Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240712となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 大規模言語モデルの記号的知識蒸留に関する調査 A Survey on Symbolic Knowledge Distillation of Large Language Models ( http://arxiv.org/abs/2408.10210v1 ) ライセンス: Link先を確認	Kamal Acharya, Alvaro Velasquez, Houbing Herbert Song,	(参考訳) 本稿では,Large Language Models (LLMs) における記号的知識蒸留の新たな重要領域について検討する。 Generative Pre-trained Transformer-3 (GPT-3) や Bidirectional Encoder Representations from Transformers (BERT) のようなLCMは、スケールと複雑さを拡大し続けており、その広範な知識を効果的に活用するという課題が最重要である。この調査は、これらのモデルに含まれる複雑な暗黙の知識を、より象徴的で明示的な形式に蒸留するプロセスに集中している。この変換は, LLMの解釈可能性, 効率, 適用性の向上に不可欠である。我々は,より小型で効率的な人工知能(AI)モデルの透明性と機能を改善するために,シンボリック・ナレッジ・蒸留をいかに利用できるかに着目し,方法論と応用に基づく既存の研究を分類する。本調査では,知識の深度を網羅的な形式で維持することを含む中核的な課題について論じ,この分野で開発された様々なアプローチや手法について考察する。現在の研究のギャップと今後の進歩の可能性を見極める。この調査は, LLMにおける記号的知識蒸留の総合的な概要を提供することを目的としており, よりアクセスしやすく, 効率的なAIシステムへの進化におけるその意義を浮き彫りにしている。 This survey paper delves into the emerging and critical area of symbolic knowledge distillation in Large Language Models (LLMs). As LLMs like Generative Pre-trained Transformer-3 (GPT-3) and Bidirectional Encoder Representations from Transformers (BERT) continue to expand in scale and complexity, the challenge of effectively harnessing their extensive knowledge becomes paramount. This survey concentrates on the process of distilling the intricate, often implicit knowledge contained within these models into a more symbolic, explicit form. This transformation is crucial for enhancing the interpretability, efficiency, and applicability of LLMs. We categorize the existing research based on methodologies and applications, focusing on how symbolic knowledge distillation can be used to improve the transparency and functionality of smaller, more efficient Artificial Intelligence (AI) models. The survey discusses the core challenges, including maintaining the depth of knowledge in a comprehensible format, and explores the various approaches and techniques that have been developed in this field. We identify gaps in current research and potential opportunities for future advancements. This survey aims to provide a comprehensive overview of symbolic knowledge distillation in LLMs, spotlighting its significance in the progression towards more accessible and efficient AI systems.	翻訳日:2024-11-08 06:44:48 公開日:2024-07-12
# ウィークリー・ニューヨーカー・コミック賞の受賞予想 Predicting Winning Captions for Weekly New Yorker Comics ( http://arxiv.org/abs/2407.18949v1 ) ライセンス: Link先を確認	Stanley Cao, Sonny Young,	(参考訳) 視覚変換器(ViTs)を用いた画像キャプションは、コンピュータビジョンと自然言語処理の重要な収束を示し、ユーザエクスペリエンスを高め、アクセシビリティを改善し、視覚データのテキスト表現を提供する。本稿では,ニューヨーク・カートゥーン・カートゥーン・キャプション・コンテスト(New Yorker Cartoon Caption Contest, New Yorker Cartoon Caption Contest)において,入賞者の機知とユーモアをエミュレートしたキャプションを生成することを目的とした,ニューヨーカーの漫画へのイメージキャプション技術の適用について検討する。この課題は、文化的ニュアンスやユーモアの理解とともに、洗練された視覚的・言語的な処理を必要とする。本稿では,ニューヨーカーの漫画キャプションコンテストのキャプションを生成するために,視覚変換器エンコーダデコーダモデルを用いた新しいベースラインを提案する。 Image captioning using Vision Transformers (ViTs) represents a pivotal convergence of computer vision and natural language processing, offering the potential to enhance user experiences, improve accessibility, and provide textual representations of visual data. This paper explores the application of image captioning techniques to New Yorker cartoons, aiming to generate captions that emulate the wit and humor of winning entries in the New Yorker Cartoon Caption Contest. This task necessitates sophisticated visual and linguistic processing, along with an understanding of cultural nuances and humor. We propose several new baselines for using vision transformer encoder-decoder models to generate captions for the New Yorker cartoon caption contest.	翻訳日:2024-08-05 01:06:22 公開日:2024-07-12
# Kantの立場から見た人工知能判断の不明瞭性 Unexplainability of Artificial Intelligence Judgments in Kant's Perspective ( http://arxiv.org/abs/2407.18950v1 ) ライセンス: Link先を確認	Jongwoo Seo,	(参考訳) カントの純粋推論批判は、認識論の歴史に大きく貢献し、人間の判断の先駆的な原理の構造を解明するためのカテゴリの表を提案している。人工知能(AI)の技術は機能主義に基づいて、人間の判断をシミュレートまたは再現すると主張している。この主張を評価するためには、AI判断が人間の判断の特徴を持っているかどうかを検討する必要がある。本稿は,AI判断が人間の判断の特性の観点から理解できない形態を示すものであることを論じる。判断の特性が重なるので、このAIの不確実性と呼ぶことができる。そして,身体的な直観のない概念は,視覚を通して機能を示すときの説明が困難であることを示す。最後に、AIが主語を通して文章を作成し、判断の要素である自然言語で述語するとしても、AIが人間が受け入れられるレベルの概念を理解しているかどうかを判断することは困難である、と説明する。これは、自然言語による説明が信頼できるかどうかが疑問であることを示している。 Kant's Critique of Pure Reason, a major contribution to the history of epistemology, proposes a table of categories to elucidate the structure of the a priori principle of human judgment. The technology of artificial intelligence (AI), based on functionalism, claims to simulate or replicate human judgment. To assess this claim, it is necessary to study whether AI judgment possesses the characteristics of human judgment. This paper argues that AI judgments exhibit a form that cannot be understood in terms of the characteristics of human judgments according to Kant. Because the characteristics of judgment overlap, we can call this AI's uncertainty. Then, I show that concepts without physical intuitions are not easy to explain when their functions are shown through vision. Finally, I illustrate that even if AI makes sentences through subject and predicate in natural language, which are components of judgment, it is difficult to determine whether AI understands the concepts to the level humans can accept. This shows that it is questionable whether the explanation through natural language is reliable.	翻訳日:2024-08-05 01:06:22 公開日:2024-07-12
# デジタルツインニング産業のためのフォトグラム計測 4.0(I4)システム Photogrammetry for Digital Twinning Industry 4.0 (I4) Systems ( http://arxiv.org/abs/2407.18951v1 ) ライセンス: Link先を確認	Ahmed Alhamadah, Muntasir Mamun, Henry Harms, Mathew Redondo, Yu-Zheng Lin, Jesus Pacheco, Soheil Salehi, Pratik Satam,	(参考訳) 産業4.0の開始は、クラウドコンピューティング、機械学習(ML)、人工知能(AI)、ユニバーサルネットワーク接続の統合によって急速に製造業の世界を変え、パフォーマンスの最適化と生産性の向上をもたらしている。デジタルツイン(Digital Twins, DT)は、ソフトウェアシステムを利用して物理プロセスの振る舞いを再現する技術である。本稿では,写真を用いた物理オブジェクトを仮想3次元モデルに再構成するプロセスであるフォトグラメトリと3次元走査技術を用いて,ML/AIに基づく行動モデルと相互作用する「物理プロセス」の正確な視覚表現を実現することを目的とする。これを実現するために、私たちは、ステレオビジョン機能を備えたiPhone 15 Proを使って、産業用4.0システムの奥行きを捉えました。これらの画像を3Dスキャンツールを用いて処理することにより、DTモデルを作成するための3Dモデリングおよびレンダリングソフトウェアのための生の3Dモデルを作成しました。本手法の信頼性は, 地中真理(テープ測度を用いた手作業による計測)と本手法を用いて作成した最終3次元モデルとの間の誤差率を計測することによって強調する。全体的な平均誤差は4.97 %であり、標準偏差誤差は5.54 %である。本研究の結果から,コンシューマグレードデバイスを用いたフォトグラメトリは,スマートマニュファクチャリングのためのDTを作成するための効率的かつコスト効率のよいアプローチであり,フレキシブルなアプローチは,時間とともにモデルの反復的な改善を可能にすることが示唆された。 The onset of Industry 4.0 is rapidly transforming the manufacturing world through the integration of cloud computing, machine learning (ML), artificial intelligence (AI), and universal network connectivity, resulting in performance optimization and increase productivity. Digital Twins (DT) are one such transformational technology that leverages software systems to replicate physical process behavior, representing the physical process in a digital environment. This paper aims to explore the use of photogrammetry (which is the process of reconstructing physical objects into virtual 3D models using photographs) and 3D Scanning techniques to create accurate visual representation of the 'Physical Process', to interact with the ML/AI based behavior models. To achieve this, we have used a readily available consumer device, the iPhone 15 Pro, which features stereo vision capabilities, to capture the depth of an Industry 4.0 system. By processing these images using 3D scanning tools, we created a raw 3D model for 3D modeling and rendering software for the creation of a DT model. The paper highlights the reliability of this method by measuring the error rate in between the ground truth (measurements done manually using a tape measure) and the final 3D model created using this method. The overall mean error is 4.97\% and the overall standard deviation error is 5.54\% between the ground truth measurements and their photogrammetry counterparts. The results from this work indicate that photogrammetry using consumer-grade devices can be an efficient and cost-efficient approach to creating DTs for smart manufacturing, while the approaches flexibility allows for iterative improvements of the models over time.	翻訳日:2024-08-05 01:06:22 公開日:2024-07-12
# Real Face Video Animation Platform Real Face Video Animation Platform ( http://arxiv.org/abs/2407.18955v1 ) ライセンス: Link先を確認	Xiaokai Chen, Xuan Liu, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su,	(参考訳) 近年、顔画像生成モデルが人気を博している。しかし、高品質のアニメスタイルの顔訓練セットがないため、誇張されたアニメスタイルの顔を扱う場合、表現力に欠けることが多い。複数のモデルをサポートしながら、実際の人間の顔から漫画的な顔へのリアルタイムな変換を可能にする顔アニメーションプラットフォームを提案する。 Gradioフレームワーク上に構築された当社のプラットフォームは,優れた対話性とユーザフレンドリ性を保証する。ユーザーは本物の顔のビデオや画像を入力し、好きな漫画のスタイルを選択することができる。システムは自動的に顔の特徴を分析し、必要な事前処理を実行し、適切なモデルを実行して表現力のあるアニメスタイルの顔を生成する。私たちは、HDTFデータセットを処理するために、システム内にさまざまなモデルを使用し、アニメーションの顔ビデオデータセットを作成します。 In recent years, facial video generation models have gained popularity. However, these models often lack expressive power when dealing with exaggerated anime-style faces due to the absence of high-quality anime-style face training sets. We propose a facial animation platform that enables real-time conversion from real human faces to cartoon-style faces, supporting multiple models. Built on the Gradio framework, our platform ensures excellent interactivity and user-friendliness. Users can input a real face video or image and select their desired cartoon style. The system will then automatically analyze facial features, execute necessary preprocessing, and invoke appropriate models to generate expressive anime-style faces. We employ a variety of models within our system to process the HDTF dataset, thereby creating an animated facial video dataset.	翻訳日:2024-08-05 01:06:22 公開日:2024-07-12
# 韓国における学生の学習満足度に影響を及ぼす要因の探索 Exploring Factors Affecting Student Learning Satisfaction during COVID-19 in South Korea ( http://arxiv.org/abs/2407.20234v1 ) ライセンス: Link先を確認	Jiwon Han, Chaeeun Ryu, Gayathri Nadarajan,	(参考訳) 学生の好みや学習満足度を理解することは、自己効力、パフォーマンス、エンゲージメントなどの学習特性に焦点を当てている。既存の研究は、学習満足度に影響を与える重要な要因を正確に特定できる統計モデルを構築してきたが、これらの要因間の複雑な関係を深く説明するとは限らない。本研究の目的は,個人学習者の特徴,指導的デザイン要素,社会的・環境要因など,パンデミック時の学生の好みや満足度に関連するいくつかの側面を理解することである。 2021年から2022年の間,韓国のスンキンクワン大学の302人の学生の回答が収集された。収集された情報には、性別、主な研究、学習時の満足度、モチベーションレベル、パフォーマンス、感情状態、学習環境が含まれていた。 Wilcoxon Rank sum test and Explainable Boosting Machine (EBM) was performed to determine significant difference in specific cohorts。 1)Wilcoxon Rank Sumテストを用いて、オフライン授業を受講した学生は、オンライン授業を受講した学生よりも、STEMとHASSの学生よりも、学習満足度が95%高いことを証明できる。 2)95.08%の精度に適合した説明可能なブースティングマシン(EBM)モデルは,学生の学習満足度に影響を及ぼす要因のトップ5,授業活動への参加に対する認識,専攻者,授業での議論の実施能力,家庭における学習空間の可利用性について検討した。学習満足度に肯定的な評価とクラスメートとの議論能力が有意な影響を与え,クラス活動参加に対する否定的な認識が学習満足度に負の影響を及ぼした。 Understanding students' preferences and learning satisfaction during COVID-19 has focused on learning attributes such as self-efficacy, performance, and engagement. Although existing efforts have constructed statistical models capable of accurately identifying significant factors impacting learning satisfaction, they do not necessarily explain the complex relationships among these factors in depth. This study aimed to understand several facets related to student learning preferences and satisfaction during the pandemic such as individual learner characteristics, instructional design elements and social and environmental factors. Responses from 302 students from Sungkyunkwan University, South Korea were collected between 2021 and 2022. Information gathered included their gender, study major, satisfaction and motivation levels when learning, perceived performance, emotional state and learning environment. Wilcoxon Rank sum test and Explainable Boosting Machine (EBM) were performed to determine significant differences in specific cohorts. The two core findings of the study are as follows:1) Using Wilcoxon Rank Sum test, we can attest with 95% confidence that students who took offline classes had significantly higher learning satisfaction, among other attributes, than those who took online classes, as with STEM versus HASS students; 2) An explainable boosting machine (EBM) model fitted to 95.08% accuracy determined the top five factors affecting students' learning satisfaction as their perceived performance, their perception on participating in class activities, their study majors, their ability to conduct discussions in class and the study space availability at home. Positive perceived performance and ability to discuss with classmates had a positive impact on learning satisfaction, while negative perception on class activities participation had a negative impact on learning satisfaction.	翻訳日:2024-08-05 00:56:24 公開日:2024-07-12
# 太陽ダイナミクス観測画像におけるニューラルベース映像圧縮 Neural-based Video Compression on Solar Dynamics Observatory Images ( http://arxiv.org/abs/2407.15730v1 ) ライセンス: Link先を確認	Atefeh Khoshkhahtinat, Ali Zafari, Piyush M. Mehta, Nasser M. Nasrabadi, Barbara J. Thompson, Michael S. F. Kirk, Daniel da Silva,	(参考訳) NASAのソーラー・ダイナミクス・オブザーバトリー(SDO)ミッションは、太陽の日常活動を監視するために膨大なデータを収集する。宇宙ミッション設計の領域では、データ圧縮は、限られたテレメトリレートによって引き起こされる課題に対処する上で重要な役割を担っている。データ圧縮の主な目的は、制約帯域内で作業するための効率的なデータ管理と送信を容易にすることである。本稿では,SDOの画像データ収集における圧縮率の高いニューラルビデオ圧縮手法を提案する。提案手法は、データ中の時間的および空間的冗長性の両方を活用することに焦点を当て、より効率的な圧縮を実現する。本研究では,入力画像から局所的情報とグローバル的情報の両方を効果的かつ効率的にキャプチャするトランスフォーマーモデルに基づくアーキテクチャを提案する。さらに,潜在表現の確率分布を正確にモデル化し,エントロピー復号処理の高速化を図るエントロピーモデルも備えている。エントロピーモデルは、チャネルに依存したアプローチを活用し、チェッカーボード型の局所的および大域的空間的コンテキストを利用する。提案手法はトランスフォーマーをベースとしたビデオ圧縮ネットワークとエントロピーモデルを組み合わせることで,H.264やH.265といった従来のビデオコーデックよりも優れた性能を示す。 NASA's Solar Dynamics Observatory (SDO) mission collects extensive data to monitor the Sun's daily activity. In the realm of space mission design, data compression plays a crucial role in addressing the challenges posed by limited telemetry rates. The primary objective of data compression is to facilitate efficient data management and transmission to work within the constrained bandwidth, thereby ensuring that essential information is captured while optimizing the utilization of available resources. This paper introduces a neural video compression technique that achieves a high compression ratio for the SDO's image data collection. The proposed approach focuses on leveraging both temporal and spatial redundancies in the data, leading to a more efficient compression. In this work, we introduce an architecture based on the Transformer model, which is specifically designed to capture both local and global information from input images in an effective and efficient manner. Additionally, our network is equipped with an entropy model that can accurately model the probability distribution of the latent representations and improves the speed of the entropy decoding step. The entropy model leverages a channel-dependent approach and utilizes checkerboard-shaped local and global spatial contexts. By combining the Transformer-based video compression network with our entropy model, the proposed compression algorithm demonstrates superior performance over traditional video codecs like H.264 and H.265, as confirmed by our experimental results.	翻訳日:2024-07-28 18:29:13 公開日:2024-07-12
# 大規模言語モデルの数学的推論能力向上のためのToken-Supervised Value Model Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models ( http://arxiv.org/abs/2407.12863v1 ) ライセンス: Link先を確認	Jung Hyun Lee, June Yong Yang, Byeongho Heo, Dongyoon Han, Kang Min Yoo,	(参考訳) 大規模言語モデル(LLM)は、ステップバイステップの推論チェーンを通じて、数学における顕著な問題解決能力を実証している。しかし、後続の推論連鎖の品質と最終的な答えに影響を及ぼす誤りを推論することは、言語モデルの自己回帰的トークン・バイ・トーケン生成の性質に起因する。近年の研究では、推論経路の生成を導くために外部検証器の採用が提案されているが、既存の研究では、トークン・バイ・トークン・推論・チェーンの正確性を評価するために、ステップ・バイ・ステップのラベルで訓練されたモデルを利用している。その結果、彼らは推論経路内のトークンの識別的詳細を認識するのに苦労し、中間推論経路が正しい最終回答に向けて有望な軌道上にあるかどうかを評価する能力に欠ける。そこで我々は,有意な累積報酬(すなわち値)にトークンレベルの監督を適用した検証者のための新しい訓練手法を考案した。さらに,最終回答の今後の正しさの確率を減らし,結果の実証的推定を可能にすることで,累積報酬の実用的定式化を提案する。数学的推論ベンチマークによる実験結果から,Token-Supervised Value Model (TVM) は,Mistral と Llama を用いた GSM8K と MATH のステップバイステップ検証よりも優れていることが示された。 Large Language Models (LLMs) have demonstrated impressive problem-solving capabilities in mathematics through step-by-step reasoning chains. However, they are susceptible to reasoning errors that impact the quality of subsequent reasoning chains and the final answer due to language models' autoregressive token-by-token generating nature. Recent works have proposed adopting external verifiers to guide the generation of reasoning paths, but existing works utilize models that have been trained with step-by-step labels to assess the correctness of token-by-token reasoning chains. Consequently, they struggle to recognize discriminative details of tokens within a reasoning path and lack the ability to evaluate whether an intermediate reasoning path is on a promising track toward the correct final answer. To amend the lack of sound and token-grained math-verification signals, we devise a novel training scheme for verifiers that apply token-level supervision with the expected cumulative reward (i.e., value). Furthermore, we propose a practical formulation of the cumulative reward by reducing it to finding the probability of future correctness of the final answer and thereby enabling the empirical estimation of the value. Experimental results on mathematical reasoning benchmarks show that Token-Supervised Value Model (TVM) can outperform step-by-step verifiers on GSM8K and MATH with Mistral and Llama.	翻訳日:2024-07-19 20:02:37 公開日:2024-07-12
# 動的グラフラプラシアンを用いた時間進化ネットワークのクラスタリング Clustering Time-Evolving Networks Using the Dynamic Graph Laplacian ( http://arxiv.org/abs/2407.12864v1 ) ライセンス: Link先を確認	Maia Trower, Nataša Djurdjevac Conrad, Stefan Klus,	(参考訳) 時間進化グラフは、ソーシャルネットワーク、トラフィックフロー、生物学的プロセスなどの複雑な力学系をモデル化する際に頻繁に発生する。これらの時間変化グラフ構造におけるコミュニティを特定し解析する技術を開発することは重要な課題である。本研究では,正準相関解析(CCA)を用いて,既存のスペクトルクラスタリングアルゴリズムを静的グラフから動的グラフへ一般化し,クラスタの時間的進化を捉える。この拡張正準相関フレームワークに基づいて、動的グラフラプラシアンを定義し、そのスペクトル特性について検討する。これらの概念を転送演算子を介して力学系理論に結合し,既存の手法と比較してベンチマークグラフ上での手法の利点を説明する。動的グラフ Laplacian は、有向グラフと無向グラフの時間経過に伴うクラスタ構造進化の明確な解釈を可能にすることを示す。 Time-evolving graphs arise frequently when modeling complex dynamical systems such as social networks, traffic flow, and biological processes. Developing techniques to identify and analyze communities in these time-varying graph structures is an important challenge. In this work, we generalize existing spectral clustering algorithms from static to dynamic graphs using canonical correlation analysis (CCA) to capture the temporal evolution of clusters. Based on this extended canonical correlation framework, we define the dynamic graph Laplacian and investigate its spectral properties. We connect these concepts to dynamical systems theory via transfer operators, and illustrate the advantages of our method on benchmark graphs by comparison with existing methods. We show that the dynamic graph Laplacian allows for a clear interpretation of cluster structure evolution over time for directed and undirected graphs.	翻訳日:2024-07-19 20:02:37 公開日:2024-07-12
# GRAD-SUM: 最適プロンプトエンジニアリングのためのグラディエント要約の活用 GRAD-SUM: Leveraging Gradient Summarization for Optimal Prompt Engineering ( http://arxiv.org/abs/2407.12865v1 ) ライセンス: Link先を確認	Derek Austin, Elliott Chartock,	(参考訳) 大規模言語モデル(LLM)のプロンプトエンジニアリングは、しばしば、高品質な出力を保証するために反復的にプロンプトの生成、評価、精製を含む手動の時間集約プロセスである。迅速なエンジニアリングを自動化する作業は行われているが、ソリューションは一般的に、与えられた回答で特定のタスクに調整されるか、非常にコストがかかる。 GRAD-SUMは、勾配に基づく最適化技術に基づく自動プロンプトエンジニアリングのためのスケーラブルで柔軟な手法である。提案手法では,ユーザ定義タスク記述と評価基準を取り入れ,フィードバックを効果的に一般化する新たな勾配要約モジュールを特徴とする。この結果から, GRAD-SUMは様々なベンチマークで既存の手法よりも優れており, 自動プロンプト最適化における汎用性と有効性を強調している。 Prompt engineering for large language models (LLMs) is often a manual time-intensive process that involves generating, evaluating, and refining prompts iteratively to ensure high-quality outputs. While there has been work on automating prompt engineering, the solutions generally are either tuned to specific tasks with given answers or are quite costly. We introduce GRAD-SUM, a scalable and flexible method for automatic prompt engineering that builds on gradient-based optimization techniques. Our approach incorporates user-defined task descriptions and evaluation criteria, and features a novel gradient summarization module to generalize feedback effectively. Our results demonstrate that GRAD-SUM consistently outperforms existing methods across various benchmarks, highlighting its versatility and effectiveness in automatic prompt optimization.	翻訳日:2024-07-19 20:02:37 公開日:2024-07-12
# milli Flow:人間のモーションセンシングのためのミリ波レーダ点雲のシーンフロー推定 milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing ( http://arxiv.org/abs/2306.17010v8 ) ライセンス: Link先を確認	Fangqiang Ding, Zhen Luo, Peijun Zhao, Chris Xiaoxuan Lu,	(参考訳) ヒューマンモーションセンシングは、意思決定、ユーザインタラクション、パーソナライズされたサービスにおいて、スマートシステムにおいて重要な役割を果たす。大規模な研究は、主にカメラに基づいており、その侵入性はスマートホームアプリケーションでの使用を制限する。この問題を解決するために、mWaveレーダーはプライバシーに優しい機能のために人気を集めている。本研究では,mmWave 点雲の相補的な動き情報としてシーンフローを推定する新たな深層学習手法である MilliFlow を提案する。実験により, 競合する手法と比較して, 提案手法の優れた性能が示された。さらに、シーンフロー情報を取り入れることで、人間の活動認識と人間のパーシングの大幅な改善を実現し、人体部分追跡を支援する。コードとデータセットはhttps://github.com/Toytiny/milliFlow.comで入手できる。 Human motion sensing plays a crucial role in smart systems for decision-making, user interaction, and personalized services. Extensive research that has been conducted is predominantly based on cameras, whose intrusive nature limits their use in smart home applications. To address this, mmWave radars have gained popularity due to their privacy-friendly features. In this work, we propose milliFlow, a novel deep learning approach to estimate scene flow as complementary motion information for mmWave point cloud, serving as an intermediate level of features and directly benefiting downstream human motion sensing tasks. Experimental results demonstrate the superior performance of our method when compared with the competing approaches. Furthermore, by incorporating scene flow information, we achieve remarkable improvements in human activity recognition and human parsing and support human body part tracking. Code and dataset are available at https://github.com/Toytiny/milliFlow.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-12
# マルチスケールパッチ埋め込みと変圧器を用いた心電図信号の復調 ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers ( http://arxiv.org/abs/2407.11065v1 ) ライセンス: Link先を確認	Ding Zhu, Vishnu Kabir Chhabra, Mohammad Mahdi Khalili,	(参考訳) 心血管疾患は、心電図(ECG)信号を用いて一般的に監視される主要な生命維持状態である。しかし、これらの信号はしばしば異なる強度の様々な種類のノイズによって汚染され、下流のタスクと著しく干渉する。したがって、心電図信号のノイズ化と信号対雑音比の増大は、心血管モニタリングに不可欠である。本稿では,1次元畳み込み層と変圧器アーキテクチャを組み合わせた深層学習手法を提案する。畳み込み層は、ECG信号を様々なカーネル/パッチサイズで処理し、マルチスケールパッチ埋め込みと呼ばれる埋め込みを生成する。次に、この埋め込みをトランスネットワークの入力として使用し、ECG信号をデノナイズするトランスの能力を高める。 Cardiovascular disease is a major life-threatening condition that is commonly monitored using electrocardiogram (ECG) signals. However, these signals are often contaminated by various types of noise at different intensities, significantly interfering with downstream tasks. Therefore, denoising ECG signals and increasing the signal-to-noise ratio is crucial for cardiovascular monitoring. In this paper, we propose a deep learning method that combines a one-dimensional convolutional layer with transformer architecture for denoising ECG signals. The convolutional layer processes the ECG signal by various kernel/patch sizes and generates an embedding called multi-scale patch embedding. The embedding then is used as the input of a transformer network and enhances the capability of the transformer for denoising the ECG signal.	翻訳日:2024-07-17 20:10:21 公開日:2024-07-12
# Show, Don't Tell: ChildPlayによるテキスト理解以上の大規模言語モデルの評価 Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay ( http://arxiv.org/abs/2407.11068v1 ) ライセンス: Link先を確認	Gonçalo Hora de Carvalho, Robert Pollice, Oscar Knap,	(参考訳) GPT-3.5 や GPT-4 のような LLM は、特に非言語領域において、より広い認知機能を持つという仮説を探求する。我々のアプローチは、戦略的思考と意思決定を評価するために、ASCIIでエンコードされたTic-Tac-Toe、Connect Four、Battleshipといったゲームを統合することで、標準的な言語ベンチマークを超えて拡張されます。モデルがトレーニングデータを超えて一般化できる能力を評価するために,さらに2つのゲームを導入する。最初のゲームであるLEGO Connect Language (LCL)は、空間論理を理解してアセンブリ命令に従うためにモデルの能力をテストする。第2のゲーム、形状のゲームは、ゼロの行列内で1sで表される形状を識別するためにモデルに挑戦し、さらに空間推論のスキルをテストする。この"Show, don't tell"戦略は、単にモデルに問い合わせるのではなく、ゲームを使用する。その結果,GPT-3.5 と GPT-4 のプレイ能力は標準ベンチマークに習熟しているにもかかわらず,事前学習をせずに完全に観察可能なゲームについて推論できることが示唆された。どちらのモデルも、Tic-Tac-ToeとConnect Fourでの敗戦を予測できず、バトルシップを正しくプレイすることができない。 GPT-4は形状のゲームである程度成功したが、両方のモデルはLCLゲームで提示された組立タスクで失敗する。これらの結果は,GPTモデルが会話の熟練度や基本ルールの理解をエミュレートできる一方で,戦略ゲームプレイや空間推論タスクにおける性能は極めて限定的であることを示唆している。重要なことに、これは現在のLLMベンチマークの盲点であり、ゲームプレイベンチマークスイートであるChildPlay(https://github.com/child-play-neurips/child-play)で強調します。本研究は, GPT-3.5 と GPT-4 とほぼ同じ大きさの LLM の創発的知能の主張と推論能力に関する注意深い物語を提供する。 We explore the hypothesis that LLMs, such as GPT-3.5 and GPT-4, possess broader cognitive functions, particularly in non-linguistic domains. Our approach extends beyond standard linguistic benchmarks by incorporating games like Tic-Tac-Toe, Connect Four, and Battleship, encoded via ASCII, to assess strategic thinking and decision-making. To evaluate the models' ability to generalize beyond their training data, we introduce two additional games. The first game, LEGO Connect Language (LCL), tests the models' capacity to understand spatial logic and follow assembly instructions. The second game, the game of shapes, challenges the models to identify shapes represented by 1s within a matrix of zeros, further testing their spatial reasoning skills. This "show, don't tell" strategy uses games instead of simply querying the models. Our results show that despite their proficiency on standard benchmarks, GPT-3.5 and GPT-4's abilities to play and reason about fully observable games without pre-training is mediocre. Both models fail to anticipate losing moves in Tic-Tac-Toe and Connect Four, and they are unable to play Battleship correctly. While GPT-4 shows some success in the game of shapes, both models fail at the assembly tasks presented in the LCL game. These results suggest that while GPT models can emulate conversational proficiency and basic rule comprehension, their performance in strategic gameplay and spatial reasoning tasks is very limited. Importantly, this reveals a blind spot in current LLM benchmarks that we highlight with our gameplay benchmark suite ChildPlay (https://github.com/child-play-neurips/child-play). Our findings provide a cautionary tale about claims of emergent intelligence and reasoning capabilities of LLMs that are roughly the size of GPT-3.5 and GPT-4.	翻訳日:2024-07-17 20:10:21 公開日:2024-07-12
# フェデレーションラーニングとコントロールを組み合わせた調査 Combining Federated Learning and Control: A Survey ( http://arxiv.org/abs/2407.11069v1 ) ライセンス: Link先を確認	Jakob Weber, Markus Gurtner, Amadeus Lobe, Adrian Trachte, Andreas Kugi,	(参考訳) この調査は、(非線形)制御アプリケーションにおける適応性、スケーラビリティ、一般化、プライバシを高めるために、フェデレートラーニング(FL)とコントロールを組み合わせる概要を提供する。従来の制御方法はコントローラ設計モデルに依存しているが、現実のシナリオではオンラインモデルの変更や学習を必要とすることが多い。 FLは、データプライバシを保持しながら、分散デバイス間の協調学習を可能にする、モデルトレーニングに対する分散アプローチを提供する。データをローカライズすることで、FLは通信のネットワーク帯域幅の要件を減らしながら、プライバシとセキュリティに関する懸念を軽減する。この調査は、FLと制御を組み合わせた最先端の概念と考え方をまとめたものである。方法論的メリットはさらに議論され,コントローラ設計による動的システムモデリングから適応制御への焦点,マルチエージェント意思決定システムにおける知識伝達に至るまで,期待されるアプリケーションの詳細な概要が示されている。 This survey provides an overview of combining Federated Learning (FL) and control to enhance adaptability, scalability, generalization, and privacy in (nonlinear) control applications. Traditional control methods rely on controller design models, but real-world scenarios often require online model retuning or learning. FL offers a distributed approach to model training, enabling collaborative learning across distributed devices while preserving data privacy. By keeping data localized, FL mitigates concerns regarding privacy and security while reducing network bandwidth requirements for communication. This survey summarizes the state-of-the-art concepts and ideas of combining FL and control. The methodical benefits are further discussed, culminating in a detailed overview of expected applications, from dynamical system modeling over controller design, focusing on adaptive control, to knowledge transfer in multi-agent decision-making systems.	翻訳日:2024-07-17 20:10:21 公開日:2024-07-12
# 因果モデリングと木探索を用いたCAGE-2の最適デフェンダ戦略 Optimal Defender Strategies for CAGE-2 using Causal Modeling and Tree Search ( http://arxiv.org/abs/2407.11070v1 ) ライセンス: Link先を確認	Kim Hammar, Neil Dhir, Rolf Stadler,	(参考訳) CAGE-2チャレンジは、自律的なサイバー防御方法を比較するための標準ベンチマークと考えられている。このベンチマークに対して評価された現在の最先端の手法は、モデルなし(オフライン)強化学習に基づいており、証明可能な最適なディフェンダー戦略を提供していない。本稿では,この制限に対処し,CAGE-2の形式的(因果的)モデルと,C-POMCP(Causal partially Observable Monte-Carlo Planning)と呼ばれる,実証可能な最適なディフェンダー戦略を生成する手法を提案する。 2つの重要な性質を持つ。まず、対象システムの因果構造、すなわちシステム変数間の因果関係を組み込む。この構造により、ディフェンダー戦略の探索空間が大幅に減少する。第2に、木探索を通じて各ステップでディフェンダー戦略を更新するオンライン手法である。 CAGE-2ベンチマークに対する評価は、C-POMCPが有効性に関して最先端の性能を達成し、最も近い競合手法よりも計算時間で2桁効率が良いことを示している。 The CAGE-2 challenge is considered a standard benchmark to compare methods for autonomous cyber defense. Current state-of-the-art methods evaluated against this benchmark are based on model-free (offline) reinforcement learning, which does not provide provably optimal defender strategies. We address this limitation and present a formal (causal) model of CAGE-2 together with a method that produces a provably optimal defender strategy, which we call Causal Partially Observable Monte-Carlo Planning (C-POMCP). It has two key properties. First, it incorporates the causal structure of the target system, i.e., the causal relationships among the system variables. This structure allows for a significant reduction of the search space of defender strategies. Second, it is an online method that updates the defender strategy at each time step via tree search. Evaluations against the CAGE-2 benchmark show that C-POMCP achieves state-of-the-art performance with respect to effectiveness and is two orders of magnitude more efficient in computing time than the closest competitor method.	翻訳日:2024-07-17 20:00:37 公開日:2024-07-12
# MonoSparse-CAM: CAMにおける木モデル処理強化のためのモノトニック性とスポーサリティの調和 MonoSparse-CAM: Harnessing Monotonicity and Sparsity for Enhanced Tree Model Processing on CAMs ( http://arxiv.org/abs/2407.11071v1 ) ライセンス: Link先を確認	Tergel Molom-Ochir, Brady Taylor, Hai, Li, Yiran Chen,	(参考訳) ニューラルネットワークによって駆動されるAIの大幅な進歩にもかかわらず、ツリーベース機械学習(TBML)モデルは表データに排他的である。これらのモデルは、特にアナログコンテンツ調整可能なメモリ(aCAM)アレイで加速された場合、エネルギー効率と高い性能を示す。しかし、TBMLモデル構造とaCAM回路を利用する場合、ハードウェアデプロイメントの最適化は依然として困難である。本稿では,コンテンツ適応型メモリ(CAM)に基づく計算最適化技術であるMonoSparse-CAMを紹介する。 MonoSparse-CAMはTBMLモデルとCAMアレイ回路を効率よく利用し、処理性能を向上させる。実験の結果,MonoSparse-CAMは,既存のデプロイメント最適化手法と比較して,生処理と比較して最大28.56倍,18.51倍のエネルギー消費を削減できることがわかった。さらに、現在の手法よりも少なくとも1.68倍の計算効率を実現している。 MonoSparse-CAMは、配列の幅にかかわらず性能を保ちながらエネルギー効率の良いCAMベースの計算を可能にすることにより、大規模な配列の処理を妨げるCAMの高エネルギー消費問題に対処する。 CAMベースのコンピューティングにおいて,効率的なデプロイメント最適化ソリューションとしてMonoSparse-CAMを提案するとともに,TBMLモデル構造が配列空間に与える影響について検討する。この研究は、ハードウェア上でのエネルギー効率の高いTBMLに関する重要な洞察を提供し、持続可能なAI技術の大幅な進歩を浮き彫りにしている。 Despite significant advancements in AI driven by neural networks, tree-based machine learning (TBML) models excel on tabular data. These models exhibit promising energy efficiency, and high performance, particularly when accelerated on analog content-addressable memory (aCAM) arrays. However, optimizing their hardware deployment, especially in leveraging TBML model structure and aCAM circuitry, remains challenging. In this paper, we introduce MonoSparse-CAM, a novel content-addressable memory (CAM) based computing optimization technique. MonoSparse-CAM efficiently leverages TBML model sparsity and CAM array circuits, enhancing processing performance. Our experiments show that MonoSparse-CAM reduces energy consumption by up to 28.56x compared to raw processing and 18.51x compared to existing deployment optimization techniques. Additionally, it consistently achieves at least 1.68x computational efficiency over current methods. By enabling energy-efficient CAM-based computing while preserving performance regardless of the array sparsity, MonoSparse-CAM addresses the high energy consumption problem of CAM which hinders processing of large arrays. Our contributions are twofold: we propose MonoSparse-CAM as an effective deployment optimization solution for CAM-based computing, and we investigate the impact of TBML model structure on array sparsity. This work provides crucial insights for energy-efficient TBML on hardware, highlighting a significant advancement in sustainable AI technologies.	翻訳日:2024-07-17 20:00:37 公開日:2024-07-12
# MaPPing your model: Assess the Impact of Adversarial Attacks on LLM-based Programming Assistants MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants ( http://arxiv.org/abs/2407.11072v1 ) ライセンス: Link先を確認	John Heibel, Daniel Lowd,	(参考訳) LLMベースのプログラミングアシスタントは、より高速なプログラミングを約束するが、より多くのセキュリティ脆弱性を導入するリスクがある。以前の研究は、LSMがより頻繁に脆弱性を提案するために、どのように悪質に微調整されるかを研究していた。信頼できない第三者による結果を利用するエージェントLSMの台頭により、モデルのプロンプトに対する攻撃のリスクが増大する。攻撃者はプログラムタスク(500バイト以下)のプロンプトに少量のテキストを追加する。我々の迅速な戦略は、LSMが他の方法で正しいコードを書き続けながら脆弱性を追加する可能性があることを示しています。我々は,基本から最先端の商用モデルに至るまで,7つの共通LLM上での3つのプロンプトを評価する。 HumanEval ベンチマークを用いて、我々のプロンプトは広範囲に効果があり、異なる LLM のカスタマイズは不要である。さらに、HumanEval で最高の LLM もまた、悪意のある命令に従うのに最適であり、単に言語モデルのスケーリングが MaPP 攻撃を防ぐことはないことを示唆している。 16のシナリオで8つのCWEのデータセットを使用することで、MaPP攻撃は、さまざまなモデルにまたがって特定の脆弱性やターゲットとする脆弱性を実装するのにも有効であることがわかった。我々の研究は、LLMの助けを借りて生成されたコードを厳格に監査するだけでなく、操作に対するLLMプロンプトの確保の必要性を強調している。 LLM-based programming assistants offer the promise of programming faster but with the risk of introducing more security vulnerabilities. Prior work has studied how LLMs could be maliciously fine-tuned to suggest vulnerabilities more often. With the rise of agentic LLMs, which may use results from an untrusted third party, there is a growing risk of attacks on the model's prompt. We introduce the Malicious Programming Prompt (MaPP) attack, in which an attacker adds a small amount of text to a prompt for a programming task (under 500 bytes). We show that our prompt strategy can cause an LLM to add vulnerabilities while continuing to write otherwise correct code. We evaluate three prompts on seven common LLMs, from basic to state-of-the-art commercial models. Using the HumanEval benchmark, we find that our prompts are broadly effective, with no customization required for different LLMs. Furthermore, the LLMs that are best at HumanEval are also best at following our malicious instructions, suggesting that simply scaling language models will not prevent MaPP attacks. Using a dataset of eight CWEs in 16 scenarios, we find that MaPP attacks are also effective at implementing specific and targeted vulnerabilities across a range of models. Our work highlights the need to secure LLM prompts against manipulation as well as rigorously auditing code generated with the help of LLMs.	翻訳日:2024-07-17 20:00:37 公開日:2024-07-12
# 画像間変換のための非対称GAN Asymmetric GANs for Image-to-Image Translation ( http://arxiv.org/abs/1912.06931v2 ) ライセンス: Link先を確認	Hao Tang, Nicu Sebe,	(参考訳) GAN(Generative Adversarial Networks)による教師なし画像翻訳の既存のモデルは、サイクル一貫性損失を用いて、ソースドメインからターゲットドメインへのマッピングを学習することができる。しかし、これらの手法は常に対称なネットワークアーキテクチャを採用し、前方と後方の両方のサイクルを学習する。ソースとターゲットドメイン間のタスク複雑性とサイクル入力差のため、双方向の前後のサイクル翻訳の不等式が重要であり、2つのドメイン間の情報量が異なる。本稿では、非対称翻訳タスクにおける既存の対称GANの制限を解析し、非対称画像翻訳タスクと教師なし画像翻訳タスクの両方において非対称的なニーズに対応するために、非対称GANモデルを提案する。さらに、既存の手法の訓練段階には、生成した画像の品質を劣化させるようなモデル崩壊の一般的な問題があり、したがって、非対称GANのトレーニングを改善するために、異なる最適化損失を探索し、一貫性と安定性を向上する。 8つのデータセットを用いた教師付きおよび教師なし生成タスクの広範な実験は、AsymmetricGANが既存のGANと比較して優れたモデルキャパシティと生成性能を達成することを示す。我々の知る限りでは、教師なしと教師なしの両方の画像翻訳タスクにおいて、非対称なGAN構造を調査するのは初めてである。 Existing models for unsupervised image translation with Generative Adversarial Networks (GANs) can learn the mapping from the source domain to the target domain using a cycle-consistency loss. However, these methods always adopt a symmetric network architecture to learn both forward and backward cycles. Because of the task complexity and cycle input difference between the source and target domains, the inequality in bidirectional forward-backward cycle translations is significant and the amount of information between two domains is different. In this paper, we analyze the limitation of existing symmetric GANs in asymmetric translation tasks, and propose an AsymmetricGAN model with both translation and reconstruction generators of unequal sizes and different parameter-sharing strategy to adapt to the asymmetric need in both unsupervised and supervised image translation tasks. Moreover, the training stage of existing methods has the common problem of model collapse that degrades the quality of the generated images, thus we explore different optimization losses for better training of AsymmetricGAN, making image translation with higher consistency and better stability. Extensive experiments on both supervised and unsupervised generative tasks with 8 datasets show that AsymmetricGAN achieves superior model capacity and better generation performance compared with existing GANs. To the best of our knowledge, we are the first to investigate the asymmetric GAN structure on both unsupervised and supervised image translation tasks.	翻訳日:2024-07-17 05:46:45 公開日:2024-07-12
# 量子力学を完成させる現実的モデル A realistic model for completing Quantum Mechanics ( http://arxiv.org/abs/2104.12701v5 ) ライセンス: Link先を確認	M. Baldo,	(参考訳) N. Bohr が提唱した量子力学のコペンハーゲン解釈(英語版)では、物理的対象と実験結果はマクロ言語でのみ記述でき、どんな微視的記述も説明できないままである。この見解は、量子力学のリレーショナル解釈において、C. Rovelliによってより深められた。物理現象の詳細な微視的な説明と進化を試みている他の解釈の多くは、波動関数を理論の基本要素として明らかに導入している。これらの解釈は量子状態の概念を理論の基本概念として必要としており、コペンハーゲン解釈(英語版)による典型的な説明不可能な物理要素である。 2つの基本的な物理的実体は波動関数の整合性によって密接に結びついている。これらの解釈は通常、現実的なものとして表される。物理過程の記述における波動関数の利用とその時間進化は、必然的にいくつかの困難またはいわゆるパラドックスに繋がる。測定問題は、主に量子力学の数学的形式に明示的に含まれていない波動関数の還元過程の導入を必要とするため、これらの困難の中心にある。本稿では, 標準形式を超越したモデルの構築と提案を行い, 測定問題とそれに関連する他の問題をすべて解決できるモデルを提案する。 In the well known Copenhagen interpretation of Quantum mechanics, advocated by N. Bohr, the physical objects and the experimental results can be described only in a macroscopic language, leaving any possible microscopic description as unspeakable. This point of view has been deepened by C. Rovelli in the relational interpretation of Quantum mechanics. Most of the alternative interpretations, which try a detailed microscopic description of physical phenomena and of their evolution, have in common the explicit introduction of the wave function as the basic element of the theory. These interpretations require the notion of quantum state as the fundamental concept of the theory, which is the typical unspeakable physical element according to the Copenhagen interpretation. The two basic physical entities are intimately bound together by the integrity of the wave function. These interpretations are usually indicated as realistic. It is well known that the use of the wave function and its time evolution in the description of the physical processes leads unavoidably to some difficulties or so-called paradoxes. The measurement problem is at the center of these difficulties, mainly because it requires the introduction of the reduction process of the wave function, which is not included explicitly within the mathematical formalism of Quantum Mechanics. In this paper we build up and propose a model which goes beyond the standard formalism and which is able to solve the measurement problem and all the other difficulties which, in a way or in another, are related to it.	翻訳日:2024-07-17 05:46:45 公開日:2024-07-12
# 冷間原子干渉計用高性能シリコンフォトニックシングルサイドバンド変調器 High-Performance Silicon Photonic Single-Sideband Modulators for Cold Atom Interferometry ( http://arxiv.org/abs/2204.12537v3 ) ライセンス: Link先を確認	Ashok Kodigala, Michael Gehl, Gregory W. Hoth, Jongmin Lee, Christopher DeRose, Andrew Pomerene, Christina Dallo, Douglas Trotter, Andrew L. Starbuck, Grant Biedermann, Peter D. D. Schwindt, Anthony L. Lentine,	(参考訳) 光パルス原子干渉計(LPAI)内の最も複雑で困難なシステムは、時間とともに複数のレーザービームの周波数と強度を制御し、量子重力と慣性センサーを構成するレーザーシステムである。 LPAIレーザーの主な機能は、低温原子生成、状態準備、状態選択検出を行い、光パルスシーケンスのためのコヒーレントな2光子過程を生成することである。レーザーシステムの重要な機能をフォトニック集積回路(PIC)に導入することにより、レーザーシステムの実質的な小型化と頑丈化を実現することができる。高性能シリコンフォトニック抑圧型シングルサイドバンド (SC-SSB) 変調器を1560nmで実証し, LPAI内で動的に周波数シフトできることを示した。 RFチャネルの独立制御により、光とRFの位相/振幅のアンバランスを30dBキャリア圧縮、ピーク変換効率の47.8dBサイドバンド圧縮、最大変換効率:-6.846dB(20.7%)に到達させる。シリコンフォトニックSSB変調器を用いて、ルビジウム($^{87}$Rb)原子系において、低温原子の生成、状態選択検出、原子干渉計による重力加速度の推定、$g \approx 9.77 \pm 0.01 \,\rm{m/s^2}$を実証する。 The most complicated and challenging system within a light-pulse atom interferometer (LPAI) is the laser system, which controls the frequencies and intensities of multiple laser beams over time to configure quantum gravity and inertial sensors. The main function of an LPAI laser system is to perform cold-atom generation, state-preparation, state-selective detection and to generate coherent two-photon process for the light-pulse sequence. Substantial miniaturization and ruggedization of the laser system can be achieved by bringing most key functions of the laser system onto photonic integrated circuit (PIC). We demonstrate a high-performance silicon photonic suppressed-carrier single-sideband (SC-SSB) modulator at 1560 nm, which can dynamically frequency shift within the LPAI. With independent RF-channel control, we study the imbalances in both the optical and RF phases/amplitudes to reach 30 dB carrier-suppression, unprecedented 47.8 dB sideband-suppression at peak conversion-efficiency: -6.846 dB (20.7 %). Using a silicon photonic SSB-modulator, we demonstrate cold-atom generation, state-selective detection, and atom interferometer fringes to estimate gravitational acceleration, $g \approx 9.77 \pm 0.01 \,\rm{m/s^2}$, in a Rubidium ($^{87}$Rb) atom system.	翻訳日:2024-07-17 05:46:45 公開日:2024-07-12
# ダイナミックリレーショナルデータのためのファクトリー型核融合収縮 Factorized Fusion Shrinkage for Dynamic Relational Data ( http://arxiv.org/abs/2210.00091v3 ) ライセンス: Link先を確認	Peng Zhao, Anirban Bhattacharya, Debdeep Pati, Bani K. Mallick,	(参考訳) 現代のデータサイエンスの応用は、しばしば動的構造を持つ複雑な関係データを含む。このようなダイナミックリレーショナルデータの急激な変化は、通常、介入によって状態が変化するシステムで観察される。このような場合、分解されたすべての因子がグループ単位の融合構造に対して動的に縮小される分解された融合収縮モデルを考え、分解された行列の行ベクトルの連続的な違いに先立って、グローバル局所的な収縮を適用して収縮を得る。提案手法は、推定された動的潜在因子の比較とクラスタリングにおいて、多くの好ましい特性を享受する。推定潜在因子の比較には、隣接および長期の比較の両方が関係し、比較の時間範囲は変数と見なされる。一定の条件下では、後続分布が対数係数まで最小値の最適値を達成することを示す。計算量の観点からは、最適後部推論と計算スケーラビリティのバランスを保ち、コンポーネント間の依存性と時間的依存性を両立させる構造的平均場変動推論フレームワークを提案する。このフレームワークは、動的行列分解、ネットワークの潜在空間モデル、低ランクテンソルなど、様々なモデルに対応できる。本手法の有効性は,広範囲なシミュレーションと実世界のデータ解析によって実証される。 Modern data science applications often involve complex relational data with dynamic structures. An abrupt change in such dynamic relational data is typically observed in systems that undergo regime changes due to interventions. In such a case, we consider a factorized fusion shrinkage model in which all decomposed factors are dynamically shrunk towards group-wise fusion structures, where the shrinkage is obtained by applying global-local shrinkage priors to the successive differences of the row vectors of the factorized matrices. The proposed priors enjoy many favorable properties in comparison and clustering of the estimated dynamic latent factors. Comparing estimated latent factors involves both adjacent and long-term comparisons, with the time range of comparison considered as a variable. Under certain conditions, we demonstrate that the posterior distribution attains the minimax optimal rate up to logarithmic factors. In terms of computation, we present a structured mean-field variational inference framework that balances optimal posterior inference with computational scalability, exploiting both the dependence among components and across time. The framework can accommodate a wide variety of models, including dynamic matrix factorization, latent space models for networks and low-rank tensors. The effectiveness of our methodology is demonstrated through extensive simulations and real-world data analysis.	翻訳日:2024-07-17 05:38:07 公開日:2024-07-12
# 大規模で異なるプライベートなストリーム処理 Differentially Private Stream Processing at Scale ( http://arxiv.org/abs/2303.18086v3 ) ライセンス: Link先を確認	Bing Zhang, Vadym Doroshenko, Peter Kairouz, Thomas Steinke, Abhradeep Thakurta, Ziyin Ma, Eidan Cohen, Himani Apte, Jodi Spacek,	(参考訳) 我々は、私たちの知る限り、最初の差分プライベート(DP)ストリーム集約処理システムを大規模に設計する。当社のシステム - Differential Privacy SQL Pipelines (DP-SQLP) - Sparkストリーミングに似たストリーミングフレームワークを使用して構築されており、GoogleのSpannerデータベースとF1クエリエンジン上に構築されています。 DP-SQLPの設計に向けて,アルゴリズムとシステムの両方の進歩,すなわち我々は二新規(ユーザレベルの)DPキー選択アルゴリズムを設計し、使用可能なキーの無拘束セットを操作でき、ユーザがコントリビュートしたキーを10億個まで拡張することができる。 (二)トリガー時間毎に全てのキーを列挙しないDPキー選択のプリエンプティブ実行方式を設計し、三 DP連続観測のアルゴリズムを用いて、ストリーム長の異なるキーに対するユーザのコントリビューションの連続DPヒストグラムを解放する。有意義なベースラインよりも、少なくとも16\times$エラーを減らし、有効性を実証的に実証する。 DP-SQLPを用いたGoogle Shoppingのユーザ印象のストリーミングを実現した。ストリーミングDPアルゴリズムは、Google Trendsにも適用される。 We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shopping with DP-SQLP. The streaming DP algorithms are further applied to Google Trends.	翻訳日:2024-07-17 05:28:16 公開日:2024-07-12
# データセット著作権のための大規模言語モデルによる透かしテキストデータ Watermarking Text Data on Large Language Models for Dataset Copyright ( http://arxiv.org/abs/2305.13257v4 ) ライセンス: Link先を確認	Yixin Liu, Hongsheng Hu, Xun Chen, Xuyun Zhang, Lichao Sun,	(参考訳) 現状の研究では、大規模なコーパス上の深層モデル(例えば、事前訓練されたモデル)が、下流のNLPタスクに有用な普遍言語表現を学習できることが示されている。しかしながら、これらの強力なモデルはさまざまなプライバシ攻撃にも脆弱であり、トレーニングデータセットには多くの機密情報が存在している。攻撃者は公共のモデル、例えば個人のメールアドレスや電話番号から容易に機密情報を盗むことができる。このような問題,特に未許可のプライベートデータの利用に対処するために,テキストマーカというバックドアベースのメンバシップ推論手法を用いて,トレーニング用テキストデータに埋め込まれたさまざまな形式のプライベート情報を保護できる新しい透かし技術を導入する。具体的には、TextMarkerはデータ所有者に対して、ターゲットモデルに対するブラックボックスアクセス仮定の下で、データ著作権保護のための少数のサンプルをマークすることのみを要求する。各種実世界のデータセットに対するTextMarkerの有効性を示す。例えば、トレーニングデータセットの0.1%しかマークしていないことは、モデルユーティリティに無視できる効果を持つ効果的なメンバーシップ推論に十分である。また、潜在的な対策について議論し、TextMarkerがそれらをバイパスするのに十分なステルス性を示している。 Substantial research works have shown that deep models, e.g., pre-trained models, on the large corpus can learn universal language representations, which are beneficial for downstream NLP tasks. However, these powerful models are also vulnerable to various privacy attacks, while much sensitive information exists in the training dataset. The attacker can easily steal sensitive information from public models, e.g., individuals' email addresses and phone numbers. In an attempt to address these issues, particularly the unauthorized use of private data, we introduce a novel watermarking technique via a backdoor-based membership inference approach named TextMarker, which can safeguard diverse forms of private information embedded in the training text data. Specifically, TextMarker only requires data owners to mark a small number of samples for data copyright protection under the black-box access assumption to the target model. Through extensive evaluation, we demonstrate the effectiveness of TextMarker on various real-world datasets, e.g., marking only 0.1% of the training dataset is practically sufficient for effective membership inference with negligible effect on model utility. We also discuss potential countermeasures and show that TextMarker is stealthy enough to bypass them.	翻訳日:2024-07-17 05:18:31 公開日:2024-07-12
# Equivariant vs. Invariant Layers: ポイントクラウド分類のためのバックボーンとプールの比較 Equivariant vs. Invariant Layers: A Comparison of Backbone and Pooling for Point Cloud Classification ( http://arxiv.org/abs/2306.05553v2 ) ライセンス: Link先を確認	Abihith Kothapalli, Ashkan Shahbazi, Xinran Liu, Robert Sheng, Soheil Kolouri,	(参考訳) ポイントクラウドのようなセット構造化データから学ぶことは、機械学習コミュニティから大きな注目を集めている。幾何学的深層学習は、集合構造データの置換対称性を保持する効果的な集合ニューラルネットワークを設計するための青写真を提供する。我々の関心は、置換不変ネットワークであり、置換同変バックボーン、置換不変大域プール、回帰/分類ヘッドで構成されている。既存の文献では、均質なバックボーンの改善に焦点が当てられているが、プーリング層の影響はしばしば見過ごされている。本稿では,3つのベンチマークポイントクラウド分類データセット上での置換同変バックボーンと置換不変大域プールの相互作用について検討する。私たちの発見は、こう示しています。 1) トランスポートベースやアテンションベースといった複雑なプーリング手法は, 単純なバックボーンの性能を著しく向上させるが, より複雑なバックボーンではメリットが低下する。 2) 複雑なバックボーンでさえ、低いデータシナリオでレイヤをプールするメリットがあります。 3) 驚くべきことに、プール層の選択は、バックボーンの幅と深さを調整するよりも、モデルの性能に顕著な影響を与える可能性がある。 4) 固定バックボーンの性能を著しく向上させることができる。我々の総合的な研究は、実践者がより優れた置換不変集合ニューラルネットワークを設計するための洞察を提供する。私たちのコードはhttps://github.com/mint-vu/backbone_vs_pooling.comで利用可能です。 Learning from set-structured data, such as point clouds, has gained significant attention from the machine learning community. Geometric deep learning provides a blueprint for designing effective set neural networks that preserve the permutation symmetry of set-structured data. Of our interest are permutation invariant networks, which are composed of a permutation equivariant backbone, permutation invariant global pooling, and regression/classification head. While existing literature has focused on improving equivariant backbones, the impact of the pooling layer is often overlooked. In this paper, we examine the interplay between permutation equivariant backbones and permutation invariant global pooling on three benchmark point cloud classification datasets. Our findings reveal that: 1) complex pooling methods, such as transport-based or attention-based poolings, can significantly boost the performance of simple backbones, but the benefits diminish for more complex backbones, 2) even complex backbones can benefit from pooling layers in low data scenarios, 3) surprisingly, the choice of pooling layers can have a more significant impact on the model's performance than adjusting the width and depth of the backbone, and 4) pairwise combination of pooling layers can significantly improve the performance of a fixed backbone. Our comprehensive study provides insights for practitioners to design better permutation invariant set neural networks. Our code is available at https://github.com/mint-vu/backbone_vs_pooling.	翻訳日:2024-07-17 05:18:31 公開日:2024-07-12
# 大規模言語モデルにおけるバイアスと公正性:調査 Bias and Fairness in Large Language Models: A Survey ( http://arxiv.org/abs/2309.00770v3 ) ライセンス: Link先を確認	Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Nesreen K. Ahmed,	(参考訳) 大規模言語モデル(LLM)の急速な進歩により、人間のようなテキストの処理、理解、生成が可能となり、社会領域に触れるシステムへの統合が拡大した。この成功にもかかわらず、これらのモデルは有害な社会的バイアスを学習し、永続し、増幅することができる。本稿では,LLMのバイアス評価と緩和技術に関する総合的な調査を行う。まず、自然言語処理における社会的偏見と公平性の概念を統合、形式化し、拡張し、異なる害の面を定義し、LLMの公正性を運用するためにいくつかのデシラタを導入する。次に、3つの直感的な分類法、バイアス評価のための2つの指標とデータセット、緩和のための1つを提案して、文献を統一する。バイアス評価のためのメトリクスの最初の分類法は、メトリクスと評価データセットの関係を曖昧にし、それらがモデルで運用するさまざまなレベル(埋め込み、確率、生成されたテキスト)でメトリクスを整理します。バイアス評価のためのデータセットの第2の分類法は、その構造によるデータセットを対実的な入力やプロンプトとして分類し、ターゲットとなる害や社会集団を特定します。偏差緩和技術の第3の分類法は, 事前処理, イントレーニング, イントラプロセッシング, ポストプロセッシングの介入によって, 研究動向を解明する粒度のサブカテゴリを分類する。最後に、今後の作業におけるオープンな問題と課題を特定します。近年の幅広い研究を合成し、研究者や実践者がLLMのバイアスの伝播をよりよく理解し防止できるように、既存の文献の明確なガイドを提供することを目指している。 Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.	翻訳日:2024-07-17 04:58:50 公開日:2024-07-12
# SimNP: 神経点間の自己相似性を学習する SimNP: Learning Self-Similarity Priors Between Neural Points ( http://arxiv.org/abs/2309.03809v2 ) ライセンス: Link先を確認	Christopher Wewer, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen,	(参考訳) 既存の3Dオブジェクト再構成のためのニューラルネットワーク表現は、(1)オブジェクトレベル表現を利用するが、グローバルな潜伏符号の条件付けにより、低品質の細部に苦しむか、(2)観察を完璧に再構築することができるが、観測されていない領域を推測するためにオブジェクトレベルの事前知識を利用できないかのいずれかである。カテゴリーレベルの自己相似性を学習する手法であるSimNPを提案する。これは、ニューラルネットワークとカテゴリレベルの自己相似性表現を結合することにより、両方の世界の利点を組み合わせたものである。私たちの貢献は2倍です。 1) コヒーレント・ポイント・クラウドの概念を利用して,カテゴリーレベルでの最初の神経点表現を設計する。結果として得られる神経点放射場は、局所的に支持された対象領域に対して高いレベルの詳細を記憶する。 2) 再建過程において, 対象の未観測領域を所定の観測から導き出すことが可能な, 制約のない, 教師なしの方法で, ニューラルポイント間での情報共有を学習する。我々は、SimNPが、カテゴリレベルまたはピクセルアラインなラディアンスフィールド上に構築され、インスタンス間の意味的対応を提供しながら、対称な見えないオブジェクト領域を再構築する従来の手法よりも優れていることを示す。 Existing neural field representations for 3D object reconstruction either (1) utilize object-level representations, but suffer from low-quality details due to conditioning on a global latent code, or (2) are able to perfectly reconstruct the observations, but fail to utilize object-level prior knowledge to infer unobserved regions. We present SimNP, a method to learn category-level self-similarities, which combines the advantages of both worlds by connecting neural point radiance fields with a category-level self-similarity representation. Our contribution is two-fold. (1) We design the first neural point representation on a category level by utilizing the concept of coherent point clouds. The resulting neural point radiance fields store a high level of detail for locally supported object regions. (2) We learn how information is shared between neural points in an unconstrained and unsupervised fashion, which allows to derive unobserved regions of an object during the reconstruction process from given observations. We show that SimNP is able to outperform previous methods in reconstructing symmetric unseen object regions, surpassing methods that build upon category-level or pixel-aligned radiance fields, while providing semantic correspondences between instances	翻訳日:2024-07-17 04:58:50 公開日:2024-07-12
# MEMO:大または小血管密度差を有するロバスト多モード網膜画像登録のためのデータセットと方法 MEMO: Dataset and Methods for Robust Multimodal Retinal Image Registration with Large or Small Vessel Density Differences ( http://arxiv.org/abs/2309.14550v2 ) ライセンス: Link先を確認	Chiao-Yi Wang, Faranguisse Kakhi Sadrieh, Yi-Ting Shen, Shih-En Chen, Sarah Kim, Victoria Chen, Achyut Raghavendra, Dongyi Wang, Osamah Saeedi, Yang Tao,	(参考訳) 毛細血管における網膜血流(RBF)の測定は、眼疾患の早期診断と治療のための強力なバイオマーカーとなる。しかし、キャピラリー流量を高精度で決定できる単一のモダリティは存在しない。 EMAは網膜微小血管の絶対2D RBFを測定することができ、OCTAは毛細血管の3D構造像を提供することができるため、EMAと光コヒーレンス断層血管造影(OCTA)を組み合わせることでこの目標を達成することができる。しかし、これらの2つのモード間のマルチモーダル網膜画像の登録はほとんど未発見のままである。このギャップを埋めるために、最初のパブリックマルチモーダルEMAであるMEMOとOCTA網膜画像データセットを構築した。これらのモダリティ間のマルチモーダル網膜画像登録におけるユニークな課題は、血管密度(VD)の相対的な大きな差である。この課題に対処するために,分割型ディープラーニングフレームワーク (VDD-Reg) と新しい評価指標 (MSD) を提案する。 VDD-Regはコンテナセグメンテーションモジュールと登録モジュールで構成される。船体セグメンテーションモジュールを訓練するために,教師なしと教師なしの損失を組み合わせた2段階の半教師付き学習フレームワーク(LVD-Seg)を設計した。 CF-FAデータセットを用いた)小さなVD差と大きなVD差(MEMOデータセットを用いた)の場合に,VDD-Regはベースライン法を定量的かつ定性的に上回ることを示す。さらに、VDD-Regはその精度を維持するために3つの注釈付き容器セグメンテーションマスクが必要であり、その実現可能性を示している。 The measurement of retinal blood flow (RBF) in capillaries can provide a powerful biomarker for the early diagnosis and treatment of ocular diseases. However, no single modality can determine capillary flowrates with high precision. Combining erythrocyte-mediated angiography (EMA) with optical coherence tomography angiography (OCTA) has the potential to achieve this goal, as EMA can measure the absolute 2D RBF of retinal microvasculature and OCTA can provide the 3D structural images of capillaries. However, multimodal retinal image registration between these two modalities remains largely unexplored. To fill this gap, we establish MEMO, the first public multimodal EMA and OCTA retinal image dataset. A unique challenge in multimodal retinal image registration between these modalities is the relatively large difference in vessel density (VD). To address this challenge, we propose a segmentation-based deep-learning framework (VDD-Reg) and a new evaluation metric (MSD), which provide robust results despite differences in vessel density. VDD-Reg consists of a vessel segmentation module and a registration module. To train the vessel segmentation module, we further designed a two-stage semi-supervised learning framework (LVD-Seg) combining supervised and unsupervised losses. We demonstrate that VDD-Reg outperforms baseline methods quantitatively and qualitatively for cases of both small VD differences (using the CF-FA dataset) and large VD differences (using our MEMO dataset). Moreover, VDD-Reg requires as few as three annotated vessel segmentation masks to maintain its accuracy, demonstrating its feasibility.	翻訳日:2024-07-17 04:48:58 公開日:2024-07-12
# VideoDirectorGPT:LLM誘導計画による連続マルチシーン映像生成 VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning ( http://arxiv.org/abs/2309.15091v2 ) ライセンス: Link先を確認	Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal,	(参考訳) 近年のテキスト・ツー・ビデオ(T2V)生成法は大きな進歩を遂げている。しかし、これらの作品の大半は、1つのイベント(すなわちシングルシーンのビデオ)の短いビデオクリップを作ることに重点を置いている。一方、最近の大規模言語モデル(LLM)は、下流のビジュアルモジュールを制御するレイアウトとプログラムを生成する能力を実証している。これらのLLMに埋め込まれた知識を時間的に一貫した長ビデオ生成に活用できるか? 本稿では,ビデオコンテンツプランニングとグラウンドドビデオ生成にLLMの知識を利用する,一貫したマルチシーンビデオ生成のための新しいフレームワークであるVideoDirectorGPTを提案する。具体的には、1つのテキストプロンプトが与えられた場合、まずビデオプランナのLCM(GPT-4)に、シーン記述、各レイアウトを持つエンティティ、各シーンの背景、エンティティの一貫性グループ化を含む「ビデオプラン」への拡張を依頼する。次に、このビデオプランでガイドされたビデオジェネレータLayout2Vidは、空間的レイアウトを明示的に制御し、複数のシーンにまたがるエンティティの時間的一貫性を保ちながら、画像レベルのアノテーションでのみ訓練することができる。実験により,本フレームワークは単一シーンと多シーンのビデオ生成におけるレイアウトと移動制御を大幅に改善し,複数シーンのビデオの一貫性を保ちながら,オープンドメインの単一シーンT2V生成におけるSOTAとの競合性能を実現した。 LLMによるレイアウト制御強度の動的調整や、ユーザが提供する画像による映像生成など、詳細なアブレーション研究により、我々のフレームワークの各コンポーネントの有効性と今後の可能性を確認することができる。 Recent text-to-video (T2V) generation methods have seen significant advancements. However, the majority of these works focus on producing short video clips of a single event (i.e., single-scene videos). Meanwhile, recent large language models (LLMs) have demonstrated their capability in generating layouts and programs to control downstream visual modules. This prompts an important question: can we leverage the knowledge embedded in these LLMs for temporally consistent long video generation? In this paper, we propose VideoDirectorGPT, a novel framework for consistent multi-scene video generation that uses the knowledge of LLMs for video content planning and grounded video generation. Specifically, given a single text prompt, we first ask our video planner LLM (GPT-4) to expand it into a 'video plan', which includes the scene descriptions, the entities with their respective layouts, the background for each scene, and consistency groupings of the entities. Next, guided by this video plan, our video generator, named Layout2Vid, has explicit control over spatial layouts and can maintain temporal consistency of entities across multiple scenes, while being trained only with image-level annotations. Our experiments demonstrate that our proposed VideoDirectorGPT framework substantially improves layout and movement control in both single- and multi-scene video generation and can generate multi-scene videos with consistency, while achieving competitive performance with SOTAs in open-domain single-scene T2V generation. Detailed ablation studies, including dynamic adjustment of layout control strength with an LLM and video generation with user-provided images, confirm the effectiveness of each component of our framework and its future potential.	翻訳日:2024-07-17 04:48:58 公開日:2024-07-12
# ひとつは、すべての分類タスクのための1つのグラフモデルをトレーニングすること One for All: Towards Training One Graph Model for All Classification Tasks ( http://arxiv.org/abs/2310.00149v3 ) ライセンス: Link先を確認	Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, Muhan Zhang,	(参考訳) 複数のタスクに対処する単一モデルを設計することは、人工知能の長年の目標である。近年,大規模言語モデルは言語領域内で異なるタスクを解く際,例外的な能力を示した。しかし、グラフ学習領域に固有の課題のために、様々なグラフタスクの統一モデルがまだ探索されていない。まず、異なる領域のグラフデータは異なる属性を持ち、異なる分布に従う。このような相違により、単一の表現空間におけるグラフの表現が困難になる。第二に、グラフ上のタスクはノード、リンク、グラフタスクに多様化し、異なる埋め込み戦略を必要とする。最後に、文脈内学習のための適切なグラフプロンプトパラダイムが不明確である。我々は、上記の課題に対処するために単一のグラフモデルを使用する最初の一般的なフレームワークである、OFA(textbf{One for All)を提案する。具体的には、ノードとエッジを自然言語で記述することで、異なるグラフデータを統一するテキスト分散グラフを提案し、言語モデルを使用して、多様でおそらくクロスドメインなテキスト属性を符号化し、同じ埋め込み空間における特徴ベクトルを符号化する。さらに、OFAは1つのタスク表現で異なるタスクを標準化するノードオブ関心の概念を導入している。グラフ上のコンテキスト内学習のためにOFAは、入力グラフにサブストラクチャを付加する新しいグラフプロンプトパラダイムを導入し、微調整なしで様々なタスクに対処できるようにする。我々は、複数のドメイン(引用ネットワーク、分子グラフ、知識グラフなど)のグラフデータを用いてOFAモデルを同時に訓練し、教師付き、少数ショット、ゼロショット学習シナリオにおけるその能力を評価する。 OFAは様々なタスクでうまく機能し、グラフ上の最初の汎用のクロスドメイン分類モデルとなる。 Designing a single model to address multiple tasks has been a long-standing objective in artificial intelligence. Recently, large language models have demonstrated exceptional capability in solving different tasks within the language domain. However, a unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain. First, graph data from different areas carry distinct attributes and follow different distributions. Such discrepancy makes it hard to represent graphs in a single representation space. Second, tasks on graphs diversify into node, link, and graph tasks, requiring distinct embedding strategies. Finally, an appropriate graph prompting paradigm for in-context learning is unclear. We propose \textbf{One for All (OFA)}, the first general framework that can use a single graph model to address the above challenges. Specifically, OFA proposes text-attributed graphs to unify different graph data by describing nodes and edges with natural language and uses language models to encode the diverse and possibly cross-domain text attributes to feature vectors in the same embedding space. Furthermore, OFA introduces the concept of nodes-of-interest to standardize different tasks with a single task representation. For in-context learning on graphs, OFA introduces a novel graph prompting paradigm that appends prompting substructures to the input graph, which enables it to address varied tasks without fine-tuning. We train the OFA model using graph data from multiple domains (including citation networks, molecular graphs, knowledge graphs, etc.) simultaneously and evaluate its ability in supervised, few-shot, and zero-shot learning scenarios. OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.	翻訳日:2024-07-17 04:48:58 公開日:2024-07-12
# 安全性と安全性を活かした2層ブロックチェーンシャーディングプロトコルの高性能化 A Two-Layer Blockchain Sharding Protocol Leveraging Safety and Liveness for Enhanced Performance ( http://arxiv.org/abs/2310.11373v5 ) ライセンス: Link先を確認	Yibin Xu, Jingyi Zheng, Boris Düdder, Tijs Slaats, Yongluan Zhou,	(参考訳) シャーディングはブロックチェーンのスケーラビリティ向上に不可欠だ。既存のプロトコルは、さまざまな敵攻撃を見落とし、トランザクションスループットを制限します。本稿では、この問題に対処する基盤的なシャーディングプロトコルであるReticulumを紹介し、ブロックチェーンのスケーラビリティを向上する。 Reticulumは2段階のアプローチを採用し、実行時逆アタックに基づくトランザクションスループットを適用している。コントロール"と"プロセス"のシャードを2つのレイヤで構成する。プロセスシャードには少なくとも1つの信頼できるノードが含まれ、コントロールシャードには信頼性のあるノードが多数含まれている。最初のフェーズでは、トランザクションはブロックに書き込まれ、プロセスシャード内のノードによって投票される。承認されたブロックが全会一致で確認される。第2段階では、全会一致の受け入れられないブロックは制御シャードによって投票される。多数派が賛成すればブロックが認められ、第一段階の反対者や無言の有権者は排除される。 Reticulumは第1フェーズで全会一致投票を使用しており、ノードが少ないため、より並列なプロセスシャードが可能である。コントロールシャードは決定を確定し、紛争を解決します。 Reticulumの革新的な設計を確認し、さまざまなネットワーク攻撃に対して高いトランザクションスループットと堅牢性を提供し、ブロックチェーンネットワークの既存のシャーディングプロトコルを上回っている。 Sharding is essential for improving blockchain scalability. Existing protocols overlook diverse adversarial attacks, limiting transaction throughput. This paper presents Reticulum, a groundbreaking sharding protocol addressing this issue, boosting blockchain scalability. Reticulum employs a two-phase approach, adapting transaction throughput based on runtime adversarial attacks. It comprises "control" and "process" shards in two layers. Process shards contain at least one trustworthy node, while control shards have a majority of trusted nodes. In the first phase, transactions are written to blocks and voted on by nodes in process shards. Unanimously accepted blocks are confirmed. In the second phase, blocks without unanimous acceptance are voted on by control shards. Blocks are accepted if the majority votes in favor, eliminating first-phase opponents and silent voters. Reticulum uses unanimous voting in the first phase, involving fewer nodes, enabling more parallel process shards. Control shards finalize decisions and resolve disputes. Experiments confirm Reticulum's innovative design, providing high transaction throughput and robustness against various network attacks, outperforming existing sharding protocols for blockchain networks.	翻訳日:2024-07-17 04:48:58 公開日:2024-07-12
# メタ学習の欠如は、言語モデルがより信頼できる情報源を信頼させるかもしれない Implicit meta-learning may lead language models to trust more reliable sources ( http://arxiv.org/abs/2310.15047v4 ) ライセンス: Link先を確認	Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, Tegan Maharaj, David Krueger,	(参考訳) LLMは文書の有用性の指標を学習し,それに応じて更新を変更できることを実証する。合成微調整データセットにおける有用性の指標としてランダム文字列(タグ)を導入する。このデータセットの微調整は暗黙的なメタ学習(IML)につながる。さらに微調整では、タグ付けされたテキストをより有効に活用するためのモデル更新が行われる。我々は、この現象の徹底的な実証調査を行い、(その他に)その現象を発見した。一予め訓練したLLM及びスクラッチから訓練を受けたもの及び視覚課題で発生すること。 (ii) より大きなモデルと小さなバッチサイズは、より多くのMLを与える傾向があります。また、モデルがパラメーターに知識を格納する方法をIMLがどう変えるかを調べるために、探索も使用しています。最後に、将来のAIシステムの能力、リスク、制御可能性について、私たちの結果が示唆するものを反映します。私たちのコードはhttps://github.com/krasheninnikov/internalization.orgにある。 We demonstrate that LLMs may learn indicators of document usefulness and modulate their updates accordingly. We introduce random strings ("tags") as indicators of usefulness in a synthetic fine-tuning dataset. Fine-tuning on this dataset leads to implicit meta-learning (IML): in further fine-tuning, the model updates to make more use of text that is tagged as useful. We perform a thorough empirical investigation of this phenomenon, finding (among other things) that (i) it occurs in both pretrained LLMs and those trained from scratch, as well as on a vision task, and (ii) larger models and smaller batch sizes tend to give more IML. We also use probing to examine how IML changes the way models store knowledge in their parameters. Finally, we reflect on what our results might imply about capabilities, risks, and controllability of future AI systems. Our code can be found at https://github.com/krasheninnikov/internalization.	翻訳日:2024-07-17 02:54:11 公開日:2024-07-12
# カーシェアリングのための車間グリッド-2030年のシミュレーション研究- Vehicle-to-grid for car sharing -- A simulation study for 2030 ( http://arxiv.org/abs/2311.07349v2 ) ライセンス: Link先を確認	Nina Wiedemann, Yanan Xin, Vasco Medici, Lorenzo Nespoli, Esra Suel, Martin Raubal,	(参考訳) 近年のカーシェアリングサービスの普及は、持続可能な輸送を推し進めるための有望な道のりを示している。単に車の所有率を下げるだけでなく、これらのシステムは車両間通信(V2G)技術による補助サービスの提供を通じてグリッドの安定性を高める上で重要な役割を担っている。本研究では、スイスにおける全国規模のサービスのための将来のシナリオを設計し、カーシェアリングにおけるV2Gの可能性を分析する。カーシェアリングサービスのさまざまなビジネス戦略と同様に,人口変動を考慮したエージェントベースシミュレーションパイプラインを提案し,2030年のシナリオシミュレーションにおけるその成功例を示す。カーシェアリングのユーザ動作を模倣するため,データ駆動型モード選択モデルを開発した。本分析では, 車両使用率の向上など, 車両の小型化, 新たなシェアリングステーションの設置など, 検討シナリオにおける重要な違いを明らかにした。これらの格差は、シナリオと日時に応じて、12MWから50MWまでのアシラリーサービスで利用可能な艦隊の電力柔軟性のバリエーションに変換される。さらに、実際の電力価格データを組み込んだ、カーシェアリングフリートのサブセットを含むケーススタディも実施する。このケーススタディは、電力グリッド事業者と艦隊所有者の両方にとって、金銭的利益を伴うスイートスポットの存在を裏付けるものである。本研究は意思決定者に対してガイドラインを提供し,カーシェアリングの領域内での電力取引に関する規制強化の必要性を強調した。 The proliferation of car sharing services in recent years presents a promising avenue for advancing sustainable transportation. Beyond merely reducing car ownership rates, these systems can play a pivotal role in bolstering grid stability through the provision of ancillary services via vehicle-to-grid (V2G) technologies - a facet that has received limited attention in previous research. In this study, we analyze the potential of V2G in car sharing by designing future scenarios for a national-scale service in Switzerland. We propose an agent-based simulation pipeline that considers population changes as well as different business strategies of the car sharing service, and we demonstrate its successful application for simulating scenarios for 2030. To imitate car sharing user behavior, we develop a data-driven mode choice model. Our analysis reveals important differences in the examined scenarios, such as higher vehicle utilization rates for a reduced fleet size as well as in a scenario featuring new car sharing stations. These disparities translate into variations in the power flexibility of the fleet available for ancillary services, ranging from 12 to 50 MW, depending on the scenario and the time of the day. Furthermore, we conduct a case study involving a subset of the car sharing fleet, incorporating real-world electricity pricing data. The case study substantiates the existence of a sweet spot involving monetary gains for both power grid operators and fleet owners. Our findings provide guidelines to decision makers and underscore the pressing need for regulatory enhancements concerning power trading within the realm of car sharing.	翻訳日:2024-07-17 02:54:11 公開日:2024-07-12
# アップサンプリング時の特徴安定性の向上 -スペクトルアーチファクトと空間文脈の重要性- Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context ( http://arxiv.org/abs/2311.17524v2 ) ライセンス: Link先を確認	Shashank Agnihotri, Julia Grabinski, Margret Keuper,	(参考訳) 画像復元、画像分割、不均一性推定など、さまざまなタスクにおいて、画素ワイズ予測が求められている。共通モデルはデータ再サンプリングのいくつかの段階を含み、特徴マップの解像度をまず情報を集約し、次に高解像度の出力を生成する。以前の研究では、再サンプリング操作がエイリアスなどのアーティファクトの対象であることが示されている。ダウンサンプリング中、エイリアスは画像分類器の予測安定性を損なうことが示されている。アップサンプリング中は、生成されたコンテンツを検出するために利用されています。しかし、アップサンプリング中のエイリアスの影響については、ピクセルワイズ予測の安定性と堅牢性についてはまだ議論されていない。同じ用語(エイリアス)に該当する一方で、ニューラルネットワークの正当性アップサンプリングの課題は、ダウンサンプリング中のそれと大きく異なる:ダウンサンプリングの際、一部の高頻度を正しく表現できず、エイリアスを避けるために除去する必要がある。しかし、ピクセルワイズ予測のアップサンプリングでは、低解像度では符号化できないような高周波数を復元する必要がある。したがって、信号処理による発見の応用は必要であるが、望ましい出力を達成するのに十分な条件ではない。対照的に、アップサンプリング中の大きな空間コンテキストの可用性は、全てのフィルタ重みを完全に学習しても、安定で高品質な画素ワイドの予測を可能にする。 Pixel-wise predictions are required in a wide variety of tasks such as image restoration, image segmentation, or disparity estimation. Common models involve several stages of data resampling, in which the resolution of feature maps is first reduced to aggregate information and then increased to generate a high-resolution output. Previous works have shown that resampling operations are subject to artifacts such as aliasing. During downsampling, aliases have been shown to compromise the prediction stability of image classifiers. During upsampling, they have been leveraged to detect generated content. Yet, the effect of aliases during upsampling has not yet been discussed w.r.t. the stability and robustness of pixel-wise predictions. While falling under the same term (aliasing), the challenges for correct upsampling in neural networks differ significantly from those during downsampling: when downsampling, some high frequencies can not be correctly represented and have to be removed to avoid aliases. However, when upsampling for pixel-wise predictions, we actually require the model to restore such high frequencies that can not be encoded in lower resolutions. The application of findings from signal processing is therefore a necessary but not a sufficient condition to achieve the desirable output. In contrast, we find that the availability of large spatial context during upsampling allows to provide stable, high-quality pixel-wise predictions, even when fully learning all filter weights.	翻訳日:2024-07-17 02:44:20 公開日:2024-07-12
# 非線形連続時間系のクラスにおける標本複雑度の推定 Estimation Sample Complexity of a Class of Nonlinear Continuous-time Systems ( http://arxiv.org/abs/2312.05382v3 ) ライセンス: Link先を確認	Simon Kuang, Xinfan Lin,	(参考訳) 本稿では, 大規模非線形系のパラメータ推定法について述べる。正規化線形回帰を用いて力学を直接反転させることにより未知パラメータを解く手法は、微分フィルタと正規化最小二乗の新たな設計と解析のアイデアに基づいている。直列で組み合わせると、平均絶対誤差に基づく新しい有限サンプルが得られる。 We present a method of parameter estimation for large class of nonlinear systems, namely those in which the state consists of output derivatives and the flow is linear in the parameter. The method, which solves for the unknown parameter by directly inverting the dynamics using regularized linear regression, is based on new design and analysis ideas for differentiation filtering and regularized least squares. Combined in series, they yield a novel finite-sample bound on mean absolute error of estimation.	翻訳日:2024-07-17 02:34:28 公開日:2024-07-12
# 報酬源としての視覚言語モデル Vision-Language Models as a Source of Rewards ( http://arxiv.org/abs/2312.09187v3 ) ライセンス: Link先を確認	Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Dmitry Nikulin, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang, Lei Zhang,	(参考訳) 豊かなオープンエンド環境で多くの目標を達成できる汎用エージェントの構築は、強化学習のための研究フロンティアの1つである。 RLを用いた一般エージェント構築の鍵となる制限要因は、異なる目標を達成するために多数の報酬関数が必要であることである。強化学習エージェントの報酬源として市販の視覚言語モデル(VLM)の有効性を検討する。様々な言語目標の視覚的達成に対する報酬は、CLIPファミリーのモデルから導き出すことができ、様々な言語目標を達成するためのRLエージェントの訓練に使用されることを示す。このアプローチを2つの異なる視覚領域で示し、より大きなVLMが視覚目標達成に対してより正確な報酬をもたらすかを示すスケーリング傾向を示し、それによってより有能なRLエージェントを生成する。 Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.	翻訳日:2024-07-17 02:34:28 公開日:2024-07-12
# 検出器シグナチャシミュレーションのための深部生成モデル:分類学的レビュー Deep Generative Models for Detector Signature Simulation: A Taxonomic Review ( http://arxiv.org/abs/2312.09597v2 ) ライセンス: Link先を確認	Baran Hashemi, Claudius Krause,	(参考訳) 現代の衝突型加速器実験では、素粒子間の基本的な相互作用を探究する探索は、非平行な精度に達している。粒子物理学検出器からの信号は、衝突の物理(ハード散乱相互作用の最終状態粒子)を符号化する低レベル物体(エネルギー沈降や軌道など)である。検出器におけるそれらの完全なシミュレーションは、計算と記憶集約的なタスクである。粒子物理学におけるこの計算ボトルネックに対処するため、新たな仮定を導入し、速度の精度をトレードオフする別の手法が開発され、深部生成モデルの進歩によって加速された検出器シミュレーションの代理モデルへの関心が高まっている。これらのモデルは、観測データと統計的に同一の応答を生成することを目的としている。本稿では,従来の文献を包括的かつ徹底的に分析し,方法論的・応用的両面から検出シグネチャのシミュレーションを行う。まず、検出器シグネチャシミュレーションの問題を定式化し、統一可能な様々なバリエーションについて議論する。次に、その基礎となるモデルアーキテクチャに基づいて、最先端の手法を5つのカテゴリに分類し、それぞれの生成戦略を要約する。最後に、検出器シグネチャシミュレーションに先立つ課題と機会を明らかにし、将来の研究開発のステージを設定します。 In modern collider experiments, the quest to explore fundamental interactions between elementary particles has reached unparalleled levels of precision. Signatures from particle physics detectors are low-level objects (such as energy depositions or tracks) encoding the physics of collisions (the final state particles of hard scattering interactions). The complete simulation of them in a detector is a computational and storage-intensive task. To address this computational bottleneck in particle physics, alternative approaches have been developed, introducing additional assumptions and trade off accuracy for speed.The field has seen a surge in interest in surrogate modeling the detector simulation, fueled by the advancements in deep generative models. These models aim to generate responses that are statistically identical to the observed data. In this paper, we conduct a comprehensive and exhaustive taxonomic review of the existing literature on the simulation of detector signatures from both methodological and application-wise perspectives. Initially, we formulate the problem of detector signature simulation and discuss its different variations that can be unified. Next, we classify the state-of-the-art methods into five distinct categories based on their underlying model architectures, summarizing their respective generation strategies. Finally, we shed light on the challenges and opportunities that lie ahead in detector signature simulation, setting the stage for future research and development.	翻訳日:2024-07-17 02:24:41 公開日:2024-07-12
# 一般化されたスタインの補題と部分代数エントロピーの漸近等分性 Generalized Stein's lemma and asymptotic equipartition property for subalgebra entropies ( http://arxiv.org/abs/2401.03090v2 ) ライセンス: Link先を確認	Li Gao, Mizanur Rahaman,	(参考訳) 量子シュタインの補題は、2つの量子状態の区別という文脈における量子仮説テストの基本的な結果である。最近の予想では、「一般化された量子シュタインの補題」と呼ばれ、この結果は、状態の1つが量子状態の凸集合に置き換えられる一般的な枠組みにおいて真であると主張している。この研究において、一般化されたシュタインの補題の主張は、第2の仮説が任意の部分代数 $\mathcal{N}$ の状態空間であるような設定に対して真であることを示す。これは、任意の固定平滑化パラメータ $\epsilon\in (0,1)$ に対して適用される滑らかな部分代数エントロピーに対する強い漸近的同値性によって得られる。資源理論の応用として, サブアルゲブラの相対エントロピーは, 適切な操作下での漸近希釈コストであることを示す。これにより、異なる量子リソース間の接続を確立することができる。 The quantum Stein's lemma is a fundamental result of quantum hypothesis testing in the context of distinguishing two quantum states. A recent conjecture, known as the ``generalized quantum Stein's lemma", asserts that this result is true in a general framework where one of the states is replaced by convex sets of quantum states. In this work, we show that the assertion of the generalized Stein's lemma is true for the setting where the second hypothesis is the state space of any subalgebra $\mathcal{N}$. This is obtained through a strong asymptotic equipartition property for smooth subalgebra entropies that applies for any fixed smoothing parameter $\epsilon\in (0,1)$. As an application in resource theory, we show that the relative entropy of a subalgebra is the asymptotic dilution cost under suitable operations. This provides a scope to establish a connection between different quantum resources.	翻訳日:2024-07-17 02:24:41 公開日:2024-07-12
# パラメトリックマトリックスモデル Parametric Matrix Models ( http://arxiv.org/abs/2401.11694v4 ) ライセンス: Link先を確認	Patrick Cook, Danny Jammooa, Morten Hjorth-Jensen, Daniel D. Lee, Dean Lee,	(参考訳) パラメトリック行列モデルと呼ばれる機械学習アルゴリズムの一般クラスを示す。ニューロンの生物学を模倣する既存の機械学習モデルとは異なり、パラメトリック行列モデルは量子系の物理をエミュレートする行列方程式を使用する。物理問題の解法と同様に、パラメトリック行列モデルは所望の出力につながる支配方程式を学習する。パラメトリック行列モデルは経験的データから効率的に訓練することができ、方程式は代数的、微分的、あるいは積分的関係を用いることができる。もともと科学計算用に設計されたが、パラメトリック行列モデルは一般的な機械学習問題に適用可能な普遍関数近似器であることが証明されている。基礎となる理論を導入した後、パラメトリック行列モデルを幅広い問題に対してそれらの性能を示す一連の異なる課題に適用する。ここで検証された全ての課題に対して、パラメトリック行列モデルは、入力特徴外挿を可能にする効率的で解釈可能な計算フレームワーク内で正確な結果を生成する。 We present a general class of machine learning algorithms called parametric matrix models. In contrast with most existing machine learning models that imitate the biology of neurons, parametric matrix models use matrix equations that emulate the physics of quantum systems. Similar to how physics problems are usually solved, parametric matrix models learn the governing equations that lead to the desired outputs. Parametric matrix models can be efficiently trained from empirical data, and the equations may use algebraic, differential, or integral relations. While originally designed for scientific computing, we prove that parametric matrix models are universal function approximators that can be applied to general machine learning problems. After introducing the underlying theory, we apply parametric matrix models to a series of different challenges that show their performance for a wide range of problems. For all the challenges tested here, parametric matrix models produce accurate results within an efficient and interpretable computational framework that allows for input feature extrapolation.	翻訳日:2024-07-17 02:14:47 公開日:2024-07-12
# 大規模言語モデルの教育的アライメント Pedagogical Alignment of Large Language Models ( http://arxiv.org/abs/2402.05000v2 ) ライセンス: Link先を確認	Shashank Sonkar, Kangqi Ni, Sapana Chaudhary, Richard G. Baraniuk,	(参考訳) 本稿では,LLMの教育的文脈における応用の変革的変化を示す,Large Language Models (LLMs) の概念を紹介する。ユーザクエリへの直接応答を提供するのではなく、段階的に整列されたLLMが足場として機能し、複雑な問題を管理可能なサブプロブレムに分割し、建設的なフィードバックとヒントを通じて最終回答へと導く。目的は、学習者に課題の理解と内部化を深める問題解決戦略を付与することである。この分野でのこれまでの研究は主に、目標をアライメント問題とみなすことなく、教師付き微調整アプローチを適用してきたため、人間からのフィードバック(RLHF)法による強化学習は行わなかった。本研究は、アライメント・オブ・アライメントを通してタスクを観察することで物語を再解釈し、RLHFメソッドがLLM動作の整列に優れた代替手段として自然に現れることを示す。この観点から,LLMの教育的アライメントに特化して設計された報酬データセットを構築するための新しい手法を提案する。我々は最先端のRLHFアルゴリズムを3つ適用し、SFTを著しく上回る結果を得た。モデル差とハイパーパラメータ感度の質的解析により,SFTよりもRLHFの方が優れていることが示された。また,本研究は,教育現場における教育現場におけるLLMの性能向上のためのオンラインフィードバックの可能性に注目し,これらのモデルの発展に有意義な洞察を与えるものである。 In this paper, we introduce the novel concept of pedagogically aligned Large Language Models (LLMs) that signifies a transformative shift in the application of LLMs within educational contexts. Rather than providing direct responses to user queries, pedagogically-aligned LLMs function as scaffolding tools, breaking complex problems into manageable subproblems and guiding students towards the final answer through constructive feedback and hints. The objective is to equip learners with problem-solving strategies that deepen their understanding and internalization of the subject matter. Previous research in this field has primarily applied the supervised finetuning approach without framing the objective as an alignment problem, hence not employing reinforcement learning through human feedback (RLHF) methods. This study reinterprets the narrative by viewing the task through the lens of alignment and demonstrates how RLHF methods emerge naturally as a superior alternative for aligning LLM behaviour. Building on this perspective, we propose a novel approach for constructing a reward dataset specifically designed for the pedagogical alignment of LLMs. We apply three state-of-the-art RLHF algorithms and find that they outperform SFT significantly. Our qualitative analyses across model differences and hyperparameter sensitivity further validate the superiority of RLHF over SFT. Also, our study sheds light on the potential of online feedback for enhancing the performance of pedagogically-aligned LLMs, thus providing valuable insights for the advancement of these models in educational settings.	翻訳日:2024-07-17 02:05:02 公開日:2024-07-12
# ProTIP:確率的摂動に対するテキスト・画像拡散モデルの確率的ロバスト性検証 ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation ( http://arxiv.org/abs/2402.15429v2 ) ライセンス: Link先を確認	Yi Zhang, Yun Tang, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Paul Jennings, Xingyu Zhao,	(参考訳) テキスト・ツー・イメージ(T2I)拡散モデル(DM)は、単純なテキスト記述に基づいて高品質な画像を生成する際、印象的な能力を示した。しかし、多くのディープラーニング(DL)モデルに共通するように、DMは堅牢性に欠ける。 T2I DMのロバスト性は二分問題や最悪の問題として評価する試みもあるが、逆例(AE)が見つかると、モデルが一般にロバストであることに答えることはできない。本研究ではまず,T2I DMsの頑健性に関する確率論的概念を導入し,統計的保証により評価するための効率的なフレームワークであるProTIPを確立する。主な課題は次の通りである。一生成工程の計算コストが高いこと。 ii) 摂動入力がAEであるか否かを決定するには、2つの出力分布を比較する必要があるが、これはラベルの誤認によりAEが識別される分類のような他のDLタスクと比べて根本的に困難である。これらの課題に対処するために,AEを識別するための統計検査において,有効性と不確実性の早期停止規則を用いた逐次解析と適応濃度の不等式を用いて,検証対象が満たされる度に,確率的摂動の「正しい」個数を動的に決定する。実験により、一般的なT2I DM上でのProTIPの有効性と効率が検証された。最後に,一般に使用されている防御手法のランク付けにProTIPを適用した。 Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions. However, as is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is whenever an adversarial example (AE) can be found. In this study, we first introduce a probabilistic notion of T2I DMs' robustness; and then establish an efficient framework, ProTIP, to evaluate it with statistical guarantees. The main challenges stem from: i) the high computational cost of the generation process; and ii) determining if a perturbed input is an AE involves comparing two output distributions, which is fundamentally harder compared to other DL tasks like classification where an AE is identified upon misprediction of labels. To tackle the challenges, we employ sequential analysis with efficacy and futility early stopping rules in the statistical testing for identifying AEs, and adaptive concentration inequalities to dynamically determine the "just-right" number of stochastic perturbations whenever the verification target is met. Empirical experiments validate the effectiveness and efficiency of ProTIP over common T2I DMs. Finally, we demonstrate an application of ProTIP to rank commonly used defence methods.	翻訳日:2024-07-17 01:45:18 公開日:2024-07-12
# アウト・オブ・ディストリビューション・セグメンテーションのためのインペインティングによるコンテキスト内オブジェクトの配置 Placing Objects in Context via Inpainting for Out-of-distribution Segmentation ( http://arxiv.org/abs/2402.16392v2 ) ライセンス: Link先を確認	Pau de Jorge, Riccardo Volpi, Puneet K. Dokania, Philip H. S. Torr, Gregory Rogez,	(参考訳) セマンティックセグメンテーションモデルを現実世界にデプロイする場合、トレーニング中に見られなかったセマンティッククラスに必然的に遭遇する。このようなシステムの安全なデプロイを保証するためには,その異常セグメンテーション能力を正確に評価し,改善することが重要である。しかし、セマンティックセグメンテーションデータの取得とラベル付けは高価であり、予測外の条件は長く、潜在的に危険である。実際、既存の異常セグメンテーションデータセットは限られた数の異常をキャプチャし、リアリズムを欠いているか、強いドメインシフトを持っている。本稿では,拡散モデルを用いて,任意のオブジェクトを任意の画像に現実的に付加する,コンテキストにおけるPlacing Objects in Context(POC)パイプラインを提案する。 POCは任意の数のオブジェクトで任意のデータセットを簡単に拡張するために使用することができる。実験では,POC生成データに基づく様々な異常セグメンテーションデータセットを提示し,POCが最新の最先端の異常調整手法の性能を向上させることを示す。 POCは、新しいクラスを学ぶのにも有効である。例えば、CityscapesのサンプルをPascalクラスのサブセットを組み込むことで強化し、そのようなデータに基づいてトレーニングされたモデルがPascalでトレーニングされたベースラインに匹敵するパフォーマンスを実現することを示す。このことはPOC生成画像に基づいて訓練されたモデルの低シント2リアルギャップを裏付ける。コード:https://github.com/naver/poc When deploying a semantic segmentation model into the real world, it will inevitably encounter semantic classes that were not seen during training. To ensure a safe deployment of such systems, it is crucial to accurately evaluate and improve their anomaly segmentation capabilities. However, acquiring and labelling semantic segmentation data is expensive and unanticipated conditions are long-tail and potentially hazardous. Indeed, existing anomaly segmentation datasets capture a limited number of anomalies, lack realism or have strong domain shifts. In this paper, we propose the Placing Objects in Context (POC) pipeline to realistically add any object into any image via diffusion models. POC can be used to easily extend any dataset with an arbitrary number of objects. In our experiments, we present different anomaly segmentation datasets based on POC-generated data and show that POC can improve the performance of recent state-of-the-art anomaly fine-tuning methods across several standardized benchmarks. POC is also effective for learning new classes. For example, we utilize it to augment Cityscapes samples by incorporating a subset of Pascal classes and demonstrate that models trained on such data achieve comparable performance to the Pascal-trained baseline. This corroborates the low synth2real gap of models trained on POC-generated images. Code: https://github.com/naver/poc	翻訳日:2024-07-17 01:45:18 公開日:2024-07-12
# PCR-99:99%のアウトリーチを持つポイントクラウド登録の実践的方法 PCR-99: A Practical Method for Point Cloud Registration with 99% Outliers ( http://arxiv.org/abs/2402.16598v3 ) ライセンス: Link先を確認	Seong Hun Lee, Javier Civera, Patrick Vandewalle,	(参考訳) 本稿では,未知のスケールと極端外周比の両方を扱える点雲登録法を提案する。 PCR-99と呼ばれる本手法では, 速度を著しく向上させる2つの新しいメカニズムを持つ決定論的3点サンプリング手法を用いて, 1) ペアスケールの整合性に基づくサンプルの整合性の向上, および(2) トリプルトスケールの整合性に基づく効率的な外乱除去手法, 悪いサンプルの事前スクリーニング, テスト対象の仮説数の削減を行う。提案手法は,98%のアウトレイラ比において,最先端技術に匹敵する性能を達成できることを示す。しかし、99%のアウトラヤ比では、既知のスケールと未知のスケールの問題の両方において、最先端の問題を上回ります。特に後者では、ロバスト性と速度の観点から明らかな優位性を観察する。 We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be inliers, and (2) an efficient outlier rejection scheme based on triplet scale consistency, prescreening bad samples and reducing the number of hypotheses to be tested. Our evaluation shows that, up to 98% outlier ratio, the proposed method achieves comparable performance to the state of the art. At 99% outlier ratio, however, it outperforms the state of the art for both known-scale and unknown-scale problems. Especially for the latter, we observe a clear superiority in terms of robustness and speed.	翻訳日:2024-07-17 01:45:18 公開日:2024-07-12
# 補助的敵防衛ネットワークによる追跡ロバスト性向上 Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks ( http://arxiv.org/abs/2402.17976v2 ) ライセンス: Link先を確認	Zhewei Wu, Ruilong Yu, Qihe Liu, Shuying Cheng, Shilin Qiu, Shijie Zhou,	(参考訳) 視覚的物体追跡における敵対的攻撃は、画像に知覚不能な摂動を導入することにより、高度なトラッカーの性能を著しく低下させた。しかし, 物体追跡のための対向防御手法の設計には, まだまだ研究の欠如がある。これらの問題に対処するため,提案するAADNは,トラッカーに入力される前に,入力画像に対する防御的変換を行う。さらに、パラメータ調整なしに他のビジュアルトラッカーとプラグイン・アンド・プレイモジュールとしてシームレスに統合することができる。我々は、AADNを、特にDua-Lossを用いて、トラッカーの分類と回帰の分岐を同時に攻撃する対向サンプルを生成するために、対向訓練を用いて訓練する。 OTB100、LaSOT、VOT2018ベンチマークで実施された大規模な実験により、AADNは適応的および非適応的な攻撃シナリオの両方において、敵攻撃手法に対する優れた防御堅牢性を維持していることが示された。さらに、防衛ネットワークを異種トラッカーに転送する際には、信頼性の高い転送性を示す。最後に、AADNは最大5ms/frameの処理時間を実現し、計算オーバーヘッドを伴わずに既存の高速トラッカーとシームレスに統合できる。 Adversarial attacks in visual object tracking have significantly degraded the performance of advanced trackers by introducing imperceptible perturbations into images. However, there is still a lack of research on designing adversarial defense methods for object tracking. To address these issues, we propose an effective auxiliary pre-processing defense network, AADN, which performs defensive transformations on the input images before feeding them into the tracker. Moreover, it can be seamlessly integrated with other visual trackers as a plug-and-play module without parameter adjustments. We train AADN using adversarial training, specifically employing Dua-Loss to generate adversarial samples that simultaneously attack the classification and regression branches of the tracker. Extensive experiments conducted on the OTB100, LaSOT, and VOT2018 benchmarks demonstrate that AADN maintains excellent defense robustness against adversarial attack methods in both adaptive and non-adaptive attack scenarios. Moreover, when transferring the defense network to heterogeneous trackers, it exhibits reliable transferability. Finally, AADN achieves a processing time of up to 5ms/frame, allowing seamless integration with existing high-speed trackers without introducing significant computational overhead.	翻訳日:2024-07-17 01:45:18 公開日:2024-07-12
# クラウドネイティブなマイクロサービスアプリケーションにおけるインフォームドおよびアセスブルな可観測性設計決定 Informed and Assessable Observability Design Decisions in Cloud-native Microservice Applications ( http://arxiv.org/abs/2403.00633v2 ) ライセンス: Link先を確認	Maria C. Borges, Joshua Bauer, Sebastian Werner, Michael Gebauer, Stefan Tai,	(参考訳) マイクロサービスアプリケーションの信頼性を保証するためには、可観測性が重要です。これらのアプリケーションは、異種環境にデプロイされる多くの独立したサービスがあるため、しばしば障害を起こしやすい。正しく"使用される場合、オブザーバビリティは、開発者が障害を素早く特定し、トラブルシュートするのに役立ちます。しかしながら、マイクロサービスアプリケーションの可観測性の測定と設定は簡単ではなく、ツールに依存し、コストに結びついている。アーキテクトは、観測可能性に関するトレードオフを理解して、異なる観測可能性設計の選択肢を重んじる必要がある。それでも、これらのアーキテクチャ設計決定は体系的な手法ではサポートされず、通常単に「専門的な直観」に依存している。本稿では,情報的かつ継続的に評価可能な可観測性設計決定に至るための体系的手法について論じる。具体的には、クラウドネイティブなマイクロサービスアプリケーションのフォールトオブザーバビリティに注目し、これをテスト可能で定量化可能なプロパティに変換する。目標に向かって、私たちはまず、クラウドネイティブスタック全体の可観測性設計決定の規模とスコープをモデル化します。次に、いわゆる可観測性実験を通じて、マイクロサービスアプリケーションで決定できる可観測性メトリクスを提案する。実験ツールOXNの概念実証実装について述べる。 OXNはChaos Engineeringに似た任意のフォールトをアプリケーションに注入できるが、可観測性の設定を変更するユニークな機能を備えており、以前は探索されていなかった設計上の決定を評価できる。一般的なオープンソースのマイクロサービスアプリケーションを使って、私たちのアプローチを実演し、さまざまな可観測性設計決定に関わるトレードオフを示しています。 Observability is important to ensure the reliability of microservice applications. These applications are often prone to failures, since they have many independent services deployed on heterogeneous environments. When employed "correctly", observability can help developers identify and troubleshoot faults quickly. However, instrumenting and configuring the observability of a microservice application is not trivial but tool-dependent and tied to costs. Architects need to understand observability-related trade-offs in order to weigh between different observability design alternatives. Still, these architectural design decisions are not supported by systematic methods and typically just rely on "professional intuition". In this paper, we argue for a systematic method to arrive at informed and continuously assessable observability design decisions. Specifically, we focus on fault observability of cloud-native microservice applications, and turn this into a testable and quantifiable property. Towards our goal, we first model the scale and scope of observability design decisions across the cloud-native stack. Then, we propose observability metrics which can be determined for any microservice application through so-called observability experiments. We present a proof-of-concept implementation of our experiment tool OXN. OXN is able to inject arbitrary faults into an application, similar to Chaos Engineering, but also possesses the unique capability to modify the observability configuration, allowing for the assessment of design decisions that were previously left unexplored. We demonstrate our approach using a popular open source microservice application and show the trade-offs involved in different observability design decisions.	翻訳日:2024-07-17 01:45:18 公開日:2024-07-12
# 公平な医用画像分類のための基礎モデルに基づくノイズ編集 Debiased Noise Editing on Foundation Models for Fair Medical Image Classification ( http://arxiv.org/abs/2403.06104v4 ) ライセンス: Link先を確認	Ruinan Jin, Wenlong Deng, Minghui Chen, Xiaoxiao Li,	(参考訳) ファウンデーション・モデル(FM)がAIで優位に立つ時代において、我々の研究は医療画像のバイアスの問題に対処し、そのモデルがブラックボックス(例えば、FM API)で動作し、特に画素と感度属性の急激な相関関係である。従来のバイアス緩和手法は、WebホストされたFMへのアクセスが制限されていることと、FM APIで符号化された基盤となるバイアスに対処することの難しさにより、制限に直面している。本稿では,DNEノイズを発生させるD(ebiased)N(oise)E(diting)戦略を提案する。 DNEはFM APIの埋め込みとイメージ自体のバイアスを軽減することができる。さらに,G(reedy) (Z)eroth-O(rder) (GeZO) をブラックボックスAPIでアクセスできない場合,DNEはWhite-boxとBlack-boxの両方のFM APIに適している。我々のパイプライン全体は、直接モデル操作や重要な計算資源を必要とせずに、様々な医療状況にまたがって適用可能な公平性に配慮した画像編集を可能にする。本手法の有効性を実証し, 患者集団, 疾患間の公平性, 有用性について検討した。 AI駆動医療の時代において、この研究は医療診断をより公平にし、事前訓練された画像FMにおけるバイアス軽減の実践的な解決策を示す。私たちのコードはhttps://github.com/ubc-tea/DNE-foundation-model-fairnessで提供されます。 In the era of Foundation Models' (FMs) rising prominence in AI, our study addresses the challenge of biases in medical images while the model operates in black-box (e.g., using FM API), particularly spurious correlations between pixels and sensitive attributes. Traditional methods for bias mitigation face limitations due to the restricted access to web-hosted FMs and difficulties in addressing the underlying bias encoded within the FM API. We propose a D(ebiased) N(oise) E(diting) strategy, termed DNE, which generates DNE noise to mask such spurious correlation. DNE is capable of mitigating bias both within the FM API embedding and the images themselves. Furthermore, DNE is suitable for both white-box and black-box FM APIs, where we introduced G(reedy) (Z)eroth-O(rder) (GeZO) optimization for it when the gradient is inaccessible in black-box APIs. Our whole pipeline enables fairness-aware image editing that can be applied across various medical contexts without requiring direct model manipulation or significant computational resources. Our empirical results demonstrate the method's effectiveness in maintaining fairness and utility across different patient groups and diseases. In the era of AI-driven medicine, this work contributes to making healthcare diagnostics more equitable, showcasing a practical solution for bias mitigation in pre-trained image FMs. Our code is provided at https://github.com/ubc-tea/DNE-foundation-model-fairness.	翻訳日:2024-07-17 01:35:33 公開日:2024-07-12
# 障害とモニタリングにより局在したシステムにおける単一粒子波動関数の非破壊 Unscrambling of single-particle wave functions in systems localized through disorder and monitoring ( http://arxiv.org/abs/2403.10725v4 ) ライセンス: Link先を確認	Marcin Szyniszewski,	(参考訳) 障害やモニタリングによる局在化-非局在化量子相転移を行うシステムでは、位相を識別し、固有の性質を明らかにすることのできるロバストな方法が不可欠である。本研究では,局所粒子を正確に特徴付ける自由フェルミオン波動関数のスレーター決定式を求める過程,すなわち「アンスクラムリング」を解く過程を開発する。中心となる考え方は、単一粒子波動関数のエンベロープ間の重なりを最小化すること、または等価に、各軌道の逆参加比を最大化することである。この数値的に効率的な手法は、指数的局所化(英語版)、パワーロー局所化(英語版)、コンフォメーションクリティカル(英語版)といった異なる種類の波動関数を区別することができる。この方法は、より高次元のシステムに容易に拡張可能である。さらに,不規則な監視自由フェルミオンを1次元に含むより困難な問題に適用し,非破壊過程が共形臨界相と局所化領域法量子Zeno相の存在を明らかにする。本手法は粒子数保存のない自由フェルミオン系にも拡張可能であり, $\mathbb{Z}_2$-symmetric disordered monitored free fermion の位相図を推定して実演する。その結果, 単一粒子波動関数を応用して, 観測された自由フェルミオンや乱れモデルなどのシステムにおける局在化遷移特性について, 貴重な知見を得ることが可能となった。 In systems undergoing localization-delocalization quantum phase transitions due to disorder or monitoring, there is a crucial need for robust methods capable of distinguishing phases and uncovering their intrinsic properties. In this work, we develop a process of finding a Slater determinant representation of free-fermion wave functions that accurately characterizes localized particles, a procedure we dub "unscrambling". The central idea is to minimize the overlap between envelopes of single-particle wave functions or, equivalently, to maximize the inverse participation ratio of each orbital. This numerically efficient methodology can differentiate between distinct types of wave functions: exponentially localized, power-law localized, and conformal critical, also revealing the underlying physics of these states. The method is readily extendable to systems in higher dimensions. Furthermore, we apply this approach to a more challenging problem involving disordered monitored free fermions in one dimension, where the unscrambling process unveils the presence of a conformal critical phase and a localized area-law quantum Zeno phase. Importantly, our method can also be extended to free fermion systems without particle number conservation, which we demonstrate by estimating the phase diagram of $\mathbb{Z}_2$-symmetric disordered monitored free fermions. Our results unlock the potential of utilizing single-particle wave functions to gain valuable insights into the localization transition properties in systems such as monitored free fermions and disordered models.	翻訳日:2024-07-17 01:25:38 公開日:2024-07-12
# 人工知能と自然知の混合:統計力学からAIへ、乱流へ Mixing Artificial and Natural Intelligence: From Statistical Mechanics to AI and Back to Turbulence ( http://arxiv.org/abs/2403.17993v3 ) ライセンス: Link先を確認	Michael Chertkov,	(参考訳) この論文は、特に乱流研究に焦点を当てた科学研究におけるAIの役割を反映し、特に非平衡統計力学に根ざした拡散モデルを通して、AIの進化について考察する。これは、ディープニューラルネットワークの革新的利用を通じて、ラグランジアンモデルによる乱流の減少に対するAIの重大な影響を浮き彫りにしている。さらに、乱流研究における様々なAI応用をレビューし、AIと統計流体力学の同時進行における潜在的な課題と機会を概説する。この議論は、AIと乱流の研究が複雑に絡み合っており、両方の分野においてより深い洞察と進歩をもたらす未来へのステージを定めている。 The paper reflects on the future role of AI in scientific research, with a special focus on turbulence studies, and examines the evolution of AI, particularly through Diffusion Models rooted in non-equilibrium statistical mechanics. It underscores the significant impact of AI on advancing reduced, Lagrangian models of turbulence through innovative use of deep neural networks. Additionally, the paper reviews various other AI applications in turbulence research and outlines potential challenges and opportunities in the concurrent advancement of AI and statistical hydrodynamics. This discussion sets the stage for a future where AI and turbulence research are intricately intertwined, leading to more profound insights and advancements in both fields.	翻訳日:2024-07-17 01:15:36 公開日:2024-07-12
# TCLC-GS:LiDAR-Camera Gaussian Splatting for autonomous Driving TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving ( http://arxiv.org/abs/2404.02410v2 ) ライセンス: Link先を確認	Cheng Zhao, Su Sun, Ruoyu Wang, Yuliang Guo, Jun-Jun Wan, Zhou Huang, Xinyu Huang, Yingjie Victor Chen, Liu Ren,	(参考訳) 都市シーンのほとんどの3Dガウススティング(3D-GS)ベースの手法は、3Dガウスを3D LiDARポイントで直接初期化するが、これはLiDARのデータ能力を過小評価するだけでなく、カメラデータにLiDARを融合する潜在的な利点を見落としている。本稿では,LiDAR-Camera Gaussian Splatting (TCLC-GS) を設計し,LiDARとカメラセンサの双方の強度をフル活用し,高速で高品質な3D再構成とRGB/deepth合成を実現する。 TCLC-GSは、LiDARカメラデータから得られたハイブリッドな(カラー化された3Dメッシュ)と暗黙的な(階層的なオクツリー特徴)の3D表現を設計し、スプレイティングのために3Dガウスの性質を豊かにする。 3Dガウスの性質は、より完成度の高い3D形状と色情報を提供する3Dメッシュと一致して初期化されるだけでなく、検索したオクツリーの暗黙的特徴を通じてより広い文脈情報も付与される。ガウススプレイティング最適化プロセスの間、3Dメッシュは密度の深い深度情報を監視として提供し、ロバストな幾何学を学ぶことでトレーニングプロセスを強化する。 Waymo Open Dataset と nuScenes Dataset の総合評価は、我々の方法のSOTA(State-of-the-art)性能を検証する。 NVIDIA RTX 3090 Tiを1つのNVIDIA RTX 3090 Tiを用いて高速トレーニングを行い,1920x1280 (Waymo)の解像度で90FPS,都市シナリオで1600x900 (nuScenes)の解像度で120FPSの解像度でリアルタイムRGBと深度レンダリングを実現する。 Most 3D Gaussian Splatting (3D-GS) based methods for urban scenes initialize 3D Gaussians directly with 3D LiDAR points, which not only underutilizes LiDAR data capabilities but also overlooks the potential advantages of fusing LiDAR with camera data. In this paper, we design a novel tightly coupled LiDAR-Camera Gaussian Splatting (TCLC-GS) to fully leverage the combined strengths of both LiDAR and camera sensors, enabling rapid, high-quality 3D reconstruction and novel view RGB/depth synthesis. TCLC-GS designs a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree feature) 3D representation derived from LiDAR-camera data, to enrich the properties of 3D Gaussians for splatting. 3D Gaussian's properties are not only initialized in alignment with the 3D mesh which provides more completed 3D shape and color information, but are also endowed with broader contextual information through retrieved octree implicit features. During the Gaussian Splatting optimization process, the 3D mesh offers dense depth information as supervision, which enhances the training process by learning of a robust geometry. Comprehensive evaluations conducted on the Waymo Open Dataset and nuScenes Dataset validate our method's state-of-the-art (SOTA) performance. Utilizing a single NVIDIA RTX 3090 Ti, our method demonstrates fast training and achieves real-time RGB and depth rendering at 90 FPS in resolution of 1920x1280 (Waymo), and 120 FPS in resolution of 1600x900 (nuScenes) in urban scenarios.	翻訳日:2024-07-17 01:05:49 公開日:2024-07-12
# BiSHop: 汎用スパースホップフィールドモデルによる話者データの双方向セルラー学習 BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model ( http://arxiv.org/abs/2404.03830v2 ) ライセンス: Link先を確認	Chenwei Xu, Yu-Chao Huang, Jerry Yao-Chieh Hu, Weijian Li, Ammar Gilani, Hsi-Sheng Goan, Han Liu,	(参考訳) 本稿では,表層学習のための新しいエンド・ツー・エンド・エンドフレームワークであるtextbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop})を紹介する。 BiSHopは、深層表型学習の2つの大きな課題に対処する。我々の主要な動機は、連想記憶と注意機構の結びつきが最近確立されたことにある。結果として、BiSHopは2つの相互接続された指向学習モジュールを通して列と行の両方のデータを逐次処理するデュアルコンポーネントアプローチを使用する。計算学的には、これらの加群は一般化されたスパースな現代的なホップフィールド層(英語版)の層を持ち、適応可能な間隔を持つ現代のホップフィールドモデルのスパース拡張である。メソジカルには、BiSHopはマルチスケールの表現学習を促進し、機能内相互作用と機能間相互作用の両方を、各スケールで適応的な間隔でキャプチャする。実証的には、さまざまな実世界のデータセットの実験を通じて、BiSHopが現在のSOTAメソッドをはるかに少ないHPOの実行で超越し、深い表層学習のための堅牢なソリューションであることを実証した。 We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and attention mechanisms. Consequently, BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise through two interconnected directional learning modules. Computationally, these modules house layers of generalized sparse modern Hopfield layers, a sparse extension of the modern Hopfield model with adaptable sparsity. Methodologically, BiSHop facilitates multi-scale representation learning, capturing both intra-feature and inter-feature interactions, with adaptive sparsity at each scale. Empirically, through experiments on diverse real-world datasets, we demonstrate that BiSHop surpasses current SOTA methods with significantly less HPO runs, marking it a robust solution for deep tabular learning.	翻訳日:2024-07-17 01:05:49 公開日:2024-07-12
# CTを用いた拡散シュレーディンガーブリッジによる脳室分画 : 対象領域の真理を伴わない CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths ( http://arxiv.org/abs/2405.18267v2 ) ライセンス: Link先を確認	Reihaneh Teimouri, Marta Kersten-Oertel, Yiming Xiao,	(参考訳) クリニカルCTスキャンによる高効率かつ正確な脳室分画は、腹腔鏡下手術のような緊急手術には不可欠である。ソフトティッシュコントラストの低下と, 臨床脳CTの注釈データベースの不足にともなって, 拡散モデルに基づくドメイン適応を生かして, CTセグメンテーションの真理を必要とせず, 新たな不確実性を意識した心室セグメンテーション技術を導入する。具体的には拡散型Schr\odinger Bridgeとアテンション・リカレントU-Netを併用し,MRIと自動CTセグメンテーションを導出する。重要なことは、画像翻訳とセグメンテーションタスクのエンドツーエンドで協調的なトレーニングフレームワークを提案し、個別のタスクを個別にトレーニングするよりも、その利点を実証することである。ドメイン適応のための2つの異なるGANモデル(CycleGAN と CUT)を用いて、類似した設定と比較することにより、拡散モデルの利点をセグメント化と画像翻訳品質の改善に向けて明らかにする。提案手法はDiceスコア0.78$\pm$0.27で,SynSeg-Netを含む比較手法よりも優れ,自動セグメンテーション結果の品質管理をより容易にするための直感的な不確実性対策を提供する。提案手法の実装は、https://github.com/HealthX-Lab/DiffusionSynCTSegで利用可能である。 Efficient and accurate brain ventricle segmentation from clinical CT scans is critical for emergency surgeries like ventriculostomy. With the challenges in poor soft tissue contrast and a scarcity of well-annotated databases for clinical brain CTs, we introduce a novel uncertainty-aware ventricle segmentation technique without the need of CT segmentation ground truths by leveraging diffusion-model-based domain adaptation. Specifically, our method employs the diffusion Schr\"odinger Bridge and an attention recurrent residual U-Net to capitalize on unpaired CT and MRI scans to derive automatic CT segmentation from those of the MRIs, which are more accessible. Importantly, we propose an end-to-end, joint training framework of image translation and segmentation tasks, and demonstrate its benefit over training individual tasks separately. By comparing the proposed method against similar setups using two different GAN models for domain adaptation (CycleGAN and CUT), we also reveal the advantage of diffusion models towards improved segmentation and image translation quality. With a Dice score of 0.78$\pm$0.27, our proposed method outperformed the compared methods, including SynSeg-Net, while providing intuitive uncertainty measures to further facilitate quality control of the automatic segmentation outcomes. The implementation of our proposed method is available at: https://github.com/HealthX-Lab/DiffusionSynCTSeg.	翻訳日:2024-07-17 00:36:09 公開日:2024-07-12
# SAMM:Sharded Automated Market Makers SAMM: Sharded Automated Market Makers ( http://arxiv.org/abs/2406.05568v2 ) ライセンス: Link先を確認	Hongyin Chen, Amit Vaisman, Ittay Eyal,	(参考訳) \emph{Automated Market Makers} (\emph{AMMs})は、分散型金融(DeFi)ブロックチェーンベースのプラットフォームの基礎である。それらはスマートコントラクトであり、 \emph{liquidity pool} を維持することで、仮想トークンの直接交換を可能にする。トレーダーは契約書とトークンを交換し、手数料を支払い、流動性はこれらの手数料で支払われる「emph{liquidity providers}」から得られる。しかし、需要が増えているにもかかわらず、AMMのパフォーマンスは限られている。最先端のブロックチェーンプラットフォームは、トランザクションの並列実行を可能にする。しかし,AMMは演算が可換ではないため,トランザクションをシリアライズしなければならないため,これらの利得を享受できないことを示す。複数の独立な \emph{shards} からなる AMM である \emph{SAMM} を述べる。すべてのシャードは、同じチェーンで動作するスマートコントラクトだが、それぞれが独立しているため、並列実行が可能である。課題は、標準的なAMMでの取引が流動性プールが大きい場合、より安いことである。したがって、複数のAMMを単純に使用すれば、トレーダーは各取引を全てのAMMに分割し、パフォーマンスが悪化することを示す。 SAMMは取引手数料の新しい設計でこの問題に対処する。トレーダーは最小のシャードのみを使用するようにインセンティブを得ている。流動性プロバイダは、すべてのプールの流動性をバランスさせ、取引が均等に分散された状態に収束する。 Suiブロックチェーンの評価によると、SAMMのスループットは従来のAMMの5倍以上であり、システムの限界に近づいている。 SAMMは直接デプロイ可能なオープンソーススマートコントラクトであり、個人とDeFiアプリケーションの大規模取引を可能にする。 \emph{Automated Market Makers} (\emph{AMMs}) are a cornerstone of decentralized finance (DeFi) blockchain-based platforms. They are smart contracts, enabling the direct exchange of virtual tokens by maintaining \emph{liquidity pools}. Traders exchange tokens with the contract, paying a fee; liquidity comes from \emph{liquidity providers}, paid by those fees. But despite growing demand, the performance of AMMs is limited. State-of-the-art blockchain platforms allow for parallel execution of transactions. However, we show that AMMs do not enjoy these gains, since their operations are not commutative so transactions using them must be serialized. We present \emph{SAMM}, an AMM comprising multiple independent \emph{shards}. All shards are smart contracts operating in the same chain, but they allow for parallel execution as each is independent. The challenge is that trading in a standard AMM is cheaper if its liquidity pool is larger. Therefore, we show that simply using multiple smaller AMMs results in traders splitting each trade among all AMMs, which worsens performance. SAMM addresses this issue with a novel design of the trading fees. Traders are incentivized to use only a single smallest shard. We show that all Subgame-Perfect Nash Equilibria (SPNE) fit the desired behavior: Liquidity providers balance the liquidity among all pools, so the system converges to the state where trades are evenly distributed. Evaluation in the Sui blockchain shows that SAMM's throughput is over fivefold that of traditional AMMs, approaching the system's limit. SAMM is a directly deployable open-source smart contract, allowing trading at scale for individuals and DeFi applications.	翻訳日:2024-07-17 00:16:39 公開日:2024-07-12
# IPv4ID選択精度,セキュリティ,性能の分類と比較分析 A Taxonomy and Comparative Analysis of IPv4 ID Selection Correctness, Security, and Performance ( http://arxiv.org/abs/2406.06483v2 ) ライセンス: Link先を確認	Joshua J. Daymude, Antonio M. Espinoza, Sean Bergen, Benjamin Mixon-Baca, Jeffrey Knockel, Jedidiah R. Crandall,	(参考訳) よりセキュアなインターネットへの戦いは、ネットワークプロトコルの最も基本的な部分を含む、多くの面で争われている。 IPv4 Identifier (IPID)は、IPv4ヘッダフィールドであり、ネットワーク特性をスキャンし、オフパス接続を推測し、DNSキャッシュを害する悪用されたサイドチャネルとして、インターネットと同じくらい長い歴史を持つ。本稿では,25年間のIPID利用履歴とそれに対応するIPID選択方法の変更を分類する。これらの手法の正しさと安全性を数学的に解析し、その性能を実証的に評価することにより、ネットワークセキュリティにおける体系的評価の価値を強調するとともに、現在のオペレーティングシステム実装の欠点と同様にベストプラクティスの推奨を明らかにする。 The battle for a more secure Internet is waged on many fronts, including the most basic of networking protocols. Our focus is the IPv4 Identifier (IPID), an IPv4 header field as old as the Internet with an equally long history as an exploited side channel for scanning network properties, inferring off-path connections, and poisoning DNS caches. This article taxonomizes the 25-year history of IPID-based exploits and the corresponding changes to IPID selection methods. By mathematically analyzing these methods' correctness and security and empirically evaluating their performance, we reveal recommendations for best practice as well as shortcomings of current operating system implementations, emphasizing the value of systematic evaluations in network security.	翻訳日:2024-07-17 00:16:39 公開日:2024-07-12
# 機械的解釈可能性によるモデル性能のコンパクト証明 Compact Proofs of Model Performance via Mechanistic Interpretability ( http://arxiv.org/abs/2406.11779v8 ) ライセンス: Link先を確認	Jason Gross, Rajashree Agrawal, Thomas Kwa, Euan Ong, Chun Hei Yip, Alex Gibson, Soufiane Noubir, Lawrence Chan,	(参考訳) 本稿では,モデル性能の形式的保証を導出し,コンパクトに証明するために,機械的解釈可能性,すなわちリバースエンジニアリングモデルウェイトを人間解釈可能なアルゴリズムに変換する手法を提案する。提案手法は, 最大K$タスクで訓練した151個の小型変圧器の精度について, 下限を正式に証明して試作する。我々は,コンピュータ支援型証明戦略を102種類作成し,それぞれのモデルに対して,その長さと厳密さを評価する。定量的な測定値を用いることで、より短い証明が必要になり、より機械的な理解が得られます。さらに、より忠実なメカニスティックな理解が、パフォーマンス境界の厳密化につながることが分かっています。これらの関係は、証明のサブセットを質的に検証することで確認する。最後に, モデル性能に関するコンパクトな証明を生成するために, 機械的解釈可能性を利用する上で重要な課題として, 合成構造のないノイズを同定する。 We propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.	翻訳日:2024-07-17 00:16:39 公開日:2024-07-12
# 解析的リアプノフ関数発見のためのニューラルネットワークとシンボリック回帰の組み合わせ Combining Neural Networks and Symbolic Regression for Analytical Lyapunov Function Discovery ( http://arxiv.org/abs/2406.15675v3 ) ライセンス: Link先を確認	Jie Feng, Haohan Zou, Yuanyuan Shi,	(参考訳) 非線形力学系に対する解析的リアプノフ関数を構成するために,CoNSAL (Combining Neural Network and Symbolic regression for Analytical Lyapunov function)を提案する。このフレームワークは、ニューラルネットワークを精密な分析形式に蒸留するためにシンボリックレグレッションを適用する、ニューラルリアプノフ関数とシンボリックレグレッション成分を含む。本手法は, 記号回帰を翻訳の道具としてだけでなく, 反例を明らかにする手段としても活用する。この手順は、解析的定式化において反例が見つからない場合に終了する。従来の結果と比較して、CoNSALは学習過程と最終結果の両方における解釈性を改善したリアプノフ関数の解析形式を直接生成する。我々は,CoNSALを2次元逆振子,経路追従,Van Der Pol Oscillator,3次元トリグダイナミクス,4次元回転輪振子,6次元3バスパワーシステムに適用し,本アルゴリズムが有効なリアプノフ関数の発見に成功したことを示す。コード例はhttps://github.com/HaohanZou/CoNSALで公開されている。 We propose CoNSAL (Combining Neural networks and Symbolic regression for Analytical Lyapunov function) to construct analytical Lyapunov functions for nonlinear dynamic systems. This framework contains a neural Lyapunov function and a symbolic regression component, where symbolic regression is applied to distill the neural network to precise analytical forms. Our approach utilizes symbolic regression not only as a tool for translation but also as a means to uncover counterexamples. This procedure terminates when no counterexamples are found in the analytical formulation. Compared with previous results, CoNSAL directly produces an analytical form of the Lyapunov function with improved interpretability in both the learning process and the final results. We apply CoNSAL to 2-D inverted pendulum, path following, Van Der Pol Oscillator, 3-D trig dynamics, 4-D rotating wheel pendulum, 6-D 3-bus power system, and demonstrate that our algorithm successfully finds their valid Lyapunov functions. Code examples are available at https://github.com/HaohanZou/CoNSAL.	翻訳日:2024-07-17 00:06:54 公開日:2024-07-12
# AIによる災害救助支援ドローン:課題と機会 AI-based Drone Assisted Human Rescue in Disaster Environments: Challenges and Opportunities ( http://arxiv.org/abs/2406.15875v2 ) ライセンス: Link先を確認	Narek Papyan, Michel Kulhandjian, Hovannes Kulhandjian, Levon Hakob Aslanyan,	(参考訳) 本調査では,特にヒトの悲鳴やその他の苦難信号を識別することで,個人検出にドローンベースのシステムを活用することに重点を置いている。この研究は、地震、ハリケーン、軍事紛争、山火事などの災害後のシナリオに大きく関係している。これらのドローンは、救助隊が直接アクセスすることが困難な災害に遭った地域をホバリングすることができる。無人航空機(英: Unmanned air vehicle、UAV)は、災害時の捜索救助任務のためにしばしば配備される航空機である。通常、ドローンは空中画像をキャプチャして、構造的な損傷を評価し、災害の程度を識別する。また、熱画像技術を使って体温を検知し、個人を見つけるのに役立つ。大規模なドローンは、孤立した災害で立ち往生している人々に必須の物資を届けるために使われる場合もある。本論では, 空中音響による人間の位置推定にまつわる課題について考察する。聴覚システムは、動物の鳴き声や風など、自然に発生する人間の叫び声と音を区別しなければならない。さらに、人々が救助隊に合図しようとする、叫び声や拍手などの信号に関連する、異なるパターンを認識する能力も備えるべきである。この課題に対処するためには、人工知能(AI)を使用して音の周波数を分析し、一般的なオーディオシグネチャを識別する。畳み込みニューラルネットワーク(CNN)のようなディープラーニングベースのネットワークは、これらのシグネチャを使用して、ドローンモーターやその他の環境要因によって発生するノイズを除去する訓練が可能である。さらに、マイクロホンアレイ信号に基づく到着方向(DOA)のような信号処理技術を用いることで、人間の騒音の音源を追跡する精度を高めることができる。 In this survey we are focusing on utilizing drone-based systems for the detection of individuals, particularly by identifying human screams and other distress signals. This study has significant relevance in post-disaster scenarios, including events such as earthquakes, hurricanes, military conflicts, wildfires, and more. These drones are capable of hovering over disaster-stricken areas that may be challenging for rescue teams to access directly. Unmanned aerial vehicles (UAVs), commonly referred to as drones, are frequently deployed for search-and-rescue missions during disaster situations. Typically, drones capture aerial images to assess structural damage and identify the extent of the disaster. They also employ thermal imaging technology to detect body heat signatures, which can help locate individuals. In some cases, larger drones are used to deliver essential supplies to people stranded in isolated disaster-stricken areas. In our discussions, we delve into the unique challenges associated with locating humans through aerial acoustics. The auditory system must distinguish between human cries and sounds that occur naturally, such as animal calls and wind. Additionally, it should be capable of recognizing distinct patterns related to signals like shouting, clapping, or other ways in which people attempt to signal rescue teams. To tackle this challenge, one solution involves harnessing artificial intelligence (AI) to analyze sound frequencies and identify common audio signatures. Deep learning-based networks, such as convolutional neural networks (CNNs), can be trained using these signatures to filter out noise generated by drone motors and other environmental factors. Furthermore, employing signal processing techniques like the direction of arrival (DOA) based on microphone array signals can enhance the precision of tracking the source of human noises.	翻訳日:2024-07-17 00:06:54 公開日:2024-07-12
# ArzEn-LLM:LLMを用いたコード変換エジプト英語翻訳と音声認識 ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs ( http://arxiv.org/abs/2406.18120v2 ) ライセンス: Link先を確認	Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali, Rania Hossam, Walid Gomaa,	(参考訳) 近年のエジプト・アラビア語と英語のコードスイッチング現象の広範化にともなって、機械翻訳(MT)と自動音声認識(ASR)システムの複雑さを探求し、コードスイッチしたエジプト・アラビア語を英語またはエジプト・アラビア語に翻訳することに焦点を当てた。本研究の目的は,LLama や Gemma などの大規模言語モデルを用いて,これらのシステム開発に使用される方法論を提示することである。 ASR の分野では,Whisper モデルをコード変更によるエジプトのアラビア語認識に利用し,データ前処理やトレーニング技術を含む実験手順を詳述する。 ASRをMTと統合した連続的な音声テキスト翻訳システムの実装を通じて、限られた資源とエジプト・アラビア方言の特徴によって生じる課題を克服することを目指している。確立された指標に対する評価は有望な結果を示し、我々の手法は、最先端の英語翻訳に対して56\%、アラビア語翻訳では9.3\%の大幅な改善をもたらす。コードスイッチングは音声言語に深く依存しているため、ASRシステムはこの現象を効果的に扱えることが重要である。この能力は、ビジネス交渉、文化交流、学術談話など、様々な分野におけるシームレスな対話を可能にするために不可欠である。私たちのモデルとコードはオープンソースリソースとして利用できます。コード: \url{http://github.com/ahmedheakl/arazn-llm}}, Models: \url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e} Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect. Evaluation against established metrics showcases promising results, with our methodologies yielding a significant improvement of $56\%$ in English translation over the state-of-the-art and $9.3\%$ in Arabic translation. Since code-switching is deeply inherent in spoken languages, it is crucial that ASR systems can effectively handle this phenomenon. This capability is crucial for enabling seamless interaction in various domains, including business negotiations, cultural exchanges, and academic discourse. Our models and code are available as open-source resources. Code: \url{http://github.com/ahmedheakl/arazn-llm}}, Models: \url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e}.	翻訳日:2024-07-17 00:06:54 公開日:2024-07-12
# ResumeAtlas:大規模データセットと大規模言語モデルによるResume分類の再検討 ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models ( http://arxiv.org/abs/2406.18125v2 ) ライセンス: Link先を確認	Ahmed Heakl, Youssef Mohamed, Noran Mohamed, Aly Elsharkawy, Ahmed Zaky,	(参考訳) オンライン採用プラットフォームへの依存度の増加とAI技術の採用は、効率的な再編成手法の必要性を浮き彫りにした。しかし、小さなデータセット、標準化された履歴テンプレートの欠如、プライバシー問題といった課題は、既存の分類モデルの正確性と有効性を妨げている。本研究では,これらの課題に対して,分類を再開するための包括的アプローチを提案する。多様な情報源から13,389人の履歴書を収集し,BERT や Gemma1.1 2B などの大規模言語モデル (LLM) を用いて分類を行った。その結果,従来の機械学習手法に比べて,トップ1の精度92\%,トップ5の精度97.5\%を達成した。これらの知見は、履歴分類システムの精度と堅牢性を高めるために、データセットの品質と高度なモデルアーキテクチャの重要性を浮き彫りにして、オンライン採用の実践の分野を推し進めている。 The increasing reliance on online recruitment platforms coupled with the adoption of AI technologies has highlighted the critical need for efficient resume classification methods. However, challenges such as small datasets, lack of standardized resume templates, and privacy concerns hinder the accuracy and effectiveness of existing classification models. In this work, we address these challenges by presenting a comprehensive approach to resume classification. We curated a large-scale dataset of 13,389 resumes from diverse sources and employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for classification. Our results demonstrate significant improvements over traditional machine learning approaches, with our best model achieving a top-1 accuracy of 92\% and a top-5 accuracy of 97.5\%. These findings underscore the importance of dataset quality and advanced model architectures in enhancing the accuracy and robustness of resume classification systems, thus advancing the field of online recruitment practices.	翻訳日:2024-07-17 00:06:54 公開日:2024-07-12
# 資源依存関係研究のための概念的・形式的基礎研究 Conceptual and formal groundwork for the study of resource dependence relations ( http://arxiv.org/abs/2407.00164v2 ) ライセンス: Link先を確認	Yìlè Yīng, Tomáš Gonda, Robert Spekkens,	(参考訳) 資源理論は状態に対して事前順序を課し、1つの状態が1番目の状態から2番目の状態へ自由な操作で変換できる場合、そして自由な操作の集合が研究中のリソースフルネスの概念を定義する。一般に、1つの資源理論の序列における状態の位置は、異なる資源理論の序列における位置を制約することができる。リソースフルネスの異なる概念の間には、非自明な依存関係が存在する可能性がある。本稿では,資源依存関係の研究における概念的および形式的基礎を概説する。特に、各資源理論の完全集合を含む一組のモノトン間の関係が、資源依存関係の完全な特徴を与えることに留意する。例えば、ブリュッホ球面上の3つの直交軸に沿ったキュービットの近面非対称性に関する3つの資源理論を考えると、この近面対称性は、同一性写像と与えられた軸上の$\pi$回転からなる$\mathbb{Z}_2$の表現を指す。この例は、各資源理論に対して完全なモノトンの集合を導出することができ、これらのモノトンの間に保持されるすべての関係を決定できるので、リソース依存関係を決定できる。しかしながら、この最も単純な例であっても、これらの関係はすでにかなり曖昧である。 A resource theory imposes a preorder over states, with one state being above another if the first can be converted to the second by a free operation, and where the set of free operations defines the notion of resourcefulness under study. In general, the location of a state in the preorder of one resource theory can constrain its location in the preorder of a different resource theory. It follows that there can be nontrivial dependence relations between different notions of resourcefulness. In this article, we lay out the conceptual and formal groundwork for the study of resource dependence relations. In particular, we note that the relations holding among a set of monotones that includes a complete set for each resource theory provides a full characterization of resource dependence relations. As an example, we consider three resource theories concerning the about-face asymmetry properties of a qubit along three mutually orthogonal axes on the Bloch ball, where about-face symmetry refers to a representation of $\mathbb{Z}_2$, consisting of the identity map and a $\pi$ rotation about the given axis. This example is sufficiently simple that we are able to derive a complete set of monotones for each resource theory and to determine all of the relations that hold among these monotones, thereby completely solving the problem of determining resource dependence relations. Nonetheless, we show that even in this simplest of examples, these relations are already quite nuanced.	翻訳日:2024-07-16 23:57:10 公開日:2024-07-12
# FlowCon:フローベースコントラスト学習を用いたアウト・オブ・ディストリビューション検出 FlowCon: Out-of-Distribution Detection using Flow-Based Contrastive Learning ( http://arxiv.org/abs/2407.03489v2 ) ライセンス: Link先を確認	Saandeep Aathreya, Shaun Canavan,	(参考訳) ディープラーニング手法の現実的な応用が拡大するにつれて、OOD(Out-of-distriion)データの特定がますます重要になっている。ポストホック法では、オフレイアデータに微調整されたソフトマックススコアを変更したり、中間特徴層を活用して、In-Distribution(ID)とOODサンプルの識別を行う。他の方法は多様なOODサンプルを用いてIDとOODの相違を学習することに焦点を当てている。しかしながら、これらの手法は典型的には、想定される外れ値のサンプルの品質に依存する。密度ベースのメソッドは明示的にクラス条件の分布をモデル化するが、これは長いトレーニング時間や分類器の再訓練を必要とする。これらの問題に対処するために、新しい密度に基づくOOD検出技術である \textit{FlowCon} を導入する。我々の主な革新は、正規化フローの特性と教師付きコントラスト学習を効率的に組み合わせることであり、堅牢な表現学習とトラクタブル密度推定を確実にすることである。 ResNet18 や WideResNet の分類器で事前訓練した CIFAR-10 や CIFAR-100 などの共通ビジョンデータセットに対して,本手法の有効性を実証的に評価した。また、UMAP埋め込みを用いた確率プロットと定性的可視化を用いて定量的解析を行い、様々なOODコンテキスト下で提案手法のロバスト性を示す。コードは、決定後、オープンソース化される。 Identifying Out-of-distribution (OOD) data is becoming increasingly critical as the real-world applications of deep learning methods expand. Post-hoc methods modify softmax scores fine-tuned on outlier data or leverage intermediate feature layers to identify distinctive patterns between In-Distribution (ID) and OOD samples. Other methods focus on employing diverse OOD samples to learn discrepancies between ID and OOD. These techniques, however, are typically dependent on the quality of the outlier samples assumed. Density-based methods explicitly model class-conditioned distributions but this requires long training time or retraining the classifier. To tackle these issues, we introduce \textit{FlowCon}, a new density-based OOD detection technique. Our main innovation lies in efficiently combining the properties of normalizing flow with supervised contrastive learning, ensuring robust representation learning with tractable density estimation. Empirical evaluation shows the enhanced performance of our method across common vision datasets such as CIFAR-10 and CIFAR-100 pretrained on ResNet18 and WideResNet classifiers. We also perform quantitative analysis using likelihood plots and qualitative visualization using UMAP embeddings and demonstrate the robustness of the proposed method under various OOD contexts. Code will be open-sourced post decision.	翻訳日:2024-07-16 23:47:24 公開日:2024-07-12
# フラッシュ正規化 - LLM用高速RMSNorm Flash normalization: fast RMSNorm for LLMs ( http://arxiv.org/abs/2407.09577v1 ) ライセンス: Link先を確認	Nils Graef, Matthew Clapp, Andrew Wasielewski,	(参考訳) RMSNormはLlama、Mistral、OpenELMなど多くのLLMで使われている。この記事では、RMSNormの正確な実装であるFlashNormについて詳述する。コードとトランスフォーマーのトリックについては、https://huggingface.co/open-machine/FlashNormを参照してください。 RMSNorm is used by many LLMs such as Llama, Mistral, and OpenELM. This paper details FlashNorm, which is an exact but faster implementation of RMSNorm followed by linear layers. See https://huggingface.co/open-machine/FlashNorm for code and more transformer tricks.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# 拡散傾向解析を用いた教師なし異常検出 Unsupervised Anomaly Detection Using Diffusion Trend Analysis ( http://arxiv.org/abs/2407.09578v1 ) ライセンス: Link先を確認	Eunwoo Kim, Un Yang, Cheol Lae Roh, Stefano Ermon,	(参考訳) 拡散モデルによる再構成に基づく従来の異常検出技術は, 異常位置や形状を高い性能で識別できるため, 広く用いられている。しかし、正常な特性を維持しながら異常を分解できる適切なノイズパラメータを決定するには限界がある。また,拡散モデルのボラティリティにより,再建時に正常領域がかなり変動し,誤検出が生じる。本稿では, 劣化度に応じて復元傾向を分析し, 既存手法の両問題を効果的に解決し, 異常検出手法を提案する。提案手法は,産業的異常検出のためのオープンデータセット上で検証され,多くの評価基準に基づいて既存手法の性能を向上させる。既存の異常検出手法と簡単に組み合わせることで、計算コストと性能のトレードオフを提供し、製造業における高い応用可能性を実現する。 Conventional anomaly detection techniques based on reconstruction via denoising diffusion model are widely used due to their ability to identify anomaly locations and shapes with high performance. However, there is a limitation in determining appropriate noise parameters that can degrade anomalies while preserving normal characteristics. Also, due to the volatility of the diffusion model, normal regions can fluctuate considerably during reconstruction, resulting in false detection. In this paper, we propose a method to detect anomalies by analysis of reconstruction trend depending on the degree of degradation, effectively solving the both problems of existing methods. The proposed method is validated on an open dataset for industrial anomaly detection, improving the performance of existing methods on a number of evaluation criteria. With the ease of combination with existing anomaly detection methods, it provides a tradeoff between computational cost and performance, allowing it high application potential in manufacturing industry.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# ペキュリヤ活性化機能を失わない - EUAFとそれを超えるもの Don't Fear Peculiar Activation Functions: EUAF and Beyond ( http://arxiv.org/abs/2407.09580v1 ) ライセンス: Link先を確認	Qianchao Wang, Shijun Zhang, Dong Zeng, Zhaoheng Xie, Hengtao Guo, Feng-Lei Fan, Tieyong Zeng,	(参考訳) 本稿では,Purometric elementary Universal Activation Function (PEUAF) と呼ばれる超表現活性化関数を提案する。 CIFAR10, Tiny-ImageNet, ImageNetなど, 各種産業・画像データセットの系統的, 包括的実験により, PEUAFの有効性を実証した。さらに, 特定の超表現活性化関数を持つ固定サイズネットワークにより, 任意の連続関数が任意の所望の精度に近似可能であることを示すことによって, いくつかの研究で実証された超表現活性化関数の族を著しく一般化する。特に、我々の研究は、超表現的活性化関数の発達を妨げる2つの大きなボトルネックに対処している。これは、超表現的関数の限定的な識別であり、それらの広範囲な適用性に対する疑念と、それらの特異な形式を生じさせ、現実のアプリケーションにおけるスケーラビリティと実用性に懐疑論をもたらす。 In this paper, we propose a new super-expressive activation function called the Parametric Elementary Universal Activation Function (PEUAF). We demonstrate the effectiveness of PEUAF through systematic and comprehensive experiments on various industrial and image datasets, including CIFAR10, Tiny-ImageNet, and ImageNet. Moreover, we significantly generalize the family of super-expressive activation functions, whose existence has been demonstrated in several recent works by showing that any continuous function can be approximated to any desired accuracy by a fixed-size network with a specific super-expressive activation function. Specifically, our work addresses two major bottlenecks in impeding the development of super-expressive activation functions: the limited identification of super-expressive functions, which raises doubts about their broad applicability, and their often peculiar forms, which lead to skepticism regarding their scalability and practicality in real-world applications.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# ディープニューラルネットワークのダイナミクスの理解に向けてのスケール不変な診断手法 A Scale-Invariant Diagnostic Approach Towards Understanding Dynamics of Deep Neural Networks ( http://arxiv.org/abs/2407.09585v1 ) ライセンス: Link先を確認	Ambarish Moharil, Damian Tamburri, Indika Kumara, Willem-Jan Van Den Heuvel, Alireza Azarfar,	(参考訳) 本稿では,複雑な接続系の非線形力学を解析・説明するために,textit{Fractal Geometry} を用いたスケール不変手法を提案する。ディープニューラルネットワーク(DNN)におけるアーキテクチャ的自己相似性を活用することにより、フラクタル次元と \textit{roughness} を定量化し、それらの力学を深く理解し、 \textit{intrinsic} 説明の質を高める。提案手法はカオス理論の原理を統合し,フラクタル進化の可視化を改善し,グラフベースニューラルネットワークを用いてネットワークトポロジを再構築する。この戦略は,コネクショニスト人工知能(AI)システムの説明可能性の向上を目的としている。 This paper introduces a scale-invariant methodology employing \textit{Fractal Geometry} to analyze and explain the nonlinear dynamics of complex connectionist systems. By leveraging architectural self-similarity in Deep Neural Networks (DNNs), we quantify fractal dimensions and \textit{roughness} to deeply understand their dynamics and enhance the quality of \textit{intrinsic} explanations. Our approach integrates principles from Chaos Theory to improve visualizations of fractal evolution and utilizes a Graph-Based Neural Network for reconstructing network topology. This strategy aims at advancing the \textit{intrinsic} explainability of connectionist Artificial Intelligence (AI) systems.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# Sparse Mixture-of-Expertsにおけるタスク非依存プルーニングのエキスパート知識の多様化 Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts ( http://arxiv.org/abs/2407.09590v1 ) ライセンス: Link先を確認	Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao,	(参考訳) モデルパラメータを増大させるが、タスクの実行時にわずかに活性化することにより、Mixture-of-Experts (MoE)アーキテクチャの使用は、推論コストを増大させることなく、LLM(Large Language Models)の性能を大幅に向上させる。しかし、専門家の増加によるメモリ消費量の増加は、これらのモデルを多くの実環境に展開する上での課題となっている。実験によっては、一部の専門家が事前トレーニング中に冗長な知識をエンコードしていることが明らかになりました。そこで本研究では,モデルパラメータの効率を向上させるために,類似の専門家をグループ化して抽出する手法を提案する。本手法の有効性を,Mixtral-8x7BとMixtral-8x22Bの2種類のMoEモデルを用いて評価した。評価の結果,本手法は様々な自然言語タスクにおいて,他のモデルプルーニング手法よりも優れていることがわかった。今後の研究を容易にするため、コードと刈り取られたMoEモデルをリリースします。 By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve model's parameter efficiency. We validate the effectiveness of our method by pruning two state-of-the-art MoE models, Mixtral-8x7B and Mixtral-8x22B. Evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks. To facilitate future research, we will release our code and the pruned MoE models.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# 規則順守に向けて: 処理活動の抽出に向けた数発の学習アプローチ Toward Regulatory Compliance: A few-shot Learning Approach to Extract Processing Activities ( http://arxiv.org/abs/2407.09592v1 ) ライセンス: Link先を確認	Pragyan K C, Rambod Ghandiparsi, Rocky Slavin, Sepideh Ghanavati, Travis Breaux, Mitra Bokaei Hosseini,	(参考訳) モバイルアプリケーションの普及によって業界は成長し、企業はターゲット広告やパーソナライズされたオファリングといったサービスのユーザーデータに大きく依存している。この文脈では、GDPR(General Data Protection Regulation)のようなプライバシー規制が重要な役割を果たす。 GDPRの要件の1つは、企業による処理記録(RoPA)の維持である。 RoPAには、データ処理アクティビティの記述、その目的、関連するデータの種類、その他の関連する外部エンティティなど、さまざまな詳細が含まれている。小さなアプリ開発企業は、リソースの制限と厳しいタイムラインのために、このようなコンプライアンス要件を満たすことの難しさに直面している。そこで本稿では,大規模な言語モデル(LLM)を用いて,ユーザによる使用シナリオからRoPAのセグメントを生成する手法を提案する。提案手法では,GPT-3.5 Turboを用いて,使用シナリオを要約し,RoPAセグメントを生成する。要約タスクでは,数発学習のプロンプトにおけるサンプル数,反復回数,命令順順の順列など,数発学習性能の整合性に影響を与えるさまざまな要因を評価した。本研究は,F1得点が複数回繰り返して無視可能な変動性を示す一方で,F1得点の総和化における実例数の影響を顕著に示すものである。提案手法は,平均70%のROUGE-L F1スコアで処理アクティビティの要約を成功させる。最後に、生成された要約を手動で評価することで、結果を改善する方法について議論する。 The widespread use of mobile applications has driven the growth of the industry, with companies relying heavily on user data for services like targeted advertising and personalized offerings. In this context, privacy regulations such as the General Data Protection Regulation (GDPR) play a crucial role. One of the GDPR requirements is the maintenance of a Record of Processing Activities (RoPA) by companies. RoPA encompasses various details, including the description of data processing activities, their purposes, types of data involved, and other relevant external entities. Small app-developing companies face challenges in meeting such compliance requirements due to resource limitations and tight timelines. To aid these developers and prevent fines, we propose a method to generate segments of RoPA from user-authored usage scenarios using large language models (LLMs). Our method employs few-shot learning with GPT-3.5 Turbo to summarize usage scenarios and generate RoPA segments. We evaluate different factors that can affect few-shot learning performance consistency for our summarization task, including the number of examples in few-shot learning prompts, repetition, and order permutation of examples in the prompts. Our findings highlight the significant influence of the number of examples in prompts on summarization F1 scores, while demonstrating negligible variability in F1 scores across multiple prompt repetitions. Our prompts achieve successful summarization of processing activities with an average 70% ROUGE-L F1 score. Finally, we discuss avenues for improving results through manual evaluation of the generated summaries.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# 交差量子相転移に対する非断熱量子最適化 Non-Adiabatic Quantum Optimization for Crossing Quantum Phase Transitions ( http://arxiv.org/abs/2407.09596v1 ) ライセンス: Link先を確認	András Grabarits, Federico Balducci, Barry C. Sanders, Adolfo del Campo,	(参考訳) 有限時間における量子相転移における多体量子系の基底状態の最適駆動について考察する。この文脈では、遷移を駆動する制御パラメータのスケジュールを調整することにより、断熱による励起を最小化することができる。 Kibble-Zurek 機構からインスピレーションを得た上で,複数の最適制御手順に対する断熱開始の時間スケールを特徴付ける。解析の結果,ローランド・セルフの局所断熱駆動や量子断熱ブラキストクロネのように局所断熱に依存するスケジュールでは,横場イジングモデルと長距離北エフモデルでは,断熱進化の大幅なスピードアップが得られないことが判明した。代替として,ランダウ・ツェナーの公式を利用して高励起状態の役割を考慮した非断熱量子最適化(Non-Adiabatic Quantum Optimization,NAQO)を提案する。 NAQOは、正確に解けるモデルに限らず、混乱した非可積分モデルにおいて、その優れた性能をさらに確認する。 We consider the optimal driving of the ground state of a many-body quantum system across a quantum phase transition in finite time. In this context, excitations caused by the breakdown of adiabaticity can be minimized by adjusting the schedule of the control parameter that drives the transition. Drawing inspiration from the Kibble-Zurek mechanism, we characterize the timescale of onset of adiabaticity for several optimal control procedures. Our analysis reveals that schedules relying on local adiabaticity, such as Roland-Cerf's local adiabatic driving and the quantum adiabatic brachistochrone, fail to provide a significant speedup over the adiabatic evolution in the transverse-field Ising and long-range Kitaev models. As an alternative, we introduce a novel framework, Non-Adiabatic Quantum Optimization (NAQO), that, by exploiting the Landau-Zener formula and taking into account the role of higher-excited states, outperforms schedules obtained via both local adiabaticity and state-of-the-art numerical optimization. NAQO is not restricted to exactly solvable models, and we further confirm its superior performance in a disordered non-integrable model.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# 双曲性スピン液体 Hyperbolic Spin Liquids ( http://arxiv.org/abs/2407.09601v1 ) ライセンス: Link先を確認	Patrick M. Lenggenhager, Santanu Dey, Tomáš Bzdušek, Joseph Maciejko,	(参考訳) 双曲格子は、結晶多体物理学の伝統的なパラダイムを超えて、負の曲面空間における相関現象を探求するユニークな機会を与える。そのような研究の理論的ベンチマークとして、北エフのスピン-1/2ハニカムモデルを双曲格子に拡張し、非ユークリッド空間群対称性を利用してモデルを正確に解く。我々は、$\{8,3\}$格子上の基底状態相図を解明し、アベリア異性体とのギャップ付き$\mathbb{Z}2$スピン液体、非アベリア異性体とキラルエッジ状態のギャップ付きキラルスピン液体、マヨラナフェルミオンの非アベリアブロッホ状態が支配する状態の低エネルギー密度の圧縮可能なスピン液体を求める。 Hyperbolic lattices present a unique opportunity to venture beyond the conventional paradigm of crystalline many-body physics and explore correlated phenomena in negatively curved space. As a theoretical benchmark for such investigations, we extend Kitaev's spin-1/2 honeycomb model to hyperbolic lattices and exploit their non-Euclidean space-group symmetries to solve the model exactly. We elucidate the ground-state phase diagram on the $\{8,3\}$ lattice and find a gapped $\mathbb{Z}_2$ spin liquid with Abelian anyons, a gapped chiral spin liquid with non-Abelian anyons and chiral edge states, and a compressible spin liquid with low-energy density of states dominated by non-Abelian Bloch states of Majorana fermions.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# 機械学習を用いた二元中性子星のリアルタイム重力波推定 Real-time gravitational-wave inference for binary neutron stars using machine learning ( http://arxiv.org/abs/2407.09602v1 ) ライセンス: Link先を確認	Maximilian Dax, Stephen R. Green, Jonathan Gair, Nihar Gupte, Michael Pürrer, Vivien Raymond, Jonas Wildberger, Jakob H. Macke, Alessandra Buonanno, Bernhard Schölkopf,	(参考訳) 二元中性子星(BNS)の融合は重力波(GW)と電磁スペクトル(EM)の両方で信号を放出する。有名なことに、GW170817のマルチセンサー観測は、宇宙論、核物理学、重力の科学的な発見につながった。これらの結果の中心は、GW170817の場合、GW信号の11時間後、関連するEM過渡性AT 2017gfoを特定するのに役立った、GWデータから得られる空の局在と距離である。 GWデータの高速解析は、時間に敏感なEM観測を誘導するために重要であるが、信号の長さと複雑さから生じる問題のため、精度を犠牲にする近似を行う必要があることが多い。そこで我々は,そのような近似を行なわずに,1秒で完全なBNS推論を行う機械学習手法を開発した。これは、物理的なドメイン知識をニューラルネットワークに明示的に統合する新しい方法によって実現されている。提案手法によるマルチメーカ観測の促進一合併前の正確な位置決め (ii) 近似低遅延法と比較して印加精度を$\sim30\%$で改善し, 三光度距離、傾き、質量の詳細な情報で、高価な望遠鏡の時間を優先することができる。さらに,本手法の柔軟性とコスト削減により,状態方程式および波形体系研究の新しい機会が開かれた。最後に,提案手法は最大1時間までの超長信号にスケールし,次世代地上・宇宙用検出器のデータ解析の青写真として機能することを示す。 Mergers of binary neutron stars (BNSs) emit signals in both the gravitational-wave (GW) and electromagnetic (EM) spectra. Famously, the 2017 multi-messenger observation of GW170817 led to scientific discoveries across cosmology, nuclear physics, and gravity. Central to these results were the sky localization and distance obtained from GW data, which, in the case of GW170817, helped to identify the associated EM transient, AT 2017gfo, 11 hours after the GW signal. Fast analysis of GW data is critical for directing time-sensitive EM observations; however, due to challenges arising from the length and complexity of signals, it is often necessary to make approximations that sacrifice accuracy. Here, we develop a machine learning approach that performs complete BNS inference in just one second without making any such approximations. This is enabled by a new method for explicit integration of physical domain knowledge into neural networks. Our approach enhances multi-messenger observations by providing (i) accurate localization even before the merger; (ii) improved localization precision by $\sim30\%$ compared to approximate low-latency methods; and (iii) detailed information on luminosity distance, inclination, and masses, which can be used to prioritize expensive telescope time. Additionally, the flexibility and reduced cost of our method open new opportunities for equation-of-state and waveform systematics studies. Finally, we demonstrate that our method scales to extremely long signals, up to an hour in length, thus serving as a blueprint for data analysis for next-generation ground- and space-based detectors.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# 変形SYKモデルにおけるクリロフ複雑性とカオス Krylov complexity and chaos in deformed SYK models ( http://arxiv.org/abs/2407.09604v1 ) ライセンス: Link先を確認	Shira Chapman, Saskia Demulder, Damián A. Galante, Sameer U. Sheorey, Osher Shoval,	(参考訳) クリロフ複雑性は、カオスの量子プローブとして最近提案されている。クリロフ複雑性の指数的成長を特徴づけるクリロフ指数は、リャプノフ指数の上界に予想される。 Sachdev-Ye-Kitaevモデルにおけるクリロフ指数とリャプノフ指数を、その変形のいくつかで計算する。この解析は、フェルミオン相互作用の数が有限かつ無限であるモデルにおいて、無限温度と有限温度の両方で行う。本研究では,2つの領域間を交差する変形と,低温でほぼ可積分となる変形を考察する。いずれの場合も、クリロフ指数がリャプノフ指数の上界であることが分かる。しかし、リアプノフ指数は温度関数として非単調な振舞いを持つことができるが、すべての研究例において、クリロフ指数は単調に振舞う。例えば、リャプノフ指数が低温でゼロとなるモデルを見つけ、一方、クリロフ指数はその極大境界に飽和する。この単調性は、ユニタリ進化の下で進化する量子系におけるクリロフ指数の一般的な特徴である可能性があると推測する。 Krylov complexity has recently been proposed as a quantum probe of chaos. The Krylov exponent characterising the exponential growth of Krylov complexity is conjectured to upper-bound the Lyapunov exponent. We compute the Krylov and the Lyapunov exponents in the Sachdev-Ye-Kitaev model and in some of its deformations. We do this analysis both at infinite and finite temperatures, in models where the number of fermionic interactions is both finite and infinite. We consider deformations that interpolate between two regions of near-maximal chaos and deformations that become nearly-integrable at low temperatures. In all cases, we find that the Krylov exponent upper-bounds the Lyapunov one. However, we find that while the Lyapunov exponent can have non-monotonic behaviour as a function of temperature, in all studied examples the Krylov exponent behaves monotonically. For instance, we find models where the Lyapunov exponent goes to zero at low temperatures, while the Krylov exponent saturates to its maximal bound. We speculate on the possibility that this monotonicity might be a generic feature of the Krylov exponent in quantum systems evolving under unitary evolution.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# 量子に着想を得た数値解析のための行列積状態におけるチェビシェフ近似と関数の合成 Chebyshev approximation and composition of functions in matrix product states for quantum-inspired numerical analysis ( http://arxiv.org/abs/2407.09609v1 ) ライセンス: Link先を確認	Juan José Rodríguez-Aldavero, Paula García-Molina, Luca Tagliacozzo, Juan José García-Ripoll,	(参考訳) 本研究では,一変量および多変量関数を行列積状態(MPS, Quantized tensor-trains, QTT)として表す。解析的かつ高度に微分可能な関数をMPSチェビシェフ補間子として表現するために,反復的なチェビシェフ展開とクレショー評価を用いるアルゴリズムを提案する。これは高微分可能な函数に対する急速な収束を示し、理論的な予測と整合し、多次元のシナリオに効率的に一般化する。このアルゴリズムの性能は, テンソルクロス補間 (TCI) やマルチスケール補間構造と比較し, 総合的な比較を行った。関数評価が安価である場合や、その関数が解析的でない場合、TCIは一般に関数ローディングにおいてより効率的である。しかし,提案手法は,特定の多変量シナリオにおいてTCIよりも優れた性能を示す。さらに,MPSにおける関数合成の枠組みを提供することにより,より広い範囲のタスクに拡張可能なスケーリング率を示し,多体統計学や非線形問題に有用である。 This work explores the representation of univariate and multivariate functions as matrix product states (MPS), also known as quantized tensor-trains (QTT). It proposes an algorithm that employs iterative Chebyshev expansions and Clenshaw evaluations to represent analytic and highly differentiable functions as MPS Chebyshev interpolants. It demonstrates rapid convergence for highly-differentiable functions, aligning with theoretical predictions, and generalizes efficiently to multidimensional scenarios. The performance of the algorithm is compared with that of tensor cross-interpolation (TCI) and multiscale interpolative constructions through a comprehensive comparative study. When function evaluation is inexpensive or when the function is not analytical, TCI is generally more efficient for function loading. However, the proposed method shows competitive performance, outperforming TCI in certain multivariate scenarios. Moreover, it shows advantageous scaling rates and generalizes to a wider range of tasks by providing a framework for function composition in MPS, which is useful for non-linear problems and many-body statistical physics.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# Heterophilic Graph Learning Handbook:ベンチマーク、モデル、理論的分析、応用と課題 The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges ( http://arxiv.org/abs/2407.09618v1 ) ライセンス: Link先を確認	Sitao Luan, Chenqing Hua, Qincheng Lu, Liheng Ma, Lirong Wu, Xinyu Wang, Minkai Xu, Xiao-Wen Chang, Doina Precup, Rex Ying, Stan Z. Li, Jian Tang, Guy Wolf, Stefanie Jegelka,	(参考訳) ホモフィリ原理では、同じラベルや類似属性を持つ \ie{} ノードが接続される可能性が高いため、グラフ構造化されたデータ、特にノードレベルのタスクにおいて、従来のニューラルネットワーク(NN)よりもグラフニューラルネットワーク(GNN)の方が優れていると一般的に信じられている。しかし、最近の研究は、GNNのパフォーマンスとNNのパフォーマンスが満足できないような、非自明なデータセットのセットを特定している。ヘテロフィリー、すなわち低ホモフィリーは、この経験的観察の主要な原因と考えられている。人々は、グラフ変換器とその変種を含む、既存のほとんどのグラフモデルを再検討し、様々な種類のグラフ、例えば異種グラフ、時間グラフ、ハイパーグラフの異種シナリオで再評価し始めている。さらに、グラフ関連の多くの応用がヘテロフィリー問題と密接に関連していることが判明した。ここ数年、ヘテロフィリ問題の研究と解決に多大な努力が注がれている。本調査では, ヘテロフィリックグラフ学習の最近の進歩を概観し, ベンチマークデータセットの概説, 合成グラフ上のホモフィリックメトリクスの評価, 最新の教師付きおよび教師なし学習手法の巧妙な分類, ホモフィリ・ヘテロフィリィ理論解析の徹底的な消化, ヘテロフィリ関連アプリケーションの広範な探索などについて概説する。特に、詳細な実験を通じて、私たちは、ベンチマークヘテロ親和性データセットを、悪性、良性、曖昧なヘテロフィリーの3つのサブカテゴリに分類しました。悪性および曖昧なデータセットは、ヘテロフィリーチャレンジにおける新しいモデルの有効性をテストするための真の挑戦データセットとして特定される。最後に,異種グラフ表現学習における課題と今後の方向性を提案する。 Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.	翻訳日:2024-07-16 21:38:05 公開日:2024-07-12
# 集積ワイヤによる捕捉イオンのドレッシング Dressing trapped ions with integrated wires ( http://arxiv.org/abs/2407.09623v1 ) ライセンス: Link先を確認	R. Tyler Sutherland,	(参考訳) トラップ集積ワイヤの近接場におけるトラップイオンのドレッシングについて検討する。手術前後にドレッシングフィールドをアディバティカルに打ち込むと、有効ハミルトニアンが変化する。ドレッシング場(英語版)の振幅と変形は、任意の操作の特性を「コトミズ」するために使用できる調整可能な自由度として作用する。この汎用ツールには3つのユースケースを提案する。まず「人工的な」クロック状態を生成し、量子ビットの線形感度を(小さいと仮定される)排除する。第二に、低い量子化場においてシェルビングを複雑にするデジネラシーを分解することができる。最後に、他のコンピュータから周波数空間で分離されたフィールドを用いて、レーザーフリーの単一量子ビットゲートを「ターゲット」イオンの集合上に実装することができる。 We discuss dressing trapped ions with the near field of a trap integrated wire. Ramping a dressing field on/off adiabatically before/after an operation changes its effective Hamiltonian. The amplitude and detuning of the dressing field act as tunable degrees of freedom we can use to `customize' the properties of any operation. We propose three use cases for this general tool. First, we can generate `artificial' clock states, where we eliminate the (assumed to be small) linear sensitivity of a qubit. Second, we can break the degeneracies that often complicate shelving at low quantization fields\textemdash allowing us to implement operations with linearly polarized microwaves that would, otherwise, require circular polarization. Finally, we can implement laser-free single qubit gates on a set of `target' ions using fields that are separated from the rest of the computer in frequency space.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# 準備-変換-測定シナリオにおける非テクスチュアリティの不等式 Noncontextuality inequalities for prepare-transform-measure scenarios ( http://arxiv.org/abs/2407.09624v1 ) ライセンス: Link先を確認	David Schmid, Roberto D. Baldijão, John H. Selby, Ana Belén Sainz, Robert W. Spekkens,	(参考訳) 我々は,準備-変換-測定シナリオにおいて,文脈性の証人を導出するための最初の体系的手法を提供する。より具体的には、線形量化器の除去が、そのようなシナリオにおける一般化された非コンテクスト性と整合した相関関係のポリトープを計算するためにどのように使用できるかを示す。このポリトープは、図の保存からいくつかの制約を無視した場合に、任意の線形操作単位に対して古典的な説明を認めるシナリオにおける観測データに必要な、必要かつ十分な条件の集合として指定される。これらの後者の制約を含むと、一般により厳密な不等式につながるが、非線形量子化器の除去はそれらを体系的に含む必要がある。また,準備-変換-測定実験において発生した数値データの非古典性を証明できる線形プログラムを提案する。この結果を用いて、安定部分定理に違反する可能性のある変換に対して、頑健な非コンテクスト性不等式を得る。最後に、与えられた状態の集合、変換の集合、あるいは測定の集合で保持されるすべての線形操作IDを計算するための単純なアルゴリズムを与える。 We provide the first systematic technique for deriving witnesses of contextuality in prepare-transform-measure scenarios. More specifically, we show how linear quantifier elimination can be used to compute a polytope of correlations consistent with generalized noncontextuality in such scenarios. This polytope is specified as a set of noncontextuality inequalities that are necessary and sufficient conditions for observed data in the scenario to admit of a classical explanation relative to any linear operational identities, if one ignores some constraints from diagram preservation. While including these latter constraints generally leads to tighter inequalities, it seems that nonlinear quantifier elimination would be required to systematically include them. We also provide a linear program which can certify the nonclassicality of a set of numerical data arising in a prepare-transform-measure experiment. We apply our results to get a robust noncontextuality inequality for transformations that can be violated within the stabilizer subtheory. Finally, we give a simple algorithm for computing all the linear operational identities holding among a given set of states, of transformations, or of measurements.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# 機械学習タイムプロパゲータによる電子動力学シミュレーションの高速化 Accelerating Electron Dynamics Simulations through Machine Learned Time Propagators ( http://arxiv.org/abs/2407.09628v1 ) ライセンス: Link先を確認	Karan Shah, Attila Cangi,	(参考訳) 時間依存密度汎関数理論(TDDFT)は、レーザー場のような様々な外部摂動下での電子力学を研究するために広く用いられる手法である。本研究では, 自己回帰型ニューラル演算子を電子密度の時間プロパゲータとして利用して, リアルタイムTDDFTに基づく電子動力学シミュレーションを高速化する新しい手法を提案する。物理インフォームド制約と高分解能トレーニングデータを活用することにより,従来の数値解法と比較して精度と計算速度が向上する。一次元二原子分子のクラスにおけるモデルの有効性を実証する。この方法は、様々な実験パラメータを持つレーザー照射された分子や材料のリアルタイム・オンザフライモデリングを可能にする可能性がある。 Time-dependent density functional theory (TDDFT) is a widely used method to investigate electron dynamics under various external perturbations such as laser fields. In this work, we present a novel approach to accelerate real time TDDFT based electron dynamics simulations using autoregressive neural operators as time-propagators for the electron density. By leveraging physics-informed constraints and high-resolution training data, our model achieves superior accuracy and computational speed compared to traditional numerical solvers. We demonstrate the effectiveness of our model on a class of one-dimensional diatomic molecules. This method has potential in enabling real-time, on-the-fly modeling of laser-irradiated molecules and materials with varying experimental parameters.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# 極端における顆粒の因果関係 Granger Causality in Extremes ( http://arxiv.org/abs/2407.09632v1 ) ライセンス: Link先を確認	Juraj Bodik, Olivier Pasche,	(参考訳) 本稿では,時系列における極端事象からの因果関係の同定を目的とした,極端におけるグランガー因果関係の厳密な数学的枠組みを提案する。グランガー因果関係は、時間変化変数間の方向関係を明らかにする上で重要な役割を果たす。この概念は極端かつ非常に不安定な期間に重要性を増すが、最先端の手法は主に分布の本体内の因果性に焦点を当てており、しばしば極端な出来事にのみ現れる因果的メカニズムを見落としている。本フレームワークは, 因果尾係数を利用して, 主に極端な事象から因果関係を推定するように設計されている。我々は、極端な因果関係と(古典的な)グランガー因果関係、シムズ因果関係、構造因果関係などの他の因果関係の概念の等価性を確立する。 Grangerの因果関係の他の重要な性質を極端に証明し、このフレームワークが隠れた共同創設者の存在下で特に有用であることを示す。また,データから極端にグランガー因果性が存在することを検出する新しい推論手法を提案する。提案手法はモデルフリーであり, 非線形・高次元時系列処理が可能であり, 性能, 速度の両面において, 現状の手法よりも優れており, 財務・極端気象観測におけるコヒーレントな効果を明らかにすることができた。 We introduce a rigorous mathematical framework for Granger causality in extremes, designed to identify causal links from extreme events in time series. Granger causality plays a pivotal role in uncovering directional relationships among time-varying variables. While this notion gains heightened importance during extreme and highly volatile periods, state-of-the-art methods primarily focus on causality within the body of the distribution, often overlooking causal mechanisms that manifest only during extreme events. Our framework is designed to infer causality mainly from extreme events by leveraging the causal tail coefficient. We establish equivalences between causality in extremes and other causal concepts, including (classical) Granger causality, Sims causality, and structural causality. We prove other key properties of Granger causality in extremes and show that the framework is especially helpful under the presence of hidden confounders. We also propose a novel inference method for detecting the presence of Granger causality in extremes from data. Our method is model-free, can handle non-linear and high-dimensional time series, outperforms current state-of-the-art methods in all considered setups, both in performance and speed, and was found to uncover coherent effects when applied to financial and extreme weather observations.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# Gibbs状態生成のための散逸変動量子アルゴリズム Dissipative variational quantum algorithms for Gibbs state preparation ( http://arxiv.org/abs/2407.09635v1 ) ライセンス: Link先を確認	Yigal Ilin, Itai Arad,	(参考訳) 近年、変動量子アルゴリズム(VQA)は、短期量子ハードウェアへの適応性と効率性から注目されている。彼らは線形代数、探索問題、ギブズ、基底状態の準備を含む様々なタスクにポテンシャルを示してきた。それでも、現在の量子ハードウェアにおけるノイズの存在は、その性能を著しく制限している。本研究では、変分量子回路の本質的な部分として、qubit RESETや確率ゲートなどの散逸演算を組み込むことにより、散逸変動量子アルゴリズム(D-VQA)を導入する。このような散逸的変分アルゴリズムは、散逸的雑音に対して自然な弾力性を持つと主張する。このようなアルゴリズムは、広範囲の量子多体ハミルトンと温度でギブス状態を作ることができ、コヒーレントノイズと非コヒーレントノイズの両方による誤差を著しく低減することができる。このアプローチのもう1つの利点は、アンシラキュービットが不要であることです。我々は,NISQデバイス上での変動量子計算の堅牢性と精度を高めるため,D-VQAの可能性を強調した。 In recent years, variational quantum algorithms (VQAs) have gained significant attention due to their adaptability and efficiency on near-term quantum hardware. They have shown potential in a variety of tasks, including linear algebra, search problems, Gibbs and ground state preparation. Nevertheless, the presence of noise in current day quantum hardware, severely limits their performance. In this work, we introduce dissipative variational quantum algorithms (D-VQAs) by incorporating dissipative operations, such as qubit RESET and stochastic gates, as an intrinsic part of a variational quantum circuit. We argue that such dissipative variational algorithms posses some natural resilience to dissipative noise. We demonstrate how such algorithms can prepare Gibbs states over a wide range of quantum many-body Hamiltonians and temperatures, while significantly reducing errors due to both coherent and non-coherent noise. An additional advantage of our approach is that no ancilla qubits are need. Our results highlight the potential of D-VQAs to enhance the robustness and accuracy of variational quantum computations on NISQ devices.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# Seq-to-Final: シーケンス分布から最終時点へのチューニングベンチマーク Seq-to-Final: A Benchmark for Tuning from Sequential Distributions to a Final Time Point ( http://arxiv.org/abs/2407.09642v1 ) ライセンス: Link先を確認	Christina X Ji, Ahmed M Alaa, David Sontag,	(参考訳) 時間とともに分布の変化は、多くの設定で起こる。歴史データの活用は、最終期間中に限られたデータが利用できる最後の時点のモデルを学ぶために必要だが、この目的のために特別に開発された手法はほとんどない。本研究では,3種類の方法の有効性を評価するために,異なる順序の合成シフトを用いたベンチマークを構築した。 1)最終期間に適応することなく、すべてのデータから学ぶこと。 2 シーケンシャルな性質によらず、歴史資料から学び、最終期間に順応し、 3)モデルを最終期間に調整する場合に、履歴データのシーケンシャルな性質を活用する。我々はこのベンチマークをSeq-to-Finalと呼び、最終時点のモデルを学習するために一連の時間を用いて焦点を合わせる。我々の総合ベンチマークにより、ユーザーは異なるタイプのシフトでシーケンスを構築でき、異なる方法を比較することができる。 CIFAR-10とCIFAR-100をベース画像として用いた画像分類タスクに着目する。また、Portraitsデータセット上の同じ手法を評価し、時間とともに現実のシフトとの関連性を探る。最後に、最終段階において異なるメソッドの初期化と更新を対比する視覚化を作成します。この結果から, ベンチマークのシーケンスに対して, 逐次構造を無視し, 最終時点に適応する手法は良好に動作することが示唆された。シーケンシャルな性質を活用するアプローチは、いかなる改善も提供しません。このベンチマークは、シーケンシャルな履歴データを活用するのに優れた新しいアルゴリズムの開発や、シーケンシャルな性質を無視した方法の深い理解を促すことを願っている。 Distribution shift over time occurs in many settings. Leveraging historical data is necessary to learn a model for the last time point when limited data is available in the final period, yet few methods have been developed specifically for this purpose. In this work, we construct a benchmark with different sequences of synthetic shifts to evaluate the effectiveness of 3 classes of methods that 1) learn from all data without adapting to the final period, 2) learn from historical data with no regard to the sequential nature and then adapt to the final period, and 3) leverage the sequential nature of historical data when tailoring a model to the final period. We call this benchmark Seq-to-Final to highlight the focus on using a sequence of time periods to learn a model for the final time point. Our synthetic benchmark allows users to construct sequences with different types of shift and compare different methods. We focus on image classification tasks using CIFAR-10 and CIFAR-100 as the base images for the synthetic sequences. We also evaluate the same methods on the Portraits dataset to explore the relevance to real-world shifts over time. Finally, we create a visualization to contrast the initializations and updates from different methods at the final time step. Our results suggest that, for the sequences in our benchmark, methods that disregard the sequential structure and adapt to the final time point tend to perform well. The approaches we evaluate that leverage the sequential nature do not offer any improvement. We hope that this benchmark will inspire the development of new algorithms that are better at leveraging sequential historical data or a deeper understanding of why methods that disregard the sequential nature are able to perform well.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# OXN -- クラウドネイティブアプリケーションのための自動可観測性評価 OXN -- Automated Observability Assessments for Cloud-Native Applications ( http://arxiv.org/abs/2407.09644v1 ) ライセンス: Link先を確認	Maria C. Borges, Joshua Bauer, Sebastian Werner,	(参考訳) マイクロサービスアプリケーションの信頼性を保証するためには、可観測性が重要です。これらのアプリケーションは、異種環境にデプロイされる多くの独立したサービスがあるため、しばしば障害を起こしやすい。正しく"使用される場合、オブザーバビリティは、開発者が障害を素早く特定し、トラブルシュートするのに役立ちます。しかしながら、マイクロサービスアプリケーションの可観測性の測定と設定は簡単ではなく、ツールに依存し、コストに結びついている。実践者は、異なる可観測性設計の選択肢を重み付けするために、可観測性に関連するトレードオフを理解する必要がある。それでも、これらのアーキテクチャ設計決定は体系的な手法ではサポートされず、通常単に「専門的な直観」に依存している。具体的な証拠とともに可観測性設計のトレードオフを評価するため,様々な設計代替品を比較する実験を行うことを提唱する。組織的で反復可能な実験プロセスを達成するには、自動化が必要です。本稿では,実験ツール-Observability eXperiment eNgine (OXN) の概念実証実装について述べる。 OXNはChaos Engineeringに似た任意のフォールトをアプリケーションに注入することができるが、可観測性の設定を変更するユニークな機能も備えており、これまで探索されていなかった設計決定の直接的な評価を可能にしている。 Observability is important to ensure the reliability of microservice applications. These applications are often prone to failures, since they have many independent services deployed on heterogeneous environments. When employed "correctly", observability can help developers identify and troubleshoot faults quickly. However, instrumenting and configuring the observability of a microservice application is not trivial but tool-dependent and tied to costs. Practitioners need to understand observability-related trade-offs in order to weigh between different observability design alternatives. Still, these architectural design decisions are not supported by systematic methods and typically just rely on "professional intuition". To assess observability design trade-offs with concrete evidence, we advocate for conducting experiments that compare various design alternatives. Achieving a systematic and repeatable experiment process necessitates automation. We present a proof-of-concept implementation of an experiment tool - Observability eXperiment eNgine (OXN). OXN is able to inject arbitrary faults into an application, similar to Chaos Engineering, but also possesses the unique capability to modify the observability configuration, allowing for the straightforward assessment of design decisions that were previously left unexplored.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# 強化学習におけるハミルトン・ヤコビの到達可能性に関する調査 Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey ( http://arxiv.org/abs/2407.09645v1 ) ライセンス: Link先を確認	Milan Ganai, Sicun Gao, Sylvia Herbert,	(参考訳) 近年の文献では、安全保証を維持しつつ、高い性能で制御ポリシーを学習するアプローチが提案されている。ハミルトン・ヤコビ・リーチブル・セット(HJ)の合成は、複雑な高次元システムに対する強化学習に基づく制御ポリシーの訓練の安全性を検証し、監督するための有効なツールとなっている。以前は、HJの到達性は低次元の動的システムの検証に限られていた。これは、それが依存する動的プログラミングアプローチの計算複雑性が、システム状態の数とともに指数関数的に増加するためである。この制限に対処するため、近年では、真の到達可能な集合の信頼性を保ちながら、HJ到達可能性分析をスケールする学習制御ポリシと同時に、到達可能性値関数を計算する方法が提案されている。これらのHJ到達可能性近似は、学習された制御ポリシーの安全性の向上や、報酬のパフォーマンス向上に利用され、動的障害やライダーベースや視覚に基づく観察といった課題を解決することができる。本稿では,高次元システムにおける信頼性のさらなる研究の基盤となる強化学習におけるHJ到達可能性評価の分野における最近の展開を概観する。 Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was limited to verifying low-dimensional dynamical systems -- this is because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. To address this limitation, in recent years, there have been methods that compute the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# ハンバ:グラフ誘導バイスキャンマンバを用いたシングルビュー3Dハンドコンストラクション Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba ( http://arxiv.org/abs/2407.09646v1 ) ライセンス: Link先を確認	Haoye Dong, Aviral Chharia, Wenbo Gou, Francisco Vicente Carrasco, Fernando De la Torre,	(参考訳) 1枚のRGB画像からの3Dハンド再構成は、関節運動、自己閉塞、物体との相互作用により困難である。既存のSOTA法では3次元ハンドポーズと形状を学習するためにアテンションベースのトランスフォーマーが用いられているが, 接合空間関係のモデリングが不十分なため, 頑健で正確な性能が得られなかった。この問題に対処するために,グラフ学習と状態空間モデリングを橋渡しするHambaというグラフ誘導型Mambaフレームワークを提案する。私たちの中核となる考え方は、マンバのスキャンをグラフ誘導の双方向走査に再構成し、いくつかの効果的なトークンを使って3D再構成することです。これにより、再構成性能を向上させるために、結合関係と空間配列を学習することができる。具体的には、グラフ構造関係と関節の空間配列を学習し、注意に基づく手法よりも88.5%少ないトークンを使用する新しいグラフ誘導状態空間(GSS)ブロックを設計する。さらに、我々は、フュージョンモジュールを使用して状態空間機能とグローバル機能を統合する。 GSSブロックと融合モジュールを利用することで、Hambaはグラフ誘導状態空間モデリング機能を効果的に活用し、グローバルとローカルの機能を共同で検討してパフォーマンスを向上させる。いくつかのベンチマークや実験において、ハンバは既存のSOTAよりも大幅に優れており、FreiHANDでは5.3mmとF@15mmのPA-MPVPEを達成している。ハンバは現在、3Dハンドリコンストラクションで2つの挑戦的リーダーボードで1位にランクインしている。コードは受理後利用可能になる。 [Website] (https://humansensinglab.github.io/Hamba/) 3D Hand reconstruction from a single RGB image is challenging due to the articulated motion, self-occlusion, and interaction with objects. Existing SOTA methods employ attention-based transformers to learn the 3D hand pose and shape, but they fail to achieve robust and accurate performance due to insufficient modeling of joint spatial relations. To address this problem, we propose a novel graph-guided Mamba framework, named Hamba, which bridges graph learning and state space modeling. Our core idea is to reformulate Mamba's scanning into graph-guided bidirectional scanning for 3D reconstruction using a few effective tokens. This enables us to learn the joint relations and spatial sequences for enhancing the reconstruction performance. Specifically, we design a novel Graph-guided State Space (GSS) block that learns the graph-structured relations and spatial sequences of joints and uses 88.5% fewer tokens than attention-based methods. Additionally, we integrate the state space features and the global features using a fusion module. By utilizing the GSS block and the fusion module, Hamba effectively leverages the graph-guided state space modeling features and jointly considers global and local features to improve performance. Extensive experiments on several benchmarks and in-the-wild tests demonstrate that Hamba significantly outperforms existing SOTAs, achieving the PA-MPVPE of 5.3mm and F@15mm of 0.992 on FreiHAND. Hamba is currently Rank 1 in two challenging competition leaderboards on 3D hand reconstruction. The code will be available upon acceptance. [Website](https://humansensinglab.github.io/Hamba/).	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# 3x2:2次元意味対応による3次元オブジェクト部分分割 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences ( http://arxiv.org/abs/2407.09648v1 ) ライセンス: Link先を確認	Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, Matt Feiszli, James M. Rehg,	(参考訳) 3Dオブジェクト部分のセグメンテーションはコンピュータビジョンアプリケーションに不可欠である。 2Dオブジェクト部分のセグメンテーションでかなりの進歩があったが、この3Dデータセットは、収集に費用がかかる注釈付き3Dデータセットが不足しているため、あまり注目されていない。本研究では,いくつかの注釈付き3次元形状やリッチな注釈付き2次元データセットを活用して3次元オブジェクト部分分割を実現することを提案する。我々は,様々な粒度レベルのベンチマークでSOTA性能を実現する3-By-2という新しい手法を提案する。事前訓練された基礎モデルの特徴を利用し,意味的および幾何学的対応を活用することで,限られた3次元アノテーションの課題を克服することができる。提案手法は利用可能な2次元ラベルを活用し,有効3次元オブジェクト部分分割を実現する。提案手法は,3-By-2で様々な分類・粒度に対応可能であり,異なる対象カテゴリにまたがる興味深い部分ラベル転送能力を示す。プロジェクトウェブサイト: \url{https://ngailapdi.github.io/projects/3by2/} 3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect. In this work, we propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object part segmentation. We present our novel approach, termed 3-By-2 that achieves SOTA performance on different benchmarks with various granularity levels. By using features from pretrained foundation models and exploiting semantic and geometric correspondences, we are able to overcome the challenges of limited 3D annotations. Our approach leverages available 2D labels, enabling effective 3D object part segmentation. Our method 3-By-2 can accommodate various part taxonomies and granularities, demonstrating interesting part label transfer ability across different object categories. Project website: \url{https://ngailapdi.github.io/projects/3by2/}.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# 中国語モデルと中国語モデル : 中国のLLMにおける言語政策の欠如 How Chinese are Chinese Language Models? The Puzzling Lack of Language Policy in China's LLMs ( http://arxiv.org/abs/2407.09652v1 ) ライセンス: Link先を確認	Andrea W Wen-Yi, Unso Eun Seo Jo, Lu Jia Lin, David Mimno,	(参考訳) 現代言語モデルは多言語化が進んでいるが、中国のLLM開発者は言語多様性に関する複雑な政治的・ビジネス的な考察を行わなければならない。中国における言語政策は、公衆の言論に影響を及ぼし、多民族社会を統治することを目的としており、1949年以降、多民族主義からより同化主義的なアプローチへと徐々に移行してきた。これらの影響が現在の言語技術に与える影響について検討する。我々は、中国企業によって18言語で事前訓練された6つのオープンソース多言語LPMを評価し、中国、アジア、アングロ・ヨーロッパ諸言語にまたがる。実験の結果,中国における多言語でのLLMのパフォーマンスは国際LLMと区別できないことがわかった。同様に、これらのモデルの技術的報告は、英語とマンダリン中国語を除いて、データ言語を事前訓練するための考慮の欠如も示している。中国のAI政策、モデル実験、技術報告を見れば、中国のLLM開発における言語多様性のいずれに対しても、一貫性のある政策の兆候は見つからない。これは、中国が人々が毎日使っている言語と言語モデルの開発の両方を規制しているが、言語モデルにおける言語に関するポリシーを持っていない、という厄介な事実を残している。 Contemporary language models are increasingly multilingual, but Chinese LLM developers must navigate complex political and business considerations of language diversity. Language policy in China aims at influencing the public discourse and governing a multi-ethnic society, and has gradually transitioned from a pluralist to a more assimilationist approach since 1949. We explore the impact of these influences on current language technology. We evaluate six open-source multilingual LLMs pre-trained by Chinese companies on 18 languages, spanning a wide range of Chinese, Asian, and Anglo-European languages. Our experiments show Chinese LLMs performance on diverse languages is indistinguishable from international LLMs. Similarly, the models' technical reports also show lack of consideration for pretraining data language coverage except for English and Mandarin Chinese. Examining Chinese AI policy, model experiments, and technical reports, we find no sign of any consistent policy, either for or against, language diversity in China's LLM development. This leaves a puzzling fact that while China regulates both the languages people use daily as well as language model development, they do not seem to have any policy on the languages in language models.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# 情報検索と製品検索のギャップを埋める:eコマースへのQ&A勧告 Bridging the Gap Between Information Seeking and Product Search Systems: Q&A Recommendation for E-commerce ( http://arxiv.org/abs/2407.09653v1 ) ライセンス: Link先を確認	Saar Kuzi, Shervin Malmasi,	(参考訳) ショッピングミッションの消費者は、商品の理解を深め、購入決定に達するための反復的なプロセスにおいて、Web検索エンジンや質問回答(QA)システムのような製品検索と情報検索システムの両方を利用することが多い。商品検索は、購入者が自分の要求を満たす実際の商品をカタログで見つけるのに有用であるが、情報検索システムは、それらの要求を洗練させるために必要なあらゆる質問に答えるために利用することができる。最近、LLM(Large Language Models)の成功により、顧客が目標を迅速に効果的に達成するための2つのタスク間のギャップを埋める機会が開かれた。本稿では,ユーザに対して,製品検索に関連する質問応答(Q&A)ペアを推薦し,購入決定を支援することを提案する。本稿では、Q&Aペアの要件と特性、その生成、Q&Aレコメンデーションタスクの最適化など、問題のさまざまな側面について論じる。我々は、この新興分野における今後の研究を促進するための課題、オープンな課題、そして解決策を提案する。 Consumers on a shopping mission often leverage both product search and information seeking systems, such as web search engines and Question Answering (QA) systems, in an iterative process to improve their understanding of available products and reach a purchase decision. While product search is useful for shoppers to find the actual products meeting their requirements in the catalog, information seeking systems can be utilized to answer any questions they may have to refine those requirements. The recent success of Large Language Models (LLMs) has opened up an opportunity to bridge the gap between the two tasks to help customers achieve their goals quickly and effectively by integrating conversational QA within product search. In this paper, we propose to recommend users Question-Answer (Q&A) pairs that are relevant to their product search and can help them make a purchase decision. We discuss the different aspects of the problem including the requirements and characteristics of the Q&A pairs, their generation, and the optimization of the Q&A recommendation task. We highlight the challenges, open problems, and suggested solutions to encourage future research in this emerging area.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# 量子クエリローバウンドに対する置換重ね合わせオラクル Permutation Superposition Oracles for Quantum Query Lower Bounds ( http://arxiv.org/abs/2407.09655v1 ) ライセンス: Link先を確認	Christian Majenz, Giulio Malavolta, Michael Walter,	(参考訳) 本稿では,Zhandryの圧縮オラクル法をランダムな置換に一般化する手法を提案する。そこで本稿では,ランダムな置換への一般化に抵抗したZhandryの手法の重要な特徴である,入力-出力対の述語に対するアルゴリズムの成功確率の有界化に,結果として生じるオラクルシミュレーションを利用する方法を示す。重要な技術的要素の1つは、オラクルのデータベースの置換を表すために厳密に単調な分解を使用することである。本フレームワークの適用例として, 1ラウンドスポンジ構成は, ランダムな置換モデルに対して無条件プレモージュ耐性を有することを示す。これはウンルーの予想を証明している。 We propose a generalization of Zhandry's compressed oracle method to random permutations, where an algorithm can query both the permutation and its inverse. We show how to use the resulting oracle simulation to bound the success probability of an algorithm for any predicate on input-output pairs, a key feature of Zhandry's technique that had hitherto resisted attempts at generalization to random permutations. One key technical ingredient is to use strictly monotone factorizations to represent the permutation in the oracle's database. As an application of our framework, we show that the one-round sponge construction is unconditionally preimage resistant in the random permutation model. This proves a conjecture by Unruh.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# BoBa:フェデレートラーニングにおけるデータ分散推論によるバックドア検出の強化 BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning ( http://arxiv.org/abs/2407.09658v1 ) ライセンス: Link先を確認	Ning Wang, Shanghao Shi, Yang Xiao, Yimin Chen, Y. Thomas Hou, Wenjing Lou,	(参考訳) フェデレーテッドラーニングは、協調モデルトレーニングの有望なアプローチであるが、その分散した性質のため、中毒攻撃の影響を受けやすい。特にバックドア攻撃は、トリガーを含む入力の予測を選択的に妥協するため、驚くべきステルス性を示している。このような攻撃を検知し軽減するためのこれまでの取り組みは、独立分散IID(Independent and Identically Distributed)データ仮定に基づいており、そこでは、良質なモデル更新はIDデータによる複数の特徴空間において高いレベルの類似性を示す。これにより、アウトリアはバックドア攻撃として検出される。それにもかかわらず、非IIDデータは、データバリエーションが良性モデル間のばらつきを導入し、異常検出に基づくメカニズムがより効果的になるため、バックドアアタック検出において重大な課題を呈している。本稿では,この問題を解決するために,分布認識型異常検出機構であるBoBaを提案する。データバラエティとバックドアアタックから生じるアウトレージを区別するために,データ分散を利用したクラスタリングクライアントと投票ベースの検出という2つのステップに分割することを提案する。クラスタリングと後続のバックドア検出は,クライアントデータ分布を知ることで大きな恩恵を受けることができるという直感に基づいて,新しいデータ分布推定機構を提案する。検出の堅牢性を改善するために,各クライアントが複数のクラスタに関連付けられている重なり合うクラスタリング手法を導入し,モデル更新の信頼性を単一のクラスタではなく複数のクラスタで評価する。広範囲な評価により,BoBa は攻撃成功率を 0.001 以下に抑えつつ,各種攻撃戦略や実験環境において高いタスク精度を維持しながら,攻撃成功率を0.001 以下に抑えることができることを示した。 Federated learning, while being a promising approach for collaborative model training, is susceptible to poisoning attacks due to its decentralized nature. Backdoor attacks, in particular, have shown remarkable stealthiness, as they selectively compromise predictions for inputs containing triggers. Previous endeavors to detect and mitigate such attacks are based on the Independent and Identically Distributed (IID) data assumption where benign model updates exhibit high-level similarity in multiple feature spaces due to IID data. Thus, outliers are detected as backdoor attacks. Nevertheless, non-IID data presents substantial challenges in backdoor attack detection, as the data variety introduces variance among benign models, making outlier detection-based mechanisms less effective. We propose a novel distribution-aware anomaly detection mechanism, BoBa, to address this problem. In order to differentiate outliers arising from data variety versus backdoor attack, we propose to break down the problem into two steps: clustering clients utilizing their data distribution followed by a voting-based detection. Based on the intuition that clustering and subsequent backdoor detection can drastically benefit from knowing client data distributions, we propose a novel data distribution inference mechanism. To improve detection robustness, we introduce an overlapping clustering method, where each client is associated with multiple clusters, ensuring that the trustworthiness of a model update is assessed collectively by multiple clusters rather than a single cluster. Through extensive evaluations, we demonstrate that BoBa can reduce the attack success rate to lower than 0.001 while maintaining high main task accuracy across various attack strategies and experimental settings.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# Bridging Dictionary: パルチザン語使用のAI生成辞書 Bridging Dictionary: AI-Generated Dictionary of Partisan Language Use ( http://arxiv.org/abs/2407.09661v1 ) ライセンス: Link先を確認	Hang Jiang, Doug Beeferman, William Brannon, Andrew Heyward, Deb Roy,	(参考訳) 言葉は様々な背景を持つ人々にとって異なる意味を持つことが多い。今日の社会的分極の時代は、特に政治的コミュニケーションやジャーナリズムにおいて、コミュニケーションの誤りを防ぐために、言葉を慎重に選ぶことを要求する。この問題に対処するために、異なる政治的見解を持つ人々によって、言葉がどのように認識されているかを示すインタラクティブなツールであるBridging Dictionaryを紹介した。 Bridging Dictionaryには、静的で印刷可能なドキュメントが含まれており、大きな言語モデルによって生成された要約を含む796の用語がある。これらの要約は、この用語が共和党員や民主党員によってどのように使われているかを強調している。さらにブリジング辞典は、ユーザーが選択した単語を探索し、その頻度、感情、要約、そして政治的分裂の例を視覚化するインタラクティブなインターフェイスを提供する。本稿では,ジャーナリストを事例として,人事機関の重要性と,このツールのさらなる強化への信頼を強調する。 Bridging Dictionaryのデプロイ版はhttps://dictionary.ccc-mit.org/で公開されている。 Words often carry different meanings for people from diverse backgrounds. Today's era of social polarization demands that we choose words carefully to prevent miscommunication, especially in political communication and journalism. To address this issue, we introduce the Bridging Dictionary, an interactive tool designed to illuminate how words are perceived by people with different political views. The Bridging Dictionary includes a static, printable document featuring 796 terms with summaries generated by a large language model. These summaries highlight how the terms are used distinctively by Republicans and Democrats. Additionally, the Bridging Dictionary offers an interactive interface that lets users explore selected words, visualizing their frequency, sentiment, summaries, and examples across political divides. We present a use case for journalists and emphasize the importance of human agency and trust in further enhancing this tool. The deployed version of Bridging Dictionary is available at https://dictionary.ccc-mit.org/.	翻訳日:2024-07-16 21:28:05 公開日:2024-07-12
# 地表面誘導拡散を用いた混合ビューパノラマ合成 Mixed-View Panorama Synthesis using Geospatially Guided Diffusion ( http://arxiv.org/abs/2407.09672v1 ) ライセンス: Link先を確認	Zhexiao Xiong, Xin Xing, Scott Workman, Subash Khanal, Nathan Jacobs,	(参考訳) そこでは,入力パノラマの小さなセットと,その領域の衛星画像が与えられた新しいパノラマを合成することが目的である。これは、入力パノラマ(サムビュー合成)や入力衛星画像(クロスビュー合成)のみを使用する以前の研究とは対照的である。混合ビュー設定は、世界中の任意の場所でパノラマ合成をサポートするのに最も自然であると主張する。重要な課題は、パノラマの空間的カバレッジが不均一であり、世界中の多くの地域ではほとんどパノラマが利用できないことである。本稿では,拡散モデルと注意に基づくアーキテクチャを用いて,利用可能なすべての入力画像から情報を抽出する手法を提案する。実験の結果,提案手法の有効性が示された。特に、利用可能なパノラマが、私たちが合成しようとしているパノラマの位置から遠く離れている場合のシナリオを扱うことができる。 We introduce the task of mixed-view panorama synthesis, where the goal is to synthesize a novel panorama given a small set of input panoramas and a satellite image of the area. This contrasts with previous work which only uses input panoramas (same-view synthesis), or an input satellite image (cross-view synthesis). We argue that the mixed-view setting is the most natural to support panorama synthesis for arbitrary locations worldwide. A critical challenge is that the spatial coverage of panoramas is uneven, with few panoramas available in many regions of the world. We introduce an approach that utilizes diffusion-based modeling and an attention-based architecture for extracting information from all available input imagery. Experimental results demonstrate the effectiveness of our proposed method. In particular, our model can handle scenarios when the available panoramas are sparse or far from the location of the panorama we are attempting to synthesize.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# 煙を再現する特性軌道の物理インフォームド学習 Physics-Informed Learning of Characteristic Trajectories for Smoke Reconstruction ( http://arxiv.org/abs/2407.09679v1 ) ライセンス: Link先を確認	Yiming Wang, Siyu Tang, Mengyu Chu,	(参考訳) 複雑な力学の限られた観察から生じる課題に対処するため、スパースビューのRGBビデオを通して、物理でインフォームドされた煙と障害物のニューラルな再構成を探索した。既存の物理インフォームドニューラルネットワークは、しばしば短期的な物理の制約を強調し、長期保存の適切な保存をあまり探さないままにしている。ラグランジアン流体軌道を暗黙的にモデル化するためにユーレリアニューラル場を利用した新しい表現であるニューラル特性軌道場を導入する。このトポロジフリーで自己微分可能な表現は、任意のフレーム間の効率的なフローマップ計算や、自動微分による効率的な速度抽出を容易にする。これにより、長期保存と短期物理学の先例をカバーするエンド・ツー・エンドの監視が可能となる。この表現に基づいて物理インフォームド・トラジェクトリ・ラーニングとNeRFに基づくシーン再構成への統合を提案する。我々は、自己教師付きシーン分解とシームレスな統合境界制約による高度な障害物処理を可能にする。以上の結果から,オクルージョンの不確実性,密度-色あいまいさ,静的-動的絡み合いといった課題を克服する能力を示した。コードとサンプルテストは \url{https://github.com/19reborn/PICT_smoke} にある。 We delve into the physics-informed neural reconstruction of smoke and obstacles through sparse-view RGB videos, tackling challenges arising from limited observation of complex dynamics. Existing physics-informed neural networks often emphasize short-term physics constraints, leaving the proper preservation of long-term conservation less explored. We introduce Neural Characteristic Trajectory Fields, a novel representation utilizing Eulerian neural fields to implicitly model Lagrangian fluid trajectories. This topology-free, auto-differentiable representation facilitates efficient flow map calculations between arbitrary frames as well as efficient velocity extraction via auto-differentiation. Consequently, it enables end-to-end supervision covering long-term conservation and short-term physics priors. Building on the representation, we propose physics-informed trajectory learning and integration into NeRF-based scene reconstruction. We enable advanced obstacle handling through self-supervised scene decomposition and seamless integrated boundary constraints. Our results showcase the ability to overcome challenges like occlusion uncertainty, density-color ambiguity, and static-dynamic entanglements. Code and sample tests are at \url{https://github.com/19reborn/PICT_smoke}.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# 産業応用のための文字列生成に基づく化学反応モデルの推算の高速化 Accelerating the inference of string generation-based chemical reaction models for industrial applications ( http://arxiv.org/abs/2407.09685v1 ) ライセンス: Link先を確認	Mikhail Andronov, Natalia Andronova, Michael Wand, Jürgen Schmidhuber, Djork-Arnè Clevert,	(参考訳) テンプレートのないSMILES-to-SMILES変換モデルによる反応予測と1段階の逆合成は、コンピュータ支援合成計画システムにおける産業的応用において、最先端の精度のために重要である。しかし、推論速度が遅い。本稿では,クエリ文字列列を適切な場所でターゲット文字列にコピーすることで,投機的復号化による自動回帰SMILESジェネレータの推論を高速化する手法を提案する。そこで,本手法をPytorch Lightningで実装した分子トランスに応用し,反応予測と1段階の逆合成において3倍以上の高速化を実現し,精度を損なうことなく実現した。 Template-free SMILES-to-SMILES translation models for reaction prediction and single-step retrosynthesis are of interest for industrial applications in computer-aided synthesis planning systems due to their state-of-the-art accuracy. However, they suffer from slow inference speed. We present a method to accelerate inference in autoregressive SMILES generators through speculative decoding by copying query string subsequences into target strings in the right places. We apply our method to the molecular transformer implemented in Pytorch Lightning and achieve over 3X faster inference in reaction prediction and single-step retrosynthesis, with no loss in accuracy.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# SPIN:自然画像における部分粒度の階層的セグメンテーション SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images ( http://arxiv.org/abs/2407.09686v1 ) ライセンス: Link先を確認	Josh Myers-Dean, Jarek Reynolds, Brian Price, Yifei Fan, Danna Gurari,	(参考訳) 階層的セグメンテーションは、様々なレベルの粒度のセグメンテーションを作成する。本稿では,SPIN(SubPartImageNet)と呼ばれる自然画像のサブパートアノテーションを用いた,最初の階層的セマンティックセマンティックセマンティックセマンティクスデータセットを紹介する。また,アルゴリズムが階層レベルの空間的関係と意味的関係をいかにうまく捉えるかを評価するために,新しい評価指標を2つ導入した。 3つの異なるタスクにまたがる最新のモデルをベンチマークし、オブジェクト、部品、サブパート間の長所と短所を分析します。コミュニティ全体の進展を促進するため、データセットをhttps://joshmyersdean.github.io/spin/index.htmlで公開しています。 Hierarchical segmentation entails creating segmentations at varying levels of granularity. We introduce the first hierarchical semantic segmentation dataset with subpart annotations for natural images, which we call SPIN (SubPartImageNet). We also introduce two novel evaluation metrics to evaluate how well algorithms capture spatial and semantic relationships across hierarchical levels. We benchmark modern models across three different tasks and analyze their strengths and weaknesses across objects, parts, and subparts. To facilitate community-wide progress, we publicly release our dataset at https://joshmyersdean.github.io/spin/index.html.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# 深い期待-一貫性近似による高速かつロバストな位相検索 Fast and Robust Phase Retrieval via Deep Expectation-Consistent Approximation ( http://arxiv.org/abs/2407.09687v1 ) ライセンス: Link先を確認	Saurav K. Shastri, Philip Schniter,	(参考訳) フェーズレス計測から正確なイメージを正確に回収することは、困難で長期にわたる問題である。本研究では,予測整合性(EC)近似とDeep Denoising Networkを組み合わせたDeepECprを提案する。非伝統的な方法でECを適用することに加えて、DeepECprは最近の拡散法にインスパイアされた新しい確率減衰スキームを含んでいる。プラグ・アンド・プレイの事前に基づく既存の位相検索法と同様に、DeepECprはデノナイジング段階を測定-探索段階で反復する。しかし、既存のメソッドとは異なり、DeepECprははるかに少ないデノイザ呼び出しを必要とします。 We compare deepECpr to the State-of-the-the-art prDeep (Metzler et al , 2018), Deep-ITA (Wang et al , 2020), and Diffusion Posterior Sampling (Chung et al , 2023) method for noisy phase-retrieval of color, natural, and unnatural grayscale images on oversampled-Fourier and coded-diffraction-pattern Measurement and find improve in PSNR and SSIM with 5x less denoiser call。 Accurately recovering images from phaseless measurements is a challenging and long-standing problem. In this work, we present "deepECpr," which combines expectation-consistent (EC) approximation with deep denoising networks to surpass state-of-the-art phase-retrieval methods in both speed and accuracy. In addition to applying EC in a non-traditional manner, deepECpr includes a novel stochastic damping scheme that is inspired by recent diffusion methods. Like existing phase-retrieval methods based on plug-and-play priors, regularization by denoising, or diffusion, deepECpr iterates a denoising stage with a measurement-exploitation stage. But unlike existing methods, deepECpr requires far fewer denoiser calls. We compare deepECpr to the state-of-the-art prDeep (Metzler et al., 2018), Deep-ITA (Wang et al., 2020), and Diffusion Posterior Sampling (Chung et al., 2023) methods for noisy phase-retrieval of color, natural, and unnatural grayscale images on oversampled-Fourier and coded-diffraction-pattern measurements and find improvements in both PSNR and SSIM with 5x fewer denoiser calls.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# 健康データの社会的決定因子統合のための大規模言語モデル:30日間の心不全予測を事例として Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction ( http://arxiv.org/abs/2407.09688v1 ) ライセンス: Link先を確認	Chase Fensore, Rodrigo M. Carrillo-Larco, Shivani A. Patel, Alanna A. Morris, Joyce C. Ho,	(参考訳) 社会的健康決定因子(SDOH)$-$は、人々が生活し、成長し、そして年齢が$-$が健康上の結果に重要な役割を果たす無数の状況である。しかし、既存の結果予測モデルは、しばしばSDOHのプロキシのみを特徴として用いている。最近のオープンデータイニシアチブは、より包括的なSDOHのビューを構築する機会を提供するが、公的なSDOHデータの量と多様性が増大するにつれて、個々の患者にとって最も関連性の高いデータを手作業で統合することはますます困難になっている。大規模言語モデル(LLM)は、構造化されたデータを自動的にアノテートすることを約束している。本稿では,LSMを用いたSDOHデータ統合の実現可能性と臨床予測におけるこれらのSDOH機能の有用性について,エンド・ツー・エンドのケーススタディを行った。まず、2つの公開アクセス可能なSDOHデータソースから5つのセマンティックSDOHカテゴリの1つに700以上の変数を手動でラベル付けする。そして,この分類課題において,9つのオープンソースLCMの性能をベンチマークする。最後に,39k心不全(HF)患者の30日間の入院寛解を予測するためのMLモデルを訓練し,分類したSDOH変数の予測性能と標準臨床変数との比較を行った。さらに,LLMのアノテーション性能に対する数発のLDMプロンプトの影響について検討し,それらの変数を正確に注釈づけする上でどの情報が役立つかを評価するプロンプトに関するメタデータのアブレーション研究を行う。我々は,SDOH変数をゼロショットプロンプトで効果的に正確にアノテートできるオープンソースのLCMが,微調整を必要とせずに存在することを発見した。要旨: 標準臨床特徴と組み合わせた場合, SDOH 変数の LLM アノテーションと構築環境サブセットは, HF 患者の30日間の寛解を予測できる最高の成績を示す。 Social determinants of health (SDOH) $-$ the myriad of circumstances in which people live, grow, and age $-$ play an important role in health outcomes. However, existing outcome prediction models often only use proxies of SDOH as features. Recent open data initiatives present an opportunity to construct a more comprehensive view of SDOH, but manually integrating the most relevant data for individual patients becomes increasingly challenging as the volume and diversity of public SDOH data grows. Large language models (LLMs) have shown promise at automatically annotating structured data. Here, we conduct an end-to-end case study evaluating the feasibility of using LLMs to integrate SDOH data, and the utility of these SDOH features for clinical prediction. We first manually label 700+ variables from two publicly-accessible SDOH data sources to one of five semantic SDOH categories. Then, we benchmark performance of 9 open-source LLMs on this classification task. Finally, we train ML models to predict 30-day hospital readmission among 39k heart failure (HF) patients, and we compare the prediction performance of the categorized SDOH variables with standard clinical variables. Additionally, we investigate the impact of few-shot LLM prompting on LLM annotation performance, and perform a metadata ablation study on prompts to evaluate which information helps LLMs accurately annotate these variables. We find that some open-source LLMs can effectively, accurately annotate SDOH variables with zero-shot prompting without the need for fine-tuning. Crucially, when combined with standard clinical features, the LLM-annotated Neighborhood and Built Environment subset of the SDOH variables shows the best performance predicting 30-day readmission of HF patients.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# 信頼されたサーバのないプライベートな不均一なフェデレーション学習:凸損失に対する誤り最適かつコミュニケーション効率のアルゴリズム Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses ( http://arxiv.org/abs/2407.09690v1 ) ライセンス: Link先を確認	Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright,	(参考訳) 我々は,サーバやサイロ/クライアントを信頼していない人たちの個人データを用いて,連邦学習(FL)の問題を再考する。この文脈では、すべてのサイロ(例えば病院)は、複数の人(例えば患者)からのデータを持ち、サーバーや他のサイロがデータを発見しようとする場合でも、各人のデータ(例えば健康記録)のプライバシーを保護する必要がある。 Inter-Silo Record-Level Differential Privacy (ISRL-DP) は、各サイロのデータ漏洩を防止し、サイロ i の通信がアイテムレベルの差分プライバシーを満たすように要求する。前者のarXiv:2203.06735は、同種(d.d.)のサイロデータと凸損失関数を持つISRL-DPアルゴリズムの最適過剰リスク境界を特徴付ける。しかし、(1)同じ過剰なリスク境界を不均一な(非i.d.)サイロデータで達成できるのか? 2)コミュニケーションラウンドを減らして最適なリスク境界を達成できるのか? 本稿では,両質問に対して肯定的な回答を与える。異種サイロデータの存在下で最適な過大なリスク境界を実現する新しいISRL-DP FLアルゴリズムを提案する。さらに、我々のアルゴリズムは従来の最先端技術よりも通信効率が高い。スムーズな損失関数に対して、我々のアルゴリズムは最適余剰リスクバウンドを達成し、非プライベートな下位バウンドと一致する通信複雑性を持つ。さらに、我々のアルゴリズムは以前の最先端技術よりも計算効率が良い。 We revisit the problem of federated learning (FL) with private data from people who do not trust the server or other silos/clients. In this context, every silo (e.g. hospital) has data from several people (e.g. patients) and needs to protect the privacy of each person's data (e.g. health records), even if the server and/or other silos try to uncover this data. Inter-Silo Record-Level Differential Privacy (ISRL-DP) prevents each silo's data from being leaked, by requiring that silo i's communications satisfy item-level differential privacy. Prior work arXiv:2203.06735 characterized the optimal excess risk bounds for ISRL-DP algorithms with homogeneous (i.i.d.) silo data and convex loss functions. However, two important questions were left open: (1) Can the same excess risk bounds be achieved with heterogeneous (non-i.i.d.) silo data? (2) Can the optimal risk bounds be achieved with fewer communication rounds? In this paper, we give positive answers to both questions. We provide novel ISRL-DP FL algorithms that achieve the optimal excess risk bounds in the presence of heterogeneous silo data. Moreover, our algorithms are more communication-efficient than the prior state-of-the-art. For smooth loss functions, our algorithm achieves the optimal excess risk bound and has communication complexity that matches the non-private lower bound. Additionally, our algorithms are more computationally efficient than the previous state-of-the-art.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# EVOLVE: 微調整GPT様モデルを用いたソーシャルメディアにおけるユーザ進化とネットワークダイナミクスの予測 EVOLVE: Predicting User Evolution and Network Dynamics in Social Media Using Fine-Tuned GPT-like Model ( http://arxiv.org/abs/2407.09691v1 ) ライセンス: Link先を確認	Ismail Hossain, Md Jahangir Alam, Sai Puppala, Sajedul Talukder,	(参考訳) ソーシャルメディアプラットフォームは、個人の感情、日々の活動、さまざまなライフイベントの共有に広く使われており、最新の出来事を人々に知らせている。ユーザーがアカウントを作成する瞬間から、彼らは友達やフォロワーのネットワークを継続的に拡張し、投稿、コメント、共有によって他人と自由にやりとりする。時間の経過とともに、ユーザー行動は人口統計特性と彼らが確立したネットワークに基づいて進化する。本研究では,ユーザが生涯にわたってソーシャルメディア上でどのように進化していくかを理解するための予測手法を提案し,その進化の次の段階を予測する。我々はGPTのようなデコーダのみのモデル(E-GPT: Evolution-GPT)を微調整し、オンラインソーシャルメディアにおけるユーザの進化の将来のステージを予測する。我々は,これらのモデルの性能を評価し,ユーザの属性がネットワーク内の変化にどのように影響するかを,ソーシャルメディア上での今後のつながりやユーザ活動の変化を予測し,またリコメンデーションシステムなどの他のソーシャルメディアの課題にも対処する。 Social media platforms are extensively used for sharing personal emotions, daily activities, and various life events, keeping people updated with the latest happenings. From the moment a user creates an account, they continually expand their network of friends or followers, freely interacting with others by posting, commenting, and sharing content. Over time, user behavior evolves based on demographic attributes and the networks they establish. In this research, we propose a predictive method to understand how a user evolves on social media throughout their life and to forecast the next stage of their evolution. We fine-tune a GPT-like decoder-only model (we named it E-GPT: Evolution-GPT) to predict the future stages of a user's evolution in online social media. We evaluate the performance of these models and demonstrate how user attributes influence changes within their network by predicting future connections and shifts in user activities on social media, which also addresses other social media challenges such as recommendation systems.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# 数理的枠組み,モデリングパラダイムの分類,およびニューラルシンボリックシステムのための学習技術スイート A Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems ( http://arxiv.org/abs/2407.09693v1 ) ライセンス: Link先を確認	Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor,	(参考訳) ニューラル・シンボリック・システム(NeSy)の分野は急速に成長している。提案されたアプローチは、ニューラルおよびシンボリックメソッドの共生結合を達成する上で大きな可能性を示している。しかし、それぞれのNeSy系は基本的な方法で異なる。共通点とアプローチの違いを照らし、さらなる進歩を可能にする統一理論が必要である。本稿では,ニューラル・シンボリックエネルギーベースモデル(NeSy-EBMs)を紹介する。我々はNeSy-EBMを用いて,システムのニューラルシンボリックインタフェースと推論機能に着目したモデリングパラダイムの分類法を開発した。さらに,NeSy-EBMの学習手法についても紹介する。重要なことは、NeSy-EBMは、顕著な学習損失の勾配に対する一般的な表現の導出を可能にし、両レベルおよび確率的ポリシー最適化を含む複数の領域からの手法を活用する4つの学習アプローチを提供する。最後に、NeSyシステムの現実的な応用を容易にするため、スケーラビリティと表現性のために設計されたオープンソースのNeSy-EBMライブラリであるNeuPSLを提案する。画像分類,グラフノードラベリング,自動運転車の状況認識,質問応答など,さまざまなタスクにおいて,NeSy-EBMの実用的メリットを示す。 The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symbolic Energy-Based Models (NeSy-EBMs), a unifying mathematical framework for discriminative and generative modeling with probabilistic and non-probabilistic NeSy approaches. We utilize NeSy-EBMs to develop a taxonomy of modeling paradigms focusing on a system's neural-symbolic interface and reasoning capabilities. Additionally, we introduce a suite of learning techniques for NeSy-EBMs. Importantly, NeSy-EBMs allow the derivation of general expressions for gradients of prominent learning losses, and we provide four learning approaches that leverage methods from multiple domains, including bilevel and stochastic policy optimization. Finally, we present Neural Probabilistic Soft Logic (NeuPSL), an open-source NeSy-EBM library designed for scalability and expressivity, facilitating real-world application of NeSy systems. Through extensive empirical analysis across multiple datasets, we demonstrate the practical advantages of NeSy-EBMs in various tasks, including image classification, graph node labeling, autonomous vehicle situation awareness, and question answering.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# ディバイドとフューズ:部分可視画像から身体部分メッシュを復元する Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images ( http://arxiv.org/abs/2407.09694v1 ) ライセンス: Link先を確認	Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu,	(参考訳) 本稿では,人体メッシュ再構築のための新しいボトムアップ手法を提案する。 SMPLのような全身のパラメトリックモデルに依存した従来のトップダウン手法は、人間の小さな部分しか見えず、正確なメッシュ再構築のためにほとんどの人の体を視認する必要がある。この制限を克服するため,本手法では「D&F(Divide and Fuse)」戦略を採用し,人体部分を融合する前に個別に再構築し,閉塞に対する堅牢性を確保する。我々は,いくつかの形状と大域的位置パラメータから独立にメッシュを再構成するHuman Part Parametric Models (HPPM) を設計する。特別に設計された融合モジュールは、少数しか見えなくても、再建された部品をシームレスに統合する。私たちは、パラメトリックメッシュモデルをトレーニングするために、大量の地上トルスSMPLデータを使用します。提案手法の訓練と評価を容易にするため,HPPMアノテーションを付加した部分可視像を特徴とするベンチマークデータセットを構築した。これらのベンチマークデータセットを用いて,本手法の有効性を実証した。特に,従来の手法が再現性を維持するのに苦労する,かなりの可視性のあるシナリオにおいて。 We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruction. To overcome this limitation, our method employs a "Divide and Fuse (D&F)" strategy, reconstructing human body parts independently before fusing them, thereby ensuring robustness against occlusions. We design Human Part Parametric Models (HPPM) that independently reconstruct the mesh from a few shape and global-location parameters, without inter-part dependency. A specially designed fusion module then seamlessly integrates the reconstructed parts, even when only a few are visible. We harness a large volume of ground-truth SMPL data to train our parametric mesh models. To facilitate the training and evaluation of our method, we have established benchmark datasets featuring images of partially visible humans with HPPM annotations. Our experiments, conducted on these benchmark datasets, demonstrate the effectiveness of our D&F method, particularly in scenarios with substantial invisibility, where traditional approaches struggle to maintain reconstruction quality.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# 距離ビューに基づく3次元セマンティックセマンティックセグメンテーションのマルチセンサフュージョンによるリアルタイム化 Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion ( http://arxiv.org/abs/2407.09697v1 ) ライセンス: Link先を確認	Shiqi Tan, Hamidreza Fazlali, Yixuan Xu, Yuan Ren, Bingbing Liu,	(参考訳) Range-View(RV)ベースの3Dポイントクラウドセグメンテーションは、そのコンパクトなデータ形式のために広く採用されている。しかし, RV法は, 3次元点雲の希少な性質のため, 閉点に対して頑健なセグメンテーションが得られず, 投影されたRGB画像の歪みに悩まされる。これらの問題を緩和するために、新しいLiDARとカメラレンジビューベースの3Dポイントクラウドセマンティックセマンティックセマンティック手法(LaCRange)を提案する。具体的には、RGB画像のRVプロジェクションの悪影響を改善するために、歪み補償知識蒸留(DCKD)戦略を設計する。さらに、強靭で保存的なセンサ融合のために、コンテキストベースの特徴融合モジュールが導入された。最後に, RVの分解能の限界と3次元トポロジの不足に対処するため, 2次元特徴量の適切な集約と3次元特徴量の増大のために, 新たな点修正方式を考案した。提案手法を大規模自律走行データセットであるSemanticKITTIとnuScenesで評価した。提案手法はリアルタイム性に加えて, nuScenes ベンチマークの最先端結果も達成する。 Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form. However, RV-based methods fall short in providing robust segmentation for the occluded points and suffer from distortion of projected RGB images due to the sparse nature of 3D point clouds. To alleviate these problems, we propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentation method (LaCRange). Specifically, a distortion-compensating knowledge distillation (DCKD) strategy is designed to remedy the adverse effect of RV projection of RGB images. Moreover, a context-based feature fusion module is introduced for robust and preservative sensor fusion. Finally, in order to address the limited resolution of RV and its insufficiency of 3D topology, a new point refinement scheme is devised for proper aggregation of features in 2D and augmentation of point features in 3D. We evaluated the proposed method on large-scale autonomous driving datasets \ie SemanticKITTI and nuScenes. In addition to being real-time, the proposed method achieves state-of-the-art results on nuScenes benchmark	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# RIO-CPD:相関を考慮したオンライン変化点検出のためのリーマン幾何学的手法 RIO-CPD: A Riemannian Geometric Method for Correlation-aware Online Change Point Detection ( http://arxiv.org/abs/2407.09698v1 ) ライセンス: Link先を確認	Chengyuan Deng, Zhengzhang Chen, Xujiang Zhao, Haoyu Wang, Junxiang Wang, Haifeng Chen, Jie Gao,	(参考訳) 変化点検出の目的は、データシーケンス内の潜在的に複数の点における急激な変化を特定することである。このタスクは、データの限界分布とジョイント分布の両方の変化を含む、さまざまなタイプの変更が発生するオンライン環境では特に困難である。本稿では,リーマン幾何学上の相関行列を逐次追跡することにより,これらの課題に対処する。対称正定行列多様体のリーマン幾何学と,変化点を検出する累積和統計量(CUSUM)を組み合わせた,非パラメトリック相関対応オンライン変化点検出フレームワークであるRio-CPDを提案する。 Rio-CPDは、現在の観測から以前の観測のFr'echet平均までの距離を計算することでCUSUMを強化する。リーマン幾何学のメトリクスを慎重に選択することで、リオCPDは単純で計算的に効率的である。合成データセットと実世界のデータセットの両方の実験結果から、Rio-CPDは検出精度と効率において既存の手法よりも優れていることが示された。 The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on the Riemannian geometry, where the geodesic distances accurately capture the development of correlations. We propose Rio-CPD, a non-parametric correlation-aware online change point detection framework that combines the Riemannian geometry of the manifold of symmetric positive definite matrices and the cumulative sum statistic (CUSUM) for detecting change points. Rio-CPD enhances CUSUM by computing the geodesic distance from present observations to the Fr\'echet mean of previous observations. With careful choice of metrics equipped to the Riemannian geometry, Rio-CPD is simple and computationally efficient. Experimental results on both synthetic and real-world datasets demonstrate that Rio-CPD outperforms existing methods in detection accuracy and efficiency.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# Pythonベースの化学フレームワークのシミュレーションにGPUアクセラレーションを導入する Introducing GPU-acceleration into the Python-based Simulations of Chemistry Framework ( http://arxiv.org/abs/2407.09700v1 ) ライセンス: Link先を確認	Rui Li, Qiming Sun, Xing Zhang, Garnet Kin-Lic Chan,	(参考訳) 我々はPySCFのメソッドのGPUアクセラレーションを提供するモジュールであるGPU4PySCFの最初のバージョンを紹介する。コア機能として、2電子反発積分(ERIs)のGPU実装が提供され、Rys二次関数を用いて最大g関数を構成する。量子化学のワークフローをいかに加速させるかの図解として、積分直交のハートリー・フォック構造と核勾配構造において、ERIを効率的に利用する方法について述べる。ベンチマーク計算では、PySCFのマルチスレッドCPUHartree-Fockコードに対する2桁の大幅な高速化と、1つのNVIDIA A100 GPU上のGAMESSやQUICKを含む他のGPUアクセラレーション量子化学パッケージに匹敵する性能を示している。 We introduce the first version of GPU4PySCF, a module that provides GPU acceleration of methods in PySCF. As a core functionality, this provides a GPU implementation of two-electron repulsion integrals (ERIs) for contracted basis sets comprising up to g functions using Rys quadrature. As an illustration of how this can accelerate a quantum chemistry workflow, we describe how to use the ERIs efficiently in the integral-direct Hartree-Fock Fock build and nuclear gradient construction. Benchmark calculations show a significant speedup of two orders of magnitude with respect to the multi-threaded CPU Hartree-Fock code of PySCF, and performance comparable to other GPU-accelerated quantum chemical packages including GAMESS and QUICK on a single NVIDIA A100 GPU.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# 優先順位付けされたリプレイと一般化の相互作用の検討 Investigating the Interplay of Prioritized Replay and Generalization ( http://arxiv.org/abs/2407.09702v1 ) ライセンス: Link先を確認	Parham Mohammad Panahi, Andrew Patterson, Martha White, Adam White,	(参考訳) 過去のデータを再利用し、サンプル効率を向上させるため、強化学習では、経験の再生は至るところで行われている。性能向上のために様々なスマートサンプリングスキームが導入されたが、今までのところ、一様サンプリングが最も一般的なアプローチである。 1つの例外は優先順位付き体験再生(PER)であり、サンプリングは動的プログラミングにおける優先順位付きスイーピングの成功にインスパイアされたTDエラーに比例して行われる。 PERの当初の作業では、Atariの改善が見られたが、その後の結果はさまざまだ。本稿ではPERの様々なバリエーションについて検討し、PERがいつ役に立つかを理解する。予測タスクでは,PERは表の設定で値の伝搬を改善することができるが,ニューラルネットワークと組み合わせた場合の挙動は著しく異なる。一般化を制御するためにターゲットネットワークのアップデートを遅らせたり、確率性を追跡するためにPERで期待されるTDエラーの見積を使用するなど、ある種の緩和は、PERやニューラルネットワークによるエラーの大規模なスパイクを回避することができるが、それでも一般的には、均一なリプレイよりも優れていない。制御タスクでは、優先順位付けされたどの変種も一貫して均一なリプレイを上回っていない。 Experience replay is ubiquitous in reinforcement learning, to reuse past data and improve sample efficiency. Though a variety of smart sampling schemes have been introduced to improve performance, uniform sampling by far remains the most common approach. One exception is Prioritized Experience Replay (PER), where sampling is done proportionally to TD errors, inspired by the success of prioritized sweeping in dynamic programming. The original work on PER showed improvements in Atari, but follow-up results are mixed. In this paper, we investigate several variations on PER, to attempt to understand where and when PER may be useful. Our findings in prediction tasks reveal that while PER can improve value propagation in tabular settings, behavior is significantly different when combined with neural networks. Certain mitigations -- like delaying target network updates to control generalization and using estimates of expected TD errors in PER to avoid chasing stochasticity -- can avoid large spikes in error with PER and neural networks, but nonetheless generally do not outperform uniform replay. In control tasks, none of the prioritized variants consistently outperform uniform replay.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# エレガントブリッジとは何か:多言語 LLM は異なる言語で同様にバイアスされる What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages ( http://arxiv.org/abs/2407.09704v1 ) ライセンス: Link先を確認	Viktor Mihaylov, Aleksandar Shtedritski,	(参考訳) 本稿では,文法的ジェンダーのレンズによるLarge Language Models(LLMs)のバイアスについて検討する。心理言語学における基礎研究、特にジェンダーが言語知覚に与える影響の研究からインスピレーションを得た上で、多言語LLMを活用してボロディツキーの基礎実験(2003年)を再考し、拡張する。 LLMを文法性に関連する心理言語学的バイアスを調べるための新しい手法として,様々な言語で形容詞を持つ名詞を記述するモデルを提案し,特に文法性のある言語に焦点を当てた。特に, 名詞を記述するために LLM が用いている形容詞の文法的性別を予測するために, 男女・言語間の形容詞共起について検討し, 二項分類器を訓練する。意外なことに、単純な分類器は偶然以上の名詞の性別を予測できるだけでなく、言語間の移動可能性も示せる。 LLMは異なる言語で異なる単語を記述できるが、同様にバイアスを受ける。 This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender. Drawing inspiration from seminal works in psycholinguistics, particularly the study of gender's influence on language perception, we leverage multilingual LLMs to revisit and expand upon the foundational experiments of Boroditsky (2003). Employing LLMs as a novel method for examining psycholinguistic biases related to grammatical gender, we prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender. In particular, we look at adjective co-occurrences across gender and languages, and train a binary classifier to predict grammatical gender given adjectives an LLM uses to describe a noun. Surprisingly, we find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability. We show that while LLMs may describe words differently in different languages, they are biased similarly.	翻訳日:2024-07-16 21:18:20 公開日:2024-07-12
# バランスの取れたマルチモーダル学習のための診断と再学習 Diagnosing and Re-learning for Balanced Multimodal Learning ( http://arxiv.org/abs/2407.09705v1 ) ライセンス: Link先を確認	Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu,	(参考訳) モデルが特定のモダリティのトレーニングを好む不均衡なマルチモーダル学習問題を克服するため、既存の手法では、異なる視点からユニモーダルエンコーダのトレーニングを制御し、モーダル間性能の相違を基礎として提案する。しかし、モダリティキャパシティの本質的な制限は無視される。少ない情報的モダリティは ``worse-learnt'' と認識できるため、モデルにより多くのノイズを記憶させ、非生産的にマルチモーダルモデルの能力に影響を与える可能性がある。さらに、現在のモダリティ変調法は、選択された劣悪な学習モダリティに狭く集中し、他者の訓練を抑える。したがって、モダリティキャパシティの本質的な制限を考慮し、バランスをとる際にすべてのモダリティを考慮することが不可欠である。そこで本研究では,診断と再学習の手法を提案する。まず、そのユニモーダル表現空間の分離性に基づいて各モーダルの学習状態を推定し、それに対応するユニモーダルエンコーダをソフトに初期化するために使用する。このように、少ない情報モダリティの過度な強調は避けられる。さらに、低遅延モードのエンコーダが強化され、他のモードの過度なトレーニングが回避される。したがって、マルチモーダル学習は効果的にバランスを保ち、強化される。複数種類のモダリティとマルチモーダルフレームワークを網羅した実験は、バランスの取れたマルチモーダル学習において、単純なyet効率の手法の優れた性能を示す。ソースコードとデータセットは \url{https://github.com/GeWu-Lab/Diagnosing_Relearning_ECCV2024} で公開されている。 To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal performance discrepancy as the basis. However, the intrinsic limitation of modality capacity is ignored. The scarcely informative modalities can be recognized as ``worse-learnt'' ones, which could force the model to memorize more noise, counterproductively affecting the multimodal model ability. Moreover, the current modality modulation methods narrowly concentrate on selected worse-learnt modalities, even suppressing the training of others. Hence, it is essential to consider the intrinsic limitation of modality capacity and take all modalities into account during balancing. To this end, we propose the Diagnosing \& Re-learning method. The learning state of each modality is firstly estimated based on the separability of its uni-modal representation space, and then used to softly re-initialize the corresponding uni-modal encoder. In this way, the over-emphasizing of scarcely informative modalities is avoided. In addition, encoders of worse-learnt modalities are enhanced, simultaneously avoiding the over-training of other modalities. Accordingly, multimodal learning is effectively balanced and enhanced. Experiments covering multiple types of modalities and multimodal frameworks demonstrate the superior performance of our simple-yet-effective method for balanced multimodal learning. The source code and dataset are available at \url{https://github.com/GeWu-Lab/Diagnosing_Relearning_ECCV2024}.	翻訳日:2024-07-16 21:08:36 公開日:2024-07-12
# GOFA: 共同グラフ言語モデリングのための1対オール生成モデル GOFA: A Generative One-For-All Model for Joint Graph Language Modeling ( http://arxiv.org/abs/2407.09709v1 ) ライセンス: Link先を確認	Lecheng Kong, Jiarui Feng, Hao Liu, Chengsong Huang, Jiaxin Huang, Yixin Chen, Muhan Zhang,	(参考訳) LLM(Large Language Models)やLVM(Large Vision Models)といった基礎的なモデルは、各分野において最も強力なツールの1つとして登場した。しかし、テキストデータや画像データとは異なり、グラフデータは決定的な構造を持っておらず、グラフ基礎モデル(GFM)を開発する上で大きな課題となっている。例えば、グラフモデルを設計する現在の試みでは、グラフデータをLLMベースの予測のための言語形式に変換するか、あるいはアシスタントとしてLLMを使ってGNNモデルをトレーニングしている。前者は無制限のタスクを処理でき、後者はグラフ構造をよりよくキャプチャする。本稿では,自己教師型事前学習,タスクの流動性,グラフ認識という,GFMの重要な3つの特性を同定する。これらの特性を考慮し,従来の言語モデリングをグラフ領域に拡張し,新たな生成グラフ言語モデルGOFAを提案する。このモデルは、ランダムに初期化されたGNN層を凍結学習されたLLMにインターリーブし、セマンティックおよび構造モデリング能力を有機的に組み合わせる。 GOFAは、新たに提案されたグラフレベルの次単語予測、質問応答、構造的タスクに基づいて、上記のGFM特性を得るために事前訓練される。事前訓練されたモデルは、タスク解決能力を得るために下流タスクにさらに微調整される。細調整されたモデルは、様々な下流タスクに基づいて評価され、ゼロショットシナリオにおける構造的および文脈的問題を解く強力な能力を示す。コードはhttps://github.com/JiaruiFeng/GOFAで公開されている。 Foundation models, such as Large Language Models (LLMs) or Large Vision Models (LVMs), have emerged as one of the most powerful tools in the respective fields. However, unlike text and image data, graph data do not have a definitive structure, posing great challenges to developing a Graph Foundation Model (GFM). For example, current attempts at designing general graph models either transform graph data into a language format for LLM-based prediction or still train a GNN model with LLM as an assistant. The former can handle unlimited tasks, while the latter captures graph structure much better -- yet, no existing work can achieve both simultaneously. In this paper, we identify three key desirable properties of a GFM: self-supervised pretraining, fluidity in tasks, and graph awareness. To account for these properties, we extend the conventional language modeling to the graph domain and propose a novel generative graph language model GOFA to solve the problem. The model interleaves randomly initialized GNN layers into a frozen pre-trained LLM so that the semantic and structural modeling abilities are organically combined. GOFA is pre-trained on newly proposed graph-level next-word prediction, question-answering, and structural tasks to obtain the above GFM properties. The pre-trained model is further fine-tuned on downstream tasks to obtain task-solving ability. The fine-tuned model is evaluated on various downstream tasks, demonstrating a strong ability to solve structural and contextual problems in zero-shot scenarios. The code is available at https://github.com/JiaruiFeng/GOFA.	翻訳日:2024-07-16 21:08:35 公開日:2024-07-12
# DisQ: 量子分散システムのためのマルコフ決定プロセスに基づく言語 DisQ: A Markov Decision Process Based Language for Quantum Distributed Systems ( http://arxiv.org/abs/2407.09710v1 ) ライセンス: Link先を確認	Le Chang, Saitej Yavvari, Rance Cleaveland, Samik Basu, Liyi Li,	(参考訳) 量子コンピュータの開発は、重要な量子資源の制限にもかかわらず、大きなマイルストーンに達している。近年、単一位置量子コンピューティングと量子ネットワーク技術を組み合わせて、遠隔プロセッサで大きな絡み合った量子ビット群を構築できるような分散量子システムの開発が試みられ、量子アルゴリズムを分散的に実行できるようになった。本研究では,分散バージョンへの量子アルゴリズムの書き直しを容易にするフレームワークとしてDisQを提案する。 DisQの中核は分散量子プログラミング言語であり、化学抽象機械(CHAM)とマルコフ決定プロセス(MDP)の概念と、明確に区別された量子並列性と分散挙動を提供することを目的としている。本研究では,DisQ言語に基づいて,量子アルゴリズムの等価性とその分散バージョンを検証するシミュレーション関係を構築した。分散バージョンに等価な書き直しを示すために、量子加算やショアのアルゴリズムなどのいくつかのケーススタディを示す。 The development of quantum computers has reached a great milestone, in spite of restrictions on important quantum resources: the number of qubits being entangled at a single-location quantum computer. Recently, there has been some work to combine single-location quantum computing and quantum networking techniques to develop distributed quantum systems such that large entangled qubit groups can be established through remote processors, and quantum algorithms can be executed distributively. We present DisQ as a framework to facilitate the rewrites of quantum algorithms to their distributed versions. The core of DisQ is a distributed quantum programming language that combines the concepts of Chemical Abstract Machine (CHAM) and Markov Decision Processes (MDP) with the objective of providing a clearly distinguishing quantum concurrent and distributed behaviors. Based on the DisQ language, we develop a simulation relation for verifying the equivalence of a quantum algorithm and its distributed versions. We present several case studies, such as quantum addition and Shor's algorithm, to demonstrate their equivalent rewrites to distributed versions.	翻訳日:2024-07-16 21:08:35 公開日:2024-07-12
# Deep-TEMPEST:Deep Learningを使って意図しない電磁エマニュエーションからHDMIを盗聴する Deep-TEMPEST: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations ( http://arxiv.org/abs/2407.09717v1 ) ライセンス: Link先を確認	Santiago Fernández, Emilio Martínez, Gabriel Varela, Pablo Musé, Federico Larroca,	(参考訳) 本研究では,ケーブルやコネクタ,特にHDMIから無意識に放出される電磁波を解析することにより,デジタルビデオディスプレイの盗聴の問題に対処する。この問題はTEMPESTとして知られている。アナログケース(VGA)と比較して、デジタルケースは10ビットの符号化により、観測された信号と画素の強度との間の帯域幅と非線形マッピングがはるかに大きくなるため、難しい。その結果、アナログケース用に設計された盗聴システムは、デジタルビデオに適用した場合、不明瞭で読みにくい画像が得られる。提案手法は、問題を逆問題として再キャストし、深層学習モジュールを訓練し、観測された電磁波を表示された画像にマッピングする。しかし、このアプローチは信号の詳細な数学的解析を必要としており、まず、チューニングする周波数を決定するだけでなく、実際のTEMPESTセットアップを実際に必要とせずにトレーニングサンプルを生成する。これにより、特にいくつかの設定が検討されている場合、時間が節約され、これらのサンプルを取得する必要がなくなる。本システムは,テキストにおける平均文字誤り率の向上に重点を置いており,従来の実装に比べて60パーセント以上向上している。提案システムは、広く利用可能なSoftware Defined Radioに基づいており、完全にオープンソースであり、人気のあるGNU Radioフレームワークにシームレスに統合されている。トレーニング用に生成したデータセットも共有しています。最後に、同様の原理に基づいて設計されたシステムによって盗難される可能性を最小限に抑えるために、いくつかの対策について論じる。 In this work, we address the problem of eavesdropping on digital video displays by analyzing the electromagnetic waves that unintentionally emanate from the cables and connectors, particularly HDMI. This problem is known as TEMPEST. Compared to the analog case (VGA), the digital case is harder due to a 10-bit encoding that results in a much larger bandwidth and non-linear mapping between the observed signal and the pixel's intensity. As a result, eavesdropping systems designed for the analog case obtain unclear and difficult-to-read images when applied to digital video. The proposed solution is to recast the problem as an inverse problem and train a deep learning module to map the observed electromagnetic signal back to the displayed image. However, this approach still requires a detailed mathematical analysis of the signal, firstly to determine the frequency at which to tune but also to produce training samples without actually needing a real TEMPEST setup. This saves time and avoids the need to obtain these samples, especially if several configurations are being considered. Our focus is on improving the average Character Error Rate in text, and our system improves this rate by over 60 percentage points compared to previous available implementations. The proposed system is based on widely available Software Defined Radio and is fully open-source, seamlessly integrated into the popular GNU Radio framework. We also share the dataset we generated for training, which comprises both simulated and over 1000 real captures. Finally, we discuss some countermeasures to minimize the potential risk of being eavesdropped by systems designed based on similar principles.	翻訳日:2024-07-16 21:08:35 公開日:2024-07-12
# CLOVER:コンテキストを考慮した長期オブジェクト視点と環境不変表現学習 CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning ( http://arxiv.org/abs/2407.09718v1 ) ライセンス: Link先を確認	Dongmyeong Lee, Amanda Adkins, Joydeep Biswas,	(参考訳) 多くのアプリケーションにおいて、ロボットは、オブジェクトインスタンスを識別したり、以前見たインスタンスを再識別する機能を含む、環境のオブジェクトレベルの理解の恩恵を受けることができる。オブジェクトの再識別は、異なる視点や、天気や照明の変化に起因する顕著な外観の変化のあるシーンで困難である。一般的な対象の再識別に対処するアプローチは、前景のセグメンテーションを必要とし、オクルージョン、屋外シーン、照明変更といった課題について限定的に考慮する。様々な照明条件と視点下での8クラスの557個のオブジェクトの1,037,814個の観測を含む,地中オブジェクト再識別データセットであるCODa Re-IDを紹介する。さらに,静的なオブジェクトインスタンスを区別可能なオブジェクト観測のための表現学習手法であるCLOVERを提案する。この結果から,CLOVERは照明条件や視点変化の異なる静的オブジェクト再識別において優れた性能を示し,未知のインスタンスやクラスに一般化できることがわかった。 In many applications, robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen instances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Most works on object re-identification focus on specific classes; approaches that address general object re-identification require foreground segmentation and have limited consideration of challenges such as occlusions, outdoor scenes, and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects of 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes, and can generalize to unseen instances and classes.	翻訳日:2024-07-16 21:08:35 公開日:2024-07-12
# MSEval:アルゴリズムモデルを評価する概念設計における材料選択のためのデータセット MSEval: A Dataset for Material Selection in Conceptual Design to Evaluate Algorithmic Models ( http://arxiv.org/abs/2407.09719v1 ) ライセンス: Link先を確認	Yash Patawari Jain, Daniele Grandi, Allin Groom, Brandon Cramer, Christopher McComb,	(参考訳) 材料選択は製造業から建設まで、多くの産業において重要な役割を担っている。材料選択は通常、設計者が設計ソリューションと意図した製造アプローチを反復的に洗練する、いくつかの概念設計のサイクル後に行われる。デザイン研究において、材料選択は一般に1つの正解を持つ最適化問題として扱われる。さらに、特定の種類のオブジェクトや設計関数に制限されることも少なくないため、選択プロセスの計算コストと時間を要する可能性がある。本稿では,多種多様なデザインブリーフィングと基準にまたがって,専門家による資料評価からなる新しいデータセットであるMSEvalを紹介する。このデータは、概念設計のための材料選択の文脈における機械学習モデルの評価と修正を容易にするためのベンチマークとして機能するように設計されている。 Material selection plays a pivotal role in many industries, from manufacturing to construction. Material selection is usually carried out after several cycles of conceptual design, during which designers iteratively refine the design solution and the intended manufacturing approach. In design research, material selection is typically treated as an optimization problem with a single correct answer. Moreover, it is also often restricted to specific types of objects or design functions, which can make the selection process computationally expensive and time-consuming. In this paper, we introduce MSEval, a novel dataset which is comprised of expert material evaluations across a variety of design briefs and criteria. This data is designed to serve as a benchmark to facilitate the evaluation and modification of machine learning models in the context of material selection for conceptual design.	翻訳日:2024-07-16 21:08:35 公開日:2024-07-12
# 大規模言語モデル推論の高速化のための多言語共同投機復号法 Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference ( http://arxiv.org/abs/2407.09722v1 ) ライセンス: Link先を確認	Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun,	(参考訳) 変換器をベースとした大規模言語モデル(LLM)は、様々なタスクにおいてそのパワーを実証しているが、その推論にはかなりの時間とエネルギーコストがかかる。 LLM推論を高速化するために、投機的復号法はより小さなモデルを用いて1つのトークン列を提案し、その後ターゲットの大モデルによってバッチで検証される。自己回帰復号法と比較すると、投機的復号法は同じ数のトークンを生成し、大きなモデルの実行量が少なくなるため、全体の推論を1ドルから2ドルに加速する。しかし、greedy decodingは出力パープレキシティの観点からは最適な復号アルゴリズムではなく、復号アルゴリズムの有効性を直接測定する。投機的復号化よりも出力の難易度と効率性が良いアルゴリズムは、実際より有用である。この明らかに矛盾する目標を達成するために、まず、各ステップで複数のトークンを、その関節の難易度に基づいて重み付けして生成するマルチトークンジョイントグリーディデコーディング(MJGD)を導入する。アウトプット全体の難易度が向上することを示す。しかし、MJGDの計算コストは実際には実現不可能である。そこで本研究では,MJGDの近似と高速化を両面から行うMJSDを提案する。MJGDは,大モデルと小モデルの結合分布を近似し,近似の精度を保証するための検証ステップを用い,ビームデコーディングを用いて関節分布からのシーケンス生成を高速化する。バニラ投機復号法と比較すると、MJSDには2つの利点がある。(1)MJGDの近似であり、より良い出力パープレキシティを実現すること、(2)結合可能性による検証により、有効なパープレキシティを持つドラフトトークンの長いプレフィックスサブシーケンスを受け入れることができ、効率が向上する。 Transformer-based Large language models (LLMs) have demonstrated their power in various tasks, but their inference incurs significant time and energy costs. To accelerate LLM inference, speculative decoding uses a smaller model to propose one sequence of tokens, which are subsequently validated in batch by the target large model. Compared with autoregressive decoding, speculative decoding generates the same number of tokens with fewer runs of the large model, hence accelerating the overall inference by $1$-$2\times$. However, greedy decoding is not the optimal decoding algorithm in terms of output perplexity, which is a direct measurement of the effectiveness of a decoding algorithm. An algorithm that has better output perplexity and even better efficiency than speculative decoding can be more useful in practice. To achieve this seemingly contradictory goal, we first introduce multi-token joint greedy decoding (MJGD), which greedily generates multiple tokens at each step based on their joint perplexity. We show that it leads to better perplexity for the whole output. But the computation cost of MJGD is infeasible in practice. So we further propose multi-token joint speculative decoding (MJSD), which approximates and accelerates the MJGD from two aspects: it approximates the joint distribution of the large model with that of a small model, and uses a verification step to guarantee the accuracy of approximation; then it uses beam decoding to accelerate the sequence generation from the joint distribution. Compared with vanilla speculative decoding, MJSD has two advantages: (1) it is an approximation of MJGD, thus achieving better output perplexity; (2) verification with joint likelihood allows it to accept the longest prefix sub-sequence of the draft tokens with valid perplexity, leading to better efficiency...	翻訳日:2024-07-16 21:08:35 公開日:2024-07-12
# FlashAttention-3: 非同期と低精度で高速で正確な注意 FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision ( http://arxiv.org/abs/2407.08608v2 ) ライセンス: Link先を確認	Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao,	(参考訳) ユビキタストランスフォーマーアーキテクチャのコアレイヤとしての注意は、大規模言語モデルと長期コンテキストアプリケーションのボトルネックとなる。 FlashAttentionは、メモリ読み込み/書き込みを最小化することでGPUの注意を加速するアプローチを詳しく説明した。しかし、FlashAttention-2はH100 GPUでわずか35%しか利用できないため、最近のハードウェアで見られる新機能をまだ活用していない。 1)ワープ特殊化による全体的な計算とデータ移動の重なり、(2)ブロックワイドの行列とソフトマックス演算のインターリーブ、(3)FP8のハードウェアサポートを利用するブロック量子化と不整合処理である。提案手法であるFlashAttention-3は,FP16が740 TFLOPs/s (75%) に達し,FP8が1.2 PFLOPs/sに近づき,H100 GPUの1.5-2.0$\times$が高速化されることを示す。我々はFP8 FlashAttention-3がベースラインFP8よりも2.6$\times$低い数値誤差を達成したことを検証する。 Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for large language models and long-context applications. FlashAttention elaborated an approach to speed up attention on GPUs through minimizing memory reads/writes. However, it has yet to take advantage of new capabilities present in recent hardware, with FlashAttention-2 achieving only 35% utilization on the H100 GPU. We develop three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) block quantization and incoherent processing that leverages hardware support for FP8 low-precision. We demonstrate that our method, FlashAttention-3, achieves speedup on H100 GPUs by 1.5-2.0$\times$ with FP16 reaching up to 740 TFLOPs/s (75% utilization), and with FP8 reaching close to 1.2 PFLOPs/s. We validate that FP8 FlashAttention-3 achieves 2.6$\times$ lower numerical error than a baseline FP8 attention.	翻訳日:2024-07-16 11:29:58 公開日:2024-07-12
# 自動倉庫レイアウト生成のための新しいフレームワーク A Novel Framework for Automated Warehouse Layout Generation ( http://arxiv.org/abs/2407.08633v2 ) ライセンス: Link先を確認	Atefeh Shahroudnejad, Payam Mousavi, Oleksii Perepelytsia, Sahir, David Staszak, Matthew E. Taylor, Brent Bawel,	(参考訳) 倉庫レイアウトの最適化は、効率と生産性に大きな影響を与えるため、非常に重要です。自動倉庫レイアウト生成のためのAI駆動フレームワークを提案する。このフレームワークは制約されたビームサーチを用いて、任意の空間パラメータ内の最適なレイアウトを導出し、すべての機能要件を順守する。生成したレイアウトの有効性は、アイテムアクセシビリティ、必要最小限のクリアランス、および通路接続性といった基準に基づいて検証される。次に、記憶位置、アクセスポイント、アクセシビリティコストを考慮し、評価可能なレイアウトを評価するためにスコア関数が使用される。本手法は, 各種倉庫の寸法, 形状, ドア配置, インターコネクトに対して, 実現可能な最適レイアウトを作成できることを示す。このアプローチは、現在デプロイの準備が整っているため、人間のデザイナがオプションを素早く探索し、確認することが可能になり、ユースケースに最適なレイアウトの選択が容易になる。 Optimizing warehouse layouts is crucial due to its significant impact on efficiency and productivity. We present an AI-driven framework for automated warehouse layout generation. This framework employs constrained beam search to derive optimal layouts within given spatial parameters, adhering to all functional requirements. The feasibility of the generated layouts is verified based on criteria such as item accessibility, required minimum clearances, and aisle connectivity. A scoring function is then used to evaluate the feasible layouts considering the number of storage locations, access points, and accessibility costs. We demonstrate our method's ability to produce feasible, optimal layouts for a variety of warehouse dimensions and shapes, diverse door placements, and interconnections. This approach, currently being prepared for deployment, will enable human designers to rapidly explore and confirm options, facilitating the selection of the most appropriate layout for their use-case.	翻訳日:2024-07-16 11:29:58 公開日:2024-07-12
# ボディードチェーン・オブ・ソート推論によるロボット制御 Robotic Control via Embodied Chain-of-Thought Reasoning ( http://arxiv.org/abs/2407.08693v2 ) ライセンス: Link先を確認	Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine,	(参考訳) 学習したロボット制御ポリシーの重要な制限は、トレーニングデータの外部で一般化できないことである。視覚言語行動モデル(VLA)に関する最近の研究は、学習されたロボットポリシーのバックボーンとして、大規模なインターネット事前学習された視覚言語モデルを使用することで、その堅牢性と一般化能力を大幅に向上させることができることを示した。しかし、他の領域における大きな視覚言語モデルの最もエキサイティングな能力の1つは、複雑な問題を通して反復的に推論できる能力である。同じ能力をロボティクスに持ち込んで、行動する前に与えられたタスクを推論することで、パフォーマンスを向上させるポリシーを実現することができるだろうか? チェーン・オブ・シークレット(CoT)スタイルのプロンプトの使用は、通常のVLAでは比較的単純なトレーニング例であるため、はるかに効果が低い。さらに、通常のCoTでよく見られるように、サブタスクに関する純粋に意味論的推論は、感覚観察やロボットの状態に推論を根ざす必要があるロボットポリシーには不十分である。この目的のために、我々はVLAのためのEmbodied Chain-of-Thought Reasoning (ECoT)を導入し、ロボットの動作を予測する前に、計画、サブタスク、動き、そしてオブジェクト境界ボックスやエンドエフェクタ位置のような視覚的に接地された特徴について推論する複数のステップを実行するようにVLAを訓練する。大規模ロボットデータセット上でECoTのための合成トレーニングデータを生成するスケーラブルなパイプラインを設計する。 ECoTは、現在最強のオープンソースVLAポリシーであるOpenVLAの絶対的な成功率を、追加のロボットトレーニングデータなしで、挑戦的な一般化タスクに対して28%向上することを示した。さらに、ECoTは、人間がポリシーの失敗を解釈し、自然言語を使って行動を修正するのを容易にする。 A key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability. Yet, one of the most exciting capabilities of large vision-language models in other domains is their ability to reason iteratively through complex problems. Can that same capability be brought into robotics to allow policies to improve performance by reasoning about a given task before acting? Naive use of "chain-of-thought" (CoT) style prompting is significantly less effective with standard VLAs because of the relatively simple training examples that are available to them. Additionally, purely semantic reasoning about sub-tasks, as is common in regular CoT, is insufficient for robot policies that need to ground their reasoning in sensory observations and the robot state. To this end, we introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features like object bounding boxes and end effector positions, before predicting the robot action. We design a scalable pipeline for generating synthetic training data for ECoT on large robot datasets. We demonstrate, that ECoT increases the absolute success rate of OpenVLA, the current strongest open-source VLA policy, by 28% across challenging generalization tasks, without any additional robot training data. Additionally, ECoT makes it easier for humans to interpret a policy's failures and correct its behavior using natural language.	翻訳日:2024-07-16 11:29:58 公開日:2024-07-12
# Embodied Computational Agents を用いた連続的発達神経シミュレーション Continual Developmental Neurosimulation Using Embodied Computational Agents ( http://arxiv.org/abs/2103.05753v3 ) ライセンス: Link先を確認	Bradly Alicea, Rishabh Chakrabarty, Stefan Dvoretskii, Akshara Gopi, Avery Lim, Jesse Parent,	(参考訳) 発達生物学、認知科学、計算モデリングの合成を通じて学ぶべきことはたくさんある。私たちの進路には、Braitenberg Vehiclesをベースとした開発にインスパイアされた学習エージェントの設計が含まれます。神経系の形態形成, 発達学習, 可塑性の関連現象のブリッジングにおける発達軌跡の役割を考察することができる。本手法は, 連続学習と密接に結びついており, 発達的実施形態と密に統合されており, 発達的ブレイテンベルク車両 (dBVs) と呼ばれるエージェントを用いて実施することができる。 dBVは、体、センサー、エフェクター、神経システムなど、エージェントベースのシステムへと変貌する、未定義の構造の集合として、自らの生活を始める。この表現型は発達のタイミングで特徴づけられる: 異なる形態形成、臨界、獲得(発達学習)期間を持つ。さらに,ネットワーク形態形成は遺伝的アルゴリズムを用いて行うことができ,発達学習は多数の計算手法を用いて行うことができることを提案する。このアプローチは、発達的アプローチから生じるかもしれない適応的エージェントの振る舞いのフレームワークを提供する。すなわち、臨界周期や成長と獲得、明示的な具体的ネットワークアーキテクチャ、神経ネットワークの組み立てとこれらのネットワーク上でのアクティブな学習の区別などである。結論として、エージェント学習と開発を、非常に短い(100ms)間隔から長期的進化まで、異なる時間スケールで検討する。エンボディドエージェントベースのアプローチにおける発達、進化、学習は、生物学的にインスパイアされたインテリジェンスの統合的視点の鍵となる。 There is much to learn through synthesis of Developmental Biology, Cognitive Science and Computational Modeling. Our path forward involves a design for developmentally-inspired learning agents based on Braitenberg Vehicles. Continual developmental neurosimulation allows us to consider the role of developmental trajectories in bridging the related phenomena of nervous system morphogenesis, developmental learning, and plasticity. Being closely tied to continual learning, our approach is tightly integrated with developmental embodiment, and can be implemented using a type of agent called developmental Braitenberg Vehicles (dBVs). dBVs begin their lives as a set of undefined structures that transform into agent-based systems including a body, sensors, effectors, and nervous system. This phenotype is characterized in terms of developmental timing: with distinct morphogenetic, critical, and acquisition (developmental learning) periods. We further propose that network morphogenesis can be accomplished using a genetic algorithmic approach, while developmental learning can be implemented using a number of computational methodologies. This approach provides a framework for adaptive agent behavior that might result from a developmental approach: namely by exploiting critical periods or growth and acquisition, an explicitly embodied network architecture, and a distinction between the assembly of neuronal networks and active learning on these networks. In conclusion, we will consider agent learning and development at different timescales, from very short (<100ms) intervals to long-term evolution. The development, evolution, and learning in an embodied agent-based approach is key to an integrative view of biologically-inspired intelligence.	翻訳日:2024-07-16 06:11:12 公開日:2024-07-12
# CLIP-PAE: 絡み合った、解釈可能な、制御可能なテキストガイド型顔マニピュレーションのための関連特徴抽出のための投影拡張埋め込み CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation ( http://arxiv.org/abs/2210.03919v5 ) ライセンス: Link先を確認	Chenliang Zhou, Fangcheng Zhong, Cengiz Oztireli,	(参考訳) 最近導入されたContrastive Language- Image Pre-Training (CLIP) は、画像とテキストを結合した潜在空間に埋め込むことでブリッジする。これにより、テキストによる説明を提供することで、入力画像を操作することを目的とした文献を多用する扉を開く。しかし、画像とテキストの埋め込みの相違により、最適化対象としてテキストの埋め込みを用いることで、結果の画像に望ましくないアーティファクトがしばしば導入される。絡み合い、解釈可能性、制御性も操作の保証が難しい。これらの問題を緩和するために、特定の画像の特徴を捉えるための関連するプロンプトで区切られたコーパス部分空間を定義することを提案する。テキスト誘導画像操作の性能向上のための最適化ターゲットとして,CLIPプロジェクション拡張埋め込み(PAE)を導入する。提案手法は,任意のCLIPに基づく画像操作アルゴリズムに容易に計算,適応し,スムーズに組み込むことができる,シンプルで汎用的なパラダイムである。本手法の有効性を実証するため,いくつかの理論的および実証的研究を行った。ケーススタディとして,テキスト誘導型セマンティックフェイス編集の手法を用いる。我々は、PAEが、最先端の品質と精度で、より不整合で、解釈可能で、制御可能な画像操作を促進することを定量的に、質的に証明する。プロジェクトページ: https://chenliang-zhou.github.io/CLIP-PAE/。 Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduces undesired artifacts in the resulting images. Disentanglement, interpretability, and controllability are also hard to guarantee for manipulation. To alleviate these problems, we propose to define corpus subspaces spanned by relevant prompts to capture specific image characteristics. We introduce CLIP Projection-Augmentation Embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation. Our method is a simple and general paradigm that can be easily computed and adapted, and smoothly incorporated into any CLIP-based image manipulation algorithm. To demonstrate the effectiveness of our method, we conduct several theoretical and empirical studies. As a case study, we utilize the method for text-guided semantic face editing. We quantitatively and qualitatively demonstrate that PAE facilitates a more disentangled, interpretable, and controllable image manipulation with state-of-the-art quality and accuracy. Project page: https://chenliang-zhou.github.io/CLIP-PAE/.	翻訳日:2024-07-16 06:11:12 公開日:2024-07-12
# 周波数アップコンバータを用いた低周波電磁モードの量子力学 Quantum metrology of low frequency electromagnetic modes with frequency upconverters ( http://arxiv.org/abs/2210.05576v2 ) ライセンス: Link先を確認	Stephen E. Kuenstner, Elizabeth C. van Assendelft, Saptarshi Chaudhuri, Hsiao-Mei Cho, Jason Corbin, Shawn W. Henderson, Fedja Kadribasic, Dale Li, Arran Phipps, Nicholas M. Rapidis, Maria Simanovskaia, Jyotirmai Singh, Cyndia Yu, Kent D. Irwin,	(参考訳) 本稿では、RF量子アップコンバータ(RQU)と、そのdcと超高周波数帯域(VHF)間の電磁モードの量子メトロジーへの応用について述べる(\lesssim$300MHz)。 RQUは、超伝導ループとジョセフソン接合からなるジョセフソン干渉計を用いて、低周波電磁モード(dcとVHF)とマイクロ波Cバンド(\sim$5GHz)のパラメトリック相互作用を実装する。我々は量子増幅器理論を用いてRQUの性能を解析し、RQUがこの周波数範囲で量子制限オプアンプとして動作可能であることを示す。また、バックアクション回避(BAE)測定、サイドバンド冷却、二モードスクイーズなど、キャビティ光学で使用されるものと同等の古典的な測定プロトコルを使用することもできる。これらのプロトコルは、標準量子限界(SQL)よりも感度のよい量子センサとして、dc--VHF電磁モードを用いた実験を可能にする。 RQUを用いて低周波からマイクロ波Cバンドへの信号アップコンバージョンを示し、完全なBAEの実現に向けた必要なステップである46.9$\;dBの位相感度ゲイン(指数比)を示す。 We present the RF Quantum Upconverter (RQU) and describe its application to quantum metrology of electromagnetic modes between dc and the Very High Frequency band (VHF) ($\lesssim$300MHz). The RQU uses a Josephson interferometer made up of superconducting loops and Josephson junctions to implement a parametric interaction between a low-frequency electromagnetic mode (between dc and VHF) and a mode in the microwave C Band ($\sim$ 5GHz), analogous to the radiation pressure interaction between electromagnetic and mechanical modes in cavity optomechanics. We analyze RQU performance with quantum amplifier theory, and show that the RQU can operate as a quantum-limited op-amp in this frequency range. It can also use non-classical measurement protocols equivalent to those used in cavity optomechanics, including back-action evading (BAE) measurements, sideband cooling, and two-mode squeezing. These protocols enable experiments using dc--VHF electromagnetic modes as quantum sensors with sensitivity better than the Standard Quantum Limit (SQL). We demonstrate signal upconversion from low frequencies to microwave C band using an RQU and show a phase-sensitive gain (extinction ratio) of $46.9$\;dB, which is a necessary step towards the realization of full BAE.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# Laplace Approximationsによるディープラーニングの効率的なベイズ更新 Efficient Bayesian Updates for Deep Learning via Laplace Approximations ( http://arxiv.org/abs/2210.06112v2 ) ライセンス: Link先を確認	Denis Huseljic, Marek Herde, Lukas Rauch, Paul Hahn, Zhixin Huang, Daniel Kottke, Stephan Vogt, Bernhard Sick,	(参考訳) ディープニューラルネットワークのトレーニングには重要な計算リソースを必要とするため、トレーニングデータセットを新しいデータで拡張するのは、通常は完全な再トレーニングを必要とするため、難しい。さらに、特定のアプリケーションは時間や計算上の制約によりコストのかかる再訓練を許さない。ラプラス近似を用いたディープニューラルネットワークのための新しいベイズ更新手法を提案することでこの問題に対処する。具体的には、ラプラス近似のガウス後続分布に二階最適化手法を応用し、逆ヘッセン行列を閉形式で計算する。このようにして、定常環境での新たなデータの到着時に、高速かつ効果的な更新を可能にする。さまざまなデータモダリティに対する大規模な評価調査では、当社の更新が、コストのかかる再トレーニングに代わる、迅速かつ競争的な代替手段であることを確認しています。さらに、既存の選択戦略を改善するために、我々の更新を利用することで、深いアクティブな学習シナリオで適用性を示す。 Since training deep neural networks takes significant computational resources, extending the training dataset with new data is difficult, as it typically requires complete retraining. Moreover, specific applications do not allow costly retraining due to time or computational constraints. We address this issue by proposing a novel Bayesian update method for deep neural networks by using a last-layer Laplace approximation. Concretely, we leverage second-order optimization techniques on the Gaussian posterior distribution of a Laplace approximation, computing the inverse Hessian matrix in closed form. This way, our method allows for fast and effective updates upon the arrival of new data in a stationary setting. A large-scale evaluation study across different data modalities confirms that our updates are a fast and competitive alternative to costly retraining. Furthermore, we demonstrate its applicability in a deep active learning scenario by using our update to improve existing selection strategies.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# 良い説明とは何か:説明の性質の調和した見方 What Makes a Good Explanation?: A Harmonized View of Properties of Explanations ( http://arxiv.org/abs/2211.05667v3 ) ライセンス: Link先を確認	Zixi Chen, Varshini Subhash, Marton Havasi, Weiwei Pan, Finale Doshi-Velez,	(参考訳) 解釈可能性(Interpretability)は、人間が機械学習(ML)モデルの側面を検証する手段を提供し、タスクを完全に自動化できない状況において、人間とMLのコラボレーションを強化する。異なる文脈は異なる性質を持つ説明を必要とする。例えば、早期の心停止警告システムがケア環境に統合される準備ができているかを決定するのに必要な説明の種類は、ローン申請者がアプリケーションを成功させるために必要なアクションを決定するのに必要な説明の種類とは大きく異なります。残念ながら、説明の性質に関して、標準化の欠如がある:異なる論文は、同じ用語を異なる量を意味するために、異なる用語を同じ量を意味するために使用する。この標準化された用語の欠如とML説明の性質の分類は、解釈可能な機械学習手法を厳格に比較することと、どの文脈でどの特性が必要なのかを識別することの両方を妨げます。本研究では、解釈可能な機械学習論文で定義された特性を調査し、実際に測定したものに基づいてそれらを合成し、それらの特性の異なる定式化間のトレードオフを記述する。そこで我々は,タスクに適した説明属性の定式化や,解釈可能な機械学習における今後の作業の標準化について,より情報的な選択を可能にする。 Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# 多文書要約モデルは合成されるか? Do Multi-Document Summarization Models Synthesize? ( http://arxiv.org/abs/2301.13844v2 ) ライセンス: Link先を確認	Jay DeYoung, Stephanie C. Martinez, Iain J. Marshall, Byron C. Wallace,	(参考訳) 多文書要約では、入力の集合の簡潔なシナプスを生成する。例えば、特定の映画について書かれた映画レビューのシナプスは、平均的な批評家のコンセンサスを反映すべきである。より簡潔な例として、臨床治験結果の生医学的体系的レビューを伴う物語要約は、個々の治験から生じる潜在的に矛盾する結果を正確に要約するべきである。本稿では,現代多文書要約モデルが如何に,このような合成を暗黙的に行うのかを問う。我々は、微調整されたトランスフォーマーからGPT-4まで、一連の要約モデルを用いて、意見とエビデンス合成データセットに関する実験を行う。既存のモデルでも部分的には合成を行うが、最高のモデルでさえ入力順序の変化に過敏であり、入力組成の変化に過敏である(例えば、正と負のレビューの比率)。提案手法は, モデル合成能力を向上させるための単純な, 汎用的, 効果的な手法であり, 明確な多様な候補出力を生成し, それらの文字列から, 入力に対して期待される集計値に最も適しているか, あるいは, モデルが良い候補を生成できない場合の留意点を選択する。 Multi-document summarization entails producing concise synopses of collections of inputs. For some applications, the synopsis should accurately synthesize inputs with respect to a key aspect, e.g., a synopsis of film reviews written about a particular movie should reflect the average critic consensus. As a more consequential example, narrative summaries that accompany biomedical systematic reviews of clinical trial results should accurately summarize the potentially conflicting results from individual trials. In this paper we ask: To what extent do modern multi-document summarization models implicitly perform this sort of synthesis? We run experiments over opinion and evidence synthesis datasets using a suite of summarization models, from fine-tuned transformers to GPT-4. We find that existing models partially perform synthesis, but imperfectly: even the best performing models are over-sensitive to changes in input ordering and under-sensitive to changes in input compositions (e.g., ratio of positive to negative reviews). We propose a simple, general, effective method for improving model synthesis capabilities by generating an explicitly diverse set of candidate outputs, and then selecting from these the string best aligned with the expected aggregate measure for the inputs, or abstaining when the model produces no good candidate.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# 一般化性能向上のためのアダムの適応ステップ範囲の抑制について On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance ( http://arxiv.org/abs/2302.01029v3 ) ライセンス: Link先を確認	Guoqiang Zhang,	(参考訳) 近年のアダプティブ・オプティマイザは、基本的に適応段差の分散を減らし、運動量を伴うSGDに近づくことにより、Adamの一般化性能を向上させる。上記のモチベーションに従えば、階層的勾配統計を利用して、アダムの適応段階化の範囲を抑えることができる。特に、各イテレーションにおいて、DNNモデルの更新に使用する前に、第2運動量v_tで連続して3つの操作を実行することを提案する:(1)ダウンスケーリング、(2)エプシロン埋め込み、(3)ダウン翻訳。結果のアルゴリズムはSET-Adamと呼ばれ、SETは3つの操作の簡単な表記法である。 v_tの層状サブベクタと対応するオールワンサブベクタとの角度を利用して、v_t上のダウンスケーリング動作を行う。 SET-Adam は NLP の変換器と LSTM のトレーニングにおいて 8 つの適応最適化器より優れており,CIAF10 と CIFAR100 のイメージ分類では VGG と ResNet が,画像生成タスクの WGAN-GP モデルのトレーニングでは 8 つの適応手法の最適性能に適合している。さらに、SET-AdamはImageNet上でResNet18をトレーニングするためにAdamやAdaBeliefよりも高い検証精度を生成する。 A number of recent adaptive optimizers improve the generalisation performance of Adam by essentially reducing the variance of adaptive stepsizes to get closer to SGD with momentum. Following the above motivation, we suppress the range of the adaptive stepsizes of Adam by exploiting the layerwise gradient statistics. In particular, at each iteration, we propose to perform three consecutive operations on the second momentum v_t before using it to update a DNN model: (1): down-scaling, (2): epsilon-embedding, and (3): down-translating. The resulting algorithm is referred to as SET-Adam, where SET is a brief notation of the three operations. The down-scaling operation on v_t is performed layerwise by making use of the angles between the layerwise subvectors of v_t and the corresponding all-one subvectors. Extensive experimental results show that SET-Adam outperforms eight adaptive optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100 while matching the best performance of the eight adaptive methods when training WGAN-GP models for image generation tasks. Furthermore, SET-Adam produces higher validation accuracies than Adam and AdaBelief for training ResNet18 over ImageNet.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# 周りを見回して学ぶ:探索による自己学習対象検出 Look Around and Learn: Self-Training Object Detection by Exploration ( http://arxiv.org/abs/2302.03566v3 ) ライセンス: Link先を確認	Gianluca Scarpellini, Stefano Rosa, Pietro Morerio, Lorenzo Natale, Alessio Del Bue,	(参考訳) オブジェクト検出器が新しい環境でデプロイされると、しばしばパフォーマンスが低下する。本稿では,既存の物体検出装置を人間の介入に頼らずに,新たな環境下で画像の探索と取得を行なえる方法,すなわち,完全に自己管理されたアプローチについて考察する。私たちの設定では、エージェントはまず、事前訓練されたオフザシェルフ検出器を使って、オブジェクトを検出し、擬似ラベルを関連付けることで、環境を探索することを学びます。同一対象の擬似ラベルは異なる視点で一致しなくてはならないと仮定することで、探索政策を学習し、硬いサンプルを採掘し、観察のコンセンサスから洗練された擬似ラベルを生成するための「診断和解」と呼ばれる新しいメカニズムを考案する。我々は現在の最先端の統一されたベンチマークを実装し、既存の探索政策や知覚メカニズムと比較する。提案手法は既存の手法よりも優れており,シミュレーションシナリオでは対象検出器を6.2%改善し,他の最先端手法よりも3.59%向上し,実際のロボット試験では9.97%向上した。提案されたアプローチとベースラインのコードはhttps://iit-pavis.github.io/Look_Around_And_Learn/で公開されている。 When an object detector is deployed in a novel setting it often experiences a drop in performance. This paper studies how an embodied agent can automatically fine-tune a pre-existing object detector while exploring and acquiring images in a new environment without relying on human intervention, i.e., a fully self-supervised approach. In our setting, an agent initially learns to explore the environment using a pre-trained off-the-shelf detector to locate objects and associate pseudo-labels. By assuming that pseudo-labels for the same object must be consistent across different views, we learn the exploration policy Look Around to mine hard samples, and we devise a novel mechanism called Disagreement Reconciliation for producing refined pseudo-labels from the consensus among observations. We implement a unified benchmark of the current state-of-the-art and compare our approach with pre-existing exploration policies and perception mechanisms. Our method is shown to outperform existing approaches, improving the object detector by 6.2% in a simulated scenario, a 3.59% advancement over other state-of-the-art methods, and by 9.97% in the real robotic test without relying on ground-truth. Code for the proposed approach and baselines are available at https://iit-pavis.github.io/Look_Around_And_Learn/.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# AI強化集中治療ユニット:広汎なセンシングによる患者ケアの革新 AI-Enhanced Intensive Care Unit: Revolutionizing Patient Care with Pervasive Sensing ( http://arxiv.org/abs/2303.06252v2 ) ライセンス: Link先を確認	Subhash Nerella, Ziyuan Guan, Scott Siegel, Jiaqing Zhang, Kia Khezeli, Azra Bihorac, Parisa Rashidi,	(参考訳) 集中治療室 (ICU) は、重篤な患者が集中治療や監視を受ける特別な病院空間である。包括的モニタリングは、患者の状態、特に明度、究極的にはケアの質を評価する上で必須である。しかし、ICUにおける患者の監視範囲は、時間的制約と医療提供者の作業負荷によって制限されている。現在、表情、姿勢、移動といった細部を含む視力評価は散発的に捉えられるか、全く捉えられていない。これらの手動の観察は個人を対象としており、ドキュメントの誤りを招きやすい。人工知能(AI)によって実現されたシステムは、異常な学習能力のために、患者の視覚的モニタリングとアセスメントを増強する可能性がある。このようなシステムは、トレーニングにロバストなアノテートデータを必要とする。そこで本研究では,複数モードの深度画像,カラーRGB画像,加速度計,筋電図,音圧,光レベルからデータを収集し,連続的および粒度の計測,デリリウムリスク,痛み,移動性評価などのインテリジェントなモニタリングシステムを開発するために,広汎なセンシング・データ処理システムを開発した。本稿では,リアルタイムの患者モニタリングと視覚的評価のために開発したIntelligent Intensive Care Unit (I2CU)システムアーキテクチャについて述べる。 The intensive care unit (ICU) is a specialized hospital space where critically ill patients receive intensive care and monitoring. Comprehensive monitoring is imperative in assessing patients conditions, in particular acuity, and ultimately the quality of care. However, the extent of patient monitoring in the ICU is limited due to time constraints and the workload on healthcare providers. Currently, visual assessments for acuity, including fine details such as facial expressions, posture, and mobility, are sporadically captured, or not captured at all. These manual observations are subjective to the individual, prone to documentation errors, and overburden care providers with the additional workload. Artificial Intelligence (AI) enabled systems has the potential to augment the patient visual monitoring and assessment due to their exceptional learning capabilities. Such systems require robust annotated data to train. To this end, we have developed pervasive sensing and data processing system which collects data from multiple modalities depth images, color RGB images, accelerometry, electromyography, sound pressure, and light levels in ICU for developing intelligent monitoring systems for continuous and granular acuity, delirium risk, pain, and mobility assessment. This paper presents the Intelligent Intensive Care Unit (I2CU) system architecture we developed for real-time patient monitoring and visual assessment.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# MS-TCRNet:センサ強化キネマティクスを用いた動作セグメンテーションのための多段階時間畳み込みリカレントネットワーク MS-TCRNet: Multi-Stage Temporal Convolutional Recurrent Networks for Action Segmentation Using Sensor-Augmented Kinematics ( http://arxiv.org/abs/2303.07814v2 ) ライセンス: Link先を確認	Adam Goldbraikh, Omer Shubi, Or Rubin, Carla M Pugh, Shlomi Laufer,	(参考訳) アクションセグメンテーション(Action segmentation)は、様々なセンサーから得られるビデオやキネマティックデータで通常実行される、ハイレベルなプロセス分析において難しいタスクである。本研究は,運動学的データに対する行動セグメンテーションに関連する2つのコントリビューションを提示する。まず,動作データに特化して設計されたMS-TCRNet(Multi-Stage Temporal Convolutional Recurrent Networks)の2つのバージョンを紹介する。アーキテクチャは、ステージ内正規化を備えた予測ジェネレータと、双方向LSTMまたはGRUベースの精錬ステージで構成されている。第2に、キネマティックデータの強い幾何学的構造を利用してアルゴリズムの性能とロバスト性を向上する、World Frame RotationとHand Inversionという2つの新しいデータ拡張手法を提案する。手術縫合作業の3つのデータセット: 可変組織シミュレーション(VTS)データセットと新たに導入されたボウエル修復シミュレーション(BRS)データセット、およびロボット手術におけるよく知られたベンチマークであるJHU-ISI Gesture and Skill Assessment Working Set(JIGSAWS)データセットについて、本モデルの評価を行った。我々の手法は最先端のパフォーマンスを達成した。 Action segmentation is a challenging task in high-level process analysis, typically performed on video or kinematic data obtained from various sensors. This work presents two contributions related to action segmentation on kinematic data. Firstly, we introduce two versions of Multi-Stage Temporal Convolutional Recurrent Networks (MS-TCRNet), specifically designed for kinematic data. The architectures consist of a prediction generator with intra-stage regularization and Bidirectional LSTM or GRU-based refinement stages. Secondly, we propose two new data augmentation techniques, World Frame Rotation and Hand Inversion, which utilize the strong geometric structure of kinematic data to improve algorithm performance and robustness. We evaluate our models on three datasets of surgical suturing tasks: the Variable Tissue Simulation (VTS) Dataset and the newly introduced Bowel Repair Simulation (BRS) Dataset, both of which are open surgery simulation datasets collected by us, as well as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a well-known benchmark in robotic surgery. Our methods achieved state-of-the-art performance.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# グラフ上のランダム逆問題:分散オンライン学習 Random Inverse Problems Over Graphs: Decentralized Online Learning ( http://arxiv.org/abs/2303.11789v6 ) ライセンス: Link先を確認	Tao Li, Xiwei Zhang,	(参考訳) ネットワークグラフ上の分散ランダム逆問題のフレームワークをオンライン測定で構築し,分散化されたオンライン学習アルゴリズムを提案する。これはヒルベルト空間における分散パラメータ推定と、再現されたカーネルヒルベルト空間(RKHS-LMS)における最小平均平方問題を統一する。我々は、アルゴリズムの収束を、L2有界なマルティンゲール差項を持つヒルベルト空間における不均一なランダム差分方程式のクラスにおける漸近安定性に変換し、ヒルベルト空間におけるL2-漸近安定性理論を開発する。ネットワークグラフが連結され、フォワード作用素の列が励起条件の無限次元時空間持続性を満たすならば、全てのノードの見積もりは平均二乗であり、ほぼ確実に一致している。さらに,RKHSにおける非定常および非独立なオンラインデータストリームに基づく分散オンライン学習アルゴリズムを提案し,ランダム入力データによって誘導される演算子が励振条件の無限次元時空間持続性を満たす場合,そのアルゴリズムが平均二乗でほぼ確実に整合であることを証明した。 We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a class of inhomogeneous random difference equations in Hilbert spaces with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory in Hilbert spaces. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinite-dimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. Moreover, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# 画素ワイド農業用画像時系列分類:比較と変形可能なプロトタイプベースアプローチ Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach ( http://arxiv.org/abs/2303.12533v2 ) ライセンス: Link先を確認	Elliot Vincent, Jean Ponce, Mathieu Aubry,	(参考訳) 衛星による地球観測の改善により、より高時間分解能と空間分解能の画像が得られる。このデータを農業モニタリングに活用することは、環境と経済の課題に対処するための鍵となる。時間的データを用いた作物の分節化の現在の手法は、注釈付きデータに依存するか、監督の欠如を補うために非常に高度に設計されている。本稿では,衛星画像時系列(SITS)の教師付きおよび教師なし画素単位のセグメンテーションのためのデータセットと手法を提示・比較する。また,K-meansやNearest Centroid Classifier (NCC)のような古典的プロトタイプベースの手法に対して,スペクトル変形と時間シフトに不変性を加えるアプローチを導入する。我々は、異なるレベルの監督について研究し、この単純かつ高度に解釈可能な手法は、低データ体制において最高の性能を達成し、最近の4つのSITSデータセット上での農業時系列の教師なし分類の最先端を著しく改善することを示す。 Improvements in Earth observation by satellites allow for imagery of ever higher temporal and spatial resolution. Leveraging this data for agricultural monitoring is key for addressing environmental and economic challenges. Current methods for crop segmentation using temporal data either rely on annotated data or are heavily engineered to compensate the lack of supervision. In this paper, we present and compare datasets and methods for both supervised and unsupervised pixel-wise segmentation of satellite image time series (SITS). We also introduce an approach to add invariance to spectral deformations and temporal shifts to classical prototype-based methods such as K-means and Nearest Centroid Classifier (NCC). We study different levels of supervision and show this simple and highly interpretable method achieves the best performance in the low data regime and significantly improves the state of the art for unsupervised classification of agricultural time series on four recent SITS datasets.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# TalkCLIP: テキストガイド型表現型音声スタイルによる対話ヘッドジェネレーション TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles ( http://arxiv.org/abs/2304.00334v2 ) ライセンス: Link先を確認	Yifeng Ma, Suzhen Wang, Yu Ding, Lincheng Li, Bowen Ma, Tangjie Lv, Changjie Fan, Zhipeng Hu, Zhidong Deng, Xin Yu,	(参考訳) 音声駆動音声ヘッド生成は注目を集めている。所望の表情で話すヘッドビデオを作成するために、従来の手法は、表現情報を提供するために余分な参照ビデオに頼っている。本研究では,自然言語で表現を指定した発話ヘッドを生成可能なフレームワークであるTalkCLIPを提案する。テキストから表現へのマッピングをモデル化するために,まず,粗い感情ときめ細かい顔の動きの両方を表現した多彩なテキスト記述を持つテキスト-ビデオ対話ヘッドデータセットを構築した。提案したデータセットを活用することで,表現表現に自然言語に基づく記述を投影するCLIPベースのスタイルエンコーダを導入する。 TalkCLIPはトレーニング中に見えない説明のために式を推測することもできます。 TalkCLIPはテキストを使って表現の強度を調節したり、表現を編集したりすることもできる。広汎な実験により、TalkCLIPは、テキスト記述でガイドされた鮮やかな表情で、写真リアルな発話ヘッドを生成する高度な能力を実現することが実証された。 Audio-driven talking head generation has drawn growing attention. To produce talking head videos with desired facial expressions, previous methods rely on extra reference videos to provide expression information, which may be difficult to find and hence limits their usage. In this work, we propose TalkCLIP, a framework that can generate talking heads where the expressions are specified by natural language, hence allowing for specifying expressions more conveniently. To model the mapping from text to expressions, we first construct a text-video paired talking head dataset where each video has diverse text descriptions that depict both coarse-grained emotions and fine-grained facial movements. Leveraging the proposed dataset, we introduce a CLIP-based style encoder that projects natural language-based descriptions to the representations of expressions. TalkCLIP can even infer expressions for descriptions unseen during training. TalkCLIP can also use text to modulate expression intensity and edit expressions. Extensive experiments demonstrate that TalkCLIP achieves the advanced capability of generating photo-realistic talking heads with vivid facial expressions guided by text descriptions.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# 音声からの基本構文:教師なしディープニューラルネットワークにおける自発的結合 Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks ( http://arxiv.org/abs/2305.01626v2 ) ライセンス: Link先を確認	Gašper Beguš, Thomas Lu, Zili Wang,	(参考訳) 構文の計算モデルは、主にテキストベースである。本稿では,最も基本的な構文操作を生音声から直接,教師なしの方法でモデル化できることを提案する。私たちは構文の最もユビキタスで基本的な特性の1つに焦点を合わせます。個別単語の音響記録を訓練した畳み込みニューラルネットワーク(CNN)が、入力に複数の単語を持つデータにアクセスすることなく、2つまたは3つの単語で連結された出力を生成し始める現象である。我々はこの発見を、異なるハイパーパラメータとトレーニングデータを持つ、独立に訓練されたいくつかのモデルで再現する。さらに、2つの単語で訓練されたネットワークは、新しい保存されていない単語の組み合わせに単語を埋め込むことを学ぶ。我々の知る限り、これは生の音声に基づくciwGAN/fiwGAN設定で訓練されたCNNのこれまで報告されていない特性であり、これらのアーキテクチャがどのように学習するかの理解と、生の音響入力からの構文のモデル化と進化の両方に影響を及ぼす。 Computational models of syntax are predominantly text-based. Here we propose that the most basic syntactic operations can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and elementary properties of syntax -- concatenation. We introduce spontaneous concatenation: a phenomenon where convolutional neural networks (CNNs) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the input. We replicate this finding in several independently trained models with different hyperparameters and training data. Additionally, networks trained on two words learn to embed words into novel unobserved word combinations. To our knowledge, this is a previously unreported property of CNNs trained in the ciwGAN/fiwGAN setting on raw speech and has implications both for our understanding of how these architectures learn as well as for modeling syntax and its evolution from raw acoustic inputs.	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# GenerateCT:3次元胸部CTボリュームのテキストコンディショナル生成 GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes ( http://arxiv.org/abs/2305.16037v5 ) ライセンス: Link先を確認	Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina, Enis Simsar, Alperen Tezcan, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Furkan Almas, Irem Dogan, Muhammed Furkan Dasdelen, Chinmay Prabhakar, Hadrien Reynaud, Sarthak Pati, Christian Bluethgen, Mehmet Kemal Ozdemir, Bjoern Menze,	(参考訳) フリーフォームの医療用テキストプロンプトに条件付けされた3D医療用画像を生成するための最初のアプローチであるGenerateCTには、テキストエンコーダと3つの重要なコンポーネントが組み込まれている。 3D医療画像において直接的に同等の手法を使わずに、我々はGenerateCTを最先端の手法と比較し、すべての主要な指標でその優位性を実証した。そこで我々はGenerateCTの臨床応用を多義性分類タスクで評価した。まず,実データセット上でのマルチ異常度分類器のトレーニングにより,ベースラインを確立した。ゼロショットシナリオにおいて、モデルが外部データに一般化し、未知のプロンプトで性能を評価するために、我々は、外部セットを用いて分類器を訓練し、追加のベンチマークを設定した。我々は、GenerateCTを用いて、各セットに対して等しいボリュームを合成することで、トレーニングデータセットを2倍にした2つの実験を行った。最初の実験では、実数と生成量で分類器を共同で訓練する際、APスコアが11%改善した。第2の実験では、目に見えないプロンプトに基づいて、実数と生成量の両方のトレーニングで7%改善した。さらに、GenerateCTは、任意のサイズの合成トレーニングデータセットのスケーリングを可能にする。例として,実数集合の5倍の3次元CTを10万個生成し,これらの合成CTのみを用いて分類器を訓練した。驚くべきことに、この分類器は、すべての利用可能な実データでトレーニングされたデータのパフォーマンスを8%上回った。最後に、ドメインの専門家は生成されたボリュームを評価し、テキストプロンプトと高い整合性を確認した。コード、モデルウェイト、トレーニングデータ、および生成されたデータにhttps://github.com/ibrahimethemhamamci/GenerateCTでアクセスします。 GenerateCT, the first approach to generating 3D medical imaging conditioned on free-form medical text prompts, incorporates a text encoder and three key components: a novel causal vision transformer for encoding 3D CT volumes, a text-image transformer for aligning CT and text tokens, and a text-conditional super-resolution diffusion model. Without directly comparable methods in 3D medical imaging, we benchmarked GenerateCT against cutting-edge methods, demonstrating its superiority across all key metrics. Importantly, we evaluated GenerateCT's clinical applications in a multi-abnormality classification task. First, we established a baseline by training a multi-abnormality classifier on our real dataset. To further assess the model's generalization to external data and performance with unseen prompts in a zero-shot scenario, we employed an external set to train the classifier, setting an additional benchmark. We conducted two experiments in which we doubled the training datasets by synthesizing an equal number of volumes for each set using GenerateCT. The first experiment demonstrated an 11% improvement in the AP score when training the classifier jointly on real and generated volumes. The second experiment showed a 7% improvement when training on both real and generated volumes based on unseen prompts. Moreover, GenerateCT enables the scaling of synthetic training datasets to arbitrary sizes. As an example, we generated 100,000 3D CTs, fivefold the number in our real set, and trained the classifier exclusively on these synthetic CTs. Impressively, this classifier surpassed the performance of the one trained on all available real data by a margin of 8%. Last, domain experts evaluated the generated volumes, confirming a high degree of alignment with the text prompt. Access our code, model weights, training data, and generated data at https://github.com/ibrahimethemhamamci/GenerateCT	翻訳日:2024-07-16 06:06:24 公開日:2024-07-12
# Compressed Sensing:離散最適化アプローチ Compressed Sensing: A Discrete Optimization Approach ( http://arxiv.org/abs/2306.04647v3 ) ライセンス: Link先を確認	Dimitris Bertsimas, Nicholas A. G. Johnson,	(参考訳) 圧縮センシング(CS: Compressed Sensing)問題について検討した。これは,線形測定の集合をある程度の数値耐性まで満足する最もスパースなベクトルを求める問題である。混合整数二階円錐プログラムとして再構成したCSの正規化式を$\ell_2$で導入する。この問題の2次円錐緩和を導出し、正規化パラメータの穏やかな条件下では、結果として得られる緩和は、よく研究された基礎追従問題と等価であることを示す。本稿では,2次円錐緩和を強化し,2次円錐緩和を利用してCSの小規模インスタンスを証明可能な最適性に解決する独自の分岐結合アルゴリズムを提案する。合成データに対する3つの最先端ベンチマーク手法による解と比較すると,我々の手法は平均6.22 %$sparseの解を生成することがわかった。合成データに対して実験的に最も優れたベンチマーク法と比較した場合、我々の手法は平均3.10\%$よりスパースな解を生成する。実世界のECGデータでは、与えられた$\ell_2$リコンストラクションエラーに対して、我々のアプローチは、ベンチマークメソッドよりも平均9.95\%$スパースなソリューション(最高のパフォーマンスベンチマークと比較してみれば3.88\%$スパース)を生成し、一方、与えられたスパーシティレベルでは、ベンチマークメソッドよりも平均10.77\%$低いリコンストラクションエラー(最高のパフォーマンスベンチマークと比較してみれば1.42\%$低いエラー)を生成する。マルチラベル分類アルゴリズムの構成要素として用いられる場合,提案手法は,ベンチマーク圧縮センシング法よりも高い分類精度を実現する。この改良された精度は、数桁の計算時間の増加によるコストが伴う。 We study the Compressed Sensing (CS) problem, which is the problem of finding the most sparse vector that satisfies a set of linear measurements up to some numerical tolerance. We introduce an $\ell_2$ regularized formulation of CS which we reformulate as a mixed integer second order cone program. We derive a second order cone relaxation of this problem and show that under mild conditions on the regularization parameter, the resulting relaxation is equivalent to the well studied basis pursuit denoising problem. We present a semidefinite relaxation that strengthens the second order cone relaxation and develop a custom branch-and-bound algorithm that leverages our second order cone relaxation to solve small-scale instances of CS to certifiable optimality. When compared against solutions produced by three state of the art benchmark methods on synthetic data, our numerical results show that our approach produces solutions that are on average $6.22\%$ more sparse. When compared only against the experiment-wise best performing benchmark method on synthetic data, our approach produces solutions that are on average $3.10\%$ more sparse. On real world ECG data, for a given $\ell_2$ reconstruction error our approach produces solutions that are on average $9.95\%$ more sparse than benchmark methods ($3.88\%$ more sparse if only compared against the best performing benchmark), while for a given sparsity level our approach produces solutions that have on average $10.77\%$ lower reconstruction error than benchmark methods ($1.42\%$ lower error if only compared against the best performing benchmark). When used as a component of a multi-label classification algorithm, our approach achieves greater classification accuracy than benchmark compressed sensing methods. This improved accuracy comes at the cost of an increase in computation time by several orders of magnitude.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# スキルクリティカル:階層的強化学習のための学習スキルの精製 Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning ( http://arxiv.org/abs/2306.08388v3 ) ライセンス: Link先を確認	Ce Hao, Catherine Weaver, Chen Tang, Kenta Kawamoto, Masayoshi Tomizuka, Wei Zhan,	(参考訳) 階層的強化学習(RL)は、政策を時間的に複数のレベルに抽象化することで、長期的な意思決定を促進することができる。スパース報酬環境における評価結果は、スキル、すなわちプリミティブアクションのシーケンスで見られる。通常、スキル潜在空間とポリシーはオフラインデータから発見される。しかしながら、結果として生じる低レベルのポリシーは、低カバレッジのデモンストレーションや分散シフトのために信頼性が低い可能性がある。そこで本研究では,Skill-Criticアルゴリズムを用いて,ハイレベルなスキル選択とともに低レベルなポリシーを微調整する手法を提案する。我々のスキル・クリティカル・アルゴリズムは、低レベルと高レベルの両方のポリシーを最適化する。これらのポリシーは、オフラインのデモから学んだ潜在空間によって初期化され、規則化され、並列ポリシーの最適化を導く。複数のスパース・リワードRL環境におけるスキル・クリティカルの評価を行い,グラナ・トゥリストスポーツにおけるスパース・リワード自律レースタスクについて検討した。実験の結果,Skill-Criticの低レベル政策の微調整と実演誘導型正規化が性能向上に不可欠であることが示唆された。コードとビデオは、私たちのWebサイト(https://sites.google.com/view/skill-critic)で入手できる。 Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance. Code and videos are available at our website: https://sites.google.com/view/skill-critic.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# 不均一HPCプラットフォームのためのディープラーニングハードウェアアクセラレータに関する調査 A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms ( http://arxiv.org/abs/2306.15552v2 ) ライセンス: Link先を確認	Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, Enrico Calore, Sebastiano Fabio Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke, Stefania Perri,	(参考訳) 近年のディープラーニング(DL)は、画像分類、コンピュータビジョン、音声認識などの高性能コンピューティング(HPC)アプリケーションにおいて、ハードウェアアクセラレーターを最も有効なソリューションとして採用している。本調査は,HPCアプリケーションの性能要件に適合するDLアクセラレータの設計における最新の進歩を要約し,分類する。特に、GPUやTPUベースのアクセラレータだけでなく、FPGAベースのアクセラレータやASICベースのアクセラレータ、Neural Processing Units、オープンハードウェアRISC-Vベースのアクセラレータ、コプロセッサといった、設計固有のハードウェアアクセラレータを含む、ディープラーニングアクセラレーションをサポートする最も高度なアプローチを強調している。このサーベイでは、新しいメモリ技術とコンピューティングパラダイムに基づくアクセラレータ、例えば3Dスタックされたプロセッサ・インメモリ、不揮発性メモリ(主に抵抗RAMと位相変化メモリ)をインメモリコンピューティングを実装するためのアクセラレータ、ニューロモーフィック処理ユニット、マルチチップモジュールに基づくアクセラレータについても説明している。新興技術の中には、量子ベースの加速器やフォトニクスに関する洞察も含まれています。結論として、この調査は、ディープラーニングの急速に発展する分野において、読者に包括的な視点を提供することを目的として、過去数年間に提案された最も影響力のあるアーキテクチャと技術を分類する。 Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the performance requirements of HPC applications. In particular, it highlights the most advanced approaches to support deep learning accelerations including not only GPU and TPU-based accelerators but also design-specific hardware accelerators such as FPGA-based and ASIC-based accelerators, Neural Processing Units, open hardware RISC-V-based accelerators and co-processors. The survey also describes accelerators based on emerging memory technologies and computing paradigms, such as 3D-stacked Processor-In-Memory, non-volatile memories (mainly, Resistive RAM and Phase Change Memories) to implement in-memory computing, Neuromorphic Processing Units, and accelerators based on Multi-Chip Modules. Among emerging technologies, we also include some insights into quantum-based accelerators and photonics. To conclude, the survey classifies the most influential architectures and technologies proposed in the last years, with the purpose of offering the reader a comprehensive perspective in the rapidly evolving field of deep learning.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# 分散シフトのためのモデリング言語の必要性について--タブラルデータセットの例- On the Need of a Modeling Language for Distribution Shifts: Illustrations on Tabular Datasets ( http://arxiv.org/abs/2307.05284v3 ) ライセンス: Link先を確認	Jiashuo Liu, Tianyu Wang, Peng Cui, Hongseok Namkoong,	(参考訳) 異なる分散シフトは異なる介入を必要とし、アルゴリズムはそれらが対処する特定のシフトに基礎を置く必要がある。しかし、ロバストアルゴリズムの方法論的開発は一般に経験的検証に欠ける構造的仮定に依存している。 5つのグラフデータセットと6万のメソッド構成に、不均衡学習と分散ロバスト最適化(DRO)メソッドを含む自然なシフトを含む実験的なテストベッドを構築した。 ML文献のX$(共変量)シフトに重きを置いているのとは対照的に、Y\|X$-shiftsはテストベッドで最も多く使われている。頑健なアルゴリズムの性能はシフトタイプによって大きく異なり、バニラ法ほど良くない。そこで我々はDRO手法の詳細な実験分析を行い、研究者によってしばしば無視されるが、基礎となるモデルクラス(例えば、XGBoost)やハイパーパラメータ選択などの実装の詳細は、あいまいさセットや半径よりもパフォーマンスに大きな影響を与えることを発見した。方法論的な研究と実践のギャップをさらに埋めるために、そのようなデータ駆動型、帰納的な分散シフトの理解が、データ中心とアルゴリズムの介入をいかに促進するかを示すケーススタディを設計する。 Different distribution shifts require different interventions, and algorithms must be grounded in the specific shifts they address. However, methodological development for robust algorithms typically relies on structural assumptions that lack empirical validation. Advocating for an empirically grounded data-driven approach to research, we build an empirical testbed comprising natural shifts across 5 tabular datasets and 60,000 method configurations encompassing imbalanced learning and distributionally robust optimization (DRO) methods. We find $Y\|X$-shifts are most prevalent on our testbed, in stark contrast to the heavy focus on $X$ (covariate)-shifts in the ML literature. The performance of robust algorithms varies significantly over shift types, and is no better than that of vanilla methods. To understand why, we conduct an in-depth empirical analysis of DRO methods and find that although often neglected by researchers, implementation details -- such as the choice of underlying model class (e.g., XGBoost) and hyperparameter selection -- have a bigger impact on performance than the ambiguity set or its radius. To further bridge that gap between methodological research and practice, we design case studies that illustrate how such a data-driven, inductive understanding of distribution shifts can enhance both data-centric and algorithmic interventions.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# 2+1)D SU(2) Yang-Mills Lattice Gauge理論のテンソルネットワークによる有限密度でのシミュレーション Simulating (2+1)D SU(2) Yang-Mills Lattice Gauge Theory at finite density with tensor networks ( http://arxiv.org/abs/2307.09396v3 ) ライセンス: Link先を確認	Giovanni Cataldi, Giuseppe Magnifico, Pietro Silvi, Simone Montangero,	(参考訳) 我々は、テンソルネットワーク(TN)を持つ2次元の非アベリア格子ゲージ理論を数値的にシミュレートする。ハミルトンの定式化におけるSU(2)Yang-Millsモデルに焦点をあて、動的物質と極小歪んだゲージ場(ハードコアグルーオン)を持つ。 TN符号プロブレムフリーアプローチにより、クォーク素質量と色電荷の関数として、0および有限バリオン数のモデルの位相図を特徴づける。中間系サイズでは、クォーク対有界準粒子(バリオン)の液相を検出し、その質量は連続極限に向かって有限である。低クォーク質量では、潜在的な分解の痕跡が見られ、高質量では、トポロジカル秩序のシグネチャが見られる。 We numerically simulate a non-Abelian lattice gauge theory in two spatial dimensions, with tensor networks (TN), up to intermediate sizes (>30 matter sites) well beyond exact diagonalization. We focus on the SU(2) Yang-Mills model in Hamiltonian formulation, with dynamical matter and minimally truncated gauge field (hardcore gluon). Thanks to the TN sign-problem-free approach, we characterize the phase diagram of the model at zero and finite baryon number as a function of the quark bare mass and color charge. At intermediate system sizes, we detect a liquid phase of quark-pair bound-state quasiparticles (baryons), whose mass is finite towards the continuum limit. Interesting phenomena arise at the transition boundary where color-electric and color-magnetic terms are maximally frustrated: For low quark masses, we see traces of potential deconfinement, while for high masses, signatures of a possible topological order.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# 自律運転システムのテスト・改善のための境界状態生成 Boundary State Generation for Testing and Improvement of Autonomous Driving Systems ( http://arxiv.org/abs/2307.10590v2 ) ライセンス: Link先を確認	Matteo Biagiola, Paolo Tonella,	(参考訳) 近年のディープニューラルネットワーク(DNN)とセンサ技術の進歩により、自律運転システム(ADS)の自律性はますます高まっている。しかし、その信頼性を評価することは依然として重要な問題である。最先端のADSテストアプローチでは、シミュレーション運転環境の制御可能な属性をADSが誤動作するまで変更する。このようなアプローチでは、ADSが成功している環境インスタンスは、ADSが誤動作する可能性のある隠れ運転条件を含む可能性があるにもかかわらず、破棄される。本稿では, ADS テストのための新しいテストジェネレータ GENBO (generator of Boundary State pairs) を提案する。 GENBOは、障害のない環境インスタンスで収集されたエゴ車両の駆動条件(位置、速度、方向)を変更し、同一環境インスタンス内の動作境界(すなわち、モデルが誤動作し始める場所)における挑戦駆動条件を効率よく生成する。このような境界条件を用いて、初期トレーニングデータセットを拡張し、テスト中のDNNモデルを再訓練する。評価結果から,再学習モデルでは,元のDNNモデルに対して,異なる評価トラックに対して平均で最大3倍の成功率を示した。 Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. In such approaches, environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GENBO (GENerator of BOundary state pairs), a novel test generator for ADS testing. GENBO mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment instance. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has, on average, up to 3x higher success rate on a separate set of evaluation tracks with respect to the original DNN model.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# D2S: カメラ再局在のためのスパース記述子と3次元座標の表現 D2S: Representing sparse descriptors and 3D coordinates for camera relocalization ( http://arxiv.org/abs/2307.15250v3 ) ライセンス: Link先を確認	Bach-Thuan Bui, Huy-Hoang Bui, Dinh-Tuan Tran, Joo-Ho Lee,	(参考訳) 最先端の視覚的ローカライゼーション手法は、主にローカル記述子と3Dポイントクラウドにマッチする複雑な手順に依存している。しかし、これらの手順は時間とともに推論、ストレージ、更新の点でかなりのコストを発生させる可能性がある。本研究では,複雑な局所記述子とそのシーン座標を表現するために,D2Sという単純なネットワークを用いた直接学習に基づくアプローチを提案する。その単純さと費用対効果が特徴である。テスト段階では、単一のRGBイメージをローカライズにのみ利用し、複雑なスパースシーンをエンコードするための軽量モデルのみを必要とする。提案したD2Sは、単純な損失関数とグラフアテンションを組み合わせて、雲や木、いくつかの動的オブジェクトなどの領域を無視しながら、堅牢な記述子に選択的にフォーカスする。この選択的な注意により、D2Sはスパースディスクリプタのバイナリ・セマンティック分類を効果的に行うことができる。さらに,シーン特異的な一般化とラベルなし観測による自己更新における視覚的局所化手法の能力を評価するための簡易な屋外データセットを提案する。本手法は,屋内および屋外環境におけるシーン座標の回帰において,最先端のCNN手法よりも優れる。ラベル付きデータソースがなくても、昼から夜への移行やドメインシフトへの適応といったシナリオを含む、トレーニングデータを超えて一般化する能力を示している。ソースコード、トレーニングされたモデル、データセット、デモビデオは以下のリンクで入手できる。 State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinates. Our method is characterized by its simplicity and cost-effectiveness. It solely leverages a single RGB image for localization during the testing phase and only requires a lightweight model to encode a complex sparse scene. The proposed D2S employs a combination of a simple loss function and graph attention to selectively focus on robust descriptors while disregarding areas such as clouds, trees, and several dynamic objects. This selective attention enables D2S to effectively perform a binary-semantic classification for sparse descriptors. Additionally, we propose a simple outdoor dataset to evaluate the capabilities of visual localization methods in scene-specific generalization and self-updating from unlabeled observations. Our approach outperforms the state-of-the-art CNN-based methods in scene coordinate regression in indoor and outdoor environments. It demonstrates the ability to generalize beyond training data, including scenarios involving transitions from day to night and adapting to domain shifts, even in the absence of the labeled data sources. The source code, trained models, dataset, and demo videos are available at the following link: https://thpjp.github.io/d2s.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# DISQ: 変分量子アルゴリズムのための動的反復スキーピング DISQ: Dynamic Iteration Skipping for Variational Quantum Algorithms ( http://arxiv.org/abs/2308.06634v3 ) ライセンス: Link先を確認	Junyao Zhang, Hanrui Wang, Gokul Subramanian Ravi, Frederic T. Chong, Song Han, Frank Mueller, Yiran Chen,	(参考訳) 本稿では,VQAトレーニングのための安定景観の構築と,ノイズドリフト問題への取り組みについて提案する。 DISQは参照回路を備えた「ドリフト検出器」を採用し、ノイズドリフトエラーによって深刻な影響を受ける繰り返しを特定し、スキップする。具体的には、前回のトレーニングイテレーションからの回路を、現在のイテレーションにおける基準回路として再実行し、ノイズドリフトの影響を推定する。この繰り返しはノイズドリフト誤差によって損なわれ、ノイズドリフトが理想的な最適化勾配の方向を反転した場合、スキップされる。ノイズドリフト検出の信頼性を高めるため,前回の繰り返しから複数の参照回路を利用する手法を提案する。それでも、複数の参照回路は、かなりの実行オーバーヘッドをもたらす。余分なオーバーヘッドを軽減するため、ドリフト検出時に大きな係数大(プライムサブセット)の観測可能な回路のみを実行するために、Pauli-term subsetting(プライムおよびマイナーサブセット)を提案する。現在のイテレーションがドリフトフリーである場合、この小さなサブセットのみが実行される。様々な応用およびQPUの評価により、DECはVQAに対するノイズドリフトの影響のかなりの部分を緩和し、従来のベースラインよりも1.51-2.24倍の忠実性向上を達成できることが示されている。 DISQの利点は1.1-1.9倍であり、平均ノイズ検出速度は2.07倍に向上する。 This paper proposes DISQ to craft a stable landscape for VQA training and tackle the noise drift challenge. DISQ adopts a "drift detector" with a reference circuit to identify and skip iterations that are severely affected by noise drift errors. Specifically, the circuits from the previous training iteration are re-executed as a reference circuit in the current iteration to estimate noise drift impacts. The iteration is deemed compromised by noise drift errors and thus skipped if noise drift flips the direction of the ideal optimization gradient. To enhance noise drift detection reliability, we further propose to leverage multiple reference circuits from previous iterations to provide a well founded judge of current noise drift. Nevertheless, multiple reference circuits also introduce considerable execution overhead. To mitigate extra overhead, we propose Pauli-term subsetting (prime and minor subsets) to execute only observable circuits with large coefficient magnitudes (prime subset) during drift detection. Only this minor subset is executed when the current iteration is drift-free. Evaluations across various applications and QPUs demonstrate that DISQ can mitigate a significant portion of the noise drift impact on VQAs and achieve 1.51-2.24x fidelity improvement over the traditional baseline. DISQ's benefit is 1.1-1.9x over the best alternative approach while boosting average noise detection speed by 2.07x	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# MetaWeather: 天気が劣化した画像の復元 MetaWeather: Few-Shot Weather-Degraded Image Restoration ( http://arxiv.org/abs/2308.14334v4 ) ライセンス: Link先を確認	Youngrae Kim, Younggeol Cho, Thanh-Tung Nguyen, Seunghoon Hong, Dongman Lee,	(参考訳) 実際の気象条件は複雑で、しばしば同時に起こる。しかし、既存の修復アプローチのほとんどは、訓練データにおける特定の気象条件の適用性に制限されており、実際の気象条件を含む目に見えない気象タイプへの一般化に苦慮している。この問題を解決するために,メタウェザー(MetaWeather)という,多種多様な新しい気象条件を単一統一モデルで処理できる普遍的なアプローチを導入する。メタウェザーは、強力なメタラーニングフレームワークを拡張し、気象劣化画像復元のタスクを、クエリ画像の劣化パターンを予測する数ショット適応問題として定式化し、新しい空間型マッチングアルゴリズムにより、目に見えない気象条件に適応することを学ぶ。 BID Task II.A, SPA-Data, RealSnow のデータセットによる実験結果から,提案手法が観測不能な気象条件に適応可能であることを示す。 Real-world weather conditions are intricate and often occur concurrently. However, most existing restoration approaches are limited in their applicability to specific weather conditions in training data and struggle to generalize to unseen weather types, including real-world weather conditions. To address this issue, we introduce MetaWeather, a universal approach that can handle diverse and novel weather conditions with a single unified model. Extending a powerful meta-learning framework, MetaWeather formulates the task of weather-degraded image restoration as a few-shot adaptation problem that predicts the degradation pattern of a query image, and learns to adapt to unseen weather conditions through a novel spatial-channel matching algorithm. Experimental results on the BID Task II.A, SPA-Data, and RealSnow datasets demonstrate that the proposed method can adapt to unseen weather conditions, significantly outperforming the state-of-the-art multi-weather image restoration methods.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# 医用画像登録のためのオンザフライ指導 On-the-Fly Guidance Training for Medical Image Registration ( http://arxiv.org/abs/2308.15216v5 ) ライセンス: Link先を確認	Yuelin Xin, Yicheng Chen, Shengxiang Ji, Kun Han, Xiaohui Xie,	(参考訳) 本研究は,既存の学習ベース画像登録モデルを改善するための新しいトレーニングフレームワークであるOn-the-Fly Guidance(OFG)を紹介し,弱教師付きおよび教師なし手法の限界に対処する。ラベル付きデータの不足により、弱教師付き手法は困難であり、教師なし手法は正確性のために画像類似度指標に直接依存する。本手法では,ラベル付きデータを必要としない登録モデルをトレーニングするための教師付き手法を提案する。 OFGは、変形予測を微分可能なオプティマイザで精錬することにより、トレーニング中に擬似地下真理を生成する。 OFGは変形予測を効率的に最適化し、推論速度を犠牲にすることなく、登録モデルの性能を向上させる。提案手法は,複数のベンチマークデータセットおよび先行モデルで検証され,性能が大幅に向上し,学習ベース登録モデルの訓練のためのプラグアンドプレイソリューションが提供される。 https://github.com/cilix-ai/on-the-fly-guidance This study introduces a novel On-the-Fly Guidance (OFG) training framework for enhancing existing learning-based image registration models, addressing the limitations of weakly-supervised and unsupervised methods. Weakly-supervised methods struggle due to the scarcity of labeled data, and unsupervised methods directly depend on image similarity metrics for accuracy. Our method proposes a supervised fashion for training registration models, without the need for any labeled data. OFG generates pseudo-ground truth during training by refining deformation predictions with a differentiable optimizer, enabling direct supervised learning. OFG optimizes deformation predictions efficiently, improving the performance of registration models without sacrificing inference speed. Our method is tested across several benchmark datasets and leading models, it significantly enhanced performance, providing a plug-and-play solution for training learning-based registration models. Code available at: https://github.com/cilix-ai/on-the-fly-guidance	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# 地域差分私的プロトコルの真のコストを明らかにする:監査的視点 Revealing the True Cost of Locally Differentially Private Protocols: An Auditing Perspective ( http://arxiv.org/abs/2309.01597v3 ) ライセンス: Link先を確認	Héber H. Arcolezi, Sébastien Gambs,	(参考訳) 従来のDP監査は,集中型モデル(例えば,DP-SGDアルゴリズムの監査)に主眼を置いているが,我々は,この手法をローカルDP(LDP)監査に拡張することを提唱している。これを実現するために,ローカルな差分秘密機構のプライバシー損失を実証的に推定する LDP-Auditor フレームワークを提案する。このアプローチは、LDP周波数推定プロトコルに対するプライバシー攻撃の設計における最近の進歩を活用する。より正確には、数多くの最先端のLDPプロトコルの分析を通じて、異なるエンコーディングや摂動関数の影響など、プライバシー監査に影響を与える要因を幅広く検討する。さらに、ドメインサイズと理論的プライバシ損失パラメータ$\epsilon$と$\delta$が局所的なプライバシ推定に与える影響についても検討する。また, 長期研究用LDPプロトコルに対する識別可能性攻撃や多次元データなど, LDP監査の具体的な側面を明らかにするために, 詳細なケーススタディも実施されている。最後に,現在最先端の LDP Python パッケージにバグが発見されている LDP-Auditor フレームワークの顕著な成果を示す。 LDPプロトコルにおけるランダム性や情報損失の源泉について,我々のLDP-Auditorフレームワークおよび本研究は,総合的に貴重な知見を提供する。これらのコントリビューションは、局所的なプライバシ損失の現実的な理解を提供するもので、実践者がそれぞれの要求に最も適した LDP メカニズムとプライバシパラメータを選択するのに役立ちます。我々は LDP-Auditor in \url{https://github.com/hharcolezi/ldp-audit} をオープンソース化した。 While the existing literature on Differential Privacy (DP) auditing predominantly focuses on the centralized model (e.g., in auditing the DP-SGD algorithm), we advocate for extending this approach to audit Local DP (LDP). To achieve this, we introduce the LDP-Auditor framework for empirically estimating the privacy loss of locally differentially private mechanisms. This approach leverages recent advances in designing privacy attacks against LDP frequency estimation protocols. More precisely, through the analysis of numerous state-of-the-art LDP protocols, we extensively explore the factors influencing the privacy audit, such as the impact of different encoding and perturbation functions. Additionally, we investigate the influence of the domain size and the theoretical privacy loss parameters $\epsilon$ and $\delta$ on local privacy estimation. In-depth case studies are also conducted to explore specific aspects of LDP auditing, including distinguishability attacks on LDP protocols for longitudinal studies and multidimensional data. Finally, we present a notable achievement of our LDP-Auditor framework, which is the discovery of a bug in a state-of-the-art LDP Python package. Overall, our LDP-Auditor framework as well as our study offer valuable insights into the sources of randomness and information loss in LDP protocols. These contributions collectively provide a realistic understanding of the local privacy loss, which can help practitioners in selecting the LDP mechanism and privacy parameters that best align with their specific requirements. We open-sourced LDP-Auditor in \url{https://github.com/hharcolezi/ldp-audit}.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# 自動運転知覚における深層学習の安全性に関する考察 Deep Learning Safety Concerns in Automated Driving Perception ( http://arxiv.org/abs/2309.03774v3 ) ライセンス: Link先を確認	Stephanie Abrecht, Alexander Hirsch, Shervin Raafatnia, Matthias Woehrle,	(参考訳) 近年のディープラーニング分野の進歩と認識のためのディープニューラルネットワーク(DNN)の性能向上により、自動走行(AD)システムへの需要が高まっている。このようなシステムの安全性は極めて重要であるため、DNNのユニークな特性を考慮する必要がある。系統的かつ包括的アプローチでDNNに基づく認識コンポーネントを用いたADシステムの安全性を実現するために,いわゆる安全懸念が適切な構造要素として導入されている。一方、安全上の懸念という概念は、ISO 21448(SOTIF)のようなADシステムの安全性に関する既存の標準によく適合している。一方、すでにいくつかの学術出版物や、ISO PAS 8800のようなAI安全性に関する今後の標準に触発されている。安全に関する概念は以前から紹介されてきたが,本論文では,様々な分野の専門家や安全専門家からのフィードバックを活用して,その拡張と改良を行っている。特に,クロスファンクショナルなチームが共同で関心事に対処できるようにすると同時に,理解を深めるための新たな分類を導入する。 Recent advances in the field of deep learning and impressive performance of deep neural networks (DNNs) for perception have resulted in an increased demand for their use in automated driving (AD) systems. The safety of such systems is of utmost importance and thus requires to consider the unique properties of DNNs. In order to achieve safety of AD systems with DNN-based perception components in a systematic and comprehensive approach, so-called safety concerns have been introduced as a suitable structuring element. On the one hand, the concept of safety concerns is -- by design -- well aligned to existing standards relevant for safety of AD systems such as ISO 21448 (SOTIF). On the other hand, it has already inspired several academic publications and upcoming standards on AI safety such as ISO PAS 8800. While the concept of safety concerns has been previously introduced, this paper extends and refines it, leveraging feedback from various domain and safety experts in the field. In particular, this paper introduces an additional categorization for a better understanding as well as enabling cross-functional teams to jointly address the concerns.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class-Agnostic Counting ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting ( http://arxiv.org/abs/2309.04820v2 ) ライセンス: Link先を確認	Michael A. Hobley, Victor A. Prisacariu,	(参考訳) クラスに依存しないカウントメソッドは任意のクラスのオブジェクトを列挙する。以前の作業では、カウントされる型の例のセットか、クエリイメージが単一のタイプのオブジェクトのみを含む必要があるため、有用性が制限されていた。これらの欠点の重要な要因は、複数の種類のオブジェクトが存在する設定におけるカウントに適切に対処するデータセットがないことである。これらの問題に対処するため、トレーニングや推論の例を使わずに複数の種類のオブジェクトを同時にカウントする手法であるMCAC(Multi-class-Agnostic Counting dataset)とABC123(A Blind Counter)を提案する。 ABC123は新しいパラダイムを導入し、例題を列挙をガイドする代わりに、ユーザが生成した出力を理解するのを助けるために、カウントステージの後に例が見つかる。 ABC123は,ヒトのループ内アノテーションを必要とせず,MCACの現代的な手法よりも優れていることを示す。また、この性能は標準クラスに依存しないカウントデータセットであるFSC-147に転送されることを示す。 MCACはMCAC.active.visionで、ABC123はABC123.active.visionで入手できる。 Class-agnostic counting methods enumerate objects of an arbitrary class, providing tremendous utility in many fields. Prior works have limited usefulness as they require either a set of examples of the type to be counted or that the query image contains only a single type of object. A significant factor in these shortcomings is the lack of a dataset to properly address counting in settings with more than one kind of object present. To address these issues, we propose the first Multi-class, Class-Agnostic Counting dataset (MCAC) and A Blind Counter (ABC123), a method that can count multiple types of objects simultaneously without using examples of type during training or inference. ABC123 introduces a new paradigm where instead of requiring exemplars to guide the enumeration, examples are found after the counting stage to help a user understand the generated outputs. We show that ABC123 outperforms contemporary methods on MCAC without needing human in-the-loop annotations. We also show that this performance transfers to FSC-147, the standard class-agnostic counting dataset. MCAC is available at MCAC.active.vision and ABC123 is available at ABC123.active.vision.	翻訳日:2024-07-16 05:56:40 公開日:2024-07-12
# 地理空間気象データに基づく深部ニューラルネットワークによる長期干ばつ予測 Long-term drought prediction using deep neural networks based on geospatial weather data ( http://arxiv.org/abs/2309.06212v6 ) ライセンス: Link先を確認	Alexander Marusov, Vsevolod Grabar, Yury Maximov, Nazar Sotiriadi, Alexander Bulkin, Alexey Zaytsev,	(参考訳) 農業計画や保険には1年前から予測される高品質の干ばつの問題が不可欠である。しかし、データの複雑さと乾燥確率性のために、妥当な精度で解決されていない。我々は、月次気象データを入力としてアクセス可能な時空間ニューラルネットワークモデルを採用するエンドツーエンドアプローチを導入することで、干ばつデータに取り組む。本研究は,Palmer Drought Severity Index(PDSI)予測の有効性を評価するために,多種多様なモデルと5つの異なる環境領域を用いた。重要な集約された発見は、TransformerモデルであるEarthFormerの、正確な短期(最大6ヶ月)の予測における例外的なパフォーマンスである。同時に、畳み込みLSTMは長期的な予測に優れている。 The problem of high-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance. Yet, it is still unsolved with reasonable accuracy due to data complexity and aridity stochasticity. We tackle drought data by introducing an end-to-end approach that adopts a spatio-temporal neural network model with accessible open monthly climate data as the input. Our systematic research employs diverse proposed models and five distinct environmental regions as a testbed to evaluate the efficacy of the Palmer Drought Severity Index (PDSI) prediction. Key aggregated findings are the exceptional performance of a Transformer model, EarthFormer, in making accurate short-term (up to six months) forecasts. At the same time, the Convolutional LSTM excels in longer-term forecasting.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# グラフにおけるコミュニティ検出の概観 A Comprehensive Review of Community Detection in Graphs ( http://arxiv.org/abs/2309.11798v5 ) ライセンス: Link先を確認	Jiakang Li, Songning Lai, Zhihao Shuai, Yuan Tan, Yifan Jia, Mianyang Yu, Zichen Song, Xiaokang Peng, Ziyang Xu, Yongxin Ni, Haifeng Qiu, Jiayu Yang, Yutong Liu, Yonggang Lu,	(参考訳) 複雑なネットワークの研究は、実世界のグラフの重要な特徴となるコミュニティ構造の理解を著しく前進させてきた。グラフ内のコミュニティを検出することは、社会学、生物学、計算機科学の応用において難しい問題である。学際的な科学者コミュニティの努力にもかかわらず、この問題に対する十分な解決策はまだ得られていない。この記事では、モジュラリティに基づく手法、スペクトルクラスタリング、確率論的モデリング、ディープラーニングの観点から、様々なコミュニティ検出手法の徹底的な説明として機能するグラフにおけるコミュニティ検出のトピックについて論じる。また,提案手法とともに,私たちによって設計されたコミュニティ検出手法についても紹介する。さらに,これらの手法の真理と非真理のデータセット上での性能を比較した。結論として、この包括的なレビューは、グラフにおけるコミュニティ検出の深い理解を提供する。 The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been achieved. This review article delves into the topic of community detection in graphs, which serves as a thorough exposition of various community detection methods from perspectives of modularity-based method, spectral clustering, probabilistic modelling, and deep learning. Along with the methods, a new community detection method designed by us is also presented. Additionally, the performance of these methods on the datasets with and without ground truth is compared. In conclusion, this comprehensive review provides a deep understanding of community detection in graphs.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# 量子状態のストリーム化 Streaming quantum state purification ( http://arxiv.org/abs/2309.16387v2 ) ライセンス: Link先を確認	Andrew M. Childs, Honghao Fu, Debbie Leung, Zhi Li, Maris Ozols, Vedang Vyas,	(参考訳) 量子状態浄化は、未知の純粋な量子状態のほぼ純粋なコピーを、複数のノイズのある状態のコピーを使って復元するタスクである。この基本的なタスクは、ノイズの多いチャネル上の量子通信や不完全なデバイスによる量子計算に応用できるが、これまでは量子ビットの場合にのみ研究されてきた。初期誤差パラメータから始まる任意の次元のクォーディットのスワップテストに基づいて効率的な浄化手順を導出する。初期誤差パラメータと次元を定数として扱うことで,本手法が最終誤差パラメータに漸近的に最適なサンプル複雑性を持つことを示す。我々のプロトコルは単純な再帰的構造を持ち、状態がストリーミング形式で一度に1つ提供されると適用でき、実装には小さな量子メモリしか必要としない。 Quantum state purification is the task of recovering a nearly pure copy of an unknown pure quantum state using multiple noisy copies of the state. This basic task has applications to quantum communication over noisy channels and quantum computation with imperfect devices, but has only been studied previously for the case of qubits. We derive an efficient purification procedure based on the swap test for qudits of any dimension, starting with any initial error parameter. Treating the initial error parameter and the dimension as constants, we show that our procedure has sample complexity asymptotically optimal in the final error parameter. Our protocol has a simple recursive structure that can be applied when the states are provided one at a time in a streaming fashion, requiring only a small quantum memory to implement.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# ゼロショット連続プロンプト転送:言語モデル全体でのタスクセマンティクスの一般化 Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models ( http://arxiv.org/abs/2310.01691v2 ) ライセンス: Link先を確認	Zijun Wu, Yongkang Wu, Lili Mou,	(参考訳) 自然言語処理(NLP)におけるプロンプトチューニングは、大規模言語モデルを特定のタスクに適応させる手法として、ますます人気が高まっている。しかし、これらのプロンプト、特に連続的なプロンプトの異なるモデル間での転送性は依然として課題である。本研究では,ゼロショット連続プロンプト転送手法を提案する。この方法では,ソースプロンプトを相対空間に符号化し,対応するターゲットプロンプトを探索してターゲットモデルに転送する。実験により提案手法の有効性を確認し, 連続的プロンプトにおける「タスク意味論」が様々な言語モデルにまたがって一般化可能であることを示す。さらに、複数のソースモデルから「タスクセマンティクス」を組み合わせることで、転送の一般化性がさらに高められることが判明した。 Prompt tuning in natural language processing (NLP) has become an increasingly popular method for adapting large language models to specific tasks. However, the transferability of these prompts, especially continuous prompts, between different models remains a challenge. In this work, we propose a zero-shot continuous prompt transfer method, where source prompts are encoded into relative space and the corresponding target prompts are searched for transferring to target models. Experimental results confirm the effectiveness of our method, showing that 'task semantics' in continuous prompts can be generalized across various language models. Moreover, we find that combining 'task semantics' from multiple source models can further enhance the generalizability of transfer.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# 物理を意識した機械学習は、機械学習とプロセスベースの水文学のための科学パラダイムに革命をもたらす Physics-aware Machine Learning Revolutionizes Scientific Paradigm for Machine Learning and Process-based Hydrology ( http://arxiv.org/abs/2310.05227v5 ) ライセンス: Link先を確認	Qingsong Xu, Yilei Shi, Jonathan Bamber, Ye Tuo, Ralf Ludwig, Xiao Xiang Zhu,	(参考訳) 正確な水文学理解と水循環予測は、水資源の管理に関連する科学的・社会的課題、特に人為的気候変動の動的影響に対処するために重要である。既存のレビューは、この分野における機械学習(ML)の開発に重点を置いているが、異なるパラダイムとして、水文学とMLを明確に区別している。本稿では、認識される障壁を克服し、両方のフィールドに革命をもたらすための変換アプローチとして、物理認識型MLを紹介する。具体的には、物理知識や物理に基づくモデリングをMLに統合する既存の手法の構造化されたコミュニティ(PaML)を構築し、物理を意識したML手法の総合的なレビューを行う。物理データ誘導型ML、物理インフォーム型ML、物理埋め込み型ML、物理認識型ハイブリッド学習の4つの側面について、これらのPaML方法論を体系的に分析する。 PaMLはML支援仮説を促進し、ビッグデータからの洞察を加速し、科学的発見を促進する。まず,PaMLにおける水文学の体系的検討を行い,降雨・流出水文プロセスや流体力学プロセスについて概観し,様々な目的やPaML手法について,最も有望で挑戦的な方向性を強調した。最後に、新しいPaMLベースの水文学プラットフォームであるHydroPMLが、水学応用の基礎としてリリースされた。 HydroPMLはMLの説明可能性と因果性を高め、デジタル水循環の実現の基礎となる。 HydroPMLプラットフォームはhttps://hydropml.github.io/.comで公開されている。 Accurate hydrological understanding and water cycle prediction are crucial for addressing scientific and societal challenges associated with the management of water resources, particularly under the dynamic influence of anthropogenic climate change. Existing reviews predominantly concentrate on the development of machine learning (ML) in this field, yet there is a clear distinction between hydrology and ML as separate paradigms. Here, we introduce physics-aware ML as a transformative approach to overcome the perceived barrier and revolutionize both fields. Specifically, we present a comprehensive review of the physics-aware ML methods, building a structured community (PaML) of existing methodologies that integrate prior physical knowledge or physics-based modeling into ML. We systematically analyze these PaML methodologies with respect to four aspects: physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning. PaML facilitates ML-aided hypotheses, accelerating insights from big data and fostering scientific discoveries. We first conduct a systematic review of hydrology in PaML, including rainfall-runoff hydrological processes and hydrodynamic processes, and highlight the most promising and challenging directions for different objectives and PaML methods. Finally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for hydrological applications. HydroPML enhances the explainability and causality of ML and lays the groundwork for the digital water cycle's realization. The HydroPML platform is publicly available at https://hydropml.github.io/.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# HiFi-123:高精細画像から3Dコンテンツ生成へ HiFi-123: Towards High-fidelity One Image to 3D Content Generation ( http://arxiv.org/abs/2310.06744v3 ) ライセンス: Link先を確認	Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, Wenbo Hu, Long Quan, Ying Shan, Yonghong Tian,	(参考訳) 拡散モデルの最近の進歩により、単一の画像から3次元生成が可能になった。しかし、現在の手法は、ぼやけたテクスチャや参照画像からの逸脱を伴って、新しいビューの最適化結果を生成することが多く、実用的利用を制限している。本稿では,高忠実かつ多視点で一貫した3次元生成が可能なHiFi-123を提案する。まず,拡散型ゼロショットノベルビュー合成法の忠実度を大幅に向上させるRGNV(Reference-Guided Novel View Enhancement)手法を提案する。第二に、RGNVに乗じて、新しいRGSD(Reference-Guided State Distillation)の損失を示す。最適化に基づくイメージ・ツー・3Dパイプラインに組み込むと、3D生成の品質が大幅に向上し、最先端の性能が達成される。包括的評価は,既存手法に対するアプローチの有効性を質的,定量的に示すものである。ビデオはプロジェクトページで見ることができる。 Recent advances in diffusion models have enabled 3D generation from a single image. However, current methods often produce suboptimal results for novel views, with blurred textures and deviations from the reference image, limiting their practical applications. In this paper, we introduce HiFi-123, a method designed for high-fidelity and multi-view consistent 3D generation. Our contributions are twofold: First, we propose a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods. Second, capitalizing on the RGNV, we present a novel Reference-Guided State Distillation (RGSD) loss. When incorporated into the optimization-based image-to-3D pipeline, our method significantly improves 3D generation quality, achieving state-of-the-art performance. Comprehensive evaluations demonstrate the effectiveness of our approach over existing methods, both qualitatively and quantitatively. Video results are available on the project page.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# 集中治療室の視力検査と臨床所見との関連 Detecting Visual Cues in the Intensive Care Unit and Association with Patient Clinical Status ( http://arxiv.org/abs/2311.00565v2 ) ライセンス: Link先を確認	Subhash Nerella, Ziyuan Guan, Andrea Davidson, Yuanfang Ren, Tezcan Baslanti, Brooke Armfield, Patrick Tighe, Azra Bihorac, Parisa Rashidi,	(参考訳) 集中治療室(ICU)は、生命を脅かす疾患のある患者に対して、綿密な監督と継続的なケアを提供する。しかし、ICUの継続的な患者評価は、時間的制約と医療提供者の作業負荷により、依然として制限されている。 ICUの既存の患者評価、例えば痛みや移動性評価は散発的に行われ、手動で管理される。 ICUで人間のアセスメントを強化する人工知能(AI)ツールの開発は、より客観的できめ細かい監視機能を提供する上で有用である。例えば、痛みや興奮に関連する患者の顔の手がかりの変化を捉えることは、痛みに関連する薬の調整や、デリリウムのような扇動誘発状態を検出するのに役立ちます。さらに, 臨床症状の軽微な変化は, 高解像度の生理学的信号や電子健康記録(EHR)データと組み合わせることで, 継続的な患者のモニタリングに役立つ可能性がある。本稿では,視力低下,急性脳機能障害,痛みなど,視力と患者の状態との関連について検討した。 ICUで収集した107,064フレームのAU-ICUデータセットに、訓練されたアノテータによる顔アクションユニット(AU)ラベルを付与した。我々はデータ資源利用の最大化によりデータ不均衡問題に対処する新しい「マスケッド損失計算」手法を開発した。 AU-ICUデータセットと3つの外部データセットを用いてモデルをトレーニングし、18個のAUを検出した。 SWINトランスモデルはF1スコア平均0.57、テストセット平均0.89を達成した。さらに,634,054フレームのAU推論を行い,顔面AUと重症度,急性脳機能障害,痛みなどの臨床症状との関連性について検討した。 Intensive Care Units (ICU) provide close supervision and continuous care to patients with life-threatening conditions. However, continuous patient assessment in the ICU is still limited due to time constraints and the workload on healthcare providers. Existing patient assessments in the ICU such as pain or mobility assessment are mostly sporadic and administered manually, thus introducing the potential for human errors. Developing Artificial intelligence (AI) tools that can augment human assessments in the ICU can be beneficial for providing more objective and granular monitoring capabilities. For example, capturing the variations in a patient's facial cues related to pain or agitation can help in adjusting pain-related medications or detecting agitation-inducing conditions such as delirium. Additionally, subtle changes in visual cues during or prior to adverse clinical events could potentially aid in continuous patient monitoring when combined with high-resolution physiological signals and Electronic Health Record (EHR) data. In this paper, we examined the association between visual cues and patient condition including acuity status, acute brain dysfunction, and pain. We leveraged our AU-ICU dataset with 107,064 frames collected in the ICU annotated with facial action units (AUs) labels by trained annotators. We developed a new "masked loss computation" technique that addresses the data imbalance problem by maximizing data resource utilization. We trained the model using our AU-ICU dataset in conjunction with three external datasets to detect 18 AUs. The SWIN Transformer model achieved 0.57 mean F1-score and 0.89 mean accuracy on the test set. Additionally, we performed AU inference on 634,054 frames to evaluate the association between facial AUs and clinically important patient conditions such as acuity status, acute brain dysfunction, and pain.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# 多変量機能主成分分析における成分数推定について On the estimation of the number of components in multivariate functional principal component analysis ( http://arxiv.org/abs/2311.04540v2 ) ライセンス: Link先を確認	Steven Golovkine, Edward Gunning, Andrew J. Simpkin, Norma Bargary,	(参考訳) Happ and Greven (2018) は、異なる次元領域で観測されたデータに対する多変量関数データの主成分分析のための方法論を開発した。彼らのアプローチは、各単変数機能特徴に対する単変数機能主成分の推定に依存する。本稿では,保持する主成分数を選択するための広範囲なシミュレーションについて述べる。本研究では,多変量機能データにおける分散の全体的パーセンテージを説明するために,各単変量機能特徴に対して,従来の分散説明しきい値を用いた手法は信頼できない可能性があることを実証的に示す。 Happ and Greven (2018) developed a methodology for principal components analysis of multivariate functional data for data observed on different dimensional domains. Their approach relies on an estimation of univariate functional principal components for each univariate functional feature. In this paper, we present extensive simulations to investigate choosing the number of principal components to retain. We show empirically that the conventional approach of using a percentage of variance explained threshold for each univariate functional feature may be unreliable when aiming to explain an overall percentage of variance in the multivariate functional data, and thus we advise practitioners to be careful when using it.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# スピンオプティカル量子コンピューティングアーキテクチャ A Spin-Optical Quantum Computing Architecture ( http://arxiv.org/abs/2311.05605v4 ) ライセンス: Link先を確認	Grégoire de Gliniasty, Paul Hilaire, Pierre-Emmanuel Emeriau, Stephen C. Wein, Alexia Salavrakos, Shane Mansfield,	(参考訳) フォールトトレラント量子コンピューティング用に設計された適応性とモジュール型ハイブリッドアーキテクチャを提案する。量子エミッターと線形光学的エンタングゲートを組み合わせて、物質ベースのアプローチとフォトニックベースのアプローチの両方の強度を利用する。アーキテクチャの重要な特徴は、その実用性であり、実験的に証明された光学部品の利用に基礎を置いている。我々のフレームワークは量子エラー訂正コードの実行を可能にするが、特に遠方の光リンクを介して組み込みの非ローカル接続を利用することにより、低密度パリティチェックコードのスケーラビリティを維持する。その効率性を評価するために,物理的に動機付けられた誤りモデルを用いてアーキテクチャを評価した。既存の全フォトニックアーキテクチャに匹敵する損失許容性を示すが、従来はリソース集約多重化に依存していた複雑な線形光学的資源状態生成モジュールは不要である。アーキテクチャの汎用性は、さらなるパフォーマンス標準を向上するための、未知の道も提供します。 We introduce an adaptable and modular hybrid architecture designed for fault-tolerant quantum computing. It combines quantum emitters and linear-optical entangling gates to leverage the strength of both matter-based and photonic-based approaches. A key feature of the architecture is its practicality, grounded in the utilisation of experimentally proven optical components. Our framework enables the execution of any quantum error correcting code, but in particular maintains scalability for low-density parity check codes by exploiting built-in non-local connectivity through distant optical links. To gauge its efficiency, we evaluated the architecture using a physically motivated error model. It exhibits loss tolerance comparable to existing all-photonic architecture but without the need for intricate linear-optical resource-state-generation modules that conventionally rely on resource-intensive multiplexing. The versatility of the architecture also offers uncharted avenues for further advancing performance standards.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# 制限された純度状態をもつ可観測物の最大期待 Maximum expectation of observables with restricted purity states ( http://arxiv.org/abs/2311.07680v2 ) ライセンス: Link先を確認	Vikesh Siddhu, John Smolin,	(参考訳) 実用的な量子情報処理(QIP)の評価は、ノイズによって課される限界を理解せずに部分的に行われている。残念なことに、単なるノイズの記述はシステムサイズとともに指数関数的に増加し、差し迫った実用的関心を持つ控えめな大きさのシステムでさえも煩雑になる。我々は、ノイズの多い量子状態の準備、検証、観察を行うための推定の必要性を満たす。この推定のために,有界純度状態上での任意の$d$次元可観測物の期待値を最大化する高速数値アルゴリズムを提案する。これは測定可能な方法でノイズの純度因子に縛られる。我々の最も高速なアルゴリズムは、観測可能量の固有分解が分かっていれば$O(d)$ ステップを、さもなくば最悪の場合には$O(d^3)$ ステップを取る。このアルゴリズムはまた、凸や非凸純度制約による量子状態トモグラフィーの最大推定を解く。数値は、我々のキーサブルーチンの性能を示す(固定ベクトルと重なり合うような有界ノルムを持つ確率ベクトルが線形時間で見つかる)。我々の研究は、量子ノイズによるQIPの制限を評価するための実践的な道のりを推し進めている。その過程では、単純だが基本的な洞察を与え、ノイズの多いシステム(同じく雑音の多いハミルトン派)は、常にノイズのないシステムよりも高い基底状態エネルギーを与える。 Assessment of practical quantum information processing (QIP) remains partial without understanding limits imposed by noise. Unfortunately, mere description of noise grows exponentially with system size, becoming cumbersome even for modest sized systems of imminent practical interest. We fulfill the need for estimates on performing noisy quantum state preparation, verification, and observation. To do the estimation we propose fast numerical algorithms to maximize the expectation value of any $d$-dimensional observable over states of bounded purity. This bound on purity factors in noise in a measurable way. Our fastest algorithm takes $O(d)$ steps if the eigendecomposition of the observable is known, otherwise takes $O(d^3)$ steps at worst. The algorithms also solve maximum likelihood estimation for quantum state tomography with convex and even non-convex purity constraints. Numerics show performance of our key sub-routine (it finds in linear time a probability vector with bounded norm that most overlaps with a fixed vector) can be several orders of magnitude faster than a common state-of-the-art convex optimization solver. Our work fosters a practical way forward to asses limitations on QIP imposed by quantum noise. Along the way, we also give a simple but fundamental insight, noisy systems (equivalently noisy Hamiltonians) always give higher ground-state energy than their noiseless counterparts.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# 命令制御可能な要約のための大規模言語モデルのベンチマーク生成と評価能力 Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization ( http://arxiv.org/abs/2311.09184v2 ) ライセンス: Link先を確認	Yixin Liu, Alexander R. Fabbri, Jiawen Chen, Yilun Zhao, Simeng Han, Shafiq Joty, Pengfei Liu, Dragomir Radev, Chien-Sheng Wu, Arman Cohan,	(参考訳) 大規模言語モデル(LLM)は、標準の総和化ベンチマークでは高い性能を達成することができるが、より複雑な総和化タスク設定では、その性能は研究されていない。そこで本研究では,命令制御可能なテキスト要約に対してLCMをベンチマークし,モデル入力が所望の要約特性に対して,ソース記事と自然言語要求の両方から成り立っていることを示す。そこで我々は,このタスク設定のための評価専用データセットをキュレートし,LLMに基づく5つのシステムの人間による評価を行い,制御可能な要約における命令追従能力を評価する。次に、4つの異なる評価プロトコルと11個のLCMを用いて、このタスクの自動評価をベンチマークし、40個の評価方法を得た。本研究は,(1) 評価された全てのLCMは,その要約において事実的および他の種類の誤りを犯しているため,命令制御可能なテキスト要約は依然として困難な課題であり,(2) 候補要約の質を判断する上で,LLMに基づく評価手法が人間アノテータと強い整合性を達成できないこと,(3) 異なるLCMが要約生成と評価能力において大きなパフォーマンスギャップを示すこと,などを明らかにする。収集したベンチマークであるInstruSumを公開して、今後の研究を促進する。 While large language models (LLMs) can already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction controllable text summarization, where the model input consists of both a source article and a natural language requirement for desired summary characteristics. To this end, we curate an evaluation-only dataset for this task setting and conduct human evaluations of five LLM-based systems to assess their instruction-following capabilities in controllable summarization. We then benchmark LLM-based automatic evaluation for this task with 4 different evaluation protocols and 11 LLMs, resulting in 40 evaluation methods. Our study reveals that instruction controllable text summarization remains a challenging task for LLMs, since (1) all LLMs evaluated still make factual and other types of errors in their summaries; (2) no LLM-based evaluation methods can achieve a strong alignment with human annotators when judging the quality of candidate summaries; (3) different LLMs show large performance gaps in summary generation and evaluation capabilities. We make our collected benchmark InstruSum publicly available to facilitate future research in this direction.	翻訳日:2024-07-16 05:46:55 公開日:2024-07-12
# マルチホップQAデータセットと擬似インストラクションチューニングによる大規模言語モデルのロバスト時間推論に向けて Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning ( http://arxiv.org/abs/2311.09821v2 ) ライセンス: Link先を確認	Qingyu Tan, Hwee Tou Ng, Lidong Bing,	(参考訳) 現実世界の知識は常に更新されている。しかし、大きな言語モデル(LLM)を頻繁に更新するのはコストがかかる。したがって、LLMには時間的知識の概念を理解することが不可欠である。しかし、時間的質問応答 (TQA) に関する先行研究は、時間的推論の複数解答と複数ホップタイプを強調していなかった。本稿では,複数の質問応答と複数ホップの時間的推論に焦点をあてた,複雑な時間的質問応答データセットであるComplex-TRを提案する。また,LLMの複雑な時間的推論能力とロバスト性を改善するために,新たなデータ拡張戦略を提案する。複数の時間的QAデータセットについて実験を行った。実験結果から,本手法は時間的QAベンチマークにおけるLLMの性能をかなりのマージンで向上できることが示された。私たちのコードとデータは、https://github.com/nusnlp/complex-tr.comでリリースされています。 Knowledge in the real world is being updated constantly. However, it is costly to frequently update large language models (LLMs). Therefore, it is crucial for LLMs to understand the concept of temporal knowledge. However, prior works on temporal question answering (TQA) did not emphasize multi-answer and multi-hop types of temporal reasoning. In this paper, we propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning. Besides, we also propose a novel data augmentation strategy to improve the complex temporal reasoning capability and robustness of LLMs. We conducted experiments on multiple temporal QA datasets. Experimental results show that our method is able to improve LLMs' performance on temporal QA benchmarks by significant margins. Our code and data are released at: https://github.com/nusnlp/complex-tr.	翻訳日:2024-07-16 05:37:11 公開日:2024-07-12
# 双極子XYリドバーグシミュレータにおけるクエンチダイナミクスからの初等励起の分光 Spectroscopy of elementary excitations from quench dynamics in a dipolar XY Rydberg simulator ( http://arxiv.org/abs/2311.11726v2 ) ライセンス: Link先を確認	Cheng Chen, Gabriel Emperauger, Guillaume Bornet, Filippo Caleca, Bastien Gély, Marcus Bintz, Shubhayu Chatterjee, Vincent Liu, Daniel Barredo, Norman Y. Yao, Thierry Lahaye, Fabio Mezzacapo, Tommaso Roscilde, Antoine Browaeys,	(参考訳) 我々はRydberg量子シミュレータを用いて、多体系の低エネルギー励起を探索するクエンチ分光と呼ばれる新しいタイプの分光法を実証する。本稿では,スピン-1/2双極子XYモデルの二次元シミュレーションについて述べる。クエンチ後の空間スピン相関ダイナミクスの顕微鏡計測により, 強磁性体と反強磁性体の双方に対する基本励起の分散関係を抽出する。相互作用の長距離の性質から生じる2つのケースと反強磁性体に固有のフラストレーションの質的に異なる挙動を観察する。特に、強磁性体は線形スピン波として振る舞う基本的な励起を示す。反強磁性体では、スピン波は崩壊し、強い非線形性の存在が示唆される。実演では,多体系の励起スペクトルにおけるパワー・ロー相互作用の重要性を強調した。 We use a Rydberg quantum simulator to demonstrate a new form of spectroscopy, called quench spectroscopy, which probes the low-energy excitations of a many-body system. We illustrate the method on a two-dimensional simulation of the spin-1/2 dipolar XY model. Through microscopic measurements of the spatial spin correlation dynamics following a quench, we extract the dispersion relation of the elementary excitations for both ferro- and anti-ferromagnetic couplings. We observe qualitatively different behaviors between the two cases that result from the long-range nature of the interactions, and the frustration inherent in the antiferromagnet. In particular, the ferromagnet exhibits elementary excitations behaving as linear spin waves. In the anti-ferromagnet, spin waves appear to decay, suggesting the presence of strong nonlinearities. Our demonstration highlights the importance of power-law interactions on the excitation spectrum of a many-body system.	翻訳日:2024-07-16 05:37:11 公開日:2024-07-12
# FrePolad: 点雲生成のための周波数可変点潜時拡散 FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation ( http://arxiv.org/abs/2311.12090v2 ) ライセンス: Link先を確認	Chenliang Zhou, Fangcheng Zhong, Param Hanji, Zhilin Guo, Kyle Fogarty, Alejandro Sztrajman, Hongyun Gao, Cengiz Oztireli,	(参考訳) 本稿では,周波数補正点潜時拡散(FrePolad):変分オートエンコーダ(VAE)と変分拡散確率モデル(DDPM)を統合した点雲生成パイプライン。 FrePoladは、高い計算効率を維持しながら、生成タスクのポイントクラウド濃度において、高品質、多様性、柔軟性を同時に達成します。生成品質と多様性の向上は,(1)点雲分布を学習しながら高周波コンテンツを保持できる球面高調波による新しい周波数補正,(2)規則化されたが複雑な潜伏分布を学習するための潜伏DDPMによって達成される。さらに、FrePolad は変点雲濃度を、潜伏した形状分布上の条件分布として点のサンプリングを定式化することによって支持する。最後に、VAEによって符号化された低次元の潜伏空間は、FrePoladの高速でスケーラブルなサンプリングに寄与する。我々の定量および定性的な結果は、FrePoladの質、多様性、計算効率の点で最先端の性能を示している。プロジェクトページ: https://chenliang-zhou.github.io/FrePolad/。 We propose FrePolad: frequency-rectified point latent diffusion, a point cloud generation pipeline integrating a variational autoencoder (VAE) with a denoising diffusion probabilistic model (DDPM) for the latent distribution. FrePolad simultaneously achieves high quality, diversity, and flexibility in point cloud cardinality for generation tasks while maintaining high computational efficiency. The improvement in generation quality and diversity is achieved through (1) a novel frequency rectification via spherical harmonics designed to retain high-frequency content while learning the point cloud distribution; and (2) a latent DDPM to learn the regularized yet complex latent distribution. In addition, FrePolad supports variable point cloud cardinality by formulating the sampling of points as conditional distributions over a latent shape distribution. Finally, the low-dimensional latent space encoded by the VAE contributes to FrePolad's fast and scalable sampling. Our quantitative and qualitative results demonstrate FrePolad's state-of-the-art performance in terms of quality, diversity, and computational efficiency. Project page: https://chenliang-zhou.github.io/FrePolad/.	翻訳日:2024-07-16 05:37:11 公開日:2024-07-12
# InstaStyle:スタイリズされた画像の逆ノイズは、秘かにスタイルアドバイス InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser ( http://arxiv.org/abs/2311.15040v3 ) ライセンス: Link先を確認	Xing Cui, Zekun Li, Pei Pei Li, Huaibo Huang, Xuannan Liu, Zhaofeng He,	(参考訳) スティル化されたテキスト・ツー・イメージ生成は、いくつかの参照画像で指定されたスタイルに固執しながら、テキスト記述から画像を作成することに焦点を当てる。しかし、異なる参照画像内の微妙なスタイルの変化は、モデルがターゲットのスタイルを正確に学習することを妨げる。本稿では,単一の参照画像のみを用いて高忠実度スタイリング画像を生成する手法であるInstaStyleを提案する。提案手法は,非ゼロ信号対雑音比で示されるように,スタイリングされた参照画像からの逆ノイズが本質的にスタイル信号を運ぶことに基づく。我々は、DDIMインバージョンを用いて、参照画像からこのノイズを抽出し、拡散モデルを利用して「スタイル」ノイズから新しいスタイル化された画像を生成する。さらに、テキストプロンプトの本来の曖昧さと偏見は、スタイルの正確な伝達を妨げる。そこで本研究では,参照画像のスタイル記述の精度を高めるために,プロンプトリファインメントによる学習可能なスタイルトークンを提案する。定性的かつ定量的な実験結果から、InstaStyleは現在のベンチマークよりも優れた性能を発揮することが示された。さらに,本手法は,混合インバージョンノイズと組み合わせたスタイルの創造的タスクにおいて,その能力を示す。 Stylized text-to-image generation focuses on creating images from textual descriptions while adhering to a style specified by a few reference images. However, subtle style variations within different reference images can hinder the model from accurately learning the target style. In this paper, we propose InstaStyle, a novel approach that excels in generating high-fidelity stylized images with only a single reference image. Our approach is based on the finding that the inversion noise from a stylized reference image inherently carries the style signal, as evidenced by their non-zero signal-to-noise ratio. We employ DDIM inversion to extract this noise from the reference image and leverage a diffusion model to generate new stylized images from the "style" noise. Additionally, the inherent ambiguity and bias of textual prompts impede the precise conveying of style. To address this, we introduce a learnable style token via prompt refinement, which enhances the accuracy of the style description for the reference image. Qualitative and quantitative experimental results demonstrate that InstaStyle achieves superior performance compared to current benchmarks. Furthermore, our approach also showcases its capability in the creative task of style combination with mixed inversion noise.	翻訳日:2024-07-16 05:37:11 公開日:2024-07-12
# 半教師型医用画像セグメンテーションのための交互教育 Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2311.17325v2 ) ライセンス: Link先を確認	Zhen Zhao, Zicheng Wang, Longyue Wang, Dian Yu, Yixuan Yuan, Luping Zhou,	(参考訳) 半教師付き医用画像セグメンテーション研究は、ラベル付きデータに制限のあるトレーニングモデルにおいて有望であることが示されている。しかし、現在の指導学生ベースのアプローチは、確証バイアスに悩まされることがある。この課題に対処するために,教師-学生フレームワークにおける多様な教育手法であるAD-MTを提案する。一人の生徒モデルと2つの訓練不可能な教師モデルがあり、それは定期的に、ランダムに、別の方法で、モーメントを更新する。 AD-MTのコアはRPA (Random Periodic Alternate) Updating Module と Conflict-Combating Module (CCM) の2つの提案されたモジュールにある。 RPAは、異なる教育の観点から多様な推論を促進するために、補完的なデータバッチ、異なるデータ拡張、ランダムな切り替え期間の交互に多様な更新プロセスをスケジュールする。 CCMは、教師間の一貫性と矛盾する予測の両方からモデルを学習するよう促すために、エントロピーに基づくアンサンブル戦略を採用している。各種半教師付き環境における2次元および3次元医用セグメンテーションベンチマークにおけるAD-MTの有効性と優位性を示す実験結果を得た。 Semi-supervised medical image segmentation studies have shown promise in training models with limited labeled data. However, current dominant teacher-student based approaches can suffer from the confirmation bias. To address this challenge, we propose AD-MT, an alternate diverse teaching approach in a teacher-student framework. It involves a single student model and two non-trainable teacher models that are momentum-updated periodically and randomly in an alternate fashion. To mitigate the confirmation bias from the diverse supervision, the core of AD-MT lies in two proposed modules: the Random Periodic Alternate (RPA) Updating Module and the Conflict-Combating Module (CCM). The RPA schedules the alternating diverse updating process with complementary data batches, distinct data augmentation, and random switching periods to encourage diverse reasoning from different teaching perspectives. The CCM employs an entropy-based ensembling strategy to encourage the model to learn from both the consistent and conflicting predictions between the teachers. Experimental results demonstrate the effectiveness and superiority of our AD-MT on the 2D and 3D medical segmentation benchmarks across various semi-supervised settings.	翻訳日:2024-07-16 05:37:11 公開日:2024-07-12
# DiG-IN: Dig-IN: Diffusion Guidance for Investigationing Networks -- Uncovering Classifier differences Neuron Visualisations and Visual Counterfactual Explanations DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations ( http://arxiv.org/abs/2311.17833v3 ) ライセンス: Link先を確認	Maximilian Augustin, Yannic Neuhaus, Matthias Hein,	(参考訳) ディープラーニングは、ImageNetのような複雑な画像分類タスクに大きな進歩をもたらしたが、予期せぬ失敗モード、例えば突発的な機能によって、これらの分類器が野生でいかに確実に機能するかを疑問視する。さらに、安全クリティカルなタスクには、その決定のブラックボックスの性質に問題がある。本稿では、ガイド画像生成のためのフレームワークを用いて、分類器由来の目的を最適化した画像を生成することにより、これらの問題に対処する。視覚的対実的説明(VCE)による画像分類器の決定、分類器が最大に一致しない画像の解析による系統的誤りの検出、ニューロンの可視化と刺激的特徴の可視化を行う。このようにして、敵の頑健なモデルの形状バイアスや新しい故障モード、例えばゼロショットCLIP分類器の系統的エラーなど、既存の観測結果を検証する。さらに、VCEはより汎用性が高く、以前の作業よりも優れています。 While deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features, call into question how reliably these classifiers work in the wild. Furthermore, for safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently. In this paper, we address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation. We analyze the decisions of image classifiers by visual counterfactual explanations (VCEs), detection of systematic mistakes by analyzing images where classifiers maximally disagree, and visualization of neurons and spurious features. In this way, we validate existing observations, e.g. the shape bias of adversarially robust models, as well as novel failure modes, e.g. systematic errors of zero-shot CLIP classifiers. Moreover, our VCEs outperform previous work while being more versatile.	翻訳日:2024-07-16 05:37:11 公開日:2024-07-12
# 静止映像の高分解能化のためのモーションガイド下潜時拡散法 Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution ( http://arxiv.org/abs/2312.00853v2 ) ライセンス: Link先を確認	Xi Yang, Chenhang He, Jianqi Ma, Lei Zhang,	(参考訳) 現実世界の低解像度(LR)ビデオは多種多様で複雑な劣化があり、高解像度(HR)を高品質に再現するビデオ超解像度(VSR)アルゴリズムに大きな課題を生んでいる。近年,画像復元作業における現実的な細部の生成において,拡散モデルの性能が向上している。しかし、拡散過程はランダム性を持ち、復元された画像の内容を制御することは困難である。この問題は、ビデオの知覚品質に時間的一貫性が不可欠であるため、VSRタスクに拡散モデルを適用する際にさらに深刻になる。本稿では,事前学習した潜伏拡散モデルの強度を利用した実世界のVSRアルゴリズムを提案する。隣接フレーム間のコンテンツ整合性を確保するため、LRビデオの時間的ダイナミクスを利用して、遅延サンプリングパスを動作誘導損失で最適化し、生成したHRビデオがコヒーレントかつ連続的な視覚的流れを維持することを保証する。生成した細部の不連続性をさらに軽減するため、デコーダに時間モジュールを挿入し、革新的なシーケンス指向の損失で微調整する。動き誘導型潜在拡散(MGLD)に基づくVSRアルゴリズムは、実世界のVSRベンチマークデータセットの最先端技術よりもはるかに優れた知覚品質を実現し、提案したモデル設計およびトレーニング戦略の有効性を検証した。 Real-world low-resolution (LR) videos have diverse and complex degradations, imposing great challenges on video super-resolution (VSR) algorithms to reproduce their high-resolution (HR) counterparts with high quality. Recently, the diffusion models have shown compelling performance in generating realistic details for image restoration tasks. However, the diffusion process has randomness, making it hard to control the contents of restored images. This issue becomes more serious when applying diffusion models to VSR tasks because temporal consistency is crucial to the perceptual quality of videos. In this paper, we propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models. To ensure the content consistency among adjacent frames, we exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss, ensuring that the generated HR video maintains a coherent and continuous visual flow. To further mitigate the discontinuity of generated details, we insert temporal module to the decoder and fine-tune it with an innovative sequence-oriented loss. The proposed motion-guided latent diffusion (MGLD) based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets, validating the effectiveness of the proposed model design and training strategies.	翻訳日:2024-07-16 05:37:11 公開日:2024-07-12
# 宇宙活動による人間とシーンのインタラクションの再考 Revisit Human-Scene Interaction via Space Occupancy ( http://arxiv.org/abs/2312.02700v2 ) ライセンス: Link先を確認	Xinpeng Liu, Haowen Hou, Yanchao Yang, Yong-Lu Li, Cewu Lu,	(参考訳) HSI(Human-Scene Interaction)の生成は、さまざまな下流タスクにとって困難なタスクである。しかし、大きな障害の1つは、その限られたデータスケールである。同時にキャプチャされた人間と3D環境による高品質なデータを取得するのは難しいため、データの多様性と複雑さが制限される。本研究では、シーンとのインタラクションが、抽象的な物理的視点から、シーンの空間占有と本質的に相互作用していると論じる。純粋な動きシーケンスを、見えないシーン占有と相互作用する人間の記録として扱うことで、動きのみのデータを、大規模にペア化された人間の占有相互作用データベースであるMotion Occupancy Base (MOB)に集約することができる。したがって、高品質なシーンスキャンによるコスト対のモーションシーンデータセットの必要性を大幅に軽減することができる。この新たな統合された人間-職業相互作用の視点により、周囲の占有状況から目標状態に到達するための単一のモーションコントローラが提案される。複雑な占有配置を持つMOBでトレーニングをすれば、人間の動きに強く依存するので、コントローラーは狭いシーンを処理し、通常のリビングルームのような限られた複雑さを持つ一般的なシーンに一般化することができる。トレーニング用のGT 3Dシーンがないため、静的シーンと動的シーンの両方を含む様々なシナリオにおいて、現実的で安定したHSIモーションを生成できる。このプロジェクトはhttps://foruck.github.io/occu-page/.comで入手できる。 Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks. However, one of the major obstacles is its limited data scale. High-quality data with simultaneously captured human and 3D environments is hard to acquire, resulting in limited data diversity and complexity. In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective, leading us to a unified novel view of Human-Occupancy Interaction. By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database: Motion Occupancy Base (MOB). Thus, the need for costly paired motion-scene datasets with high-quality scene scans can be substantially alleviated. With this new unified view of Human-Occupancy interaction, a single motion controller is proposed to reach the target state given the surrounding occupancy. Once trained on MOB with complex occupancy layout, which is stringent to human movements, the controller could handle cramped scenes and generalize well to general scenes with limited complexity like regular living rooms. With no GT 3D scenes for training, our method can generate realistic and stable HSI motions in diverse scenarios, including both static and dynamic scenes. The project is available at https://foruck.github.io/occu-page/.	翻訳日:2024-07-16 05:37:11 公開日:2024-07-12
# 高等教育におけるジェネレーティブAI:大学政策・資源・ガイドラインを通してChatGPTを見る Generative AI in Higher Education: Seeing ChatGPT Through Universities' Policies, Resources, and Guidelines ( http://arxiv.org/abs/2312.05235v3 ) ライセンス: Link先を確認	Hui Wang, Anh Dang, Zihao Wu, Son Mac,	(参考訳) ジェネレーティブ・人工知能(GenAI)の進歩は、教育経験を豊かにする機会を提供するだけでなく、学術的完全性への懸念も引き起こす。多くの教育者は、GenAIを教育実践に取り入れることに対する不安とためらいを表明しており、彼らの教室にGenAIを効果的に組み込むために支援できる制度からの勧告や指導を必要としている。本研究は、高等教育者のニーズに応えるため、大学や教育者が、GenAIの利用、特にChatGPTに関する米国トップクラスの大学が確立した学術政策やガイドラインを分析し、学術的文脈におけるGenAIの発展にどのように対応し、適応するかを検討することを目的とする。データソースには、米国内の上位100大学が提供する学術的方針、声明、ガイドライン、関連するリソースが含まれており、これらの大学の大半は、GenAIに対してオープンだが慎重なアプローチを採用していることを示している。主な関心事は、倫理的利用、正確性、データのプライバシーである。ほとんどの大学は、シラバステンプレート、ワークショップ、共有記事、一般的な技術紹介、倫理的関心事、教育的応用、予防戦略、データプライバシー、制限、探偵ツールなど、様々な種類のリソースを積極的に対応し提供しています。この発見は、教育実践における教育者への実践的な教育的意味を4つ与えている: その存在を受け入れ、学習目的と使用を一致させ、誤用を防ぐためのカリキュラムを進化させ、AI検出器に頼るのではなく、多面的評価戦略を採用する。政策立案における教育者には2つの推奨事項が提案される: 規律固有の政策とガイドラインを確立し、機密情報を慎重に管理する。 The advancements in Generative Artificial Intelligence (GenAI) provide opportunities to enrich educational experiences, but also raise concerns about academic integrity. Many educators have expressed anxiety and hesitation in integrating GenAI in their teaching practices, and are in needs of recommendations and guidance from their institutions that can support them to incorporate GenAI in their classrooms effectively. In order to respond to higher educators' needs, this study aims to explore how universities and educators respond and adapt to the development of GenAI in their academic contexts by analyzing academic policies and guidelines established by top-ranked U.S. universities regarding the use of GenAI, especially ChatGPT. Data sources include academic policies, statements, guidelines, and relevant resources provided by the top 100 universities in the U.S. Results show that the majority of these universities adopt an open but cautious approach towards GenAI. Primary concerns lie in ethical usage, accuracy, and data privacy. Most universities actively respond and provide diverse types of resources, such as syllabus templates, workshops, shared articles, and one-on-one consultations focusing on a range of topics: general technical introduction, ethical concerns, pedagogical applications, preventive strategies, data privacy, limitations, and detective tools. The findings provide four practical pedagogical implications for educators in teaching practices: accept its presence, align its use with learning objectives, evolve curriculum to prevent misuse, and adopt multifaceted evaluation strategies rather than relying on AI detectors. Two recommendations are suggested for educators in policy making: establish discipline-specific policies and guidelines, and manage sensitive information carefully.	翻訳日:2024-07-16 05:37:10 公開日:2024-07-12
# Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0 Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0 ( http://arxiv.org/abs/2312.06432v2 ) ライセンス: Link先を確認	Tao Yu, Zongdian Li, Kei Sakaguchi, Omar Hashash, Walid Saad, Merouane Debbah,	(参考訳) 物理的システムのプログラム可能なデジタル表現を可能にするデジタルツイン(DT)の概念は、将来の産業に革命をもたらすものと期待され、将来のスマート社会、すなわち社会5.0のビジョンの中心に位置する。しかし、このようなDT駆動のSociety 5.0の成功は、人工知能とネットワーク技術の相乗的収束を必要とする。これまでの研究は定性的な研究、単純な分析、単一DTのソフトウェア実装に限られていたため、Society 5.0が要求するデジタル空間と物理空間の高度にシナジスティックな統合は提供できない。これとは対照的に,本稿では,異なる社会5.0サービスを表す異種・物理的に分離されたDTを,単一のフレームワークとシステムに一元的に統合する,インターネット・オブ・フェデレーション・デジタル・ツインズ(IoFDT)の新たな概念を構想する。 IoFDTのこの概念のために、我々はまず、水平と垂直の相互作用を通じてフェデレーションされたDTを統合する階層的アーキテクチャを導入し、サイバー空間と物理空間をブリッジして、新たな可能性を開く。そして、IoFDTを実現する上での課題について議論し、通信、コンピューティング、AIネイティブネットワーク間の複雑さを強調しながら、潜在的な革新的なソリューションを強調します。その後、我々は、すべての技術コンポーネントを統合し、それらの相互作用を編成する統合IoFDTプラットフォームの実装の重要性を詳述し、スマートモビリティのような分野における実世界のアプリケーションに焦点を当てた実践的なプラットフォームの必要性を強調した。 The concept of digital twin (DT), which enables the creation of a programmable, digital representation of physical systems, is expected to revolutionize future industries and will lie at the heart of the vision of a future smart society, namely, Society 5.0, in which high integration between cyber (digital) and physical spaces is exploited to bring economic and societal advancements. However, the success of such a DT-driven Society 5.0 requires a synergistic convergence of artificial intelligence and networking technologies into an integrated, programmable system that can coordinate DT networks to effectively deliver diverse Society 5.0 services. Prior works remain restricted to either qualitative study, simple analysis or software implementations of a single DT, and thus, they cannot provide the highly synergistic integration of digital and physical spaces as required by Society 5.0. In contrast, this paper envisions a novel concept of an Internet of Federated Digital Twins (IoFDT) that holistically integrates heterogeneous and physically separated DTs representing different Society 5.0 services within a single framework and system. For this concept of IoFDT, we first introduce a hierarchical architecture that integrates federated DTs through horizontal and vertical interactions, bridging cyber and physical spaces to unlock new possibilities. Then, we discuss challenges of realizing IoFDT, highlighting the intricacies across communication, computing, and AI-native networks while also underscoring potential innovative solutions. Subsequently, we elaborate on the importance of the implementation of a unified IoFDT platform that integrates all technical components and orchestrates their interactions, emphasizing the necessity of practical experimental platforms with a focus on real-world applications in areas like smart mobility.	翻訳日:2024-07-16 05:37:10 公開日:2024-07-12
# 潜在回廊による適応的人軌道予測 Adaptive Human Trajectory Prediction via Latent Corridors ( http://arxiv.org/abs/2312.06653v2 ) ライセンス: Link先を確認	Neerja Thakkar, Karttikeya Mangalam, Andrea Bajcsy, Jitendra Malik,	(参考訳) 人間の軌道予測は、通常ゼロショットの一般化問題として提起される:予測器はトレーニングシーンで人間の動きのデータセットで学習され、見当たらないテストシーンに展開される。このパラダイムは、非常に進歩していますが、デプロイメントシーンにおける人間の振る舞いの傾向は、時間とともに一定であると、基本的に仮定しています。このように、現在の予測モデルは、一時的に集まる群衆や、雨の中を急いでいる歩行者、水たまりを避けたり、抗議活動など、シーン固有の一時的な人間の行動に適応できない。本稿では,シーン固有の適応軌道予測の問題を形式化し,潜時廊下と呼ばれる即時チューニングにヒントを得た新しい適応手法を提案する。学習可能な画像プロンプトで事前訓練された人間の軌道予測器の入力を増大させることで、極めて少ない新しいデータ(例えば、30秒間観察された2人の人間)からトレンドを推測することで、配置シーンを改善することができる。 0.1%の追加モデルパラメータでは、MOTSynthのシミュレーションデータの改善が23.9%、MOTおよびWildtrackにおけるADEが16.4%となる。定性的には,非適応予測器が捕捉に苦慮するシーン幾何学とシーン固有の人間の行動に意識を抱く潜伏廊下が予測器に現れるのを観察する。プロジェクトのWebサイトはhttps://neerja.me/atp_latent_corridors/にある。 Human trajectory prediction is typically posed as a zero-shot generalization problem: a predictor is learnt on a dataset of human motion in training scenes, and then deployed on unseen test scenes. While this paradigm has yielded tremendous progress, it fundamentally assumes that trends in human behavior within the deployment scene are constant over time. As such, current prediction models are unable to adapt to scene-specific transient human behaviors, such as crowds temporarily gathering to see buskers, pedestrians hurrying through the rain and avoiding puddles, or a protest breaking out. We formalize the problem of scene-specific adaptive trajectory prediction and propose a new adaptation approach inspired by prompt tuning called latent corridors. By augmenting the input of any pre-trained human trajectory predictor with learnable image prompts, the predictor can improve in the deployment scene by inferring trends from extremely small amounts of new data (e.g., 2 humans observed for 30 seconds). With less than 0.1% additional model parameters, we see up to 23.9% ADE improvement in MOTSynth simulated data and 16.4% ADE in MOT and Wildtrack real pedestrian data. Qualitatively, we observe that latent corridors imbue predictors with an awareness of scene geometry and scene-specific human behaviors that non-adaptive predictors struggle to capture. The project website can be found at https://neerja.me/atp_latent_corridors/.	翻訳日:2024-07-16 05:37:10 公開日:2024-07-12
# 3DReact: 化学反応のための幾何学的深層学習 3DReact: Geometric deep learning for chemical reactions ( http://arxiv.org/abs/2312.08307v2 ) ライセンス: Link先を確認	Puck van Gerwen, Ksenia R. Briling, Charlotte Bunne, Vignesh Ram Somnath, Ruben Laplaza, Andreas Krause, Clemence Corminboeuf,	(参考訳) ニューラルネットワークアーキテクチャに関連する分子対称性を組み込んだ幾何学的ディープラーニングモデルは、分子特性の予測の精度とデータ効率を大幅に改善した。この成功に基づいて,反応物と生成物の三次元構造から反応特性を予測する幾何学的深層学習モデルである3DReactを導入する。モデルの不変バージョンが既存の反応データセットに十分であることを示す。本稿では,GDB7-22-TS,Cyclo-23-TS,Proparg-21-TSの各データセットにおけるアクティベーションバリアの予測における競合性能について述べる。反応特性予測の既存のモデルと比較して、3DReactは、もし利用可能であれば原子をマッピングする情報を利用する柔軟なフレームワークと、(不変または同変の方法で)反応物質と生成物のジオメトリを提供する。したがって、異なるデータセット、原子をマッピングするレシエーション、および補間と補間の両方のタスクを体系的にうまく実行する。 Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction datasets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS and Proparg-21-TS datasets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different datasets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.	翻訳日:2024-07-16 05:37:10 公開日:2024-07-12
# 低エネルギー部分空間におけるディジタル量子シミュレーションの複雑さ:応用と下界 Complexity of Digital Quantum Simulation in the Low-Energy Subspace: Applications and a Lower Bound ( http://arxiv.org/abs/2312.08867v3 ) ライセンス: Link先を確認	Weiyuan Gong, Shuo Zhou, Tongyang Li,	(参考訳) デジタル量子シミュレーションは、ハミルトニアンのユニタリ進化の近似に広く応用されている。実際、量子系の多くのシミュレーションタスクはヒルベルト空間全体ではなく低エネルギー部分空間の量子状態に焦点を当てている。本稿では,低エネルギー部分空間の積公式に基づいて,ディジタル量子シミュレーションの複雑さを系統的に検討する。シミュレーション誤差は、様々なデジタル量子シミュレーションアルゴリズムや量子システムにおいて、ハミルトニアンの有効な低エネルギーノルムに依存しており、熱化による不完全な状態の準備であっても、完全なユニタリシミュレーションの以前の複雑さよりも改善できることが示される。特に、低エネルギー部分空間におけるスピンモデルをシミュレートするためには、qDRIFTやランダムな置換のようなランダム化された積公式がより小さなトロッター数を必要とすることを証明する。このような改善は対称性に保護されたデジタル量子シミュレーションでも継続する。我々は、パワーロー量子相互作用の力学をシミュレートする上で、同様の改善を証明した。また、低エネルギー部分空間における一般ディジタル量子シミュレーションのためのクエリローバウンドを提供する。 Digital quantum simulation has broad applications in approximating unitary evolution of Hamiltonians. In practice, many simulation tasks for quantum systems focus on quantum states in the low-energy subspace instead of the entire Hilbert space. In this paper, we systematically investigate the complexity of digital quantum simulation based on product formulas in the low-energy subspace. We show that the simulation error depends on the effective low-energy norm of the Hamiltonian for a variety of digital quantum simulation algorithms and quantum systems, allowing improvements over the previous complexities for full unitary simulations even for imperfect state preparations due to thermalization. In particular, for simulating spin models in the low-energy subspace, we prove that randomized product formulas such as qDRIFT and random permutation require smaller Trotter numbers. Such improvement also persists in symmetry-protected digital quantum simulations. We prove a similar improvement in simulating the dynamics of power-law quantum interactions. We also provide a query lower bound for general digital quantum simulations in the low-energy subspace.	翻訳日:2024-07-16 05:37:10 公開日:2024-07-12
# 政策学習のための任意の軌道モデリング Any-point Trajectory Modeling for Policy Learning ( http://arxiv.org/abs/2401.00025v3 ) ライセンス: Link先を確認	Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel,	(参考訳) デモから学ぶことはロボットに新しいスキルを教える強力な方法であり、より多くのデモデータを持つことでポリシー学習が向上することが多い。しかし、デモデータを収集するコストが高いことは、重大なボトルネックである。ビデオは、リッチなデータソースとして、行動、物理、意味に関する知識を含んでいるが、アクションラベルの欠如により、それらから制御固有の情報を抽出することは困難である。本研究では、ビデオフレーム内の任意の点の将来の軌跡を予測するために、トラジェクトリモデルを事前学習することで、ビデオデモを利用する新しいフレームワーク、Any-point Trajectory Modeling (ATM)を導入する。トレーニングが完了すると、これらのトラジェクトリは詳細な制御ガイダンスを提供し、最小のアクションラベル付きデータによる堅牢なビジュモータポリシーの学習を可能にする。シミュレーションと実世界の両方で評価した130以上の言語条件タスクにおいて、ATMは強力なビデオ事前学習ベースラインを平均80%上回っている。さらに,異なるロボット形態から人間のビデオやビデオから操作スキルを効果的に伝達する学習方法を示す。ビジュアライゼーションとコードは以下の通りである。 Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, Any-point Trajectory Modeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Across over 130 language-conditioned tasks we evaluated in both simulation and the real world, ATM outperforms strong video pre-training baselines by 80% on average. Furthermore, we show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology. Visualizations and code are available at: \url{https://xingyu-lin.github.io/atm}.	翻訳日:2024-07-16 05:37:10 公開日:2024-07-12
# ニューラル計算のための新しいパラダイム:学習可能なニューロンと適応可能な構造を持つXネット A Novel Paradigm for Neural Computation: X-Net with Learnable Neurons and Adaptable Structure ( http://arxiv.org/abs/2401.01772v2 ) ライセンス: Link先を確認	Yanjie Li, Weijun Li, Lina Yu, Min Wu, Jinyi Liu, Wenqiang Li, Meilan Hao, Shu Wei, Yusong Deng, Liping Zhang, Xiaoli Dong, Hong Qin, Xin Ning, Yugui Zhang, Baoli Lu, Jian Xu, Shuang Li,	(参考訳) 多層認識(MLP)は、バイオインフォマティクスから金融分析まで、様々な分野に浸透し、現代の科学研究の課題に欠かせない存在となっている。しかし、MLPには明らかな欠点がある。 1) アクティベーション関数のタイプは単一かつ比較的固定的であり,ネットワークの「表現能力」が低下し,ネットワーク構造が適応的でなく,ネットワーク構造が冗長あるいは不十分である場合が多い。本研究では,MLPを置き換えることを約束する新しいニューラルネットワークパラダイムX-Netを提案する。 X-Netは、訓練中のデリバティブ情報に基づいて個別にアクティベーション関数を動的に学習し、特定のタスクに対するネットワークの表現能力を改善する。同時に、X-Netはニューロンレベルでネットワーク構造を正確に調整し、様々な複雑さのタスクに対応し、計算コストを削減できる。 X-Net は表現能力において MLP よりも優れていることを示す。 X-Netは、回帰や分類タスクのパラメータをはるかに小さくして、MPPと同等またはそれ以上の性能を達成することができる。具体的には、パラメータの数に関して言えば、X-Netは平均でMLPの3%しかなく、一部のタスクでは1.1%しか持たない。我々はまた、X-Netがエネルギー、環境、航空宇宙といった様々な分野のデータに対して科学的発見を行う能力を示し、X-Netは科学者が新しい数学や物理学の法則を発見する手助けをする。 Multilayer perception (MLP) has permeated various disciplinary domains, ranging from bioinformatics to financial analytics, where their application has become an indispensable facet of contemporary scientific research endeavors. However, MLP has obvious drawbacks. 1), The type of activation function is single and relatively fixed, which leads to poor `representation ability' of the network, and it is often to solve simple problems with complex networks; 2), the network structure is not adaptive, it is easy to cause network structure redundant or insufficient. In this work, we propose a novel neural network paradigm X-Net promising to replace MLPs. X-Net can dynamically learn activation functions individually based on derivative information during training to improve the network's representational ability for specific tasks. At the same time, X-Net can precisely adjust the network structure at the neuron level to accommodate tasks of varying complexity and reduce computational costs. We show that X-Net outperforms MLPs in terms of representational capability. X-Net can achieve comparable or even better performance than MLP with much smaller parameters on regression and classification tasks. Specifically, in terms of the number of parameters, X-Net is only 3% of MLP on average and only 1.1% under some tasks. We also demonstrate X-Net's ability to perform scientific discovery on data from various disciplines such as energy, environment, and aerospace, where X-Net is shown to help scientists discover new laws of mathematics or physics.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# いずれにせよ、誰の妻なのか?機械翻訳における同性関係に対する偏見の評価 Whose wife is it anyway? Assessing bias against same-gender relationships in machine translation ( http://arxiv.org/abs/2401.04972v2 ) ライセンス: Link先を確認	Ian Stewart, Rada Mihalcea,	(参考訳) 機械翻訳は、しばしばバイアスのあるデータやアルゴリズムに悩まされる。性規範の偏見は研究されているが、MTシステムが社会関係に関する偏見を符号化しているかどうかについては、例えば「弁護士が妻にキスをした」など、あまり知られていない。 MTシステムにおける同性関係に対するバイアスの程度を,いくつかの名詞・ジェンダー言語(例えばスペイン語)から抽出されたテンプレート文を用いて検討した。 3つの一般的なMTサービスは、同じ性別のエンティティ間の関係に関する文を正確に翻訳することができないことが分かりました。エラー率は文脈によって大きく異なり、高い女性表現の職業を参照する同性文はより低い精度で翻訳される。本研究は,社会関係に関するNLPシステムにおける本質的バイアス評価のケーススタディである。 Machine translation often suffers from biased data and algorithms that can lead to unacceptable errors in system output. While bias in gender norms has been investigated, less is known about whether MT systems encode bias about social relationships, e.g., "the lawyer kissed her wife." We investigate the degree of bias against same-gender relationships in MT systems, using generated template sentences drawn from several noun-gender languages (e.g., Spanish) and comprised of popular occupation nouns. We find that three popular MT services consistently fail to accurately translate sentences concerning relationships between entities of the same gender. The error rate varies considerably based on the context, and same-gender sentences referencing high female-representation occupations are translated with lower accuracy. We provide this work as a case study in the evaluation of intrinsic bias in NLP systems with respect to social relationships.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# DISTINQT - 将来のモバイルおよびワイヤレスネットワークのためのQoS予測のための分散プライバシ意識学習フレームワーク DISTINQT: A Distributed Privacy Aware Learning Framework for QoS Prediction for Future Mobile and Wireless Networks ( http://arxiv.org/abs/2401.10158v2 ) ライセンス: Link先を確認	Nikolaos Koursioumpas, Lina Magoula, Ioannis Stavrakakis, Nancy Alonistioti, M. A. Gutierrez-Estevez, Ramin Khalili,	(参考訳) 5Gと6G以上のネットワークは、あるレベルのQuality of Service(QoS)に依存してスムーズな運用を行う、新しくて挑戦的なユースケースとアプリケーションをサポートすることが期待されている。 QoSをタイムリーに予測することは、特に車両通信の場合のように、安全クリティカルな用途において非常に重要である。近年まで、集中型人工知能(AI)ソリューションによってQoS予測が実行されてきたが、多くのプライバシー、計算、運用上の懸念が浮かび上がっている。新たなソリューション(例えば、Split Learning、Federated Learning)が浮上し、データのプライバシを保ちながら、ノード間で複雑さを低減したAIタスクが分散された。しかし、スケーラブルな分散学習アプローチに関しては、将来の無線ネットワークの異質性を考慮して、新たな課題が浮かび上がっている。現在の研究は、QoS予測のための新しいマルチヘッド入力プライバシ対応分散学習フレームワークであるDISTINQTを提案する。我々のフレームワークは、データ型とモデルアーキテクチャの観点から複数の異種ノードをサポートし、それらをまたいだ計算を共有する。これにより、最終QoS予測モデルの堅牢性と一般化能力を高めるために、多様な知識を単独の学習プロセスに組み込むことができる。 DISTINQTはまた、あらゆる生の入力データを、送信前に非常に複雑で圧縮され、不可逆な潜在表現に符号化することで、データのプライバシ保護に貢献する。評価の結果,DISTINQTは集中型よりも統計的に同じ性能を示し,プライバシー保護クレームの有効性が証明された。 DISTINQTは、Tele-Operated Drivingのユースケースで提示された6つの最先端集中型ベースラインソリューションに対して、平均65%の予測誤差の低減を実現している。 Beyond 5G and 6G networks are expected to support new and challenging use cases and applications that depend on a certain level of Quality of Service (QoS) to operate smoothly. Predicting the QoS in a timely manner is of high importance, especially for safety-critical applications as in the case of vehicular communications. Although until recent years the QoS prediction has been carried out by centralized Artificial Intelligence (AI) solutions, a number of privacy, computational, and operational concerns have emerged. Alternative solutions have surfaced (e.g. Split Learning, Federated Learning), distributing AI tasks of reduced complexity across nodes, while preserving the privacy of the data. However, new challenges rise when it comes to scalable distributed learning approaches, taking into account the heterogeneous nature of future wireless networks. The current work proposes DISTINQT, a novel multi-headed input privacy-aware distributed learning framework for QoS prediction. Our framework supports multiple heterogeneous nodes, in terms of data types and model architectures, by sharing computations across them. This enables the incorporation of diverse knowledge into a sole learning process that will enhance the robustness and generalization capabilities of the final QoS prediction model. DISTINQT also contributes to data privacy preservation by encoding any raw input data into highly complex, compressed, and irreversible latent representations before any transmission. Evaluation results showcase that DISTINQT achieves a statistically identical performance compared to its centralized version, while also proving the validity of the privacy preserving claims. DISTINQT manages to achieve a reduction in prediction error of up to 65% on average against six state-of-the-art centralized baseline solutions presented in the Tele-Operated Driving use case.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# 深層学習に基づく農業推薦システム:多変量気象予報手法 Agricultural Recommendation System based on Deep Learning: A Multivariate Weather Forecasting Approach ( http://arxiv.org/abs/2401.11410v3 ) ライセンス: Link先を確認	Md Zubair, Md. Shahidul Salim, Mehrab Mustafy Rahman, Mohammad Jahid Ibna Basher, Shahin Imran, Iqbal H. Sarker,	(参考訳) 農業は、経済成長を推進し、世界中の人々の食料安全保障を確保する上で、基本的な役割を担っている。労働集約型農業は多くの発展途上国で食糧穀物生産が着実に増加してきたが、豪雨、低温、干ばつなどの悪天候に悩まされることが多い。これらの要因は食料生産を著しく妨げ、世界の食料安全保障に重大なリスクをもたらしている。そこで本研究では,天気予報モデルを用いた環境適応型作物推薦システムを提案する。実施のため、バングラデシュの全領域について検討した。気象予報モデルとして多変量重畳Bi-LSTM(時間分散層を有する3層Bi-LSTM)ネットワークが広く評価されている。提案した気象モデルは、バングラデシュの任意の場所における降水量、気温、湿度、日光量を平均して0.9824と予測でき、他の最先端のLSTMモデルよりも優れている。これらの予測は、実効的な農業決定を生み出す上で、我々のシステムを導く。さらに、我々の本格的なシステムは、農作物を保護するための予防措置を実施できるように、極端な気象状況について農家に警告することができる。最後に、このシステムは、洪水や干ばつに起因した地域での知識に基づく作物提案にも長けている。 Agriculture plays a fundamental role in driving economic growth and ensuring food security for populations around the world. Although labor-intensive agriculture has led to steady increases in food grain production in many developing countries, it is frequently challenged by adverse weather conditions, including heavy rainfall, low temperatures, and drought. These factors substantially hinder food production, posing significant risks to global food security. In order to have a profitable, sustainable, and farmer-friendly agricultural practice, this paper proposes a context-based crop recommendation system powered by a weather forecast model. For implementation purposes, we have considered the whole territory of Bangladesh. With extensive evaluation, the multivariate Stacked Bi-LSTM (three Bi-LSTM layers with a time Distributed layer) Network is employed as the weather forecasting model. The proposed weather model can forecast Rainfall, Temperature, Humidity, and Sunshine for any given location in Bangladesh with an average R-Squared value of 0.9824, and the model outperforms other state-of-the-art LSTM models. These predictions guide our system in generating viable farming decisions. Additionally, our full-fledged system is capable of alerting the farmers about extreme weather conditions so that preventive measures can be undertaken to protect the crops. Finally, the system is also adept at making knowledge-based crop suggestions for flood and drought-prone regions.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# グリオーマの病理像解析における人工知能の応用 Applications of artificial intelligence in the analysis of histopathology images of gliomas: a review ( http://arxiv.org/abs/2401.15022v4 ) ライセンス: Link先を確認	Jan-Philipp Redlich, Friedrich Feuerhake, Joachim Weis, Nadine S. Schaadt, Sarah Teuber-Hanselmann, Christoph Buck, Sabine Luttmann, Andrea Eberle, Stefan Nikolin, Arno Appenzeller, Andreas Portmann, André Homeyer,	(参考訳) 近年,グリオーマの診断が複雑化している。人工知能(AI)を用いたグリオーマ組織像の解析は,診断と予後予測を支援する新たな機会を提供する。そこで本研究では,ヒトグリオーマの組織像全体に対するAIを用いた画像解析手法を提案し,脳卒中(23/83),脳卒中(27/83),分子マーカー(20/83),生存率(29/83)の診断課題について検討した。方法論的側面と臨床応用性について検討した。本研究の焦点は,成人型びまん性グリオーマのヘマトキシリンおよびエオシン染色組織分画の評価である。研究の大半 (52/83) は、The Cancer Genome Atlas (TCGA) から入手可能なグリオーマと低グレードグリオーマのデータセットに基づいており、他のデータセット(16/83) やTCGAデータセット(15/83) に加えて、いくつかの研究しか使われていない。現在のアプローチは主に20倍(35/83)で組織を分析するために畳み込みニューラルネットワーク(63/83)に依存している。新しい研究分野は、臨床データ、オミクスデータ、磁気共鳴イメージング(29/83)の統合である。これまでのところ、AIベースの手法は有望な成果を上げているが、実際の臨床環境ではまだ使われていない。今後の研究は、高品質で最新の臨床および分子病理アノテーションを持つ大規模で多サイトなデータセットに対するメソッドの独立した検証に焦点をあてて、定期的な適用性を示す必要がある。 In recent years, the diagnosis of gliomas has become increasingly complex. Analysis of glioma histopathology images using artificial intelligence (AI) offers new opportunities to support diagnosis and outcome prediction. To give an overview of the current state of research, this review examines 83 publicly available research studies that have proposed AI-based methods for whole-slide histopathology images of human gliomas, covering the diagnostic tasks of subtyping (23/83), grading (27/83), molecular marker prediction (20/83), and survival prediction (29/83). All studies were reviewed with regard to methodological aspects as well as clinical applicability. It was found that the focus of current research is the assessment of hematoxylin and eosin-stained tissue sections of adult-type diffuse gliomas. The majority of studies (52/83) are based on the publicly available glioblastoma and low-grade glioma datasets from The Cancer Genome Atlas (TCGA) and only a few studies employed other datasets in isolation (16/83) or in addition to the TCGA datasets (15/83). Current approaches mostly rely on convolutional neural networks (63/83) for analyzing tissue at 20x magnification (35/83). A new field of research is the integration of clinical data, omics data, or magnetic resonance imaging (29/83). So far, AI-based methods have achieved promising results, but are not yet used in real clinical settings. Future work should focus on the independent validation of methods on larger, multi-site datasets with high-quality and up-to-date clinical and molecular pathology annotations to demonstrate routine applicability.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# データ組織はバイナリ分類の予測可能性を制限する Data organization limits the predictability of binary classification ( http://arxiv.org/abs/2401.17036v2 ) ライセンス: Link先を確認	Fei Jing, Zi-Ke Zhang, Yi-Cheng Zhang, Qingpeng Zhang,	(参考訳) データ組織の構造は、特に二分分類タスクにおいて、機械学習アルゴリズムの有効性に大きな影響を与えていると広く認識されている。我々の研究は、与えられたデータセット上のバイナリ分類器の最大ポテンシャルは、データ固有の性質に主に制約されていることを示唆する理論的枠組みを提供する。理論的推論と経験的検証の両面から, 2つの主要な結論に達するために, 標準目的関数, 評価指標, 二項分類器を用いた。まず、実際のデータセットにおける二項分類性能の理論的上限が理論的に達成可能であることを示す。この上界は、学習損失と評価基準の間の計算可能な平衡を表す。第2に、一般的に使用されている3つの評価指標の正確な上界を計算し、その上界は、使用中の分類器とは独立に、データセットの特徴と複雑に結びついているという、上位のテーゼと基本的な均一性を明らかにする。さらに、その後の分析により、二項分類データにおける性能上限とクラス重複レベルとの詳細な関係が明らかになった。この関係は、機能エンジニアリングで使用する最も効果的な機能サブセットをピンポイントするのに役立ちます。 The structure of data organization is widely recognized as having a substantial influence on the efficacy of machine learning algorithms, particularly in binary classification tasks. Our research provides a theoretical framework suggesting that the maximum potential of binary classifiers on a given dataset is primarily constrained by the inherent qualities of the data. Through both theoretical reasoning and empirical examination, we employed standard objective functions, evaluative metrics, and binary classifiers to arrive at two principal conclusions. Firstly, we show that the theoretical upper bound of binary classification performance on actual datasets can be theoretically attained. This upper boundary represents a calculable equilibrium between the learning loss and the metric of evaluation. Secondly, we have computed the precise upper bounds for three commonly used evaluation metrics, uncovering a fundamental uniformity with our overarching thesis: the upper bound is intricately linked to the dataset's characteristics, independent of the classifier in use. Additionally, our subsequent analysis uncovers a detailed relationship between the upper limit of performance and the level of class overlap within the binary classification data. This relationship is instrumental for pinpointing the most effective feature subsets for use in feature engineering.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# アジャイルソフトウェア開発におけるデータ管理の課題と解決策を探る: 文献レビューと実践者調査 Exploring Data Management Challenges and Solutions in Agile Software Development: A Literature Review and Practitioner Survey ( http://arxiv.org/abs/2402.00462v2 ) ライセンス: Link先を確認	Ahmed Fawzy, Amjed Tahir, Matthias Galster, Peng Liang,	(参考訳) ソフトウェア製品とその開発に関連するデータを管理することは、ソフトウェアプロジェクトやアジャイルチームにとって大きな課題となる。課題には、さまざまなソースからのデータを統合し、継続的な変更と適応の観点からデータ品質を保証することが含まれる。この目的のために、私たちは、アジャイルプロジェクトでデータ管理の課題と潜在的な解決策を体系的に探求することを目指していました。研究の状況を理解するために,系統的な文献レビュー(SLR)を用いた混合手法のアプローチを採用し,実践者を対象にした調査を行った。 SLRでは、データ管理の側面と関連する課題と解決策を特定し分類する45の研究をレビューした。実践者調査では,32名の業界専門家から実践経験とソリューションを抽出し,SLRの知見を補完した。その結果,データ統合プロセスの管理,多種多様なデータの収集,データ収集の自動化,リアルタイム分析要求の達成など,SLRと実践者の双方で報告された主要なデータ管理課題が明らかになった。本研究は,データ管理方針の明確化の必要性,データ管理ツールのトレーニング,アジリティの向上,製品品質の向上,プロジェクト成果の向上といった新たなデータ管理戦略の採用など,実践者や研究者にとっての意義を示すものである。 Managing data related to a software product and its development poses significant challenges for software projects and agile development teams. Challenges include integrating data from diverse sources and ensuring data quality in light of continuous change and adaptation. To this end, we aimed to systematically explore data management challenges and potential solutions in agile projects. We employed a mixed-methods approach, utilizing a systematic literature review (SLR) to understand the state-of-research followed by a survey with practitioners to reflect on the state-of-practice. In the SLR, we reviewed 45 studies in which we identified and categorized data management aspects and the associated challenges and solutions. In the practitioner survey, we captured practical experiences and solutions from 32 industry experts to complement the findings from the SLR. Our findings reveal major data management challenges reported in both the SLR and practitioner survey, such as managing data integration processes, capturing diverse data, automating data collection, and meeting real-time analysis requirements. Based on our findings, we present implications for practitioners and researchers, which include the necessity of developing clear data management policies, training on data management tools, and adopting new data management strategies that enhance agility, improve product quality, and facilitate better project outcomes.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# 検証回路の再利用による言語モデルの信頼度向上 Increasing Trust in Language Models through the Reuse of Verified Circuits ( http://arxiv.org/abs/2402.02619v8 ) ライセンス: Link先を確認	Philip Quirke, Clement Neo, Fazl Barez,	(参考訳) 言語モデル(LM)は、幅広い予測タスクにますます使われていますが、それらのトレーニングは稀なエッジケースを無視し、信頼性を低下させます。ここでは、タスクアルゴリズムと回路実装を検証し、エッジケースを考慮し、既知の障害モードを含まない、厳格な信頼性基準を定義する。数学的および論理的に定義されたフレームワークを使用して構築すれば、この標準を満たすようにモデルをトレーニングできることが示される。本稿では,n桁整数加算のための自動回帰変換器モデルを完全に検証する。検証されたモジュールの再利用性を示すため、トレーニングされた整数加算モデルをより大きな未学習モデルに挿入し、加算と減算の両方を行うように組み合わせたモデルを訓練する。両タスクの加算回路を広範囲に再利用し,より複雑な減算器モデルの検証を容易にする。本稿では,検証済みのタスクモジュールをLMに挿入することで,モデルの再利用を有効活用し,それらを用いた言語モデルの妥当性と信頼性を向上させる方法について論じる。検証回路の再利用により、言語モデルの安全性に向けた重要なステップであると考えられる、より複雑な複合モデルを検証する労力が削減される。 Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify an auto-regressive transformer model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into a larger untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# ポリノミアル時間におけるReLUニューラルネットワーク近似グローバルオプティマの凸緩和 Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time ( http://arxiv.org/abs/2402.03625v3 ) ライセンス: Link先を確認	Sungyoon Kim, Mert Pilanci,	(参考訳) 本稿では,2層ReLUネットワーク間における重み劣化と凸緩和の最適性ギャップについて検討する。トレーニングデータがランダムであれば,n がトレーニングサンプル数である O(log n^0.5) の係数によって,元の問題と緩和の間の相対的最適性ギャップが有界であることが示される。単純な応用は、元の非凸問題を対数係数まで解くことが保証される、抽出可能な多項式時間アルゴリズムにつながる。さらに, 緩やかな仮定の下では, 局所勾配法は訓練損失の低い点に収束し, 高い確率で収束することを示す。その結果,局所勾配法が有効である理由の理解に新たな光を当てることができた。 In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n^0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that local gradient methods converge to a point with low training loss with high probability. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# Leggett-Garg不等式を用いた単一システムによる認証ランダムネスの生成 Single system based generation of certified randomness using Leggett-Garg inequality ( http://arxiv.org/abs/2402.03712v2 ) ライセンス: Link先を確認	Pingal Pratyush Nath, Debashis Saha, Dipankar Home, Urbasi Sinha,	(参考訳) 我々は、ループホールのないフォトニックアーキテクチャにおいて、Leggett-Gargの不等式違反を利用して、半デバイス非依存の量子乱数生成のための安全なスキームを理論的に定式化し、実験的に示す。生成したランダム性の定量化は、解析的および数値的アプローチによって厳密に推定され、どちらも完全に一致している。 9,19,118ドルの真に予測不可能なビットを確実に生成します。これは、単一のシステムの量子性を利用する信頼性の高い乱数生成器の、経験的に便利なクラスへの未探索の道を開く。 We theoretically formulate and experimentally demonstrate a secure scheme for semi-device-independent quantum random number generation by utilizing Leggett-Garg inequality violations, within a loophole-free photonic architecture. The quantification of the generated randomness is rigorously estimated by analytical as well as numerical approaches, both of which are in perfect agreement. We securely generate $9,19,118$ truly unpredictable bits. This opens up an unexplored avenue towards an empirically convenient class of reliable random number generators harnessing the quantumness of single systems.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# 知識集約型文脈における会話アシスタント:LLMとインテントベースシステムの評価 Conversational Assistants in Knowledge-Intensive Contexts: An Evaluation of LLM- versus Intent-based Systems ( http://arxiv.org/abs/2402.04955v2 ) ライセンス: Link先を確認	Samuel Kernan Freire, Chaofan Wang, Evangelos Niforatos,	(参考訳) 会話アシスタント(CA)は、知識管理における人間の労働者を支援している。伝統的に、CAはユーザー意図や会話パターンを事前に定義した特定の方法で応答する。しかし、この剛性は自然言語の多様性をうまく扱えない。近年の自然言語処理,すなわちLarge Language Models (LLMs) の進歩により,CAはテキストから関連情報を抽出し,専門家から情報を取り出すとともに,'hallucinations'のような新たな課題を導入し,より柔軟で人間的な方法で会話することが可能になった。知識管理タスクにLLMを使用する可能性を評価するため,LLMベースのCAを対話効率,ユーザエクスペリエンス,作業負荷,ユーザビリティに関する意図に基づくシステムと比較した。この結果,LCMをベースとしたCAは,インテントベースシステムよりも優れたユーザエクスペリエンス,タスク完了率,ユーザビリティ,認識性能を示し,NLP技術の変更は知識管理の文脈において有益であることが示唆された。 Conversational Assistants (CA) are increasingly supporting human workers in knowledge management. Traditionally, CAs respond in specific ways to predefined user intents and conversation patterns. However, this rigidness does not handle the diversity of natural language well. Recent advances in natural language processing, namely Large Language Models (LLMs), enable CAs to converse in a more flexible, human-like manner, extracting relevant information from texts and capturing information from expert humans but introducing new challenges such as ``hallucinations''. To assess the potential of using LLMs for knowledge management tasks, we conducted a user study comparing an LLM-based CA to an intent-based system regarding interaction efficiency, user experience, workload, and usability. This revealed that LLM-based CAs exhibited better user experience, task completion rate, usability, and perceived performance than intent-based systems, suggesting that switching NLP techniques can be beneficial in the context of knowledge management.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# GPT-4 構造化ナラティブ・プロンプトを用いたライフイベントの物語生成:検証研究 GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study ( http://arxiv.org/abs/2402.05435v2 ) ライセンス: Link先を確認	Christopher J. Lynch, Erik Jensen, Madison H. Munro, Virginia Zamponi, Joseph Martinez, Kevin O'Brien, Brandon Feldhaus, Katherine Smith, Ann Marie Reinhold, Ross Gore,	(参考訳) 大規模言語モデル(LLM)は、物語の膨大な配列を生成する上で重要な役割を果たす。本研究では,OpenAIのGPT-4を用いて,ゼロショット構造化された物語プロンプトを用いて24,000の物語を生成する。このデータセットから、2,880の物語を手動で分類し、出生、死亡、雇用、解雇の妥当性を評価する。注目すべきは、物語の87.43%が、構造化されたプロンプトの意図を十分に伝えることである。有効かつ無効な物語の識別を自動化するため、分類データセット上で9つの機械学習モデルをトレーニングし、検証する。これらのモデルを利用して分析を拡張し、残りの21,120の物語の分類を予測する。全てのMLモデルは有効な物語を有効に分類するのに優れていたが、無効な物語を無効に分類すると同時に課題を経験した。本研究は, LLMの能力, 限界, 妥当性の研究を前進させるだけでなく, 物語生成や自然言語処理の実用化にも有効である。 Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications.	翻訳日:2024-07-16 05:27:26 公開日:2024-07-12
# 顔行動単位検出のための対照的な特徴表現の学習 Learning Contrastive Feature Representations for Facial Action Unit Detection ( http://arxiv.org/abs/2402.06165v2 ) ライセンス: Link先を確認	Ziqiao Shang, Bin Liu, Fengmao Lv, Fei Teng, Tianrui Li,	(参考訳) 顔アクションユニット(AU)検出は、AUが活性化する際の微妙な特徴差を検出するという課題に長年遭遇してきた。既存の手法はしばしばAUのピクセルレベルの情報を符号化することに頼り、余分な情報をエンコードするだけでなく、モデルの複雑さが増し、一般化可能性も制限される。さらに、各AUタイプのクラス不均衡問題や、ノイズや偽AUラベルの存在により、AU検出の精度が負の影響を受ける。本稿では、自己教師付き信号と教師付き信号の両方を組み込んだAU検出を目的とした新しいコントラスト学習フレームワークを導入し、精度の高いAU検出のための識別特徴の学習を向上する。クラス不均衡問題に対処するために、少数派および多数派のサンプルに対するパラメータの更新のステップサイズを調整する負のサンプル再重み付け戦略を用いる。さらに,雑音や偽AUラベルによる課題に対処するために,3種類の正のサンプル対を含むサンプリング手法を用いる。これにより、教師付き信号に自己教師付き信号を注入し、ノイズラベルの悪影響を効果的に軽減することができる。筆者らは,4つの広く利用されているベンチマークデータセット(BP4D, DISFA, GFT, Aff-Wild2)を用いて実験を行った。我々のコードは \url{https://github.com/Ziqiao-Shang/AUNCE} で入手できる。 Facial action unit (AU) detection has long encountered the challenge of detecting subtle feature differences when AUs activate. Existing methods often rely on encoding pixel-level information of AUs, which not only encodes additional redundant information but also leads to increased model complexity and limited generalizability. Additionally, the accuracy of AU detection is negatively impacted by the class imbalance issue of each AU type, and the presence of noisy and false AU labels. In this paper, we introduce a novel contrastive learning framework aimed for AU detection that incorporates both self-supervised and supervised signals, thereby enhancing the learning of discriminative features for accurate AU detection. To tackle the class imbalance issue, we employ a negative sample re-weighting strategy that adjusts the step size of updating parameters for minority and majority class samples. Moreover, to address the challenges posed by noisy and false AU labels, we employ a sampling technique that encompasses three distinct types of positive sample pairs. This enables us to inject self-supervised signals into the supervised signal, effectively mitigating the adverse effects of noisy labels. Our experimental assessments, conducted on four widely-utilized benchmark datasets (BP4D, DISFA, GFT and Aff-Wild2), underscore the superior performance of our approach compared to state-of-the-art methods of AU detection. Our code is available at \url{https://github.com/Ziqiao-Shang/AUNCE}.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# コンテキストバンドのためのツリーアンサンブル Tree Ensembles for Contextual Bandits ( http://arxiv.org/abs/2402.06963v2 ) ライセンス: Link先を確認	Hannes Nilsson, Rikard Johansson, Niklas Åkerblom, Morteza Haghir Chehreghani,	(参考訳) 木アンサンブルに基づくコンテキスト型マルチアームバンディットのための新しいフレームワークを提案する。本フレームワークは,標準設定と組合せ設定の両方に,アッパー信頼境界とトンプソンサンプリングという2つの広範に使用されている帯域幅法を統合している。我々は,XGBoostとランダム林を併用したいくつかの実験により,本フレームワークの有効性を実証した。提案手法は,決定木やニューラルネットワークに基づく最先端の手法と比較して,ベンチマークデータセットに適用した場合の,後悔の最小化と計算ランタイムの両方の観点から,優れた性能を示す。 We propose a novel framework for contextual multi-armed bandits based on tree ensembles. Our framework integrates two widely used bandit methods, Upper Confidence Bound and Thompson Sampling, for both standard and combinatorial settings. We demonstrate the effectiveness of our framework via several experimental studies, employing both XGBoost and random forest, two popular tree ensemble methods. Compared to state-of-the-art methods based on decision trees and neural networks, our methods exhibit superior performance in terms of both regret minimization and computational runtime, when applied to benchmark datasets and the real-world application of navigation over road networks.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# 教師付き学習力学の比較:ディープニューラルネットワークは人間のデータ効率と一致するが、一般化ラグを示す Comparing supervised learning dynamics: Deep neural networks match human data efficiency but show a generalisation lag ( http://arxiv.org/abs/2402.09303v3 ) ライセンス: Link先を確認	Lukas S. Huber, Fred W. Mast, Felix A. Wichmann,	(参考訳) 近年の研究では、画像分類分野における人間とディープニューラルネットワーク(DNN)の行動比較が数多く行われている。比較研究は、しばしば学習過程の終末に焦点を合わせ、対象カテゴリーの表現における類似性を測定し比較する。しかし、これらの表現の出現過程、すなわち、獲得中に観察される行動変化と中間段階は、直接的かつ経験的に比較されることが少なくなる。本稿では、人間の観察者および様々な古典的かつ最先端のDNNにおける学習力学の詳細な研究について報告する。我々は,開始点,入力モダリティ,利用可能な入力データ,提供されたフィードバックなどの学習関連条件を整合させる,制約付き教師付き学習環境を開発する。学習プロセス全体にわたって、十分に学習された表現が、これまで見つからなかったテストデータにどのように一般化できるかを評価し、比較する。学習プロセス全体の比較は、DNNが人間の学習者と同等のデータ効率のレベルを示しており、この分野におけるいくつかの一般的な仮定に挑戦していることを示している。しかし,本研究の結果は,DNNの学習に顕著な一般化ラグが特徴的であるのに対して,人間は,後に新しいデータにのみ転送されるセット固有情報を学習する予備的な段階を伴わずに,すぐに一般化可能な表現を習得するように見える。 Recent research has seen many behavioral comparisons between humans and deep neural networks (DNNs) in the domain of image classification. Often, comparison studies focus on the end-result of the learning process by measuring and comparing the similarities in the representations of object categories once they have been formed. However, the process of how these representations emerge -- that is, the behavioral changes and intermediate stages observed during the acquisition -- is less often directly and empirically compared. Here we report a detailed investigation of the learning dynamics in human observers and various classic and state-of-the-art DNNs. We develop a constrained supervised learning environment to align learning-relevant conditions such as starting point, input modality, available input data and the feedback provided. Across the whole learning process we evaluate and compare how well learned representations can be generalized to previously unseen test data. Comparisons across the entire learning process indicate that DNNs demonstrate a level of data efficiency comparable to human learners, challenging some prevailing assumptions in the field. However, our results also reveal representational differences: while DNNs' learning is characterized by a pronounced generalisation lag, humans appear to immediately acquire generalizable representations without a preliminary phase of learning training set-specific information that is only later transferred to novel data.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# セキュアコード生成のためのインストラクションチューニング Instruction Tuning for Secure Code Generation ( http://arxiv.org/abs/2402.09497v2 ) ライセンス: Link先を確認	Jingxuan He, Mark Vero, Gabriela Krasnopolska, Martin Vechev,	(参考訳) 現代の言語モデル(LM)は、日常や専門的な文脈、特にプログラミングにおいて広く受け入れられている。この導入を可能にする重要な手順は命令チューニングであり、ユーザ命令や人間の好みに従うように訓練することで、LMの実用性を大幅に向上させる。しかし、既存の命令チューニングスキームは、生成されたコードのセキュリティという重要な側面を見落としている。その結果、最先端の命令チューニングされたLMでさえ、しばしば安全でないコードを生成し、重大なセキュリティリスクを生じさせる。この作業では、このギャップに対処するためにSafeCoderを導入します。 SafeCoderは、自動パイプラインを使用して収集した多種多様な高品質データセットを使用して、セキュリティ中心の微調整を実行します。セキュリティの微調整と標準命令のチューニングを統合し,セキュリティとユーティリティの両面の最適化を容易にする。その単純さにもかかわらず、SafeCoderは様々な人気のあるLMやデータセットで有効であることを示す。ユーティリティを保ちながら、セキュリティを大幅に改善できます(約30%)。 Modern language models (LMs) have gained widespread acceptance in everyday and professional contexts, particularly in programming. An essential procedure enabling this adoption is instruction tuning, which substantially enhances LMs' practical utility by training them to follow user instructions and human preferences. However, existing instruction tuning schemes overlook a crucial aspect: the security of generated code. As a result, even the state-of-the-art instruction-tuned LMs frequently produce unsafe code, posing significant security risks. In this work, we introduce SafeCoder to address this gap. SafeCoder performs security-centric fine-tuning using a diverse and high-quality dataset that we collected using an automated pipeline. We integrate the security fine-tuning with standard instruction tuning, to facilitate a joint optimization of both security and utility. Despite its simplicity, we show that SafeCoder is effective across a variety of popular LMs and datasets. It is able to drastically improve security (by about 30%), while preserving utility.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# 拡散テンパリングは正規微分方程式に対する確率積分器によるパラメータ推定を改善する Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations ( http://arxiv.org/abs/2402.12231v3 ) ライセンス: Link先を確認	Jonas Beck, Nathanael Bosch, Michael Deistler, Kyra L. Kadhim, Jakob H. Macke, Philipp Hennig, Philipp Berens,	(参考訳) 通常微分方程式(ODE)は科学の力学系を記述するために広く用いられているが、実験的な測定を説明するパラメータを特定することは困難である。特に、ODEは微分可能であり、勾配に基づくパラメータ最適化が可能であるが、ODEの非線形ダイナミクスは多くの場合、多くの局所最小化と初期条件に対する極度な感度をもたらす。そこで我々は,ODEにおける勾配に基づくパラメータ最適化の収束性を改善する確率的数値法の新しい正規化手法である拡散テンパリングを提案する。確率積分器の雑音パラメータを反復的に低減することにより、提案手法は真のパラメータにより確実に収束する。本手法は複雑性の異なる力学系に対して有効であることを示すとともに,実際に関連するパラメータ数を持つHodgkin-Huxleyモデルに対して,信頼性の高いパラメータ推定値が得られることを示す。 Ordinary differential equations (ODEs) are widely used to describe dynamical systems in science, but identifying parameters that explain experimental measurements is challenging. In particular, although ODEs are differentiable and would allow for gradient-based parameter optimization, the nonlinear dynamics of ODEs often lead to many local minima and extreme sensitivity to initial conditions. We therefore propose diffusion tempering, a novel regularization technique for probabilistic numerical methods which improves convergence of gradient-based parameter optimization in ODEs. By iteratively reducing a noise parameter of the probabilistic integrator, the proposed method converges more reliably to the true parameters. We demonstrate that our method is effective for dynamical systems of different complexity and show that it obtains reliable parameter estimates for a Hodgkin-Huxley model with a practically relevant number of parameters.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# 障害のないSachdev-Ye-Kitaevモデル:積分性と量子カオス Disorder-Free Sachdev-Ye-Kitaev models: Integrability and Quantum Chaos ( http://arxiv.org/abs/2402.13154v2 ) ライセンス: Link先を確認	Soshun Ozaki, Hosho Katsura,	(参考訳) 本稿では、Sachdev-Ye-Kitaevモデル(SYK)の2つの障害のない変種を紹介し、それらの可積分性を実証し、それらの静的および動的性質について検討する。図式的手法とは異なり、これらのモデルの可積分性は、マヨラナフェルミオンの数が有限である場合でも、動的相関関数を得ることができる。これらの解から、これらのモデルにおける時間外相関器(OTOC)は、障害や外的キック項のような量子カオス系と同様、早期に指数関数的な成長を示すことが分かる。逆に、我々の分析では、レベル統計学やスペクトル形状因子におけるランダム行列の挙動の証拠は示されていない。以上の結果から,SYKモデルのクリーンバージョンは,OTOCのカオス的挙動を示す乱れのない量子多体系の単純かつ非自明な例であることがわかった。 We introduce two disorder-free variants of the Sachdev-Ye-Kitaev (SYK) model, demonstrate their integrability, and study their static and dynamical properties. Unlike diagrammatic techniques, the integrability of these models allows us to obtain dynamical correlation functions even when the number of Majorana fermions is finite. From the solutions, we find that out-of-time-order correlators (OTOCs) in these models exhibit exponential growth at early times, resembling that of quantum chaotic systems such as those with disorder or external kick terms. Conversely, our analysis shows no evidence of random-matrix behavior in level statistics or the spectral form factor. Our findings illustrate that the clean versions of the SYK models represent simple but nontrivial examples of disorder-free quantum many-body systems displaying chaos-like behavior of OTOCs.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# STENCIL: コールドスタートアクティブラーニングのためのサブモジュール相互情報に基づく弱スーパービジョン STENCIL: Submodular Mutual Information Based Weak Supervision for Cold-Start Active Learning ( http://arxiv.org/abs/2402.13468v2 ) ライセンス: Link先を確認	Nathan Beck, Adithya Iyer, Rishabh Iyer,	(参考訳) NLPアプリケーションにおける事前訓練済みモデルの微調整が普及するにつれて、特に大きな言語モデルにおけるパラメータ数の増加に伴い、注釈付きデータのコーパスが大きいことが要求される。モデルパフォーマンスを最大に向上させるためにラベルのないインスタンスをマイニングし注釈付けしようとするアクティブラーニングは、アノテーションコストを削減するための一般的な選択肢であるが、ほとんどのメソッドは、クラス不均衡を無視したり、初期アノテーション付きデータへのアクセスを前提としたり、稀なクラスを改善する前に複数のアクティブラーニング選択を必要とする。本稿では,一連のテキスト例と最近提案されたサブモジュール相互情報を利用して,アノテータによって強くラベル付けされた弱いラベル付けされたレアクラスのインスタンス群を選択する。 STENCILは、クラス不均衡のコールドスタート設定において、一般的なアクティブな学習方法よりも、複数のテキスト分類データセットに対して10\%-18\%$とレアクラスのF-1スコアを17\%-40\%$に改善することを示した。 As supervised fine-tuning of pre-trained models within NLP applications increases in popularity, larger corpora of annotated data are required, especially with increasing parameter counts in large language models. Active learning, which attempts to mine and annotate unlabeled instances to improve model performance maximally fast, is a common choice for reducing the annotation cost; however, most methods typically ignore class imbalance and either assume access to initial annotated data or require multiple rounds of active learning selection before improving rare classes. We present STENCIL, which utilizes a set of text exemplars and the recently proposed submodular mutual information to select a set of weakly labeled rare-class instances that are then strongly labeled by an annotator. We show that STENCIL improves overall accuracy by $10\%-18\%$ and rare-class F-1 score by $17\%-40\%$ on multiple text classification datasets over common active learning methods within the class-imbalanced cold-start setting.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# 長期制約付きチャット凸関数 Chasing Convex Functions with Long-term Constraints ( http://arxiv.org/abs/2402.14012v2 ) ライセンス: Link先を確認	Adam Lechowicz, Nicolas Christianson, Bo Sun, Noman Bashir, Mohammad Hajiesmaili, Adam Wierman, Prashant Shenoy,	(参考訳) 我々は,長期的制約を伴うオンライン計量問題群を紹介し,研究する。これらの問題において、オンラインプレーヤーは、計量空間$(X,d)$で$\mathbf{x}_t$を判定し、ヒットコスト$f_t(\mathbf{x}_t)$を同時に最小化し、計量によって決定される切り替えコストを最小化する。時間が経つにつれて、プレイヤーは長期要求制約である$\sum_{t} c(\mathbf{x}_t) \geq 1$を満たさなければならない。このような問題は、持続可能なエネルギー/計算システムにおけるオンラインリソース割り当てへの幅広い応用を見出すことができる。我々は,有界ヒットコスト勾配と重み付き$\ell_1$メトリクスの場合に最適な競合性および学習強化アルゴリズムを考案し,さらに,提案アルゴリズムが数値実験で良好に動作することを示す。 We introduce and study a family of online metric problems with long-term constraints. In these problems, an online player makes decisions $\mathbf{x}_t$ in a metric space $(X,d)$ to simultaneously minimize their hitting cost $f_t(\mathbf{x}_t)$ and switching cost as determined by the metric. Over the time horizon $T$, the player must satisfy a long-term demand constraint $\sum_{t} c(\mathbf{x}_t) \geq 1$, where $c(\mathbf{x}_t)$ denotes the fraction of demand satisfied at time $t$. Such problems can find a wide array of applications to online resource allocation in sustainable energy/computing systems. We devise optimal competitive and learning-augmented algorithms for the case of bounded hitting cost gradients and weighted $\ell_1$ metrics, and further show that our proposed algorithms perform well in numerical experiments.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# 人間機械社会システム Human-machine social systems ( http://arxiv.org/abs/2402.14410v2 ) ライセンス: Link先を確認	Milena Tsvetkova, Taha Yasseri, Niccolo Pescetelli, Tobias Werner,	(参考訳) 偽のソーシャルメディアアカウントや生成AIチャットボットから金融取引アルゴリズムや自動運転車、ロボット、ボット、アルゴリズムに至るまで、私たちのコミュニケーションチャネル、社会的相互作用、経済取引、そして交通機関が普及し、浸透しています。複数の相互依存・相互作用する人間と自律機械のネットワークは複雑な社会システムを構成する。本パラダイムでは, 競争, 協調, 協力, 伝染, 集団的意思決定の状況において, 競争, 協調, 協調, 協調, 集団的意思決定の状況における, さまざまな分野からの最近の研究を概観し, 高頻度取引市場, ソーシャルメディアプラットフォーム, オープン・コラボレーション・コミュニティ, ディスカッション・フォーラムの事例を考察する。より堅牢でレジリエントな人間と機械のコミュニティを確実にするためには、研究者たちは複雑なシステム手法を使ってそれらを研究し、エンジニアは人間と機械の相互作用のためのAIを明示的に設計し、規制当局は人間と機械の生態多様性と社会的共進化を統治する必要がある。 From fake social media accounts and generative-AI chatbots to financial trading algorithms and self-driving vehicles, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and autonomous machines constitute complex social systems where the collective outcomes cannot be deduced from either human or machine behavior alone. Under this paradigm, we review recent research from across a range of disciplines and identify general dynamics and patterns in situations of competition, coordination, cooperation, contagion, and collective decision-making, with context-rich examples from high-frequency trading markets, a social media platform, an open-collaboration community, and a discussion forum. To ensure more robust and resilient human-machine communities, researchers should study them using complex-system methods, engineers should explicitly design AI for human-machine and machine-machine interactions, and regulators should govern the ecological diversity and social co-evolution of humans and machines.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# 複数のモデルにまたがる統一タスク埋め込みに向けて: Promptベースの大規模言語モデルのギャップを埋める Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond ( http://arxiv.org/abs/2402.14522v2 ) ライセンス: Link先を確認	Xinyu Wang, Hainiu Xu, Lin Gui, Yulan He,	(参考訳) タスク固有の情報をキャプチャするメタ学習技術であるタスク埋め込みは、特にマルチタスク学習、モデル編集、解釈可能性などの分野で人気を集めている。しかし、プロンプト誘導型大規模言語モデル(LLM)がグラデーションフリーで動作し、課題に直面している。既存のタスク埋め込み手法は、細調整されたタスク固有の言語モデルに依存しており、様々なモデル、特にプロンプトベースのLLMに対するタスク埋め込みの適応性を妨げている。 LLMの時代にタスク埋め込みの可能性を困難にするため、単一ベクトル空間内で、より小さな言語モデルや様々なプロンプトを持つLLMを含む様々なモデルからタスク埋め込みを調和させる統合タスク埋め込み(FUTE)フレームワークを提案する。このような統一性は、異なるモデル間の類似性の比較と分析を可能にし、アーキテクチャ固有のメソッドに匹敵する性能を維持しながら、既存のタスク埋め込みメソッドの範囲と実用性を広げる。 Task embedding, a meta-learning technique that captures task-specific information, has gained popularity, especially in areas such as multi-task learning, model editing, and interpretability. However, it faces challenges with the emergence of prompt-guided Large Language Models (LLMs) operating in a gradient-free manner. Existing task embedding methods rely on fine-tuned, task-specific language models, which hinders the adaptability of task embeddings across diverse models, especially prompt-based LLMs. To hardness the potential of task embeddings in the era of LLMs, we propose a framework for unified task embeddings (FUTE), harmonizing task embeddings from various models, including smaller language models and LLMs with varied prompts, within a single vector space. Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios, while maintaining their performance comparable to architecture-specific methods.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# 量子熱状態におけるデータのトポロジー The topology of data hides in quantum thermal states ( http://arxiv.org/abs/2402.15633v2 ) ライセンス: Link先を確認	Stefano Scali, Chukwudubem Umeano, Oleksandr Kyriienko,	(参考訳) 量子熱状態の蒸留によるトポロジカルデータ解析(TDA)を行うための量子プロトコルを提供する。量子熱状態生成アルゴリズムの最近の進歩は、散逸性リンドブレディアンの性質によって定義される特徴的スケーリングを明らかにする。これは、組合せラプラシアンの性質に依存するスケーリングを持つユニタリ進化に基づくプロトコルとは対照的である。量子熱状態生成アルゴリズムを活用するために、量子TDAをリアルタイムから虚像に変換し、パラダイムをユニタリなアプローチから散逸的なアプローチにシフトする。システムの基底状態と重なり合う初期状態から始めると、そのエネルギーはデータセット固有のチャネルを介して散逸し、その情報を自然に蒸留することができる。したがって、ベッチ数の計算は純度推定に変換される。あるいは、このことはR\'{e}nyi 2-エントロピー、ウルマンのフィディリティ、あるいは単純錯体の埋め込みトポロジーとの熱状態に対するヒルベルト・シュミット距離の評価と解釈できる。我々の研究は、データトポロジのより物理的解釈に向けて、TDAの分野を開放する。 We provide a quantum protocol to perform topological data analysis (TDA) via the distillation of quantum thermal states. Recent developments of quantum thermal state preparation algorithms reveal their characteristic scaling defined by properties of dissipative Lindbladians. This contrasts with protocols based on unitary evolution which have a scaling depending on the properties of the combinatorial Laplacian. To leverage quantum thermal state preparation algorithms, we translate quantum TDA from a real-time to an imaginary-time picture, shifting the paradigm from a unitary approach to a dissipative one. Starting from an initial state overlapping with the ground state of the system, one can dissipate its energy via channels unique to the dataset, naturally distilling its information. Therefore calculating Betti numbers translates into a purity estimation. Alternatively, this can be interpreted as the evaluation of the R\'{e}nyi 2-entropy, Uhlmann fidelity or Hilbert-Schmidt distance relative to thermal states with the embedded topology of simplicial complexes. Our work opens the field of TDA toward a more physical interpretation of the topology of data.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# アルゴリズム問題を解くニューラルネットワーク書き換えシステム A Neural Rewriting System to Solve Algorithmic Problems ( http://arxiv.org/abs/2402.17407v2 ) ライセンス: Link先を確認	Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti,	(参考訳) 現代のニューラルネットワークアーキテクチャは、アウト・オブ・ディストリビューションの問題を解決するために構成規則を体系的に適用する必要があるアルゴリズムの手順を学ぶのに依然として苦労している。本研究では,ニューラルアーキテクチャの体系的一般化能力の研究に使用される合成ベンチマークのクラスである式単純化問題に焦点をあてる。本稿では,最小限の学習例にのみ依存して,ネストした数式を解くための一般的な手順を学習するために設計されたモジュラーアーキテクチャを提案する。シンボリック人工知能の古典的な枠組みであるシステム書き換えに触発された我々は、解決可能な部分表現を識別するために訓練されたセレクタ(Selector)と、それらの値にサブ表現をマッピングするソルバー(Solver)と、元の式でのサブ表現をソルバー(Solver)が提供する解に置き換えるコンビネータ(Compiner)という、3つの特殊かつ相互作用するモジュールをアーキテクチャに含めている。我々は,系統的な一般化に特化した最近のモデルであるニューラル・データ・ルータと,先進的なプロンプト戦略で探索された最先端の大規模言語モデル(GPT-4)とをベンチマークした。本稿では,3種類の式単純化問題に対するこれらの代替手法と比較して,分布外一般化の程度が高いことを実証し,その限界を解析して考察する。 Modern neural network architectures still struggle to learn algorithmic procedures that require to systematically apply compositional rules to solve out-of-distribution problem instances. In this work, we focus on formula simplification problems, a class of synthetic benchmarks used to study the systematic generalization capabilities of neural architectures. We propose a modular architecture designed to learn a general procedure for solving nested mathematical formulas by only relying on a minimal set of training examples. Inspired by rewriting systems, a classic framework in symbolic artificial intelligence, we include in the architecture three specialized and interacting modules: the Selector, trained to identify solvable sub-expressions; the Solver, mapping sub-expressions to their values; and the Combiner, replacing sub-expressions in the original formula with the solution provided by the Solver. We benchmark our system against the Neural Data Router, a recent model specialized for systematic generalization, and a state-of-the-art large language model (GPT-4) probed with advanced prompting strategies. We demonstrate that our approach achieves a higher degree of out-of-distribution generalization compared to these alternative approaches on three different types of formula simplification problems, and we discuss its limitations by analyzing its failures.	翻訳日:2024-07-16 05:17:24 公開日:2024-07-12
# ShapeLLM: 身体インタラクションのためのユニバーサル3Dオブジェクト理解 ShapeLLM: Universal 3D Object Understanding for Embodied Interaction ( http://arxiv.org/abs/2402.17766v3 ) ライセンス: Link先を確認	Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, Kaisheng Ma,	(参考訳) 本稿では,3次元点群と言語を用いた汎用的な3次元オブジェクト理解を探求する,最初の3次元マルチモーダル大言語モデルであるShapeLLMを提案する。 ShapeLLMはReConをReCon++に拡張することで改良された3Dエンコーダ上に構築されている。 LLMのための3Dポイントクラウド入力エンコーダとしてReCon++を活用することで、ShapeLLMは命令追従データの構築を訓練し、3D MM-Vetという新しいベンチマークでテストする。 ReCon++とShapeLLMは、3Dの幾何学的理解と、具体化された視覚的接地のような言語統一された3Dインタラクションタスクにおいて最先端のパフォーマンスを達成する。プロジェクトページ: https://qizekun.github.io/shapellm/ This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages. ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++ that benefits from multi-view image distillation for enhanced geometry understanding. By utilizing ReCon++ as the 3D point cloud input encoder for LLMs, ShapeLLM is trained on constructed instruction-following data and tested on our newly human-curated benchmark, 3D MM-Vet. ReCon++ and ShapeLLM achieve state-of-the-art performance in 3D geometry understanding and language-unified 3D interaction tasks, such as embodied visual grounding. Project page: https://qizekun.github.io/shapellm/	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# 重機型クラス不均衡とAdamが言語モデルでグラディエント・ダイスを上回る理由 Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models ( http://arxiv.org/abs/2402.19449v2 ) ライセンス: Link先を確認	Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti,	(参考訳) Adamは、他のタスクよりも大きなマージンで、大きな言語モデルでの勾配勾配よりも優れていることが示されているが、なぜかは定かではない。この性能ギャップの重要な要因は、言語タスクで見られる重み付きクラス不均衡であることを示す。勾配降下法で訓練すると、頻度の低い単語の損失は、頻繁な単語の損失よりも遅くなる。これは、ほとんどのサンプルが頻度の低い単語から来ているため、平均的な損失が緩やかに減少する。一方、Adamと手話に基づく手法はこの問題にはあまり敏感ではない。この動作がクラス不均衡によって引き起こされることを示すために、アーキテクチャやデータタイプ、言語変換器、視覚CNN、線形モデル上で再現できることを実証的に示す。クロスエントロピー損失を持つ線形モデルにおいて、クラス不均衡はアダムに利益をもたらすと仮定された不均衡な相関勾配とヘッセン性をもたらすことを示す。また、連続時間において、勾配降下は低周波のクラスにゆっくりと収束するが、符号降下は必ずしも収束しないことを示す。 Adam has been shown to outperform gradient descent on large language models by a larger margin than on other tasks, but it is unclear why. We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks. When trained with gradient descent, the loss of infrequent words decreases more slowly than the loss of frequent ones. This leads to a slow decrease on the average loss as most samples come from infrequent words. On the other hand, Adam and sign-based methods are less sensitive to this problem. To establish that this behavior is caused by class imbalance, we show empirically that it can be reproduced across architectures and data types, on language transformers, vision CNNs, and linear models. On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. We also prove that, in continuous time, gradient descent converges slowly on low-frequency classes while sign descent does not.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# 深度情報を利用した単一画像デハジングのための協調的相互促進ネットワーク Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing ( http://arxiv.org/abs/2403.01105v2 ) ライセンス: Link先を確認	Yafei Zhang, Shen Zhou, Huafeng Li,	(参考訳) 一つのぼんやりした画像から明確なイメージを復元することは、オープンな逆問題である。研究の進展は著しいが、既存の手法のほとんどは上流の脱ハイキングを促進するために下流のタスクが果たす影響を無視している。ヘイズ生成機構の観点からは、シーンの深さ情報とヘイズ画像との間に潜在的な関係がある。そこで本研究では,単一画像のデハジングを実現するためのマルチタスク協調促進フレームワークを提案する。本フレームワークは,両タスクインタラクション機構による深度推定とデハジングを統合し,性能の相互向上を実現する。 2つのタスクの協調最適化を実現するために,差分認識を用いた代替実装機構を開発した。一方,デハジング結果の深度マップと理想像との差分認識を提案し,デハジングネットワークを促進させ,デハジングの非理想領域に注意を払う。一方、ヘイズ画像の回収困難な領域における深度推定性能を向上させることにより、ヘイズ画像の深度情報を明示的に利用して鮮明な画像復元を支援することができる。深度推定を促進するために,デハズド画像と地上の真実との差を利用して,デハズド一理想領域に焦点を合わせ,深度推定ネットワークを誘導する手法を提案する。これにより、デハジングと深さの推定は、相互に強化された方法で彼らの強みを活用することができる。実験結果から,提案手法は最先端手法よりも優れた性能が得られることが示された。 Recovering a clear image from a single hazy image is an open inverse problem. Although significant research progress has been made, most existing methods ignore the effect that downstream tasks play in promoting upstream dehazing. From the perspective of the haze generation mechanism, there is a potential relationship between the depth information of the scene and the hazy image. Based on this, we propose a dual-task collaborative mutual promotion framework to achieve the dehazing of a single image. This framework integrates depth estimation and dehazing by a dual-task interaction mechanism and achieves mutual enhancement of their performance. To realize the joint optimization of the two tasks, an alternative implementation mechanism with the difference perception is developed. On the one hand, the difference perception between the depth maps of the dehazing result and the ideal image is proposed to promote the dehazing network to pay attention to the non-ideal areas of the dehazing. On the other hand, by improving the depth estimation performance in the difficult-to-recover areas of the hazy image, the dehazing network can explicitly use the depth information of the hazy image to assist the clear image recovery. To promote the depth estimation, we propose to use the difference between the dehazed image and the ground truth to guide the depth estimation network to focus on the dehazed unideal areas. It allows dehazing and depth estimation to leverage their strengths in a mutually reinforcing manner. Experimental results show that the proposed method can achieve better performance than that of the state-of-the-art approaches.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# 低照度映像強調のための時空間アライメントSUNetモデル A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement ( http://arxiv.org/abs/2403.02408v3 ) ライセンス: Link先を確認	Ruirui Lin, Nantheera Anantrasirichai, Alexandra Malyugina, David Bull,	(参考訳) 低照度条件による歪みは視覚的に不快なだけでなく、コンピュータビジョンタスクのパフォーマンスを低下させる。修復と強化は、非常に有益であることが証明されている。しかし、低照度で取得したビデオ用に明示的に設計された拡張手法は限られている。本稿では,Swin Transformer をバックボーンとした時空間適応SUNet(Spatio-Temporal Aligned SUNet)モデルを提案する。 STA-SUNetモデルは、様々な光条件下でキャプチャされた動的なシーンを含む、新しい完全に登録されたデータセット(BVI)に基づいて訓練されている。さらに3つのテストデータセット上で、他のさまざまなモデルに対して比較分析される。このモデルは全てのデータセットに対して優れた適応性を示し、最も高いPSNRとSSIM値を得る。極端に低照度な条件下では特に有効であり、非常に良好な視覚化結果をもたらす。 Distortions caused by low-light conditions are not only visually unpleasant but also degrade the performance of computer vision tasks. The restoration and enhancement have proven to be highly beneficial. However, there are only a limited number of enhancement methods explicitly designed for videos acquired in low-light conditions. We propose a Spatio-Temporal Aligned SUNet (STA-SUNet) model using a Swin Transformer as a backbone to capture low light video features and exploit their spatio-temporal correlations. The STA-SUNet model is trained on a novel, fully registered dataset (BVI), which comprises dynamic scenes captured under varying light conditions. It is further analysed comparatively against various other models over three test datasets. The model demonstrates superior adaptivity across all datasets, obtaining the highest PSNR and SSIM values. It is particularly effective in extreme low-light conditions, yielding fairly good visualisation results.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# 生成モデリング研究のためのクリニカル人工知能チェックリストに関する最小情報(MI-CLAIM-GEN) The Minimum Information about CLinical Artificial Intelligence Checklist for Generative Modeling Research (MI-CLAIM-GEN) ( http://arxiv.org/abs/2403.02558v2 ) ライセンス: Link先を確認	Brenda Y. Miao, Irene Y. Chen, Christopher YK Williams, Jaysón Davidson, Augusto Garcia-Agundez, Shenghuan Sun, Travis Zack, Suchi Saria, Rima Arnaout, Giorgio Quer, Hossein J. Sadaei, Ali Torkamani, Brett Beaulieu-Jones, Bin Yu, Milena Gianfrancesco, Atul J. Butte, Beau Norgeot, Madhumita Sushil,	(参考訳) 大規模言語モデル(LLMs)、視覚言語モデル(VLMs)、拡散モデル(拡散モデル)などの生成モデルの最近の進歩は、医学における自然言語と画像処理の分野を加速させ、バイオメディカルモデルを開発・展開する際の重要なパラダイムシフトとなった。これらのモデルは、新しいタスクに非常に適応できるが、その使い方のスケーリングと評価は、以前のフレームワークでは対処できなかった新しい課題を示す。特に、これらのモデルが、専門的なトレーニングデータ(「ゼロ」または「ファウショット」アプローチ)をほとんど必要とせず、有用なアウトプットを生成する能力と、そのアウトプットのオープンな性質は、臨床生成モデル研究の堅牢な報告のための新しいガイドラインの開発を必要としている。米国大統領令141103および臨床AI評価のための新興国ネットワークによって特定される臨床AIツールの開発における標準とベストプラクティスのギャップに対応するため、我々は、元のMI-CLAIMチェックリストに基づいてこれらのガイドラインのいくつかを定式化し始めた。新しいチェックリストであるMI-CLAIM-GEN(Table 1)は、非生成的(予測的)AIモデルと比較して、新しい生成モデルのトレーニング、評価、解釈可能性、再現性の違いに対処することを目的としている。このMI-CLAIM-GENチェックリストは、非構造化臨床データによるコホート選択報告を明確にし、臨床AI研究の倫理基準に沿った追加項目を追加することを目的とする。 Recent advances in generative models, including large language models (LLMs), vision language models (VLMs), and diffusion models, have accelerated the field of natural language and image processing in medicine and marked a significant paradigm shift in how biomedical models can be developed and deployed. While these models are highly adaptable to new tasks, scaling and evaluating their usage presents new challenges not addressed in previous frameworks. In particular, the ability of these models to produce useful outputs with little to no specialized training data ("zero-" or "few-shot" approaches), as well as the open-ended nature of their outputs, necessitate the development of new guidelines for robust reporting of clinical generative model research. In response to gaps in standards and best practices for the development of clinical AI tools identified by US Executive Order 141103 and several emerging national networks for clinical AI evaluation, we begin to formalize some of these guidelines by building on the original MI-CLAIM checklist. The new checklist, MI-CLAIM-GEN (Table 1), aims to address differences in training, evaluation, interpretability, and reproducibility of new generative models compared to non-generative ("predictive") AI models. This MI-CLAIM-GEN checklist also seeks to clarify cohort selection reporting with unstructured clinical data and adds additional items on alignment with ethical standards for clinical AI research.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# 集中治療室(ICU)におけるコンピュータビジョンの活用 Leveraging Computer Vision in the Intensive Care Unit (ICU) for Examining Visitation and Mobility ( http://arxiv.org/abs/2403.06322v2 ) ライセンス: Link先を確認	Scott Siegel, Jiaqing Zhang, Sabyasachi Bandyopadhyay, Subhash Nerella, Brandon Silva, Tezcan Baslanti, Azra Bihorac, Parisa Rashidi,	(参考訳) ICU (Intensive Care Unit) において患者を綿密に監視することの重要性にもかかわらず、医療提供者に課される時間的制約のため、多くの側面が限定的に評価されている。例えば、休息中の過度の訪問は概日リズムの破壊やデリリウムのリスクを悪化させる可能性があるが、ICUでは捕獲されない。同様に、ICU患者の回復または悪化の指標としてモビリティが重要であるが、これは散発的にのみ捕獲されるか、全く捕獲されないかのどちらかである。過去数年間、コンピュータビジョン分野は、人的負担を減らすことで、多くの領域で応用を見出した。 ICUのコンピュータビジョンシステムを使用することで、既存の評価の頻度と精度を高めつつ、スタッフの作業量を削減できる可能性がある。本研究では、奥行き画像に基づく最先端の非侵襲型コンピュータビジョンシステムを活用し、ICU訪問と患者の移動性を特徴付ける。次に、訪問と、痛み、明度、デリリウムなどのいくつかの患者結果との関係について検討する。患者視力低下と訪問の増加に伴うデリリウムの出現との関連を見いだした。一方,DVPRS(Defense and Veteran Pain Rating Scale)を用いた自己報告痛は,来院率の低下と相関した。 ICU患者に対する非侵襲的自律システムの有用性と可能性について検討した。 Despite the importance of closely monitoring patients in the Intensive Care Unit (ICU), many aspects are still assessed in a limited manner due to the time constraints imposed on healthcare providers. For example, although excessive visitations during rest hours can potentially exacerbate the risk of circadian rhythm disruption and delirium, it is not captured in the ICU. Likewise, while mobility can be an important indicator of recovery or deterioration in ICU patients, it is only captured sporadically or not captured at all. In the past few years, the computer vision field has found application in many domains by reducing the human burden. Using computer vision systems in the ICU can also potentially enable non-existing assessments or enhance the frequency and accuracy of existing assessments while reducing the staff workload. In this study, we leverage a state-of-the-art noninvasive computer vision system based on depth imaging to characterize ICU visitations and patients' mobility. We then examine the relationship between visitation and several patient outcomes, such as pain, acuity, and delirium. We found an association between deteriorating patient acuity and the incidence of delirium with increased visitations. In contrast, self-reported pain, reported using the Defense and Veteran Pain Rating Scale (DVPRS), was correlated with decreased visitations. Our findings highlight the feasibility and potential of using noninvasive autonomous systems to monitor ICU patients.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# StainFuser:マルチギガピクセル画像におけるより高速なニューラルスタイル転送のための拡散制御 StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images ( http://arxiv.org/abs/2403.09302v2 ) ライセンス: Link先を確認	Robert Jewsbury, Ruoyu Wang, Abhir Bhalerao, Nasir Rajpoot, Quoc Dang Vu,	(参考訳) 静止正規化アルゴリズムは、ソースマルチギガピクセルのヒストロジー画像の色と強度特性を、対象画像の色に合わせるように変換することを目的としており、画像中の細胞成分の強調に用いられる染色の外観上の矛盾を緩和する。我々は,新しい条件付き潜在拡散アーキテクチャを用いて,この問題をスタイル伝達タスクとして扱う新しいアプローチであるStainFuserを提案し,手作りカラーコンポーネントの必要性を排除した。本手法により,SPI-2Mは,200万枚以上の組織像に対して,高品質な画像変換のためのニューラルスタイル転送を行うため,これまでで最大の染色正規化データセットである。このデータに基づいてトレーニングされたStainFuserは、正規化された画像の品質と、CoNICデータセットのダウンストリームモデルパフォーマンスの観点から、最先端のディープラーニングおよび手作りの手法より優れています。 Stain normalization algorithms aim to transform the color and intensity characteristics of a source multi-gigapixel histology image to match those of a target image, mitigating inconsistencies in the appearance of stains used to highlight cellular components in the images. We propose a new approach, StainFuser, which treats this problem as a style transfer task using a novel Conditional Latent Diffusion architecture, eliminating the need for handcrafted color components. With this method, we curate SPI-2M the largest stain normalization dataset to date of over 2 million histology images with neural style transfer for high-quality transformations. Trained on this data, StainFuser outperforms current state-of-the-art deep learning and handcrafted methods in terms of the quality of normalized images and in terms of downstream model performance on the CoNIC dataset.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# 低ランクボツネックを用いたビジョンランゲージパラメータ効率の良いファインチューニングへのルーティング関数の導入 Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks ( http://arxiv.org/abs/2403.09377v2 ) ライセンス: Link先を確認	Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens,	(参考訳) LoRA(英語版)やAdapter(英語版)のようなメインストリームパラメータ効率の良い微調整(PEFT)手法は、モデルの隠れた状態を低い次元に投影し、トレーニング済みのモデルがこの低ランクのボトルネックを通じて新しいデータに適応できるようにする。しかしながら、視覚言語(VL)タスクのような複数のモダリティを含むPEFTタスクは、新しいデータへの適応だけでなく、異なるモダリティ間の関係も学習する必要がある。 VL PEFTタスクをターゲットに、低ランクボトルネックにおけるVLアライメントを高めるためにルーティング関数と呼ばれる一連の操作を提案する。これらの特徴ルーティング関数は線形演算を採用し、新しいトレーニング可能なパラメータを導入しない。詳細な分析を行ない、その振る舞いを研究する。様々なVL PEFT設定において、ルーティング機能は元のPEFTメソッドのパフォーマンスを大幅に改善し、VQAv2$\text{RoBERTa}_{\text{large}}$+ViT-L/16)とCOCOキャプション(GPT2-medium+ViT-L/16)を20以上改善した。また,CLIP-BARTのような事前学習型マルチモーダルモデルの微調整では,VL PEFTタスクの幅が小さくても一貫した改善が観察される。私たちのコードはhttps://github.com/tingyu215/Routing_VLPEFTで利用可能です。 Mainstream parameter-efficient fine-tuning (PEFT) methods, such as LoRA or Adapter, project a model's hidden states to a lower dimension, allowing pre-trained models to adapt to new data through this low-rank bottleneck. However, PEFT tasks involving multiple modalities, like vision-language (VL) tasks, require not only adaptation to new data but also learning the relationship between different modalities. Targeting at VL PEFT tasks, we propose a family of operations, called routing functions, to enhance VL alignment in the low-rank bottlenecks. These feature routing functions adopt linear operations and do not introduce new trainable parameters. In-depth analyses are conducted to study their behavior. In various VL PEFT settings, the routing functions significantly improve performance of the original PEFT methods, achieving over 20\% improvement on VQAv2 ($\text{RoBERTa}_{\text{large}}$+ViT-L/16) and 30\% on COCO Captioning (GPT2-medium+ViT-L/16). Also when fine-tuning a pre-trained multimodal model such as CLIP-BART, we observe smaller but consistent improvements across a range of VL PEFT tasks. Our code is available at https://github.com/tingyu215/Routing_VLPEFT.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# Glyph-ByT5: 正確なビジュアルテキストレンダリングのためのカスタマイズされたテキストエンコーダ Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering ( http://arxiv.org/abs/2403.09622v2 ) ライセンス: Link先を確認	Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan,	(参考訳) ビジュアルテキストレンダリングは、テキストエンコーダの欠陥が中心的な問題となっている現代テキスト・画像生成モデルにおいて、根本的な課題となっている。正確なテキストレンダリングを実現するために,文字認識とグリフとのアライメントという,テキストエンコーダの2つの重要な要件を特定した。我々のソリューションは、微妙にキュレートされたグリフテキストデータセットを使用して文字認識のBYT5エンコーダを微調整することで、一連のカスタマイズされたテキストエンコーダ、Glyph-ByT5を作成することである。本稿では,Glyph-ByT5をSDXLに統合する方法を提案する。これにより、テキストレンダリングの精度が大幅に向上し、デザインイメージベンチマークで20セント未満から90セント近くに改善します。注目すべきは、Glyph-SDXLの新しいテキスト段落レンダリング機能で、自動的な複数行レイアウトを持つ数十から数百文字のスペル精度を実現することである。最後に,Glyph-SDXLの微調整により,オープンドメイン実画像におけるシーンテキストレンダリング機能を大幅に向上させることを示す。これらの魅力的な成果は、多様で困難なタスクのためにカスタマイズされたテキストエンコーダを設計する際のさらなる調査を促進することを目的としている。 Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies. To achieve accurate text rendering, we identify two crucial requirements for text encoders: character awareness and alignment with glyphs. Our solution involves crafting a series of customized text encoder, Glyph-ByT5, by fine-tuning the character-aware ByT5 encoder using a meticulously curated paired glyph-text dataset. We present an effective method for integrating Glyph-ByT5 with SDXL, resulting in the creation of the Glyph-SDXL model for design image generation. This significantly enhances text rendering accuracy, improving it from less than $20\%$ to nearly $90\%$ on our design image benchmark. Noteworthy is Glyph-SDXL's newfound ability for text paragraph rendering, achieving high spelling accuracy for tens to hundreds of characters with automated multi-line layouts. Finally, through fine-tuning Glyph-SDXL with a small set of high-quality, photorealistic images featuring visual text, we showcase a substantial improvement in scene text rendering capabilities in open-domain real images. These compelling outcomes aim to encourage further exploration in designing customized text encoders for diverse and challenging tasks.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# 予算リサイクルの差別化 Budget Recycling Differential Privacy ( http://arxiv.org/abs/2403.11445v4 ) ライセンス: Link先を確認	Bo Jiang, Jian Du, Sagar Sharma, Qiang Yan,	(参考訳) 差分プライバシー(DP)メカニズムは通常、厳格なプライバシー予算のために"アウト・オブ・バウンド"ノイズのある結果を生成することによって、データユーティリティを強制的に削減する。本稿では,既存のDPメカニズムに対して,ソフトバウンドなノイズ出力を提供するために,BR-DP(Budgetcycle Differential Privacy)フレームワークを導入する。ソフトバウンド”では、事前に定義されたエラー境界内でほとんどのアウトプットを解放し、ユーティリティを改善し、同時にプライバシを維持するメカニズムの能力について言及する。 BR-DPのコアは2つのコンポーネントから構成される: 繰り返しごとにノイズの答えを生成するDPカーネルと、ノイズの答えを確率的にリサイクルまたは再生するリサイクル器である。我々は, BR-DP のプライバシ会計を探求し, DP カーネルとリサイクルシステムの間で利用可能な予算を最適にサブアロケーションする予算策定の原則を策定する。さらに, 構成シナリオにおけるBR-DPの厳密な会計アルゴリズムを導入し, BR-DPは, DPに比べてプライバシー漏洩後のコンポジションの低減を実現していることを示す。さらに、BR-DPフレームワーク内でのサブサンプリングによるプライバシアンプリフィケーションの概念について検討し、様々なクエリに対するBR-DPの最適なサンプリングレートを提案する。実データを用いて実験を行い, BR-DPがDP機構によって提供されるユーティリティ・プライバシ・トレードオフを解除する効果を実証した。 Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within a predefined error boundary, thereby improving utility and maintaining privacy simultaneously. The core of BR-DP consists of two components: a DP kernel responsible for generating a noisy answer per iteration, and a recycler that probabilistically recycles/regenerates or releases the noisy answer. We delve into the privacy accounting of BR-DP, culminating in the development of a budgeting principle that optimally sub-allocates the available budget between the DP kernel and the recycler. Furthermore, we introduce algorithms for tight BR-DP accounting in composition scenarios, and our findings indicate that BR-DP achieves reduced privacy leakage post-composition compared to DP. Additionally, we explore the concept of privacy amplification via subsampling within the BR-DP framework and propose optimal sampling rates for BR-DP across various queries. We experiment with real data, and the results demonstrate BR-DP's effectiveness in lifting the utility-privacy tradeoff provided by DP mechanisms.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# 畳み込み層に対する Roesser 型の状態空間表現 State space representations of the Roesser type for convolutional layers ( http://arxiv.org/abs/2403.11938v2 ) ライセンス: Link先を確認	Patricia Pauli, Dennis Gramlich, Frank Allgöwer,	(参考訳) 制御理論の観点からは、畳み込み層(ニューラルネットワーク)は2-D(またはN-D)線形時間不変力学系である。畳み込みカーネルによる畳み込み層の通常の表現は、そのインパルス応答による力学系の表現に対応する。しかし、制御理論からの多くの解析ツール、例えば線型行列の不等式は状態空間表現を必要とする。この理由から、我々は、$c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ state, where $c_\mathrm{in}$/c_\mathrm{out}$は層の入出力チャネルの数であり、$r_1$/$r_2$は、畳み込みカーネルの幅と長さを特徴づける。この表現は$c_\mathrm{in} = c_\mathrm{out}$に対して最小であることが示されている。さらに、拡張、ストライド、N-D畳み込みのための状態空間表現を構築する。 From the perspective of control theory, convolutional layers (of neural networks) are 2-D (or N-D) linear time-invariant dynamical systems. The usual representation of convolutional layers by the convolution kernel corresponds to the representation of a dynamical system by its impulse response. However, many analysis tools from control theory, e.g., involving linear matrix inequalities, require a state space representation. For this reason, we explicitly provide a state space representation of the Roesser type for 2-D convolutional layers with $c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ states, where $c_\mathrm{in}$/$c_\mathrm{out}$ is the number of input/output channels of the layer and $r_1$/$r_2$ characterizes the width/length of the convolution kernel. This representation is shown to be minimal for $c_\mathrm{in} = c_\mathrm{out}$. We further construct state space representations for dilated, strided, and N-D convolutions.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# EnvGen: 人工呼吸器を訓練するためのLLMによる環境の生成と適応 EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents ( http://arxiv.org/abs/2403.12014v2 ) ライセンス: Link先を確認	Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal,	(参考訳) 近年のSOTAアプローチでは,環境における次のステップを決定するために,大規模言語モデル(LLM)を直接エージェントとして採用している。世界的知識と推論能力のため、LLMエージェントは強化学習(RL)に基づく従来のより小さなエージェントよりも高い性能を達成するが、LLMを頻繁に呼び出すのは遅くて高価である。 LLMをエージェントとして直接利用する代わりに、LLMの推論機能を使って、より小さなRLエージェントが、弱いスキルを学ぶのに役立つトレーニング環境を適応的に作成できますか? 本稿では,この問題に対処するための新しいフレームワークであるEnvGenを提案する。まず, LLMに, エージェントが学習すべきタスク記述とシミュレーションの目標を与え, 環境設定(例えば, 異なる地形, 当初エージェントに与えられた項目など)のセットを生成するように要求することで, トレーニング環境を生成するよう促す。次に、LLM生成環境とLLM生成環境を混合した小さなRLエージェントを訓練する。次に, LLMが生成した環境を継続的に適応させ, エージェントのパフォーマンスの形でLLMにフィードバックを提供することにより, エージェントが弱いスキルを徐々に向上させる。 Crafter および Heist 環境での総合的な実験により,EnvGen の有用性を実証する。我々は、EnvGenで訓練された小さなRLエージェントが、GPT-4エージェントを含むSOTAメソッドより優れており、長い水平タスクをかなり高速に学習できることを発見した。また,LLMを用いてカリキュラム学習の手法を動的に改善し,RLエージェントの能力向上にどのように適応するかを示す。さらに、EnvGenは、少数のLLMコール(例えば、合計4)しか使用していないのに対して、LLMエージェントは数千の呼び出しを必要とするため、かなり効率的である。最後に、EnvGen設計選択に関する詳細なアブレーション研究について述べる。 Recent SOTA approaches for embodied learning via interaction directly employ large language models (LLMs) as agents to determine the next steps in an environment. Due to their world knowledge and reasoning capabilities, LLM agents achieve stronger performance than previous smaller agents based on reinforcement learning (RL); however, frequently calling LLMs is slow and expensive. Instead of directly employing LLMs as agents, can we use LLMs' reasoning capabilities to adaptively create training environments to help smaller RL agents learn useful skills that they are weak at? We propose EnvGen, a novel framework to address this question. We first prompt an LLM to generate training environments by giving it the task description and simulator objectives that the agents should learn and then asking it to generate a set of environment configurations (e.g., different terrains, items initially given to agents, etc.). Next, we train a small RL agent in a mixture of the original and LLM-generated environments. Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance. We demonstrate the usefulness of EnvGen with comprehensive experiments in Crafter and Heist environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster. We also show that using an LLM to adapt environments dynamically outperforms curriculum learning approaches and how the environments are adapted to help improve RL agents' weaker skills over time. Additionally, EnvGen is substantially more efficient as it only uses a small number of LLM calls (e.g., 4 in total), whereas LLM agents require thousands of calls. Lastly, we present detailed ablation studies for EnvGen design choices.	翻訳日:2024-07-16 05:07:34 公開日:2024-07-12
# カメラローカライゼーションのためのニューラルボリュームポーズ特徴の学習 Learning Neural Volumetric Pose Features for Camera Localization ( http://arxiv.org/abs/2403.12800v4 ) ライセンス: Link先を確認	Jingyu Lin, Jiaqi Gu, Bojian Wu, Lubin Fan, Renjie Chen, Ligang Liu, Jieping Ye,	(参考訳) 本稿では,PoseMapと呼ばれるニューラルボリュームポーズ機能を導入し,画像と関連するカメラポーズの情報をカプセル化することで,カメラのローカライゼーションを強化する。我々のフレームワークは、拡張されたNeRFモジュールとともにAPR(Absolute Pose Regression)アーキテクチャを活用している。この統合は、トレーニングデータセットを豊かにする新しいビューの生成を促進するだけでなく、効果的なポーズ特徴の学習も可能にする。さらに、自己教師付きオンラインアライメントのためのアーキテクチャを拡張し、統合されたフレームワーク内で、未実装の画像に対してメソッドを使用および微調整できるようにします。室内および屋外のベンチマークシーンで平均14.28%, 20.51%の性能向上が得られた。 We introduce a novel neural volumetric pose feature, termed PoseMap, designed to enhance camera localization by encapsulating the information between images and the associated camera poses. Our framework leverages an Absolute Pose Regression (APR) architecture, together with an augmented NeRF module. This integration not only facilitates the generation of novel views to enrich the training dataset but also enables the learning of effective pose features. Additionally, we extend our architecture for self-supervised online alignment, allowing our method to be used and fine-tuned for unlabelled images within a unified framework. Experiments demonstrate that our method achieves 14.28% and 20.51% performance gain on average in indoor and outdoor benchmark scenes, outperforming existing APR methods with state-of-the-art accuracy.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# BaCon: バランスのとれた特徴レベルのコントラスト学習による非バランスな半教師あり学習の促進 BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning ( http://arxiv.org/abs/2403.12986v2 ) ライセンス: Link先を確認	Qianhan Feng, Lujing Xie, Shijie Fang, Tong Lin,	(参考訳) 半教師付き学習(SSL)は、ディープラーニングにおける広範なアノテーションの必要性を減らしますが、SSLにおける不均衡なデータ分散のより現実的な課題は、まだ明らかにされていません。クラス不均衡半教師学習(CISSL)では、信頼できない擬似ラベルによって引き起こされるバイアスは、不均衡なデータ分布によって悪化させることができる。既存のほとんどのメソッドは、再重み付けや再サンプリングを通じて、インスタンスレベルでこの問題に対処するが、パフォーマンスはバイアス付きバックボーン表現に依存しているため、非常に制限されている。その他の方法は、機能ブレンディングのような機能レベルの調整を行うが、好ましくないノイズをもたらす可能性がある。本稿では、CISSL問題に対するよりバランスのとれた特徴分布のボーナスについて論じ、さらにバランスのとれた特徴レベルコントラスト学習法(BaCon)を提案する。提案手法は、よく設計されたコントラスト的な方法で、インスタンスの表現の分布を直接正規化する。特に、クラスワイドの特徴中心は正のアンカーとして計算され、負のアンカーは単純で効果的なメカニズムによって選択される。分布関連温度調整を利用して、クラスワイドコントラストの度合いを動的に制御する。提案手法は, CIFAR10-LT, CIFAR100-LT, STL10-LT, SVHN-LTデータセットを様々な設定で包括的に実験することにより, その有効性を示す。例えば、BaConはCIFAR10-LTのインスタンスレベルのFixMatchベースのABCを1.21%の精度で上回り、CIFAR100-LTのCoSSLの精度は0.63%向上した。より極端な不均衡の度合いに直面すると、BaConは他の方法よりも堅牢性も向上する。 Semi-supervised Learning (SSL) reduces the need for extensive annotations in deep learning, but the more realistic challenge of imbalanced data distribution in SSL remains largely unexplored. In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions. Most existing methods address this issue at instance-level through reweighting or resampling, but the performance is heavily limited by their reliance on biased backbone representation. Some other methods do perform feature-level adjustments like feature blending but might introduce unfavorable noise. In this paper, we discuss the bonus of a more balanced feature distribution for the CISSL problem, and further propose a Balanced Feature-Level Contrastive Learning method (BaCon). Our method directly regularizes the distribution of instances' representations in a well-designed contrastive manner. Specifically, class-wise feature centers are computed as the positive anchors, while negative anchors are selected by a straightforward yet effective mechanism. A distribution-related temperature adjustment is leveraged to control the class-wise contrastive degrees dynamically. Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT datasets across various settings. For example, BaCon surpasses instance-level method FixMatch-based ABC on CIFAR10-LT with a 1.21% accuracy improvement, and outperforms state-of-the-art feature-level method CoSSL on CIFAR100-LT with a 0.63% accuracy improvement. When encountering more extreme imbalance degree, BaCon also shows better robustness than other methods.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# Find n' Propagate: 都市環境におけるオープンボキャブラリ3次元物体検出 Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments ( http://arxiv.org/abs/2403.13556v2 ) ライセンス: Link先を確認	Djamahl Etchegaray, Zi Huang, Tatsuya Harada, Yadan Luo,	(参考訳) 本研究では,従来のLiDARに基づく3次元オブジェクト検出システムの限界に対処する。都市環境におけるオープンボキャブラリ(OV)学習の探索は,複数センサデータを用いた事前学習型視覚言語モデル(VLM)を用いて,新規なインスタンスを捕捉することを目的としている。入力データ戦略に基づいて、トップダウンまたはボトムアップのアプローチに分類し、ベースラインとして4つの潜在的なソリューションを設計し、ベンチマークする。有効ではあるが、これらの手法は、3Dボックス推定における新しい物体の欠如や厳密な事前適用といった一定の制限を示しており、カメラや長方形地形の物体に偏りが生じる。これらの制約を克服するために、新しい物体のリコールを最大化し、この検出能力をより遠くまで伝播させることを目的として、3次元OVタスクに対して普遍的な \textsc{Find n' Propagate} アプローチを導入する。特に、グリーディボックス探索器を用いて、生成したフラストラムごとに異なる向きと深さの3D新鮮ボックスを探索し、クロスアライメントと密度ランク付けにより、新たに同定されたボックスの信頼性を確保する。さらに、カメラ近位物体に対する固有のバイアスは、メモリバンク内のベースサンプルの融合と相まって、自己学習プロセスにおいて擬似ラベル付き新規インスタンスをランダムに分散する遠隔シミュレーターによって軽減される。大規模な実験では、様々なOV設定、VLM、および3D検出器にまたがる新しいリコールが53%改善された。特に、新しいオブジェクトクラスに対する平均精度(AP)が最大3.97倍に向上する。ソースコードはhttps://github.com/djamahl99/findnpropagateで公開されている。 In this work, we tackle the limitations of current LiDAR-based 3D object detection systems, which are hindered by a restricted class vocabulary and the high costs associated with annotating new object classes. Our exploration of open-vocabulary (OV) learning in urban environments aims to capture novel instances using pre-trained vision-language models (VLMs) with multi-sensor data. We design and benchmark a set of four potential solutions as baselines, categorizing them into either top-down or bottom-up approaches based on their input data strategies. While effective, these methods exhibit certain limitations, such as missing novel objects in 3D box estimation or applying rigorous priors, leading to biases towards objects near the camera or of rectangular geometries. To overcome these limitations, we introduce a universal \textsc{Find n' Propagate} approach for 3D OV tasks, aimed at maximizing the recall of novel objects and propagating this detection capability to more distant areas thereby progressively capturing more. In particular, we utilize a greedy box seeker to search against 3D novel boxes of varying orientations and depth in each generated frustum and ensure the reliability of newly identified boxes by cross alignment and density ranker. Additionally, the inherent bias towards camera-proximal objects is alleviated by the proposed remote simulator, which randomly diversifies pseudo-labeled novel instances in the self-training process, combined with the fusion of base samples in the memory bank. Extensive experiments demonstrate a 53% improvement in novel recall across diverse OV settings, VLMs, and 3D detectors. Notably, we achieve up to a 3.97-fold increase in Average Precision (AP) for novel object classes. The source code is made available at https://github.com/djamahl99/findnpropagate.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# HAC:3次元ガウス切削圧縮のためのハッシュグリッド支援コンテキスト HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression ( http://arxiv.org/abs/2403.14530v3 ) ライセンス: Link先を確認	Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai,	(参考訳) 3D Gaussian Splatting (3DGS)は、新しいビュー合成のための有望なフレームワークとして登場し、高速レンダリング速度と高忠実さを誇っている。しかし、ガウスとその関連属性は効果的な圧縮技術を必要とする。それでも、ガウシアン(あるいは論文のアンカー)の点雲のスパースで非組織的な性質は、圧縮の課題を提示している。そこで我々は,非組織型アンカーと構造化ハッシュグリッドの関係を利用して,それらの相互情報をコンテキストモデリングに活用し,高度にコンパクトな3DGS表現のためのHash-grid Assisted Context(HAC)フレームワークを提案する。提案手法では, 連続的な空間的整合性を確立するための2値ハッシュグリッドを導入し, 慎重に設計した文脈モデルを用いて, アンカーの空間的関係を明らかにする。エントロピー符号化を容易にするために,我々はガウス分布を用いて各量子化属性の確率を正確に推定する。さらに,無効なガウスとアンカーを除去するために,適応的なマスキング戦略を取り入れた。重要なことは、我々の研究は3DGS表現の文脈ベースの圧縮を探求する先駆者であり、その結果、バニラ3DGSと比較して75ドル以上のコスト削減が達成され、同時に忠実度が向上し、SOTA3DGS圧縮アプローチであるScaffold-GSよりも11ドル以上のコスト削減が達成された。私たちのコードはこちらで入手可能です。 3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To address this, we make use of the relations between the unorganized anchors and the structured hash grid, leveraging their mutual information for context modeling, and propose a Hash-grid Assisted Context (HAC) framework for highly compact 3DGS representation. Our approach introduces a binary hash grid to establish continuous spatial consistencies, allowing us to unveil the inherent spatial relations of anchors through a carefully designed context model. To facilitate entropy coding, we utilize Gaussian distributions to accurately estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Additionally, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Importantly, our work is the pioneer to explore context-based compression for 3DGS representation, resulting in a remarkable size reduction of over $75\times$ compared to vanilla 3DGS, while simultaneously improving fidelity, and achieving over $11\times$ size reduction over SOTA 3DGS compression approach Scaffold-GS. Our code is available here: https://github.com/YihangChen-ee/HAC	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# 大規模モデルのためのパラメータ効率の良いファインチューニング:包括的調査 Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey ( http://arxiv.org/abs/2403.14608v6 ) ライセンス: Link先を確認	Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, Sai Qian Zhang,	(参考訳) 大規模モデルは、複数のアプリケーション分野における画期的な進歩を表しており、様々なタスクにおける顕著な達成を可能にしている。しかし、その前例のない規模には計算コストがかなり伴う。これらのモデルはしばしば数十億のパラメータで構成され、実行には膨大な量の計算資源を必要とする。特に、拡張スケールと計算要求は、特定の下流タスク、特に計算能力に制約されたハードウェアプラットフォームをカスタマイズする際に大きな課題を生じさせる。パラメータ効率の良いファインチューニング(PEFT)は、様々な下流タスクに対して大きなモデルを効率的に調整することで、実用的なソリューションを提供する。特にPEFTは、訓練済みの大規模モデルのパラメータを特定のタスクやドメインに適応させ、導入された追加パラメータの数や計算資源を最小限に抑えるプロセスを指す。これらのモデルをスクラッチから微調整することは、計算コストが高く、リソース集約的であり、システムプラットフォーム設計をサポートする上で大きな課題となるため、大規模な言語モデルに高いパラメータ数で対処する上で特に重要である。本稿では,様々なPEFTアルゴリズムの総合的な研究を行い,その性能と計算オーバーヘッドについて検討する。さらに,異なるPEFTアルゴリズムを用いて開発されたアプリケーションの概要を述べるとともに,PEFTの計算コストを軽減するための一般的な手法について議論する。アルゴリズムの観点からの広範な調査に加えて,様々なPEFT手法による実装コストを調査するために,実世界のシステム設計についても検討する。この調査は、PEFTアルゴリズムとそのシステム実装の両方を理解することを目的とした研究者にとって、必須のリソースとなる。 Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large models to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as an indispensable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed ......	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# 量子コンピュータを用いた絡み合い力学による素数同定 Using quantum computers to identify prime numbers via entanglement dynamics ( http://arxiv.org/abs/2403.14703v2 ) ライセンス: Link先を確認	Victor F. dos Santos, Jonas Maziero,	(参考訳) 近年,分離型コヒーレント状態に初期準備された2つの高調波発振器の絡み合いダイナミクスが,素数同定のための経路として実証された。本稿では、一般化されたアプローチを示し、スケーラブルなフォールトトレラント量子ビットベースの量子コンピュータにおけるこの理論概念の実装を可能にする決定論的アルゴリズムの概要を示す。本アルゴリズムで用いられる対角ユニタリ演算は,従来報告されていた一般対角ユニタリの指数的複雑性とは対照的に,次数2の多項式時間複雑性を示す。 Recently, the entanglement dynamics of two harmonic oscillators initially prepared in a separable-coherent state was demonstrated to offer a pathway for prime number identification. This article presents a generalized approach and outlines a deterministic algorithm making possible the implementation of this theoretical concept on scalable fault-tolerant qubit-based quantum computers. We prove that the diagonal unitary operations employed in our algorithm exhibit a polynomial-time complexity of degree two, contrasting with the previously reported exponential complexity of general diagonal unitaries.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# 大規模言語モデルはコンテキスト内を探索できるのか? Can large language models explore in-context? ( http://arxiv.org/abs/2403.15371v2 ) ライセンス: Link先を確認	Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins,	(参考訳) 本稿では,現代における大規模言語モデル(LLM)が,強化学習と意思決定における中核的能力である探索にどの程度関与できるかを考察する。既存のLLMのネイティブパフォーマンスをトレーニングの介入なしに重視する。簡単なマルチアームバンディット環境において, LLMをエージェントとしてデプロイし, LLMプロンプト内で環境記述とインタラクション履歴を完全にコンテキスト内で指定する。 GPT-3.5, GPT-4, および Llama2 を各種のプロンプト設計を用いて実験した結果, モデルが実質的な介入なしには探索に強く関与しないことが判明した。一すべての実験において、十分な統計として提示されたチェーン・オブ・ソート推論と外部要約された相互作用履歴を備えたGPT-4の1つの構成だけで十分な探索行動が得られた。 ii)他のすべての構成は、チェーン・オブ・シークレットの推論を行うが、未熟な歴史を持つものを含め、堅牢な探索行動には至らなかった。これらの知見は肯定的に解釈できるが、より複雑な環境では不可能かもしれない外部の要約は、LSMエージェントから望ましい行動を得るために重要であることを示唆している。我々は,LLMに基づく意思決定エージェントを複雑な設定で強化するために,微調整やデータセットキュレーションなどの非自明なアルゴリズム介入が必要であると結論付けている。 We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust exploratory behavior, including those with chain-of-thought reasoning but unsummarized history. Although these findings can be interpreted positively, they suggest that external summarization -- which may not be possible in more complex settings -- is important for obtaining desirable behavior from LLM agents. We conclude that non-trivial algorithmic interventions, such as fine-tuning or dataset curation, may be required to empower LLM-based decision making agents in complex settings.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# グラフ色問題に対する部分順序付けモデルのSAT符号化 SAT Encoding of Partial Ordering Models for Graph Coloring Problems ( http://arxiv.org/abs/2403.15961v2 ) ライセンス: Link先を確認	Daniel Faber, Adalat Jabrayilov, Petra Mutzel,	(参考訳) 本稿では,グラフ着色問題 (GCP) と帯域幅着色問題 (BCP) に対する部分順序付けベースLPモデルの新たなSAT符号化を提案する。 GCPは、与えられたグラフの頂点に割り当てられる最小の色数を求め、隣接する2つの頂点はそれぞれ異なる色を得る。 BCPは一般化であり、各エッジは、割り当てられた色の間に最小の「距離」を強制する重みを持ち、その目標は、使用される「最大の」色を最小化することである。広く研究されているGCPでは、新しいSATエンコーディングとDIMACSベンチマークセットの最先端アプローチを実験的に比較する。評価の結果、このSAT符号化はスパースグラフに有効であり、DIMACSインスタンスの最先端よりも優れていたことが確認された。 BCP では,部分順序付きSAT と ILP の定式化が古典的代入ベースモデルよりも漸近的に小さいことを示す。実際の評価では,代入ベースの符号化よりも,ベンチマークインスタンスの集合に対する最先端のアプローチの方が優位であることが確認されている。私たちの知る限り、BCPのいくつかのオープンな事例を文献から初めて解決しました。 In this paper, we suggest new SAT encodings of the partial-ordering based ILP model for the graph coloring problem (GCP) and the bandwidth coloring problem (BCP). The GCP asks for the minimum number of colors that can be assigned to the vertices of a given graph such that each two adjacent vertices get different colors. The BCP is a generalization, where each edge has a weight that enforces a minimal "distance" between the assigned colors, and the goal is to minimize the "largest" color used. For the widely studied GCP, we experimentally compare our new SAT encoding to the state-of-the-art approaches on the DIMACS benchmark set. Our evaluation confirms that this SAT encoding is effective for sparse graphs and even outperforms the state-of-the-art on some DIMACS instances. For the BCP, our theoretical analysis shows that the partial-ordering based SAT and ILP formulations have an asymptotically smaller size than that of the classical assignment-based model. Our practical evaluation confirms not only a dominance compared to the assignment-based encodings but also to the state-of-the-art approaches on a set of benchmark instances. Up to our knowledge, we have solved several open instances of the BCP from the literature for the first time.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# NeuSDFusion:3次元形状補完・再構成・生成のための空間認識生成モデル NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation ( http://arxiv.org/abs/2403.18241v2 ) ライセンス: Link先を確認	Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan Ji,	(参考訳) 3D形状生成は、特定の条件や制約に固執する革新的な3Dコンテンツを作成することを目的としている。既存の方法では、しばしば3次元形状を局所成分の列に分解し、各要素を空間的一貫性を考慮せずに分離して扱う。その結果、これらの手法は、3次元データ表現と形状生成において限られた汎用性を示し、指定された制約を満たす高度に多様な3次元形状を生成する能力を妨げている。本稿では,2次元平面表現を利用した空間認識型3次元形状生成フレームワークを提案する。空間コヒーレンスを確保し,メモリ使用量を削減するため,直交2次元平面を用いて3次元形状の連続符号付き距離場表現を直接学習するハイブリッド形状表現手法を組み込んだ。さらに,トランスを用いたオートエンコーダ構造を用いて,異なる平面間の空間的対応を慎重に実施し,生成した3次元形状における空間的関係の保存を促進する。これにより、無条件形状生成、マルチモーダル形状完了、単一ビュー再構成、テキスト・ツー・シェイプ合成など、様々なタスクにおける最先端の3D形状生成手法を一貫して上回るアルゴリズムが得られる。私たちのプロジェクトページはhttps://weizheliu.github.io/NeuSDFusion/ で公開されています。 3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation without considering spatial consistency. As a result, these approaches exhibit limited versatility in 3D data representation and shape generation, hindering their ability to generate highly diverse 3D shapes that comply with the specified constraints. In this paper, we introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling. To ensure spatial coherence and reduce memory usage, we incorporate a hybrid shape representation technique that directly learns a continuous signed distance field representation of the 3D shape using orthogonal 2D planes. Additionally, we meticulously enforce spatial correspondences across distinct planes using a transformer-based autoencoder structure, promoting the preservation of spatial relationships in the generated 3D shapes. This yields an algorithm that consistently outperforms state-of-the-art 3D shape generation methods on various tasks, including unconditional shape generation, multi-modal shape completion, single-view reconstruction, and text-to-shape synthesis. Our project page is available at https://weizheliu.github.io/NeuSDFusion/ .	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# 深部因果生成モデルの半教師付き学習 Semi-Supervised Learning for Deep Causal Generative Models ( http://arxiv.org/abs/2403.18717v2 ) ライセンス: Link先を確認	Yasin Ibrahim, Hermione Warr, Konstantinos Kamnitsas,	(参考訳) yがzであった場合、xはどのように変化するか」という形式の疑問に答えることのできるモデルを開発することは、医療画像解析の進歩に不可欠である。しかし、このような反事実的問題に対処する因果生成モデルの訓練には、現在、すべての関連する変数が観察され、対応するラベルがトレーニングデータで利用可能であることが要求されている。しかし、臨床データは全患者の完全な記録を持っておらず、最先端の因果生成モデルでは十分に活用できない。そこで本研究では,変数間の因果関係を利用して全データの利用を最大化する半教師付き深い因果生成モデルを開発した。それぞれのサンプルが完全にラベル付けされているか、完全にラベル付けされていないかで、また各サンプルに異なるラベルが欠落しているというより臨床的に現実的なケースでこれを調査する。不完全なラベルを持つサンプルであっても、因果推論の手法を利用して、欠落した値を推測し、現実的な反事実を生成する。 Developing models that are capable of answering questions of the form "How would x change if y had been z?'" is fundamental to advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that the corresponding labels are available in the training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# ペアワイズ選好比較によるメトリックラーニング Metric Learning from Limited Pairwise Preference Comparisons ( http://arxiv.org/abs/2403.19629v2 ) ライセンス: Link先を確認	Zhi Wang, Geelon So, Ramya Korlakai Vinayak,	(参考訳) 理想点モデルに基づく選好比較からメートル法学習について検討し、潜在する理想点に近ければ、ある項目を他の項目よりも好んで選択する。これらのアイテムは、ユーザ間で共有される未知のマハラノビス距離を備えた$\mathbb{R}^d$に埋め込まれる。最近の研究は、$\mathcal{O}(d)$ペアワイズの比較を1人あたり$\mathcal{O}(d)$で同時に回収できることを示しているが、実際には$o(d)$比較の限られた予算を持つことが多い。個人理想の項目を学習することはもはや不可能であるにもかかわらず、この指標が依然として回復可能であるかどうかを考察する。一般に、$o(d)$比較は、無限に多くのユーザでさえ、計量に関する情報を示さないことを示す。しかし、低次元構造を示す項目を比較した場合、各利用者は低次元部分空間に制限された計量を学習して、計量を共同で識別することができる。そこで本稿では,この問題を解決し,理論的回復保証と実証的検証を提供する。 We study metric learning from preference comparisons under the ideal point model, in which a user prefers an item over another if it is closer to their latent ideal item. These items are embedded into $\mathbb{R}^d$ equipped with an unknown Mahalanobis distance shared across users. While recent work shows that it is possible to simultaneously recover the metric and ideal items given $\mathcal{O}(d)$ pairwise comparisons per user, in practice we often have a limited budget of $o(d)$ comparisons. We study whether the metric can still be recovered, even though it is known that learning individual ideal items is now no longer possible. We show that in general, $o(d)$ comparisons reveal no information about the metric, even with infinitely many users. However, when comparisons are made over items that exhibit low-dimensional structure, each user can contribute to learning the metric restricted to a low-dimensional subspace so that the metric can be jointly identified. We present a divide-and-conquer approach that achieves this, and provide theoretical recovery guarantees and empirical validation.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# GraspXL: スケールでの異物に対するグラッピング運動の生成 GraspXL: Generating Grasping Motions for Diverse Objects at Scale ( http://arxiv.org/abs/2403.19649v2 ) ライセンス: Link先を確認	Hui Zhang, Sammy Christen, Zicong Fan, Otmar Hilliges, Jie Song,	(参考訳) 人間の手は、対象の特定の部分をつかんだり、望ましい方向から近づいたりするなど、多様な物体と相互作用する器用さを持っている。さらに重要なのは、人間は物体固有のスキルを使わずにあらゆる形の物体を把握できるということです。近年の作業では、所望の進路方向や把握領域などの単一目的に追従する把握動作を合成している。さらに、トレーニングや推論の間、高価な3Dハンドオブジェクトデータに頼っているため、大規模に見えない物体の把握動作を合成する能力が制限される。本論文では、政策学習フレームワークGraspXLにおいて、複数の運動対象物、多様な物体形状、および器用な手形態にまたがる手対象把握動作の生成を統一する。目的は、把握可能な領域、接近中の方向、手首回転、手の位置から成り立っている。 3Dハンドオブジェクトのインタラクションデータを必要としないため、58個のオブジェクトでトレーニングされたポリシーは、成功率82.2%の500万以上の未確認オブジェクトに対する多様な把握動作を堅牢に合成することができる。同時に、ポリシーは目的に固執し、オブジェクトごとの多様な把握の生成を可能にする。さらに、我々のフレームワークは、異なるデクスタラスハンドにデプロイされ、再構成または生成されたオブジェクトで作業可能であることを示す。提案手法の有効性を定量的に,質的に評価した。私たちのモデル、コード、そして大規模な生成されたモーションはhttps://eth-ait.github.io/graspxl/.com/で利用可能です。 Human hands possess the dexterity to interact with diverse objects such as grasping specific parts of the objects and/or approaching them from desired directions. More importantly, humans can grasp objects of any shape without object-specific skills. Recent works synthesize grasping motions following single objectives such as a desired approach heading direction or a grasping area. Moreover, they usually rely on expensive 3D hand-object data during training and inference, which limits their capability to synthesize grasping motions for unseen objects at scale. In this paper, we unify the generation of hand-object grasping motions across multiple motion objectives, diverse object shapes and dexterous hand morphologies in a policy learning framework GraspXL. The objectives are composed of the graspable area, heading direction during approach, wrist rotation, and hand position. Without requiring any 3D hand-object interaction data, our policy trained with 58 objects can robustly synthesize diverse grasping motions for more than 500k unseen objects with a success rate of 82.2%. At the same time, the policy adheres to objectives, which enables the generation of diverse grasps per object. Moreover, we show that our framework can be deployed to different dexterous hands and work with reconstructed or generated objects. We quantitatively and qualitatively evaluate our method to show the efficacy of our approach. Our model, code, and the large-scale generated motions are available at https://eth-ait.github.io/graspxl/.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# 潜伏拡散空間における潜伏透かし:潜伏拡散空間における透かしの注入と検出 Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space ( http://arxiv.org/abs/2404.00230v2 ) ライセンス: Link先を確認	Zheling Meng, Bo Peng, Jing Dong,	(参考訳) ウォーターマーキング(英: Watermarking)は、潜伏拡散モデルによって生成された画像を積極的に識別し、帰属するツールである。既存の手法は、画質と透かしの堅牢性のジレンマに直面している。画像品質の優れた透かしは通常、ぼかしやJPEG圧縮などの攻撃に対して弱い頑健さを持つが、優れた強靭性を持つ透かしは通常、画像品質に著しくダメージを与える。このジレンマは、透かしがピクセル空間に注入され、検出される伝統的なパラダイムに由来し、透かしの検出と攻撃に対するレジリエンスにピクセルの摂動に依存している。本稿では,潜伏拡散空間における透かしの注入と検出を効果的に行うことを強調し,進行的学習戦略を用いた潜伏透かしを提案する。品質とロバスト性の間の直接的な関係を弱め、矛盾を和らげる。 2つのデータセットと10のウォーターマーク攻撃に対して評価を行う。 6のメトリクスは、画質と透かしの堅牢性を測定する。その結果、StegaStamp、StableSignature、RoSteALS、TreeRingといった最近提案された手法と比較して、LWはロバスト性だけでなく、画質も優れていることがわかった。私たちのコードはhttps://github.com/RichardSunnyMeng/LatentWatermarkで公開されます。 Watermarking is a tool for actively identifying and attributing the images generated by latent diffusion models. Existing methods face the dilemma of image quality and watermark robustness. Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression, while watermarks with superior robustness usually significantly damage image quality. This dilemma stems from the traditional paradigm where watermarks are injected and detected in pixel space, relying on pixel perturbation for watermark detection and resilience against attacks. In this paper, we highlight that an effective solution to the problem is to both inject and detect watermarks in the latent diffusion space, and propose Latent Watermark with a progressive training strategy. It weakens the direct connection between quality and robustness and thus alleviates their contradiction. We conduct evaluations on two datasets and against 10 watermark attacks. 6 metrics measure the image quality and watermark robustness. Results show that compared to the recently proposed methods such as StegaStamp, StableSignature, RoSteALS, and TreeRing, LW not only surpasses them in terms of robustness but also offers superior image quality. Our code will be available at https://github.com/RichardSunnyMeng/LatentWatermark.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# LAKE-RED:潜在背景知識検索拡散によるカモフラージュ画像の生成 LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion ( http://arxiv.org/abs/2404.00292v4 ) ライセンス: Link先を確認	Pancheng Zhao, Peng Xu, Pengda Qin, Deng-Ping Fan, Zhicheng Zhang, Guoli Jia, Bowen Zhou, Jufeng Yang,	(参考訳) カモフラージュされた視覚知覚は、多くの実用的な応用において重要な視覚課題である。高価な収集とラベル付けコストのため、このコミュニティはデータセットの種分類が少数の対象種に限られているという大きなボトルネックに直面している。しかし、既存のカモフラージュ生成法では、手動でバックグラウンドを指定する必要があるため、カモフラージュされたサンプルの多様性を低コストで拡張できない。本稿では,カモフラージュ画像生成のための潜在背景知識検索拡散(LAKE-RED)を提案する。 1) 背景入力を受信する必要のないカモフラージュ生成パラダイムを提案する。 2) LAKE-REDは, カモフラージュ生成のための解釈可能性を持つ最初の知識検索拡張手法であり, タスク固有の課題を軽減するために, 知識検索と推論の強化を明示的に分離する考え方を提案する。さらに,本手法は特定の前景的対象や背景に限らず,より多様な領域に視知覚を拡大する可能性がある。実験の結果,提案手法は既存の手法よりも優れ,よりリアルなカモフラージュ画像を生成することがわかった。 Camouflaged vision perception is an important vision task with numerous practical applications. Due to the expensive collection and labeling costs, this community struggles with a major bottleneck that the species category of its datasets is limited to a small number of object species. However, the existing camouflaged generation methods require specifying the background manually, thus failing to extend the camouflaged sample diversity in a low-cost manner. In this paper, we propose a Latent Background Knowledge Retrieval-Augmented Diffusion (LAKE-RED) for camouflaged image generation. To our knowledge, our contributions mainly include: (1) For the first time, we propose a camouflaged generation paradigm that does not need to receive any background inputs. (2) Our LAKE-RED is the first knowledge retrieval-augmented method with interpretability for camouflaged generation, in which we propose an idea that knowledge retrieval and reasoning enhancement are separated explicitly, to alleviate the task-specific challenges. Moreover, our method is not restricted to specific foreground targets or backgrounds, offering a potential for extending camouflaged vision perception to more diverse domains. (3) Experimental results demonstrate that our method outperforms the existing approaches, generating more realistic camouflage images.	翻訳日:2024-07-16 04:57:27 公開日:2024-07-12
# SceneGraphLoc: 3D Scene Graph上でのクロスモーダル粗なビジュアルローカライゼーション SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs ( http://arxiv.org/abs/2404.00469v3 ) ライセンス: Link先を確認	Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Dániel Béla Baráth,	(参考訳) 本稿では,3次元シーングラフのデータベースで表されるマルチモーダル参照マップ内の入力画像の局所化という,新たな問題を紹介する。これらのグラフは、オブジェクトレベルの点雲、画像、属性、オブジェクト間の関係を含む複数のモードから構成されており、広範囲な画像データベースに依存する従来の方法に対する軽量で効率的な代替手段を提供する。提案手法であるSceneGraphLocは、利用可能なモダリティを考慮し、シーングラフ内の各ノード(すなわちオブジェクトインスタンスを表す)に対する固定サイズの埋め込みを学習し、入力されたクエリ画像に表示されるオブジェクトとの効果的なマッチングを可能にする。この戦略は、地図埋め込みにイメージを組み込むことなく、他のクロスモーダル手法よりも大幅に優れている。画像を利用する場合、SceneGraphLocは、大規模な画像データベースに依存する最先端技術に近いパフォーマンスを達成すると同時に、3つの命令の保存を減らし、命令の処理を高速化する。コードは公開されます。 We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given the available modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing an object instance) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map embeddings. When images are leveraged, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. The code will be made public.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# インストラクションチューニングによる顔の感情行動解析 Facial Affective Behavior Analysis with Instruction Tuning ( http://arxiv.org/abs/2404.05052v2 ) ライセンス: Link先を確認	Yifan Li, Anh Dao, Wentao Bao, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong,	(参考訳) 顔の感情行動分析(FABA)は、画像から人間の精神状態を理解するために重要である。しかし、従来のアプローチは、主に個別の感情カテゴリーを識別するためのモデルをデプロイし、複雑な顔の振る舞いに対する細かい粒度と推論能力が欠如している。 MLLM(Multi-modal Large Language Models)の出現は、一般的な視覚的理解タスクにおいて成功している。しかし、データセットやベンチマークの不足、顔の事前知識の無視、トレーニング効率の低下など、MLLMを直接FABAに活用することは難しい。これらの課題に対処するために、私たちは i)2つのFABAタスクのための指示追従データセット。例えば、感情と行動単位認識。 (ii)認識能力と生成能力の両方を考慮した新しい指標を持つベンチマークFABA-Bench (三)コミュニティの強力な基盤となる新しいMLLM「エモラ」。データセットとベンチマークに関する我々のイニシアチブは、顔の感情行動の性質と理性、すなわち、きめ細かい顔の動き、解釈可能性、推論を明らかにする。さらに,FABA MLLMを効果的かつ効率的に構築するために,顔構造知識と低ランク適応モジュールを事前訓練したMLLMに導入する。 FABA-Benchと4つの一般的なFABAデータセットについて広範な実験を行った。以上の結果から,提案した顔前エキスパートはパフォーマンスを向上し,EmoLAはFABA-Benchで最高の結果を得ることができた。一般的に使用されるFABAデータセットでは、EmoLAはタスク固有の最先端モデルと競合する。 Facial affective behavior analysis (FABA) is crucial for understanding human mental states from images. However, traditional approaches primarily deploy models to discriminate among discrete emotion categories, and lack the fine granularity and reasoning capability for complex facial behaviors. The advent of Multi-modal Large Language Models (MLLMs) has been proven successful in general visual understanding tasks. However, directly harnessing MLLMs for FABA is challenging due to the scarcity of datasets and benchmarks, neglecting facial prior knowledge, and low training efficiency. To address these challenges, we introduce (i) an instruction-following dataset for two FABA tasks, e.g., emotion and action unit recognition, (ii) a benchmark FABA-Bench with a new metric considering both recognition and generation ability, and (iii) a new MLLM "EmoLA" as a strong baseline to the community. Our initiative on the dataset and benchmarks reveal the nature and rationale of facial affective behaviors, i.e., fine-grained facial movement, interpretability, and reasoning. Moreover, to build an effective and efficient FABA MLLM, we introduce a facial prior expert module with face structure knowledge and a low-rank adaptation module into pre-trained MLLM. We conduct extensive experiments on FABA-Bench and four commonly-used FABA datasets. The results demonstrate that the proposed facial prior expert can boost the performance and EmoLA achieves the best results on our FABA-Bench. On commonly-used FABA datasets, EmoLA is competitive rivaling task-specific state-of-the-art models.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing ( http://arxiv.org/abs/2404.05309v2 ) ライセンス: Link先を確認	Philipp Rigoll, Laurenz Adolph, Lennart Ries, Eric Sax,	(参考訳) 認識システム、特にカメラは自動走行システムの目玉だ。確実かつ堅牢に機能することを保証することは、車両の自動化において重要なビルディングブロックである。自動走行システムの認識をテストするには様々な方法がある。しかし、究極的には、それは常に特定の入力データの下での知覚システムの振舞いの調査に繋がる。カメラ画像は入力データの重要な部分である。そのため、自動走行システムのテストのために画像データセットが収集されるが、これらのデータセットに特定の画像を見つけることは容易ではない。ニューラルネットワークの最近の進歩により、自然言語のプロンプトと類似性に応じてデータセット内の画像をソートする手法が現在存在する。検索結果の提供をさらに自動化するために、これらのソート結果のしきい値定義を自動化し、結果としてプロンプトに関連する画像のみを返すことでコントリビューションを行う。私たちの焦点は、偽陽性と偽陰性を平等に防止することにあります。また,本手法が堅牢であり,仮定が満たされていない場合には,フォールバックソリューションを提供することも重要である。 Perception systems, especially cameras, are the eyes of automated driving systems. Ensuring that they function reliably and robustly is therefore an important building block in the automation of vehicles. There are various approaches to test the perception of automated driving systems. Ultimately, however, it always comes down to the investigation of the behavior of perception systems under specific input data. Camera images are a crucial part of the input data. Image data sets are therefore collected for the testing of automated driving systems, but it is non-trivial to find specific images in these data sets. Thanks to recent developments in neural networks, there are now methods for sorting the images in a data set according to their similarity to a prompt in natural language. In order to further automate the provision of search results, we make a contribution by automating the threshold definition in these sorted results and returning only the images relevant to the prompt as a result. Our focus is on preventing false positives and false negatives equally. It is also important that our method is robust and in the case that our assumptions are not fulfilled, we provide a fallback solution.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# BISCUIT:計算ノートにおける一時UIによるLLM生成コードの共有 BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks ( http://arxiv.org/abs/2404.07387v3 ) ライセンス: Link先を確認	Ruijia Cheng, Titus Barik, Alan Leung, Fred Hohman, Jeffrey Nichols,	(参考訳) プログラマは計算ノートブックの機械学習チュートリアルに頻繁に携わり、大規模言語モデル(LLM)に基づいたコード生成技術を採用してきた。しかし、LLMが生成したコードを理解し、操作することの難しさに直面する。これらの課題を軽減するため,ユーザプロンプトとコード生成の中間段階としてユーザUIスキャフォールドを提供するとともに,LLMベースのコード生成を一時UIステップで強化する新しいワークフローを計算ノートに導入する。このワークフローは、JupyterLabの拡張機能であるBISCUITで、ユーザに対して、コードと意図のコンテキストに基づいてLLMが生成した短命なUIを提供し、ユーザがLLM生成コードを理解し、ガイドし、探索するための足場を提供する。 10人の初心者が機械学習チュートリアルにBISCUITを使用したユーザスタディを通じて、BISCUITはユーザの理解を助けるためのコードの表現を提供し、迅速なエンジニアリングの複雑さを低減し、ユーザが異なる変数を探索し、アイデアを反復するための遊び場を作成する。 Programmers frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through a user study where 10 novices used BISCUIT for machine learning tutorials, we found that BISCUIT offers users representations of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# オムニサット:地球観測のための自己監督されたモーダリティ融合 OmniSat: Self-Supervised Modality Fusion for Earth Observation ( http://arxiv.org/abs/2404.08351v2 ) ライセンス: Link先を確認	Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu,	(参考訳) 地球観測(EO)の分野は、多様なセンサーからの豊富なデータを提供し、自己監督型マルチモーダル学習を前進させる大きな機会を提供する。しかし、現在のマルチモーダルなEOデータセットとモデルは、単一のデータタイプ、すなわちモノデート画像または時系列に焦点を合わせ、表現性を制限している。 OmniSatは,複数のEOモダリティ間の空間的アライメントを利用して,ラベルのない表現型マルチモーダル表現を学習する新しいアーキテクチャである。異なる性質のモダリティを組み合わせる利点を示すため、既存の2つのデータセットを新しいモダリティで拡張する。下流の3つの課題:林業、土地被覆分類、作物マッピング。 OmniSatは、教師なしの方法でリッチな表現を学習することができ、推論に1つのモダリティしか利用できない場合でも、半教師付き設定と完全教師付き設定のパフォーマンスが改善される。コードとデータセットはhttps://github.com/gastruc/OmniSat.comで入手できる。 The field of Earth Observations (EO) offers a wealth of data from diverse sensors, presenting a great opportunity for advancing self-supervised multimodal learning. However, current multimodal EO datasets and models focus on a single data type, either mono-date images or time series, which limits their expressivity. We introduce OmniSat, a novel architecture that exploits the spatial alignment between multiple EO modalities to learn expressive multimodal representations without labels. To demonstrate the advantages of combining modalities of different natures, we augment two existing datasets with new modalities. As demonstrated on three downstream tasks: forestry, land cover classification, and crop mapping. OmniSat can learn rich representations in an unsupervised manner, leading to improved performance in the semi- and fully-supervised settings, even when only one modality is available for inference. The code and dataset are available at https://github.com/gastruc/OmniSat.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# OneActor: クラスタ駆動誘導による一貫性キャラクタ生成 OneActor: Consistent Character Generation via Cluster-Conditioned Guidance ( http://arxiv.org/abs/2404.10267v2 ) ライセンス: Link先を確認	Jiahao Wang, Caixia Yan, Haonan Lin, Weizhan Zhang, Mengmeng Wang, Tieliang Gong, Guang Dai, Hao Sun,	(参考訳) テキストから画像への拡散モデルは、高品質な画像生成でアーティストに恩恵を与える。しかし、彼らの確率的な性質は、アーティストが同じ主題の一貫性のあるイメージを作成するのを妨げる。既存の手法はこの課題に取り組み、様々な方法で一貫性のあるコンテンツを生成する。しかし、それらは外部の制限されたデータに依存するか、拡散モデルの高価なチューニングを必要とする。本稿では,OneActorと呼ばれる新しいワンショットチューニングパラダイムを提案する。学習したセマンティックガイダンスを通じてのみプロンプトによって駆動される一貫した主題生成を効率よく実行し、面倒なバックボーンチューニングを回避します。我々は、クラスタリングの観点から一貫した主題生成の目的を定式化し、クラスタ条件モデルの設計を導く。ワンショットチューニングパイプラインが共有するオーバーフィッティングの課題を軽減するため、補助的なサンプルによるチューニングを強化し、セマンティック補間とクラスタガイダンスという2つの推論戦略を考案する。これらの技術は後に、生成品質を著しく向上させるために検証される。包括的実験により,本手法は,良好な主観的整合性,即時整合性,高画質で,様々なベースラインに優れることが示された。提案手法は多目的生成が可能であり, 一般的な拡散拡張と互換性がある。さらに、チューニングベースのベースラインよりも4倍高速なチューニング速度を実現し、望めば推論時間の増加を回避できる。さらに、我々の知る限り、拡散モデルの意味空間が潜在空間と同じ補間性を持っていることを初めて証明する。この特性は、ファインジェネレーション制御のためのもう1つの有望なツールとして機能する。 Text-to-image diffusion models benefit artists with high-quality image generation. Yet their stochastic nature hinders artists from creating consistent images of the same subject. Existing methods try to tackle this challenge and generate consistent content in various ways. However, they either depend on external restricted data or require expensive tuning of the diffusion model. For this issue, we propose a novel one-shot tuning paradigm, termed as OneActor. It efficiently performs consistent subject generation solely driven by prompts via a learned semantic guidance to bypass the laborious backbone tuning. We lead the way to formalize the objective of consistent subject generation from a clustering perspective, and thus design a cluster-conditioned model. To mitigate the overfitting challenge shared by one-shot tuning pipelines, we augment the tuning with auxiliary samples and devise two inference strategies: semantic interpolation and cluster guidance. These techniques are later verified to significantly enhance the generation quality. Comprehensive experiments show that our method outperforms a variety of baselines with satisfactory subject consistency, superior prompt conformity as well as high image quality. Our method is capable of multi-subject generation and compatible with popular diffusion extensions. Besides, we achieve a 4 times faster tuning speed than tuning-based baselines and, if desired, avoid increasing inference time. Furthermore, to our best knowledge, we are the first to prove that the semantic space of the diffusion model has the same interpolation property as the latent space does. This property can serve as another promising tool for fine generation control.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# VRにおける視線駆動認証性能のベースライン構築:超大規模データセットに関する第1報 Establishing a Baseline for Gaze-driven Authentication Performance in VR: A Breadth-First Investigation on a Very Large Dataset ( http://arxiv.org/abs/2404.11798v2 ) ライセンス: Link先を確認	Dillon Lohr, Michael J. Proulx, Oleg Komogortsev,	(参考訳) 本稿では,9202人の視線追跡(ET)信号品質を現代消費者向けバーチャルリアリティ(VR)プラットフォームと同等とした非常に大規模な視線記録データセットを用いて,視線駆動型認証性能のベースラインを確立するための重要な作業を行う。採用データセットのサイズは、少なくとも以前の関連する作業から得られた他のデータセットよりも大きくなっています。本モデルでは,眼の視軸と視軸の両眼的推定値と,眼球運動の受入と検証に最低限の期間を要し,偽受容率(FAR)で3%未満の偽拒絶率(FRR)を5万分の1で達成する。ギャラリーサイズとともに減少する識別精度については,ギャラリーサイズが148,000以上の場合,我々のモデルがチャンスレベルの精度を下回ると推定する。我々の主要な発見は、最先端の機械学習アーキテクチャと十分に大きなトレーニングデータセットによって駆動される場合、視線認証はFIDO標準で必要とされるように正確であることを示している。 This paper performs the crucial work of establishing a baseline for gaze-driven authentication performance to begin answering fundamental research questions using a very large dataset of gaze recordings from 9202 people with a level of eye tracking (ET) signal quality equivalent to modern consumer-facing virtual reality (VR) platforms. The size of the employed dataset is at least an order-of-magnitude larger than any other dataset from previous related work. Binocular estimates of the optical and visual axes of the eyes and a minimum duration for enrollment and verification are required for our model to achieve a false rejection rate (FRR) of below 3% at a false acceptance rate (FAR) of 1 in 50,000. In terms of identification accuracy which decreases with gallery size, we estimate that our model would fall below chance-level accuracy for gallery sizes of 148,000 or more. Our major findings indicate that gaze authentication can be as accurate as required by the FIDO standard when driven by a state-of-the-art machine learning architecture and a sufficiently large training dataset.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# 英語からウクライナ語への機械翻訳を改良したデータプリンタのセットアップ Setting up the Data Printer with Improved English to Ukrainian Machine Translation ( http://arxiv.org/abs/2404.15196v2 ) ライセンス: Link先を確認	Yurii Paniv, Dmytro Chaplynskyi, Nikita Trynus, Volodymyr Kyrylov,	(参考訳) ウクライナ語のための大規模な言語モデルを構築するには、自然言語で表現された大量の新しいアルゴリズムタスクでコーパスを拡張する必要がある。英語で表現されたタスクパフォーマンスの例は豊富であるため、高品質な翻訳システムでは、コミュニティがデータセットを高速にキュレートすることが可能になります。この目的を達成するために、ウクライナ語と英語の3M対のノイズの多い並列データセットを用いた大規模事前学習言語モデルの教師付き微調整を用いた翻訳システムの構築法を紹介し、それに続いて、k-fold perplexity filtering(k-fold perplexity filtering)によって選択された17K例を高品質のデータセット上で選択した第2フェーズのトレーニングを行う。我々のデコーダのみのモデルであるDragomanは、FLORESのデペレーティングセットにおける従来の最先端のエンコーダ-デコーダモデルのパフォーマンスを上回りました。 To build large language models for Ukrainian we need to expand our corpora with large amounts of new algorithmic tasks expressed in natural language. Examples of task performance expressed in English are abundant, so with a high-quality translation system our community will be enabled to curate datasets faster. To aid this goal, we introduce a recipe to build a translation system using supervised finetuning of a large pretrained language model with a noisy parallel dataset of 3M pairs of Ukrainian and English sentences followed by a second phase of training using 17K examples selected by k-fold perplexity filtering on another dataset of higher quality. Our decoder-only model named Dragoman beats performance of previous state of the art encoder-decoder models on the FLORES devtest set.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# TOP-Nav:Terrin, Obstacle, Proprioception Estimationを統合した脚付きナビゲーション TOP-Nav: Legged Navigation Integrating Terrain, Obstacle and Proprioception Estimation ( http://arxiv.org/abs/2404.15256v3 ) ライセンス: Link先を確認	Junli Ren, Yikai Liu, Yingru Dai, Junfeng Long, Guijin Wang,	(参考訳) 脚のついたナビゲーションは通常、オープンワールド、オフロード、挑戦的な環境で検査される。これらのシナリオでは、外乱を推定するには、多重モーダル情報の複雑な合成が必要である。これは、主に障害を避けることに焦点を当てた既存の作業において、大きな制限となる。本研究では,包括的パスプランナとTerrain認識,Obstacle回避,クローズループプロプライオセプションを統合した新しい脚付きナビゲーションフレームワークTOP-Navを提案する。 TOP-Navは、経路計画と運動計画の両方において、視覚とプロプレセプションの相乗効果を強調している。経路プランナ内では、障害物を効果的に回避しつつ、高い走行性を有する地形上の経路をロボットが選択できる地形推定器を提示し、統合する。動作計画レベルでは、ナビゲーションコマンドを追跡するために移動制御器を実装できるだけでなく、経路プランナーに動作評価を提供するための受容アドバイザも構築する。クローズループ動作フィードバックに基づいて、視覚に基づく地形と障害物推定のオンライン修正を行う。そのため、TOP-Navは、ロボットが以前の知識の分布を超えて地形や乱れを扱えるように、オープンワールドナビゲーションを実現し、視覚条件によって課される制約を克服する。 TOP-Navは、シミュレーションと実世界の環境の両方で実施された広範な実験に基づいて、既存の手法と比較して、オープンワールドナビゲーションにおいて優れた性能を示す。 Legged navigation is typically examined within open-world, off-road, and challenging environments. In these scenarios, estimating external disturbances requires a complex synthesis of multi-modal information. This underlines a major limitation in existing works that primarily focus on avoiding obstacles. In this work, we propose TOP-Nav, a novel legged navigation framework that integrates a comprehensive path planner with Terrain awareness, Obstacle avoidance and close-loop Proprioception. TOP-Nav underscores the synergies between vision and proprioception in both path and motion planning. Within the path planner, we present and integrate a terrain estimator that enables the robot to select waypoints on terrains with higher traversability while effectively avoiding obstacles. In the motion planning level, we not only implement a locomotion controller to track the navigation commands, but also construct a proprioception advisor to provide motion evaluations for the path planner. Based on the close-loop motion feedback, we make online corrections for the vision-based terrain and obstacle estimations. Consequently, TOP-Nav achieves open-world navigation that the robot can handle terrains or disturbances beyond the distribution of prior knowledge and overcomes constraints imposed by visual conditions. Building upon extensive experiments conducted in both simulation and real-world environments, TOP-Nav demonstrates superior performance in open-world navigation compared to existing methods.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# ODMixer:Metro Origin-Destination Predictionのための微細な時空間MLP ODMixer: Fine-grained Spatial-temporal MLP for Metro Origin-Destination Prediction ( http://arxiv.org/abs/2404.15734v2 ) ライセンス: Link先を確認	Yang Liu, Binglin Chen, Yongsen Zheng, Guanbin Li, Liang Lin,	(参考訳) Metro Origin-Destination (OD) 予測は、都市コンピューティングにおいて重要な時空間予測課題であり、メトロスケジューリングを最適化し、全体の輸送効率を向上させるために、クロスステーションライダーシップを正確に予測することを目的としている。駅間の細粒度および包括的関係を効果的に分析することは、メトロOD予測に不可欠である。しかし、既存の地下鉄のODモデルは、駅の視点で複数のODペアからの情報や、ODペアのサブセットにのみ焦点を合わせている。これらのアプローチはODペア間の微細な関係を見落とし、潜在的な異常な状態を予測するのに困難をもたらす可能性がある。これらの課題に対処するために、すべてのODペアの観点からトラフィックの変動を分析し、ODMixerというメトロOD予測のための微粒な時空間MLPアーキテクチャを提案する。具体的には、ODMixerは二重分岐構造を持ち、Channel Mixer、Multi-view Mixer、Bidirectional Trend Learnerを含む。 Channel MixerはODペア間の短期的時間的関係を捉えることを目的としており、Multi-view Mixerは起源と目的地の両方の観点から関係を捉えることに集中している。長期的な時間的関係をモデル化するために,双方向トレンド学習システムを導入する。大規模OD予測データセットHZMODとSHMOの大規模な実験により,ODMixerの利点が示された。私たちのコードはhttps://github.com/KLatitude/ODMixer.comから入手可能です。 Metro Origin-Destination (OD) prediction is a crucial yet challenging spatial-temporal prediction task in urban computing, which aims to accurately forecast cross-station ridership for optimizing metro scheduling and enhancing overall transport efficiency. Analyzing fine-grained and comprehensive relations among stations effectively is imperative for metro OD prediction. However, existing metro OD models either mix information from multiple OD pairs from the station's perspective or exclusively focus on a subset of OD pairs. These approaches may overlook fine-grained relations among OD pairs, leading to difficulties in predicting potential anomalous conditions. To address these challenges, we analyze traffic variations from the perspective of all OD pairs and propose a fine-grained spatial-temporal MLP architecture for metro OD prediction, namely ODMixer. Specifically, our ODMixer has double-branch structure and involves the Channel Mixer, the Multi-view Mixer, and the Bidirectional Trend Learner. The Channel Mixer aims to capture short-term temporal relations among OD pairs, the Multi-view Mixer concentrates on capturing relations from both origin and destination perspectives. To model long-term temporal relations, we introduce the Bidirectional Trend Learner. Extensive experiments on two large-scale metro OD prediction datasets HZMOD and SHMO demonstrate the advantages of our ODMixer. Our code is available at https://github.com/KLatitude/ODMixer.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# FAD-SAR:深層学習に基づく合成開口レーダ画像による漁業活動検出システム FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method ( http://arxiv.org/abs/2404.18245v2 ) ライセンス: Link先を確認	Yanbing Bai, Siao Li, Rui-Yang Ju, Zihao Yang, Jinze Yu, Jen-Shiun Chiang,	(参考訳) 違法で、報告されず、規制されていない(IUU)漁業活動は、人間の生活の様々な側面に深刻な影響を及ぼす。しかし,海洋におけるIUU漁活動の検出とモニタリングには限界がある。合成開口レーダ(SAR)は既存の容器検出システムを補完するが,従来の方法でのSAR画像から有用な情報を抽出することは,特にIUU漁では困難である。本稿では, SSD, RetinaNet, FSAF, FCOS, Faster R-CNN, Cascade R-CNNの6つの古典的物体検出モデルを用いて, xView3データセット上に実装された深層学習型漁獲活動検知システムを提案する。さらに、この研究は、より高速なR-CNNモデルの性能を向上させるために、異なる拡張技術を用いている。実験の結果,オンラインハードケースマイニング(OHEM)戦略を用いた高速R-CNNモデルのトレーニングにより,Avg-F1値が0.212から0.216に増加した。 Illegal, unreported, and unregulated (IUU) fishing activities seriously affect various aspects of human life. However, traditional methods for detecting and monitoring IUU fishing activities at sea have limitations. Although synthetic aperture radar (SAR) can complement existing vessel detection systems, extracting useful information from SAR images using traditional methods remains a challenge, especially in IUU fishing. This paper proposes a deep learning based fishing activity detection system, which is implemented on the xView3 dataset using six classical object detection models: SSD, RetinaNet, FSAF, FCOS, Faster R-CNN, and Cascade R-CNN. In addition, this work employs different enhancement techniques to improve the performance of the Faster R-CNN model. The experimental results demonstrate that training the Faster R-CNN model using the Online Hard Example Mining (OHEM) strategy increases the Avg-F1 value from 0.212 to 0.216.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# GRAMMAR:閉領域検索拡張言語モデルの評価のための基礎的およびモジュール的手法 GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model ( http://arxiv.org/abs/2404.19232v5 ) ライセンス: Link先を確認	Xinzhe Li, Ming Liu, Shang Gao,	(参考訳) Retrieval-augmented Generation (RAG) システムは、ドメイン固有の知識ベースを問うために、様々な産業で活発に研究され、展開されている。しかし、これらのシステムを評価することは、ドメイン固有のクエリの不足やそれに対応する基礎的な真実、そして障害の原因を診断するための体系的なアプローチの欠如など、ユニークな課題を示す。これらの課題に対処するために、GRAMMAR(GRounded and Modular Methodology for Assessment of RAG)という2つの要素からなる評価フレームワークを導入する。 1)リレーショナルデータベースとLCMを活用して,スケーラブルな問合せペアを効率よく作成し,評価を行うデータ生成プロセス。この方法は、クエリロジックを言語的バリエーションから分離しやすくし、非ロバストなテキスト形式に関する仮説の検証を可能にする。 2)知識ギャップと堅牢性を区別し,欠陥モジュールの識別を可能にする評価フレームワーク。我々の経験的結果は、モデル脆弱性を正確に識別するために、現在の基準フリー評価手法の限界とGRAMMARの信頼性を裏付けるものである。実装の詳細については、GitHubリポジトリを参照してください。 Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem from knowledge deficits or issues related to system robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising two key elements: 1) a data generation process that leverages relational databases and LLMs to efficiently produce scalable query-answer pairs for evaluation. This method facilitates the separation of query logic from linguistic variations, enabling the testing of hypotheses related to non-robust textual forms; and 2) an evaluation framework that differentiates knowledge gaps from robustness and enables the identification of defective modules. Our empirical results underscore the limitations of current reference-free evaluation approaches and the reliability of GRAMMAR to accurately identify model vulnerabilities. For implementation details, refer to our GitHub repository: https://github.com/xinzhel/grammar.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# ドメイン一般化のためのソフトプロンプト生成 Soft Prompt Generation for Domain Generalization ( http://arxiv.org/abs/2404.19286v2 ) ライセンス: Link先を確認	Shuanghao Bai, Yuedi Zhang, Wanqi Zhou, Zhirong Luan, Badong Chen,	(参考訳) 大規模な事前訓練された視覚言語モデル(VLM)は、手動で設計したプロンプトで下流のタスクに印象的なゼロショット能力を示している。 VLMを下流タスクにさらに適応させるために、ソフトプロンプトは、特定のドメインデータに基づいて微調整を行う手作業で設計されたプロンプトを置き換えることが提案されている。事前のプロンプト学習法は、主にトレーニングサンプルから固定されたプロンプトまたは予約されたプロンプトを学習する。しかし、学習したプロンプトは多様性を欠き、目に見えない領域に関する情報を無視する。本稿では,素早い学習フレームワークを生成的観点から再構築し,ドメイン一般化(DG)タスク,すなわちソフト・プロンプト・ジェネレーション(SPG)の簡易かつ効率的な手法を提案する。具体的には、SPGは2段階のトレーニングフェーズと推論フェーズから構成される。トレーニング期間中に、生成モデルドメイン知識を組み込んだソフトプロンプトラベルを各ドメインに導入する。推論フェーズでは、生成モデルのジェネレータを使用して、未知のターゲットドメインに対してインスタンス固有のソフトプロンプトを得る。 3つのDGタスクの5つの領域一般化ベンチマークの大規模な実験は、SPGが最先端のパフォーマンスを達成することを示す。コードはhttps://github.com/renytek13/Soft-Prompt-Generation-with-CGANで公開されている。 Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which undergoes fine-tuning based on specific domain data. Prior prompt learning methods primarily learn a fixed prompt or residuled prompt from training samples. However, the learned prompts lack diversity and ignore information about unseen domains. In this paper, we reframe the prompt learning framework from a generative perspective and propose a simple yet efficient method for the Domain Generalization (DG) task, namely Soft Prompt Generation (SPG). Specifically, SPG consists of a two-stage training phase and an inference phase. During the training phase, we introduce soft prompt label for each domain, aiming to incorporate the generative model domain knowledge. During the inference phase, the generator of the generative model is employed to obtain instance-specific soft prompts for the unseen target domain. Extensive experiments on five domain generalization benchmarks of three DG tasks demonstrate that SPG achieves state-of-the-art performance. The code is available at https://github.com/renytek13/Soft-Prompt-Generation-with-CGAN.	翻訳日:2024-07-16 04:47:43 公開日:2024-07-12
# WateRF:著作権保護分野におけるロバストな透かし WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights ( http://arxiv.org/abs/2405.02066v4 ) ライセンス: Link先を確認	Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim,	(参考訳) NeRF(Neural Radiance Fields)研究の進歩は、様々な領域に広範な応用をもたらすが、著作権保護はまだ深く研究されていない。近年、NeRFベースの3D表現を安全に展開するための重要なソリューションの1つとして、NeRF透かしが検討されている。しかし、既存の手法は暗黙的あるいは明示的なNeRF表現にのみ適用するように設計されている。本研究では,NeRFの両表現に適用可能な革新的な透かし手法を提案する。これは、NeRFを微調整してバイナリメッセージをレンダリングプロセスに埋め込むことによって実現される。本稿では,NeRF空間における離散ウェーブレット変換を透かしに利用することを提案する。さらに、遅延バックプロパゲーション手法を採用し、パッチワイズ損失と組み合わせることで、最小トレードオフでレンダリング品質とビット精度を向上させる。提案手法は,2次元レンダリング画像に埋め込まれた透かしの容量,可視性,堅牢性の3つの異なる側面で評価する。本手法は、比較した最先端手法よりも高速なトレーニング速度で最先端性能を実現する。 The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# CausalLP:知識グラフの重み付きリンク予測による因果関係の学習 CausalLP: Learning causal relations with weighted knowledge graph link prediction ( http://arxiv.org/abs/2405.02327v2 ) ライセンス: Link先を確認	Utkarshani Jaimini, Cory Henson, Amit P. Sheth,	(参考訳) 因果ネットワークは、医療診断から製造における根本原因分析まで、幅広い用途で有用である。しかし、実際には因果関係が欠如しているため、因果関係は不完全であることが多い。本稿では,知識グラフ補完問題として不完全因果関係の問題を定式化するCausalLPという新しい手法を提案する。より具体的には、不完全な因果ネットワークにおける新たな因果関係を見つけるタスクを知識グラフリンク予測のタスクにマップする。因果関係を表すために知識グラフを用いることは、外部のドメイン知識の統合を可能にし、さらに複雑さとして、因果関係は知識グラフ内のエンティティ間の因果関係の強さを表す重みを持つ。 CausalLPでは、因果的説明と因果的予測という2つの主要なタスクがサポートされている。このアプローチの評価には、因果推論のためのシミュレーションビデオのベンチマークデータセットであるCLEVRER-Humansを使用し、複数の知識グラフ埋め込みアルゴリズムの性能を比較する。 2) 因果関係のマルコフ特性を利用した新しいデータ分割手法であるマルコフスプリット(Markov-based split) と, リンク予測アルゴリズムの評価に一般的に使用されるランダムスプリット(ランダムスプリット)と, マルコフスプリット(Markov-based split) の2つの異なるデータセット分割手法が評価に用いられている。その結果,重み付き因果関係を用いることで,重み付き関係を伴わないベースライン上の因果関係の予測が向上することがわかった。 Causal networks are useful in a wide variety of applications, from medical diagnosis to root-cause analysis in manufacturing. In practice, however, causal networks are often incomplete with missing causal relations. This paper presents a novel approach, called CausalLP, that formulates the issue of incomplete causal networks as a knowledge graph completion problem. More specifically, the task of finding new causal relations in an incomplete causal network is mapped to the task of knowledge graph link prediction. The use of knowledge graphs to represent causal relations enables the integration of external domain knowledge; and as an added complexity, the causal relations have weights representing the strength of the causal association between entities in the knowledge graph. Two primary tasks are supported by CausalLP: causal explanation and causal prediction. An evaluation of this approach uses a benchmark dataset of simulated videos for causal reasoning, CLEVRER-Humans, and compares the performance of multiple knowledge graph embedding algorithms. Two distinct dataset splitting approaches are used for evaluation: (1) random-based split, which is the method typically employed to evaluate link prediction algorithms, and (2) Markov-based split, a novel data split technique that utilizes the Markovian property of causal relations. Results show that using weighted causal relations improves causal link prediction over the baseline without weighted relations.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# コンフォーマル性, コンバブレーション, 偽装:多言語LLMコラボレーションにおけるペルソナの不整合 Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent LLM Collaboration ( http://arxiv.org/abs/2405.03862v2 ) ライセンス: Link先を確認	Razan Baltaji, Babak Hemmatian, Lav R. Varshney,	(参考訳) マルチエージェントAIシステムは、科学的および実践的な応用において、集合的な意思決定をシミュレートするために使用することができる。また、チャットボットパイプラインに多様なグループディスカッションステップを導入して、チャットボットの応答の文化的感受性を高めるためにも使用できる。しかしながら、これらのアプリケーションは、AIエージェントが割り当てられたペルソナを確実に採用し、人間のインタラクションを模倣する能力に基づいている。 LLMエージェントがこれらの要件を満たす能力を評価するために、カルチャーコラボレーションや議論に携わるAIエージェントのアンサンブルを、個人の反応やチャットの書き起こしを分析して検討する。本研究は, 多様な視点を反映した集団的意思決定を促すことが示唆されるが, この利益は, 対人的プレッシャーや一貫したペルソナや意見を維持する上での課題により, エージェントの適合性への感受性によって誘惑される。協力よりも意見を支持する上での議論を促す指示は、矛盾の度合いを増大させる。私たちが特定した要因に対処しない限り、より文化的に多様なAI出力や、グループ意思決定のより現実的なシミュレーションを生成するマルチエージェントフレームワークの潜在能力は未完成のままである。 Multi-agent AI systems can be used for simulating collective decision-making in scientific and practical applications. They can also be used to introduce a diverse group discussion step in chatbot pipelines, enhancing the cultural sensitivity of the chatbot's responses. These applications, however, are predicated on the ability of AI agents to reliably adopt assigned personas and mimic human interactions. To evaluate the ability of LLM agents to satisfy these requirements, we examine AI agent ensembles engaged in cultural collaboration and debate by analyzing their private responses and chat transcripts. Our findings suggest that multi-agent discussions can encourage collective decisions that reflect diverse perspectives, yet this benefit is tempered by the agents' susceptibility to conformity due to perceived peer pressure and challenges in maintaining consistent personas and opinions. Instructions that encourage debate in support of one's opinions rather than collaboration increase the rate of inconstancy. Without addressing the factors we identify, the full potential of multi-agent frameworks for producing more culturally diverse AI outputs or more realistic simulations of group decision-making will remain untapped.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# ViewFormer: View-Guided Transformer を用いた多視点3次元動作知覚のための時空間モデリング ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers ( http://arxiv.org/abs/2405.04299v2 ) ライセンス: Link先を確認	Jinke Li, Xiao He, Chonghua Zhou, Xiaoqiang Cheng, Yang Wen, Dan Zhang,	(参考訳) シナリオを駆動するための高度な認識技術である3D占有は、物理空間をグリッドマップに定量化することで、前景と背景を区別することなく、シーン全体を表現している。画像特徴を3次元表現に変換するのに効率的で、広く採用されているプロジェクションファーストの変形可能な注意力は、センサーの配置制約によるマルチビュー機能集約の課題に遭遇する。この問題に対処するために,効果的な多視点特徴集約のための学習優先視点アテンション機構を提案する。さらに,マップ構築や3Dオブジェクト検出など,多視点3Dタスクにまたがるビューアテンションのスケーラビリティについても紹介する。提案するビューアテンションと,追加のマルチフレームストリーミング時間アテンションを活用して,時空間特徴アグリゲーションのための視覚中心のトランスフォーマーベースのフレームワークであるViewFormerを紹介する。占有レベルのフロー表現をさらに探求するため,既存の高品質データセット上に構築されたベンチマークであるFlowOcc3Dを紹介した。このベンチマークの質的および定量的分析は、きめ細かいダイナミックなシーンを表現する可能性を明らかにする。大規模な実験により,本手法は従来手法よりも有意に優れていたことがわかった。コードは \url{https://github.com/ViewFormerOcc/ViewFormer-Occ} で公開されている。 3D occupancy, an advanced perception technology for driving scenarios, represents the entire scene without distinguishing between foreground and background by quantifying the physical space into a grid map. The widely adopted projection-first deformable attention, efficient in transforming image features into 3D representations, encounters challenges in aggregating multi-view features due to sensor deployment constraints. To address this issue, we propose our learning-first view attention mechanism for effective multi-view feature aggregation. Moreover, we showcase the scalability of our view attention across diverse multi-view 3D tasks, including map construction and 3D object detection. Leveraging the proposed view attention as well as an additional multi-frame streaming temporal attention, we introduce ViewFormer, a vision-centric transformer-based framework for spatiotemporal feature aggregation. To further explore occupancy-level flow representation, we present FlowOcc3D, a benchmark built on top of existing high-quality datasets. Qualitative and quantitative analyses on this benchmark reveal the potential to represent fine-grained dynamic scenes. Extensive experiments show that our approach significantly outperforms prior state-of-the-art methods. The codes are available at \url{https://github.com/ViewFormerOcc/ViewFormer-Occ}.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# vAttention: PagedAttention のない LLM 実行のための動的メモリ管理 vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention ( http://arxiv.org/abs/2405.04437v2 ) ライセンス: Link先を確認	Ramya Prabhu, Ajay Nayak, Jayashree Mohan, Ramachandran Ramjee, Ashish Panwar,	(参考訳) 高スループットLLM推論には,GPUメモリの効率的な管理が不可欠である。以前のシステムではKVキャッシュのメモリを前もって保存していたため、内部の断片化が原因で容量が無駄になった。需要パージングにインスパイアされたvLLMは、KV-cacheの動的メモリ割り当てを可能にするPagedAttentionを提案した。このアプローチは断片化を排除し、全体のサービスを改善する。しかし、物理メモリを動的に割り当てるために、PagedAttentionはKV-cacheのレイアウトを連続的な仮想メモリから連続しない仮想メモリに変更した。結果として、ページングをサポートするためにアテンションカーネルを書き換え、サービスフレームワークにメモリマネージャを実装する必要がある。これにより、パフォーマンスとプログラミングのオーバーヘッドと、最先端の注目カーネルを採用する際の移植性の問題の両方が生じる。本稿では,動的KVキャッシュメモリ管理のための新しいアプローチであるvAttentionを提案する。 PagedAttentionとは対照的に、vAttentionはKV-cacheを連続した仮想メモリに格納し、物理メモリのオンデマンド割り当てにOSサポートを活用する。 vAttentionは、コードを書き換えることなく、物理メモリの動的アロケーションのサポートを追加することで、最先端の注目カーネルをすぐに使えるようにする。我々は、vLLMサービススタックにvAttentionを実装し、FlashAttentionとFlashInferの最先端のPagedAttentionベースのカーネルに比べて、最大1.99倍のデコードスループット、最大1.22倍と1.29倍のエンドツーエンドサービススループットを向上させることを実証した。 Efficient management of GPU memory is essential for high throughput LLM inference. Prior systems used to reserve KV-cache memory ahead-of-time that resulted in wasted capacity due to internal fragmentation. Inspired by demand paging, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation and improves serving throughout. However, to be able to allocate physical memory dynamically, PagedAttention changes the layout of KV-cache from contiguous virtual memory to non-contiguous virtual memory. As a consequence, one needs to rewrite the attention kernels to support paging, and implement a memory manager in the serving framework. This results in both performance and programming overheads, as well as portability challenges in adopting state-of-the-art attention kernels. In this paper, we propose vAttention, a new approach for dynamic KV-cache memory management. In contrast to PagedAttention, vAttention stores KV-cache in contiguous virtual memory and leverages OS support for on-demand allocation of physical memory. vAttention thus enables one to use state-of-the art attention kernels out-of-the-box by adding support for dynamic allocation of physical memory without having to re-write their code. We implement vAttention in the vLLM serving stack to show that it also helps improve decode throughput by up to 1.99x over vLLM, and the end-to-end serving throughput by up to 1.22x and 1.29x, compared to using the state-of-the-art PagedAttention based kernels of FlashAttention and FlashInfer.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# ProLLM:タンパク質とタンパク質の相互作用予測のためのLLMの強化 ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction ( http://arxiv.org/abs/2405.06649v2 ) ライセンス: Link先を確認	Mingyu Jin, Haochen Xue, Zhenting Wang, Boming Kang, Ruosong Ye, Kaixiong Zhou, Mengnan Du, Yongfeng Zhang,	(参考訳) タンパク質-タンパク質相互作用(PPI)の予測は、生物学的機能や疾患を理解する上で重要である。 PPI予測に対する従来の機械学習アプローチは、主に直接物理的相互作用に焦点を当てており、中間タンパク質による非物理的接続の広いコンテキストを無視し、その効果を制限している。大規模言語モデル(LLM)の出現は、この複雑な生物学的課題に対処する新たな機会を提供する。構造化されたデータを自然言語のプロンプトに変換することで、タンパク質間の関係をテキストにマッピングできる。このアプローチにより、LLMはタンパク質間の間接的な接続を識別し、上流から下流への経路をトレースすることができる。そこで本研究では,PPIに適したLLMを用いた新しいフレームワークProLLMを提案する。具体的には、自然言語のプロンプトとしてシグナル伝達経路の生物学的機構を複製する、思考のタンパク質鎖(ProCoT)を提案する。 ProCoTはシグナル伝達経路を、上流タンパク質から始まり、いくつかの中間タンパク質を通過して下流タンパク質に生物学的シグナルを伝達するタンパク質推論過程とみなしている。したがって、上流タンパクと下流タンパクとの相互作用を予測するためにProCoTを使用することができる。 ProLLMのトレーニングには、複雑な生物学的問題に対するモデルの理解を深めるProCoTフォーマットが使用されている。本稿では,ProCoTに加えて,自然言語のプロンプトにタンパク質サイトを埋め込む方法の探索や,タンパク質知識データセットの微調整の指導にも貢献する。本稿では,ベンチマークデータセットに対する厳密な検証による ProLLM の有効性を実証し,予測精度と一般化性の観点から既存手法よりも大幅に向上したことを示す。コードは、https://github.com/MingyuJ666/ProLLM.comで入手できる。 The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases. Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions, ignoring the broader context of nonphysical connections through intermediate proteins, thus limiting their effectiveness. The emergence of Large Language Models (LLMs) provides a new opportunity for addressing this complex biological challenge. By transforming structured data into natural language prompts, we can map the relationships between proteins into texts. This approach allows LLMs to identify indirect connections between proteins, tracing the path from upstream to downstream. Therefore, we propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time. Specifically, we propose Protein Chain of Thought (ProCoT), which replicates the biological mechanism of signaling pathways as natural language prompts. ProCoT considers a signaling pathway as a protein reasoning process, which starts from upstream proteins and passes through several intermediate proteins to transmit biological signals to downstream proteins. Thus, we can use ProCoT to predict the interaction between upstream proteins and downstream proteins. The training of ProLLM employs the ProCoT format, which enhances the model's understanding of complex biological problems. In addition to ProCoT, this paper also contributes to the exploration of embedding replacement of protein sites in natural language prompts, and instruction fine-tuning in protein knowledge datasets. We demonstrate the efficacy of ProLLM through rigorous validation against benchmark datasets, showing significant improvement over existing methods in terms of prediction accuracy and generalizability. The code is available at: https://github.com/MingyuJ666/ProLLM.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# 静的AI評価を超えて: LLMの害とリスクに対する人間のインタラクション評価を前進させる Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks ( http://arxiv.org/abs/2405.10632v5 ) ライセンス: Link先を確認	Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung,	(参考訳) モデル評価は、AIシステムの安全性、リスク、社会的影響を理解する上で重要である。ほとんどの実世界のAIアプリケーションは人間とAIのインタラクションを含んでいるが、AIモデルの現在の評価(例えば、一般的なベンチマーク)はそうではない。その代わりに、人間的要因を限定的に組み込んで、モデルの安全性を個別に評価することで、人間とモデルの相互作用の複雑さを捉えることができない。本稿では,人-モデルインタラクションの評価や,モデルを用いた人-モデルインタラクションのプロセスと結果に焦点をあてた,新たな評価カテゴリ"ヒューマンインタラクション評価" (HIEs) の定義と運用について論じる。まず、HIEは安全性評価の妥当性を高め、直接人的影響と相互作用特異的害を評価し、モデルによる社会的影響の今後の評価を導くために使用できると論じる。第2に,安全性を重視したHIE設計フレームワーク(人-LLM相互作用分類を含む)について,(1)危険領域の同定,(2)使用状況の特徴付け,(3)評価パラメータの選択の3段階について提案する。第3に、過信と説得リスクの2つの潜在的評価に我々の枠組みを適用します。最後に,HIEのコスト,複製性,非表現性に関する懸念に対処するための具体的な勧告を述べる。 Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# グラフバックドア攻撃を再考する: 分散保存の観点から Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective ( http://arxiv.org/abs/2405.10757v3 ) ライセンス: Link先を確認	Zhiwei Zhang, Minhua Lin, Enyan Dai, Suhang Wang,	(参考訳) グラフニューラルネットワーク(GNN)は、様々なタスクにおいて顕著なパフォーマンスを示している。しかし、最近の研究によると、GNNはバックドア攻撃に弱い。一般的に、バックドア攻撃は、トレーニンググラフ内の一連のノードにバックドアトリガとターゲットクラスラベルをアタッチすることで、グラフを毒する。有毒グラフでトレーニングされたGNNは、ターゲットクラスにトリガが付いたテストノードを予測するために誤解される。その効果にもかかわらず、我々の経験的分析は、既存の方法によって生成されるトリガーは、クリーンデータと大きく異なる分布外(OOD)である傾向があることを示している。したがって、これらのインジェクショントリガーは、現実世界のアプリケーションで広く使われている外れ値検出法で容易に検出および切断することができる。そこで本稿では,IDトリガによる無意味なグラフバックドア攻撃の新たな問題について検討する。我々は,IDトリガを生成するために,OOD検出器を逆学習戦略と組み合わせて導入し,分散中のトリガの属性を生成する。 IDトリガによる高い攻撃成功率を確保するため,有毒グラフで訓練した被害者モデルによるトリガ記憶の促進を目的とした新しいモジュールを提案する。実世界のデータセットに対する大規模な実験は、高い攻撃成功率を維持しながら、様々な防衛戦略をバイパスできる分散トリガの生成において、提案手法の有効性を実証している。 Graph Neural Networks (GNNs) have shown remarkable performance in various tasks. However, recent works reveal that GNNs are vulnerable to backdoor attacks. Generally, backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph. A GNN trained on the poisoned graph will then be misled to predict test nodes attached with trigger to the target class. Despite their effectiveness, our empirical analysis shows that triggers generated by existing methods tend to be out-of-distribution (OOD), which significantly differ from the clean data. Hence, these injected triggers can be easily detected and pruned with widely used outlier detection methods in real-world applications. Therefore, in this paper, we study a novel problem of unnoticeable graph backdoor attacks with in-distribution (ID) triggers. To generate ID triggers, we introduce an OOD detector in conjunction with an adversarial learning strategy to generate the attributes of the triggers within distribution. To ensure a high attack success rate with ID triggers, we introduce novel modules designed to enhance trigger memorization by the victim model trained on poisoned graph. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method in generating in distribution triggers that can by-pass various defense strategies while maintaining a high attack success rate.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# ベイズ学習によるクラスインクリメンタル学習のための原型コントラスト損失 Bayesian Learning-driven Prototypical Contrastive Loss for Class-Incremental Learning ( http://arxiv.org/abs/2405.11067v2 ) ライセンス: Link先を確認	Nisha L. Raichur, Lucas Heublein, Tobias Feigl, Alexander Rügamer, Christopher Mutschler, Felix Ott,	(参考訳) 連続学習における手法の主な目的は、破滅的な忘れ込みの有害な現象を軽減しつつ、データのストリームから連続的にタスクを学習することである。本稿では,従来のプロトタイプと新たに遭遇したプロトタイプの最適な表現を学習することに焦点を当てる。本稿では,クラス増分学習シナリオに特化して,ベイズ学習駆動型コントラスト損失(BLCL)を持つプロトタイプネットワークを提案する。そこで我々は,クラス間距離を小さくし,クラス間距離を増大させることにより,新しいクラスを潜在表現に組み込むコントラスト的損失を導入する。提案手法は,ベイズ学習手法を用いて,クロスエントロピーとコントラスト損失関数のバランスを動的に適用する。 CIFAR-10 と CIFAR-100 による画像分類と干渉分類のための GNSS ベースデータセットの画像化による実験的な評価により,提案手法の有効性を検証し,既存の最先端手法よりも優れていることを示す。 The primary objective of methods in continual learning is to learn tasks in a sequential manner over time from a stream of data, while mitigating the detrimental phenomenon of catastrophic forgetting. In this paper, we focus on learning an optimal representation between previous class prototypes and newly encountered ones. We propose a prototypical network with a Bayesian learning-driven contrastive loss (BLCL) tailored specifically for class-incremental learning scenarios. Therefore, we introduce a contrastive loss that incorporates new classes into the latent representation by reducing the intra-class distance and increasing the inter-class distance. Our approach dynamically adapts the balance between the cross-entropy and contrastive loss functions with a Bayesian learning technique. Empirical evaluations conducted on both the CIFAR-10 and CIFAR-100 dataset for image classification and images of a GNSS-based dataset for interference classification validate the efficacy of our method, showcasing its superiority over existing state-of-the-art approaches.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# リカレントグラフニューラルネットワークのリアルとフロートによる論理的特性評価 Logical Characterizations of Recurrent Graph Neural Networks with Reals and Floats ( http://arxiv.org/abs/2405.14606v2 ) ライセンス: Link先を確認	Veeti Ahvonen, Damian Heiman, Antti Kuusisto, Carsten Lutz,	(参考訳) 2019年の先駆的な研究の中で、Barcel\'o氏と共著者は、一階述語論理で定義可能な特性に対して、定数反復深度グラフニューラルネットワーク(GNN)の表現力に正確に一致するロジックを特定した。本稿では,(1)浮動小数点数の設定と(2)実数の設定の2つのシナリオにおいて,繰り返しGNNの正確な論理的特徴を与える。フロートに対して、繰り返しGNNと一致する形式主義は数えられる規則に基づくモーダル論理であり、実数に対しては数えるにも適切な無限のモーダル論理を用いる。これらの結果は、どちらの場合もバックグラウンド論理に関連付けることなく、繰り返し設定における論理とGNNの正確な一致を与えるが、浮動小数点演算に関する自然な仮定を用いる。キャラクタリゼーションを適用することで、モナディック二階述語論理(MSO)で定義可能なグラフ特性と比較して、無限論理と規則論理は等しく表現力があることも証明できる。これは、実数とフロートを持つリカレントGNNが、MSO定義可能な性質に対して同じ表現力を持つことを意味し、そのような性質に対して、実数を持つリカレントGNNも(最終!)ルールに基づくモーダル論理によって特徴づけられることを示している。一般的には、フロートによる表現力は実数よりも弱い。論理指向の結果に加えて、分散オートマトンを用いて、実数とフロートの両方を持つ繰り返しGNNを特徴付け、分散コンピューティングモデルへのリンクを描画する。 In pioneering work from 2019, Barcel\'o and coauthors identified logics that precisely match the expressive power of constant iteration-depth graph neural networks (GNNs) relative to properties definable in first-order logic. In this article, we give exact logical characterizations of recurrent GNNs in two scenarios: (1) in the setting with floating-point numbers and (2) with reals. For floats, the formalism matching recurrent GNNs is a rule-based modal logic with counting, while for reals we use a suitable infinitary modal logic, also with counting. These results give exact matches between logics and GNNs in the recurrent setting without relativising to a background logic in either case, but using some natural assumptions about floating-point arithmetic. Applying our characterizations, we also prove that, relative to graph properties definable in monadic second-order logic (MSO), our infinitary and rule-based logics are equally expressive. This implies that recurrent GNNs with reals and floats have the same expressive power over MSO-definable properties and shows that, for such properties, also recurrent GNNs with reals are characterized by a (finitary!) rule-based modal logic. In the general case, in contrast, the expressive power with floats is weaker than with reals. In addition to logic-oriented results, we also characterize recurrent GNNs, with both reals and floats, via distributed automata, drawing links to distributed computing models.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# Alistair: 差分生産型広告測定システムのためのデバイス上での効率的な予算化 Alistair: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems ( http://arxiv.org/abs/2405.16719v2 ) ライセンス: Link先を確認	Pierre Tholoniat, Kelly Kostopoulou, Peter McNeely, Prabhpreet Singh Sodhi, Anirudh Varanasi, Benjamin Case, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer,	(参考訳) 主要なブラウザからのサードパーティ製クッキーの削除や、新しいプライバシー保護広告APIの導入によって、調査コミュニティは、Webのプライバシーを質的に改善する業界を支援する機会を、タイムリーに持っている。本稿では、既存のプライバシー保護広告計測APIを強化するため、W3Cコミュニティグループ内での取り組みについて論じる。 Google、Apple、Meta、Mozillaのデザインを分析し、より厳格で効率的な差分プライバシー(DP)予算コンポーネントでそれらを強化します。当社のアプローチはAlistairと呼ばれ、明確に定義されたDP保証を強制し、広告主がより正確なプライベートな測定クエリを実行できるようにする。 DPの個々の形態でプライバシー保証をフレーミングすることで、従来のDP定義を使用するシステムよりもDP予算を効率的にすることができる。 AlistairをChromeに組み込んで、マイクロベンチマークや広告データセットで評価します。すべてのワークロードにおいて、Alistairは、同等のDP保護の下でより多くの広告測定を可能にする点で、ベースラインを著しく上回る。 With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Alistair, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Alistair into Chrome and evaluate it on microbenchmarks and advertising datasets. Across all workloads, Alistair significantly outperforms baselines in enabling more advertising measurements under comparable DP protection.	翻訳日:2024-07-16 04:37:57 公開日:2024-07-12
# Mini-Netによる医用画像分割の促進:医用画像の効率的な分別を目的とした軽量化 Advancing Medical Image Segmentation with Mini-Net: A Lightweight Solution Tailored for Efficient Segmentation of Medical Images ( http://arxiv.org/abs/2405.17520v3 ) ライセンス: Link先を確認	Syed Javed, Tariq M. Khan, Abdul Qayyum, Arcot Sowmya, Imran Razzak,	(参考訳) 医用画像における解剖学的構造と異常の正確なセグメンテーションは,コンピュータによる診断・解析に不可欠である。このタスクではディープラーニングの技術が優れていますが、その計算要求は課題を引き起こします。また, 一般的な物体分割には有効であるが, 医用画像には最適でない部分分割法もある。これらの課題に対処するために,医用画像に特化して設計された軽量セグメンテーションネットワークであるMini-Netを提案する。パラメータが38,000未満のMini-Netは、高周波数と低周波数の両方の機能を効率的にキャプチャし、様々な医療画像シナリオにおけるリアルタイムのアプリケーションを可能にする。 DRIVE, STARE, ISIC-2016, ISIC-2018, MoNuSegなどの各種データセット上でMini-Netを評価し, 最先端手法と比較して, その堅牢性と優れた性能を示す。 Accurate segmentation of anatomical structures and abnormalities in medical images is crucial for computer-aided diagnosis and analysis. While deep learning techniques excel at this task, their computational demands pose challenges. Additionally, some cutting-edge segmentation methods, though effective for general object segmentation, may not be optimised for medical images. To address these issues, we propose Mini-Net, a lightweight segmentation network specifically designed for medical images. With fewer than 38,000 parameters, Mini-Net efficiently captures both high- and low-frequency features, enabling real-time applications in various medical imaging scenarios. We evaluate Mini-Net on various datasets, including DRIVE, STARE, ISIC-2016, ISIC-2018, and MoNuSeg, demonstrating its robustness and good performance compared to state-of-the-art methods.	翻訳日:2024-07-16 04:27:57 公開日:2024-07-12
# ブロックチェーン検証器のジレンマに対するPeer-Predictionソリューションは2つ It Takes Two: A Peer-Prediction Solution for Blockchain Verifier's Dilemma ( http://arxiv.org/abs/2406.01794v2 ) ライセンス: Link先を確認	Zishuo Zhao, Xi Chen, Yuan Zhou,	(参考訳) ブロックチェーンシステムのセキュリティは、基本的には、大多数の当事者が誠実に振る舞う分散コンセンサスに基づいており、ブロックチェーンシステムの堅牢性を維持するためには、コンテンツ検証のプロセスが不可欠である。しかし、不正行為者が少ない、あるいは全くないセキュアなブロックチェーンシステムが、検証者が正直に検証を行うのに十分なインセンティブを与えられないという現象は、検証者のジレンマと呼ばれ、ブロックチェーンシステムの基本的なセキュリティを著しく損なう可能性がある。既存の研究は遅延検証の非インセンティブ化のために意図的にエラーを挿入しようと試みているが、分散環境は検証の正しさを判断したり、悪意のある検証を直接検出することは不可能である。本稿では,複数の検証者間での分散検証ゲームのためのベイズ的真理機構の設計に対するピア予測手法を活用する研究を開始し,検証プロセスにおけるノイズ観測の存在下においても,基礎的真理にアクセスせずに誠実な検証を行うよう,検証者全員にインセンティブを与える。理論的に検証ゲームのメカニズムの真実性を保証することで、当社の作業は、ブロックチェーンやその他の分散システムのセキュリティと堅牢性を向上する検証メカニズムのフレームワークを提供します。 The security of blockchain systems is fundamentally based on the decentralized consensus in which the majority of parties behave honestly, and the process of content verification is essential to keep the robustness of blockchain systems. However, the phenomenon that a secure blockchain system with few or no cheaters could not provide sufficient incentive for verifiers to honestly perform the costly verification, referred to as the Verifier's Dilemma, could severely undermine the fundamental security of blockchain systems. While existing works have attempted to insert deliberate errors to disincentivize lazy verification, the decentralized environment makes it impossible to judge the correctness of verification or detect malicious verifiers directly. In this paper, we initiate the research that leverages the peer prediction approach towards the design of Bayesian truthful mechanisms for the decentralized verification game among multiple verifiers, incentivizing all verifiers to perform honest verification without access to the ground truth even in the presence of noisy observations in the verification process. With theoretically guaranteed truthfulness of our mechanism for the verification game, our work provides a framework of verification mechanisms that enhances the security and robustness of the blockchain and potentially other decentralized systems.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# Pseudo-Label Filtering for Continual Test-Time Adaptation Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation ( http://arxiv.org/abs/2406.02609v2 ) ライセンス: Link先を確認	Jiayao Tan, Fan Lyu, Chenggong Ni, Tingliang Feng, Fuyuan Hu, Zhang Zhang, Shaochuang Zhao, Liang Wang,	(参考訳) 連続的テスト時間適応(CTTA)は、ソースデータにアクセスすることなく、テストフェーズ中に対象ドメインのシーケンスに事前訓練されたモデルを適用することを目的としている。未知のドメインからのラベルのないデータに適応するために、既存のメソッドは、すべてのサンプルに対して擬似ラベルを構築し、自己学習を通じてモデルを更新する。しかし、これらの擬似ラベルは、しばしばノイズを伴い、適応が不十分になる。 Pseudo Labeling Filter (PLF) と呼ばれるCTTAの擬似ラベル選択法を提案する。 PLFの鍵となる考え方は、擬似ラベルの適切なしきい値を選択し続け、自己学習のための信頼できるしきい値を特定することである。具体的には、初期化、成長、多様性を含む、継続的なドメイン学習の間にしきい値を設定するための3つの原則を提示します。これらの原則に基づいて、擬似ラベルをフィルタするために自己適応型閾値を設計する。さらに、未知のドメインサンプルに対して多様な予測を行うようモデルに促すために、クラス優先アライメント(CPA)手法を導入する。広範な実験を通じて、PLFは現在の最先端の手法よりも優れており、CTTAにおいてその効果が証明されている。 Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient adaptation. To improve the quality of pseudo-labels, we propose a pseudo-label selection method for CTTA, called Pseudo Labeling Filter (PLF). The key idea of PLF is to keep selecting appropriate thresholds for pseudo-labels and identify reliable ones for self-training. Specifically, we present three principles for setting thresholds during continuous domain learning, including initialization, growth and diversity. Based on these principles, we design Self-Adaptive Thresholding to filter pseudo-labels. Additionally, we introduce a Class Prior Alignment (CPA) method to encourage the model to make diverse predictions for unknown domain samples. Through extensive experiments, PLF outperforms current state-of-the-art methods, proving its effectiveness in CTTA.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# 大規模都市道路網における交通信号の最適化 -Isingモデルを用いた適応予測制御- Traffic signal optimization in large-scale urban road networks: an adaptive-predictive controller using Ising models ( http://arxiv.org/abs/2406.03690v2 ) ライセンス: Link先を確認	Daisuke Inoue, Hiroshi Yamashita, Kazuyuki Aihara, Hiroaki Yoshida,	(参考訳) カーボン中立性を達成するためには,スムーズな交通流の実現が重要である。交通条件を考慮した適応的な交通信号制御が注目されている。しかし, 計算負荷が大きいため, 既存の制御手法を用いることで, 大都市全体での車両の最適走行を確保することは困難である。本稿では,AMPIC(Adaptive Model Predictive Ising Controller)と呼ばれる,スケーラビリティと最適性の両方を保証する制御手法を提案する。提案手法では,車両流の予測モデルを明確に考慮し,各制御区間における最適制御問題の解法としてモデル予測制御を用いる。この最適制御問題は、いわゆるイジング問題と同等のバイナリ変数を持つ組合せ最適化問題に変換される。この変換により、広く研究され、高速かつ効率的な最適化性能が期待されているIsingソルバが利用可能となる。現実的な都市道路網のための微視的交通シミュレータを用いて数値実験を行った。その結果、AMPICは従来の制御方式よりも待ち時間が少なく、より高速な走行が可能であり、結果としてCO2排出量は減少することがわかった。長い予測地平線を持つモデル予測手法は、制御性能を効果的に向上させる。モデル都市におけるシステムパラメトリック研究は,提案手法が大都市道路網のスムーズな交通流を実現することを示唆している。イジング解法のうち、D-Waveの量子アニールは、妥当な計算コストで最適に近い解を見つけることが示されている。 Realizing smooth traffic flow is important for achieving carbon neutrality. Adaptive traffic signal control, which considers traffic conditions, has thus attracted attention. However, it is difficult to ensure optimal vehicle flow throughout a large city using existing control methods because of their heavy computational load. Here, we propose a control method called AMPIC (Adaptive Model Predictive Ising Controller) that guarantees both scalability and optimality. The proposed method employs model predictive control to solve an optimal control problem at each control interval with explicit consideration of a predictive model of vehicle flow. This optimal control problem is transformed into a combinatorial optimization problem with binary variables that is equivalent to the so-called Ising problem. This transformation allows us to use an Ising solver, which has been widely studied and is expected to have fast and efficient optimization performance. We performed numerical experiments using a microscopic traffic simulator for a realistic city road network. The results show that AMPIC enables faster vehicle cruising speed with less waiting time than that achieved by classical control methods, resulting in lower CO2 emissions. The model predictive approach with a long prediction horizon thus effectively improves control performance. Systematic parametric studies on model cities indicate that the proposed method realizes smoother traffic flows for large city road networks. Among Ising solvers, D-Wave's quantum annealing is shown to find near-optimal solutions at a reasonable computational cost.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# 安定度エントロピーの普遍飽和による量子複雑性の探索 Probing quantum complexity via universal saturation of stabilizer entropies ( http://arxiv.org/abs/2406.04190v2 ) ライセンス: Link先を確認	Tobias Haug, Leandro Aolita, M. S. Kim,	(参考訳) 非安定化器性 (nonstabilizerness) または 'magic' は量子コンピューティングの鍵となる資源であり、量子優位性に必要な条件である。非クリフォード演算は安定化器状態を資源状態に変え、安定化器R'enyi entropies (SREs)のような資源測度によって非安定化器の量を定量化する。ここでは,SREが臨界数の非クリフォード演算でその最大値を飽和させることを示す。臨界点に近いSREは普遍的な振舞いを示す。顕著なことに、SREの微分は、キュービットの数とは無関係に同じ点で交差し、単一の曲線に再スケールすることができる。臨界点は R'enyi index $\alpha$ に非自明に依存していることが分かる。 Tゲートをドープしたランダムなクリフォード回路の場合、臨界Tゲート密度は$\alpha$とは独立にスケールする。対照的に、ランダムなハミルトン進化の場合、臨界時間は、$\alpha>1$ のキュービット数で線形にスケールするが、$\alpha<1$ の定数は$\alpha<1$ である。このことは、$\alpha$-SREsは、基本的には$\alpha$:$\alpha$-SREsと$\alpha<1$は、Cliffordシミュレーションの複雑さに関連する。技術的貢献として、ランダム進化のパウリスペクトルは2つの高度集中ピークによって近似され、SREを計算することができる。さらに、ランダムなクリフォード回路と回転として表現できるランダムな進化のクラスを導入し、その正確なSREを提供する。量子システムの複雑性を特徴付ける新しい手法が提案されている。 Nonstabilizerness or `magic' is a key resource for quantum computing and a necessary condition for quantum advantage. Non-Clifford operations turn stabilizer states into resourceful states, where the amount of nonstabilizerness is quantified by resource measures such as stabilizer R\'enyi entropies (SREs). Here, we show that SREs saturate their maximum value at a critical number of non-Clifford operations. Close to the critical point SREs show universal behavior. Remarkably, the derivative of the SRE crosses at the same point independent of the number of qubits and can be rescaled onto a single curve. We find that the critical point depends non-trivially on R\'enyi index $\alpha$. For random Clifford circuits doped with T-gates, the critical T-gate density scales independently of $\alpha$. In contrast, for random Hamiltonian evolution, the critical time scales linearly with qubit number for $\alpha>1$, while is a constant for $\alpha<1$. This highlights that $\alpha$-SREs reveal fundamentally different aspects of nonstabilizerness depending on $\alpha$: $\alpha$-SREs with $\alpha<1$ relate to Clifford simulation complexity, while $\alpha>1$ probe the distance to the closest stabilizer state and approximate state certification cost via Pauli measurements. As technical contributions, we observe that the Pauli spectrum of random evolution can be approximated by two highly concentrated peaks which allows us to compute its SRE. Further, we introduce a class of random evolution that can be expressed as random Clifford circuits and rotations, where we provide its exact SRE. Our results opens up new approaches to characterize the complexity of quantum systems.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# サーキットブレーカによるアライメントとロバスト性の改善 Improving Alignment and Robustness with Circuit Breakers ( http://arxiv.org/abs/2406.04313v4 ) ライセンス: Link先を確認	Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks,	(参考訳) AIシステムは有害な行動をとることができ、敵の攻撃に対して非常に脆弱である。本稿では,近年の表現工学の進歩に触発されて,有害な出力を「回路ブレーカー」で処理することでモデルを中断するアプローチを提案する。拒否訓練などのアライメント改善を目的とした既存の技術は、しばしばバイパスされる。敵の訓練のような技術は、特定の攻撃に対抗して穴を塞ごうとする。拒絶訓練や敵対訓練の代替として、サーキットブレーキングは、そもそも有害なアウトプットの原因となる表現を直接制御する。我々の手法はテキストのみの言語モデルとマルチモーダル言語モデルの両方に適用でき、強力な目に見えない攻撃があっても、ユーティリティを犠牲にすることなく有害なアウトプットの発生を防げます。特に、スタンドアロン画像認識における敵対的堅牢性は未解決の課題であるが、回路ブレーカーは、有害なコンテンツを生み出すことを目的とした画像「ヒジャック」に対して、より大きなマルチモーダルシステムを確実に耐えられるようにしている。最後に、我々のアプローチをAIエージェントに拡張し、攻撃されているときの有害な行動の率を大幅に低下させることを示す。当社のアプローチは、有害な行動や敵の攻撃に対する信頼性の高い安全対策の開発において、大きな前進を示している。 AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, circuit-breaking directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, circuit breakers allow the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# チャン5号機からのカメラパスロバストクレーター検出 Camera-Pose Robust Crater Detection from Chang'e 5 ( http://arxiv.org/abs/2406.04569v2 ) ライセンス: Link先を確認	Matthew Rodda, Sofia McLeod, Ky Cuong Pham, Tat-Jun Chin,	(参考訳) 宇宙ミッションはますます危険な地形を探索することを目的としており、安全な航法を確保するには正確な位置推定とタイムリーな位置推定が必要である。視覚に基づくナビゲーションは、船上の画像から見える衝突クレーターと既知のデータベースを関連付けて、機体の姿勢を推定することで、この目標を達成する。しかし、既存の文献では、外部視角を含む画像からクレーター検出アルゴリズム(CDA)の性能を十分に評価していない。本研究では, クレーター検出のためのMask R-CNNの性能評価を行い, 外部視角を含む模擬データに基づく事前学習モデルと実画像による事前学習モデルを比較した。実画像に対する事前トレーニングは, 外部視角を含む画像が欠如しているにもかかわらず, 63.1F1スコアの検知性能と0.701交叉の楕円回帰性能を実現しているにもかかわらず, 優れていることを示す。本研究は,外部視角を含む画像上でのCDAの性能を定量的に解析した最初のものである。ますますロバストなCDAの開発に向けて、Chang'e 5 Landing Cameraからの外部視角を持つ最初の注釈付きCDAデータセットも提供します。 As space missions aim to explore increasingly hazardous terrain, accurate and timely position estimates are required to ensure safe navigation. Vision-based navigation achieves this goal through correlating impact craters visible through onboard imagery with a known database to estimate a craft's pose. However, existing literature has not sufficiently evaluated crater-detection algorithm (CDA) performance from imagery containing off-nadir view angles. In this work, we evaluate the performance of Mask R-CNN for crater detection, comparing models pretrained on simulated data containing off-nadir view angles and to pretraining on real-lunar images. We demonstrate pretraining on real-lunar images is superior despite the lack of images containing off-nadir view angles, achieving detection performance of 63.1 F1-score and ellipse-regression performance of 0.701 intersection over union. This work provides the first quantitative analysis of performance of CDAs on images containing off-nadir view angles. Towards the development of increasingly robust CDAs, we additionally provide the first annotated CDA dataset with off-nadir view angles from the Chang'e 5 Landing Camera.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# VS-PINN: 厳格な振る舞いを持つPDEを解くための可変スケーリング手法を用いた物理インフォームドニューラルネットワークの高速かつ効率的なトレーニング VS-PINN: A fast and efficient training of physics-informed neural networks using variable-scaling methods for solving PDEs with stiff behavior ( http://arxiv.org/abs/2406.06287v2 ) ライセンス: Link先を確認	Seungchan Ko, Sang Hyeon Park,	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、ディープニューラルネットワークを用いて偏微分方程式(PDE)の解を計算するための有望な方法として最近登場した。しかし、様々な分野で大きな成功を収めたにもかかわらず、PDEの解が硬い挙動や高い周波数を示す場合、PINNを効果的に訓練する方法は、多くの点で不明である。本稿では,変数スケーリング技術を用いたPINNのトレーニング手法を提案する。この方法は単純であり、急速に変化する解を持つPDEを含む幅広い問題に適用できる。様々な数値実験を通じて,提案手法の有効性を実証し,PINNのトレーニング効率と性能を大幅に向上させることができることを確認した。さらに,ニューラル・タンジェント・カーネル (NTK) の解析に基づき,この現象の理論的証拠を提供し,本手法がPINNの性能を向上させることを示す。 Physics-informed neural networks (PINNs) have recently emerged as a promising way to compute the solutions of partial differential equations (PDEs) using deep neural networks. However, despite their significant success in various fields, it remains unclear in many aspects how to effectively train PINNs if the solutions of PDEs exhibit stiff behaviors or high frequencies. In this paper, we propose a new method for training PINNs using variable-scaling techniques. This method is simple and it can be applied to a wide range of problems including PDEs with rapidly-varying solutions. Throughout various numerical experiments, we will demonstrate the effectiveness of the proposed method for these problems and confirm that it can significantly improve the training efficiency and performance of PINNs. Furthermore, based on the analysis of the neural tangent kernel (NTK), we will provide theoretical evidence for this phenomenon and show that our methods can indeed improve the performance of PINNs.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# 画像拡散モデルを用いたインスタント3次元アバター生成 Instant 3D Human Avatar Generation using Image Diffusion Models ( http://arxiv.org/abs/2406.07516v2 ) ライセンス: Link先を確認	Nikos Kolotouros, Thiemo Alldieck, Enric Corona, Eduard Gabriel Bazavan, Cristian Sminchisescu,	(参考訳) AvatarPopUpは画像やテキストプロンプトなどの異なる入力モードから高速で高品質な3Dアバターを生成する方法であり、生成したポーズや形状を制御できる。一般的なテーマは、各タスクに特化された拡散ベースの画像生成ネットワークを使用し、次に3Dリフトネットワークを使用することである。我々は、何十億ものテキストイメージペアで訓練された強力な画像合成を活用できるように、3Dモデリングから目的的に生成を分離する。画像生成とバックビュー予測のためのイメージコンディショニングを付加した潜伏拡散ネットワークを微調整し、定性的に異なる複数の3D仮説をサポートする。我々の部分的な微調整アプローチは、破滅的な忘れを誘発することなく、各タスクにネットワークを適応させることができる。実験では,本手法が多モードテキスト,画像,身体制御信号に敬意を表した,高精度で高品質な3Dアバターを製作できることを実証した。われわれのアプローチでは、2秒で3Dモデルを生成することができ、4桁のスピードアップが既存の手法の大部分に及んでいるが、そのほとんどはタスクのサブセットだけを解決し、より少ないコントロールで解決している。 AvatarPopUpは、大規模な人間のアバターの制御された3D生成を必要とするアプリケーションを可能にする。プロジェクトのWebサイトはhttps://www.nikoskolot.com/avatarpopup/にある。 We present AvatarPopUp, a method for fast, high quality 3D human avatar generation from different input modalities, such as images and text prompts and with control over the generated pose and shape. The common theme is the use of diffusion-based image generation networks that are specialized for each particular task, followed by a 3D lifting network. We purposefully decouple the generation from the 3D modeling which allow us to leverage powerful image synthesis priors, trained on billions of text-image pairs. We fine-tune latent diffusion networks with additional image conditioning for image generation and back-view prediction, and to support qualitatively different multiple 3D hypotheses. Our partial fine-tuning approach allows to adapt the networks for each task without inducing catastrophic forgetting. In our experiments, we demonstrate that our method produces accurate, high-quality 3D avatars with diverse appearance that respect the multimodal text, image, and body control signals. Our approach can produce a 3D model in as few as 2 seconds, a four orders of magnitude speedup wrt the vast majority of existing methods, most of which solve only a subset of our tasks, and with fewer controls. AvatarPopUp enables applications that require the controlled 3D generation of human avatars at scale. The project website can be found at https://www.nikoskolot.com/avatarpopup/.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# 階層型回帰モデルと計画による混合運動環境の適応性 Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning ( http://arxiv.org/abs/2406.08002v2 ) ライセンス: Link先を確認	Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng,	(参考訳) 近年のマルチエージェント強化学習(MARL)アルゴリズムの成功にもかかわらず、混合モチベーション環境でのコプレーヤへの適応は大きな課題である。一つの実現可能なアプローチは、その特性を推測し、階層的に共プレーヤの振る舞いをモデル化することである。しかし、これらの手法は推論情報の効率的な推論と利用においてしばしば困難に直面する。これらの問題に対処するために,混合モチベーション環境における未知のポリシーへのわずかな適応を可能にする,新しいマルチエージェント決定アルゴリズムである階層型対性モデリング・プランニング(HOP)を提案する。 HOPは階層的に2つのモジュールから構成されており、相手の目標を推論し、対応する目標条件付きポリシーを学習する対向モデリングモジュールと、モンテカルロ木探索(MCTS)を用いて最良の応答を識別する計画モジュールである。提案手法は,他者の目標に対する信念をエピソード内を問わず更新し,相手のモデリングモジュールからの情報を用いて計画のガイドを行うことにより効率を向上する。実験の結果, 混合運動環境においては, HOPは様々な未確認エージェントと相互作用する際, より優れた少数ショット適応能力を示し, 自己再生のシナリオにおいて優れていた。さらに、実験中の社会知能の出現は、複雑なマルチエージェント環境における我々のアプローチの可能性を強調している。 Despite the recent successes of multi-agent reinforcement learning (MARL) algorithms, efficiently adapting to co-players in mixed-motive environments remains a significant challenge. One feasible approach is to hierarchically model co-players' behavior based on inferring their characteristics. However, these methods often encounter difficulties in efficient reasoning and utilization of inferred information. To address these issues, we propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm that enables few-shot adaptation to unseen policies in mixed-motive environments. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies, and a planning module that employs Monte Carlo Tree Search (MCTS) to identify the best response. Our approach improves efficiency by updating beliefs about others' goals both across and within episodes and by using information from the opponent modeling module to guide planning. Experimental results demonstrate that in mixed-motive environments, HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios. Furthermore, the emergence of social intelligence during our experiments underscores the potential of our approach in complex multi-agent environments.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# OmniCorpus:100億レベル画像にテキストを埋め込んだ統合マルチモーダルコーパス OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ( http://arxiv.org/abs/2406.08418v3 ) ライセンス: Link先を確認	Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang, Min Dou, Changyao Tian, Xizhou Zhu, Lewei Lu, Yushi Chen, Junjun He, Zhongying Tu, Tong Lu, Yali Wang, Limin Wang, Dahua Lin, Yu Qiao, Botian Shi, Conghui He, Jifeng Dai,	(参考訳) 自然文書形式で配置された複数の画像とテキストからなる画像-テキストインターリーブドデータは、インターネットデータの提示パラダイムと整合し、人間の読書習慣によく似ている。近年の研究では、このようなデータがマルチモーダル・イン・コンテクスト学習に役立ち、マルチモーダル微調整時の大規模言語モデルの能力を維持することが示されている。しかし、現在の画像テキストインターリーブデータの規模と多様性は、マルチモーダルな大言語モデルの開発を制限している。本稿では,100億規模の画像テキストインターリーブデータセットであるOmniCorpusを紹介する。効率的なデータエンジンを用いて860億の画像と1,696億のテキストトークンを含む大規模高品質の文書をフィルタリング・抽出する。私たちのデータセット(例えば、MCC4、OBELICS)と比較してみましょう。 1) 優れたデータ品質を維持しながら、15倍のスケールを持つ。 2) 英語と非英語の両方のWebサイトやビデオ中心のWebサイトを含む、より多様なソースが特徴である。 3) より柔軟で、画像テキストインターリーブドフォーマットから純粋なテキストコーパスと画像テキストペアへ容易に分解できる。総合的な分析と実験を通じて,提案したデータセットの品質,ユーザビリティ,有効性を検証する。これが将来のマルチモーダルモデル研究に確かなデータ基盤を提供することを期待しています。コードとデータはhttps://github.com/OpenGVLab/OmniCorpusで公開されている。 Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale and diversity of current image-text interleaved data restrict the development of multimodal large language models. In this paper, we introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset. Using an efficient data engine, we filter and extract large-scale high-quality documents, which contain 8.6 billion images and 1,696 billion text tokens. Compared to counterparts (e.g., MMC4, OBELICS), our dataset 1) has 15 times larger scales while maintaining good data quality; 2) features more diverse sources, including both English and non-English websites as well as video-centric websites; 3) is more flexible, easily degradable from an image-text interleaved format to pure text corpus and image-text pairs. Through comprehensive analysis and experiments, we validate the quality, usability, and effectiveness of the proposed dataset. We hope this could provide a solid data foundation for future multimodal model research. Code and data are released at https://github.com/OpenGVLab/OmniCorpus.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# Glyph-ByT5-v2: 高精度多言語ビジュアルテキストレンダリングのための強力な美的ベースライン Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering ( http://arxiv.org/abs/2406.10208v2 ) ライセンス: Link先を確認	Zeyu Liu, Weicong Liang, Yiming Zhao, Bohan Chen, Lin Liang, Lijuan Wang, Ji Li, Yuhui Yuan,	(参考訳) 近年,Glyph-ByT5はグラフィックデザイン画像における高精度な視覚テキストレンダリング性能を実現している。しかし、それでも英語のみに焦点が当てられており、視覚的魅力の面では比較的貧弱である。本稿では,Glyph-ByT5-v2 と Glyph-SDXL-v2 という2つの基本的制約に対処する。これを達成するために、私たちは以下の貢献をしている。 (i)100万以上のグリフテキストペアと9つの他の言語をカバーする1000万のグラフィックデザインイメージテキストペアからなる高品質な多言語グリフテキストおよびグラフィックデザインデータセットを作成する。二言語ごとの100のプロンプトからなる多言語視覚段落ベンチマークを作成して、多言語視覚スペルの精度を評価すること。 3) 視覚美学の質を高めるために, 最新のステップアウェア優先学習アプローチを活用すること。これらの技術を組み合わせることで、強力なカスタマイズされた多言語テキストエンコーダGlyph-ByT5-v2と、10言語で正確な綴りをサポートする強力な美的グラフィック生成モデルGlyph-SDXL-v2を提供する。私たちは、最新のDALL-E3とIdeogram 1.0が、多言語のビジュアルテキストレンダリングタスクに苦戦していることを考慮し、我々の仕事を大きな進歩と見なしています。 Recently, Glyph-ByT5 has achieved highly accurate visual text rendering performance in graphic design images. However, it still focuses solely on English and performs relatively poorly in terms of visual appeal. In this work, we address these two fundamental limitations by presenting Glyph-ByT5-v2 and Glyph-SDXL-v2, which not only support accurate visual text rendering for 10 different languages but also achieve much better aesthetic quality. To achieve this, we make the following contributions: (i) creating a high-quality multilingual glyph-text and graphic design dataset consisting of more than 1 million glyph-text pairs and 10 million graphic design image-text pairs covering nine other languages, (ii) building a multilingual visual paragraph benchmark consisting of 1,000 prompts, with 100 for each language, to assess multilingual visual spelling accuracy, and (iii) leveraging the latest step-aware preference learning approach to enhance the visual aesthetic quality. With the combination of these techniques, we deliver a powerful customized multilingual text encoder, Glyph-ByT5-v2, and a strong aesthetic graphic generation model, Glyph-SDXL-v2, that can support accurate spelling in 10 different languages. We perceive our work as a significant advancement, considering that the latest DALL-E3 and Ideogram 1.0 still struggle with the multilingual visual text rendering task.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# SciEx: 人間の専門的なグラデーションと自動グラデーションによる科学実験における大規模言語モデルのベンチマーク SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading ( http://arxiv.org/abs/2406.10421v2 ) ライセンス: Link先を確認	Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Tobias Röddiger, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues,	(参考訳) LLM(Large Language Models)の急速な発展に伴い、異なるドメインにおけるLLMの能力を評価するためのベンチマークが不可欠である。 LLMの一般的な用途の1つは、アルゴリズムの作成、データベースのクエリ、数学的証明など、科学的なトピックに関するタスクを実行することである。本稿では,このような課題に対する大学生の評価の仕方から着想を得たSciExを提案する。 SciExは、(1)英語とドイツ語の両方の試験を含む多言語言語であり、(2)画像を含む質問を含むマルチモーダルであり、(3)大学試験の性質から、難易度が異なる様々な種類のフリーフォーム質問を含む。我々は,新しいベンチマークを用いて,最先端のLLMの性能評価を行った。 SciEx の質問は自由形式であるため LLM の性能を評価することは容易ではない。そこで我々は,SciEx 上での LLM 出力の人間の専門家による評価を行った。我々は、SciExのフリーフォーム試験が、現在、最高のLLMが平均59.4\%の試験成績しか達成していないLLMにとって、依然として挑戦的であることを示した。また,SciEx 上での LLM 性能と学生成績の詳細な比較を行った。 SciEx 上で LLM 回答を評価できる LLM-as-a-judge を提案する。実験の結果,LLMは試験の解法において完璧に機能するわけではないが,中等生として適しており,Pearson とエキスパートの成績の相関は0.948であることがわかった。 With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx - a benchmark consisting of university computer science exam questions, to evaluate LLMs ability on solving scientific tasks. SciEx is (1) multilingual, containing both English and German exams, and (2) multi-modal, containing questions that involve images, and (3) contains various types of freeform questions with different difficulty levels, due to the nature of university exams. We evaluate the performance of various state-of-the-art LLMs on our new benchmark. Since SciEx questions are freeform, it is not straightforward to evaluate LLM performance. Therefore, we provide human expert grading of the LLM outputs on SciEx. We show that the free-form exams in SciEx remain challenging for the current LLMs, where the best LLM only achieves 59.4\% exam grade on average. We also provide detailed comparisons between LLM performance and student performance on SciEx. To enable future evaluation of new LLMs, we propose using LLM-as-a-judge to grade the LLM answers on SciEx. Our experiments show that, although they do not perform perfectly on solving the exams, LLMs are decent as graders, achieving 0.948 Pearson correlation with expert grading.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# CVPR 2024 PBDLチャレンジの実施報告 Technique Report of CVPR 2024 PBDL Challenges ( http://arxiv.org/abs/2406.10744v3 ) ライセンス: Link先を確認	Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu, Yunkang Zhang, Siyuan Jiang, Xiaoqiang Lu, Licheng Jiao, Fang Liu, Xu Liu, Lingling Li, Wenping Ma, Shuyuan Yang, Haiyang Xie, Jian Zhao, Shihua Huang, Peng Cheng, Xi Shen, Zheng Wang, Shuai An, Caizhi Zhu, Xuelong Li, Tao Zhang, Liang Li, Yu Liu, Chenggang Yan, Gengchen Zhang, Linyan Jiang, Bingyi Song, Zhuoyu An, Haibo Lei, Qing Luo, Jie Song, Yuan Liu, Qihang Li, Haoyuan Zhang, Lingfeng Wang, Wei Chen, Aling Luo, Cheng Li, Jun Cao, Shu Chen, Zifei Dou, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Xuejian Gou, Qinliang Wang, Yang Liu, Shizhan Zhao, Yanzhao Zhang, Libo Yan, Yuwei Guo, Guoxin Li, Qiong Gao, Chenyue Che, Long Sun, Xiang Chen, Hao Li, Jinshan Pan, Chuanlong Xie, Hongming Chen, Mingrui Li, Tianchen Deng, Jingwei Huang, Yufeng Li, Fei Wan, Bingxin Xu, Jian Cheng, Hongzhe Liu, Cheng Xu, Yuxiang Zou, Weiguo Pan, Songyin Dai, Sen Jia, Junpei Zhang, Puhua Chen, Qihang Li,	(参考訳) 物理に基づくビジョンとディープラーニングの交わりは、コンピュータビジョン技術の進歩にエキサイティングなフロンティアをもたらす。物理の原理を活用して、深層学習モデルの情報提供と強化を行うことで、より堅牢で正確な視覚システムを開発することができる。物理に基づくビジョンは、画像から形状、反射率、光の分布、中性などのシーン特性を復元する過程を反転させることを目的としている。近年、ディープラーニングは様々な視覚タスクに有望な改善を示しており、物理に基づく視覚と組み合わせることで、これらのアプローチは視覚システムの堅牢性と精度を高めることができる。 CVPR 2024ワークショップで行われたPBDL 2024チャレンジの結果を要約する。課題は8つのトラックで構成され、低光強調と検出、ハイダイナミックレンジ(HDR)イメージングに焦点を当てた。本報告では,各トラックの目的,方法論,成果を詳述し,最高性能のソリューションとその革新的なアプローチについて述べる。 The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches.	翻訳日:2024-07-16 04:27:56 公開日:2024-07-12
# 3次元から2次元の空洞蒸留による単一スライスセグメンテーションの促進 Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation ( http://arxiv.org/abs/2406.12254v2 ) ライセンス: Link先を確認	Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael E. Kim, Rendong Zhang, Shunxing Bao, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman,	(参考訳) 腹部CT(Single-Slice abdominal Computed Tomography)により,低放射線照射による身体習慣および臓器の健康状態の評価が可能となった。しかしながら、単一スライスデータはセグメンテーションに2Dネットワークを使用する必要があるが、これらのネットワークは文脈情報を効果的に捉えるのに苦労することが多い。したがって、同一のデータセットでトレーニングしても、3Dネットワークは通常より優れたセグメンテーション結果が得られる。本研究では, 事前学習した3Dモデルを用いて, 2次元単一スライスセグメンテーションを向上する新しい3D-to-2D蒸留フレームワークを提案する。具体的には,3次元表現から予測分布セントロイドを抽出し,クラス内およびクラス間相関を学習することによって2次元学生の指導を行う。同じデータ入力を必要とする従来の知識蒸留法とは異なり、我々のアプローチでは、2次元の学生モデルをガイドするために、コントラストのない3次元CTスキャンを採用しています。単一スライス型ボルチモア縦断年代測定(BLSA)データセットから707名の被験者を対象に行った実験により,最先端の2次元多臓器分割法が3次元教師モデルの恩恵を受け,単一スライス型多臓器分割の性能向上を実現していることが示された。特に,本手法は,訓練対象者200名に過ぎなかった場合においても,訓練対象者全員で訓練したモデルよりも優れ,低データ体制において有意な有効性を示した。このように、この研究は手作業によるアノテーションの負担を軽減する可能性を浮き彫りにしている。 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# タブラルデータ生成モデルのフッド下:ハイパーパラメータチューニングの強い影響 Under the Hood of Tabular Data Generation Models: the Strong Impact of Hyperparameter Tuning ( http://arxiv.org/abs/2406.12945v2 ) ライセンス: Link先を確認	G. Charbel N. Kindji, Lina Maria Rojas-Barahona, Elisa Fromont, Tanguy Urvoy,	(参考訳) グラフデータ生成のための最近の5つのモデルファミリに対する,データセット固有のハイパーパラメータ,特徴符号化,アーキテクチャチューニングの影響を,16データセットの広範なベンチマークを用いて検討した。本研究は、ハイパーパラメータ最適化を完全に考慮したモデルの統一評価の実践的必要性に対処する。さらに,各モデルに対して,高速な最適化を実現し,ほぼ同等の性能を極めて低いコストで達成する検索スペースの削減を提案し,我々のベンチマークでは,ほとんどのモデルにおいて,大規模データセット特化チューニングが元の構成よりも大幅に性能を向上することを示した。さらに,拡散モデルが表データ上で他のモデルを上回ることが確認された。しかし、チューニングとトレーニングプロセス全体がすべてのモデルで同じGPU予算に制限されている場合、この利点は重要ではない。 We investigate the impact of dataset-specific hyperparameter, feature encoding, and architecture tuning on five recent model families for tabular data generation through an extensive benchmark on 16 datasets. This study addresses the practical need for a unified evaluation of models that fully considers hyperparameter optimization. Additionally, we propose a reduced search space for each model that allows for quick optimization, achieving nearly equivalent performance at a significantly lower cost.Our benchmark demonstrates that, for most models, large-scale dataset-specific tuning substantially improves performance compared to the original configurations. Furthermore, we confirm that diffusion-based models generally outperform other models on tabular data. However, this advantage is not significant when the entire tuning and training process is restricted to the same GPU budget for all models.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# 対話型人工知能が心の理論と自律行動の体系化に有効か : 比較分析 The Efficacy of Conversational Artificial Intelligence in Rectifying the Theory of Mind and Autonomy Biases: Comparative Analysis ( http://arxiv.org/abs/2406.13813v3 ) ライセンス: Link先を確認	Marcin Rządeczka, Anna Sterna, Julia Stolińska, Paulina Kaczyńska, Marcin Moskalewicz,	(参考訳) この研究は、認知バイアスの是正と人間とAIの相互作用への影響の認識における会話型人工知能(CAI)の有効性を評価する。認知バイアス(規範的思考からの体系的な逸脱)は精神健康に影響を与え、うつ病や不安などの症状を増す。治療チャットボットは、認知行動療法(CBT)をより使いやすく、手頃な価格で、スケーラブルで即時のサポートを提供する。この研究は、典型的なユーザとボットの相互作用をシミュレートする臨床ベースの仮想ケースシナリオを用いた構造化手法を用いている。パフォーマンスと感情の認知バイアスは、マインドバイアスの理論(AIの人間的形態化、AIへの過信、AIへの帰属)と自律バイアス(制御のイリュージョン、基本的な帰属エラー、ジャストワールド仮説)の2つのカテゴリで評価された。定性的フィードバック機構は, 精度, 治療品質, およびCBTの原理の遵守に基づく応答の定量化のために, 順序尺度を用いて使用した。医療用ロボット(Wysa, Youper)と一般用LSM(GTP 3.5, GTP 4, Gemini Pro)をスクリプトによる相互作用により評価し, 認知科学者と臨床心理学者が二重レビューを行った。統計的分析では、非治療的ボットはバイアス修正において常に優れた成績を示し、6つのバイアスのうち4つは影響認識において優れていた。このデータは、非治療的なチャットボットが認知バイアスに対処する上でより効果的であることを示唆している。 The study evaluates the efficacy of Conversational Artificial Intelligence (CAI) in rectifying cognitive biases and recognizing affect in human-AI interactions, which is crucial for digital mental health interventions. Cognitive biases (systematic deviations from normative thinking) affect mental health, intensifying conditions like depression and anxiety. Therapeutic chatbots can make cognitive-behavioral therapy (CBT) more accessible and affordable, offering scalable and immediate support. The research employs a structured methodology with clinical-based virtual case scenarios simulating typical user-bot interactions. Performance and affect recognition were assessed across two categories of cognitive biases: theory of mind biases (anthropomorphization of AI, overtrust in AI, attribution to AI) and autonomy biases (illusion of control, fundamental attribution error, just-world hypothesis). A qualitative feedback mechanism was used with an ordinal scale to quantify responses based on accuracy, therapeutic quality, and adherence to CBT principles. Therapeutic bots (Wysa, Youper) and general-use LLMs (GTP 3.5, GTP 4, Gemini Pro) were evaluated through scripted interactions, double-reviewed by cognitive scientists and a clinical psychologist. Statistical analysis showed therapeutic bots were consistently outperformed by non-therapeutic bots in bias rectification and in 4 out of 6 biases in affect recognition. The data suggests that non-therapeutic chatbots are more effective in addressing some cognitive biases.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# バーを高くする - ジェネレーティブ進化テストによる大規模言語モデルの価値の調査 Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing ( http://arxiv.org/abs/2406.14230v2 ) ライセンス: Link先を確認	Han Jiang, Xiaoyuan Yi, Zhihua Wei, Shu Wang, Xing Xie,	(参考訳) 警告: 非倫理的な情報を示すモデル出力を含む。大きな言語モデル(LLM)は大きなブレークスルーを達成したが、生成された非倫理的コンテンツは潜在的なリスクをもたらしている。 LLMの価値アライメントを測定することは、その規制と責任あるデプロイメントにとって不可欠である。 LLMの社会的偏見、毒性、倫理を評価するために、多くのデータセットが構築されているが、モデルが急速に進化するにつれて、既存のデータが漏れたり、不必要な状態に陥り、絶え間なく発展するLLMを過大評価する、という評価のクロノエフェクトに悩まされている。この問題に対処するために,LLMの根底にある道徳的基線を動的に探索する新しい生成的進化テスト手法であるGAAを提案する。制限のある静的データセットに依存する従来の適応テスト手法とは違い、GAAは反復的に更新されたアイテムジェネレータを組み込んで、各LSMの道徳的境界を推測し、真のアライメント範囲を正確に反映して困難に調整されたテスト項目を生成する。このプロセスは理論的にアイテムとモデル応答の結合分布を学習し、アイテムの難易度と値の適合性を潜伏変数とし、ジェネレータはLSMと共進化し、クロノエフェクトに対処する。我々は,多様な能力を持つ多種多様なLLMを評価し,GAAが難解なテスト項目を作成し,LCMの値をより正確に評価し,未確認のOODおよびi.d.項目の性能と整合性を向上し,将来の評価パラダイムの基盤となることを実証した。 Warning: this paper contains model outputs exhibiting unethical information. Large Language Models (LLMs) have achieved significant breakthroughs, but their generated unethical content poses potential risks. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Numerous datasets have been constructed to assess social bias, toxicity, and ethics in LLMs, but they suffer from evaluation chronoeffect, that is, as models rapidly evolve, existing data becomes leaked or undemanding, overestimating ever-developing LLMs. To tackle this problem, we propose GETA, a novel generative evolving testing approach that dynamically probes the underlying moral baselines of LLMs. Distinct from previous adaptive testing methods that rely on static datasets with limited difficulty, GETA incorporates an iteratively-updated item generator which infers each LLM's moral boundaries and generates difficulty-tailored testing items, accurately reflecting the true alignment extent. This process theoretically learns a joint distribution of item and model response, with item difficulty and value conformity as latent variables, where the generator co-evolves with the LLM, addressing chronoeffect. We evaluate various popular LLMs with diverse capabilities and demonstrate that GETA can create difficulty-matching testing items and more accurately assess LLMs' values, better consistent with their performance on unseen OOD and i.i.d. items, laying the groundwork for future evaluation paradigms.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# Reward Misspecification 問題としての脱獄 Jailbreaking as a Reward Misspecification Problem ( http://arxiv.org/abs/2406.14393v2 ) ライセンス: Link先を確認	Zhihui Xie, Jiahui Gao, Lei Li, Zhenguo Li, Qi Liu, Lingpeng Kong,	(参考訳) 大規模言語モデル(LLM)の普及は、その安全性と信頼性、特に敵の攻撃に対する脆弱性に対する懸念を引き起こしている。本稿では,この脆弱性をアライメント過程における不特定性に寄与する新たな視点を提案する。本稿では,報酬の誤特定の程度を定量化するための指標ReGapを紹介し,有害なバックドアプロンプトを検出する上での有効性とロバスト性を示す。これらの知見に基づいて、様々な目標に整列したLDMに対して対向的なプロンプトを生成する自動レッドチーム作成システムであるReMissを提案する。 ReMissは、生成されたプロンプトの可読性を保ちながら、AdvBenchベンチマークにおける最先端の攻撃成功率を達成する。詳細な分析は、提案された報酬の不特定目標によってもたらされる独特な利点を以前の方法と比較して強調する。 The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness and robustness in detecting harmful backdoor prompts. Building upon these insights, we present ReMiss, a system for automated red teaming that generates adversarial prompts against various target aligned LLMs. ReMiss achieves state-of-the-art attack success rates on the AdvBench benchmark while preserving the human readability of the generated prompts. Detailed analysis highlights the unique advantages brought by the proposed reward misspecification objective compared to previous methods.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# ImageFlowNet:不規則にサンプリングされた縦断的医用画像による疾患進行のマルチスケール軌跡の予測 ImageFlowNet: Forecasting Multiscale Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images ( http://arxiv.org/abs/2406.14794v3 ) ライセンス: Link先を確認	Chen Liu, Ke Xu, Liangbo L. Shen, Guillaume Huguet, Zilong Wang, Alexander Tong, Danilo Bzdok, Jay Stewart, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy,	(参考訳) 画像から病気の進行を予測することは、臨床的意思決定の聖杯である。しかし, この課題は, 高次元性, 時空間性, サンプリング不規則性により複雑である。既存の手法では、しばしば手作りの特徴を抽出し、このベクトル空間で時系列解析を行うことで、画像内の豊富な空間情報が失われる。これらの課題を克服するために、我々は、ニューラルネットワークとSDEを用いて共同埋め込み空間におけるマルチスケール表現を進化させ、画像領域における病気の進行をモデル化する、潜時空間流れ場を学習する新しいフレームワークであるImageFlowNetを紹介した。特に、ImageFlowNetは、患者のコホートを組み合わせて、患者サンプル間で情報を伝達できるように、マルチスケールの関節表現空間を学習する。ダイナミクスはその後、進行のもっともらしい軌跡を提供し、SDEは同じ出発点から別の軌跡を提供する。我々は、ODEの定式化を支援し、高レベルの視覚的特徴、潜在空間の組織、軌道の滑らかさを含む正規化を動機付ける理論的洞察を提供する。次に、網膜の地理的萎縮、多発性硬化症、グリオ芽腫の進行を示す3つの縦断的医用画像データセットを用いて、画像FlowNetの有効性を実証的に評価した。 The forecasting of disease progression from images is a holy grail for clinical decision making. However, this task is complicated by the inherent high dimensionality, temporal sparsity and sampling irregularity in longitudinal image acquisitions. Existing methods often rely on extracting hand-crafted features and performing time-series analysis in this vector space, leading to a loss of rich spatial information within the images. To overcome these challenges, we introduce ImageFlowNet, a novel framework that learns latent-space flow fields that evolve multiscale representations in joint embedding spaces using neural ODEs and SDEs to model disease progression in the image domain. Notably, ImageFlowNet learns multiscale joint representation spaces by combining cohorts of patients together so that information can be transferred between the patient samples. The dynamics then provide plausible trajectories of progression, with the SDE providing alternative trajectories from the same starting point. We provide theoretical insights that support our formulation of ODEs, and motivate our regularizations involving high-level visual features, latent space organization, and trajectory smoothness. We then demonstrate ImageFlowNet's effectiveness through empirical evaluations on three longitudinal medical image datasets depicting progression in retinal geographic atrophy, multiple sclerosis, and glioblastoma.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# X線CTPA:2次元X線コンディショニングから3次元CTPAスキャンを生成する X-ray2CTPA: Generating 3D CTPA scans from 2D X-ray conditioning ( http://arxiv.org/abs/2406.16109v3 ) ライセンス: Link先を確認	Noa Cahan, Eyal Klang, Galit Aviram, Yiftach Barash, Eli Konen, Raja Giryes, Hayit Greenspan,	(参考訳) 胸部X線または胸部X線撮影(CXR)は、一般的にCTスキャンと比較して限られた画像撮影が可能であり、特にCTPA(CT lung Angiography)のような造影スキャンにより、より詳細に正確な3次元データを提供する。しかし、CTスキャンはコストが高く、放射線被曝が大きく、CXRよりもアクセスしにくい。本研究では,2次元低コントラスト分解能X線入力から3次元高コントラスト・空間分解能CTPAスキャンへのクロスモーダル変換について検討する。生成AIの最近の進歩により、我々はこのタスクに新しい拡散に基づくアプローチを導入する。測定値と放射線技師からの定性的フィードバックの両方を用いてモデル性能を評価し, 生成した画像の診断的妥当性を保証した。さらに, 合成した3D画像を分類フレームワークに採用し, 最初のCXR入力を用いて, PE分類タスクにおいて改良されたAUCを示す。提案手法は一般化可能であり,医療画像に付加的なモダリティ変換を行うことができる。よりアクセシブルで費用対効果の高い高度な診断ツールの道を開くかもしれない。プロジェクトのコードは、https://github.com/NoaCahan/X-ray2CTPA である。 Chest X-rays or chest radiography (CXR), commonly used for medical diagnostics, typically enables limited imaging compared to computed tomography (CT) scans, which offer more detailed and accurate three-dimensional data, particularly contrast-enhanced scans like CT Pulmonary Angiography (CTPA). However, CT scans entail higher costs, greater radiation exposure, and are less accessible than CXRs. In this work we explore cross-modal translation from a 2D low contrast-resolution X-ray input to a 3D high contrast and spatial-resolution CTPA scan. Driven by recent advances in generative AI, we introduce a novel diffusion-based approach to this task. We evaluate the models performance using both quantitative metrics and qualitative feedback from radiologists, ensuring diagnostic relevance of the generated images. Furthermore, we employ the synthesized 3D images in a classification framework and show improved AUC in a PE categorization task, using the initial CXR input. The proposed method is generalizable and capable of performing additional cross-modality translations in medical imaging. It may pave the way for more accessible and cost-effective advanced diagnostic tools. The code for this project is available: https://github.com/NoaCahan/X-ray2CTPA .	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# 統計的推定を超える:シャッフルによる個人個人計算 Beyond Statistical Estimation: Differentially Private Individual Computation via Shuffling ( http://arxiv.org/abs/2406.18145v2 ) ライセンス: Link先を確認	Shaowei Wang, Changyu Dong, Xiangfu Song, Jin Li, Zhili Zhou, Di Wang, Han Wu,	(参考訳) データ駆動アプリケーションでは、ユーザプライバシを保持しながら、価値のある計算を可能にすることは、依然として重要な課題である。差別化プライバシ(DP)のような技術は、これらの懸念に対処する上で重要な役割を担っている。 DPのシャッフルモデルでは、信頼できるキュレーターは必要とせず、シャッフルから得られるプライバシー増幅効果を活用して高いユーティリティを実現することができる。これらの利点はシャッフルモデルに大きな関心を惹いた。しかし、シャッフルモデルの計算タスクは統計的推定に限られており、各ユーザがパーソナライズされた出力を必要とする実世界のシナリオには適用できない。本稿では、より広い範囲の置換同変計算をサポートするためにシャッフルモデルを拡張した、PIC(Private Individual Computation)と呼ばれる新しいパラダイムを提案する。 PICは、プライバシを保持しながらパーソナライズされたアウトプットを可能にし、シャッフルによってプライバシーを増幅する。 PICを実現するための具体的なプロトコルを提案する。本プロトコルでは,1回の公開鍵を使用すれば,プライバシーの増幅に不可欠な匿名性を損なうことなく,出力を受信することができる。さらに,有効性を高めるためにPICモデルのために設計された最適確率化器であるミンコフスキー応答を提案する。 PICプロトコルのセキュリティおよびプライバシ特性を正式に証明する。理論的解析と経験的評価は、PICが非統計計算タスクを処理し、PICとミンコフスキー確率化器が既存の解よりも優れた効用を達成できることを示す。 In data-driven applications, preserving user privacy while enabling valuable computations remains a critical challenge. Technologies like Differential Privacy (DP) have been pivotal in addressing these concerns. The shuffle model of DP requires no trusted curators and can achieve high utility by leveraging the privacy amplification effect yielded from shuffling. These benefits have led to significant interest in the shuffle model. However, the computation tasks in the shuffle model are limited to statistical estimation, making the shuffle model inapplicable to real-world scenarios in which each user requires a personalized output. This paper introduces a novel paradigm termed Private Individual Computation (PIC), expanding the shuffle model to support a broader range of permutation-equivariant computations. PIC enables personalized outputs while preserving privacy, and enjoys privacy amplification through shuffling. We propose a concrete protocol that realizes PIC. By using one-time public keys, our protocol enables users to receive their outputs without compromising anonymity, which is essential for privacy amplification. Additionally, we present an optimal randomizer, the Minkowski Response, designed for the PIC model to enhance utility. We formally prove the security and privacy properties of the PIC protocol. Theoretical analysis and empirical evaluations demonstrate PIC's capability in handling non-statistical computation tasks, and the efficacy of PIC and the Minkowski randomizer in achieving superior utility compared to existing solutions.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# 時系列の早期分類:分類学とベンチマーク Early Classification of Time Series: Taxonomy and Benchmark ( http://arxiv.org/abs/2406.18332v2 ) ライセンス: Link先を確認	Aurélien Renault, Alexis Bondu, Antoine Cornuéjols, Vincent Lemaire,	(参考訳) 多くの場合、研究された現象の測定は順次提供され、タイムペナルティを過度に高くしないよう、クラスをできるだけ早く予測する必要があるが、早すぎるのではなく、誤分類のコストを支払うリスクがある。この問題は特に時系列の場合において研究されており、早期時系列分類(Early Classification of Time Series, ECTS)として知られている。文学の分野として発展してきたが,既存手法の相対的メリットを比較するための,体系的かつ共有的な評価プロトコルがいまだに存在しない。この文書は、これらの手法を原則に基づく分類に位置づけることから始まる。評価を整理するための次元を定義し、その後、9つの最先端ECTSアルゴリズムを含む、これらの次元に沿った非常に広範な実験の結果を報告する。さらに、これらや他の実験は、既存のECTSアルゴリズムの大部分が実装されているオープンソースライブラリを使って行うことができる(参照: \url{https://github.com/ML-EDM/ml_edm})。 In many situations, the measurements of a studied phenomenon are provided sequentially, and the prediction of its class needs to be made as early as possible so as not to incur too high a time penalty, but not too early and risk paying the cost of misclassification. This problem has been particularly studied in the case of time series, and is known as Early Classification of Time Series (ECTS). Although it has been the subject of a growing body of literature, there is still a lack of a systematic, shared evaluation protocol to compare the relative merits of the various existing methods. This document begins by situating these methods within a principle-based taxonomy. It defines dimensions for organizing their evaluation, and then reports the results of a very extensive set of experiments along these dimensions involving nine state-of-the art ECTS algorithms. In addition, these and other experiments can be carried out using an open-source library in which most of the existing ECTS algorithms have been implemented (see \url{https://github.com/ML-EDM/ml_edm}).	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# RoboUniView:ロボットマニピュレイトンのための統一ビュー表現を用いた視覚言語モデル RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton ( http://arxiv.org/abs/2406.18977v2 ) ライセンス: Link先を確認	Fanfan Liu, Feng Yan, Liming Zheng, Chengjian Feng, Yiyang Huang, Lin Ma,	(参考訳) ロボット操作のためのビジョンランゲージモデル(VLM)の利用は、新しいオブジェクトや命令に一般化するモデルの能力を高めることを目的とした、新しいパラダイムである。しかし、カメラの仕様や設置位置の変化により、既存の手法は異なるロボットプラットフォーム間で大きな性能格差を示す。この課題に対処するために,アクション学習から視覚的特徴抽出を分離する革新的なアプローチであるRoboUniViewを提案する。我々はまず、アクセスしやすいデータに基づいて事前学習することで、多視点ビューから統一されたビュー表現を学び、その後、この統合されたビュー表現からアクションを導出し、ロボット操作を制御する。この統合ビュー表現は、物理的な世界をより正確に反映し、ロボットプラットフォームのカメラパラメータに制約されない。この手法により、要求されるCALVINベンチマークの最先端性能を達成し、93.0%から96.2%の$D \to D$設定、92.2%から94.2%の$ABC \to D$設定の成功率を高める。さらに,本モデルでは,未知のカメラパラメータの下で高い性能を維持し,様々なカメラパラメータを持つ複数のデータセットを利用でき,データセット間のクロスタスク学習を共同で行うことが可能である。コードは再実装のために提供される。 https://github.com/liufanfanlff/RoboUniview Utilizing Vision-Language Models (VLMs) for robotic manipulation represents a novel paradigm, aiming to enhance the model's ability to generalize to new objects and instructions. However, due to variations in camera specifications and mounting positions, existing methods exhibit significant performance disparities across different robotic platforms. To address this challenge, we propose RoboUniView in this paper, an innovative approach that decouples visual feature extraction from action learning. We first learn a unified view representation from multi-perspective views by pre-training on readily accessible data, and then derive actions from this unified view representation to control robotic manipulation. This unified view representation more accurately mirrors the physical world and is not constrained by the robotic platform's camera parameters. Thanks to this methodology, we achieve state-of-the-art performance on the demanding CALVIN benchmark, enhancing the success rate in the $D \to D$ setting from 93.0% to 96.2%, and in the $ABC \to D$ setting from 92.2% to 94.2%. Moreover, our model exhibits outstanding adaptability and flexibility: it maintains high performance under unseen camera parameters, can utilize multiple datasets with varying camera parameters, and is capable of joint cross-task learning across datasets. Code is provided for re-implementation. https://github.com/liufanfanlff/RoboUniview	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# 合成癌-LLMで悪を増す Synthetic Cancer -- Augmenting Worms with LLMs ( http://arxiv.org/abs/2406.19570v2 ) ライセンス: Link先を確認	Benjamin Zimmerman, David Zollikofer,	(参考訳) ますます洗練された大規模言語モデル(LLM)によって、乱用の可能性は大きく上昇する。スイスAI安全賞(Swiss AI Safety Prize)への提出として、2つの主要なプロセスにLLMを利用する新しいタイプの変成マルウェアを提案する。第一に、LSMは、アンチマルウェアプログラムによるシグネチャベースの検出を避けるために、自動コード書き換えに使用される。マルウェアはLLMを利用して電子メールの返信をソーシャルにエンジニアリングし、受信者にマルウェアの実行を促す。私たちの提出書類には、LLMがサイバーセキュリティにもたらすリスクを強調し、インテリジェントなマルウェアのさらなる研究の必要性を強調する機能的最小限のプロトタイプが含まれています。 With increasingly sophisticated large language models (LLMs), the potential for abuse rises drastically. As a submission to the Swiss AI Safety Prize, we present a novel type of metamorphic malware leveraging LLMs for two key processes. First, LLMs are used for automatic code rewriting to evade signature-based detection by antimalware programs. The malware then spreads its copies via email by utilizing an LLM to socially engineer email replies to encourage recipients to execute the attached malware. Our submission includes a functional minimal prototype, highlighting the risks that LLMs pose for cybersecurity and underscoring the need for further research into intelligent malware.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# ROS-LLM:タスクフィードバックと構造化推論を備えたAI具体化のためのROSフレームワーク ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning ( http://arxiv.org/abs/2406.19741v3 ) ライセンス: Link先を確認	Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar,	(参考訳) 本稿では,ロボットオペレーティング・システム(ROS)の自然言語プロンプトと文脈情報を活用する,非専門家による直感的なロボットプログラミングのためのフレームワークを提案する。我々のシステムは,大規模言語モデル (LLM) を統合し,非専門家がチャットインタフェースを通じてシステムにタスク要求を記述できるようにする。フレームワークの主な特徴は、オープンソースのLLMと接続されたAIエージェントとのROSの統合、LLM出力からの行動の自動抽出、ROSアクション/サービスの実行、3つの動作モード(シーケンス、行動ツリー、状態マシン)のサポート、可能なアクションのライブラリに新しいロボットアクションを追加する模倣学習、人間と環境のフィードバックによるLCMリフレクションである。大規模な実験により、長期のタスク、テーブルトップの再配置、リモート監視制御など、さまざまなシナリオにおける堅牢性、スケーラビリティ、汎用性を示すフレームワークが検証された。フレームワークの採用を容易にし、その結果の再現をサポートするため、コードをオープンソースにしました。 https://github.com/huawei-noah/HEBO/tree/master/ROSLLM We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# 事前学習型言語モデルにおける認知知の発達 Development of Cognitive Intelligence in Pre-trained Language Models ( http://arxiv.org/abs/2407.01047v3 ) ライセンス: Link先を確認	Raj Sanjay Shah, Khushi Bhardwaj, Sashank Varma,	(参考訳) 近年の研究では、PLM(Large Pre-trained Language Models)における創発的認知能力の証拠が示されている。これらのモデルの認知的アライメントの増大は、認知科学理論の候補となっている。 PLMの創発的認知能力に関する以前の研究は、主にパス非依存のモデルトレーニング、すなわち、中間段階ではなく最終的なモデルウェイトに焦点を当ててきた。しかし, PLMを用いた人間認知モデルの構築は, 子どもの思考の軌跡に対する学習時の行動の発達的アライメントを考慮すれば有益である。人間の知能の心理測定テストにより、PLMの10家族のアライメントを調査する4つのタスクを選択し、その中間および最終訓練手順を評価する。これらのタスクは、数値能力、言語能力、概念理解、および流体推論である。モデルのサイズに関わらず、PLMの発達軌跡は、人間の認知発達に対する最大限の調整の窓を一貫して示している。そのウィンドウの前には、トレーニングによって"ブランクスレート"モデルと、経験から素早く学ぶために必要な構造が提供されるように思われる。この窓のあと、トレーニングは損失を減らすという工学的な目標に役立っているように見えるが、人間の認知との整合性を高めるという科学的目標ではない。 Recent studies show evidence for emergent cognitive abilities in Large Pre-trained Language Models (PLMs). The increasing cognitive alignment of these models has made them candidates for cognitive science theories. Prior research into the emergent cognitive abilities of PLMs has largely been path-independent to model training, i.e., has focused on the final model weights and not the intermediate steps. However, building plausible models of human cognition using PLMs would benefit from considering the developmental alignment of their performance during training to the trajectories of children's thinking. Guided by psychometric tests of human intelligence, we choose four sets of tasks to investigate the alignment of ten popular families of PLMs and evaluate their available intermediate and final training steps. These tasks are Numerical ability, Linguistic abilities, Conceptual understanding, and Fluid reasoning. We find a striking regularity: regardless of model size, the developmental trajectories of PLMs consistently exhibit a window of maximal alignment to human cognitive development. Before that window, training appears to endow "blank slate" models with the requisite structure to be poised to rapidly learn from experience. After that window, training appears to serve the engineering goal of reducing loss but not the scientific goal of increasing alignment with human cognition.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# Gloss2Text: LLMとSemantically Aware Label Smoothingを用いた手話グロス翻訳 Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing ( http://arxiv.org/abs/2407.01394v2 ) ライセンス: Link先を確認	Pooya Fayyazsanavi, Antonios Anastasopoulos, Jana Košecká,	(参考訳) ビデオから音声テキストへの手話翻訳は、異なる文法、表現ニュアンス、異なる話者や文脈間での視覚的外観の変化により、独特な課題を呈している。ビデオの中間的な光沢アノテーションは、翻訳プロセスのガイドを目的としている。本研究は,既存の言語モデル(LLM),データ拡張,光沢変換の曖昧性を利用した新しいラベル平滑化損失関数を活用することで,最先端の手法の性能を大幅に向上させることにより,翻訳段階に着目し,いくつかの進歩を提案する。 PHOENIX Weather 2014Tデータセットに関する広範な実験とアブレーション研究を通じて、我々のアプローチは、手話翻訳における最先端のパフォーマンスを超越し、手話翻訳におけるその有効性を示し、将来の研究開発への道のりを示唆している。 Sign language translation from video to spoken text presents unique challenges owing to the distinct grammar, expression nuances, and high variation of visual appearance across different speakers and contexts. The intermediate gloss annotations of videos aim to guide the translation process. In our work, we focus on {\em Gloss2Text} translation stage and propose several advances by leveraging pre-trained large language models (LLMs), data augmentation, and novel label-smoothing loss function exploiting gloss translation ambiguities improving significantly the performance of state-of-the-art approaches. Through extensive experiments and ablation studies on the PHOENIX Weather 2014T dataset, our approach surpasses state-of-the-art performance in {\em Gloss2Text} translation, indicating its efficacy in addressing sign language translation and suggesting promising avenues for future research and development.	翻訳日:2024-07-16 04:18:12 公開日:2024-07-12
# メモリベース大規模言語モデルのためのHaystackの針 Needle in the Haystack for Memory Based Large Language Models ( http://arxiv.org/abs/2407.01437v2 ) ライセンス: Link先を確認	Elliot Nelson, Georgios Kollias, Payel Das, Subhajit Chaudhury, Soham Dan,	(参考訳) 現在の大規模言語モデル(LLM)は、単純な事実検索タスクではよく機能しない。本稿では、動的に適応可能な外部メモリをLCMに結合することで、この問題を軽減することができるか検討する。この目的のために、我々は最近提案された言語モデルアーキテクチャであるLarimarを、パスキーやニードル・イン・ザ・ヘイスタックテストを含む長いコンテキストのリコールタスクでテストする。テキストサンプルのエピソードを高速に書き書きできるLarimarの外部メモリは、テスト時に、トレーニング中に見られるものよりもはるかに長いコンテキストを扱うために使用できることを示した。さらに、メモリからの遅延読み出し(長いコンテキストが書かれる)がデコーダを制御して正しい出力を生成し、メモリはGPUから外されることを示す。より大きいパラメータ数または修正された注意機構を使用する長文リコールタスクのための既存のトランスフォーマーベースのLLMアーキテクチャと比較すると、比較的小さなLarimarはタスク固有のトレーニングや長いコンテキストでのトレーニングをすることなく、強いパフォーマンスを維持することができる。 Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. Here we investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. For this purpose, we test Larimar, a recently proposed language model architecture which uses an external associative memory, on long-context recall tasks including passkey and needle-in-the-haystack tests. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training. We further show that the latent readouts from the memory (to which long contexts are written) control the decoder towards generating correct outputs, with the memory stored off of the GPU. Compared to existing transformer-based LLM architectures for long-context recall tasks that use larger parameter counts or modified attention mechanisms, a relatively smaller size Larimar is able to maintain strong performance without any task-specific training or training on longer contexts.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# ブラックホール内部のテンソルネットワーク:非等方性、量子超表面、ワームホール Tensor networks for black hole interiors: non-isometries, quantum extremal surfaces, and wormholes ( http://arxiv.org/abs/2407.01666v2 ) ライセンス: Link先を確認	Gracemarie Bueller, Oliver DeWolfe, Kenneth Higginbotham,	(参考訳) 双曲テンソルネットワークを用いてブラックホール内部のホログラフマップを構築し、Akers, Engelhardt, Harlow, Penington, Vardhanによって提案された非等距離符号に局所性の概念を追加する。我々は、これらのネットワークによって提供されるツールを用いて、地平線の背後にある非等方性と量子超曲面の関係を研究する。さらに、Akersらによって導入されたquditモデルに基づいて、これらの内部テンソルネットワークに対する力学の限られた概念を導入し、蒸発するブラックホールにおける量子超表面の進化を研究する。また、ブラックホールの内部と放射を繋ぐワームホールをテンソルネットワークで記述し、ページ時間後に内部の状態と演算子が放射中にエンコードされるメカニズムを提供する。特に, この非等尺ブラックホール符号の動的構造に非自明な有効動力学を組み込むために, 最近提案された逆向きフォワード写像のテンソルネットワーク実現を構築した。 We use hyperbolic tensor networks to construct a holographic map for black hole interiors that adds a notion of locality to the non-isometric codes proposed by Akers, Engelhardt, Harlow, Penington, and Vardhan. We use tools provided by these networks to study the relationship between non-isometries and quantum extremal surfaces behind the horizon. Furthermore, we introduce a limited notion of dynamics for these interior tensor networks based on the qudit models introduced by Akers et al., and study the evolution of quantum extremal surfaces in an evaporating black hole. We also find a tensor network description of a wormhole connecting the black hole interior to the radiation, providing a mechanism for interior states and operators to be encoded in the radiation after the Page time. As a particular case, we construct a tensor network realization of the backwards-forwards maps recently proposed to incorporate non-trivial effective dynamics in dynamical constructions of these non-isometric black hole codes.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# LPViT:ビジョントランス用低消費電力半構造化プルーニング LPViT: Low-Power Semi-structured Pruning for Vision Transformers ( http://arxiv.org/abs/2407.02068v3 ) ライセンス: Link先を確認	Kaixin Xu, Zhe Wang, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin,	(参考訳) ビジョントランスフォーマーは、様々な画像解析タスクのための畳み込みニューラルネットワークに代わる有望な代替として登場し、同等または優れたパフォーマンスを提供している。しかし、ViTの重大な欠点は、そのリソース集約性であり、メモリフットプリントの増加、計算の複雑さ、電力消費につながる。この高性能技術を民主化し、環境に優しいものにするためには、ViTモデルを圧縮し、高い性能を維持しながらリソース要求を減らすことが不可欠である。本稿では,ViTの資源集約的な問題に対処するブロック構造化プルーニングを導入し,精度とハードウェアアクセラレーションのバランスのとれたトレードオフを提供する。非構造化プルーニングやチャネルワイドプルーニングとは異なり、ブロックプルーニングは線形層のブロックワイド構造を利用しており、より効率的な行列乗算をもたらす。このプルーニング方式を最適化するために,ブロック間隔構造に合わせて,高速化と推論時の消費電力の最小化を同時に行う,ハードウェア対応学習目標を提案する。この目的は、経験的なルックアップテーブルの必要性を排除し、パラメタライズされたレイヤ接続の削減にのみ焦点をあてる。さらに,本論文では,2次テイラー近似と経験的最適化を用いて,ViTの学習後プルーニングを実現するための軽量なアルゴリズムを提案する。 ImageNetの大規模な実験は、DeiT-BやDeiT-Sなど様々なViTアーキテクチャで行われ、他のプルーニング手法と競合する性能を示し、精度の保存と省電力の両立を実現している。特に,DeiT-Bでは専用ハードウェアで最大3.93倍,GPUで1.79倍の高速化を実現し,実世界のGPUで1.4倍の推論パワー低下を観測した。 Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more environmentally friendly, it is essential to compress ViT models, reducing their resource requirements while maintaining high performance. In this paper, we introduce a new block-structured pruning to address the resource-intensive issue for ViTs, offering a balanced trade-off between accuracy and hardware acceleration. Unlike unstructured pruning or channel-wise structured pruning, block pruning leverages the block-wise structure of linear layers, resulting in more efficient matrix multiplications. To optimize this pruning scheme, our paper proposes a novel hardware-aware learning objective that simultaneously maximizes speedup and minimizes power consumption during inference, tailored to the block sparsity structure. This objective eliminates the need for empirical look-up tables and focuses solely on reducing parametrized layer connections. Moreover, our paper provides a lightweight algorithm to achieve post-training pruning for ViTs, utilizing second-order Taylor approximation and empirical optimization to solve the proposed hardware-aware objective. Extensive experiments on ImageNet are conducted across various ViT architectures, including DeiT-B and DeiT-S, demonstrating competitive performance with other pruning methods and achieving a remarkable balance between accuracy preservation and power savings. Especially, we achieve up to 3.93x and 1.79x speedups on dedicated hardware and GPUs respectively for DeiT-B, and also observe an inference power reduction by 1.4x on real-world GPUs.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# 動的アルゴリズムとコンパイラ共設計によるオンデバイス超解法のためのデータオーバーフィッティング Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design ( http://arxiv.org/abs/2407.02813v2 ) ライセンス: Link先を確認	Gen Li, Zhihao Shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma,	(参考訳) ディープニューラルネットワーク(DNN)は、様々なコンピュータビジョンアプリケーションで頻繁に使用される。現在、ビデオ配信システムにおける新たなトレンドは、DNNの過度に適合したプロパティを活用して、ビデオ解像度のアップスケールを実現することである。動画をチャンクに分割し、各チャンクに過度に適合させるために超高解像度(SR)モデルを適用することで、このSRモデルとビデオチャンクのスキームは、従来のビデオ伝送を置き換えることができ、ビデオ品質と伝送効率を向上させることができる。しかし、高パフォーマンスを保証するために多くのモデルとチャンクが必要であるため、モデルの切り替えとユーザ側のメモリフットプリントが大幅にオーバヘッドされる。このような問題を解決するために,Content-Awareデータ処理パイプラインが支援するダイナミックディープニューラルネットワークを提案する。また,Dy-DCAの動的特徴(動的形状,サイズ,制御フローなど)を最適化し,融合コード生成や静的実行計画など,一連のコンパイル最適化を可能にするフレームワークを設計した。このような手法を用いることで,市販携帯電話上でのPSNRとリアルタイム性能(33FPS)を向上する。一方、コンパイルの最適化によって、1.7$\times$スピードアップを実現し、最大1.61$\times$メモリ消費を節約します。コードはhttps://github.com/coulsonlee/Dy-DCA-ECCV2024で公開されている。 Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks is able to replace traditional video transmission to enhance video quality and transmission efficiency. However, many models and chunks are needed to guarantee high performance, which leads to tremendous overhead on model switching and memory footprints at the user end. To resolve such problems, we propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one (Dy-DCA), which helps promote performance while conserving computational resources. Additionally, to achieve real acceleration on the user end, we designed a framework that optimizes dynamic features (e.g., dynamic shapes, sizes, and control flow) in Dy-DCA to enable a series of compilation optimizations, including fused code generation, static execution planning, etc. By employing such techniques, our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone. Meanwhile, assisted by our compilation optimization, we achieve a 1.7$\times$ speedup while saving up to 1.61$\times$ memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# 弱い潜伏因子はいつ統計的に推測できるのか? When can weak latent factors be statistically inferred? ( http://arxiv.org/abs/2407.03616v2 ) ライセンス: Link先を確認	Jianqing Fan, Yuling Yan, Yuheng Zheng,	(参考訳) 本稿では,主成分分析(PCA)の新しい包括的・包括的推定理論を,雑音レベルや信号対雑音比に対する因子強度を最小限に抑え,断面積依存の慣性成分を許容する弱因子モデルの下で確立する。我々の理論は断面次元$N$と時間次元$T$の相対的な成長速度によらず適用可能である。このより現実的な仮定と顕著な結果は、完全に新しい技術装置を必要とする。例えば、$N\asymp T$ の場合、PCA ベースの推定器の漸近正規性は、信号-雑音比 (SNR) が$\log N$ の多項式速度よりも早く増加する限り、保たれることを示す。この発見は、多項式レートが$N$を必要とした以前の作業を大幅に上回る。我々の理論は完全に非漸近的であり、推測誤差と統計的推論の不確実性の両方に有限サンプルの特性を与える。特筆すべき技術的革新は、PCAベースの推定器のクローズドフォームの1次近似であり、様々な統計的テストの道を開くものである。さらに,提案理論を適用して,未知の潜伏因子の線形スパンに該当する要因の検証,各ユニットの因子負荷における構造的欠陥の検証,2つのユニットが同一のリスク露光を有するかどうかの検証,系統的リスクに対する信頼区間の構築を行う。私たちの実証研究は、テスト結果と経済サイクルの洞察に富んだ相関関係を明らかにしました。 This article establishes a new and comprehensive estimation and inference theory for principal component analysis (PCA) under the weak factor model that allow for cross-sectional dependent idiosyncratic components under nearly minimal the factor strength relative to the noise level or signal-to-noise ratio. Our theory is applicable regardless of the relative growth rate between the cross-sectional dimension $N$ and temporal dimension $T$. This more realistic assumption and noticeable result requires completely new technical device, as the commonly-used leave-one-out trick is no longer applicable to the case with cross-sectional dependence. Another notable advancement of our theory is on PCA inference $ - $ for example, under the regime where $N\asymp T$, we show that the asymptotic normality for the PCA-based estimator holds as long as the signal-to-noise ratio (SNR) grows faster than a polynomial rate of $\log N$. This finding significantly surpasses prior work that required a polynomial rate of $N$. Our theory is entirely non-asymptotic, offering finite-sample characterizations for both the estimation error and the uncertainty level of statistical inference. A notable technical innovation is our closed-form first-order approximation of PCA-based estimator, which paves the way for various statistical tests. Furthermore, we apply our theories to design easy-to-implement statistics for validating whether given factors fall in the linear spans of unknown latent factors, testing structural breaks in the factor loadings for an individual unit, checking whether two units have the same risk exposures, and constructing confidence intervals for systematic risks. Our empirical studies uncover insightful correlations between our test results and economic cycles.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# 多言語ASRシステムの自己回帰デコーダの連続学習最適化 Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems ( http://arxiv.org/abs/2407.03645v2 ) ライセンス: Link先を確認	Chin Yuen Kwok, Jia Qi Yip, Eng Siong Chng,	(参考訳) 継続学習(CL)は、事前学習されたデータの性能を維持しながら、新しいデータで訓練済みモデルを微調整する。これは多言語ASR(MASR)の機能拡張に特に関係している。しかし、コンピュータビジョンと強化学習タスクを主目的とする既存のCL手法では、MASRに直接適用した場合、しばしば準最適結果が得られる。これはMASRモデルにおける自己回帰デコーダのCLが難しいためである。これを検証するために,デコーダに4つの最適化を提案する。その中には、デコーダ層勾配手術、未使用のトークン埋め込みの凍結、新たに追加されたトークンの出力の抑制、学習率の再スケーリングが含まれる。 Common VoiceデータセットからWhisperを10の未確認言語に適用する実験により、これらの最適化により、新しい言語のAWERを妥協することなく、事前訓練された言語の平均単語誤り率(AWER)が14.2%から12.4%に低下することを示した。 Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforcement learning tasks, often yield sub-optimal results when directly applied to MASR. We hypothesise that this is because CL of the auto-regressive decoder in the MASR model is difficult. To verify this, we propose four optimizations on the decoder. They include decoder-layer gradient surgery, freezing unused token embeddings, suppressing output of newly added tokens, and learning rate re-scaling. Our experiments on adapting Whisper to 10 unseen languages from the Common Voice dataset demonstrate that these optimizations reduce the Average Word Error Rate (AWER) of pretrained languages from 14.2% to 12.4% compared with Experience Replay, without compromising the AWER of new languages.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# SfM on-the-fly:より優れた3D撮影が可能に SfM on-the-fly: Get better 3D from What You Capture ( http://arxiv.org/abs/2407.03939v2 ) ライセンス: Link先を確認	Zhan Zongqian, Yu Yifei, Xia Rui, Gan Wentian, Xie Hong, Perda Giulio, Morelli Luca, Remondino Fabio, Wang Xin,	(参考訳) 過去20年間、Structure from Motion (SfM) はフォトグラメトリー、コンピュータビジョン、ロボティクスなどの分野において、常にホットスポットとして研究されてきた。この作品は、オリジナルのオンザフライSfM(Zhan et al , 2024)の上に構築され、新しい3つの改良を加えて、撮影物からより良い3Dを得られるようにした。 (i)階層型ナビゲート型小型世界(HNSW)グラフを用いることにより、リアルタイム画像マッチングをさらに強化し、より真の正重畳み画像候補をより高速に同定する。 (II)SfM結果を改善するために,頑健な階層的局所バンドル調整のための自己適応重み付け戦略を提案する。三共同SfMを支援するための複数のエージェントを含み、一般的に登録された画像が現れたときに、複数の3D再構成をシームレスに完全3Dシーンにマージする。提案したSfM法(On-the-fly SfMv2)は,より完全でロバストな3次元再構成を高時間効率で実現できることを示す。コードはhttp://yifeiyu225.github.io/on-theflySfMv2.github.io/で公開されている。 In the last twenty years, Structure from Motion (SfM) has been a constant research hotspot in the fields of photogrammetry, computer vision, robotics etc., whereas real-time performance is just a recent topic of growing interest. This work builds upon the original on-the-fly SfM (Zhan et al., 2024) and presents an updated version with three new advancements to get better 3D from what you capture: (i) real-time image matching is further boosted by employing the Hierarchical Navigable Small World (HNSW) graphs, thus more true positive overlapping image candidates are faster identified; (ii) a self-adaptive weighting strategy is proposed for robust hierarchical local bundle adjustment to improve the SfM results; (iii) multiple agents are included for supporting collaborative SfM and seamlessly merge multiple 3D reconstructions into a complete 3D scene when commonly registered images appear. Various comprehensive experiments demonstrate that the proposed SfM method (named on-the-fly SfMv2) can generate more complete and robust 3D reconstructions in a high time-efficient way. Code is available at http://yifeiyu225.github.io/on-the-flySfMv2.github.io/.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# 観察可能な近接面の検出:クロスドメイン3次元物体検出の新しいモデリングと評価 Detect Closer Surfaces that can be Seen: New Modeling and Evaluation in Cross-domain 3D Object Detection ( http://arxiv.org/abs/2407.04061v3 ) ライセンス: Link先を確認	Ruixiao Zhang, Yihong Wu, Juheon Lee, Adam Prugel-Bennett, Xiaohao Cai,	(参考訳) ドメイン適応技術の性能は、現在の自動運転車の3Dオブジェクト検出分野において、まだ理想的なレベルに達していない。これらの要因が組み合わさって、特定のデータセットから学んだ知識の効果的な伝達と応用を妨げる。既存の評価指標は、当初、予測と接地トラスト境界ボックス間の2次元または3次元の重なりを計算して、単一領域上での評価のために設計されているため、データセット間のサイズ差に起因する過度な問題に悩まされることが多い。ドメインにまたがって適用された後、元の3Dバウンディングボックスで優れたパフォーマンスを維持するために、本当にモデルが必要なのでしょうか? 実用的アプリケーションの観点からは、車両と他の障害物との衝突を防止することに重点を置いています。言い換えれば、モデルがエゴ車両に最も近い表面を正確に識別できる限り、障害を効果的に回避することは十分である。本稿では,エゴ車両のセンサに近接する表面を検出する3次元物体検出モデルの能力を測定するための2つの指標を提案する。さらに、EdgeHeadと呼ばれる改良ヘッドを提案し、学習可能な近接面にもっと焦点を合わせることで、既存のモデルのクロスドメインパフォーマンスを大幅に向上させることができる。 The performance of domain adaptation technologies has not yet reached an ideal level in the current 3D object detection field for autonomous driving, which is mainly due to significant differences in the size of vehicles, as well as the environments they operate in when applied across domains. These factors together hinder the effective transfer and application of knowledge learned from specific datasets. Since the existing evaluation metrics are initially designed for evaluation on a single domain by calculating the 2D or 3D overlap between the prediction and ground-truth bounding boxes, they often suffer from the overfitting problem caused by the size differences among datasets. This raises a fundamental question related to the evaluation of the 3D object detection models' cross-domain performance: Do we really need models to maintain excellent performance in their original 3D bounding boxes after being applied across domains? From a practical application perspective, one of our main focuses is actually on preventing collisions between vehicles and other obstacles, especially in cross-domain scenarios where correctly predicting the size of vehicles is much more difficult. In other words, as long as a model can accurately identify the closest surfaces to the ego vehicle, it is sufficient to effectively avoid obstacles. In this paper, we propose two metrics to measure 3D object detection models' ability of detecting the closer surfaces to the sensor on the ego vehicle, which can be used to evaluate their cross-domain performance more comprehensively and reasonably. Furthermore, we propose a refinement head, named EdgeHead, to guide models to focus more on the learnable closer surfaces, which can greatly improve the cross-domain performance of existing models not only under our new metrics, but even also under the original BEV/3D metrics.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# Stephanie: 社会会話におけるヒューマンインタラクションの軽減のためのステップバイステップ対話 Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations ( http://arxiv.org/abs/2407.04093v2 ) ライセンス: Link先を確認	Hao Yang, Hongyuan Lu, Xinhua Zeng, Yang Liu, Xiang Zhang, Haoran Yang, Yumeng Zhang, Shan Huang, Yiran Wei, Wai Lam,	(参考訳) 自然言語処理の分野では、対話システムは1段階の対話パラダイムを主に採用している。このパラダイムは効率的だが、人間の相互作用の深さと流動性が欠如しており、自然に見えない。本稿では,人間の会話のダイナミックな性質を模倣した,新しい『textbf{Step}-by-Step Dialogue Paradigm』(ステファニー)を紹介する。デュアルラーニング戦略と,さらに分割した後編集手法を用いることで,既存の大規模言語モデルの微調整に高品質なステップバイステップ対話データセットを作成,活用し,ステップバイステップ対話を可能にする。私たちはステファニーを徹底的に紹介する。従来の単段階対話のパラダイムと比較して,その効果を評価するために,自動評価と人的評価を行った。チャットボットの未来を促進するために、コード、Stephanieデータセット、Stephanie LLMをリリースします。 In the rapidly evolving field of natural language processing, dialogue systems primarily employ a single-step dialogue paradigm. Although this paradigm is efficient, it lacks the depth and fluidity of human interactions and does not appear natural. We introduce a novel \textbf{Step}-by-Step Dialogue Paradigm (Stephanie), designed to mimic the ongoing dynamic nature of human conversations. By employing a dual learning strategy and a further-split post-editing method, we generated and utilized a high-quality step-by-step dialogue dataset to fine-tune existing large language models, enabling them to perform step-by-step dialogues. We thoroughly present Stephanie. Tailored automatic and human evaluations are conducted to assess its effectiveness compared to the traditional single-step dialogue paradigm. We will release code, Stephanie datasets, and Stephanie LLMs to facilitate the future of chatbot eras.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# NeuFair: ドロップアウトによるニューラルネットワークのフェアネス修復 NeuFair: Neural Network Fairness Repair with Dropout ( http://arxiv.org/abs/2407.04268v2 ) ライセンス: Link先を確認	Vishnu Asutosh Dasu, Ashish Kumar, Saeid Tizpaz-Niari, Gang Tan,	(参考訳) 本稿では,ディープニューラルネットワーク(DNN)における後処理バイアス緩和としてのニューロンのドロップアウトについて検討する。神経駆動型ソフトウェアソリューションは、社会的に重要な領域において、重要な公正性に影響を及ぼす。ニューラルネットワークは、データから統計的パターンを見つけるのに非常に適しているが、過去のデータから既存のバイアスをエンコードして増幅することができる。既存のバイアス軽減アルゴリズムでは、入力データセットや学習アルゴリズムを変更する必要があることが多い。ランダムにニューロンを落とすことによるトレーニング中に過剰な適合を防げる一般的なドロップアウト手法は、事前訓練されたDNNの公平性を改善するための効果的な、より侵入的なアプローチである可能性があると仮定する。しかし、ドロップするニューロンの理想的な集合を見つけることは組合せ問題である。我々は,事前学習したDNNにおける不公平さをトレーニング後の推論におけるドロップアウトによって軽減する,後処理のランダム化アルゴリズムであるNeuFairを提案する。我々のランダム化検索は、モデルの実用性を維持しながら差別を最小限に抑える目的によって導かれる。ランダム化アルゴリズムの設計は, モデルの性能劣化を最小限に抑えつつ, 公平性(最大69%)を向上させるのに有効であり, 効率的であることを示す。本稿では,これらの現象を直感的に説明し,探索アルゴリズムの様々なハイパーパラメータが結果に与える影響を慎重に検討する。最後に、NeuFairと異なる最先端バイアス緩和器を経験的、概念的に比較する。 This paper investigates neuron dropout as a post-processing bias mitigation for deep neural networks (DNNs). Neural-driven software solutions are increasingly applied in socially critical domains with significant fairness implications. While neural networks are exceptionally good at finding statistical patterns from data, they may encode and amplify existing biases from the historical data. Existing bias mitigation algorithms often require modifying the input dataset or the learning algorithms. We posit that the prevalent dropout methods that prevent over-fitting during training by randomly dropping neurons may be an effective and less intrusive approach to improve the fairness of pre-trained DNNs. However, finding the ideal set of neurons to drop is a combinatorial problem. We propose NeuFair, a family of post-processing randomized algorithms that mitigate unfairness in pre-trained DNNs via dropouts during inference after training. Our randomized search is guided by an objective to minimize discrimination while maintaining the model's utility. We show that our design of randomized algorithms is effective and efficient in improving fairness (up to 69%) with minimal or no model performance degradation. We provide intuitive explanations of these phenomena and carefully examine the influence of various hyperparameters of search algorithms on the results. Finally, we empirically and conceptually compare NeuFair to different state-of-the-art bias mitigators.	翻訳日:2024-07-16 04:08:24 公開日:2024-07-12
# パーソナライズによる公正なフェデレーションデータクラスタリング - 分散データ分散のギャップを埋める Fair Federated Data Clustering through Personalization: Bridging the Gap between Diverse Data Distributions ( http://arxiv.org/abs/2407.04302v2 ) ライセンス: Link先を確認	Shivam Gupta, Tarushi, Tsering Wangzes, Shweta Jain,	(参考訳) エッジデバイスからのデータの急速な成長は、機械学習アルゴリズムのパフォーマンスを触媒にした。しかしながら、生成されたデータはクライアントデバイスに存在するため、従来の機械学習パラダイムが直面する大きな課題が2つある – トレーニング用のデータの集中化と、クラスラベルが欠落している生成データの大部分に対して、高コストと専門知識の欠如により、クライアントが手動でデータをラベル付けするインセンティブが非常に低い。これらの問題を解決するために、教師なしのフェデレートされたデータクラスタリングを使用して、分散的に保護されたプライバシで、不正なデータを処理するための初期の試みがあった。目標は、クライアントで利用可能なデータを、実際のデータ交換なしで、$k$パーティション(クラスタと呼ばれる)に分割することだ。既存のアルゴリズムのほとんどは、クライアント間のデータ分散パターンに依存しているか、あるいは計算コストが高い。さらに、既存のモデルが現実的なシナリオのほとんどにおいて、クライアントにまたがるデータの歪んだ性質があるため、クライアントは高いクラスタリングコストを被り、フェデレーションプロセスへの参加に消極的になる可能性がある。そこで,我々はまず,フェデレートクラスタリングにおけるパーソナライゼーションの考え方を紹介する。目標は、より低いクラスタリングコストを達成することと、同時に、クライアント間で均一なコストを達成することのバランスを達成することです。サーバとクライアント間の1ラウンドの通信でこれらの目標に対処するp-FClusを提案する。我々は,p-FClusがデータ独立性を示す様々なフェデレーションデータセットに対して有効であること,有限$$$$-normに適用可能であること,同時にコストと分散の低減を実現していることを検証した。 The rapid growth of data from edge devices has catalyzed the performance of machine learning algorithms. However, the data generated resides at client devices thus there are majorly two challenge faced by traditional machine learning paradigms - centralization of data for training and secondly for most the generated data the class labels are missing and there is very poor incentives to clients to manually label their data owing to high cost and lack of expertise. To overcome these issues, there have been initial attempts to handle unlabelled data in a privacy preserving distributed manner using unsupervised federated data clustering. The goal is partition the data available on clients into $k$ partitions (called clusters) without actual exchange of data. Most of the existing algorithms are highly dependent on data distribution patterns across clients or are computationally expensive. Furthermore, due to presence of skewed nature of data across clients in most of practical scenarios existing models might result in clients suffering high clustering cost making them reluctant to participate in federated process. To this, we are first to introduce the idea of personalization in federated clustering. The goal is achieve balance between achieving lower clustering cost and at same time achieving uniform cost across clients. We propose p-FClus that addresses these goal in a single round of communication between server and clients. We validate the efficacy of p-FClus against variety of federated datasets showcasing it's data independence nature, applicability to any finite $\ell$-norm, while simultaneously achieving lower cost and variance.	翻訳日:2024-07-16 04:08:23 公開日:2024-07-12
# Segment any 4D Gaussians Segment Any 4D Gaussians ( http://arxiv.org/abs/2407.04504v2 ) ライセンス: Link先を確認	Shengxiang Ji, Guanjun Wu, Jiemin Fang, Jiazhong Cen, Taoran Yi, Wenyu Liu, Qi Tian, Xinggang Wang,	(参考訳) XR/VRでは、現実世界のモデリング、理解、再構築が不可欠である。近年,3次元ガウス散乱(3D-GS)法は3次元シーンのモデリングと理解において顕著な成功を収めている。同様に、様々な4D表現は、4D世界のダイナミクスを捉える能力を示している。しかし、4次元表現のセグメンテーションに焦点をあてる研究が数多く存在する。本稿では, 4D ガウスをベースとした 4D デジタル世界において, あらゆるものをセグメント化する最初のフレームワークである Segment Any 4D Gaussians (SA4D) を提案する。 SA4Dでは、ガウスのドリフトを扱うために効率的な時間的アイデンティティ特徴場を導入し、ノイズやスパース入力から正確なアイデンティティ特徴を学習することができる。さらに, アーティファクトを除去するために, 4次元セグメンテーション精製法を提案する。われわれのSA4Dは4Dガウスで数秒以内の精度で高品質なセグメンテーションを実現し、高品質なマスクを取り除き、色を変え、構成し、レンダリングする能力を示している。さらなるデモは、https://jsxzs.github.io/sa4d/.comで公開されている。 Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations. In this paper, we propose Segment Any 4D Gaussians (SA4D), one of the first frameworks to segment anything in the 4D digital world based on 4D Gaussians. In SA4D, an efficient temporal identity feature field is introduced to handle Gaussian drifting, with the potential to learn precise identity features from noisy and sparse input. Additionally, a 4D segmentation refinement process is proposed to remove artifacts. Our SA4D achieves precise, high-quality segmentation within seconds in 4D Gaussians and shows the ability to remove, recolor, compose, and render high-quality anything masks. More demos are available at: https://jsxzs.github.io/sa4d/.	翻訳日:2024-07-16 04:08:23 公開日:2024-07-12
# 強いLLMを判断する弱いLLMによるスケーラブルな監視について On scalable oversight with weak LLMs judging strong LLMs ( http://arxiv.org/abs/2407.04622v2 ) ライセンス: Link先を確認	Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah,	(参考訳) スケーラブルな監視プロトコルは、人間が人間の超人的AIを正確に監視できるようにすることを目的としている。本稿では,2つのAIが1人のAIが1人の裁判官を納得させようとするコンサルト,単一のAIが1人の裁判官を納得させようとするコンサルト,そして、AIなしで裁判官が正解する直接的な質問回答の基準と比較する。大規模言語モデル(LLM)をAIエージェントと人間の判断のためのスタンドインの両方として使用し、判断モデルがエージェントモデルよりも弱いと判断する。我々は、裁判官とエージェント間のさまざまな非対称性をベンチマークし、情報非対称性を持つ1つの抽出的QAタスクの以前の作業を拡張し、数学、コーディング、論理学、マルチモーダル推論非対称性も含むようにした。議論は、コンサルタントがランダムにアサインされ、正しい/間違った回答を議論するときに、すべてのタスクでコンサルタントを上回ります。情報非対称性の議論を抽出するQAタスクでは、直接質問応答よりも優れるが、情報非対称性のない他のタスクでは、結果は混合される。以前の作業では議論者やコンサルタンに議論の答えを割り当てていた。代わりに、どの答えを議論するかを選べば、審査員は、コンサルタントよりも議論において間違った答えに納得する頻度が低いことが分かる。さらに、より強力な議論者モデルは、従来の研究よりも控えめに判断精度を高めることが判明した。 Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies.	翻訳日:2024-07-16 04:08:23 公開日:2024-07-12
# Toucan: 150のアフリカ語ペアの多言語翻訳 Toucan: Many-to-Many Translation for 150 African Language Pairs ( http://arxiv.org/abs/2407.04796v2 ) ライセンス: Link先を確認	AbdelRahim Elmadany, Ife Adebara, Muhammad Abdul-Mageed,	(参考訳) 我々は、低リソース言語のための機械翻訳(MT)を改善するために設計されたリソースのコレクションを導入することで、自然言語処理(NLP)の顕著なギャップに対処する。まず、12億と370億のパラメータを持つ2つの言語モデル、Cheetah-1.2BとCheetah-3.7Bを紹介する。次に、前述のモデルを微調整して、アフリカ語ペア156をサポートするように設計された、アフロセントリックな機械翻訳モデルであるToucanを作成します。 Toucanを評価するため、我々はAfroLingu-MTと呼ばれる機械翻訳評価のための広範囲な機械翻訳ベンチマークを慎重に開発した。トウカンは他のモデルよりも大幅に優れており、アフリカの言語におけるMTでの顕著なパフォーマンスを示している。最後に、新しいモデルspBLEU-1Kをトレーニングし、614のアフリカ語を含む1K言語をカバーする翻訳評価指標を強化する。この研究は、特にアフリカなどの限られた言語資源を持つ地域で、異文化間の理解と知識交換を促進することを目的としている。 ToucanプロジェクトのGitHubリポジトリはhttps://github.com/UBC-NLP/Toucanで公開されている。 We address a notable gap in Natural Language Processing (NLP) by introducing a collection of resources designed to improve Machine Translation (MT) for low-resource languages, with a specific focus on African languages. First, we introduce two language models (LMs), Cheetah-1.2B and Cheetah-3.7B, with 1.2 billion and 3.7 billion parameters respectively. Next, we finetune the aforementioned models to create toucan, an Afrocentric machine translation model designed to support 156 African language pairs. To evaluate Toucan, we carefully develop an extensive machine translation benchmark, dubbed AfroLingu-MT, tailored for evaluating machine translation. Toucan significantly outperforms other models, showcasing its remarkable performance on MT for African languages. Finally, we train a new model, spBLEU-1K, to enhance translation evaluation metrics, covering 1K languages, including 614 African languages. This work aims to advance the field of NLP, fostering cross-cultural understanding and knowledge exchange, particularly in regions with limited language resources such as Africa. The GitHub repository for the Toucan project is available at https://github.com/UBC-NLP/Toucan.	翻訳日:2024-07-16 04:08:23 公開日:2024-07-12
# LaSe-E2V:言語誘導型セマンティック・アウェア・イベント・ビデオ再構成を目指して LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction ( http://arxiv.org/abs/2407.05547v2 ) ライセンス: Link先を確認	Kanghao Chen, Hangyu Li, JiaZhou Zhou, Zeyu Wang, Lin Wang,	(参考訳) イベントカメラは、標準カメラと比較して低レイテンシ、高時間分解能、高ダイナミックレンジ(HDR)などの利点を利用する。画像パラダイムの相違により、イベント・ツー・ビデオ(E2V)の再構築が主流となり、イベントベースと標準的なコンピュータビジョンが橋渡しされる。しかし、イベントカメラは、エッジとモーションの情報のみをローカルで検出する、本質的に不適切な性質のため、このタスクは依然として困難である。その結果、再構成されたビデオは、主にイベントデータのあいまいな意味論によって引き起こされる、アーティファクトや地域的曖昧さに悩まされることが多い。本稿では,言語は自然に豊富な意味情報を伝達し,E2V再構成のセマンティック一貫性を確保するのに驚くほど優れていることを示す。そこで本稿では,テキスト条件拡散モデルを用いて,言語誘導の観点から意味認識による高品質なE2V再構築を実現する,LaSe-E2Vという新しいフレームワークを提案する。しかし、拡散モデル固有の多様性とランダム性のため、E2V再構成のための空間的・時間的整合性を実現するために直接適用することは不可能である。そこで,まずイベント誘導時空間アテンション(ESA)モジュールを提案する。次に、時間的コヒーレンスを確保するためのイベント対応マスクロスと、空間的一貫性を高めるためのノイズ初期化戦略を導入する。イベントテキストとビデオのペアデータがないため、既存のE2Vデータセットを集約し、トレーニングと評価のためにタグ付けモデルを使用してテキスト記述を生成する。様々な難解なシナリオ(例えば、高速な動き、低光)をカバーする3つのデータセットの大規模な実験は、我々の手法の優位性を実証している。 Event cameras harness advantages such as low latency, high temporal resolution, and high dynamic range (HDR), compared to standard cameras. Due to the distinct imaging paradigm shift, a dominant line of research focuses on event-to-video (E2V) reconstruction to bridge event-based and standard computer vision. However, this task remains challenging due to its inherently ill-posed nature: event cameras only detect the edge and motion information locally. Consequently, the reconstructed videos are often plagued by artifacts and regional blur, primarily caused by the ambiguous semantics of event data. In this paper, we find language naturally conveys abundant semantic information, rendering it stunningly superior in ensuring semantic consistency for E2V reconstruction. Accordingly, we propose a novel framework, called LaSe-E2V, that can achieve semantic-aware high-quality E2V reconstruction from a language-guided perspective, buttressed by the text-conditional diffusion models. However, due to diffusion models' inherent diversity and randomness, it is hardly possible to directly apply them to achieve spatial and temporal consistency for E2V reconstruction. Thus, we first propose an Event-guided Spatiotemporal Attention (ESA) module to condition the event data to the denoising pipeline effectively. We then introduce an event-aware mask loss to ensure temporal coherence and a noise initialization strategy to enhance spatial consistency. Given the absence of event-text-video paired data, we aggregate existing E2V datasets and generate textual descriptions using the tagging models for training and evaluation. Extensive experiments on three datasets covering diverse challenging scenarios (e.g., fast motion, low light) demonstrate the superiority of our method.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# PORCA:部分観測データによる根本原因解析 PORCA: Root Cause Analysis with Partially Observed Data ( http://arxiv.org/abs/2407.05869v2 ) ライセンス: Link先を確認	Chang Gong, Di Yao, Jin Wang, Wenbin Li, Lanting Fang, Yongtao Xie, Kaiyu Feng, Peng Han, Jingping Bi,	(参考訳) ルート原因分析(RCA)は、複雑なシステムから因果構造を発見し解析することによって、システム障害の原因を特定することを目的としている。多くのアプリケーションドメインで広く使われている。信頼性の高い診断の結論は、システム障害と財政的損失を軽減する上で非常に重要である。しかし、以前の研究では、部分的な観察の効果(すなわち、欠損ノードと潜伏障害)を無視したシステムの完全な観察を暗黙的に仮定していた。その結果、信頼できるRCA結果の導出に失敗する。本稿では, 部分観察における未観測共同創設者の問題点と異質性を明らかにするとともに, 部分観察データを用いた根本原因分析の新たな課題を提起する。そこで本研究では,新しいRCAフレームワークであるPORCAを提案する。 PORCAは、拡大したスコアベースの因果探索を利用して、未観測の共同設立者の下で、非循環性指向の混合グラフを効率的に最適化する。さらに、適応的なサンプル重み付けを提供する不均一性を考慮したスケジューリング戦略も開発している。 1つの実世界のデータセットと2つの実世界のデータセットに対する大規模な実験結果は、提案フレームワークの有効性と優位性を示している。 Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which neglect the effect of partial observation (i.e., missing nodes and latent malfunction). As a result, they fail in deriving reliable RCA results. In this paper, we unveil the issues of unobserved confounders and heterogeneity in partial observation and come up with a new problem of root cause analysis with partially observed data. To achieve this, we propose PORCA, a novel RCA framework which can explore reliable root causes under both unobserved confounders and unobserved heterogeneity. PORCA leverages magnified score-based causal discovery to efficiently optimize acyclic directed mixed graph under unobserved confounders. In addition, we also develop a heterogeneity-aware scheduling strategy to provide adaptive sample weights. Extensive experimental results on one synthetic and two real-world datasets demonstrate the effectiveness and superiority of the proposed framework.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# PAS:データ効率の良いPlug-and-Play Prompt Augmentation System PAS: Data-Efficient Plug-and-Play Prompt Augmentation System ( http://arxiv.org/abs/2407.06027v3 ) ライセンス: Link先を確認	Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou,	(参考訳) 近年、Large Language Models(LLMs)の台頭により、プラグアンドプレイAIシステムへの需要が高まっている。様々なAI技術の中で、プロンプトエンジニアリングは特に重要である。しかし、学習曲線の急激さや時間投資の大幅な増加により、ユーザーはプロンプトを書くことの難しさに直面することが多く、既存の自動プロンプトエンジニアリング(APE)モデルを使用することは困難である。この問題に対処するために, LLM ベースのプラグアンドプレイ APE システム PAS を提案する。 PASは高品質で自動生成される補完的なデータセットに基づいてトレーニングされたLLMを使用し、例外的なパフォーマンスを実現している。総合的なベンチマークでは、PASは従来のAPEモデルと比較して、平均6.09ポイントの改善を達成している。さらに、PASは非常に効率的で、9000のデータポイントしか持たないSoTAの性能を実現している。さらに、PASは人的労働を必要とせずに、即時増強データを自律的に生成することができる。この柔軟性により、既存のすべてのLLMと互換性があり、幅広いタスクに適用できる。 PASは人間の評価に優れており、ユーザのためのプラグインとしての適合性を強調している。高い性能、効率、柔軟性の組み合わせにより、PASはプロンプトエンジニアリングの改善を通じてLCMのユーザビリティと有効性を向上する貴重なシステムとなっている。 In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficult to use. To address this issue, we propose PAS, an LLM-based plug-and-play APE system. PAS utilizes LLMs trained on high-quality, automatically generated prompt complementary datasets, resulting in exceptional performance. In comprehensive benchmarks, PAS achieves state-of-the-art (SoTA) results compared to previous APE models, with an average improvement of 6.09 points. Moreover, PAS is highly efficient, achieving SoTA performance with only 9000 data points. Additionally, PAS can autonomously generate prompt augmentation data without requiring additional human labor. Its flexibility also allows it to be compatible with all existing LLMs and applicable to a wide range of tasks. PAS excels in human evaluations, underscoring its suitability as a plug-in for users. This combination of high performance, efficiency, and flexibility makes PAS a valuable system for enhancing the usability and effectiveness of LLMs through improved prompt engineering.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# 構造生成:階層的クラスタを用いて拡散モデルを導出する Structured Generations: Using Hierarchical Clusters to guide Diffusion Models ( http://arxiv.org/abs/2407.06124v2 ) ライセンス: Link先を確認	Jorge da Silva Goncalves, Laura Manduchi, Moritz Vandenhirtz, Julia E. Vogt,	(参考訳) 本稿では,階層的クラスタリングをDenoising Diffusion Probabilistic Models (DDPMs) の枠組みに統合したDiffuse-TreeVAEを提案する。提案手法は,学習した潜在木VAE構造体の根埋め込みから新たな画像を生成し,階層的な経路を伝播し,第2段階のDDPMを用いて各データクラスタの異なる高品質な画像を洗練・生成する。その結果、画像の明瞭度を向上するだけでなく、生成されたサンプルがそれぞれのクラスタに代表されることを保証するモデルとなり、従来のVAEベースの手法の限界に対処し、クラスタリングベースの生成モデリングの状況を改善する。 This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate distinct, high-quality images for each data cluster. The result is a model that not only improves image clarity but also ensures that the generated samples are representative of their respective clusters, addressing the limitations of previous VAE-based methods and advancing the state of clustering-based generative modeling.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# SideSeeing: 歩道アセスメントのためのマルチモーダルデータセットとツールコレクション SideSeeing: A multimodal dataset and collection of tools for sidewalk assessment ( http://arxiv.org/abs/2407.06464v2 ) ライセンス: Link先を確認	R. J. P. Damaceno, L. Ferreira, F. Miranda, M. Hosseini, R. M. Cesar Jr,	(参考訳) 構築された環境を評価するためのツールとデータセットを提供する新しいイニシアティブであるSideSeeingを紹介する。本稿では,道路レベルのデータ取得,ロード,分析のためのフレームワークを提案する。このフレームワークを用いて,胸部搭載モバイルデバイスから撮影した映像とセンサデータ(加速度計,ジャイロスコープ,磁気センサ,GPS)を統合した新しいデータセットを収集した。それぞれのデータサンプルは、ブラジルとアメリカの病院の近くで歩道を撮影するユーザーが横断する経路を表している。データセットは、9つの病院の周囲12kmをカバーする3時間のコンテンツを含み、325,000のビデオフレームと対応するセンサーデータを含んでいる。さらに,歩道のシーン識別のための新しい68要素分類法を提案する。 SideSeeingは、都市の専門家が深層歩道のアクセシビリティ評価に利用できる一連のツールへの一歩だ。 SideSeeingデータとツールはhttps://sites.usp.br/sideseeing/.comで公開されている。 This paper introduces SideSeeing, a novel initiative that provides tools and datasets for assessing the built environment. We present a framework for street-level data acquisition, loading, and analysis. Using the framework, we collected a novel dataset that integrates synchronized video footaged captured from chest-mounted mobile devices with sensor data (accelerometer, gyroscope, magnetometer, and GPS). Each data sample represents a path traversed by a user filming sidewalks near hospitals in Brazil and the USA. The dataset encompasses three hours of content covering 12 kilometers around nine hospitals, and includes 325,000 video frames with corresponding sensor data. Additionally, we present a novel 68-element taxonomy specifically created for sidewalk scene identification. SideSeeing is a step towards a suite of tools that urban experts can use to perform in-depth sidewalk accessibility evaluations. SideSeeing data and tools are publicly available at https://sites.usp.br/sideseeing/.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# 視覚言語モデルは盲目です Vision language models are blind ( http://arxiv.org/abs/2407.06581v3 ) ライセンス: Link先を確認	Pooyan Rahmanzadehgervi, Logan Bolton, Mohammad Reza Taesiri, Anh Totti Nguyen,	(参考訳) 視覚機能を備えた大規模言語モデル(VLM)、例えば、GPT-4o、Gemini 1.5 Proは、数え切れないほどの画像テキストアプリケーションを動かし、多くの視覚基盤ベンチマークで高いスコアを得ている。私たちはBlindTestを提案します。BlindTestは、人間を識別するなど、まったく簡単な7つの視覚タスクのスイートです。 (a) 2つの円が重複するか否か (b)二つの線が交差するか否か (c)どの文字が一言で丸められているか、 (d)オリンピックのようなロゴの円の数を数える。驚いたことに、最先端の4つのVLMは平均してベンチマークで56.20%しか正確ではなく、 \newsonnetが最も正確である(73.77%)。 BlindTestでは、VLMは正確な空間情報とカウント(0から10)を必要とするタスクに苦労する。コードは、https://vlmsareblind.github.io/で入手できる。 Large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro are powering countless image-text applications and scoring high on many vision-understanding benchmarks. We propose BlindTest, a suite of 7 visual tasks absurdly easy to humans such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting the number of circles in a Olympic-like logo. Surprisingly, four state-of-the-art VLMs are, on average, only 56.20% accurate on our benchmark, with \newsonnet being the best (73.77% accuracy). On BlindTest, VLMs struggle with tasks that requires precise spatial information and counting (from 0 to 10), sometimes providing an impression of a person with myopia seeing fine details as blurry and making educated guesses. Code is available at: https://vlmsareblind.github.io/	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# Mobius:テキスト・ビデオ生成タスクのための高能率空間時間並列学習パラダイム Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task ( http://arxiv.org/abs/2407.06617v2 ) ライセンス: Link先を確認	Yiran Yang, Jinchao Zhang, Ying Deng, Jie Zhou,	(参考訳) テキスト・トゥ・イメージ(T2I)生成タスクの成功に触発されて、多くの研究者がテキスト・トゥ・ビデオ(T2V)生成タスクに力を注いでいる。 T2Vフレームワークの多くは、通常、T2Iモデルから継承し、動的ビデオを生成するための時間外トレーニング層を追加します。しかし、従来の3D-Unetはシリアルモードであり、時空間層は空間層に追従する。我々は、このシリアルモードは、環境に優しいものではなく、T2Vの開発に適さない大規模な拡散モデルと大規模なデータセットで、より多くのトレーニングコストをもたらすと信じている。そこで本稿では,T2Vタスクのための高効率な時空間並列訓練パラダイムであるMobiusを提案する。我々の3D-Unetでは、時間層と空間層は並列であり、特徴フローとバックプロパゲーションを最適化する。 Mobiusは24%のGPUメモリと12%のトレーニング時間を節約し、T2Vの微調整タスクを大幅に改善し、AIGCコミュニティに新たな洞察を与える。将来、コードをリリースします。 Inspired by the success of the text-to-image (T2I) generation task, many researchers are devoting themselves to the text-to-video (T2V) generation task. Most of the T2V frameworks usually inherit from the T2I model and add extra-temporal layers of training to generate dynamic videos, which can be viewed as a fine-tuning task. However, the traditional 3D-Unet is a serial mode and the temporal layers follow the spatial layers, which will result in high GPU memory and training time consumption according to its serial feature flow. We believe that this serial mode will bring more training costs with the large diffusion model and massive datasets, which are not environmentally friendly and not suitable for the development of the T2V. Therefore, we propose a highly efficient spatial-temporal parallel training paradigm for T2V tasks, named Mobius. In our 3D-Unet, the temporal layers and spatial layers are parallel, which optimizes the feature flow and backpropagation. The Mobius will save 24% GPU memory and 12% training time, which can greatly improve the T2V fine-tuning task and provide a novel insight for the AIGC community. We will release our codes in the future.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# 物理世界とサイバー空間の整合性: 体操AIに関する包括的調査 Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI ( http://arxiv.org/abs/2407.06886v2 ) ライセンス: Link先を確認	Yang Liu, Weixing Chen, Yongjie Bai, Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Zhida Li, Ganlong Zhao, Junyi Lin, Guanbin Li, Wen Gao, Liang Lin,	(参考訳) Embodied Artificial Intelligence (Embodied AI)は、AGI(Artificial General Intelligence)を達成するために不可欠であり、サイバースペースと物理世界を橋渡しする様々なアプリケーションの基盤として機能する。近年,MLM(Multi-modal Large Models)やWM(World Models)の出現が注目されている。しかし、MLMの時代には、Embodied AIに関する包括的な調査は行われていない。本調査では,Embodied AIの最近の進歩を包括的に調査する。まず,ロボットとシミュレータの代表的な研究の最前線をナビゲートし,研究の焦点とその限界を十分に理解する。そして、主な研究対象を4つ分析する。 1)知覚の具体化。 2) 相互作用の具体化。 3)具体化剤、及び 4)シム・トゥ・リアルな適応、最先端の手法、必須パラダイム、包括的なデータセットを網羅する。さらに,仮想および実実施エージェントにおけるMLMの複雑さを考察し,動的デジタルおよび物理環境における相互作用を促進することの重要性を強調した。最後に、具体化AIの課題と限界を要約し、今後の方向性について論じる。この調査が研究コミュニティの基礎的な参考として役立ち、継続的なイノベーションを刺激することを期待しています。関連するプロジェクトはhttps://github.com/HCPLab-SYSU/Embodied_AI_Paper_Listにある。 Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# 天皇が服を着る:パスワードマネージャによるWebユーザ認証のためのセキュアなガバナンスフレームワーク The Emperor is Now Clothed: A Secure Governance Framework for Web User Authentication through Password Managers ( http://arxiv.org/abs/2407.07205v2 ) ライセンス: Link先を確認	Ali Cherry, Konstantinos Barmpis, Siamak F. Shahandashti,	(参考訳) パスワードマネージャとWebアプリケーション間のインタラクションを促進する既存のアプローチは、適切な機能を提供し、重要な攻撃に対する緩和戦略を提供していない。 HTML Autofillは十分な表現力がなく、Credential Management APIはブラウザ拡張パスワードマネージャをサポートしておらず、他の提案されたソリューションは確立したユーザメンタルモデルに準拠していない。本稿では,パスワードマネージャとWebアプリケーション間のインタラクションを仲介するブラウザベースのガバナンスフレームワークであるBerytusを提案する。 2つのAPIは、パスワードマネージャとWebアプリケーションの間のオーケストレータとして機能するBerytusをサポートするように設計されている。 Firefoxにおけるフレームワークの実装は、登録および認証プロセスを完全にサポートする。これは、フィッシング、クロスサイトスクリプティング、インラインコードインジェクション(例えば、悪意のあるブラウザ拡張による)、TLSプロキシに対する効果的な緩和戦略を提供するのに対して、コンテンツセキュリティポリシーやクレデンシャルトークン化のような既存の緩和戦略は部分的に有効である。フレームワーク設計は、マルチステップ、マルチファクタ、カスタム認証スキームのサポートなど、望ましい機能も提供する。包括的セキュリティと機能評価を提供し、将来的な方向性について議論する。 Existing approaches to facilitate the interaction between password managers and web applications fall short of providing adequate functionality and mitigation strategies against prominent attacks. HTML Autofill is not sufficiently expressive, Credential Management API does not support browser extension password managers, and other proposed solutions do not conform to established user mental models. In this paper, we propose Berytus, a browser-based governance framework that mediates the interaction between password managers and web applications. Two APIs are designed to support Berytus acting as an orchestrator between password managers and web applications. An implementation of the framework in Firefox is developed that fully supports registration and authentication processes. As an orchestrator, Berytus is able to authenticate web applications and facilitate authenticated key exchange between web applications and password managers, which as we show, can provide effective mitigation strategies against phishing, cross-site scripting, inline code injection (e.g., by a malicious browser extension), and TLS proxy in the middle attacks, whereas existing mitigation strategies such as Content Security Policy and credential tokenisation are only partially effective. The framework design also provides desirable functional properties such as support for multi-step, multi-factor, and custom authentication schemes. We provide a comprehensive security and functionality evaluation and discuss possible future directions.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# 強化学習による構造設計 Structural Design Through Reinforcement Learning ( http://arxiv.org/abs/2407.07288v2 ) ライセンス: Link先を確認	Thomas Rochefort-Beaudoin, Aurelian Vadean, Niels Aage, Sofiane Achiche,	(参考訳) 本稿では、トポロジ最適化(TO)における機械学習を促進するために設計された、オープンソースの強化学習(RL)環境である構造最適化ジム(SOgym)を紹介する。ソギムは、TOの物理学を報酬関数に統合することにより、物理的に実現可能で構造的に堅牢な設計をRLエージェントで生成することを可能にする。スケーラビリティを高めるために、Sogymは、環境とエージェントの間のメッシュ非依存のインターフェースとしてフィーチャーマッピングメソッドを活用し、メッシュの解像度に関わらず、設計変数との効率的なインタラクションを可能にする。ベースラインの結果はモデルフリーのプロキシポリシー最適化エージェントとモデルベースDreamerV3エージェントを使用する。 3つの観測空間が試験された。 TopOptのゲームインスパイアされた構成は、ボリューム制約下でのコンプライアンスを最小化する構造設計における学生の直感を改善するインタラクティブな教育ツールであり、性能とサンプル効率の点で最善を尽くした。 DreamerV3の100Mパラメータバージョンは、従来の最適化手法によって達成されたベースラインコンプライアンスの54%以内の構造と0%の切断率を生成した。エージェントの学習率とTopOptゲーム実験の工学生の学習率を比較すると、DreamerV3-100Mモデルは約4桁の学習率を示し、試行錯誤を通じてスクラッチからトレーニングされたポリシーにとって素晴らしい成果だ。これらの結果は、RLが継続的TO問題を解決し、多様な設計ソリューションから学び、学習する能力を持っていることを示唆している。 SOgymは複雑な構造設計の課題に対してRLエージェントを開発するためのプラットフォームを提供しており、この分野のさらなる研究を支援するために公開されている。 This paper introduces the Structural Optimization gym (SOgym), a novel open-source Reinforcement Learning (RL) environment designed to advance machine learning in Topology Optimization (TO). SOgym enables RL agents to generate physically viable and structurally robust designs by integrating the physics of TO into the reward function. To enhance scalability, SOgym leverages feature-mapping methods as a mesh-independent interface between the environment and the agent, allowing efficient interaction with the design variables regardless of mesh resolution. Baseline results use a model-free Proximal Policy Optimization agent and a model-based DreamerV3 agent. Three observation space configurations were tested. The TopOpt game-inspired configuration, an interactive educational tool that improves students' intuition in designing structures to minimize compliance under volume constraints, performed best in terms of performance and sample efficiency. The 100M parameter version of DreamerV3 produced structures within 54% of the baseline compliance achieved by traditional optimization methods and a 0% disconnection rate, an improvement over supervised learning approaches that often struggle with disconnected load paths. When comparing the learning rates of the agents to those of engineering students from the TopOpt game experiment, the DreamerV3-100M model shows a learning rate approximately four orders of magnitude lower, an impressive feat for a policy trained from scratch through trial and error. These results suggest RL's potential to solve continuous TO problems and its capacity to explore and learn from diverse design solutions. SOgym provides a platform for developing RL agents for complex structural design challenges and is publicly available to support further research in the field.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# 大規模視線モデルに対する攻撃調査:資源・進歩・今後の動向 A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends ( http://arxiv.org/abs/2407.07403v2 ) ライセンス: Link先を確認	Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu,	(参考訳) 近年の大規模モデルの発展に伴い、LVLM(Large Vision-Language Models)は多モード理解と推論タスクの幅広い分野において顕著な機能を示した。 LVLMは従来のLarge Language Models (LLMs)と比較して、マルチリソースの現実世界アプリケーションに近づき、マルチモーダル処理の複雑さのため、大きな可能性と課題を示す。しかし、LVLMsの脆弱性は比較的過小評価されており、日々の使用において潜在的なセキュリティリスクを生じさせている。本稿では,既存のLVLM攻撃の様々な形態について概説する。具体的には、まず、攻撃予備、攻撃課題、攻撃資源を含むLVLMをターゲットにした攻撃の背景を紹介する。次に,モデル出力を操作する敵攻撃,不正行為のモデル脆弱性を悪用するジェイルブレイク攻撃,プロンプト型とパターンを設計するインジェクション攻撃,モデルトレーニングに影響を与えるデータ中毒など,LVLM攻撃手法の開発を体系的に検討する。最後に,将来的な研究の方向性について論じる。我々の調査は、LVLMの脆弱性の現在の状況に関する洞察を提供し、より多くの研究者がLVLM開発における潜在的な安全性問題を探求し緩和するよう促していると信じています。 LVLM攻撃に関する最新の論文は、https://github.com/liudaizong/Awesome-LVLM-Attack.comで継続的に収集されている。 With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the complexity of multi-modal processing. However, the vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage. In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks. Specifically, we first introduce the background of attacks targeting LVLMs, including the attack preliminary, attack challenges, and attack resources. Then, we systematically review the development of LVLM attack methods, such as adversarial attacks that manipulate model outputs, jailbreak attacks that exploit model vulnerabilities for unauthorized actions, prompt injection attacks that engineer the prompt type and pattern, and data poisoning that affects model training. Finally, we discuss promising research directions in the future. We believe that our survey provides insights into the current landscape of LVLM vulnerabilities, inspiring more researchers to explore and mitigate potential safety issues in LVLM developments. The latest papers on LVLM attacks are continuously collected in https://github.com/liudaizong/Awesome-LVLM-Attack.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# Open-Vocabulary Video Instance Segmentationのための統一埋め込みアライメント Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation ( http://arxiv.org/abs/2407.07427v2 ) ライセンス: Link先を確認	Hao Fang, Peng Wu, Yawei Li, Xinxin Zhang, Xiankai Lu,	(参考訳) Open-Vocabulary Video Instance Segmentation (VIS)は、任意のオブジェクトをセグメンテーションし追跡する能力によって、注目を集めている。しかし、最近のOpen-Vocabulary VISの試みは、特に新しいカテゴリの一般化能力に関して、不満足な結果を得た。 VLM機能(例えばCLIP)とインスタンスクエリのドメインギャップと時間的一貫性の未利用が2つの中心的な原因であることが判明した。これらの問題を緩和するために、我々はOVFormerと呼ばれる新しいOpen-Vocabulary VISベースラインを設計し、訓練する。 OVFormerは軽量なモジュールを使用して、クエリの埋め込みとCLIPイメージの埋め込みを統合してドメインギャップを修復する。従来の画像ベーストレーニングとは異なり、ビデオベースのモデルトレーニングを行い、ビデオ内の時間的一貫性を完全にマイニングする半オンライン推論スキームをデプロイする。ベルとホイッスルがなければ、OVFormerはLV-VISのResNet-50バックボーンで21.9mAPを達成した。いくつかの近接語彙VISデータセットに対する大規模な実験は、OVFormerの強いゼロショット一般化能力(YouTube-VIS 2019では7.6 mAP、OVISでは3.9 mAP)も示している。コードはhttps://github.com/fanghaook/OVFormer.comで入手できる。 Open-Vocabulary Video Instance Segmentation (VIS) is attracting increasing attention due to its ability to segment and track arbitrary objects. However, the recent Open-Vocabulary VIS attempts obtained unsatisfactory results, especially in terms of generalization ability of novel categories. We discover that the domain gap between the VLM features (e.g., CLIP) and the instance queries and the underutilization of temporal consistency are two central causes. To mitigate these issues, we design and train a novel Open-Vocabulary VIS baseline called OVFormer. OVFormer utilizes a lightweight module for unified embedding alignment between query embeddings and CLIP image embeddings to remedy the domain gap. Unlike previous image-based training methods, we conduct video-based model training and deploy a semi-online inference scheme to fully mine the temporal consistency in the video. Without bells and whistles, OVFormer achieves 21.9 mAP with a ResNet-50 backbone on LV-VIS, exceeding the previous state-of-the-art performance by 7.7. Extensive experiments on some Close-Vocabulary VIS datasets also demonstrate the strong zero-shot generalization ability of OVFormer (+ 7.6 mAP on YouTube-VIS 2019, + 3.9 mAP on OVIS). Code is available at https://github.com/fanghaook/OVFormer.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# 半監督的時間的行動定位のための適応的擬似ラベル学習に向けて Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization ( http://arxiv.org/abs/2407.07673v2 ) ライセンス: Link先を確認	Feixiang Zhou, Bryan Williams, Hossein Rahmani,	(参考訳) ノイズを緩和する擬似ラベルは、セミスーパーバイズド・テンポラル・アクション・ローカライゼーション(SS-TAL)において重要な課題である。既存の手法はしばしば厳密な条件に基づいて擬似ラベルをフィルタリングするが、典型的には分類とローカライゼーションの質を別々に評価し、最適でない擬似ラベルのランク付けと選択に繋がる。特に、選択された正のラベルの中に不正確な擬似ラベルがあり、信頼されたラベルは誤って負のラベルに割り当てられる。これらの問題に対処するため, 擬似ラベル選択を容易にするために, 適応型擬似ラベル学習(APL)フレームワークを提案する。具体的には、ランキング品質を改善するために、分類信頼性と局所化信頼性を協調的に学習し、次いで、共同スコアに基づいて擬似ラベルを動的に選択する適応ラベル品質評価(ALQA)を提案する。さらに、インスタンスレベルの一貫性判別器(ICD)を提案し、不明瞭な正と潜在的な正を同時に除去し、インスタンス間固有の一貫性に基づいて、より正確な選択をもたらす。さらに,行動と背景の区別を高めるために,一般教師なしの行動対応コントラスト事前訓練(ACP)を導入し,SS-TALの恩恵を受ける。 THUMOS14とActivityNet v1.3の広範囲な実験により,様々な半教師付き環境下での最先端性能が実証された。 Alleviating noisy pseudo labels remains a key challenge in Semi-Supervised Temporal Action Localization (SS-TAL). Existing methods often filter pseudo labels based on strict conditions, but they typically assess classification and localization quality separately, leading to suboptimal pseudo-label ranking and selection. In particular, there might be inaccurate pseudo labels within selected positives, alongside reliable counterparts erroneously assigned to negatives. To tackle these problems, we propose a novel Adaptive Pseudo-label Learning (APL) framework to facilitate better pseudo-label selection. Specifically, to improve the ranking quality, Adaptive Label Quality Assessment (ALQA) is proposed to jointly learn classification confidence and localization reliability, followed by dynamically selecting pseudo labels based on the joint score. Additionally, we propose an Instance-level Consistency Discriminator (ICD) for eliminating ambiguous positives and mining potential positives simultaneously based on inter-instance intrinsic consistency, thereby leading to a more precise selection. We further introduce a general unsupervised Action-aware Contrastive Pre-training (ACP) to enhance the discrimination both within actions and between actions and backgrounds, which benefits SS-TAL. Extensive experiments on THUMOS14 and ActivityNet v1.3 demonstrate that our method achieves state-of-the-art performance under various semi-supervised settings.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# 科学シミュレーションのためのスマートサロゲートの能動的学習の可能性 Feasibility Study on Active Learning of Smart Surrogates for Scientific Simulations ( http://arxiv.org/abs/2407.07674v2 ) ライセンス: Link先を確認	Pradeep Bajracharya, Javier Quetzalcóatl Toledo-Marín, Geoffrey Fox, Shantenu Jha, Linwei Wang,	(参考訳) 複雑なシステムを理解する上で重要な高性能な科学シミュレーションは、特に広いパラメータ空間を探索する際に計算上の問題に遭遇する。シミュレーションを加速できる代理モデルとして、ディープニューラルネットワーク(DNN)の開発への関心が高まっている。しかし、これらのDNNサロゲートをトレーニングするための既存のアプローチは、ヒューリスティックに選択され、高価な計算で生成される広範なシミュレーションデータに依存している。本稿では,DNNサロゲートトレーニングにアクティブラーニングを取り入れることの可能性を検討する。これにより、インテリジェントで客観的なトレーニングシミュレーションの選択が可能になり、広範なシミュレーションデータを生成する必要がなくなり、事前定義されたトレーニングシミュレーションに対するDNNサロゲートのパフォーマンスの依存性が軽減される。 2つの異なるDNNアーキテクチャを考慮し,拡散方程式に対するDNNサロゲート構築の問題点として,多様性と不確実性に基づくトレーニングシミュレーション選択手法の有効性を検討する。研究成果は,科学シミュレーションの効率向上を図るために,能動的学習戦略によるシミュレーションデータのオンザフライ生成を支援する,スマートサロゲートのための高性能コンピューティング基盤の開発の基礎となるものである。 High-performance scientific simulations, important for comprehension of complex systems, encounter computational challenges especially when exploring extensive parameter spaces. There has been an increasing interest in developing deep neural networks (DNNs) as surrogate models capable of accelerating the simulations. However, existing approaches for training these DNN surrogates rely on extensive simulation data which are heuristically selected and generated with expensive computation -- a challenge under-explored in the literature. In this paper, we investigate the potential of incorporating active learning into DNN surrogate training. This allows intelligent and objective selection of training simulations, reducing the need to generate extensive simulation data as well as the dependency of the performance of DNN surrogates on pre-defined training simulations. In the problem context of constructing DNN surrogates for diffusion equations with sources, we examine the efficacy of diversity- and uncertainty-based strategies for selecting training simulations, considering two different DNN architecture. The results set the groundwork for developing the high-performance computing infrastructure for Smart Surrogates that supports on-the-fly generation of simulation data steered by active learning strategies to potentially improve the efficiency of scientific simulations.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# ケーラー非線形振動子における位相遷移 Topological Transitions in a Kerr Nonlinear Oscillator ( http://arxiv.org/abs/2407.07729v2 ) ライセンス: Link先を確認	Juan Lin, Shou-Bang Yang, Fan Wu, Zhen-Biao Yang,	(参考訳) カー非線形発振器(KNO)は、連続変数量子ビット基底状態の符号化に適した一対の定常固有状態、反対位相のコヒーレント状態をサポートする。定常状態部分空間内に閉じ込められたKNOの任意制御は、システムのクエンチ速度に対する物理的観測値の線形応答によるベリー曲率の抽出を可能にし、KNOにおける位相の効果的な評価法を提供する。代替として、KNOに「断熱へのショートカット」を採用する制御は、加速された断熱的固有状態の進化を通じてトポロジーの探索を可能にし、3つの物理観測物全てを測定する。位相遷移は、それぞれベリー曲率の積分と新しい極角関係から得られる第1チャーン数のパラメータ空間全体へのジャンプによって明らかにされる。我々の戦略は、連続変数系のトポロジカル遷移を測定する方法である。 A Kerr nonlinear oscillator (KNO) supports a pair of steady eigenstates, coherent states with opposite phases, that are good for the encoding of continuous variable qubit basis states. Arbitrary control of the KNO confined within the steady state subspace allows extraction of the Berry curvature through the linear response of the physical observable to the quench velocity of the system, providing an effective method for the characterization of topology in the KNO. As an alternative, the control adopting the "shortcut to adiabaticity" to the KNO enables the exploration of the topology through accelerated adiabatic eigenstate evolution to measure all three physical observables. Topological transitions are revealed by the jump of the first Chern number, obtained respectively from the integral of the Berry curvature and of the new polar angle relation, over the whole parameter space. Our strategy paves the way for measuring topological transitions in continuous variable systems.	翻訳日:2024-07-16 03:58:18 公開日:2024-07-12
# Mobility VLA: 長期VLMとトポロジグラフを用いたマルチモーダルインストラクションナビゲーション Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs ( http://arxiv.org/abs/2407.07775v2 ) ライセンス: Link先を確認	Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan,	(参考訳) ナビゲーション研究の究極的な目標は、自然言語や画像を含むマルチモーダル命令を理解し、有用なナビゲーションを実行するインテリジェントエージェントを構築することである。そこで本研究では,MINT (Multimodal Instruction Navigation with Demo Tours) と呼ばれる,従来記録されていたデモビデオを通じて,事前の環境を提供するナビゲーションタスクのカテゴリについて検討する。視覚言語モデル(VLM)の最近の進歩は、マルチモーダル入力の知覚と推論能力を示すものとして、この目標を達成する上で有望な道筋を示している。しかしながら、VLMは典型的にはテキスト出力を予測するために訓練されており、ナビゲーションに最適な方法に関するオープンな研究課題である。 MINT を解決するために,環境理解と長文 VLM の共通感覚推論能力とトポロジグラフに基づくロバストな低レベルナビゲーションポリシを組み合わせた階層型視覚言語行動(VLA)ナビゲーションポリシーであるモビリティ VLA を提案する。高レベルポリシーは、デモツアービデオとマルチモーダルユーザーインストラクションを入力として、ツアービデオのゴールフレームを見つけるための長文VLMで構成されている。次に、低レベルのポリシーでは、ゴールフレームとオフラインで構築されたトポロジグラフを使用して、各ステップでロボットアクションを生成する。我々は,836m^2実環境におけるモビリティVLAの評価を行い,プラスチック製の容器を持ちながら,それまで未解決であったマルチモーダル命令に対して,モビリティVLAは高いエンドツーエンドの成功率を示す。 Mobility VLAのデモビデオはこちらで見ることができる。 An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recorded demonstration video. Recent advances in Vision Language Models (VLMs) have shown a promising path in achieving this goal as it demonstrates capabilities in perceiving and reasoning about multimodal inputs. However, VLMs are typically trained to predict textual output and it is an open research question about how to best utilize them in navigation. To solve MINT, we present Mobility VLA, a hierarchical Vision-Language-Action (VLA) navigation policy that combines the environment understanding and common sense reasoning power of long-context VLMs and a robust low-level navigation policy based on topological graphs. The high-level policy consists of a long-context VLM that takes the demonstration tour video and the multimodal user instruction as input to find the goal frame in the tour video. Next, a low-level policy uses the goal frame and an offline constructed topological graph to generate robot actions at every timestep. We evaluated Mobility VLA in a 836m^2 real world environment and show that Mobility VLA has a high end-to-end success rates on previously unsolved multimodal instructions such as "Where should I return this?" while holding a plastic bin. A video demonstrating Mobility VLA can be found here: https://youtu.be/-Tof__Q8_5s	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# 無効機器を用いた双方向MRの同定と推定 Identification and Estimation of the Bi-Directional MR with Some Invalid Instruments ( http://arxiv.org/abs/2407.07933v2 ) ライセンス: Link先を確認	Feng Xie, Zhen Yao, Lin Xie, Yan Zeng, Zhi Geng,	(参考訳) 両方向メンデルランダム化(MR)における純粋観測データから因果効果を推定する難しい問題について考察する。この問題に対処するために、既存のほとんどの手法は、専門家の知識によって、あるいは因果モデルが一方向MRモデルであると仮定して、対象因果効果の適切な有効器用変数(IV)を見つけようとする。そこで,本稿ではまず,観測データから双方向MRの同定を理論的に検討する。特に、一対の表現型(すなわち、治療と結果)の因果方向を含む双方向MRモデルが識別可能であるように、有効なIV集合が正しく同定される必要十分条件を提供する。さらに、同定理論に基づいて、有効なIV集合を発見し、興味の因果効果を推定するクラスタ融合のような手法を開発する。理論的に提案アルゴリズムの正しさを実証する。両方向MRの因果効果を推定するための方法の有効性を実験的に検証した。 We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming that the causal model is a one-directional MR model. As such, in this paper, we first theoretically investigate the identification of the bi-directional MR from observational data. In particular, we provide necessary and sufficient conditions under which valid IV sets are correctly identified such that the bi-directional MR model is identifiable, including the causal directions of a pair of phenotypes (i.e., the treatment and outcome). Moreover, based on the identification theory, we develop a cluster fusion-like method to discover valid IV sets and estimate the causal effects of interest. We theoretically demonstrate the correctness of the proposed algorithm. Experimental results show the effectiveness of our method for estimating causal effects in bi-directional MR.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# 潜伏条件付き要約因果グラフにおけるマクロ条件の不依存性とマクロトータル効果の同定 Identifying macro conditional independencies and macro total effects in summary causal graphs with latent confounding ( http://arxiv.org/abs/2407.07934v2 ) ライセンス: Link先を確認	Simon Ferreira, Charles K. Assaad,	(参考訳) ダイナミックシステムにおける因果関係を理解することは、疫学、経済学、生物学を含む多くの科学分野において不可欠である。因果推論法は広く研究されているが、しばしば完全に定義された因果グラフに依存しており、必ずしも複雑な力学系では利用できないかもしれない。要約因果グラフ(SCG)のような部分特定因果グラフは、因果関係の単純化、時間的情報の省略、高レベルの因果構造に焦点を当てる。グラフ内の頂点として表されるクラスタ間の関係を含むマクロクエリと、グラフの頂点を通して直接見えない変数間の関係を含むマイクロクエリである。本稿では,まず,マクロ条件の非依存性とマイクロ条件の非依存性と,マクロ効果とマイクロトータル効果を明確に区別する。次に,SCGにおけるマクロ条件の不一致を識別するために,d-セパレーションの健全性と完全性を示す。さらに,SCGにおけるマクロトータル効果を同定するために,do-calculusが健全かつ完全であることが確認された。逆に,マイクロコンディショナル・インディペンデンシーとマイクロトータル・エフェクトを考慮した場合,これらの結果は成立しないことを示す。 Understanding causal relationships in dynamic systems is essential for numerous scientific fields, including epidemiology, economics, and biology. While causal inference methods have been extensively studied, they often rely on fully specified causal graphs, which may not always be available or practical in complex dynamic systems. Partially specified causal graphs, such as summary causal graphs (SCGs), provide a simplified representation of causal relationships, omitting temporal information and focusing on high-level causal structures. This simplification introduces new challenges concerning the types of queries of interest: macro queries, which involve relationships between clusters represented as vertices in the graph, and micro queries, which pertain to relationships between variables that are not directly visible through the vertices of the graph. In this paper, we first clearly distinguish between macro conditional independencies and micro conditional independencies and between macro total effects and micro total effects. Then, we demonstrate the soundness and completeness of the d-separation to identify macro conditional independencies in SCGs. Furthermore, we establish that the do-calculus is sound and complete for identifying macro total effects in SCGs. Conversely, we also show through various examples that these results do not hold when considering micro conditional independencies and micro total effects.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# 言語モデル復号化のためのオートマタによる制約 Automata-based constraints for language model decoding ( http://arxiv.org/abs/2407.08103v2 ) ライセンス: Link先を確認	Terry Koo, Frederick Liu, Luheng He,	(参考訳) 例えば、構造化データ、API呼び出し、コードスニペットなどである。 LMは形式構文への適合性を改善するために調整できるが、特に大規模展開に適した小型のLMでは適合性は保証されない。加えて、チューニングにはかなりのリソースが必要であるため、一般的でないフォーマットやタスク固有のフォーマットでは実用的ではない。下流のパースエラーを防ぐためには、LMが有効な出力のみを生成することを理想的に制限するが、これはトークン化によって非常に複雑になる。 APIコールやスキーマ誘導JSON,YAMLなど,多くの実用的なアプリケーションを備えた多種多様な形式言語である,正規言語に対する効率的なクローズドフォームソリューションを導出する,オートマトン理論の適用により,これらの問題を解決する。また,高分岐係数問題に対処するための実用的拡張についても論じる。最後に、我々の手法を決定論的文脈自由言語に拡張し、同様に効率的な閉形式解を許容する。その柔軟性と代表的能力にもかかわらず、我々のアプローチでは、トークンごとの復号化ロジットへのアクセスしか必要とせず、LMサイズに依存しない単純な計算に抑えられるため、ほぼ全てのLMアーキテクチャに効率よく簡単に適用できる。 LMs are often expected to generate strings in some formal language; for example, structured data, API calls, or code snippets. Although LMs can be tuned to improve their adherence to formal syntax, this does not guarantee conformance, especially with smaller LMs suitable for large-scale deployment. In addition, tuning requires significant resources, making it impractical for uncommon or task-specific formats. To prevent downstream parsing errors we would ideally constrain the LM to only produce valid output, but this is severely complicated by tokenization, which is typically both ambiguous and misaligned with the formal grammar. We solve these issues through the application of automata theory, deriving an efficient closed-form solution for the regular languages, a broad class of formal languages with many practical applications, including API calls or schema-guided JSON and YAML. We also discuss pragmatic extensions for coping with the issue of high branching factor. Finally, we extend our techniques to deterministic context-free languages, which similarly admit an efficient closed-form solution. In spite of its flexibility and representative power, our approach only requires access to per-token decoding logits and lowers into simple calculations that are independent of LM size, making it both efficient and easy to apply to almost any LM architecture.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# 欧州連合におけるフェデレーション学習とAI規制 : 責任とは何か -- 学際的分析 Federated Learning and AI Regulation in the European Union: Who is Responsible? -- An Interdisciplinary Analysis ( http://arxiv.org/abs/2407.08105v2 ) ライセンス: Link先を確認	Herbert Woisetschläger, Simon Mertel, Christoph Krönke, Ruben Mayer, Hans-Arno Jacobsen,	(参考訳) 欧州連合人工知能法(EU)は、膨大な罰金を回避するため、機械学習アプリケーションの開発とデプロイにおけるステークホルダーの明確な責任を委任し、その起源にあるデータによるプライベートでセキュアなデータ処理を優先する。フェデレートラーニング(FL)は、データサイロを越えた生成AIモデルのトレーニングを可能にし、データセキュリティを改善しながらモデルパラメータのみを共有する。 FLは協調学習パラダイムであるため、クライアントとサーバはFLパイプラインにおける法的責任を自然に共有する。我々の仕事は、双方の役割を明確にし、責任をサーバオペレータに移すための戦略を説明し、EU AI法の下でFLの実践的適用性を改善するために解決しなければならない、オープンな技術的課題を指摘している。 The European Union Artificial Intelligence Act mandates clear stakeholder responsibilities in developing and deploying machine learning applications to avoid substantial fines, prioritizing private and secure data processing with data remaining at its origin. Federated Learning (FL) enables the training of generative AI Models across data siloes, sharing only model parameters while improving data security. Since FL is a cooperative learning paradigm, clients and servers naturally share legal responsibility in the FL pipeline. Our work contributes to clarifying the roles of both parties, explains strategies for shifting responsibilities to the server operator, and points out open technical challenges that we must solve to improve FL's practical applicability under the EU AI Act.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# EchoMimic: 編集可能なランドマーク条件によるライブライクなオーディオ駆動のポートレートアニメーション EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions ( http://arxiv.org/abs/2407.08136v2 ) ライセンス: Link先を確認	Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma,	(参考訳) オーディオ入力によって推進されるポートレートイメージアニメーションの領域は、ライフライクでダイナミックなポートレートの生成において顕著な進歩を見せている。従来の方法では、音声または顔のキーポイントを使用して映像をビデオに駆動するに限られるが、良好な結果が得られるが、ある問題が存在する。例えば、音声のみによって駆動される手法は、比較的弱い音声信号のために時々不安定になり、一方、顔のキーポイントのみによって駆動される手法は、運転時により安定しているが、キーポイント情報の過剰な制御による不自然な結果をもたらす可能性がある。本稿では,これまでに述べた課題に対処するため,EchoMimicという新しいアプローチを提案する。 EchoMimicはオーディオと顔のランドマークの両方を使って同時にトレーニングされている。新たなトレーニング戦略の実装を通じて、EchoMimicは、オーディオと顔のランドマークを個別に生成するだけでなく、オーディオと選択された顔のランドマークを組み合わせることで、ポートレートビデオを生成することができる。 EchoMimicは、さまざまな公開データセットや収集データセットの代替アルゴリズムと比較して総合的に比較され、定量評価と定性評価の両方において優れたパフォーマンスを示している。ソースコードへのさらなる視覚化とアクセスは、EchoMimicプロジェクトページにある。 The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into videos, while they can yield satisfactory results, certain issues exist. For instance, methods driven solely by audios can be unstable at times due to the relatively weaker audio signal, while methods driven exclusively by facial key points, although more stable in driving, can result in unnatural outcomes due to the excessive control of key point information. In addressing the previously mentioned challenges, in this paper, we introduce a novel approach which we named EchoMimic. EchoMimic is concurrently trained using both audios and facial landmarks. Through the implementation of a novel training strategy, EchoMimic is capable of generating portrait videos not only by audios and facial landmarks individually, but also by a combination of both audios and selected facial landmarks. EchoMimic has been comprehensively compared with alternative algorithms across various public datasets and our collected dataset, showcasing superior performance in both quantitative and qualitative evaluations. Additional visualization and access to the source code can be located on the EchoMimic project page.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# 距離一貫性リハーサルによる画像検索における生涯病理組織学 Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency Rehearsal ( http://arxiv.org/abs/2407.08153v2 ) ライセンス: Link先を確認	Xinyu Zhu, Zhiguo Jiang, Kun Wu, Jun Shi, Yushan Zheng,	(参考訳) 近年,CBHIR (Content-based Histopathological Image Search) が注目されている。しかし、臨床実践においては、WSIデータベースの連続的な拡張サイズは、現在のCBHIR法の実用化に制限される。本稿では,連続的に成長する検索データベース上でのプログレッシブモデル更新による破滅的な忘れ込みの課題を解決するために,ライフロング・ホール・スライド検索(LWSR)フレームワークを提案する。私たちのフレームワークは、継続的学習中に安定性と可塑性のバランスを達成することを目的としています。システムの可塑性を維持するため,ローカルメモリバンクと貯水池サンプリングを用いて,旧タスクと新タスクの両方の特徴空間を包括的に包括的に包括的に包括するインスタンスの保存を行う。さらに,従来のタスクに対する検索キューの整合性を確保するために,距離整合リハーサル (DCR) モジュールが設計されている。提案手法をTCGAプロジェクトの4つの公開WSIデータセット上で評価した。実験により,提案手法は有効であり,最先端手法よりも優れていることが示された。 Content-based histopathological image retrieval (CBHIR) has gained attention in recent years, offering the capability to return histopathology images that are content-wise similar to the query one from an established database. However, in clinical practice, the continuously expanding size of WSI databases limits the practical application of the current CBHIR methods. In this paper, we propose a Lifelong Whole Slide Retrieval (LWSR) framework to address the challenges of catastrophic forgetting by progressive model updating on continuously growing retrieval database. Our framework aims to achieve the balance between stability and plasticity during continuous learning. To preserve system plasticity, we utilize local memory bank with reservoir sampling method to save instances, which can comprehensively encompass the feature spaces of both old and new tasks. Furthermore, A distance consistency rehearsal (DCR) module is designed to ensure the retrieval queue's consistency for previous tasks, which is regarded as stability within a lifelong CBHIR system. We evaluated the proposed method on four public WSI datasets from TCGA projects. The experimental results have demonstrated the proposed method is effective and is superior to the state-of-the-art methods.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# RB-SQL: テキスト・トゥ・SQLのための検索ベースのLLMフレームワーク RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL ( http://arxiv.org/abs/2407.08273v2 ) ライセンス: Link先を確認	Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song,	(参考訳) 文脈内学習を伴う大規模言語モデル(LLM)は、テキスト対SQLタスクのパフォーマンスを大幅に改善した。これまでの作業は一般的に、LLMの推論能力を改善するために排他的なSQL生成プロンプトを使用することに重点を置いていた。しかし、多くのテーブルや列を持つ大規模なデータベースを扱うことはほとんど難しく、通常、事前処理データベースの重要性を無視し、より効率的なプロンプトエンジニアリングのために貴重な情報を抽出する。提案するRB-SQLは,簡潔なテーブルと列をスキーマとして検索する3つのモジュールと,コンテキスト内学習のためのターゲット例からなる,コンテキスト内プロンプトエンジニアリングのための新しいLLMフレームワークである。実験により,我々のモデルは,公開データセットのBIRDとSpiderの競合ベースラインよりも優れた性能が得られることが示された。 Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting valuable information for more efficient prompt engineering. Based on above analysis, we propose RB-SQL, a novel retrieval-based LLM framework for in-context prompt engineering, which consists of three modules that retrieve concise tables and columns as schema, and targeted examples for in-context learning. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# 任意分解能における適応型深部虹彩特徴エクストラクタ Adaptive Deep Iris Feature Extractor at Arbitrary Resolutions ( http://arxiv.org/abs/2407.08341v2 ) ライセンス: Link先を確認	Yuho Shoji, Yuka Ogino, Takahiro Toizumi, Atsushi Ito,	(参考訳) 本稿では,任意の解像度で虹彩認識を行うための深部特徴抽出器を提案する。分解能劣化は、高解像度画像で訓練されたディープラーニングモデルの認識性能を低下させる。高解像度画像の認識性能を犠牲にしながら、各種解像度画像のトレーニングによりモデルの堅牢性を向上させることができる。様々な解像度で高い認識性能を実現するために,自動切替ネットワークを用いた分解能適応特徴抽出法を提案する。我々のフレームワークには、ダウンサンプリングやアウト・オブ・フォーカスのぼかしなど、様々な分解能劣化に特化した分解能専門家モジュールが含まれています。入力画像の劣化条件に応じて自動的に切り替える。低解像度の専門家は、両方の専門家が共通のアイデンティティの特徴を抽出できるように、高解像度の専門家からの知識蒸留によって訓練される。従来の3つのニューラルネットワークモデルに我々のフレームワークを適用した。実験結果から,本手法は従来手法の低解像度での認識性能の向上と高解像度での認識性能の維持を図っている。 This paper proposes a deep feature extractor for iris recognition at arbitrary resolutions. Resolution degradation reduces the recognition performance of deep learning models trained by high-resolution images. Using various-resolution images for training can improve the model's robustness while sacrificing recognition performance for high-resolution images. To achieve higher recognition performance at various resolutions, we propose a method of resolution-adaptive feature extraction with automatically switching networks. Our framework includes resolution expert modules specialized for different resolution degradations, including down-sampling and out-of-focus blurring. The framework automatically switches them depending on the degradation condition of an input image. Lower-resolution experts are trained by knowledge-distillation from the high-resolution expert in such a manner that both experts can extract common identity features. We applied our framework to three conventional neural network models. The experimental results show that our method enhances the recognition performance at low-resolution in the conventional methods and also maintains their performance at high-resolution.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# 統計物理学の基礎における量子力学的積分性 Quantum Thermodynamic Integrability for Foundations of Statistical Physics ( http://arxiv.org/abs/2407.08344v2 ) ライセンス: Link先を確認	Ruo-Xun Zhai, C. P. Sun,	(参考訳) 第二法則のカラス・エオロディの原理を、体積や磁場などのマクロ変数に依存するエネルギー準位を持つ量子熱力学に拡張する。この拡張は量子熱力学積分(QTI)の概念を導入し、統計力学の代替基盤を提供する。 QTIの特徴は、熱力学多様体内の仕事と熱の経路依存性であり、エネルギーレベルと特定の熱力学パラメータによって局所的に記述されている。この枠組みの中で、温度は自然に積分因子として現れ、QTIに基づくエントロピー積分方程式(EIE)から正準分布と非正準分布の両方を導出することができる。特に、非正準状態は、熱力学限界の外側で特に重要なものとなり、有限サイズの熱力学系における情報相関の存在を明らかにしている。 We extend the Carath\'eodory principle of the Second Law to quantum thermodynamics with energy levels depending on macroscopic variables, such as volume and magnetic field. This extension introduces the concept of Quantum Thermodynamic Integrability (QTI), offering an alternative foundation for statistical mechanics. QTI is characterized by the path-independence of work and heat within the thermodynamic manifold, which is locally described by energy levels and specific thermodynamic parameters. Within this framework, temperature naturally emerges as an integrating factor, allowing for the derivation of both canonical and non-canonical distributions from the Entropy Integrable Equations (EIE) based on QTI. Notably, non-canonical states, which become particularly significant outside the thermodynamic limit, reveal the existence of informational correlations in finite-size thermodynamic systems.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# PredBench: さまざまな分野にわたる時空間予測のベンチマーク PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines ( http://arxiv.org/abs/2407.08418v2 ) ライセンス: Link先を確認	ZiDong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai,	(参考訳) 本稿では,時空間予測ネットワークの全体的評価に適したベンチマークであるPredBenchを紹介する。この分野では大きな進歩があったが、様々な予測ネットワークアーキテクチャの詳細と比較分析のための標準化されたフレームワークはいまだに存在しない。 PredBenchはこのギャップに対処するため、大規模な実験を行い、標準化された適切な実験環境を維持し、多次元評価を実装する。このベンチマークは、広く採用されている12のメソッドと、複数のアプリケーションドメインにまたがる15の多様なデータセットを統合し、現代の時空間予測ネットワークを広範囲に評価する。 PredBenchは、様々なアプリケーションにわたる予測設定の厳密な校正を通じて、意図した使用に関する評価を保証し、公正な比較を可能にする。さらに、その多次元評価フレームワークは、包括的なメトリクスセットで分析を拡張し、モデルの能力に関する深い洞察を提供する。本研究から得られた知見は,今後の発展に向けての戦略的方向性を提供するものである。私たちのコードベースはhttps://github.com/OpenEarthLab/PredBench.orgで公開されています。 In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and appropriate experimental settings, and implementing multi-dimensional evaluations. This benchmark integrates 12 widely adopted methods with 15 diverse datasets across multiple application domains, offering extensive evaluation of contemporary spatio-temporal prediction networks. Through meticulous calibration of prediction settings across various applications, PredBench ensures evaluations relevant to their intended use and enables fair comparisons. Moreover, its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics, providing deep insights into the capabilities of models. The findings from our research offer strategic directions for future developments in the field. Our codebase is available at https://github.com/OpenEarthLab/PredBench.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# 無限運動:長文命令による拡張運動生成 Infinite Motion: Extended Motion Generation via Long Text Instructions ( http://arxiv.org/abs/2407.08443v2 ) ライセンス: Link先を確認	Mengtian Li, Chengshuo Zhai, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiang,	(参考訳) モーションジェネレーションの領域では、長周期で高品質なモーションシーケンスの作成は依然として重要な課題である。本稿では,長文から長文へ拡張された動き生成に活用し,短文と長文の運動合成のギャップを効果的に埋める新しい手法である「無限運動」に関する画期的な研究について述べる。私たちの中核となる洞察は、既存の高品質なテキストモーションデータセットの戦略的拡張と再組み立てであり、それによって、拡張されたモーションシーケンスのためのモデルのトレーニングを容易にする新しいベンチマークデータセットが作成されました。我々のモデルの重要な革新は、任意の長さのテキストを入力として受け入れることであり、特定の物語やシナリオに合わせた動き列の生成を可能にする。さらに、テキストのタイムスタンプ設計を取り入れ、生成したシーケンス内の局所セグメントの正確な編集を可能にし、非並列制御と動き合成の柔軟性を提供する。さらに、自然言語インタラクティブな編集、長いシーケンス内の動作シーケンスの編集、独立した動きシーケンスのスプライシングという3つの応用を通して、「無限運動」の汎用性と実用性を実証する。各アプリケーションは、我々のアプローチの適応性を強調し、モーションジェネレーションにおける研究と開発の可能性の範囲を広げる。大規模な実験を通じて,既存手法と比較して長周期動作の生成におけるモデルの性能を実証する。プロジェクトページ: https://shuochengzhai.github.io/ Infinite-motion.github.io/ In the realm of motion generation, the creation of long-duration, high-quality motion sequences remains a significant challenge. This paper presents our groundbreaking work on "Infinite Motion", a novel approach that leverages long text to extended motion generation, effectively bridging the gap between short and long-duration motion synthesis. Our core insight is the strategic extension and reassembly of existing high-quality text-motion datasets, which has led to the creation of a novel benchmark dataset to facilitate the training of models for extended motion sequences. A key innovation of our model is its ability to accept arbitrary lengths of text as input, enabling the generation of motion sequences tailored to specific narratives or scenarios. Furthermore, we incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences, offering unparalleled control and flexibility in motion synthesis. We further demonstrate the versatility and practical utility of "Infinite Motion" through three specific applications: natural language interactive editing, motion sequence editing within long sequences and splicing of independent motion sequences. Each application highlights the adaptability of our approach and broadens the spectrum of possibilities for research and development in motion generation. Through extensive experiments, we demonstrate the superior performance of our model in generating long sequence motions compared to existing methods.Project page: https://shuochengzhai.github.io/Infinite-motion.github.io/	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# Neural Poisson Solver: 自然信号ブレンディングのための普遍的で継続的なフレームワーク Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending ( http://arxiv.org/abs/2407.08457v2 ) ライセンス: Link先を確認	Delong Wu, Hao Zhu, Qi Zhang, You Li, Zhan Ma, Xun Cao,	(参考訳) Inlicit Neural Representation (INR)は、視覚信号(例えば、2D画像や3Dシーン)を表現し、様々なダウンストリームアプリケーションで有望な結果を示す一般的な方法となっている。視覚信号の媒体としての可能性を考えると、INRを利用したニューラルブレンディング法の開発は自然な進歩である。ニューラルブレンディングは、2つのINRをマージして、両方の元の表現から情報をカプセル化する新しいINRを作成する。直接的アプローチでは、INRレンダリングプロセスに従来の画像編集手法を適用する。しかし、この手法はしばしば歪み、アーティファクト、色の変化をブレンドする。主な原因は、下層の画素格子の離散化と、変分問題を解くための境界条件の導入である。この問題に対処するために,INRによって表現される視覚信号をブレンドするための,プラグアンドプレイで普遍的に適用可能なフレームワークであるNeural Poisson Solverを導入する。我々のニューラル・ポアソン・ソルバーは連続ポアソン方程式に基づく変分問題解決手法を提供し、様々な領域で例外的な性能を示す。具体的には、変分問題の解法過程を表現するための勾配誘導型ニューラルソルバを提案し、対象信号を精製して自然なブレンディング結果を得る。また,ポアソン方程式に基づく損失と最適化手法を開発し,入力されたINRシーンを効果的にブレンドし,固有の構造と意味的内容を保存する。追加の事前知識への依存の欠如により,本手法は様々なタスクカテゴリに適応しやすく,その汎用性を強調している。総合的な実験結果は、複数の次元にまたがるアプローチの頑健さとタスクのブレンディングを検証した。 Implicit Neural Representation (INR) has become a popular method for representing visual signals (e.g., 2D images and 3D scenes), demonstrating promising results in various downstream applications. Given its potential as a medium for visual signals, exploring the development of a neural blending method that utilizes INRs is a natural progression. Neural blending involves merging two INRs to create a new INR that encapsulates information from both original representations. A direct approach involves applying traditional image editing methods to the INR rendering process. However, this method often results in blending distortions, artifacts, and color shifts, primarily due to the discretization of the underlying pixel grid and the introduction of boundary conditions for solving variational problems. To tackle this issue, we introduce the Neural Poisson Solver, a plug-and-play and universally applicable framework across different signal dimensions for blending visual signals represented by INRs. Our Neural Poisson Solver offers a variational problem-solving approach based on the continuous Poisson equation, demonstrating exceptional performance across various domains. Specifically, we propose a gradient-guided neural solver to represent the solution process of the variational problem, refining the target signal to achieve natural blending results. We also develop a Poisson equation-based loss and optimization scheme to train our solver, ensuring it effectively blends the input INR scenes while preserving their inherent structure and semantic content. The lack of dependence on additional prior knowledge makes our method easily adaptable to various task categories, highlighting its versatility. Comprehensive experimental results validate the robustness of our approach across multiple dimensions and blending tasks.	翻訳日:2024-07-16 03:48:26 公開日:2024-07-12
# マルチモーダル顔画像テキストデータセット1500万 15M Multimodal Facial Image-Text Dataset ( http://arxiv.org/abs/2407.08515v2 ) ライセンス: Link先を確認	Dawei Dai, YuTang Li, YingGe Liu, Mingming Jia, Zhang YuanHui, Guoyin Wang,	(参考訳) 現在、画像テキスト駆動型マルチモーダルディープラーニングモデルは、多くの分野でその顕著な可能性を実証している。実際には、顔画像を中心としたタスクは幅広い応用可能性を持っている。本稿では,顔画像の大規模・多様・高品質なデータセットである「textbf{FaceCaption-15M}」について,その自然言語記述(顔画像からテキストへ)を伴って述べる。このデータセットは、顔中心タスクの研究を容易にすることを目的としている。 FaceCaption-15Mは、1500万対以上の顔画像と、それに対応する顔の特徴の自然言語記述で構成されており、これまでで最大の顔画像キャプチャデータセットとなっている。画像品質, テキストの自然性, テキストの複雑さ, テキスト画像の関連性を総合的に分析し, FaceCaption-15Mの優位性を実証した。 FaceCaption-15Mの有効性を検証するために,顔画像と対応する字幕を特徴空間で整列させるために,まず顔画像前訓練モデル(FLIP,CLIPと類似)を訓練した。その後、画像エンコーダとテキストエンコーダを併用し、線形層のみを微調整することで、FLIPベースのモデルでは、2つの課題のある顔中心タスクに対して最先端の結果が得られた。目的は、FaceCaption-15Mデータセットの公開を通じて、顔関連タスクの研究を促進することである。すべてのデータ、コード、モデルは公開されています。 https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M Currently, image-text-driven multi-modal deep learning models have demonstrated their outstanding potential in many fields. In practice, tasks centered around facial images have broad application prospects. This paper presents \textbf{FaceCaption-15M}, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image-caption dataset to date. We conducted a comprehensive analysis of image quality, text naturalness, text complexity, and text-image relevance to demonstrate the superiority of FaceCaption-15M. To validate the effectiveness of FaceCaption-15M, we first trained a facial language-image pre-training model (FLIP, similar to CLIP) to align facial image with its corresponding captions in feature space. Subsequently, using both image and text encoders and fine-tuning only the linear layer, our FLIP-based models achieved state-of-the-art results on two challenging face-centered tasks. The purpose is to promote research in the field of face-related tasks through the availability of the proposed FaceCaption-15M dataset. All data, codes, and models are publicly available. https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M	翻訳日:2024-07-16 03:38:34 公開日:2024-07-12
# 時空間的フェデレーション学習のグラディエント・インバージョン・アタックに対するプライバシー強化 Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks ( http://arxiv.org/abs/2407.08529v2 ) ライセンス: Link先を確認	Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa,	(参考訳) 時空間フェデレーション学習は、様々な位置情報ベースのサービスにおいて、共有勾配しか持たない価値あるモデルを訓練する能力のために、近年、集中的な研究が進められている。一方、最近の研究では、画像やテキスト上での共有勾配は、勾配反転攻撃(GIA)を受ける可能性があることが示されている。しかし、現在、時空間学習における勾配反転攻撃に関する体系的な研究は行われていない。本稿では,攻撃と防衛の観点からの時空間的フェデレーション学習における勾配攻撃問題について検討する。まず、時空間学習におけるプライバシーリスクを理解するために、時空間データに適した勾配攻撃アルゴリズムである時空間勾配反転攻撃(ST-GIA)を提案する。さらに、時空間学習における勾配反転攻撃を軽減するための適応的な防御戦略を設計する。摂動レベルを動的に調整することで、さまざまなトレーニングデータに対して、適切な保護を提供することができます。実世界の3つのデータセットに対する集中的な実験分析により、提案した防衛戦略が、効果的なセキュリティ保護を備えた時空間フェデレーション学習の有用性を十分に維持できることが明らかとなった。 Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion attacks in spatiotemporal federated learning. In this paper, we explore the gradient attack problem in spatiotemporal federated learning from attack and defense perspectives. To understand privacy risks in spatiotemporal federated learning, we first propose Spatiotemporal Gradient Inversion Attack (ST-GIA), a gradient attack algorithm tailored to spatiotemporal data that successfully reconstructs the original location from gradients. Furthermore, we design an adaptive defense strategy to mitigate gradient inversion attacks in spatiotemporal federated learning. By dynamically adjusting the perturbation levels, we can offer tailored protection for varying rounds of training data, thereby achieving a better trade-off between privacy and utility than current state-of-the-art methods. Through intensive experimental analysis on three real-world datasets, we reveal that the proposed defense strategy can well preserve the utility of spatiotemporal federated learning with effective security protection.	翻訳日:2024-07-16 03:38:34 公開日:2024-07-12
# IDAT: 対話型タスクソービングエージェントの構築と評価のためのマルチモーダルデータセットとツールキット IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents ( http://arxiv.org/abs/2407.08898v1 ) ライセンス: Link先を確認	Shrestha Mohanty, Negar Arabzadeh, Andrea Tupini, Yuxuan Sun, Alexey Skrynnik, Artem Zholus, Marc-Alexandre Côté, Julia Kiseleva,	(参考訳) AIエージェントと自然言語を用いた人間とのシームレスな対話は、AI研究の重要な目標である。本稿では,NeurIPSにおけるIGLUコンペティションを通じて,自然言語命令の理解と実行が可能な対話型エージェントを開発する上での課題について述べる。進歩にもかかわらず、適切なデータセットの不足や効果的な評価プラットフォームの必要性といった課題が続いている。 Minecraftのような環境で対話的な接地言語命令を収集するためのスケーラブルなデータ収集ツールを導入する。さらに,人間アノテータとのマルチターン通信による定性解析とエージェント性能の比較を行うための,Human-in-the-Loopインタラクティブ評価プラットフォームを提案する。我々は、知的な対話型AIエージェントの開発を促進し、さらなる研究に不可欠なリソースを提供することを目的として、IDAT(IGLU Dataset And Toolkit)と呼ばれるこれらの資産をコミュニティに提供します。 Seamless interaction between AI agents and humans using natural language remains a key goal in AI research. This paper addresses the challenges of developing interactive agents capable of understanding and executing grounded natural language instructions through the IGLU competition at NeurIPS. Despite advancements, challenges such as a scarcity of appropriate datasets and the need for effective evaluation platforms persist. We introduce a scalable data collection tool for gathering interactive grounded language instructions within a Minecraft-like environment, resulting in a Multi-Modal dataset with around 9,000 utterances and over 1,000 clarification questions. Additionally, we present a Human-in-the-Loop interactive evaluation platform for qualitative analysis and comparison of agent performance through multi-turn communication with human annotators. We offer to the community these assets referred to as IDAT (IGLU Dataset And Toolkit) which aim to advance the development of intelligent, interactive AI agents and provide essential resources for further research.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# 自閉症児の治療における人工知能の医療専門家・介護者支援への応用 Application of Artificial Intelligence in Supporting Healthcare Professionals and Caregivers in Treatment of Autistic Children ( http://arxiv.org/abs/2407.08902v1 ) ライセンス: Link先を確認	Hossein Mohammadi Rouzbahani, Hadis Karimipour,	(参考訳) 自閉症スペクトラム障害(Autism Spectrum disorder、ASD)は、社会的相互作用の困難、コミュニケーション障害、反復行動に特徴付けられる多面的な神経発達状態である。 ASDの理解の進展にもかかわらず、その診断と治療は、症状学の多様性と多分野の医療アプローチの必要性により、大きな課題を呈し続けている。本稿では,医療従事者や介護者のASD管理能力を高める人工知能(AI)の可能性について検討する。我々は,自閉症児と非自閉症児の日常活動における表情と身体の表情を解析するための高度なアルゴリズムを開発し,強力な深層学習に基づく自閉症検出システムの開発に繋がった。我々の研究は、AIモデル、特にXceptionとResNet50V2アーキテクチャが自閉症スペクトラム障害(ASD)の診断において高い精度を実現していることを示した。本研究は, ASDの診断, 治療, 包括的管理の改善におけるAIの変革的可能性を強調する。我々の研究によると、AIモデル、特にXceptionとResNet50V2アーキテクチャは、ASDの診断において高い精度を示した。 Autism Spectrum Disorder (ASD) represents a multifaceted neurodevelopmental condition marked by difficulties in social interaction, communication impediments, and repetitive behaviors. Despite progress in understanding ASD, its diagnosis and treatment continue to pose significant challenges due to the variability in symptomatology and the necessity for multidisciplinary care approaches. This paper investigates the potential of Artificial Intelligence (AI) to augment the capabilities of healthcare professionals and caregivers in managing ASD. We have developed a sophisticated algorithm designed to analyze facial and bodily expressions during daily activities of both autistic and non-autistic children, leading to the development of a powerful deep learning-based autism detection system. Our study demonstrated that AI models, specifically the Xception and ResNet50V2 architectures, achieved high accuracy in diagnosing Autism Spectrum Disorder (ASD). This research highlights the transformative potential of AI in improving the diagnosis, treatment, and comprehensive management of ASD. Our study revealed that AI models, notably the Xception and ResNet50V2 architectures, demonstrated high accuracy in diagnosing ASD.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# TensorTEE: 安全なコラボレーション型テンソルコンピューティングのための不均一なTEE粒度の統合 TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing ( http://arxiv.org/abs/2407.08903v1 ) ライセンス: Link先を確認	Husheng Han, Xinyao Zheng, Yuanbo Wen, Yifan Hao, Erhu Feng, Ling Liang, Jianan Mu, Xiaqing Li, Tianyun Ma, Pengwei Jin, Xinkai Song, Zidong Du, Qi Guo, Xing Hu,	(参考訳) NPUとCPUによる不均一なコラボレーティブコンピューティングは、その性能上の利点から広く注目を集めている。コンピューティングにおけるデータの機密性と整合性を確保するため、Trusted Execution Environments (TEE) は比較的低いオーバーヘッドのため、有望なソリューションであると考えられている。しかし、既存の異種TEE設計は、CPUとNPUのメモリの粒度が微妙で異なるため、協調コンピューティングでは非効率である。 1) CPU TEEのキャッシュラインの粒度は、余分なメモリアクセスによるメモリ圧力を増大させ、 2)NPUのキャッシュライン粒度MACは、限られたメモリストレージの圧力を増大させる。 3) 異種エンクレーブ間のデータ転送は非セキュア領域の転送に依存しており, 煩雑な再暗号化とスケジューリングを行う。これらの問題に対処するために,効率的な協調テンソル計算のための統一テンソル粒度不均一TEEであるTensorTEEを提案する。まず,CPUTEEにおけるテンソルの粒度を仮想的にサポートし,チップ上のテンソル構造を検出し維持することにより,オフチップメタデータアクセスを除去する。第2に,オフチップMACストレージとアクセスを排除しつつ,計算停止を回避するために,予測実行を伴うテンソル粒度MAC管理を提案する。さらに、統一された粒度に基づいて、再暗号化やジレンマのスケジューリングを行わずに直接データ転送を可能にする。本評価は,改良されたGem5とサイクル精度NPUシミュレータ上に構築した。その結果、TensorTEEは、既存の作業に比べてLarge Language Model(LLM)トレーニングワークロードのパフォーマンスを4.0倍改善し、非セキュアトレーニングに比べて2.1%オーバーヘッドしか発生せず、LLMトレーニングの実践的なセキュリティ保証を提供することがわかった。 Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# ローレンツ共変物理ブラウン運動:古典的および量子的 Lorentz covariant physical Brownian motion: Classical and quantum ( http://arxiv.org/abs/2407.08905v1 ) ライセンス: Link先を確認	Henryk Gzyl,	(参考訳) 本研究では,Goldstein-Ka\c{c}速度スイッチングモデルについて,二つの視点から再検討する。一方、確率過程の前方および後方のチャップマン・コルモゴロフ方程式がローレンツ共変であることを示す。一方、このモデルを量子ランダム進化として再キャストするために、ゴールドスタイン-Ka\c{c}モデルをハミルトニアン系として再検討し、標準対応規則を用いて量子化することができる。ランダムな量子進化の密度は古典的な場合と同様のチャップマン・コルモゴロフ方程式を満たすことが判明し、従ってローレンツ共変である。平均量子分散を計算する。結論として、量子モデルは特殊相対性理論とも一致しており、光円錐の外側の遷移、すなわち時空で不整合を持つ状態間の遷移は起こらないことを検証する。 In this work, we re-examine the Goldstein-Ka\c{c} velocity switching model from two points of view. On the one hand, we prove that the forward and backward Chapman-Kolmogorov equations of the stochastic process are Lorentz covariant when the trajectories are parameterized by their proper time. On the other hand, to recast the model as a quantum random evolution, we consider restating the Goldstein-Ka\c{c} model as a Hamiltonian system, which can then be quantized using the standard correspondence rules. It turns out that the density for the random quantum evolution satisfies a Chapman-Kolmogorov equation similar to that of the classical case, and therefore, it is also Lorentz covariant. We compute the average quantum variance. To finish, we verify that the quantum model is also consistent with special relativity and that transitions outside the light cone, that is, transitions between states with disjoint supports in space-time, cannot occur.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# AirSketch: スケッチのための生成モーション AirSketch: Generative Motion to Sketch ( http://arxiv.org/abs/2407.08906v1 ) ライセンス: Link先を確認	Hui Xian Grace Lim, Xuanming Cui, Yogesh S Rawat, Ser-Nam Lim,	(参考訳) イラストレーションは人間の表現とコミュニケーションの基本的なモードである。音声に付随するある種の動きは、この説明的なコミュニケーションのモードを提供することができる。 Augmented and Virtual Reality Technologies (AR/VR) は手の動き(空気描画)を描画するツールを導入したが、通常は高価なハードウェアと追加のデジタルマーカーが必要であり、それによってアクセシビリティとポータビリティが制限される。さらに、空気描画は美的な結果を得るためにかなりの技術を必要とする。これらの課題に対処するために,手の動きから直接忠実で視覚的に整合したスケッチを生成し,複雑なヘッドセットやマーカーを必要としないAirSketchの概念を紹介した。制御可能な画像拡散モデルにより、ノイズの多い手追跡画像から、クリーンで美的なスケッチへの変換を学習し、元の追跡データから不可欠な視覚的手がかりを保ちながら、簡単な拡張ベースの自己教師付き訓練手順を考案する。この問題を研究するために,空気描画データセットを2つ提示する。以上の結果から,空間的正確な入力から写真実写画像を生成するだけでなく,制御可能な画像拡散により,ノイズの多い入力から鮮明なスケッチを効果的に作成できることが示唆された。我々の研究は、マーカーレス空気描画への最初のステップとして機能し、AirSketchやAR/VR全般に制御可能な拡散モデルの異なる応用を明らかにする。 Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting their accessibility and portability. Furthermore, air drawing demands considerable skill to achieve aesthetic results. To address these challenges, we introduce the concept of AirSketch, aimed at generating faithful and visually coherent sketches directly from hand motions, eliminating the need for complicated headsets or markers. We devise a simple augmentation-based self-supervised training procedure, enabling a controllable image diffusion model to learn to translate from highly noisy hand tracking images to clean, aesthetically pleasing sketches, while preserving the essential visual cues from the original tracking data. We present two air drawing datasets to study this problem. Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. Our work serves as an initial step towards marker-less air drawing and reveals distinct applications of controllable diffusion models to AirSketch and AR/VR in general.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# 画像が同じか?画像検索における人間とAIの協調のための概念ボタネックモデルの適用 Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval ( http://arxiv.org/abs/2407.08908v1 ) ライセンス: Link先を確認	Vaibhav Balloli, Sara Beery, Elizabeth Bondi-Kelly,	(参考訳) 画像検索は、野生生物保護から医療への応用において、個々の動物や関連画像を見つけるために重要な役割を果たす。画像検索のためのディープラーニング技術は著しく進歩しているが、その不完全な現実世界のパフォーマンスは人間の専門知識を含む必要がしばしばある。ヒューマン・イン・ザ・ループ(Human-in-the-loop)アプローチは一般的に、人間が独立してタスクを完了し、その意見をさまざまな方法でAIモデルと組み合わせる。人間がAIモデルに介入できるように、人間の時間と労力を節約し、概念ボトルネックモデル(CBM)を適用して、‘texttt{CHAIR}’を提案する。 \texttt{CHAIR} (a) 人間が中間概念を修正できるようにし、 textit{improve} 埋め込みの生成を助け、 b) より優れた検索のために、様々なレベルの人間の専門知識を適合させる柔軟な介入を可能にする。本手法は, 外部介入を伴わずに, 画像検索指標の類似モデルよりも優れた性能を示す。さらに,人間の介入によって検索性能が向上し,人間とAIの相補性が向上することを示す。 Image retrieval plays a pivotal role in applications from wildlife conservation to healthcare, for finding individual animals or relevant images to aid diagnosis. Although deep learning techniques for image retrieval have advanced significantly, their imperfect real-world performance often necessitates including human expertise. Human-in-the-loop approaches typically rely on humans completing the task independently and then combining their opinions with an AI model in various ways, as these models offer very little interpretability or \textit{correctability}. To allow humans to intervene in the AI model instead, thereby saving human time and effort, we adapt the Concept Bottleneck Model (CBM) and propose \texttt{CHAIR}. \texttt{CHAIR} (a) enables humans to correct intermediate concepts, which helps \textit{improve} embeddings generated, and (b) allows for flexible levels of intervention that accommodate varying levels of human expertise for better retrieval. To show the efficacy of \texttt{CHAIR}, we demonstrate that our method performs better than similar models on image retrieval metrics without any external intervention. Furthermore, we also showcase how human intervention helps further improve retrieval performance, thereby achieving human-AI complementarity.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# KGpose: ポイントワイズポーズ投票によるキーポイントグラフ駆動型エンド・ツー・エンド多目的6Dポーズ推定 KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting ( http://arxiv.org/abs/2407.08909v1 ) ライセンス: Link先を確認	Andrew Jeong,	(参考訳) KGposeは、複数のオブジェクトの6Dポーズ推定のための新しいエンドツーエンドフレームワークである。提案手法は,キーポイントのグラフ表現である'keypoint-graph'を通じて,キーポイントベースの手法と学習可能なポーズ回帰を組み合わせる。 KGposeはまず、RGBとポイントクラウドの機能を融合したマルチモーダル機能を用いて、各オブジェクトの3Dキーポイントを推定する。これらのキーポイントは点雲の各点から推定され、グラフ表現に変換される。ネットワークは、グラフ畳み込みで設計されたキーポイントグラフ埋め込みと局所グラフ埋め込みのシーケンスを通じて、各ポイントの6Dパラメータを直接回帰し、その後に回転と変換ヘッドが続く。各オブジェクトの最終ポーズは、ポイントワイズ予測候補から選択される。本手法は,本モデルの有効性を実証し,ベンチマークデータセット上での競合結果を実現する。 KGposeは、ロボットアプリケーションのための複雑なシーンにおける幾何学的コンテキストを理解するための統一的で効率的なソリューションを提供するため、追加のローカライゼーションステップを必要とせずに、多目的ポーズ推定を可能にする。 This letter presents KGpose, a novel end-to-end framework for 6D pose estimation of multiple objects. Our approach combines keypoint-based method with learnable pose regression through `keypoint-graph', which is a graph representation of the keypoints. KGpose first estimates 3D keypoints for each object using an attentional multi-modal feature fusion of RGB and point cloud features. These keypoints are estimated from each point of point cloud and converted into a graph representation. The network directly regresses 6D pose parameters for each point through a sequence of keypoint-graph embedding and local graph embedding which are designed with graph convolutions, followed by rotation and translation heads. The final pose for each object is selected from the candidates of point-wise predictions. The method achieves competitive results on the benchmark dataset, demonstrating the effectiveness of our model. KGpose enables multi-object pose estimation without requiring an extra localization step, offering a unified and efficient solution for understanding geometric contexts in complex scenes for robotic applications.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# PAIL:カーボンニュートラル最適化のための性能に基づく逆模倣学習エンジン PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization ( http://arxiv.org/abs/2407.08910v1 ) ライセンス: Link先を確認	Yuyang Ye, Lu-An Tang, Haoyu Wang, Runlong Yu, Wenchao Yu, Erhu He, Haifeng Chen, Hui Xiong,	(参考訳) 工業運転における炭素中立化は、持続可能な開発にますます不可欠になっている。これは重要な課題であり、業界4.0における運用最適化の重要な機会でもある。近年、深層強化学習(DRL)に基づく手法は、逐次最適化プロセスの有望な拡張を提供し、二酸化炭素排出量の削減に利用できる。しかし、既存のDRL法では、各アクションが最終的な持続可能な開発目標(SDG)に与える影響を評価するために、事前に定義された報酬関数が必要である。多くの実応用において、そのような報酬関数は事前に与えられない。そこで本研究では,PAIL(Performance Based Adversarial Imitation Learning)エンジンを提案する。これは、事前に定義されたアクション報酬を伴わずに、炭素中立性のための最適な操作ポリシーを取得するための新しい方法である。具体的には、Transformerベースのポリシージェネレータを使用して、履歴情報をエンコードし、多次元空間内の後続のアクションを予測する。アクションシーケンス全体を環境シミュレータによって反復的に更新する。次に、PAILは判別器を用いて、生成されたシーケンスと高SDGの実世界のサンプルとの差を最小限にする。並行して、Qラーニングフレームワークに基づくパフォーマンス推定器は、各アクションがSDGに与える影響を推定するために設計されている。これらの推定に基づいて、PAILは識別器と性能推定器の両方の報酬で生成されたポリシーを洗練する。 PAILは、複数の実世界のアプリケーションケースとデータセットで評価される。実験結果は,他の最先端ベースラインと比較したPAILの有効性を示した。さらに、PAILは炭素中立性の最適化に有意義な解釈性を提供する。 Achieving carbon neutrality within industrial operations has become increasingly imperative for sustainable development. It is both a significant challenge and a key opportunity for operational optimization in industry 4.0. In recent years, Deep Reinforcement Learning (DRL) based methods offer promising enhancements for sequential optimization processes and can be used for reducing carbon emissions. However, existing DRL methods need a pre-defined reward function to assess the impact of each action on the final sustainable development goals (SDG). In many real applications, such a reward function cannot be given in advance. To address the problem, this study proposes a Performance based Adversarial Imitation Learning (PAIL) engine. It is a novel method to acquire optimal operational policies for carbon neutrality without any pre-defined action rewards. Specifically, PAIL employs a Transformer-based policy generator to encode historical information and predict following actions within a multi-dimensional space. The entire action sequence will be iteratively updated by an environmental simulator. Then PAIL uses a discriminator to minimize the discrepancy between generated sequences and real-world samples of high SDG. In parallel, a Q-learning framework based performance estimator is designed to estimate the impact of each action on SDG. Based on these estimations, PAIL refines generated policies with the rewards from both discriminator and performance estimator. PAIL is evaluated on multiple real-world application cases and datasets. The experiment results demonstrate the effectiveness of PAIL comparing to other state-of-the-art baselines. In addition, PAIL offers meaningful interpretability for the optimization in carbon neutrality.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# 高度な機械学習による映画推薦の変換:NMF,SVD,K-Meansクラスタリングの検討 Transforming Movie Recommendations with Advanced Machine Learning: A Study of NMF, SVD,and K-Means Clustering ( http://arxiv.org/abs/2407.08916v1 ) ライセンス: Link先を確認	Yubing Yan, Camille Moreau, Zhuoyue Wang, Wenhan Fan, Chengqian Fu,	(参考訳) 本研究では,Non-Negative Matrix Factorization (NMF),Trncated Singular Value Decomposition (SVD),K-Means Clusteringなどの機械学習技術を用いて,ロバストな映画推薦システムを開発した。主な目的は、パーソナライズされた映画レコメンデーションを提供することでユーザーエクスペリエンスを向上させることである。この研究は、データ前処理、モデルトレーニング、評価を含み、採用手法の有効性を強調している。その結果,提案システムはレコメンデーションの精度と妥当性が高く,レコメンデーションシステムの分野に多大な貢献をしていることがわかった。 This study develops a robust movie recommendation system using various machine learning techniques, including Non- Negative Matrix Factorization (NMF), Truncated Singular Value Decomposition (SVD), and K-Means clustering. The primary objective is to enhance user experience by providing personalized movie recommendations. The research encompasses data preprocessing, model training, and evaluation, highlighting the efficacy of the employed methods. Results indicate that the proposed system achieves high accuracy and relevance in recommendations, making significant contributions to the field of recommendations systems.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# 進化的多タスク最適化における知識伝達の探索--複雑ネットワークの視点から Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective ( http://arxiv.org/abs/2407.08918v1 ) ライセンス: Link先を確認	Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu,	(参考訳) 進化的多タスク最適化(EMaTO)の分野は、繰り返し特性による最適化課題の解決を合理化し、計算資源を保存できることで、ますます認識されている。本稿では,個々のタスク評価の計算要求によって複雑化するタスクであるEMATO内で,効率的な知識伝達機構を構築することの課題に取り組む。本稿では,EMATO内のタスク間の知識伝達のダイナミクスを包括的に解析するために,複雑なネットワークを用いた新しいフレームワークを提案する。既存のEMATOアルゴリズムから知識伝達ネットワークを抽出し、精査することにより、ネットワーク修正が全体的なアルゴリズムの有効性に与える影響を評価する。その結果,これらのネットワークは多様であり,ネットワーク密度は異なるタスクセットに適応し,コミュニティ構造を指向したグラフ特性を示すことが示唆された。本研究は、複雑なネットワーク概念をEMATOに統合し、知識伝達プロセスを洗練し、将来的なドメインの進歩への道を開くことの可能性を実証するものである。 The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task evaluations. We introduce a novel framework that employs a complex network to comprehensively analyze the dynamics of knowledge transfer between tasks within EMaTO. By extracting and scrutinizing the knowledge transfer network from existing EMaTO algorithms, we evaluate the influence of network modifications on overall algorithmic efficacy. Our findings indicate that these networks are diverse, displaying community-structured directed graph characteristics, with their network density adapting to different task sets. This research underscores the viability of integrating complex network concepts into EMaTO to refine knowledge transfer processes, paving the way for future advancements in the domain.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# ナノ合成メカニズム説明のための大規模言語モデルを活用する:固体基礎か単なる予想か? Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? ( http://arxiv.org/abs/2407.08922v1 ) ライセンス: Link先を確認	Yingming Pu, Liping Huang, Tao Lin, Hongyu Chen,	(参考訳) 人工知能(AI)の急速な発展に伴い、GPT-4のような大規模言語モデル(LLM)は科学界で大きな注目を集め、科学的発見の進展に大きな可能性を示している。これらのLSMは、現実世界の物理化学的原理とよく一致しているか? 現在の評価戦略は、物質的特性予測や名前認識などの事実に基づく知識を主に重視しているが、論理的推論を必要とする基本的な物理化学的メカニズムの理解が欠如していることが多い。このギャップを埋めるために,金ナノ粒子合成のメカニズムに焦点をあてた775個の多重選択質問からなるベンチマークを開発した。既存の評価指標を反映することにより、直接真偽評価が単に推測を示唆するかどうかを疑問視する。そこで本研究では,評価基準である信頼度に基づくスコア(cスコア)を提案し,出力ロジットを探索し,正解の正確な確率を導出する。実験結果から,金ナノ粒子合成の文脈では,LLMは予想よりも基礎となる物理化学的機構を理解する。本研究は,LLMが本質的な科学的メカニズムを把握し,より信頼性が高く効果的なAIツールを様々な科学領域で開発するための段階を定めている。 With the rapid development of artificial intelligence (AI), large language models (LLMs) such as GPT-4 have garnered significant attention in the scientific community, demonstrating great potential in advancing scientific discovery. This progress raises a critical question: are these LLMs well-aligned with real-world physicochemical principles? Current evaluation strategies largely emphasize fact-based knowledge, such as material property prediction or name recognition, but they often lack an understanding of fundamental physicochemical mechanisms that require logical reasoning. To bridge this gap, our study developed a benchmark consisting of 775 multiple-choice questions focusing on the mechanisms of gold nanoparticle synthesis. By reflecting on existing evaluation metrics, we question whether a direct true-or-false assessment merely suggests conjecture. Hence, we propose a novel evaluation metric, the confidence-based score (c-score), which probes the output logits to derive the precise probability for the correct answer. Based on extensive experiments, our results show that in the context of gold nanoparticle synthesis, LLMs understand the underlying physicochemical mechanisms rather than relying on conjecture. This study underscores the potential of LLMs to grasp intrinsic scientific mechanisms and sets the stage for developing more reliable and effective AI tools across various scientific domains.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# LLMによる難燃剤の分解 Disassembling Obfuscated Executables with LLM ( http://arxiv.org/abs/2407.08924v1 ) ライセンス: Link先を確認	Huanyao Rong, Yue Duan, Hang Zhang, XiaoFeng Wang, Hongbo Chen, Shengchen Duan, Shen Wang,	(参考訳) 分解は、特に分解エラーを引き起こすように設計されたジャンクバイトを含む難読化された実行ファイルにとって、困難なタスクである。既存のソリューションはヒューリスティックや機械学習技術を利用するが、限られた成功しか達成できない。基本的に、そのような難読化は、大言語モデル(LLM)の出現によって実現されるバイナリ実行ファイルのセマンティクスの深い理解なしには達成できない。本稿では,難読化可能ファイルの解析における課題を克服するために,新しいLCM駆動型ディスサンブラであるDisasLLMを提案する。 DisasLLMは、アセンブリコードスニペット内の命令が正しくデコードされているかどうかを決定するLLMベースの分類器と、このモデルを利用して難読化された実行ファイルをエンドツーエンドに分解する分解戦略の2つのコンポーネントで構成されている。我々は、DisasLLMを非常に難解な実行ファイルの集合で評価し、他の最先端の分解ソリューションよりも大幅に優れていることを示した。 Disassembly is a challenging task, particularly for obfuscated executables containing junk bytes, which is designed to induce disassembly errors. Existing solutions rely on heuristics or leverage machine learning techniques, but only achieve limited successes. Fundamentally, such obfuscation cannot be defeated without in-depth understanding of the binary executable's semantics, which is made possible by the emergence of large language models (LLMs). In this paper, we present DisasLLM, a novel LLM-driven dissembler to overcome the challenge in analyzing obfuscated executables. DisasLLM consists of two components: an LLM-based classifier that determines whether an instruction in an assembly code snippet is correctly decoded, and a disassembly strategy that leverages this model to disassemble obfuscated executables end-to-end. We evaluated DisasLLM on a set of heavily obfuscated executables, which is shown to significantly outperform other state-of-the-art disassembly solutions.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# ハードウェアデバイス上での複数量子回路のリソース認識スケジューリング Resource-aware scheduling of multiple quantum circuits on a hardware device ( http://arxiv.org/abs/2407.08930v1 ) ライセンス: Link先を確認	Debasmita Bhoumik, Ritajit Majumdar, Susmita Sur-Kolay,	(参考訳) 最近の量子技術と量子エラー訂正符号は、望ましくないノイズを避けるために、量子回路を所定のハードウェアデバイスにマッピングしながら、最も近い隣り合う(NN)構成で相互作用する量子ビットを配置する必要性を強調している。 m < m の n 量子ビットの回路を実行しながら、m 量子ビットを持つ量子ハードウェア装置において、量子ビットの無駄を最小化することが同様に重要である。 2つの回路間のクロストークを防ぐために、レイアウト間のバッファ距離が必要となる。さらに、全ての量子ビットと2つの量子ビットの相互作用は同じノイズレベルであるわけではない。同じハードウェア上で複数の回路をスケジューリングすると、一部の回路が他の回路よりもノイズの多いレイアウトで実行される可能性がある。本稿では,ハードウェア上での並列実行に可能な限り多くの回路をスケジュールする最適化問題について検討する。相互作用する量子ビット間の近接配置を保ちながら、最大忠実性を確保する整数線形プログラミング定式化を示す。我々の主張は、よく知られた量子回路ベンチマークを含む包括的な調査によって支持されている。このスケジューリング問題はNPハードであることが示され、27量子ビットと127量子ビットのハードウェアデバイスに対して、それぞれ2倍と3倍の精度で量子ビットと時間で利用することができる、欲張りのヒューリスティック手法も提案する。 Recent quantum technologies and quantum error-correcting codes emphasize the requirement for arranging interacting qubits in a nearest-neighbor (NN) configuration while mapping a quantum circuit onto a given hardware device, in order to avoid undesirable noise. It is equally important to minimize the wastage of qubits in a quantum hardware device with m qubits while running circuits of n qubits in total, with n < m. In order to prevent cross-talk between two circuits, a buffer distance between their layouts is needed. Furthermore, not all the qubits and all the two-qubit interactions are at the same noise-level. Scheduling multiple circuits on the same hardware may create a possibility that some circuits are executed on a noisier layout than the others. In this paper, we consider an optimization problem which schedules as many circuits as possible for execution in parallel on the hardware, while maintaining a pre-defined layout quality for each. An integer linear programming formulation to ensure maximum fidelity while preserving the nearest neighbor arrangement among interacting qubits is presented. Our assertion is supported by comprehensive investigations involving various well-known quantum circuit benchmarks. As this scheduling problem is shown to be NP Hard, we also propose a greedy heuristic method which provides 2x and 3x better utilization for 27-qubit and 127-qubit hardware devices respectively in terms of qubits and time.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# ライダーを用いた開語彙検出のためのLLMを用いたグローバルローカル協調推論 Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection ( http://arxiv.org/abs/2407.08931v1 ) ライセンス: Link先を確認	Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu,	(参考訳) Open-Vocabulary Detection (OVD)は、事前に定義されたオブジェクトクラスなしで、あるシーンですべての興味深いオブジェクトを検出するタスクである。 2D RGB画像のOVDに対する大規模な取り組みは行われているが、3D OVDの探索はまだ限られている。直感的には、ライダーポイントクラウドはオブジェクトレベルとシーンレベルの両方の3D情報を提供し、信頼できる検出結果を生成する。しかし、従来のライダーベースのOVD手法は、シーンレベルの情報の本質を無視して、オブジェクトレベルの特徴の使用のみに焦点を当てていた。本稿では、オブジェクトレベルの検出結果を生成するローカルブランチと、シーンレベルのグローバル機能を得るグローバルブランチを含む、ライダーベースのOVDタスクのためのグローバルローカル協調スキーム(GLIS)を提案する。グローバルなローカル情報を用いて、連鎖推定にLarge Language Model(LLM)を適用し、それに応じて検出結果を洗練することができる。さらに,Reflectioned Pseudo Labels Generation (RPLG) を提案し,高品質な擬似ラベルを生成するとともに,背景認識オブジェクトローカライゼーション (BAOL) を用いて正確なオブジェクト提案を選択する。 ScanNetV2 と SUN RGB-D の大規模な実験により,本手法の優位性を実証した。コードはhttps://github.com/GradiusTwinbee/GLISで公開されている。 Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous lidar-based OVD methods only focus on the usage of object-level features, ignoring the essence of scene-level information. In this paper, we propose a Global-Local Collaborative Scheme (GLIS) for the lidar-based OVD task, which contains a local branch to generate object-level detection result and a global branch to obtain scene-level global feature. With the global-local information, a Large Language Model (LLM) is applied for chain-of-thought inference, and the detection result can be refined accordingly. We further propose Reflected Pseudo Labels Generation (RPLG) to generate high-quality pseudo labels for supervision and Background-Aware Object Localization (BAOL) to select precise object proposals. Extensive experiments on ScanNetV2 and SUN RGB-D demonstrate the superiority of our methods. Code is released at https://github.com/GradiusTwinbee/GLIS.	翻訳日:2024-07-16 01:16:30 公開日:2024-07-12
# 動的環境における自律走行車両意思決定のための深部注意駆動型強化学習(DAD-RL) Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Vehicle Decision-Making in Dynamic Environment ( http://arxiv.org/abs/2407.08932v1 ) ライセンス: Link先を確認	Jayabrata Chowdhury, Venkataramanan Shivaraman, Sumit Dangi, Suresh Sundaram, P. B. Sujit,	(参考訳) 都市環境における自律走行車(AV)の意思決定は、周囲の車両との動的相互作用のために本質的に困難である。安全な計画のためには、AVはシーン内の様々な時空間相互作用の重み付けを理解する必要がある。現代の研究では、トラジェクトリ予測を中心に相互作用を符号化するためにコロッサルトランスフォーマーアーキテクチャを使用しており、計算複雑性が増大している。時空間的理解と性能を損なうことなくこの問題に対処するため,エゴのRL駆動意思決定プロセスにおいて,周囲車両の意義を動的に割り当て,組み込む,DADRL(Deep Attention Driven Reinforcement Learning)フレームワークを提案する。 AV中心の時空間アテンション符号化(STAE)機構を導入し,周囲の車両との動的相互作用を学習する。地図と経路のコンテキストを理解するために,コンテキストマップから特徴を抽出するためにコンテキストエンコーダを用いる。時空間表現と文脈符号化の組み合わせは、包括的な状態表現を提供する。得られたモデルは、Soft Actor Critic (SAC)アルゴリズムを用いて訓練される。我々は,交通信号のないSMARTS都市ベンチマークの枠組みを検証し,DADRLが最近の最先端手法よりも優れていることを示す。さらに、アブレーション研究は、優れた性能を達成する上で、文脈エンコーダと時空間アテンションエンコーダの重要性を強調している。 Autonomous Vehicle (AV) decision making in urban environments is inherently challenging due to the dynamic interactions with surrounding vehicles. For safe planning, AV must understand the weightage of various spatiotemporal interactions in a scene. Contemporary works use colossal transformer architectures to encode interactions mainly for trajectory prediction, resulting in increased computational complexity. To address this issue without compromising spatiotemporal understanding and performance, we propose the simple Deep Attention Driven Reinforcement Learning (DADRL) framework, which dynamically assigns and incorporates the significance of surrounding vehicles into the ego's RL driven decision making process. We introduce an AV centric spatiotemporal attention encoding (STAE) mechanism for learning the dynamic interactions with different surrounding vehicles. To understand map and route context, we employ a context encoder to extract features from context maps. The spatiotemporal representations combined with contextual encoding provide a comprehensive state representation. The resulting model is trained using the Soft Actor Critic (SAC) algorithm. We evaluate the proposed framework on the SMARTS urban benchmarking scenarios without traffic signals to demonstrate that DADRL outperforms recent state of the art methods. Furthermore, an ablation study underscores the importance of the context-encoder and spatio temporal attention encoder in achieving superior performance.	翻訳日:2024-07-16 01:06:34 公開日:2024-07-12
# 高ボリュームメディア製造における機械学習 Machine Learning in High Volume Media Manufacturing ( http://arxiv.org/abs/2407.08933v1 ) ライセンス: Link先を確認	Siddarth Reddy Karuka, Abhinav Sunderrajan, Zheng Zheng, Yong Woon Tiean, Ganesh Nagappan, Allan Luk,	(参考訳) 大量生産環境でのエラーや失敗は、時間とお金の損失をもたらす大きな影響を与える可能性がある。このような失敗を早期に特定することは製造業にとって最優先事項であり、長年にわたり様々なルールベースのアルゴリズムが開発されてきた。しかし、これらの失敗をキャッチすることは時間がかかり、そのようなアルゴリズムは設計の変化にうまく適応できない。さらに重要なのは、大量生産環境で監視するユニットの数は、手動の監視や単純なプログラムには大きすぎることだ。ここでは、ルールベースの意思決定と機械学習モデルを組み合わせて、このような日々のバリエーションや長期的なデザインの変更を学習し、適応できるだけでなく、現在使われている多くの製造ユニットにも大規模に適用できる新しいプログラムを開発する。現在の最先端技術を用いて、我々はこのプログラムを大規模に展開し、製造環境からの需要の増加に対処する。 Errors or failures in a high-volume manufacturing environment can have significant impact that can result in both the loss of time and money. Identifying such failures early has been a top priority for manufacturing industries and various rule-based algorithms have been developed over the years. However, catching these failures is time consuming and such algorithms cannot adapt well to changes in designs, and sometimes variations in everyday behavior. More importantly, the number of units to monitor in a high-volume manufacturing environment is too big for manual monitoring or for a simple program. Here we develop a novel program that combines both rule-based decisions and machine learning models that can not only learn and adapt to such day-to-day variations or long-term design changes, but also can be applied at scale to the high number of manufacturing units in use today. Using the current state-of-the-art technologies, we then deploy this program at-scale to handle the needs of ever-increasing demand from the manufacturing environment.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# 神経内包と相互作用分解における組成構造 Compositional Structures in Neural Embedding and Interaction Decompositions ( http://arxiv.org/abs/2407.08934v1 ) ライセンス: Link先を確認	Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto,	(参考訳) ニューラルネットワークにおけるベクトル埋め込みにおける線形代数構造と,これらのネットワークでモデル化された確率分布に対する条件独立性制約の基本的な対応について述べる。我々のフレームワークは、データ表現における構造パターンの出現に光を当てることを目的としている。具体的には、「相互作用分解」という観点から構成構造の特徴づけを導入し、モデルの表現の中にそのような構造が存在するためには必要かつ十分な条件を確立する。 We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks and conditional independence constraints on the probability distributions modeled by these networks. Our framework aims to shed light on the emergence of structural patterns in data representations, a phenomenon widely acknowledged but arguably still lacking a solid formal grounding. Specifically, we introduce a characterization of compositional structures in terms of "interaction decompositions," and we establish necessary and sufficient conditions for the presence of such structures within the representations of a model.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# フェデレーショングラフ学習と認証ディフェンスに対する分散型バックドアアタック Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses ( http://arxiv.org/abs/2407.08935v1 ) ライセンス: Link先を確認	Yuxin Yang, Qiang Li, Jinyuan Jia, Yuan Hong, Binghui Wang,	(参考訳) フェデレーショングラフ学習(Federated Graph Learning, FedGL)は、FLを拡張してさまざまなソースからグラフデータを学習する、新たなフェデレーション学習(FedGL)フレームワークである。非グラフデータのFLは、トレーニングデータに共有バックドアトリガーを注入し、トレーニングされたバックドアFLモデルが、攻撃者が望むようにトリガーを含むテストデータを予測できるように、バックドアアタックに対して脆弱であることが示されている。しかし、FedGLによるバックドア攻撃に対する攻撃はほとんど探索されておらず、効果的な防御は存在しない。本稿では,このような重大な欠陥に対処することを目的とする。まず,FedGLに対する効果的な,ステルス的で永続的なバックドア攻撃を提案する。我々の攻撃では、サブグラフをトリガーとし、各グラフの効果的なトリガー位置と形状を導出できる適応トリガージェネレータを設計する。私たちの攻撃は、経験的防御が生成したトリガーを検出・削除することが難しいことを示している。これを軽減するため、任意の位置で任意の形状のトリガーに対して、バックドアのFedGLモデルに対する認証された防御を更に開発する。我々の弁護は、テストグラフを複数のサブグラフに慎重に分割し、これらのサブグラフに多数決ベースのアンサンブル分類器を設計することである。次に、アンサンブル分類器に基づいて決定論的証明された堅牢性を導出し、その厳密性を証明する。 6つのグラフデータセットに対する攻撃と防御を広範囲に評価した。我々の攻撃結果は、ほぼ全てのデータセットで90%以上のバックドア精度が得られることを示している。以上の結果から,20の任意のトリガに対するクリーンなテストグラフの精度は,攻撃を受けない場合の正常な精度に近いが,他の場合では適度なギャップがあることがわかった。さらに、我々の攻撃によって生成されたバックドアテストグラフに対して、認証されたバックドア精度は常に0であり、防衛が攻撃を完全に軽減できることを意味している。ソースコードはhttps://github.com/Yuxin104/Opt-GDBA.comで入手できる。 Federated graph learning (FedGL) is an emerging federated learning (FL) framework that extends FL to learn graph data from diverse sources. FL for non-graph data has shown to be vulnerable to backdoor attacks, which inject a shared backdoor trigger into the training data such that the trained backdoored FL model can predict the testing data containing the trigger as the attacker desires. However, FedGL against backdoor attacks is largely unexplored, and no effective defense exists. In this paper, we aim to address such significant deficiency. First, we propose an effective, stealthy, and persistent backdoor attack on FedGL. Our attack uses a subgraph as the trigger and designs an adaptive trigger generator that can derive the effective trigger location and shape for each graph. Our attack shows that empirical defenses are hard to detect/remove our generated triggers. To mitigate it, we further develop a certified defense for any backdoored FedGL model against the trigger with any shape at any location. Our defense involves carefully dividing a testing graph into multiple subgraphs and designing a majority vote-based ensemble classifier on these subgraphs. We then derive the deterministic certified robustness based on the ensemble classifier and prove its tightness. We extensively evaluate our attack and defense on six graph datasets. Our attack results show our attack can obtain > 90% backdoor accuracy in almost all datasets. Our defense results show, in certain cases, the certified accuracy for clean testing graphs against an arbitrary trigger with size 20 can be close to the normal accuracy under no attack, while there is a moderate gap in other cases. Moreover, the certified backdoor accuracy is always 0 for backdoored testing graphs generated by our attack, implying our defense can fully mitigate the attack. Source code is available at: https://github.com/Yuxin104/Opt-GDBA.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# 自己進化型GPT : 生涯にわたる自律的経験的学習者 Self-Evolving GPT: A Lifelong Autonomous Experiential Learner ( http://arxiv.org/abs/2407.08937v1 ) ライセンス: Link先を確認	Jinglong Gao, Xiao Ding, Yiming Cui, Jianbai Zhao, Hepeng Wang, Ting Liu, Bing Qin,	(参考訳) 大規模言語モデル(LLM)の性能向上のために,研究者らは,プロンプトによるテキストタスク解決エクスペリエンスを備えたLLMの提供を検討した。しかし、各タスクに対してこのような経験を習得し、適用するための手作業に頼っているため、LSMの需要の増加や様々なユーザ質問に対して実現不可能である。この問題に対処するために、LLMをベースとした生涯にわたる自律的経験学習フレームワークを設計し、LLMが人間の学習能力を模倣し、経験を活用できるかどうかを考察する。自律的に学習し、経験の伝達と帰納を通じて経験を蓄積し、どのような種類の入力質問を分類し、どの蓄積された経験を雇用するかを選択する。 6つのNLPデータセットによる実験結果から,本フレームワークは各中間段階において確実に動作し,GPT-3.5およびGPT-4の性能を効果的に向上することが示された。これは、人間の経験的学習と応用能力を模倣するためにLLMを使うことの可能性を検証する。さらに、各ステップでフレームワークの振る舞いを詳細に分析します。 To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential learning framework based on LLMs to explore whether LLMs can imitate human ability for learning and utilizing experience. It autonomously learns and accumulates experience through experience transfer and induction, categorizing the types of input questions to select which accumulated experience to employ for them. Experimental results on six widely used NLP datasets show that our framework performs reliably in each intermediate step and effectively improves the performance of GPT-3.5 and GPT-4. This validates the feasibility of using LLMs to mimic human experiential learning and application capabilities. Additionally, we provide a detailed analysis of the behavior of our framework at each step.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# LightenDiffusion: ラテン・レチネックス拡散モデルによる教師なし低光画像強調 LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models ( http://arxiv.org/abs/2407.08939v1 ) ライセンス: Link先を確認	Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, Shuaicheng Liu,	(参考訳) 本稿では,低照度画像強調のための拡散モデルであるLightenDiffusionを用いて,物理的に説明可能なRetinex理論を組み込んだ拡散に基づく教師なしフレームワークを提案する。具体的には,画像空間ではなく遅延空間内でRetinex分解を行うコンテントトランスファー分解ネットワークを提案し,未ペアローライトおよびノーマルライト画像の特徴をコンテントリッチリフレクタンスマップとコンテントフリー照明マップに分解する。その後、低照度画像の反射率マップと通常照度画像の照度マップとを低照度特徴の誘導により教師なし復元のための拡散モデルに入力し、自己拘束的整合性損失をさらに提案して、回復結果に対する正常照度コンテンツの干渉を排除し、全体的な視覚的品質を向上させる。公開されている実世界のベンチマークに関する大規模な実験によると、提案されたLightenDiffusionは最先端の非教師付き競合よりも優れており、様々な場面でより一般化可能な教師付き手法に匹敵する。私たちのコードはhttps://github.com/JianghaiSCU/LightenDiffusion.comで公開されています。 In this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, we present a content-transfer decomposition network that performs Retinex decomposition within the latent space instead of image space as in previous approaches, enabling the encoded features of unpaired low-light and normal-light images to be decomposed into content-rich reflectance maps and content-free illumination maps. Subsequently, the reflectance map of the low-light image and the illumination map of the normal-light image are taken as input to the diffusion model for unsupervised restoration with the guidance of the low-light feature, where a self-constrained consistency loss is further proposed to eliminate the interference of normal-light content on the restored results to improve overall visual quality. Extensive experiments on publicly available real-world benchmarks show that the proposed LightenDiffusion outperforms state-of-the-art unsupervised competitors and is comparable to supervised methods while being more generalizable to various scenes. Our code is available at https://github.com/JianghaiSCU/LightenDiffusion.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# バイオメディカル仮説生成系としての大規模言語モデル:包括的評価 Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation ( http://arxiv.org/abs/2407.08940v1 ) ライセンス: Link先を確認	Biqing Qi, Kaiyan Zhang, Kai Tian, Haoxiang Li, Zhang-Ren Chen, Sihang Zeng, Ermo Hua, Hu Jinfang, Bowen Zhou,	(参考訳) 生物医学的知識の急速な成長は、洞察を効率的に抽出し、新しい仮説を創出する能力を大きく上回っている。大規模言語モデル(LLM)は、知識の相互作用を革新し、生体医学的な発見を加速するための有望なツールとして登場した。本稿では, LLMをバイオメディカル仮説生成器として包括的に評価する。バイオメディカル文献から背景と仮説のペアのデータセットを構築し、データ汚染を軽減するために、公開日に基づくトレーニング、観察、不明なテストセットに慎重に分割する。このデータセットを用いて、ゼロショット、少数ショット、微調整設定で上位層の指示されたモデルの仮説生成能力を評価する。科学的発見の重要な側面である不確実性の探索を強化するため,評価枠組みにツール利用とマルチエージェントインタラクションを取り入れた。さらに, LLMに基づく評価と人的評価の両面から, 仮説の質を評価するために, 広範な文献レビューに基礎を置く4つの新しい指標を提案する。我々の実験は2つの重要な発見をもたらす。 1)LLMは、トレーニング中に見えない文献でテストしても、新規で検証された仮説を生成できる。 2)マルチエージェントインタラクションやツール利用による不確実性の向上により,多様な候補生成が容易になり,ゼロショット仮説生成性能が向上する。しかし、数発の学習とツール使用による追加知識の統合は、必ずしもパフォーマンス向上につながるとは限りませんし、組み込まれた外部知識のタイプや範囲を慎重に検討する必要性も浮き彫りにしています。これらの知見は、LLMが生物医学的仮説生成の強力な補助となり、この分野のさらなる研究を導く貴重な洞察を与える可能性を示している。 The rapid growth of biomedical knowledge has outpaced our ability to efficiently extract insights and generate novel hypotheses. Large language models (LLMs) have emerged as a promising tool to revolutionize knowledge interaction and potentially accelerate biomedical discovery. In this paper, we present a comprehensive evaluation of LLMs as biomedical hypothesis generators. We construct a dataset of background-hypothesis pairs from biomedical literature, carefully partitioned into training, seen, and unseen test sets based on publication date to mitigate data contamination. Using this dataset, we assess the hypothesis generation capabilities of top-tier instructed models in zero-shot, few-shot, and fine-tuning settings. To enhance the exploration of uncertainty, a crucial aspect of scientific discovery, we incorporate tool use and multi-agent interactions in our evaluation framework. Furthermore, we propose four novel metrics grounded in extensive literature review to evaluate the quality of generated hypotheses, considering both LLM-based and human assessments. Our experiments yield two key findings: 1) LLMs can generate novel and validated hypotheses, even when tested on literature unseen during training, and 2) Increasing uncertainty through multi-agent interactions and tool use can facilitate diverse candidate generation and improve zero-shot hypothesis generation performance. However, we also observe that the integration of additional knowledge through few-shot learning and tool use may not always lead to performance gains, highlighting the need for careful consideration of the type and scope of external knowledge incorporated. These findings underscore the potential of LLMs as powerful aids in biomedical hypothesis generation and provide valuable insights to guide further research in this area.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# マルチモーダル大言語モデルに基づくニューラルネットワーク行列分解レコメンダシステムモデル A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model ( http://arxiv.org/abs/2407.08942v1 ) ライセンス: Link先を確認	Ao Xiang, Bingjie Huang, Xinyu Guo, Haowei Yang, Tianyao Zheng,	(参考訳) 推薦システムは情報検索問題に対する重要な解決策となっている。本稿では,BoNMFと呼ばれる多モーダル大規模言語モデルに基づくニューラルネットワーク行列分解推薦システムモデルを提案する。このモデルは、自然言語処理におけるBoBERTaの強力な能力、視覚におけるコンピュータにおけるViT、およびニューラルマトリックス分解技術を組み合わせたものである。ユーザとアイテムの潜在的な特性をキャプチャし、ユーザとアイテムIDからなる低次元行列と対話した後、ニューラルネットワークは結果を出力する。推薦するコールドスタートおよびアブレーション実験の結果,BoNMFモデルは大規模な公開データセットに対して優れた性能を示し,レコメンデーションの精度を大幅に向上させることがわかった。 Recommendation systems have become an important solution to information search problems. This article proposes a neural matrix factorization recommendation system model based on the multimodal large language model called BoNMF. This model combines BoBERTa's powerful capabilities in natural language processing, ViT in computer in vision, and neural matrix decomposition technology. By capturing the potential characteristics of users and items, and after interacting with a low-dimensional matrix composed of user and item IDs, the neural network outputs the results. recommend. Cold start and ablation experimental results show that the BoNMF model exhibits excellent performance on large public data sets and significantly improves the accuracy of recommendations.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# 大規模ローカライゼーションのための展開可能な量子アクセスポイント選択アルゴリズム A Deployable Quantum Access Points Selection Algorithm for Large-Scale Localization ( http://arxiv.org/abs/2407.08943v1 ) ライセンス: Link先を確認	Ahmed Shokry, Moustafa Youssef,	(参考訳) 効果的なアクセスポイントの選択は、ローカライズシステムにおいて重要なステップである。これは、ローカライズ精度と計算効率の両方に直接的な影響を与える。古典的なAP選択アルゴリズムは通常計算コストが高く、大規模なグローバルスケールでのローカライゼーションシステムの展開を妨げる。本稿では,大規模ローカライゼーションシステムのための量子APs選択アルゴリズムを提案する。提案アルゴリズムは、量子アニールを利用して冗長でノイズの多いAPを除去する。本稿では、量子アニールに適した2次非拘束バイナリ最適化(QUBO)問題としてAPs選択問題を定式化する方法と、完全APsセットと同じ局所化系精度を維持する最小数のAPを選択する方法について説明する。これに基づいて、最適なAP数を選択するための対数複雑度アルゴリズムを提案する。我々は,実D-Wave Systems量子マシンに量子アルゴリズムを実装し,フロアローカライズ問題に対する実テスト環境での性能評価を行う。その結果, 利用可能なAPの14%未満を環境下で選択することにより, 量子アルゴリズムは, 古典的なAP選択によるデータセットの削減よりも, 全体のAPを利用するのと同じフロアローカライズ精度と優れた精度を達成できることが判明した。さらに、提案した量子アルゴリズムは、対応する古典的なAPs選択アルゴリズムよりも1桁以上のスピードアップを実現し、大規模ローカライゼーションシステムにおける提案した量子アルゴリズムの効率性を強調した。 Effective access points (APs) selection is a crucial step in localization systems. It directly affects both localization accuracy and computational efficiency. Classical APs selection algorithms are usually computationally expensive, hindering the deployment of localization systems in a large worldwide scale. In this paper, we introduce a quantum APs selection algorithm for large-scale localization systems. The proposed algorithm leverages quantum annealing to eliminate redundant and noisy APs. We explain how to formulate the APs selection problem as a quadratic unconstrained binary optimization (QUBO) problem, suitable for quantum annealing, and how to select the minimum number of APs that maintain the same overall localization system accuracy as the complete APs set. Based on this, we further propose a logarithmic-complexity algorithm to select the optimal number of APs. We implement our quantum algorithm on a real D-Wave Systems quantum machine and assess its performance in a real test environment for a floor localization problem. Our findings reveal that by selecting fewer than 14% of the available APs in the environment, our quantum algorithm achieves the same floor localization accuracy as utilizing the entire set of APs and a superior accuracy over utilizing the reduced dataset by classical APs selection counterparts. Moreover, the proposed quantum algorithm achieves more than an order of magnitude speedup over the corresponding classical APs selection algorithms, emphasizing the efficiency of the proposed quantum algorithm for large-scale localization systems.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# Bora:バイオメディカルジェネリストのビデオ生成モデル Bora: Biomedical Generalist Video Generation Model ( http://arxiv.org/abs/2407.08944v1 ) ライセンス: Link先を確認	Weixiang Sun, Xiaocao You, Ruizhe Zheng, Zhengqing Yuan, Xiang Li, Lifang He, Quanzheng Li, Lichao Sun,	(参考訳) 生成モデルは、医療教育の革新、ロボット支援手術、医療AI開発のためのデータ拡張を約束する。拡散モデルはテキストプロンプトからリアルな画像を生成できるようになったが、最近の進歩は、多種多様な高品質のビデオを作成する能力を示している。しかしながら、これらのモデルは、医療処置の正確な表現と詳細な解剖学的構造の生成に苦慮することが多い。本稿では,テキスト誘導バイオメディカルビデオ生成のための最初の時空間拡散確率モデルであるBoraを紹介する。 BoraはTransformerアーキテクチャを活用し、汎用ビデオ生成タスクで事前訓練されている。様々な医療分野のテキストビデオデータを含む,新たに確立された医用ビデオコーパスを用いて,モデルアライメントとインストラクションチューニングによって微調整を行う。私たちの知る限りでは、このような包括的な注釈付きバイオメディカルビデオデータセットを確立するための最初の試みである。 Boraは、4つの異なるバイオメディカル領域にまたがる高品質なビデオデータを生成し、医療専門家の基準に準拠し、一貫性と多様性を示す。このジェネラリストビデオ生成モデルは、特にリソース限定の設定において、医療相談や意思決定の強化に重要な可能性を秘めている。さらに、ボラは没入型医療訓練と手続き計画の道を開くことができる。内視鏡, 超音波, MRI, 細胞追跡などの異なる医用モダリティに関する広範囲な実験により, 生医学的指示を理解する上での本モデルの有効性と, 最先端の世代モデルと比較して, 被験者間での優れた性能が検証された。 Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for medical AI development. Diffusion models can now generate realistic images from text prompts, while recent advancements have demonstrated their ability to create diverse, high-quality videos. However, these models often struggle with generating accurate representations of medical procedures and detailed anatomical structures. This paper introduces Bora, the first spatio-temporal diffusion probabilistic model designed for text-guided biomedical video generation. Bora leverages Transformer architecture and is pre-trained on general-purpose video generation tasks. It is fine-tuned through model alignment and instruction tuning using a newly established medical video corpus, which includes paired text-video data from various biomedical fields. To the best of our knowledge, this is the first attempt to establish such a comprehensive annotated biomedical video dataset. Bora is capable of generating high-quality video data across four distinct biomedical domains, adhering to medical expert standards and demonstrating consistency and diversity. This generalist video generative model holds significant potential for enhancing medical consultation and decision-making, particularly in resource-limited settings. Additionally, Bora could pave the way for immersive medical training and procedure planning. Extensive experiments on distinct medical modalities such as endoscopy, ultrasound, MRI, and cell tracking validate the effectiveness of our model in understanding biomedical instructions and its superior performance across subjects compared to state-of-the-art generation models.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# 拡散モデルは秘密裏にノイズ分類器であり、コントラストトレーニングの利点 Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training ( http://arxiv.org/abs/2407.08946v1 ) ライセンス: Link先を確認	Yunshu Wu, Yingtao Luo, Xianghao Kong, Evangelos E. Papalexakis, Greg Ver Steeg,	(参考訳) 拡散モデルはデータをノイズ化することを学び、訓練されたデノイザを使用してデータ分布から新しいサンプルを生成する。本稿では, 拡散サンプリングプロセスを再検討し, 試料品質劣化の根本原因を同定する。このデノイザは, トレーニング分布外(OOD)から遠く離れた地域では推定が不十分であり, これらのOOD領域ではサンプリングプロセスが必然的に評価される。これは全てのサンプリング手法において問題となり、特に並列サンプリングに移行する際には、動的の標本軌跡全体を並列に初期化および更新する必要があるため、多くのOOD評価が導かれる。この問題に対処するために,サンプルに付加される雑音のレベルを区別する新たな自己教師型学習目標を導入する。提案手法は, 拡散モデルが音量の異なる分布を識別する対数様比を暗黙的に定義することに基づいており, この表現は, 標準学習分布の外でのデノイザー性能に依存する。提案したコントラスト拡散訓練は逐次的および並列的な設定に有効であり, 並列サンプリング器の性能と速度を著しく向上することを示す。 Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions. This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models implicitly define a log-likelihood ratio that distinguishes distributions with different amounts of noise, and this expression depends on denoiser performance outside the standard training distribution. We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings, and it improves the performance and speed of parallel samplers significantly.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# 概念ベースモデルの構築による人間の最小努力とのスパーラス相関の緩和 Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort ( http://arxiv.org/abs/2407.08947v1 ) ライセンス: Link先を確認	Jeeyung Kim, Ze Wang, Qiang Qiu,	(参考訳) モデル解釈可能性の強化は、モデルがどのように予測を引き出すかを明らかにすることで、急激な相関に対処することができる。概念ボトルネックモデル(Concept Bottleneck Models, CBM)は、データアノテーションにおける人間の努力のコストが高いにもかかわらず、人間の理解可能な概念を通じてモデル行動の開示とガイドを行う、原則化された方法を提供する。本稿では,複数の基礎モデルの相乗効果を利用して,人的労力を伴わないCBMを構築する。我々は、事前学習モデル上に構築されたCBMの望ましくないバイアスを発見し、これらのバイアスに免疫を持ちながら事前学習モデルを利用するように設計された新しいフレームワークを提案する。具体的には、データセットの潜在的スパイラルな相関を評価し、画像の概念を注釈付けし、ロバスト性を改善するためのアノテーションを洗練するための基礎モデルを採用したシームレスなパイプラインを提供する。提案手法を複数のデータセット上で評価し,その解釈可能性を維持しつつ,素粒子相関によるモデル依存の低減効果を示した。 Enhancing model interpretability can address spurious correlations by revealing how models draw their predictions. Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts, albeit at a high cost of human efforts in data annotation. In this paper, we leverage a synergy of multiple foundation models to construct CBMs with nearly no human effort. We discover undesirable biases in CBMs built on pre-trained models and propose a novel framework designed to exploit pre-trained models while being immune to these biases, thereby reducing vulnerability to spurious correlations. Specifically, our method offers a seamless pipeline that adopts foundation models for assessing potential spurious correlations in datasets, annotating concepts for images, and refining the annotations for improved robustness. We evaluate the proposed method on multiple datasets, and the results demonstrate its effectiveness in reducing model reliance on spurious correlations while preserving its interpretability.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# 脳画像解析のためのDeep Learning Frameworkをコード化した対称性認識 Symmetry Awareness Encoded Deep Learning Framework for Brain Imaging Analysis ( http://arxiv.org/abs/2407.08948v1 ) ライセンス: Link先を確認	Yang Ma, Dongang Wang, Peilin Liu, Lynette Masters, Michael Barnett, Weidong Cai, Chenyu Wang,	(参考訳) 構造異常から機能障害まで、神経学的条件の不均一性は、医用画像解析タスクにおいて重要な課題である。さらに、注釈付きデータセットの可用性の制限により、ロバスト分析モデルの開発が制限される。この背景から,本研究では,ヒト脳の解剖学的対称的特徴を活用する新たなアプローチを導入し,その後の脳疾患の検出とセグメンテーション分析を強化する。左右半球の対称的特徴を符号化する新しいシンメトリー・アウェア・クロス・アテンション(SACA)モジュールが提案され、様々なMRIおよびCTで健康な脳画像と疾患の脳画像からなる広範囲な脳画像データセット上でネットワーク全体の事前学習を導くシンメトリー・アウェア・ヘッド(SAH)として対称的特徴を検出するプロキシタスクが提案されている。脳疾患の分類とセグメンテーションの両方を含む下流タスクの綿密な実験を通じて、我々のモデルは最先端の方法論よりも優れた性能を示し、特に対称性学習の重要性を強調している。本研究は, 事前トレーニングに対称性認識を取り入れることの有効性を提唱し, 医用画像解析のための新しいベンチマークを作成し, 正確かつ効率的な診断プロセスに向けて大きな前進を約束する。コードはhttps://github.com/bitMyron/sa-swin.comから入手できる。 The heterogeneity of neurological conditions, ranging from structural anomalies to functional impairments, presents a significant challenge in medical imaging analysis tasks. Moreover, the limited availability of well-annotated datasets constrains the development of robust analysis models. Against this backdrop, this study introduces a novel approach leveraging the inherent anatomical symmetrical features of the human brain to enhance the subsequent detection and segmentation analysis for brain diseases. A novel Symmetry-Aware Cross-Attention (SACA) module is proposed to encode symmetrical features of left and right hemispheres, and a proxy task to detect symmetrical features as the Symmetry-Aware Head (SAH) is proposed, which guides the pretraining of the whole network on a vast 3D brain imaging dataset comprising both healthy and diseased brain images across various MRI and CT. Through meticulous experimentation on downstream tasks, including both classification and segmentation for brain diseases, our model demonstrates superior performance over state-of-the-art methodologies, particularly highlighting the significance of symmetry-aware learning. Our findings advocate for the effectiveness of incorporating symmetry awareness into pretraining and set a new benchmark for medical imaging analysis, promising significant strides toward accurate and efficient diagnostic processes. Code is available at https://github.com/bitMyron/sa-swin.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# One-Shot Pose-Driving Face Animation Platform One-Shot Pose-Driving Face Animation Platform ( http://arxiv.org/abs/2407.08949v1 ) ライセンス: Link先を確認	He Feng, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su,	(参考訳) 顔アニメーションの目的は、ビデオまたは音声入力から導かれる駆動条件を利用して、単一の参照顔から動的で表現力のある音声ヘッドビデオを生成することである。現在のアプローチでは、特定のアイデンティティを微調整する必要があることが多く、Wav2Poseモジュールの有効性が制限されているため、表現力のあるビデオの生成に失敗することが多い。ワンショットかつ連続的な音声ヘッドビデオの生成を容易にするため,Face LocatorとMotion Frame機構を統合し,既存のImage2Videoモデルを洗練する。その後、人間の顔ビデオデータセットを用いてモデルを最適化し、高品質で表現力のある音声ヘッドビデオを作成する能力を大幅に向上させた。さらに,Gradioフレームワークを用いたデモプラットフォームを開発し,プロセスの合理化を実現した。 The objective of face animation is to generate dynamic and expressive talking head videos from a single reference face, utilizing driving conditions derived from either video or audio inputs. Current approaches often require fine-tuning for specific identities and frequently fail to produce expressive videos due to the limited effectiveness of Wav2Pose modules. To facilitate the generation of one-shot and more consecutive talking head videos, we refine an existing Image2Video model by integrating a Face Locator and Motion Frame mechanism. We subsequently optimize the model using extensive human face video datasets, significantly enhancing its ability to produce high-quality and expressive talking head videos. Additionally, we develop a demo platform using the Gradio framework, which streamlines the process, enabling users to quickly create customized talking head videos.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# 画像復元のための周波数選択によるよりリッチで高精度な情報探索 Exploring Richer and More Accurate Information via Frequency Selection for Image Restoration ( http://arxiv.org/abs/2407.08950v1 ) ライセンス: Link先を確認	Hu Gao, Depeng Dang,	(参考訳) 画像復元は、破損した画像から高品質な画像を復元することを目的としている。既存の多くの手法は、主に空間領域に注目し、周波数変動の理解を無視し、スキップ接続における暗騒音の影響を無視している。本稿では,空間および周波数領域の知識をシームレスに統合し,よりリッチで正確な情報を選択的に回復するマルチスケール周波数選択ネットワーク(MSFSNet)を提案する。具体的には、まずまず空間的特徴を捉え、周波数知識を統合するために異なるスケールで動的フィルタ選択モジュール(DFS)に入力する。 DFSは学習可能なフィルタを用いて高周波数・低周波情報を生成し、周波数クロスアテンション機構(FCAM)を用いて回復する最も多くの情報を決定する。マルチスケールで正確なハイブリッド特徴集合を学習するために,コンテキスト特徴を利用したスキップ特徴融合ブロック(SFF)を開発し,どの情報をスキップ接続で伝播すべきかを識別する。 DFSとSFFがジェネリックプラグインモジュールであることは注目に値する。様々な画像復元タスクに対する大規模な実験により、MSFSNetは最先端のアルゴリズムに匹敵する性能を達成できることを示した。 Image restoration aims to recover high-quality images from their corrupted counterparts. Many existing methods primarily focus on the spatial domain, neglecting the understanding of frequency variations and ignoring the impact of implicit noise in skip connections. In this paper, we introduce a multi-scale frequency selection network (MSFSNet) that seamlessly integrates spatial and frequency domain knowledge, selectively recovering richer and more accurate information. Specifically, we initially capture spatial features and input them into dynamic filter selection modules (DFS) at different scales to integrate frequency knowledge. DFS utilizes learnable filters to generate high and low-frequency information and employs a frequency cross-attention mechanism (FCAM) to determine the most information to recover. To learn a multi-scale and accurate set of hybrid features, we develop a skip feature fusion block (SFF) that leverages contextual features to discriminatively determine which information should be propagated in skip-connections. It is worth noting that our DFS and SFF are generic plug-in modules that can be directly employed in existing networks without any adjustments, leading to performance improvements. Extensive experiments across various image restoration tasks demonstrate that our MSFSNet achieves performance that is either superior or comparable to state-of-the-art algorithms.	翻訳日:2024-07-16 01:06:33 公開日:2024-07-12
# 検知・調査・判断:Few-shot Fakeニュース検出のための新しいLCMベースのフレームワーク Detect, Investigate, Judge and Determine: A Novel LLM-based Framework for Few-shot Fake News Detection ( http://arxiv.org/abs/2407.08952v1 ) ライセンス: Link先を確認	Ye Liu, Jiajun Zhu, Kai Zhang, Haoyu Tang, Yanghai Zhang, Xukai Liu, Qi Liu, Enhong Chen,	(参考訳) Few-Shot Fake News Detection (FS-FND) は、極めて低リソースのシナリオにおいて、非正確なニュースを実際のニュースと区別することを目的としている。ソーシャルメディア上でのフェイクニュースの拡散や有害な影響により、このタスクは注目を集めている。大きな言語モデル(LLM)は、豊富な事前知識と優れたコンテキスト内学習能力の助けを借りて、競争性能を実証している。しかし、既存の手法では、LLMの可能性を著しく損なう「理解のあいまいさ」や「インフォメーション・スカシティ」といった重大な制限に直面している。これらの欠点に対処するため、内部および外部からLLMを強化するために設計されたDual-perspective Augmented Fake News Detection (DAFND)モデルを提案する。具体的には、DAFNDはまず、検出モジュールを通じて各ニュース記事のキーワードを識別する。その後、DAFNDは、現在のニュースに関する情報の内部および外部の貴重な情報を検索するための調査モジュールを創造的に設計し、続いて別の審査モジュールがそれぞれの2つの予測結果を導出する。最後に、決定モジュールはこれらの2つの予測をさらに統合し、最終的な結果を引き出す。 2つの公開データセットに対する大規模な実験により,提案手法の有効性,特に低リソース環境での有効性が示された。 Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. Large Language Models (LLMs) have demonstrated competitive performance with the help of their rich prior knowledge and excellent in-context learning abilities. However, existing methods face significant limitations, such as the Understanding Ambiguity and Information Scarcity, which significantly undermine the potential of LLMs. To address these shortcomings, we propose a Dual-perspective Augmented Fake News Detection (DAFND) model, designed to enhance LLMs from both inside and outside perspectives. Specifically, DAFND first identifies the keywords of each news article through a Detection Module. Subsequently, DAFND creatively designs an Investigation Module to retrieve inside and outside valuable information concerning to the current news, followed by another Judge Module to derive its respective two prediction results. Finally, a Determination Module further integrates these two predictions and derives the final result. Extensive experiments on two publicly available datasets show the efficacy of our proposed method, particularly in low-resource settings.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# 資産価格における帰属方法:彼らはリスクを考慮しているか? Attribution Methods in Asset Pricing: Do They Account for Risk? ( http://arxiv.org/abs/2407.08953v1 ) ライセンス: Link先を確認	Dangxing Chen, Yuan Gao,	(参考訳) 過去数十年間、機械学習モデルは極めて成功した。公理的帰属法の結果、特徴的貢献はより明確かつ厳密に説明されている。しかし、公理とともにドメイン知識を調べる研究はほとんどない。本研究では,リスク管理と密接に関連する金融の資産価格について検討する。したがって、機械学習モデルを適用する際には、帰属法が根底にあるリスクを正確に反映することを保証する必要がある。本研究では、資産価格ドメインの知識から導かれるいくつかの公理を提示し、研究する。シャプリー値と積分勾配は、ほとんどの公理を保存するが、どちらも全ての公理を満たすことはできない。分析的および実証的な例を用いて、帰属法がいかにリスクを反映し、いつ使用すべきでないかを実証する。 Over the past few decades, machine learning models have been extremely successful. As a result of axiomatic attribution methods, feature contributions have been explained more clearly and rigorously. There are, however, few studies that have examined domain knowledge in conjunction with the axioms. In this study, we examine asset pricing in finance, a field closely related to risk management. Consequently, when applying machine learning models, we must ensure that the attribution methods reflect the underlying risks accurately. In this work, we present and study several axioms derived from asset pricing domain knowledge. It is shown that while Shapley value and Integrated Gradients preserve most axioms, neither can satisfy all axioms. Using extensive analytical and empirical examples, we demonstrate how attribution methods can reflect risks and when they should not be used.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# PriRoAgg: フェデレートラーニングのための最小限のプライバシリークでロバストモデル集約を実現する PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning ( http://arxiv.org/abs/2407.08954v1 ) ライセンス: Link先を確認	Sizai Hou, Songze Li, Tayyebeh Jahani-Nezhad, Giuseppe Caire,	(参考訳) フェデレートラーニング(FL)は、ユーザプライバシを保ちながら、大規模な分散ユーザデータを活用できる可能性から、最近大きな勢いを増している。しかし、FLの典型的なパラダイムは、プライバシとロバスト性の両方の課題に直面している。送信されたモデル更新は、機密性の高いユーザ情報を漏洩させる可能性があるし、ローカルトレーニングプロセスの集中的な制御の欠如は、モデル更新に対する悪意のある操作の影響を受けやすいグローバルモデルを残している。ワンサーバFL設定の下で両方の問題に対処しようとする現在のソリューションは、以下の側面で不足している。 1) 高度な攻撃(例えば、個別更新の基準のチェックなど)に対して不十分な簡易な妥当性確認のために設計された。 2) より複雑なロバストな集約アルゴリズムに対する部分的なプライバシリーク(例えば、マルチスクラムではモデル更新間の距離がリークされる)。本研究では,より高度なロバストなアグリゲーションを実現するためには,ユーザ情報量の最小化を図った,新たなセキュリティ概念であるアグリゲートプライバシを,ユーザ更新の集計統計の形で形式化する。我々は、Lagrange符号化計算と分散ゼロ知識証明を利用した汎用フレームワークPriRoAggを開発し、集約されたプライバシを満たすとともに、幅広いロバストな集約アルゴリズムを実行する。 PriRoAggの具体的なインスタンス化として、最先端のロバストアルゴリズムに基づく2つのセキュアでロバストなプロトコルを構築し、セキュリティと複雑性に関する完全な理論的分析を行う。これらのプロトコルに対して大規模な実験を行い、様々なモデルの整合性攻撃に対する頑健さと、ベースラインに対する効率上の優位性を実証した。 Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data while preserving user privacy. However, the typical paradigm of FL faces challenges of both privacy and robustness: the transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. Current solutions attempting to address both problems under the one-server FL setting fall short in the following aspects: 1) designed for simple validity checks that are insufficient against advanced attacks (e.g., checking norm of individual update); and 2) partial privacy leakage for more complicated robust aggregation algorithms (e.g., distances between model updates are leaked for multi-Krum). In this work, we formalize a novel security notion of aggregated privacy that characterizes the minimum amount of user information, in the form of some aggregated statistics of users' updates, that is necessary to be revealed to accomplish more advanced robust aggregation. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy. As concrete instantiations of PriRoAgg, we construct two secure and robust protocols based on state-of-the-art robust algorithms, for which we provide full theoretical analyses on security and complexity. Extensive experiments are conducted for these protocols, demonstrating their robustness against various model integrity attacks, and their efficiency advantages over baselines.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# DeCE:裏口攻撃に備えて設計したクロスエントロピー障害 DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks ( http://arxiv.org/abs/2407.08956v1 ) ライセンス: Link先を確認	Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, David Lo, Taolue Chen,	(参考訳) コード言語モデル(CLM)、特にディープラーニングを活用するものは、コードインテリジェンス領域において大きな成功を収めています。しかし、セキュリティの問題、特にバックドア攻撃は、このプロセスでは見過ごされがちである。これまでの研究では、CLMのバックドア攻撃の設計に焦点が当てられていたが、効果的な防御は適切に対処されていない。特に、自然言語処理からの既存の防御手法は、CLMに直接適用しても効果が十分ではなく、汎用性に欠けており、いくつかのモデルやシナリオではうまく機能するが、他のモデルではうまく機能しないため、バックドア攻撃を継続的に軽減するには不十分である。このギャップを埋めるために,我々はまず,CLMの訓練中に発生する「早期学習」現象を確認した。この現象は、モデルが最初はトレーニングデータの主な特徴に焦点を当てていたが、時間が経つにつれてバックドアのトリガーに敏感になり、バックドアの攻撃に対する過度な適合と感受性をもたらす可能性があることを示唆している。次に, バックドアへの過度な適合は, クロスエントロピー損失関数の使用による結果であり, クロスエントロピーの非有界性は, 有毒データの特徴にますます集中させる。そこで本研究では,知覚的分布をブレンドしてラベルスムースにラベルスムースにすることで,モデルがバックドアトリガに過度に収まることを防止し,バックドア攻撃に対するCLMの安全性を高めることで,汎用的で効果的な損失関数DeCEを提案する。本手法の有効性を検証するために,コード合成タスクを実験シナリオとして選択する。各種コード合成データセット,モデル,有毒比に対する実験により,CLMの安全性を高める上でのDeCEの適用性と有効性を示した。 Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# デバッグのための実用的かつ便利なプログラム修復に向けて Towards Practical and Useful Automated Program Repair for Debugging ( http://arxiv.org/abs/2407.08958v1 ) ライセンス: Link先を確認	Qi Xin, Haojun Wu, Steven P. Reiss, Jifeng Xuan,	(参考訳) 現在の自動プログラム修復(APR)技術は、現実的なデバッグに十分な実用的かつ有用なものではない。それらは、パッチ検証の正確性基準と頻繁なプログラムの再実行として、包括的なテストケースのスイートを必要とすること、高速ではないこと、プログラムの複数箇所を修正して、一般的に発生する複雑なバグを修復する能力など、非現実的な仮定に依存している。 APRの実用性、有効性、有用性を大幅に改善して、デバッグを支援したいと思っています。この目標に向けて,統合開発環境(IDE)で動作する対話型修復システムであるPracAPRを構想し,デバッグに有効な修復提案を行う。 PracAPRはテストスイートやプログラムの再実行を必要としない。開発者はIDEデバッガを使用しており、問題が観測された場所でプログラムが停止していると仮定する。問題仕様を得るために開発者と対話する。この仕様に基づいて、テストフリーでフロー分析に基づくフォールトローカライゼーション、大規模言語モデルに基づく局所的な修復と調整された戦略駆動のグローバルな修復を組み合わせたパッチ生成、およびシミュレーショントレース比較に基づくプログラム再実行不要なパッチ検証を実行し、修復を提案する。 PracAPRを使用することで、APRを便利にし、デバッグの日常的な部分へと、大きな一歩を踏み出したいと考えています。 Current automated program repair (APR) techniques are far from being practical and useful enough to be considered for realistic debugging. They rely on unrealistic assumptions including the requirement of a comprehensive suite of test cases as the correctness criterion and frequent program re-execution for patch validation; they are not fast; and their ability of repairing the commonly arising complex bugs by fixing multiple locations of the program is very limited. We hope to substantially improve APR's practicality, effectiveness, and usefulness to help people debug. Towards this goal, we envision PracAPR, an interactive repair system that works in an Integrated Development Environment (IDE) to provide effective repair suggestions for debugging. PracAPR does not require a test suite or program re-execution. It assumes that the developer uses an IDE debugger and the program has suspended at a location where a problem is observed. It interacts with the developer to obtain a problem specification. Based on the specification, it performs test-free, flow-analysis-based fault localization, patch generation that combines large language model-based local repair and tailored strategy-driven global repair, and program re-execution-free patch validation based on simulated trace comparison to suggest repairs. By having PracAPR, we hope to take a significant step towards making APR useful and an everyday part of debugging.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# 数ショット階層テキスト分類のための反復推論の連鎖によるドメイン階層適応 Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification ( http://arxiv.org/abs/2407.08959v1 ) ライセンス: Link先を確認	Ke Ji, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Jingsheng Gao, Ziyu Shang,	(参考訳) 近年,様々な事前学習型言語モデル (PLM) が提案されている。しかし、PLMにおける非構造的事前知識に制限されているため、特に下流データが極めて少ない場合に、階層的テキスト分類(HTC)のような複雑な構造化シナリオで一貫した性能を維持することは困難である。主な課題は、PLMの非構造化セマンティック空間を下流ドメイン階層に転送する方法である。複数ラベルの分類やグラフニューラルネットワーク(GNN)を用いてラベル階層をインジェクトする以前のHTCの作業とは異なり、本研究では、HTCの問題を数ショットの条件下で研究し、構造化されていない方法でPLMの知識を下流階層に適応させる。技術的には、階層的反復条件ランダムフィールド (HierICRF) と呼ばれる単純な手法を設計し、最もドメインが混在する方向を探索し、ドメイン階層適応を階層的反復言語モデリング問題として巧妙に構築し、推論中に階層的一貫性を自己補正し、階層的一貫性の維持による知識伝達を実現する。私たちは、さまざまなアーキテクチャ上でHierICRFを実行し、2つの人気のあるHTCデータセット上で大規模な実験を行い、HierICRFによるプロンプトによって、平均的なMicro-F1が28.80%、Macro-F1が36.29%から1.5%向上し、SOTAの階層的一貫性が保たれる一方で、以前のSOTAベースラインよりも大幅に向上することを示した。 Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex structured scenarios, such as hierarchical text classification (HTC), especially when the downstream data is extremely scarce. The main challenge is how to transfer the unstructured semantic space in PLMs to the downstream domain hierarchy. Unlike previous work on HTC which directly performs multi-label classification or uses graph neural network (GNN) to inject label hierarchy, in this work, we study the HTC problem under a few-shot setting to adapt knowledge in PLMs from an unstructured manner to the downstream hierarchy. Technically, we design a simple yet effective method named Hierarchical Iterative Conditional Random Field (HierICRF) to search the most domain-challenging directions and exquisitely crafts domain-hierarchy adaptation as a hierarchical iterative language modeling problem, and then it encourages the model to make hierarchical consistency self-correction during the inference, thereby achieving knowledge transfer with hierarchical consistency preservation. We perform HierICRF on various architectures, and extensive experiments on two popular HTC datasets demonstrate that prompt with HierICRF significantly boosts the few-shot HTC performance with an average Micro-F1 by 28.80% to 1.50% and Macro-F1 by 36.29% to 1.5% over the previous state-of-the-art (SOTA) baselines under few-shot settings, while remaining SOTA hierarchical consistency performance.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# 胸部CTにおけるセグメンテーション前訓練のための組織競合性セミマスクオートエンコーダ Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT ( http://arxiv.org/abs/2407.08961v1 ) ライセンス: Link先を確認	Jie Zheng, Ru Wen, Haiqin Hu, Lina Wei, Kui Su, Wei Chen, Chen Liu, Jun Wang,	(参考訳) 既存のマスク付き画像モデリング(MIM)は、胸部CTに適用した場合の2つの制限に直面する未ラベル画像から物体の像を知覚するための空間パッチベースのマスク再構築戦略に依存している。 1)CT画像に示される複雑な解剖学的詳細による非効率な特徴学習 2)上流モデルと下流モデルとの入力格差による準最適知識伝達。これらの課題に対処するため,胸部CT画像のモデリングのためのCTS-MAEと呼ばれる新しいMIM法を提案する。私たちの手法には2つの新しい設計があります。 1)より微細な解剖学的特徴を捉えるための組織ベースのマスキング・リコンストラクション戦略 2) 上流モデルと下流モデルのギャップを埋めるために,マスクとオリジナル画像ビューの対比学習を施したデュアルAEアーキテクチャ。本手法の有効性を検証するために, 肺炎, 縦隔腫瘍, 各種臓器の分節化に関わる課題に対して, 代表的コントラスト, 生成的, ハイブリッド自己教師型学習法を体系的に検討した。その結果,既存の手法と比較して組織認識表現をより効果的に学習し,全タスクのセグメンテーション性能を大幅に向上させることができた。 Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream models. To address these issues, we propose a new MIM method named Tissue-Contrastive Semi-Masked Autoencoder (TCS-MAE) for modeling chest CT images. Our method has two novel designs: 1) a tissue-based masking-reconstruction strategy to capture more fine-grained anatomical features, and 2) a dual-AE architecture with contrastive learning between the masked and original image views to bridge the gap of the upstream and downstream models. To validate our method, we systematically investigate representative contrastive, generative, and hybrid self-supervised learning methods on top of tasks involving segmenting pneumonia, mediastinal tumors, and various organs. The results demonstrate that, compared to existing methods, our TCS-MAE more effectively learns tissue-aware representations, thereby significantly enhancing segmentation performance across all tasks.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# 熱力学エントロピーと情報 Information vs Thermodynamic Entropy ( http://arxiv.org/abs/2407.08962v1 ) ライセンス: Link先を確認	Phil Attard,	(参考訳) シャノン情報は熱力学のエントロピーと異なり、熱力学の第二法則とは無関係である。 The Shannon information is shown to be different to the thermodynamic entropy, and indifferent to the Second Law of Thermodynamics.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# 多様性最適化における局所最適性:非自明なオフスプリング人口は不可欠である Local Optima in Diversity Optimization: Non-trivial Offspring Population is Essential ( http://arxiv.org/abs/2407.08963v1 ) ライセンス: Link先を確認	Denis Antipov, Aneta Neumann, Frank Neumann,	(参考訳) 多様性最適化の主目的は、適合性にある程度の低い制約を満たす多様なソリューションを見つけることである。進化的アルゴリズム(EA)は、自然に解の集団を最適化するために設計されているため、そのようなタスクにしばしば使用される。 EDOと呼ばれるこの多様性最適化のアプローチは、以前は理論的な観点から研究されてきたが、ほとんどの研究は、(\mu + 1)$ EAのような自明な子孫を持つEAのみを考慮に入れている。そこで本論文では,従来の単一目的最適化と多様性最適化の重大な違い,すなわち,少なくとも2つの個人を一度に置き換えることによってのみ逃れることのできる局所的最適集団の存在,すなわち$(\mu + 1)$アルゴリズムでは不可能である,という問題を例に挙げる。また、$(\mu + \lambda)$ EA with $\lambda \ge \mu$ は、ブランソン・アンド・サットン(TCS 2023)にインスパイアされた突然変異演算子を使用する場合、$k$-vertex カバー上の多様な集団を効果的に見つけることができることを示した。多様性を最適化するとき、$(\mu + \lambda)$ EAに生じる部分集合選択の問題を避けるために、人口に対する$(1 + 1)$ EAの類似である$(1_\mu + 1_\mu)$ EA$_D$も提案する。 The main goal of diversity optimization is to find a diverse set of solutions which satisfy some lower bound on their fitness. Evolutionary algorithms (EAs) are often used for such tasks, since they are naturally designed to optimize populations of solutions. This approach to diversity optimization, called EDO, has been previously studied from theoretical perspective, but most studies considered only EAs with a trivial offspring population such as the $(\mu + 1)$ EA. In this paper we give an example instance of a $k$-vertex cover problem, which highlights a critical difference of the diversity optimization from the regular single-objective optimization, namely that there might be a locally optimal population from which we can escape only by replacing at least two individuals at once, which the $(\mu + 1)$ algorithms cannot do. We also show that the $(\mu + \lambda)$ EA with $\lambda \ge \mu$ can effectively find a diverse population on $k$-vertex cover, if using a mutation operator inspired by Branson and Sutton (TCS 2023). To avoid the problem of subset selection which arises in the $(\mu + \lambda)$ EA when it optimizes diversity, we also propose the $(1_\mu + 1_\mu)$ EA$_D$, which is an analogue of the $(1 + 1)$ EA for populations, and which is also efficient at optimizing diversity on the $k$-vertex cover problem.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# 協調型適応型クルーズ制御のためのコミュニケーション・アウェア強化学習 Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control ( http://arxiv.org/abs/2407.08964v1 ) ライセンス: Link先を確認	Sicong Jiang, Seongjin Choi, Lijun Sun,	(参考訳) コラボレーティブ・アダプティブ・クルーズ・コントロール(CACC)は、コネクテッド・アンド・オートマチック・ビークルズ(CAV)における交通効率と安全性を高める上で重要な役割を担っている。強化学習(RL)は、CACCにおける複雑な意思決定プロセスの最適化に有効であることが証明され、システム性能と適応性が改善された。 RLのアプローチの中で、マルチエージェント強化学習(MARL)は、分散実行による集中訓練(CTDE)を通じて、複数のCAV間で協調的な行動を可能にすることで、顕著な可能性を示している。しかし、MARLはスケーラビリティの問題に直面することが多く、特にCACC車両が突然小隊に加わるか去ると性能が低下する。これらの課題に対処するため,コミュニケーション・アウェア・強化学習(CA-RL)を提案する。 CA-RLは、前方および後方情報伝送モジュールを介して車両の通信情報を抽出し、圧縮する通信対応モジュールを含む。これにより、CACCトラフィックフロー内での効率的な循環情報伝搬が可能となり、ポリシーの整合性を確保し、CACCにおけるMARLのスケーラビリティ問題を軽減できる。実験の結果,CA-RLは様々な交通シナリオにおいてベースライン法よりも優れており,車両数の変化にもかかわらず信頼性の高い性能を維持しつつ,優れたスケーラビリティ,堅牢性,システム全体の性能を実現していることがわかった。 Cooperative Adaptive Cruise Control (CACC) plays a pivotal role in enhancing traffic efficiency and safety in Connected and Autonomous Vehicles (CAVs). Reinforcement Learning (RL) has proven effective in optimizing complex decision-making processes in CACC, leading to improved system performance and adaptability. Among RL approaches, Multi-Agent Reinforcement Learning (MARL) has shown remarkable potential by enabling coordinated actions among multiple CAVs through Centralized Training with Decentralized Execution (CTDE). However, MARL often faces scalability issues, particularly when CACC vehicles suddenly join or leave the platoon, resulting in performance degradation. To address these challenges, we propose Communication-Aware Reinforcement Learning (CA-RL). CA-RL includes a communication-aware module that extracts and compresses vehicle communication information through forward and backward information transmission modules. This enables efficient cyclic information propagation within the CACC traffic flow, ensuring policy consistency and mitigating the scalability problems of MARL in CACC. Experimental results demonstrate that CA-RL significantly outperforms baseline methods in various traffic scenarios, achieving superior scalability, robustness, and overall system performance while maintaining reliable performance despite changes in the number of participating vehicles.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# Lite-SAMは、あらゆるセグメンテーションに必要なもの Lite-SAM Is Actually What You Need for Segment Everything ( http://arxiv.org/abs/2407.08965v1 ) ライセンス: Link先を確認	Jianhai Fu, Yuanjie Yu, Ningchuan Li, Yi Zhang, Qichao Chen, Jianping Xiong, Jun Yin, Zhiyu Xiang,	(参考訳) 本稿では、計算コストと冗長性を低減するために設計されたSegEveryタスクの効率的なエンドツーエンドソリューションであるLite-SAMを紹介する。 Lite-SAMは、CNN-Transformerハイブリッドエンコーダ(LiteViT)、自動プロンプトプロポーザルネットワーク(AutoPPN)、従来のプロンプトエンコーダ、マスクデコーダの4つの主要コンポーネントで構成されている。これらのコンポーネントはすべてSAMフレームワークに統合されます。我々のLiteViTは、高性能で軽量なバックボーンネットワークであり、1.16Mのパラメータしか持たない。また,AutoPPNを導入し,プロンプトボックスとポイント生成のための革新的なエンドツーエンド手法を提案する。これは従来のグリッドサーチサンプリング法よりも改善され、そのユニークな設計により、SAMシリーズのアルゴリズムに容易に統合でき、使い勝手を向上させることができる。公開データセットとプライベートデータセットの両方で、Lite-SAMを徹底的にベンチマークしました。評価には、パラメータの数、SegEveryの実行時間、精度など、幅広い普遍的な指標が含まれていた。その結果、Lite-SAMはリーン4.2Mパラメータで動作しており、SAM、MobileSAM、Edge-SAM、EfficientViT-SAM、MobileSAM-v2よりも43x、31x、20x、21x、1.6xのパフォーマンス改善を示しながら、競争の正確さを維持していることがわかった。これにより、Lite-SAMは、パフォーマンスと精度の最適な均衡を達成し、ドメインに新しい最先端(SOTA)ベンチマークを設定できる。 This paper introduces Lite-SAM, an efficient end-to-end solution for the SegEvery task designed to reduce computational costs and redundancy. Lite-SAM is composed of four main components: a streamlined CNN-Transformer hybrid encoder (LiteViT), an automated prompt proposal network (AutoPPN), a traditional prompt encoder, and a mask decoder. All these components are integrated within the SAM framework. Our LiteViT, a high-performance lightweight backbone network, has only 1.16M parameters, which is a 23% reduction compared to the lightest existing backbone network Shufflenet. We also introduce AutoPPN, an innovative end-to-end method for prompt boxes and points generation. This is an improvement over traditional grid search sampling methods, and its unique design allows for easy integration into any SAM series algorithm, extending its usability. we have thoroughly benchmarked Lite-SAM across a plethora of both public and private datasets. The evaluation encompassed a broad spectrum of universal metrics, including the number of parameters, SegEvery execution time, and accuracy. The findings reveal that Lite-SAM, operating with a lean 4.2M parameters, significantly outpaces its counterparts, demonstrating performance improvements of 43x, 31x, 20x, 21x, and 1.6x over SAM, MobileSAM, Edge-SAM, EfficientViT-SAM, and MobileSAM-v2 respectively, all the while maintaining competitive accuracy. This underscores Lite-SAM's prowess in achieving an optimal equilibrium between performance and precision, thereby setting a new state-of-the-art(SOTA) benchmark in the domain.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# LAPT:視覚言語モデルを用いたOOD検出のためのラベル駆動型自動プロンプトチューニング LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models ( http://arxiv.org/abs/2407.08966v1 ) ライセンス: Link先を確認	Yabin Zhang, Wenjie Zhu, Chenhang He, Lei Zhang,	(参考訳) オフ・オブ・ディストリビューション(OOD)検出は、未知のクラスからのサンプルを特定し、予期しない入力によるエラーを低減するため、モデルの信頼性に不可欠である。 CLIPのような視覚言語モデル(VLM)は、マルチモーダル情報を統合することで、OOD検出の強力なツールとして現れている。しかし、そのようなシステムの実践的応用は、ドメインの専門知識を必要とし、言語的なニュアンスに敏感な手動プロンプト工学によって挑戦されている。本稿では,手動プロンプトエンジニアリングの必要性を低減させるOOD検出の新しいアプローチである,ラベル駆動型自動プロンプトチューニング(LAPT)を提案する。 In-distriion (ID) クラス名と負ラベルを自動的にマイニングする分布認識プロンプトを開発した。これらのクラスラベルに関連付けられたトレーニングサンプルは、画像合成と検索によって自律的に収集され、手作業なしで即時学習が可能である。簡単なクロスエントロピー損失を即時最適化に利用し、クロスモーダルとクロスディストリビューションの混合戦略を用いて、画像ノイズを低減し、分布間の中間空間を探索する。 LAPTフレームワークは自律的に動作し、IDクラス名のみを入力として必要とし、手動による介入を不要とする。広範な実験により、LAPTは手作業によるプロンプトを一貫して上回り、OOD検出の新しい標準を設定した。さらに、LAPTは、IDとOODの区別を強化するだけでなく、ID分類精度も向上し、共変量シフトに対する一般化ロバスト性を強化し、フルスペクトルOOD検出タスクに挑戦する際、優れたパフォーマンスをもたらす。コードは \url{https://github.com/YBZh/LAPT} で公開されている。 Out-of-distribution (OOD) detection is crucial for model reliability, as it identifies samples from unknown classes and reduces errors due to unexpected inputs. Vision-Language Models (VLMs) such as CLIP are emerging as powerful tools for OOD detection by integrating multi-modal information. However, the practical application of such systems is challenged by manual prompt engineering, which demands domain expertise and is sensitive to linguistic nuances. In this paper, we introduce Label-driven Automated Prompt Tuning (LAPT), a novel approach to OOD detection that reduces the need for manual prompt engineering. We develop distribution-aware prompts with in-distribution (ID) class names and negative labels mined automatically. Training samples linked to these class labels are collected autonomously via image synthesis and retrieval methods, allowing for prompt learning without manual effort. We utilize a simple cross-entropy loss for prompt optimization, with cross-modal and cross-distribution mixing strategies to reduce image noise and explore the intermediate space between distributions, respectively. The LAPT framework operates autonomously, requiring only ID class names as input and eliminating the need for manual intervention. With extensive experiments, LAPT consistently outperforms manually crafted prompts, setting a new standard for OOD detection. Moreover, LAPT not only enhances the distinction between ID and OOD samples, but also improves the ID classification accuracy and strengthens the generalization robustness to covariate shifts, resulting in outstanding performance in challenging full-spectrum OOD detection tasks. Codes are available at \url{https://github.com/YBZh/LAPT}.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# 従来のRE手法と大規模言語モデルの統合によるFew-Shot関係抽出 Empowering Few-Shot Relation Extraction with The Integration of Traditional RE Methods and Large Language Models ( http://arxiv.org/abs/2407.08967v1 ) ライセンス: Link先を確認	Ye Liu, Kai Zhang, Aoran Gan, Linan Yue, Feng Hu, Qi Liu, Enhong Chen,	(参考訳) Few-Shot Relation extract (FSRE)は、限られたトレーニングインスタンスを利用するリレーショナル抽出(RE)のサブタスクであり、非常に低リソースのシナリオでテキスト情報を抽出する能力により、自然言語処理(NLP)の研究者にアピールする。 FSREの主要な手法は、事前学習言語モデル(PLM)に基づく微調整または即時チューニング技術である。近年,大規模言語モデル (LLM) の出現により,多くの研究者が文脈学習 (ICL) を通じてFSREを探求している。しかし、従来のREモデルやLLMに基づいたメソッドには、かなりの制限がある。従来のREモデルは、必要な事前知識の欠如によって妨げられ、一方LLMは、REのタスク固有の能力に不足しています。これらの欠点に対処するため,従来のREモデルとLLMを相乗的に組み合わせたデュアルシステム拡張関係エクストラクタ(DSARE)を提案する。具体的には、DSARE は従来の RE モデルに LLM の以前の知識を革新的に注入し、関係抽出による RE に対する LLM のタスク固有の適性を向上させる。さらに、統合予測モジュールを用いて、これらの2つの予測を共同で検討し、最終的な結果を導出する。大規模実験により提案手法の有効性が示された。 Few-Shot Relation Extraction (FSRE), a subtask of Relation Extraction (RE) that utilizes limited training instances, appeals to more researchers in Natural Language Processing (NLP) due to its capability to extract textual information in extremely low-resource scenarios. The primary methodologies employed for FSRE have been fine-tuning or prompt tuning techniques based on Pre-trained Language Models (PLMs). Recently, the emergence of Large Language Models (LLMs) has prompted numerous researchers to explore FSRE through In-Context Learning (ICL). However, there are substantial limitations associated with methods based on either traditional RE models or LLMs. Traditional RE models are hampered by a lack of necessary prior knowledge, while LLMs fall short in their task-specific capabilities for RE. To address these shortcomings, we propose a Dual-System Augmented Relation Extractor (DSARE), which synergistically combines traditional RE models with LLMs. Specifically, DSARE innovatively injects the prior knowledge of LLMs into traditional RE models, and conversely enhances LLMs' task-specific aptitude for RE through relation extraction augmentation. Moreover, an Integrated Prediction module is employed to jointly consider these two respective predictions and derive the final results. Extensive experiments demonstrate the efficacy of our proposed method.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# SlideGCD:全スライド画像分類のための知識蒸留によるグラフ協調学習 SlideGCD: Slide-based Graph Collaborative Training with Knowledge Distillation for Whole Slide Image Classification ( http://arxiv.org/abs/2407.08968v1 ) ライセンス: Link先を確認	Tong Shu, Jun Shi, Dongdong Sun, Zhiguo Jiang, Yushan Zheng,	(参考訳) 既存のWSI分析法は、腫瘍の病理組織学的特徴ががん診断の重要な指針である、という結論に基づいている。特に、がんの進化は連続的なプロセスであるため、様々な段階、解剖学的位置、患者との相関や差異を考慮する必要がある。しかし、最近の研究は主にスライド間の相関を無視して、単一のWSIの内部コンテキスト情報に焦点を当てている。スライド相互相関の導入がWSI表現学習の改善をもたらすかどうかを検証するため,既存のマルチインスタンス学習(MIL)手法をバックボーンとして考慮し,WSI分類タスクをノード分類問題としてフォッジする,汎用的なWSI解析パイプラインであるSlideGCDを提案する。より具体的には、SlideGCDは、その後の広範なスライドベースのグラフ構築のために、以前のスライド埋め込みを格納するノードバッファを宣言し、グラフ学習を実施して、スライドベースのグラフに暗示される相関関係を探索する。さらに、MIL分類器とグラフ学習を2つの並列ワークフローに分類し、知識蒸留をデプロイして、識別可能な情報をグラフニューラルネットワークに転送する。 2つのTCGAベンチマークデータセットで、これまでの4つの最先端MILメソッドのSlideGCDによる一貫したパフォーマンス向上が観察された。コードはhttps://github.com/HFUT-miaLab/SlideGCDで入手できる。 Existing WSI analysis methods lie on the consensus that histopathological characteristics of tumors are significant guidance for cancer diagnostics. Particularly, as the evolution of cancers is a continuous process, the correlations and differences across various stages, anatomical locations and patients should be taken into account. However, recent research mainly focuses on the inner-contextual information in a single WSI, ignoring the correlations between slides. To verify whether introducing the slide inter-correlations can bring improvements to WSI representation learning, we propose a generic WSI analysis pipeline SlideGCD that considers the existing multi-instance learning (MIL) methods as the backbone and forge the WSI classification task as a node classification problem. More specifically, SlideGCD declares a node buffer that stores previous slide embeddings for subsequent extensive slide-based graph construction and conducts graph learning to explore the inter-correlations implied in the slide-based graph. Moreover, we frame the MIL classifier and graph learning into two parallel workflows and deploy the knowledge distillation to transfer the differentiable information to the graph neural network. The consistent performance boosting, brought by SlideGCD, of four previous state-of-the-art MIL methods is observed on two TCGA benchmark datasets. The code is available at https://github.com/HFUT-miaLab/SlideGCD.	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# Llamaの検出 - 大規模言語モデルによるスマートコントラクトの脆弱性検出 Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models ( http://arxiv.org/abs/2407.08969v1 ) ライセンス: Link先を確認	Peter Ince, Xiapu Luo, Jiangshan Yu, Joseph K. Liu, Xiaoning Du,	(参考訳) 本稿では, OpenAI の GPT-4 がよく動作するが, スマートコントラクトの脆弱性検出において, GPT-4 よりも優れたオープンソースモデルを微調整できるという仮説を検証した。我々はMetaのCode Llamaと17kプロンプトのデータセット、Llama - Foundation と Detect Llama - Instruct の2つのモデルを微調整し、OpenAI の GPT-3.5 Turbo Model (GPT-3.5FT) を微調整する。次に、これらのモデルとランダムなベースラインを、GPT-4とGPT-4のTurboに対して開発したテストセットに基づいて評価し、データセットから8つの脆弱性を検出し、2つの最上位の脆弱性(重み付きF1スコア)を検出します。バイナリ分類(つまり、このスマートコントラクトは脆弱か?)では、GPT-3.5FT と Detect Llama - Foundation の2つの最高のパフォーマンスモデルが、0.776$と0.68$のF1スコアを達成し、GPT-4とGPT-4 Turboを0.66$と0.675$で上回ります。 GPT-4は0.218ドル、GPT-4は0.243ドル、F1は0.719ドル、GPT-3.5FTは0.674ドル、Llamaは0.363ドル、GPT-4は0.429ドルだった。 In this paper, we test the hypothesis that although OpenAI's GPT-4 performs well generally, we can fine-tune open-source models to outperform GPT-4 in smart contract vulnerability detection. We fine-tune two models from Meta's Code Llama and a dataset of 17k prompts, Detect Llama - Foundation and Detect Llama - Instruct, and we also fine-tune OpenAI's GPT-3.5 Turbo model (GPT-3.5FT). We then evaluate these models, plus a random baseline, on a testset we develop against GPT-4, and GPT-4 Turbo's, detection of eight vulnerabilities from the dataset and the two top identified vulnerabilities - and their weighted F1 scores. We find that for binary classification (i.e., is this smart contract vulnerable?), our two best-performing models, GPT-3.5FT and Detect Llama - Foundation, achieve F1 scores of $0.776$ and $0.68$, outperforming both GPT-4 and GPT-4 Turbo, $0.66$ and $0.675$. For the evaluation against individual vulnerability identification, our top two models, GPT-3.5FT and Detect Llama - Foundation, both significantly outperformed GPT-4 and GPT-4 Turbo in both weighted F1 for all vulnerabilities ($0.61$ and $0.56$ respectively against GPT-4's $0.218$ and GPT-4 Turbo's $0.243$) and weighted F1 for the top two identified vulnerabilities ($0.719$ for GPT-3.5FT, $0.674$ for Detect Llama - Foundation against GPT-4's $0.363$ and GPT-4 Turbo's $0.429$).	翻訳日:2024-07-16 00:56:38 公開日:2024-07-12
# ソフトプロンプトは難しい - 隠れたメタ命令でビジュアル言語モデルをステアリングする Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions ( http://arxiv.org/abs/2407.08970v1 ) ライセンス: Link先を確認	Tingwei Zhang, Collin Zhang, John X. Morris, Eugene Bagdasaryan, Vitaly Shmatikov,	(参考訳) 隠れた"メタインストラクション"は、モデルがどのようにイメージを解釈し、モデルのアウトプットを操り、逆長線スタイル、感情、視点を表現する。ソフトプロンプトとして機能する画像を生成することによってメタ命令を生成する方法について説明する。ジェイルブレイク攻撃や敵の例とは異なり、これらの画像から得られる出力は、画像の視覚的内容に基づいて可視であり、敵の指示に従う。誤情報やスピンを含むこれらの攻撃のリスクについて述べるとともに、複数の視覚言語モデルや敵対的メタオブジェクトに対する有効性を評価し、明示的なテキスト命令によって利用できない基盤となる言語モデルの能力を「アンロック」する方法を実証する。最後に、これらの攻撃に対する防御について論じる。 We introduce a new type of indirect injection vulnerabilities in language models that operate on images: hidden "meta-instructions" that influence how the model interprets the image and steer the model's outputs to express an adversary-chosen style, sentiment, or point of view. We explain how to create meta-instructions by generating images that act as soft prompts. Unlike jailbreaking attacks and adversarial examples, the outputs resulting from these images are plausible and based on the visual content of the image, yet follow the adversary's (meta-)instructions. We describe the risks of these attacks, including misinformation and spin, evaluate their efficacy for multiple visual language models and adversarial meta-objectives, and demonstrate how they can "unlock" the capabilities of the underlying language models that are unavailable via explicit text instructions. Finally, we discuss defenses against these attacks.	翻訳日:2024-07-16 00:46:39 公開日:2024-07-12
# 弱教師付き時間行動定位のためのフルステージ擬似ラベル品質向上 Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization ( http://arxiv.org/abs/2407.08971v1 ) ライセンス: Link先を確認	Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen,	(参考訳) 微弱に監督された時間的行動局所化(WSTAL)は、ビデオレベルの監督のみを使用して、未編集ビデオのアクションをローカライズすることを目的としている。最新のWSTAL手法は、分類に基づくトレーニングとローカライゼーションにおける推論ターゲットのギャップを埋め、最先端の結果を得るための擬似ラベル学習フレームワークを導入している。これらのフレームワークでは、回帰に基づく学生モデルのために、分類に基づくモデルを使用して擬似ラベルを生成し、そこから学習する。しかし、最終結果の鍵となるフレームワークにおける擬似ラベルの品質は慎重に研究されていない。本稿では,FuSTALフレームワークを構築するための簡易かつ効率的な擬似ラベル品質向上機構を提案する。 FuSTALは擬似ラベルの品質を3段階で強化する: 提案生成段階におけるクロスビデオコントラスト学習、提案した選択段階における事前フィルタリング、訓練段階におけるEMAベースの蒸留。これらの設計は、フレームワークの異なる段階で擬似ラベルの品質を高め、より情報的で、偽りがなく、よりスムーズなアクション提案を生み出すのに役立つ。これらの総合的な設計の助けを借りて、FuSTALはTHUMOS'14で平均50.8%のmAPを達成し、以前のベストメソッドを1.2%上回った。 Weakly-supervised Temporal Action Localization (WSTAL) aims to localize actions in untrimmed videos using only video-level supervision. Latest WSTAL methods introduce pseudo label learning framework to bridge the gap between classification-based training and inferencing targets at localization, and achieve cutting-edge results. In these frameworks, a classification-based model is used to generate pseudo labels for a regression-based student model to learn from. However, the quality of pseudo labels in the framework, which is a key factor to the final result, is not carefully studied. In this paper, we propose a set of simple yet efficient pseudo label quality enhancement mechanisms to build our FuSTAL framework. FuSTAL enhances pseudo label quality at three stages: cross-video contrastive learning at proposal Generation-Stage, prior-based filtering at proposal Selection-Stage and EMA-based distillation at Training-Stage. These designs enhance pseudo label quality at different stages in the framework, and help produce more informative, less false and smoother action proposals. With the help of these comprehensive designs at all stages, FuSTAL achieves an average mAP of 50.8% on THUMOS'14, outperforming the previous best method by 1.2%, and becomes the first method to reach the milestone of 50%.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# 極大カーネルネットワークの暗黒秘密をロバスト性に発見する Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness ( http://arxiv.org/abs/2407.08972v1 ) ライセンス: Link先を確認	Honghao Chen, Yurong Zhang, Xiaokun Feng, Xiangxiang Chu, Kaiqi Huang,	(参考訳) ロバスト性は、ディープラーニングモデルを野生に展開する上で、考慮すべき重要な側面である。ビジョントランスフォーマー(ViT)の堅牢性の研究に多くの研究が注がれており、2020年代以降、視覚タスクの主要なバックボーン選択として支配されてきた。近年、一部の大規模なカーネル・コンブネットは、性能と効率性で復活している。しかし、大規模なカーネルネットワークが堅牢なのか、その堅牢性に起因するのかはまだ不明である。本稿では,6つの多種多様なロバスト性ベンチマークデータセット上で,大カーネルのロバスト性と,典型的な小カーネルのロバスト性とViTとの相違点を総合的に評価する。そして、その強靭性の背後にある要因を分析するために、定量的および質的な視点から実験を設計し、典型的な共振器とは全く異なる大きなカーネル共振器の興味深い特性を明らかにする。我々の実験は、純粋なCNNが、ViTに匹敵する、あるいはそれよりも優れた、非常に堅牢性を達成できることを初めて実証した。本研究は,オクルージョンの不変性,カーネルの注意パターン,周波数特性について解析し,ロバスト性の原因に関する新たな知見を提供する。 Robustness is a vital aspect to consider when deploying deep learning models into the wild. Numerous studies have been dedicated to the study of the robustness of vision transformers (ViTs), which have dominated as the mainstream backbone choice for vision tasks since the dawn of 2020s. Recently, some large kernel convnets make a comeback with impressive performance and efficiency. However, it still remains unclear whether large kernel networks are robust and the attribution of their robustness. In this paper, we first conduct a comprehensive evaluation of large kernel convnets' robustness and their differences from typical small kernel counterparts and ViTs on six diverse robustness benchmark datasets. Then to analyze the underlying factors behind their strong robustness, we design experiments from both quantitative and qualitative perspectives to reveal large kernel convnets' intriguing properties that are completely different from typical convnets. Our experiments demonstrate for the first time that pure CNNs can achieve exceptional robustness comparable or even superior to that of ViTs. Our analysis on occlusion invariance, kernel attention patterns and frequency characteristics provide novel insights into the source of robustness.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# 解釈可能な機械学習のための白黒ボックス技術の統合 Integrating White and Black Box Techniques for Interpretable Machine Learning ( http://arxiv.org/abs/2407.08973v1 ) ライセンス: Link先を確認	Eric M. Vernon, Naoki Masuyama, Yusuke Nojima,	(参考訳) 機械学習アルゴリズム設計では、アルゴリズムの解釈可能性と性能の間にトレードオフが存在する。一般に、人間が理解するのが簡単で容易なアルゴリズムは、より複雑で透明性の低いアルゴリズムよりも悪い性能を示す傾向がある。例えば、ランダムな森林分類器は単純な決定木よりも正確である可能性が高いが、解釈可能性の犠牲になる。本稿では,高い解釈可能な分類器(ホワイトボックスモデル)を用いてより簡単な入力を分類するアンサンブル分類器の設計と,より強力だが解釈可能な分類器(ブラックボックスモデル)を用いてより難しい入力を行う。 In machine learning algorithm design, there exists a trade-off between the interpretability and performance of the algorithm. In general, algorithms which are simpler and easier for humans to comprehend tend to show worse performance than more complex, less transparent algorithms. For example, a random forest classifier is likely to be more accurate than a simple decision tree, but at the expense of interpretability. In this paper, we present an ensemble classifier design which classifies easier inputs using a highly-interpretable classifier (i.e., white box model), and more difficult inputs using a more powerful, but less interpretable classifier (i.e., black box model).	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# 抗がんペプチド予測のためのトポロジー強化機械学習モデル(Top-ML) Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction ( http://arxiv.org/abs/2407.08974v1 ) ライセンス: Link先を確認	Joshua Zhi En Tan, JunJie Wee, Xue Gong, Kelin Xia,	(参考訳) 近年,治療ペプチドは癌治療に大いに期待されている。強力な抗がんペプチドを探索するために、人工知能(AI)ベースのアプローチが、潜在的な候補を体系的にスクリーニングするために開発されている。しかし、これらの機械学習モデルでは、ペプチドの効率的な分解の欠如がボトルネックとなっている。本稿では,抗がんペプチド予測のためのトポロジー強化機械学習モデル(Top-ML)を提案する。筆者らのTop-MLでは, ベクターおよびスペクトル記述子を特徴とする配列"接続"情報から得られるペプチドトポロジ的特徴を用いている。我々のTop-MLモデルは、広く使われている2つのAntiCP 2.0ベンチマークデータセットで検証され、最先端のパフォーマンスを達成した。本研究は,抗がんペプチドの同定を促進するために,新規なトポロジを基盤とした創製の可能性を強調した。 Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by vector and spectral descriptors. Our Top-ML model has been validated on two widely used AntiCP 2.0 benchmark datasets and has achieved state-of-the-art performance. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# ランダムフーリエ特徴を持つカーネル2サンプルテストにおける計算統計的トレードオフ Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features ( http://arxiv.org/abs/2407.08976v1 ) ライセンス: Link先を確認	Ikjun Choi, Ilmun Kim,	(参考訳) 近年,2サンプル試験の手法が急増しており,その中の1つは,高次元・高次元データを扱うための有効なツールとして,最大平均離散性(MMD)テスト(Maximum Mean Discrepancy)テスト(英語版))が出現している。成功と広く採用されているにもかかわらず、MDDテストの主な制限は2次時間の複雑さであり、大規模な分析の課題となっている。手順の迅速化には様々なアプローチが提案されているが、MDD試験と同等の出力保証を準4次時間で達成できるかどうかは不明である。このギャップを埋めるために、ランダムなフーリエ特徴を用いて近似MDDテストを再検討し、その計算統計的トレードオフについて検討する。まず,無作為な特徴数が無限大に近づいた場合にのみ,近似MDDテストがパワーで一意に一致していることを明らかにする。次に、テストの均一なパワーを検討し、ミニマックステストフレームワークの下でタイムパワートレードオフを研究する。その結果, ランダムな特徴数を慎重に選択することにより, MMD試験と同一の最小分離率を準4次時間で達成できることが示唆された。我々は、ソボレフ球の密度のような異なる分布仮定の下でこの点を実証する。我々の理論的発見はシミュレーション研究によって裏付けられている。 Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread adoption, the primary limitation of the MMD test has been its quadratic-time complexity, which poses challenges for large-scale analysis. While various approaches have been proposed to expedite the procedure, it has been unclear whether it is possible to attain the same power guarantee as the MMD test at sub-quadratic time cost. To fill this gap, we revisit the approximated MMD test using random Fourier features, and investigate its computational-statistical trade-off. We start by revealing that the approximated MMD test is pointwise consistent in power only when the number of random features approaches infinity. We then consider the uniform power of the test and study the time-power trade-off under the minimax testing framework. Our result shows that, by carefully choosing the number of random features, it is possible to attain the same minimax separation rates as the MMD test within sub-quadratic time. We demonstrate this point under different distributional assumptions such as densities in a Sobolev ball. Our theoretical findings are corroborated by simulation studies.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# CURE:プライバシ保護のスプリット学習を正しく行う CURE: Privacy-Preserving Split Learning Done Right ( http://arxiv.org/abs/2407.08977v1 ) ライセンス: Link先を確認	Halil Ibrahim Kanpak, Aqsa Shabbir, Esra Genç, Alptekin Küpçü, Sinem Sav,	(参考訳) ディープニューラルネットワークのトレーニングには、計算上の制約により、大規模なデータセット、ストレージ、クラウドサーバでの処理が必要になることが少なくない。この手続きは、医療などの領域で厳格なプライバシー規制に従う必要がある。モデル層をクライアントとサーバに分割するフレームワークであるスプリットラーニング(SL)は、分散モデルトレーニングに広く採用されている。 Split Learningは、完全なパラメータセットへのサーバアクセスを制限することによって、プライバシのリスクを低減するが、以前の調査では、サーバとクライアントの間で交換された中間出力が、クライアントのデータプライバシを損なう可能性がある。このシナリオには、同型暗号化(HE)ベースのソリューションが存在するが、しばしば禁止的な計算負担を課す。これらの課題に対処するために,モデルサーバ側と任意にデータのみを暗号化するHEに基づく新しいシステムであるCUREを提案する。 CUREはセキュアなSLを実現すると同時に、高度なパッキング技術による通信と並列化を大幅に改善する。我々は,1層ネットワークのHEレベルを1つ消費し,n層ニューラルネットワークへの解を一般化する2つのパッキング手法を提案する。 CUREは,従来のプライバシ保存方式に比べて,実行時の16倍の効率で,平文SLと同等の精度を達成できることを実証した。 Training deep neural networks often requires large-scale datasets, necessitating storage and processing on cloud servers due to computational constraints. The procedures must follow strict privacy regulations in domains like healthcare. Split Learning (SL), a framework that divides model layers between client(s) and server(s), is widely adopted for distributed model training. While Split Learning reduces privacy risks by limiting server access to the full parameter set, previous research has identified that intermediate outputs exchanged between server and client can compromise client's data privacy. Homomorphic encryption (HE)-based solutions exist for this scenario but often impose prohibitive computational burdens. To address these challenges, we propose CURE, a novel system based on HE, that encrypts only the server side of the model and optionally the data. CURE enables secure SL while substantially improving communication and parallelization through advanced packing techniques. We propose two packing schemes that consume one HE level for one-layer networks and generalize our solutions to n-layer neural networks. We demonstrate that CURE can achieve similar accuracy to plaintext SL while being 16x more efficient in terms of the runtime compared to the state-of-the-art privacy-preserving alternatives.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# 大規模言語モデルによる第1章から第2章へのコンテキスト認識文学翻訳に向けて Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models ( http://arxiv.org/abs/2407.08978v1 ) ライセンス: Link先を確認	Linghao Jin, Li An, Xuezhe Ma,	(参考訳) 既存の文書レベルの翻訳データセットにおける談話現象は希少であり、文脈対応機械翻訳モデルの開発において根本的な障害となっている。さらに、既存の文書レベルのコーパスや文脈対応機械翻訳手法は、文レベルのアライメントに関する非現実的な仮定に依存している。これらの問題を緩和するために、我々はまず、複雑な談話構造を持つ160冊の本からなる漢文文学の新しいデータセットをキュレートする。そこで本稿では,Ch2Ch(チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/ さらに,Ch2Ch文の翻訳領域において,大規模言語モデル(LLM)を微調整する潜在的なアプローチを導入する。包括的分析を通して、モデル学習法と翻訳復号アルゴリズムの両方に関して、Ch2Ch設定による文体翻訳が本質的に困難であることを明らかにする。 Discourse phenomena in existing document-level translation datasets are sparse, which has been a fundamental obstacle in the development of context-aware machine translation models. Moreover, most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments. To mitigate these issues, we first curate a novel dataset of Chinese-English literature, which consists of 160 books with intricate discourse structures. Then, we propose a more pragmatic and challenging setting for context-aware translation, termed chapter-to-chapter (Ch2Ch) translation, and investigate the performance of commonly-used machine translation models under this setting. Furthermore, we introduce a potential approach of finetuning large language models (LLMs) within the domain of Ch2Ch literary translation, yielding impressive improvements over baselines. Through our comprehensive analysis, we unveil that literary translation under the Ch2Ch setting is challenging in nature, with respect to both model learning methods and translation decoding algorithms.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# 構文記述によるより信頼しやすく解釈可能なLLMのコード化に向けて Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations ( http://arxiv.org/abs/2407.08983v1 ) ライセンス: Link先を確認	David N. Palacio, Daniel Rodriguez-Cardenas, Alejandro Velasco, Dipin Khati, Kevin Moran, Denys Poshyvanyk,	(参考訳) 信頼性と解釈性は、LLMにとって、厳密には結びついている概念である。 LLMが解釈可能なほど、より信頼できるものになります。しかし、コード関連タスクに適用する場合のLLMの解釈技術は、精度の測定、モデルがどのように変化に反応するかの計測、あるいはより深い解釈可能性のために予測時に必要とされる詳細な説明ではなく個々のタスクパフォーマンスに重点を置いている。本稿では,モデル信頼度とプログラミング言語の構文構造との関係を基礎とした説明を生成する,LLMの解釈可能性手法であるASTrustを紹介する。 ASTrustは抽象構文木に基づく構文カテゴリのコンテキストで生成されたコードを説明し、ローカル(個別のコードスニペット)とグローバル(コードのより大きなデータセット)のレベルでモデル予測を理解する実践者を支援する。 AST内に存在するよく知られた構文構造にモデルの信頼度スコアを分配し割り当てることにより、当社のアプローチは、開発者が親しんだプログラミング言語の概念と直接一致するモデル信頼度ビューを提供することにより、トークンレベルの信頼度マッピングを実行する従来の技術を超えた。 ASTrustを実践するために,ASTからの構文構造のシーケンス,熱マップ,グラフに基づく可視化に重畳されたモデル信頼性スコアを自動可視化する手法を開発した。我々は、GitHubリポジトリのキュレートされたセット上での12の人気のあるLCMに関するデータサイエンス研究を通じてASTrustがもたらす実用的なメリットと、人間による研究によるASTrustの有用性について検討する。 Trustworthiness and interpretability are inextricably linked concepts for LLMs. The more interpretable an LLM is, the more trustworthy it becomes. However, current techniques for interpreting LLMs when applied to code-related tasks largely focus on accuracy measurements, measures of how models react to change, or individual task performance instead of the fine-grained explanations needed at prediction time for greater interpretability, and hence trust. To improve upon this status quo, this paper introduces ASTrust, an interpretability method for LLMs of code that generates explanations grounded in the relationship between model confidence and syntactic structures of programming languages. ASTrust explains generated code in the context of syntax categories based on Abstract Syntax Trees and aids practitioners in understanding model predictions at both local (individual code snippets) and global (larger datasets of code) levels. By distributing and assigning model confidence scores to well-known syntactic structures that exist within ASTs, our approach moves beyond prior techniques that perform token-level confidence mapping by offering a view of model confidence that directly aligns with programming language concepts with which developers are familiar. To put ASTrust into practice, we developed an automated visualization that illustrates the aggregated model confidence scores superimposed on sequence, heat-map, and graph-based visuals of syntactic structures from ASTs. We examine both the practical benefit that ASTrust can provide through a data science study on 12 popular LLMs on a curated set of GitHub repos and the usefulness of ASTrust through a human study.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# 高等教育における創発的AI政策の探求--中国,日本,モンゴル,米国からの比較 Exploring Generative AI Policies in Higher Education: A Comparative Perspective from China, Japan, Mongolia, and the USA ( http://arxiv.org/abs/2407.08986v1 ) ライセンス: Link先を確認	Qin Xie, Ming Li, Ariunaa Enkhtur,	(参考訳) 本研究は,中国,日本,モンゴル,米国の4カ国における生成AIに関する国家政策の比較分析を行った。質的比較分析(QCA)手法を用いて、これらの国々の高等教育環境におけるジェネレーティブAIに対する反応を調査し、このグループ内での彼らのアプローチの多様性を精査する。高等教育における創造的AIに対する肯定的な態度を示す4つの国はいずれも、日本と米国は、人間中心のアプローチを優先し、教育と学習の直接的なガイダンスを提供する。対照的に、中国とモンゴルは国家の安全に関する懸念を優先し、そのガイドラインは特に教育に合わせたものではなく、社会水準に重点を置いている。さらに、多様性、株式、包摂性を強調している4カ国すべてに拘わらず、デジタル格差に対処する措置について明確に議論したり実施したりすることは一貫して失敗している。これらの国々の高等教育における生成的AIに関する態度と政策の総合的な比較分析を提供することにより、既存の文献を豊かにし、政策立案者にグローバルな視点を提供し、この領域の政策が排除よりも包摂性を促進することを確実にする。 This study conducts a comparative analysis of national policies on Generative AI across four countries: China, Japan, Mongolia, and the USA. Employing the Qualitative Comparative Analysis (QCA) method, it examines the responses of these nations to Generative AI in higher education settings, scrutinizing the diversity in their approaches within this group. While all four countries exhibit a positive attitude toward Generative AI in higher education, Japan and the USA prioritize a human-centered approach and provide direct guidance in teaching and learning. In contrast, China and Mongolia prioritize national security concerns, with their guidelines focusing more on the societal level rather than being specifically tailored to education. Additionally, despite all four countries emphasizing diversity, equity, and inclusion, they consistently fail to clearly discuss or implement measures to address the digital divide. By offering a comprehensive comparative analysis of attitudes and policies regarding Generative AI in higher education across these countries, this study enriches existing literature and provides policymakers with a global perspective, ensuring that policies in this domain promote inclusion rather than exclusion.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# 非定常未知過程からのパラメータ推定 Parameter inference from a non-stationary unknown process ( http://arxiv.org/abs/2407.08987v1 ) ライセンス: Link先を確認	Kieran S. Owens, Ben D. Fulcher,	(参考訳) 非定常系は、二酸化炭素濃度の変動による気候パターンから、上昇する神経調節によって引き起こされる脳のダイナミクスまで、世界中で見られる。したがって、非定常過程を解析する手法が必要であるが、科学や産業における重要な問題において実際に使用されるほとんどの時系列解析手法は、定常性の仮定を単純化する。非定常システムの解析における重要な問題は、非定常未知プロセス(PINUP)からのパラメータ推論と呼ばれる問題クラスである。観測された時系列が与えられた場合、基礎となるシステムの数学的モデルに関する知識や推論を必要とせず、時系列の非定常性を駆動するパラメータを推測する。ここでは、PINUPのための多様なアルゴリズムの文献をレビューし、統一する。問題を定式化し、様々なアルゴリズムの貢献を分類する。この合成により、研究者は文献のギャップを特定でき、異なる方法の体系的な比較が可能になる。また、既存の手法がテストされている最も一般的なシステム(特に静止しないLorenzプロセスとロジスティックマップ)は、ウィンドウ付き平均や分散のような単純な統計的特徴を使用することで驚くほど簡単に動作できることを示し、アルゴリズム性能の証拠としてこれらのシステムで優れたパフォーマンスを使用するプラクティスを損なう。そして、多くの既存手法が不十分に動作し、この分野の方法論的進歩を促進するために使用できる、より困難な問題を特定する。本研究は,非定常系解析への科学的貢献を統一し,PINUP問題と非定常現象のより広範な研究の進展に向けた新たな方向性を提案する。 Non-stationary systems are found throughout the world, from climate patterns under the influence of variation in carbon dioxide concentration, to brain dynamics driven by ascending neuromodulation. Accordingly, there is a need for methods to analyze non-stationary processes, and yet most time-series analysis methods that are used in practice, on important problems across science and industry, make the simplifying assumption of stationarity. One important problem in the analysis of non-stationary systems is the problem class that we refer to as Parameter Inference from a Non-stationary Unknown Process (PINUP). Given an observed time series, this involves inferring the parameters that drive non-stationarity of the time series, without requiring knowledge or inference of a mathematical model of the underlying system. Here we review and unify a diverse literature of algorithms for PINUP. We formulate the problem, and categorize the various algorithmic contributions. This synthesis will allow researchers to identify gaps in the literature and will enable systematic comparisons of different methods. We also demonstrate that the most common systems that existing methods are tested on - notably the non-stationary Lorenz process and logistic map - are surprisingly easy to perform well on using simple statistical features like windowed mean and variance, undermining the practice of using good performance on these systems as evidence of algorithmic performance. We then identify more challenging problems that many existing methods perform poorly on and which can be used to drive methodological advances in the field. Our results unify disjoint scientific contributions to analyzing non-stationary systems and suggest new directions for progress on the PINUP problem and the broader study of non-stationary phenomena.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# テキスト中の摂動に対するLDMのロバスト性 Robustness of LLMs to Perturbations in Text ( http://arxiv.org/abs/2407.08989v1 ) ライセンス: Link先を確認	Ayush Singh, Navpreet Singh, Shubham Vatsal,	(参考訳) クリーンなデータセットを持つことは、ほとんどの自然言語処理(NLP)システムの基本的な前提となっている。しかし、適切に書かれたテキストは現実のシナリオではほとんど見つからないため、しばしば上記の基礎前提を無効にする。最近、Large Language Model (LLM) は目覚ましい性能を示しているが、現実のデータでは避けられないノイズを処理できるだろうか? この研究は、LLMのテキストのモルフォロジー変化に対するレジリエンスを調査することによって、この重要な問題に取り組む。そこで本研究では,異なるレベルのノイズを多種多様なデータセットに人工的に導入し,オリジナルテキストの劣化に対するLLMの頑健さを体系的に評価する。以上の結果から, LLM は, 一般の信念とは対照的に, 文中での騒々しい摂動に対して静かであることが明らかとなった。これはBERTやRoBERTaのような事前訓練済みのモデルから外れており、ノイズの多いテキストの劣化に敏感であることが示されている。さらに、複数の実世界のベンチマークでLSMのレジリエンスをテストする。最小限のプロンプトにより、LLMはGrammar Error Correction (GEC) と Lexical Semantic Change (LSC) のベンチマークタスクにおいて、新たな最先端を実現する。今後の研究を促進するために、私たちは、LLMと人間の補正出力の好みを記述したデータセットを、コードとともにリリースし、その結果を再現します。 Having a clean dataset has been the foundational assumption of most natural language processing (NLP) systems. However, properly written text is rarely found in real-world scenarios and hence, oftentimes invalidates the aforementioned foundational assumption. Recently, Large language models (LLMs) have shown impressive performance, but can they handle the inevitable noise in real-world data? This work tackles this critical question by investigating LLMs' resilience against morphological variations in text. To that end, we artificially introduce varying levels of noise into a diverse set of datasets and systematically evaluate LLMs' robustness against the corrupt variations of the original text. Our findings show that contrary to popular beliefs, generative LLMs are quiet robust to noisy perturbations in text. This is a departure from pre-trained models like BERT or RoBERTa whose performance has been shown to be sensitive to deteriorating noisy text. Additionally, we test LLMs' resilience on multiple real-world benchmarks that closely mimic commonly found errors in the wild. With minimal prompting, LLMs achieve a new state-of-the-art on the benchmark tasks of Grammar Error Correction (GEC) and Lexical Semantic Change (LSC). To empower future research, we also release a dataset annotated by humans stating their preference for LLM vs. human-corrected outputs along with the code to reproduce our results.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# メムリシブCIMとCAMを用いた2次元・3次元視覚のための動的ニューラルネットワーク Dynamic neural network with memristive CIM and CAM for 2D and 3D vision ( http://arxiv.org/abs/2407.08990v1 ) ライセンス: Link先を確認	Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo Wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu,	(参考訳) 脳はダイナミックで、連想的で、効率的です。入力と過去の経験を関連付けることで、メモリと処理を融合して再構成する。対照的に、AIモデルは静的であり、入力と過去の経験を関連付けることができず、物理的に分離されたメモリと処理を備えたデジタルコンピュータ上で実行される。メムリスタを用いたセマンティックメモリベースの動的ニューラルネットワーク(DNN)であるハードウェア・ソフトウェア共同設計を提案する。ネットワークは、受信したデータとセマンティックベクターとして格納された過去の経験を関連付ける。ネットワークとセマンティックメモリは,それぞれCIM(Computer-Addressable Memory)回路とCAM(Content-Addressable Memory)回路上に実装されている。我々は、40nmのmemristorマクロを用いて、MNISTとModelNetのデータセットから画像と3Dポイントを分類するResNetとPointNet++のコデザインを検証する。さらに、エネルギー消費の77.6%と93.3%を削減している。 The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# 効率的な量子化手法によるDNN話者検証モデルの最適化 Optimization of DNN-based speaker verification model through efficient quantization technique ( http://arxiv.org/abs/2407.08991v1 ) ライセンス: Link先を確認	Yeona Hong, Woo-Jin Chung, Hong-Goo Kang,	(参考訳) ディープニューラルネットワーク(Deep Neural Networks, DNN)は、音声検証を含む様々な分野で急速に進歩しているため、一般的には高い計算コストとかなりのメモリ消費を伴い、モバイルシステムでは管理が難しい。ディープモデルの量子化は、計算コストとメモリコストの両方を削減する手段を提供する。本研究では,話者検証モデルの定量化のための最適化フレームワークを提案する。事前学習話者検証モデルの各層における性能変化とモデルサイズ削減を解析することにより、モデルサイズを著しく低減しつつ、性能劣化を効果的に最小化することができる。我々の量子化アルゴリズムは、そのモデルサイズを著しく圧縮しつつ、最先端の事前訓練話者検証モデル ECAPATDNN の性能を維持するための最初の試みである。全体として、我々の量子化アプローチはモデルのサイズを半分に減らし、EERの増大は0.07%に制限された。 As Deep Neural Networks (DNNs) rapidly advance in various fields, including speech verification, they typically involve high computational costs and substantial memory consumption, which can be challenging to manage on mobile systems. Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of the speaker verification model. By analyzing performance changes and model size reductions in each layer of a pre-trained speaker verification model, we have effectively minimized performance degradation while significantly reducing the model size. Our quantization algorithm is the first attempt to maintain the performance of the state-of-the-art pre-trained speaker verification model, ECAPATDNN, while significantly compressing its model size. Overall, our quantization approach resulted in reducing the model size by half, with an increase in EER limited to 0.07%.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# 感情講演:心理的支援のための音声メッセージによる感情支援 Emotion Talk: Emotional Support via Audio Messages for Psychological Assistance ( http://arxiv.org/abs/2407.08992v1 ) ライセンス: Link先を確認	Fabrycio Leite Nakano Almada, Kauan Divino Pouso Mariano, Maykon Adriell Dutra, Victor Emanuel da Silva Monteiro,	(参考訳) 本稿では,心理的支援のための音声メッセージを通じて,継続的な感情的支援を提供するシステムである「感情講演」を提案する。主な目的は、音声メッセージを分析し、感情を検出し、適切な反応を生成することによって、従来のセラピーセッション以外の患者に一貫したサポートを提供することである。このソリューションはポルトガル語話者に焦点を合わせ、システムが言語的かつ文化的に関連があることを保証する。本システムはセラピストが行う心理的フォローアッププロセスを補完し、特に迅速な対応が不可欠な緊急時において、即時かつアクセス可能な支援を提供することを目的としている。実験により,提案システムの有効性を実証し,心理的サポートの適用の可能性を明らかにする。 This paper presents "Emotion Talk," a system designed to provide continuous emotional support through audio messages for psychological assistance. The primary objective is to offer consistent support to patients outside traditional therapy sessions by analyzing audio messages to detect emotions and generate appropriate responses. The solution focuses on Portuguese-speaking users, ensuring that the system is linguistically and culturally relevant. This system aims to complement and enhance the psychological follow-up process conducted by therapists, providing immediate and accessible assistance, especially in emergency situations where rapid response is crucial. Experimental results demonstrate the effectiveness of the proposed system, highlighting its potential in applications of psychological support.	翻訳日:2024-07-16 00:46:38 公開日:2024-07-12
# タスク駆動型単一画像による文書スキャンの超解像再構成 Task-driven single-image super-resolution reconstruction of document scans ( http://arxiv.org/abs/2407.08993v1 ) ライセンス: Link先を確認	Maciej Zyrek, Michal Kawulok,	(参考訳) 超分解能再構成は、低分解能観測から高分解能の画像を生成することを目的としている。ディープラーニングに根ざした最先端の超解像技術は、目立った視覚的品質の結果を得ることができるが、それらが特定のコンピュータビジョンアプリケーションにとって貴重な情報源であるかどうかはほとんど検証されていない。本稿では,文書スキャンから光学的文字認識を改善するために,超解像を前処理ステップとして活用する可能性を検討する。そこで本研究では,単一画像の超解像のための深層ネットワークをタスク駆動方式で訓練し,テキスト検出のための適応性を高めることを提案する。特定のタスクに限られる問題は重大な欠陥があるため、画像類似性によって導かれるテキスト検出に関連するコンポーネントを取り入れたマルチタスク損失関数を導入する。本稿では,文書画像のリアルタイム超解像化に向けた重要なステップであることを示す。 Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# グローバルアテンション誘導型デュアルドメイン・ポイント・クラウド特徴学習による分類とセグメンテーション Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation ( http://arxiv.org/abs/2407.08994v1 ) ライセンス: Link先を確認	Zihao Li, Pan Gao, Kang You, Chuan Yan, Manoranjan Paul,	(参考訳) 従来の研究では、ポイントクラウド分析タスクにおけるポイントベースニューラルモデルの有効性が実証されている。しかし, 原点座標に対する効率的な入力埋め込みを実現する上で, 依然として重要な課題が残っている。さらに、もう1つの問題は、ネットワークの幹において重要な要素である隣り合う集約の効率の制限にある。本稿では,上記の課題に対処するため,グローバルアテンション誘導型デュアルドメイン特徴学習ネットワーク(GAD)を提案する。我々はまず,グローバルアテンション機構を改良したCPTモジュールを考案し,その後のアグリゲーションのガイダンスとして機能するグローバル・アウェア・インプット・埋め込みを開発した。次に、Dual-domain K-nearest neighbor Feature Fusion (DKFF)をカスケードして、局所幾何学的関係と長距離意味的接続の両方を高く評価する新しい二重ドメイン特徴学習を通して効果的な特徴集約を行う。マルチポイントクラウド解析タスク(例えば分類,部分セグメンテーション,シーンセグメンテーション)における広範囲な実験により,提案手法の優れた性能と,考案したモジュールの有効性が示された。 Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Attention-guided Dual-domain Feature Learning network (GAD) to address the above-mentioned issues. We first devise the Contextual Position-enhanced Transformer (CPT) module, which is armed with an improved global attention mechanism, to produce a global-aware input embedding that serves as the guidance to subsequent aggregations. Then, the Dual-domain K-nearest neighbor Feature Fusion (DKFF) is cascaded to conduct effective feature aggregation through novel dual-domain feature learning which appreciates both local geometric relations and long-distance semantic connections. Extensive experiments on multiple point cloud analysis tasks (e.g., classification, part segmentation, and scene semantic segmentation) demonstrate the superior performance of the proposed method and the efficacy of the devised modules.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# セルフプロンプトチューニング: LLMでの自律的なロールプレイを可能にする Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs ( http://arxiv.org/abs/2407.08995v1 ) ライセンス: Link先を確認	Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Jiaming Zhou, Haoqin Sun,	(参考訳) LLMの最近の進歩は、異なる指示や文脈に基づいて、様々な役割の対話スタイルと認知過程を正確にシミュレートできる、目覚ましいロールプレイング能力を示してきた。研究は、LLMを専門家の役割に割り当てること、すなわちロールプレイプロンプトとして知られる戦略は、対応する領域におけるそれらのパフォーマンスを高めることを示唆している。しかし、プロンプトは、特定の専門知識と反復的な修正を必要とする、与えられた問題のために手動で設計する必要がある。この目的のために,LLM自体が微調整によってロールプレイプロンプトを生成するセルフ・プロンプト・チューニングを提案する。 LIMAデータセットを基本コーパスとして活用することにより、各データポイントにロールプレイプロンプトをアノテートするためにGPT-4を使用し、LIMA-Roleデータセットを作成する。 LIMA-Role上のLlama-2-7BやMistral-7Bのような微調整LDMを作製した。従って、自己プロンプト調整されたLSMは、任意の質問に対して専門家のロールプロンプトを自動的に生成することができる。我々は、広く使われているNLPベンチマークとオープンエンド質問テストに基づいて、自己プロンプト調整LPMを広範囲に評価した。実験結果から,LLMの自発チューニングは,ほとんどのデータセットにおいて,標準命令のチューニングベースラインよりも優れていたことが示唆された。このことは、LLMを自己プロンプトにするために微調整を利用する大きな可能性を強調し、複雑なプロンプト戦略を自動化する。データセット、モデル、コードは、この \href{https://anonymous.4open.science/r/Self-Prompt-Tuning-739E/}{url} でリリースします。 Recent advancements in LLMs have showcased their remarkable role-playing capabilities, able to accurately simulate the dialogue styles and cognitive processes of various roles based on different instructions and contexts. Studies indicate that assigning LLMs the roles of experts, a strategy known as role-play prompting, can enhance their performance in the corresponding domains. However, the prompt needs to be manually designed for the given problem, requiring certain expertise and iterative modifications. To this end, we propose self-prompt tuning, making LLMs themselves generate role-play prompts through fine-tuning. Leveraging the LIMA dataset as our foundational corpus, we employ GPT-4 to annotate role-play prompts for each data points, resulting in the creation of the LIMA-Role dataset. We then fine-tune LLMs like Llama-2-7B and Mistral-7B on LIMA-Role. Consequently, the self-prompt tuned LLMs can automatically generate expert role prompts for any given question. We extensively evaluate self-prompt tuned LLMs on widely used NLP benchmarks and open-ended question test. Our empirical results illustrate that self-prompt tuned LLMs outperform standard instruction tuned baselines across most datasets. This highlights the great potential of utilizing fine-tuning to enable LLMs to self-prompt, thereby automating complex prompting strategies. We release the dataset, models, and code at this \href{https://anonymous.4open.science/r/Self-Prompt-Tuning-739E/}{url}.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# 音源項による量子場理論の音響的アナローグ Acoustic Analogue for Quantum Field Theory with a Source term ( http://arxiv.org/abs/2407.08999v1 ) ライセンス: Link先を確認	Akshat Pandey,	(参考訳) 古典音源に結合したスカラー場に対する非相対論的流体類似モデルを提案する。一般的なアナログ重力モデルは、音響計量に結合されたフォノン場を含む。音響アナログの特殊相対性限界について研究する。流体系上の時間依存外部ポテンシャルを仮定することにより、スカラー場の原項をモデル化することができる。量子化の際には、ソースによるフォノン生成が研究される。 We propose a non-relativistic fluid analogue model for a scalar field coupled to a classical source. The generic analogue gravity model involves the phonon field which is coupled to the acoustic metric. We work in the special relativity limit of the acoustic analogue. By assuming a time dependent external potential on the fluid system, we are able to model a source term for the scalar field. Upon quantisation, phonon creation due to the source is studied.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# 大規模言語モデルによる株価変動予測の強化 Enhancing Few-Shot Stock Trend Prediction with Large Language Models ( http://arxiv.org/abs/2407.09003v1 ) ライセンス: Link先を確認	Yiqi Deng, Xingwei He, Jiahao Hu, Siu-Ming Yiu,	(参考訳) 株価トレンド予測の目標は、情報投資決定のための将来の市場の動きを予測することである。既存の手法は主に、広範囲な注釈付きデータに基づいてトレーニングされた教師付きモデルによる株価トレンドの予測に重点を置いている。しかし、人間のアノテーションはリソース集約的であり、注釈付きデータは簡単には利用できない。 LLM(Large Language Models)の印象的な数ショット機能に触発されて,ラベル付きデータの不足を克服し,投資家にとってより有益な予測を行うために,LLMを数ショット設定で使用することを提案する。従来は複数の金融ニュースを合併して株価トレンドを予測し,(1)合併ニュースにはノイズが伴い,(2)LLMの入力限界を超え,性能が低下する2つの重大な問題を引き起こしていた。これらの問題を克服するため、我々は2段階の「デノベーション・then-voting」手法を提案する。具体的には、「関連性」カテゴリーを導入し、統合ニュースの代わりに個別ニュースの株価トレンドを予測する。次に、多数決投票を用いてこれらの予測を集計する。提案手法は, ノイズのあるニュースを無関係に分類し, 最終予測への影響を除去する2つの利点を提供する。 2)個人ニュースの予測はLLMの入力長制限を緩和する。本手法は,S&P500の66.59%,CSI-100の62.17%,HK株の61.17%の精度を達成し,標準の少数ショットの約7%,4%,4%を上回った。さらに,提案手法は最先端の教師付き手法と同等に動作する。 The goal of stock trend prediction is to forecast future market movements for informed investment decisions. Existing methods mostly focus on predicting stock trends with supervised models trained on extensive annotated data. However, human annotation can be resource-intensive and the annotated data are not readily available. Inspired by the impressive few-shot capability of Large Language Models (LLMs), we propose using LLMs in a few-shot setting to overcome the scarcity of labeled data and make prediction more feasible to investors. Previous works typically merge multiple financial news for predicting stock trends, causing two significant problems when using LLMs: (1) Merged news contains noise, and (2) it may exceed LLMs' input limits, leading to performance degradation. To overcome these issues, we propose a two-step method 'denoising-then-voting'. Specifically, we introduce an `Irrelevant' category, and predict stock trends for individual news instead of merged news. Then we aggregate these predictions using majority voting. The proposed method offers two advantages: (1) Classifying noisy news as irrelevant removes its impact on the final prediction. (2) Predicting for individual news mitigates LLMs' input length limits. Our method achieves 66.59% accuracy in S&P 500, 62.17% in CSI-100, and 61.17% in HK stock prediction, outperforming the standard few-shot counterparts by around 7%, 4%, and 4%. Furthermore, our proposed method performs on par with state-of-the-art supervised methods.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# プライバシ保護型協調ゲノム研究 : 実生活の展開とビジョン Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision ( http://arxiv.org/abs/2407.09004v1 ) ライセンス: Link先を確認	Zahra Rahmani, Nahal Shahini, Nadav Gat, Zebin Yun, Yuzhou Jiang, Ofir Farchy, Yaniv Harel, Vipin Chaudhary, Mahmood Sharif, Erman Ayday,	(参考訳) データ革命は、医療セクターにとって大きな可能性を秘めている。個人から収集された膨大なデータは、知識、AIモデル、予測システム、ベストプラクティスに変換される。健康分野の1つにゲノム領域がある。 AI、機械学習、データサイエンスの進歩は、ゲノム研究の新しい機会を開き、パーソナライズドメディカルのブレークスルーを約束している。しかし、プライバシーとサイバーセキュリティに対する意識の高まりは、協調研究において機密データを保護するための堅牢なソリューションを必要としている。本稿では、健康データコラボレーションのためのプラットフォームであるLynx.MDと共同で開発された、ゲノム研究のためのプライバシ保護フレームワークの実践的展開について述べる。このフレームワークは、重要なサイバーセキュリティとプライバシの課題に対処し、データ漏洩に伴うリスクを軽減しつつ、プライバシ保護によるゲノムデータの共有と分析を可能にする。高度なプライバシ保護アルゴリズムを統合することで、このソリューションは、データユーティリティを損なうことなく、個々のプライバシを保護する。このシステムのユニークな特徴は、データ共有とプライバシのトレードオフのバランスをとる能力であり、ステークホルダーがプライバシのリスクを定量化し、情報的な決定を行うためのツールを提供する。 Lynx.MD内でのフレームワークの実装には、ゲノムデータをバイナリ形式に符号化し、制御された摂動技術を通じてノイズを適用することが含まれる。このアプローチはデータの本質的な統計特性を保ち、効果的な研究と分析を容易にする。さらに、リアルタイムデータ監視と高度な可視化ツールが組み込まれ、ユーザエクスペリエンスと意思決定が向上する。この論文は、ゲノムデータに特有のプライバシー攻撃と防衛の必要性を強調している。これらの課題に対処することで、ゲノム研究の協力が促進され、パーソナライズされた医療と公衆衛生が推進される。 The data revolution holds significant promise for the health sector. Vast amounts of data collected from individuals will be transformed into knowledge, AI models, predictive systems, and best practices. One area of health that stands to benefit greatly is the genomic domain. Progress in AI, machine learning, and data science has opened new opportunities for genomic research, promising breakthroughs in personalized medicine. However, increasing awareness of privacy and cybersecurity necessitates robust solutions to protect sensitive data in collaborative research. This paper presents a practical deployment of a privacy-preserving framework for genomic research, developed in collaboration with Lynx.MD, a platform for secure health data collaboration. The framework addresses critical cybersecurity and privacy challenges, enabling the privacy-preserving sharing and analysis of genomic data while mitigating risks associated with data breaches. By integrating advanced privacy-preserving algorithms, the solution ensures the protection of individual privacy without compromising data utility. A unique feature of the system is its ability to balance trade-offs between data sharing and privacy, providing stakeholders tools to quantify privacy risks and make informed decisions. Implementing the framework within Lynx.MD involves encoding genomic data into binary formats and applying noise through controlled perturbation techniques. This approach preserves essential statistical properties of the data, facilitating effective research and analysis. Moreover, the system incorporates real-time data monitoring and advanced visualization tools, enhancing user experience and decision-making. The paper highlights the need for tailored privacy attacks and defenses specific to genomic data. Addressing these challenges fosters collaboration in genomic research, advancing personalized medicine and public health.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# VaDAの導入:新しいデータセットを用いた海上物体分割のための新しい画像分割モデル Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset ( http://arxiv.org/abs/2407.09005v1 ) ライセンス: Link先を確認	Yongjin Kim, Jinbum Park, Sanha Kang, Hanguen Kim,	(参考訳) 海上輸送産業は、コンピュータビジョン人工知能(AI)の進歩によって急速に進化している。その結果、海上輸送のためのAIベースの物体認識モデルの研究は着実に増加しており、センサー技術とコンピュータ性能の進歩を活用している。しかし、海洋環境における物体認識は、光の反射、干渉、激しい照明、様々な気象条件といった課題に直面している。これらの課題に対処するためには、海洋画像に適した高性能ディープラーニングアルゴリズムと海洋シーンに特化した高品質データセットが不可欠である。既存のAI認識モデルとデータセットは、自律ナビゲーションシステムを構成するのに限定的に適している。そこで本稿では,海洋オブジェクトセグメンテーションのためのVaDAモデルと新たなモデル評価手法であるIFCP(Integrated Figure of Compute Performance)を提案する。さらに、様々な海洋環境におけるモデルパフォーマンス評価を標準化するために、ベンチマーク海事データセットOASI(Ocean AI Segmentation Initiatives)を導入する。 OASIsデータセットと詳細は、私たちのWebサイトにある。 The maritime shipping industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light reflection, interference, intense lighting, and various weather conditions. To address these challenges, high-performance deep learning algorithms tailored to maritime imagery and high-quality datasets specialized for maritime scenes are essential. Existing AI recognition models and datasets have limited suitability for composing autonomous navigation systems. Therefore, in this paper, we propose a Vertical and Detail Attention (VaDA) model for maritime object segmentation and a new model evaluation method, the Integrated Figure of Calculation Performance (IFCP), to verify its suitability for the system in real-time. Additionally, we introduce a benchmark maritime dataset, OASIs (Ocean AI Segmentation Initiatives) to standardize model performance evaluation across diverse maritime environments. OASIs dataset and details are available at our website: https://www.navlue.com/dataset	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# ベンチマーク言語モデルの創造性:コード生成のケーススタディ Benchmarking Language Model Creativity: A Case Study on Code Generation ( http://arxiv.org/abs/2407.09007v1 ) ライセンス: Link先を確認	Yining Lu, Dixuan Wang, Tianjian Li, Dongwei Jiang, Daniel Khashabi,	(参考訳) LLMが普及するにつれて、これらのモデルがいかに「創造的」であるかを考えることは興味深い。認知科学では、創造性は少なくとも2つの重要な特徴から構成される: \emph{convergent} 思考(与えられた目標を達成するための目的性)と \emph{divergent} 思考(新しい環境や制約への適応性) \citep{runco2003 critical} である。本稿では,2つの特徴を取り入れたLCM創造性を定量化する枠組みを提案する。本研究の成果は,1) 従来のソリューションに新たな制約を段階的に課すことにより, LLM がより創造的な解決策を導き出すように促すとともに, LLM が新たな戦略を採用するよう説得すること,2) LLM が生成した創造的応答における収束的思考と発散的思考の両方を考察するNeoGauge メトリクスの定義と計算によって達成される。我々は,人間のコーディングソリューションを収集する自然なデータソースであるCodeforces問題に対して,提案したフレームワークを適用した。さまざまなプロプライエタリなオープンソースモデルに対してNeoGaugeを定量化し、最も創造的なモデルであるGPT-4でさえ、人間のような創造性を実証するに足りていないことを発見した。また、先進的推論戦略(MCTS、自己補正など)も試行し、創造性に大きな改善は見つからない。分析の副産物として、将来のモデルで結果を再現するためのNeoCoderデータセットをリリースします。 As LLMs become increasingly prevalent, it is interesting to consider how ``creative'' these models can be. From cognitive science, creativity consists of at least two key characteristics: \emph{convergent} thinking (purposefulness to achieve a given goal) and \emph{divergent} thinking (adaptability to new environments or constraints) \citep{runco2003critical}. In this work, we introduce a framework for quantifying LLM creativity that incorporates the two characteristics. This is achieved by (1) Denial Prompting pushes LLMs to come up with more creative solutions to a given problem by incrementally imposing new constraints on the previous solution, compelling LLMs to adopt new strategies, and (2) defining and computing the NeoGauge metric which examines both convergent and divergent thinking in the generated creative responses by LLMs. We apply the proposed framework on Codeforces problems, a natural data source for collecting human coding solutions. We quantify NeoGauge for various proprietary and open-source models and find that even the most creative model, GPT-4, still falls short of demonstrating human-like creativity. We also experiment with advanced reasoning strategies (MCTS, self-correction, etc.) and observe no significant improvement in creativity. As a by-product of our analysis, we release NeoCoder dataset for reproducing our results on future models.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# 1羽, 4羽の鳥:教師付きコントラスト学習を用いたQAシステムの総合的解法 One Stone, Four Birds: A Comprehensive Solution for QA System Using Supervised Contrastive Learning ( http://arxiv.org/abs/2407.09011v1 ) ライセンス: Link先を確認	Bo Wang, Tsunenori Mine,	(参考訳) 本稿では,教師付きコントラスト学習(SCL)による質問応答(QA)システムの堅牢性と効率性を両立させる,新しい総合的ソリューションを提案する。事前訓練された言語モデルでは、少量のデータと単純な微調整を必要とせず、高性能なQAシステムのトレーニングが簡単になっている。しかし、近年の進歩にもかかわらず、既存のQAシステムは機能や訓練効率に重大な欠陥をみせている。ユーザ入力意図分類、ドメイン外入力検出、新しい意図発見、継続学習の4つの重要なタスクを定義することで、機能問題に対処する。次に,SCLをベースとした表現学習手法を活用し,クラス内およびクラス間分散特徴空間を効率的に構築し,既知の意図分類と未知の意図検出と発見を容易にする。その結果、下流タスクに最小限のチューニングを施すことで、モデル効率を大幅に改善し、全てのタスクにまたがる新しい最先端パフォーマンスを実現することができる。 This paper presents a novel and comprehensive solution to enhance both the robustness and efficiency of question answering (QA) systems through supervised contrastive learning (SCL). Training a high-performance QA system has become straightforward with pre-trained language models, requiring only a small amount of data and simple fine-tuning. However, despite recent advances, existing QA systems still exhibit significant deficiencies in functionality and training efficiency. We address the functionality issue by defining four key tasks: user input intent classification, out-of-domain input detection, new intent discovery, and continual learning. We then leverage a unified SCL-based representation learning method to efficiently build an intra-class compact and inter-class scattered feature space, facilitating both known intent classification and unknown intent detection and discovery. Consequently, with minimal additional tuning on downstream tasks, our approach significantly improves model efficiency and achieves new state-of-the-art performance across all tasks.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# TCAN:拡散モデルを用いた時間的視点誘導による人物画像のアニメーション TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models ( http://arxiv.org/abs/2407.09012v1 ) ライセンス: Link先を確認	Jeongho Kim, Min-Jung Kim, Junsoo Lee, Jaegul Choo,	(参考訳) ポーズ駆動型人像アニメーション拡散モデルは、リアルな人間のビデオ合成において顕著な能力を示した。従来のアプローチによる有望な結果にもかかわらず、時間的に一貫したアニメーションの実現と、市販のポーズ検出器による堅牢性の確保には課題が続いている。本稿では,ポーズ駆動型人間画像アニメーション法であるTCANについて述べる。従来の手法とは対照的に,事前学習したControlNetを微調整なしで利用し,多数のポーズ・イメージ・ペアから取得した膨大な知識を活用する。 ControlNetを凍結に保つために、LoRAをUNet層に適応させ、ポーズと外観の特徴の間に潜伏した空間を調整できるようにします。さらに、ControlNetに追加の時間層を導入することで、ポーズ検出器の外れ値に対する堅牢性を高める。時間軸上のアテンションマップの解析を通じて、ポーズ情報を利用した新しい温度マップを設計し、より静的な背景を実現する。大規模な実験により,チビのような様々なポーズを含む映像合成タスクにおいて,提案手法が期待できる結果が得られることが示された。プロジェクトページ: https://eccv2024tcan.github.io/ Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to erroneous poses and consistent over time. In contrast to previous methods, we utilize the pre-trained ControlNet without fine-tuning to leverage its extensive pre-acquired knowledge from numerous pose-image-caption pairs. To keep the ControlNet frozen, we adapt LoRA to the UNet layers, enabling the network to align the latent space between the pose and appearance features. Additionally, by introducing an additional temporal layer to the ControlNet, we enhance robustness against outliers of the pose detector. Through the analysis of attention maps over the temporal axis, we also designed a novel temperature map leveraging pose information, allowing for a more static background. Extensive experiments demonstrate that the proposed method can achieve promising results in video synthesis tasks encompassing various poses, like chibi. Project Page: https://eccv2024tcan.github.io/	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# 生成人工知能による手続き的コンテンツ生成 Procedural Content Generation via Generative Artificial Intelligence ( http://arxiv.org/abs/2407.09013v1 ) ライセンス: Link先を確認	Xinyu Mao, Wanli Yu, Kazunori D Yamada, Michael R. Zielewski,	(参考訳) PCGで機械学習を活用する試みは過去にも行われてきた。そこで本研究では,2010年代中盤に注目が集まってきた生成人工知能(AI)がPCGにどのように利用されているかを検討する。我々は、地形、アイテム、さらにはストーリーラインを含む様々なタイプのコンテンツを作成するための生成AIの応用についてレビューする。生成AIはPCGに有効だが、それが直面する重要な問題は、高性能生成AIの構築には膨大なトレーニングデータが必要であることだ。コンテンツは一般的に高度にカスタマイズされているため、ドメイン固有のトレーニングデータは少なく、生成AIモデルへの直接的なアプローチはうまく機能しないかもしれない。 PCG研究をさらに進めるためには、限られたトレーニングデータに関連する問題を克服する必要がある。このように、限られたトレーニングデータによってもたらされる課題に対処する研究についても、特別に検討する。 The attempt to utilize machine learning in PCG has been made in the past. In this survey paper, we investigate how generative artificial intelligence (AI), which saw a significant increase in interest in the mid-2010s, is being used for PCG. We review applications of generative AI for the creation of various types of content, including terrains, items, and even storylines. While generative AI is effective for PCG, one significant issues it faces is that building high-performance generative AI requires vast amounts of training data. Because content generally highly customized, domain-specific training data is scarce, and straightforward approaches to generative AI models may not work well. For PCG research to advance further, issues related to limited training data must be overcome. Thus, we also give special consideration to research that addresses the challenges posed by limited training data.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# CompAct: 質問応答のために検索した文書をアクティブに圧縮する CompAct: Compressing Retrieved Documents Actively for Question Answering ( http://arxiv.org/abs/2407.09014v1 ) ライセンス: Link先を確認	Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, Jaewoo Kang,	(参考訳) Retrieval-augmented Generationは、言語モデルをサポートし、外部コンテキストを提供することで、実際の基盤を強化する。しかし、言語モデルは、広範囲な情報を与えるとしばしば課題に直面し、問題の解決においての有効性を低下させる。コンテキスト圧縮は、無関係な情報をフィルタリングすることでこの問題に対処するが、現在の手法は、単一ステップのアプローチで重要な情報をキャプチャできない現実的なシナリオで依然として苦労している。この制限を克服するために、キー情報を失うことなく広範囲の文書を凝縮するアクティブな戦略を取り入れた新しいフレームワークCompActを紹介する。本実験は,マルチホップ質問応答(QA)ベンチマークにおいて,CompActが性能と圧縮速度の両方に大幅な改善をもたらすことを示した。 CompActは、様々なオフザシェルフレトリバーやリーダーを備えたコスト効率のよいプラグインモジュールとして柔軟に動作し、非常に高い圧縮率(47倍)を達成する。 Retrieval-augmented generation supports language models to strengthen their factual groundings by providing external contexts. However, language models often face challenges when given extensive information, diminishing their effectiveness in solving questions. Context compression tackles this issue by filtering out irrelevant information, but current methods still struggle in realistic scenarios where crucial information cannot be captured with a single-step approach. To overcome this limitation, we introduce CompAct, a novel framework that employs an active strategy to condense extensive documents without losing key information. Our experiments demonstrate that CompAct brings significant improvements in both performance and compression rate on multi-hop question-answering (QA) benchmarks. CompAct flexibly operates as a cost-efficient plug-in module with various off-the-shelf retrievers or readers, achieving exceptionally high compression rates (47x).	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# ブールネットワークを用いた論理プログラムの静的解析 Static Analysis of Logic Programs via Boolean Networks ( http://arxiv.org/abs/2407.09015v1 ) ライセンス: Link先を確認	Van-Giang Trinh, Belaid Benhamou,	(参考訳) 解答集合プログラミング(Answer Set Programming, ASP)は、仮定された問題の解に対応する安定モデルを持つ論理プログラムとして組合せ問題の符号化に使用できる宣言的問題解決パラダイムである。 ASPはAIなどのさまざまな領域に広く適用されています。静的情報から論理プログラムの安定モデルについて何が言えるのか?」という疑問が研究され、多くの状況で有用であることが証明されている。本研究では,論理プログラムとBooleanネットワークを接続させることにより,この方向をさらに深く掘り下げる。提案されたコネクションは、Booleanネットワークの静的解析に関する豊富な歴史に既存の結果をもたらし、ASP.NETの静的解析をさらに研究するための統一的で強力なツールとなる。特に、新しく得られた洞察は、ASP.NETの分野で多くの問題を解決する可能性がある。 Answer Set Programming (ASP) is a declarative problem solving paradigm that can be used to encode a combinatorial problem as a logic program whose stable models correspond to the solutions of the considered problem. ASP has been widely applied to various domains in AI and beyond. The question "What can be said about stable models of a logic program from its static information?" has been investigated and proved useful in many circumstances. In this work, we dive into this direction more deeply by making the connection between a logic program and a Boolean network, which is a prominent modeling framework with applications to various areas. The proposed connection can bring the existing results in the rich history on static analysis of Boolean networks to explore and prove more theoretical results on ASP, making it become a unified and powerful tool to further study the static analysis of ASP. In particular, the newly obtained insights have the potential to benefit many problems in the field of ASP.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# Microsoft Copilotによるセキュリティ運用センターのためのAI駆動ガイド応答 AI-Driven Guided Response for Security Operation Centers with Microsoft Copilot for Security ( http://arxiv.org/abs/2407.09017v1 ) ライセンス: Link先を確認	Scott Freitas, Jovan Kalajdjieski, Amir Gharib, Rob McCann,	(参考訳) セキュリティオペレーションセンターは、単純なものから非常に複雑なものまで、セキュリティインシデントの絶え間ないストリームと競合する。この問題を解決するために、業界規模のMLアーキテクチャであるCopilot Guided Response(CGR)を開発した。これは、(1)類似のインシデントを特定することによって、セキュリティアナリストを調査、必須の歴史的コンテキストを提供する、(2)真のポジティブ、偽陽性、良心的ポジティブ、(3)修正された封じ込めアクションを推奨する、という3つの重要なタスクにわたって、セキュリティアナリストをガイドするものだ。 CGRはMicrosoft Defender XDR製品に統合され、世界中でデプロイされ、何千もの顧客に対して数百万のレコメンデーションを生成する。内部評価、セキュリティ専門家とのコラボレーション、顧客からのフィードバックを取り入れた大規模な評価は、CGRが3つのタスクすべてにわたって高品質なレコメンデーションを提供することを示すものです。我々は、CGRアーキテクチャの概要を包括的に紹介し、このような詳細でこれらの機能をオープンに議論した最初のサイバーセキュリティ企業として、先例を定めている。さらに、現実のセキュリティインシデントに関する最大の公開コレクションであるGUIDEは、100万件の注釈付きインシデントにまたがる13万件のエビデンスにまたがっています。研究者や実践者が現実世界のデータの研究を行うことで、GUIDEはサイバーセキュリティの状態を前進させ、次世代の機械学習システムの開発をサポートする。 Security operation centers contend with a constant stream of security incidents, ranging from straightforward to highly complex. To address this, we developed Copilot Guided Response (CGR), an industry-scale ML architecture that guides security analysts across three key tasks -- (1) investigation, providing essential historical context by identifying similar incidents; (2) triaging to ascertain the nature of the incident -- whether it is a true positive, false positive, or benign positive; and (3) remediation, recommending tailored containment actions. CGR is integrated into the Microsoft Defender XDR product and deployed worldwide, generating millions of recommendations across thousands of customers. Our extensive evaluation, incorporating internal evaluation, collaboration with security experts, and customer feedback, demonstrates that CGR delivers high-quality recommendations across all three tasks. We provide a comprehensive overview of the CGR architecture, setting a precedent as the first cybersecurity company to openly discuss these capabilities in such depth. Additionally, we GUIDE, the largest public collection of real-world security incidents, spanning 13M evidences across 1M annotated incidents. By enabling researchers and practitioners to conduct research on real-world data, GUIDE advances the state of cybersecurity and supports the development of next-generation machine learning systems.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# AUITestAgent: 自動要件指向GUI機能テスト AUITestAgent: Automatic Requirements Oriented GUI Function Testing ( http://arxiv.org/abs/2407.09018v1 ) ライセンス: Link先を確認	Yongxiang Hu, Xuan Wang, Yingchuan Wang, Yu Zhang, Shiyu Guo, Chaoyi Chen, Xin Wang, Yangfan Zhou,	(参考訳) Graphical User Interface (GUI)は、ユーザがモバイルアプリと対話する方法である。適切に機能するためには、テストエンジニアは、通常自然言語で書かれたテスト要件に基づいて、意図した通りに機能するようにする必要がある。広く採用されている手動テストとスクリプトベースの手法は有効であるが、モダンなモバイルアプリではGUIページの多さと迅速なイテレーションのため、かなりの努力を要する。本稿では,モバイル向け初の自動自然言語駆動型GUIテストツールであるAUITestAgentについて紹介する。テスト要件は通常、インタラクションコマンドと検証オラクルを含む。 AUITestAgentは動的に整理されたエージェントを介してテスト要件からGUIインタラクションを抽出できる。次に、AUITestAgentは多次元データ抽出戦略を使用して、インタラクショントレースからテスト要件に関連するデータを検索し、検証を行う。カスタマイズされたベンチマークの実験では、AUITestAgentは生成されたGUIインタラクションの品質で既存のツールよりも優れており、検証の精度は94%に達した。さらに、Meituanのフィールドデプロイメントでは、AUITestAgentの実用的ユーザビリティが示されており、2ヶ月で10回の回帰テスト中に4つの新しい機能バグを検出する。 The Graphical User Interface (GUI) is how users interact with mobile apps. To ensure it functions properly, testing engineers have to make sure it functions as intended, based on test requirements that are typically written in natural language. While widely adopted manual testing and script-based methods are effective, they demand substantial effort due to the vast number of GUI pages and rapid iterations in modern mobile apps. This paper introduces AUITestAgent, the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification. Since test requirements typically contain interaction commands and verification oracles. AUITestAgent can extract GUI interactions from test requirements via dynamically organized agents. Then, AUITestAgent employs a multi-dimensional data extraction strategy to retrieve data relevant to the test requirements from the interaction trace and perform verification. Experiments on customized benchmarks demonstrate that AUITestAgent outperforms existing tools in the quality of generated GUI interactions and achieved the accuracy of verifications of 94%. Moreover, field deployment in Meituan has shown AUITestAgent's practical usability, with it detecting 4 new functional bugs during 10 regression tests in two months.	翻訳日:2024-07-16 00:36:46 公開日:2024-07-12
# ソーシャルメディア上での解釈型抑うつ検出のためのプロンプト学習を用いた不均質なサブグラフネットワーク Heterogeneous Subgraph Network with Prompt Learning for Interpretable Depression Detection on Social Media ( http://arxiv.org/abs/2407.09019v1 ) ライセンス: Link先を確認	Chen Chen, Mingwei Li, Fenghuan Li, Haopeng Chen, Yuankun Lin,	(参考訳) ソーシャルメディアの膨大なデータは、人々の真正な思考、感情、コミュニケーションなどを反映し、うつ病などの精神疾患の早期発見のために分析することができる。ソーシャルメディアにおける初期うつ病検出に関する既存の研究は、解釈可能性に欠け、ソーシャルメディアデータの異質性を無視した。さらに、ユーザ間のグローバルなインタラクションも見落としていた。これらの課題に対処するために,不均質なサブグラフネットワークとPrompt Learning(HSNPL)とコントラスト学習機構を活用した新しい手法を開発した。具体的には、ユーザの暗黙的な心理的シンボルを解釈しやすくマッピングするために、迅速な学習が使用され、深い意味と多様な行動特徴が異種情報ネットワークに組み込まれている。そして、二重注意機構を有する異種グラフネットワークを構築し、特徴レベルにおける異種社会情報間の関係をモデル化する。さらに、ユーザレベルでのユーザとグループ間の複雑な相互作用を探索するために、サブグラフ注意と自己教師付きコントラスト学習を統合した異種サブグラフネットワークを開発した。その結果,提案手法はソーシャルメディア上での抑うつ検出の最先端手法よりも優れていた。 Massive social media data can reflect people's authentic thoughts, emotions, communication, etc., and therefore can be analyzed for early detection of mental health problems such as depression. Existing works about early depression detection on social media lacked interpretability and neglected the heterogeneity of social media data. Furthermore, they overlooked the global interaction among users. To address these issues, we develop a novel method that leverages a Heterogeneous Subgraph Network with Prompt Learning(HSNPL) and contrastive learning mechanisms. Specifically, prompt learning is employed to map users' implicit psychological symbols with excellent interpretability while deep semantic and diverse behavioral features are incorporated by a heterogeneous information network. Then, the heterogeneous graph network with a dual attention mechanism is constructed to model the relationships among heterogeneous social information at the feature level. Furthermore, the heterogeneous subgraph network integrating subgraph attention and self-supervised contrastive learning is developed to explore complicated interactions among users and groups at the user level. Extensive experimental results demonstrate that our proposed method significantly outperforms state-of-the-art methods for depression detection on social media.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# 3M-Health:メンタルヘルス検出のためのマルチモーダルマルチテラー知識蒸留 3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection ( http://arxiv.org/abs/2407.09020v1 ) ライセンス: Link先を確認	Rina Carines Cabral, Siwen Luo, Soyeon Caren Han, Josiah Poon,	(参考訳) メンタルヘルスの分類の重要性は現代社会において最重要であり、デジタルプラットフォームは個人の健康をモニタリングするための重要な情報源となっている。しかし、既存のソーシャルメディアのメンタルヘルスデータセットは、主にテキストのみのサンプルで構成されており、そのようなデータに基づいてトレーニングされたモデルの有効性を制限する可能性がある。人間は複雑な状況や問題を理解するために横断的な情報を活用することを認識して、現在の方法論の限界に対処するための新しいアプローチを提案する。本研究では, メンタルヘルス分類のためのマルチモーダル・マルチモーダル知識蒸留モデルを提案する。多様な特徴を統合するための単純な結合にしばしば依存する従来のアプローチとは異なり、我々のモデルは様々な性質(例えばテキストや音)の入力を適切に表現するという課題に対処する。すべての機能をひとつのモデルに統合する際の計算複雑性を軽減するために,マルチモーダル・マルチ教師アーキテクチャを採用する。複数の教員にまたがって学習過程を分散し、それぞれが特定の特徴抽出の側面を特化することにより、メンタルヘルスの全体的分類性能を向上させる。実験により,性能向上のためのモデルの有効性を実証した。関連するすべてのコードは、出版時に利用可能になる。 The significance of mental health classification is paramount in contemporary society, where digital platforms serve as crucial sources for monitoring individuals' well-being. However, existing social media mental health datasets primarily consist of text-only samples, potentially limiting the efficacy of models trained on such data. Recognising that humans utilise cross-modal information to comprehend complex situations or issues, we present a novel approach to address the limitations of current methodologies. In this work, we introduce a Multimodal and Multi-Teacher Knowledge Distillation model for Mental Health Classification, leveraging insights from cross-modal human understanding. Unlike conventional approaches that often rely on simple concatenation to integrate diverse features, our model addresses the challenge of appropriately representing inputs of varying natures (e.g., texts and sounds). To mitigate the computational complexity associated with integrating all features into a single model, we employ a multimodal and multi-teacher architecture. By distributing the learning process across multiple teachers, each specialising in a particular feature extraction aspect, we enhance the overall mental health classification performance. Through experimental validation, we demonstrate the efficacy of our model in achieving improved performance. All relevant codes will be made available upon publication.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# 効率的な連続制御のためのQ関数付き拡散挙動の調整 Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control ( http://arxiv.org/abs/2407.09024v1 ) ライセンス: Link先を確認	Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu,	(参考訳) 言語モデルアライメントの最近の進歩に基づき、オフライン強化学習を2段階最適化問題として定式化します。まず、報酬のない行動データセットに対して表現豊かな生成ポリシーを事前訓練し、次に、これらのポリシーをQ値のようなタスク固有のアノテーションに合わせるように微調整します。この戦略により、多種多様な行動データを活用し、一般化を強化し、最小限のアノテーションを使って下流タスクへの迅速な適応を可能にする。特に,連続制御問題を解くための効率的な拡散アライメント(EDA)を導入する。 EDAは拡散モデルを用いて行動モデリングを行う。しかし、従来のアプローチとは異なり、我々は拡散ポリシーを行動入力に対するスカラーニューラルネットワークの微分として表現する。この表現は拡散モデルの直接密度計算を可能にするため、既存のLLMアライメント理論と互換性がある。ポリシーの微調整中に、直接優先度最適化(DPO)のような嗜好に基づくアライメント手法を拡張して、拡散挙動を連続的なQ-関数と整合させる。 D4RL ベンチマークによる評価の結果,EDA は全体の性能においてすべての基準手法を超越していることがわかった。特に、EDAは95%程度のパフォーマンスを維持し、微調整中にQラベル付きデータのわずか1倍の精度でいくつかのベースラインを上回ります。 Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generalization and enable rapid adaptation to downstream tasks using minimal annotations. In particular, we introduce Efficient Diffusion Alignment (EDA) for solving continuous control problems. EDA utilizes diffusion models for behavior modeling. However, unlike previous approaches, we represent diffusion policies as the derivative of a scalar neural network with respect to action inputs. This representation is critical because it enables direct density calculation for diffusion models, making them compatible with existing LLM alignment theories. During policy fine-tuning, we extend preference-based alignment methods like Direct Preference Optimization (DPO) to align diffusion behaviors with continuous Q-functions. Our evaluation on the D4RL benchmark shows that EDA exceeds all baseline methods in overall performance. Notably, EDA maintains about 95\% of performance and still outperforms several baselines given only 1\% of Q-labelled data during fine-tuning.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# SpreadsheetLLM: 大規模言語モデルのためのスプレッドシートのエンコード SpreadsheetLLM: Encoding Spreadsheets for Large Language Models ( http://arxiv.org/abs/2407.09025v1 ) ライセンス: Link先を確認	Yuzhang Tian, Jianbo Zhao, Haoyu Dong, Junyu Xiong, Shiyu Xia, Mengyu Zhou, Yun Lin, José Cambronero, Yeye He, Shi Han, Dongmei Zhang,	(参考訳) 広範な2次元グリッド、様々なレイアウト、多様なフォーマットオプションを備えたスプレッドシートは、大きな言語モデル(LLM)において顕著な課題を提示する。そこで我々は,スプレッドシート上でのLLMの強力な理解と推論能力の解放と最適化を目的とした,効率的な符号化手法であるSpreadsheetLLMを紹介した。まず、セルアドレス、値、フォーマットを組み込んだバニラシリアライズ手法を提案する。しかし、このアプローチはLLMのトークン制約によって制限され、ほとんどのアプリケーションでは実用的ではない。この課題に対処するために,LLMのスプレッドシートを効果的に圧縮する革新的な符号化フレームワークである SheetCompressor を開発した。構造アンカーベースの圧縮、逆インデックス変換、データフォーマット対応アグリゲーションの3つのモジュールで構成されている。これはスプレッドシートテーブル検出タスクのパフォーマンスを大幅に改善し、GPT4のコンテキスト内学習環境ではバニラアプローチを25.6%上回った。さらに、シート圧縮機を用いた微調整LDMの圧縮率は平均25倍であるが、最先端の78.9%のF1スコアを達成し、既存のモデルでは12.3%を上回っている。最後に、スプレッドシート理解の下流タスクのためのスプレッドシートのチェーンを提案し、新しい要求のスプレッドシートQAタスクで検証する。我々はスプレッドシートのレイアウトと構造を手法的に利用し、スプレッドシートLLMが様々なスプレッドシートタスクにおいて極めて有効であることを示す。 Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs' token constraints, making it impractical for most applications. To tackle this challenge, we develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approach by 25.6% in GPT4's in-context learning setting. Moreover, fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, but achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%. Finally, we propose Chain of Spreadsheet for downstream tasks of spreadsheet understanding and validate in a new and demanding spreadsheet QA task. We methodically leverage the inherent layout and structure of spreadsheets, demonstrating that SpreadsheetLLM is highly effective across a variety of spreadsheet tasks.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# HPC: ボリュームビデオのための階層的プログレッシブコーディングフレームワーク HPC: Hierarchical Progressive Coding Framework for Volumetric Video ( http://arxiv.org/abs/2407.09026v1 ) ライセンス: Link先を確認	Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang,	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)に基づくボリュームビデオは、様々な3Dアプリケーションにとって大きな可能性を秘めている。現在のNeRF圧縮は、様々なネットワークとデバイス容量のための単一のモデル内でビデオ品質とビットレートを調整する柔軟性に欠ける。これらの問題に対処するために,HPCを提案する。HPCは,単一のモデルを用いて可変ビットレートを実現する新しい階層的なプログレッシブボリュームビデオ符号化フレームワークである。具体的には、HPCは、多分解能残留放射場を持つ階層表現を導入し、様々な詳細レベルを同時に生成しながら、長期化シーケンスにおける時間的冗長性を減少させる。そこで本稿では,階層的表現と圧縮の両面を協調的に最適化するマルチレート歪み損失関数を用いたエンドツーエンドのプログレッシブ・ラーニング手法を提案する。我々のHPCは一度だけ複数の圧縮レベルを実現することができるが、現在の手法では異なるレート歪み(RD)トレードオフのために複数の固定ビットレートモデルをトレーニングする必要がある。大規模な実験により、HPCは可変ビットレートの柔軟な品質レベルを単一モデルで達成し、競争力のあるRD性能を示し、また様々なデータセットで固定ビットレートモデルよりも優れていた。 Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hierarchical progressive volumetric video coding framework achieving variable bitrate using a single model. Specifically, HPC introduces a hierarchical representation with a multi-resolution residual radiance field to reduce temporal redundancy in long-duration sequences while simultaneously generating various levels of detail. Then, we propose an end-to-end progressive learning approach with a multi-rate-distortion loss function to jointly optimize both hierarchical representation and compression. Our HPC trained only once can realize multiple compression levels, while the current methods need to train multiple fixed-bitrate models for different rate-distortion (RD) tradeoffs. Extensive experiments demonstrate that HPC achieves flexible quality levels with variable bitrate by a single model and exhibits competitive RD performance, even outperforming fixed-bitrate models across various datasets.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# 異方性量子ラビスタークモデルによる量子オットーサイクルにおける臨界性の役割の探索 Exploring the role of criticality in the quantum Otto cycle fueled by the anisotropic quantum Rabi-Stark model ( http://arxiv.org/abs/2407.09027v1 ) ライセンス: Link先を確認	He-Guang Xu, Jiasen Jin, Norton G. de Almeida, G. D. de Moraes Neto,	(参考訳) 熱エンジン、冷蔵庫、ヒーター、加速器を含む量子熱機械は、量子熱力学の最前線を象徴し、熱エネルギーを有用な機械作業に変換する新しいパラダイムを提供する。量子力学の原理を活用することで、これらの機械は、再生可能エネルギーと量子コンピューティングの潜在的な応用により、古典的な機械よりも優れた効率と性能を約束する。本稿では, 理想と有限時間の両方のシナリオで動作する量子オットーエンジンについて検討し, 異方性量子Rabi-Starkモデル(AQRSM)のフレームワーク内の調和振動子と相互作用する2レベルシステムを用いて検討する。このモデルは、一階法と連続量子相転移の両方を示すことで有名である。量子熱機関に着目して、これらの相転移がAQRSMベースのエンジンの効率とパワーをクリティカルに調節し、ハーモニックスペクトルを持つ作業媒体によって駆動される量子エンジンよりも優れていることを示した。さらに, 有限時間運転における量子摩擦の影響について検討し, 実用化に向けた量子熱エンジンの最適化に関する知見を提供する。 Quantum heat machines, encompassing heat engines, refrigerators, heaters, and accelerators, represent the forefront of quantum thermodynamics, offering a novel paradigm for converting heat energy into useful mechanical work. Leveraging quantum mechanical principles, these machines promise superior efficiency and performance compared to classical counterparts, with potential applications in renewable energy and quantum computing. This paper investigates a quantum Otto engine operating in both ideal and finite-time scenarios, employing a two-level system interacting with a harmonic oscillator within the framework of the anisotropic quantum Rabi-Stark model (AQRSM) as the working medium. This model is notable for exhibiting both first-order and continuous quantum phase transitions. By focusing on quantum heat engines, our study reveals that these phase transitions critically modulate the efficiency and power of AQRSM-based engines, outperforming quantum engines fueled by working medium with harmonic spectrum. Additionally, we explore the impacts of quantum friction and conduct limit cycle analysis in finite-time operations, providing insights into optimizing quantum heat engines for practical implementation.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# 不完全データにおける感情認識の強化:新しいクロスモーダルアライメント,リコンストラクション,リファインメントフレームワーク Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework ( http://arxiv.org/abs/2407.09029v1 ) ライセンス: Link先を確認	Haoqin Sun, Shiwan Zhao, Shaokai Li, Xiangyu Kong, Xuechen Wang, Aobo Kong, Jiaming Zhou, Yong Chen, Wenjia Zeng, Yong Qin,	(参考訳) マルチモーダル感情認識システムは、モーダルデータの完全利用に大きく依存しており、モーダルデータが不完全である場合に顕著な性能低下を被る。この問題に対処するために,クロスモーダルアライメント,リコンストラクション,リファインメント(CM-ARR)フレームワークを提案する。このフレームワークは、教師なし分布に基づくコントラスト学習を利用して、不均一なモーダル分布を整列させ、相違を低減し、意味的不確実性を効果的にモデル化する。再構成フェーズは、これらの整列分布を変換し、欠落したモダリティを回復するために、正規化フローモデルを適用する。改善フェーズでは、教師付きポイントベースのコントラスト学習を用いて、意味的相関を乱し、感情的特徴をアクセントし、再構成された表現の感情的内容を強化する。 IEMOCAP と MSP-IMPROV データセットの大規模な実験により、CM-ARR の欠落と完全モダリティの両方の条件下での優れた性能が確認された。 CM-ARRは6つのモダリティのシナリオの平均として、IEMOCAPデータセットではWARが2.11%、UARが2.12%、MSP-IMPROVデータセットではWARが1.71%、UARが1.96%という絶対的な改善を実現している。 Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle missing modalities and enhance emotion recognition. This framework utilizes unsupervised distribution-based contrastive learning to align heterogeneous modal distributions, reducing discrepancies and modeling semantic uncertainty effectively. The reconstruction phase applies normalizing flow models to transform these aligned distributions and recover missing modalities. The refinement phase employs supervised point-based contrastive learning to disrupt semantic correlations and accentuate emotional traits, thereby enriching the affective content of the reconstructed representations. Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the superior performance of CM-ARR under conditions of both missing and complete modalities. Notably, averaged across six scenarios of missing modalities, CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the MSP-IMPROV dataset.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# CAMP: 病理学における継続的かつ適応的な学習モデル CAMP: Continuous and Adaptive Learning Model in Pathology ( http://arxiv.org/abs/2407.09030v1 ) ライセンス: Link先を確認	Anh Tien Nguyen, Keunho Byeon, Kyungeun Kim, Boram Song, Seoung Wan Chae, Jin Tae Kwak,	(参考訳) 病理には多くの診断課題がある。従来の計算病理学は、それらを独立および個別の画像分類問題として定式化し、それによって計算の非効率性と高いコストをもたらす。この課題に対処するために,病理画像分類のための連続的適応学習モデル (CAMP) と呼ばれる汎用的,統一的,普遍的なフレームワークを提案する。 CAMPは、どんな分類タスクにも継続的に適応できる生成的、効率的、適応的な分類モデルであり、病理学固有の事前知識を活用し、タスク固有の知識を最小の計算コストで学習し、既存のタスクからの知識を忘れることなく得る。我々はCAMPを17の分類タスクに対して,1,171,526のパッチと11,811の病理スライドを含む22のデータセットで評価した。 CAMPは、パッチレベルとスライドレベルの両方で、幅広いデータセットとタスクに対して最先端の分類性能を達成し、従来の分類モデルと比較して、計算時間の94%とストレージメモリの85%を削減した。以上の結果から,CAMPは画像分類の根本的な変換を図り,完全にデジタル化されコンピュータ化された病理学の実践の道を開くことができることが示された。 There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pathology image classification. CAMP is a generative, efficient, and adaptive classification model that can continuously adapt to any classification task by leveraging pathology-specific prior knowledge and learning taskspecific knowledge with minimal computational cost and without forgetting the knowledge from the existing tasks. We evaluated CAMP on 22 datasets, including 1,171,526 patches and 11,811 pathology slides, across 17 classification tasks. CAMP achieves state-of-theart classification performance on a wide range of datasets and tasks at both patch- and slide-levels and reduces up to 94% of computation time and 85% of storage memory in comparison to the conventional classification models. Our results demonstrate that CAMP can offer a fundamental transformation in pathology image classification, paving the way for the fully digitized and computerized pathology practice.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# DRMの再検討:完全なエラー分析 DRM Revisited: A Complete Error Analysis ( http://arxiv.org/abs/2407.09032v1 ) ライセンス: Link先を確認	Yuling Jiao, Ruoxuan Li, Peiying Wu, Jerry Zhijian Yang, Pingwen Zhang,	(参考訳) 目標精度レベルが与えられた場合、トレーニングサンプルの適切な数、ニューラルネットワークの鍵となるアーキテクチャパラメータ、投影された勾配勾配最適化手順のステップサイズ、反復の必要回数をどうやって決定できるか。 In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number of iterations, such that the output of the gradient descent process closely approximates the true solution of the underlying partial differential equation to the specified precision?	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# ドメイン一般化セグメンテーションのためのテキストクエリ駆動型マスク変換器 Textual Query-Driven Mask Transformer for Domain Generalized Segmentation ( http://arxiv.org/abs/2407.09033v1 ) ライセンス: Link先を確認	Byeonghyun Pak, Byeongju Woo, Sunghwan Kim, Dae-hwan Kim, Hoseong Kim,	(参考訳) 本稿では,視覚言語モデルのテキスト埋め込みから,ドメイン不変の意味知識を活用することによって,ドメイン一般化セマンティックセマンティックセマンティックセマンティックセマンティクス(DGSS)に取り組む手法を提案する。我々は、変換器ベースのセグメンテーションフレームワーク(テキストオブジェクトクエリ)内で、オブジェクトクエリとしてテキスト埋め込みを使用します。これらのクエリは、DGSSにおけるピクセルグループ化のドメイン不変基底と見なされる。テキスト・オブジェクト・クエリのパワーを活用するために,テキスト・クエリ・ドリブン・マスク・トランスフォーマ (tqdm) と呼ばれる新しいフレームワークを導入する。 tqdmの目的は,(1)ドメイン不変セマンティクスを最大エンコードするテキストオブジェクトクエリを生成し,(2)高密度な視覚的特徴のセマンティクスを明確にすることである。さらに,視覚的特徴とテキスト的特徴の整合により,tqdmの有効性を向上させるために3つの正規化損失を提案する。本手法を用いることで,本モデルは興味のあるクラスに固有の意味情報を理解し,極端なドメイン(スケッチスタイルなど)に一般化することができる。我々のtqdmはGTA5$\rightarrow$Cityscapes上で68.9 mIoUを達成した。プロジェクトのページはhttps://byeonghyunpak.github.io/tqdm.comで公開されている。 In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. To leverage the power of textual object queries, we introduce a novel framework named the textual query-driven mask transformer (tqdm). Our tqdm aims to (1) generate textual object queries that maximally encode domain-invariant semantics and (2) enhance the semantic clarity of dense visual features. Additionally, we suggest three regularization losses to improve the efficacy of tqdm by aligning between visual and textual features. By utilizing our method, the model can comprehend inherent semantic information for classes of interest, enabling it to generalize to extreme domains (e.g., sketch style). Our tqdm achieves 68.9 mIoU on GTA5$\rightarrow$Cityscapes, outperforming the prior state-of-the-art method by 2.5 mIoU. The project page is available at https://byeonghyunpak.github.io/tqdm.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# GPC:ジェネレーティブおよび一般的な病理画像分類器 GPC: Generative and General Pathology Image Classifier ( http://arxiv.org/abs/2407.09035v1 ) ライセンス: Link先を確認	Anh Tien Nguyen, Jin Tae Kwak,	(参考訳) ディープラーニングは、その効率、正確性、堅牢性を改善するために、様々な計算病理アプリケーションに組み込まれている。画像分類における従来のアプローチは成功したが、重大な欠点がある。病理学には多くのタスクがあるが、タスクごとにモデルを構築する必要がある。さらに、任意のタスク固有のモデルを別のタスクに転送することは、依然として難しい問題である。本稿では,多種多様な病理画像から学習し,多数の分類タスクを統一モデルで実行することを目的とした,GPCと呼ばれるタスク依存型画像分類器を提案する。 GPCは、畳み込みニューラルネットワークとトランスフォーマーベースの言語モデルを備え、病理画像を高次元の特徴空間にマッピングし、画像からテキストへの分類機構を介して、関連するクラスラベルをテキストとして生成する。我々は,4つの病理画像分類タスクに対して,GPCを6つのデータセットで評価した。実験の結果,GPCは病理画像解析のための効率的かつ効率的な普遍的モデルの開発にかなりの可能性を秘めていることが明らかとなった。 Deep learning has been increasingly incorporated into various computational pathology applications to improve its efficiency, accuracy, and robustness. Although successful, most previous approaches for image classification have crucial drawbacks. There exist numerous tasks in pathology, but one needs to build a model per task, i.e., a task-specific model, thereby increasing the number of models, training resources, and cost. Moreover, transferring arbitrary task-specific model to another task is still a challenging problem. Herein, we propose a task-agnostic generative and general pathology image classifier, so called GPC, that aims at learning from diverse kinds of pathology images and conducting numerous classification tasks in a unified model. GPC, equipped with a convolutional neural network and a Transformer-based language model, maps pathology images into a high-dimensional feature space and generates pertinent class labels as texts via the image-to-text classification mechanism. We evaluate GPC on six datasets for four different pathology image classification tasks. Experimental results show that GPC holds considerable potential for developing an effective and efficient universal model for pathology image analysis.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# タブラルデータ分類におけるカタストロフィック・フォーミングの克服--擬似リハーサルに基づくアプローチ Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approach ( http://arxiv.org/abs/2407.09039v1 ) ライセンス: Link先を確認	Pablo García-Santaclara, Bruno Fernández-Castro, Rebeca P. Díaz-Redondo,	(参考訳) 継続学習(CL)は、それまで獲得した知識を忘れずに、新たな知識を統合しながら、データ分散の進化に適応する上で重要な課題となる。本稿では,Tarbular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3) と呼ばれる新しい手法を提案する。 TRIL3は、プロトタイプベースのインクリメンタル生成モデルXuILVQを使用して、古い知識を保存するために合成データを生成する。合成データの適切なパーセンテージを取得し, TRIL3 と他の CL との比較を行った結果, TRIL3 の性能は, 合成データの 50% しか利用しない文献の他の選択肢に勝っていると結論できる。 Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classification problems. TRIL3 uses the prototype-based incremental generative model XuILVQ to generate synthetic data to preserve old knowledge and the DNDF algorithm, which was modified to run in an incremental way, to learn classification tasks for tabular data, without storing old samples. After different tests to obtain the adequate percentage of synthetic data and to compare TRIL3 with other CL available proposals, we can conclude that the performance of TRIL3 outstands other options in the literature using only 50% of synthetic data.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# 拡張ペアを用いた分子言語モデルとエキスパートトランスファー Molecule Language Model with Augmented Pairs and Expertise Transfer ( http://arxiv.org/abs/2407.09043v1 ) ライセンス: Link先を確認	Namkyeong Lee, Siddhartha Laghuvarapu, Chanyoung Park, Jimeng Sun,	(参考訳) 最近、分子言語モデル(MoLM)による分子とそのテキスト記述の理解が、研究者の間で注目を集めている。しかし、MOLMの分野には独自の課題が存在する。 1)分子文のペア化データの限られた量と 2)専門家の専門分野による専門知識の欠如。この目的のために,我々はAMOLEを提案する。 1)構造的類似性保持損失を有する分子文対を増補し、 2) 専門知識を分子間で伝達する。様々な下流タスクに関する大規模な実験は、コンプレッション分子とその記述におけるAMOLEの優位性を示し、現実世界の薬物発見への応用の可能性を強調している。 Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# 人物再同定のための可変長WiFi CSI信号の時間周波数解析 Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification ( http://arxiv.org/abs/2407.09045v1 ) ライセンス: Link先を確認	Chen Mao, Chong Tan, Jingqi Hu, Min Zheng,	(参考訳) セキュリティ分野における重要な技術である人物再識別(ReID)は、セキュリティ検出や数え方において重要な役割を担っている。現在のセキュリティと監視システムは、主に視覚情報に依存しており、個人のプライバシーを侵害し、特定のシナリオにおける歩行者の外観や衣服からの干渉を受けやすい可能性がある。一方、ルータの普及はReIDに新たな可能性をもたらす。本文では, 歩行者の特徴を識別するための基礎として, WiFi信号のマルチパス伝搬特性を活用する, WiFiチャネル状態情報(CSI)を用いた手法を紹介する。本稿では、WiFi信号の周波数領域における時間領域と位相の振幅を解析し、連続的な横方向接続を通して時間周波数情報を融合し、表現とメートル法学習のための高度な目的関数を用いる可変長データを処理する2ストリームネットワーク構造を提案する。実世界で収集されたデータセットを用いて実験し、93.68%のmAPと98.13%のランク-1を達成した。 Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of routers offers new possibilities for ReID. This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features. We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals, fuses time-frequency information through continuous lateral connections, and employs advanced objective functions for representation and metric learning. Tested on a dataset collected in the real world, our method achieves 93.68% mAP and 98.13% Rank-1.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# Cs2K:インクリメンタルセマンティックセグメンテーションのためのクラス固有およびクラス共有知識ガイダンス Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation ( http://arxiv.org/abs/2407.09047v1 ) ライセンス: Link先を確認	Wei Cong, Yang Cong, Yuyang Liu, Gan Sun,	(参考訳) 増分的セグメンテーションは、古いクラスの知識を維持しながら、新しく遭遇したクラスをセグメンテーションする。しかし、既存の方法もある。 1) クラス固有の知識(例えば、古いクラスプロトタイプ)からのガイダンスが欠如し、新しいクラスへの偏見につながるか、 2) クラス共有知識(すなわち古いモデルウェイト)を過度に差別せずに拘束し, 古いクラスを優先する。本稿では,モデル性能をトレードオフするために,段階的セマンティックセグメンテーションのためのクラス固有およびクラス共有知識(Cs2K)ガイダンスを提案する。具体的には、クラス固有の知識の観点から、プロトタイプからの特徴近接を利用して擬似ラベルを補正し、破滅的な忘れを克服するプロトタイプ誘導擬似ラベルを設計する。一方,従来の拡張プロトタイプを学習することで,データセット間のクラス分布を整合させるプロトタイプ誘導型クラス適応を開発した。さらに,クラス共有の知識的側面から,新しいメモリを維持しつつ,古いクラスの重みと新しいモデルの重みを統合することで,古いメモリの強化を図るための重み付き選択的統合を提案する。公開データセットの実験により,提案したCs2Kはセグメンテーション性能を著しく向上し,プラグアンドプレイであることが示された。 Incremental semantic segmentation endeavors to segment newly encountered classes while maintaining knowledge of old classes. However, existing methods either 1) lack guidance from class-specific knowledge (i.e., old class prototypes), leading to a bias towards new classes, or 2) constrain class-shared knowledge (i.e., old model weights) excessively without discrimination, resulting in a preference for old classes. In this paper, to trade off model performance, we propose the Class-specific and Class-shared Knowledge (Cs2K) guidance for incremental semantic segmentation. Specifically, from the class-specific knowledge aspect, we design a prototype-guided pseudo labeling that exploits feature proximity from prototypes to correct pseudo labels, thereby overcoming catastrophic forgetting. Meanwhile, we develop a prototype-guided class adaptation that aligns class distribution across datasets via learning old augmented prototypes. Moreover, from the class-shared knowledge aspect, we propose a weight-guided selective consolidation to strengthen old memory while maintaining new memory by integrating old and new model weights based on weight importance relative to old classes. Experiments on public datasets demonstrate that our proposed Cs2K significantly improves segmentation performance and is plug-and-play.	翻訳日:2024-07-16 00:26:50 公開日:2024-07-12
# KUNPENG:Intelligent Maritimeのためのエンボディード大型モデル KUNPENG: An Embodied Large Model for Intelligent Maritime ( http://arxiv.org/abs/2407.09048v1 ) ライセンス: Link先を確認	Naiyao Wang, Tongbang Jiang, Ye Wang, Shaoyang Qiu, Bo Zhang, Xinqiang Xie, Munan Li, Chunliu Wang, Yiyang Wang, Hongxiang Ren, Ruili Wang, Hongjun Shan, Hongbo Liu,	(参考訳) インテリジェントな海運は、スマート海洋構築の不可欠なコンポーネントとして、高度な人工知能技術とデータ分析手法を深く統合し、スマート血管、ルート最適化、安全な航行といった複数の側面を網羅し、海洋資源利用の効率化と輸送ネットワークの知性向上を目指している。しかし、複雑でダイナミックな海洋環境は、多種多様で異質な大規模データソースとともに、インテリジェントな海洋におけるリアルタイムな意思決定の課題を提示している。本稿では,6つのシステムからなるスマート海洋構築における知的海洋モデルであるKUNPENGを提案する。このモデルは、環境相互作用の認識のためのマルチソースの異種データを認識し、インテリジェントな船舶が安全と緊急の保証の下で航行行動を行い、海上でのエンボディドインテリジェンスを達成するために継続的に電力を最適化する自律的な意思決定戦略を行う。総合的な海上作業評価において、KUNPENGは優れた性能を示した。 Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic maritime environment, along with diverse and heterogeneous large-scale data sources, present challenges for real-time decision-making in intelligent maritime. In this paper, We propose KUNPENG, the first-ever embodied large model for intelligent maritime in the smart ocean construction, which consists of six systems. The model perceives multi-source heterogeneous data for the cognition of environmental interaction and make autonomous decision strategies, which are used for intelligent vessels to perform navigation behaviors under safety and emergency guarantees and continuously optimize power to achieve embodied intelligence in maritime. In comprehensive maritime task evaluations, KUNPENG has demonstrated excellent performance.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# マルチモーダル大言語モデルのための安全プロンプトの再利用 Refusing Safe Prompts for Multi-modal Large Language Models ( http://arxiv.org/abs/2407.09050v1 ) ライセンス: Link先を確認	Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong,	(参考訳) マルチモーダルな大規模言語モデル(MLLM)は、今日の生成AIエコシステムの基盤となり、テック大企業やスタートアップの間で激しい競争を巻き起こしている。特に、MLLMは、画像と質問からなるプロンプトが与えられたテキスト応答を生成する。最先端のMLLMは安全フィルタとアライメント技術を用いて安全でないプロンプトを拒否するが,本研究では,安全プロンプトに対する拒絶を誘導する最初の手法であるMLLM-Refusalを紹介する。特に、MLLM-Refusalは、ほとんど認識不能な拒絶摂動を最適化し、画像を付加するので、ターゲットMLLMは、摂動画像と安全な質問を含む安全なプロンプトを拒否する可能性が高い。具体的には,MLLM-Refusalを制約付き最適化問題として定式化し,その解法を提案する。本手法は,MLLM のユーザエクスペリエンスを損なう可能性を秘めているため,MLLM モデルプロバイダに対して競争上の優位性を提供する。 4つのデータセットにわたるMLLMに対するMLLM-Refusalの評価を行い、競合するMLLMが非競合MLLMに影響を与えずに安全なプロンプトを拒否する効果を示した。さらに,ガウス雑音,DiffPure,対人訓練の3つの潜在的な対策について検討した。 MLLM-Refusalの有効性を緩和できるが、競合するMLLMの精度や効率を犠牲にすることができる。コードはhttps://github.com/Sadcardation/MLLM-Refusalで入手できる。 Multimodal large language models (MLLMs) have become the cornerstone of today's generative AI ecosystem, sparking intense competition among tech giants and startups. In particular, an MLLM generates a text response given a prompt consisting of an image and a question. While state-of-the-art MLLMs use safety filters and alignment techniques to refuse unsafe prompts, in this work, we introduce MLLM-Refusal, the first method that induces refusals for safe prompts. In particular, our MLLM-Refusal optimizes a nearly-imperceptible refusal perturbation and adds it to an image, causing target MLLMs to likely refuse a safe prompt containing the perturbed image and a safe question. Specifically, we formulate MLLM-Refusal as a constrained optimization problem and propose an algorithm to solve it. Our method offers competitive advantages for MLLM model providers by potentially disrupting user experiences of competing MLLMs, since competing MLLM's users will receive unexpected refusals when they unwittingly use these perturbed images in their prompts. We evaluate MLLM-Refusal on four MLLMs across four datasets, demonstrating its effectiveness in causing competing MLLMs to refuse safe prompts while not affecting non-competing MLLMs. Furthermore, we explore three potential countermeasures -- adding Gaussian noise, DiffPure, and adversarial training. Our results show that they are insufficient: though they can mitigate MLLM-Refusal's effectiveness, they also sacrifice the accuracy and/or efficiency of the competing MLLM. The code is available at https://github.com/Sadcardation/MLLM-Refusal.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# DroneMOT:ドローンと物体の同時移動と検出困難を考慮したドローンによる多物体追跡 DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects ( http://arxiv.org/abs/2407.09051v1 ) ライセンス: Link先を確認	Peng Wang, Yongcai Wang, Deying Li,	(参考訳) 監視カメラのような静的なプラットフォーム上でのマルチオブジェクトトラッキング(MOT)は、様々なパラダイムが魅力的なパフォーマンスを提供するなど、大きな進歩を遂げている。しかし、ドローンのような動的プラットフォームに関しては、従来のMOT手法の有効性は著しく低下している。この減少は、(1)物体は画像平面において一般的に小さく、ぼやけており、しばしば隠蔽されているため、検出と認識が困難である、(2)ドローンが異なる角度から物体を移動して見ることにより、予測された位置の信頼性が低下し、物体の特徴が埋もれてしまうという、MOT-on-droneシナリオの特異な課題に起因している。本稿では,DroneMOTを提案する。DroneMOTは,ドローンによる物体検出の高速化と,小型でぼやけた,隠蔽された物体に対する特徴埋め込みを実現するために,ドローンの高速動作を考慮したDual-domain Integrated Attention (DIA)モジュールを提案する。次に、ドローンと物体の同時動作を考慮した、革新的な動き駆動協会(MDA)方式を導入する。 MDAでは、異なる角度から見る物体の特徴を更新するために、適応的特徴同期(AFS)技術が提示されている。さらに、物体の位置を予測するために、デュアルモーションベース予測(DMP)手法を用いる。最後に、改良された特徴埋め込みと予測された位置を統合して、オブジェクトアソシエーションを強化する。 VisDrone2019-MOTとUAVDTデータセットの総合的な評価によると、DroneMOTは、ドローン上のMOTの領域における最先端技術よりも大幅にパフォーマンスが改善されている。 Multi-object tracking (MOT) on static platforms, such as by surveillance cameras, has achieved significant progress, with various paradigms providing attractive performances. However, the effectiveness of traditional MOT methods is significantly reduced when it comes to dynamic platforms like drones. This decrease is attributed to the distinctive challenges in the MOT-on-drone scenario: (1) objects are generally small in the image plane, blurred, and frequently occluded, making them challenging to detect and recognize; (2) drones move and see objects from different angles, causing the unreliability of the predicted positions and feature embeddings of the objects. This paper proposes DroneMOT, which firstly proposes a Dual-domain Integrated Attention (DIA) module that considers the fast movements of drones to enhance the drone-based object detection and feature embedding for small-sized, blurred, and occluded objects. Then, an innovative Motion-Driven Association (MDA) scheme is introduced, considering the concurrent movements of both the drone and the objects. Within MDA, an Adaptive Feature Synchronization (AFS) technique is presented to update the object features seen from different angles. Additionally, a Dual Motion-based Prediction (DMP) method is employed to forecast the object positions. Finally, both the refined feature embeddings and the predicted positions are integrated to enhance the object association. Comprehensive evaluations on VisDrone2019-MOT and UAVDT datasets show that DroneMOT provides substantial performance improvements over the state-of-the-art in the domain of MOT on drones.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# MIDIからリッチタブラチュアへ:リードギター奏者の指とスティリスティックな選択を取り入れた自動生成システム From MIDI to Rich Tablatures: an Automatic Generative System incorporating Lead Guitarists' Fingering and Stylistic choices ( http://arxiv.org/abs/2407.09052v1 ) ライセンス: Link先を確認	Pierluigi Bontempi, Daniele Manerba, Alexandre D'Hooge, Sergio Canazza,	(参考訳) 弦楽器のメロディ演奏に最適な指の自動識別は、文献では(少なくとも部分的には)既に行われているが、リードエレキギターに関する特定のケースには、専用のアプローチが必要である。簡単なMIDIメロディから,指や調音,表現技術に富んだタブを生成できるシステムを提案する。基本指は,各瞬間に使用する指だけでなく,フレッティングハンドの最適な位置を導出する制約付き多属性最適化問題の解法から導かれるもので,MySongBook corpusの統計データを解析することにより,最も一般的なclich{\'e}と生体力学的実現性,調音,表現技術を導入している。最後に、得られた出力はMusicXMLフォーマットに変換され、視覚化と使用が容易になる。提案手法の質と高い構成性は、特に器楽教育、補助的な作曲とアレンジング、および計算的表現力のある演奏モデルにおいて、いくつかの影響を及ぼす可能性がある。 Although the automatic identification of the optimal fingering for the performance of melodies on fretted string instruments has already been addressed (at least partially) in the literature, the specific case regarding lead electric guitar requires a dedicated approach. We propose a system that can generate, from simple MIDI melodies, tablatures enriched by fingerings, articulations, and expressive techniques. The basic fingering is derived by solving a constrained and multi-attribute optimization problem, which derives the best position of the fretting hand, not just the finger used at each moment.Then, by analyzing statistical data from the mySongBook corpus, the most common clich{\'e}s and biomechanical feasibility, articulations, and expressive techniques are introduced. Finally, the obtained output is converted into MusicXML format, which allows for easy visualization and use. The quality of the tablatures derived and the high configurability of the proposed approach can have several impacts, in particular in the fields of instrumental teaching, assisted composition and arranging, and computational expressive music performance models.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# 高度なグラフクラスタリング手法: 包括的で詳細な分析 Advanced Graph Clustering Methods: A Comprehensive and In-Depth Analysis ( http://arxiv.org/abs/2407.09055v1 ) ライセンス: Link先を確認	Timothé Watteau, Aubin Bonnefoy, Simon Illouz-Laurent, Joaquim Jusseau, Serge Iovleff,	(参考訳) グラフクラスタリングは、グラフを複数の均質なグループに分割することを目的としており、ソーシャルネットワーク分析、バイオインフォマティクス、イメージセグメンテーションといった様々な分野にまたがるアプリケーションにおいて重要な研究領域である。本稿では,従来のグラフクラスタリング手法と最近のグラフクラスタリング手法について検討する。まず、グラフ理論における重要な概念と定義を紹介する。背景のセクションでは、グラフラプラシアンやグラフ解析におけるディープラーニングの統合など、重要なトピックが取り上げられている。論文では、スペクトルクラスタリングやライデンアルゴリズムなど、従来のクラスタリング手法について論じる。次に,ディープラーニングを活用した最先端クラスタリング手法について検討した。これらの手法の総合的な比較は実験を通じて行われる。本稿では,グラフクラスタリングの実用化と今後の研究の方向性について論じる。 Graph clustering, which aims to divide a graph into several homogeneous groups, is a critical area of study with applications that span various fields such as social network analysis, bioinformatics, and image segmentation. This paper explores both traditional and more recent approaches to graph clustering. Firstly, key concepts and definitions in graph theory are introduced. The background section covers essential topics, including graph Laplacians and the integration of Deep Learning in graph analysis. The paper then delves into traditional clustering methods, including Spectral Clustering and the Leiden algorithm. Following this, state-of-the-art clustering techniques that leverage deep learning are examined. A comprehensive comparison of these methods is made through experiments. The paper concludes with a discussion of the practical applications of graph clustering and potential future research directions.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# 高エネルギー物理実験におけるジェットクラスタリングの新しい量子化 A Novel Quantum Realization of Jet Clustering in High-Energy Physics Experiments ( http://arxiv.org/abs/2407.09056v1 ) ライセンス: Link先を確認	Yongfeng Zhu, Weifeng Zhuang, Chen Qian, Yunheng Ma, Dong E. Liu, Manqi Ruan, Chen Zhou,	(参考訳) 量子技術の基礎科学への応用を探求することは、双方にとってイノベーションを育む鍵となる。高エネルギー粒子衝突ではクォークとグルーオンが生成され、すぐにジェットとして知られる衝突粒子噴霧を形成する。正確なジェット・クラスタリングは、起源のクォークやグルーオンの情報を保持し、亜原子粒子の質量生成の機構を基盤とするヒッグス粒子の性質の研究の基礎を形成するため、重要である。衝突イベントをノードとして、角分離をエッジとしてグラフにマッピングすることで、利用可能な量子資源と古典的な組合せ最適化問題に対処するハイブリッド量子古典アルゴリズムであるQuantum Approximate Optimization Algorithm (QAOA)を用いてジェットクラスタリングを実現する。量子コンピュータシミュレータの30量子ビットと量子コンピュータハードウェアの6量子ビットから得られた本研究では,QAOAを用いたジェットクラスタリング性能が,小型問題に対する古典的アルゴリズムと同等かそれ以上に優れていることを示す。この研究は、ジェットクラスタリングに革命をもたらす量子コンピューティングの可能性を強調し、高エネルギー物理実験における量子コンピューティングの実践的応用を一歩近づいた。 Exploring the application of quantum technologies to fundamental sciences holds the key to fostering innovation for both sides. In high-energy particle collisions, quarks and gluons are produced and immediately form collimated particle sprays known as jets. Accurate jet clustering is crucial as it retains the information of the originating quark or gluon and forms the basis for studying properties of the Higgs boson, which underlies teh mechanism of mass generation for subatomic particles. For the first time, by mapping collision events into graphs--with particles as nodes and their angular separations as edges--we realize jet clustering using the Quantum Approximate Optimization Algorithm (QAOA), a hybrid quantum-classical algorithm for addressing classical combinatorial optimization problems with available quantum resources. Our results, derived from 30 qubits on quantum computer simulator and 6 qubits on quantum computer hardware, demonstrate that jet clustering performance with QAOA is comparable with or even better than classical algorithms for a small-sized problem. This study highlights the feasibility of quantum computing to revolutionize jet clustering, bringing the practical application of quantum computing in high-energy physics experiments one step closer.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# パーソナライゼーションネット:パーソナライズされた主題を人として振る舞う PersonificationNet: Making customized subject act like a person ( http://arxiv.org/abs/2407.09057v1 ) ライセンス: Link先を確認	Tianchu Guo, Pengyu Li, Biao Wang, Xiansheng Hua,	(参考訳) 近年、カスタマイズされた生成には大きな可能性を秘めており、3～5個のユーザ提供画像を用いて、特定の被写体の新たな画像の合成をモデルに訓練している。その後のアプリケーションは、カスタマイズされた生成の柔軟性と多様性を高めるが、人のポーズのように振る舞う対象に対するきめ細かい制御は、まだ研究の欠如である。本稿では,人物像と同一のポーズを演じるために,漫画キャラクタやぬいぐるみなどの特定対象を制御できるペルソナライズネットを提案する。カスタマイズされたブランチ、ポーズ条件ブランチ、構造アライメントモジュールが含まれている。具体的には、まず、カスタマイズされたブランチが特定の被写体を模倣する。第2に、ポーズ条件分岐は、人体構造情報を変種インスタンスに転送する。最後に、構造アライメントモジュールは、推論段階における人と特定被写体の間の構造ギャップをブリッジする。実験の結果,提案するペルソナリティネットは最先端の手法よりも優れていた。 Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a PersonificationNet, which can control the specified subject such as a cartoon character or plush toy to act the same pose as a given referenced person's image. It contains a customized branch, a pose condition branch and a structure alignment module. Specifically, first, the customized branch mimics specified subject appearance. Second, the pose condition branch transfers the body structure information from the human to variant instances. Last, the structure alignment module bridges the structure gap between human and specified subject in the inference stage. Experimental results show our proposed PersonificationNet outperforms the state-of-the-art methods.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# テスト時間ブラリングによるドメイン適応型ビデオの劣化 Domain-adaptive Video Deblurring via Test-time Blurring ( http://arxiv.org/abs/2407.09059v1 ) ライセンス: Link先を確認	Jin-Ting He, Fu-Jen Tsai, Jia-Hao Wu, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin,	(参考訳) ダイナミックなシーンビデオのデブロアリングは、露光プロセス中にキャプチャーされた望ましくないぼやけたアーティファクトを取り除くことを目的としている。従来のビデオデブロアリング手法は目覚ましい結果を得たが、トレーニングとテストビデオの領域差、特に実世界のシナリオで捉えた場合、大きなパフォーマンス低下に悩まされている。そこで本研究では,未確認領域におけるデブロアリングモデルに対するテスト時間微調整を実現するために,ぼやけたモデルに基づくドメイン適応方式を提案する。そこで本手法では, 対象領域の劣化モデルを校正するために, ドメイン適応型トレーニングペアを生成することができる。まず、ぼやけた入力画像から比較的シャープな領域を識別し、擬似シャープ画像とみなすための相対シャープネス検出モジュールを提案する。次に、テスト中に抽出した擬似シャープ画像に基づいて、ぼやけた画像を生成するために、ぼやけたモデルを用いる。対象データ分布に応じてぼやけた画像を合成するために, ドメイン適応型ブラ条件生成モジュールを提案し, ぼやけたモデルに対して, ドメイン固有なぼやけた条件を作成する。最後に、生成された擬似シャープとぼやけたペアを使用して、より優れた性能を得るためにデブロアリングモデルを微調整する。大規模な実験結果から,本手法は最先端のビデオデブロアリング法を大幅に改善し,実世界のビデオデブロアリングデータセットに対して最大7.54dBの性能向上を達成できることが示された。ソースコードはhttps://github.com/Jin-Ting-He/DADeblur.comで入手できる。 Dynamic scene video deblurring aims to remove undesirable blurry artifacts captured during the exposure process. Although previous video deblurring methods have achieved impressive results, they suffer from significant performance drops due to the domain gap between training and testing videos, especially for those captured in real-world scenarios. To address this issue, we propose a domain adaptation scheme based on a blurring model to achieve test-time fine-tuning for deblurring models in unseen domains. Since blurred and sharp pairs are unavailable for fine-tuning during inference, our scheme can generate domain-adaptive training pairs to calibrate a deblurring model for the target domain. First, a Relative Sharpness Detection Module is proposed to identify relatively sharp regions from the blurry input images and regard them as pseudo-sharp images. Next, we utilize a blurring model to produce blurred images based on the pseudo-sharp images extracted during testing. To synthesize blurred images in compliance with the target data distribution, we propose a Domain-adaptive Blur Condition Generation Module to create domain-specific blur conditions for the blurring model. Finally, the generated pseudo-sharp and blurred pairs are used to fine-tune a deblurring model for better performance. Extensive experimental results demonstrate that our approach can significantly improve state-of-the-art video deblurring methods, providing performance gains of up to 7.54dB on various real-world video deblurring datasets. The source code is available at https://github.com/Jin-Ting-He/DADeblur.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# スペクトル自己監督的特徴選択 Spectral Self-supervised Feature Selection ( http://arxiv.org/abs/2407.09061v1 ) ライセンス: Link先を確認	Daniel Segal, Ofir Lindenbaum, Ariel Jaffe,	(参考訳) 教師なし環境での高次元観測から有意義な特徴のサブセットを選択することは、クラスタリングや次元減少といった下流分析の精度を大幅に向上させ、与えられたデータセットの不均一性の原因に関する貴重な洞察を提供する。本稿では,教師なし特徴選択のための自己教師付きグラフベースアプローチを提案する。提案手法のコアは,グラフラプラシアンの固有ベクトルに単純な処理ステップを適用することで,ロバストな擬似ラベルを計算することである。擬似ラベル計算に使用される固有ベクトルのサブセットは、モデル安定性基準に基づいて選択される。次に,観測結果から擬似ラベルを予測するために代理モデルを訓練することにより,各特徴の重要性を測定する。我々のアプローチは、外れ値や複雑な部分構造の存在など、困難なシナリオに対して堅牢であることが示されている。実世界のデータセットを用いた実験を通して,本手法の有効性を実証し,その堅牢性,特に生物学的データセットにおける有効性を示す。 Choosing a meaningful subset of features from high-dimensional observations in unsupervised settings can greatly enhance the accuracy of downstream analysis, such as clustering or dimensionality reduction, and provide valuable insights into the sources of heterogeneity in a given dataset. In this paper, we propose a self-supervised graph-based approach for unsupervised feature selection. Our method's core involves computing robust pseudo-labels by applying simple processing steps to the graph Laplacian's eigenvectors. The subset of eigenvectors used for computing pseudo-labels is chosen based on a model stability criterion. We then measure the importance of each feature by training a surrogate model to predict the pseudo-labels from the observations. Our approach is shown to be robust to challenging scenarios, such as the presence of outliers and complex substructures. We demonstrate the effectiveness of our method through experiments on real-world datasets, showing its robustness across multiple domains, particularly its effectiveness on biological datasets.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# フェデレーションのためのマルチモーダルデータセット作成 -DICOM構造化レポートを用いた学習- Multi-Modal Dataset Creation for Federated~Learning with DICOM Structured Reports ( http://arxiv.org/abs/2407.09064v1 ) ライセンス: Link先を確認	Malte Tölle, Lukas Burger, Halvar Kelm, Florian André, Peter Bannas, Gerhard Diller, Norbert Frey, Philipp Garthe, Stefan Groß, Anja Hennemuth, Lars Kaderali, Nina Krüger, Andreas Leha, Simon Martin, Alexander Meyer, Eike Nagel, Stefan Orwat, Clemens Scherer, Moritz Seiffert, Jan Moritz Seliger, Stefan Simm, Tim Friede, Tim Seidler, Sandy Engelhardt,	(参考訳) 目的: フェデレーショントレーニングは,多種多様なデータストレージオプション,一貫性のない命名方式,さまざまなアノテーション手順,ラベル品質の相違などにより,不均一なデータセットによって妨げられることが多い。これは、均一なデータ表現とフィルタリングオプションを含むデータセット調和が最重要となる、新興のマルチモーダル学習パラダイムにおいて特に顕著である。メソッド: DICOM構造化レポートは、イメージングドメインを超えて任意の情報の標準化されたリンクを可能にする。これに基づいて、マルチモーダルデータセットの組み立てプロセスを簡単にする、データ統合と対話型フィルタリング機能のためのオープンプラットフォームを開発した。結果: 本研究は,ドイツにある8つの大学病院のコンソーシアムにおけるフェデレーショントレーニングのためのデータセットの合理化とともに, より多種多様なデータタイプに適用可能性を示すことによって, これまでの作業を拡張した。最小侵襲心弁置換術後の結果を予測するため,全部位に調和したマルチモーダルデータセットを作成した。データはDICOMデータ(CT画像、心電図スキャン)、アノテーション(石灰化セグメンテーション、ポイントセット、ペースメーカー依存性)、メタデータ(補綴、診断)を含む。結論: 構造化レポートは、画像システムと情報システムの間の伝統的なギャップを橋渡しする。固有のDICOM参照システムを利用することで、任意のデータ型を同時にクエリして、臨床的研究に意味のあるコホートを作成することができる。グラフィカルインターフェースと構造化レポートテンプレートの例が公開される予定だ。 Purpose: Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance. Methods: DICOM structured reports enable the standardized linkage of arbitrary information beyond the imaging domain and can be used within Python deep learning pipelines with highdicom. Building on this, we developed an open platform for data integration and interactive filtering capabilities that simplifies the process of assembling multi-modal datasets. Results: In this study, we extend our prior work by showing its applicability to more and divergent data types, as well as streamlining datasets for federated training within an established consortium of eight university hospitals in Germany. We prove its concurrent filtering ability by creating harmonized multi-modal datasets across all locations for predicting the outcome after minimally invasive heart valve replacement. The data includes DICOM data (i.e. computed tomography images, electrocardiography scans) as well as annotations (i.e. calcification segmentations, pointsets and pacemaker dependency), and metadata (i.e. prosthesis and diagnoses). Conclusion: Structured reports bridge the traditional gap between imaging systems and information systems. Utilizing the inherent DICOM reference system arbitrary data types can be queried concurrently to create meaningful cohorts for clinical studies. The graphical interface as well as example structured report templates will be made publicly available.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# 直接参照最適化のための新しいデシラタ New Desiderata for Direct Preference Optimization ( http://arxiv.org/abs/2407.09072v1 ) ライセンス: Link先を確認	Xiangkun Hu, Tong He, David Wipf,	(参考訳) これまでの大きな言語モデルは、モデル応答と人間の嗜好をより良く整合させるために、人間からのフィードバック(RLHF)による強化学習のある種の形式に依存してきた。しかし、これらのRLHFパイプラインを実装する際の不安定性のため、RL報酬モデルを個別に学習する必要性を助長するために、近年様々なパラメータ化技術が導入されている。代わりに、人間の嗜好を直接微調整することは、単一のクローズドフォームトレーニング目標(元々は直接選好最適化(DPO)と呼ばれ、その後いくつかの顕著な子孫が続くプロセス)の最小化によって達成される。実世界の特定の環境では有効であるが、既存のDPO手法が事前訓練された参照モデルと人間の嗜好の実証的尺度を補間する能力の未解決の欠点を浮き彫りにする新たな評価基準を導入する。私たちの洞察は、これらの制限を確実に緩和する代替のDPOライクな損失を動機付けます。経験的結果は、我々の分析の顕著な側面を裏付けるものである。 Large language models in the past have typically relied on some form of reinforcement learning with human feedback (RLHF) to better align model responses with human preferences. However, because of oft-observed instabilities when implementing these RLHF pipelines, various reparameterization techniques have recently been introduced to sidestep the need for separately learning an RL reward model. Instead, directly fine-tuning for human preferences is achieved via the minimization of a single closed-form training objective, a process originally referred to as direct preference optimization (DPO) and followed by several notable descendants. Although effective in certain real-world settings, we introduce new evaluation criteria that serve to highlight unresolved shortcomings in the ability of existing DPO methods to interpolate between a pre-trained reference model and empirical measures of human preferences, as well as unavoidable trade-offs in how low- and high-quality responses are regularized and constraints are handled. Our insights then motivate an alternative DPO-like loss that provably mitigates these limitations. Empirical results serve to corroborate notable aspects of our analyses.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# Open Vocabulary Multi-Label Video 分類 Open Vocabulary Multi-Label Video Classification ( http://arxiv.org/abs/2407.09073v1 ) ライセンス: Link先を確認	Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi,	(参考訳) 事前学習された視覚言語モデル(VLM)は、画像分類、オブジェクト検出、画像セグメント化などのオープン語彙コンピュータビジョンタスクにおいて大きな進歩をもたらした。いくつかの最近の研究はVLMを拡張し、ビデオ内の単一のラベルのアクション分類をオープンにすることに焦点を当てている。しかし、従来の手法では、複数のアクションやエンティティを同時に認識する能力、例えば、ビデオ内のオブジェクトをオープンな語彙設定で認識する能力を必要とする、全体論的ビデオ理解では不足していた。この問題をオープン語彙多ラベルビデオ分類として定式化し、CLIPなどの事前学習VLMを適用してこの問題を解決する方法を提案する。大規模言語モデル(LLM)を活用して,クラスラベルに関するVLMのセマンティックガイダンスを提供し,そのオープンな語彙性能を2つの重要なコントリビューションで改善する。まず、LLMにCLIPテキストエンコーダのソフト属性を生成して、新しいクラスを認識できるようにする、エンドツーエンドのトレーニング可能なアーキテクチャを提案する。第2に、時間モデリングモジュールをCLIPの視覚エンコーダに統合し、ビデオ概念の時空間的ダイナミクスを効果的にモデル化し、ビデオ領域における強力なオープン語彙分類性能を保証するための新しい正規化微調整手法を提案する。大規模な実験では、複数のベンチマークデータセットに対するアプローチの有効性を示す。 Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to simultaneously recognize multiple actions and entities e.g., objects in the video in an open vocabulary setting. We formulate this problem as open vocabulary multilabel video classification and propose a method to adapt a pre-trained VLM such as CLIP to solve this task. We leverage large language models (LLMs) to provide semantic guidance to the VLM about class labels to improve its open vocabulary performance with two key contributions. First, we propose an end-to-end trainable architecture that learns to prompt an LLM to generate soft attributes for the CLIP text-encoder to enable it to recognize novel classes. Second, we integrate a temporal modeling module into CLIP's vision encoder to effectively model the spatio-temporal dynamics of video concepts as well as propose a novel regularized finetuning technique to ensure strong open vocabulary classification performance in the video domain. Our extensive experimentation showcases the efficacy of our approach on multiple benchmark datasets.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# BKDSNN: 知識蒸留による学習型スパイクニューラルネットワークトレーニングの性能向上 BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation ( http://arxiv.org/abs/2407.09083v1 ) ライセンス: Link先を確認	Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He,	(参考訳) 生物学的ニューラルネットワークを模倣して離散スパイクを介して情報を伝達するスパイキングニューラルネットワーク(SNN)は、優れた計算効率を持つ脳にインスパイアされたモデルとしてよく知られている。離散スパイクに対する代理勾配推定を利用して、超低推論遅延(時間ステップ数)を達成する学習ベースのSNNトレーニング手法が最近出現している。それでも、離散スパイクの正確な勾配推定を学習ベース手法で導き出すことが難しいため、SNNとその人工知能ニューラルネットワーク(ANN)間では、明確な精度のギャップが持続する。上記の問題に対処するために,ランダムなぼやけたSNN機能を活用してANN機能を復元・模倣する,ぼやけた知識蒸留(BKD)手法を提案する。なお, 我々のBKDは, SNNの最終層直前の機能マップに適用されており, 従来のロジットに基づく知識蒸留と組み合わせることで, 精度を最大化することができる。我々の知る限り、学習に基づく手法のカテゴリでは、静的およびニューロモルフィックなデータセット上でSNNをトレーニングするための最先端のパフォーマンスを達成する。 ImageNetデータセットでは、BKDSNNは、CNNとTransformerのネットワークトポロジでそれぞれ4.51%、0.93%の先行結果を上回っている。 Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to the difficulty in deriving precise gradient estimation for discrete spikes using learning-based method, a distinct accuracy gap persists between SNN and its artificial neural networks (ANNs) counterpart. To address the aforementioned issue, we propose a blurred knowledge distillation (BKD) technique, which leverages random blurred SNN feature to restore and imitate the ANN feature. Note that, our BKD is applied upon the feature map right before the last layer of SNN, which can also mix with prior logits-based knowledge distillation for maximized accuracy boost. To our best knowledge, in the category of learning-based methods, our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets. On ImageNet dataset, BKDSNN outperforms prior best results by 4.51% and 0.93% with the network topology of CNN and Transformer respectively.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# 視覚表現学習における離散的トークン化の役割について On the Role of Discrete Tokenization in Visual Representation Learning ( http://arxiv.org/abs/2407.09087v1 ) ライセンス: Link先を確認	Tianqi Du, Yifei Wang, Yisen Wang,	(参考訳) 自己教師付き学習(SSL)の分野では、マスク付き画像モデリング(MIM)が対照的な学習手法と共に人気を集めている。 MIMは、入力画像の被写体領域を、その被写体部分を用いて再構成する。 MIM手法の顕著なサブセットは、個別のトークンを再構成ターゲットとして採用しているが、この選択の理論的基盤は未解明のままである。本稿では,これらのトークンの役割について考察し,それらのメリットと限界を明らかにすることを目的とする。 MIMと対照的な学習の関連性に基づいて、離散的なトークン化がモデルの一般化能力にどのように影響するかを包括的に理論的に理解する。さらに、MIMフレームワーク内の離散トークンの有効性を評価するために、TASと呼ばれる新しいメトリクスを提案する。本稿では,この指標に触発され,革新的なトークン化器の設計に寄与し,それに対応するクラスタMIM法を提案する。さまざまなベンチマークデータセットとViTバックボーン上での優れたパフォーマンスを示している。コードはhttps://github.com/PKU-ML/ClusterMIMで入手できる。 In the realm of self-supervised learning (SSL), masked image modeling (MIM) has gained popularity alongside contrastive learning methods. MIM involves reconstructing masked regions of input images using their unmasked portions. A notable subset of MIM methodologies employs discrete tokens as the reconstruction target, but the theoretical underpinnings of this choice remain underexplored. In this paper, we explore the role of these discrete tokens, aiming to unravel their benefits and limitations. Building upon the connection between MIM and contrastive learning, we provide a comprehensive theoretical understanding on how discrete tokenization affects the model's generalization capabilities. Furthermore, we propose a novel metric named TCAS, which is specifically designed to assess the effectiveness of discrete tokens within the MIM framework. Inspired by this metric, we contribute an innovative tokenizer design and propose a corresponding MIM method named ClusterMIM. It demonstrates superior performance on a variety of benchmark datasets and ViT backbones. Code is available at https://github.com/PKU-ML/ClusterMIM.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# FD-SOS: 口腔内画像からの骨剥離・脱ヒスンス検出のためのビジョンランゲージオープンセット検出器 FD-SOS: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images ( http://arxiv.org/abs/2407.09088v1 ) ライセンス: Link先を確認	Marawan Elbatel, Keyuan Liu, Yanqi Yang, Xiaomeng Li,	(参考訳) 歯科における骨形成・脱ヒスチンス(FD)の正確な検出は,効果的な治療計画立案に不可欠である。コーンビームCT(CBCT)はFDを評価するための金の標準であるが, 放射線照射, アクセシビリティの制限, 口腔内画像と比較して高コストである。口腔内画像では、歯科医はFDの鑑別診断に困難に直面している。本論文は口腔内画像のみからFD検出の新規かつ臨床的に重要な応用について述べる。そこで本研究では,口腔内画像からのFD検出のための新しいオープンセットオブジェクト検出器FD-SOSを提案する。 FD-SOSは、条件付きコントラストデノイング(CCDN)と歯特異的マッチング割り当て(TMA)の2つの新しい構成要素を持つ。これらのモジュールにより、FD-SOSは外部の歯科的意味論を効果的に活用できる。以上の結果から,本手法は既存の検出方法よりも優れ,同じ精度で35%のリコールを達成できた。コードは、https://github.com/xmed-lab/FD-SOSで入手できる。 Accurate detection of bone fenestration and dehiscence (FD) is crucial for effective treatment planning in dentistry. While cone-beam computed tomography (CBCT) is the gold standard for evaluating FD, it comes with limitations such as radiation exposure, limited accessibility, and higher cost compared to intraoral images. In intraoral images, dentists face challenges in the differential diagnosis of FD. This paper presents a novel and clinically significant application of FD detection solely from intraoral images. To achieve this, we propose FD-SOS, a novel open-set object detector for FD detection from intraoral images. FD-SOS has two novel components: conditional contrastive denoising (CCDN) and teeth-specific matching assignment (TMA). These modules enable FD-SOS to effectively leverage external dental semantics. Experimental results showed that our method outperformed existing detection methods and surpassed dental professionals by 35% recall under the same level of precision. Code is available at: https://github.com/xmed-lab/FD-SOS.	翻訳日:2024-07-16 00:17:04 公開日:2024-07-12
# 波動関数スナップショットからの量子輸送の診断 Diagnosing quantum transport from wave function snapshots ( http://arxiv.org/abs/2407.09092v1 ) ライセンス: Link先を確認	Devendra Singh Bhakuni, Roberto Verdel, Cristiano Muzzi, Riccardo Andreoni, Monika Aidelsburger, Marcello Dalmonte,	(参考訳) スピン鎖の非平衡量子力学を波動関数スナップショットのデータセットに主成分分析(PCA)を用いて研究し,これらのデータセット内で情報がどのように伝播するかを検討する。私たちが使用する量は、データセットから直接構築されたサンプル第2モーメント行列のスペクトルから導き出される。スピンやエネルギー輸送の異なるいくつかの相互作用するスピン鎖の研究により、データ情報の拡散はスピンやエネルギーの量子輸送の根底にあるものと同じ動的指数に従っていることが明らかとなった。具体的には,ハミルトニアン形式を仮定することなく,限られたサンプル数でエネルギー輸送を追跡するために,簡便でデータ駆動型かつ重要な解釈可能な診断を可能にする。これらの観測は、実験的および数値的な制約に沿って、わずかに有限の大きさと進化の時間で得られる。我々のフレームワークは、高次元系の力学の実験的量子シミュレーターデータセットに直接適用され、古典的なシミュレーション手法は通常、重要な制限に直面し、近距離および遠距離のクエンチにも等しく適用される。 We study nonequilibrium quantum dynamics of spin chains by employing principal component analysis (PCA) on data sets of wave function snapshots and examine how information propagates within these data sets. The quantities we employ are derived from the spectrum of the sample second moment matrix, built directly from data sets. Our investigations on several interacting spin chains featuring distinct spin or energy transport reveal that the growth of data information spreading follows the same dynamical exponents as that of the underlying quantum transport of spin or energy. Specifically, our approach enables an easy, data-driven, and importantly interpretable diagnostic to track energy transport with a limited number of samples, which is usually challenging without any assumption on the Hamiltonian form. These observations are obtained at a modest finite size and evolution time, which aligns with experimental and numerical constraints. Our framework directly applies to experimental quantum simulator data sets of dynamics in higher-dimensional systems, where classical simulation methods usually face significant limitations and apply equally to both near- and far-from-equilibrium quenches.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# アーキテクチャ変更のないビットレベル可逆変換器について On Exact Bit-level Reversible Transformers Without Changing Architectures ( http://arxiv.org/abs/2407.09093v1 ) ライセンス: Link先を確認	Guoqiang Zhang, J. P. Lewis, W. B. Kleijn,	(参考訳) 文献では、メモリ消費を減らしたり、トレーニングプロセスにおけるデータ処理を改善するために、様々な可逆的ディープニューラルネットワーク(DNN)モデルが提案されている。しかし、ほとんどの既存の可逆的なDNNは、特別な構造を持つように制約されているか、あるいは可逆性を実現するために元のDNNアーキテクチャをかなり変更して構築されている。本研究では,提案手法のアーキテクチャを変更せずに,正確なビットレベル可逆変換器を提案する。基本的な考え方は、各変圧器ブロックを通常の微分方程式(ODE)を解くためのオイラー積分近似として扱い、BDIAベースの拡散インバージョンのための双方向積分近似(BDIA)の技法を活性化量子化と共にニューラルネットワークに組み込むことである。トレーニングプロセスでは、BDIA変換器のハイパーパラメータ$\gamma$を2つの連続積分近似を平均化するための変換器ブロックあたり$\{0.5, -0.5\}$をランダムに取り、検証精度を向上させるためにモデルを正規化する。変圧器ブロック当たりの軽量側情報は、正確にビットレベルの可逆性を実現するためにバイナリ量子化損失を考慮するためにフォワードプロセスに格納する必要がある。推論手順では、期待$\mathbb{E}(\gamma)=0$ を用いて、BDIA変換器のアーキテクチャを活性化量子化のために変換器と同一にする。経験的研究により、BDIA変換器は$\gamma$パラメータの正規化効果により、元の変換器よりも優れていたことが示されている。 In the literature, various reversible deep neural networks (DNN) models have been proposed to reduce memory consumption or improve data-throughput in the training process. However, almost all existing reversible DNNs either are constrained to have special structures or are constructed by modifying the original DNN architectures considerably to enable reversibility. In this work, we propose exact bit-level reversible transformers without changing the architectures in the inference procedure. The basic idea is to first treat each transformer block as the Euler integration approximation for solving an ordinary differential equation (ODE) and then incorporate the technique of bidirectional integration approximation (BDIA) (see [26]) for BDIA-based diffusion inversion) into the neural architecture together with activation quantization to make it exactly bit-level reversible, referred to as BDIA-transformer. In the training process, we let a hyper-parameter $\gamma$ in BDIA-transformer randomly take one of the two values $\{0.5, -0.5\}$ per transformer block for averaging two consecutive integration approximations, which regularizes the models for improving the validation accuracy. Light-weight side information per transformer block is required to be stored in the forward process to account for binary quantization loss to enable exact bit-level reversibility. In the inference procedure, the expectation $\mathbb{E}(\gamma)=0$ is taken to make the resulting architectures of BDIA-transformer be identical to transformers up to activation quantization. Empirical study indicates that BDIA-transformers outperform their original counterparts notably due to the regularization effect of the $\gamma$ parameter.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# コンディショナル・デノナイジング・トランスに先立ってノイズを埋め込む「Beyond Image Prior」 Beyond Image Prior: Embedding Noise Prior into Conditional Denoising Transformer ( http://arxiv.org/abs/2407.09094v1 ) ライセンス: Link先を確認	Yuanfei Huang, Hua Huang,	(参考訳) 既存の学習ベースの復調法は、現実のシナリオで発生する雑音分布の変化に悩まされ、大規模なデータセットからイメージを一般化するためにモデルを訓練するのが一般的である。そこで本研究では,ノイズと画像の先行部分の分離を個別に強調することにより,難読化問題に対する新たな視点を提案する。この洞察は、従来の認知フレームワークの制約を克服するために設計された条件付き最適化フレームワークの開発の基礎を形成します。そこで我々はLoNPE(Locally Noise Prior Estimation)アルゴリズムを導入し,1つの生ノイズ画像から直接ノイズを正確に推定する。この推定は、カメラセンサーの撮像環境の明示的な事前表現として機能し、シーンの前の画像とは異なる。さらに,SRGBノイズ画像への実用的な応用に適した補助学習可能なLoNPEネットワークを設計する。推定雑音を利用した新しいコンディショナル・デノイング・トランス(Condformer)を提案する。この統合により、Condformerは最適化プロセスを複数の明示的な部分空間に分割することができ、モデルの一般化と柔軟性を大幅に向上させることができる。合成および実世界の両方のデータセットに対する大規模な実験評価により,提案手法が現状の手法よりも優れた性能を達成できることが実証された。ソースコードはhttps://github.com/YuanfeiHuang/Condformer.comで入手できる。 Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets, suffering from the variability in noise distributions encountered in real-world scenarios. In this work, we propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors. This insight forms the basis for our development of conditional optimization framework, designed to overcome the constraints of traditional denoising framework. To this end, we introduce a Locally Noise Prior Estimation (LoNPE) algorithm, which accurately estimates the noise prior directly from a single raw noisy image. This estimation acts as an explicit prior representation of the camera sensor's imaging environment, distinct from the image prior of scenes. Additionally, we design an auxiliary learnable LoNPE network tailored for practical application to sRGB noisy images. Leveraging the estimated noise prior, we present a novel Conditional Denoising Transformer (Condformer), by incorporating the noise prior into a conditional self-attention mechanism. This integration allows the Condformer to segment the optimization process into multiple explicit subspaces, significantly enhancing the model's generalization and flexibility. Extensive experimental evaluations on both synthetic and real-world datasets, demonstrate that the proposed method achieves superior performance over current state-of-the-art methods. The source code is available at https://github.com/YuanfeiHuang/Condformer.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# TAPFixer:負極性推論に基づくホームオートメーション脆弱性の自動検出と修復 TAPFixer: Automatic Detection and Repair of Home Automation Vulnerabilities based on Negated-property Reasoning ( http://arxiv.org/abs/2407.09095v1 ) ライセンス: Link先を確認	Yinbo Yu, Yuanqi Xu, Kepu Huang, Jiajia Liu,	(参考訳) Trigger-Action Programming(TAP)は、ホームオートメーション(HA)システムで人気のあるエンドユーザープログラミングフレームワークである。しかし、その単純化された構文は、脆弱なルールインタラクションを通じてHAシステムに新たな安全脅威をもたらす。ルールがデプロイされる前に、論理的に、物理的に根本原因を取り除くことによって、これらの脆弱性を正確に修正することが不可欠である。しかし、あまり研究されていない。本稿では,HAシステムにおけるルール間相互作用の脆弱性を自動的に検出し,修復する新しいフレームワークであるTAPFixerを提案する。 HAプロファイルからTAPルールを抽出し、物理的および遅延特性を持つオートマトンモデルに変換し、さまざまな正しさ特性を持つモデルチェックを実行する。次に、新しいネゲートプロパティ推論アルゴリズムを使用して、ネゲートプロパティに基づいたモデル抽象化と改善、モデルチェックを通じてパッチを自動的に推論する。マーケットHAアプリ(1177のTAPルールと53のプロパティ)でTAPFixerを評価し、ルールインタラクションの脆弱性の修復において86.65%の成功率を達成した。また,23名のHAユーザを募集し,実際のHAシナリオにおける脆弱性修復におけるTAPFixerの有用性を示すユーザスタディを実施している。 Trigger-Action Programming (TAP) is a popular end-user programming framework in the home automation (HA) system, which eases users to customize home automation and control devices as expected. However, its simplified syntax also introduces new safety threats to HA systems through vulnerable rule interactions. Accurately fixing these vulnerabilities by logically and physically eliminating their root causes is essential before rules are deployed. However, it has not been well studied. In this paper, we present TAPFixer, a novel framework to automatically detect and repair rule interaction vulnerabilities in HA systems. It extracts TAP rules from HA profiles, translates them into an automaton model with physical and latency features, and performs model checking with various correctness properties. It then uses a novel negated-property reasoning algorithm to automatically infer a patch via model abstraction and refinement and model checking based on negated-properties. We evaluate TAPFixer on market HA apps (1177 TAP rules and 53 properties) and find that it can achieve an 86.65% success rate in repairing rule interaction vulnerabilities. We additionally recruit 23 HA users to conduct a user study that demonstrates the usefulness of TAPFixer for vulnerability repair in practical HA scenarios.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# STD-LLM:LLMを用いた時空間データの空間的・時間的特性の理解 STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs ( http://arxiv.org/abs/2407.09096v1 ) ライセンス: Link先を確認	Yiheng Huang, Xiaowei Mao, Shengnan Guo, Yubin Chen, Youfang Lin, Huaiyu Wan,	(参考訳) 時空間予測と計算は、インテリジェント交通、都市計画、公衆衛生といった現実の動的システムにとって重要である。既存のほとんどの手法は個々の予測や計算作業に向いているが、どちらも設計されていない。さらに、ゼロショット学習や少数ショット学習では効果が低い。大規模言語モデル (LLM) は, ほとんどショット学習やゼロショット学習など様々なタスクにおいて強いパターン認識と推論能力を示してきたが, 時間的相関, 空間的接続性, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係, 時間的相関関係のモデル化が不十分である。本稿では,空間的時間的予測タスクとインプットタスクの両方を実装可能なSTD-LLMを提案する。 STD-LLMは仮想ノードと同様に空間的および時間的トークン化器によって空間的時間的相関を理解する。トポロジ対応ノード埋め込みは、LLMがデータのトポロジ構造を理解し、活用するために設計されている。さらに,LLMのためのハイパーグラフ学習モジュールを設計し,性能の向上と効率の向上を図る。大規模な実験により、STD-LLMは様々なデータセットの予測および計算タスク全体にわたって、強力な性能と一般化能力を示すことが示された。さらに、STD-LLMは、少数ショットとゼロショットの両方の学習タスクで有望な結果が得られる。 Spatial-temporal forecasting and imputation are important for real-world dynamic systems such as intelligent transportation, urban planning, and public health. Most existing methods are tailored for individual forecasting or imputation tasks but are not designed for both. Additionally, they are less effective for zero-shot and few-shot learning. While large language models (LLMs) have exhibited strong pattern recognition and reasoning abilities across various tasks, including few-shot and zero-shot learning, their development in understanding spatial-temporal data has been constrained by insufficient modeling of complex correlations such as the temporal correlations, spatial connectivity, non-pairwise and high-order spatial-temporal correlations within data. In this paper, we propose STD-LLM for understanding both spatial and temporal properties of \underline{S}patial-\underline{T}emporal \underline{D}ata with \underline{LLM}s, which is capable of implementing both spatial-temporal forecasting and imputation tasks. STD-LLM understands spatial-temporal correlations via explicitly designed spatial and temporal tokenizers as well as virtual nodes. Topology-aware node embeddings are designed for LLMs to comprehend and exploit the topology structure of data. Additionally, to capture the non-pairwise and higher-order correlations, we design a hypergraph learning module for LLMs, which can enhance the overall performance and improve efficiency. Extensive experiments demonstrate that STD-LLM exhibits strong performance and generalization capabilities across the forecasting and imputation tasks on various datasets. Moreover, STD-LLM achieves promising results on both few-shot and zero-shot learning tasks.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# RefinPaintによる音楽のプロファイリング - コンテクストのコンテクスト変更の場所と方法 Music Proofreading with RefinPaint: Where and How to Modify Compositions given Context ( http://arxiv.org/abs/2407.09099v1 ) ライセンス: Link先を確認	Pedro Ramoneda, Martin Rocamora, Taketo Akama,	(参考訳) 自己回帰生成変換器は音楽生成において重要な要素であり、コヒーレントな構成を生成するが、人間と機械の協調において課題に直面している。サンプリングプロセスを改善する反復的手法であるRefinPaintを提案する。フィードバックモデルを用いてより弱い音楽要素を識別し、塗装モデルで再サンプリングする選択を通知する。この二重焦点法は、機械が繰り返しのサイクルを通じて自動塗布生成を改善する能力を促進するだけでなく、自動証明読解によってその構成を洗練しようとする人間にとって貴重なツールを提供する。実験結果から,RefinPaintは,機械と人の両方が生成した楽曲の精製に有用であることを示す。このアプローチは創造性を促進するだけでなく、アマチュア作曲家が作品を改善するのにも役立っている。 Autoregressive generative transformers are key in music generation, producing coherent compositions but facing challenges in human-machine collaboration. We propose RefinPaint, an iterative technique that improves the sampling process. It does this by identifying the weaker music elements using a feedback model, which then informs the choices for resampling by an inpainting model. This dual-focus methodology not only facilitates the machine's ability to improve its automatic inpainting generation through repeated cycles but also offers a valuable tool for humans seeking to refine their compositions with automatic proofreading. Experimental results suggest RefinPaint's effectiveness in inpainting and proofreading tasks, demonstrating its value for refining music created by both machines and humans. This approach not only facilitates creativity but also aids amateur composers in improving their work.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# DANIEL:手書き文書の情報抽出・ラベリングのための高速文書注意ネットワーク DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents ( http://arxiv.org/abs/2407.09103v1 ) ライセンス: Link先を確認	Thomas Constum, Pierrick Tranouez, Thierry Paquet,	(参考訳) 手書き文書からの情報抽出には,従来,文書レイアウト解析,手書きテキスト認識,名前付きエンティティ認識の3つのステップがある。最近のアプローチでは、これらのステップを完全なエンドツーエンドアーキテクチャを使って単一のプロセスに統合しようと試みている。それにもかかわらず、これらの統合されたアプローチは、プレーンテキストにおける情報抽出に適用した場合、言語モデルの性能とまだ一致していない。本稿では,DANIEL(Document Attention Network for Information extract and Labelling)について紹介する。 DANIELは全ページ文書上でレイアウト認識、手書き認識、名前付きエンティティ認識を行う。さらに、複数の言語、レイアウト、タスクを同時に学習できる。名前付きエンティティ認識では、オントロジーを入力プロンプトを介して指定することができる。このアーキテクチャでは、任意のサイズの画像をサイズ変更せずに処理できる畳み込みエンコーダを採用し、トランスフォーマーベースの言語モデルに基づく自己回帰デコーダとペアリングする。 DANIELは、RIMES 2009における新しい最先端のパフォーマンス、手書き文字認識のためのM-POPP、名前付きエンティティ認識のためのIAM NERを含む、4つのデータセットで競合する結果を達成している。さらに、DANIELは既存のアプローチよりもはるかに高速です。トレーニングされたモデルのソースコードと重みは、 \url{https://github.com/Shulk97/daniel} で提供します。 Information extraction from handwritten documents involves traditionally three distinct steps: Document Layout Analysis, Handwritten Text Recognition, and Named Entity Recognition. Recent approaches have attempted to integrate these steps into a single process using fully end-to-end architectures. Despite this, these integrated approaches have not yet matched the performance of language models, when applied to information extraction in plain text. In this paper, we introduce DANIEL (Document Attention Network for Information Extraction and Labelling), a fully end-to-end architecture integrating a language model and designed for comprehensive handwritten document understanding. DANIEL performs layout recognition, handwriting recognition, and named entity recognition on full-page documents. Moreover, it can simultaneously learn across multiple languages, layouts, and tasks. For named entity recognition, the ontology to be applied can be specified via the input prompt. The architecture employs a convolutional encoder capable of processing images of any size without resizing, paired with an autoregressive decoder based on a transformer-based language model. DANIEL achieves competitive results on four datasets, including a new state-of-the-art performance on RIMES 2009 and M-POPP for Handwriting Text Recognition, and IAM NER for Named Entity Recognition. Furthermore, DANIEL is much faster than existing approaches. We provide the source code and the weights of the trained models at \url{https://github.com/Shulk97/daniel}.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# UserBoost: 行動バイオメトリックシステムへのより高速なエンローメントのためのユーザ固有の合成データを生成する UserBoost: Generating User-specific Synthetic Data for Faster Enrolment into Behavioural Biometric Systems ( http://arxiv.org/abs/2407.09104v1 ) ライセンス: Link先を確認	George Webber, Jack Sturgess, Ivan Martinovic,	(参考訳) 行動バイオメトリック認証システムは、ユーザにとって負担のかかるエンロラメント期間を必要とする。本研究では,生成的深層学習を用いたユーザジェスチャから合成ジェスチャの生成について検討し,単純な(非深層学習)認証モデルのトレーニングを応用した。具体的には,実データと合成データを併用することで,生体認証システムにエンロールするために必要な実データ点数を削減できることを示す。この手法を検証するために,2022年に提案されたWatchAuthのデータセットを用いて,支払い端末に手を伸ばした物理的なジェスチャーを用いて,スマートウォッチの支払いを認証するシステムを開発した。本研究では,これらの物理ジェスチャーを表す合成ユーザ固有の手首動作データを生成するための正規化オートエンコーダモデルを構築し,合成ジェスチャーの多様性と忠実さを実証する。実世界のシステムにおいて,学習における合成ジェスチャーを用いることで,分類能力の向上が期待できることを示す。この技術により、エラー率に悪影響を及ぼすことなく、WatchAuthライクなシステムにユーザをエンローリングするために必要なジェスチャー数を40%以上削減できる。 Behavioural biometric authentication systems entail an enrolment period that is burdensome for the user. In this work, we explore generating synthetic gestures from a few real user gestures with generative deep learning, with the application of training a simple (i.e. non-deep-learned) authentication model. Specifically, we show that utilising synthetic data alongside real data can reduce the number of real datapoints a user must provide to enrol into a biometric system. To validate our methods, we use the publicly available dataset of WatchAuth, a system proposed in 2022 for authenticating smartwatch payments using the physical gesture of reaching towards a payment terminal. We develop a regularised autoencoder model for generating synthetic user-specific wrist motion data representing these physical gestures, and demonstrate the diversity and fidelity of our synthetic gestures. We show that using synthetic gestures in training can improve classification ability for a real-world system. Through this technique we can reduce the number of gestures required to enrol a user into a WatchAuth-like system by more than 40% without negatively impacting its error rates.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# Flashアテンションによるパッケージングによるトレーニング効率の向上 Enhancing Training Efficiency Using Packing with Flash Attention ( http://arxiv.org/abs/2407.09105v1 ) ライセンス: Link先を確認	Achintya Kundu, Rhui Dih Lee, Laura Wynter, Raghu Kiran Ganti,	(参考訳) パディングは、各バッチの最長シーケンスの長さに合わせて、短いトレーニング例に特別なトークンを追加することで、LLMモデルのチューニングによく使用される。これはバッチ処理の統一性を保証するが、計算に無関係なパディングトークンを含め、GPUリソースを浪費することで非効率を導入する。一方、Hugging Face SFTトレーナーは、最大シーケンス長まで複数のトレーニング例を組み合わせるためにパッキングを使用するオプションを提供する。これにより、GPUリソースの最大活用が可能になる。しかし、各充填トレーニング例の適切なマスキングがなければ、SFTトレーナーを使用する場合、注意は正しく計算されない。私たちは、各例の適切な注意マスクで、パッキングとFlashアテンションを有効化し、分析し、このトレーニングパラダイムの利点を示します。 Padding is often used in tuning LLM models by adding special tokens to shorter training examples to match the length of the longest sequence in each batch. While this ensures uniformity for batch processing, it introduces inefficiencies by including irrelevant padding tokens in the computation and wastes GPU resources. On the other hand, the Hugging Face SFT trainer offers the option to use packing to combine multiple training examples up to the maximum sequence length. This allows for maximal utilization of GPU resources. However, without proper masking of each packed training example, attention will not be computed correctly when using SFT trainer. We enable and then analyse packing and Flash Attention with proper attention masking of each example and show the benefits of this training paradigm.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# MLOps: 業界における複数のケーススタディ MLOps: A Multiple Case Study in Industry 4.0 ( http://arxiv.org/abs/2407.09107v1 ) ライセンス: Link先を確認	Leonhard Faubel, Klaus Schmid,	(参考訳) 機械学習(ML)が産業4.0で普及するにつれて、産業環境においてMLを本番環境に導入するための体系的なアプローチをどのように実践できるかを理解する必要性が高まっている。ここでMLOpsが活躍する。 MLOpsは、MLモデルを確実かつ効率的に開発、テスト、デプロイ、管理するために使用されるプロセス、ツール、組織構造を指す。しかし、現在、産業におけるMLOpsの実践的実装に関する情報が不足している。この問題に対処するため、私たちは、MLOps専用のチームを持つ3つの大企業でMLOpsに関する複数のケーススタディを実施しました。本研究は,企業4.0のシナリオを4つ説明し,その実装と,多数のプロジェクトで直面した課題について,関連する知見を提供する。さらに、MLOpsプロセス、プロシージャ、技術、企業間のコンテキスト変化についても論じる。 As Machine Learning (ML) becomes more prevalent in Industry 4.0, there is a growing need to understand how systematic approaches to bringing ML into production can be practically implemented in industrial environments. Here, MLOps comes into play. MLOps refers to the processes, tools, and organizational structures used to develop, test, deploy, and manage ML models reliably and efficiently. However, there is currently a lack of information on the practical implementation of MLOps in industrial enterprises. To address this issue, we conducted a multiple case study on MLOps in three large companies with dedicated MLOps teams, using established tools and well-defined model deployment processes in the Industry 4.0 environment. This study describes four of the companies' Industry 4.0 scenarios and provides relevant insights into their implementation and the challenges they faced in numerous projects. Further, we discuss MLOps processes, procedures, technologies, as well as contextual variations among companies.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# 光共振器を内蔵した量子ネットワークレジスタ A quantum-network register assembled with optical tweezers in an optical cavity ( http://arxiv.org/abs/2407.09109v1 ) ライセンス: Link先を確認	Lukas Hartung, Matthias Seubert, Stephan Welte, Emanuele Distante, Gerhard Rempe,	(参考訳) 量子計算と量子通信は、ユーザーに古典物理学ではアクセスできない機能を提供すると期待されている。しかし、多くの量子ビットを持つ大規模システムへのスケーラビリティは困難である。 1つの解決策は、通信キュービットに可逆的にインタフェースされる計算キュービットを含む小規模量子レジスタからなる量子ネットワークを開発することである。本稿では、光学的ツイーザと光学格子の両方を用いて、2次元の原子配列を光学的空洞に決定的に組み立てるレジスタについて報告する。各原子から光子の放出を刺激し, 生成対検出効率90$\%の多重原子光子の絡み合いを示す。キャビティを介する量子論理と組み合わせることで、分散量子情報処理への経路を提供する。 Quantum computation and quantum communication are expected to provide users with capabilities inaccessible by classical physics. However, scalability to larger systems with many qubits is challenging. One solution is to develop a quantum network consisting of small-scale quantum registers containing computation qubits that are reversibly interfaced to communication qubits. Here we report on a register that uses both optical tweezers and optical lattices to deterministically assemble a two-dimensional array of atoms in an optical cavity. Harnessing a single-atom addressing beam, we stimulate the emission of a photon from each atom and demonstrate multiplexed atom-photon entanglement with a generation-to-detection efficiency approaching 90$\%$. Combined with cavity-mediated quantum logic, our approach provides a possible route to distributed quantum information processing.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# AI加速器のファンデーションモデルの推論最適化 Inference Optimization of Foundation Models on AI Accelerators ( http://arxiv.org/abs/2407.09111v1 ) ライセンス: Link先を確認	Youngsuk Park, Kailash Budhathoki, Liangfu Chen, Jonas Kübler, Jiaji Huang, Matthäus Kleindessner, Jun Huan, Volkan Cevher, Yida Wang, George Karypis,	(参考訳) 大規模言語モデル(LLM)を含む強力な基礎モデルとトランスフォーマーアーキテクチャは、さまざまな産業にまたがるジェネレーティブAIの新しい時代を支えてきた。産業と研究のコミュニティは、これらの基礎モデルに基づいて、多くの新しいアプリケーションを見てきた。このようなアプリケーションには、質問と回答、カスタマーサービス、画像とビデオの生成、コード補完などが含まれる。しかし、モデルパラメータの数が数十億に達すると、実際のシナリオにおける推論コストと高い遅延が禁止される。結果として、AIアクセラレータを使用したコスト効率が高く高速な推論の需要はさらに高くなる。この目的のために,本チュートリアルでは,AIアクセラレータを用いた補完推論最適化手法に関する総合的な議論を行っている。基本的なTransformerアーキテクチャとディープラーニングシステムフレームワークの概要から始め、高速かつメモリ効率の注意計算のためのシステム最適化手法を深く掘り下げ、AIアクセラレータに効率的に実装する方法について議論する。次に、高速トランスフォーマー推論の鍵となるアーキテクチャ要素について述べる。最後に、同じ文脈で様々なモデル圧縮と高速復号化戦略について検討する。 Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions, among others. However, as the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. As a result, the demand for cost-effective and fast inference using AI accelerators is ever more higher. To this end, our tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators. Beginning with an overview of basic Transformer architectures and deep learning system frameworks, we deep dive into system optimization techniques for fast and memory-efficient attention computations and discuss how they can be implemented efficiently on AI accelerators. Next, we describe architectural elements that are key for fast transformer inference. Finally, we examine various model compression and fast decoding strategies in the same context.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# 単一CdSeマジックナノ結晶の分光 Spectroscopy of Single CdSe Magic-Sized Nanocrystals ( http://arxiv.org/abs/2407.09114v1 ) ライセンス: Link先を確認	Gabriel Nagamine, Julian Santen, Juri G. Crimmann, Aniket S. Mule, Andrew B. Pun, David J. Norris,	(参考訳) サイズと形状の狭いナノ結晶(NC)を提供する化学合成は、NC研究にとって重要である。これにより、個々のステップで成長する半導体結晶のクラスであるマジックサイズNC(MSNC)が研究され、単一のサイズと形状(すなわち単分散性)を提供する可能性がある。しかし、室温で測定されたCdSe MSNCの光ルミネッセンス(PL)スペクトルは、最先端の量子ドットよりも広いことが報告されている。この違いは、ライン幅を広げるMSNCのサイズが小さいことや、そのサイズが分散しているためかもしれない。ここでは、MSNCの光学性能をよりよく理解するために、単粒子分光を行う。以上の結果から,CdSe MSNCは粒子-粒子間変動を示すが,最大寄与は単一粒子線幅による。粒径や殻の異なるMSNCを調べた結果, この単一粒子の拡大は, NC表面からの音響フォノンとの励起子結合と一致していることがわかった。小さいため、この結合と残留サイズ分散の役割は、アンサンブルの放出線幅に大きな影響を及ぼす。特に、小さい(直径2.7nm)MSNCと量子ドットを比較すると、MSNCのアンサンブルPL線幅は実際よりシャープになる。サイズが小さいため、MSNCは室温で強い反膨らみの$[g^{(2)}(0) \sim 0.05]$を示す。したがって、MSNCは明るく、スペクトル的に純粋な量子エミッタのクラスであり、強い3次元閉じ込めが必要な光電子・量子情報技術への応用に有用である。 Chemical syntheses that provide nanocrystals (NCs) with narrow distributions in size and shape are critical for NC research. This has led to the investigation of magic-sized NCs (MSNCs), a class of semiconductor crystallites that grow in discrete steps, potentially offering a single size and shape (i.e., monodispersity). However, the photoluminescence (PL) spectra of CdSe MSNCs measured at room temperature have been reported to be broader than those of state-of-the-art quantum dots. This difference could be due to the smaller size of MSNCs, which broadens their line widths, or due to their residual size dispersity. To better understand the optical performance of MSNCs, here we perform single-particle spectroscopy. Our results show that, while CdSe MSNCs do exhibit particle-to-particle variations that lead to modest broadening of their ensemble emission spectra, the largest contribution comes from the single-particle line width. By examining MSNCs with different sizes and shells, we conclude that this single-particle broadening is consistent with exciton coupling to acoustic phonons from the NC surface. Because of their small size, this coupling and the role of residual size dispersity have a larger impact on the ensemble emission line widths. Notably, when small (<2.7 nm diameter) MSNCs and quantum dots are compared, the ensemble PL line widths of MSNCs are actually sharper. Due to their small size, MSNCs also exhibit strong anti-bunching $[g^{(2)}(0) \sim 0.05]$ at room temperature. Thus, MSNCs represent a bright, spectrally pure class of quantum emitter, useful for applications in optoelectronic and quantum-information technologies where strong three-dimensional confinement is required.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# ResNetの保存性を考慮した層幅関係伝播 Layer-Wise Relevance Propagation with Conservation Property for ResNet ( http://arxiv.org/abs/2407.09115v1 ) ライセンス: Link先を確認	Seitaro Otsuki, Tsumugi Iida, Félix Doublet, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi, Komei Sugiura,	(参考訳) 説明法の透明な定式化は、一般的にブラックボックスモデルであるニューラルネットワークの予測の解明に不可欠である。レイヤワイド・レバレンス・プロパゲーション(Layer-wise Relevance Propagation, LRP)は、リバレンス・スコアをバックプロパゲートすることで、モデルがアーキテクチャを通して逆向きに予測する流れを透過的に追跡する、よく確立された手法である。しかし、従来のLRPはスキップ接続の存在を十分に考慮していないため、広く使われているResNetアーキテクチャへの応用は十分に検討されていない。本研究では、スキップ接続からの出力が残留ブロックからの出力と収束する点において、Relevance Splittingを導入することで、LRPをResNetモデルに拡張する。我々の定式化はプロセス全体の保存性を保証し、生成した説明の完全性を維持する。提案手法の有効性を評価するため,ImageNetとCaltech-UCSD Birds-200-2011データセットを用いて実験を行った。本手法は, 保存性を維持しつつ, インサーション・削除スコアなどの標準評価指標の基準法よりも優れた性能を実現する。詳細はhttps://5ei74r0.github.io/lrp-for-resnet.page/で公開します。 The transparent formulation of explanation methods is essential for elucidating the predictions of neural networks, which are typically black-box models. Layer-wise Relevance Propagation (LRP) is a well-established method that transparently traces the flow of a model's prediction backward through its architecture by backpropagating relevance scores. However, the conventional LRP does not fully consider the existence of skip connections, and thus its application to the widely used ResNet architecture has not been thoroughly explored. In this study, we extend LRP to ResNet models by introducing Relevance Splitting at points where the output from a skip connection converges with that from a residual block. Our formulation guarantees the conservation property throughout the process, thereby preserving the integrity of the generated explanations. To evaluate the effectiveness of our approach, we conduct experiments on ImageNet and the Caltech-UCSD Birds-200-2011 dataset. Our method achieves superior performance to that of baseline methods on standard evaluation metrics such as the Insertion-Deletion score while maintaining its conservation property. We will release our code for further research at https://5ei74r0.github.io/lrp-for-resnet.page/	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# 光位相雑音のフィードフォワードキャンセルによる量子状態伝達の促進 Enhanced quantum state transfer via feedforward cancellation of optical phase noise ( http://arxiv.org/abs/2407.09119v1 ) ライセンス: Link先を確認	Benjamin P. Maddox, Jonathan M. Mortlock, Tom R. Hepworth, Adarsh P. Raghuram, Philip D. Gregory, Alexander Guttridge, Simon L. Cornish,	(参考訳) 量子科学の多くの実験プラットフォームは、レーザー磁場による状態制御に依存している。しかし、光位相ノイズにより制御忠実度が制限されることが少なくない。これは、高周波位相ノイズがフィードバックの避けられない結果となる安定化レーザーシステムで悪化する。ここでは,超低温RbCs分子の114 THzにおけるSTIRAP状態遷移におけるレーザー位相ノイズを抑制するための光フィードフォワード法を実装した。単一分子上で100以上の状態伝達を行うことで、利用可能なレーザー強度のみによって制限された98.7(1)%の転写効率を著しく向上させる。 Many experimental platforms for quantum science depend on state control via laser fields. Frequently, however, the control fidelity is limited by optical phase noise. This is exacerbated in stabilized laser systems where high-frequency phase noise is an unavoidable consequence of feedback. Here we implement an optical feedforward technique to suppress laser phase noise in the STIRAP state transfer of ultracold RbCs molecules, across 114 THz, from a weakly bound Feshbach state to the rovibrational ground state. By performing over 100 state transfers on single molecules, we measure a significantly enhanced transfer efficiency of 98.7(1)% limited only by available laser intensity.	翻訳日:2024-07-16 00:07:20 公開日:2024-07-12
# URRL-IMVC:不完全なマルチビュークラスタリングのための統一とロバスト表現学習 URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering ( http://arxiv.org/abs/2407.09120v1 ) ライセンス: Link先を確認	Ge Teng, Ting Mao, Chen Shen, Xiang Tian, Xuesong Liu, Yaowu Chen, Jieping Ye,	(参考訳) 不完全なマルチビュークラスタリング(IMVC)は、部分的にしか利用できないマルチビューデータをクラスタリングすることを目的としている。これは、マルチビュー情報を効果的に活用し、欠落したビューの影響を緩和する、という2つの大きな課題を提起する。一般的なソリューションでは、クロスビューのコントラスト学習と、ビューリカバリの欠如が採用されている。しかし、彼らは意見の一致にのみ焦点をあてることで、貴重な補完情報を無視するか、監督が欠如しているため、信頼できない見解を提供するかのいずれかである。これらの制約に対処するため,不完全なマルチビュークラスタリングのためのUnified and Robust Representation Learning(URRL-IMVC)を提案する。 URRL-IMVCは、複数のビューや隣接するサンプルからの情報を統合することで、失われた状態を見るのに堅牢な統合埋め込みを直接学習する。第一に、クロスビューコントラスト学習の限界を克服するため、URRL-IMVCはアテンションベースのオートエンコーダフレームワークを導入し、マルチビュー情報を融合し、統合された埋め込みを生成する。第2に、URRL-IMVCは、KNNの計算とデータ拡張技術により、ビューロス条件に対する統一的な埋め込みの堅牢性を直接的に強化し、明らかに欠落したビューリカバリを不要にする。最後に、クラスタリングモジュールやエンコーダのカスタマイズなど、全体的なパフォーマンスをさらに向上するために、漸進的な改善が導入されている。提案するURRL-IMVCフレームワークを様々なベンチマークデータセット上で広範囲に評価し,その最先端性能を実証した。さらに, 設計の有効性を検証するため, 包括的アブレーション研究を行った。 Incomplete multi-view clustering (IMVC) aims to cluster multi-view data that are only partially available. This poses two main challenges: effectively leveraging multi-view information and mitigating the impact of missing views. Prevailing solutions employ cross-view contrastive learning and missing view recovery techniques. However, they either neglect valuable complementary information by focusing only on consensus between views or provide unreliable recovered views due to the absence of supervision. To address these limitations, we propose a novel Unified and Robust Representation Learning for Incomplete Multi-View Clustering (URRL-IMVC). URRL-IMVC directly learns a unified embedding that is robust to view missing conditions by integrating information from multiple views and neighboring samples. Firstly, to overcome the limitations of cross-view contrastive learning, URRL-IMVC incorporates an attention-based auto-encoder framework to fuse multi-view information and generate unified embeddings. Secondly, URRL-IMVC directly enhances the robustness of the unified embedding against view-missing conditions through KNN imputation and data augmentation techniques, eliminating the need for explicit missing view recovery. Finally, incremental improvements are introduced to further enhance the overall performance, such as the Clustering Module and the customization of the Encoder. We extensively evaluate the proposed URRL-IMVC framework on various benchmark datasets, demonstrating its state-of-the-art performance. Furthermore, comprehensive ablation studies are performed to validate the effectiveness of our design.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# 安全でないときはいつでもリユースする - 分離されたリファイントレーニングによるLCMの安全性向上 Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training ( http://arxiv.org/abs/2407.09121v1 ) ライセンス: Link先を確認	Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu,	(参考訳) 本研究では,大言語モデル(LLM)における安全チューニングの実践において,安全チューニングデータ内の拒絶位置バイアスを特定し,対処することにより,安全でないコンテンツの生成を適切に拒否するモデルの能力を損なう。本稿では, LLM に対して, いかなる応答位置においても有害なプロンプトへのコンプライアンスを拒否し, 安全性を著しく向上させる新しいアプローチである Decoupled Refusal Training (DeRTa) を導入する。 DeRTaは,(1) 有害応答の開始に有害応答のセグメントを付加することにより,安全でないコンテンツの認識と回避をモデルに訓練する,(MLE) 有害応答列全体を通して潜在的障害から安全拒絶へ移行する能力を持つモデルを装備する強化遷移最適化(RTO) という2つの新しいコンポーネントを組み込んだ。 6つの攻撃シナリオにわたるLLaMA3およびMistralモデルファミリーを用いて実施した実証実験により,本手法は,性能を損なうことなくモデル安全性を向上するだけでなく,攻撃防御においてGPT-4などのよく知られたモデルを上回ることを実証した。本手法は, GPT-4 と LLaMA3-70B-Instruct を併用した最近の攻撃手法 (CodeAttack など) を効果的に防御する。コードとデータはhttps://github.com/RobustNLP/DeRTa.comで確認できます。 This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model families across six attack scenarios, demonstrates that our method not only improves model safety without compromising performance but also surpasses well-known models such as GPT-4 in defending against attacks. Importantly, our approach successfully defends recent advanced attack methods (e.g., CodeAttack) that have jailbroken GPT-4 and LLaMA3-70B-Instruct. Our code and data can be found at https://github.com/RobustNLP/DeRTa.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# 量子力学の通常の枠組みにおける一般化された不確実性原理 The generalized uncertainty principle within the ordinary framework of quantum mechanics ( http://arxiv.org/abs/2407.09123v1 ) ライセンス: Link先を確認	Y. V. Przhiyalkovskiy,	(参考訳) 量子力学における基底座標と運動量交換関係の適切な変形は、小さなスケールにおける重力の影響を考慮に入れた現象論的アプローチを提供する。平方運動量項の導入は、粒子位置の最小不確実性をプランク長に制限する一般化された不確実性原理をもたらす。しかし、そのような可換子の変形は形式性を大きく変え、量子力学の正準形式性とは分離する。本研究では、位置と運動量演算子の変形代数を通常の量子力学の枠組みに組み込むことができることを示した。 A proper deformation of the underlying coordinate and momentum commutation relations in quantum mechanics provides a phenomenological approach to account for the influence of gravity on small scales. Introducing the squared momentum term results in a generalized uncertainty principle, which limits the minimum uncertainty in particle position to the Planck length. However, such a deformation of the commutator significantly changes the formalism, making it separate from the canonical formalism of quantum mechanics. In this study, it is shown that the deformed algebra of position and momentum operators can be incorporated into the framework of ordinary quantum mechanics.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# クラスタ同期レーザネットワークを用いた分散マルチエージェント強化学習アルゴリズム Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network ( http://arxiv.org/abs/2407.09124v1 ) ライセンス: Link先を確認	Shun Kotoku, Takatomo Mihana, André Röhm, Ryoichi Horisaki,	(参考訳) マルチエージェント強化学習(MARL)は、無線ネットワークや自律運転など、さまざまな分野に適用可能な重要な原則を研究する。本稿では,MARLの最も基本的な問題であるCMAB問題に対処するフォトニクスに基づく意思決定アルゴリズムを提案する。計算機シミュレーションにより,光結合型レーザーのカオス振動とクラスタ同期が,エージェント間で情報を共有することなく協調的な意思決定を容易にし,効率よく探索と利用のバランスをとることを示した。本研究は, 単純なアルゴリズムによって制御される複雑な物理過程を活用することにより, 分散強化学習を実現する方法を示す。 Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# ディラック量子場における電荷演算子定義問題について On the Problem of Defining Charge Operators for the Dirac Quantum Field ( http://arxiv.org/abs/2407.09126v1 ) ライセンス: Link先を確認	Pablo Costa Rico, Roderich Tumulka,	(参考訳) 第二量子化ディラック方程式の標準ヒルベルト空間上で、トータル電荷(ポジトロン数マイナス電子数)に対する演算子$Q$を定義する方法が知られている。ここで、作用素 $Q_A$ は3次元物理空間における領域 $A\subseteq \mathbb{R}^3$ の電荷量を表す。 Q_A$ の自然な公式はあるが、ここで説明するように、それを数学的に正確な定義にすることは困難である。まず、$Q_A$ は級数として書くことができるが、収束は絶望的とは思えない。第二に、$A$ のいくつかの選択について、$Q_A$ が定義できるならば、その領域は真空ベクトルまたは真空から得られるベクトルのいずれかを生成および消滅演算子に多項式を適用することによって含めることができないことを示す。どちらの観測も、一般的な$A$に対する$Q_A$の存在に反対している。 It is well known how to define the operator $Q$ for the total charge (i.e., positron number minus electron number) on the standard Hilbert space of the second-quantized Dirac equation. Here we ask about operators $Q_A$ representing the charge content of a region $A\subseteq \mathbb{R}^3$ in 3d physical space. There is a natural formula for $Q_A$ but, as we explain, there are difficulties about turning it into a mathematically precise definition. First, $Q_A$ can be written as a series but its convergence seems hopeless. Second, we show for some choices of $A$ that if $Q_A$ could be defined then its domain could not contain either the vacuum vector or any vector obtained from the vacuum by applying a polynomial in creation and annihilation operators. Both observations speak against the existence of $Q_A$ for generic $A$.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# 産業プロセスモデリングにおける説明可能な人工知能のロバスト性 Robustness of Explainable Artificial Intelligence in Industrial Process Modelling ( http://arxiv.org/abs/2407.09127v1 ) ライセンス: Link先を確認	Benedikt Kantz, Clemens Staudinger, Christoph Feilmayr, Johannes Wachlmayr, Alexander Haberl, Stefan Schuster, Franz Pernkopf,	(参考訳) eXplainable Artificial Intelligence (XAI)は、ブラックボックスモデルの理解可能な説明を提供することを目的としている。本稿では,地中真実シミュレーションと感度解析に基づいて,現在のXAI手法を評価する。この目的のために、我々は、HAAP(SHAP)、LIME(Local Interpretable Model-Agnostic Explanations)、ALE(Averaged Local Effects)、Smooth Gradients(SG)といったXAI手法の限界とロバスト性をよりよく理解するために、Electric Arc Furnace(EAF)モデルを使用しました。これらのXAI法は, 各種ブラックボックスモデルに適用され, その正しさをデータ生成過程の地味感度と比較した。その結果、機械学習(ML)モデルが正確にプロセスをキャプチャする能力は、実際に、基礎となるデータ生成プロセスの説明可能性の正しさと結びついていることが判明した。さらに、XAI法とXAI法の違いが、モデル化された産業プロセスの真の感度を正確に予測する能力の相違について述べる。 eXplainable Artificial Intelligence (XAI) aims at providing understandable explanations of black box models. In this paper, we evaluate current XAI methods by scoring them based on ground truth simulations and sensitivity analysis. To this end, we used an Electric Arc Furnace (EAF) model to better understand the limits and robustness characteristics of XAI methods such as SHapley Additive exPlanations (SHAP), Local Interpretable Model-agnostic Explanations (LIME), as well as Averaged Local Effects (ALE) or Smooth Gradients (SG) in a highly topical setting. These XAI methods were applied to various types of black-box models and then scored based on their correctness compared to the ground-truth sensitivity of the data-generating processes using a novel scoring evaluation methodology over a range of simulated additive noise. The resulting evaluation shows that the capability of the Machine Learning (ML) models to capture the process accurately is, indeed, coupled with the correctness of the explainability of the underlying data-generating process. We furthermore show the differences between XAI methods in their ability to correctly predict the true sensitivity of the modeled industrial process.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# 大規模言語モデルチュータを用いた学生共振誤差の段階的検証と修正 Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors ( http://arxiv.org/abs/2407.09136v1 ) ライセンス: Link先を確認	Nico Daheim, Jakub Macina, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan,	(参考訳) 大規模言語モデル(LLM)は、高品質なパーソナライズされた教育を全員に拡大する機会を提供する。これに対する有望なアプローチは、学生の問題解決を支援するダイアログ学習モデルを構築することである。しかしながら、既存のLLMは推論問題の解法においてよく機能するが、学生の誤りを正確に検出し、これらの誤りに対するフィードバックを調整することは困難である。教師が学生の誤りを識別し、それに基づいて回答をカスタマイズする現実世界の教育実践に触発されて、学生のソリューションを検証することに集中し、そのような検証に基礎を置くことによって、教師の反応生成の全体的な品質が向上することを示す。教師がアノテートした最初のエラーステップで、1K段階の算数推論チェーンのデータセットを収集する。学生ソリューションの誤りを見つけることは、現在のモデルでは難しいことを実証的に示す。これらの誤りを検出するための検証器を複数提案し,評価する。自動評価と人的評価の両方を用いて,既存のベースラインに比べて幻覚の少ない学生の誤りに対する高度に標的を絞った応答に対して,学生のソリューション検証が生成モデルを操ることを示す。 Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. A promising approach towards this means is to build dialog tutoring models that scaffold students' problem-solving. However, even though existing LLMs perform well in solving reasoning questions, they struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions and show how grounding to such verification improves the overall quality of tutor response generation. We collect a dataset of 1K stepwise math reasoning chains with the first error step annotated by teachers. We show empirically that finding the mistake in a student solution is challenging for current models. We propose and evaluate several verifiers for detecting these errors. Using both automatic and human evaluation we show that the student solution verifiers steer the generation model towards highly targeted responses to student errors which are more often correct with less hallucinations compared to existing baselines.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# AWRSによるニュース回避の展望:回避型レコメンダシステム A Look Into News Avoidance Through AWRS: An Avoidance-Aware Recommender System ( http://arxiv.org/abs/2407.09137v1 ) ライセンス: Link先を確認	Igor L. R. Azevedo, Toyotaro Suzumura, Yuichiro Yasui,	(参考訳) 近年、ジャーナリストはニュース記事の回避傾向の高まり、特に特定の分野における懸念を表明している。この問題はレコメンデーターシステムの台頭によって悪化している。我々の研究は、推奨システムは回避を基本要因として考えるべきであることを示唆している。我々は、ニュース記事は、露出、関連性、回避の3つの主要な要素によって特徴づけられると論じる。これらの課題に対処するために、AWRS(Avoidance-Aware Recommender System)を導入する。このフレームワークは、ニュース記事の回避がユーザの好みに関する重要な情報を伝えるという前提に基づいて、ニュースを推薦する際の回避意識を取り入れている。異なる言語(英語,ノルウェー語,日本語)における3つのニュースデータセットの評価結果から,提案手法が既存手法より優れていることを示す。 In recent years, journalists have expressed concerns about the increasing trend of news article avoidance, especially within specific domains. This issue has been exacerbated by the rise of recommender systems. Our research indicates that recommender systems should consider avoidance as a fundamental factor. We argue that news articles can be characterized by three principal elements: exposure, relevance, and avoidance, all of which are closely interconnected. To address these challenges, we introduce AWRS, an Avoidance-Aware Recommender System. This framework incorporates avoidance awareness when recommending news, based on the premise that news article avoidance conveys significant information about user preferences. Evaluation results on three news datasets in different languages (English, Norwegian, and Japanese) demonstrate that our method outperforms existing approaches.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# 正確さは必要なものばかりではない Accuracy is Not All You Need ( http://arxiv.org/abs/2407.09141v1 ) ライセンス: Link先を確認	Abhinav Dutta, Sanjeev Krishnan, Nipun Kwatra, Ramachandran Ramjee,	(参考訳) 大規模言語モデル(LLM)を量子化などの手法を用いて圧縮する場合,その妥当性を示す主要な方法は,様々なベンチマーク上でモデルの精度を測定することであり,ベースラインモデルと圧縮モデルの精度が近い場合には,品質の無視できる劣化があったと仮定する。しかし,ベースラインモデルと圧縮モデルの精度が類似している場合でも,フリップの現象を観察し,正解が正解から正解に逆転する現象を観察する。また,複数の圧縮技術,モデル,データセットにわたって,圧縮モデルの動作がベースラインモデルと著しく異なることを示し,その精度が類似している場合でも,圧縮モデルの挙動がベースラインモデルと著しく異なることを明らかにする。 When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks.If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality.However, even when the accuracy of baseline and compressed model are similar, we observe the phenomenon of flips, wherein answers change from correct to incorrect and vice versa in proportion.We conduct a detailed study of metrics across multiple compression techniques, models and datasets, demonstrating that the behavior of compressed models as visible to end-users is often significantly different from the baseline model, even when accuracy is similar.We further evaluate compressed models qualitatively and quantitatively using MT-Bench and show that compressed models are significantly worse than baseline models in this free-form generative task.Thus, we argue that compression techniques should also be evaluated using distance metrics.We propose two such metrics, KL-Divergence and flips, and show that they are well correlated.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# 分散型ソフトウェア開発チームのための機密データのセキュア化 - 暗号化されたコンテナファイル Securing Confidential Data For Distributed Software Development Teams: Encrypted Container File ( http://arxiv.org/abs/2407.09142v1 ) ライセンス: Link先を確認	Tobias J. Bauer, Andreas Aßmuth,	(参考訳) 現代のソフトウェアエンジニアリングの文脈では、世界中のメンバが参加する国際チームを含むクラウドネイティブなソフトウェア開発の傾向があります。 GitHubのようなクラウドベースのバージョン管理サービスは、一般的にソースコードやその他のファイルに使われている。しかし、特定の開発者のみへのアクセスを制限するために機密データを暗号化する必要があるため、異なる企業や組織の開発者がプラットフォームを共有する場合、問題が発生する。本稿では,この問題に対処する既存のツールについて論じ,その欠点を浮き彫りにする。著者らは、他のツールで見られる欠陥を克服するために設計された、独自のソリューションであるEncrypted Container Filesを提案する。 In the context of modern software engineering, there is a trend towards Cloud-native software development involving international teams with members from all over the world. Cloud-based version management services like GitHub are commonly used for source code and other files. However, a challenge arises when developers from different companies or organizations share the platform, as sensitive data should be encrypted to restrict access to certain developers only. This paper discusses existing tools addressing this issue, highlighting their shortcomings. The authors propose their own solution, Encrypted Container Files, designed to overcome the deficiencies observed in other tools.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# セマンティックセグメンテーションの逆行性ロバスト性の評価:より強引な支払いの試み Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off ( http://arxiv.org/abs/2407.09150v1 ) ライセンス: Link先を確認	Levente Halmosi, Bálint Mohos, Márk Jelasity,	(参考訳) 機械学習モデルは、非常に大きな出力エラーを引き起こすよう最適化された小さな逆入力摂動に対して脆弱である。この脆弱性を計測するためには、このような敵の摂動を見つけるための信頼性の高い方法が必要である。画像分類モデルでは,評価手法が時間的評価に立脚している。しかし, セマンティックセグメンテーションの分野では, 対人摂動に対する感度の良好な近似は, 現状の満足度よりもはるかに多くの労力を要すると論じる。この主張を支持するために、我々は多くのよく知られたロバストセグメンテーションモデルを再評価し、広範な実証研究を行った。我々は新たな攻撃を提案し、文学で利用可能な最強の攻撃と組み合わせる。また、モデルの感度を詳細に分析する。その結果, 現状のモデルのほとんどは, 従来報告されていたよりも, 対向摂動に対する感度が劇的に高いことが示唆された。小さいオブジェクトは、たとえ大きなオブジェクトがロバストであっても、しばしばより容易に攻撃されるが、現在の評価指標からは明らかでない現象である。我々の結果は、異なるモデルが異なる攻撃に対して脆弱であることが多いため、多様な強力な攻撃が必要であることも示している。 Machine learning models are vulnerable to tiny adversarial input perturbations optimized to cause a very large output error. To measure this vulnerability, we need reliable methods that can find such adversarial perturbations. For image classification models, evaluation methodologies have emerged that have stood the test of time. However, we argue that in the area of semantic segmentation, a good approximation of the sensitivity to adversarial perturbations requires significantly more effort than what is currently considered satisfactory. To support this claim, we re-evaluate a number of well-known robust segmentation models in an extensive empirical study. We propose new attacks and combine them with the strongest attacks available in the literature. We also analyze the sensitivity of the models in fine detail. The results indicate that most of the state-of-the-art models have a dramatically larger sensitivity to adversarial perturbations than previously reported. We also demonstrate a size-bias: small objects are often more easily attacked, even if the large objects are robust, a phenomenon not revealed by current evaluation metrics. Our results also demonstrate that a diverse set of strong attacks is necessary, because different models are often vulnerable to different attacks.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# コインの2つの側面:LLMの幻覚生成とLLMの評価装置としてのLLMの検出 The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs ( http://arxiv.org/abs/2407.09152v1 ) ライセンス: Link先を確認	Anh Thu Maria Bui, Saskia Felizitas Brech, Natalie Hußfeldt, Tobias Jennert, Melanie Ullrich, Timo Breuer, Narjes Nikzad Khasmakhi, Philipp Schaer,	(参考訳) LLM(Large Language Models)における幻覚検出は,その信頼性を確保するために重要である。本研究はCLEF ELOQUENT HalluciGen共有タスクへの参加について述べる。 Llama 3, Gemma, GPT-3.5 Turbo, GPT-4の4つのLLMの能力について検討した。また,4つのモデルをすべて組み込むために,アンサンブル多数決を行った。その結果,幻覚発生および検出タスクの処理におけるLLMの長所と短所について,貴重な知見が得られた。 Hallucination detection in Large Language Models (LLMs) is crucial for ensuring their reliability. This work presents our participation in the CLEF ELOQUENT HalluciGen shared task, where the goal is to develop evaluators for both generating and detecting hallucinated content. We explored the capabilities of four LLMs: Llama 3, Gemma, GPT-3.5 Turbo, and GPT-4, for this purpose. We also employed ensemble majority voting to incorporate all four models for the detection task. The results provide valuable insights into the strengths and weaknesses of these LLMs in handling hallucination generation and detection tasks.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# マルチモーダルトランス機能融合によるポスターアテンションによる映画レコメンデーション Movie Recommendation with Poster Attention via Multi-modal Transformer Feature Fusion ( http://arxiv.org/abs/2407.09157v1 ) ライセンス: Link先を確認	Linhan Xia, Yicheng Yang, Ziou Chen, Zheng Yang, Shengxin Zhu,	(参考訳) 事前訓練されたモデルは、訓練時間を著しく短縮するために特定のタスクのために微調整できる大きなダットセットから一般的な表現を学ぶ。生成事前学習型トランスフォーマー(GPT)、変換器(BERT)からの双方向エンコーダ表現、視覚トランスフォーマー(ViT)といった事前学習モデルが、機械学習における現在の研究の基盤となっている。本研究は,映画ごとによく設計されたポスターの特徴を抽出し,映画の物語文を記述したマルチモーダル映画レコメンデーションシステムを提案する。本システムは,テキストモダリティの情報抽出にBERTモデル,ポスター/イメージモダリティ情報を抽出するViTモデル,ユーザの好みを予測するためにすべてのモダリティを特徴融合するTransformerアーキテクチャを用いる。トレーニング済みの基礎モデルとダウンストリームアプリケーション内のいくつかの小さなデータセットの統合は、より包括的な方法でマルチモーダルコンテンツ機能をキャプチャし、より正確なレコメンデーションを提供する。概念実証モデルの効率は、MovieLens 100Kと1Mデータセットの標準ベンチマーク問題によって検証される。ユーザレーティングの予測精度はベースラインアルゴリズムと比較して向上し、映画やビデオのレコメンデーションに適用されるクロスモーダルアルゴリズムの可能性を示す。 Pre-trained models learn general representations from large datsets which can be fine-turned for specific tasks to significantly reduce training time. Pre-trained models like generative pretrained transformers (GPT), bidirectional encoder representations from transformers (BERT), vision transfomers (ViT) have become a cornerstone of current research in machine learning. This study proposes a multi-modal movie recommendation system by extract features of the well designed posters for each movie and the narrative text description of the movie. This system uses the BERT model to extract the information of text modality, the ViT model applied to extract the information of poster/image modality, and the Transformer architecture for feature fusion of all modalities to predict users' preference. The integration of pre-trained foundational models with some smaller data sets in downstream applications capture multi-modal content features in a more comprehensive manner, thereby providing more accurate recommendations. The efficiency of the proof-of-concept model is verified by the standard benchmark problem the MovieLens 100K and 1M datasets. The prediction accuracy of user ratings is enhanced in comparison to the baseline algorithm, thereby demonstrating the potential of this cross-modal algorithm to be applied for movie or video recommendation.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# 長時間ビデオにおける弱教師付き自閉症重症度評価 Weakly-supervised Autism Severity Assessment in Long Videos ( http://arxiv.org/abs/2407.09159v1 ) ライセンス: Link先を確認	Abid Ali, Mahmoud Ali, Jean-Marc Odobez, Camilla Barbini, Séverine Dubuisson, Francois Bremond, Susanne Thümmler,	(参考訳) 自閉症スペクトラム障害(Autism Spectrum disorder、ASD)は、社会的コミュニケーションと相互相互作用における課題と、反復的およびステレオタイプ的行動に特徴付けられる多様な神経生物学的条件の集合である。長い、トリミングされていないビデオにおける非定型的な行動パターンは、ASDを持つ子供のバイオマーカーとして機能する。本稿では,長編ビデオの時空間的特徴を利用して,自閉症検出のための典型的,非典型的動作を学習するビデオベースの弱教師付き手法を提案する。そこで本研究では,重度スコアをさらに分類するために,TN-MLPの浅層ネットワークを提案する。臨床専門医による自閉症児の実際の評価ビデオ(重症度スコア)について検討した。実験により, 自閉症スペクトラム分析における臨床医を支援する行動バイオマーカーの有効性が示された。 Autism Spectrum Disorder (ASD) is a diverse collection of neurobiological conditions marked by challenges in social communication and reciprocal interactions, as well as repetitive and stereotypical behaviors. Atypical behavior patterns in a long, untrimmed video can serve as biomarkers for children with ASD. In this paper, we propose a video-based weakly-supervised method that takes spatio-temporal features of long videos to learn typical and atypical behaviors for autism detection. On top of that, we propose a shallow TCN-MLP network, which is designed to further categorize the severity score. We evaluate our method on actual evaluation videos of children with autism collected and annotated (for severity score) by clinical professionals. Experimental results demonstrate the effectiveness of behavioral biomarkers that could help clinicians in autism spectrum analysis.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# 空間フォトニックイジングマシンにおける任意のイジングハミルトニアンの符号化 Encoding arbitrary Ising Hamiltonians on Spatial Photonic Ising Machines ( http://arxiv.org/abs/2407.09161v1 ) ライセンス: Link先を確認	Jason Sakellariou, Alexis Askitopoulos, Georgios Pastras, Symeon I. Tsintzos,	(参考訳) フォトニックイジングマシンは、イジングモデルの基底状態を見つける問題に還元できる組合せ最適化問題に取り組むことを目的とした、新しい計算パラダイムである。空間フォトニックイジングマシンは、完全に接続された大規模スピンシステムのシミュレーションに有利であることが証明されている。しかし、一般的な相互作用行列である$J$の細かい制御は、最適化プロセスの実行時間を制限する固有値分解法によってのみ達成されている。任意の結合と接続性を持つIsing Hamiltonianの符号化を可能にする、完全な相互作用行列を直接制御できるSPIMインスタンスを導入し、実験的に検証する。実験によって測定されたIsingエネルギーと理論的な期待値との整合性を実証し、未加重グラフ分割問題と重み付きグラフ分割問題の両方を解き、シミュレートされたアニールによる最適解への体系的収束を示す。提案手法は,システム固有の利点を犠牲にすることなく,実世界のアプリケーションにおけるSPIMの適用性を大幅に拡張し,SPIMデバイス上でIsingモデルと同等のNP問題を完全にエンコードする方法を開拓する。 Photonic Ising Machines constitute an emergent new paradigm of computation, geared towards tackling combinatorial optimization problems that can be reduced to the problem of finding the ground state of an Ising model. Spatial Photonic Ising Machines have proven to be advantageous for simulating fully connected large-scale spin systems. However, fine control of a general interaction matrix $J$ has so far only been accomplished through eigenvalue decomposition methods that either limit the scalability or increase the execution time of the optimization process. We introduce and experimentally validate a SPIM instance that enables direct control over the full interaction matrix, enabling the encoding of Ising Hamiltonians with arbitrary couplings and connectivity. We demonstrate the conformity of the experimentally measured Ising energy with the theoretically expected values and then proceed to solve both the unweighted and weighted graph partitioning problems, showcasing a systematic convergence to an optimal solution via simulated annealing. Our approach greatly expands the applicability of SPIMs for real-world applications without sacrificing any of the inherent advantages of the system, and paves the way to encoding the full range of NP problems that are known to be equivalent to Ising models, on SPIM devices.	翻訳日:2024-07-15 23:57:34 公開日:2024-07-12
# Tsetlin マシンの除去による状態空間の探索と推論 Exploring State Space and Reasoning by Elimination in Tsetlin Machine ( http://arxiv.org/abs/2407.09162v1 ) ライセンス: Link先を確認	Ahmed K. Kadhim, Ole-Christoffer Granmo, Lei Jiao, Rishad Shafik,	(参考訳) Tsetlin Machine(TM)は機械学習(ML)において大きな注目を集めている。論理的基礎を用いることで、パターン学習と表現を容易にし、結語節という形でパターン分類に特化して理解可能な人工知能(AI)を開発するための代替のアプローチを提供する。自然言語処理(NLP)の分野において、TMは単語の埋め込みを構築し、節を用いてターゲット語を記述するために用いられる。これらの節の記述能力を高めるために、より包括的な表現を提供するために特徴否定を取り入れた節の定式化において、Reasoning by Elimination(RbE)の概念を研究する。より詳しくは、Tsetlin Machine Auto-Encoder (TM-AE) アーキテクチャを用いて、与えられた語彙に対して特徴量ベクトルを抽出してコンテキスト情報を取得することを目的とした、高密度な単語ベクトルを生成する。その後、RbEの原理は記述性を改善し、TMの性能を最適化するために研究される。具体的には、特異性パラメータsと投票マージンパラメータTを利用して状態空間の特徴分布を規制し、各節の情報を密に表現する。さらに, TM-AEの状態空間, 特に忘れられた, 除外された特徴について検討する。人工的に生成されたデータ、IMDBデータセット、20ニューズグループデータセットに関する実証的研究は、IMDBの精度が90.62\%に達するTMの堅牢性を示している。 The Tsetlin Machine (TM) has gained significant attention in Machine Learning (ML). By employing logical fundamentals, it facilitates pattern learning and representation, offering an alternative approach for developing comprehensible Artificial Intelligence (AI) with a specific focus on pattern classification in the form of conjunctive clauses. In the domain of Natural Language Processing (NLP), TM is utilised to construct word embedding and describe target words using clauses. To enhance the descriptive capacity of these clauses, we study the concept of Reasoning by Elimination (RbE) in clauses' formulation, which involves incorporating feature negations to provide a more comprehensive representation. In more detail, this paper employs the Tsetlin Machine Auto-Encoder (TM-AE) architecture to generate dense word vectors, aiming at capturing contextual information by extracting feature-dense vectors for a given vocabulary. Thereafter, the principle of RbE is explored to improve descriptivity and optimise the performance of the TM. Specifically, the specificity parameter s and the voting margin parameter T are leveraged to regulate feature distribution in the state space, resulting in a dense representation of information for each clause. In addition, we investigate the state spaces of TM-AE, especially for the forgotten/excluded features. Empirical investigations on artificially generated data, the IMDB dataset, and the 20 Newsgroups dataset showcase the robustness of the TM, with accuracy reaching 90.62\% for the IMDB.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# TAPI: コードLLMに対するターゲット特化的かつ対向的なプロンプトインジェクションを目指して TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs ( http://arxiv.org/abs/2407.09164v1 ) ライセンス: Link先を確認	Yuchen Yang, Hongwei Yao, Bingrun Yang, Yiling He, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren,	(参考訳) 近年、コード指向の大規模言語モデル(Code LLM)は、コードプログラミングを簡素化し、促進するために広く使われ、成功している。これらのツールを使えば、開発者は不完全なコードと自然言語プロンプトに基づいて、望まれる完全な関数コードを簡単に生成できる。しかし、いくつかの先駆的な研究により、これらのコードLLMは、例えば、バックドアや敵の攻撃に対して脆弱であることが明らかとなった。前者は、トレーニングデータやモデルパラメータを悪用することで、悪意のあるコードスニペットを挿入するトリガーに応答するためにLSMを誘導し、後者は、悪意のある逆入力コードを作成して、生成されたコードの品質を低下させる。バックドアアタックはモデルトレーニングプロセスの制御に依存し、敵対的アタックは特定の悪意のある目的を達成するのに苦労する。バックドア攻撃と対向攻撃の両方の利点を継承するために,コードLLMに対する新たな攻撃パラダイム,すなわち,ターゲット固有および対向的プロンプトインジェクション(TAPI)を提案する。 TAPIは悪意のある命令に関する情報を含む読めないコメントを生成し、それらを外部ソースコードのトリガーとして隠す。トリガーを含むコードを完成させるためにCode LLMを利用すると、モデルは特定の場所で攻撃者が特定した悪意のあるコードスニペットを生成する。重篤な3つの目的と7つの事例において,4つの代表的なLSMに対するTAPI攻撃を評価した。その結果,攻撃成功率最大89.3\%)とステルスティ(トリガ設計において平均53.1\%のトークンを節約)を非常に脅かしていることがわかった。特に、CodeGeexやGithub Copilotなど、デプロイされたコード補完統合アプリケーションに対する攻撃に成功しました。これは我々の攻撃の現実的な脅威をさらに裏付ける。 Recently, code-oriented large language models (Code LLMs) have been widely and successfully used to simplify and facilitate code programming. With these tools, developers can easily generate desired complete functional codes based on incomplete code and natural language prompts. However, a few pioneering works revealed that these Code LLMs are also vulnerable, e.g., against backdoor and adversarial attacks. The former could induce LLMs to respond to triggers to insert malicious code snippets by poisoning the training data or model parameters, while the latter can craft malicious adversarial input codes to reduce the quality of generated codes. However, both attack methods have underlying limitations: backdoor attacks rely on controlling the model training process, while adversarial attacks struggle with fulfilling specific malicious purposes. To inherit the advantages of both backdoor and adversarial attacks, this paper proposes a new attack paradigm, i.e., target-specific and adversarial prompt injection (TAPI), against Code LLMs. TAPI generates unreadable comments containing information about malicious instructions and hides them as triggers in the external source code. When users exploit Code LLMs to complete codes containing the trigger, the models will generate attacker-specified malicious code snippets at specific locations. We evaluate our TAPI attack on four representative LLMs under three representative malicious objectives and seven cases. The results show that our method is highly threatening (achieving an attack success rate of up to 89.3\%) and stealthy (saving an average of 53.1\% of tokens in the trigger design). In particular, we successfully attack some famous deployed code completion integrated applications, including CodeGeex and Github Copilot. This further confirms the realistic threat of our attack.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# ロバストかつ効率的な等角形予測セット Robust Yet Efficient Conformal Prediction Sets ( http://arxiv.org/abs/2407.09165v1 ) ライセンス: Link先を確認	Soroush H. Zargarbashi, Mohammad Sadegh Akhondzadeh, Aleksandar Bojchevski,	(参考訳) コンフォーマル予測(CP)は、任意のモデルの出力を、ユーザーが特定した確率で真のラベルを含むことが保証された予測セットに変換することができる。しかし、モデル自体と同様に、CPは敵対的なテスト例(回避)と摂動キャリブレーションデータ(中毒)に弱い。整合性スコアの最悪のケース変化をバウンドすることで、証明可能なロバストな集合を導出する。より厳密な境界はより効率的な集合をもたらす。連続的なデータと離散的なデータの両方をカバーし、(機能とラベルの両方において)回避と中毒の攻撃の両方を保証します。 Conformal prediction (CP) can convert any model's output into prediction sets guaranteed to include the true label with any user-specified probability. However, same as the model itself, CP is vulnerable to adversarial test examples (evasion) and perturbed calibration data (poisoning). We derive provably robust sets by bounding the worst-case change in conformity scores. Our tighter bounds lead to more efficient sets. We cover both continuous and discrete (sparse) data and our guarantees work both for evasion and poisoning attacks (on both features and labels).	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# SE(3)-bi-equivariant Transformer for Point Cloud Assembly SE(3)-bi-equivariant Transformers for Point Cloud Assembly ( http://arxiv.org/abs/2407.09167v1 ) ライセンス: Link先を確認	Ziming Wang, Rebecka Jörnsten,	(参考訳) 一対の点雲が与えられた場合、アセンブリの目標は、一方の点雲ともう一方の点雲を整合させる厳密な変換を回復することである。このタスクは、点雲がオーバーラップされない可能性があり、任意の初期位置を持つため、難しい。これらの問題に対処するため,SE(3)-bi-equivariant transformer (BITR, SE(3)-bi-equivariant transformer) という手法を提案する。その等価性のため、BITRはオーバーラップしないPCを扱えるだけでなく、初期位置に対する堅牢性も保証できる。具体的には、BITRはまず、新しい$SE(3) \times SE(3)$-transformerを使って入力の特徴を抽出し、学習した特徴をSE(3)を出力として投影する。さらに, BITR にスワップとスケールの等式を組み込むことにより, インプットのスケーリングおよびスワップにおいて, 安定した性能を保証できることが理論的に示されている。本研究は,BITRの実践的課題における有効性について実験的に示す。 Given a pair of point clouds, the goal of assembly is to recover a rigid transformation that aligns one point cloud to the other. This task is challenging because the point clouds may be non-overlapped, and they may have arbitrary initial positions. To address these difficulties, we propose a method, called SE(3)-bi-equivariant transformer (BITR), based on the SE(3)-bi-equivariance prior of the task: it guarantees that when the inputs are rigidly perturbed, the output will transform accordingly. Due to its equivariance property, BITR can not only handle non-overlapped PCs, but also guarantee robustness against initial positions. Specifically, BITR first extracts features of the inputs using a novel $SE(3) \times SE(3)$-transformer, and then projects the learned feature to group SE(3) as the output. Moreover, we theoretically show that swap and scale equivariances can be incorporated into BITR, thus it further guarantees stable performance under scaling and swapping the inputs. We experimentally show the effectiveness of BITR in practical tasks.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# テンソルネットワークは乱流確率分布の計算を可能にする Tensor networks enable the calculation of turbulence probability distributions ( http://arxiv.org/abs/2407.09169v1 ) ライセンス: Link先を確認	Nikita Gourianov, Peyman Givi, Dieter Jaksch, Stephen B. Pope,	(参考訳) 乱流の流れの力学を予測することは、長い間科学と工学の中心的な目標であった。しかし、現代のコンピューティング技術でさえ、最も単純な乱流場以外の全てを正確にシミュレーションすることは不可能である。別の方法として、乱流 $\textit{probabilistically}$ を、関節確率密度関数(PDFs)に従って分布するランダム変数としてフロー特性を見ることができる。乱流PDFはカオスでもマルチスケールでもないが、高次元性のためシミュレーションが難しい。本稿では, 乱流PDFを"テンソルネットワーク" (TN) と呼ばれる極めて圧縮された形式にパラメータ化することで, 次元問題を克服する方法を示す。化学反応性乱流の5+1$のPDFでは、標準有限差分アルゴリズムと比較して、それぞれ$\mathcal{O}(10^6)$と$\mathcal{O}(10^3)$の係数でメモリと計算コストの削減を達成する。乱流と他のカオス系の両方の高次元PDFを直接シミュレートし、確率論的に記述するのに有用である。 Predicting the dynamics of turbulent fluid flows has long been a central goal of science and engineering. Yet, even with modern computing technology, accurate simulation of all but the simplest turbulent flow-fields remains impossible: the fields are too chaotic and multi-scaled to directly store them in memory and perform time-evolution. An alternative is to treat turbulence $\textit{probabilistically}$, viewing flow properties as random variables distributed according to joint probability density functions (PDFs). Turbulence PDFs are neither chaotic nor multi-scale, but are still challenging to simulate due to their high dimensionality. Here we show how to overcome the dimensionality problem by parameterising turbulence PDFs into an extremely compressed format known as a "tensor network" (TN). The TN paradigm enables simulations on single CPU cores that would otherwise be impractical even with supercomputers: for a $5+1$ dimensional PDF of a chemically reactive turbulent flow, we achieve reductions in memory and computational costs by factors of $\mathcal{O}(10^6)$ and $\mathcal{O}(10^3)$, respectively, compared to standard finite difference algorithms. A future path is opened towards something heretofore regarded as infeasible: directly simulating high-dimensional PDFs of both turbulent flows and other chaotic systems that are useful to describe probabilistically.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# 線形量子ネットワークにおける最適忠実度を考慮した絡み合い分布 Optimal Fidelity-Aware Entanglement Distribution in Linear Quantum Networks ( http://arxiv.org/abs/2407.09171v1 ) ライセンス: Link先を確認	Iordanis Koutsopoulos,	(参考訳) 本研究では,2リンク線形量子ネットワークにおけるエンタングルメントの完全忠実度を最大化するユーティリティ関数の最大化の観点から,エンタングルメント分布の問題について検討する。ノードにはいくつかの量子記憶があり、問題は各リンクの絡み合いの浄化をどのように調整するか、そして上記の目標を達成するためにリンク間の絡み合いを交換するかである。エンタングルメント・スワッピング(すなわち、各リンクから1組のキュービットを決定してスワップを行う)は、二部グラフ上の最大重み付きマッチングを見つけるのに等価であることを示す。さらに、絡み合いの浄化(すなわち、リンク内のどのクビットが精製されるかを決定する)は、非二部グラフ上の最大重みマッチングを見つけることと等価である。本稿では,Purify-then-Swap (PtS) とSwap-then-Purify (StP) の2つの多項式アルゴリズムを提案する。数値計算の結果,PtSはStPよりも優れており,StPの浄化の省略は大きな利益をもたらすことがわかった。 We study the problem of entanglement distribution in terms of maximizing a utility function that captures the total fidelity of end-to-end entanglements in a two-link linear quantum network with a source, a repeater, and a destination. The nodes have several quantum memories, and the problem is how to coordinate entanglement purification in each of the links, and entanglement swapping across links, so as to achieve the goal above. We show that entanglement swapping (i.e, deciding on the pair of qubits from each link to perform swapping on) is equivalent to finding a max-weight matching on a bipartite graph. Further, entanglement purification (i.e, deciding which pairs of qubits in a link will undergo purification) is equivalent to finding a max-weight matching on a non-bipartite graph. We propose two polynomial algorithms, the Purify-then-Swap (PtS) and the Swap-then-Purify (StP) ones, where the decisions about purification and swapping are taken with different order. Numerical results show that PtS performs better than StP, and also that the omission of purification in StP gives substantial benefits.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# マシン・アポフェニア:カレイドスコピックによる建築画像の生成 Machine Apophenia: The Kaleidoscopic Generation of Architectural Images ( http://arxiv.org/abs/2407.09172v1 ) ライセンス: Link先を確認	Alexey Tikhonov, Dmitry Sinyavin,	(参考訳) 本研究では,建築設計における生成人工知能の適用について検討する。本稿では,複数のニューラルネットワークを組み合わせて,教師なしかつ修正されていないユニークなアーキテクチャイメージのストリームを生成する手法を提案する。我々のアプローチは、マシン・アポフェニアと呼ばれる概念的枠組みに基づいている。ニューラルネットワークは、多様な人為的なデータに基づいて訓練され、審美的嗜好を内在化し、ランダムな入力からでもコヒーレントなデザインを生み出す傾向がある、という仮説を立てる。この手法は、画像生成、記述、改善の反復的なプロセスを含むため、いくつかのソーシャルメディアプラットフォームで自動的に共有されるアーキテクチャの葉書がキャプションされる。評価およびアブレーション研究は、各ステップで得られた画像の技術的および審美的指標の改善を示す。 This study investigates the application of generative artificial intelligence in architectural design. We present a novel methodology that combines multiple neural networks to create an unsupervised and unmoderated stream of unique architectural images. Our approach is grounded in the conceptual framework called machine apophenia. We hypothesize that neural networks, trained on diverse human-generated data, internalize aesthetic preferences and tend to produce coherent designs even from random inputs. The methodology involves an iterative process of image generation, description, and refinement, resulting in captioned architectural postcards automatically shared on several social media platforms. Evaluation and ablation studies show the improvement both in technical and aesthetic metrics of resulting images on each step.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# 共形誘導型グラフニューラルネットワーク Conformal Inductive Graph Neural Networks ( http://arxiv.org/abs/2407.09173v1 ) ライセンス: Link先を確認	Soroush H. Zargarbashi, Aleksandar Bojchevski,	(参考訳) コンフォーマル予測(CP)は、任意のモデルの出力を真のラベルを含むことが保証された予測セットに変換する。 CPは、有効な分布のないカバレッジを保証するために、i.d.仮定の緩和である交換可能性を必要とする。これにより、トランスダクティブノード分類に直接適用することができる。しかし、従来のCPは、新しいノードとのメッセージパッシングに起因する(校正)スコアの暗黙的なシフトのため、誘導的な設定では適用できない。ノードグラフとエッジ交換可能なグラフの両方のケースでこの問題を修正し、統計的効率を犠牲にすることなく標準カバレッジ保証を回復する。さらに、新しいノード/エッジの到着時や後続の瞬間に、保証が予測時間とは独立に保持されていることを証明します。 Conformal prediction (CP) transforms any model's output into prediction sets guaranteed to include (cover) the true label. CP requires exchangeability, a relaxation of the i.i.d. assumption, to obtain a valid distribution-free coverage guarantee. This makes it directly applicable to transductive node-classification. However, conventional CP cannot be applied in inductive settings due to the implicit shift in the (calibration) scores caused by message passing with the new nodes. We fix this issue for both cases of node and edge-exchangeable graphs, recovering the standard coverage guarantee without sacrificing statistical efficiency. We further prove that the guarantee holds independently of the prediction time, e.g. upon arrival of a new node/edge or at any subsequent moment.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# DART: データ多様化,オープンボキャブラリバウンディングボックスアノテーション,擬似ラベルレビュー,モデルトレーニングを備えた自動エンドツーエンドオブジェクト検出パイプライン DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training ( http://arxiv.org/abs/2407.09174v1 ) ライセンス: Link先を確認	Chen Xin, Andreas Hartel, Enkelejda Kasneci,	(参考訳) スウィフトと指定されたオブジェクトの正確な検出は、建設現場の安全監視など、多くの産業アプリケーションにとって不可欠である。しかし、従来のアプローチは、常に変化する環境や新しいターゲットオブジェクトに適応するのに苦労する、厳しい手動のアノテーションとデータ収集に大きく依存している。これらの制約に対処するため、DARTはオブジェクト検出アプリケーションのワークフロー全体をデータ収集からモデル展開に合理化するために設計されたエンドツーエンドパイプラインである。 DARTは、多様なシナリオに優れながら、人間のラベル付けと広範なデータ収集の必要性を排除する。データ多様化にはDreamBooth(SDXL付きDreamBooth)の画像生成モジュールを使用し、続いてオープン語彙オブジェクト検出(DINO周辺)が生成された画像とオリジナル画像の両方のバウンディングボックスアノテーションを生成するアノテーションステージを使用する。これらの擬似ラベルは大規模マルチモーダルモデル (GPT-4o) によってレビューされ、リアルタイム物体検出器 (YOLO) を訓練するための基礎的な真実として機能する前に信頼性を保証する。我々はDARTを、23のカテゴリにまたがる15K以上の高品質な画像を含む、Leebherr Productという名前の自己コンパイルされた建設機械のデータセットに適用する。現在のDARTの実装により、平均精度(AP)は0.064から0.832に大幅に向上した。さらに,DARTのモジュール設計を採用し,交換性と拡張性を確保する。これにより、将来的にはより高度なアルゴリズムへのスムーズな移行、手動ラベリングなしで新しいオブジェクトカテゴリのシームレスな統合、余分なデータ収集なしでカスタマイズされた環境への適応性が可能になる。コードとデータセットはhttps://github.com/chen-xin-94/DARTで公開されている。 Swift and accurate detection of specified objects is crucial for many industrial applications, such as safety monitoring on construction sites. However, traditional approaches rely heavily on arduous manual annotation and data collection, which struggle to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an automated end-to-end pipeline designed to streamline the entire workflow of an object detection application from data collection to model deployment. DART eliminates the need for human labeling and extensive data collection while excelling in diverse scenarios. It employs a subject-driven image generation module (DreamBooth with SDXL) for data diversification, followed by an annotation stage where open-vocabulary object detection (Grounding DINO) generates bounding box annotations for both generated and original images. These pseudo-labels are then reviewed by a large multimodal model (GPT-4o) to guarantee credibility before serving as ground truth to train real-time object detectors (YOLO). We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current implementation of DART significantly increases average precision (AP) from 0.064 to 0.832. Furthermore, we adopt a modular design for DART to ensure easy exchangeability and extensibility. This allows for a smooth transition to more advanced algorithms in the future, seamless integration of new object categories without manual labeling, and adaptability to customized environments without extra data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# ペルソナ抽出法の有効性を探る Exploring the Effectiveness of Methods for Persona Extraction ( http://arxiv.org/abs/2407.09181v1 ) ライセンス: Link先を確認	Konstantin Zaitsev,	(参考訳) 本稿では,対話参加者に関する情報を抽出し,そのパフォーマンスをロシア語で評価する手法について検討する。このタスクのモデルをトレーニングするために、Multi-Session Chatデータセットは複数の翻訳モデルを使用してロシア語に変換され、データ品質が改善された。抽出モデルの有効性を評価するために,Fスコアの概念に基づく計量を示す。メトリクスは、訓練された分類器を使用して、そのペルソナが属する対話参加者を特定する。 MBart、FRED-T5、Starling-7BはMistral、Encoder2Encoderモデルをベースにしている。その結果, すべてのモデルにおいてペルソナ抽出作業におけるリコールレベルが不十分であることが判明した。 NCEロスの組み入れにより、リコールを犠牲にしてモデルの精度が向上した。さらに、モデルのサイズが大きくなると、ペルソナの抽出が強化された。 The paper presents a study of methods for extracting information about dialogue participants and evaluating their performance in Russian. To train models for this task, the Multi-Session Chat dataset was translated into Russian using multiple translation models, resulting in improved data quality. A metric based on the F-score concept is presented to evaluate the effectiveness of the extraction models. The metric uses a trained classifier to identify the dialogue participant to whom the persona belongs. Experiments were conducted on MBart, FRED-T5, Starling-7B, which is based on the Mistral, and Encoder2Encoder models. The results demonstrated that all models exhibited an insufficient level of recall in the persona extraction task. The incorporation of the NCE Loss improved the model's precision at the expense of its recall. Furthermore, increasing the model's size led to enhanced extraction of personas.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# 不完全構文は韓国語モデルに影響を及ぼすか? 語順と事例マーカに着目して Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers ( http://arxiv.org/abs/2407.09184v1 ) ライセンス: Link先を確認	Jong Myoung Kim, Young-Jun Lee, Yong-jin Han, Sangkeun Jung, Ho-Jin Choi,	(参考訳) 語順やケースマーカーなどの構文要素は自然言語処理において基本的なものである。近年の研究では、構文情報によって言語モデルの性能が向上し、学習メカニズムを理解するための手がかりが提示されている。英語のような固定語順を持つ言語とは異なり、韓国語はその標準的構造にもかかわらず、文成分の機能を示すケースマーカーのため、様々な単語列を許容する。本研究は,韓国語モデルがこの柔軟性を正確に捉えることができるかどうかを考察する。通常の韓国語通信では,不完全語順や省略例マーカーが頻繁に現れることに留意する。これをさらに調査するため,Syntactically Incomplete Korean(SIKO)データセットを導入した。 SIKOを用いて、韓国語モデルの柔軟性を不完全な構文で評価し、データセットのトレーニング値を確認した。結果は、これらのモデルが韓国固有の柔軟性を反映し、不完全な入力を正確に処理していることを示している。さらに、SIKOによる微調整により、共通不完全韓国構文形式を扱う能力が向上する。データセットの単純な構築プロセスは、大幅なパフォーマンス向上と相まって、効果的なデータ拡張技術としての地位を固めている。 Syntactic elements, such as word order and case markers, are fundamental in natural language processing. Recent studies show that syntactic information boosts language model performance and offers clues for people to understand their learning mechanisms. Unlike languages with a fixed word order such as English, Korean allows for varied word sequences, despite its canonical structure, due to case markers that indicate the functions of sentence components. This study explores whether Korean language models can accurately capture this flexibility. We note that incomplete word orders and omitted case markers frequently appear in ordinary Korean communication. To investigate this further, we introduce the Syntactically Incomplete Korean (SIKO) dataset. Through SIKO, we assessed Korean language models' flexibility with incomplete syntax and confirmed the dataset's training value. Results indicate these models reflect Korean's inherent flexibility, accurately handling incomplete inputs. Moreover, fine-tuning with SIKO enhances the ability to handle common incomplete Korean syntactic forms. The dataset's simple construction process, coupled with significant performance enhancements, solidifies its standing as an effective data augmentation technique.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# スムース粒子流体力学による変分推論 Variational Inference via Smoothed Particle Hydrodynamics ( http://arxiv.org/abs/2407.09186v1 ) ライセンス: Link先を確認	Yongchao Huang,	(参考訳) スムーズな粒子流体力学(SPH)に基づく新しい変分推論法 SPH-ParVI が提案されている。 SPH-ParVIは、ターゲット密度によって駆動される外部効果下での流体の流れをシミュレートする。連続流体はSPHを介して相互作用粒子系(IPS)としてモデル化され、各粒子は滑らかな性質を持ち、ナビエ・ストークス方程式に従って相互作用し、進化する。このメッシュフリーなラグランジアンシミュレーション法は、ベイズ的推論や生成モデルのような確率的モデルのクラスに対して、高速で、柔軟で、スケーラブルで決定論的サンプリングと推論を提供する。 A new variational inference method, SPH-ParVI, based on smoothed particle hydrodynamics (SPH), is proposed for sampling partially known densities (e.g. up to a constant) or sampling using gradients. SPH-ParVI simulates the flow of a fluid under external effects driven by the target density; transient or steady state of the fluid approximates the target density. The continuum fluid is modelled as an interacting particle system (IPS) via SPH, where each particle carries smoothed properties, interacts and evolves as per the Navier-Stokes equations. This mesh-free, Lagrangian simulation method offers fast, flexible, scalable and deterministic sampling and inference for a class of probabilistic models such as those encountered in Bayesian inference and generative modelling.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# バングラにおける抑うつポスト検出の促進:TF-IDF,BERTおよびFastText埋め込みの比較検討 Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings ( http://arxiv.org/abs/2407.09187v1 ) ライセンス: Link先を確認	Saad Ahmed Sazan, Mahdi H. Miraz, A B M Muntasir Rahman,	(参考訳) ソーシャルメディアの普及により、ソーシャルメディア分析によるユーザーの抑うつの検出は、特にバングラのような表現不足言語において重要な意味を持つ。本研究では,先進的な自然言語処理技術を用いて,バングラの抑うつ的ソーシャルメディア投稿を識別する手法を提案する。この研究で使用されるデータセットは、ドメインの専門家によって注釈付けされ、抑うつ的ポストと非抑うつ的ポストの両方が含まれ、モデルトレーニングと評価のための高品質なデータを保証する。クラス不均衡の問題に対処するため,マイノリティクラスに対するランダムなオーバーサンプリングを利用して,抑うつポストを正確に検出する能力を向上した。本稿では,変換器(BERT)の埋め込みとFastTextの埋め込みを,深層学習に基づく畳み込みニューラルネットワーク-双方向長短期記憶(CNN-BiLSTM)モデルと組み合わせることで,TF-IDF(Term Frequency-Inverse Document Frequency)や双方向エンコーダ表現(Bidirectional Encoder Representations from Transformers)の埋め込み,FastTextの埋め込みなど,さまざまな数値表現手法について検討した。その結果,BERT法はF1スコアの84%を達成し,他の方法よりも優れた成績を示した。このことは、BERTとCNN-BiLSTMアーキテクチャが組み合わさって、抑圧的な内容に関連するBanglaテキストのニュアンスを効果的に認識していることを示している。既存の最先端手法との比較分析により、BERT埋め込みによるアプローチは、評価指標やデータセットアノテーションの信頼性の観点から、他の方法よりも優れていることが示された。本研究は,バングラ語における抑うつ姿勢検出のための信頼性の高いツールの開発に大きく貢献する。本研究は,様々な埋め込み手法と深層学習モデルの有効性を強調することによって,ソーシャルメディアプラットフォームによるメンタルヘルスモニタリングの改善方法を明らかにする。 Due to massive adoption of social media, detection of users' depression through social media analytics bears significant importance, particularly for underrepresented languages, such as Bangla. This study introduces a well-grounded approach to identify depressive social media posts in Bangla, by employing advanced natural language processing techniques. The dataset used in this work, annotated by domain experts, includes both depressive and non-depressive posts, ensuring high-quality data for model training and evaluation. To address the prevalent issue of class imbalance, we utilised random oversampling for the minority class, thereby enhancing the model's ability to accurately detect depressive posts. We explored various numerical representation techniques, including Term Frequency-Inverse Document Frequency (TF-IDF), Bidirectional Encoder Representations from Transformers (BERT) embedding and FastText embedding, by integrating them with a deep learning-based Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) model. The results obtained through extensive experimentation, indicate that the BERT approach performed better the others, achieving a F1-score of 84%. This indicates that BERT, in combination with the CNN-BiLSTM architecture, effectively recognises the nuances of Bangla texts relevant to depressive contents. Comparative analysis with the existing state-of-the-art methods demonstrates that our approach with BERT embedding performs better than others in terms of evaluation metrics and the reliability of dataset annotations. Our research significantly contribution to the development of reliable tools for detecting depressive posts in the Bangla language. By highlighting the efficacy of different embedding techniques and deep learning models, this study paves the way for improved mental health monitoring through social media platforms.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# 限られたデータによる医用画像のセグメンテーション Segmenting Medical Images with Limited Data ( http://arxiv.org/abs/2407.09189v1 ) ライセンス: Link先を確認	Zhaoshan Liua, Qiujie Lv, Chau Hung Lee, Lei Shen,	(参考訳) コンピュータビジョンは、医療画像のセグメンテーションに有用であることが証明されているが、その応用は、限られたデータセットサイズや、ラベルなし画像の有効活用の複雑さといった課題に直面している。これらの課題に対処するために、データ効率医療セグメンタ(DEMS)と呼ばれる、半教師付き一貫性に基づく新しいアプローチを提案する。 DEMSはエンコーダ・デコーダアーキテクチャを備え、開発されているオンライン自動拡張器(OAA)と残留ロバストネス強化(RRE)ブロックを組み込んでいる。 OAAは入力データを様々な画像変換で拡張し、データセットを多様化して一般化能力を向上させる。 RREは特徴の多様性を豊かにし、様々なデコーダに対して様々な入力を生成するために摂動を導入する。さらに、異なるデコーダ間の一貫性をさらに向上し、トレーニングプロセスの安定化を図るために、敏感な損失を導入する。我々の公開データセットと3つの公開データセットの大規模な実験結果から、DEMの有効性が確認された。極端なデータ不足のシナリオ下では、私たちのDEMSは、それぞれU-Netとトップパフォーマンスの最先端手法と比較して、ダイススコアが16.85\%と10.37\%向上している。データ効率が優れていることから、DEMSは小さなデータ体制下での医療分野の大幅な進歩を示す可能性がある。プロジェクトのホームページはhttps://github.com/NUS-Tim/DEMSでアクセスできる。 While computer vision has proven valuable for medical image segmentation, its application faces challenges such as limited dataset sizes and the complexity of effectively leveraging unlabeled images. To address these challenges, we present a novel semi-supervised, consistency-based approach termed the data-efficient medical segmenter (DEMS). The DEMS features an encoder-decoder architecture and incorporates the developed online automatic augmenter (OAA) and residual robustness enhancement (RRE) blocks. The OAA augments input data with various image transformations, thereby diversifying the dataset to improve the generalization ability. The RRE enriches feature diversity and introduces perturbations to create varied inputs for different decoders, thereby providing enhanced variability. Moreover, we introduce a sensitive loss to further enhance consistency across different decoders and stabilize the training process. Extensive experimental results on both our own and three public datasets affirm the effectiveness of DEMS. Under extreme data shortage scenarios, our DEMS achieves 16.85\% and 10.37\% improvement in dice score compared with the U-Net and top-performed state-of-the-art method, respectively. Given its superior data efficiency, DEMS could present significant advancements in medical segmentation under small data regimes. The project homepage can be accessed at https://github.com/NUS-Tim/DEMS.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# 難易度から難易度へ:ロバストなパノプティカルシーングラフ生成のための曲線形状認識特徴を学習する From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation ( http://arxiv.org/abs/2407.09191v1 ) ライセンス: Link先を確認	Hanrong Shi, Lin Li, Jun Xiao, Yueting Zhuang, Long Chen,	(参考訳) パノプティック・シーングラフ生成(PSG)は、パノプティック・セグメンテーション・マスクに基づく総合的なグラフ構造表現を作成することを目的としている。 PSGの顕著な進歩にもかかわらず、既存のほとんどの手法は、本質的には輪郭やオブジェクトの境界に焦点をあてる形状認識の特徴の重要性を無視している。このギャップを埋めるために,PSG のためのモデルに依存しない shApe-aware feature (CAFE) 学習戦略を提案する。具体的には、形状認識機能(マスク機能やバウンダリ機能など)をPSGに組み込んで、bbox機能のみに依存しないようにします。さらに, 人間の認識からインスピレーションを得た形状認識機能を, 容易かつハードな方法で統合することを提案する。そこで我々は,認識学習の難しさに基づいて,述語を3つのグループに分類し,学習過程を3つの段階に分けた。各段階は、特定の述語群を区別するために特殊関係分類器を使用する。述語学習の難しさが増大するにつれて、これらの分類器は複雑性を上昇させる特徴を備えている。また,早期に獲得した知識を維持するため,知識蒸留も取り入れた。モデルに依存しない性質のため、CAFEは任意のPSGモデルにシームレスに組み込むことができる。強靭性PSGとゼロショットPSGの両条件下での2つのPSGタスクに対する広範な実験と改善により,提案したCAFEの優位性と堅牢性が証明された。 Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-structure representation based on panoptic segmentation masks. Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features, which inherently focus on the contours and boundaries of objects. To bridge this gap, we propose a model-agnostic Curricular shApe-aware FEature (CAFE) learning strategy for PSG. Specifically, we incorporate shape-aware features (i.e., mask features and boundary features) into PSG, moving beyond reliance solely on bbox features. Furthermore, drawing inspiration from human cognition, we propose to integrate shape-aware features in an easy-to-hard manner. To achieve this, we categorize the predicates into three groups based on cognition learning difficulty and correspondingly divide the training process into three stages. Each stage utilizes a specialized relation classifier to distinguish specific groups of predicates. As the learning difficulty of predicates increases, these classifiers are equipped with features of ascending complexity. We also incorporate knowledge distillation to retain knowledge acquired in earlier stages. Due to its model-agnostic nature, CAFE can be seamlessly incorporated into any PSG model. Extensive experiments and ablations on two PSG tasks under both robust and zero-shot PSG have attested to the superiority and robustness of our proposed CAFE, which outperforms existing state-of-the-art methods by a large margin.	翻訳日:2024-07-15 23:47:49 公開日:2024-07-12
# 塩とペッパーのヒートマップ:拡散インフォームドランドマーク検出戦略 Salt & Pepper Heatmaps: Diffusion-informed Landmark Detection Strategy ( http://arxiv.org/abs/2407.09192v1 ) ライセンス: Link先を確認	Julian Wyatt, Irina Voiculescu,	(参考訳) 解剖学的ランドマーク検出(Anatomical Landmark Detection)は、臨床測定のための画像の重要な領域を特定するプロセスである。それぞれのランドマークは、臨床医によってラベル付けされた単一の真実点である。機械学習モデルは、ヒートマップで表される確率領域としてランドマークの軌跡を予測する。拡散モデルは、高品質なサンプリングとモードカバレッジのため、生成モデリングで人気が高まり、セマンティックセグメンテーションのための医療画像処理に採用されている。拡散モデリングはランドマーク上の分布を学習するためにさらに適応することができる。拡散モデルの確率的性質は、有意な確率領域にぼやけることにより、ランドマーク予測における揺らぎを捉える。本稿では,自動解剖学的ランドマーク検出を高精度な生成モデルタスクとして再構成し,数ドットのヒートマップを生成する。提案手法は,既存の作業に対して,最先端のMREと同等のSDR性能を実現する。 Anatomical Landmark Detection is the process of identifying key areas of an image for clinical measurements. Each landmark is a single ground truth point labelled by a clinician. A machine learning model predicts the locus of a landmark as a probability region represented by a heatmap. Diffusion models have increased in popularity for generative modelling due to their high quality sampling and mode coverage, leading to their adoption in medical image processing for semantic segmentation. Diffusion modelling can be further adapted to learn a distribution over landmarks. The stochastic nature of diffusion models captures fluctuations in the landmark prediction, which we leverage by blurring into meaningful probability regions. In this paper, we reformulate automatic Anatomical Landmark Detection as a precise generative modelling task, producing a few-hot pixel heatmap. Our method achieves state-of-the-art MRE and comparable SDR performance with existing work.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# ヨーロッパにおける亡命見習いのチャットボット A Chatbot for Asylum-Seeking Migrants in Europe ( http://arxiv.org/abs/2407.09197v1 ) ライセンス: Link先を確認	Bettina Fazzinga, Elena Palmieri, Margherita Vestoso, Luca Bolognini, Andrea Galassi, Filippo Furfaro, Paolo Torroni,	(参考訳) 本稿では,ヨーロッパにおける亡命希望者のためのチャットボットACMEについて紹介する。 ACMEは、計算的議論に依存しており、移民が適用可能な最も高いレベルの保護を特定するのを支援することを目的としている。このことは、亡命申請者を支援する領土委員会、裁判所、人道団体の負担を減らすことで、より持続可能な移住に寄与した。実演に使われたコンテキスト、システムアーキテクチャ、技術、ケーススタディについて説明する。 We present ACME: A Chatbot for asylum-seeking Migrants in Europe. ACME relies on computational argumentation and aims to help migrants identify the highest level of protection they can apply for. This would contribute to a more sustainable migration by reducing the load on territorial commissions, Courts, and humanitarian organizations supporting asylum applicants. We describe the context, system architectures, technologies, and the case study used to run the demonstration.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# 総合都市移動データ生成モデル : 体系的文献レビュー Generative Models for Synthetic Urban Mobility Data: A Systematic Literature Review ( http://arxiv.org/abs/2407.09198v1 ) ライセンス: Link先を確認	Alexandra Kapp, Julia Hansmeyer, Helena Mihaljević,	(参考訳) 様々な用途に非常に価値があるが、センシティブな個人情報を含むため、都市移動データを公開することは滅多にない。合成データは、構造的および統計的特性のオリジナルのデータセットに似た人工データを生成することで、この問題を解決することを目的としている。モビリティデータについては、過去10年間に多数の対応するモデルが提案されている。この体系的なレビューは、この異質で活発な研究分野の現状に関する構造化された比較概要を提供する。レビューされたモデルの適用性に特に焦点が当てられている。 Although highly valuable for a variety of applications, urban mobility data is rarely made openly available as it contains sensitive personal information. Synthetic data aims to solve this issue by generating artificial data that resembles an original dataset in structural and statistical characteristics, but omits sensitive information. For mobility data, a large number of corresponding models have been proposed in the last decade. This systematic review provides a structured comparative overview of the current state of this heterogeneous, active field of research. A special focus is put on the applicability of the reviewed models in practice.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# Instagramにおけるインフルエンサー自己開示の実践:多人数縦断的研究 Influencer Self-Disclosure Practices on Instagram: A Multi-Country Longitudinal Study ( http://arxiv.org/abs/2407.09202v1 ) ライセンス: Link先を確認	Thales Bertaglia, Catalina Goanta, Gerasimos Spanakis, Adriana Iamnitchi,	(参考訳) 本稿では、米国、ブラジル、オランダ、ドイツ4カ国のコンテンツクリエーター400名による100万以上の投稿からなるInstagram上での10年以上にわたる活動について、縦断的研究を行った。本研究は、各国間のコンテンツ収益化の専門化における差異や、類似したユーザーエンゲージメント傾向の頻度の顕著な相違、一部の国におけるスポンサーコンテンツ公開における顕著な相違、および国家法との直接的な結びつきを示すものである。我々は、コンテンツクリエーターが異なる法律環境に開示方法を適用する方法に焦点を当て、立法やプラットフォーム機能の変更によるマーケティング戦略の変化を分析する。また、情報開示やスポンサー投稿がエンゲージメントに与える影響を分析し、スポンサー投稿は平均してエンゲージメントが低いが、広告を適切に開示することはエンゲージメントをさらに減少させるものではないと結論付けている。我々の観察は、開示コンプライアンスの重要性を強調し、より効果的にそれらを開発・監視する当局を導くことができる。 This paper presents a longitudinal study of more than ten years of activity on Instagram consisting of over a million posts by 400 content creators from four countries: the US, Brazil, Netherlands and Germany. Our study shows differences in the professionalisation of content monetisation between countries, yet consistent patterns; significant differences in the frequency of posts yet similar user engagement trends; and significant differences in the disclosure of sponsored content in some countries, with a direct connection with national legislation. We analyse shifts in marketing strategies due to legislative and platform feature changes, focusing on how content creators adapt disclosure methods to different legal environments. We also analyse the impact of disclosures and sponsored posts on engagement and conclude that, although sponsored posts have lower engagement on average, properly disclosing ads does not reduce engagement further. Our observations stress the importance of disclosure compliance and can guide authorities in developing and monitoring them more effectively.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# 集合的リモート検査プロトコルの設計とセキュリティについて On the Design and Security of Collective Remote Attestation Protocols ( http://arxiv.org/abs/2407.09203v1 ) ライセンス: Link先を確認	Sharar Ahmadi, Jay Le-Papin, Liqun Chen, Brijesh Dongol, Sasa Radomirovic, Helen Treharne,	(参考訳) Collective Remote Attestation (CRA) は、(不均一な)ネットワークにおいて、侵入された(しばしば低出力の)デバイスを効率的に識別することを目的としたセキュリティサービスである。ここ数年、CRAプロトコルの提案が大幅に増加し、様々なネットワークトポロジ、ハードウェアの仮定、その他の機能要件によってガイドされたさまざまな設計が示されている。しかし、信頼前提、敵モデル、役割記述が異なるため、セキュリティ保証を均一に評価することは困難である。本稿では,40のCRAプロトコルとその逆モデルに関する包括的研究に基づいて,CRAプロトコルを体系的に比較可能な統合フレームワークであるCattを提案する。 Cattは、デバイスが果たす役割を特徴付け、これらに基づいて、CRAプロトコルのための新しいセキュリティ特性セットを開発します。次に、研究対象とするすべてのプロトコルのセキュリティ目標を分類する。タマリン証明器にエンコードしてSIMPLE+プロトコルを検証することにより,セキュリティ特性の適用性を説明する。 Collective remote attestation (CRA) is a security service that aims to efficiently identify compromised (often low-powered) devices in a (heterogeneous) network. The last few years have seen an extensive growth in CRA protocol proposals, showing a variety of designs guided by different network topologies, hardware assumptions and other functional requirements. However, they differ in their trust assumptions, adversary models and role descriptions making it difficult to uniformly assess their security guarantees. In this paper we present Catt, a unifying framework for CRA protocols that enables them to be compared systematically, based on a comprehensive study of 40 CRA protocols and their adversary models. Catt characterises the roles that devices can take and based on these we develop a novel set of security properties for CRA protocols. We then classify the security aims of all the studied protocols. We illustrate the applicability of our security properties by encoding them in the tamarin prover and verifying the SIMPLE+ protocol against them.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# マルチモーダル大言語モデルによる発音評価 Pronunciation Assessment with Multi-modal Large Language Models ( http://arxiv.org/abs/2407.09209v1 ) ライセンス: Link先を確認	Kaiqi Fu, Linkai Peng, Nan Yang, Shuran Zhou,	(参考訳) 大きな言語モデル(LLM)は、強力な対話能力で知られており、特に言語学習のための自動化インテリジェントな教育システムにおいて、教育分野における例外的なツールとして広く認識されている。本稿では,テキスト関連スコアリングタスクに対する肯定的な影響を動機として,LLMに基づくスコアリングシステムを提案する。具体的には、まず学習者の発話を文脈的特徴にマッピングする。アダプタ層は、これらの機能を潜在空間に埋め込まれたテキストに合わせるように変換する。評価タスク固有のプレフィックスおよびプロンプトテキストは、モダリティアダプタ層によって生成された特徴に埋め込み、連結され、LCMが精度および流速スコアを予測する。実験により,提案したスコアリングシステムは,Speechocean762データセットのベースラインと比較して,競争力のある結果が得られることを示した。また,提案したスコアリングシステムにおいて,迅速なテキストとトレーニング戦略の貢献をより深く理解するために,アブレーション調査を行った。 Large language models (LLMs), renowned for their powerful conversational abilities, are widely recognized as exceptional tools in the field of education, particularly in the context of automated intelligent instruction systems for language learning. In this paper, we propose a scoring system based on LLMs, motivated by their positive impact on text-related scoring tasks. Specifically, the speech encoder first maps the learner's speech into contextual features. The adapter layer then transforms these features to align with the text embedding in latent space. The assessment task-specific prefix and prompt text are embedded and concatenated with the features generated by the modality adapter layer, enabling the LLMs to predict accuracy and fluency scores. Our experiments demonstrate that the proposed scoring systems achieve competitive results compared to the baselines on the Speechocean762 datasets. Moreover, we also conducted an ablation study to better understand the contributions of the prompt text and training strategy in the proposed scoring system.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# 量子ドット系北エフ鎖:鎖長の増加を伴うマヨナの品質測定とスケーリング Quantum-dot-based Kitaev chains: Majorana quality measures and scaling with increasing chain length ( http://arxiv.org/abs/2407.09211v1 ) ライセンス: Link先を確認	Viktor Svensson, Martin Leijnse,	(参考訳) マヨラナ境界状態(MBS)を短く、よく制御可能な量子ドットの鎖で実現することは、障害の問題を後押しするが、微調整が必要であり、長い鎖に固有の真の位相的保護を与えない。ここでは、強い電子-電子相互作用の存在下でも適用可能な新しい品質尺度を導入し、短い量子ドット鎖における微細なMBSの位相的保護の近さを定量化する。この測度は、任意の局所測度が2つの状態を区別できる程度に有界であるからである。異なる長さの量子ドット鎖の局所的識別性について検討する。 3ドット鎖は詳細に研究されており、摂動理論から導かれる有効モデルの中で理解できる事実である2ドットの場合よりも常に改善されるとは限らないことが分かる。長い鎖の場合、局所的な区別性は指数関数的に消失し、局所的な測定では区別できない2つの基底状態を持つ位相相への遷移をシグナルする。 Realizing Majorana bound states (MBSs) in short, well-controllable chains of coupled quantum dots sidesteps the problem of disorder, but requires fine-tuning and does not give the true topological protection inherent to long chains. Here, we introduce a new quality measure that is applicable also in the presence of strong electron-electron interactions and that quantifies the closeness to topological protection of finetuned MBSs in short quantum-dot chains. We call this measure local distinguishability because it puts a bound to the degree an arbitrary local measurement can distinguish between two states. We study the local distinguishability for quantum-dot chains of different length. The three-dot chain is studied in detail, and we find that it may not always be an improvement over the two-dot case, a fact that can be understood within an effective model derived from perturbation theory. For longer chains, the local distinguishability vanishes exponentially, signalling a transition to a topological phase with two ground states that cannot be distinguished by any local measurement.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# 知識グラフクエリ埋め込み学習によるSROI^{-}オントロジーの生成 Generating SROI^{-} Ontologies via Knowledge Graph Query Embedding Learning ( http://arxiv.org/abs/2407.09212v1 ) ライセンス: Link先を確認	Yunjie He, Daniel Hernandez, Mojtaba Nayyeri, Bo Xiong, Yuqicheng Zhu, Evgeny Kharlamov, Steffen Staab,	(参考訳) クエリ埋め込みアプローチは、エンティティ、リレーション、クエリの低次元ベクトル表現を計算し操作することで、不完全知識グラフ(KG)上の複雑な論理的クエリに答える。しかし、現在のクエリ埋め込みモデルは過度にパラメータ化されたニューラルネットワークに依存しており、グラフから学んだ知識を説明できない。本稿では,SROI^{-}記述論理公理の形でグラフから学習した知識を,既存手法よりもパラメータ効率がよい新しいクエリ埋め込み手法AConEを提案する。 AConEはクエリをSROI^{-}記述ロジックの概念に関連付ける。すべての SROI^{-} の概念は複素ベクトル空間の錐として埋め込まれ、それぞれの SROI^{-} の関係は錐を回転させ拡大する変換として埋め込まれる。 AConE が SROI^{-} の公理を学習できることを理論的に示し、演算が 1 から SROI^{-} の記述論理概念に 1 に対応する代数を定義する。複数のクエリデータセットに関する実証研究により、AConEはパラメータが少なく、以前のベースラインよりも優れた結果が得られることが示された。特にWN18RRデータセットでは、AConEはベースラインモデルよりも大幅に改善されている。我々は,公理を表現する能力が問合せ応答の結果に肯定的な影響を及ぼすことを示す包括的分析を行った。 Query embedding approaches answer complex logical queries over incomplete knowledge graphs (KGs) by computing and operating on low-dimensional vector representations of entities, relations, and queries. However, current query embedding models heavily rely on excessively parameterized neural networks and cannot explain the knowledge learned from the graph. We propose a novel query embedding method, AConE, which explains the knowledge learned from the graph in the form of SROI^{-} description logic axioms while being more parameter-efficient than most existing approaches. AConE associates queries to a SROI^{-} description logic concept. Every SROI^{-} concept is embedded as a cone in complex vector space, and each SROI^{-} relation is embedded as a transformation that rotates and scales cones. We show theoretically that AConE can learn SROI^{-} axioms, and defines an algebra whose operations correspond one to one to SROI^{-} description logic concept constructs. Our empirical study on multiple query datasets shows that AConE achieves superior results over previous baselines with fewer parameters. Notably on the WN18RR dataset, AConE achieves significant improvement over baseline models. We provide comprehensive analyses showing that the capability to represent axioms positively impacts the results of query answering.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# HUP-3D: 自己中心型手超音波ポーズ推定のための3次元多視点合成データセット HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation ( http://arxiv.org/abs/2407.09215v1 ) ライセンス: Link先を確認	Manuel Birlo, Razvan Caramalau, Philip J. "Eddie" Edwards, Brian Dromey, Matthew J. Clarkson, Danail Stoyanov,	(参考訳) HUP-3Dは, 超音波による手超音波探触子ポーズ推定のための3次元マルチモーダル合成データセットである。エゴセントリックなマーカーレス3D共同ポーズ推定は、混合現実に基づく医療教育において潜在的に有用である。手動とプローブの動きをプログラム的に理解する能力は、調整された指導と指導の応用への扉を開く。我々のデータセットは31万セット以上のRGBと深度とセグメンテーションマスクフレームで構成されており、画像の多様性と複雑さに重点を置いている。カメラ視点に基づくスフィアの概念を採用することで、さまざまなビューをキャプチャし、トレーニング済みのネットワークを使用して複数のハンドグリップポーズを生成することができる。さらに,本手法には,手や腕のテクスチャ,照明条件,背景画像による多様性の向上など,ソフトウェアベースの画像レンダリングの概念が含まれている。さらに,提案したデータセットを最先端の学習モデルで検証し,手指のキーポイント誤りの最小値を得た。データセットおよびその他の詳細は、補足材料を備える。グリップ生成とレンダリングパイプラインのソースコードが公開されます。 We present HUP-3D, a 3D multi-view multi-modal synthetic dataset for hand-ultrasound (US) probe pose estimation in the context of obstetric ultrasound. Egocentric markerless 3D joint pose estimation has potential applications in mixed reality based medical education. The ability to understand hand and probe movements programmatically opens the door to tailored guidance and mentoring applications. Our dataset consists of over 31k sets of RGB, depth and segmentation mask frames, including pose related ground truth data, with a strong emphasis on image diversity and complexity. Adopting a camera viewpoint-based sphere concept allows us to capture a variety of views and generate multiple hand grasp poses using a pre-trained network. Additionally, our approach includes a software-based image rendering concept, enhancing diversity with various hand and arm textures, lighting conditions, and background images. Furthermore, we validated our proposed dataset with state-of-the-art learning models and we obtained the lowest hand-object keypoint errors. The dataset and other details are provided with the supplementary material. The source code of our grasp generation and rendering pipeline will be made publicly available.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# パン光学シーングラフ生成のための公正ランキングと新しいモデル A Fair Ranking and New Model for Panoptic Scene Graph Generation ( http://arxiv.org/abs/2407.09216v1 ) ライセンス: Link先を確認	Julian Lorenz, Alexander Pest, Daniel Kienzle, Katja Ludwig, Rainer Lienhart,	(参考訳) パノプティック・シーングラフ生成(PSGG)では、モデルがパノプティック・セグメンテーション・マスクによってグラウンディングされた画像内のオブジェクト間の相互作用を検索する。従来,同一物体の複数のマスクがマスクとマスクのペアあたりの複数の関係分布を導出する誤った評価プロトコルが提案されてきた。これは最終スコアを上げるために利用することができる。我々は、この欠陥を修正し、既存のPSGGモデルに対して公正なランキングを提供する。既存の手法で観測されたスコアは、すべての2段階法で7.4 mR@50まで増加し、一方、1段階法では19.3 mR@50まで減少し、正しい評価の重要性を強調した。近年の論文とは対照的に,既存の2段階法は1段階法と競合することを示す。そこで本研究では,Decoupled SceneFormer(DSFormer)という,既存のシーングラフモデルに対して,修正した評価値に対して,+11mR@50と+10mNgR@50の大きなマージンで優れた2段階モデルを導入し,新たなSOTAを設定した。基本設計原則として、DSFormerは被写体とオブジェクトマスクを直接特徴空間にエンコードする。 In panoptic scene graph generation (PSGG), models retrieve interactions between objects in an image which are grounded by panoptic segmentation masks. Previous evaluations on panoptic scene graphs have been subject to an erroneous evaluation protocol where multiple masks for the same object can lead to multiple relation distributions per mask-mask pair. This can be exploited to increase the final score. We correct this flaw and provide a fair ranking over a wide range of existing PSGG models. The observed scores for existing methods increase by up to 7.4 mR@50 for all two-stage methods, while dropping by up to 19.3 mR@50 for all one-stage methods, highlighting the importance of a correct evaluation. Contrary to recent publications, we show that existing two-stage methods are competitive to one-stage methods. Building on this, we introduce the Decoupled SceneFormer (DSFormer), a novel two-stage model that outperforms all existing scene graph models by a large margin of +11 mR@50 and +10 mNgR@50 on the corrected evaluation, thus setting a new SOTA. As a core design principle, DSFormer encodes subject and object masks directly into feature space.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# 移動ミラーモデルにおけるホーキング放射の量子エンタングルメントの指標としての2次コヒーレンス Second-Order Coherence as an Indicator of Quantum Entanglement of Hawking Radiation in Moving-Mirror Models ( http://arxiv.org/abs/2407.09218v1 ) ライセンス: Link先を確認	Masanori Tomonaga, Yasusada Nambu,	(参考訳) 光の2階コヒーレンス(英: second-order coherence of light)は、光の量子特性を評価するために広く知られている物理量であり、その性質は量子光学の分野で広く研究されている。近年,量子エンタングルメントの指標として2階コヒーレンスを利用する方法が提案されている。本研究では,ブラックホールからのホーキング放射のアナログモデルとして機能する移動ミラーモデルを用いて,第2次コヒーレンスの評価を行った。本研究では, ホーキング放射の熱性によるノイズ効果に注意を払って, 絡み合いと2次コヒーレンスの関係を考察し, 2量子検出器による絡み合いハーベスティングプロトコルの量子相関を低減させる。 The second-order coherence of light is a widely recognized physical quantity used to assess the quantum characteristics of light, and its properties have been extensively investigated in the field of quantum optics. Recently, it has been proposed that second-order coherence can be utilized as an indicator of quantum entanglement. In this study, we evaluated the second-order coherence in the context of the moving-mirror model, which serves as an analog model for Hawking radiation from a black hole. We discuss the relation between entanglement and the second-order coherence of Hawking radiation paying attention to the noise effect due to the thermality of Hawking radiation, which reduces the quantum correlation in the entanglement-harvesting protocol with two-qubit detectors.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# AI評価を評価する - 限界と展望 Evaluating AI Evaluation: Perils and Prospects ( http://arxiv.org/abs/2407.09221v1 ) ライセンス: Link先を確認	John Burden,	(参考訳) AIシステムは、常に増加する能力と一般性を示すため、その真の可能性と安全性を評価することが最重要である。本稿では,これらのシステムに対する評価手法が根本的に不適切であり,AIに関連するリスクや潜在的なリスクを高めることを主張する。 AIシステムを評価する方法には改革が必要であり、多種多様な知能を評価する長年の伝統を持つ我々のアプローチにインスピレーションを得るために、認知科学に目を向けるべきである、と私は主張する。我々は、認知にインスパイアされたアプローチを汎用AIシステムに適用する上で、克服すべき課題のいくつかを特定し、また、新たな領域である"Evals"を分析する。論文は、AI評価を洗練させる有望な研究経路を特定し、安全なAIシステムの開発に寄与する厳格な科学領域へと前進させることで、結論付けている。 As AI systems appear to exhibit ever-increasing capability and generality, assessing their true potential and safety becomes paramount. This paper contends that the prevalent evaluation methods for these systems are fundamentally inadequate, heightening the risks and potential hazards associated with AI. I argue that a reformation is required in the way we evaluate AI systems and that we should look towards cognitive sciences for inspiration in our approaches, which have a longstanding tradition of assessing general intelligence across diverse species. We will identify some of the difficulties that need to be overcome when applying cognitively-inspired approaches to general-purpose AI systems and also analyse the emerging area of "Evals". The paper concludes by identifying promising research pathways that could refine AI evaluation, advancing it towards a rigorous scientific domain that contributes to the development of safe AI systems.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# 量子シミュレータ上での状態の断熱的調製によるシュウィンガーモデルの位相図 Phase Diagram of the Schwinger Model by Adiabatic Preparation of States on a Quantum Simulator ( http://arxiv.org/abs/2407.09224v1 ) ライセンス: Link先を確認	Oleg Kaikov, Theo Saporiti, Vasily Sazonov, Mohamed Tamaazousti,	(参考訳) 我々は、状態の断熱的準備を通じて量子デバイス上の量子物理系の位相構造を研究することが可能であると主張している。位相的$\theta$-term の存在下で、新しい手法を導入し、Schwinger モデルに適用することに成功した。対応する相図の1次相転移領域と非相転移領域について検討する。この方法の中核となる考え方は、時間依存のハミルトニアンで基底と最初の励起状態を別々に発展させることであり、その時間依存性は$\theta$の異なる値の間を補間する。我々のアプローチは断熱的定理の直接的な応用であるにもかかわらず、いくつかのケースでは、断熱的状態の準備を兼ね備えた文献と異なる方法と比較してその利点を実証することができる。 We argue the feasibility to study the phase structure of a quantum physical system on quantum devices via adiabatic preparation of states. We introduce a novel method and successfully test it in application to the Schwinger model in the presence of a topological $\theta$-term. We explore the first-order-phase-transition and the no-transition regions of the corresponding phase diagram. The core idea of the method is to separately evolve the ground and the first excited states with a time-dependent Hamiltonian, the time-dependence of which interpolates between different values of $\theta$. Despite our approach being a direct application of the adiabatic theorem, in some cases we are able to demonstrate its advantages in comparison to a different method from the literature that also employs adiabatic state preparation.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# 光子を媒介とする相互作用を持つ量子気体中のサブラジアントパターンの安定性と崩壊 Stability and decay of subradiant patterns in a quantum gas with photon-mediated interactions ( http://arxiv.org/abs/2407.09227v1 ) ライセンス: Link先を確認	Alexander Baumgärtner, Simon Hertlein, Tom Schmit, Davide Dreon, Carlos Máximo, Xiangliang Li, Giovanna Morigi, Tobias Donner,	(参考訳) 自発放射の驚くほどの抑制を特徴とするサブ放射の現象は、散布器の集団行動に対する従来の期待に挑戦する。 2つの光学キャビティのモード交差に位置付けられたボース・アインシュタイン凝縮体の実験的設定におけるサブラディアンスについて検討した。この設定では、準放射は準安定密度構造として現れ、1つのキャビティモードへの放出を抑えることで、系のエネルギーを最小化する静止超放射格子への緩和を防ぐ。我々は、数百ミリ秒を超えるサブラジアント状態の寿命を観測し、系の特徴的な動的時間スケールをはるかに超えている。最終的に不安定がスーパーラジアント定常パターンへの急激な遷移を引き起こす。我々は、これらの力学を量子平均場モデルにより再現し、宇宙物理クラスターやプラズマのような他の長距離相互作用系で予測される準定常状態と、準定常状態の特性を共有することを示唆した。この挙動は光子を媒介する長距離力の可能性を制御可能で利用可能な量子協調現象として強調している。 The phenomenon of subradiance, marked by its surprising suppression of spontaneous emission, challenges conventional expectations of the collective behavior of scatterers. We study subradiance in the experimental setting of a Bose-Einstein condensate positioned at the mode crossing of two optical cavities. In this setup, subradiance manifests in the form of metastable density structures that suppress emission into one cavity mode, thereby preventing relaxation to the stationary, superradiant grating that minimizes the system's energy. We observe lifetimes of the subradiant states exceeding hundred milliseconds, far surpassing any characteristic dynamic time scale of the system. Eventually, an instability triggers a rapid transition to the superradiant stationary pattern. We reproduce these dynamics by a quantum mean field model, suggesting that subradiance shares characteristics with quasi-stationary states predicted in other long-range interacting systems such as astrophysical clusters and plasmas. This behavior highlights the potential of photon-mediated long-range forces as controllable and exploitable quantum cooperative phenomenon.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# 外科的テキスト・画像生成 Surgical Text-to-Image Generation ( http://arxiv.org/abs/2407.09230v1 ) ライセンス: Link先を確認	Chinedu Innocent Nwoye, Rupak Bose, Kareem Elgohary, Lorenzo Arboit, Giorgio Carlino, Joël L. Lavanchy, Pietro Mascagni, Nicolas Padoy,	(参考訳) 研究開発のための外科的データを取得することは、高いアノテーションコストと実践的および倫理的制約によって著しく妨げられている。合成画像を利用することは、価値ある代替手段となるかもしれない。本研究は,ColecT50データセットを用いて,手術領域におけるテキスト・ツー・イメージ生成モデルの適用について詳細な解析を行い,手術行動トリガ(インストラメント,動詞,ターゲット)を付加した手術画像を提供する。様々な言語モデルについて検討し,T5は三重項に基づくテキスト入力に基づく手術動作の識別に,より明確な特徴を提供する。分析の結果,三重奏法と三重奏法を併用し,三重奏法と三重奏法を併用した。本稿では,3重テキスト埋め込みが潜時空間において楽器中心であることを明らかにすることで,付加的な入力信号を持たない3重テキストキャプション上でのテキスト・ツー・イメージモデルの訓練課題に対処する。拡散型画像生成モデルであるRetensing Imagenを用いて,三重項ベースのテキストプロンプトからフォトリアリスティックかつ活動対応の手術画像を生成する。 FIDやCLIPスコアなど,人間の専門家による調査や自動化手法など,さまざまな指標を用いたモデルの評価を行った。我々は、品質、アライメント、推論、知識、堅牢性といった重要な側面でモデルパフォーマンスを評価し、実際のデータ収集の現実的な代替手段を提供する上で、我々のアプローチの有効性を実証する。 Acquiring surgical data for research and development is significantly hindered by high annotation costs and practical and ethical constraints. Utilizing synthetically generated images could offer a valuable alternative. In this work, we conduct an in-depth analysis on adapting text-to-image generative models for the surgical domain, leveraging the CholecT50 dataset, which provides surgical images annotated with surgical action triplets (instrument, verb, target). We investigate various language models and find T5 to offer more distinct features for differentiating surgical actions based on triplet-based textual inputs. Our analysis demonstrates strong alignment between long and triplet-based captions, supporting the use of triplet-based labels. We address the challenges in training text-to-image models on triplet-based captions without additional input signals by uncovering that triplet text embeddings are instrument-centric in the latent space and then, by designing an instrument-based class balancing technique to counteract the imbalance and skewness in the surgical data, improving training convergence. Extending Imagen, a diffusion-based generative model, we develop Surgical Imagen to generate photorealistic and activity-aligned surgical images from triplet-based textual prompts. We evaluate our model using diverse metrics, including human expert surveys and automated methods like FID and CLIP scores. We assess the model performance on key aspects: quality, alignment, reasoning, knowledge, and robustness, demonstrating the effectiveness of our approach in providing a realistic alternative to real data collection.	翻訳日:2024-07-15 23:38:05 公開日:2024-07-12
# Prompts First, そして最後に Prompts First, Finally ( http://arxiv.org/abs/2407.09231v1 ) ライセンス: Link先を確認	Brent N. Reeves, James Prather, Paul Denny, Juho Leinonen, Stephen MacNeil, Brett A. Becker, Andrew Luxton-Reilly,	(参考訳) 生成AI(GenAI)と特に大規模言語モデルは、コンピュータサイエンス教育を妨害している。彼らはますます多くの課題に対して能力を発揮している。一部の教育者は、彼らがコンピューティング教育に深刻な脅威をもたらし、教室での彼らの使用を禁止すべきだと主張している。真剣なGenAI問題はまだ解決されていないが、現在の段階では、コンピュータサイエンスの全体的な軌跡を調べるのに有用かもしれない。初めから、我々の規律は、それぞれの新しい表現における抽象化のレベルを高めようとしてきた。ハードウェアのディップスイッチから、特別な目的言語やフローチャートのような視覚表現を経て、今では‘自然言語’へと進化しています。「」GenAIの出現により、学生はついに問題の抽象レベルを「言語」から「問題解決」へと変えることができる。本稿では、プログラミングの抽象化は常に自然言語に向けられていると論じる。今こそ,コンピュータサイエンス教育に ‘Prompts First’ アプローチを採用する時だ。 Generative AI (GenAI) and large language models in particular, are disrupting Computer Science Education. They are proving increasingly capable at more and more challenges. Some educators argue that they pose a serious threat to computing education, and that we should ban their use in the classroom. While there are serious GenAI issues that remain unsolved, it may be useful in the present moment to step back and examine the overall trajectory of Computer Science writ large. Since the very beginning, our discipline has sought to increase the level of abstraction in each new representation. We have progressed from hardware dip switches, through special purpose languages and visual representations like flow charts, all the way now to ``natural language.'' With the advent of GenAI, students can finally change the abstraction level of a problem to the ``language'' they've been ``problem solving'' with all their lives. In this paper, we argue that our programming abstractions were always headed here -- to natural language. Now is the time to adopt a ``Prompts First'' approach to Computer Science Education.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 畳み込みニューラルネットワーク画像における人間の直観のモデル化 Modelling the Human Intuition to Complete the Missing Information in Images for Convolutional Neural Networks ( http://arxiv.org/abs/2407.09236v1 ) ライセンス: Link先を確認	Robin Koç, Fatoş T. Yarman Vural,	(参考訳) 本研究では、直観をモデル化し、この形式を組み込んで畳み込みニューラルネットワークの性能向上を図る。何十年もの研究にもかかわらず、曖昧さは直観の原理に根ざしている。実験心理学は、人間の心の状態に依存する多くの種類の直観を明らかにする。視覚認知タスクにおいて、行方不明情報を完成させるのに有用な視覚的直観に焦点を当てる。まず,データセットの画像中の視覚情報の量を徐々に減らし,CNNの精度への影響を調べるシナリオを設定した。そして、ゲシュタルト理論を用いて視覚的直観のモデルを示す。この理論は、人間が意識下の経験に基づいて一連のテンプレートを導出すると主張している。脳がオクルージョンのようなシーンに欠落している情報があると判断すると、その欠落した部分を最もよく似たものに置き換えることで、即座に情報を完成させる。 Gestalt理論に基づいて、視覚的直観を2つの層でモデル化する。これらの層の詳細は、全紙に記載されている。我々は、MNISTデータセットを用いて、不足した情報を完成させるために提案された直観モデルをテストする。実験により、拡張CNNアーキテクチャは、不完全画像を使用する場合の古典モデルと比較して高い性能を提供することが示された。 In this study, we attempt to model intuition and incorporate this formalism to improve the performance of the Convolutional Neural Networks. Despite decades of research, ambiguities persist on principles of intuition. Experimental psychology reveals many types of intuition, which depend on state of the human mind. We focus on visual intuition, useful for completing missing information during visual cognitive tasks. First, we set up a scenario to gradually decrease the amount of visual information in the images of a dataset to examine its impact on CNN accuracy. Then, we represent a model for visual intuition using Gestalt theory. The theory claims that humans derive a set of templates according to their subconscious experiences. When the brain decides that there is missing information in a scene, such as occlusion, it instantaneously completes the information by replacing the missing parts with the most similar ones. Based upon Gestalt theory, we model the visual intuition, in two layers. Details of these layers are provided throughout the paper. We use the MNIST data set to test the suggested intuition model for completing the missing information. Experiments show that the augmented CNN architecture provides higher performances compared to the classic models when using incomplete images.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# FedVAE:Federated Variational AutoEncoderに基づくトラジェクティブプライバシ保護 FedVAE: Trajectory privacy preserving based on Federated Variational AutoEncoder ( http://arxiv.org/abs/2407.09239v1 ) ライセンス: Link先を確認	Yuchen Jiang, Ying Wu, Shiyao Zhang, James J. Q. Yu,	(参考訳) 知的交通システム(ITS)や様々な交通システムタスクにおいて,空間的情報量の多い軌跡データの利用が重要である。位置情報ベースのサービス(LBS)は、位置情報に合わせてパーソナライズされたサービスを提供するために、このトラジェクトリデータを活用する。しかし、この軌跡データには、ユーザの行動パターンや習慣に関する機密情報が含まれており、未知の収集者からの機密性や保護が必要である。この課題に対処するため、データセット内の個人情報を保護するために、K匿名性や差分プライバシーといったプライバシ保護手法が提案されている。その効果にもかかわらず、これらの手法は摂動を導入したり、非現実的な軌道データを生成することによって元の特徴に影響を与え、下流のタスクにおいて最適以下のパフォーマンスをもたらす。これらの制約を克服するため,FedVAE (Federated Variational AutoEncoder) アプローチを提案する。さらに、FedVAEは変分オートエンコーダ(VAE)を活用して、元の機能空間を維持し、新しいトラジェクトリデータを生成し、トレーニング段階ではフェデレートラーニング(FL)を組み込んで、ユーザのデータがローカルに保存されて個人情報を保護する。この結果は、位置情報ベースのアプリケーションにおけるデータプライバシとユーティリティを強化するための有望なソリューションとしてFedVAEを肯定する、既存の方法と比較して、優れたパフォーマンスを示している。 The use of trajectory data with abundant spatial-temporal information is pivotal in Intelligent Transport Systems (ITS) and various traffic system tasks. Location-Based Services (LBS) capitalize on this trajectory data to offer users personalized services tailored to their location information. However, this trajectory data contains sensitive information about users' movement patterns and habits, necessitating confidentiality and protection from unknown collectors. To address this challenge, privacy-preserving methods like K-anonymity and Differential Privacy have been proposed to safeguard private information in the dataset. Despite their effectiveness, these methods can impact the original features by introducing perturbations or generating unrealistic trajectory data, leading to suboptimal performance in downstream tasks. To overcome these limitations, we propose a Federated Variational AutoEncoder (FedVAE) approach, which effectively generates a new trajectory dataset while preserving the confidentiality of private information and retaining the structure of the original features. In addition, FedVAE leverages Variational AutoEncoder (VAE) to maintain the original feature space and generate new trajectory data, and incorporates Federated Learning (FL) during the training stage, ensuring that users' data remains locally stored to protect their personal information. The results demonstrate its superior performance compared to other existing methods, affirming FedVAE as a promising solution for enhancing data privacy and utility in location-based applications.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 言語モデリングの社会言語学的基礎 The Sociolinguistic Foundations of Language Modeling ( http://arxiv.org/abs/2407.09241v1 ) ライセンス: Link先を確認	Jack Grieve, Sara Bartl, Matteo Fuoli, Jason Grafmiller, Weihang Huang, Alejandro Jawerbaum, Akira Murakami, Marcus Perlman, Dana Roemling, Bodo Winter,	(参考訳) 本稿では,言語モデリングにおける社会言語学的視点を紹介する。我々は、大規模言語モデルは本質的に言語モデルのモデルであり、この洞察が大規模言語モデルの開発と展開にどのように役立つかを考える。我々はまず、社会言語学で開発された様々な言語の概念の技術的な定義を提示することから始める。次に、この視点が言語モデリングにおける5つの基本的な課題(社会的バイアス、ドメイン適応、アライメント、言語の変化、スケール)にどのように対処できるかについて議論する。最終的に、大規模言語モデルの性能と社会的価値を最大化するために、モデル化されている特定の言語の種類を正確に表現したトレーニングコーパスを慎重に定義し、コンパイルすることが重要であると論じる。 In this paper, we introduce a sociolinguistic perspective on language modeling. We claim that large language models are inherently models of varieties of language, and we consider how this insight can inform the development and deployment of large language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss how this perspective can help address five basic challenges in language modeling: social bias, domain adaptation, alignment, language change, and scale. Ultimately, we argue that it is crucial to carefully define and compile training corpora that accurately represent the specific varieties of language being modeled to maximize the performance and societal value of large language models.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 量子通信の物理層的側面:サーベイ Physical Layer Aspects of Quantum Communications: A Survey ( http://arxiv.org/abs/2407.09244v1 ) ライセンス: Link先を確認	Seid Koudia, Leonardo Oleynik, Mert Bayraktar, Junaid ur Rehman, Symeon Chatzinotas,	(参考訳) 量子通信システムは、分散量子コンピューティング、分散量子センシング、およびいくつかの暗号プロトコルの形式で独自のアプリケーションをサポートする。これらの通信システムの主な有効性は、未知の量子状態の転送を高速かつ忠実に行える効率的なインフラである。この偉業は、量子情報の制限と原則を尊重しつつ、利用可能な物理層資源を効率的に活用する通信システム設計への新しいアプローチを必要とする。古典的世界と量子的世界の間には根本的な違いがあるが、量子通信システムでも有益であることが証明される普遍的なコミュニケーション概念が存在する。本調査では,古典的通信と量子的通信の共通点と相違点を引き出す試みとして,物理層量子通信の特異な側面を強調した。具体的には、さまざまな光伝搬媒体上での量子チャネルとユースケースの概要を説明し、クロストークと干渉の概念に光を当てることから始める。その後、量子源、検出器、チャネル、変調技術を調査した。さらに,コヒーレント制御,多重化,多様性,MIMOといった空間多重化技術について議論し,分析する。最後に,2つの通信技術間の相乗効果と,次世代の量子通信システムの開発において重要な課題を明らかにする。 Quantum communication systems support unique applications in the form of distributed quantum computing, distributed quantum sensing, and several cryptographic protocols. The main enabler in these communication systems is an efficient infrastructure that is capable to transport unknown quantum states with high rate and fidelity. This feat requires a new approach to communication system design which efficiently exploits the available physical layer resources, while respecting the limitations and principles of quantum information. Despite the fundamental differences between the classic and quantum worlds, there exist universal communication concepts that may proven beneficial in quantum communication systems as well. In this survey, the distinctive aspects of physical layer quantum communications are highlighted in a attempt to draw commonalities and divergences between classic and quantum communications. More specifically, we begin by overviewing the quantum channels and use cases over diverse optical propagation media, shedding light on the concepts of crosstalk and interference. Subsequently, we survey quantum sources, detectors, channels and modulation techniques. More importantly, we discuss and analyze spatial multiplexing techniques, such as coherent control, multiplexing, diversity and MIMO. Finally, we identify synergies between the two communication technologies and grand open challenges that can be pivotal in the development of next-generation quantum communication systems.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 強化学習のための制約付き固有の動機づけ Constrained Intrinsic Motivation for Reinforcement Learning ( http://arxiv.org/abs/2407.09247v1 ) ライセンス: Link先を確認	Xiang Zheng, Xingjun Ma, Chao Shen, Cong Wang,	(参考訳) 本稿では,Reward-Free Pre-Training (RFPT)タスクにおける強化学習に内在的モチベーション(IM)を利用する場合と,内在的モチベーション(EIM)タスクによる探索(EIM)タスクにおいて生じる2つの基本的な問題点について検討する。 1)RFPTタスクに有効な本質的な目的を設計する方法、及び 2)EIMタスクにおける本質的な目的によってもたらされるバイアスを軽減する方法。既存のIM手法は、静的スキル、限られた状態カバレッジ、RFPTタスクのサンプル非効率、EIMタスクのサブ最適性に悩まされている。これらの問題に対処するため,RFPT と EIM のタスクに対して \emph{Constrained Intrinsic Motivation (CIM) を提案する。 1)RFPT用CIMは、状態エンコーダネットワーク上のアライメント制約を受ける条件状態エントロピーの下限を最大化し、動的かつ多様なスキル発見及び状態カバレッジの最大化を行う。 2) EIMのCIMは,制約付き政策最適化を利用して本質的目標の係数を適応的に調整し,本質的目標からの逸脱を軽減する。各種の MuJoCo ロボット環境において,RFPT の CIM が,スキル多様性,状態カバレッジ,微調整性能の面で,教師なしスキル発見のための 15 の IM 手法を大きく上回っていることを実証的に示す。また,当初から課題報酬が暴露された場合の本質的な報酬の再評価におけるCIMの有効性を示す。私たちのコードはhttps://github.com/x-zheng16/CIMで公開されています。 This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer from static skills, limited state coverage, sample inefficiency in RFPT tasks, and suboptimality in EIM tasks. To tackle these problems, we propose \emph{Constrained Intrinsic Motivation (CIM)} for RFPT and EIM tasks, respectively: 1) CIM for RFPT maximizes the lower bound of the conditional state entropy subject to an alignment constraint on the state encoder network for efficient dynamic and diverse skill discovery and state coverage maximization; 2) CIM for EIM leverages constrained policy optimization to adaptively adjust the coefficient of the intrinsic objective to mitigate the distraction from the intrinsic objective. In various MuJoCo robotics environments, we empirically show that CIM for RFPT greatly surpasses fifteen IM methods for unsupervised skill discovery in terms of skill diversity, state coverage, and fine-tuning performance. Additionally, we showcase the effectiveness of CIM for EIM in redeeming intrinsic rewards when task rewards are exposed from the beginning. Our code is available at https://github.com/x-zheng16/CIM.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 室内シーンのテクスチャ塗装改善のためのセマンティックUVマッピング Semantic UV mapping to improve texture inpainting for indoor scenes ( http://arxiv.org/abs/2407.09248v1 ) ライセンス: Link先を確認	Jelle Vermandere, Maarten Bassier, Maarten Vergauwen,	(参考訳) 本研究の目的は, 走査された屋内メッシュの粗い除去後のテクスチャ塗布の改善である。これは、室内のシーンのセマンティック情報を利用して、UVの島々と、壁や床のような異なる構造要素の3D表現をより正確に一致させる新しいUVマッピング前処理によって達成される。セマンティックUVマッピングは、幾何学的特徴だけでなく、現在のテクスチャに由来する視覚的特徴に依存することによって、古典的なUVアンラッピングアルゴリズムを豊かにする。セグメンテーションは紫外線マッピングを改善し、ゆるい物体を除去した後のシーンの3次元幾何学的再構成を同時に単純化する。各セグメント要素は、隣接素子の境界条件を用いて別々に再構成することができる。これは前処理のステップとして実行されるため、将来は幾何学的およびテクスチャ的再構成のための他の特殊な手法を用いて結果をさらに改善することができる。 This work aims to improve texture inpainting after clutter removal in scanned indoor meshes. This is achieved with a new UV mapping pre-processing step which leverages semantic information of indoor scenes to more accurately match the UV islands with the 3D representation of distinct structural elements like walls and floors. Semantic UV Mapping enriches classic UV unwrapping algorithms by not only relying on geometric features but also visual features originating from the present texture. The segmentation improves the UV mapping and simultaneously simplifies the 3D geometric reconstruction of the scene after the removal of loose objects. Each segmented element can be reconstructed separately using the boundary conditions of the adjacent elements. Because this is performed as a pre-processing step, other specialized methods for geometric and texture reconstruction can be used in the future to improve the results even further.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 多エージェントシステムのためのモデルベースRLを用いたGNN GNN with Model-based RL for Multi-agent Systems ( http://arxiv.org/abs/2407.09249v1 ) ライセンス: Link先を確認	Hanxiao Chen,	(参考訳) マルチエージェントシステム(MAS)は、マシンインテリジェンスと高度な応用を探索する上で重要な役割を果たしている。モデルベース強化学習を用いた状態空間グラフニューラルネットワークを用いて,MASのミッション(例えばビリヤード回避,自律走行車)に対処する。具体的には,まずGNNモデルを用いて,複数のエージェントの将来の状態や軌道を予測し,次にCEM(Cross-Entropy Method)最適化モデル予測制御を適用して,エゴエージェント計画動作を支援し,特定のMASタスクの達成に成功した。 Multi-agent systems (MAS) constitute a significant role in exploring machine intelligence and advanced applications. In order to deeply investigate complicated interactions within MAS scenarios, we originally propose "GNN for MBRL" model, which utilizes a state-spaced Graph Neural Networks with Model-based Reinforcement Learning to address specific MAS missions (e.g., Billiard-Avoidance, Autonomous Driving Cars). In detail, we firstly used GNN model to predict future states and trajectories of multiple agents, then applied the Cross-Entropy Method (CEM) optimized Model Predictive Control to assist the ego-agent planning actions and successfully accomplish certain MAS tasks.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# FedsLLM: ネットワーク上の大規模言語モデルのためのフェデレーション・スプリット学習 FedsLLM: Federated Split Learning for Large Language Models over Communication Networks ( http://arxiv.org/abs/2407.09250v1 ) ライセンス: Link先を確認	Kai Zhao, Zhaohui Yang, Chongwen Huang, Xiaoming Chen, Zhaoyang Zhang,	(参考訳) 本稿では,無線通信ネットワークに大規模言語モデルをデプロイする際の課題に対処するため,低ランク適応技術(LoRA)と分割学習フレームワークを併用し,大規模言語モデル(FedsLLM)フレームワークに対するフェデレーション分割学習を提案する。本稿ではLoRA技術を用いて,ネットワークをクライアントサブネットワークとサーバサブネットワークに分割することで処理負荷を削減する手法を提案する。フェデレーションサーバを活用して、クライアントモデルを集約し、更新する。トレーニングデータは、クライアントとメインサーバおよびフェデレーションサーバ間の無線ネットワークを介して送信されるので、学習精度と通信帯域の割り当てによりトレーニング遅延を決定する。本稿では,計算と通信の最適化を統合することにより,学習遅延の最小化をモデル化し,最適化問題を凸問題に単純化し,最適解を求める。さらに、この問題の正確な解を記述した補題も提示する。シミュレーションの結果、最適化アルゴリズムは最適化されていないシナリオと比較して平均47.63%の遅延を減少させることが示された。 Addressing the challenges of deploying large language models in wireless communication networks, this paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework. The method introduced in this paper utilizes LoRA technology to reduce processing loads by dividing the network into client subnetworks and server subnetworks. It leverages a federated server to aggregate and update client models. As the training data are transmitted through a wireless network between clients and both main and federated servers, the training delay is determined by the learning accuracy and the allocation of communication bandwidth. This paper models the minimization of the training delay by integrating computation and communication optimization, simplifying the optimization problem into a convex problem to find the optimal solution. Additionally, it presents a lemma that describes the precise solutions to this problem. Simulation results demonstrate that the proposed optimization algorithm reduces delays by an average of 47.63% compared to unoptimized scenarios.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 多レベルLp攻撃に対する深層防御 Deep Adversarial Defense Against Multilevel-Lp Attacks ( http://arxiv.org/abs/2407.09251v1 ) ライセンス: Link先を確認	Ren Wang, Yuxuan Li, Alfred Hero,	(参考訳) ディープラーニングモデルは、特に攻撃戦略が高度化するにつれて、敵の攻撃に対して重大な脆弱性を示す。従来の敵対的トレーニング(AT)技術はレジリエンスを提供するが、多くの場合、他のタイプで失敗する可能性のある、単一のタイプの攻撃、例えば$\ell_\infty$-norm攻撃に対する防御に重点を置いている。本稿では,複数の$\ell_p$-norm攻撃に対するディープラーニングモデルのレジリエンスを高めることを目的とした,EMRC(Efficient Robust Mode Connectivity)法という,計算効率のよいマルチレベル$\ell_p$ディフェンスを提案する。連続最適化で用いられる解析的連続アプローチと同様に、この手法は2つの$p$固有の逆最適モデル、$\ell_1$-と$\ell_\infty$-norm ATの解をブレンドし、$p$の範囲で良好な逆堅牢性を提供する。我々は,CIFAR-10, CIFAR-100 / PreResNet110, WideResNet, ViT-Baseを含むデータセット/アーキテクチャに対して,AT-$\ell_\infty$, E-AT, MSDと比較して,本手法が様々な攻撃に対して優れていることを示す実験を行った。 Deep learning models have shown considerable vulnerability to adversarial attacks, particularly as attacker strategies become more sophisticated. While traditional adversarial training (AT) techniques offer some resilience, they often focus on defending against a single type of attack, e.g., the $\ell_\infty$-norm attack, which can fail for other types. This paper introduces a computationally efficient multilevel $\ell_p$ defense, called the Efficient Robust Mode Connectivity (EMRC) method, which aims to enhance a deep learning model's resilience against multiple $\ell_p$-norm attacks. Similar to analytical continuation approaches used in continuous optimization, the method blends two $p$-specific adversarially optimal models, the $\ell_1$- and $\ell_\infty$-norm AT solutions, to provide good adversarial robustness for a range of $p$. We present experiments demonstrating that our approach performs better on various attacks as compared to AT-$\ell_\infty$, E-AT, and MSD, for datasets/architectures including: CIFAR-10, CIFAR-100 / PreResNet110, WideResNet, ViT-Base.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# RAGにおける効率的なアンサー生成のためのコンテキスト埋め込み Context Embeddings for Efficient Answer Generation in RAG ( http://arxiv.org/abs/2407.09252v1 ) ライセンス: Link先を確認	David Rau, Shuai Wang, Hervé Déjean, Stéphane Clinchant,	(参考訳) Retrieval-Augmented Generation (RAG) は、入力を外部情報で拡張することで、LLMの限られた知識を克服することができる。結果として、モデルへのコンテキスト入力はずっと長くなり、ユーザが答えを待つ時間に直接変換するデコード時間を遅くする。この課題に対処するために、COCOMという効果的なコンテキスト圧縮手法を提案し、長いコンテキストを少数のコンテキスト埋め込みに減らし、生成時間を大きなマージンで高速化する。提案手法では,デコード時間と解答品質の異なる圧縮速度が可能である。以前の方法と比較すると、COCOMは複数のコンテキストをより効果的に扱えるようになり、長い入力の復号時間を大幅に短縮する。提案手法では,最大5.69$\times$の高速化を実現しつつ,既存の効率的な文脈圧縮手法と比較して高い性能を実現している。 Retrieval-Augmented Generation (RAG) allows overcoming the limited knowledge of LLMs by extending the input with external information. As a consequence, the contextual inputs to the model become much longer which slows down decoding time directly translating to the time a user has to wait for an answer. We address this challenge by presenting COCOM, an effective context compression method, reducing long contexts to only a handful of Context Embeddings speeding up the generation time by a large margin. Our method allows for different compression rates trading off decoding time for answer quality. Compared to earlier methods, COCOM allows for handling multiple contexts more effectively, significantly reducing decoding time for long inputs. Our method demonstrates a speed-up of up to 5.69 $\times$ while achieving higher performance compared to existing efficient context compression methods.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 縮退近傍に励起されたドープフォトニック結晶繊維の波長変換 Tunable frequency conversion in doped photonic crystal fiber pumped near degeneracy ( http://arxiv.org/abs/2407.09266v1 ) ライセンス: Link先を確認	Leah R Murphy, Mateusz J Olszewski, Petros Androvitsaneas, Miguel Alvarez Perez, Will A M Smith, Anthony J Bennett, Peter J Mosley, Alex O C Davis,	(参考訳) 将来の量子ネットワークは、異なる波長帯域間で光的に符号化された量子情報をコヒーレントに転送する能力に依存する。光ファイバーにおけるブラッグ散乱4波混合は、これを実現するには有望な方法であるが、正確な分散制御と信号、ターゲット、ポンプ波長でのブロードバンド伝送を必要とする。本稿では、InAs量子ドットに基づく効率的な単一光子源の放出範囲内で、1550nm、Cバンド、920nmの群速度マッチングを特徴とするゲルマニウムドープコアを有するフォトニック結晶ファイバを紹介する。長波長でも低い色のウォークオフと優れた光学誘導により、この繊維は920nm付近の波長間のナノメートルスケールの周波数シフトを最大79.4\%内部変換効率で達成し、異種InAsドットをインタフェース化することができる。また、この周波数変換をカスケードして、通信波長から周波数コムを生成する方法も示す。最後に、ファイバを用いて、918nm付近の弱古典信号からCバンドへの可変周波数変換を示す。 Future quantum networks will rely on the ability to coherently transfer optically encoded quantum information between different wavelength bands. Bragg-scattering four-wave mixing in optical fiber is a promising route to achieving this, but requires fibers with precise dispersion control and broadband transmission at signal, target and pump wavelengths. Here we introduce a photonic crystal fiber with a germanium-doped core featuring group velocity matching at 1550 nm, the telecoms C-band, and 920 nm, within the emission range of efficient single photon sources based on InAs quantum dots. With low chromatic walk-off and good optical guidance even at long wavelengths, large lengths of this fiber are used to achieve nanometer-scale frequency shifts between wavelengths around 920 nm with up to 79.4\% internal conversion efficiency, allowing dissimilar InAs dots to be interfaced. We also show how cascading this frequency conversion can be used to generate a frequency comb away from telecoms wavelengths. Finally, we use the fiber to demonstrate tunable frequency conversion of weak classical signals around 918 nm to the telecoms C-band.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 医用画像復元用領域注意変換器 Region Attention Transformer for Medical Image Restoration ( http://arxiv.org/abs/2407.09268v1 ) ライセンス: Link先を確認	Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Zhou, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu,	(参考訳) トランスフォーマーを用いた手法は, 空間次元におけるマルチヘッド自己注意機構(MSA)による医用画像復元において, 顕著な結果を示した。しかし、既存のトランスフォーマーの大多数は、画像全体または固定パッチ全体(\text{e g })に注意を払っており、無関係な領域からの干渉と連続した画像内容の断片化をもたらす。これらの課題を克服するために,領域ベースマルチヘッド自己注意機構(R-MSA)を利用した新しい領域注意変換器(RAT)を導入する。 R-MSAは、ロバストなSegment Anything Model(SAM)を使用して、入力画像を非重複セマンティック領域に動的に分割し、これらの領域内で自己注意を行う。この領域分割はより柔軟で解釈可能であり、類似のセマンティック領域のピクセルだけが互いに補完し合い、無関係な領域からの干渉を排除できる。さらに,高拡散領域の回復に適応的に焦点を合わせるために,焦点領域の損失を導入する。 PET画像合成,CT画像デノイング,病理画像超解像など,様々な医用画像復元作業におけるRATの有効性を広範囲にわたる実験により実証した。コードは \href{https://github.com/Yaziwel/Region-Attention-Transformer-for-Medical-Image-Restoration.git}{https://github.com/RAT} で公開されている。 Transformer-based methods have demonstrated impressive results in medical image restoration, attributed to the multi-head self-attention (MSA) mechanism in the spatial dimension. However, the majority of existing Transformers conduct attention within fixed and coarsely partitioned regions (\text{e.g.} the entire image or fixed patches), resulting in interference from irrelevant regions and fragmentation of continuous image content. To overcome these challenges, we introduce a novel Region Attention Transformer (RAT) that utilizes a region-based multi-head self-attention mechanism (R-MSA). The R-MSA dynamically partitions the input image into non-overlapping semantic regions using the robust Segment Anything Model (SAM) and then performs self-attention within these regions. This region partitioning is more flexible and interpretable, ensuring that only pixels from similar semantic regions complement each other, thereby eliminating interference from irrelevant regions. Moreover, we introduce a focal region loss to guide our model to adaptively focus on recovering high-difficulty regions. Extensive experiments demonstrate the effectiveness of RAT in various medical image restoration tasks, including PET image synthesis, CT image denoising, and pathological image super-resolution. Code is available at \href{https://github.com/Yaziwel/Region-Attention-Transformer-for-Medical-Image-Restoration.git}{https://github.com/RAT}.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# 多モードおよび絡み合った光場における実験的光子付加とサブトラクション Experimental photon addition and subtraction in multi-mode and entangled optical fields ( http://arxiv.org/abs/2407.09269v1 ) ライセンス: Link先を確認	Kishore Thapliyal, Jan Peřina Jr., Ondřej Haderka, Václav Michálek, Radek Machulka,	(参考訳) 多モード熱場, サブポアソン場, およびツインビームへの光子付加と減算を, 1つの実験装置を用いて相互に比較した。空間分解能の高い強化CCDカメラによって検出された密な空間相関を持つツインビームを用いて初期フィールドを調製する。最大3個の光子が加えられ、非古典的および非ガウス的状態に到達する。光子置換熱状態のみが古典的のままである。一般に、実験的光子付加状態は、同等の光子置換状態よりも非古典性と非ガウス性を示す。光子をツインビームで加減すると、両方のプロセスはツインビームの光子対によって得られる状態の同等の性質をもたらす。 Multiple photon addition and subtraction applied to multi-mode thermal and sub-Poissonian fields as well as twin beams is mutually compared using one experimental setup. Twin beams with tight spatial correlations detected by an intensified CCD camera with high spatial resolution are used to prepare the initial fields. Up to three photons are added or subtracted to arrive at the nonclassical and non-Gaussian states. Only the photon-subtracted thermal states remain classical. In general, the experimental photon-added states exhibit greater nonclassicality and non-Gaussianity than the comparable photon-subtracted states. Once photons are added or subtracted in twin beams, both processes result in comparable properties of the obtained states owing to twin-beam photon pairing.	翻訳日:2024-07-15 23:28:21 公開日:2024-07-12
# iNeMo:ロバストクラス増分学習のためのインクリメンタルニューラルネットワークモデル iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning ( http://arxiv.org/abs/2407.09271v1 ) ライセンス: Link先を確認	Tom Fischer, Yaoyao Liu, Artur Jesslen, Noor Ahmed, Prakhar Kaushik, Angtian Wang, Alan Yuille, Adam Kortylewski, Eddy Ilg,	(参考訳) 人間の性質と異なり、視覚タスクがディープラーニングモデルをトレーニングするのは、最初と固定データセットのみである。最近、さまざまなアプローチが連続的なデータストリームの処理に対処している。しかし、これらの手法をアウト・オブ・ディストリビューション(OOD)のシナリオに拡張することは、効果的に研究されていない。一方、近年、非連続ニューラルネットワークモデルは、そのようなOODシナリオを一般化する上で、強い性能を示すことが示されている。この決定的特性を連続的な学習環境で活用するために、時間とともに新しいメッシュで拡張可能なインクリメンタルニューラルネットワークモデルを提案する。さらに,今後の未確認クラスの特徴空間を予め割り当てる潜在空間初期化戦略と,各潜在空間領域に各クラスの特徴を連続的に保持させる位置正規化項を提案する。我々はPascal3DおよびObjectNet3Dデータセットの広範な実験により,本手法の有効性を実証し,本手法がドメイン内における分類基準を2～6 %,OOD環境では6～50 %で上回ることを示す。我々の研究は、ポーズ推定のための最初の漸進的な学習手法も提示している。私たちのコードとモデルはhttps://github.com/Fischer-Tom/iNeMo.orgで確認できます。 Different from human nature, it is still common practice today for vision tasks to train deep learning models only initially and on fixed datasets. A variety of approaches have recently addressed handling continual data streams. However, extending these methods to manage out-of-distribution (OOD) scenarios has not effectively been investigated. On the other hand, it has recently been shown that non-continual neural mesh models exhibit strong performance in generalizing to such OOD scenarios. To leverage this decisive property in a continual learning setting, we propose incremental neural mesh models that can be extended with new meshes over time. In addition, we present a latent space initialization strategy that enables us to allocate feature space for future unseen classes in advance and a positional regularization term that forces the features of the different classes to consistently stay in respective latent space regions. We demonstrate the effectiveness of our method through extensive experiments on the Pascal3D and ObjectNet3D datasets and show that our approach outperforms the baselines for classification by $2-6\%$ in the in-domain and by $6-50\%$ in the OOD setting. Our work also presents the first incremental learning approach for pose estimation. Our code and model can be found at https://github.com/Fischer-Tom/iNeMo.	翻訳日:2024-07-15 23:28:20 公開日:2024-07-12
# 大規模マルチモーダルモデルHelixProtXを用いたタンパク質生成のための配列・構造・記述の統一 Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX ( http://arxiv.org/abs/2407.09274v1 ) ライセンス: Link先を確認	Zhiyuan Chen, Tianhao Chen, Chenggang Xie, Yang Xue, Xiaonan Zhang, Jingbo Zhou, Xiaomin Fang,	(参考訳) タンパク質は生物学的システムの基本的な構成要素であり、配列、構造、テキストの記述を含む様々なモダリティを通して表現することができる。タンパク質研究のための深層学習と科学的大規模言語モデル(LLMs)の進歩にもかかわらず、現在の方法論は主に限定的なタスクに焦点を当てている。これらのアプローチはマルチモーダルタンパク質データの理解と生成を制限する。対照的に、大規模なマルチモーダルモデルでは、テキスト、画像、ビデオなどの任意のコンテンツを生成する可能性を示しており、それによってさまざまなドメインにわたるユーザーインタラクションが強化されている。これらのマルチモーダルモデル技術をタンパク質研究に統合することは、タンパク質の研究方法を変える可能性を秘めている。この目的のために我々は,タンパク質研究の包括的ソリューションを提供することを目的として,大規模マルチモーダルモデルに基づくシステムであるHelixProtXを紹介した。既存の方法とは異なり、任意の入力タンパク質モダリティを任意の所望のタンパク質モダリティに変換することができる。 The experimental results affirm the Advanced capabilities of HelixProtX, not generated functional descriptions from amino acid sequences, also in execution of critical task such design protein sequences and structure from textual descriptions。予備的な発見は、HelixProtXが、既存の最先端モデルよりも優れたタンパク質関連タスクを一貫して達成していることを示している。マルチモーダルな大型モデルをタンパク質研究に統合することで、HelixProtXはタンパク質生物学を理解するための新たな道を開き、科学的な発見を加速する。 Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. These approaches restrict the understanding and generation of multimodal protein data. In contrast, large multimodal models have demonstrated potential capabilities in generating any-to-any content like text, images, and videos, thus enriching user interactions across various domains. Integrating these multimodal model technologies into protein research offers significant promise by potentially transforming how proteins are studied. To this end, we introduce HelixProtX, a system built upon the large multimodal model, aiming to offer a comprehensive solution to protein research by supporting any-to-any protein modality generation. Unlike existing methods, it allows for the transformation of any input protein modality into any desired protein modality. The experimental results affirm the advanced capabilities of HelixProtX, not only in generating functional descriptions from amino acid sequences but also in executing critical tasks such as designing protein sequences and structures from textual descriptions. Preliminary findings indicate that HelixProtX consistently achieves superior accuracy across a range of protein-related tasks, outperforming existing state-of-the-art models. By integrating multimodal large models into protein research, HelixProtX opens new avenues for understanding protein biology, thereby promising to accelerate scientific discovery.	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# H2O-Danube3技術報告 H2O-Danube3 Technical Report ( http://arxiv.org/abs/2407.09276v1 ) ライセンス: Link先を確認	Pascal Pfeiffer, Philipp Singer, Yauhen Babakhin, Gabor Fodor, Nischay Dhankhar, Sri Satish Ambati,	(参考訳) 6Tトークンで訓練されたH2O-Danube3-4Bと、4Tトークンで訓練されたH2O-Danube3-500Mからなる一連の小言語モデルであるH2O-Danube3を提案する。我々のモデルは、チャットバージョンの最終教師ありチューニングの前に、主に英語のトークンを3段階に分けた高品質なWebデータに基づいて事前訓練されている。これらのモデルは、さまざまな学術的、チャット、微調整のベンチマークで非常に競争力のある指標を示している。コンパクトなアーキテクチャのおかげで、H2O-Danube3は最新のスマートフォン上で効率的に動作し、モバイル端末でもローカル推論と高速な処理を可能にする。私たちは、すべてのモデルをApache 2.0ライセンスの下で公開して、LLMをさらに経済的に幅広い聴衆に民主化させています。 We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# 軌道のコヒーレントアンサンブルによる波動関数の解釈 Interpretation of wave function by coherent ensembles of trajectories ( http://arxiv.org/abs/2407.09277v1 ) ライセンス: Link先を確認	Vladimir V. Kisil,	(参考訳) De~Broglie, Schr\odiger, Dirac, Feynman のアイデアを再利用し、量子力学における波動関数のアンサンブル解釈を再検討する。この目的のために、時空における量子軌道の集合のコヒーレンス(自動一致)を導入する。コヒーレンス条件は、ファインマン経路積分法の基礎となる古典的な作用に比例する位相を説明できる。したがって、我々の解釈は波動力学のよく知られた、テストされた概念と方法に基づいている。他のアンサンブル解釈と同様に、我々の手法は測定過程における波動関数の崩壊に関連する全ての問題やパラドックスを避けることができる。もう一つの結果は、特定のqビットが波動関数全体を表すと仮定した場合、量子計算や量子暗号法は機能しないということである。 We re-use some original ideas of de~Broglie, Schr\"odiger, Dirac and Feynman to revise the ensemble interpretation of wave function in quantum mechanics. To this end we introduce coherence (auto-concordance) of ensembles of quantum trajectories in the space-time. The coherence condition accounts phases proportional to classical action, which are in foundation of the Feynman path integral technique. Therefore, our interpretation is entirely based on well-known and tested concepts and methods of wave mechanics. Similarly to other ensemble interpretations our approach allows us to avoid all problems and paradoxes related to wave function collapse during a measurement process. Another consequence is that no quantum computation or quantum cryptography method will ever work if it assumes that a particular q-bit represents the entire wave function.	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# パラメトリックダウンコンバージョンにおける高次元最大絡み合った光子対 High-dimensional maximally entangled photon pairs in parametric down-conversion ( http://arxiv.org/abs/2407.09280v1 ) ライセンス: Link先を確認	Richard Bernecker, Baghdasar Baghdasaryan, Stephan Fritzsche,	(参考訳) 自発パラメトリックダウンコンバージョンから生成される光子対は、絡み合った2部フォトニックシステムを実現するための確立された方法である。軌道角運動量(OAM)を持つラゲール・ガウスモードは、高次元の絡み合った量子状態を実験的に設計するために一般的に利用される。次元 d>2 のヒルベルト空間の場合、最大絡み合った状態(MES)は量子通信プロトコルの容量とセキュリティを改善するのに役立つ。しかし、有限 OAM 基底のよく定義された高次元部分空間における MES の直接生成は依然として挑戦である。ここでは, ポンプビームの空間分布と結晶の非線形分布を利用して, サブ空間内でのOAMモードの追加空間フィルタリングを行なわずにMESを生成する方法について定式化する。我々は、最大絡み合った四角形 (d=3) と四角形 (d=5) を用いて、我々のアプローチを説明する。 Photon pairs generated from spontaneous parametric down-conversion are a well-established method to realize entangled bipartite photonic systems. Laguerre-Gaussian modes, which carry orbital angular momentum (OAM), are commonly exploited to engineer high-dimensional entangled quantum states experimentally. For Hilbert spaces with dimension d>2, maximally entangled states (MES) help to improve the capacity and security of quantum communication protocols, among several other promising features. However, the direct generation of MES in well-defined high-dimensional subspaces of the infinite OAM basis has remained a challenge. Here, we formalize how the spatial distribution of the pump beam and the nonlinear profile of the crystal can be utilized to generate MES without additional spatial filtering of OAM modes within a subspace. We illustrate our approach with maximally entangled qutrits (d=3) and ququints (d=5).	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# 人間の行動決定の予測と理解:大規模言語モデルと認知事例に基づく学習から Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning ( http://arxiv.org/abs/2407.09281v1 ) ライセンス: Link先を確認	Thuy Ngoc Nguyen, Kasturi Jamale, Cleotilde Gonzalez,	(参考訳) 大きな言語モデル(LLM)は、言語翻訳から複雑な推論まで、様々なタスクでその能力を実証している。人間の行動とバイアスの理解と予測は、人工知能(AI)支援システムに有用な支援を提供する上で不可欠である。本稿では,LLMの推論と生成能力を活用して,2つの逐次意思決定タスクにおける人間の行動を予測することによって,このギャップを解消する。これらのタスクには、搾取行動と探索行動のバランスをとることと、実際の意思決定プロセスのシミュレーションに不可欠な遅延フィードバックを扱うことが含まれる。我々は,LLMの性能を,人間の経験的意思決定を模倣した認知的インスタンスベース学習(IBL)モデルと比較した。以上の結果から,LLMはフィードバックを迅速に取り入れて予測精度を向上させることが示唆された。対照的に、認知的IBLモデルは、人間の探索行動をよりよく説明し、損失回避バイアスを効果的に捉えている。その結果,LLMを認知的アーキテクチャに統合することで,複雑な人間の意思決定パターンのモデリングと理解が促進される可能性が示唆された。 Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI) assisted systems to provide useful assistance, yet it remains an open question whether these models can achieve this. This paper addresses this gap by leveraging the reasoning and generative capabilities of the LLMs to predict human behavior in two sequential decision-making tasks. These tasks involve balancing between exploitative and exploratory actions and handling delayed feedback, both essential for simulating real-life decision processes. We compare the performance of LLMs with a cognitive instance-based learning (IBL) model, which imitates human experiential decision-making. Our findings indicate that LLMs excel at rapidly incorporating feedback to enhance prediction accuracy. In contrast, the cognitive IBL model better accounts for human exploratory behaviors and effectively captures loss aversion bias, i.e., the tendency to choose a sub-optimal goal with fewer step-cost penalties rather than exploring to find the optimal choice, even with limited experience. The results highlight the benefits of integrating LLMs with cognitive architectures, suggesting that this synergy could enhance the modeling and understanding of complex human decision-making patterns.	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# DAHRS:ダイバージェンスを意識した幻覚-SRLプロジェクション DAHRS: Divergence-Aware Hallucination-Remediated SRL Projection ( http://arxiv.org/abs/2407.09283v1 ) ライセンス: Link先を確認	Sangpil Youm, Brodie Mather, Chathuri Jayaweera, Juliana Prada, Bonnie Dorr,	(参考訳) セマンティックロールラベリング(SRL)は、機械翻訳、質問応答、要約、スタンス/ビリーフ検出など、多くの下流アプリケーションを強化している。しかし、多言語SRLモデルの構築は、複数の言語に対する意味的注釈付きコーパスが不足しているため困難である。さらに、大言語モデル(LLM)に基づく最先端SRLプロジェクション(XSRL)は、突発的なロールラベルで取り除かれた出力を出力する。このような幻覚の修復は、LSMの説明可能性の欠如のため、容易ではない。幻覚的役割ラベルは、初期アライメントに干渉する自然発生の発散タイプと関連していることを示す。言語的にインフォームドされたアライメントの修復と,greedy First-Come First-Assign (FCFA) SRL プロジェクションを併用し,Dergence-Aware Hallucination-Remediated SRL projection (DAHRS)を実装した。 DAHRSは、追加のトランスフォーマーベースの機械を使わずにSRLプロジェクションの精度を改善し、XSRLを人間と自動の比較の両方で打ち破り、フレーズレベルのSRLプロジェクション(例:EN-FR、EN-ES)に対応するためにワードを超えて前進する。根拠としてCoNLL-2009を用い、XSRLよりも87.6%、77.3%(EN-FR)、89.0%(EN-ES)の単語レベルF1を達成する。ヒトの句レベルの評価は89.1%(EN-FR)と91.0%(EN-ES)である。また、他の言語ペア(例えば、英語-タガログ)にアプローチを適用するために、分岐計量を定義する。 Semantic role labeling (SRL) enriches many downstream applications, e.g., machine translation, question answering, summarization, and stance/belief detection. However, building multilingual SRL models is challenging due to the scarcity of semantically annotated corpora for multiple languages. Moreover, state-of-the-art SRL projection (XSRL) based on large language models (LLMs) yields output that is riddled with spurious role labels. Remediation of such hallucinations is not straightforward due to the lack of explainability of LLMs. We show that hallucinated role labels are related to naturally occurring divergence types that interfere with initial alignments. We implement Divergence-Aware Hallucination-Remediated SRL projection (DAHRS), leveraging linguistically-informed alignment remediation followed by greedy First-Come First-Assign (FCFA) SRL projection. DAHRS improves the accuracy of SRL projection without additional transformer-based machinery, beating XSRL in both human and automatic comparisons, and advancing beyond headwords to accommodate phrase-level SRL projection (e.g., EN-FR, EN-ES). Using CoNLL-2009 as our ground truth, we achieve a higher word-level F1 over XSRL: 87.6% vs. 77.3% (EN-FR) and 89.0% vs. 82.7% (EN-ES). Human phrase-level assessments yield 89.1% (EN-FR) and 91.0% (EN-ES). We also define a divergence metric to adapt our approach to other language pairs (e.g., English-Tagalog).	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# MetaFood CVPR 2024 : 物理的インフォームド3D食品再構成への挑戦:方法と結果 MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results ( http://arxiv.org/abs/2407.09285v1 ) ライセンス: Link先を確認	Jiangpeng He, Yuhao Chen, Gautham Vinod, Talha Ibn Mahmud, Fengqing Zhu, Edward Delp, Alexander Wong, Pengcheng Xi, Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva, Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang, Yawei Jueluo, Chengyu Shi, Pengyu Wang,	(参考訳) 栄養・食事モニタリングにおけるコンピュータビジョンの応用への関心が高まり、食品の高度な3D再構成技術が開発されるようになった。しかし、高品質なデータの不足と産学連携の制限により、この分野の進歩は制限されている。近年の3Dリコンストラクションの進歩を踏まえ,メタフードワークショップと物理インフォームド3Dフードリコンストラクションの課題について紹介する。本課題は,2次元画像から,視認性チェッカーボードをサイズ基準として,食品の容積正確な3次元モデルを再構築することに焦点を当てる。参加者は, 難易度・中・難易度・難易度・難易度・難易度・難易度・難易度・難易度・難易度・難易度・難易度・難易度・難易度・難易度・難易度・難易度を選抜した20項目の3Dモデルを再構築した。簡単なレベルは200枚、中レベルは30枚、ハードレベルは1枚で再構築できる。合計16チームが最終テストフェーズで結果を提出しました。この課題で開発されたソリューションは、3D食品の復元において有望な成果を達成し、食事評価と栄養モニタリングのための部分推定の改善に有意な可能性を秘めている。このワークショップの課題とデータセットへのアクセスに関する詳細は、https://sites.google.com/view/cvpr-metafood-2024にある。 The increasing interest in computer vision applications for nutrition and dietary monitoring has led to the development of advanced 3D reconstruction techniques for food items. However, the scarcity of high-quality data and limited collaboration between industry and academia have constrained progress in this field. Building on recent advancements in 3D reconstruction, we host the MetaFood Workshop and its challenge for Physically Informed 3D Food Reconstruction. This challenge focuses on reconstructing volume-accurate 3D models of food items from 2D images, using a visible checkerboard as a size reference. Participants were tasked with reconstructing 3D models for 20 selected food items of varying difficulty levels: easy, medium, and hard. The easy level provides 200 images, the medium level provides 30 images, and the hard level provides only 1 image for reconstruction. In total, 16 teams submitted results in the final testing phase. The solutions developed in this challenge achieved promising results in 3D food reconstruction, with significant potential for improving portion estimation for dietary assessment and nutritional monitoring. More details about this workshop challenge and access to the dataset can be found at https://sites.google.com/view/cvpr-metafood-2024.	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# 仮想環境におけるゴールコンディション強化学習の指導 Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments ( http://arxiv.org/abs/2407.09287v1 ) ライセンス: Link先を確認	Zoya Volovikova, Alexey Skrynnik, Petr Kuderov, Aleksandr I. Panov,	(参考訳) 本研究では,人工知能エージェントが仮想環境内で複雑な言語命令を実行できるという課題に対処する。本フレームワークでは,これらの命令は複雑な言語構造と複数の相互依存的タスクを伴い,望まれる結果を達成するためにうまくナビゲートする必要があると仮定する。これらの複雑さを効果的に管理するために,大規模言語モデルの深い言語理解と強化学習エージェントの適応的行動実行能力を組み合わせた階層型フレームワークを提案する。言語モジュール(LLMに基づく)は、言語命令をハイレベルなアクションプランに変換し、事前訓練された強化学習エージェントによって実行される。 IGLUではエージェントが構造を構築するように指示され、Crafterではエージェントがタスクを実行し、言語コマンドに従って周辺環境のオブジェクトと対話する。 In this study, we address the issue of enabling an artificial intelligence agent to execute complex language instructions within virtual environments. In our framework, we assume that these instructions involve intricate linguistic structures and multiple interdependent tasks that must be navigated successfully to achieve the desired outcomes. To effectively manage these complexities, we propose a hierarchical framework that combines the deep language comprehension of large language models with the adaptive action-execution capabilities of reinforcement learning agents. The language module (based on LLM) translates the language instruction into a high-level action plan, which is then executed by a pre-trained reinforcement learning agent. We have demonstrated the effectiveness of our approach in two different environments: in IGLU, where agents are instructed to build structures, and in Crafter, where agents perform tasks and interact with objects in the surrounding environment according to language commands.	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# WSESeg:対話型セグメンテーションのためのベースライン付き冬季スポーツ機器セグメンテーションデータセットの導入 WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation ( http://arxiv.org/abs/2407.09288v1 ) ライセンス: Link先を確認	Robin Schön, Daniel Kienzle, Rainer Lienhart,	(参考訳) 本稿では,冬期スポーツ機器の10種類のカテゴリを対象とした,インスタンスセグメンテーションマスクを含む新しいデータセット,WSESeg(Winter Sports Equipment Segmentation)を提案する。さらに、これらのデータセット上でインタラクティブなセグメンテーション実験を行い、より効率的なラベリングの可能性を探る。 SAMモデルとHQ-SAMモデルは、ユーザガイドセグメンテーションを行うための基礎モデルとして概念化されている。彼らの主張する一般化能力を測定するために、WSESegでそれらを評価します。インタラクティブなセグメンテーションは、テスト期間中に容易に活用可能な真理データを作成する利点を提供するので、モデルを明示的に微調整することなく、改善のための可能性を探るため、様々なオンライン適応手法をテストする。実験の結果,適応手法がフェールレート (FR) とNoC (Number of Clicks) の指標を大幅に削減し,対話的なセグメンテーション結果の高速化が図られた。 In this paper we introduce a new dataset containing instance segmentation masks for ten different categories of winter sports equipment, called WSESeg (Winter Sports Equipment Segmentation). Furthermore, we carry out interactive segmentation experiments on said dataset to explore possibilities for efficient further labeling. The SAM and HQ-SAM models are conceptualized as foundation models for performing user guided segmentation. In order to measure their claimed generalization capability we evaluate them on WSESeg. Since interactive segmentation offers the benefit of creating easily exploitable ground truth data during test-time, we are going to test various online adaptation methods for the purpose of exploring potentials for improvements without having to fine-tune the models explicitly. Our experiments show that our adaptation methods drastically reduce the Failure Rate (FR) and Number of Clicks (NoC) metrics, which generally leads faster to better interactive segmentation results.	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# CEIPA:大規模言語モデルにおける非現実的説明可能なインクリメンタル・プロンプト・アタック解析 CEIPA: Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models ( http://arxiv.org/abs/2407.09292v1 ) ライセンス: Link先を確認	Dong Shu, Mingyu Jin, Tianle Chen, Chong Zhang, Yongfeng Zhang,	(参考訳) 本研究は, GPT-4 や LLaMA-2 などの大規模言語モデル (LLMs) における安全性とプライバシ対策を, 即時攻撃を説明可能な解析によって識別・緩和することにより, 安全性とプライバシ対策の推進の必要性を浮き彫りにしている。本稿では,攻撃効果を定量的に測定し,それらのモデルに埋め込まれた防御機構を探索するために,特定の方法でプロンプトを誘導する新しい手法であるCEIPAを提案する。本手法は,LSMによる有害反応の発生の背景にある要因を,段階的な対策手法によって解明する能力に特有である。素早い修正プロセスを4つの段階(単語、文、文字、文字と単語の組み合わせ)にまとめることで、LLM固有の感受性の徹底的な検証を容易にする。本研究から得られた知見は,反実的説明の洞察を提供するだけでなく,我々の枠組みが攻撃プロンプトの有効性を著しく向上させることを示すものである。 This study sheds light on the imperative need to bolster safety and privacy measures in large language models (LLMs), such as GPT-4 and LLaMA-2, by identifying and mitigating their vulnerabilities through explainable analysis of prompt attacks. We propose Counterfactual Explainable Incremental Prompt Attack (CEIPA), a novel technique where we guide prompts in a specific manner to quantitatively measure attack effectiveness and explore the embedded defense mechanisms in these models. Our approach is distinctive for its capacity to elucidate the reasons behind the generation of harmful responses by LLMs through an incremental counterfactual methodology. By organizing the prompt modification process into four incremental levels: (word, sentence, character, and a combination of character and word) we facilitate a thorough examination of the susceptibilities inherent to LLMs. The findings from our study not only provide counterfactual explanation insight but also demonstrate that our framework significantly enhances the effectiveness of attack prompts.	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12
# SS-SfP:(混合)偏光による自己監督形状のニューラル逆レンダリング SS-SfP:Neural Inverse Rendering for Self Supervised Shape from (Mixed) Polarization ( http://arxiv.org/abs/2407.09294v1 ) ライセンス: Link先を確認	Ashish Tiwari, Shanmuganathan Raman,	(参考訳) 本研究では,一視点偏光画像から物体やシーンの3次元形状(画素ごとの表面の正規度と深さ)を推定する,新しい逆レンダリングベースのフレームワークを提案する。 SfPの既存の物理学的および学習的手法は、一定の制約の下で、すなわち、実行する。 a) 本物の表面ではめったにない純粋に拡散する、または純粋に特異な反射ロスキャン装置の解像度により取得が困難で制限された直接監督のための地上真理面法則の利用可能性及び (c) 既知の屈折率。これらの制約を克服するために、我々は、修正偏光反射モデルに基づいて反射率キューと呼ばれる部分偏光とスペクトル反射成分を分離し、その後、逆レンダリングに基づくSS-SfPと呼ばれる自己教師付き深層学習フレームワークを用いて混合偏光下での形状を推定し、偏光データと推定反射キューで導かれる。さらに, 屈折率を非線形最小二乗解として求める。広範に定量的かつ質的な評価を行うことで、DeepSfPデータセットからの単純な単一オブジェクトシーンとSPWデータセットからの複雑なインザワイルドシーンに対して、完全に自己教師された設定で、提案フレームワークの有効性を確立した。我々の知る限りでは、これは完全に自己管理されたフレームワークで混合分極下でSfPに対処する学習ベースの最初のアプローチである。 We present a novel inverse rendering-based framework to estimate the 3D shape (per-pixel surface normals and depth) of objects and scenes from single-view polarization images, the problem popularly known as Shape from Polarization (SfP). The existing physics-based and learning-based methods for SfP perform under certain restrictions, i.e., (a) purely diffuse or purely specular reflections, which are seldom in the real surfaces, (b) availability of the ground truth surface normals for direct supervision that are hard to acquire and are limited by the scanner's resolution, and (c) known refractive index. To overcome these restrictions, we start by learning to separate the partially-polarized diffuse and specular reflection components, which we call reflectance cues, based on a modified polarization reflection model and then estimate shape under mixed polarization through an inverse-rendering based self-supervised deep learning framework called SS-SfP, guided by the polarization data and estimated reflectance cues. Furthermore, we also obtain the refractive index as a non-linear least squares solution. Through extensive quantitative and qualitative evaluation, we establish the efficacy of the proposed framework over simple single-object scenes from DeepSfP dataset and complex in-the-wild scenes from SPW dataset in an entirely self-supervised setting. To the best of our knowledge, this is the first learning-based approach to address SfP under mixed polarization in a completely self-supervised framework.	翻訳日:2024-07-15 23:18:28 公開日:2024-07-12

Title

Authors

Abstract

論文公表日・翻訳日

# 大規模言語モデルの記号的知識蒸留に関する調査

A Survey on Symbolic Knowledge Distillation of Large Language Models ( http://arxiv.org/abs/2408.10210v1 )

ライセンス: Link先を確認

Kamal Acharya, Alvaro Velasquez, Houbing Herbert Song,

(参考訳) 本稿では,Large Language Models (LLMs) における記号的知識蒸留の新たな重要領域について検討する。 Generative Pre-trained Transformer-3 (GPT-3) や Bidirectional Encoder Representations from Transformers (BERT) のようなLCMは、スケールと複雑さを拡大し続けており、その広範な知識を効果的に活用するという課題が最重要である。この調査は、これらのモデルに含まれる複雑な暗黙の知識を、より象徴的で明示的な形式に蒸留するプロセスに集中している。この変換は, LLMの解釈可能性, 効率, 適用性の向上に不可欠である。我々は,より小型で効率的な人工知能(AI)モデルの透明性と機能を改善するために,シンボリック・ナレッジ・蒸留をいかに利用できるかに着目し,方法論と応用に基づく既存の研究を分類する。本調査では,知識の深度を網羅的な形式で維持することを含む中核的な課題について論じ,この分野で開発された様々なアプローチや手法について考察する。現在の研究のギャップと今後の進歩の可能性を見極める。この調査は, LLMにおける記号的知識蒸留の総合的な概要を提供することを目的としており, よりアクセスしやすく, 効率的なAIシステムへの進化におけるその意義を浮き彫りにしている。

This survey paper delves into the emerging and critical area of symbolic knowledge distillation in Large Language Models (LLMs). As LLMs like Generative Pre-trained Transformer-3 (GPT-3) and Bidirectional Encoder Representations from Transformers (BERT) continue to expand in scale and complexity, the challenge of effectively harnessing their extensive knowledge becomes paramount. This survey concentrates on the process of distilling the intricate, often implicit knowledge contained within these models into a more symbolic, explicit form. This transformation is crucial for enhancing the interpretability, efficiency, and applicability of LLMs. We categorize the existing research based on methodologies and applications, focusing on how symbolic knowledge distillation can be used to improve the transparency and functionality of smaller, more efficient Artificial Intelligence (AI) models. The survey discusses the core challenges, including maintaining the depth of knowledge in a comprehensible format, and explores the various approaches and techniques that have been developed in this field. We identify gaps in current research and potential opportunities for future advancements. This survey aims to provide a comprehensive overview of symbolic knowledge distillation in LLMs, spotlighting its significance in the progression towards more accessible and efficient AI systems.

翻訳日:2024-11-08 06:44:48 公開日:2024-07-12

# ウィークリー・ニューヨーカー・コミック賞の受賞予想

Predicting Winning Captions for Weekly New Yorker Comics ( http://arxiv.org/abs/2407.18949v1 )

ライセンス: Link先を確認

Stanley Cao, Sonny Young,

(参考訳) 視覚変換器(ViTs)を用いた画像キャプションは、コンピュータビジョンと自然言語処理の重要な収束を示し、ユーザエクスペリエンスを高め、アクセシビリティを改善し、視覚データのテキスト表現を提供する。本稿では,ニューヨーク・カートゥーン・カートゥーン・キャプション・コンテスト(New Yorker Cartoon Caption Contest, New Yorker Cartoon Caption Contest)において,入賞者の機知とユーモアをエミュレートしたキャプションを生成することを目的とした,ニューヨーカーの漫画へのイメージキャプション技術の適用について検討する。この課題は、文化的ニュアンスやユーモアの理解とともに、洗練された視覚的・言語的な処理を必要とする。本稿では,ニューヨーカーの漫画キャプションコンテストのキャプションを生成するために,視覚変換器エンコーダデコーダモデルを用いた新しいベースラインを提案する。

Image captioning using Vision Transformers (ViTs) represents a pivotal convergence of computer vision and natural language processing, offering the potential to enhance user experiences, improve accessibility, and provide textual representations of visual data. This paper explores the application of image captioning techniques to New Yorker cartoons, aiming to generate captions that emulate the wit and humor of winning entries in the New Yorker Cartoon Caption Contest. This task necessitates sophisticated visual and linguistic processing, along with an understanding of cultural nuances and humor. We propose several new baselines for using vision transformer encoder-decoder models to generate captions for the New Yorker cartoon caption contest.

翻訳日:2024-08-05 01:06:22 公開日:2024-07-12

# Kantの立場から見た人工知能判断の不明瞭性

Unexplainability of Artificial Intelligence Judgments in Kant's Perspective ( http://arxiv.org/abs/2407.18950v1 )

ライセンス: Link先を確認

Jongwoo Seo,

(参考訳) カントの純粋推論批判は、認識論の歴史に大きく貢献し、人間の判断の先駆的な原理の構造を解明するためのカテゴリの表を提案している。人工知能(AI)の技術は機能主義に基づいて、人間の判断をシミュレートまたは再現すると主張している。この主張を評価するためには、AI判断が人間の判断の特徴を持っているかどうかを検討する必要がある。本稿は,AI判断が人間の判断の特性の観点から理解できない形態を示すものであることを論じる。判断の特性が重なるので、このAIの不確実性と呼ぶことができる。そして,身体的な直観のない概念は,視覚を通して機能を示すときの説明が困難であることを示す。最後に、AIが主語を通して文章を作成し、判断の要素である自然言語で述語するとしても、AIが人間が受け入れられるレベルの概念を理解しているかどうかを判断することは困難である、と説明する。これは、自然言語による説明が信頼できるかどうかが疑問であることを示している。

Kant's Critique of Pure Reason, a major contribution to the history of epistemology, proposes a table of categories to elucidate the structure of the a priori principle of human judgment. The technology of artificial intelligence (AI), based on functionalism, claims to simulate or replicate human judgment. To assess this claim, it is necessary to study whether AI judgment possesses the characteristics of human judgment. This paper argues that AI judgments exhibit a form that cannot be understood in terms of the characteristics of human judgments according to Kant. Because the characteristics of judgment overlap, we can call this AI's uncertainty. Then, I show that concepts without physical intuitions are not easy to explain when their functions are shown through vision. Finally, I illustrate that even if AI makes sentences through subject and predicate in natural language, which are components of judgment, it is difficult to determine whether AI understands the concepts to the level humans can accept. This shows that it is questionable whether the explanation through natural language is reliable.

翻訳日:2024-08-05 01:06:22 公開日:2024-07-12

# デジタルツインニング産業のためのフォトグラム計測 4.0(I4)システム

Photogrammetry for Digital Twinning Industry 4.0 (I4) Systems ( http://arxiv.org/abs/2407.18951v1 )

ライセンス: Link先を確認

Ahmed Alhamadah, Muntasir Mamun, Henry Harms, Mathew Redondo, Yu-Zheng Lin, Jesus Pacheco, Soheil Salehi, Pratik Satam,

(参考訳) 産業4.0の開始は、クラウドコンピューティング、機械学習(ML)、人工知能(AI)、ユニバーサルネットワーク接続の統合によって急速に製造業の世界を変え、パフォーマンスの最適化と生産性の向上をもたらしている。デジタルツイン(Digital Twins, DT)は、ソフトウェアシステムを利用して物理プロセスの振る舞いを再現する技術である。本稿では,写真を用いた物理オブジェクトを仮想3次元モデルに再構成するプロセスであるフォトグラメトリと3次元走査技術を用いて,ML/AIに基づく行動モデルと相互作用する「物理プロセス」の正確な視覚表現を実現することを目的とする。これを実現するために、私たちは、ステレオビジョン機能を備えたiPhone 15 Proを使って、産業用4.0システムの奥行きを捉えました。これらの画像を3Dスキャンツールを用いて処理することにより、DTモデルを作成するための3Dモデリングおよびレンダリングソフトウェアのための生の3Dモデルを作成しました。本手法の信頼性は, 地中真理(テープ測度を用いた手作業による計測)と本手法を用いて作成した最終3次元モデルとの間の誤差率を計測することによって強調する。全体的な平均誤差は4.97 %であり、標準偏差誤差は5.54 %である。本研究の結果から,コンシューマグレードデバイスを用いたフォトグラメトリは,スマートマニュファクチャリングのためのDTを作成するための効率的かつコスト効率のよいアプローチであり,フレキシブルなアプローチは,時間とともにモデルの反復的な改善を可能にすることが示唆された。

The onset of Industry 4.0 is rapidly transforming the manufacturing world through the integration of cloud computing, machine learning (ML), artificial intelligence (AI), and universal network connectivity, resulting in performance optimization and increase productivity. Digital Twins (DT) are one such transformational technology that leverages software systems to replicate physical process behavior, representing the physical process in a digital environment. This paper aims to explore the use of photogrammetry (which is the process of reconstructing physical objects into virtual 3D models using photographs) and 3D Scanning techniques to create accurate visual representation of the 'Physical Process', to interact with the ML/AI based behavior models. To achieve this, we have used a readily available consumer device, the iPhone 15 Pro, which features stereo vision capabilities, to capture the depth of an Industry 4.0 system. By processing these images using 3D scanning tools, we created a raw 3D model for 3D modeling and rendering software for the creation of a DT model. The paper highlights the reliability of this method by measuring the error rate in between the ground truth (measurements done manually using a tape measure) and the final 3D model created using this method. The overall mean error is 4.97\% and the overall standard deviation error is 5.54\% between the ground truth measurements and their photogrammetry counterparts. The results from this work indicate that photogrammetry using consumer-grade devices can be an efficient and cost-efficient approach to creating DTs for smart manufacturing, while the approaches flexibility allows for iterative improvements of the models over time.

翻訳日:2024-08-05 01:06:22 公開日:2024-07-12

# Real Face Video Animation Platform

Real Face Video Animation Platform ( http://arxiv.org/abs/2407.18955v1 )

ライセンス: Link先を確認

Xiaokai Chen, Xuan Liu, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su,

(参考訳) 近年、顔画像生成モデルが人気を博している。しかし、高品質のアニメスタイルの顔訓練セットがないため、誇張されたアニメスタイルの顔を扱う場合、表現力に欠けることが多い。複数のモデルをサポートしながら、実際の人間の顔から漫画的な顔へのリアルタイムな変換を可能にする顔アニメーションプラットフォームを提案する。 Gradioフレームワーク上に構築された当社のプラットフォームは,優れた対話性とユーザフレンドリ性を保証する。ユーザーは本物の顔のビデオや画像を入力し、好きな漫画のスタイルを選択することができる。システムは自動的に顔の特徴を分析し、必要な事前処理を実行し、適切なモデルを実行して表現力のあるアニメスタイルの顔を生成する。私たちは、HDTFデータセットを処理するために、システム内にさまざまなモデルを使用し、アニメーションの顔ビデオデータセットを作成します。

In recent years, facial video generation models have gained popularity. However, these models often lack expressive power when dealing with exaggerated anime-style faces due to the absence of high-quality anime-style face training sets. We propose a facial animation platform that enables real-time conversion from real human faces to cartoon-style faces, supporting multiple models. Built on the Gradio framework, our platform ensures excellent interactivity and user-friendliness. Users can input a real face video or image and select their desired cartoon style. The system will then automatically analyze facial features, execute necessary preprocessing, and invoke appropriate models to generate expressive anime-style faces. We employ a variety of models within our system to process the HDTF dataset, thereby creating an animated facial video dataset.

翻訳日:2024-08-05 01:06:22 公開日:2024-07-12

# 韓国における学生の学習満足度に影響を及ぼす要因の探索

Exploring Factors Affecting Student Learning Satisfaction during COVID-19 in South Korea ( http://arxiv.org/abs/2407.20234v1 )

ライセンス: Link先を確認

Jiwon Han, Chaeeun Ryu, Gayathri Nadarajan,

(参考訳) 学生の好みや学習満足度を理解することは、自己効力、パフォーマンス、エンゲージメントなどの学習特性に焦点を当てている。既存の研究は、学習満足度に影響を与える重要な要因を正確に特定できる統計モデルを構築してきたが、これらの要因間の複雑な関係を深く説明するとは限らない。本研究の目的は,個人学習者の特徴,指導的デザイン要素,社会的・環境要因など,パンデミック時の学生の好みや満足度に関連するいくつかの側面を理解することである。 2021年から2022年の間,韓国のスンキンクワン大学の302人の学生の回答が収集された。収集された情報には、性別、主な研究、学習時の満足度、モチベーションレベル、パフォーマンス、感情状態、学習環境が含まれていた。 Wilcoxon Rank sum test and Explainable Boosting Machine (EBM) was performed to determine significant difference in specific cohorts。 1)Wilcoxon Rank Sumテストを用いて、オフライン授業を受講した学生は、オンライン授業を受講した学生よりも、STEMとHASSの学生よりも、学習満足度が95%高いことを証明できる。 2)95.08%の精度に適合した説明可能なブースティングマシン(EBM)モデルは,学生の学習満足度に影響を及ぼす要因のトップ5,授業活動への参加に対する認識,専攻者,授業での議論の実施能力,家庭における学習空間の可利用性について検討した。学習満足度に肯定的な評価とクラスメートとの議論能力が有意な影響を与え,クラス活動参加に対する否定的な認識が学習満足度に負の影響を及ぼした。

Understanding students' preferences and learning satisfaction during COVID-19 has focused on learning attributes such as self-efficacy, performance, and engagement. Although existing efforts have constructed statistical models capable of accurately identifying significant factors impacting learning satisfaction, they do not necessarily explain the complex relationships among these factors in depth. This study aimed to understand several facets related to student learning preferences and satisfaction during the pandemic such as individual learner characteristics, instructional design elements and social and environmental factors. Responses from 302 students from Sungkyunkwan University, South Korea were collected between 2021 and 2022. Information gathered included their gender, study major, satisfaction and motivation levels when learning, perceived performance, emotional state and learning environment. Wilcoxon Rank sum test and Explainable Boosting Machine (EBM) were performed to determine significant differences in specific cohorts. The two core findings of the study are as follows:1) Using Wilcoxon Rank Sum test, we can attest with 95% confidence that students who took offline classes had significantly higher learning satisfaction, among other attributes, than those who took online classes, as with STEM versus HASS students; 2) An explainable boosting machine (EBM) model fitted to 95.08% accuracy determined the top five factors affecting students' learning satisfaction as their perceived performance, their perception on participating in class activities, their study majors, their ability to conduct discussions in class and the study space availability at home. Positive perceived performance and ability to discuss with classmates had a positive impact on learning satisfaction, while negative perception on class activities participation had a negative impact on learning satisfaction.

翻訳日:2024-08-05 00:56:24 公開日:2024-07-12

# 太陽ダイナミクス観測画像におけるニューラルベース映像圧縮

Neural-based Video Compression on Solar Dynamics Observatory Images ( http://arxiv.org/abs/2407.15730v1 )

ライセンス: Link先を確認

Atefeh Khoshkhahtinat, Ali Zafari, Piyush M. Mehta, Nasser M. Nasrabadi, Barbara J. Thompson, Michael S. F. Kirk, Daniel da Silva,

(参考訳) NASAのソーラー・ダイナミクス・オブザーバトリー(SDO)ミッションは、太陽の日常活動を監視するために膨大なデータを収集する。宇宙ミッション設計の領域では、データ圧縮は、限られたテレメトリレートによって引き起こされる課題に対処する上で重要な役割を担っている。データ圧縮の主な目的は、制約帯域内で作業するための効率的なデータ管理と送信を容易にすることである。本稿では,SDOの画像データ収集における圧縮率の高いニューラルビデオ圧縮手法を提案する。提案手法は、データ中の時間的および空間的冗長性の両方を活用することに焦点を当て、より効率的な圧縮を実現する。本研究では,入力画像から局所的情報とグローバル的情報の両方を効果的かつ効率的にキャプチャするトランスフォーマーモデルに基づくアーキテクチャを提案する。さらに,潜在表現の確率分布を正確にモデル化し,エントロピー復号処理の高速化を図るエントロピーモデルも備えている。エントロピーモデルは、チャネルに依存したアプローチを活用し、チェッカーボード型の局所的および大域的空間的コンテキストを利用する。提案手法はトランスフォーマーをベースとしたビデオ圧縮ネットワークとエントロピーモデルを組み合わせることで,H.264やH.265といった従来のビデオコーデックよりも優れた性能を示す。

NASA's Solar Dynamics Observatory (SDO) mission collects extensive data to monitor the Sun's daily activity. In the realm of space mission design, data compression plays a crucial role in addressing the challenges posed by limited telemetry rates. The primary objective of data compression is to facilitate efficient data management and transmission to work within the constrained bandwidth, thereby ensuring that essential information is captured while optimizing the utilization of available resources. This paper introduces a neural video compression technique that achieves a high compression ratio for the SDO's image data collection. The proposed approach focuses on leveraging both temporal and spatial redundancies in the data, leading to a more efficient compression. In this work, we introduce an architecture based on the Transformer model, which is specifically designed to capture both local and global information from input images in an effective and efficient manner. Additionally, our network is equipped with an entropy model that can accurately model the probability distribution of the latent representations and improves the speed of the entropy decoding step. The entropy model leverages a channel-dependent approach and utilizes checkerboard-shaped local and global spatial contexts. By combining the Transformer-based video compression network with our entropy model, the proposed compression algorithm demonstrates superior performance over traditional video codecs like H.264 and H.265, as confirmed by our experimental results.

翻訳日:2024-07-28 18:29:13 公開日:2024-07-12

# 大規模言語モデルの数学的推論能力向上のためのToken-Supervised Value Model

Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models ( http://arxiv.org/abs/2407.12863v1 )

ライセンス: Link先を確認

Jung Hyun Lee, June Yong Yang, Byeongho Heo, Dongyoon Han, Kang Min Yoo,

(参考訳) 大規模言語モデル(LLM)は、ステップバイステップの推論チェーンを通じて、数学における顕著な問題解決能力を実証している。しかし、後続の推論連鎖の品質と最終的な答えに影響を及ぼす誤りを推論することは、言語モデルの自己回帰的トークン・バイ・トーケン生成の性質に起因する。近年の研究では、推論経路の生成を導くために外部検証器の採用が提案されているが、既存の研究では、トークン・バイ・トークン・推論・チェーンの正確性を評価するために、ステップ・バイ・ステップのラベルで訓練されたモデルを利用している。その結果、彼らは推論経路内のトークンの識別的詳細を認識するのに苦労し、中間推論経路が正しい最終回答に向けて有望な軌道上にあるかどうかを評価する能力に欠ける。そこで我々は,有意な累積報酬(すなわち値)にトークンレベルの監督を適用した検証者のための新しい訓練手法を考案した。さらに,最終回答の今後の正しさの確率を減らし,結果の実証的推定を可能にすることで,累積報酬の実用的定式化を提案する。数学的推論ベンチマークによる実験結果から,Token-Supervised Value Model (TVM) は,Mistral と Llama を用いた GSM8K と MATH のステップバイステップ検証よりも優れていることが示された。

Large Language Models (LLMs) have demonstrated impressive problem-solving capabilities in mathematics through step-by-step reasoning chains. However, they are susceptible to reasoning errors that impact the quality of subsequent reasoning chains and the final answer due to language models' autoregressive token-by-token generating nature. Recent works have proposed adopting external verifiers to guide the generation of reasoning paths, but existing works utilize models that have been trained with step-by-step labels to assess the correctness of token-by-token reasoning chains. Consequently, they struggle to recognize discriminative details of tokens within a reasoning path and lack the ability to evaluate whether an intermediate reasoning path is on a promising track toward the correct final answer. To amend the lack of sound and token-grained math-verification signals, we devise a novel training scheme for verifiers that apply token-level supervision with the expected cumulative reward (i.e., value). Furthermore, we propose a practical formulation of the cumulative reward by reducing it to finding the probability of future correctness of the final answer and thereby enabling the empirical estimation of the value. Experimental results on mathematical reasoning benchmarks show that Token-Supervised Value Model (TVM) can outperform step-by-step verifiers on GSM8K and MATH with Mistral and Llama.

翻訳日:2024-07-19 20:02:37 公開日:2024-07-12

# 動的グラフラプラシアンを用いた時間進化ネットワークのクラスタリング

Clustering Time-Evolving Networks Using the Dynamic Graph Laplacian ( http://arxiv.org/abs/2407.12864v1 )

ライセンス: Link先を確認

Maia Trower, Nataša Djurdjevac Conrad, Stefan Klus,

(参考訳) 時間進化グラフは、ソーシャルネットワーク、トラフィックフロー、生物学的プロセスなどの複雑な力学系をモデル化する際に頻繁に発生する。これらの時間変化グラフ構造におけるコミュニティを特定し解析する技術を開発することは重要な課題である。本研究では,正準相関解析(CCA)を用いて,既存のスペクトルクラスタリングアルゴリズムを静的グラフから動的グラフへ一般化し,クラスタの時間的進化を捉える。この拡張正準相関フレームワークに基づいて、動的グラフラプラシアンを定義し、そのスペクトル特性について検討する。これらの概念を転送演算子を介して力学系理論に結合し,既存の手法と比較してベンチマークグラフ上での手法の利点を説明する。動的グラフ Laplacian は、有向グラフと無向グラフの時間経過に伴うクラスタ構造進化の明確な解釈を可能にすることを示す。

Time-evolving graphs arise frequently when modeling complex dynamical systems such as social networks, traffic flow, and biological processes. Developing techniques to identify and analyze communities in these time-varying graph structures is an important challenge. In this work, we generalize existing spectral clustering algorithms from static to dynamic graphs using canonical correlation analysis (CCA) to capture the temporal evolution of clusters. Based on this extended canonical correlation framework, we define the dynamic graph Laplacian and investigate its spectral properties. We connect these concepts to dynamical systems theory via transfer operators, and illustrate the advantages of our method on benchmark graphs by comparison with existing methods. We show that the dynamic graph Laplacian allows for a clear interpretation of cluster structure evolution over time for directed and undirected graphs.

翻訳日:2024-07-19 20:02:37 公開日:2024-07-12

# GRAD-SUM: 最適プロンプトエンジニアリングのためのグラディエント要約の活用

GRAD-SUM: Leveraging Gradient Summarization for Optimal Prompt Engineering ( http://arxiv.org/abs/2407.12865v1 )

ライセンス: Link先を確認

Derek Austin, Elliott Chartock,

(参考訳) 大規模言語モデル(LLM)のプロンプトエンジニアリングは、しばしば、高品質な出力を保証するために反復的にプロンプトの生成、評価、精製を含む手動の時間集約プロセスである。迅速なエンジニアリングを自動化する作業は行われているが、ソリューションは一般的に、与えられた回答で特定のタスクに調整されるか、非常にコストがかかる。 GRAD-SUMは、勾配に基づく最適化技術に基づく自動プロンプトエンジニアリングのためのスケーラブルで柔軟な手法である。提案手法では,ユーザ定義タスク記述と評価基準を取り入れ,フィードバックを効果的に一般化する新たな勾配要約モジュールを特徴とする。この結果から, GRAD-SUMは様々なベンチマークで既存の手法よりも優れており, 自動プロンプト最適化における汎用性と有効性を強調している。

Prompt engineering for large language models (LLMs) is often a manual time-intensive process that involves generating, evaluating, and refining prompts iteratively to ensure high-quality outputs. While there has been work on automating prompt engineering, the solutions generally are either tuned to specific tasks with given answers or are quite costly. We introduce GRAD-SUM, a scalable and flexible method for automatic prompt engineering that builds on gradient-based optimization techniques. Our approach incorporates user-defined task descriptions and evaluation criteria, and features a novel gradient summarization module to generalize feedback effectively. Our results demonstrate that GRAD-SUM consistently outperforms existing methods across various benchmarks, highlighting its versatility and effectiveness in automatic prompt optimization.

翻訳日:2024-07-19 20:02:37 公開日:2024-07-12

# milli Flow:人間のモーションセンシングのためのミリ波レーダ点雲のシーンフロー推定

milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing ( http://arxiv.org/abs/2306.17010v8 )

ライセンス: Link先を確認

Fangqiang Ding, Zhen Luo, Peijun Zhao, Chris Xiaoxuan Lu,

(参考訳) ヒューマンモーションセンシングは、意思決定、ユーザインタラクション、パーソナライズされたサービスにおいて、スマートシステムにおいて重要な役割を果たす。大規模な研究は、主にカメラに基づいており、その侵入性はスマートホームアプリケーションでの使用を制限する。この問題を解決するために、mWaveレーダーはプライバシーに優しい機能のために人気を集めている。本研究では,mmWave 点雲の相補的な動き情報としてシーンフローを推定する新たな深層学習手法である MilliFlow を提案する。実験により, 競合する手法と比較して, 提案手法の優れた性能が示された。さらに、シーンフロー情報を取り入れることで、人間の活動認識と人間のパーシングの大幅な改善を実現し、人体部分追跡を支援する。コードとデータセットはhttps://github.com/Toytiny/milliFlow.comで入手できる。

Human motion sensing plays a crucial role in smart systems for decision-making, user interaction, and personalized services. Extensive research that has been conducted is predominantly based on cameras, whose intrusive nature limits their use in smart home applications. To address this, mmWave radars have gained popularity due to their privacy-friendly features. In this work, we propose milliFlow, a novel deep learning approach to estimate scene flow as complementary motion information for mmWave point cloud, serving as an intermediate level of features and directly benefiting downstream human motion sensing tasks. Experimental results demonstrate the superior performance of our method when compared with the competing approaches. Furthermore, by incorporating scene flow information, we achieve remarkable improvements in human activity recognition and human parsing and support human body part tracking. Code and dataset are available at https://github.com/Toytiny/milliFlow.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-12

# マルチスケールパッチ埋め込みと変圧器を用いた心電図信号の復調

ECG Signal Denoising Using Multi-scale Patch Embedding and Transformers ( http://arxiv.org/abs/2407.11065v1 )

ライセンス: Link先を確認

Ding Zhu, Vishnu Kabir Chhabra, Mohammad Mahdi Khalili,

(参考訳) 心血管疾患は、心電図(ECG)信号を用いて一般的に監視される主要な生命維持状態である。しかし、これらの信号はしばしば異なる強度の様々な種類のノイズによって汚染され、下流のタスクと著しく干渉する。したがって、心電図信号のノイズ化と信号対雑音比の増大は、心血管モニタリングに不可欠である。本稿では,1次元畳み込み層と変圧器アーキテクチャを組み合わせた深層学習手法を提案する。畳み込み層は、ECG信号を様々なカーネル/パッチサイズで処理し、マルチスケールパッチ埋め込みと呼ばれる埋め込みを生成する。次に、この埋め込みをトランスネットワークの入力として使用し、ECG信号をデノナイズするトランスの能力を高める。

Cardiovascular disease is a major life-threatening condition that is commonly monitored using electrocardiogram (ECG) signals. However, these signals are often contaminated by various types of noise at different intensities, significantly interfering with downstream tasks. Therefore, denoising ECG signals and increasing the signal-to-noise ratio is crucial for cardiovascular monitoring. In this paper, we propose a deep learning method that combines a one-dimensional convolutional layer with transformer architecture for denoising ECG signals. The convolutional layer processes the ECG signal by various kernel/patch sizes and generates an embedding called multi-scale patch embedding. The embedding then is used as the input of a transformer network and enhances the capability of the transformer for denoising the ECG signal.

翻訳日:2024-07-17 20:10:21 公開日:2024-07-12

# Show, Don't Tell: ChildPlayによるテキスト理解以上の大規模言語モデルの評価

Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay ( http://arxiv.org/abs/2407.11068v1 )

ライセンス: Link先を確認

Gonçalo Hora de Carvalho, Robert Pollice, Oscar Knap,

(参考訳) GPT-3.5 や GPT-4 のような LLM は、特に非言語領域において、より広い認知機能を持つという仮説を探求する。我々のアプローチは、戦略的思考と意思決定を評価するために、ASCIIでエンコードされたTic-Tac-Toe、Connect Four、Battleshipといったゲームを統合することで、標準的な言語ベンチマークを超えて拡張されます。モデルがトレーニングデータを超えて一般化できる能力を評価するために,さらに2つのゲームを導入する。最初のゲームであるLEGO Connect Language (LCL)は、空間論理を理解してアセンブリ命令に従うためにモデルの能力をテストする。第2のゲーム、形状のゲームは、ゼロの行列内で1sで表される形状を識別するためにモデルに挑戦し、さらに空間推論のスキルをテストする。この"Show, don't tell"戦略は、単にモデルに問い合わせるのではなく、ゲームを使用する。その結果,GPT-3.5 と GPT-4 のプレイ能力は標準ベンチマークに習熟しているにもかかわらず,事前学習をせずに完全に観察可能なゲームについて推論できることが示唆された。どちらのモデルも、Tic-Tac-ToeとConnect Fourでの敗戦を予測できず、バトルシップを正しくプレイすることができない。 GPT-4は形状のゲームである程度成功したが、両方のモデルはLCLゲームで提示された組立タスクで失敗する。これらの結果は,GPTモデルが会話の熟練度や基本ルールの理解をエミュレートできる一方で,戦略ゲームプレイや空間推論タスクにおける性能は極めて限定的であることを示唆している。重要なことに、これは現在のLLMベンチマークの盲点であり、ゲームプレイベンチマークスイートであるChildPlay(https://github.com/child-play-neurips/child-play)で強調します。本研究は, GPT-3.5 と GPT-4 とほぼ同じ大きさの LLM の創発的知能の主張と推論能力に関する注意深い物語を提供する。

We explore the hypothesis that LLMs, such as GPT-3.5 and GPT-4, possess broader cognitive functions, particularly in non-linguistic domains. Our approach extends beyond standard linguistic benchmarks by incorporating games like Tic-Tac-Toe, Connect Four, and Battleship, encoded via ASCII, to assess strategic thinking and decision-making. To evaluate the models' ability to generalize beyond their training data, we introduce two additional games. The first game, LEGO Connect Language (LCL), tests the models' capacity to understand spatial logic and follow assembly instructions. The second game, the game of shapes, challenges the models to identify shapes represented by 1s within a matrix of zeros, further testing their spatial reasoning skills. This "show, don't tell" strategy uses games instead of simply querying the models. Our results show that despite their proficiency on standard benchmarks, GPT-3.5 and GPT-4's abilities to play and reason about fully observable games without pre-training is mediocre. Both models fail to anticipate losing moves in Tic-Tac-Toe and Connect Four, and they are unable to play Battleship correctly. While GPT-4 shows some success in the game of shapes, both models fail at the assembly tasks presented in the LCL game. These results suggest that while GPT models can emulate conversational proficiency and basic rule comprehension, their performance in strategic gameplay and spatial reasoning tasks is very limited. Importantly, this reveals a blind spot in current LLM benchmarks that we highlight with our gameplay benchmark suite ChildPlay (https://github.com/child-play-neurips/child-play). Our findings provide a cautionary tale about claims of emergent intelligence and reasoning capabilities of LLMs that are roughly the size of GPT-3.5 and GPT-4.

翻訳日:2024-07-17 20:10:21 公開日:2024-07-12

# フェデレーションラーニングとコントロールを組み合わせた調査

Combining Federated Learning and Control: A Survey ( http://arxiv.org/abs/2407.11069v1 )

ライセンス: Link先を確認

Jakob Weber, Markus Gurtner, Amadeus Lobe, Adrian Trachte, Andreas Kugi,

(参考訳) この調査は、(非線形)制御アプリケーションにおける適応性、スケーラビリティ、一般化、プライバシを高めるために、フェデレートラーニング(FL)とコントロールを組み合わせる概要を提供する。従来の制御方法はコントローラ設計モデルに依存しているが、現実のシナリオではオンラインモデルの変更や学習を必要とすることが多い。 FLは、データプライバシを保持しながら、分散デバイス間の協調学習を可能にする、モデルトレーニングに対する分散アプローチを提供する。データをローカライズすることで、FLは通信のネットワーク帯域幅の要件を減らしながら、プライバシとセキュリティに関する懸念を軽減する。この調査は、FLと制御を組み合わせた最先端の概念と考え方をまとめたものである。方法論的メリットはさらに議論され,コントローラ設計による動的システムモデリングから適応制御への焦点,マルチエージェント意思決定システムにおける知識伝達に至るまで,期待されるアプリケーションの詳細な概要が示されている。

This survey provides an overview of combining Federated Learning (FL) and control to enhance adaptability, scalability, generalization, and privacy in (nonlinear) control applications. Traditional control methods rely on controller design models, but real-world scenarios often require online model retuning or learning. FL offers a distributed approach to model training, enabling collaborative learning across distributed devices while preserving data privacy. By keeping data localized, FL mitigates concerns regarding privacy and security while reducing network bandwidth requirements for communication. This survey summarizes the state-of-the-art concepts and ideas of combining FL and control. The methodical benefits are further discussed, culminating in a detailed overview of expected applications, from dynamical system modeling over controller design, focusing on adaptive control, to knowledge transfer in multi-agent decision-making systems.

翻訳日:2024-07-17 20:10:21 公開日:2024-07-12

# 因果モデリングと木探索を用いたCAGE-2の最適デフェンダ戦略

Optimal Defender Strategies for CAGE-2 using Causal Modeling and Tree Search ( http://arxiv.org/abs/2407.11070v1 )

ライセンス: Link先を確認

Kim Hammar, Neil Dhir, Rolf Stadler,

(参考訳) CAGE-2チャレンジは、自律的なサイバー防御方法を比較するための標準ベンチマークと考えられている。このベンチマークに対して評価された現在の最先端の手法は、モデルなし(オフライン)強化学習に基づいており、証明可能な最適なディフェンダー戦略を提供していない。本稿では,この制限に対処し,CAGE-2の形式的(因果的)モデルと,C-POMCP(Causal partially Observable Monte-Carlo Planning)と呼ばれる,実証可能な最適なディフェンダー戦略を生成する手法を提案する。 2つの重要な性質を持つ。まず、対象システムの因果構造、すなわちシステム変数間の因果関係を組み込む。この構造により、ディフェンダー戦略の探索空間が大幅に減少する。第2に、木探索を通じて各ステップでディフェンダー戦略を更新するオンライン手法である。 CAGE-2ベンチマークに対する評価は、C-POMCPが有効性に関して最先端の性能を達成し、最も近い競合手法よりも計算時間で2桁効率が良いことを示している。

The CAGE-2 challenge is considered a standard benchmark to compare methods for autonomous cyber defense. Current state-of-the-art methods evaluated against this benchmark are based on model-free (offline) reinforcement learning, which does not provide provably optimal defender strategies. We address this limitation and present a formal (causal) model of CAGE-2 together with a method that produces a provably optimal defender strategy, which we call Causal Partially Observable Monte-Carlo Planning (C-POMCP). It has two key properties. First, it incorporates the causal structure of the target system, i.e., the causal relationships among the system variables. This structure allows for a significant reduction of the search space of defender strategies. Second, it is an online method that updates the defender strategy at each time step via tree search. Evaluations against the CAGE-2 benchmark show that C-POMCP achieves state-of-the-art performance with respect to effectiveness and is two orders of magnitude more efficient in computing time than the closest competitor method.

翻訳日:2024-07-17 20:00:37 公開日:2024-07-12

# MonoSparse-CAM: CAMにおける木モデル処理強化のためのモノトニック性とスポーサリティの調和

MonoSparse-CAM: Harnessing Monotonicity and Sparsity for Enhanced Tree Model Processing on CAMs ( http://arxiv.org/abs/2407.11071v1 )

ライセンス: Link先を確認

Tergel Molom-Ochir, Brady Taylor, Hai, Li, Yiran Chen,

(参考訳) ニューラルネットワークによって駆動されるAIの大幅な進歩にもかかわらず、ツリーベース機械学習(TBML)モデルは表データに排他的である。これらのモデルは、特にアナログコンテンツ調整可能なメモリ(aCAM)アレイで加速された場合、エネルギー効率と高い性能を示す。しかし、TBMLモデル構造とaCAM回路を利用する場合、ハードウェアデプロイメントの最適化は依然として困難である。本稿では,コンテンツ適応型メモリ(CAM)に基づく計算最適化技術であるMonoSparse-CAMを紹介する。 MonoSparse-CAMはTBMLモデルとCAMアレイ回路を効率よく利用し、処理性能を向上させる。実験の結果,MonoSparse-CAMは,既存のデプロイメント最適化手法と比較して,生処理と比較して最大28.56倍,18.51倍のエネルギー消費を削減できることがわかった。さらに、現在の手法よりも少なくとも1.68倍の計算効率を実現している。 MonoSparse-CAMは、配列の幅にかかわらず性能を保ちながらエネルギー効率の良いCAMベースの計算を可能にすることにより、大規模な配列の処理を妨げるCAMの高エネルギー消費問題に対処する。 CAMベースのコンピューティングにおいて,効率的なデプロイメント最適化ソリューションとしてMonoSparse-CAMを提案するとともに,TBMLモデル構造が配列空間に与える影響について検討する。この研究は、ハードウェア上でのエネルギー効率の高いTBMLに関する重要な洞察を提供し、持続可能なAI技術の大幅な進歩を浮き彫りにしている。

Despite significant advancements in AI driven by neural networks, tree-based machine learning (TBML) models excel on tabular data. These models exhibit promising energy efficiency, and high performance, particularly when accelerated on analog content-addressable memory (aCAM) arrays. However, optimizing their hardware deployment, especially in leveraging TBML model structure and aCAM circuitry, remains challenging. In this paper, we introduce MonoSparse-CAM, a novel content-addressable memory (CAM) based computing optimization technique. MonoSparse-CAM efficiently leverages TBML model sparsity and CAM array circuits, enhancing processing performance. Our experiments show that MonoSparse-CAM reduces energy consumption by up to 28.56x compared to raw processing and 18.51x compared to existing deployment optimization techniques. Additionally, it consistently achieves at least 1.68x computational efficiency over current methods. By enabling energy-efficient CAM-based computing while preserving performance regardless of the array sparsity, MonoSparse-CAM addresses the high energy consumption problem of CAM which hinders processing of large arrays. Our contributions are twofold: we propose MonoSparse-CAM as an effective deployment optimization solution for CAM-based computing, and we investigate the impact of TBML model structure on array sparsity. This work provides crucial insights for energy-efficient TBML on hardware, highlighting a significant advancement in sustainable AI technologies.

翻訳日:2024-07-17 20:00:37 公開日:2024-07-12

# MaPPing your model: Assess the Impact of Adversarial Attacks on LLM-based Programming Assistants

MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants ( http://arxiv.org/abs/2407.11072v1 )

ライセンス: Link先を確認

John Heibel, Daniel Lowd,

(参考訳) LLMベースのプログラミングアシスタントは、より高速なプログラミングを約束するが、より多くのセキュリティ脆弱性を導入するリスクがある。以前の研究は、LSMがより頻繁に脆弱性を提案するために、どのように悪質に微調整されるかを研究していた。信頼できない第三者による結果を利用するエージェントLSMの台頭により、モデルのプロンプトに対する攻撃のリスクが増大する。攻撃者はプログラムタスク(500バイト以下)のプロンプトに少量のテキストを追加する。我々の迅速な戦略は、LSMが他の方法で正しいコードを書き続けながら脆弱性を追加する可能性があることを示しています。我々は,基本から最先端の商用モデルに至るまで,7つの共通LLM上での3つのプロンプトを評価する。 HumanEval ベンチマークを用いて、我々のプロンプトは広範囲に効果があり、異なる LLM のカスタマイズは不要である。さらに、HumanEval で最高の LLM もまた、悪意のある命令に従うのに最適であり、単に言語モデルのスケーリングが MaPP 攻撃を防ぐことはないことを示唆している。 16のシナリオで8つのCWEのデータセットを使用することで、MaPP攻撃は、さまざまなモデルにまたがって特定の脆弱性やターゲットとする脆弱性を実装するのにも有効であることがわかった。我々の研究は、LLMの助けを借りて生成されたコードを厳格に監査するだけでなく、操作に対するLLMプロンプトの確保の必要性を強調している。

LLM-based programming assistants offer the promise of programming faster but with the risk of introducing more security vulnerabilities. Prior work has studied how LLMs could be maliciously fine-tuned to suggest vulnerabilities more often. With the rise of agentic LLMs, which may use results from an untrusted third party, there is a growing risk of attacks on the model's prompt. We introduce the Malicious Programming Prompt (MaPP) attack, in which an attacker adds a small amount of text to a prompt for a programming task (under 500 bytes). We show that our prompt strategy can cause an LLM to add vulnerabilities while continuing to write otherwise correct code. We evaluate three prompts on seven common LLMs, from basic to state-of-the-art commercial models. Using the HumanEval benchmark, we find that our prompts are broadly effective, with no customization required for different LLMs. Furthermore, the LLMs that are best at HumanEval are also best at following our malicious instructions, suggesting that simply scaling language models will not prevent MaPP attacks. Using a dataset of eight CWEs in 16 scenarios, we find that MaPP attacks are also effective at implementing specific and targeted vulnerabilities across a range of models. Our work highlights the need to secure LLM prompts against manipulation as well as rigorously auditing code generated with the help of LLMs.

翻訳日:2024-07-17 20:00:37 公開日:2024-07-12

# 画像間変換のための非対称GAN

Asymmetric GANs for Image-to-Image Translation ( http://arxiv.org/abs/1912.06931v2 )

ライセンス: Link先を確認

Hao Tang, Nicu Sebe,

(参考訳) GAN(Generative Adversarial Networks)による教師なし画像翻訳の既存のモデルは、サイクル一貫性損失を用いて、ソースドメインからターゲットドメインへのマッピングを学習することができる。しかし、これらの手法は常に対称なネットワークアーキテクチャを採用し、前方と後方の両方のサイクルを学習する。ソースとターゲットドメイン間のタスク複雑性とサイクル入力差のため、双方向の前後のサイクル翻訳の不等式が重要であり、2つのドメイン間の情報量が異なる。本稿では、非対称翻訳タスクにおける既存の対称GANの制限を解析し、非対称画像翻訳タスクと教師なし画像翻訳タスクの両方において非対称的なニーズに対応するために、非対称GANモデルを提案する。さらに、既存の手法の訓練段階には、生成した画像の品質を劣化させるようなモデル崩壊の一般的な問題があり、したがって、非対称GANのトレーニングを改善するために、異なる最適化損失を探索し、一貫性と安定性を向上する。 8つのデータセットを用いた教師付きおよび教師なし生成タスクの広範な実験は、AsymmetricGANが既存のGANと比較して優れたモデルキャパシティと生成性能を達成することを示す。我々の知る限りでは、教師なしと教師なしの両方の画像翻訳タスクにおいて、非対称なGAN構造を調査するのは初めてである。

Existing models for unsupervised image translation with Generative Adversarial Networks (GANs) can learn the mapping from the source domain to the target domain using a cycle-consistency loss. However, these methods always adopt a symmetric network architecture to learn both forward and backward cycles. Because of the task complexity and cycle input difference between the source and target domains, the inequality in bidirectional forward-backward cycle translations is significant and the amount of information between two domains is different. In this paper, we analyze the limitation of existing symmetric GANs in asymmetric translation tasks, and propose an AsymmetricGAN model with both translation and reconstruction generators of unequal sizes and different parameter-sharing strategy to adapt to the asymmetric need in both unsupervised and supervised image translation tasks. Moreover, the training stage of existing methods has the common problem of model collapse that degrades the quality of the generated images, thus we explore different optimization losses for better training of AsymmetricGAN, making image translation with higher consistency and better stability. Extensive experiments on both supervised and unsupervised generative tasks with 8 datasets show that AsymmetricGAN achieves superior model capacity and better generation performance compared with existing GANs. To the best of our knowledge, we are the first to investigate the asymmetric GAN structure on both unsupervised and supervised image translation tasks.

翻訳日:2024-07-17 05:46:45 公開日:2024-07-12

# 量子力学を完成させる現実的モデル

A realistic model for completing Quantum Mechanics ( http://arxiv.org/abs/2104.12701v5 )

ライセンス: Link先を確認

M. Baldo,

(参考訳) N. Bohr が提唱した量子力学のコペンハーゲン解釈(英語版)では、物理的対象と実験結果はマクロ言語でのみ記述でき、どんな微視的記述も説明できないままである。この見解は、量子力学のリレーショナル解釈において、C. Rovelliによってより深められた。物理現象の詳細な微視的な説明と進化を試みている他の解釈の多くは、波動関数を理論の基本要素として明らかに導入している。これらの解釈は量子状態の概念を理論の基本概念として必要としており、コペンハーゲン解釈(英語版)による典型的な説明不可能な物理要素である。 2つの基本的な物理的実体は波動関数の整合性によって密接に結びついている。これらの解釈は通常、現実的なものとして表される。物理過程の記述における波動関数の利用とその時間進化は、必然的にいくつかの困難またはいわゆるパラドックスに繋がる。測定問題は、主に量子力学の数学的形式に明示的に含まれていない波動関数の還元過程の導入を必要とするため、これらの困難の中心にある。本稿では, 標準形式を超越したモデルの構築と提案を行い, 測定問題とそれに関連する他の問題をすべて解決できるモデルを提案する。

In the well known Copenhagen interpretation of Quantum mechanics, advocated by N. Bohr, the physical objects and the experimental results can be described only in a macroscopic language, leaving any possible microscopic description as unspeakable. This point of view has been deepened by C. Rovelli in the relational interpretation of Quantum mechanics. Most of the alternative interpretations, which try a detailed microscopic description of physical phenomena and of their evolution, have in common the explicit introduction of the wave function as the basic element of the theory. These interpretations require the notion of quantum state as the fundamental concept of the theory, which is the typical unspeakable physical element according to the Copenhagen interpretation. The two basic physical entities are intimately bound together by the integrity of the wave function. These interpretations are usually indicated as realistic. It is well known that the use of the wave function and its time evolution in the description of the physical processes leads unavoidably to some difficulties or so-called paradoxes. The measurement problem is at the center of these difficulties, mainly because it requires the introduction of the reduction process of the wave function, which is not included explicitly within the mathematical formalism of Quantum Mechanics. In this paper we build up and propose a model which goes beyond the standard formalism and which is able to solve the measurement problem and all the other difficulties which, in a way or in another, are related to it.

翻訳日:2024-07-17 05:46:45 公開日:2024-07-12

# 冷間原子干渉計用高性能シリコンフォトニックシングルサイドバンド変調器

High-Performance Silicon Photonic Single-Sideband Modulators for Cold Atom Interferometry ( http://arxiv.org/abs/2204.12537v3 )

ライセンス: Link先を確認

Ashok Kodigala, Michael Gehl, Gregory W. Hoth, Jongmin Lee, Christopher DeRose, Andrew Pomerene, Christina Dallo, Douglas Trotter, Andrew L. Starbuck, Grant Biedermann, Peter D. D. Schwindt, Anthony L. Lentine,

(参考訳) 光パルス原子干渉計(LPAI)内の最も複雑で困難なシステムは、時間とともに複数のレーザービームの周波数と強度を制御し、量子重力と慣性センサーを構成するレーザーシステムである。 LPAIレーザーの主な機能は、低温原子生成、状態準備、状態選択検出を行い、光パルスシーケンスのためのコヒーレントな2光子過程を生成することである。レーザーシステムの重要な機能をフォトニック集積回路(PIC)に導入することにより、レーザーシステムの実質的な小型化と頑丈化を実現することができる。高性能シリコンフォトニック抑圧型シングルサイドバンド (SC-SSB) 変調器を1560nmで実証し, LPAI内で動的に周波数シフトできることを示した。 RFチャネルの独立制御により、光とRFの位相/振幅のアンバランスを30dBキャリア圧縮、ピーク変換効率の47.8dBサイドバンド圧縮、最大変換効率:-6.846dB(20.7%)に到達させる。シリコンフォトニックSSB変調器を用いて、ルビジウム($^{87}$Rb)原子系において、低温原子の生成、状態選択検出、原子干渉計による重力加速度の推定、$g \approx 9.77 \pm 0.01 \,\rm{m/s^2}$を実証する。

The most complicated and challenging system within a light-pulse atom interferometer (LPAI) is the laser system, which controls the frequencies and intensities of multiple laser beams over time to configure quantum gravity and inertial sensors. The main function of an LPAI laser system is to perform cold-atom generation, state-preparation, state-selective detection and to generate coherent two-photon process for the light-pulse sequence. Substantial miniaturization and ruggedization of the laser system can be achieved by bringing most key functions of the laser system onto photonic integrated circuit (PIC). We demonstrate a high-performance silicon photonic suppressed-carrier single-sideband (SC-SSB) modulator at 1560 nm, which can dynamically frequency shift within the LPAI. With independent RF-channel control, we study the imbalances in both the optical and RF phases/amplitudes to reach 30 dB carrier-suppression, unprecedented 47.8 dB sideband-suppression at peak conversion-efficiency: -6.846 dB (20.7 %). Using a silicon photonic SSB-modulator, we demonstrate cold-atom generation, state-selective detection, and atom interferometer fringes to estimate gravitational acceleration, $g \approx 9.77 \pm 0.01 \,\rm{m/s^2}$, in a Rubidium ($^{87}$Rb) atom system.

翻訳日:2024-07-17 05:46:45 公開日:2024-07-12

# ダイナミックリレーショナルデータのためのファクトリー型核融合収縮

Factorized Fusion Shrinkage for Dynamic Relational Data ( http://arxiv.org/abs/2210.00091v3 )

ライセンス: Link先を確認

Peng Zhao, Anirban Bhattacharya, Debdeep Pati, Bani K. Mallick,

(参考訳) 現代のデータサイエンスの応用は、しばしば動的構造を持つ複雑な関係データを含む。このようなダイナミックリレーショナルデータの急激な変化は、通常、介入によって状態が変化するシステムで観察される。このような場合、分解されたすべての因子がグループ単位の融合構造に対して動的に縮小される分解された融合収縮モデルを考え、分解された行列の行ベクトルの連続的な違いに先立って、グローバル局所的な収縮を適用して収縮を得る。提案手法は、推定された動的潜在因子の比較とクラスタリングにおいて、多くの好ましい特性を享受する。推定潜在因子の比較には、隣接および長期の比較の両方が関係し、比較の時間範囲は変数と見なされる。一定の条件下では、後続分布が対数係数まで最小値の最適値を達成することを示す。計算量の観点からは、最適後部推論と計算スケーラビリティのバランスを保ち、コンポーネント間の依存性と時間的依存性を両立させる構造的平均場変動推論フレームワークを提案する。このフレームワークは、動的行列分解、ネットワークの潜在空間モデル、低ランクテンソルなど、様々なモデルに対応できる。本手法の有効性は,広範囲なシミュレーションと実世界のデータ解析によって実証される。

Modern data science applications often involve complex relational data with dynamic structures. An abrupt change in such dynamic relational data is typically observed in systems that undergo regime changes due to interventions. In such a case, we consider a factorized fusion shrinkage model in which all decomposed factors are dynamically shrunk towards group-wise fusion structures, where the shrinkage is obtained by applying global-local shrinkage priors to the successive differences of the row vectors of the factorized matrices. The proposed priors enjoy many favorable properties in comparison and clustering of the estimated dynamic latent factors. Comparing estimated latent factors involves both adjacent and long-term comparisons, with the time range of comparison considered as a variable. Under certain conditions, we demonstrate that the posterior distribution attains the minimax optimal rate up to logarithmic factors. In terms of computation, we present a structured mean-field variational inference framework that balances optimal posterior inference with computational scalability, exploiting both the dependence among components and across time. The framework can accommodate a wide variety of models, including dynamic matrix factorization, latent space models for networks and low-rank tensors. The effectiveness of our methodology is demonstrated through extensive simulations and real-world data analysis.

翻訳日:2024-07-17 05:38:07 公開日:2024-07-12

# 大規模で異なるプライベートなストリーム処理

Differentially Private Stream Processing at Scale ( http://arxiv.org/abs/2303.18086v3 )

ライセンス: Link先を確認

Bing Zhang, Vadym Doroshenko, Peter Kairouz, Thomas Steinke, Abhradeep Thakurta, Ziyin Ma, Eidan Cohen, Himani Apte, Jodi Spacek,

(参考訳) 我々は、私たちの知る限り、最初の差分プライベート(DP)ストリーム集約処理システムを大規模に設計する。当社のシステム - Differential Privacy SQL Pipelines (DP-SQLP) - Sparkストリーミングに似たストリーミングフレームワークを使用して構築されており、GoogleのSpannerデータベースとF1クエリエンジン上に構築されています。 DP-SQLPの設計に向けて,アルゴリズムとシステムの両方の進歩,すなわち我々は二新規(ユーザレベルの)DPキー選択アルゴリズムを設計し、使用可能なキーの無拘束セットを操作でき、ユーザがコントリビュートしたキーを10億個まで拡張することができる。 (二)トリガー時間毎に全てのキーを列挙しないDPキー選択のプリエンプティブ実行方式を設計し、三 DP連続観測のアルゴリズムを用いて、ストリーム長の異なるキーに対するユーザのコントリビューションの連続DPヒストグラムを解放する。有意義なベースラインよりも、少なくとも16\times$エラーを減らし、有効性を実証的に実証する。 DP-SQLPを用いたGoogle Shoppingのユーザ印象のストリーミングを実現した。ストリーミングDPアルゴリズムは、Google Trendsにも適用される。

We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shopping with DP-SQLP. The streaming DP algorithms are further applied to Google Trends.

翻訳日:2024-07-17 05:28:16 公開日:2024-07-12

# データセット著作権のための大規模言語モデルによる透かしテキストデータ

Watermarking Text Data on Large Language Models for Dataset Copyright ( http://arxiv.org/abs/2305.13257v4 )

ライセンス: Link先を確認

Yixin Liu, Hongsheng Hu, Xun Chen, Xuyun Zhang, Lichao Sun,

(参考訳) 現状の研究では、大規模なコーパス上の深層モデル(例えば、事前訓練されたモデル)が、下流のNLPタスクに有用な普遍言語表現を学習できることが示されている。しかしながら、これらの強力なモデルはさまざまなプライバシ攻撃にも脆弱であり、トレーニングデータセットには多くの機密情報が存在している。攻撃者は公共のモデル、例えば個人のメールアドレスや電話番号から容易に機密情報を盗むことができる。このような問題,特に未許可のプライベートデータの利用に対処するために,テキストマーカというバックドアベースのメンバシップ推論手法を用いて,トレーニング用テキストデータに埋め込まれたさまざまな形式のプライベート情報を保護できる新しい透かし技術を導入する。具体的には、TextMarkerはデータ所有者に対して、ターゲットモデルに対するブラックボックスアクセス仮定の下で、データ著作権保護のための少数のサンプルをマークすることのみを要求する。各種実世界のデータセットに対するTextMarkerの有効性を示す。例えば、トレーニングデータセットの0.1%しかマークしていないことは、モデルユーティリティに無視できる効果を持つ効果的なメンバーシップ推論に十分である。また、潜在的な対策について議論し、TextMarkerがそれらをバイパスするのに十分なステルス性を示している。

Substantial research works have shown that deep models, e.g., pre-trained models, on the large corpus can learn universal language representations, which are beneficial for downstream NLP tasks. However, these powerful models are also vulnerable to various privacy attacks, while much sensitive information exists in the training dataset. The attacker can easily steal sensitive information from public models, e.g., individuals' email addresses and phone numbers. In an attempt to address these issues, particularly the unauthorized use of private data, we introduce a novel watermarking technique via a backdoor-based membership inference approach named TextMarker, which can safeguard diverse forms of private information embedded in the training text data. Specifically, TextMarker only requires data owners to mark a small number of samples for data copyright protection under the black-box access assumption to the target model. Through extensive evaluation, we demonstrate the effectiveness of TextMarker on various real-world datasets, e.g., marking only 0.1% of the training dataset is practically sufficient for effective membership inference with negligible effect on model utility. We also discuss potential countermeasures and show that TextMarker is stealthy enough to bypass them.

翻訳日:2024-07-17 05:18:31 公開日:2024-07-12

# Equivariant vs. Invariant Layers: ポイントクラウド分類のためのバックボーンとプールの比較

Equivariant vs. Invariant Layers: A Comparison of Backbone and Pooling for Point Cloud Classification ( http://arxiv.org/abs/2306.05553v2 )

ライセンス: Link先を確認

Abihith Kothapalli, Ashkan Shahbazi, Xinran Liu, Robert Sheng, Soheil Kolouri,

(参考訳) ポイントクラウドのようなセット構造化データから学ぶことは、機械学習コミュニティから大きな注目を集めている。幾何学的深層学習は、集合構造データの置換対称性を保持する効果的な集合ニューラルネットワークを設計するための青写真を提供する。我々の関心は、置換不変ネットワークであり、置換同変バックボーン、置換不変大域プール、回帰/分類ヘッドで構成されている。既存の文献では、均質なバックボーンの改善に焦点が当てられているが、プーリング層の影響はしばしば見過ごされている。本稿では,3つのベンチマークポイントクラウド分類データセット上での置換同変バックボーンと置換不変大域プールの相互作用について検討する。私たちの発見は、こう示しています。 1) トランスポートベースやアテンションベースといった複雑なプーリング手法は, 単純なバックボーンの性能を著しく向上させるが, より複雑なバックボーンではメリットが低下する。 2) 複雑なバックボーンでさえ、低いデータシナリオでレイヤをプールするメリットがあります。 3) 驚くべきことに、プール層の選択は、バックボーンの幅と深さを調整するよりも、モデルの性能に顕著な影響を与える可能性がある。 4) 固定バックボーンの性能を著しく向上させることができる。我々の総合的な研究は、実践者がより優れた置換不変集合ニューラルネットワークを設計するための洞察を提供する。私たちのコードはhttps://github.com/mint-vu/backbone_vs_pooling.comで利用可能です。

Learning from set-structured data, such as point clouds, has gained significant attention from the machine learning community. Geometric deep learning provides a blueprint for designing effective set neural networks that preserve the permutation symmetry of set-structured data. Of our interest are permutation invariant networks, which are composed of a permutation equivariant backbone, permutation invariant global pooling, and regression/classification head. While existing literature has focused on improving equivariant backbones, the impact of the pooling layer is often overlooked. In this paper, we examine the interplay between permutation equivariant backbones and permutation invariant global pooling on three benchmark point cloud classification datasets. Our findings reveal that: 1) complex pooling methods, such as transport-based or attention-based poolings, can significantly boost the performance of simple backbones, but the benefits diminish for more complex backbones, 2) even complex backbones can benefit from pooling layers in low data scenarios, 3) surprisingly, the choice of pooling layers can have a more significant impact on the model's performance than adjusting the width and depth of the backbone, and 4) pairwise combination of pooling layers can significantly improve the performance of a fixed backbone. Our comprehensive study provides insights for practitioners to design better permutation invariant set neural networks. Our code is available at https://github.com/mint-vu/backbone_vs_pooling.

翻訳日:2024-07-17 05:18:31 公開日:2024-07-12

# 大規模言語モデルにおけるバイアスと公正性:調査

Bias and Fairness in Large Language Models: A Survey ( http://arxiv.org/abs/2309.00770v3 )

ライセンス: Link先を確認

Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Nesreen K. Ahmed,

(参考訳) 大規模言語モデル(LLM)の急速な進歩により、人間のようなテキストの処理、理解、生成が可能となり、社会領域に触れるシステムへの統合が拡大した。この成功にもかかわらず、これらのモデルは有害な社会的バイアスを学習し、永続し、増幅することができる。本稿では,LLMのバイアス評価と緩和技術に関する総合的な調査を行う。まず、自然言語処理における社会的偏見と公平性の概念を統合、形式化し、拡張し、異なる害の面を定義し、LLMの公正性を運用するためにいくつかのデシラタを導入する。次に、3つの直感的な分類法、バイアス評価のための2つの指標とデータセット、緩和のための1つを提案して、文献を統一する。バイアス評価のためのメトリクスの最初の分類法は、メトリクスと評価データセットの関係を曖昧にし、それらがモデルで運用するさまざまなレベル(埋め込み、確率、生成されたテキスト)でメトリクスを整理します。バイアス評価のためのデータセットの第2の分類法は、その構造によるデータセットを対実的な入力やプロンプトとして分類し、ターゲットとなる害や社会集団を特定します。偏差緩和技術の第3の分類法は, 事前処理, イントレーニング, イントラプロセッシング, ポストプロセッシングの介入によって, 研究動向を解明する粒度のサブカテゴリを分類する。最後に、今後の作業におけるオープンな問題と課題を特定します。近年の幅広い研究を合成し、研究者や実践者がLLMのバイアスの伝播をよりよく理解し防止できるように、既存の文献の明確なガイドを提供することを目指している。

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

翻訳日:2024-07-17 04:58:50 公開日:2024-07-12

# SimNP: 神経点間の自己相似性を学習する

SimNP: Learning Self-Similarity Priors Between Neural Points ( http://arxiv.org/abs/2309.03809v2 )

ライセンス: Link先を確認

Christopher Wewer, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen,

(参考訳) 既存の3Dオブジェクト再構成のためのニューラルネットワーク表現は、(1)オブジェクトレベル表現を利用するが、グローバルな潜伏符号の条件付けにより、低品質の細部に苦しむか、(2)観察を完璧に再構築することができるが、観測されていない領域を推測するためにオブジェクトレベルの事前知識を利用できないかのいずれかである。カテゴリーレベルの自己相似性を学習する手法であるSimNPを提案する。これは、ニューラルネットワークとカテゴリレベルの自己相似性表現を結合することにより、両方の世界の利点を組み合わせたものである。私たちの貢献は2倍です。 1) コヒーレント・ポイント・クラウドの概念を利用して,カテゴリーレベルでの最初の神経点表現を設計する。結果として得られる神経点放射場は、局所的に支持された対象領域に対して高いレベルの詳細を記憶する。 2) 再建過程において, 対象の未観測領域を所定の観測から導き出すことが可能な, 制約のない, 教師なしの方法で, ニューラルポイント間での情報共有を学習する。我々は、SimNPが、カテゴリレベルまたはピクセルアラインなラディアンスフィールド上に構築され、インスタンス間の意味的対応を提供しながら、対称な見えないオブジェクト領域を再構築する従来の手法よりも優れていることを示す。

Existing neural field representations for 3D object reconstruction either (1) utilize object-level representations, but suffer from low-quality details due to conditioning on a global latent code, or (2) are able to perfectly reconstruct the observations, but fail to utilize object-level prior knowledge to infer unobserved regions. We present SimNP, a method to learn category-level self-similarities, which combines the advantages of both worlds by connecting neural point radiance fields with a category-level self-similarity representation. Our contribution is two-fold. (1) We design the first neural point representation on a category level by utilizing the concept of coherent point clouds. The resulting neural point radiance fields store a high level of detail for locally supported object regions. (2) We learn how information is shared between neural points in an unconstrained and unsupervised fashion, which allows to derive unobserved regions of an object during the reconstruction process from given observations. We show that SimNP is able to outperform previous methods in reconstructing symmetric unseen object regions, surpassing methods that build upon category-level or pixel-aligned radiance fields, while providing semantic correspondences between instances

翻訳日:2024-07-17 04:58:50 公開日:2024-07-12

# MEMO:大または小血管密度差を有するロバスト多モード網膜画像登録のためのデータセットと方法

MEMO: Dataset and Methods for Robust Multimodal Retinal Image Registration with Large or Small Vessel Density Differences ( http://arxiv.org/abs/2309.14550v2 )

ライセンス: Link先を確認

Chiao-Yi Wang, Faranguisse Kakhi Sadrieh, Yi-Ting Shen, Shih-En Chen, Sarah Kim, Victoria Chen, Achyut Raghavendra, Dongyi Wang, Osamah Saeedi, Yang Tao,

(参考訳) 毛細血管における網膜血流(RBF)の測定は、眼疾患の早期診断と治療のための強力なバイオマーカーとなる。しかし、キャピラリー流量を高精度で決定できる単一のモダリティは存在しない。 EMAは網膜微小血管の絶対2D RBFを測定することができ、OCTAは毛細血管の3D構造像を提供することができるため、EMAと光コヒーレンス断層血管造影(OCTA)を組み合わせることでこの目標を達成することができる。しかし、これらの2つのモード間のマルチモーダル網膜画像の登録はほとんど未発見のままである。このギャップを埋めるために、最初のパブリックマルチモーダルEMAであるMEMOとOCTA網膜画像データセットを構築した。これらのモダリティ間のマルチモーダル網膜画像登録におけるユニークな課題は、血管密度(VD)の相対的な大きな差である。この課題に対処するために,分割型ディープラーニングフレームワーク (VDD-Reg) と新しい評価指標 (MSD) を提案する。 VDD-Regはコンテナセグメンテーションモジュールと登録モジュールで構成される。船体セグメンテーションモジュールを訓練するために,教師なしと教師なしの損失を組み合わせた2段階の半教師付き学習フレームワーク(LVD-Seg)を設計した。 CF-FAデータセットを用いた)小さなVD差と大きなVD差(MEMOデータセットを用いた)の場合に,VDD-Regはベースライン法を定量的かつ定性的に上回ることを示す。さらに、VDD-Regはその精度を維持するために3つの注釈付き容器セグメンテーションマスクが必要であり、その実現可能性を示している。

The measurement of retinal blood flow (RBF) in capillaries can provide a powerful biomarker for the early diagnosis and treatment of ocular diseases. However, no single modality can determine capillary flowrates with high precision. Combining erythrocyte-mediated angiography (EMA) with optical coherence tomography angiography (OCTA) has the potential to achieve this goal, as EMA can measure the absolute 2D RBF of retinal microvasculature and OCTA can provide the 3D structural images of capillaries. However, multimodal retinal image registration between these two modalities remains largely unexplored. To fill this gap, we establish MEMO, the first public multimodal EMA and OCTA retinal image dataset. A unique challenge in multimodal retinal image registration between these modalities is the relatively large difference in vessel density (VD). To address this challenge, we propose a segmentation-based deep-learning framework (VDD-Reg) and a new evaluation metric (MSD), which provide robust results despite differences in vessel density. VDD-Reg consists of a vessel segmentation module and a registration module. To train the vessel segmentation module, we further designed a two-stage semi-supervised learning framework (LVD-Seg) combining supervised and unsupervised losses. We demonstrate that VDD-Reg outperforms baseline methods quantitatively and qualitatively for cases of both small VD differences (using the CF-FA dataset) and large VD differences (using our MEMO dataset). Moreover, VDD-Reg requires as few as three annotated vessel segmentation masks to maintain its accuracy, demonstrating its feasibility.

翻訳日:2024-07-17 04:48:58 公開日:2024-07-12

# VideoDirectorGPT:LLM誘導計画による連続マルチシーン映像生成

VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning ( http://arxiv.org/abs/2309.15091v2 )

ライセンス: Link先を確認

Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal,

(参考訳) 近年のテキスト・ツー・ビデオ(T2V)生成法は大きな進歩を遂げている。しかし、これらの作品の大半は、1つのイベント(すなわちシングルシーンのビデオ)の短いビデオクリップを作ることに重点を置いている。一方、最近の大規模言語モデル(LLM)は、下流のビジュアルモジュールを制御するレイアウトとプログラムを生成する能力を実証している。これらのLLMに埋め込まれた知識を時間的に一貫した長ビデオ生成に活用できるか? 本稿では,ビデオコンテンツプランニングとグラウンドドビデオ生成にLLMの知識を利用する,一貫したマルチシーンビデオ生成のための新しいフレームワークであるVideoDirectorGPTを提案する。具体的には、1つのテキストプロンプトが与えられた場合、まずビデオプランナのLCM(GPT-4)に、シーン記述、各レイアウトを持つエンティティ、各シーンの背景、エンティティの一貫性グループ化を含む「ビデオプラン」への拡張を依頼する。次に、このビデオプランでガイドされたビデオジェネレータLayout2Vidは、空間的レイアウトを明示的に制御し、複数のシーンにまたがるエンティティの時間的一貫性を保ちながら、画像レベルのアノテーションでのみ訓練することができる。実験により,本フレームワークは単一シーンと多シーンのビデオ生成におけるレイアウトと移動制御を大幅に改善し,複数シーンのビデオの一貫性を保ちながら,オープンドメインの単一シーンT2V生成におけるSOTAとの競合性能を実現した。 LLMによるレイアウト制御強度の動的調整や、ユーザが提供する画像による映像生成など、詳細なアブレーション研究により、我々のフレームワークの各コンポーネントの有効性と今後の可能性を確認することができる。

Recent text-to-video (T2V) generation methods have seen significant advancements. However, the majority of these works focus on producing short video clips of a single event (i.e., single-scene videos). Meanwhile, recent large language models (LLMs) have demonstrated their capability in generating layouts and programs to control downstream visual modules. This prompts an important question: can we leverage the knowledge embedded in these LLMs for temporally consistent long video generation? In this paper, we propose VideoDirectorGPT, a novel framework for consistent multi-scene video generation that uses the knowledge of LLMs for video content planning and grounded video generation. Specifically, given a single text prompt, we first ask our video planner LLM (GPT-4) to expand it into a 'video plan', which includes the scene descriptions, the entities with their respective layouts, the background for each scene, and consistency groupings of the entities. Next, guided by this video plan, our video generator, named Layout2Vid, has explicit control over spatial layouts and can maintain temporal consistency of entities across multiple scenes, while being trained only with image-level annotations. Our experiments demonstrate that our proposed VideoDirectorGPT framework substantially improves layout and movement control in both single- and multi-scene video generation and can generate multi-scene videos with consistency, while achieving competitive performance with SOTAs in open-domain single-scene T2V generation. Detailed ablation studies, including dynamic adjustment of layout control strength with an LLM and video generation with user-provided images, confirm the effectiveness of each component of our framework and its future potential.

翻訳日:2024-07-17 04:48:58 公開日:2024-07-12

# ひとつは、すべての分類タスクのための1つのグラフモデルをトレーニングすること

One for All: Towards Training One Graph Model for All Classification Tasks ( http://arxiv.org/abs/2310.00149v3 )

ライセンス: Link先を確認

Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, Muhan Zhang,

(参考訳) 複数のタスクに対処する単一モデルを設計することは、人工知能の長年の目標である。近年,大規模言語モデルは言語領域内で異なるタスクを解く際,例外的な能力を示した。しかし、グラフ学習領域に固有の課題のために、様々なグラフタスクの統一モデルがまだ探索されていない。まず、異なる領域のグラフデータは異なる属性を持ち、異なる分布に従う。このような相違により、単一の表現空間におけるグラフの表現が困難になる。第二に、グラフ上のタスクはノード、リンク、グラフタスクに多様化し、異なる埋め込み戦略を必要とする。最後に、文脈内学習のための適切なグラフプロンプトパラダイムが不明確である。我々は、上記の課題に対処するために単一のグラフモデルを使用する最初の一般的なフレームワークである、OFA(textbf{One for All)を提案する。具体的には、ノードとエッジを自然言語で記述することで、異なるグラフデータを統一するテキスト分散グラフを提案し、言語モデルを使用して、多様でおそらくクロスドメインなテキスト属性を符号化し、同じ埋め込み空間における特徴ベクトルを符号化する。さらに、OFAは1つのタスク表現で異なるタスクを標準化するノードオブ関心の概念を導入している。グラフ上のコンテキスト内学習のためにOFAは、入力グラフにサブストラクチャを付加する新しいグラフプロンプトパラダイムを導入し、微調整なしで様々なタスクに対処できるようにする。我々は、複数のドメイン(引用ネットワーク、分子グラフ、知識グラフなど)のグラフデータを用いてOFAモデルを同時に訓練し、教師付き、少数ショット、ゼロショット学習シナリオにおけるその能力を評価する。 OFAは様々なタスクでうまく機能し、グラフ上の最初の汎用のクロスドメイン分類モデルとなる。

Designing a single model to address multiple tasks has been a long-standing objective in artificial intelligence. Recently, large language models have demonstrated exceptional capability in solving different tasks within the language domain. However, a unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain. First, graph data from different areas carry distinct attributes and follow different distributions. Such discrepancy makes it hard to represent graphs in a single representation space. Second, tasks on graphs diversify into node, link, and graph tasks, requiring distinct embedding strategies. Finally, an appropriate graph prompting paradigm for in-context learning is unclear. We propose \textbf{One for All (OFA)}, the first general framework that can use a single graph model to address the above challenges. Specifically, OFA proposes text-attributed graphs to unify different graph data by describing nodes and edges with natural language and uses language models to encode the diverse and possibly cross-domain text attributes to feature vectors in the same embedding space. Furthermore, OFA introduces the concept of nodes-of-interest to standardize different tasks with a single task representation. For in-context learning on graphs, OFA introduces a novel graph prompting paradigm that appends prompting substructures to the input graph, which enables it to address varied tasks without fine-tuning. We train the OFA model using graph data from multiple domains (including citation networks, molecular graphs, knowledge graphs, etc.) simultaneously and evaluate its ability in supervised, few-shot, and zero-shot learning scenarios. OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.

翻訳日:2024-07-17 04:48:58 公開日:2024-07-12

# 安全性と安全性を活かした2層ブロックチェーンシャーディングプロトコルの高性能化

A Two-Layer Blockchain Sharding Protocol Leveraging Safety and Liveness for Enhanced Performance ( http://arxiv.org/abs/2310.11373v5 )

ライセンス: Link先を確認

Yibin Xu, Jingyi Zheng, Boris Düdder, Tijs Slaats, Yongluan Zhou,

(参考訳) シャーディングはブロックチェーンのスケーラビリティ向上に不可欠だ。既存のプロトコルは、さまざまな敵攻撃を見落とし、トランザクションスループットを制限します。本稿では、この問題に対処する基盤的なシャーディングプロトコルであるReticulumを紹介し、ブロックチェーンのスケーラビリティを向上する。 Reticulumは2段階のアプローチを採用し、実行時逆アタックに基づくトランザクションスループットを適用している。コントロール"と"プロセス"のシャードを2つのレイヤで構成する。プロセスシャードには少なくとも1つの信頼できるノードが含まれ、コントロールシャードには信頼性のあるノードが多数含まれている。最初のフェーズでは、トランザクションはブロックに書き込まれ、プロセスシャード内のノードによって投票される。承認されたブロックが全会一致で確認される。第2段階では、全会一致の受け入れられないブロックは制御シャードによって投票される。多数派が賛成すればブロックが認められ、第一段階の反対者や無言の有権者は排除される。 Reticulumは第1フェーズで全会一致投票を使用しており、ノードが少ないため、より並列なプロセスシャードが可能である。コントロールシャードは決定を確定し、紛争を解決します。 Reticulumの革新的な設計を確認し、さまざまなネットワーク攻撃に対して高いトランザクションスループットと堅牢性を提供し、ブロックチェーンネットワークの既存のシャーディングプロトコルを上回っている。

Sharding is essential for improving blockchain scalability. Existing protocols overlook diverse adversarial attacks, limiting transaction throughput. This paper presents Reticulum, a groundbreaking sharding protocol addressing this issue, boosting blockchain scalability. Reticulum employs a two-phase approach, adapting transaction throughput based on runtime adversarial attacks. It comprises "control" and "process" shards in two layers. Process shards contain at least one trustworthy node, while control shards have a majority of trusted nodes. In the first phase, transactions are written to blocks and voted on by nodes in process shards. Unanimously accepted blocks are confirmed. In the second phase, blocks without unanimous acceptance are voted on by control shards. Blocks are accepted if the majority votes in favor, eliminating first-phase opponents and silent voters. Reticulum uses unanimous voting in the first phase, involving fewer nodes, enabling more parallel process shards. Control shards finalize decisions and resolve disputes. Experiments confirm Reticulum's innovative design, providing high transaction throughput and robustness against various network attacks, outperforming existing sharding protocols for blockchain networks.

翻訳日:2024-07-17 04:48:58 公開日:2024-07-12

# メタ学習の欠如は、言語モデルがより信頼できる情報源を信頼させるかもしれない

Implicit meta-learning may lead language models to trust more reliable sources ( http://arxiv.org/abs/2310.15047v4 )

ライセンス: Link先を確認

Dmitrii Krasheninnikov, Egor Krasheninnikov, Bruno Mlodozeniec, Tegan Maharaj, David Krueger,

(参考訳) LLMは文書の有用性の指標を学習し,それに応じて更新を変更できることを実証する。合成微調整データセットにおける有用性の指標としてランダム文字列(タグ)を導入する。このデータセットの微調整は暗黙的なメタ学習(IML)につながる。さらに微調整では、タグ付けされたテキストをより有効に活用するためのモデル更新が行われる。我々は、この現象の徹底的な実証調査を行い、(その他に)その現象を発見した。一予め訓練したLLM及びスクラッチから訓練を受けたもの及び視覚課題で発生すること。 (ii) より大きなモデルと小さなバッチサイズは、より多くのMLを与える傾向があります。また、モデルがパラメーターに知識を格納する方法をIMLがどう変えるかを調べるために、探索も使用しています。最後に、将来のAIシステムの能力、リスク、制御可能性について、私たちの結果が示唆するものを反映します。私たちのコードはhttps://github.com/krasheninnikov/internalization.orgにある。

We demonstrate that LLMs may learn indicators of document usefulness and modulate their updates accordingly. We introduce random strings ("tags") as indicators of usefulness in a synthetic fine-tuning dataset. Fine-tuning on this dataset leads to implicit meta-learning (IML): in further fine-tuning, the model updates to make more use of text that is tagged as useful. We perform a thorough empirical investigation of this phenomenon, finding (among other things) that (i) it occurs in both pretrained LLMs and those trained from scratch, as well as on a vision task, and (ii) larger models and smaller batch sizes tend to give more IML. We also use probing to examine how IML changes the way models store knowledge in their parameters. Finally, we reflect on what our results might imply about capabilities, risks, and controllability of future AI systems. Our code can be found at https://github.com/krasheninnikov/internalization.

翻訳日:2024-07-17 02:54:11 公開日:2024-07-12

# カーシェアリングのための車間グリッド-2030年のシミュレーション研究-

Vehicle-to-grid for car sharing -- A simulation study for 2030 ( http://arxiv.org/abs/2311.07349v2 )

ライセンス: Link先を確認

Nina Wiedemann, Yanan Xin, Vasco Medici, Lorenzo Nespoli, Esra Suel, Martin Raubal,

(参考訳) 近年のカーシェアリングサービスの普及は、持続可能な輸送を推し進めるための有望な道のりを示している。単に車の所有率を下げるだけでなく、これらのシステムは車両間通信(V2G)技術による補助サービスの提供を通じてグリッドの安定性を高める上で重要な役割を担っている。本研究では、スイスにおける全国規模のサービスのための将来のシナリオを設計し、カーシェアリングにおけるV2Gの可能性を分析する。カーシェアリングサービスのさまざまなビジネス戦略と同様に,人口変動を考慮したエージェントベースシミュレーションパイプラインを提案し,2030年のシナリオシミュレーションにおけるその成功例を示す。カーシェアリングのユーザ動作を模倣するため,データ駆動型モード選択モデルを開発した。本分析では, 車両使用率の向上など, 車両の小型化, 新たなシェアリングステーションの設置など, 検討シナリオにおける重要な違いを明らかにした。これらの格差は、シナリオと日時に応じて、12MWから50MWまでのアシラリーサービスで利用可能な艦隊の電力柔軟性のバリエーションに変換される。さらに、実際の電力価格データを組み込んだ、カーシェアリングフリートのサブセットを含むケーススタディも実施する。このケーススタディは、電力グリッド事業者と艦隊所有者の両方にとって、金銭的利益を伴うスイートスポットの存在を裏付けるものである。本研究は意思決定者に対してガイドラインを提供し,カーシェアリングの領域内での電力取引に関する規制強化の必要性を強調した。

The proliferation of car sharing services in recent years presents a promising avenue for advancing sustainable transportation. Beyond merely reducing car ownership rates, these systems can play a pivotal role in bolstering grid stability through the provision of ancillary services via vehicle-to-grid (V2G) technologies - a facet that has received limited attention in previous research. In this study, we analyze the potential of V2G in car sharing by designing future scenarios for a national-scale service in Switzerland. We propose an agent-based simulation pipeline that considers population changes as well as different business strategies of the car sharing service, and we demonstrate its successful application for simulating scenarios for 2030. To imitate car sharing user behavior, we develop a data-driven mode choice model. Our analysis reveals important differences in the examined scenarios, such as higher vehicle utilization rates for a reduced fleet size as well as in a scenario featuring new car sharing stations. These disparities translate into variations in the power flexibility of the fleet available for ancillary services, ranging from 12 to 50 MW, depending on the scenario and the time of the day. Furthermore, we conduct a case study involving a subset of the car sharing fleet, incorporating real-world electricity pricing data. The case study substantiates the existence of a sweet spot involving monetary gains for both power grid operators and fleet owners. Our findings provide guidelines to decision makers and underscore the pressing need for regulatory enhancements concerning power trading within the realm of car sharing.

翻訳日:2024-07-17 02:54:11 公開日:2024-07-12

# アップサンプリング時の特徴安定性の向上 -スペクトルアーチファクトと空間文脈の重要性-

Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context ( http://arxiv.org/abs/2311.17524v2 )

ライセンス: Link先を確認

Shashank Agnihotri, Julia Grabinski, Margret Keuper,

(参考訳) 画像復元、画像分割、不均一性推定など、さまざまなタスクにおいて、画素ワイズ予測が求められている。共通モデルはデータ再サンプリングのいくつかの段階を含み、特徴マップの解像度をまず情報を集約し、次に高解像度の出力を生成する。以前の研究では、再サンプリング操作がエイリアスなどのアーティファクトの対象であることが示されている。ダウンサンプリング中、エイリアスは画像分類器の予測安定性を損なうことが示されている。アップサンプリング中は、生成されたコンテンツを検出するために利用されています。しかし、アップサンプリング中のエイリアスの影響については、ピクセルワイズ予測の安定性と堅牢性についてはまだ議論されていない。同じ用語(エイリアス)に該当する一方で、ニューラルネットワークの正当性アップサンプリングの課題は、ダウンサンプリング中のそれと大きく異なる:ダウンサンプリングの際、一部の高頻度を正しく表現できず、エイリアスを避けるために除去する必要がある。しかし、ピクセルワイズ予測のアップサンプリングでは、低解像度では符号化できないような高周波数を復元する必要がある。したがって、信号処理による発見の応用は必要であるが、望ましい出力を達成するのに十分な条件ではない。対照的に、アップサンプリング中の大きな空間コンテキストの可用性は、全てのフィルタ重みを完全に学習しても、安定で高品質な画素ワイドの予測を可能にする。

Pixel-wise predictions are required in a wide variety of tasks such as image restoration, image segmentation, or disparity estimation. Common models involve several stages of data resampling, in which the resolution of feature maps is first reduced to aggregate information and then increased to generate a high-resolution output. Previous works have shown that resampling operations are subject to artifacts such as aliasing. During downsampling, aliases have been shown to compromise the prediction stability of image classifiers. During upsampling, they have been leveraged to detect generated content. Yet, the effect of aliases during upsampling has not yet been discussed w.r.t. the stability and robustness of pixel-wise predictions. While falling under the same term (aliasing), the challenges for correct upsampling in neural networks differ significantly from those during downsampling: when downsampling, some high frequencies can not be correctly represented and have to be removed to avoid aliases. However, when upsampling for pixel-wise predictions, we actually require the model to restore such high frequencies that can not be encoded in lower resolutions. The application of findings from signal processing is therefore a necessary but not a sufficient condition to achieve the desirable output. In contrast, we find that the availability of large spatial context during upsampling allows to provide stable, high-quality pixel-wise predictions, even when fully learning all filter weights.

翻訳日:2024-07-17 02:44:20 公開日:2024-07-12

# 非線形連続時間系のクラスにおける標本複雑度の推定

Estimation Sample Complexity of a Class of Nonlinear Continuous-time Systems ( http://arxiv.org/abs/2312.05382v3 )

ライセンス: Link先を確認

Simon Kuang, Xinfan Lin,

(参考訳) 本稿では, 大規模非線形系のパラメータ推定法について述べる。正規化線形回帰を用いて力学を直接反転させることにより未知パラメータを解く手法は、微分フィルタと正規化最小二乗の新たな設計と解析のアイデアに基づいている。直列で組み合わせると、平均絶対誤差に基づく新しい有限サンプルが得られる。

We present a method of parameter estimation for large class of nonlinear systems, namely those in which the state consists of output derivatives and the flow is linear in the parameter. The method, which solves for the unknown parameter by directly inverting the dynamics using regularized linear regression, is based on new design and analysis ideas for differentiation filtering and regularized least squares. Combined in series, they yield a novel finite-sample bound on mean absolute error of estimation.

翻訳日:2024-07-17 02:34:28 公開日:2024-07-12

# 報酬源としての視覚言語モデル

Vision-Language Models as a Source of Rewards ( http://arxiv.org/abs/2312.09187v3 )

ライセンス: Link先を確認

Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Dmitry Nikulin, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang, Lei Zhang,

(参考訳) 豊かなオープンエンド環境で多くの目標を達成できる汎用エージェントの構築は、強化学習のための研究フロンティアの1つである。 RLを用いた一般エージェント構築の鍵となる制限要因は、異なる目標を達成するために多数の報酬関数が必要であることである。強化学習エージェントの報酬源として市販の視覚言語モデル(VLM)の有効性を検討する。様々な言語目標の視覚的達成に対する報酬は、CLIPファミリーのモデルから導き出すことができ、様々な言語目標を達成するためのRLエージェントの訓練に使用されることを示す。このアプローチを2つの異なる視覚領域で示し、より大きなVLMが視覚目標達成に対してより正確な報酬をもたらすかを示すスケーリング傾向を示し、それによってより有能なRLエージェントを生成する。

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.

翻訳日:2024-07-17 02:34:28 公開日:2024-07-12

# 検出器シグナチャシミュレーションのための深部生成モデル:分類学的レビュー

Deep Generative Models for Detector Signature Simulation: A Taxonomic Review ( http://arxiv.org/abs/2312.09597v2 )

ライセンス: Link先を確認

Baran Hashemi, Claudius Krause,

(参考訳) 現代の衝突型加速器実験では、素粒子間の基本的な相互作用を探究する探索は、非平行な精度に達している。粒子物理学検出器からの信号は、衝突の物理(ハード散乱相互作用の最終状態粒子)を符号化する低レベル物体(エネルギー沈降や軌道など)である。検出器におけるそれらの完全なシミュレーションは、計算と記憶集約的なタスクである。粒子物理学におけるこの計算ボトルネックに対処するため、新たな仮定を導入し、速度の精度をトレードオフする別の手法が開発され、深部生成モデルの進歩によって加速された検出器シミュレーションの代理モデルへの関心が高まっている。これらのモデルは、観測データと統計的に同一の応答を生成することを目的としている。本稿では,従来の文献を包括的かつ徹底的に分析し,方法論的・応用的両面から検出シグネチャのシミュレーションを行う。まず、検出器シグネチャシミュレーションの問題を定式化し、統一可能な様々なバリエーションについて議論する。次に、その基礎となるモデルアーキテクチャに基づいて、最先端の手法を5つのカテゴリに分類し、それぞれの生成戦略を要約する。最後に、検出器シグネチャシミュレーションに先立つ課題と機会を明らかにし、将来の研究開発のステージを設定します。

In modern collider experiments, the quest to explore fundamental interactions between elementary particles has reached unparalleled levels of precision. Signatures from particle physics detectors are low-level objects (such as energy depositions or tracks) encoding the physics of collisions (the final state particles of hard scattering interactions). The complete simulation of them in a detector is a computational and storage-intensive task. To address this computational bottleneck in particle physics, alternative approaches have been developed, introducing additional assumptions and trade off accuracy for speed.The field has seen a surge in interest in surrogate modeling the detector simulation, fueled by the advancements in deep generative models. These models aim to generate responses that are statistically identical to the observed data. In this paper, we conduct a comprehensive and exhaustive taxonomic review of the existing literature on the simulation of detector signatures from both methodological and application-wise perspectives. Initially, we formulate the problem of detector signature simulation and discuss its different variations that can be unified. Next, we classify the state-of-the-art methods into five distinct categories based on their underlying model architectures, summarizing their respective generation strategies. Finally, we shed light on the challenges and opportunities that lie ahead in detector signature simulation, setting the stage for future research and development.

翻訳日:2024-07-17 02:24:41 公開日:2024-07-12

# 一般化されたスタインの補題と部分代数エントロピーの漸近等分性

Generalized Stein's lemma and asymptotic equipartition property for subalgebra entropies ( http://arxiv.org/abs/2401.03090v2 )

ライセンス: Link先を確認

Li Gao, Mizanur Rahaman,

(参考訳) 量子シュタインの補題は、2つの量子状態の区別という文脈における量子仮説テストの基本的な結果である。最近の予想では、「一般化された量子シュタインの補題」と呼ばれ、この結果は、状態の1つが量子状態の凸集合に置き換えられる一般的な枠組みにおいて真であると主張している。この研究において、一般化されたシュタインの補題の主張は、第2の仮説が任意の部分代数 $\mathcal{N}$ の状態空間であるような設定に対して真であることを示す。これは、任意の固定平滑化パラメータ $\epsilon\in (0,1)$ に対して適用される滑らかな部分代数エントロピーに対する強い漸近的同値性によって得られる。資源理論の応用として, サブアルゲブラの相対エントロピーは, 適切な操作下での漸近希釈コストであることを示す。これにより、異なる量子リソース間の接続を確立することができる。

The quantum Stein's lemma is a fundamental result of quantum hypothesis testing in the context of distinguishing two quantum states. A recent conjecture, known as the ``generalized quantum Stein's lemma", asserts that this result is true in a general framework where one of the states is replaced by convex sets of quantum states. In this work, we show that the assertion of the generalized Stein's lemma is true for the setting where the second hypothesis is the state space of any subalgebra $\mathcal{N}$. This is obtained through a strong asymptotic equipartition property for smooth subalgebra entropies that applies for any fixed smoothing parameter $\epsilon\in (0,1)$. As an application in resource theory, we show that the relative entropy of a subalgebra is the asymptotic dilution cost under suitable operations. This provides a scope to establish a connection between different quantum resources.

翻訳日:2024-07-17 02:24:41 公開日:2024-07-12

# パラメトリックマトリックスモデル

Parametric Matrix Models ( http://arxiv.org/abs/2401.11694v4 )

ライセンス: Link先を確認

Patrick Cook, Danny Jammooa, Morten Hjorth-Jensen, Daniel D. Lee, Dean Lee,

(参考訳) パラメトリック行列モデルと呼ばれる機械学習アルゴリズムの一般クラスを示す。ニューロンの生物学を模倣する既存の機械学習モデルとは異なり、パラメトリック行列モデルは量子系の物理をエミュレートする行列方程式を使用する。物理問題の解法と同様に、パラメトリック行列モデルは所望の出力につながる支配方程式を学習する。パラメトリック行列モデルは経験的データから効率的に訓練することができ、方程式は代数的、微分的、あるいは積分的関係を用いることができる。もともと科学計算用に設計されたが、パラメトリック行列モデルは一般的な機械学習問題に適用可能な普遍関数近似器であることが証明されている。基礎となる理論を導入した後、パラメトリック行列モデルを幅広い問題に対してそれらの性能を示す一連の異なる課題に適用する。ここで検証された全ての課題に対して、パラメトリック行列モデルは、入力特徴外挿を可能にする効率的で解釈可能な計算フレームワーク内で正確な結果を生成する。

We present a general class of machine learning algorithms called parametric matrix models. In contrast with most existing machine learning models that imitate the biology of neurons, parametric matrix models use matrix equations that emulate the physics of quantum systems. Similar to how physics problems are usually solved, parametric matrix models learn the governing equations that lead to the desired outputs. Parametric matrix models can be efficiently trained from empirical data, and the equations may use algebraic, differential, or integral relations. While originally designed for scientific computing, we prove that parametric matrix models are universal function approximators that can be applied to general machine learning problems. After introducing the underlying theory, we apply parametric matrix models to a series of different challenges that show their performance for a wide range of problems. For all the challenges tested here, parametric matrix models produce accurate results within an efficient and interpretable computational framework that allows for input feature extrapolation.

翻訳日:2024-07-17 02:14:47 公開日:2024-07-12

# 大規模言語モデルの教育的アライメント

Pedagogical Alignment of Large Language Models ( http://arxiv.org/abs/2402.05000v2 )

ライセンス: Link先を確認

Shashank Sonkar, Kangqi Ni, Sapana Chaudhary, Richard G. Baraniuk,

(参考訳) 本稿では,LLMの教育的文脈における応用の変革的変化を示す,Large Language Models (LLMs) の概念を紹介する。ユーザクエリへの直接応答を提供するのではなく、段階的に整列されたLLMが足場として機能し、複雑な問題を管理可能なサブプロブレムに分割し、建設的なフィードバックとヒントを通じて最終回答へと導く。目的は、学習者に課題の理解と内部化を深める問題解決戦略を付与することである。この分野でのこれまでの研究は主に、目標をアライメント問題とみなすことなく、教師付き微調整アプローチを適用してきたため、人間からのフィードバック(RLHF)法による強化学習は行わなかった。本研究は、アライメント・オブ・アライメントを通してタスクを観察することで物語を再解釈し、RLHFメソッドがLLM動作の整列に優れた代替手段として自然に現れることを示す。この観点から,LLMの教育的アライメントに特化して設計された報酬データセットを構築するための新しい手法を提案する。我々は最先端のRLHFアルゴリズムを3つ適用し、SFTを著しく上回る結果を得た。モデル差とハイパーパラメータ感度の質的解析により,SFTよりもRLHFの方が優れていることが示された。また,本研究は,教育現場における教育現場におけるLLMの性能向上のためのオンラインフィードバックの可能性に注目し,これらのモデルの発展に有意義な洞察を与えるものである。

In this paper, we introduce the novel concept of pedagogically aligned Large Language Models (LLMs) that signifies a transformative shift in the application of LLMs within educational contexts. Rather than providing direct responses to user queries, pedagogically-aligned LLMs function as scaffolding tools, breaking complex problems into manageable subproblems and guiding students towards the final answer through constructive feedback and hints. The objective is to equip learners with problem-solving strategies that deepen their understanding and internalization of the subject matter. Previous research in this field has primarily applied the supervised finetuning approach without framing the objective as an alignment problem, hence not employing reinforcement learning through human feedback (RLHF) methods. This study reinterprets the narrative by viewing the task through the lens of alignment and demonstrates how RLHF methods emerge naturally as a superior alternative for aligning LLM behaviour. Building on this perspective, we propose a novel approach for constructing a reward dataset specifically designed for the pedagogical alignment of LLMs. We apply three state-of-the-art RLHF algorithms and find that they outperform SFT significantly. Our qualitative analyses across model differences and hyperparameter sensitivity further validate the superiority of RLHF over SFT. Also, our study sheds light on the potential of online feedback for enhancing the performance of pedagogically-aligned LLMs, thus providing valuable insights for the advancement of these models in educational settings.

翻訳日:2024-07-17 02:05:02 公開日:2024-07-12

# ProTIP:確率的摂動に対するテキスト・画像拡散モデルの確率的ロバスト性検証

ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation ( http://arxiv.org/abs/2402.15429v2 )

ライセンス: Link先を確認

Yi Zhang, Yun Tang, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Paul Jennings, Xingyu Zhao,

(参考訳) テキスト・ツー・イメージ(T2I)拡散モデル(DM)は、単純なテキスト記述に基づいて高品質な画像を生成する際、印象的な能力を示した。しかし、多くのディープラーニング(DL)モデルに共通するように、DMは堅牢性に欠ける。 T2I DMのロバスト性は二分問題や最悪の問題として評価する試みもあるが、逆例(AE)が見つかると、モデルが一般にロバストであることに答えることはできない。本研究ではまず,T2I DMsの頑健性に関する確率論的概念を導入し,統計的保証により評価するための効率的なフレームワークであるProTIPを確立する。主な課題は次の通りである。一生成工程の計算コストが高いこと。 ii) 摂動入力がAEであるか否かを決定するには、2つの出力分布を比較する必要があるが、これはラベルの誤認によりAEが識別される分類のような他のDLタスクと比べて根本的に困難である。これらの課題に対処するために,AEを識別するための統計検査において,有効性と不確実性の早期停止規則を用いた逐次解析と適応濃度の不等式を用いて,検証対象が満たされる度に,確率的摂動の「正しい」個数を動的に決定する。実験により、一般的なT2I DM上でのProTIPの有効性と効率が検証された。最後に,一般に使用されている防御手法のランク付けにProTIPを適用した。

Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions. However, as is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is whenever an adversarial example (AE) can be found. In this study, we first introduce a probabilistic notion of T2I DMs' robustness; and then establish an efficient framework, ProTIP, to evaluate it with statistical guarantees. The main challenges stem from: i) the high computational cost of the generation process; and ii) determining if a perturbed input is an AE involves comparing two output distributions, which is fundamentally harder compared to other DL tasks like classification where an AE is identified upon misprediction of labels. To tackle the challenges, we employ sequential analysis with efficacy and futility early stopping rules in the statistical testing for identifying AEs, and adaptive concentration inequalities to dynamically determine the "just-right" number of stochastic perturbations whenever the verification target is met. Empirical experiments validate the effectiveness and efficiency of ProTIP over common T2I DMs. Finally, we demonstrate an application of ProTIP to rank commonly used defence methods.

翻訳日:2024-07-17 01:45:18 公開日:2024-07-12

# アウト・オブ・ディストリビューション・セグメンテーションのためのインペインティングによるコンテキスト内オブジェクトの配置

Placing Objects in Context via Inpainting for Out-of-distribution Segmentation ( http://arxiv.org/abs/2402.16392v2 )

ライセンス: Link先を確認

Pau de Jorge, Riccardo Volpi, Puneet K. Dokania, Philip H. S. Torr, Gregory Rogez,

(参考訳) セマンティックセグメンテーションモデルを現実世界にデプロイする場合、トレーニング中に見られなかったセマンティッククラスに必然的に遭遇する。このようなシステムの安全なデプロイを保証するためには,その異常セグメンテーション能力を正確に評価し,改善することが重要である。しかし、セマンティックセグメンテーションデータの取得とラベル付けは高価であり、予測外の条件は長く、潜在的に危険である。実際、既存の異常セグメンテーションデータセットは限られた数の異常をキャプチャし、リアリズムを欠いているか、強いドメインシフトを持っている。本稿では,拡散モデルを用いて,任意のオブジェクトを任意の画像に現実的に付加する,コンテキストにおけるPlacing Objects in Context(POC)パイプラインを提案する。 POCは任意の数のオブジェクトで任意のデータセットを簡単に拡張するために使用することができる。実験では,POC生成データに基づく様々な異常セグメンテーションデータセットを提示し,POCが最新の最先端の異常調整手法の性能を向上させることを示す。 POCは、新しいクラスを学ぶのにも有効である。例えば、CityscapesのサンプルをPascalクラスのサブセットを組み込むことで強化し、そのようなデータに基づいてトレーニングされたモデルがPascalでトレーニングされたベースラインに匹敵するパフォーマンスを実現することを示す。このことはPOC生成画像に基づいて訓練されたモデルの低シント2リアルギャップを裏付ける。コード:https://github.com/naver/poc

When deploying a semantic segmentation model into the real world, it will inevitably encounter semantic classes that were not seen during training. To ensure a safe deployment of such systems, it is crucial to accurately evaluate and improve their anomaly segmentation capabilities. However, acquiring and labelling semantic segmentation data is expensive and unanticipated conditions are long-tail and potentially hazardous. Indeed, existing anomaly segmentation datasets capture a limited number of anomalies, lack realism or have strong domain shifts. In this paper, we propose the Placing Objects in Context (POC) pipeline to realistically add any object into any image via diffusion models. POC can be used to easily extend any dataset with an arbitrary number of objects. In our experiments, we present different anomaly segmentation datasets based on POC-generated data and show that POC can improve the performance of recent state-of-the-art anomaly fine-tuning methods across several standardized benchmarks. POC is also effective for learning new classes. For example, we utilize it to augment Cityscapes samples by incorporating a subset of Pascal classes and demonstrate that models trained on such data achieve comparable performance to the Pascal-trained baseline. This corroborates the low synth2real gap of models trained on POC-generated images. Code: https://github.com/naver/poc

翻訳日:2024-07-17 01:45:18 公開日:2024-07-12

# PCR-99:99%のアウトリーチを持つポイントクラウド登録の実践的方法

PCR-99: A Practical Method for Point Cloud Registration with 99% Outliers ( http://arxiv.org/abs/2402.16598v3 )

ライセンス: Link先を確認

Seong Hun Lee, Javier Civera, Patrick Vandewalle,

(参考訳) 本稿では,未知のスケールと極端外周比の両方を扱える点雲登録法を提案する。 PCR-99と呼ばれる本手法では, 速度を著しく向上させる2つの新しいメカニズムを持つ決定論的3点サンプリング手法を用いて, 1) ペアスケールの整合性に基づくサンプルの整合性の向上, および(2) トリプルトスケールの整合性に基づく効率的な外乱除去手法, 悪いサンプルの事前スクリーニング, テスト対象の仮説数の削減を行う。提案手法は,98%のアウトレイラ比において,最先端技術に匹敵する性能を達成できることを示す。しかし、99%のアウトラヤ比では、既知のスケールと未知のスケールの問題の両方において、最先端の問題を上回ります。特に後者では、ロバスト性と速度の観点から明らかな優位性を観察する。

We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be inliers, and (2) an efficient outlier rejection scheme based on triplet scale consistency, prescreening bad samples and reducing the number of hypotheses to be tested. Our evaluation shows that, up to 98% outlier ratio, the proposed method achieves comparable performance to the state of the art. At 99% outlier ratio, however, it outperforms the state of the art for both known-scale and unknown-scale problems. Especially for the latter, we observe a clear superiority in terms of robustness and speed.

翻訳日:2024-07-17 01:45:18 公開日:2024-07-12

# 補助的敵防衛ネットワークによる追跡ロバスト性向上

Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks ( http://arxiv.org/abs/2402.17976v2 )

ライセンス: Link先を確認

Zhewei Wu, Ruilong Yu, Qihe Liu, Shuying Cheng, Shilin Qiu, Shijie Zhou,

(参考訳) 視覚的物体追跡における敵対的攻撃は、画像に知覚不能な摂動を導入することにより、高度なトラッカーの性能を著しく低下させた。しかし, 物体追跡のための対向防御手法の設計には, まだまだ研究の欠如がある。これらの問題に対処するため,提案するAADNは,トラッカーに入力される前に,入力画像に対する防御的変換を行う。さらに、パラメータ調整なしに他のビジュアルトラッカーとプラグイン・アンド・プレイモジュールとしてシームレスに統合することができる。我々は、AADNを、特にDua-Lossを用いて、トラッカーの分類と回帰の分岐を同時に攻撃する対向サンプルを生成するために、対向訓練を用いて訓練する。 OTB100、LaSOT、VOT2018ベンチマークで実施された大規模な実験により、AADNは適応的および非適応的な攻撃シナリオの両方において、敵攻撃手法に対する優れた防御堅牢性を維持していることが示された。さらに、防衛ネットワークを異種トラッカーに転送する際には、信頼性の高い転送性を示す。最後に、AADNは最大5ms/frameの処理時間を実現し、計算オーバーヘッドを伴わずに既存の高速トラッカーとシームレスに統合できる。

Adversarial attacks in visual object tracking have significantly degraded the performance of advanced trackers by introducing imperceptible perturbations into images. However, there is still a lack of research on designing adversarial defense methods for object tracking. To address these issues, we propose an effective auxiliary pre-processing defense network, AADN, which performs defensive transformations on the input images before feeding them into the tracker. Moreover, it can be seamlessly integrated with other visual trackers as a plug-and-play module without parameter adjustments. We train AADN using adversarial training, specifically employing Dua-Loss to generate adversarial samples that simultaneously attack the classification and regression branches of the tracker. Extensive experiments conducted on the OTB100, LaSOT, and VOT2018 benchmarks demonstrate that AADN maintains excellent defense robustness against adversarial attack methods in both adaptive and non-adaptive attack scenarios. Moreover, when transferring the defense network to heterogeneous trackers, it exhibits reliable transferability. Finally, AADN achieves a processing time of up to 5ms/frame, allowing seamless integration with existing high-speed trackers without introducing significant computational overhead.

翻訳日:2024-07-17 01:45:18 公開日:2024-07-12

# クラウドネイティブなマイクロサービスアプリケーションにおけるインフォームドおよびアセスブルな可観測性設計決定

Informed and Assessable Observability Design Decisions in Cloud-native Microservice Applications ( http://arxiv.org/abs/2403.00633v2 )

ライセンス: Link先を確認

Maria C. Borges, Joshua Bauer, Sebastian Werner, Michael Gebauer, Stefan Tai,

(参考訳) マイクロサービスアプリケーションの信頼性を保証するためには、可観測性が重要です。これらのアプリケーションは、異種環境にデプロイされる多くの独立したサービスがあるため、しばしば障害を起こしやすい。正しく"使用される場合、オブザーバビリティは、開発者が障害を素早く特定し、トラブルシュートするのに役立ちます。しかしながら、マイクロサービスアプリケーションの可観測性の測定と設定は簡単ではなく、ツールに依存し、コストに結びついている。アーキテクトは、観測可能性に関するトレードオフを理解して、異なる観測可能性設計の選択肢を重んじる必要がある。それでも、これらのアーキテクチャ設計決定は体系的な手法ではサポートされず、通常単に「専門的な直観」に依存している。本稿では,情報的かつ継続的に評価可能な可観測性設計決定に至るための体系的手法について論じる。具体的には、クラウドネイティブなマイクロサービスアプリケーションのフォールトオブザーバビリティに注目し、これをテスト可能で定量化可能なプロパティに変換する。目標に向かって、私たちはまず、クラウドネイティブスタック全体の可観測性設計決定の規模とスコープをモデル化します。次に、いわゆる可観測性実験を通じて、マイクロサービスアプリケーションで決定できる可観測性メトリクスを提案する。実験ツールOXNの概念実証実装について述べる。 OXNはChaos Engineeringに似た任意のフォールトをアプリケーションに注入できるが、可観測性の設定を変更するユニークな機能を備えており、以前は探索されていなかった設計上の決定を評価できる。一般的なオープンソースのマイクロサービスアプリケーションを使って、私たちのアプローチを実演し、さまざまな可観測性設計決定に関わるトレードオフを示しています。

Observability is important to ensure the reliability of microservice applications. These applications are often prone to failures, since they have many independent services deployed on heterogeneous environments. When employed "correctly", observability can help developers identify and troubleshoot faults quickly. However, instrumenting and configuring the observability of a microservice application is not trivial but tool-dependent and tied to costs. Architects need to understand observability-related trade-offs in order to weigh between different observability design alternatives. Still, these architectural design decisions are not supported by systematic methods and typically just rely on "professional intuition". In this paper, we argue for a systematic method to arrive at informed and continuously assessable observability design decisions. Specifically, we focus on fault observability of cloud-native microservice applications, and turn this into a testable and quantifiable property. Towards our goal, we first model the scale and scope of observability design decisions across the cloud-native stack. Then, we propose observability metrics which can be determined for any microservice application through so-called observability experiments. We present a proof-of-concept implementation of our experiment tool OXN. OXN is able to inject arbitrary faults into an application, similar to Chaos Engineering, but also possesses the unique capability to modify the observability configuration, allowing for the assessment of design decisions that were previously left unexplored. We demonstrate our approach using a popular open source microservice application and show the trade-offs involved in different observability design decisions.

翻訳日:2024-07-17 01:45:18 公開日:2024-07-12

# 公平な医用画像分類のための基礎モデルに基づくノイズ編集

Debiased Noise Editing on Foundation Models for Fair Medical Image Classification ( http://arxiv.org/abs/2403.06104v4 )

ライセンス: Link先を確認

Ruinan Jin, Wenlong Deng, Minghui Chen, Xiaoxiao Li,

(参考訳) ファウンデーション・モデル(FM)がAIで優位に立つ時代において、我々の研究は医療画像のバイアスの問題に対処し、そのモデルがブラックボックス(例えば、FM API)で動作し、特に画素と感度属性の急激な相関関係である。従来のバイアス緩和手法は、WebホストされたFMへのアクセスが制限されていることと、FM APIで符号化された基盤となるバイアスに対処することの難しさにより、制限に直面している。本稿では,DNEノイズを発生させるD(ebiased)N(oise)E(diting)戦略を提案する。 DNEはFM APIの埋め込みとイメージ自体のバイアスを軽減することができる。さらに,G(reedy) (Z)eroth-O(rder) (GeZO) をブラックボックスAPIでアクセスできない場合,DNEはWhite-boxとBlack-boxの両方のFM APIに適している。我々のパイプライン全体は、直接モデル操作や重要な計算資源を必要とせずに、様々な医療状況にまたがって適用可能な公平性に配慮した画像編集を可能にする。本手法の有効性を実証し, 患者集団, 疾患間の公平性, 有用性について検討した。 AI駆動医療の時代において、この研究は医療診断をより公平にし、事前訓練された画像FMにおけるバイアス軽減の実践的な解決策を示す。私たちのコードはhttps://github.com/ubc-tea/DNE-foundation-model-fairnessで提供されます。

In the era of Foundation Models' (FMs) rising prominence in AI, our study addresses the challenge of biases in medical images while the model operates in black-box (e.g., using FM API), particularly spurious correlations between pixels and sensitive attributes. Traditional methods for bias mitigation face limitations due to the restricted access to web-hosted FMs and difficulties in addressing the underlying bias encoded within the FM API. We propose a D(ebiased) N(oise) E(diting) strategy, termed DNE, which generates DNE noise to mask such spurious correlation. DNE is capable of mitigating bias both within the FM API embedding and the images themselves. Furthermore, DNE is suitable for both white-box and black-box FM APIs, where we introduced G(reedy) (Z)eroth-O(rder) (GeZO) optimization for it when the gradient is inaccessible in black-box APIs. Our whole pipeline enables fairness-aware image editing that can be applied across various medical contexts without requiring direct model manipulation or significant computational resources. Our empirical results demonstrate the method's effectiveness in maintaining fairness and utility across different patient groups and diseases. In the era of AI-driven medicine, this work contributes to making healthcare diagnostics more equitable, showcasing a practical solution for bias mitigation in pre-trained image FMs. Our code is provided at https://github.com/ubc-tea/DNE-foundation-model-fairness.

翻訳日:2024-07-17 01:35:33 公開日:2024-07-12

# 障害とモニタリングにより局在したシステムにおける単一粒子波動関数の非破壊

Unscrambling of single-particle wave functions in systems localized through disorder and monitoring ( http://arxiv.org/abs/2403.10725v4 )

ライセンス: Link先を確認

Marcin Szyniszewski,

(参考訳) 障害やモニタリングによる局在化-非局在化量子相転移を行うシステムでは、位相を識別し、固有の性質を明らかにすることのできるロバストな方法が不可欠である。本研究では,局所粒子を正確に特徴付ける自由フェルミオン波動関数のスレーター決定式を求める過程,すなわち「アンスクラムリング」を解く過程を開発する。中心となる考え方は、単一粒子波動関数のエンベロープ間の重なりを最小化すること、または等価に、各軌道の逆参加比を最大化することである。この数値的に効率的な手法は、指数的局所化(英語版)、パワーロー局所化(英語版)、コンフォメーションクリティカル(英語版)といった異なる種類の波動関数を区別することができる。この方法は、より高次元のシステムに容易に拡張可能である。さらに,不規則な監視自由フェルミオンを1次元に含むより困難な問題に適用し,非破壊過程が共形臨界相と局所化領域法量子Zeno相の存在を明らかにする。本手法は粒子数保存のない自由フェルミオン系にも拡張可能であり, $\mathbb{Z}_2$-symmetric disordered monitored free fermion の位相図を推定して実演する。その結果, 単一粒子波動関数を応用して, 観測された自由フェルミオンや乱れモデルなどのシステムにおける局在化遷移特性について, 貴重な知見を得ることが可能となった。

In systems undergoing localization-delocalization quantum phase transitions due to disorder or monitoring, there is a crucial need for robust methods capable of distinguishing phases and uncovering their intrinsic properties. In this work, we develop a process of finding a Slater determinant representation of free-fermion wave functions that accurately characterizes localized particles, a procedure we dub "unscrambling". The central idea is to minimize the overlap between envelopes of single-particle wave functions or, equivalently, to maximize the inverse participation ratio of each orbital. This numerically efficient methodology can differentiate between distinct types of wave functions: exponentially localized, power-law localized, and conformal critical, also revealing the underlying physics of these states. The method is readily extendable to systems in higher dimensions. Furthermore, we apply this approach to a more challenging problem involving disordered monitored free fermions in one dimension, where the unscrambling process unveils the presence of a conformal critical phase and a localized area-law quantum Zeno phase. Importantly, our method can also be extended to free fermion systems without particle number conservation, which we demonstrate by estimating the phase diagram of $\mathbb{Z}_2$-symmetric disordered monitored free fermions. Our results unlock the potential of utilizing single-particle wave functions to gain valuable insights into the localization transition properties in systems such as monitored free fermions and disordered models.

翻訳日:2024-07-17 01:25:38 公開日:2024-07-12

# 人工知能と自然知の混合:統計力学からAIへ、乱流へ

Mixing Artificial and Natural Intelligence: From Statistical Mechanics to AI and Back to Turbulence ( http://arxiv.org/abs/2403.17993v3 )

ライセンス: Link先を確認

Michael Chertkov,

(参考訳) この論文は、特に乱流研究に焦点を当てた科学研究におけるAIの役割を反映し、特に非平衡統計力学に根ざした拡散モデルを通して、AIの進化について考察する。これは、ディープニューラルネットワークの革新的利用を通じて、ラグランジアンモデルによる乱流の減少に対するAIの重大な影響を浮き彫りにしている。さらに、乱流研究における様々なAI応用をレビューし、AIと統計流体力学の同時進行における潜在的な課題と機会を概説する。この議論は、AIと乱流の研究が複雑に絡み合っており、両方の分野においてより深い洞察と進歩をもたらす未来へのステージを定めている。

The paper reflects on the future role of AI in scientific research, with a special focus on turbulence studies, and examines the evolution of AI, particularly through Diffusion Models rooted in non-equilibrium statistical mechanics. It underscores the significant impact of AI on advancing reduced, Lagrangian models of turbulence through innovative use of deep neural networks. Additionally, the paper reviews various other AI applications in turbulence research and outlines potential challenges and opportunities in the concurrent advancement of AI and statistical hydrodynamics. This discussion sets the stage for a future where AI and turbulence research are intricately intertwined, leading to more profound insights and advancements in both fields.

翻訳日:2024-07-17 01:15:36 公開日:2024-07-12

# TCLC-GS:LiDAR-Camera Gaussian Splatting for autonomous Driving

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving ( http://arxiv.org/abs/2404.02410v2 )

ライセンス: Link先を確認

Cheng Zhao, Su Sun, Ruoyu Wang, Yuliang Guo, Jun-Jun Wan, Zhou Huang, Xinyu Huang, Yingjie Victor Chen, Liu Ren,

(参考訳) 都市シーンのほとんどの3Dガウススティング(3D-GS)ベースの手法は、3Dガウスを3D LiDARポイントで直接初期化するが、これはLiDARのデータ能力を過小評価するだけでなく、カメラデータにLiDARを融合する潜在的な利点を見落としている。本稿では,LiDAR-Camera Gaussian Splatting (TCLC-GS) を設計し,LiDARとカメラセンサの双方の強度をフル活用し,高速で高品質な3D再構成とRGB/deepth合成を実現する。 TCLC-GSは、LiDARカメラデータから得られたハイブリッドな(カラー化された3Dメッシュ)と暗黙的な(階層的なオクツリー特徴)の3D表現を設計し、スプレイティングのために3Dガウスの性質を豊かにする。 3Dガウスの性質は、より完成度の高い3D形状と色情報を提供する3Dメッシュと一致して初期化されるだけでなく、検索したオクツリーの暗黙的特徴を通じてより広い文脈情報も付与される。ガウススプレイティング最適化プロセスの間、3Dメッシュは密度の深い深度情報を監視として提供し、ロバストな幾何学を学ぶことでトレーニングプロセスを強化する。 Waymo Open Dataset と nuScenes Dataset の総合評価は、我々の方法のSOTA(State-of-the-art)性能を検証する。 NVIDIA RTX 3090 Tiを1つのNVIDIA RTX 3090 Tiを用いて高速トレーニングを行い,1920x1280 (Waymo)の解像度で90FPS,都市シナリオで1600x900 (nuScenes)の解像度で120FPSの解像度でリアルタイムRGBと深度レンダリングを実現する。

Most 3D Gaussian Splatting (3D-GS) based methods for urban scenes initialize 3D Gaussians directly with 3D LiDAR points, which not only underutilizes LiDAR data capabilities but also overlooks the potential advantages of fusing LiDAR with camera data. In this paper, we design a novel tightly coupled LiDAR-Camera Gaussian Splatting (TCLC-GS) to fully leverage the combined strengths of both LiDAR and camera sensors, enabling rapid, high-quality 3D reconstruction and novel view RGB/depth synthesis. TCLC-GS designs a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree feature) 3D representation derived from LiDAR-camera data, to enrich the properties of 3D Gaussians for splatting. 3D Gaussian's properties are not only initialized in alignment with the 3D mesh which provides more completed 3D shape and color information, but are also endowed with broader contextual information through retrieved octree implicit features. During the Gaussian Splatting optimization process, the 3D mesh offers dense depth information as supervision, which enhances the training process by learning of a robust geometry. Comprehensive evaluations conducted on the Waymo Open Dataset and nuScenes Dataset validate our method's state-of-the-art (SOTA) performance. Utilizing a single NVIDIA RTX 3090 Ti, our method demonstrates fast training and achieves real-time RGB and depth rendering at 90 FPS in resolution of 1920x1280 (Waymo), and 120 FPS in resolution of 1600x900 (nuScenes) in urban scenarios.

翻訳日:2024-07-17 01:05:49 公開日:2024-07-12

# BiSHop: 汎用スパースホップフィールドモデルによる話者データの双方向セルラー学習

BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model ( http://arxiv.org/abs/2404.03830v2 )

ライセンス: Link先を確認

Chenwei Xu, Yu-Chao Huang, Jerry Yao-Chieh Hu, Weijian Li, Ammar Gilani, Hsi-Sheng Goan, Han Liu,

(参考訳) 本稿では,表層学習のための新しいエンド・ツー・エンド・エンドフレームワークであるtextbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop})を紹介する。 BiSHopは、深層表型学習の2つの大きな課題に対処する。我々の主要な動機は、連想記憶と注意機構の結びつきが最近確立されたことにある。結果として、BiSHopは2つの相互接続された指向学習モジュールを通して列と行の両方のデータを逐次処理するデュアルコンポーネントアプローチを使用する。計算学的には、これらの加群は一般化されたスパースな現代的なホップフィールド層(英語版)の層を持ち、適応可能な間隔を持つ現代のホップフィールドモデルのスパース拡張である。メソジカルには、BiSHopはマルチスケールの表現学習を促進し、機能内相互作用と機能間相互作用の両方を、各スケールで適応的な間隔でキャプチャする。実証的には、さまざまな実世界のデータセットの実験を通じて、BiSHopが現在のSOTAメソッドをはるかに少ないHPOの実行で超越し、深い表層学習のための堅牢なソリューションであることを実証した。

We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and attention mechanisms. Consequently, BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise through two interconnected directional learning modules. Computationally, these modules house layers of generalized sparse modern Hopfield layers, a sparse extension of the modern Hopfield model with adaptable sparsity. Methodologically, BiSHop facilitates multi-scale representation learning, capturing both intra-feature and inter-feature interactions, with adaptive sparsity at each scale. Empirically, through experiments on diverse real-world datasets, we demonstrate that BiSHop surpasses current SOTA methods with significantly less HPO runs, marking it a robust solution for deep tabular learning.

翻訳日:2024-07-17 01:05:49 公開日:2024-07-12

# CTを用いた拡散シュレーディンガーブリッジによる脳室分画 : 対象領域の真理を伴わない

CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths ( http://arxiv.org/abs/2405.18267v2 )

ライセンス: Link先を確認

Reihaneh Teimouri, Marta Kersten-Oertel, Yiming Xiao,

(参考訳) クリニカルCTスキャンによる高効率かつ正確な脳室分画は、腹腔鏡下手術のような緊急手術には不可欠である。ソフトティッシュコントラストの低下と, 臨床脳CTの注釈データベースの不足にともなって, 拡散モデルに基づくドメイン適応を生かして, CTセグメンテーションの真理を必要とせず, 新たな不確実性を意識した心室セグメンテーション技術を導入する。具体的には拡散型Schr\odinger Bridgeとアテンション・リカレントU-Netを併用し,MRIと自動CTセグメンテーションを導出する。重要なことは、画像翻訳とセグメンテーションタスクのエンドツーエンドで協調的なトレーニングフレームワークを提案し、個別のタスクを個別にトレーニングするよりも、その利点を実証することである。ドメイン適応のための2つの異なるGANモデル(CycleGAN と CUT)を用いて、類似した設定と比較することにより、拡散モデルの利点をセグメント化と画像翻訳品質の改善に向けて明らかにする。提案手法はDiceスコア0.78$\pm$0.27で,SynSeg-Netを含む比較手法よりも優れ,自動セグメンテーション結果の品質管理をより容易にするための直感的な不確実性対策を提供する。提案手法の実装は、https://github.com/HealthX-Lab/DiffusionSynCTSegで利用可能である。

Efficient and accurate brain ventricle segmentation from clinical CT scans is critical for emergency surgeries like ventriculostomy. With the challenges in poor soft tissue contrast and a scarcity of well-annotated databases for clinical brain CTs, we introduce a novel uncertainty-aware ventricle segmentation technique without the need of CT segmentation ground truths by leveraging diffusion-model-based domain adaptation. Specifically, our method employs the diffusion Schr\"odinger Bridge and an attention recurrent residual U-Net to capitalize on unpaired CT and MRI scans to derive automatic CT segmentation from those of the MRIs, which are more accessible. Importantly, we propose an end-to-end, joint training framework of image translation and segmentation tasks, and demonstrate its benefit over training individual tasks separately. By comparing the proposed method against similar setups using two different GAN models for domain adaptation (CycleGAN and CUT), we also reveal the advantage of diffusion models towards improved segmentation and image translation quality. With a Dice score of 0.78$\pm$0.27, our proposed method outperformed the compared methods, including SynSeg-Net, while providing intuitive uncertainty measures to further facilitate quality control of the automatic segmentation outcomes. The implementation of our proposed method is available at: https://github.com/HealthX-Lab/DiffusionSynCTSeg.

翻訳日:2024-07-17 00:36:09 公開日:2024-07-12

# SAMM:Sharded Automated Market Makers

SAMM: Sharded Automated Market Makers ( http://arxiv.org/abs/2406.05568v2 )

ライセンス: Link先を確認

Hongyin Chen, Amit Vaisman, Ittay Eyal,

(参考訳) \emph{Automated Market Makers} (\emph{AMMs})は、分散型金融(DeFi)ブロックチェーンベースのプラットフォームの基礎である。それらはスマートコントラクトであり、 \emph{liquidity pool} を維持することで、仮想トークンの直接交換を可能にする。トレーダーは契約書とトークンを交換し、手数料を支払い、流動性はこれらの手数料で支払われる「emph{liquidity providers}」から得られる。しかし、需要が増えているにもかかわらず、AMMのパフォーマンスは限られている。最先端のブロックチェーンプラットフォームは、トランザクションの並列実行を可能にする。しかし,AMMは演算が可換ではないため,トランザクションをシリアライズしなければならないため,これらの利得を享受できないことを示す。複数の独立な \emph{shards} からなる AMM である \emph{SAMM} を述べる。すべてのシャードは、同じチェーンで動作するスマートコントラクトだが、それぞれが独立しているため、並列実行が可能である。課題は、標準的なAMMでの取引が流動性プールが大きい場合、より安いことである。したがって、複数のAMMを単純に使用すれば、トレーダーは各取引を全てのAMMに分割し、パフォーマンスが悪化することを示す。 SAMMは取引手数料の新しい設計でこの問題に対処する。トレーダーは最小のシャードのみを使用するようにインセンティブを得ている。流動性プロバイダは、すべてのプールの流動性をバランスさせ、取引が均等に分散された状態に収束する。 Suiブロックチェーンの評価によると、SAMMのスループットは従来のAMMの5倍以上であり、システムの限界に近づいている。 SAMMは直接デプロイ可能なオープンソーススマートコントラクトであり、個人とDeFiアプリケーションの大規模取引を可能にする。

\emph{Automated Market Makers} (\emph{AMMs}) are a cornerstone of decentralized finance (DeFi) blockchain-based platforms. They are smart contracts, enabling the direct exchange of virtual tokens by maintaining \emph{liquidity pools}. Traders exchange tokens with the contract, paying a fee; liquidity comes from \emph{liquidity providers}, paid by those fees. But despite growing demand, the performance of AMMs is limited. State-of-the-art blockchain platforms allow for parallel execution of transactions. However, we show that AMMs do not enjoy these gains, since their operations are not commutative so transactions using them must be serialized. We present \emph{SAMM}, an AMM comprising multiple independent \emph{shards}. All shards are smart contracts operating in the same chain, but they allow for parallel execution as each is independent. The challenge is that trading in a standard AMM is cheaper if its liquidity pool is larger. Therefore, we show that simply using multiple smaller AMMs results in traders splitting each trade among all AMMs, which worsens performance. SAMM addresses this issue with a novel design of the trading fees. Traders are incentivized to use only a single smallest shard. We show that all Subgame-Perfect Nash Equilibria (SPNE) fit the desired behavior: Liquidity providers balance the liquidity among all pools, so the system converges to the state where trades are evenly distributed. Evaluation in the Sui blockchain shows that SAMM's throughput is over fivefold that of traditional AMMs, approaching the system's limit. SAMM is a directly deployable open-source smart contract, allowing trading at scale for individuals and DeFi applications.

翻訳日:2024-07-17 00:16:39 公開日:2024-07-12

# IPv4ID選択精度,セキュリティ,性能の分類と比較分析

A Taxonomy and Comparative Analysis of IPv4 ID Selection Correctness, Security, and Performance ( http://arxiv.org/abs/2406.06483v2 )

ライセンス: Link先を確認

Joshua J. Daymude, Antonio M. Espinoza, Sean Bergen, Benjamin Mixon-Baca, Jeffrey Knockel, Jedidiah R. Crandall,

(参考訳) よりセキュアなインターネットへの戦いは、ネットワークプロトコルの最も基本的な部分を含む、多くの面で争われている。 IPv4 Identifier (IPID)は、IPv4ヘッダフィールドであり、ネットワーク特性をスキャンし、オフパス接続を推測し、DNSキャッシュを害する悪用されたサイドチャネルとして、インターネットと同じくらい長い歴史を持つ。本稿では,25年間のIPID利用履歴とそれに対応するIPID選択方法の変更を分類する。これらの手法の正しさと安全性を数学的に解析し、その性能を実証的に評価することにより、ネットワークセキュリティにおける体系的評価の価値を強調するとともに、現在のオペレーティングシステム実装の欠点と同様にベストプラクティスの推奨を明らかにする。

The battle for a more secure Internet is waged on many fronts, including the most basic of networking protocols. Our focus is the IPv4 Identifier (IPID), an IPv4 header field as old as the Internet with an equally long history as an exploited side channel for scanning network properties, inferring off-path connections, and poisoning DNS caches. This article taxonomizes the 25-year history of IPID-based exploits and the corresponding changes to IPID selection methods. By mathematically analyzing these methods' correctness and security and empirically evaluating their performance, we reveal recommendations for best practice as well as shortcomings of current operating system implementations, emphasizing the value of systematic evaluations in network security.

翻訳日:2024-07-17 00:16:39 公開日:2024-07-12

# 機械的解釈可能性によるモデル性能のコンパクト証明

Compact Proofs of Model Performance via Mechanistic Interpretability ( http://arxiv.org/abs/2406.11779v8 )

ライセンス: Link先を確認

Jason Gross, Rajashree Agrawal, Thomas Kwa, Euan Ong, Chun Hei Yip, Alex Gibson, Soufiane Noubir, Lawrence Chan,

(参考訳) 本稿では,モデル性能の形式的保証を導出し,コンパクトに証明するために,機械的解釈可能性,すなわちリバースエンジニアリングモデルウェイトを人間解釈可能なアルゴリズムに変換する手法を提案する。提案手法は, 最大K$タスクで訓練した151個の小型変圧器の精度について, 下限を正式に証明して試作する。我々は,コンピュータ支援型証明戦略を102種類作成し,それぞれのモデルに対して,その長さと厳密さを評価する。定量的な測定値を用いることで、より短い証明が必要になり、より機械的な理解が得られます。さらに、より忠実なメカニスティックな理解が、パフォーマンス境界の厳密化につながることが分かっています。これらの関係は、証明のサブセットを質的に検証することで確認する。最後に, モデル性能に関するコンパクトな証明を生成するために, 機械的解釈可能性を利用する上で重要な課題として, 合成構造のないノイズを同定する。

We propose using mechanistic interpretability -- techniques for reverse engineering model weights into human-interpretable algorithms -- to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving lower bounds on the accuracy of 151 small transformers trained on a Max-of-$K$ task. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless noise as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.

翻訳日:2024-07-17 00:16:39 公開日:2024-07-12

# 解析的リアプノフ関数発見のためのニューラルネットワークとシンボリック回帰の組み合わせ

Combining Neural Networks and Symbolic Regression for Analytical Lyapunov Function Discovery ( http://arxiv.org/abs/2406.15675v3 )

ライセンス: Link先を確認

Jie Feng, Haohan Zou, Yuanyuan Shi,

(参考訳) 非線形力学系に対する解析的リアプノフ関数を構成するために,CoNSAL (Combining Neural Network and Symbolic regression for Analytical Lyapunov function)を提案する。このフレームワークは、ニューラルネットワークを精密な分析形式に蒸留するためにシンボリックレグレッションを適用する、ニューラルリアプノフ関数とシンボリックレグレッション成分を含む。本手法は, 記号回帰を翻訳の道具としてだけでなく, 反例を明らかにする手段としても活用する。この手順は、解析的定式化において反例が見つからない場合に終了する。従来の結果と比較して、CoNSALは学習過程と最終結果の両方における解釈性を改善したリアプノフ関数の解析形式を直接生成する。我々は,CoNSALを2次元逆振子,経路追従,Van Der Pol Oscillator,3次元トリグダイナミクス,4次元回転輪振子,6次元3バスパワーシステムに適用し,本アルゴリズムが有効なリアプノフ関数の発見に成功したことを示す。コード例はhttps://github.com/HaohanZou/CoNSALで公開されている。

We propose CoNSAL (Combining Neural networks and Symbolic regression for Analytical Lyapunov function) to construct analytical Lyapunov functions for nonlinear dynamic systems. This framework contains a neural Lyapunov function and a symbolic regression component, where symbolic regression is applied to distill the neural network to precise analytical forms. Our approach utilizes symbolic regression not only as a tool for translation but also as a means to uncover counterexamples. This procedure terminates when no counterexamples are found in the analytical formulation. Compared with previous results, CoNSAL directly produces an analytical form of the Lyapunov function with improved interpretability in both the learning process and the final results. We apply CoNSAL to 2-D inverted pendulum, path following, Van Der Pol Oscillator, 3-D trig dynamics, 4-D rotating wheel pendulum, 6-D 3-bus power system, and demonstrate that our algorithm successfully finds their valid Lyapunov functions. Code examples are available at https://github.com/HaohanZou/CoNSAL.

翻訳日:2024-07-17 00:06:54 公開日:2024-07-12

# AIによる災害救助支援ドローン:課題と機会

AI-based Drone Assisted Human Rescue in Disaster Environments: Challenges and Opportunities ( http://arxiv.org/abs/2406.15875v2 )

ライセンス: Link先を確認

Narek Papyan, Michel Kulhandjian, Hovannes Kulhandjian, Levon Hakob Aslanyan,

(参考訳) 本調査では,特にヒトの悲鳴やその他の苦難信号を識別することで,個人検出にドローンベースのシステムを活用することに重点を置いている。この研究は、地震、ハリケーン、軍事紛争、山火事などの災害後のシナリオに大きく関係している。これらのドローンは、救助隊が直接アクセスすることが困難な災害に遭った地域をホバリングすることができる。無人航空機(英: Unmanned air vehicle、UAV)は、災害時の捜索救助任務のためにしばしば配備される航空機である。通常、ドローンは空中画像をキャプチャして、構造的な損傷を評価し、災害の程度を識別する。また、熱画像技術を使って体温を検知し、個人を見つけるのに役立つ。大規模なドローンは、孤立した災害で立ち往生している人々に必須の物資を届けるために使われる場合もある。本論では, 空中音響による人間の位置推定にまつわる課題について考察する。聴覚システムは、動物の鳴き声や風など、自然に発生する人間の叫び声と音を区別しなければならない。さらに、人々が救助隊に合図しようとする、叫び声や拍手などの信号に関連する、異なるパターンを認識する能力も備えるべきである。この課題に対処するためには、人工知能(AI)を使用して音の周波数を分析し、一般的なオーディオシグネチャを識別する。畳み込みニューラルネットワーク(CNN)のようなディープラーニングベースのネットワークは、これらのシグネチャを使用して、ドローンモーターやその他の環境要因によって発生するノイズを除去する訓練が可能である。さらに、マイクロホンアレイ信号に基づく到着方向(DOA)のような信号処理技術を用いることで、人間の騒音の音源を追跡する精度を高めることができる。

In this survey we are focusing on utilizing drone-based systems for the detection of individuals, particularly by identifying human screams and other distress signals. This study has significant relevance in post-disaster scenarios, including events such as earthquakes, hurricanes, military conflicts, wildfires, and more. These drones are capable of hovering over disaster-stricken areas that may be challenging for rescue teams to access directly. Unmanned aerial vehicles (UAVs), commonly referred to as drones, are frequently deployed for search-and-rescue missions during disaster situations. Typically, drones capture aerial images to assess structural damage and identify the extent of the disaster. They also employ thermal imaging technology to detect body heat signatures, which can help locate individuals. In some cases, larger drones are used to deliver essential supplies to people stranded in isolated disaster-stricken areas. In our discussions, we delve into the unique challenges associated with locating humans through aerial acoustics. The auditory system must distinguish between human cries and sounds that occur naturally, such as animal calls and wind. Additionally, it should be capable of recognizing distinct patterns related to signals like shouting, clapping, or other ways in which people attempt to signal rescue teams. To tackle this challenge, one solution involves harnessing artificial intelligence (AI) to analyze sound frequencies and identify common audio signatures. Deep learning-based networks, such as convolutional neural networks (CNNs), can be trained using these signatures to filter out noise generated by drone motors and other environmental factors. Furthermore, employing signal processing techniques like the direction of arrival (DOA) based on microphone array signals can enhance the precision of tracking the source of human noises.

翻訳日:2024-07-17 00:06:54 公開日:2024-07-12

# ArzEn-LLM:LLMを用いたコード変換エジプト英語翻訳と音声認識

ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs ( http://arxiv.org/abs/2406.18120v2 )

ライセンス: Link先を確認

Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali, Rania Hossam, Walid Gomaa,

(参考訳) 近年のエジプト・アラビア語と英語のコードスイッチング現象の広範化にともなって、機械翻訳(MT)と自動音声認識(ASR)システムの複雑さを探求し、コードスイッチしたエジプト・アラビア語を英語またはエジプト・アラビア語に翻訳することに焦点を当てた。本研究の目的は,LLama や Gemma などの大規模言語モデルを用いて,これらのシステム開発に使用される方法論を提示することである。 ASR の分野では,Whisper モデルをコード変更によるエジプトのアラビア語認識に利用し,データ前処理やトレーニング技術を含む実験手順を詳述する。 ASRをMTと統合した連続的な音声テキスト翻訳システムの実装を通じて、限られた資源とエジプト・アラビア方言の特徴によって生じる課題を克服することを目指している。確立された指標に対する評価は有望な結果を示し、我々の手法は、最先端の英語翻訳に対して56\%、アラビア語翻訳では9.3\%の大幅な改善をもたらす。コードスイッチングは音声言語に深く依存しているため、ASRシステムはこの現象を効果的に扱えることが重要である。この能力は、ビジネス交渉、文化交流、学術談話など、様々な分野におけるシームレスな対話を可能にするために不可欠である。私たちのモデルとコードはオープンソースリソースとして利用できます。コード: \url{http://github.com/ahmedheakl/arazn-llm}}, Models: \url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e}

Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect. Evaluation against established metrics showcases promising results, with our methodologies yielding a significant improvement of $56\%$ in English translation over the state-of-the-art and $9.3\%$ in Arabic translation. Since code-switching is deeply inherent in spoken languages, it is crucial that ASR systems can effectively handle this phenomenon. This capability is crucial for enabling seamless interaction in various domains, including business negotiations, cultural exchanges, and academic discourse. Our models and code are available as open-source resources. Code: \url{http://github.com/ahmedheakl/arazn-llm}}, Models: \url{http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e}.

翻訳日:2024-07-17 00:06:54 公開日:2024-07-12

# ResumeAtlas:大規模データセットと大規模言語モデルによるResume分類の再検討

ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models ( http://arxiv.org/abs/2406.18125v2 )

ライセンス: Link先を確認

Ahmed Heakl, Youssef Mohamed, Noran Mohamed, Aly Elsharkawy, Ahmed Zaky,

(参考訳) オンライン採用プラットフォームへの依存度の増加とAI技術の採用は、効率的な再編成手法の必要性を浮き彫りにした。しかし、小さなデータセット、標準化された履歴テンプレートの欠如、プライバシー問題といった課題は、既存の分類モデルの正確性と有効性を妨げている。本研究では,これらの課題に対して,分類を再開するための包括的アプローチを提案する。多様な情報源から13,389人の履歴書を収集し,BERT や Gemma1.1 2B などの大規模言語モデル (LLM) を用いて分類を行った。その結果,従来の機械学習手法に比べて,トップ1の精度92\%,トップ5の精度97.5\%を達成した。これらの知見は、履歴分類システムの精度と堅牢性を高めるために、データセットの品質と高度なモデルアーキテクチャの重要性を浮き彫りにして、オンライン採用の実践の分野を推し進めている。

The increasing reliance on online recruitment platforms coupled with the adoption of AI technologies has highlighted the critical need for efficient resume classification methods. However, challenges such as small datasets, lack of standardized resume templates, and privacy concerns hinder the accuracy and effectiveness of existing classification models. In this work, we address these challenges by presenting a comprehensive approach to resume classification. We curated a large-scale dataset of 13,389 resumes from diverse sources and employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for classification. Our results demonstrate significant improvements over traditional machine learning approaches, with our best model achieving a top-1 accuracy of 92\% and a top-5 accuracy of 97.5\%. These findings underscore the importance of dataset quality and advanced model architectures in enhancing the accuracy and robustness of resume classification systems, thus advancing the field of online recruitment practices.

翻訳日:2024-07-17 00:06:54 公開日:2024-07-12

# 資源依存関係研究のための概念的・形式的基礎研究

Conceptual and formal groundwork for the study of resource dependence relations ( http://arxiv.org/abs/2407.00164v2 )

ライセンス: Link先を確認

Yìlè Yīng, Tomáš Gonda, Robert Spekkens,

(参考訳) 資源理論は状態に対して事前順序を課し、1つの状態が1番目の状態から2番目の状態へ自由な操作で変換できる場合、そして自由な操作の集合が研究中のリソースフルネスの概念を定義する。一般に、1つの資源理論の序列における状態の位置は、異なる資源理論の序列における位置を制約することができる。リソースフルネスの異なる概念の間には、非自明な依存関係が存在する可能性がある。本稿では,資源依存関係の研究における概念的および形式的基礎を概説する。特に、各資源理論の完全集合を含む一組のモノトン間の関係が、資源依存関係の完全な特徴を与えることに留意する。例えば、ブリュッホ球面上の3つの直交軸に沿ったキュービットの近面非対称性に関する3つの資源理論を考えると、この近面対称性は、同一性写像と与えられた軸上の$\pi$回転からなる$\mathbb{Z}_2$の表現を指す。この例は、各資源理論に対して完全なモノトンの集合を導出することができ、これらのモノトンの間に保持されるすべての関係を決定できるので、リソース依存関係を決定できる。しかしながら、この最も単純な例であっても、これらの関係はすでにかなり曖昧である。

A resource theory imposes a preorder over states, with one state being above another if the first can be converted to the second by a free operation, and where the set of free operations defines the notion of resourcefulness under study. In general, the location of a state in the preorder of one resource theory can constrain its location in the preorder of a different resource theory. It follows that there can be nontrivial dependence relations between different notions of resourcefulness. In this article, we lay out the conceptual and formal groundwork for the study of resource dependence relations. In particular, we note that the relations holding among a set of monotones that includes a complete set for each resource theory provides a full characterization of resource dependence relations. As an example, we consider three resource theories concerning the about-face asymmetry properties of a qubit along three mutually orthogonal axes on the Bloch ball, where about-face symmetry refers to a representation of $\mathbb{Z}_2$, consisting of the identity map and a $\pi$ rotation about the given axis. This example is sufficiently simple that we are able to derive a complete set of monotones for each resource theory and to determine all of the relations that hold among these monotones, thereby completely solving the problem of determining resource dependence relations. Nonetheless, we show that even in this simplest of examples, these relations are already quite nuanced.

翻訳日:2024-07-16 23:57:10 公開日:2024-07-12

# FlowCon:フローベースコントラスト学習を用いたアウト・オブ・ディストリビューション検出

FlowCon: Out-of-Distribution Detection using Flow-Based Contrastive Learning ( http://arxiv.org/abs/2407.03489v2 )

ライセンス: Link先を確認

Saandeep Aathreya, Shaun Canavan,

(参考訳) ディープラーニング手法の現実的な応用が拡大するにつれて、OOD(Out-of-distriion)データの特定がますます重要になっている。ポストホック法では、オフレイアデータに微調整されたソフトマックススコアを変更したり、中間特徴層を活用して、In-Distribution(ID)とOODサンプルの識別を行う。他の方法は多様なOODサンプルを用いてIDとOODの相違を学習することに焦点を当てている。しかしながら、これらの手法は典型的には、想定される外れ値のサンプルの品質に依存する。密度ベースのメソッドは明示的にクラス条件の分布をモデル化するが、これは長いトレーニング時間や分類器の再訓練を必要とする。これらの問題に対処するために、新しい密度に基づくOOD検出技術である \textit{FlowCon} を導入する。我々の主な革新は、正規化フローの特性と教師付きコントラスト学習を効率的に組み合わせることであり、堅牢な表現学習とトラクタブル密度推定を確実にすることである。 ResNet18 や WideResNet の分類器で事前訓練した CIFAR-10 や CIFAR-100 などの共通ビジョンデータセットに対して,本手法の有効性を実証的に評価した。また、UMAP埋め込みを用いた確率プロットと定性的可視化を用いて定量的解析を行い、様々なOODコンテキスト下で提案手法のロバスト性を示す。コードは、決定後、オープンソース化される。

Identifying Out-of-distribution (OOD) data is becoming increasingly critical as the real-world applications of deep learning methods expand. Post-hoc methods modify softmax scores fine-tuned on outlier data or leverage intermediate feature layers to identify distinctive patterns between In-Distribution (ID) and OOD samples. Other methods focus on employing diverse OOD samples to learn discrepancies between ID and OOD. These techniques, however, are typically dependent on the quality of the outlier samples assumed. Density-based methods explicitly model class-conditioned distributions but this requires long training time or retraining the classifier. To tackle these issues, we introduce \textit{FlowCon}, a new density-based OOD detection technique. Our main innovation lies in efficiently combining the properties of normalizing flow with supervised contrastive learning, ensuring robust representation learning with tractable density estimation. Empirical evaluation shows the enhanced performance of our method across common vision datasets such as CIFAR-10 and CIFAR-100 pretrained on ResNet18 and WideResNet classifiers. We also perform quantitative analysis using likelihood plots and qualitative visualization using UMAP embeddings and demonstrate the robustness of the proposed method under various OOD contexts. Code will be open-sourced post decision.

翻訳日:2024-07-16 23:47:24 公開日:2024-07-12

# フラッシュ正規化 - LLM用高速RMSNorm

Flash normalization: fast RMSNorm for LLMs ( http://arxiv.org/abs/2407.09577v1 )

ライセンス: Link先を確認

Nils Graef, Matthew Clapp, Andrew Wasielewski,

(参考訳) RMSNormはLlama、Mistral、OpenELMなど多くのLLMで使われている。この記事では、RMSNormの正確な実装であるFlashNormについて詳述する。コードとトランスフォーマーのトリックについては、https://huggingface.co/open-machine/FlashNormを参照してください。

RMSNorm is used by many LLMs such as Llama, Mistral, and OpenELM. This paper details FlashNorm, which is an exact but faster implementation of RMSNorm followed by linear layers. See https://huggingface.co/open-machine/FlashNorm for code and more transformer tricks.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# 拡散傾向解析を用いた教師なし異常検出

Unsupervised Anomaly Detection Using Diffusion Trend Analysis ( http://arxiv.org/abs/2407.09578v1 )

ライセンス: Link先を確認

Eunwoo Kim, Un Yang, Cheol Lae Roh, Stefano Ermon,

(参考訳) 拡散モデルによる再構成に基づく従来の異常検出技術は, 異常位置や形状を高い性能で識別できるため, 広く用いられている。しかし、正常な特性を維持しながら異常を分解できる適切なノイズパラメータを決定するには限界がある。また,拡散モデルのボラティリティにより,再建時に正常領域がかなり変動し,誤検出が生じる。本稿では, 劣化度に応じて復元傾向を分析し, 既存手法の両問題を効果的に解決し, 異常検出手法を提案する。提案手法は,産業的異常検出のためのオープンデータセット上で検証され,多くの評価基準に基づいて既存手法の性能を向上させる。既存の異常検出手法と簡単に組み合わせることで、計算コストと性能のトレードオフを提供し、製造業における高い応用可能性を実現する。

Conventional anomaly detection techniques based on reconstruction via denoising diffusion model are widely used due to their ability to identify anomaly locations and shapes with high performance. However, there is a limitation in determining appropriate noise parameters that can degrade anomalies while preserving normal characteristics. Also, due to the volatility of the diffusion model, normal regions can fluctuate considerably during reconstruction, resulting in false detection. In this paper, we propose a method to detect anomalies by analysis of reconstruction trend depending on the degree of degradation, effectively solving the both problems of existing methods. The proposed method is validated on an open dataset for industrial anomaly detection, improving the performance of existing methods on a number of evaluation criteria. With the ease of combination with existing anomaly detection methods, it provides a tradeoff between computational cost and performance, allowing it high application potential in manufacturing industry.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# ペキュリヤ活性化機能を失わない - EUAFとそれを超えるもの

Don't Fear Peculiar Activation Functions: EUAF and Beyond ( http://arxiv.org/abs/2407.09580v1 )

ライセンス: Link先を確認

Qianchao Wang, Shijun Zhang, Dong Zeng, Zhaoheng Xie, Hengtao Guo, Feng-Lei Fan, Tieyong Zeng,

(参考訳) 本稿では,Purometric elementary Universal Activation Function (PEUAF) と呼ばれる超表現活性化関数を提案する。 CIFAR10, Tiny-ImageNet, ImageNetなど, 各種産業・画像データセットの系統的, 包括的実験により, PEUAFの有効性を実証した。さらに, 特定の超表現活性化関数を持つ固定サイズネットワークにより, 任意の連続関数が任意の所望の精度に近似可能であることを示すことによって, いくつかの研究で実証された超表現活性化関数の族を著しく一般化する。特に、我々の研究は、超表現的活性化関数の発達を妨げる2つの大きなボトルネックに対処している。これは、超表現的関数の限定的な識別であり、それらの広範囲な適用性に対する疑念と、それらの特異な形式を生じさせ、現実のアプリケーションにおけるスケーラビリティと実用性に懐疑論をもたらす。

In this paper, we propose a new super-expressive activation function called the Parametric Elementary Universal Activation Function (PEUAF). We demonstrate the effectiveness of PEUAF through systematic and comprehensive experiments on various industrial and image datasets, including CIFAR10, Tiny-ImageNet, and ImageNet. Moreover, we significantly generalize the family of super-expressive activation functions, whose existence has been demonstrated in several recent works by showing that any continuous function can be approximated to any desired accuracy by a fixed-size network with a specific super-expressive activation function. Specifically, our work addresses two major bottlenecks in impeding the development of super-expressive activation functions: the limited identification of super-expressive functions, which raises doubts about their broad applicability, and their often peculiar forms, which lead to skepticism regarding their scalability and practicality in real-world applications.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# ディープニューラルネットワークのダイナミクスの理解に向けてのスケール不変な診断手法

A Scale-Invariant Diagnostic Approach Towards Understanding Dynamics of Deep Neural Networks ( http://arxiv.org/abs/2407.09585v1 )

ライセンス: Link先を確認

Ambarish Moharil, Damian Tamburri, Indika Kumara, Willem-Jan Van Den Heuvel, Alireza Azarfar,

(参考訳) 本稿では,複雑な接続系の非線形力学を解析・説明するために,textit{Fractal Geometry} を用いたスケール不変手法を提案する。ディープニューラルネットワーク(DNN)におけるアーキテクチャ的自己相似性を活用することにより、フラクタル次元と \textit{roughness} を定量化し、それらの力学を深く理解し、 \textit{intrinsic} 説明の質を高める。提案手法はカオス理論の原理を統合し,フラクタル進化の可視化を改善し,グラフベースニューラルネットワークを用いてネットワークトポロジを再構築する。この戦略は,コネクショニスト人工知能(AI)システムの説明可能性の向上を目的としている。

This paper introduces a scale-invariant methodology employing \textit{Fractal Geometry} to analyze and explain the nonlinear dynamics of complex connectionist systems. By leveraging architectural self-similarity in Deep Neural Networks (DNNs), we quantify fractal dimensions and \textit{roughness} to deeply understand their dynamics and enhance the quality of \textit{intrinsic} explanations. Our approach integrates principles from Chaos Theory to improve visualizations of fractal evolution and utilizes a Graph-Based Neural Network for reconstructing network topology. This strategy aims at advancing the \textit{intrinsic} explainability of connectionist Artificial Intelligence (AI) systems.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# Sparse Mixture-of-Expertsにおけるタスク非依存プルーニングのエキスパート知識の多様化

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts ( http://arxiv.org/abs/2407.09590v1 )

ライセンス: Link先を確認

Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao,

(参考訳) モデルパラメータを増大させるが、タスクの実行時にわずかに活性化することにより、Mixture-of-Experts (MoE)アーキテクチャの使用は、推論コストを増大させることなく、LLM(Large Language Models)の性能を大幅に向上させる。しかし、専門家の増加によるメモリ消費量の増加は、これらのモデルを多くの実環境に展開する上での課題となっている。実験によっては、一部の専門家が事前トレーニング中に冗長な知識をエンコードしていることが明らかになりました。そこで本研究では,モデルパラメータの効率を向上させるために,類似の専門家をグループ化して抽出する手法を提案する。本手法の有効性を,Mixtral-8x7BとMixtral-8x22Bの2種類のMoEモデルを用いて評価した。評価の結果,本手法は様々な自然言語タスクにおいて,他のモデルプルーニング手法よりも優れていることがわかった。今後の研究を容易にするため、コードと刈り取られたMoEモデルをリリースします。

By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve model's parameter efficiency. We validate the effectiveness of our method by pruning two state-of-the-art MoE models, Mixtral-8x7B and Mixtral-8x22B. Evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks. To facilitate future research, we will release our code and the pruned MoE models.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# 規則順守に向けて: 処理活動の抽出に向けた数発の学習アプローチ

Toward Regulatory Compliance: A few-shot Learning Approach to Extract Processing Activities ( http://arxiv.org/abs/2407.09592v1 )

ライセンス: Link先を確認

Pragyan K C, Rambod Ghandiparsi, Rocky Slavin, Sepideh Ghanavati, Travis Breaux, Mitra Bokaei Hosseini,

(参考訳) モバイルアプリケーションの普及によって業界は成長し、企業はターゲット広告やパーソナライズされたオファリングといったサービスのユーザーデータに大きく依存している。この文脈では、GDPR(General Data Protection Regulation)のようなプライバシー規制が重要な役割を果たす。 GDPRの要件の1つは、企業による処理記録(RoPA)の維持である。 RoPAには、データ処理アクティビティの記述、その目的、関連するデータの種類、その他の関連する外部エンティティなど、さまざまな詳細が含まれている。小さなアプリ開発企業は、リソースの制限と厳しいタイムラインのために、このようなコンプライアンス要件を満たすことの難しさに直面している。そこで本稿では,大規模な言語モデル(LLM)を用いて,ユーザによる使用シナリオからRoPAのセグメントを生成する手法を提案する。提案手法では,GPT-3.5 Turboを用いて,使用シナリオを要約し,RoPAセグメントを生成する。要約タスクでは,数発学習のプロンプトにおけるサンプル数,反復回数,命令順順の順列など,数発学習性能の整合性に影響を与えるさまざまな要因を評価した。本研究は,F1得点が複数回繰り返して無視可能な変動性を示す一方で,F1得点の総和化における実例数の影響を顕著に示すものである。提案手法は,平均70%のROUGE-L F1スコアで処理アクティビティの要約を成功させる。最後に、生成された要約を手動で評価することで、結果を改善する方法について議論する。

The widespread use of mobile applications has driven the growth of the industry, with companies relying heavily on user data for services like targeted advertising and personalized offerings. In this context, privacy regulations such as the General Data Protection Regulation (GDPR) play a crucial role. One of the GDPR requirements is the maintenance of a Record of Processing Activities (RoPA) by companies. RoPA encompasses various details, including the description of data processing activities, their purposes, types of data involved, and other relevant external entities. Small app-developing companies face challenges in meeting such compliance requirements due to resource limitations and tight timelines. To aid these developers and prevent fines, we propose a method to generate segments of RoPA from user-authored usage scenarios using large language models (LLMs). Our method employs few-shot learning with GPT-3.5 Turbo to summarize usage scenarios and generate RoPA segments. We evaluate different factors that can affect few-shot learning performance consistency for our summarization task, including the number of examples in few-shot learning prompts, repetition, and order permutation of examples in the prompts. Our findings highlight the significant influence of the number of examples in prompts on summarization F1 scores, while demonstrating negligible variability in F1 scores across multiple prompt repetitions. Our prompts achieve successful summarization of processing activities with an average 70% ROUGE-L F1 score. Finally, we discuss avenues for improving results through manual evaluation of the generated summaries.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# 交差量子相転移に対する非断熱量子最適化

Non-Adiabatic Quantum Optimization for Crossing Quantum Phase Transitions ( http://arxiv.org/abs/2407.09596v1 )

ライセンス: Link先を確認

András Grabarits, Federico Balducci, Barry C. Sanders, Adolfo del Campo,

(参考訳) 有限時間における量子相転移における多体量子系の基底状態の最適駆動について考察する。この文脈では、遷移を駆動する制御パラメータのスケジュールを調整することにより、断熱による励起を最小化することができる。 Kibble-Zurek 機構からインスピレーションを得た上で,複数の最適制御手順に対する断熱開始の時間スケールを特徴付ける。解析の結果,ローランド・セルフの局所断熱駆動や量子断熱ブラキストクロネのように局所断熱に依存するスケジュールでは,横場イジングモデルと長距離北エフモデルでは,断熱進化の大幅なスピードアップが得られないことが判明した。代替として,ランダウ・ツェナーの公式を利用して高励起状態の役割を考慮した非断熱量子最適化(Non-Adiabatic Quantum Optimization,NAQO)を提案する。 NAQOは、正確に解けるモデルに限らず、混乱した非可積分モデルにおいて、その優れた性能をさらに確認する。

We consider the optimal driving of the ground state of a many-body quantum system across a quantum phase transition in finite time. In this context, excitations caused by the breakdown of adiabaticity can be minimized by adjusting the schedule of the control parameter that drives the transition. Drawing inspiration from the Kibble-Zurek mechanism, we characterize the timescale of onset of adiabaticity for several optimal control procedures. Our analysis reveals that schedules relying on local adiabaticity, such as Roland-Cerf's local adiabatic driving and the quantum adiabatic brachistochrone, fail to provide a significant speedup over the adiabatic evolution in the transverse-field Ising and long-range Kitaev models. As an alternative, we introduce a novel framework, Non-Adiabatic Quantum Optimization (NAQO), that, by exploiting the Landau-Zener formula and taking into account the role of higher-excited states, outperforms schedules obtained via both local adiabaticity and state-of-the-art numerical optimization. NAQO is not restricted to exactly solvable models, and we further confirm its superior performance in a disordered non-integrable model.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# 双曲性スピン液体

Hyperbolic Spin Liquids ( http://arxiv.org/abs/2407.09601v1 )

ライセンス: Link先を確認

Patrick M. Lenggenhager, Santanu Dey, Tomáš Bzdušek, Joseph Maciejko,

(参考訳) 双曲格子は、結晶多体物理学の伝統的なパラダイムを超えて、負の曲面空間における相関現象を探求するユニークな機会を与える。そのような研究の理論的ベンチマークとして、北エフのスピン-1/2ハニカムモデルを双曲格子に拡張し、非ユークリッド空間群対称性を利用してモデルを正確に解く。我々は、$\{8,3\}$格子上の基底状態相図を解明し、アベリア異性体とのギャップ付き$\mathbb{Z}2$スピン液体、非アベリア異性体とキラルエッジ状態のギャップ付きキラルスピン液体、マヨラナフェルミオンの非アベリアブロッホ状態が支配する状態の低エネルギー密度の圧縮可能なスピン液体を求める。

Hyperbolic lattices present a unique opportunity to venture beyond the conventional paradigm of crystalline many-body physics and explore correlated phenomena in negatively curved space. As a theoretical benchmark for such investigations, we extend Kitaev's spin-1/2 honeycomb model to hyperbolic lattices and exploit their non-Euclidean space-group symmetries to solve the model exactly. We elucidate the ground-state phase diagram on the $\{8,3\}$ lattice and find a gapped $\mathbb{Z}_2$ spin liquid with Abelian anyons, a gapped chiral spin liquid with non-Abelian anyons and chiral edge states, and a compressible spin liquid with low-energy density of states dominated by non-Abelian Bloch states of Majorana fermions.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# 機械学習を用いた二元中性子星のリアルタイム重力波推定

Real-time gravitational-wave inference for binary neutron stars using machine learning ( http://arxiv.org/abs/2407.09602v1 )

ライセンス: Link先を確認

Maximilian Dax, Stephen R. Green, Jonathan Gair, Nihar Gupte, Michael Pürrer, Vivien Raymond, Jonas Wildberger, Jakob H. Macke, Alessandra Buonanno, Bernhard Schölkopf,

(参考訳) 二元中性子星(BNS)の融合は重力波(GW)と電磁スペクトル(EM)の両方で信号を放出する。有名なことに、GW170817のマルチセンサー観測は、宇宙論、核物理学、重力の科学的な発見につながった。これらの結果の中心は、GW170817の場合、GW信号の11時間後、関連するEM過渡性AT 2017gfoを特定するのに役立った、GWデータから得られる空の局在と距離である。 GWデータの高速解析は、時間に敏感なEM観測を誘導するために重要であるが、信号の長さと複雑さから生じる問題のため、精度を犠牲にする近似を行う必要があることが多い。そこで我々は,そのような近似を行なわずに,1秒で完全なBNS推論を行う機械学習手法を開発した。これは、物理的なドメイン知識をニューラルネットワークに明示的に統合する新しい方法によって実現されている。提案手法によるマルチメーカ観測の促進一合併前の正確な位置決め (ii) 近似低遅延法と比較して印加精度を$\sim30\%$で改善し, 三光度距離、傾き、質量の詳細な情報で、高価な望遠鏡の時間を優先することができる。さらに,本手法の柔軟性とコスト削減により,状態方程式および波形体系研究の新しい機会が開かれた。最後に,提案手法は最大1時間までの超長信号にスケールし,次世代地上・宇宙用検出器のデータ解析の青写真として機能することを示す。

Mergers of binary neutron stars (BNSs) emit signals in both the gravitational-wave (GW) and electromagnetic (EM) spectra. Famously, the 2017 multi-messenger observation of GW170817 led to scientific discoveries across cosmology, nuclear physics, and gravity. Central to these results were the sky localization and distance obtained from GW data, which, in the case of GW170817, helped to identify the associated EM transient, AT 2017gfo, 11 hours after the GW signal. Fast analysis of GW data is critical for directing time-sensitive EM observations; however, due to challenges arising from the length and complexity of signals, it is often necessary to make approximations that sacrifice accuracy. Here, we develop a machine learning approach that performs complete BNS inference in just one second without making any such approximations. This is enabled by a new method for explicit integration of physical domain knowledge into neural networks. Our approach enhances multi-messenger observations by providing (i) accurate localization even before the merger; (ii) improved localization precision by $\sim30\%$ compared to approximate low-latency methods; and (iii) detailed information on luminosity distance, inclination, and masses, which can be used to prioritize expensive telescope time. Additionally, the flexibility and reduced cost of our method open new opportunities for equation-of-state and waveform systematics studies. Finally, we demonstrate that our method scales to extremely long signals, up to an hour in length, thus serving as a blueprint for data analysis for next-generation ground- and space-based detectors.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# 変形SYKモデルにおけるクリロフ複雑性とカオス

Krylov complexity and chaos in deformed SYK models ( http://arxiv.org/abs/2407.09604v1 )

ライセンス: Link先を確認

Shira Chapman, Saskia Demulder, Damián A. Galante, Sameer U. Sheorey, Osher Shoval,

(参考訳) クリロフ複雑性は、カオスの量子プローブとして最近提案されている。クリロフ複雑性の指数的成長を特徴づけるクリロフ指数は、リャプノフ指数の上界に予想される。 Sachdev-Ye-Kitaevモデルにおけるクリロフ指数とリャプノフ指数を、その変形のいくつかで計算する。この解析は、フェルミオン相互作用の数が有限かつ無限であるモデルにおいて、無限温度と有限温度の両方で行う。本研究では,2つの領域間を交差する変形と,低温でほぼ可積分となる変形を考察する。いずれの場合も、クリロフ指数がリャプノフ指数の上界であることが分かる。しかし、リアプノフ指数は温度関数として非単調な振舞いを持つことができるが、すべての研究例において、クリロフ指数は単調に振舞う。例えば、リャプノフ指数が低温でゼロとなるモデルを見つけ、一方、クリロフ指数はその極大境界に飽和する。この単調性は、ユニタリ進化の下で進化する量子系におけるクリロフ指数の一般的な特徴である可能性があると推測する。

Krylov complexity has recently been proposed as a quantum probe of chaos. The Krylov exponent characterising the exponential growth of Krylov complexity is conjectured to upper-bound the Lyapunov exponent. We compute the Krylov and the Lyapunov exponents in the Sachdev-Ye-Kitaev model and in some of its deformations. We do this analysis both at infinite and finite temperatures, in models where the number of fermionic interactions is both finite and infinite. We consider deformations that interpolate between two regions of near-maximal chaos and deformations that become nearly-integrable at low temperatures. In all cases, we find that the Krylov exponent upper-bounds the Lyapunov one. However, we find that while the Lyapunov exponent can have non-monotonic behaviour as a function of temperature, in all studied examples the Krylov exponent behaves monotonically. For instance, we find models where the Lyapunov exponent goes to zero at low temperatures, while the Krylov exponent saturates to its maximal bound. We speculate on the possibility that this monotonicity might be a generic feature of the Krylov exponent in quantum systems evolving under unitary evolution.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# 量子に着想を得た数値解析のための行列積状態におけるチェビシェフ近似と関数の合成

Chebyshev approximation and composition of functions in matrix product states for quantum-inspired numerical analysis ( http://arxiv.org/abs/2407.09609v1 )

ライセンス: Link先を確認

Juan José Rodríguez-Aldavero, Paula García-Molina, Luca Tagliacozzo, Juan José García-Ripoll,

(参考訳) 本研究では,一変量および多変量関数を行列積状態(MPS, Quantized tensor-trains, QTT)として表す。解析的かつ高度に微分可能な関数をMPSチェビシェフ補間子として表現するために,反復的なチェビシェフ展開とクレショー評価を用いるアルゴリズムを提案する。これは高微分可能な函数に対する急速な収束を示し、理論的な予測と整合し、多次元のシナリオに効率的に一般化する。このアルゴリズムの性能は, テンソルクロス補間 (TCI) やマルチスケール補間構造と比較し, 総合的な比較を行った。関数評価が安価である場合や、その関数が解析的でない場合、TCIは一般に関数ローディングにおいてより効率的である。しかし,提案手法は,特定の多変量シナリオにおいてTCIよりも優れた性能を示す。さらに,MPSにおける関数合成の枠組みを提供することにより,より広い範囲のタスクに拡張可能なスケーリング率を示し,多体統計学や非線形問題に有用である。

This work explores the representation of univariate and multivariate functions as matrix product states (MPS), also known as quantized tensor-trains (QTT). It proposes an algorithm that employs iterative Chebyshev expansions and Clenshaw evaluations to represent analytic and highly differentiable functions as MPS Chebyshev interpolants. It demonstrates rapid convergence for highly-differentiable functions, aligning with theoretical predictions, and generalizes efficiently to multidimensional scenarios. The performance of the algorithm is compared with that of tensor cross-interpolation (TCI) and multiscale interpolative constructions through a comprehensive comparative study. When function evaluation is inexpensive or when the function is not analytical, TCI is generally more efficient for function loading. However, the proposed method shows competitive performance, outperforming TCI in certain multivariate scenarios. Moreover, it shows advantageous scaling rates and generalizes to a wider range of tasks by providing a framework for function composition in MPS, which is useful for non-linear problems and many-body statistical physics.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# Heterophilic Graph Learning Handbook:ベンチマーク、モデル、理論的分析、応用と課題

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges ( http://arxiv.org/abs/2407.09618v1 )

ライセンス: Link先を確認

Sitao Luan, Chenqing Hua, Qincheng Lu, Liheng Ma, Lirong Wu, Xinyu Wang, Minkai Xu, Xiao-Wen Chang, Doina Precup, Rex Ying, Stan Z. Li, Jian Tang, Guy Wolf, Stefanie Jegelka,

(参考訳) ホモフィリ原理では、同じラベルや類似属性を持つ \ie{} ノードが接続される可能性が高いため、グラフ構造化されたデータ、特にノードレベルのタスクにおいて、従来のニューラルネットワーク(NN)よりもグラフニューラルネットワーク(GNN)の方が優れていると一般的に信じられている。しかし、最近の研究は、GNNのパフォーマンスとNNのパフォーマンスが満足できないような、非自明なデータセットのセットを特定している。ヘテロフィリー、すなわち低ホモフィリーは、この経験的観察の主要な原因と考えられている。人々は、グラフ変換器とその変種を含む、既存のほとんどのグラフモデルを再検討し、様々な種類のグラフ、例えば異種グラフ、時間グラフ、ハイパーグラフの異種シナリオで再評価し始めている。さらに、グラフ関連の多くの応用がヘテロフィリー問題と密接に関連していることが判明した。ここ数年、ヘテロフィリ問題の研究と解決に多大な努力が注がれている。本調査では, ヘテロフィリックグラフ学習の最近の進歩を概観し, ベンチマークデータセットの概説, 合成グラフ上のホモフィリックメトリクスの評価, 最新の教師付きおよび教師なし学習手法の巧妙な分類, ホモフィリ・ヘテロフィリィ理論解析の徹底的な消化, ヘテロフィリ関連アプリケーションの広範な探索などについて概説する。特に、詳細な実験を通じて、私たちは、ベンチマークヘテロ親和性データセットを、悪性、良性、曖昧なヘテロフィリーの3つのサブカテゴリに分類しました。悪性および曖昧なデータセットは、ヘテロフィリーチャレンジにおける新しいモデルの有効性をテストするための真の挑戦データセットとして特定される。最後に,異種グラフ表現学習における課題と今後の方向性を提案する。

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

翻訳日:2024-07-16 21:38:05 公開日:2024-07-12

# 集積ワイヤによる捕捉イオンのドレッシング

Dressing trapped ions with integrated wires ( http://arxiv.org/abs/2407.09623v1 )

ライセンス: Link先を確認

R. Tyler Sutherland,

(参考訳) トラップ集積ワイヤの近接場におけるトラップイオンのドレッシングについて検討する。手術前後にドレッシングフィールドをアディバティカルに打ち込むと、有効ハミルトニアンが変化する。ドレッシング場(英語版)の振幅と変形は、任意の操作の特性を「コトミズ」するために使用できる調整可能な自由度として作用する。この汎用ツールには3つのユースケースを提案する。まず「人工的な」クロック状態を生成し、量子ビットの線形感度を(小さいと仮定される)排除する。第二に、低い量子化場においてシェルビングを複雑にするデジネラシーを分解することができる。最後に、他のコンピュータから周波数空間で分離されたフィールドを用いて、レーザーフリーの単一量子ビットゲートを「ターゲット」イオンの集合上に実装することができる。

We discuss dressing trapped ions with the near field of a trap integrated wire. Ramping a dressing field on/off adiabatically before/after an operation changes its effective Hamiltonian. The amplitude and detuning of the dressing field act as tunable degrees of freedom we can use to `customize' the properties of any operation. We propose three use cases for this general tool. First, we can generate `artificial' clock states, where we eliminate the (assumed to be small) linear sensitivity of a qubit. Second, we can break the degeneracies that often complicate shelving at low quantization fields\textemdash allowing us to implement operations with linearly polarized microwaves that would, otherwise, require circular polarization. Finally, we can implement laser-free single qubit gates on a set of `target' ions using fields that are separated from the rest of the computer in frequency space.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# 準備-変換-測定シナリオにおける非テクスチュアリティの不等式

Noncontextuality inequalities for prepare-transform-measure scenarios ( http://arxiv.org/abs/2407.09624v1 )

ライセンス: Link先を確認

David Schmid, Roberto D. Baldijão, John H. Selby, Ana Belén Sainz, Robert W. Spekkens,

(参考訳) 我々は,準備-変換-測定シナリオにおいて,文脈性の証人を導出するための最初の体系的手法を提供する。より具体的には、線形量化器の除去が、そのようなシナリオにおける一般化された非コンテクスト性と整合した相関関係のポリトープを計算するためにどのように使用できるかを示す。このポリトープは、図の保存からいくつかの制約を無視した場合に、任意の線形操作単位に対して古典的な説明を認めるシナリオにおける観測データに必要な、必要かつ十分な条件の集合として指定される。これらの後者の制約を含むと、一般により厳密な不等式につながるが、非線形量子化器の除去はそれらを体系的に含む必要がある。また,準備-変換-測定実験において発生した数値データの非古典性を証明できる線形プログラムを提案する。この結果を用いて、安定部分定理に違反する可能性のある変換に対して、頑健な非コンテクスト性不等式を得る。最後に、与えられた状態の集合、変換の集合、あるいは測定の集合で保持されるすべての線形操作IDを計算するための単純なアルゴリズムを与える。

We provide the first systematic technique for deriving witnesses of contextuality in prepare-transform-measure scenarios. More specifically, we show how linear quantifier elimination can be used to compute a polytope of correlations consistent with generalized noncontextuality in such scenarios. This polytope is specified as a set of noncontextuality inequalities that are necessary and sufficient conditions for observed data in the scenario to admit of a classical explanation relative to any linear operational identities, if one ignores some constraints from diagram preservation. While including these latter constraints generally leads to tighter inequalities, it seems that nonlinear quantifier elimination would be required to systematically include them. We also provide a linear program which can certify the nonclassicality of a set of numerical data arising in a prepare-transform-measure experiment. We apply our results to get a robust noncontextuality inequality for transformations that can be violated within the stabilizer subtheory. Finally, we give a simple algorithm for computing all the linear operational identities holding among a given set of states, of transformations, or of measurements.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# 機械学習タイムプロパゲータによる電子動力学シミュレーションの高速化

Accelerating Electron Dynamics Simulations through Machine Learned Time Propagators ( http://arxiv.org/abs/2407.09628v1 )

ライセンス: Link先を確認

Karan Shah, Attila Cangi,

(参考訳) 時間依存密度汎関数理論(TDDFT)は、レーザー場のような様々な外部摂動下での電子力学を研究するために広く用いられる手法である。本研究では, 自己回帰型ニューラル演算子を電子密度の時間プロパゲータとして利用して, リアルタイムTDDFTに基づく電子動力学シミュレーションを高速化する新しい手法を提案する。物理インフォームド制約と高分解能トレーニングデータを活用することにより,従来の数値解法と比較して精度と計算速度が向上する。一次元二原子分子のクラスにおけるモデルの有効性を実証する。この方法は、様々な実験パラメータを持つレーザー照射された分子や材料のリアルタイム・オンザフライモデリングを可能にする可能性がある。

Time-dependent density functional theory (TDDFT) is a widely used method to investigate electron dynamics under various external perturbations such as laser fields. In this work, we present a novel approach to accelerate real time TDDFT based electron dynamics simulations using autoregressive neural operators as time-propagators for the electron density. By leveraging physics-informed constraints and high-resolution training data, our model achieves superior accuracy and computational speed compared to traditional numerical solvers. We demonstrate the effectiveness of our model on a class of one-dimensional diatomic molecules. This method has potential in enabling real-time, on-the-fly modeling of laser-irradiated molecules and materials with varying experimental parameters.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# 極端における顆粒の因果関係

Granger Causality in Extremes ( http://arxiv.org/abs/2407.09632v1 )

ライセンス: Link先を確認

Juraj Bodik, Olivier Pasche,

(参考訳) 本稿では,時系列における極端事象からの因果関係の同定を目的とした,極端におけるグランガー因果関係の厳密な数学的枠組みを提案する。グランガー因果関係は、時間変化変数間の方向関係を明らかにする上で重要な役割を果たす。この概念は極端かつ非常に不安定な期間に重要性を増すが、最先端の手法は主に分布の本体内の因果性に焦点を当てており、しばしば極端な出来事にのみ現れる因果的メカニズムを見落としている。本フレームワークは, 因果尾係数を利用して, 主に極端な事象から因果関係を推定するように設計されている。我々は、極端な因果関係と(古典的な)グランガー因果関係、シムズ因果関係、構造因果関係などの他の因果関係の概念の等価性を確立する。 Grangerの因果関係の他の重要な性質を極端に証明し、このフレームワークが隠れた共同創設者の存在下で特に有用であることを示す。また,データから極端にグランガー因果性が存在することを検出する新しい推論手法を提案する。提案手法はモデルフリーであり, 非線形・高次元時系列処理が可能であり, 性能, 速度の両面において, 現状の手法よりも優れており, 財務・極端気象観測におけるコヒーレントな効果を明らかにすることができた。

We introduce a rigorous mathematical framework for Granger causality in extremes, designed to identify causal links from extreme events in time series. Granger causality plays a pivotal role in uncovering directional relationships among time-varying variables. While this notion gains heightened importance during extreme and highly volatile periods, state-of-the-art methods primarily focus on causality within the body of the distribution, often overlooking causal mechanisms that manifest only during extreme events. Our framework is designed to infer causality mainly from extreme events by leveraging the causal tail coefficient. We establish equivalences between causality in extremes and other causal concepts, including (classical) Granger causality, Sims causality, and structural causality. We prove other key properties of Granger causality in extremes and show that the framework is especially helpful under the presence of hidden confounders. We also propose a novel inference method for detecting the presence of Granger causality in extremes from data. Our method is model-free, can handle non-linear and high-dimensional time series, outperforms current state-of-the-art methods in all considered setups, both in performance and speed, and was found to uncover coherent effects when applied to financial and extreme weather observations.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# Gibbs状態生成のための散逸変動量子アルゴリズム

Dissipative variational quantum algorithms for Gibbs state preparation ( http://arxiv.org/abs/2407.09635v1 )

ライセンス: Link先を確認

Yigal Ilin, Itai Arad,

(参考訳) 近年、変動量子アルゴリズム(VQA)は、短期量子ハードウェアへの適応性と効率性から注目されている。彼らは線形代数、探索問題、ギブズ、基底状態の準備を含む様々なタスクにポテンシャルを示してきた。それでも、現在の量子ハードウェアにおけるノイズの存在は、その性能を著しく制限している。本研究では、変分量子回路の本質的な部分として、qubit RESETや確率ゲートなどの散逸演算を組み込むことにより、散逸変動量子アルゴリズム(D-VQA)を導入する。このような散逸的変分アルゴリズムは、散逸的雑音に対して自然な弾力性を持つと主張する。このようなアルゴリズムは、広範囲の量子多体ハミルトンと温度でギブス状態を作ることができ、コヒーレントノイズと非コヒーレントノイズの両方による誤差を著しく低減することができる。このアプローチのもう1つの利点は、アンシラキュービットが不要であることです。我々は,NISQデバイス上での変動量子計算の堅牢性と精度を高めるため,D-VQAの可能性を強調した。

In recent years, variational quantum algorithms (VQAs) have gained significant attention due to their adaptability and efficiency on near-term quantum hardware. They have shown potential in a variety of tasks, including linear algebra, search problems, Gibbs and ground state preparation. Nevertheless, the presence of noise in current day quantum hardware, severely limits their performance. In this work, we introduce dissipative variational quantum algorithms (D-VQAs) by incorporating dissipative operations, such as qubit RESET and stochastic gates, as an intrinsic part of a variational quantum circuit. We argue that such dissipative variational algorithms posses some natural resilience to dissipative noise. We demonstrate how such algorithms can prepare Gibbs states over a wide range of quantum many-body Hamiltonians and temperatures, while significantly reducing errors due to both coherent and non-coherent noise. An additional advantage of our approach is that no ancilla qubits are need. Our results highlight the potential of D-VQAs to enhance the robustness and accuracy of variational quantum computations on NISQ devices.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# Seq-to-Final: シーケンス分布から最終時点へのチューニングベンチマーク

Seq-to-Final: A Benchmark for Tuning from Sequential Distributions to a Final Time Point ( http://arxiv.org/abs/2407.09642v1 )

ライセンス: Link先を確認

Christina X Ji, Ahmed M Alaa, David Sontag,

(参考訳) 時間とともに分布の変化は、多くの設定で起こる。歴史データの活用は、最終期間中に限られたデータが利用できる最後の時点のモデルを学ぶために必要だが、この目的のために特別に開発された手法はほとんどない。本研究では,3種類の方法の有効性を評価するために,異なる順序の合成シフトを用いたベンチマークを構築した。 1)最終期間に適応することなく、すべてのデータから学ぶこと。 2 シーケンシャルな性質によらず、歴史資料から学び、最終期間に順応し、 3)モデルを最終期間に調整する場合に、履歴データのシーケンシャルな性質を活用する。我々はこのベンチマークをSeq-to-Finalと呼び、最終時点のモデルを学習するために一連の時間を用いて焦点を合わせる。我々の総合ベンチマークにより、ユーザーは異なるタイプのシフトでシーケンスを構築でき、異なる方法を比較することができる。 CIFAR-10とCIFAR-100をベース画像として用いた画像分類タスクに着目する。また、Portraitsデータセット上の同じ手法を評価し、時間とともに現実のシフトとの関連性を探る。最後に、最終段階において異なるメソッドの初期化と更新を対比する視覚化を作成します。この結果から, ベンチマークのシーケンスに対して, 逐次構造を無視し, 最終時点に適応する手法は良好に動作することが示唆された。シーケンシャルな性質を活用するアプローチは、いかなる改善も提供しません。このベンチマークは、シーケンシャルな履歴データを活用するのに優れた新しいアルゴリズムの開発や、シーケンシャルな性質を無視した方法の深い理解を促すことを願っている。

Distribution shift over time occurs in many settings. Leveraging historical data is necessary to learn a model for the last time point when limited data is available in the final period, yet few methods have been developed specifically for this purpose. In this work, we construct a benchmark with different sequences of synthetic shifts to evaluate the effectiveness of 3 classes of methods that 1) learn from all data without adapting to the final period, 2) learn from historical data with no regard to the sequential nature and then adapt to the final period, and 3) leverage the sequential nature of historical data when tailoring a model to the final period. We call this benchmark Seq-to-Final to highlight the focus on using a sequence of time periods to learn a model for the final time point. Our synthetic benchmark allows users to construct sequences with different types of shift and compare different methods. We focus on image classification tasks using CIFAR-10 and CIFAR-100 as the base images for the synthetic sequences. We also evaluate the same methods on the Portraits dataset to explore the relevance to real-world shifts over time. Finally, we create a visualization to contrast the initializations and updates from different methods at the final time step. Our results suggest that, for the sequences in our benchmark, methods that disregard the sequential structure and adapt to the final time point tend to perform well. The approaches we evaluate that leverage the sequential nature do not offer any improvement. We hope that this benchmark will inspire the development of new algorithms that are better at leveraging sequential historical data or a deeper understanding of why methods that disregard the sequential nature are able to perform well.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# OXN -- クラウドネイティブアプリケーションのための自動可観測性評価

OXN -- Automated Observability Assessments for Cloud-Native Applications ( http://arxiv.org/abs/2407.09644v1 )

ライセンス: Link先を確認

Maria C. Borges, Joshua Bauer, Sebastian Werner,

(参考訳) マイクロサービスアプリケーションの信頼性を保証するためには、可観測性が重要です。これらのアプリケーションは、異種環境にデプロイされる多くの独立したサービスがあるため、しばしば障害を起こしやすい。正しく"使用される場合、オブザーバビリティは、開発者が障害を素早く特定し、トラブルシュートするのに役立ちます。しかしながら、マイクロサービスアプリケーションの可観測性の測定と設定は簡単ではなく、ツールに依存し、コストに結びついている。実践者は、異なる可観測性設計の選択肢を重み付けするために、可観測性に関連するトレードオフを理解する必要がある。それでも、これらのアーキテクチャ設計決定は体系的な手法ではサポートされず、通常単に「専門的な直観」に依存している。具体的な証拠とともに可観測性設計のトレードオフを評価するため,様々な設計代替品を比較する実験を行うことを提唱する。組織的で反復可能な実験プロセスを達成するには、自動化が必要です。本稿では,実験ツール-Observability eXperiment eNgine (OXN) の概念実証実装について述べる。 OXNはChaos Engineeringに似た任意のフォールトをアプリケーションに注入することができるが、可観測性の設定を変更するユニークな機能も備えており、これまで探索されていなかった設計決定の直接的な評価を可能にしている。

Observability is important to ensure the reliability of microservice applications. These applications are often prone to failures, since they have many independent services deployed on heterogeneous environments. When employed "correctly", observability can help developers identify and troubleshoot faults quickly. However, instrumenting and configuring the observability of a microservice application is not trivial but tool-dependent and tied to costs. Practitioners need to understand observability-related trade-offs in order to weigh between different observability design alternatives. Still, these architectural design decisions are not supported by systematic methods and typically just rely on "professional intuition". To assess observability design trade-offs with concrete evidence, we advocate for conducting experiments that compare various design alternatives. Achieving a systematic and repeatable experiment process necessitates automation. We present a proof-of-concept implementation of an experiment tool - Observability eXperiment eNgine (OXN). OXN is able to inject arbitrary faults into an application, similar to Chaos Engineering, but also possesses the unique capability to modify the observability configuration, allowing for the straightforward assessment of design decisions that were previously left unexplored.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# 強化学習におけるハミルトン・ヤコビの到達可能性に関する調査

Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey ( http://arxiv.org/abs/2407.09645v1 )

ライセンス: Link先を確認

Milan Ganai, Sicun Gao, Sylvia Herbert,

(参考訳) 近年の文献では、安全保証を維持しつつ、高い性能で制御ポリシーを学習するアプローチが提案されている。ハミルトン・ヤコビ・リーチブル・セット(HJ)の合成は、複雑な高次元システムに対する強化学習に基づく制御ポリシーの訓練の安全性を検証し、監督するための有効なツールとなっている。以前は、HJの到達性は低次元の動的システムの検証に限られていた。これは、それが依存する動的プログラミングアプローチの計算複雑性が、システム状態の数とともに指数関数的に増加するためである。この制限に対処するため、近年では、真の到達可能な集合の信頼性を保ちながら、HJ到達可能性分析をスケールする学習制御ポリシと同時に、到達可能性値関数を計算する方法が提案されている。これらのHJ到達可能性近似は、学習された制御ポリシーの安全性の向上や、報酬のパフォーマンス向上に利用され、動的障害やライダーベースや視覚に基づく観察といった課題を解決することができる。本稿では,高次元システムにおける信頼性のさらなる研究の基盤となる強化学習におけるHJ到達可能性評価の分野における最近の展開を概観する。

Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was limited to verifying low-dimensional dynamical systems -- this is because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. To address this limitation, in recent years, there have been methods that compute the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# ハンバ:グラフ誘導バイスキャンマンバを用いたシングルビュー3Dハンドコンストラクション

Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba ( http://arxiv.org/abs/2407.09646v1 )

ライセンス: Link先を確認

Haoye Dong, Aviral Chharia, Wenbo Gou, Francisco Vicente Carrasco, Fernando De la Torre,

(参考訳) 1枚のRGB画像からの3Dハンド再構成は、関節運動、自己閉塞、物体との相互作用により困難である。既存のSOTA法では3次元ハンドポーズと形状を学習するためにアテンションベースのトランスフォーマーが用いられているが, 接合空間関係のモデリングが不十分なため, 頑健で正確な性能が得られなかった。この問題に対処するために,グラフ学習と状態空間モデリングを橋渡しするHambaというグラフ誘導型Mambaフレームワークを提案する。私たちの中核となる考え方は、マンバのスキャンをグラフ誘導の双方向走査に再構成し、いくつかの効果的なトークンを使って3D再構成することです。これにより、再構成性能を向上させるために、結合関係と空間配列を学習することができる。具体的には、グラフ構造関係と関節の空間配列を学習し、注意に基づく手法よりも88.5%少ないトークンを使用する新しいグラフ誘導状態空間(GSS)ブロックを設計する。さらに、我々は、フュージョンモジュールを使用して状態空間機能とグローバル機能を統合する。 GSSブロックと融合モジュールを利用することで、Hambaはグラフ誘導状態空間モデリング機能を効果的に活用し、グローバルとローカルの機能を共同で検討してパフォーマンスを向上させる。いくつかのベンチマークや実験において、ハンバは既存のSOTAよりも大幅に優れており、FreiHANDでは5.3mmとF@15mmのPA-MPVPEを達成している。ハンバは現在、3Dハンドリコンストラクションで2つの挑戦的リーダーボードで1位にランクインしている。コードは受理後利用可能になる。 [Website] (https://humansensinglab.github.io/Hamba/)

3D Hand reconstruction from a single RGB image is challenging due to the articulated motion, self-occlusion, and interaction with objects. Existing SOTA methods employ attention-based transformers to learn the 3D hand pose and shape, but they fail to achieve robust and accurate performance due to insufficient modeling of joint spatial relations. To address this problem, we propose a novel graph-guided Mamba framework, named Hamba, which bridges graph learning and state space modeling. Our core idea is to reformulate Mamba's scanning into graph-guided bidirectional scanning for 3D reconstruction using a few effective tokens. This enables us to learn the joint relations and spatial sequences for enhancing the reconstruction performance. Specifically, we design a novel Graph-guided State Space (GSS) block that learns the graph-structured relations and spatial sequences of joints and uses 88.5% fewer tokens than attention-based methods. Additionally, we integrate the state space features and the global features using a fusion module. By utilizing the GSS block and the fusion module, Hamba effectively leverages the graph-guided state space modeling features and jointly considers global and local features to improve performance. Extensive experiments on several benchmarks and in-the-wild tests demonstrate that Hamba significantly outperforms existing SOTAs, achieving the PA-MPVPE of 5.3mm and F@15mm of 0.992 on FreiHAND. Hamba is currently Rank 1 in two challenging competition leaderboards on 3D hand reconstruction. The code will be available upon acceptance. [Website](https://humansensinglab.github.io/Hamba/).

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# 3x2:2次元意味対応による3次元オブジェクト部分分割

3x2: 3D Object Part Segmentation by 2D Semantic Correspondences ( http://arxiv.org/abs/2407.09648v1 )

ライセンス: Link先を確認

Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, Matt Feiszli, James M. Rehg,

(参考訳) 3Dオブジェクト部分のセグメンテーションはコンピュータビジョンアプリケーションに不可欠である。 2Dオブジェクト部分のセグメンテーションでかなりの進歩があったが、この3Dデータセットは、収集に費用がかかる注釈付き3Dデータセットが不足しているため、あまり注目されていない。本研究では,いくつかの注釈付き3次元形状やリッチな注釈付き2次元データセットを活用して3次元オブジェクト部分分割を実現することを提案する。我々は,様々な粒度レベルのベンチマークでSOTA性能を実現する3-By-2という新しい手法を提案する。事前訓練された基礎モデルの特徴を利用し,意味的および幾何学的対応を活用することで,限られた3次元アノテーションの課題を克服することができる。提案手法は利用可能な2次元ラベルを活用し,有効3次元オブジェクト部分分割を実現する。提案手法は,3-By-2で様々な分類・粒度に対応可能であり,異なる対象カテゴリにまたがる興味深い部分ラベル転送能力を示す。プロジェクトウェブサイト: \url{https://ngailapdi.github.io/projects/3by2/}

3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect. In this work, we propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object part segmentation. We present our novel approach, termed 3-By-2 that achieves SOTA performance on different benchmarks with various granularity levels. By using features from pretrained foundation models and exploiting semantic and geometric correspondences, we are able to overcome the challenges of limited 3D annotations. Our approach leverages available 2D labels, enabling effective 3D object part segmentation. Our method 3-By-2 can accommodate various part taxonomies and granularities, demonstrating interesting part label transfer ability across different object categories. Project website: \url{https://ngailapdi.github.io/projects/3by2/}.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# 中国語モデルと中国語モデル : 中国のLLMにおける言語政策の欠如

How Chinese are Chinese Language Models? The Puzzling Lack of Language Policy in China's LLMs ( http://arxiv.org/abs/2407.09652v1 )

ライセンス: Link先を確認

Andrea W Wen-Yi, Unso Eun Seo Jo, Lu Jia Lin, David Mimno,

(参考訳) 現代言語モデルは多言語化が進んでいるが、中国のLLM開発者は言語多様性に関する複雑な政治的・ビジネス的な考察を行わなければならない。中国における言語政策は、公衆の言論に影響を及ぼし、多民族社会を統治することを目的としており、1949年以降、多民族主義からより同化主義的なアプローチへと徐々に移行してきた。これらの影響が現在の言語技術に与える影響について検討する。我々は、中国企業によって18言語で事前訓練された6つのオープンソース多言語LPMを評価し、中国、アジア、アングロ・ヨーロッパ諸言語にまたがる。実験の結果,中国における多言語でのLLMのパフォーマンスは国際LLMと区別できないことがわかった。同様に、これらのモデルの技術的報告は、英語とマンダリン中国語を除いて、データ言語を事前訓練するための考慮の欠如も示している。中国のAI政策、モデル実験、技術報告を見れば、中国のLLM開発における言語多様性のいずれに対しても、一貫性のある政策の兆候は見つからない。これは、中国が人々が毎日使っている言語と言語モデルの開発の両方を規制しているが、言語モデルにおける言語に関するポリシーを持っていない、という厄介な事実を残している。

Contemporary language models are increasingly multilingual, but Chinese LLM developers must navigate complex political and business considerations of language diversity. Language policy in China aims at influencing the public discourse and governing a multi-ethnic society, and has gradually transitioned from a pluralist to a more assimilationist approach since 1949. We explore the impact of these influences on current language technology. We evaluate six open-source multilingual LLMs pre-trained by Chinese companies on 18 languages, spanning a wide range of Chinese, Asian, and Anglo-European languages. Our experiments show Chinese LLMs performance on diverse languages is indistinguishable from international LLMs. Similarly, the models' technical reports also show lack of consideration for pretraining data language coverage except for English and Mandarin Chinese. Examining Chinese AI policy, model experiments, and technical reports, we find no sign of any consistent policy, either for or against, language diversity in China's LLM development. This leaves a puzzling fact that while China regulates both the languages people use daily as well as language model development, they do not seem to have any policy on the languages in language models.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# 情報検索と製品検索のギャップを埋める:eコマースへのQ&A勧告

Bridging the Gap Between Information Seeking and Product Search Systems: Q&A Recommendation for E-commerce ( http://arxiv.org/abs/2407.09653v1 )

ライセンス: Link先を確認

Saar Kuzi, Shervin Malmasi,

(参考訳) ショッピングミッションの消費者は、商品の理解を深め、購入決定に達するための反復的なプロセスにおいて、Web検索エンジンや質問回答(QA)システムのような製品検索と情報検索システムの両方を利用することが多い。商品検索は、購入者が自分の要求を満たす実際の商品をカタログで見つけるのに有用であるが、情報検索システムは、それらの要求を洗練させるために必要なあらゆる質問に答えるために利用することができる。最近、LLM(Large Language Models)の成功により、顧客が目標を迅速に効果的に達成するための2つのタスク間のギャップを埋める機会が開かれた。本稿では,ユーザに対して,製品検索に関連する質問応答(Q&A)ペアを推薦し,購入決定を支援することを提案する。本稿では、Q&Aペアの要件と特性、その生成、Q&Aレコメンデーションタスクの最適化など、問題のさまざまな側面について論じる。我々は、この新興分野における今後の研究を促進するための課題、オープンな課題、そして解決策を提案する。

Consumers on a shopping mission often leverage both product search and information seeking systems, such as web search engines and Question Answering (QA) systems, in an iterative process to improve their understanding of available products and reach a purchase decision. While product search is useful for shoppers to find the actual products meeting their requirements in the catalog, information seeking systems can be utilized to answer any questions they may have to refine those requirements. The recent success of Large Language Models (LLMs) has opened up an opportunity to bridge the gap between the two tasks to help customers achieve their goals quickly and effectively by integrating conversational QA within product search. In this paper, we propose to recommend users Question-Answer (Q&A) pairs that are relevant to their product search and can help them make a purchase decision. We discuss the different aspects of the problem including the requirements and characteristics of the Q&A pairs, their generation, and the optimization of the Q&A recommendation task. We highlight the challenges, open problems, and suggested solutions to encourage future research in this emerging area.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# 量子クエリローバウンドに対する置換重ね合わせオラクル

Permutation Superposition Oracles for Quantum Query Lower Bounds ( http://arxiv.org/abs/2407.09655v1 )

ライセンス: Link先を確認

Christian Majenz, Giulio Malavolta, Michael Walter,

(参考訳) 本稿では,Zhandryの圧縮オラクル法をランダムな置換に一般化する手法を提案する。そこで本稿では,ランダムな置換への一般化に抵抗したZhandryの手法の重要な特徴である,入力-出力対の述語に対するアルゴリズムの成功確率の有界化に,結果として生じるオラクルシミュレーションを利用する方法を示す。重要な技術的要素の1つは、オラクルのデータベースの置換を表すために厳密に単調な分解を使用することである。本フレームワークの適用例として, 1ラウンドスポンジ構成は, ランダムな置換モデルに対して無条件プレモージュ耐性を有することを示す。これはウンルーの予想を証明している。

We propose a generalization of Zhandry's compressed oracle method to random permutations, where an algorithm can query both the permutation and its inverse. We show how to use the resulting oracle simulation to bound the success probability of an algorithm for any predicate on input-output pairs, a key feature of Zhandry's technique that had hitherto resisted attempts at generalization to random permutations. One key technical ingredient is to use strictly monotone factorizations to represent the permutation in the oracle's database. As an application of our framework, we show that the one-round sponge construction is unconditionally preimage resistant in the random permutation model. This proves a conjecture by Unruh.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# BoBa:フェデレートラーニングにおけるデータ分散推論によるバックドア検出の強化

BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning ( http://arxiv.org/abs/2407.09658v1 )

ライセンス: Link先を確認

Ning Wang, Shanghao Shi, Yang Xiao, Yimin Chen, Y. Thomas Hou, Wenjing Lou,

(参考訳) フェデレーテッドラーニングは、協調モデルトレーニングの有望なアプローチであるが、その分散した性質のため、中毒攻撃の影響を受けやすい。特にバックドア攻撃は、トリガーを含む入力の予測を選択的に妥協するため、驚くべきステルス性を示している。このような攻撃を検知し軽減するためのこれまでの取り組みは、独立分散IID(Independent and Identically Distributed)データ仮定に基づいており、そこでは、良質なモデル更新はIDデータによる複数の特徴空間において高いレベルの類似性を示す。これにより、アウトリアはバックドア攻撃として検出される。それにもかかわらず、非IIDデータは、データバリエーションが良性モデル間のばらつきを導入し、異常検出に基づくメカニズムがより効果的になるため、バックドアアタック検出において重大な課題を呈している。本稿では,この問題を解決するために,分布認識型異常検出機構であるBoBaを提案する。データバラエティとバックドアアタックから生じるアウトレージを区別するために,データ分散を利用したクラスタリングクライアントと投票ベースの検出という2つのステップに分割することを提案する。クラスタリングと後続のバックドア検出は,クライアントデータ分布を知ることで大きな恩恵を受けることができるという直感に基づいて,新しいデータ分布推定機構を提案する。検出の堅牢性を改善するために,各クライアントが複数のクラスタに関連付けられている重なり合うクラスタリング手法を導入し,モデル更新の信頼性を単一のクラスタではなく複数のクラスタで評価する。広範囲な評価により,BoBa は攻撃成功率を 0.001 以下に抑えつつ,各種攻撃戦略や実験環境において高いタスク精度を維持しながら,攻撃成功率を0.001 以下に抑えることができることを示した。

Federated learning, while being a promising approach for collaborative model training, is susceptible to poisoning attacks due to its decentralized nature. Backdoor attacks, in particular, have shown remarkable stealthiness, as they selectively compromise predictions for inputs containing triggers. Previous endeavors to detect and mitigate such attacks are based on the Independent and Identically Distributed (IID) data assumption where benign model updates exhibit high-level similarity in multiple feature spaces due to IID data. Thus, outliers are detected as backdoor attacks. Nevertheless, non-IID data presents substantial challenges in backdoor attack detection, as the data variety introduces variance among benign models, making outlier detection-based mechanisms less effective. We propose a novel distribution-aware anomaly detection mechanism, BoBa, to address this problem. In order to differentiate outliers arising from data variety versus backdoor attack, we propose to break down the problem into two steps: clustering clients utilizing their data distribution followed by a voting-based detection. Based on the intuition that clustering and subsequent backdoor detection can drastically benefit from knowing client data distributions, we propose a novel data distribution inference mechanism. To improve detection robustness, we introduce an overlapping clustering method, where each client is associated with multiple clusters, ensuring that the trustworthiness of a model update is assessed collectively by multiple clusters rather than a single cluster. Through extensive evaluations, we demonstrate that BoBa can reduce the attack success rate to lower than 0.001 while maintaining high main task accuracy across various attack strategies and experimental settings.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# Bridging Dictionary: パルチザン語使用のAI生成辞書

Bridging Dictionary: AI-Generated Dictionary of Partisan Language Use ( http://arxiv.org/abs/2407.09661v1 )

ライセンス: Link先を確認

Hang Jiang, Doug Beeferman, William Brannon, Andrew Heyward, Deb Roy,

(参考訳) 言葉は様々な背景を持つ人々にとって異なる意味を持つことが多い。今日の社会的分極の時代は、特に政治的コミュニケーションやジャーナリズムにおいて、コミュニケーションの誤りを防ぐために、言葉を慎重に選ぶことを要求する。この問題に対処するために、異なる政治的見解を持つ人々によって、言葉がどのように認識されているかを示すインタラクティブなツールであるBridging Dictionaryを紹介した。 Bridging Dictionaryには、静的で印刷可能なドキュメントが含まれており、大きな言語モデルによって生成された要約を含む796の用語がある。これらの要約は、この用語が共和党員や民主党員によってどのように使われているかを強調している。さらにブリジング辞典は、ユーザーが選択した単語を探索し、その頻度、感情、要約、そして政治的分裂の例を視覚化するインタラクティブなインターフェイスを提供する。本稿では,ジャーナリストを事例として,人事機関の重要性と,このツールのさらなる強化への信頼を強調する。 Bridging Dictionaryのデプロイ版はhttps://dictionary.ccc-mit.org/で公開されている。

Words often carry different meanings for people from diverse backgrounds. Today's era of social polarization demands that we choose words carefully to prevent miscommunication, especially in political communication and journalism. To address this issue, we introduce the Bridging Dictionary, an interactive tool designed to illuminate how words are perceived by people with different political views. The Bridging Dictionary includes a static, printable document featuring 796 terms with summaries generated by a large language model. These summaries highlight how the terms are used distinctively by Republicans and Democrats. Additionally, the Bridging Dictionary offers an interactive interface that lets users explore selected words, visualizing their frequency, sentiment, summaries, and examples across political divides. We present a use case for journalists and emphasize the importance of human agency and trust in further enhancing this tool. The deployed version of Bridging Dictionary is available at https://dictionary.ccc-mit.org/.

翻訳日:2024-07-16 21:28:05 公開日:2024-07-12

# 地表面誘導拡散を用いた混合ビューパノラマ合成

Mixed-View Panorama Synthesis using Geospatially Guided Diffusion ( http://arxiv.org/abs/2407.09672v1 )

ライセンス: Link先を確認

Zhexiao Xiong, Xin Xing, Scott Workman, Subash Khanal, Nathan Jacobs,

(参考訳) そこでは,入力パノラマの小さなセットと,その領域の衛星画像が与えられた新しいパノラマを合成することが目的である。これは、入力パノラマ(サムビュー合成)や入力衛星画像(クロスビュー合成)のみを使用する以前の研究とは対照的である。混合ビュー設定は、世界中の任意の場所でパノラマ合成をサポートするのに最も自然であると主張する。重要な課題は、パノラマの空間的カバレッジが不均一であり、世界中の多くの地域ではほとんどパノラマが利用できないことである。本稿では,拡散モデルと注意に基づくアーキテクチャを用いて,利用可能なすべての入力画像から情報を抽出する手法を提案する。実験の結果,提案手法の有効性が示された。特に、利用可能なパノラマが、私たちが合成しようとしているパノラマの位置から遠く離れている場合のシナリオを扱うことができる。

We introduce the task of mixed-view panorama synthesis, where the goal is to synthesize a novel panorama given a small set of input panoramas and a satellite image of the area. This contrasts with previous work which only uses input panoramas (same-view synthesis), or an input satellite image (cross-view synthesis). We argue that the mixed-view setting is the most natural to support panorama synthesis for arbitrary locations worldwide. A critical challenge is that the spatial coverage of panoramas is uneven, with few panoramas available in many regions of the world. We introduce an approach that utilizes diffusion-based modeling and an attention-based architecture for extracting information from all available input imagery. Experimental results demonstrate the effectiveness of our proposed method. In particular, our model can handle scenarios when the available panoramas are sparse or far from the location of the panorama we are attempting to synthesize.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# 煙を再現する特性軌道の物理インフォームド学習

Physics-Informed Learning of Characteristic Trajectories for Smoke Reconstruction ( http://arxiv.org/abs/2407.09679v1 )

ライセンス: Link先を確認

Yiming Wang, Siyu Tang, Mengyu Chu,

(参考訳) 複雑な力学の限られた観察から生じる課題に対処するため、スパースビューのRGBビデオを通して、物理でインフォームドされた煙と障害物のニューラルな再構成を探索した。既存の物理インフォームドニューラルネットワークは、しばしば短期的な物理の制約を強調し、長期保存の適切な保存をあまり探さないままにしている。ラグランジアン流体軌道を暗黙的にモデル化するためにユーレリアニューラル場を利用した新しい表現であるニューラル特性軌道場を導入する。このトポロジフリーで自己微分可能な表現は、任意のフレーム間の効率的なフローマップ計算や、自動微分による効率的な速度抽出を容易にする。これにより、長期保存と短期物理学の先例をカバーするエンド・ツー・エンドの監視が可能となる。この表現に基づいて物理インフォームド・トラジェクトリ・ラーニングとNeRFに基づくシーン再構成への統合を提案する。我々は、自己教師付きシーン分解とシームレスな統合境界制約による高度な障害物処理を可能にする。以上の結果から,オクルージョンの不確実性,密度-色あいまいさ,静的-動的絡み合いといった課題を克服する能力を示した。コードとサンプルテストは \url{https://github.com/19reborn/PICT_smoke} にある。

We delve into the physics-informed neural reconstruction of smoke and obstacles through sparse-view RGB videos, tackling challenges arising from limited observation of complex dynamics. Existing physics-informed neural networks often emphasize short-term physics constraints, leaving the proper preservation of long-term conservation less explored. We introduce Neural Characteristic Trajectory Fields, a novel representation utilizing Eulerian neural fields to implicitly model Lagrangian fluid trajectories. This topology-free, auto-differentiable representation facilitates efficient flow map calculations between arbitrary frames as well as efficient velocity extraction via auto-differentiation. Consequently, it enables end-to-end supervision covering long-term conservation and short-term physics priors. Building on the representation, we propose physics-informed trajectory learning and integration into NeRF-based scene reconstruction. We enable advanced obstacle handling through self-supervised scene decomposition and seamless integrated boundary constraints. Our results showcase the ability to overcome challenges like occlusion uncertainty, density-color ambiguity, and static-dynamic entanglements. Code and sample tests are at \url{https://github.com/19reborn/PICT_smoke}.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# 産業応用のための文字列生成に基づく化学反応モデルの推算の高速化

Accelerating the inference of string generation-based chemical reaction models for industrial applications ( http://arxiv.org/abs/2407.09685v1 )

ライセンス: Link先を確認

Mikhail Andronov, Natalia Andronova, Michael Wand, Jürgen Schmidhuber, Djork-Arnè Clevert,

(参考訳) テンプレートのないSMILES-to-SMILES変換モデルによる反応予測と1段階の逆合成は、コンピュータ支援合成計画システムにおける産業的応用において、最先端の精度のために重要である。しかし、推論速度が遅い。本稿では,クエリ文字列列を適切な場所でターゲット文字列にコピーすることで,投機的復号化による自動回帰SMILESジェネレータの推論を高速化する手法を提案する。そこで,本手法をPytorch Lightningで実装した分子トランスに応用し,反応予測と1段階の逆合成において3倍以上の高速化を実現し,精度を損なうことなく実現した。

Template-free SMILES-to-SMILES translation models for reaction prediction and single-step retrosynthesis are of interest for industrial applications in computer-aided synthesis planning systems due to their state-of-the-art accuracy. However, they suffer from slow inference speed. We present a method to accelerate inference in autoregressive SMILES generators through speculative decoding by copying query string subsequences into target strings in the right places. We apply our method to the molecular transformer implemented in Pytorch Lightning and achieve over 3X faster inference in reaction prediction and single-step retrosynthesis, with no loss in accuracy.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# SPIN:自然画像における部分粒度の階層的セグメンテーション

SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images ( http://arxiv.org/abs/2407.09686v1 )

ライセンス: Link先を確認

Josh Myers-Dean, Jarek Reynolds, Brian Price, Yifei Fan, Danna Gurari,

(参考訳) 階層的セグメンテーションは、様々なレベルの粒度のセグメンテーションを作成する。本稿では,SPIN(SubPartImageNet)と呼ばれる自然画像のサブパートアノテーションを用いた,最初の階層的セマンティックセマンティックセマンティックセマンティクスデータセットを紹介する。また,アルゴリズムが階層レベルの空間的関係と意味的関係をいかにうまく捉えるかを評価するために,新しい評価指標を2つ導入した。 3つの異なるタスクにまたがる最新のモデルをベンチマークし、オブジェクト、部品、サブパート間の長所と短所を分析します。コミュニティ全体の進展を促進するため、データセットをhttps://joshmyersdean.github.io/spin/index.htmlで公開しています。

Hierarchical segmentation entails creating segmentations at varying levels of granularity. We introduce the first hierarchical semantic segmentation dataset with subpart annotations for natural images, which we call SPIN (SubPartImageNet). We also introduce two novel evaluation metrics to evaluate how well algorithms capture spatial and semantic relationships across hierarchical levels. We benchmark modern models across three different tasks and analyze their strengths and weaknesses across objects, parts, and subparts. To facilitate community-wide progress, we publicly release our dataset at https://joshmyersdean.github.io/spin/index.html.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# 深い期待-一貫性近似による高速かつロバストな位相検索

Fast and Robust Phase Retrieval via Deep Expectation-Consistent Approximation ( http://arxiv.org/abs/2407.09687v1 )

ライセンス: Link先を確認

Saurav K. Shastri, Philip Schniter,

(参考訳) フェーズレス計測から正確なイメージを正確に回収することは、困難で長期にわたる問題である。本研究では,予測整合性(EC)近似とDeep Denoising Networkを組み合わせたDeepECprを提案する。非伝統的な方法でECを適用することに加えて、DeepECprは最近の拡散法にインスパイアされた新しい確率減衰スキームを含んでいる。プラグ・アンド・プレイの事前に基づく既存の位相検索法と同様に、DeepECprはデノナイジング段階を測定-探索段階で反復する。しかし、既存のメソッドとは異なり、DeepECprははるかに少ないデノイザ呼び出しを必要とします。 We compare deepECpr to the State-of-the-the-art prDeep (Metzler et al , 2018), Deep-ITA (Wang et al , 2020), and Diffusion Posterior Sampling (Chung et al , 2023) method for noisy phase-retrieval of color, natural, and unnatural grayscale images on oversampled-Fourier and coded-diffraction-pattern Measurement and find improve in PSNR and SSIM with 5x less denoiser call。

Accurately recovering images from phaseless measurements is a challenging and long-standing problem. In this work, we present "deepECpr," which combines expectation-consistent (EC) approximation with deep denoising networks to surpass state-of-the-art phase-retrieval methods in both speed and accuracy. In addition to applying EC in a non-traditional manner, deepECpr includes a novel stochastic damping scheme that is inspired by recent diffusion methods. Like existing phase-retrieval methods based on plug-and-play priors, regularization by denoising, or diffusion, deepECpr iterates a denoising stage with a measurement-exploitation stage. But unlike existing methods, deepECpr requires far fewer denoiser calls. We compare deepECpr to the state-of-the-art prDeep (Metzler et al., 2018), Deep-ITA (Wang et al., 2020), and Diffusion Posterior Sampling (Chung et al., 2023) methods for noisy phase-retrieval of color, natural, and unnatural grayscale images on oversampled-Fourier and coded-diffraction-pattern measurements and find improvements in both PSNR and SSIM with 5x fewer denoiser calls.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# 健康データの社会的決定因子統合のための大規模言語モデル:30日間の心不全予測を事例として

Large Language Models for Integrating Social Determinant of Health Data: A Case Study on Heart Failure 30-Day Readmission Prediction ( http://arxiv.org/abs/2407.09688v1 )

ライセンス: Link先を確認

Chase Fensore, Rodrigo M. Carrillo-Larco, Shivani A. Patel, Alanna A. Morris, Joyce C. Ho,

(参考訳) 社会的健康決定因子(SDOH)$-$は、人々が生活し、成長し、そして年齢が$-$が健康上の結果に重要な役割を果たす無数の状況である。しかし、既存の結果予測モデルは、しばしばSDOHのプロキシのみを特徴として用いている。最近のオープンデータイニシアチブは、より包括的なSDOHのビューを構築する機会を提供するが、公的なSDOHデータの量と多様性が増大するにつれて、個々の患者にとって最も関連性の高いデータを手作業で統合することはますます困難になっている。大規模言語モデル(LLM)は、構造化されたデータを自動的にアノテートすることを約束している。本稿では,LSMを用いたSDOHデータ統合の実現可能性と臨床予測におけるこれらのSDOH機能の有用性について,エンド・ツー・エンドのケーススタディを行った。まず、2つの公開アクセス可能なSDOHデータソースから5つのセマンティックSDOHカテゴリの1つに700以上の変数を手動でラベル付けする。そして,この分類課題において,9つのオープンソースLCMの性能をベンチマークする。最後に,39k心不全(HF)患者の30日間の入院寛解を予測するためのMLモデルを訓練し,分類したSDOH変数の予測性能と標準臨床変数との比較を行った。さらに,LLMのアノテーション性能に対する数発のLDMプロンプトの影響について検討し,それらの変数を正確に注釈づけする上でどの情報が役立つかを評価するプロンプトに関するメタデータのアブレーション研究を行う。我々は,SDOH変数をゼロショットプロンプトで効果的に正確にアノテートできるオープンソースのLCMが,微調整を必要とせずに存在することを発見した。要旨: 標準臨床特徴と組み合わせた場合, SDOH 変数の LLM アノテーションと構築環境サブセットは, HF 患者の30日間の寛解を予測できる最高の成績を示す。

Social determinants of health (SDOH) $-$ the myriad of circumstances in which people live, grow, and age $-$ play an important role in health outcomes. However, existing outcome prediction models often only use proxies of SDOH as features. Recent open data initiatives present an opportunity to construct a more comprehensive view of SDOH, but manually integrating the most relevant data for individual patients becomes increasingly challenging as the volume and diversity of public SDOH data grows. Large language models (LLMs) have shown promise at automatically annotating structured data. Here, we conduct an end-to-end case study evaluating the feasibility of using LLMs to integrate SDOH data, and the utility of these SDOH features for clinical prediction. We first manually label 700+ variables from two publicly-accessible SDOH data sources to one of five semantic SDOH categories. Then, we benchmark performance of 9 open-source LLMs on this classification task. Finally, we train ML models to predict 30-day hospital readmission among 39k heart failure (HF) patients, and we compare the prediction performance of the categorized SDOH variables with standard clinical variables. Additionally, we investigate the impact of few-shot LLM prompting on LLM annotation performance, and perform a metadata ablation study on prompts to evaluate which information helps LLMs accurately annotate these variables. We find that some open-source LLMs can effectively, accurately annotate SDOH variables with zero-shot prompting without the need for fine-tuning. Crucially, when combined with standard clinical features, the LLM-annotated Neighborhood and Built Environment subset of the SDOH variables shows the best performance predicting 30-day readmission of HF patients.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# 信頼されたサーバのないプライベートな不均一なフェデレーション学習:凸損失に対する誤り最適かつコミュニケーション効率のアルゴリズム

Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses ( http://arxiv.org/abs/2407.09690v1 )

ライセンス: Link先を確認

Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright,

(参考訳) 我々は,サーバやサイロ/クライアントを信頼していない人たちの個人データを用いて,連邦学習(FL)の問題を再考する。この文脈では、すべてのサイロ(例えば病院)は、複数の人(例えば患者)からのデータを持ち、サーバーや他のサイロがデータを発見しようとする場合でも、各人のデータ(例えば健康記録)のプライバシーを保護する必要がある。 Inter-Silo Record-Level Differential Privacy (ISRL-DP) は、各サイロのデータ漏洩を防止し、サイロ i の通信がアイテムレベルの差分プライバシーを満たすように要求する。前者のarXiv:2203.06735は、同種(d.d.)のサイロデータと凸損失関数を持つISRL-DPアルゴリズムの最適過剰リスク境界を特徴付ける。しかし、(1)同じ過剰なリスク境界を不均一な(非i.d.)サイロデータで達成できるのか? 2)コミュニケーションラウンドを減らして最適なリスク境界を達成できるのか? 本稿では,両質問に対して肯定的な回答を与える。異種サイロデータの存在下で最適な過大なリスク境界を実現する新しいISRL-DP FLアルゴリズムを提案する。さらに、我々のアルゴリズムは従来の最先端技術よりも通信効率が高い。スムーズな損失関数に対して、我々のアルゴリズムは最適余剰リスクバウンドを達成し、非プライベートな下位バウンドと一致する通信複雑性を持つ。さらに、我々のアルゴリズムは以前の最先端技術よりも計算効率が良い。

We revisit the problem of federated learning (FL) with private data from people who do not trust the server or other silos/clients. In this context, every silo (e.g. hospital) has data from several people (e.g. patients) and needs to protect the privacy of each person's data (e.g. health records), even if the server and/or other silos try to uncover this data. Inter-Silo Record-Level Differential Privacy (ISRL-DP) prevents each silo's data from being leaked, by requiring that silo i's communications satisfy item-level differential privacy. Prior work arXiv:2203.06735 characterized the optimal excess risk bounds for ISRL-DP algorithms with homogeneous (i.i.d.) silo data and convex loss functions. However, two important questions were left open: (1) Can the same excess risk bounds be achieved with heterogeneous (non-i.i.d.) silo data? (2) Can the optimal risk bounds be achieved with fewer communication rounds? In this paper, we give positive answers to both questions. We provide novel ISRL-DP FL algorithms that achieve the optimal excess risk bounds in the presence of heterogeneous silo data. Moreover, our algorithms are more communication-efficient than the prior state-of-the-art. For smooth loss functions, our algorithm achieves the optimal excess risk bound and has communication complexity that matches the non-private lower bound. Additionally, our algorithms are more computationally efficient than the previous state-of-the-art.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# EVOLVE: 微調整GPT様モデルを用いたソーシャルメディアにおけるユーザ進化とネットワークダイナミクスの予測

EVOLVE: Predicting User Evolution and Network Dynamics in Social Media Using Fine-Tuned GPT-like Model ( http://arxiv.org/abs/2407.09691v1 )

ライセンス: Link先を確認

Ismail Hossain, Md Jahangir Alam, Sai Puppala, Sajedul Talukder,

(参考訳) ソーシャルメディアプラットフォームは、個人の感情、日々の活動、さまざまなライフイベントの共有に広く使われており、最新の出来事を人々に知らせている。ユーザーがアカウントを作成する瞬間から、彼らは友達やフォロワーのネットワークを継続的に拡張し、投稿、コメント、共有によって他人と自由にやりとりする。時間の経過とともに、ユーザー行動は人口統計特性と彼らが確立したネットワークに基づいて進化する。本研究では,ユーザが生涯にわたってソーシャルメディア上でどのように進化していくかを理解するための予測手法を提案し,その進化の次の段階を予測する。我々はGPTのようなデコーダのみのモデル(E-GPT: Evolution-GPT)を微調整し、オンラインソーシャルメディアにおけるユーザの進化の将来のステージを予測する。我々は,これらのモデルの性能を評価し,ユーザの属性がネットワーク内の変化にどのように影響するかを,ソーシャルメディア上での今後のつながりやユーザ活動の変化を予測し,またリコメンデーションシステムなどの他のソーシャルメディアの課題にも対処する。

Social media platforms are extensively used for sharing personal emotions, daily activities, and various life events, keeping people updated with the latest happenings. From the moment a user creates an account, they continually expand their network of friends or followers, freely interacting with others by posting, commenting, and sharing content. Over time, user behavior evolves based on demographic attributes and the networks they establish. In this research, we propose a predictive method to understand how a user evolves on social media throughout their life and to forecast the next stage of their evolution. We fine-tune a GPT-like decoder-only model (we named it E-GPT: Evolution-GPT) to predict the future stages of a user's evolution in online social media. We evaluate the performance of these models and demonstrate how user attributes influence changes within their network by predicting future connections and shifts in user activities on social media, which also addresses other social media challenges such as recommendation systems.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# 数理的枠組み,モデリングパラダイムの分類,およびニューラルシンボリックシステムのための学習技術スイート

A Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems ( http://arxiv.org/abs/2407.09693v1 )

ライセンス: Link先を確認

Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor,

(参考訳) ニューラル・シンボリック・システム(NeSy)の分野は急速に成長している。提案されたアプローチは、ニューラルおよびシンボリックメソッドの共生結合を達成する上で大きな可能性を示している。しかし、それぞれのNeSy系は基本的な方法で異なる。共通点とアプローチの違いを照らし、さらなる進歩を可能にする統一理論が必要である。本稿では,ニューラル・シンボリックエネルギーベースモデル(NeSy-EBMs)を紹介する。我々はNeSy-EBMを用いて,システムのニューラルシンボリックインタフェースと推論機能に着目したモデリングパラダイムの分類法を開発した。さらに,NeSy-EBMの学習手法についても紹介する。重要なことは、NeSy-EBMは、顕著な学習損失の勾配に対する一般的な表現の導出を可能にし、両レベルおよび確率的ポリシー最適化を含む複数の領域からの手法を活用する4つの学習アプローチを提供する。最後に、NeSyシステムの現実的な応用を容易にするため、スケーラビリティと表現性のために設計されたオープンソースのNeSy-EBMライブラリであるNeuPSLを提案する。画像分類,グラフノードラベリング,自動運転車の状況認識,質問応答など,さまざまなタスクにおいて,NeSy-EBMの実用的メリットを示す。

The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symbolic Energy-Based Models (NeSy-EBMs), a unifying mathematical framework for discriminative and generative modeling with probabilistic and non-probabilistic NeSy approaches. We utilize NeSy-EBMs to develop a taxonomy of modeling paradigms focusing on a system's neural-symbolic interface and reasoning capabilities. Additionally, we introduce a suite of learning techniques for NeSy-EBMs. Importantly, NeSy-EBMs allow the derivation of general expressions for gradients of prominent learning losses, and we provide four learning approaches that leverage methods from multiple domains, including bilevel and stochastic policy optimization. Finally, we present Neural Probabilistic Soft Logic (NeuPSL), an open-source NeSy-EBM library designed for scalability and expressivity, facilitating real-world application of NeSy systems. Through extensive empirical analysis across multiple datasets, we demonstrate the practical advantages of NeSy-EBMs in various tasks, including image classification, graph node labeling, autonomous vehicle situation awareness, and question answering.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# ディバイドとフューズ:部分可視画像から身体部分メッシュを復元する

Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images ( http://arxiv.org/abs/2407.09694v1 )

ライセンス: Link先を確認

Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu,

(参考訳) 本稿では,人体メッシュ再構築のための新しいボトムアップ手法を提案する。 SMPLのような全身のパラメトリックモデルに依存した従来のトップダウン手法は、人間の小さな部分しか見えず、正確なメッシュ再構築のためにほとんどの人の体を視認する必要がある。この制限を克服するため,本手法では「D&F(Divide and Fuse)」戦略を採用し,人体部分を融合する前に個別に再構築し,閉塞に対する堅牢性を確保する。我々は,いくつかの形状と大域的位置パラメータから独立にメッシュを再構成するHuman Part Parametric Models (HPPM) を設計する。特別に設計された融合モジュールは、少数しか見えなくても、再建された部品をシームレスに統合する。私たちは、パラメトリックメッシュモデルをトレーニングするために、大量の地上トルスSMPLデータを使用します。提案手法の訓練と評価を容易にするため,HPPMアノテーションを付加した部分可視像を特徴とするベンチマークデータセットを構築した。これらのベンチマークデータセットを用いて,本手法の有効性を実証した。特に,従来の手法が再現性を維持するのに苦労する,かなりの可視性のあるシナリオにおいて。

We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruction. To overcome this limitation, our method employs a "Divide and Fuse (D&F)" strategy, reconstructing human body parts independently before fusing them, thereby ensuring robustness against occlusions. We design Human Part Parametric Models (HPPM) that independently reconstruct the mesh from a few shape and global-location parameters, without inter-part dependency. A specially designed fusion module then seamlessly integrates the reconstructed parts, even when only a few are visible. We harness a large volume of ground-truth SMPL data to train our parametric mesh models. To facilitate the training and evaluation of our method, we have established benchmark datasets featuring images of partially visible humans with HPPM annotations. Our experiments, conducted on these benchmark datasets, demonstrate the effectiveness of our D&F method, particularly in scenarios with substantial invisibility, where traditional approaches struggle to maintain reconstruction quality.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# 距離ビューに基づく3次元セマンティックセマンティックセグメンテーションのマルチセンサフュージョンによるリアルタイム化

Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion ( http://arxiv.org/abs/2407.09697v1 )

ライセンス: Link先を確認

Shiqi Tan, Hamidreza Fazlali, Yixuan Xu, Yuan Ren, Bingbing Liu,

(参考訳) Range-View(RV)ベースの3Dポイントクラウドセグメンテーションは、そのコンパクトなデータ形式のために広く採用されている。しかし, RV法は, 3次元点雲の希少な性質のため, 閉点に対して頑健なセグメンテーションが得られず, 投影されたRGB画像の歪みに悩まされる。これらの問題を緩和するために、新しいLiDARとカメラレンジビューベースの3Dポイントクラウドセマンティックセマンティックセマンティック手法(LaCRange)を提案する。具体的には、RGB画像のRVプロジェクションの悪影響を改善するために、歪み補償知識蒸留(DCKD)戦略を設計する。さらに、強靭で保存的なセンサ融合のために、コンテキストベースの特徴融合モジュールが導入された。最後に, RVの分解能の限界と3次元トポロジの不足に対処するため, 2次元特徴量の適切な集約と3次元特徴量の増大のために, 新たな点修正方式を考案した。提案手法を大規模自律走行データセットであるSemanticKITTIとnuScenesで評価した。提案手法はリアルタイム性に加えて, nuScenes ベンチマークの最先端結果も達成する。

Range-View(RV)-based 3D point cloud segmentation is widely adopted due to its compact data form. However, RV-based methods fall short in providing robust segmentation for the occluded points and suffer from distortion of projected RGB images due to the sparse nature of 3D point clouds. To alleviate these problems, we propose a new LiDAR and Camera Range-view-based 3D point cloud semantic segmentation method (LaCRange). Specifically, a distortion-compensating knowledge distillation (DCKD) strategy is designed to remedy the adverse effect of RV projection of RGB images. Moreover, a context-based feature fusion module is introduced for robust and preservative sensor fusion. Finally, in order to address the limited resolution of RV and its insufficiency of 3D topology, a new point refinement scheme is devised for proper aggregation of features in 2D and augmentation of point features in 3D. We evaluated the proposed method on large-scale autonomous driving datasets \ie SemanticKITTI and nuScenes. In addition to being real-time, the proposed method achieves state-of-the-art results on nuScenes benchmark

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# RIO-CPD:相関を考慮したオンライン変化点検出のためのリーマン幾何学的手法

RIO-CPD: A Riemannian Geometric Method for Correlation-aware Online Change Point Detection ( http://arxiv.org/abs/2407.09698v1 )

ライセンス: Link先を確認

Chengyuan Deng, Zhengzhang Chen, Xujiang Zhao, Haoyu Wang, Junxiang Wang, Haifeng Chen, Jie Gao,

(参考訳) 変化点検出の目的は、データシーケンス内の潜在的に複数の点における急激な変化を特定することである。このタスクは、データの限界分布とジョイント分布の両方の変化を含む、さまざまなタイプの変更が発生するオンライン環境では特に困難である。本稿では,リーマン幾何学上の相関行列を逐次追跡することにより,これらの課題に対処する。対称正定行列多様体のリーマン幾何学と,変化点を検出する累積和統計量(CUSUM)を組み合わせた,非パラメトリック相関対応オンライン変化点検出フレームワークであるRio-CPDを提案する。 Rio-CPDは、現在の観測から以前の観測のFr'echet平均までの距離を計算することでCUSUMを強化する。リーマン幾何学のメトリクスを慎重に選択することで、リオCPDは単純で計算的に効率的である。合成データセットと実世界のデータセットの両方の実験結果から、Rio-CPDは検出精度と効率において既存の手法よりも優れていることが示された。

The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on the Riemannian geometry, where the geodesic distances accurately capture the development of correlations. We propose Rio-CPD, a non-parametric correlation-aware online change point detection framework that combines the Riemannian geometry of the manifold of symmetric positive definite matrices and the cumulative sum statistic (CUSUM) for detecting change points. Rio-CPD enhances CUSUM by computing the geodesic distance from present observations to the Fr\'echet mean of previous observations. With careful choice of metrics equipped to the Riemannian geometry, Rio-CPD is simple and computationally efficient. Experimental results on both synthetic and real-world datasets demonstrate that Rio-CPD outperforms existing methods in detection accuracy and efficiency.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# Pythonベースの化学フレームワークのシミュレーションにGPUアクセラレーションを導入する

Introducing GPU-acceleration into the Python-based Simulations of Chemistry Framework ( http://arxiv.org/abs/2407.09700v1 )

ライセンス: Link先を確認

Rui Li, Qiming Sun, Xing Zhang, Garnet Kin-Lic Chan,

(参考訳) 我々はPySCFのメソッドのGPUアクセラレーションを提供するモジュールであるGPU4PySCFの最初のバージョンを紹介する。コア機能として、2電子反発積分(ERIs)のGPU実装が提供され、Rys二次関数を用いて最大g関数を構成する。量子化学のワークフローをいかに加速させるかの図解として、積分直交のハートリー・フォック構造と核勾配構造において、ERIを効率的に利用する方法について述べる。ベンチマーク計算では、PySCFのマルチスレッドCPUHartree-Fockコードに対する2桁の大幅な高速化と、1つのNVIDIA A100 GPU上のGAMESSやQUICKを含む他のGPUアクセラレーション量子化学パッケージに匹敵する性能を示している。

We introduce the first version of GPU4PySCF, a module that provides GPU acceleration of methods in PySCF. As a core functionality, this provides a GPU implementation of two-electron repulsion integrals (ERIs) for contracted basis sets comprising up to g functions using Rys quadrature. As an illustration of how this can accelerate a quantum chemistry workflow, we describe how to use the ERIs efficiently in the integral-direct Hartree-Fock Fock build and nuclear gradient construction. Benchmark calculations show a significant speedup of two orders of magnitude with respect to the multi-threaded CPU Hartree-Fock code of PySCF, and performance comparable to other GPU-accelerated quantum chemical packages including GAMESS and QUICK on a single NVIDIA A100 GPU.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# 優先順位付けされたリプレイと一般化の相互作用の検討

Investigating the Interplay of Prioritized Replay and Generalization ( http://arxiv.org/abs/2407.09702v1 )

ライセンス: Link先を確認

Parham Mohammad Panahi, Andrew Patterson, Martha White, Adam White,

(参考訳) 過去のデータを再利用し、サンプル効率を向上させるため、強化学習では、経験の再生は至るところで行われている。性能向上のために様々なスマートサンプリングスキームが導入されたが、今までのところ、一様サンプリングが最も一般的なアプローチである。 1つの例外は優先順位付き体験再生(PER)であり、サンプリングは動的プログラミングにおける優先順位付きスイーピングの成功にインスパイアされたTDエラーに比例して行われる。 PERの当初の作業では、Atariの改善が見られたが、その後の結果はさまざまだ。本稿ではPERの様々なバリエーションについて検討し、PERがいつ役に立つかを理解する。予測タスクでは,PERは表の設定で値の伝搬を改善することができるが,ニューラルネットワークと組み合わせた場合の挙動は著しく異なる。一般化を制御するためにターゲットネットワークのアップデートを遅らせたり、確率性を追跡するためにPERで期待されるTDエラーの見積を使用するなど、ある種の緩和は、PERやニューラルネットワークによるエラーの大規模なスパイクを回避することができるが、それでも一般的には、均一なリプレイよりも優れていない。制御タスクでは、優先順位付けされたどの変種も一貫して均一なリプレイを上回っていない。

Experience replay is ubiquitous in reinforcement learning, to reuse past data and improve sample efficiency. Though a variety of smart sampling schemes have been introduced to improve performance, uniform sampling by far remains the most common approach. One exception is Prioritized Experience Replay (PER), where sampling is done proportionally to TD errors, inspired by the success of prioritized sweeping in dynamic programming. The original work on PER showed improvements in Atari, but follow-up results are mixed. In this paper, we investigate several variations on PER, to attempt to understand where and when PER may be useful. Our findings in prediction tasks reveal that while PER can improve value propagation in tabular settings, behavior is significantly different when combined with neural networks. Certain mitigations -- like delaying target network updates to control generalization and using estimates of expected TD errors in PER to avoid chasing stochasticity -- can avoid large spikes in error with PER and neural networks, but nonetheless generally do not outperform uniform replay. In control tasks, none of the prioritized variants consistently outperform uniform replay.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# エレガントブリッジとは何か:多言語 LLM は異なる言語で同様にバイアスされる

What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages ( http://arxiv.org/abs/2407.09704v1 )

ライセンス: Link先を確認

Viktor Mihaylov, Aleksandar Shtedritski,

(参考訳) 本稿では,文法的ジェンダーのレンズによるLarge Language Models(LLMs)のバイアスについて検討する。心理言語学における基礎研究、特にジェンダーが言語知覚に与える影響の研究からインスピレーションを得た上で、多言語LLMを活用してボロディツキーの基礎実験(2003年)を再考し、拡張する。 LLMを文法性に関連する心理言語学的バイアスを調べるための新しい手法として,様々な言語で形容詞を持つ名詞を記述するモデルを提案し,特に文法性のある言語に焦点を当てた。特に, 名詞を記述するために LLM が用いている形容詞の文法的性別を予測するために, 男女・言語間の形容詞共起について検討し, 二項分類器を訓練する。意外なことに、単純な分類器は偶然以上の名詞の性別を予測できるだけでなく、言語間の移動可能性も示せる。 LLMは異なる言語で異なる単語を記述できるが、同様にバイアスを受ける。

This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender. Drawing inspiration from seminal works in psycholinguistics, particularly the study of gender's influence on language perception, we leverage multilingual LLMs to revisit and expand upon the foundational experiments of Boroditsky (2003). Employing LLMs as a novel method for examining psycholinguistic biases related to grammatical gender, we prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender. In particular, we look at adjective co-occurrences across gender and languages, and train a binary classifier to predict grammatical gender given adjectives an LLM uses to describe a noun. Surprisingly, we find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability. We show that while LLMs may describe words differently in different languages, they are biased similarly.

翻訳日:2024-07-16 21:18:20 公開日:2024-07-12

# バランスの取れたマルチモーダル学習のための診断と再学習

Diagnosing and Re-learning for Balanced Multimodal Learning ( http://arxiv.org/abs/2407.09705v1 )

ライセンス: Link先を確認

Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu,

(参考訳) モデルが特定のモダリティのトレーニングを好む不均衡なマルチモーダル学習問題を克服するため、既存の手法では、異なる視点からユニモーダルエンコーダのトレーニングを制御し、モーダル間性能の相違を基礎として提案する。しかし、モダリティキャパシティの本質的な制限は無視される。少ない情報的モダリティは ``worse-learnt'' と認識できるため、モデルにより多くのノイズを記憶させ、非生産的にマルチモーダルモデルの能力に影響を与える可能性がある。さらに、現在のモダリティ変調法は、選択された劣悪な学習モダリティに狭く集中し、他者の訓練を抑える。したがって、モダリティキャパシティの本質的な制限を考慮し、バランスをとる際にすべてのモダリティを考慮することが不可欠である。そこで本研究では,診断と再学習の手法を提案する。まず、そのユニモーダル表現空間の分離性に基づいて各モーダルの学習状態を推定し、それに対応するユニモーダルエンコーダをソフトに初期化するために使用する。このように、少ない情報モダリティの過度な強調は避けられる。さらに、低遅延モードのエンコーダが強化され、他のモードの過度なトレーニングが回避される。したがって、マルチモーダル学習は効果的にバランスを保ち、強化される。複数種類のモダリティとマルチモーダルフレームワークを網羅した実験は、バランスの取れたマルチモーダル学習において、単純なyet効率の手法の優れた性能を示す。ソースコードとデータセットは \url{https://github.com/GeWu-Lab/Diagnosing_Relearning_ECCV2024} で公開されている。

To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal performance discrepancy as the basis. However, the intrinsic limitation of modality capacity is ignored. The scarcely informative modalities can be recognized as ``worse-learnt'' ones, which could force the model to memorize more noise, counterproductively affecting the multimodal model ability. Moreover, the current modality modulation methods narrowly concentrate on selected worse-learnt modalities, even suppressing the training of others. Hence, it is essential to consider the intrinsic limitation of modality capacity and take all modalities into account during balancing. To this end, we propose the Diagnosing \& Re-learning method. The learning state of each modality is firstly estimated based on the separability of its uni-modal representation space, and then used to softly re-initialize the corresponding uni-modal encoder. In this way, the over-emphasizing of scarcely informative modalities is avoided. In addition, encoders of worse-learnt modalities are enhanced, simultaneously avoiding the over-training of other modalities. Accordingly, multimodal learning is effectively balanced and enhanced. Experiments covering multiple types of modalities and multimodal frameworks demonstrate the superior performance of our simple-yet-effective method for balanced multimodal learning. The source code and dataset are available at \url{https://github.com/GeWu-Lab/Diagnosing_Relearning_ECCV2024}.

翻訳日:2024-07-16 21:08:36 公開日:2024-07-12

# GOFA: 共同グラフ言語モデリングのための1対オール生成モデル

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling ( http://arxiv.org/abs/2407.09709v1 )

ライセンス: Link先を確認

Lecheng Kong, Jiarui Feng, Hao Liu, Chengsong Huang, Jiaxin Huang, Yixin Chen, Muhan Zhang,

(参考訳) LLM(Large Language Models)やLVM(Large Vision Models)といった基礎的なモデルは、各分野において最も強力なツールの1つとして登場した。しかし、テキストデータや画像データとは異なり、グラフデータは決定的な構造を持っておらず、グラフ基礎モデル(GFM)を開発する上で大きな課題となっている。例えば、グラフモデルを設計する現在の試みでは、グラフデータをLLMベースの予測のための言語形式に変換するか、あるいはアシスタントとしてLLMを使ってGNNモデルをトレーニングしている。前者は無制限のタスクを処理でき、後者はグラフ構造をよりよくキャプチャする。本稿では,自己教師型事前学習,タスクの流動性,グラフ認識という,GFMの重要な3つの特性を同定する。これらの特性を考慮し,従来の言語モデリングをグラフ領域に拡張し,新たな生成グラフ言語モデルGOFAを提案する。このモデルは、ランダムに初期化されたGNN層を凍結学習されたLLMにインターリーブし、セマンティックおよび構造モデリング能力を有機的に組み合わせる。 GOFAは、新たに提案されたグラフレベルの次単語予測、質問応答、構造的タスクに基づいて、上記のGFM特性を得るために事前訓練される。事前訓練されたモデルは、タスク解決能力を得るために下流タスクにさらに微調整される。細調整されたモデルは、様々な下流タスクに基づいて評価され、ゼロショットシナリオにおける構造的および文脈的問題を解く強力な能力を示す。コードはhttps://github.com/JiaruiFeng/GOFAで公開されている。

Foundation models, such as Large Language Models (LLMs) or Large Vision Models (LVMs), have emerged as one of the most powerful tools in the respective fields. However, unlike text and image data, graph data do not have a definitive structure, posing great challenges to developing a Graph Foundation Model (GFM). For example, current attempts at designing general graph models either transform graph data into a language format for LLM-based prediction or still train a GNN model with LLM as an assistant. The former can handle unlimited tasks, while the latter captures graph structure much better -- yet, no existing work can achieve both simultaneously. In this paper, we identify three key desirable properties of a GFM: self-supervised pretraining, fluidity in tasks, and graph awareness. To account for these properties, we extend the conventional language modeling to the graph domain and propose a novel generative graph language model GOFA to solve the problem. The model interleaves randomly initialized GNN layers into a frozen pre-trained LLM so that the semantic and structural modeling abilities are organically combined. GOFA is pre-trained on newly proposed graph-level next-word prediction, question-answering, and structural tasks to obtain the above GFM properties. The pre-trained model is further fine-tuned on downstream tasks to obtain task-solving ability. The fine-tuned model is evaluated on various downstream tasks, demonstrating a strong ability to solve structural and contextual problems in zero-shot scenarios. The code is available at https://github.com/JiaruiFeng/GOFA.

翻訳日:2024-07-16 21:08:35 公開日:2024-07-12

# DisQ: 量子分散システムのためのマルコフ決定プロセスに基づく言語

DisQ: A Markov Decision Process Based Language for Quantum Distributed Systems ( http://arxiv.org/abs/2407.09710v1 )

ライセンス: Link先を確認

Le Chang, Saitej Yavvari, Rance Cleaveland, Samik Basu, Liyi Li,

(参考訳) 量子コンピュータの開発は、重要な量子資源の制限にもかかわらず、大きなマイルストーンに達している。近年、単一位置量子コンピューティングと量子ネットワーク技術を組み合わせて、遠隔プロセッサで大きな絡み合った量子ビット群を構築できるような分散量子システムの開発が試みられ、量子アルゴリズムを分散的に実行できるようになった。本研究では,分散バージョンへの量子アルゴリズムの書き直しを容易にするフレームワークとしてDisQを提案する。 DisQの中核は分散量子プログラミング言語であり、化学抽象機械(CHAM)とマルコフ決定プロセス(MDP)の概念と、明確に区別された量子並列性と分散挙動を提供することを目的としている。本研究では,DisQ言語に基づいて,量子アルゴリズムの等価性とその分散バージョンを検証するシミュレーション関係を構築した。分散バージョンに等価な書き直しを示すために、量子加算やショアのアルゴリズムなどのいくつかのケーススタディを示す。

The development of quantum computers has reached a great milestone, in spite of restrictions on important quantum resources: the number of qubits being entangled at a single-location quantum computer. Recently, there has been some work to combine single-location quantum computing and quantum networking techniques to develop distributed quantum systems such that large entangled qubit groups can be established through remote processors, and quantum algorithms can be executed distributively. We present DisQ as a framework to facilitate the rewrites of quantum algorithms to their distributed versions. The core of DisQ is a distributed quantum programming language that combines the concepts of Chemical Abstract Machine (CHAM) and Markov Decision Processes (MDP) with the objective of providing a clearly distinguishing quantum concurrent and distributed behaviors. Based on the DisQ language, we develop a simulation relation for verifying the equivalence of a quantum algorithm and its distributed versions. We present several case studies, such as quantum addition and Shor's algorithm, to demonstrate their equivalent rewrites to distributed versions.

翻訳日:2024-07-16 21:08:35 公開日:2024-07-12

# Deep-TEMPEST:Deep Learningを使って意図しない電磁エマニュエーションからHDMIを盗聴する

Deep-TEMPEST: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations ( http://arxiv.org/abs/2407.09717v1 )

ライセンス: Link先を確認

Santiago Fernández, Emilio Martínez, Gabriel Varela, Pablo Musé, Federico Larroca,

(参考訳) 本研究では,ケーブルやコネクタ,特にHDMIから無意識に放出される電磁波を解析することにより,デジタルビデオディスプレイの盗聴の問題に対処する。この問題はTEMPESTとして知られている。アナログケース(VGA)と比較して、デジタルケースは10ビットの符号化により、観測された信号と画素の強度との間の帯域幅と非線形マッピングがはるかに大きくなるため、難しい。その結果、アナログケース用に設計された盗聴システムは、デジタルビデオに適用した場合、不明瞭で読みにくい画像が得られる。提案手法は、問題を逆問題として再キャストし、深層学習モジュールを訓練し、観測された電磁波を表示された画像にマッピングする。しかし、このアプローチは信号の詳細な数学的解析を必要としており、まず、チューニングする周波数を決定するだけでなく、実際のTEMPESTセットアップを実際に必要とせずにトレーニングサンプルを生成する。これにより、特にいくつかの設定が検討されている場合、時間が節約され、これらのサンプルを取得する必要がなくなる。本システムは,テキストにおける平均文字誤り率の向上に重点を置いており,従来の実装に比べて60パーセント以上向上している。提案システムは、広く利用可能なSoftware Defined Radioに基づいており、完全にオープンソースであり、人気のあるGNU Radioフレームワークにシームレスに統合されている。トレーニング用に生成したデータセットも共有しています。最後に、同様の原理に基づいて設計されたシステムによって盗難される可能性を最小限に抑えるために、いくつかの対策について論じる。

In this work, we address the problem of eavesdropping on digital video displays by analyzing the electromagnetic waves that unintentionally emanate from the cables and connectors, particularly HDMI. This problem is known as TEMPEST. Compared to the analog case (VGA), the digital case is harder due to a 10-bit encoding that results in a much larger bandwidth and non-linear mapping between the observed signal and the pixel's intensity. As a result, eavesdropping systems designed for the analog case obtain unclear and difficult-to-read images when applied to digital video. The proposed solution is to recast the problem as an inverse problem and train a deep learning module to map the observed electromagnetic signal back to the displayed image. However, this approach still requires a detailed mathematical analysis of the signal, firstly to determine the frequency at which to tune but also to produce training samples without actually needing a real TEMPEST setup. This saves time and avoids the need to obtain these samples, especially if several configurations are being considered. Our focus is on improving the average Character Error Rate in text, and our system improves this rate by over 60 percentage points compared to previous available implementations. The proposed system is based on widely available Software Defined Radio and is fully open-source, seamlessly integrated into the popular GNU Radio framework. We also share the dataset we generated for training, which comprises both simulated and over 1000 real captures. Finally, we discuss some countermeasures to minimize the potential risk of being eavesdropped by systems designed based on similar principles.

翻訳日:2024-07-16 21:08:35 公開日:2024-07-12

# CLOVER:コンテキストを考慮した長期オブジェクト視点と環境不変表現学習

CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning ( http://arxiv.org/abs/2407.09718v1 )

ライセンス: Link先を確認

Dongmyeong Lee, Amanda Adkins, Joydeep Biswas,

(参考訳) 多くのアプリケーションにおいて、ロボットは、オブジェクトインスタンスを識別したり、以前見たインスタンスを再識別する機能を含む、環境のオブジェクトレベルの理解の恩恵を受けることができる。オブジェクトの再識別は、異なる視点や、天気や照明の変化に起因する顕著な外観の変化のあるシーンで困難である。一般的な対象の再識別に対処するアプローチは、前景のセグメンテーションを必要とし、オクルージョン、屋外シーン、照明変更といった課題について限定的に考慮する。様々な照明条件と視点下での8クラスの557個のオブジェクトの1,037,814個の観測を含む,地中オブジェクト再識別データセットであるCODa Re-IDを紹介する。さらに,静的なオブジェクトインスタンスを区別可能なオブジェクト観測のための表現学習手法であるCLOVERを提案する。この結果から,CLOVERは照明条件や視点変化の異なる静的オブジェクト再識別において優れた性能を示し,未知のインスタンスやクラスに一般化できることがわかった。

In many applications, robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen instances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Most works on object re-identification focus on specific classes; approaches that address general object re-identification require foreground segmentation and have limited consideration of challenges such as occlusions, outdoor scenes, and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects of 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes, and can generalize to unseen instances and classes.

翻訳日:2024-07-16 21:08:35 公開日:2024-07-12

# MSEval:アルゴリズムモデルを評価する概念設計における材料選択のためのデータセット

MSEval: A Dataset for Material Selection in Conceptual Design to Evaluate Algorithmic Models ( http://arxiv.org/abs/2407.09719v1 )

ライセンス: Link先を確認

Yash Patawari Jain, Daniele Grandi, Allin Groom, Brandon Cramer, Christopher McComb,

(参考訳) 材料選択は製造業から建設まで、多くの産業において重要な役割を担っている。材料選択は通常、設計者が設計ソリューションと意図した製造アプローチを反復的に洗練する、いくつかの概念設計のサイクル後に行われる。デザイン研究において、材料選択は一般に1つの正解を持つ最適化問題として扱われる。さらに、特定の種類のオブジェクトや設計関数に制限されることも少なくないため、選択プロセスの計算コストと時間を要する可能性がある。本稿では,多種多様なデザインブリーフィングと基準にまたがって,専門家による資料評価からなる新しいデータセットであるMSEvalを紹介する。このデータは、概念設計のための材料選択の文脈における機械学習モデルの評価と修正を容易にするためのベンチマークとして機能するように設計されている。

Material selection plays a pivotal role in many industries, from manufacturing to construction. Material selection is usually carried out after several cycles of conceptual design, during which designers iteratively refine the design solution and the intended manufacturing approach. In design research, material selection is typically treated as an optimization problem with a single correct answer. Moreover, it is also often restricted to specific types of objects or design functions, which can make the selection process computationally expensive and time-consuming. In this paper, we introduce MSEval, a novel dataset which is comprised of expert material evaluations across a variety of design briefs and criteria. This data is designed to serve as a benchmark to facilitate the evaluation and modification of machine learning models in the context of material selection for conceptual design.

翻訳日:2024-07-16 21:08:35 公開日:2024-07-12

# 大規模言語モデル推論の高速化のための多言語共同投機復号法

Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference ( http://arxiv.org/abs/2407.09722v1 )

ライセンス: Link先を確認

Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun,

(参考訳) 変換器をベースとした大規模言語モデル(LLM)は、様々なタスクにおいてそのパワーを実証しているが、その推論にはかなりの時間とエネルギーコストがかかる。 LLM推論を高速化するために、投機的復号法はより小さなモデルを用いて1つのトークン列を提案し、その後ターゲットの大モデルによってバッチで検証される。自己回帰復号法と比較すると、投機的復号法は同じ数のトークンを生成し、大きなモデルの実行量が少なくなるため、全体の推論を1ドルから2ドルに加速する。しかし、greedy decodingは出力パープレキシティの観点からは最適な復号アルゴリズムではなく、復号アルゴリズムの有効性を直接測定する。投機的復号化よりも出力の難易度と効率性が良いアルゴリズムは、実際より有用である。この明らかに矛盾する目標を達成するために、まず、各ステップで複数のトークンを、その関節の難易度に基づいて重み付けして生成するマルチトークンジョイントグリーディデコーディング(MJGD)を導入する。アウトプット全体の難易度が向上することを示す。しかし、MJGDの計算コストは実際には実現不可能である。そこで本研究では,MJGDの近似と高速化を両面から行うMJSDを提案する。MJGDは,大モデルと小モデルの結合分布を近似し,近似の精度を保証するための検証ステップを用い,ビームデコーディングを用いて関節分布からのシーケンス生成を高速化する。バニラ投機復号法と比較すると、MJSDには2つの利点がある。(1)MJGDの近似であり、より良い出力パープレキシティを実現すること、(2)結合可能性による検証により、有効なパープレキシティを持つドラフトトークンの長いプレフィックスサブシーケンスを受け入れることができ、効率が向上する。

Transformer-based Large language models (LLMs) have demonstrated their power in various tasks, but their inference incurs significant time and energy costs. To accelerate LLM inference, speculative decoding uses a smaller model to propose one sequence of tokens, which are subsequently validated in batch by the target large model. Compared with autoregressive decoding, speculative decoding generates the same number of tokens with fewer runs of the large model, hence accelerating the overall inference by $1$-$2\times$. However, greedy decoding is not the optimal decoding algorithm in terms of output perplexity, which is a direct measurement of the effectiveness of a decoding algorithm. An algorithm that has better output perplexity and even better efficiency than speculative decoding can be more useful in practice. To achieve this seemingly contradictory goal, we first introduce multi-token joint greedy decoding (MJGD), which greedily generates multiple tokens at each step based on their joint perplexity. We show that it leads to better perplexity for the whole output. But the computation cost of MJGD is infeasible in practice. So we further propose multi-token joint speculative decoding (MJSD), which approximates and accelerates the MJGD from two aspects: it approximates the joint distribution of the large model with that of a small model, and uses a verification step to guarantee the accuracy of approximation; then it uses beam decoding to accelerate the sequence generation from the joint distribution. Compared with vanilla speculative decoding, MJSD has two advantages: (1) it is an approximation of MJGD, thus achieving better output perplexity; (2) verification with joint likelihood allows it to accept the longest prefix sub-sequence of the draft tokens with valid perplexity, leading to better efficiency...

翻訳日:2024-07-16 21:08:35 公開日:2024-07-12

# FlashAttention-3: 非同期と低精度で高速で正確な注意

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision ( http://arxiv.org/abs/2407.08608v2 )

ライセンス: Link先を確認

Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, Tri Dao,

(参考訳) ユビキタストランスフォーマーアーキテクチャのコアレイヤとしての注意は、大規模言語モデルと長期コンテキストアプリケーションのボトルネックとなる。 FlashAttentionは、メモリ読み込み/書き込みを最小化することでGPUの注意を加速するアプローチを詳しく説明した。しかし、FlashAttention-2はH100 GPUでわずか35%しか利用できないため、最近のハードウェアで見られる新機能をまだ活用していない。 1)ワープ特殊化による全体的な計算とデータ移動の重なり、(2)ブロックワイドの行列とソフトマックス演算のインターリーブ、(3)FP8のハードウェアサポートを利用するブロック量子化と不整合処理である。提案手法であるFlashAttention-3は,FP16が740 TFLOPs/s (75%) に達し,FP8が1.2 PFLOPs/sに近づき,H100 GPUの1.5-2.0$\times$が高速化されることを示す。我々はFP8 FlashAttention-3がベースラインFP8よりも2.6$\times$低い数値誤差を達成したことを検証する。

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for large language models and long-context applications. FlashAttention elaborated an approach to speed up attention on GPUs through minimizing memory reads/writes. However, it has yet to take advantage of new capabilities present in recent hardware, with FlashAttention-2 achieving only 35% utilization on the H100 GPU. We develop three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) block quantization and incoherent processing that leverages hardware support for FP8 low-precision. We demonstrate that our method, FlashAttention-3, achieves speedup on H100 GPUs by 1.5-2.0$\times$ with FP16 reaching up to 740 TFLOPs/s (75% utilization), and with FP8 reaching close to 1.2 PFLOPs/s. We validate that FP8 FlashAttention-3 achieves 2.6$\times$ lower numerical error than a baseline FP8 attention.

翻訳日:2024-07-16 11:29:58 公開日:2024-07-12

# 自動倉庫レイアウト生成のための新しいフレームワーク

A Novel Framework for Automated Warehouse Layout Generation ( http://arxiv.org/abs/2407.08633v2 )

ライセンス: Link先を確認

Atefeh Shahroudnejad, Payam Mousavi, Oleksii Perepelytsia, Sahir, David Staszak, Matthew E. Taylor, Brent Bawel,

(参考訳) 倉庫レイアウトの最適化は、効率と生産性に大きな影響を与えるため、非常に重要です。自動倉庫レイアウト生成のためのAI駆動フレームワークを提案する。このフレームワークは制約されたビームサーチを用いて、任意の空間パラメータ内の最適なレイアウトを導出し、すべての機能要件を順守する。生成したレイアウトの有効性は、アイテムアクセシビリティ、必要最小限のクリアランス、および通路接続性といった基準に基づいて検証される。次に、記憶位置、アクセスポイント、アクセシビリティコストを考慮し、評価可能なレイアウトを評価するためにスコア関数が使用される。本手法は, 各種倉庫の寸法, 形状, ドア配置, インターコネクトに対して, 実現可能な最適レイアウトを作成できることを示す。このアプローチは、現在デプロイの準備が整っているため、人間のデザイナがオプションを素早く探索し、確認することが可能になり、ユースケースに最適なレイアウトの選択が容易になる。

Optimizing warehouse layouts is crucial due to its significant impact on efficiency and productivity. We present an AI-driven framework for automated warehouse layout generation. This framework employs constrained beam search to derive optimal layouts within given spatial parameters, adhering to all functional requirements. The feasibility of the generated layouts is verified based on criteria such as item accessibility, required minimum clearances, and aisle connectivity. A scoring function is then used to evaluate the feasible layouts considering the number of storage locations, access points, and accessibility costs. We demonstrate our method's ability to produce feasible, optimal layouts for a variety of warehouse dimensions and shapes, diverse door placements, and interconnections. This approach, currently being prepared for deployment, will enable human designers to rapidly explore and confirm options, facilitating the selection of the most appropriate layout for their use-case.

翻訳日:2024-07-16 11:29:58 公開日:2024-07-12

# ボディードチェーン・オブ・ソート推論によるロボット制御

Robotic Control via Embodied Chain-of-Thought Reasoning ( http://arxiv.org/abs/2407.08693v2 )

ライセンス: Link先を確認

Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine,

(参考訳) 学習したロボット制御ポリシーの重要な制限は、トレーニングデータの外部で一般化できないことである。視覚言語行動モデル(VLA)に関する最近の研究は、学習されたロボットポリシーのバックボーンとして、大規模なインターネット事前学習された視覚言語モデルを使用することで、その堅牢性と一般化能力を大幅に向上させることができることを示した。しかし、他の領域における大きな視覚言語モデルの最もエキサイティングな能力の1つは、複雑な問題を通して反復的に推論できる能力である。同じ能力をロボティクスに持ち込んで、行動する前に与えられたタスクを推論することで、パフォーマンスを向上させるポリシーを実現することができるだろうか? チェーン・オブ・シークレット(CoT)スタイルのプロンプトの使用は、通常のVLAでは比較的単純なトレーニング例であるため、はるかに効果が低い。さらに、通常のCoTでよく見られるように、サブタスクに関する純粋に意味論的推論は、感覚観察やロボットの状態に推論を根ざす必要があるロボットポリシーには不十分である。この目的のために、我々はVLAのためのEmbodied Chain-of-Thought Reasoning (ECoT)を導入し、ロボットの動作を予測する前に、計画、サブタスク、動き、そしてオブジェクト境界ボックスやエンドエフェクタ位置のような視覚的に接地された特徴について推論する複数のステップを実行するようにVLAを訓練する。大規模ロボットデータセット上でECoTのための合成トレーニングデータを生成するスケーラブルなパイプラインを設計する。 ECoTは、現在最強のオープンソースVLAポリシーであるOpenVLAの絶対的な成功率を、追加のロボットトレーニングデータなしで、挑戦的な一般化タスクに対して28%向上することを示した。さらに、ECoTは、人間がポリシーの失敗を解釈し、自然言語を使って行動を修正するのを容易にする。

A key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability. Yet, one of the most exciting capabilities of large vision-language models in other domains is their ability to reason iteratively through complex problems. Can that same capability be brought into robotics to allow policies to improve performance by reasoning about a given task before acting? Naive use of "chain-of-thought" (CoT) style prompting is significantly less effective with standard VLAs because of the relatively simple training examples that are available to them. Additionally, purely semantic reasoning about sub-tasks, as is common in regular CoT, is insufficient for robot policies that need to ground their reasoning in sensory observations and the robot state. To this end, we introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features like object bounding boxes and end effector positions, before predicting the robot action. We design a scalable pipeline for generating synthetic training data for ECoT on large robot datasets. We demonstrate, that ECoT increases the absolute success rate of OpenVLA, the current strongest open-source VLA policy, by 28% across challenging generalization tasks, without any additional robot training data. Additionally, ECoT makes it easier for humans to interpret a policy's failures and correct its behavior using natural language.

翻訳日:2024-07-16 11:29:58 公開日:2024-07-12

# Embodied Computational Agents を用いた連続的発達神経シミュレーション

Continual Developmental Neurosimulation Using Embodied Computational Agents ( http://arxiv.org/abs/2103.05753v3 )

ライセンス: Link先を確認

Bradly Alicea, Rishabh Chakrabarty, Stefan Dvoretskii, Akshara Gopi, Avery Lim, Jesse Parent,

(参考訳) 発達生物学、認知科学、計算モデリングの合成を通じて学ぶべきことはたくさんある。私たちの進路には、Braitenberg Vehiclesをベースとした開発にインスパイアされた学習エージェントの設計が含まれます。神経系の形態形成, 発達学習, 可塑性の関連現象のブリッジングにおける発達軌跡の役割を考察することができる。本手法は, 連続学習と密接に結びついており, 発達的実施形態と密に統合されており, 発達的ブレイテンベルク車両 (dBVs) と呼ばれるエージェントを用いて実施することができる。 dBVは、体、センサー、エフェクター、神経システムなど、エージェントベースのシステムへと変貌する、未定義の構造の集合として、自らの生活を始める。この表現型は発達のタイミングで特徴づけられる: 異なる形態形成、臨界、獲得(発達学習)期間を持つ。さらに,ネットワーク形態形成は遺伝的アルゴリズムを用いて行うことができ,発達学習は多数の計算手法を用いて行うことができることを提案する。このアプローチは、発達的アプローチから生じるかもしれない適応的エージェントの振る舞いのフレームワークを提供する。すなわち、臨界周期や成長と獲得、明示的な具体的ネットワークアーキテクチャ、神経ネットワークの組み立てとこれらのネットワーク上でのアクティブな学習の区別などである。結論として、エージェント学習と開発を、非常に短い(100ms)間隔から長期的進化まで、異なる時間スケールで検討する。エンボディドエージェントベースのアプローチにおける発達、進化、学習は、生物学的にインスパイアされたインテリジェンスの統合的視点の鍵となる。

There is much to learn through synthesis of Developmental Biology, Cognitive Science and Computational Modeling. Our path forward involves a design for developmentally-inspired learning agents based on Braitenberg Vehicles. Continual developmental neurosimulation allows us to consider the role of developmental trajectories in bridging the related phenomena of nervous system morphogenesis, developmental learning, and plasticity. Being closely tied to continual learning, our approach is tightly integrated with developmental embodiment, and can be implemented using a type of agent called developmental Braitenberg Vehicles (dBVs). dBVs begin their lives as a set of undefined structures that transform into agent-based systems including a body, sensors, effectors, and nervous system. This phenotype is characterized in terms of developmental timing: with distinct morphogenetic, critical, and acquisition (developmental learning) periods. We further propose that network morphogenesis can be accomplished using a genetic algorithmic approach, while developmental learning can be implemented using a number of computational methodologies. This approach provides a framework for adaptive agent behavior that might result from a developmental approach: namely by exploiting critical periods or growth and acquisition, an explicitly embodied network architecture, and a distinction between the assembly of neuronal networks and active learning on these networks. In conclusion, we will consider agent learning and development at different timescales, from very short (<100ms) intervals to long-term evolution. The development, evolution, and learning in an embodied agent-based approach is key to an integrative view of biologically-inspired intelligence.

翻訳日:2024-07-16 06:11:12 公開日:2024-07-12

# CLIP-PAE: 絡み合った、解釈可能な、制御可能なテキストガイド型顔マニピュレーションのための関連特徴抽出のための投影拡張埋め込み

CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation ( http://arxiv.org/abs/2210.03919v5 )

ライセンス: Link先を確認

Chenliang Zhou, Fangcheng Zhong, Cengiz Oztireli,

(参考訳) 最近導入されたContrastive Language- Image Pre-Training (CLIP) は、画像とテキストを結合した潜在空間に埋め込むことでブリッジする。これにより、テキストによる説明を提供することで、入力画像を操作することを目的とした文献を多用する扉を開く。しかし、画像とテキストの埋め込みの相違により、最適化対象としてテキストの埋め込みを用いることで、結果の画像に望ましくないアーティファクトがしばしば導入される。絡み合い、解釈可能性、制御性も操作の保証が難しい。これらの問題を緩和するために、特定の画像の特徴を捉えるための関連するプロンプトで区切られたコーパス部分空間を定義することを提案する。テキスト誘導画像操作の性能向上のための最適化ターゲットとして,CLIPプロジェクション拡張埋め込み(PAE)を導入する。提案手法は,任意のCLIPに基づく画像操作アルゴリズムに容易に計算,適応し,スムーズに組み込むことができる,シンプルで汎用的なパラダイムである。本手法の有効性を実証するため,いくつかの理論的および実証的研究を行った。ケーススタディとして,テキスト誘導型セマンティックフェイス編集の手法を用いる。我々は、PAEが、最先端の品質と精度で、より不整合で、解釈可能で、制御可能な画像操作を促進することを定量的に、質的に証明する。プロジェクトページ: https://chenliang-zhou.github.io/CLIP-PAE/。

Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduces undesired artifacts in the resulting images. Disentanglement, interpretability, and controllability are also hard to guarantee for manipulation. To alleviate these problems, we propose to define corpus subspaces spanned by relevant prompts to capture specific image characteristics. We introduce CLIP Projection-Augmentation Embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation. Our method is a simple and general paradigm that can be easily computed and adapted, and smoothly incorporated into any CLIP-based image manipulation algorithm. To demonstrate the effectiveness of our method, we conduct several theoretical and empirical studies. As a case study, we utilize the method for text-guided semantic face editing. We quantitatively and qualitatively demonstrate that PAE facilitates a more disentangled, interpretable, and controllable image manipulation with state-of-the-art quality and accuracy. Project page: https://chenliang-zhou.github.io/CLIP-PAE/.

翻訳日:2024-07-16 06:11:12 公開日:2024-07-12

# 周波数アップコンバータを用いた低周波電磁モードの量子力学

Quantum metrology of low frequency electromagnetic modes with frequency upconverters ( http://arxiv.org/abs/2210.05576v2 )

ライセンス: Link先を確認

Stephen E. Kuenstner, Elizabeth C. van Assendelft, Saptarshi Chaudhuri, Hsiao-Mei Cho, Jason Corbin, Shawn W. Henderson, Fedja Kadribasic, Dale Li, Arran Phipps, Nicholas M. Rapidis, Maria Simanovskaia, Jyotirmai Singh, Cyndia Yu, Kent D. Irwin,

(参考訳) 本稿では、RF量子アップコンバータ(RQU)と、そのdcと超高周波数帯域(VHF)間の電磁モードの量子メトロジーへの応用について述べる(\lesssim$300MHz)。 RQUは、超伝導ループとジョセフソン接合からなるジョセフソン干渉計を用いて、低周波電磁モード(dcとVHF)とマイクロ波Cバンド(\sim$5GHz)のパラメトリック相互作用を実装する。我々は量子増幅器理論を用いてRQUの性能を解析し、RQUがこの周波数範囲で量子制限オプアンプとして動作可能であることを示す。また、バックアクション回避(BAE)測定、サイドバンド冷却、二モードスクイーズなど、キャビティ光学で使用されるものと同等の古典的な測定プロトコルを使用することもできる。これらのプロトコルは、標準量子限界(SQL)よりも感度のよい量子センサとして、dc--VHF電磁モードを用いた実験を可能にする。 RQUを用いて低周波からマイクロ波Cバンドへの信号アップコンバージョンを示し、完全なBAEの実現に向けた必要なステップである46.9$\;dBの位相感度ゲイン(指数比)を示す。

We present the RF Quantum Upconverter (RQU) and describe its application to quantum metrology of electromagnetic modes between dc and the Very High Frequency band (VHF) ($\lesssim$300MHz). The RQU uses a Josephson interferometer made up of superconducting loops and Josephson junctions to implement a parametric interaction between a low-frequency electromagnetic mode (between dc and VHF) and a mode in the microwave C Band ($\sim$ 5GHz), analogous to the radiation pressure interaction between electromagnetic and mechanical modes in cavity optomechanics. We analyze RQU performance with quantum amplifier theory, and show that the RQU can operate as a quantum-limited op-amp in this frequency range. It can also use non-classical measurement protocols equivalent to those used in cavity optomechanics, including back-action evading (BAE) measurements, sideband cooling, and two-mode squeezing. These protocols enable experiments using dc--VHF electromagnetic modes as quantum sensors with sensitivity better than the Standard Quantum Limit (SQL). We demonstrate signal upconversion from low frequencies to microwave C band using an RQU and show a phase-sensitive gain (extinction ratio) of $46.9$\;dB, which is a necessary step towards the realization of full BAE.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# Laplace Approximationsによるディープラーニングの効率的なベイズ更新

Efficient Bayesian Updates for Deep Learning via Laplace Approximations ( http://arxiv.org/abs/2210.06112v2 )

ライセンス: Link先を確認

Denis Huseljic, Marek Herde, Lukas Rauch, Paul Hahn, Zhixin Huang, Daniel Kottke, Stephan Vogt, Bernhard Sick,

(参考訳) ディープニューラルネットワークのトレーニングには重要な計算リソースを必要とするため、トレーニングデータセットを新しいデータで拡張するのは、通常は完全な再トレーニングを必要とするため、難しい。さらに、特定のアプリケーションは時間や計算上の制約によりコストのかかる再訓練を許さない。ラプラス近似を用いたディープニューラルネットワークのための新しいベイズ更新手法を提案することでこの問題に対処する。具体的には、ラプラス近似のガウス後続分布に二階最適化手法を応用し、逆ヘッセン行列を閉形式で計算する。このようにして、定常環境での新たなデータの到着時に、高速かつ効果的な更新を可能にする。さまざまなデータモダリティに対する大規模な評価調査では、当社の更新が、コストのかかる再トレーニングに代わる、迅速かつ競争的な代替手段であることを確認しています。さらに、既存の選択戦略を改善するために、我々の更新を利用することで、深いアクティブな学習シナリオで適用性を示す。

Since training deep neural networks takes significant computational resources, extending the training dataset with new data is difficult, as it typically requires complete retraining. Moreover, specific applications do not allow costly retraining due to time or computational constraints. We address this issue by proposing a novel Bayesian update method for deep neural networks by using a last-layer Laplace approximation. Concretely, we leverage second-order optimization techniques on the Gaussian posterior distribution of a Laplace approximation, computing the inverse Hessian matrix in closed form. This way, our method allows for fast and effective updates upon the arrival of new data in a stationary setting. A large-scale evaluation study across different data modalities confirms that our updates are a fast and competitive alternative to costly retraining. Furthermore, we demonstrate its applicability in a deep active learning scenario by using our update to improve existing selection strategies.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# 良い説明とは何か:説明の性質の調和した見方

What Makes a Good Explanation?: A Harmonized View of Properties of Explanations ( http://arxiv.org/abs/2211.05667v3 )

ライセンス: Link先を確認

Zixi Chen, Varshini Subhash, Marton Havasi, Weiwei Pan, Finale Doshi-Velez,

(参考訳) 解釈可能性(Interpretability)は、人間が機械学習(ML)モデルの側面を検証する手段を提供し、タスクを完全に自動化できない状況において、人間とMLのコラボレーションを強化する。異なる文脈は異なる性質を持つ説明を必要とする。例えば、早期の心停止警告システムがケア環境に統合される準備ができているかを決定するのに必要な説明の種類は、ローン申請者がアプリケーションを成功させるために必要なアクションを決定するのに必要な説明の種類とは大きく異なります。残念ながら、説明の性質に関して、標準化の欠如がある:異なる論文は、同じ用語を異なる量を意味するために、異なる用語を同じ量を意味するために使用する。この標準化された用語の欠如とML説明の性質の分類は、解釈可能な機械学習手法を厳格に比較することと、どの文脈でどの特性が必要なのかを識別することの両方を妨げます。本研究では、解釈可能な機械学習論文で定義された特性を調査し、実際に測定したものに基づいてそれらを合成し、それらの特性の異なる定式化間のトレードオフを記述する。そこで我々は,タスクに適した説明属性の定式化や,解釈可能な機械学習における今後の作業の標準化について,より情報的な選択を可能にする。

Interpretability provides a means for humans to verify aspects of machine learning (ML) models and empower human+ML teaming in situations where the task cannot be fully automated. Different contexts require explanations with different properties. For example, the kind of explanation required to determine if an early cardiac arrest warning system is ready to be integrated into a care setting is very different from the type of explanation required for a loan applicant to help determine the actions they might need to take to make their application successful. Unfortunately, there is a lack of standardization when it comes to properties of explanations: different papers may use the same term to mean different quantities, and different terms to mean the same quantity. This lack of a standardized terminology and categorization of the properties of ML explanations prevents us from both rigorously comparing interpretable machine learning methods and identifying what properties are needed in what contexts. In this work, we survey properties defined in interpretable machine learning papers, synthesize them based on what they actually measure, and describe the trade-offs between different formulations of these properties. In doing so, we enable more informed selection of task-appropriate formulations of explanation properties as well as standardization for future work in interpretable machine learning.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# 多文書要約モデルは合成されるか?

Do Multi-Document Summarization Models Synthesize? ( http://arxiv.org/abs/2301.13844v2 )

ライセンス: Link先を確認

Jay DeYoung, Stephanie C. Martinez, Iain J. Marshall, Byron C. Wallace,

(参考訳) 多文書要約では、入力の集合の簡潔なシナプスを生成する。例えば、特定の映画について書かれた映画レビューのシナプスは、平均的な批評家のコンセンサスを反映すべきである。より簡潔な例として、臨床治験結果の生医学的体系的レビューを伴う物語要約は、個々の治験から生じる潜在的に矛盾する結果を正確に要約するべきである。本稿では,現代多文書要約モデルが如何に,このような合成を暗黙的に行うのかを問う。我々は、微調整されたトランスフォーマーからGPT-4まで、一連の要約モデルを用いて、意見とエビデンス合成データセットに関する実験を行う。既存のモデルでも部分的には合成を行うが、最高のモデルでさえ入力順序の変化に過敏であり、入力組成の変化に過敏である(例えば、正と負のレビューの比率)。提案手法は, モデル合成能力を向上させるための単純な, 汎用的, 効果的な手法であり, 明確な多様な候補出力を生成し, それらの文字列から, 入力に対して期待される集計値に最も適しているか, あるいは, モデルが良い候補を生成できない場合の留意点を選択する。

Multi-document summarization entails producing concise synopses of collections of inputs. For some applications, the synopsis should accurately synthesize inputs with respect to a key aspect, e.g., a synopsis of film reviews written about a particular movie should reflect the average critic consensus. As a more consequential example, narrative summaries that accompany biomedical systematic reviews of clinical trial results should accurately summarize the potentially conflicting results from individual trials. In this paper we ask: To what extent do modern multi-document summarization models implicitly perform this sort of synthesis? We run experiments over opinion and evidence synthesis datasets using a suite of summarization models, from fine-tuned transformers to GPT-4. We find that existing models partially perform synthesis, but imperfectly: even the best performing models are over-sensitive to changes in input ordering and under-sensitive to changes in input compositions (e.g., ratio of positive to negative reviews). We propose a simple, general, effective method for improving model synthesis capabilities by generating an explicitly diverse set of candidate outputs, and then selecting from these the string best aligned with the expected aggregate measure for the inputs, or abstaining when the model produces no good candidate.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# 一般化性能向上のためのアダムの適応ステップ範囲の抑制について

On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance ( http://arxiv.org/abs/2302.01029v3 )

ライセンス: Link先を確認

Guoqiang Zhang,

(参考訳) 近年のアダプティブ・オプティマイザは、基本的に適応段差の分散を減らし、運動量を伴うSGDに近づくことにより、Adamの一般化性能を向上させる。上記のモチベーションに従えば、階層的勾配統計を利用して、アダムの適応段階化の範囲を抑えることができる。特に、各イテレーションにおいて、DNNモデルの更新に使用する前に、第2運動量v_tで連続して3つの操作を実行することを提案する:(1)ダウンスケーリング、(2)エプシロン埋め込み、(3)ダウン翻訳。結果のアルゴリズムはSET-Adamと呼ばれ、SETは3つの操作の簡単な表記法である。 v_tの層状サブベクタと対応するオールワンサブベクタとの角度を利用して、v_t上のダウンスケーリング動作を行う。 SET-Adam は NLP の変換器と LSTM のトレーニングにおいて 8 つの適応最適化器より優れており,CIAF10 と CIFAR100 のイメージ分類では VGG と ResNet が,画像生成タスクの WGAN-GP モデルのトレーニングでは 8 つの適応手法の最適性能に適合している。さらに、SET-AdamはImageNet上でResNet18をトレーニングするためにAdamやAdaBeliefよりも高い検証精度を生成する。

A number of recent adaptive optimizers improve the generalisation performance of Adam by essentially reducing the variance of adaptive stepsizes to get closer to SGD with momentum. Following the above motivation, we suppress the range of the adaptive stepsizes of Adam by exploiting the layerwise gradient statistics. In particular, at each iteration, we propose to perform three consecutive operations on the second momentum v_t before using it to update a DNN model: (1): down-scaling, (2): epsilon-embedding, and (3): down-translating. The resulting algorithm is referred to as SET-Adam, where SET is a brief notation of the three operations. The down-scaling operation on v_t is performed layerwise by making use of the angles between the layerwise subvectors of v_t and the corresponding all-one subvectors. Extensive experimental results show that SET-Adam outperforms eight adaptive optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100 while matching the best performance of the eight adaptive methods when training WGAN-GP models for image generation tasks. Furthermore, SET-Adam produces higher validation accuracies than Adam and AdaBelief for training ResNet18 over ImageNet.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# 周りを見回して学ぶ:探索による自己学習対象検出

Look Around and Learn: Self-Training Object Detection by Exploration ( http://arxiv.org/abs/2302.03566v3 )

ライセンス: Link先を確認

Gianluca Scarpellini, Stefano Rosa, Pietro Morerio, Lorenzo Natale, Alessio Del Bue,

(参考訳) オブジェクト検出器が新しい環境でデプロイされると、しばしばパフォーマンスが低下する。本稿では,既存の物体検出装置を人間の介入に頼らずに,新たな環境下で画像の探索と取得を行なえる方法,すなわち,完全に自己管理されたアプローチについて考察する。私たちの設定では、エージェントはまず、事前訓練されたオフザシェルフ検出器を使って、オブジェクトを検出し、擬似ラベルを関連付けることで、環境を探索することを学びます。同一対象の擬似ラベルは異なる視点で一致しなくてはならないと仮定することで、探索政策を学習し、硬いサンプルを採掘し、観察のコンセンサスから洗練された擬似ラベルを生成するための「診断和解」と呼ばれる新しいメカニズムを考案する。我々は現在の最先端の統一されたベンチマークを実装し、既存の探索政策や知覚メカニズムと比較する。提案手法は既存の手法よりも優れており,シミュレーションシナリオでは対象検出器を6.2%改善し,他の最先端手法よりも3.59%向上し,実際のロボット試験では9.97%向上した。提案されたアプローチとベースラインのコードはhttps://iit-pavis.github.io/Look_Around_And_Learn/で公開されている。

When an object detector is deployed in a novel setting it often experiences a drop in performance. This paper studies how an embodied agent can automatically fine-tune a pre-existing object detector while exploring and acquiring images in a new environment without relying on human intervention, i.e., a fully self-supervised approach. In our setting, an agent initially learns to explore the environment using a pre-trained off-the-shelf detector to locate objects and associate pseudo-labels. By assuming that pseudo-labels for the same object must be consistent across different views, we learn the exploration policy Look Around to mine hard samples, and we devise a novel mechanism called Disagreement Reconciliation for producing refined pseudo-labels from the consensus among observations. We implement a unified benchmark of the current state-of-the-art and compare our approach with pre-existing exploration policies and perception mechanisms. Our method is shown to outperform existing approaches, improving the object detector by 6.2% in a simulated scenario, a 3.59% advancement over other state-of-the-art methods, and by 9.97% in the real robotic test without relying on ground-truth. Code for the proposed approach and baselines are available at https://iit-pavis.github.io/Look_Around_And_Learn/.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# AI強化集中治療ユニット:広汎なセンシングによる患者ケアの革新

AI-Enhanced Intensive Care Unit: Revolutionizing Patient Care with Pervasive Sensing ( http://arxiv.org/abs/2303.06252v2 )

ライセンス: Link先を確認

Subhash Nerella, Ziyuan Guan, Scott Siegel, Jiaqing Zhang, Kia Khezeli, Azra Bihorac, Parisa Rashidi,

(参考訳) 集中治療室 (ICU) は、重篤な患者が集中治療や監視を受ける特別な病院空間である。包括的モニタリングは、患者の状態、特に明度、究極的にはケアの質を評価する上で必須である。しかし、ICUにおける患者の監視範囲は、時間的制約と医療提供者の作業負荷によって制限されている。現在、表情、姿勢、移動といった細部を含む視力評価は散発的に捉えられるか、全く捉えられていない。これらの手動の観察は個人を対象としており、ドキュメントの誤りを招きやすい。人工知能(AI)によって実現されたシステムは、異常な学習能力のために、患者の視覚的モニタリングとアセスメントを増強する可能性がある。このようなシステムは、トレーニングにロバストなアノテートデータを必要とする。そこで本研究では,複数モードの深度画像,カラーRGB画像,加速度計,筋電図,音圧,光レベルからデータを収集し,連続的および粒度の計測,デリリウムリスク,痛み,移動性評価などのインテリジェントなモニタリングシステムを開発するために,広汎なセンシング・データ処理システムを開発した。本稿では,リアルタイムの患者モニタリングと視覚的評価のために開発したIntelligent Intensive Care Unit (I2CU)システムアーキテクチャについて述べる。

The intensive care unit (ICU) is a specialized hospital space where critically ill patients receive intensive care and monitoring. Comprehensive monitoring is imperative in assessing patients conditions, in particular acuity, and ultimately the quality of care. However, the extent of patient monitoring in the ICU is limited due to time constraints and the workload on healthcare providers. Currently, visual assessments for acuity, including fine details such as facial expressions, posture, and mobility, are sporadically captured, or not captured at all. These manual observations are subjective to the individual, prone to documentation errors, and overburden care providers with the additional workload. Artificial Intelligence (AI) enabled systems has the potential to augment the patient visual monitoring and assessment due to their exceptional learning capabilities. Such systems require robust annotated data to train. To this end, we have developed pervasive sensing and data processing system which collects data from multiple modalities depth images, color RGB images, accelerometry, electromyography, sound pressure, and light levels in ICU for developing intelligent monitoring systems for continuous and granular acuity, delirium risk, pain, and mobility assessment. This paper presents the Intelligent Intensive Care Unit (I2CU) system architecture we developed for real-time patient monitoring and visual assessment.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# MS-TCRNet:センサ強化キネマティクスを用いた動作セグメンテーションのための多段階時間畳み込みリカレントネットワーク

MS-TCRNet: Multi-Stage Temporal Convolutional Recurrent Networks for Action Segmentation Using Sensor-Augmented Kinematics ( http://arxiv.org/abs/2303.07814v2 )

ライセンス: Link先を確認

Adam Goldbraikh, Omer Shubi, Or Rubin, Carla M Pugh, Shlomi Laufer,

(参考訳) アクションセグメンテーション(Action segmentation)は、様々なセンサーから得られるビデオやキネマティックデータで通常実行される、ハイレベルなプロセス分析において難しいタスクである。本研究は,運動学的データに対する行動セグメンテーションに関連する2つのコントリビューションを提示する。まず,動作データに特化して設計されたMS-TCRNet(Multi-Stage Temporal Convolutional Recurrent Networks)の2つのバージョンを紹介する。アーキテクチャは、ステージ内正規化を備えた予測ジェネレータと、双方向LSTMまたはGRUベースの精錬ステージで構成されている。第2に、キネマティックデータの強い幾何学的構造を利用してアルゴリズムの性能とロバスト性を向上する、World Frame RotationとHand Inversionという2つの新しいデータ拡張手法を提案する。手術縫合作業の3つのデータセット: 可変組織シミュレーション(VTS)データセットと新たに導入されたボウエル修復シミュレーション(BRS)データセット、およびロボット手術におけるよく知られたベンチマークであるJHU-ISI Gesture and Skill Assessment Working Set(JIGSAWS)データセットについて、本モデルの評価を行った。我々の手法は最先端のパフォーマンスを達成した。

Action segmentation is a challenging task in high-level process analysis, typically performed on video or kinematic data obtained from various sensors. This work presents two contributions related to action segmentation on kinematic data. Firstly, we introduce two versions of Multi-Stage Temporal Convolutional Recurrent Networks (MS-TCRNet), specifically designed for kinematic data. The architectures consist of a prediction generator with intra-stage regularization and Bidirectional LSTM or GRU-based refinement stages. Secondly, we propose two new data augmentation techniques, World Frame Rotation and Hand Inversion, which utilize the strong geometric structure of kinematic data to improve algorithm performance and robustness. We evaluate our models on three datasets of surgical suturing tasks: the Variable Tissue Simulation (VTS) Dataset and the newly introduced Bowel Repair Simulation (BRS) Dataset, both of which are open surgery simulation datasets collected by us, as well as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a well-known benchmark in robotic surgery. Our methods achieved state-of-the-art performance.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# グラフ上のランダム逆問題:分散オンライン学習

Random Inverse Problems Over Graphs: Decentralized Online Learning ( http://arxiv.org/abs/2303.11789v6 )

ライセンス: Link先を確認

Tao Li, Xiwei Zhang,

(参考訳) ネットワークグラフ上の分散ランダム逆問題のフレームワークをオンライン測定で構築し,分散化されたオンライン学習アルゴリズムを提案する。これはヒルベルト空間における分散パラメータ推定と、再現されたカーネルヒルベルト空間(RKHS-LMS)における最小平均平方問題を統一する。我々は、アルゴリズムの収束を、L2有界なマルティンゲール差項を持つヒルベルト空間における不均一なランダム差分方程式のクラスにおける漸近安定性に変換し、ヒルベルト空間におけるL2-漸近安定性理論を開発する。ネットワークグラフが連結され、フォワード作用素の列が励起条件の無限次元時空間持続性を満たすならば、全てのノードの見積もりは平均二乗であり、ほぼ確実に一致している。さらに,RKHSにおける非定常および非独立なオンラインデータストリームに基づく分散オンライン学習アルゴリズムを提案し,ランダム入力データによって誘導される演算子が励振条件の無限次元時空間持続性を満たす場合,そのアルゴリズムが平均二乗でほぼ確実に整合であることを証明した。

We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a class of inhomogeneous random difference equations in Hilbert spaces with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory in Hilbert spaces. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinite-dimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. Moreover, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# 画素ワイド農業用画像時系列分類:比較と変形可能なプロトタイプベースアプローチ

Pixel-wise Agricultural Image Time Series Classification: Comparisons and a Deformable Prototype-based Approach ( http://arxiv.org/abs/2303.12533v2 )

ライセンス: Link先を確認

Elliot Vincent, Jean Ponce, Mathieu Aubry,

(参考訳) 衛星による地球観測の改善により、より高時間分解能と空間分解能の画像が得られる。このデータを農業モニタリングに活用することは、環境と経済の課題に対処するための鍵となる。時間的データを用いた作物の分節化の現在の手法は、注釈付きデータに依存するか、監督の欠如を補うために非常に高度に設計されている。本稿では,衛星画像時系列(SITS)の教師付きおよび教師なし画素単位のセグメンテーションのためのデータセットと手法を提示・比較する。また,K-meansやNearest Centroid Classifier (NCC)のような古典的プロトタイプベースの手法に対して,スペクトル変形と時間シフトに不変性を加えるアプローチを導入する。我々は、異なるレベルの監督について研究し、この単純かつ高度に解釈可能な手法は、低データ体制において最高の性能を達成し、最近の4つのSITSデータセット上での農業時系列の教師なし分類の最先端を著しく改善することを示す。

Improvements in Earth observation by satellites allow for imagery of ever higher temporal and spatial resolution. Leveraging this data for agricultural monitoring is key for addressing environmental and economic challenges. Current methods for crop segmentation using temporal data either rely on annotated data or are heavily engineered to compensate the lack of supervision. In this paper, we present and compare datasets and methods for both supervised and unsupervised pixel-wise segmentation of satellite image time series (SITS). We also introduce an approach to add invariance to spectral deformations and temporal shifts to classical prototype-based methods such as K-means and Nearest Centroid Classifier (NCC). We study different levels of supervision and show this simple and highly interpretable method achieves the best performance in the low data regime and significantly improves the state of the art for unsupervised classification of agricultural time series on four recent SITS datasets.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# TalkCLIP: テキストガイド型表現型音声スタイルによる対話ヘッドジェネレーション

TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles ( http://arxiv.org/abs/2304.00334v2 )

ライセンス: Link先を確認

Yifeng Ma, Suzhen Wang, Yu Ding, Lincheng Li, Bowen Ma, Tangjie Lv, Changjie Fan, Zhipeng Hu, Zhidong Deng, Xin Yu,

(参考訳) 音声駆動音声ヘッド生成は注目を集めている。所望の表情で話すヘッドビデオを作成するために、従来の手法は、表現情報を提供するために余分な参照ビデオに頼っている。本研究では,自然言語で表現を指定した発話ヘッドを生成可能なフレームワークであるTalkCLIPを提案する。テキストから表現へのマッピングをモデル化するために,まず,粗い感情ときめ細かい顔の動きの両方を表現した多彩なテキスト記述を持つテキスト-ビデオ対話ヘッドデータセットを構築した。提案したデータセットを活用することで,表現表現に自然言語に基づく記述を投影するCLIPベースのスタイルエンコーダを導入する。 TalkCLIPはトレーニング中に見えない説明のために式を推測することもできます。 TalkCLIPはテキストを使って表現の強度を調節したり、表現を編集したりすることもできる。広汎な実験により、TalkCLIPは、テキスト記述でガイドされた鮮やかな表情で、写真リアルな発話ヘッドを生成する高度な能力を実現することが実証された。

Audio-driven talking head generation has drawn growing attention. To produce talking head videos with desired facial expressions, previous methods rely on extra reference videos to provide expression information, which may be difficult to find and hence limits their usage. In this work, we propose TalkCLIP, a framework that can generate talking heads where the expressions are specified by natural language, hence allowing for specifying expressions more conveniently. To model the mapping from text to expressions, we first construct a text-video paired talking head dataset where each video has diverse text descriptions that depict both coarse-grained emotions and fine-grained facial movements. Leveraging the proposed dataset, we introduce a CLIP-based style encoder that projects natural language-based descriptions to the representations of expressions. TalkCLIP can even infer expressions for descriptions unseen during training. TalkCLIP can also use text to modulate expression intensity and edit expressions. Extensive experiments demonstrate that TalkCLIP achieves the advanced capability of generating photo-realistic talking heads with vivid facial expressions guided by text descriptions.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# 音声からの基本構文:教師なしディープニューラルネットワークにおける自発的結合

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks ( http://arxiv.org/abs/2305.01626v2 )

ライセンス: Link先を確認

Gašper Beguš, Thomas Lu, Zili Wang,

(参考訳) 構文の計算モデルは、主にテキストベースである。本稿では,最も基本的な構文操作を生音声から直接,教師なしの方法でモデル化できることを提案する。私たちは構文の最もユビキタスで基本的な特性の1つに焦点を合わせます。個別単語の音響記録を訓練した畳み込みニューラルネットワーク(CNN)が、入力に複数の単語を持つデータにアクセスすることなく、2つまたは3つの単語で連結された出力を生成し始める現象である。我々はこの発見を、異なるハイパーパラメータとトレーニングデータを持つ、独立に訓練されたいくつかのモデルで再現する。さらに、2つの単語で訓練されたネットワークは、新しい保存されていない単語の組み合わせに単語を埋め込むことを学ぶ。我々の知る限り、これは生の音声に基づくciwGAN/fiwGAN設定で訓練されたCNNのこれまで報告されていない特性であり、これらのアーキテクチャがどのように学習するかの理解と、生の音響入力からの構文のモデル化と進化の両方に影響を及ぼす。

Computational models of syntax are predominantly text-based. Here we propose that the most basic syntactic operations can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and elementary properties of syntax -- concatenation. We introduce spontaneous concatenation: a phenomenon where convolutional neural networks (CNNs) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the input. We replicate this finding in several independently trained models with different hyperparameters and training data. Additionally, networks trained on two words learn to embed words into novel unobserved word combinations. To our knowledge, this is a previously unreported property of CNNs trained in the ciwGAN/fiwGAN setting on raw speech and has implications both for our understanding of how these architectures learn as well as for modeling syntax and its evolution from raw acoustic inputs.

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# GenerateCT:3次元胸部CTボリュームのテキストコンディショナル生成

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes ( http://arxiv.org/abs/2305.16037v5 )

ライセンス: Link先を確認

Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina, Enis Simsar, Alperen Tezcan, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Furkan Almas, Irem Dogan, Muhammed Furkan Dasdelen, Chinmay Prabhakar, Hadrien Reynaud, Sarthak Pati, Christian Bluethgen, Mehmet Kemal Ozdemir, Bjoern Menze,

(参考訳) フリーフォームの医療用テキストプロンプトに条件付けされた3D医療用画像を生成するための最初のアプローチであるGenerateCTには、テキストエンコーダと3つの重要なコンポーネントが組み込まれている。 3D医療画像において直接的に同等の手法を使わずに、我々はGenerateCTを最先端の手法と比較し、すべての主要な指標でその優位性を実証した。そこで我々はGenerateCTの臨床応用を多義性分類タスクで評価した。まず,実データセット上でのマルチ異常度分類器のトレーニングにより,ベースラインを確立した。ゼロショットシナリオにおいて、モデルが外部データに一般化し、未知のプロンプトで性能を評価するために、我々は、外部セットを用いて分類器を訓練し、追加のベンチマークを設定した。我々は、GenerateCTを用いて、各セットに対して等しいボリュームを合成することで、トレーニングデータセットを2倍にした2つの実験を行った。最初の実験では、実数と生成量で分類器を共同で訓練する際、APスコアが11%改善した。第2の実験では、目に見えないプロンプトに基づいて、実数と生成量の両方のトレーニングで7%改善した。さらに、GenerateCTは、任意のサイズの合成トレーニングデータセットのスケーリングを可能にする。例として,実数集合の5倍の3次元CTを10万個生成し,これらの合成CTのみを用いて分類器を訓練した。驚くべきことに、この分類器は、すべての利用可能な実データでトレーニングされたデータのパフォーマンスを8%上回った。最後に、ドメインの専門家は生成されたボリュームを評価し、テキストプロンプトと高い整合性を確認した。コード、モデルウェイト、トレーニングデータ、および生成されたデータにhttps://github.com/ibrahimethemhamamci/GenerateCTでアクセスします。

GenerateCT, the first approach to generating 3D medical imaging conditioned on free-form medical text prompts, incorporates a text encoder and three key components: a novel causal vision transformer for encoding 3D CT volumes, a text-image transformer for aligning CT and text tokens, and a text-conditional super-resolution diffusion model. Without directly comparable methods in 3D medical imaging, we benchmarked GenerateCT against cutting-edge methods, demonstrating its superiority across all key metrics. Importantly, we evaluated GenerateCT's clinical applications in a multi-abnormality classification task. First, we established a baseline by training a multi-abnormality classifier on our real dataset. To further assess the model's generalization to external data and performance with unseen prompts in a zero-shot scenario, we employed an external set to train the classifier, setting an additional benchmark. We conducted two experiments in which we doubled the training datasets by synthesizing an equal number of volumes for each set using GenerateCT. The first experiment demonstrated an 11% improvement in the AP score when training the classifier jointly on real and generated volumes. The second experiment showed a 7% improvement when training on both real and generated volumes based on unseen prompts. Moreover, GenerateCT enables the scaling of synthetic training datasets to arbitrary sizes. As an example, we generated 100,000 3D CTs, fivefold the number in our real set, and trained the classifier exclusively on these synthetic CTs. Impressively, this classifier surpassed the performance of the one trained on all available real data by a margin of 8%. Last, domain experts evaluated the generated volumes, confirming a high degree of alignment with the text prompt. Access our code, model weights, training data, and generated data at https://github.com/ibrahimethemhamamci/GenerateCT

翻訳日:2024-07-16 06:06:24 公開日:2024-07-12

# Compressed Sensing:離散最適化アプローチ

Compressed Sensing: A Discrete Optimization Approach ( http://arxiv.org/abs/2306.04647v3 )

ライセンス: Link先を確認

Dimitris Bertsimas, Nicholas A. G. Johnson,

(参考訳) 圧縮センシング(CS: Compressed Sensing)問題について検討した。これは,線形測定の集合をある程度の数値耐性まで満足する最もスパースなベクトルを求める問題である。混合整数二階円錐プログラムとして再構成したCSの正規化式を$\ell_2$で導入する。この問題の2次円錐緩和を導出し、正規化パラメータの穏やかな条件下では、結果として得られる緩和は、よく研究された基礎追従問題と等価であることを示す。本稿では,2次円錐緩和を強化し,2次円錐緩和を利用してCSの小規模インスタンスを証明可能な最適性に解決する独自の分岐結合アルゴリズムを提案する。合成データに対する3つの最先端ベンチマーク手法による解と比較すると,我々の手法は平均6.22 %$sparseの解を生成することがわかった。合成データに対して実験的に最も優れたベンチマーク法と比較した場合、我々の手法は平均3.10\%$よりスパースな解を生成する。実世界のECGデータでは、与えられた$\ell_2$リコンストラクションエラーに対して、我々のアプローチは、ベンチマークメソッドよりも平均9.95\%$スパースなソリューション(最高のパフォーマンスベンチマークと比較してみれば3.88\%$スパース)を生成し、一方、与えられたスパーシティレベルでは、ベンチマークメソッドよりも平均10.77\%$低いリコンストラクションエラー(最高のパフォーマンスベンチマークと比較してみれば1.42\%$低いエラー)を生成する。マルチラベル分類アルゴリズムの構成要素として用いられる場合,提案手法は,ベンチマーク圧縮センシング法よりも高い分類精度を実現する。この改良された精度は、数桁の計算時間の増加によるコストが伴う。

We study the Compressed Sensing (CS) problem, which is the problem of finding the most sparse vector that satisfies a set of linear measurements up to some numerical tolerance. We introduce an $\ell_2$ regularized formulation of CS which we reformulate as a mixed integer second order cone program. We derive a second order cone relaxation of this problem and show that under mild conditions on the regularization parameter, the resulting relaxation is equivalent to the well studied basis pursuit denoising problem. We present a semidefinite relaxation that strengthens the second order cone relaxation and develop a custom branch-and-bound algorithm that leverages our second order cone relaxation to solve small-scale instances of CS to certifiable optimality. When compared against solutions produced by three state of the art benchmark methods on synthetic data, our numerical results show that our approach produces solutions that are on average $6.22\%$ more sparse. When compared only against the experiment-wise best performing benchmark method on synthetic data, our approach produces solutions that are on average $3.10\%$ more sparse. On real world ECG data, for a given $\ell_2$ reconstruction error our approach produces solutions that are on average $9.95\%$ more sparse than benchmark methods ($3.88\%$ more sparse if only compared against the best performing benchmark), while for a given sparsity level our approach produces solutions that have on average $10.77\%$ lower reconstruction error than benchmark methods ($1.42\%$ lower error if only compared against the best performing benchmark). When used as a component of a multi-label classification algorithm, our approach achieves greater classification accuracy than benchmark compressed sensing methods. This improved accuracy comes at the cost of an increase in computation time by several orders of magnitude.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# スキルクリティカル:階層的強化学習のための学習スキルの精製

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning ( http://arxiv.org/abs/2306.08388v3 )

ライセンス: Link先を確認

Ce Hao, Catherine Weaver, Chen Tang, Kenta Kawamoto, Masayoshi Tomizuka, Wei Zhan,

(参考訳) 階層的強化学習(RL)は、政策を時間的に複数のレベルに抽象化することで、長期的な意思決定を促進することができる。スパース報酬環境における評価結果は、スキル、すなわちプリミティブアクションのシーケンスで見られる。通常、スキル潜在空間とポリシーはオフラインデータから発見される。しかしながら、結果として生じる低レベルのポリシーは、低カバレッジのデモンストレーションや分散シフトのために信頼性が低い可能性がある。そこで本研究では,Skill-Criticアルゴリズムを用いて,ハイレベルなスキル選択とともに低レベルなポリシーを微調整する手法を提案する。我々のスキル・クリティカル・アルゴリズムは、低レベルと高レベルの両方のポリシーを最適化する。これらのポリシーは、オフラインのデモから学んだ潜在空間によって初期化され、規則化され、並列ポリシーの最適化を導く。複数のスパース・リワードRL環境におけるスキル・クリティカルの評価を行い,グラナ・トゥリストスポーツにおけるスパース・リワード自律レースタスクについて検討した。実験の結果,Skill-Criticの低レベル政策の微調整と実演誘導型正規化が性能向上に不可欠であることが示唆された。コードとビデオは、私たちのWebサイト(https://sites.google.com/view/skill-critic)で入手できる。

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance. Code and videos are available at our website: https://sites.google.com/view/skill-critic.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# 不均一HPCプラットフォームのためのディープラーニングハードウェアアクセラレータに関する調査

A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms ( http://arxiv.org/abs/2306.15552v2 )

ライセンス: Link先を確認

Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, Enrico Calore, Sebastiano Fabio Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke, Stefania Perri,

(参考訳) 近年のディープラーニング(DL)は、画像分類、コンピュータビジョン、音声認識などの高性能コンピューティング(HPC)アプリケーションにおいて、ハードウェアアクセラレーターを最も有効なソリューションとして採用している。本調査は,HPCアプリケーションの性能要件に適合するDLアクセラレータの設計における最新の進歩を要約し,分類する。特に、GPUやTPUベースのアクセラレータだけでなく、FPGAベースのアクセラレータやASICベースのアクセラレータ、Neural Processing Units、オープンハードウェアRISC-Vベースのアクセラレータ、コプロセッサといった、設計固有のハードウェアアクセラレータを含む、ディープラーニングアクセラレーションをサポートする最も高度なアプローチを強調している。このサーベイでは、新しいメモリ技術とコンピューティングパラダイムに基づくアクセラレータ、例えば3Dスタックされたプロセッサ・インメモリ、不揮発性メモリ(主に抵抗RAMと位相変化メモリ)をインメモリコンピューティングを実装するためのアクセラレータ、ニューロモーフィック処理ユニット、マルチチップモジュールに基づくアクセラレータについても説明している。新興技術の中には、量子ベースの加速器やフォトニクスに関する洞察も含まれています。結論として、この調査は、ディープラーニングの急速に発展する分野において、読者に包括的な視点を提供することを目的として、過去数年間に提案された最も影響力のあるアーキテクチャと技術を分類する。

Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the performance requirements of HPC applications. In particular, it highlights the most advanced approaches to support deep learning accelerations including not only GPU and TPU-based accelerators but also design-specific hardware accelerators such as FPGA-based and ASIC-based accelerators, Neural Processing Units, open hardware RISC-V-based accelerators and co-processors. The survey also describes accelerators based on emerging memory technologies and computing paradigms, such as 3D-stacked Processor-In-Memory, non-volatile memories (mainly, Resistive RAM and Phase Change Memories) to implement in-memory computing, Neuromorphic Processing Units, and accelerators based on Multi-Chip Modules. Among emerging technologies, we also include some insights into quantum-based accelerators and photonics. To conclude, the survey classifies the most influential architectures and technologies proposed in the last years, with the purpose of offering the reader a comprehensive perspective in the rapidly evolving field of deep learning.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# 分散シフトのためのモデリング言語の必要性について--タブラルデータセットの例-

On the Need of a Modeling Language for Distribution Shifts: Illustrations on Tabular Datasets ( http://arxiv.org/abs/2307.05284v3 )

ライセンス: Link先を確認

Jiashuo Liu, Tianyu Wang, Peng Cui, Hongseok Namkoong,

(参考訳) 異なる分散シフトは異なる介入を必要とし、アルゴリズムはそれらが対処する特定のシフトに基礎を置く必要がある。しかし、ロバストアルゴリズムの方法論的開発は一般に経験的検証に欠ける構造的仮定に依存している。 5つのグラフデータセットと6万のメソッド構成に、不均衡学習と分散ロバスト最適化(DRO)メソッドを含む自然なシフトを含む実験的なテストベッドを構築した。 ML文献のX$(共変量)シフトに重きを置いているのとは対照的に、Y|X$-shiftsはテストベッドで最も多く使われている。頑健なアルゴリズムの性能はシフトタイプによって大きく異なり、バニラ法ほど良くない。そこで我々はDRO手法の詳細な実験分析を行い、研究者によってしばしば無視されるが、基礎となるモデルクラス(例えば、XGBoost)やハイパーパラメータ選択などの実装の詳細は、あいまいさセットや半径よりもパフォーマンスに大きな影響を与えることを発見した。方法論的な研究と実践のギャップをさらに埋めるために、そのようなデータ駆動型、帰納的な分散シフトの理解が、データ中心とアルゴリズムの介入をいかに促進するかを示すケーススタディを設計する。

Different distribution shifts require different interventions, and algorithms must be grounded in the specific shifts they address. However, methodological development for robust algorithms typically relies on structural assumptions that lack empirical validation. Advocating for an empirically grounded data-driven approach to research, we build an empirical testbed comprising natural shifts across 5 tabular datasets and 60,000 method configurations encompassing imbalanced learning and distributionally robust optimization (DRO) methods. We find $Y|X$-shifts are most prevalent on our testbed, in stark contrast to the heavy focus on $X$ (covariate)-shifts in the ML literature. The performance of robust algorithms varies significantly over shift types, and is no better than that of vanilla methods. To understand why, we conduct an in-depth empirical analysis of DRO methods and find that although often neglected by researchers, implementation details -- such as the choice of underlying model class (e.g., XGBoost) and hyperparameter selection -- have a bigger impact on performance than the ambiguity set or its radius. To further bridge that gap between methodological research and practice, we design case studies that illustrate how such a data-driven, inductive understanding of distribution shifts can enhance both data-centric and algorithmic interventions.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# 2+1)D SU(2) Yang-Mills Lattice Gauge理論のテンソルネットワークによる有限密度でのシミュレーション

Simulating (2+1)D SU(2) Yang-Mills Lattice Gauge Theory at finite density with tensor networks ( http://arxiv.org/abs/2307.09396v3 )

ライセンス: Link先を確認

Giovanni Cataldi, Giuseppe Magnifico, Pietro Silvi, Simone Montangero,

(参考訳) 我々は、テンソルネットワーク(TN)を持つ2次元の非アベリア格子ゲージ理論を数値的にシミュレートする。ハミルトンの定式化におけるSU(2)Yang-Millsモデルに焦点をあて、動的物質と極小歪んだゲージ場(ハードコアグルーオン)を持つ。 TN符号プロブレムフリーアプローチにより、クォーク素質量と色電荷の関数として、0および有限バリオン数のモデルの位相図を特徴づける。中間系サイズでは、クォーク対有界準粒子(バリオン)の液相を検出し、その質量は連続極限に向かって有限である。低クォーク質量では、潜在的な分解の痕跡が見られ、高質量では、トポロジカル秩序のシグネチャが見られる。

We numerically simulate a non-Abelian lattice gauge theory in two spatial dimensions, with tensor networks (TN), up to intermediate sizes (>30 matter sites) well beyond exact diagonalization. We focus on the SU(2) Yang-Mills model in Hamiltonian formulation, with dynamical matter and minimally truncated gauge field (hardcore gluon). Thanks to the TN sign-problem-free approach, we characterize the phase diagram of the model at zero and finite baryon number as a function of the quark bare mass and color charge. At intermediate system sizes, we detect a liquid phase of quark-pair bound-state quasiparticles (baryons), whose mass is finite towards the continuum limit. Interesting phenomena arise at the transition boundary where color-electric and color-magnetic terms are maximally frustrated: For low quark masses, we see traces of potential deconfinement, while for high masses, signatures of a possible topological order.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# 自律運転システムのテスト・改善のための境界状態生成

Boundary State Generation for Testing and Improvement of Autonomous Driving Systems ( http://arxiv.org/abs/2307.10590v2 )

ライセンス: Link先を確認

Matteo Biagiola, Paolo Tonella,

(参考訳) 近年のディープニューラルネットワーク(DNN)とセンサ技術の進歩により、自律運転システム(ADS)の自律性はますます高まっている。しかし、その信頼性を評価することは依然として重要な問題である。最先端のADSテストアプローチでは、シミュレーション運転環境の制御可能な属性をADSが誤動作するまで変更する。このようなアプローチでは、ADSが成功している環境インスタンスは、ADSが誤動作する可能性のある隠れ運転条件を含む可能性があるにもかかわらず、破棄される。本稿では, ADS テストのための新しいテストジェネレータ GENBO (generator of Boundary State pairs) を提案する。 GENBOは、障害のない環境インスタンスで収集されたエゴ車両の駆動条件(位置、速度、方向)を変更し、同一環境インスタンス内の動作境界(すなわち、モデルが誤動作し始める場所)における挑戦駆動条件を効率よく生成する。このような境界条件を用いて、初期トレーニングデータセットを拡張し、テスト中のDNNモデルを再訓練する。評価結果から,再学習モデルでは,元のDNNモデルに対して,異なる評価トラックに対して平均で最大3倍の成功率を示した。

Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. In such approaches, environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GENBO (GENerator of BOundary state pairs), a novel test generator for ADS testing. GENBO mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment instance. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has, on average, up to 3x higher success rate on a separate set of evaluation tracks with respect to the original DNN model.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# D2S: カメラ再局在のためのスパース記述子と3次元座標の表現

D2S: Representing sparse descriptors and 3D coordinates for camera relocalization ( http://arxiv.org/abs/2307.15250v3 )

ライセンス: Link先を確認

Bach-Thuan Bui, Huy-Hoang Bui, Dinh-Tuan Tran, Joo-Ho Lee,

(参考訳) 最先端の視覚的ローカライゼーション手法は、主にローカル記述子と3Dポイントクラウドにマッチする複雑な手順に依存している。しかし、これらの手順は時間とともに推論、ストレージ、更新の点でかなりのコストを発生させる可能性がある。本研究では,複雑な局所記述子とそのシーン座標を表現するために,D2Sという単純なネットワークを用いた直接学習に基づくアプローチを提案する。その単純さと費用対効果が特徴である。テスト段階では、単一のRGBイメージをローカライズにのみ利用し、複雑なスパースシーンをエンコードするための軽量モデルのみを必要とする。提案したD2Sは、単純な損失関数とグラフアテンションを組み合わせて、雲や木、いくつかの動的オブジェクトなどの領域を無視しながら、堅牢な記述子に選択的にフォーカスする。この選択的な注意により、D2Sはスパースディスクリプタのバイナリ・セマンティック分類を効果的に行うことができる。さらに,シーン特異的な一般化とラベルなし観測による自己更新における視覚的局所化手法の能力を評価するための簡易な屋外データセットを提案する。本手法は,屋内および屋外環境におけるシーン座標の回帰において,最先端のCNN手法よりも優れる。ラベル付きデータソースがなくても、昼から夜への移行やドメインシフトへの適応といったシナリオを含む、トレーニングデータを超えて一般化する能力を示している。ソースコード、トレーニングされたモデル、データセット、デモビデオは以下のリンクで入手できる。

State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinates. Our method is characterized by its simplicity and cost-effectiveness. It solely leverages a single RGB image for localization during the testing phase and only requires a lightweight model to encode a complex sparse scene. The proposed D2S employs a combination of a simple loss function and graph attention to selectively focus on robust descriptors while disregarding areas such as clouds, trees, and several dynamic objects. This selective attention enables D2S to effectively perform a binary-semantic classification for sparse descriptors. Additionally, we propose a simple outdoor dataset to evaluate the capabilities of visual localization methods in scene-specific generalization and self-updating from unlabeled observations. Our approach outperforms the state-of-the-art CNN-based methods in scene coordinate regression in indoor and outdoor environments. It demonstrates the ability to generalize beyond training data, including scenarios involving transitions from day to night and adapting to domain shifts, even in the absence of the labeled data sources. The source code, trained models, dataset, and demo videos are available at the following link: https://thpjp.github.io/d2s.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# DISQ: 変分量子アルゴリズムのための動的反復スキーピング

DISQ: Dynamic Iteration Skipping for Variational Quantum Algorithms ( http://arxiv.org/abs/2308.06634v3 )

ライセンス: Link先を確認

Junyao Zhang, Hanrui Wang, Gokul Subramanian Ravi, Frederic T. Chong, Song Han, Frank Mueller, Yiran Chen,

(参考訳) 本稿では,VQAトレーニングのための安定景観の構築と,ノイズドリフト問題への取り組みについて提案する。 DISQは参照回路を備えた「ドリフト検出器」を採用し、ノイズドリフトエラーによって深刻な影響を受ける繰り返しを特定し、スキップする。具体的には、前回のトレーニングイテレーションからの回路を、現在のイテレーションにおける基準回路として再実行し、ノイズドリフトの影響を推定する。この繰り返しはノイズドリフト誤差によって損なわれ、ノイズドリフトが理想的な最適化勾配の方向を反転した場合、スキップされる。ノイズドリフト検出の信頼性を高めるため,前回の繰り返しから複数の参照回路を利用する手法を提案する。それでも、複数の参照回路は、かなりの実行オーバーヘッドをもたらす。余分なオーバーヘッドを軽減するため、ドリフト検出時に大きな係数大(プライムサブセット)の観測可能な回路のみを実行するために、Pauli-term subsetting(プライムおよびマイナーサブセット)を提案する。現在のイテレーションがドリフトフリーである場合、この小さなサブセットのみが実行される。様々な応用およびQPUの評価により、DECはVQAに対するノイズドリフトの影響のかなりの部分を緩和し、従来のベースラインよりも1.51-2.24倍の忠実性向上を達成できることが示されている。 DISQの利点は1.1-1.9倍であり、平均ノイズ検出速度は2.07倍に向上する。

This paper proposes DISQ to craft a stable landscape for VQA training and tackle the noise drift challenge. DISQ adopts a "drift detector" with a reference circuit to identify and skip iterations that are severely affected by noise drift errors. Specifically, the circuits from the previous training iteration are re-executed as a reference circuit in the current iteration to estimate noise drift impacts. The iteration is deemed compromised by noise drift errors and thus skipped if noise drift flips the direction of the ideal optimization gradient. To enhance noise drift detection reliability, we further propose to leverage multiple reference circuits from previous iterations to provide a well founded judge of current noise drift. Nevertheless, multiple reference circuits also introduce considerable execution overhead. To mitigate extra overhead, we propose Pauli-term subsetting (prime and minor subsets) to execute only observable circuits with large coefficient magnitudes (prime subset) during drift detection. Only this minor subset is executed when the current iteration is drift-free. Evaluations across various applications and QPUs demonstrate that DISQ can mitigate a significant portion of the noise drift impact on VQAs and achieve 1.51-2.24x fidelity improvement over the traditional baseline. DISQ's benefit is 1.1-1.9x over the best alternative approach while boosting average noise detection speed by 2.07x

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# MetaWeather: 天気が劣化した画像の復元

MetaWeather: Few-Shot Weather-Degraded Image Restoration ( http://arxiv.org/abs/2308.14334v4 )

ライセンス: Link先を確認

Youngrae Kim, Younggeol Cho, Thanh-Tung Nguyen, Seunghoon Hong, Dongman Lee,

(参考訳) 実際の気象条件は複雑で、しばしば同時に起こる。しかし、既存の修復アプローチのほとんどは、訓練データにおける特定の気象条件の適用性に制限されており、実際の気象条件を含む目に見えない気象タイプへの一般化に苦慮している。この問題を解決するために,メタウェザー(MetaWeather)という,多種多様な新しい気象条件を単一統一モデルで処理できる普遍的なアプローチを導入する。メタウェザーは、強力なメタラーニングフレームワークを拡張し、気象劣化画像復元のタスクを、クエリ画像の劣化パターンを予測する数ショット適応問題として定式化し、新しい空間型マッチングアルゴリズムにより、目に見えない気象条件に適応することを学ぶ。 BID Task II.A, SPA-Data, RealSnow のデータセットによる実験結果から,提案手法が観測不能な気象条件に適応可能であることを示す。

Real-world weather conditions are intricate and often occur concurrently. However, most existing restoration approaches are limited in their applicability to specific weather conditions in training data and struggle to generalize to unseen weather types, including real-world weather conditions. To address this issue, we introduce MetaWeather, a universal approach that can handle diverse and novel weather conditions with a single unified model. Extending a powerful meta-learning framework, MetaWeather formulates the task of weather-degraded image restoration as a few-shot adaptation problem that predicts the degradation pattern of a query image, and learns to adapt to unseen weather conditions through a novel spatial-channel matching algorithm. Experimental results on the BID Task II.A, SPA-Data, and RealSnow datasets demonstrate that the proposed method can adapt to unseen weather conditions, significantly outperforming the state-of-the-art multi-weather image restoration methods.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# 医用画像登録のためのオンザフライ指導

On-the-Fly Guidance Training for Medical Image Registration ( http://arxiv.org/abs/2308.15216v5 )

ライセンス: Link先を確認

Yuelin Xin, Yicheng Chen, Shengxiang Ji, Kun Han, Xiaohui Xie,

(参考訳) 本研究は,既存の学習ベース画像登録モデルを改善するための新しいトレーニングフレームワークであるOn-the-Fly Guidance(OFG)を紹介し,弱教師付きおよび教師なし手法の限界に対処する。ラベル付きデータの不足により、弱教師付き手法は困難であり、教師なし手法は正確性のために画像類似度指標に直接依存する。本手法では,ラベル付きデータを必要としない登録モデルをトレーニングするための教師付き手法を提案する。 OFGは、変形予測を微分可能なオプティマイザで精錬することにより、トレーニング中に擬似地下真理を生成する。 OFGは変形予測を効率的に最適化し、推論速度を犠牲にすることなく、登録モデルの性能を向上させる。提案手法は,複数のベンチマークデータセットおよび先行モデルで検証され,性能が大幅に向上し,学習ベース登録モデルの訓練のためのプラグアンドプレイソリューションが提供される。 https://github.com/cilix-ai/on-the-fly-guidance

This study introduces a novel On-the-Fly Guidance (OFG) training framework for enhancing existing learning-based image registration models, addressing the limitations of weakly-supervised and unsupervised methods. Weakly-supervised methods struggle due to the scarcity of labeled data, and unsupervised methods directly depend on image similarity metrics for accuracy. Our method proposes a supervised fashion for training registration models, without the need for any labeled data. OFG generates pseudo-ground truth during training by refining deformation predictions with a differentiable optimizer, enabling direct supervised learning. OFG optimizes deformation predictions efficiently, improving the performance of registration models without sacrificing inference speed. Our method is tested across several benchmark datasets and leading models, it significantly enhanced performance, providing a plug-and-play solution for training learning-based registration models. Code available at: https://github.com/cilix-ai/on-the-fly-guidance

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# 地域差分私的プロトコルの真のコストを明らかにする:監査的視点

Revealing the True Cost of Locally Differentially Private Protocols: An Auditing Perspective ( http://arxiv.org/abs/2309.01597v3 )

ライセンス: Link先を確認

Héber H. Arcolezi, Sébastien Gambs,

(参考訳) 従来のDP監査は,集中型モデル(例えば,DP-SGDアルゴリズムの監査)に主眼を置いているが,我々は,この手法をローカルDP(LDP)監査に拡張することを提唱している。これを実現するために,ローカルな差分秘密機構のプライバシー損失を実証的に推定する LDP-Auditor フレームワークを提案する。このアプローチは、LDP周波数推定プロトコルに対するプライバシー攻撃の設計における最近の進歩を活用する。より正確には、数多くの最先端のLDPプロトコルの分析を通じて、異なるエンコーディングや摂動関数の影響など、プライバシー監査に影響を与える要因を幅広く検討する。さらに、ドメインサイズと理論的プライバシ損失パラメータ$\epsilon$と$\delta$が局所的なプライバシ推定に与える影響についても検討する。また, 長期研究用LDPプロトコルに対する識別可能性攻撃や多次元データなど, LDP監査の具体的な側面を明らかにするために, 詳細なケーススタディも実施されている。最後に,現在最先端の LDP Python パッケージにバグが発見されている LDP-Auditor フレームワークの顕著な成果を示す。 LDPプロトコルにおけるランダム性や情報損失の源泉について,我々のLDP-Auditorフレームワークおよび本研究は,総合的に貴重な知見を提供する。これらのコントリビューションは、局所的なプライバシ損失の現実的な理解を提供するもので、実践者がそれぞれの要求に最も適した LDP メカニズムとプライバシパラメータを選択するのに役立ちます。我々は LDP-Auditor in \url{https://github.com/hharcolezi/ldp-audit} をオープンソース化した。

While the existing literature on Differential Privacy (DP) auditing predominantly focuses on the centralized model (e.g., in auditing the DP-SGD algorithm), we advocate for extending this approach to audit Local DP (LDP). To achieve this, we introduce the LDP-Auditor framework for empirically estimating the privacy loss of locally differentially private mechanisms. This approach leverages recent advances in designing privacy attacks against LDP frequency estimation protocols. More precisely, through the analysis of numerous state-of-the-art LDP protocols, we extensively explore the factors influencing the privacy audit, such as the impact of different encoding and perturbation functions. Additionally, we investigate the influence of the domain size and the theoretical privacy loss parameters $\epsilon$ and $\delta$ on local privacy estimation. In-depth case studies are also conducted to explore specific aspects of LDP auditing, including distinguishability attacks on LDP protocols for longitudinal studies and multidimensional data. Finally, we present a notable achievement of our LDP-Auditor framework, which is the discovery of a bug in a state-of-the-art LDP Python package. Overall, our LDP-Auditor framework as well as our study offer valuable insights into the sources of randomness and information loss in LDP protocols. These contributions collectively provide a realistic understanding of the local privacy loss, which can help practitioners in selecting the LDP mechanism and privacy parameters that best align with their specific requirements. We open-sourced LDP-Auditor in \url{https://github.com/hharcolezi/ldp-audit}.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# 自動運転知覚における深層学習の安全性に関する考察

Deep Learning Safety Concerns in Automated Driving Perception ( http://arxiv.org/abs/2309.03774v3 )

ライセンス: Link先を確認

Stephanie Abrecht, Alexander Hirsch, Shervin Raafatnia, Matthias Woehrle,

(参考訳) 近年のディープラーニング分野の進歩と認識のためのディープニューラルネットワーク(DNN)の性能向上により、自動走行(AD)システムへの需要が高まっている。このようなシステムの安全性は極めて重要であるため、DNNのユニークな特性を考慮する必要がある。系統的かつ包括的アプローチでDNNに基づく認識コンポーネントを用いたADシステムの安全性を実現するために,いわゆる安全懸念が適切な構造要素として導入されている。一方、安全上の懸念という概念は、ISO 21448(SOTIF)のようなADシステムの安全性に関する既存の標準によく適合している。一方、すでにいくつかの学術出版物や、ISO PAS 8800のようなAI安全性に関する今後の標準に触発されている。安全に関する概念は以前から紹介されてきたが,本論文では,様々な分野の専門家や安全専門家からのフィードバックを活用して,その拡張と改良を行っている。特に,クロスファンクショナルなチームが共同で関心事に対処できるようにすると同時に,理解を深めるための新たな分類を導入する。

Recent advances in the field of deep learning and impressive performance of deep neural networks (DNNs) for perception have resulted in an increased demand for their use in automated driving (AD) systems. The safety of such systems is of utmost importance and thus requires to consider the unique properties of DNNs. In order to achieve safety of AD systems with DNN-based perception components in a systematic and comprehensive approach, so-called safety concerns have been introduced as a suitable structuring element. On the one hand, the concept of safety concerns is -- by design -- well aligned to existing standards relevant for safety of AD systems such as ISO 21448 (SOTIF). On the other hand, it has already inspired several academic publications and upcoming standards on AI safety such as ISO PAS 8800. While the concept of safety concerns has been previously introduced, this paper extends and refines it, leveraging feedback from various domain and safety experts in the field. In particular, this paper introduces an additional categorization for a better understanding as well as enabling cross-functional teams to jointly address the concerns.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class-Agnostic Counting

ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting ( http://arxiv.org/abs/2309.04820v2 )

ライセンス: Link先を確認

Michael A. Hobley, Victor A. Prisacariu,

(参考訳) クラスに依存しないカウントメソッドは任意のクラスのオブジェクトを列挙する。以前の作業では、カウントされる型の例のセットか、クエリイメージが単一のタイプのオブジェクトのみを含む必要があるため、有用性が制限されていた。これらの欠点の重要な要因は、複数の種類のオブジェクトが存在する設定におけるカウントに適切に対処するデータセットがないことである。これらの問題に対処するため、トレーニングや推論の例を使わずに複数の種類のオブジェクトを同時にカウントする手法であるMCAC(Multi-class-Agnostic Counting dataset)とABC123(A Blind Counter)を提案する。 ABC123は新しいパラダイムを導入し、例題を列挙をガイドする代わりに、ユーザが生成した出力を理解するのを助けるために、カウントステージの後に例が見つかる。 ABC123は,ヒトのループ内アノテーションを必要とせず,MCACの現代的な手法よりも優れていることを示す。また、この性能は標準クラスに依存しないカウントデータセットであるFSC-147に転送されることを示す。 MCACはMCAC.active.visionで、ABC123はABC123.active.visionで入手できる。

Class-agnostic counting methods enumerate objects of an arbitrary class, providing tremendous utility in many fields. Prior works have limited usefulness as they require either a set of examples of the type to be counted or that the query image contains only a single type of object. A significant factor in these shortcomings is the lack of a dataset to properly address counting in settings with more than one kind of object present. To address these issues, we propose the first Multi-class, Class-Agnostic Counting dataset (MCAC) and A Blind Counter (ABC123), a method that can count multiple types of objects simultaneously without using examples of type during training or inference. ABC123 introduces a new paradigm where instead of requiring exemplars to guide the enumeration, examples are found after the counting stage to help a user understand the generated outputs. We show that ABC123 outperforms contemporary methods on MCAC without needing human in-the-loop annotations. We also show that this performance transfers to FSC-147, the standard class-agnostic counting dataset. MCAC is available at MCAC.active.vision and ABC123 is available at ABC123.active.vision.

翻訳日:2024-07-16 05:56:40 公開日:2024-07-12

# 地理空間気象データに基づく深部ニューラルネットワークによる長期干ばつ予測

Long-term drought prediction using deep neural networks based on geospatial weather data ( http://arxiv.org/abs/2309.06212v6 )

ライセンス: Link先を確認

Alexander Marusov, Vsevolod Grabar, Yury Maximov, Nazar Sotiriadi, Alexander Bulkin, Alexey Zaytsev,

(参考訳) 農業計画や保険には1年前から予測される高品質の干ばつの問題が不可欠である。しかし、データの複雑さと乾燥確率性のために、妥当な精度で解決されていない。我々は、月次気象データを入力としてアクセス可能な時空間ニューラルネットワークモデルを採用するエンドツーエンドアプローチを導入することで、干ばつデータに取り組む。本研究は,Palmer Drought Severity Index(PDSI)予測の有効性を評価するために,多種多様なモデルと5つの異なる環境領域を用いた。重要な集約された発見は、TransformerモデルであるEarthFormerの、正確な短期(最大6ヶ月)の予測における例外的なパフォーマンスである。同時に、畳み込みLSTMは長期的な予測に優れている。

The problem of high-quality drought forecasting up to a year in advance is critical for agriculture planning and insurance. Yet, it is still unsolved with reasonable accuracy due to data complexity and aridity stochasticity. We tackle drought data by introducing an end-to-end approach that adopts a spatio-temporal neural network model with accessible open monthly climate data as the input. Our systematic research employs diverse proposed models and five distinct environmental regions as a testbed to evaluate the efficacy of the Palmer Drought Severity Index (PDSI) prediction. Key aggregated findings are the exceptional performance of a Transformer model, EarthFormer, in making accurate short-term (up to six months) forecasts. At the same time, the Convolutional LSTM excels in longer-term forecasting.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# グラフにおけるコミュニティ検出の概観

A Comprehensive Review of Community Detection in Graphs ( http://arxiv.org/abs/2309.11798v5 )

ライセンス: Link先を確認

Jiakang Li, Songning Lai, Zhihao Shuai, Yuan Tan, Yifan Jia, Mianyang Yu, Zichen Song, Xiaokang Peng, Ziyang Xu, Yongxin Ni, Haifeng Qiu, Jiayu Yang, Yutong Liu, Yonggang Lu,

(参考訳) 複雑なネットワークの研究は、実世界のグラフの重要な特徴となるコミュニティ構造の理解を著しく前進させてきた。グラフ内のコミュニティを検出することは、社会学、生物学、計算機科学の応用において難しい問題である。学際的な科学者コミュニティの努力にもかかわらず、この問題に対する十分な解決策はまだ得られていない。この記事では、モジュラリティに基づく手法、スペクトルクラスタリング、確率論的モデリング、ディープラーニングの観点から、様々なコミュニティ検出手法の徹底的な説明として機能するグラフにおけるコミュニティ検出のトピックについて論じる。また,提案手法とともに,私たちによって設計されたコミュニティ検出手法についても紹介する。さらに,これらの手法の真理と非真理のデータセット上での性能を比較した。結論として、この包括的なレビューは、グラフにおけるコミュニティ検出の深い理解を提供する。

The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been achieved. This review article delves into the topic of community detection in graphs, which serves as a thorough exposition of various community detection methods from perspectives of modularity-based method, spectral clustering, probabilistic modelling, and deep learning. Along with the methods, a new community detection method designed by us is also presented. Additionally, the performance of these methods on the datasets with and without ground truth is compared. In conclusion, this comprehensive review provides a deep understanding of community detection in graphs.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# 量子状態のストリーム化

Streaming quantum state purification ( http://arxiv.org/abs/2309.16387v2 )

ライセンス: Link先を確認

Andrew M. Childs, Honghao Fu, Debbie Leung, Zhi Li, Maris Ozols, Vedang Vyas,

(参考訳) 量子状態浄化は、未知の純粋な量子状態のほぼ純粋なコピーを、複数のノイズのある状態のコピーを使って復元するタスクである。この基本的なタスクは、ノイズの多いチャネル上の量子通信や不完全なデバイスによる量子計算に応用できるが、これまでは量子ビットの場合にのみ研究されてきた。初期誤差パラメータから始まる任意の次元のクォーディットのスワップテストに基づいて効率的な浄化手順を導出する。初期誤差パラメータと次元を定数として扱うことで,本手法が最終誤差パラメータに漸近的に最適なサンプル複雑性を持つことを示す。我々のプロトコルは単純な再帰的構造を持ち、状態がストリーミング形式で一度に1つ提供されると適用でき、実装には小さな量子メモリしか必要としない。

Quantum state purification is the task of recovering a nearly pure copy of an unknown pure quantum state using multiple noisy copies of the state. This basic task has applications to quantum communication over noisy channels and quantum computation with imperfect devices, but has only been studied previously for the case of qubits. We derive an efficient purification procedure based on the swap test for qudits of any dimension, starting with any initial error parameter. Treating the initial error parameter and the dimension as constants, we show that our procedure has sample complexity asymptotically optimal in the final error parameter. Our protocol has a simple recursive structure that can be applied when the states are provided one at a time in a streaming fashion, requiring only a small quantum memory to implement.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# ゼロショット連続プロンプト転送:言語モデル全体でのタスクセマンティクスの一般化

Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models ( http://arxiv.org/abs/2310.01691v2 )

ライセンス: Link先を確認

Zijun Wu, Yongkang Wu, Lili Mou,

(参考訳) 自然言語処理(NLP)におけるプロンプトチューニングは、大規模言語モデルを特定のタスクに適応させる手法として、ますます人気が高まっている。しかし、これらのプロンプト、特に連続的なプロンプトの異なるモデル間での転送性は依然として課題である。本研究では,ゼロショット連続プロンプト転送手法を提案する。この方法では,ソースプロンプトを相対空間に符号化し,対応するターゲットプロンプトを探索してターゲットモデルに転送する。実験により提案手法の有効性を確認し, 連続的プロンプトにおける「タスク意味論」が様々な言語モデルにまたがって一般化可能であることを示す。さらに、複数のソースモデルから「タスクセマンティクス」を組み合わせることで、転送の一般化性がさらに高められることが判明した。

Prompt tuning in natural language processing (NLP) has become an increasingly popular method for adapting large language models to specific tasks. However, the transferability of these prompts, especially continuous prompts, between different models remains a challenge. In this work, we propose a zero-shot continuous prompt transfer method, where source prompts are encoded into relative space and the corresponding target prompts are searched for transferring to target models. Experimental results confirm the effectiveness of our method, showing that 'task semantics' in continuous prompts can be generalized across various language models. Moreover, we find that combining 'task semantics' from multiple source models can further enhance the generalizability of transfer.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# 物理を意識した機械学習は、機械学習とプロセスベースの水文学のための科学パラダイムに革命をもたらす

Physics-aware Machine Learning Revolutionizes Scientific Paradigm for Machine Learning and Process-based Hydrology ( http://arxiv.org/abs/2310.05227v5 )

ライセンス: Link先を確認

Qingsong Xu, Yilei Shi, Jonathan Bamber, Ye Tuo, Ralf Ludwig, Xiao Xiang Zhu,

(参考訳) 正確な水文学理解と水循環予測は、水資源の管理に関連する科学的・社会的課題、特に人為的気候変動の動的影響に対処するために重要である。既存のレビューは、この分野における機械学習(ML)の開発に重点を置いているが、異なるパラダイムとして、水文学とMLを明確に区別している。本稿では、認識される障壁を克服し、両方のフィールドに革命をもたらすための変換アプローチとして、物理認識型MLを紹介する。具体的には、物理知識や物理に基づくモデリングをMLに統合する既存の手法の構造化されたコミュニティ(PaML)を構築し、物理を意識したML手法の総合的なレビューを行う。物理データ誘導型ML、物理インフォーム型ML、物理埋め込み型ML、物理認識型ハイブリッド学習の4つの側面について、これらのPaML方法論を体系的に分析する。 PaMLはML支援仮説を促進し、ビッグデータからの洞察を加速し、科学的発見を促進する。まず,PaMLにおける水文学の体系的検討を行い,降雨・流出水文プロセスや流体力学プロセスについて概観し,様々な目的やPaML手法について,最も有望で挑戦的な方向性を強調した。最後に、新しいPaMLベースの水文学プラットフォームであるHydroPMLが、水学応用の基礎としてリリースされた。 HydroPMLはMLの説明可能性と因果性を高め、デジタル水循環の実現の基礎となる。 HydroPMLプラットフォームはhttps://hydropml.github.io/.comで公開されている。

Accurate hydrological understanding and water cycle prediction are crucial for addressing scientific and societal challenges associated with the management of water resources, particularly under the dynamic influence of anthropogenic climate change. Existing reviews predominantly concentrate on the development of machine learning (ML) in this field, yet there is a clear distinction between hydrology and ML as separate paradigms. Here, we introduce physics-aware ML as a transformative approach to overcome the perceived barrier and revolutionize both fields. Specifically, we present a comprehensive review of the physics-aware ML methods, building a structured community (PaML) of existing methodologies that integrate prior physical knowledge or physics-based modeling into ML. We systematically analyze these PaML methodologies with respect to four aspects: physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning. PaML facilitates ML-aided hypotheses, accelerating insights from big data and fostering scientific discoveries. We first conduct a systematic review of hydrology in PaML, including rainfall-runoff hydrological processes and hydrodynamic processes, and highlight the most promising and challenging directions for different objectives and PaML methods. Finally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for hydrological applications. HydroPML enhances the explainability and causality of ML and lays the groundwork for the digital water cycle's realization. The HydroPML platform is publicly available at https://hydropml.github.io/.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# HiFi-123:高精細画像から3Dコンテンツ生成へ

HiFi-123: Towards High-fidelity One Image to 3D Content Generation ( http://arxiv.org/abs/2310.06744v3 )

ライセンス: Link先を確認

Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, Wenbo Hu, Long Quan, Ying Shan, Yonghong Tian,

(参考訳) 拡散モデルの最近の進歩により、単一の画像から3次元生成が可能になった。しかし、現在の手法は、ぼやけたテクスチャや参照画像からの逸脱を伴って、新しいビューの最適化結果を生成することが多く、実用的利用を制限している。本稿では,高忠実かつ多視点で一貫した3次元生成が可能なHiFi-123を提案する。まず,拡散型ゼロショットノベルビュー合成法の忠実度を大幅に向上させるRGNV(Reference-Guided Novel View Enhancement)手法を提案する。第二に、RGNVに乗じて、新しいRGSD(Reference-Guided State Distillation)の損失を示す。最適化に基づくイメージ・ツー・3Dパイプラインに組み込むと、3D生成の品質が大幅に向上し、最先端の性能が達成される。包括的評価は,既存手法に対するアプローチの有効性を質的,定量的に示すものである。ビデオはプロジェクトページで見ることができる。

Recent advances in diffusion models have enabled 3D generation from a single image. However, current methods often produce suboptimal results for novel views, with blurred textures and deviations from the reference image, limiting their practical applications. In this paper, we introduce HiFi-123, a method designed for high-fidelity and multi-view consistent 3D generation. Our contributions are twofold: First, we propose a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods. Second, capitalizing on the RGNV, we present a novel Reference-Guided State Distillation (RGSD) loss. When incorporated into the optimization-based image-to-3D pipeline, our method significantly improves 3D generation quality, achieving state-of-the-art performance. Comprehensive evaluations demonstrate the effectiveness of our approach over existing methods, both qualitatively and quantitatively. Video results are available on the project page.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# 集中治療室の視力検査と臨床所見との関連

Detecting Visual Cues in the Intensive Care Unit and Association with Patient Clinical Status ( http://arxiv.org/abs/2311.00565v2 )

ライセンス: Link先を確認

Subhash Nerella, Ziyuan Guan, Andrea Davidson, Yuanfang Ren, Tezcan Baslanti, Brooke Armfield, Patrick Tighe, Azra Bihorac, Parisa Rashidi,

(参考訳) 集中治療室(ICU)は、生命を脅かす疾患のある患者に対して、綿密な監督と継続的なケアを提供する。しかし、ICUの継続的な患者評価は、時間的制約と医療提供者の作業負荷により、依然として制限されている。 ICUの既存の患者評価、例えば痛みや移動性評価は散発的に行われ、手動で管理される。 ICUで人間のアセスメントを強化する人工知能(AI)ツールの開発は、より客観的できめ細かい監視機能を提供する上で有用である。例えば、痛みや興奮に関連する患者の顔の手がかりの変化を捉えることは、痛みに関連する薬の調整や、デリリウムのような扇動誘発状態を検出するのに役立ちます。さらに, 臨床症状の軽微な変化は, 高解像度の生理学的信号や電子健康記録(EHR)データと組み合わせることで, 継続的な患者のモニタリングに役立つ可能性がある。本稿では,視力低下,急性脳機能障害,痛みなど,視力と患者の状態との関連について検討した。 ICUで収集した107,064フレームのAU-ICUデータセットに、訓練されたアノテータによる顔アクションユニット(AU)ラベルを付与した。我々はデータ資源利用の最大化によりデータ不均衡問題に対処する新しい「マスケッド損失計算」手法を開発した。 AU-ICUデータセットと3つの外部データセットを用いてモデルをトレーニングし、18個のAUを検出した。 SWINトランスモデルはF1スコア平均0.57、テストセット平均0.89を達成した。さらに,634,054フレームのAU推論を行い,顔面AUと重症度,急性脳機能障害,痛みなどの臨床症状との関連性について検討した。

Intensive Care Units (ICU) provide close supervision and continuous care to patients with life-threatening conditions. However, continuous patient assessment in the ICU is still limited due to time constraints and the workload on healthcare providers. Existing patient assessments in the ICU such as pain or mobility assessment are mostly sporadic and administered manually, thus introducing the potential for human errors. Developing Artificial intelligence (AI) tools that can augment human assessments in the ICU can be beneficial for providing more objective and granular monitoring capabilities. For example, capturing the variations in a patient's facial cues related to pain or agitation can help in adjusting pain-related medications or detecting agitation-inducing conditions such as delirium. Additionally, subtle changes in visual cues during or prior to adverse clinical events could potentially aid in continuous patient monitoring when combined with high-resolution physiological signals and Electronic Health Record (EHR) data. In this paper, we examined the association between visual cues and patient condition including acuity status, acute brain dysfunction, and pain. We leveraged our AU-ICU dataset with 107,064 frames collected in the ICU annotated with facial action units (AUs) labels by trained annotators. We developed a new "masked loss computation" technique that addresses the data imbalance problem by maximizing data resource utilization. We trained the model using our AU-ICU dataset in conjunction with three external datasets to detect 18 AUs. The SWIN Transformer model achieved 0.57 mean F1-score and 0.89 mean accuracy on the test set. Additionally, we performed AU inference on 634,054 frames to evaluate the association between facial AUs and clinically important patient conditions such as acuity status, acute brain dysfunction, and pain.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# 多変量機能主成分分析における成分数推定について

On the estimation of the number of components in multivariate functional principal component analysis ( http://arxiv.org/abs/2311.04540v2 )

ライセンス: Link先を確認

Steven Golovkine, Edward Gunning, Andrew J. Simpkin, Norma Bargary,

(参考訳) Happ and Greven (2018) は、異なる次元領域で観測されたデータに対する多変量関数データの主成分分析のための方法論を開発した。彼らのアプローチは、各単変数機能特徴に対する単変数機能主成分の推定に依存する。本稿では,保持する主成分数を選択するための広範囲なシミュレーションについて述べる。本研究では,多変量機能データにおける分散の全体的パーセンテージを説明するために,各単変量機能特徴に対して,従来の分散説明しきい値を用いた手法は信頼できない可能性があることを実証的に示す。

Happ and Greven (2018) developed a methodology for principal components analysis of multivariate functional data for data observed on different dimensional domains. Their approach relies on an estimation of univariate functional principal components for each univariate functional feature. In this paper, we present extensive simulations to investigate choosing the number of principal components to retain. We show empirically that the conventional approach of using a percentage of variance explained threshold for each univariate functional feature may be unreliable when aiming to explain an overall percentage of variance in the multivariate functional data, and thus we advise practitioners to be careful when using it.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# スピンオプティカル量子コンピューティングアーキテクチャ

A Spin-Optical Quantum Computing Architecture ( http://arxiv.org/abs/2311.05605v4 )

ライセンス: Link先を確認

Grégoire de Gliniasty, Paul Hilaire, Pierre-Emmanuel Emeriau, Stephen C. Wein, Alexia Salavrakos, Shane Mansfield,

(参考訳) フォールトトレラント量子コンピューティング用に設計された適応性とモジュール型ハイブリッドアーキテクチャを提案する。量子エミッターと線形光学的エンタングゲートを組み合わせて、物質ベースのアプローチとフォトニックベースのアプローチの両方の強度を利用する。アーキテクチャの重要な特徴は、その実用性であり、実験的に証明された光学部品の利用に基礎を置いている。我々のフレームワークは量子エラー訂正コードの実行を可能にするが、特に遠方の光リンクを介して組み込みの非ローカル接続を利用することにより、低密度パリティチェックコードのスケーラビリティを維持する。その効率性を評価するために,物理的に動機付けられた誤りモデルを用いてアーキテクチャを評価した。既存の全フォトニックアーキテクチャに匹敵する損失許容性を示すが、従来はリソース集約多重化に依存していた複雑な線形光学的資源状態生成モジュールは不要である。アーキテクチャの汎用性は、さらなるパフォーマンス標準を向上するための、未知の道も提供します。

We introduce an adaptable and modular hybrid architecture designed for fault-tolerant quantum computing. It combines quantum emitters and linear-optical entangling gates to leverage the strength of both matter-based and photonic-based approaches. A key feature of the architecture is its practicality, grounded in the utilisation of experimentally proven optical components. Our framework enables the execution of any quantum error correcting code, but in particular maintains scalability for low-density parity check codes by exploiting built-in non-local connectivity through distant optical links. To gauge its efficiency, we evaluated the architecture using a physically motivated error model. It exhibits loss tolerance comparable to existing all-photonic architecture but without the need for intricate linear-optical resource-state-generation modules that conventionally rely on resource-intensive multiplexing. The versatility of the architecture also offers uncharted avenues for further advancing performance standards.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# 制限された純度状態をもつ可観測物の最大期待

Maximum expectation of observables with restricted purity states ( http://arxiv.org/abs/2311.07680v2 )

ライセンス: Link先を確認

Vikesh Siddhu, John Smolin,

(参考訳) 実用的な量子情報処理(QIP)の評価は、ノイズによって課される限界を理解せずに部分的に行われている。残念なことに、単なるノイズの記述はシステムサイズとともに指数関数的に増加し、差し迫った実用的関心を持つ控えめな大きさのシステムでさえも煩雑になる。我々は、ノイズの多い量子状態の準備、検証、観察を行うための推定の必要性を満たす。この推定のために,有界純度状態上での任意の$d$次元可観測物の期待値を最大化する高速数値アルゴリズムを提案する。これは測定可能な方法でノイズの純度因子に縛られる。我々の最も高速なアルゴリズムは、観測可能量の固有分解が分かっていれば$O(d)$ ステップを、さもなくば最悪の場合には$O(d^3)$ ステップを取る。このアルゴリズムはまた、凸や非凸純度制約による量子状態トモグラフィーの最大推定を解く。数値は、我々のキーサブルーチンの性能を示す(固定ベクトルと重なり合うような有界ノルムを持つ確率ベクトルが線形時間で見つかる)。我々の研究は、量子ノイズによるQIPの制限を評価するための実践的な道のりを推し進めている。その過程では、単純だが基本的な洞察を与え、ノイズの多いシステム(同じく雑音の多いハミルトン派)は、常にノイズのないシステムよりも高い基底状態エネルギーを与える。

Assessment of practical quantum information processing (QIP) remains partial without understanding limits imposed by noise. Unfortunately, mere description of noise grows exponentially with system size, becoming cumbersome even for modest sized systems of imminent practical interest. We fulfill the need for estimates on performing noisy quantum state preparation, verification, and observation. To do the estimation we propose fast numerical algorithms to maximize the expectation value of any $d$-dimensional observable over states of bounded purity. This bound on purity factors in noise in a measurable way. Our fastest algorithm takes $O(d)$ steps if the eigendecomposition of the observable is known, otherwise takes $O(d^3)$ steps at worst. The algorithms also solve maximum likelihood estimation for quantum state tomography with convex and even non-convex purity constraints. Numerics show performance of our key sub-routine (it finds in linear time a probability vector with bounded norm that most overlaps with a fixed vector) can be several orders of magnitude faster than a common state-of-the-art convex optimization solver. Our work fosters a practical way forward to asses limitations on QIP imposed by quantum noise. Along the way, we also give a simple but fundamental insight, noisy systems (equivalently noisy Hamiltonians) always give higher ground-state energy than their noiseless counterparts.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# 命令制御可能な要約のための大規模言語モデルのベンチマーク生成と評価能力

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization ( http://arxiv.org/abs/2311.09184v2 )

ライセンス: Link先を確認

Yixin Liu, Alexander R. Fabbri, Jiawen Chen, Yilun Zhao, Simeng Han, Shafiq Joty, Pengfei Liu, Dragomir Radev, Chien-Sheng Wu, Arman Cohan,

(参考訳) 大規模言語モデル(LLM)は、標準の総和化ベンチマークでは高い性能を達成することができるが、より複雑な総和化タスク設定では、その性能は研究されていない。そこで本研究では,命令制御可能なテキスト要約に対してLCMをベンチマークし,モデル入力が所望の要約特性に対して,ソース記事と自然言語要求の両方から成り立っていることを示す。そこで我々は,このタスク設定のための評価専用データセットをキュレートし,LLMに基づく5つのシステムの人間による評価を行い,制御可能な要約における命令追従能力を評価する。次に、4つの異なる評価プロトコルと11個のLCMを用いて、このタスクの自動評価をベンチマークし、40個の評価方法を得た。本研究は,(1) 評価された全てのLCMは,その要約において事実的および他の種類の誤りを犯しているため,命令制御可能なテキスト要約は依然として困難な課題であり,(2) 候補要約の質を判断する上で,LLMに基づく評価手法が人間アノテータと強い整合性を達成できないこと,(3) 異なるLCMが要約生成と評価能力において大きなパフォーマンスギャップを示すこと,などを明らかにする。収集したベンチマークであるInstruSumを公開して、今後の研究を促進する。

While large language models (LLMs) can already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction controllable text summarization, where the model input consists of both a source article and a natural language requirement for desired summary characteristics. To this end, we curate an evaluation-only dataset for this task setting and conduct human evaluations of five LLM-based systems to assess their instruction-following capabilities in controllable summarization. We then benchmark LLM-based automatic evaluation for this task with 4 different evaluation protocols and 11 LLMs, resulting in 40 evaluation methods. Our study reveals that instruction controllable text summarization remains a challenging task for LLMs, since (1) all LLMs evaluated still make factual and other types of errors in their summaries; (2) no LLM-based evaluation methods can achieve a strong alignment with human annotators when judging the quality of candidate summaries; (3) different LLMs show large performance gaps in summary generation and evaluation capabilities. We make our collected benchmark InstruSum publicly available to facilitate future research in this direction.

翻訳日:2024-07-16 05:46:55 公開日:2024-07-12

# マルチホップQAデータセットと擬似インストラクションチューニングによる大規模言語モデルのロバスト時間推論に向けて

Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning ( http://arxiv.org/abs/2311.09821v2 )

ライセンス: Link先を確認

Qingyu Tan, Hwee Tou Ng, Lidong Bing,

(参考訳) 現実世界の知識は常に更新されている。しかし、大きな言語モデル(LLM)を頻繁に更新するのはコストがかかる。したがって、LLMには時間的知識の概念を理解することが不可欠である。しかし、時間的質問応答 (TQA) に関する先行研究は、時間的推論の複数解答と複数ホップタイプを強調していなかった。本稿では,複数の質問応答と複数ホップの時間的推論に焦点をあてた,複雑な時間的質問応答データセットであるComplex-TRを提案する。また,LLMの複雑な時間的推論能力とロバスト性を改善するために,新たなデータ拡張戦略を提案する。複数の時間的QAデータセットについて実験を行った。実験結果から,本手法は時間的QAベンチマークにおけるLLMの性能をかなりのマージンで向上できることが示された。私たちのコードとデータは、https://github.com/nusnlp/complex-tr.comでリリースされています。

Knowledge in the real world is being updated constantly. However, it is costly to frequently update large language models (LLMs). Therefore, it is crucial for LLMs to understand the concept of temporal knowledge. However, prior works on temporal question answering (TQA) did not emphasize multi-answer and multi-hop types of temporal reasoning. In this paper, we propose a complex temporal question-answering dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning. Besides, we also propose a novel data augmentation strategy to improve the complex temporal reasoning capability and robustness of LLMs. We conducted experiments on multiple temporal QA datasets. Experimental results show that our method is able to improve LLMs' performance on temporal QA benchmarks by significant margins. Our code and data are released at: https://github.com/nusnlp/complex-tr.

翻訳日:2024-07-16 05:37:11 公開日:2024-07-12

# 双極子XYリドバーグシミュレータにおけるクエンチダイナミクスからの初等励起の分光

Spectroscopy of elementary excitations from quench dynamics in a dipolar XY Rydberg simulator ( http://arxiv.org/abs/2311.11726v2 )

ライセンス: Link先を確認

Cheng Chen, Gabriel Emperauger, Guillaume Bornet, Filippo Caleca, Bastien Gély, Marcus Bintz, Shubhayu Chatterjee, Vincent Liu, Daniel Barredo, Norman Y. Yao, Thierry Lahaye, Fabio Mezzacapo, Tommaso Roscilde, Antoine Browaeys,

(参考訳) 我々はRydberg量子シミュレータを用いて、多体系の低エネルギー励起を探索するクエンチ分光と呼ばれる新しいタイプの分光法を実証する。本稿では,スピン-1/2双極子XYモデルの二次元シミュレーションについて述べる。クエンチ後の空間スピン相関ダイナミクスの顕微鏡計測により, 強磁性体と反強磁性体の双方に対する基本励起の分散関係を抽出する。相互作用の長距離の性質から生じる2つのケースと反強磁性体に固有のフラストレーションの質的に異なる挙動を観察する。特に、強磁性体は線形スピン波として振る舞う基本的な励起を示す。反強磁性体では、スピン波は崩壊し、強い非線形性の存在が示唆される。実演では,多体系の励起スペクトルにおけるパワー・ロー相互作用の重要性を強調した。

We use a Rydberg quantum simulator to demonstrate a new form of spectroscopy, called quench spectroscopy, which probes the low-energy excitations of a many-body system. We illustrate the method on a two-dimensional simulation of the spin-1/2 dipolar XY model. Through microscopic measurements of the spatial spin correlation dynamics following a quench, we extract the dispersion relation of the elementary excitations for both ferro- and anti-ferromagnetic couplings. We observe qualitatively different behaviors between the two cases that result from the long-range nature of the interactions, and the frustration inherent in the antiferromagnet. In particular, the ferromagnet exhibits elementary excitations behaving as linear spin waves. In the anti-ferromagnet, spin waves appear to decay, suggesting the presence of strong nonlinearities. Our demonstration highlights the importance of power-law interactions on the excitation spectrum of a many-body system.

翻訳日:2024-07-16 05:37:11 公開日:2024-07-12

# FrePolad: 点雲生成のための周波数可変点潜時拡散

FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation ( http://arxiv.org/abs/2311.12090v2 )

ライセンス: Link先を確認

Chenliang Zhou, Fangcheng Zhong, Param Hanji, Zhilin Guo, Kyle Fogarty, Alejandro Sztrajman, Hongyun Gao, Cengiz Oztireli,

(参考訳) 本稿では,周波数補正点潜時拡散(FrePolad):変分オートエンコーダ(VAE)と変分拡散確率モデル(DDPM)を統合した点雲生成パイプライン。 FrePoladは、高い計算効率を維持しながら、生成タスクのポイントクラウド濃度において、高品質、多様性、柔軟性を同時に達成します。生成品質と多様性の向上は,(1)点雲分布を学習しながら高周波コンテンツを保持できる球面高調波による新しい周波数補正,(2)規則化されたが複雑な潜伏分布を学習するための潜伏DDPMによって達成される。さらに、FrePolad は変点雲濃度を、潜伏した形状分布上の条件分布として点のサンプリングを定式化することによって支持する。最後に、VAEによって符号化された低次元の潜伏空間は、FrePoladの高速でスケーラブルなサンプリングに寄与する。我々の定量および定性的な結果は、FrePoladの質、多様性、計算効率の点で最先端の性能を示している。プロジェクトページ: https://chenliang-zhou.github.io/FrePolad/。

We propose FrePolad: frequency-rectified point latent diffusion, a point cloud generation pipeline integrating a variational autoencoder (VAE) with a denoising diffusion probabilistic model (DDPM) for the latent distribution. FrePolad simultaneously achieves high quality, diversity, and flexibility in point cloud cardinality for generation tasks while maintaining high computational efficiency. The improvement in generation quality and diversity is achieved through (1) a novel frequency rectification via spherical harmonics designed to retain high-frequency content while learning the point cloud distribution; and (2) a latent DDPM to learn the regularized yet complex latent distribution. In addition, FrePolad supports variable point cloud cardinality by formulating the sampling of points as conditional distributions over a latent shape distribution. Finally, the low-dimensional latent space encoded by the VAE contributes to FrePolad's fast and scalable sampling. Our quantitative and qualitative results demonstrate FrePolad's state-of-the-art performance in terms of quality, diversity, and computational efficiency. Project page: https://chenliang-zhou.github.io/FrePolad/.

翻訳日:2024-07-16 05:37:11 公開日:2024-07-12

# InstaStyle:スタイリズされた画像の逆ノイズは、秘かにスタイルアドバイス

InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser ( http://arxiv.org/abs/2311.15040v3 )

ライセンス: Link先を確認

Xing Cui, Zekun Li, Pei Pei Li, Huaibo Huang, Xuannan Liu, Zhaofeng He,

(参考訳) スティル化されたテキスト・ツー・イメージ生成は、いくつかの参照画像で指定されたスタイルに固執しながら、テキスト記述から画像を作成することに焦点を当てる。しかし、異なる参照画像内の微妙なスタイルの変化は、モデルがターゲットのスタイルを正確に学習することを妨げる。本稿では,単一の参照画像のみを用いて高忠実度スタイリング画像を生成する手法であるInstaStyleを提案する。提案手法は,非ゼロ信号対雑音比で示されるように,スタイリングされた参照画像からの逆ノイズが本質的にスタイル信号を運ぶことに基づく。我々は、DDIMインバージョンを用いて、参照画像からこのノイズを抽出し、拡散モデルを利用して「スタイル」ノイズから新しいスタイル化された画像を生成する。さらに、テキストプロンプトの本来の曖昧さと偏見は、スタイルの正確な伝達を妨げる。そこで本研究では,参照画像のスタイル記述の精度を高めるために,プロンプトリファインメントによる学習可能なスタイルトークンを提案する。定性的かつ定量的な実験結果から、InstaStyleは現在のベンチマークよりも優れた性能を発揮することが示された。さらに,本手法は,混合インバージョンノイズと組み合わせたスタイルの創造的タスクにおいて,その能力を示す。

Stylized text-to-image generation focuses on creating images from textual descriptions while adhering to a style specified by a few reference images. However, subtle style variations within different reference images can hinder the model from accurately learning the target style. In this paper, we propose InstaStyle, a novel approach that excels in generating high-fidelity stylized images with only a single reference image. Our approach is based on the finding that the inversion noise from a stylized reference image inherently carries the style signal, as evidenced by their non-zero signal-to-noise ratio. We employ DDIM inversion to extract this noise from the reference image and leverage a diffusion model to generate new stylized images from the "style" noise. Additionally, the inherent ambiguity and bias of textual prompts impede the precise conveying of style. To address this, we introduce a learnable style token via prompt refinement, which enhances the accuracy of the style description for the reference image. Qualitative and quantitative experimental results demonstrate that InstaStyle achieves superior performance compared to current benchmarks. Furthermore, our approach also showcases its capability in the creative task of style combination with mixed inversion noise.

翻訳日:2024-07-16 05:37:11 公開日:2024-07-12

# 半教師型医用画像セグメンテーションのための交互教育

Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2311.17325v2 )

ライセンス: Link先を確認

Zhen Zhao, Zicheng Wang, Longyue Wang, Dian Yu, Yixuan Yuan, Luping Zhou,

(参考訳) 半教師付き医用画像セグメンテーション研究は、ラベル付きデータに制限のあるトレーニングモデルにおいて有望であることが示されている。しかし、現在の指導学生ベースのアプローチは、確証バイアスに悩まされることがある。この課題に対処するために,教師-学生フレームワークにおける多様な教育手法であるAD-MTを提案する。一人の生徒モデルと2つの訓練不可能な教師モデルがあり、それは定期的に、ランダムに、別の方法で、モーメントを更新する。 AD-MTのコアはRPA (Random Periodic Alternate) Updating Module と Conflict-Combating Module (CCM) の2つの提案されたモジュールにある。 RPAは、異なる教育の観点から多様な推論を促進するために、補完的なデータバッチ、異なるデータ拡張、ランダムな切り替え期間の交互に多様な更新プロセスをスケジュールする。 CCMは、教師間の一貫性と矛盾する予測の両方からモデルを学習するよう促すために、エントロピーに基づくアンサンブル戦略を採用している。各種半教師付き環境における2次元および3次元医用セグメンテーションベンチマークにおけるAD-MTの有効性と優位性を示す実験結果を得た。

Semi-supervised medical image segmentation studies have shown promise in training models with limited labeled data. However, current dominant teacher-student based approaches can suffer from the confirmation bias. To address this challenge, we propose AD-MT, an alternate diverse teaching approach in a teacher-student framework. It involves a single student model and two non-trainable teacher models that are momentum-updated periodically and randomly in an alternate fashion. To mitigate the confirmation bias from the diverse supervision, the core of AD-MT lies in two proposed modules: the Random Periodic Alternate (RPA) Updating Module and the Conflict-Combating Module (CCM). The RPA schedules the alternating diverse updating process with complementary data batches, distinct data augmentation, and random switching periods to encourage diverse reasoning from different teaching perspectives. The CCM employs an entropy-based ensembling strategy to encourage the model to learn from both the consistent and conflicting predictions between the teachers. Experimental results demonstrate the effectiveness and superiority of our AD-MT on the 2D and 3D medical segmentation benchmarks across various semi-supervised settings.

翻訳日:2024-07-16 05:37:11 公開日:2024-07-12

# DiG-IN: Dig-IN: Diffusion Guidance for Investigationing Networks -- Uncovering Classifier differences Neuron Visualisations and Visual Counterfactual Explanations

DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations ( http://arxiv.org/abs/2311.17833v3 )

ライセンス: Link先を確認

Maximilian Augustin, Yannic Neuhaus, Matthias Hein,

(参考訳) ディープラーニングは、ImageNetのような複雑な画像分類タスクに大きな進歩をもたらしたが、予期せぬ失敗モード、例えば突発的な機能によって、これらの分類器が野生でいかに確実に機能するかを疑問視する。さらに、安全クリティカルなタスクには、その決定のブラックボックスの性質に問題がある。本稿では、ガイド画像生成のためのフレームワークを用いて、分類器由来の目的を最適化した画像を生成することにより、これらの問題に対処する。視覚的対実的説明(VCE)による画像分類器の決定、分類器が最大に一致しない画像の解析による系統的誤りの検出、ニューロンの可視化と刺激的特徴の可視化を行う。このようにして、敵の頑健なモデルの形状バイアスや新しい故障モード、例えばゼロショットCLIP分類器の系統的エラーなど、既存の観測結果を検証する。さらに、VCEはより汎用性が高く、以前の作業よりも優れています。

While deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features, call into question how reliably these classifiers work in the wild. Furthermore, for safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently. In this paper, we address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation. We analyze the decisions of image classifiers by visual counterfactual explanations (VCEs), detection of systematic mistakes by analyzing images where classifiers maximally disagree, and visualization of neurons and spurious features. In this way, we validate existing observations, e.g. the shape bias of adversarially robust models, as well as novel failure modes, e.g. systematic errors of zero-shot CLIP classifiers. Moreover, our VCEs outperform previous work while being more versatile.

翻訳日:2024-07-16 05:37:11 公開日:2024-07-12

# 静止映像の高分解能化のためのモーションガイド下潜時拡散法

Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution ( http://arxiv.org/abs/2312.00853v2 )

ライセンス: Link先を確認

Xi Yang, Chenhang He, Jianqi Ma, Lei Zhang,

(参考訳) 現実世界の低解像度(LR)ビデオは多種多様で複雑な劣化があり、高解像度(HR)を高品質に再現するビデオ超解像度(VSR)アルゴリズムに大きな課題を生んでいる。近年,画像復元作業における現実的な細部の生成において,拡散モデルの性能が向上している。しかし、拡散過程はランダム性を持ち、復元された画像の内容を制御することは困難である。この問題は、ビデオの知覚品質に時間的一貫性が不可欠であるため、VSRタスクに拡散モデルを適用する際にさらに深刻になる。本稿では,事前学習した潜伏拡散モデルの強度を利用した実世界のVSRアルゴリズムを提案する。隣接フレーム間のコンテンツ整合性を確保するため、LRビデオの時間的ダイナミクスを利用して、遅延サンプリングパスを動作誘導損失で最適化し、生成したHRビデオがコヒーレントかつ連続的な視覚的流れを維持することを保証する。生成した細部の不連続性をさらに軽減するため、デコーダに時間モジュールを挿入し、革新的なシーケンス指向の損失で微調整する。動き誘導型潜在拡散(MGLD)に基づくVSRアルゴリズムは、実世界のVSRベンチマークデータセットの最先端技術よりもはるかに優れた知覚品質を実現し、提案したモデル設計およびトレーニング戦略の有効性を検証した。

Real-world low-resolution (LR) videos have diverse and complex degradations, imposing great challenges on video super-resolution (VSR) algorithms to reproduce their high-resolution (HR) counterparts with high quality. Recently, the diffusion models have shown compelling performance in generating realistic details for image restoration tasks. However, the diffusion process has randomness, making it hard to control the contents of restored images. This issue becomes more serious when applying diffusion models to VSR tasks because temporal consistency is crucial to the perceptual quality of videos. In this paper, we propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models. To ensure the content consistency among adjacent frames, we exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss, ensuring that the generated HR video maintains a coherent and continuous visual flow. To further mitigate the discontinuity of generated details, we insert temporal module to the decoder and fine-tune it with an innovative sequence-oriented loss. The proposed motion-guided latent diffusion (MGLD) based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets, validating the effectiveness of the proposed model design and training strategies.

翻訳日:2024-07-16 05:37:11 公開日:2024-07-12

# 宇宙活動による人間とシーンのインタラクションの再考

Revisit Human-Scene Interaction via Space Occupancy ( http://arxiv.org/abs/2312.02700v2 )

ライセンス: Link先を確認

Xinpeng Liu, Haowen Hou, Yanchao Yang, Yong-Lu Li, Cewu Lu,

(参考訳) HSI(Human-Scene Interaction)の生成は、さまざまな下流タスクにとって困難なタスクである。しかし、大きな障害の1つは、その限られたデータスケールである。同時にキャプチャされた人間と3D環境による高品質なデータを取得するのは難しいため、データの多様性と複雑さが制限される。本研究では、シーンとのインタラクションが、抽象的な物理的視点から、シーンの空間占有と本質的に相互作用していると論じる。純粋な動きシーケンスを、見えないシーン占有と相互作用する人間の記録として扱うことで、動きのみのデータを、大規模にペア化された人間の占有相互作用データベースであるMotion Occupancy Base (MOB)に集約することができる。したがって、高品質なシーンスキャンによるコスト対のモーションシーンデータセットの必要性を大幅に軽減することができる。この新たな統合された人間-職業相互作用の視点により、周囲の占有状況から目標状態に到達するための単一のモーションコントローラが提案される。複雑な占有配置を持つMOBでトレーニングをすれば、人間の動きに強く依存するので、コントローラーは狭いシーンを処理し、通常のリビングルームのような限られた複雑さを持つ一般的なシーンに一般化することができる。トレーニング用のGT 3Dシーンがないため、静的シーンと動的シーンの両方を含む様々なシナリオにおいて、現実的で安定したHSIモーションを生成できる。このプロジェクトはhttps://foruck.github.io/occu-page/.comで入手できる。

Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks. However, one of the major obstacles is its limited data scale. High-quality data with simultaneously captured human and 3D environments is hard to acquire, resulting in limited data diversity and complexity. In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective, leading us to a unified novel view of Human-Occupancy Interaction. By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database: Motion Occupancy Base (MOB). Thus, the need for costly paired motion-scene datasets with high-quality scene scans can be substantially alleviated. With this new unified view of Human-Occupancy interaction, a single motion controller is proposed to reach the target state given the surrounding occupancy. Once trained on MOB with complex occupancy layout, which is stringent to human movements, the controller could handle cramped scenes and generalize well to general scenes with limited complexity like regular living rooms. With no GT 3D scenes for training, our method can generate realistic and stable HSI motions in diverse scenarios, including both static and dynamic scenes. The project is available at https://foruck.github.io/occu-page/.

翻訳日:2024-07-16 05:37:11 公開日:2024-07-12

# 高等教育におけるジェネレーティブAI:大学政策・資源・ガイドラインを通してChatGPTを見る

Generative AI in Higher Education: Seeing ChatGPT Through Universities' Policies, Resources, and Guidelines ( http://arxiv.org/abs/2312.05235v3 )

ライセンス: Link先を確認

Hui Wang, Anh Dang, Zihao Wu, Son Mac,

(参考訳) ジェネレーティブ・人工知能(GenAI)の進歩は、教育経験を豊かにする機会を提供するだけでなく、学術的完全性への懸念も引き起こす。多くの教育者は、GenAIを教育実践に取り入れることに対する不安とためらいを表明しており、彼らの教室にGenAIを効果的に組み込むために支援できる制度からの勧告や指導を必要としている。本研究は、高等教育者のニーズに応えるため、大学や教育者が、GenAIの利用、特にChatGPTに関する米国トップクラスの大学が確立した学術政策やガイドラインを分析し、学術的文脈におけるGenAIの発展にどのように対応し、適応するかを検討することを目的とする。データソースには、米国内の上位100大学が提供する学術的方針、声明、ガイドライン、関連するリソースが含まれており、これらの大学の大半は、GenAIに対してオープンだが慎重なアプローチを採用していることを示している。主な関心事は、倫理的利用、正確性、データのプライバシーである。ほとんどの大学は、シラバステンプレート、ワークショップ、共有記事、一般的な技術紹介、倫理的関心事、教育的応用、予防戦略、データプライバシー、制限、探偵ツールなど、様々な種類のリソースを積極的に対応し提供しています。この発見は、教育実践における教育者への実践的な教育的意味を4つ与えている: その存在を受け入れ、学習目的と使用を一致させ、誤用を防ぐためのカリキュラムを進化させ、AI検出器に頼るのではなく、多面的評価戦略を採用する。政策立案における教育者には2つの推奨事項が提案される: 規律固有の政策とガイドラインを確立し、機密情報を慎重に管理する。

The advancements in Generative Artificial Intelligence (GenAI) provide opportunities to enrich educational experiences, but also raise concerns about academic integrity. Many educators have expressed anxiety and hesitation in integrating GenAI in their teaching practices, and are in needs of recommendations and guidance from their institutions that can support them to incorporate GenAI in their classrooms effectively. In order to respond to higher educators' needs, this study aims to explore how universities and educators respond and adapt to the development of GenAI in their academic contexts by analyzing academic policies and guidelines established by top-ranked U.S. universities regarding the use of GenAI, especially ChatGPT. Data sources include academic policies, statements, guidelines, and relevant resources provided by the top 100 universities in the U.S. Results show that the majority of these universities adopt an open but cautious approach towards GenAI. Primary concerns lie in ethical usage, accuracy, and data privacy. Most universities actively respond and provide diverse types of resources, such as syllabus templates, workshops, shared articles, and one-on-one consultations focusing on a range of topics: general technical introduction, ethical concerns, pedagogical applications, preventive strategies, data privacy, limitations, and detective tools. The findings provide four practical pedagogical implications for educators in teaching practices: accept its presence, align its use with learning objectives, evolve curriculum to prevent misuse, and adopt multifaceted evaluation strategies rather than relying on AI detectors. Two recommendations are suggested for educators in policy making: establish discipline-specific policies and guidelines, and manage sensitive information carefully.

翻訳日:2024-07-16 05:37:10 公開日:2024-07-12

# Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0

Internet of Federated Digital Twins (IoFDT): Connecting Twins Beyond Borders for Society 5.0 ( http://arxiv.org/abs/2312.06432v2 )

ライセンス: Link先を確認

Tao Yu, Zongdian Li, Kei Sakaguchi, Omar Hashash, Walid Saad, Merouane Debbah,

(参考訳) 物理的システムのプログラム可能なデジタル表現を可能にするデジタルツイン(DT)の概念は、将来の産業に革命をもたらすものと期待され、将来のスマート社会、すなわち社会5.0のビジョンの中心に位置する。しかし、このようなDT駆動のSociety 5.0の成功は、人工知能とネットワーク技術の相乗的収束を必要とする。これまでの研究は定性的な研究、単純な分析、単一DTのソフトウェア実装に限られていたため、Society 5.0が要求するデジタル空間と物理空間の高度にシナジスティックな統合は提供できない。これとは対照的に,本稿では,異なる社会5.0サービスを表す異種・物理的に分離されたDTを,単一のフレームワークとシステムに一元的に統合する,インターネット・オブ・フェデレーション・デジタル・ツインズ(IoFDT)の新たな概念を構想する。 IoFDTのこの概念のために、我々はまず、水平と垂直の相互作用を通じてフェデレーションされたDTを統合する階層的アーキテクチャを導入し、サイバー空間と物理空間をブリッジして、新たな可能性を開く。そして、IoFDTを実現する上での課題について議論し、通信、コンピューティング、AIネイティブネットワーク間の複雑さを強調しながら、潜在的な革新的なソリューションを強調します。その後、我々は、すべての技術コンポーネントを統合し、それらの相互作用を編成する統合IoFDTプラットフォームの実装の重要性を詳述し、スマートモビリティのような分野における実世界のアプリケーションに焦点を当てた実践的なプラットフォームの必要性を強調した。

The concept of digital twin (DT), which enables the creation of a programmable, digital representation of physical systems, is expected to revolutionize future industries and will lie at the heart of the vision of a future smart society, namely, Society 5.0, in which high integration between cyber (digital) and physical spaces is exploited to bring economic and societal advancements. However, the success of such a DT-driven Society 5.0 requires a synergistic convergence of artificial intelligence and networking technologies into an integrated, programmable system that can coordinate DT networks to effectively deliver diverse Society 5.0 services. Prior works remain restricted to either qualitative study, simple analysis or software implementations of a single DT, and thus, they cannot provide the highly synergistic integration of digital and physical spaces as required by Society 5.0. In contrast, this paper envisions a novel concept of an Internet of Federated Digital Twins (IoFDT) that holistically integrates heterogeneous and physically separated DTs representing different Society 5.0 services within a single framework and system. For this concept of IoFDT, we first introduce a hierarchical architecture that integrates federated DTs through horizontal and vertical interactions, bridging cyber and physical spaces to unlock new possibilities. Then, we discuss challenges of realizing IoFDT, highlighting the intricacies across communication, computing, and AI-native networks while also underscoring potential innovative solutions. Subsequently, we elaborate on the importance of the implementation of a unified IoFDT platform that integrates all technical components and orchestrates their interactions, emphasizing the necessity of practical experimental platforms with a focus on real-world applications in areas like smart mobility.

翻訳日:2024-07-16 05:37:10 公開日:2024-07-12

# 潜在回廊による適応的人軌道予測

Adaptive Human Trajectory Prediction via Latent Corridors ( http://arxiv.org/abs/2312.06653v2 )

ライセンス: Link先を確認

Neerja Thakkar, Karttikeya Mangalam, Andrea Bajcsy, Jitendra Malik,

(参考訳) 人間の軌道予測は、通常ゼロショットの一般化問題として提起される:予測器はトレーニングシーンで人間の動きのデータセットで学習され、見当たらないテストシーンに展開される。このパラダイムは、非常に進歩していますが、デプロイメントシーンにおける人間の振る舞いの傾向は、時間とともに一定であると、基本的に仮定しています。このように、現在の予測モデルは、一時的に集まる群衆や、雨の中を急いでいる歩行者、水たまりを避けたり、抗議活動など、シーン固有の一時的な人間の行動に適応できない。本稿では,シーン固有の適応軌道予測の問題を形式化し,潜時廊下と呼ばれる即時チューニングにヒントを得た新しい適応手法を提案する。学習可能な画像プロンプトで事前訓練された人間の軌道予測器の入力を増大させることで、極めて少ない新しいデータ(例えば、30秒間観察された2人の人間)からトレンドを推測することで、配置シーンを改善することができる。 0.1%の追加モデルパラメータでは、MOTSynthのシミュレーションデータの改善が23.9%、MOTおよびWildtrackにおけるADEが16.4%となる。定性的には,非適応予測器が捕捉に苦慮するシーン幾何学とシーン固有の人間の行動に意識を抱く潜伏廊下が予測器に現れるのを観察する。プロジェクトのWebサイトはhttps://neerja.me/atp_latent_corridors/にある。

Human trajectory prediction is typically posed as a zero-shot generalization problem: a predictor is learnt on a dataset of human motion in training scenes, and then deployed on unseen test scenes. While this paradigm has yielded tremendous progress, it fundamentally assumes that trends in human behavior within the deployment scene are constant over time. As such, current prediction models are unable to adapt to scene-specific transient human behaviors, such as crowds temporarily gathering to see buskers, pedestrians hurrying through the rain and avoiding puddles, or a protest breaking out. We formalize the problem of scene-specific adaptive trajectory prediction and propose a new adaptation approach inspired by prompt tuning called latent corridors. By augmenting the input of any pre-trained human trajectory predictor with learnable image prompts, the predictor can improve in the deployment scene by inferring trends from extremely small amounts of new data (e.g., 2 humans observed for 30 seconds). With less than 0.1% additional model parameters, we see up to 23.9% ADE improvement in MOTSynth simulated data and 16.4% ADE in MOT and Wildtrack real pedestrian data. Qualitatively, we observe that latent corridors imbue predictors with an awareness of scene geometry and scene-specific human behaviors that non-adaptive predictors struggle to capture. The project website can be found at https://neerja.me/atp_latent_corridors/.

翻訳日:2024-07-16 05:37:10 公開日:2024-07-12

# 3DReact: 化学反応のための幾何学的深層学習

3DReact: Geometric deep learning for chemical reactions ( http://arxiv.org/abs/2312.08307v2 )

ライセンス: Link先を確認

Puck van Gerwen, Ksenia R. Briling, Charlotte Bunne, Vignesh Ram Somnath, Ruben Laplaza, Andreas Krause, Clemence Corminboeuf,

(参考訳) ニューラルネットワークアーキテクチャに関連する分子対称性を組み込んだ幾何学的ディープラーニングモデルは、分子特性の予測の精度とデータ効率を大幅に改善した。この成功に基づいて,反応物と生成物の三次元構造から反応特性を予測する幾何学的深層学習モデルである3DReactを導入する。モデルの不変バージョンが既存の反応データセットに十分であることを示す。本稿では,GDB7-22-TS,Cyclo-23-TS,Proparg-21-TSの各データセットにおけるアクティベーションバリアの予測における競合性能について述べる。反応特性予測の既存のモデルと比較して、3DReactは、もし利用可能であれば原子をマッピングする情報を利用する柔軟なフレームワークと、(不変または同変の方法で)反応物質と生成物のジオメトリを提供する。したがって、異なるデータセット、原子をマッピングするレシエーション、および補間と補間の両方のタスクを体系的にうまく実行する。

Geometric deep learning models, which incorporate the relevant molecular symmetries within the neural network architecture, have considerably improved the accuracy and data efficiency of predictions of molecular properties. Building on this success, we introduce 3DReact, a geometric deep learning model to predict reaction properties from three-dimensional structures of reactants and products. We demonstrate that the invariant version of the model is sufficient for existing reaction datasets. We illustrate its competitive performance on the prediction of activation barriers on the GDB7-22-TS, Cyclo-23-TS and Proparg-21-TS datasets in different atom-mapping regimes. We show that, compared to existing models for reaction property prediction, 3DReact offers a flexible framework that exploits atom-mapping information, if available, as well as geometries of reactants and products (in an invariant or equivariant fashion). Accordingly, it performs systematically well across different datasets, atom-mapping regimes, as well as both interpolation and extrapolation tasks.

翻訳日:2024-07-16 05:37:10 公開日:2024-07-12

# 低エネルギー部分空間におけるディジタル量子シミュレーションの複雑さ:応用と下界

Complexity of Digital Quantum Simulation in the Low-Energy Subspace: Applications and a Lower Bound ( http://arxiv.org/abs/2312.08867v3 )

ライセンス: Link先を確認

Weiyuan Gong, Shuo Zhou, Tongyang Li,

(参考訳) デジタル量子シミュレーションは、ハミルトニアンのユニタリ進化の近似に広く応用されている。実際、量子系の多くのシミュレーションタスクはヒルベルト空間全体ではなく低エネルギー部分空間の量子状態に焦点を当てている。本稿では,低エネルギー部分空間の積公式に基づいて,ディジタル量子シミュレーションの複雑さを系統的に検討する。シミュレーション誤差は、様々なデジタル量子シミュレーションアルゴリズムや量子システムにおいて、ハミルトニアンの有効な低エネルギーノルムに依存しており、熱化による不完全な状態の準備であっても、完全なユニタリシミュレーションの以前の複雑さよりも改善できることが示される。特に、低エネルギー部分空間におけるスピンモデルをシミュレートするためには、qDRIFTやランダムな置換のようなランダム化された積公式がより小さなトロッター数を必要とすることを証明する。このような改善は対称性に保護されたデジタル量子シミュレーションでも継続する。我々は、パワーロー量子相互作用の力学をシミュレートする上で、同様の改善を証明した。また、低エネルギー部分空間における一般ディジタル量子シミュレーションのためのクエリローバウンドを提供する。

Digital quantum simulation has broad applications in approximating unitary evolution of Hamiltonians. In practice, many simulation tasks for quantum systems focus on quantum states in the low-energy subspace instead of the entire Hilbert space. In this paper, we systematically investigate the complexity of digital quantum simulation based on product formulas in the low-energy subspace. We show that the simulation error depends on the effective low-energy norm of the Hamiltonian for a variety of digital quantum simulation algorithms and quantum systems, allowing improvements over the previous complexities for full unitary simulations even for imperfect state preparations due to thermalization. In particular, for simulating spin models in the low-energy subspace, we prove that randomized product formulas such as qDRIFT and random permutation require smaller Trotter numbers. Such improvement also persists in symmetry-protected digital quantum simulations. We prove a similar improvement in simulating the dynamics of power-law quantum interactions. We also provide a query lower bound for general digital quantum simulations in the low-energy subspace.

翻訳日:2024-07-16 05:37:10 公開日:2024-07-12

# 政策学習のための任意の軌道モデリング

Any-point Trajectory Modeling for Policy Learning ( http://arxiv.org/abs/2401.00025v3 )

ライセンス: Link先を確認

Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel,

(参考訳) デモから学ぶことはロボットに新しいスキルを教える強力な方法であり、より多くのデモデータを持つことでポリシー学習が向上することが多い。しかし、デモデータを収集するコストが高いことは、重大なボトルネックである。ビデオは、リッチなデータソースとして、行動、物理、意味に関する知識を含んでいるが、アクションラベルの欠如により、それらから制御固有の情報を抽出することは困難である。本研究では、ビデオフレーム内の任意の点の将来の軌跡を予測するために、トラジェクトリモデルを事前学習することで、ビデオデモを利用する新しいフレームワーク、Any-point Trajectory Modeling (ATM)を導入する。トレーニングが完了すると、これらのトラジェクトリは詳細な制御ガイダンスを提供し、最小のアクションラベル付きデータによる堅牢なビジュモータポリシーの学習を可能にする。シミュレーションと実世界の両方で評価した130以上の言語条件タスクにおいて、ATMは強力なビデオ事前学習ベースラインを平均80%上回っている。さらに,異なるロボット形態から人間のビデオやビデオから操作スキルを効果的に伝達する学習方法を示す。ビジュアライゼーションとコードは以下の通りである。

Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, Any-point Trajectory Modeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Across over 130 language-conditioned tasks we evaluated in both simulation and the real world, ATM outperforms strong video pre-training baselines by 80% on average. Furthermore, we show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology. Visualizations and code are available at: \url{https://xingyu-lin.github.io/atm}.

翻訳日:2024-07-16 05:37:10 公開日:2024-07-12

# ニューラル計算のための新しいパラダイム:学習可能なニューロンと適応可能な構造を持つXネット

A Novel Paradigm for Neural Computation: X-Net with Learnable Neurons and Adaptable Structure ( http://arxiv.org/abs/2401.01772v2 )

ライセンス: Link先を確認

Yanjie Li, Weijun Li, Lina Yu, Min Wu, Jinyi Liu, Wenqiang Li, Meilan Hao, Shu Wei, Yusong Deng, Liping Zhang, Xiaoli Dong, Hong Qin, Xin Ning, Yugui Zhang, Baoli Lu, Jian Xu, Shuang Li,

(参考訳) 多層認識(MLP)は、バイオインフォマティクスから金融分析まで、様々な分野に浸透し、現代の科学研究の課題に欠かせない存在となっている。しかし、MLPには明らかな欠点がある。 1) アクティベーション関数のタイプは単一かつ比較的固定的であり,ネットワークの「表現能力」が低下し,ネットワーク構造が適応的でなく,ネットワーク構造が冗長あるいは不十分である場合が多い。本研究では,MLPを置き換えることを約束する新しいニューラルネットワークパラダイムX-Netを提案する。 X-Netは、訓練中のデリバティブ情報に基づいて個別にアクティベーション関数を動的に学習し、特定のタスクに対するネットワークの表現能力を改善する。同時に、X-Netはニューロンレベルでネットワーク構造を正確に調整し、様々な複雑さのタスクに対応し、計算コストを削減できる。 X-Net は表現能力において MLP よりも優れていることを示す。 X-Netは、回帰や分類タスクのパラメータをはるかに小さくして、MPPと同等またはそれ以上の性能を達成することができる。具体的には、パラメータの数に関して言えば、X-Netは平均でMLPの3%しかなく、一部のタスクでは1.1%しか持たない。我々はまた、X-Netがエネルギー、環境、航空宇宙といった様々な分野のデータに対して科学的発見を行う能力を示し、X-Netは科学者が新しい数学や物理学の法則を発見する手助けをする。

Multilayer perception (MLP) has permeated various disciplinary domains, ranging from bioinformatics to financial analytics, where their application has become an indispensable facet of contemporary scientific research endeavors. However, MLP has obvious drawbacks. 1), The type of activation function is single and relatively fixed, which leads to poor `representation ability' of the network, and it is often to solve simple problems with complex networks; 2), the network structure is not adaptive, it is easy to cause network structure redundant or insufficient. In this work, we propose a novel neural network paradigm X-Net promising to replace MLPs. X-Net can dynamically learn activation functions individually based on derivative information during training to improve the network's representational ability for specific tasks. At the same time, X-Net can precisely adjust the network structure at the neuron level to accommodate tasks of varying complexity and reduce computational costs. We show that X-Net outperforms MLPs in terms of representational capability. X-Net can achieve comparable or even better performance than MLP with much smaller parameters on regression and classification tasks. Specifically, in terms of the number of parameters, X-Net is only 3% of MLP on average and only 1.1% under some tasks. We also demonstrate X-Net's ability to perform scientific discovery on data from various disciplines such as energy, environment, and aerospace, where X-Net is shown to help scientists discover new laws of mathematics or physics.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# いずれにせよ、誰の妻なのか?機械翻訳における同性関係に対する偏見の評価

Whose wife is it anyway? Assessing bias against same-gender relationships in machine translation ( http://arxiv.org/abs/2401.04972v2 )

ライセンス: Link先を確認

Ian Stewart, Rada Mihalcea,

(参考訳) 機械翻訳は、しばしばバイアスのあるデータやアルゴリズムに悩まされる。性規範の偏見は研究されているが、MTシステムが社会関係に関する偏見を符号化しているかどうかについては、例えば「弁護士が妻にキスをした」など、あまり知られていない。 MTシステムにおける同性関係に対するバイアスの程度を,いくつかの名詞・ジェンダー言語(例えばスペイン語)から抽出されたテンプレート文を用いて検討した。 3つの一般的なMTサービスは、同じ性別のエンティティ間の関係に関する文を正確に翻訳することができないことが分かりました。エラー率は文脈によって大きく異なり、高い女性表現の職業を参照する同性文はより低い精度で翻訳される。本研究は,社会関係に関するNLPシステムにおける本質的バイアス評価のケーススタディである。

Machine translation often suffers from biased data and algorithms that can lead to unacceptable errors in system output. While bias in gender norms has been investigated, less is known about whether MT systems encode bias about social relationships, e.g., "the lawyer kissed her wife." We investigate the degree of bias against same-gender relationships in MT systems, using generated template sentences drawn from several noun-gender languages (e.g., Spanish) and comprised of popular occupation nouns. We find that three popular MT services consistently fail to accurately translate sentences concerning relationships between entities of the same gender. The error rate varies considerably based on the context, and same-gender sentences referencing high female-representation occupations are translated with lower accuracy. We provide this work as a case study in the evaluation of intrinsic bias in NLP systems with respect to social relationships.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# DISTINQT - 将来のモバイルおよびワイヤレスネットワークのためのQoS予測のための分散プライバシ意識学習フレームワーク

DISTINQT: A Distributed Privacy Aware Learning Framework for QoS Prediction for Future Mobile and Wireless Networks ( http://arxiv.org/abs/2401.10158v2 )

ライセンス: Link先を確認

Nikolaos Koursioumpas, Lina Magoula, Ioannis Stavrakakis, Nancy Alonistioti, M. A. Gutierrez-Estevez, Ramin Khalili,

(参考訳) 5Gと6G以上のネットワークは、あるレベルのQuality of Service(QoS)に依存してスムーズな運用を行う、新しくて挑戦的なユースケースとアプリケーションをサポートすることが期待されている。 QoSをタイムリーに予測することは、特に車両通信の場合のように、安全クリティカルな用途において非常に重要である。近年まで、集中型人工知能(AI)ソリューションによってQoS予測が実行されてきたが、多くのプライバシー、計算、運用上の懸念が浮かび上がっている。新たなソリューション(例えば、Split Learning、Federated Learning)が浮上し、データのプライバシを保ちながら、ノード間で複雑さを低減したAIタスクが分散された。しかし、スケーラブルな分散学習アプローチに関しては、将来の無線ネットワークの異質性を考慮して、新たな課題が浮かび上がっている。現在の研究は、QoS予測のための新しいマルチヘッド入力プライバシ対応分散学習フレームワークであるDISTINQTを提案する。我々のフレームワークは、データ型とモデルアーキテクチャの観点から複数の異種ノードをサポートし、それらをまたいだ計算を共有する。これにより、最終QoS予測モデルの堅牢性と一般化能力を高めるために、多様な知識を単独の学習プロセスに組み込むことができる。 DISTINQTはまた、あらゆる生の入力データを、送信前に非常に複雑で圧縮され、不可逆な潜在表現に符号化することで、データのプライバシ保護に貢献する。評価の結果,DISTINQTは集中型よりも統計的に同じ性能を示し,プライバシー保護クレームの有効性が証明された。 DISTINQTは、Tele-Operated Drivingのユースケースで提示された6つの最先端集中型ベースラインソリューションに対して、平均65%の予測誤差の低減を実現している。

Beyond 5G and 6G networks are expected to support new and challenging use cases and applications that depend on a certain level of Quality of Service (QoS) to operate smoothly. Predicting the QoS in a timely manner is of high importance, especially for safety-critical applications as in the case of vehicular communications. Although until recent years the QoS prediction has been carried out by centralized Artificial Intelligence (AI) solutions, a number of privacy, computational, and operational concerns have emerged. Alternative solutions have surfaced (e.g. Split Learning, Federated Learning), distributing AI tasks of reduced complexity across nodes, while preserving the privacy of the data. However, new challenges rise when it comes to scalable distributed learning approaches, taking into account the heterogeneous nature of future wireless networks. The current work proposes DISTINQT, a novel multi-headed input privacy-aware distributed learning framework for QoS prediction. Our framework supports multiple heterogeneous nodes, in terms of data types and model architectures, by sharing computations across them. This enables the incorporation of diverse knowledge into a sole learning process that will enhance the robustness and generalization capabilities of the final QoS prediction model. DISTINQT also contributes to data privacy preservation by encoding any raw input data into highly complex, compressed, and irreversible latent representations before any transmission. Evaluation results showcase that DISTINQT achieves a statistically identical performance compared to its centralized version, while also proving the validity of the privacy preserving claims. DISTINQT manages to achieve a reduction in prediction error of up to 65% on average against six state-of-the-art centralized baseline solutions presented in the Tele-Operated Driving use case.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# 深層学習に基づく農業推薦システム:多変量気象予報手法

Agricultural Recommendation System based on Deep Learning: A Multivariate Weather Forecasting Approach ( http://arxiv.org/abs/2401.11410v3 )

ライセンス: Link先を確認

Md Zubair, Md. Shahidul Salim, Mehrab Mustafy Rahman, Mohammad Jahid Ibna Basher, Shahin Imran, Iqbal H. Sarker,

(参考訳) 農業は、経済成長を推進し、世界中の人々の食料安全保障を確保する上で、基本的な役割を担っている。労働集約型農業は多くの発展途上国で食糧穀物生産が着実に増加してきたが、豪雨、低温、干ばつなどの悪天候に悩まされることが多い。これらの要因は食料生産を著しく妨げ、世界の食料安全保障に重大なリスクをもたらしている。そこで本研究では,天気予報モデルを用いた環境適応型作物推薦システムを提案する。実施のため、バングラデシュの全領域について検討した。気象予報モデルとして多変量重畳Bi-LSTM(時間分散層を有する3層Bi-LSTM)ネットワークが広く評価されている。提案した気象モデルは、バングラデシュの任意の場所における降水量、気温、湿度、日光量を平均して0.9824と予測でき、他の最先端のLSTMモデルよりも優れている。これらの予測は、実効的な農業決定を生み出す上で、我々のシステムを導く。さらに、我々の本格的なシステムは、農作物を保護するための予防措置を実施できるように、極端な気象状況について農家に警告することができる。最後に、このシステムは、洪水や干ばつに起因した地域での知識に基づく作物提案にも長けている。

Agriculture plays a fundamental role in driving economic growth and ensuring food security for populations around the world. Although labor-intensive agriculture has led to steady increases in food grain production in many developing countries, it is frequently challenged by adverse weather conditions, including heavy rainfall, low temperatures, and drought. These factors substantially hinder food production, posing significant risks to global food security. In order to have a profitable, sustainable, and farmer-friendly agricultural practice, this paper proposes a context-based crop recommendation system powered by a weather forecast model. For implementation purposes, we have considered the whole territory of Bangladesh. With extensive evaluation, the multivariate Stacked Bi-LSTM (three Bi-LSTM layers with a time Distributed layer) Network is employed as the weather forecasting model. The proposed weather model can forecast Rainfall, Temperature, Humidity, and Sunshine for any given location in Bangladesh with an average R-Squared value of 0.9824, and the model outperforms other state-of-the-art LSTM models. These predictions guide our system in generating viable farming decisions. Additionally, our full-fledged system is capable of alerting the farmers about extreme weather conditions so that preventive measures can be undertaken to protect the crops. Finally, the system is also adept at making knowledge-based crop suggestions for flood and drought-prone regions.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# グリオーマの病理像解析における人工知能の応用

Applications of artificial intelligence in the analysis of histopathology images of gliomas: a review ( http://arxiv.org/abs/2401.15022v4 )

ライセンス: Link先を確認

Jan-Philipp Redlich, Friedrich Feuerhake, Joachim Weis, Nadine S. Schaadt, Sarah Teuber-Hanselmann, Christoph Buck, Sabine Luttmann, Andrea Eberle, Stefan Nikolin, Arno Appenzeller, Andreas Portmann, André Homeyer,

(参考訳) 近年,グリオーマの診断が複雑化している。人工知能(AI)を用いたグリオーマ組織像の解析は,診断と予後予測を支援する新たな機会を提供する。そこで本研究では,ヒトグリオーマの組織像全体に対するAIを用いた画像解析手法を提案し,脳卒中(23/83),脳卒中(27/83),分子マーカー(20/83),生存率(29/83)の診断課題について検討した。方法論的側面と臨床応用性について検討した。本研究の焦点は,成人型びまん性グリオーマのヘマトキシリンおよびエオシン染色組織分画の評価である。研究の大半 (52/83) は、The Cancer Genome Atlas (TCGA) から入手可能なグリオーマと低グレードグリオーマのデータセットに基づいており、他のデータセット(16/83) やTCGAデータセット(15/83) に加えて、いくつかの研究しか使われていない。現在のアプローチは主に20倍(35/83)で組織を分析するために畳み込みニューラルネットワーク(63/83)に依存している。新しい研究分野は、臨床データ、オミクスデータ、磁気共鳴イメージング(29/83)の統合である。これまでのところ、AIベースの手法は有望な成果を上げているが、実際の臨床環境ではまだ使われていない。今後の研究は、高品質で最新の臨床および分子病理アノテーションを持つ大規模で多サイトなデータセットに対するメソッドの独立した検証に焦点をあてて、定期的な適用性を示す必要がある。

In recent years, the diagnosis of gliomas has become increasingly complex. Analysis of glioma histopathology images using artificial intelligence (AI) offers new opportunities to support diagnosis and outcome prediction. To give an overview of the current state of research, this review examines 83 publicly available research studies that have proposed AI-based methods for whole-slide histopathology images of human gliomas, covering the diagnostic tasks of subtyping (23/83), grading (27/83), molecular marker prediction (20/83), and survival prediction (29/83). All studies were reviewed with regard to methodological aspects as well as clinical applicability. It was found that the focus of current research is the assessment of hematoxylin and eosin-stained tissue sections of adult-type diffuse gliomas. The majority of studies (52/83) are based on the publicly available glioblastoma and low-grade glioma datasets from The Cancer Genome Atlas (TCGA) and only a few studies employed other datasets in isolation (16/83) or in addition to the TCGA datasets (15/83). Current approaches mostly rely on convolutional neural networks (63/83) for analyzing tissue at 20x magnification (35/83). A new field of research is the integration of clinical data, omics data, or magnetic resonance imaging (29/83). So far, AI-based methods have achieved promising results, but are not yet used in real clinical settings. Future work should focus on the independent validation of methods on larger, multi-site datasets with high-quality and up-to-date clinical and molecular pathology annotations to demonstrate routine applicability.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# データ組織はバイナリ分類の予測可能性を制限する

Data organization limits the predictability of binary classification ( http://arxiv.org/abs/2401.17036v2 )

ライセンス: Link先を確認

Fei Jing, Zi-Ke Zhang, Yi-Cheng Zhang, Qingpeng Zhang,

(参考訳) データ組織の構造は、特に二分分類タスクにおいて、機械学習アルゴリズムの有効性に大きな影響を与えていると広く認識されている。我々の研究は、与えられたデータセット上のバイナリ分類器の最大ポテンシャルは、データ固有の性質に主に制約されていることを示唆する理論的枠組みを提供する。理論的推論と経験的検証の両面から, 2つの主要な結論に達するために, 標準目的関数, 評価指標, 二項分類器を用いた。まず、実際のデータセットにおける二項分類性能の理論的上限が理論的に達成可能であることを示す。この上界は、学習損失と評価基準の間の計算可能な平衡を表す。第2に、一般的に使用されている3つの評価指標の正確な上界を計算し、その上界は、使用中の分類器とは独立に、データセットの特徴と複雑に結びついているという、上位のテーゼと基本的な均一性を明らかにする。さらに、その後の分析により、二項分類データにおける性能上限とクラス重複レベルとの詳細な関係が明らかになった。この関係は、機能エンジニアリングで使用する最も効果的な機能サブセットをピンポイントするのに役立ちます。

The structure of data organization is widely recognized as having a substantial influence on the efficacy of machine learning algorithms, particularly in binary classification tasks. Our research provides a theoretical framework suggesting that the maximum potential of binary classifiers on a given dataset is primarily constrained by the inherent qualities of the data. Through both theoretical reasoning and empirical examination, we employed standard objective functions, evaluative metrics, and binary classifiers to arrive at two principal conclusions. Firstly, we show that the theoretical upper bound of binary classification performance on actual datasets can be theoretically attained. This upper boundary represents a calculable equilibrium between the learning loss and the metric of evaluation. Secondly, we have computed the precise upper bounds for three commonly used evaluation metrics, uncovering a fundamental uniformity with our overarching thesis: the upper bound is intricately linked to the dataset's characteristics, independent of the classifier in use. Additionally, our subsequent analysis uncovers a detailed relationship between the upper limit of performance and the level of class overlap within the binary classification data. This relationship is instrumental for pinpointing the most effective feature subsets for use in feature engineering.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# アジャイルソフトウェア開発におけるデータ管理の課題と解決策を探る: 文献レビューと実践者調査

Exploring Data Management Challenges and Solutions in Agile Software Development: A Literature Review and Practitioner Survey ( http://arxiv.org/abs/2402.00462v2 )

ライセンス: Link先を確認

Ahmed Fawzy, Amjed Tahir, Matthias Galster, Peng Liang,

(参考訳) ソフトウェア製品とその開発に関連するデータを管理することは、ソフトウェアプロジェクトやアジャイルチームにとって大きな課題となる。課題には、さまざまなソースからのデータを統合し、継続的な変更と適応の観点からデータ品質を保証することが含まれる。この目的のために、私たちは、アジャイルプロジェクトでデータ管理の課題と潜在的な解決策を体系的に探求することを目指していました。研究の状況を理解するために,系統的な文献レビュー(SLR)を用いた混合手法のアプローチを採用し,実践者を対象にした調査を行った。 SLRでは、データ管理の側面と関連する課題と解決策を特定し分類する45の研究をレビューした。実践者調査では,32名の業界専門家から実践経験とソリューションを抽出し,SLRの知見を補完した。その結果,データ統合プロセスの管理,多種多様なデータの収集,データ収集の自動化,リアルタイム分析要求の達成など,SLRと実践者の双方で報告された主要なデータ管理課題が明らかになった。本研究は,データ管理方針の明確化の必要性,データ管理ツールのトレーニング,アジリティの向上,製品品質の向上,プロジェクト成果の向上といった新たなデータ管理戦略の採用など,実践者や研究者にとっての意義を示すものである。

Managing data related to a software product and its development poses significant challenges for software projects and agile development teams. Challenges include integrating data from diverse sources and ensuring data quality in light of continuous change and adaptation. To this end, we aimed to systematically explore data management challenges and potential solutions in agile projects. We employed a mixed-methods approach, utilizing a systematic literature review (SLR) to understand the state-of-research followed by a survey with practitioners to reflect on the state-of-practice. In the SLR, we reviewed 45 studies in which we identified and categorized data management aspects and the associated challenges and solutions. In the practitioner survey, we captured practical experiences and solutions from 32 industry experts to complement the findings from the SLR. Our findings reveal major data management challenges reported in both the SLR and practitioner survey, such as managing data integration processes, capturing diverse data, automating data collection, and meeting real-time analysis requirements. Based on our findings, we present implications for practitioners and researchers, which include the necessity of developing clear data management policies, training on data management tools, and adopting new data management strategies that enhance agility, improve product quality, and facilitate better project outcomes.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# 検証回路の再利用による言語モデルの信頼度向上

Increasing Trust in Language Models through the Reuse of Verified Circuits ( http://arxiv.org/abs/2402.02619v8 )

ライセンス: Link先を確認

Philip Quirke, Clement Neo, Fazl Barez,

(参考訳) 言語モデル(LM)は、幅広い予測タスクにますます使われていますが、それらのトレーニングは稀なエッジケースを無視し、信頼性を低下させます。ここでは、タスクアルゴリズムと回路実装を検証し、エッジケースを考慮し、既知の障害モードを含まない、厳格な信頼性基準を定義する。数学的および論理的に定義されたフレームワークを使用して構築すれば、この標準を満たすようにモデルをトレーニングできることが示される。本稿では,n桁整数加算のための自動回帰変換器モデルを完全に検証する。検証されたモジュールの再利用性を示すため、トレーニングされた整数加算モデルをより大きな未学習モデルに挿入し、加算と減算の両方を行うように組み合わせたモデルを訓練する。両タスクの加算回路を広範囲に再利用し,より複雑な減算器モデルの検証を容易にする。本稿では,検証済みのタスクモジュールをLMに挿入することで,モデルの再利用を有効活用し,それらを用いた言語モデルの妥当性と信頼性を向上させる方法について論じる。検証回路の再利用により、言語モデルの安全性に向けた重要なステップであると考えられる、より複雑な複合モデルを検証する労力が削減される。

Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify an auto-regressive transformer model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into a larger untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# ポリノミアル時間におけるReLUニューラルネットワーク近似グローバルオプティマの凸緩和

Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time ( http://arxiv.org/abs/2402.03625v3 )

ライセンス: Link先を確認

Sungyoon Kim, Mert Pilanci,

(参考訳) 本稿では,2層ReLUネットワーク間における重み劣化と凸緩和の最適性ギャップについて検討する。トレーニングデータがランダムであれば,n がトレーニングサンプル数である O(log n^0.5) の係数によって,元の問題と緩和の間の相対的最適性ギャップが有界であることが示される。単純な応用は、元の非凸問題を対数係数まで解くことが保証される、抽出可能な多項式時間アルゴリズムにつながる。さらに, 緩やかな仮定の下では, 局所勾配法は訓練損失の低い点に収束し, 高い確率で収束することを示す。その結果,局所勾配法が有効である理由の理解に新たな光を当てることができた。

In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n^0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that local gradient methods converge to a point with low training loss with high probability. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# Leggett-Garg不等式を用いた単一システムによる認証ランダムネスの生成

Single system based generation of certified randomness using Leggett-Garg inequality ( http://arxiv.org/abs/2402.03712v2 )

ライセンス: Link先を確認

Pingal Pratyush Nath, Debashis Saha, Dipankar Home, Urbasi Sinha,

(参考訳) 我々は、ループホールのないフォトニックアーキテクチャにおいて、Leggett-Gargの不等式違反を利用して、半デバイス非依存の量子乱数生成のための安全なスキームを理論的に定式化し、実験的に示す。生成したランダム性の定量化は、解析的および数値的アプローチによって厳密に推定され、どちらも完全に一致している。 9,19,118ドルの真に予測不可能なビットを確実に生成します。これは、単一のシステムの量子性を利用する信頼性の高い乱数生成器の、経験的に便利なクラスへの未探索の道を開く。

We theoretically formulate and experimentally demonstrate a secure scheme for semi-device-independent quantum random number generation by utilizing Leggett-Garg inequality violations, within a loophole-free photonic architecture. The quantification of the generated randomness is rigorously estimated by analytical as well as numerical approaches, both of which are in perfect agreement. We securely generate $9,19,118$ truly unpredictable bits. This opens up an unexplored avenue towards an empirically convenient class of reliable random number generators harnessing the quantumness of single systems.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# 知識集約型文脈における会話アシスタント:LLMとインテントベースシステムの評価

Conversational Assistants in Knowledge-Intensive Contexts: An Evaluation of LLM- versus Intent-based Systems ( http://arxiv.org/abs/2402.04955v2 )

ライセンス: Link先を確認

Samuel Kernan Freire, Chaofan Wang, Evangelos Niforatos,

(参考訳) 会話アシスタント(CA)は、知識管理における人間の労働者を支援している。伝統的に、CAはユーザー意図や会話パターンを事前に定義した特定の方法で応答する。しかし、この剛性は自然言語の多様性をうまく扱えない。近年の自然言語処理,すなわちLarge Language Models (LLMs) の進歩により,CAはテキストから関連情報を抽出し,専門家から情報を取り出すとともに,'hallucinations'のような新たな課題を導入し,より柔軟で人間的な方法で会話することが可能になった。知識管理タスクにLLMを使用する可能性を評価するため,LLMベースのCAを対話効率,ユーザエクスペリエンス,作業負荷,ユーザビリティに関する意図に基づくシステムと比較した。この結果,LCMをベースとしたCAは,インテントベースシステムよりも優れたユーザエクスペリエンス,タスク完了率,ユーザビリティ,認識性能を示し,NLP技術の変更は知識管理の文脈において有益であることが示唆された。

Conversational Assistants (CA) are increasingly supporting human workers in knowledge management. Traditionally, CAs respond in specific ways to predefined user intents and conversation patterns. However, this rigidness does not handle the diversity of natural language well. Recent advances in natural language processing, namely Large Language Models (LLMs), enable CAs to converse in a more flexible, human-like manner, extracting relevant information from texts and capturing information from expert humans but introducing new challenges such as ``hallucinations''. To assess the potential of using LLMs for knowledge management tasks, we conducted a user study comparing an LLM-based CA to an intent-based system regarding interaction efficiency, user experience, workload, and usability. This revealed that LLM-based CAs exhibited better user experience, task completion rate, usability, and perceived performance than intent-based systems, suggesting that switching NLP techniques can be beneficial in the context of knowledge management.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# GPT-4 構造化ナラティブ・プロンプトを用いたライフイベントの物語生成:検証研究

GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study ( http://arxiv.org/abs/2402.05435v2 )

ライセンス: Link先を確認

Christopher J. Lynch, Erik Jensen, Madison H. Munro, Virginia Zamponi, Joseph Martinez, Kevin O'Brien, Brandon Feldhaus, Katherine Smith, Ann Marie Reinhold, Ross Gore,

(参考訳) 大規模言語モデル(LLM)は、物語の膨大な配列を生成する上で重要な役割を果たす。本研究では,OpenAIのGPT-4を用いて,ゼロショット構造化された物語プロンプトを用いて24,000の物語を生成する。このデータセットから、2,880の物語を手動で分類し、出生、死亡、雇用、解雇の妥当性を評価する。注目すべきは、物語の87.43%が、構造化されたプロンプトの意図を十分に伝えることである。有効かつ無効な物語の識別を自動化するため、分類データセット上で9つの機械学習モデルをトレーニングし、検証する。これらのモデルを利用して分析を拡張し、残りの21,120の物語の分類を予測する。全てのMLモデルは有効な物語を有効に分類するのに優れていたが、無効な物語を無効に分類すると同時に課題を経験した。本研究は, LLMの能力, 限界, 妥当性の研究を前進させるだけでなく, 物語生成や自然言語処理の実用化にも有効である。

Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications.

翻訳日:2024-07-16 05:27:26 公開日:2024-07-12

# 顔行動単位検出のための対照的な特徴表現の学習

Learning Contrastive Feature Representations for Facial Action Unit Detection ( http://arxiv.org/abs/2402.06165v2 )

ライセンス: Link先を確認

Ziqiao Shang, Bin Liu, Fengmao Lv, Fei Teng, Tianrui Li,

(参考訳) 顔アクションユニット(AU)検出は、AUが活性化する際の微妙な特徴差を検出するという課題に長年遭遇してきた。既存の手法はしばしばAUのピクセルレベルの情報を符号化することに頼り、余分な情報をエンコードするだけでなく、モデルの複雑さが増し、一般化可能性も制限される。さらに、各AUタイプのクラス不均衡問題や、ノイズや偽AUラベルの存在により、AU検出の精度が負の影響を受ける。本稿では、自己教師付き信号と教師付き信号の両方を組み込んだAU検出を目的とした新しいコントラスト学習フレームワークを導入し、精度の高いAU検出のための識別特徴の学習を向上する。クラス不均衡問題に対処するために、少数派および多数派のサンプルに対するパラメータの更新のステップサイズを調整する負のサンプル再重み付け戦略を用いる。さらに,雑音や偽AUラベルによる課題に対処するために,3種類の正のサンプル対を含むサンプリング手法を用いる。これにより、教師付き信号に自己教師付き信号を注入し、ノイズラベルの悪影響を効果的に軽減することができる。筆者らは,4つの広く利用されているベンチマークデータセット(BP4D, DISFA, GFT, Aff-Wild2)を用いて実験を行った。我々のコードは \url{https://github.com/Ziqiao-Shang/AUNCE} で入手できる。

Facial action unit (AU) detection has long encountered the challenge of detecting subtle feature differences when AUs activate. Existing methods often rely on encoding pixel-level information of AUs, which not only encodes additional redundant information but also leads to increased model complexity and limited generalizability. Additionally, the accuracy of AU detection is negatively impacted by the class imbalance issue of each AU type, and the presence of noisy and false AU labels. In this paper, we introduce a novel contrastive learning framework aimed for AU detection that incorporates both self-supervised and supervised signals, thereby enhancing the learning of discriminative features for accurate AU detection. To tackle the class imbalance issue, we employ a negative sample re-weighting strategy that adjusts the step size of updating parameters for minority and majority class samples. Moreover, to address the challenges posed by noisy and false AU labels, we employ a sampling technique that encompasses three distinct types of positive sample pairs. This enables us to inject self-supervised signals into the supervised signal, effectively mitigating the adverse effects of noisy labels. Our experimental assessments, conducted on four widely-utilized benchmark datasets (BP4D, DISFA, GFT and Aff-Wild2), underscore the superior performance of our approach compared to state-of-the-art methods of AU detection. Our code is available at \url{https://github.com/Ziqiao-Shang/AUNCE}.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# コンテキストバンドのためのツリーアンサンブル

Tree Ensembles for Contextual Bandits ( http://arxiv.org/abs/2402.06963v2 )

ライセンス: Link先を確認

Hannes Nilsson, Rikard Johansson, Niklas Åkerblom, Morteza Haghir Chehreghani,

(参考訳) 木アンサンブルに基づくコンテキスト型マルチアームバンディットのための新しいフレームワークを提案する。本フレームワークは,標準設定と組合せ設定の両方に,アッパー信頼境界とトンプソンサンプリングという2つの広範に使用されている帯域幅法を統合している。我々は,XGBoostとランダム林を併用したいくつかの実験により,本フレームワークの有効性を実証した。提案手法は,決定木やニューラルネットワークに基づく最先端の手法と比較して,ベンチマークデータセットに適用した場合の,後悔の最小化と計算ランタイムの両方の観点から,優れた性能を示す。

We propose a novel framework for contextual multi-armed bandits based on tree ensembles. Our framework integrates two widely used bandit methods, Upper Confidence Bound and Thompson Sampling, for both standard and combinatorial settings. We demonstrate the effectiveness of our framework via several experimental studies, employing both XGBoost and random forest, two popular tree ensemble methods. Compared to state-of-the-art methods based on decision trees and neural networks, our methods exhibit superior performance in terms of both regret minimization and computational runtime, when applied to benchmark datasets and the real-world application of navigation over road networks.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# 教師付き学習力学の比較:ディープニューラルネットワークは人間のデータ効率と一致するが、一般化ラグを示す

Comparing supervised learning dynamics: Deep neural networks match human data efficiency but show a generalisation lag ( http://arxiv.org/abs/2402.09303v3 )

ライセンス: Link先を確認

Lukas S. Huber, Fred W. Mast, Felix A. Wichmann,

(参考訳) 近年の研究では、画像分類分野における人間とディープニューラルネットワーク(DNN)の行動比較が数多く行われている。比較研究は、しばしば学習過程の終末に焦点を合わせ、対象カテゴリーの表現における類似性を測定し比較する。しかし、これらの表現の出現過程、すなわち、獲得中に観察される行動変化と中間段階は、直接的かつ経験的に比較されることが少なくなる。本稿では、人間の観察者および様々な古典的かつ最先端のDNNにおける学習力学の詳細な研究について報告する。我々は,開始点,入力モダリティ,利用可能な入力データ,提供されたフィードバックなどの学習関連条件を整合させる,制約付き教師付き学習環境を開発する。学習プロセス全体にわたって、十分に学習された表現が、これまで見つからなかったテストデータにどのように一般化できるかを評価し、比較する。学習プロセス全体の比較は、DNNが人間の学習者と同等のデータ効率のレベルを示しており、この分野におけるいくつかの一般的な仮定に挑戦していることを示している。しかし,本研究の結果は,DNNの学習に顕著な一般化ラグが特徴的であるのに対して,人間は,後に新しいデータにのみ転送されるセット固有情報を学習する予備的な段階を伴わずに,すぐに一般化可能な表現を習得するように見える。

Recent research has seen many behavioral comparisons between humans and deep neural networks (DNNs) in the domain of image classification. Often, comparison studies focus on the end-result of the learning process by measuring and comparing the similarities in the representations of object categories once they have been formed. However, the process of how these representations emerge -- that is, the behavioral changes and intermediate stages observed during the acquisition -- is less often directly and empirically compared. Here we report a detailed investigation of the learning dynamics in human observers and various classic and state-of-the-art DNNs. We develop a constrained supervised learning environment to align learning-relevant conditions such as starting point, input modality, available input data and the feedback provided. Across the whole learning process we evaluate and compare how well learned representations can be generalized to previously unseen test data. Comparisons across the entire learning process indicate that DNNs demonstrate a level of data efficiency comparable to human learners, challenging some prevailing assumptions in the field. However, our results also reveal representational differences: while DNNs' learning is characterized by a pronounced generalisation lag, humans appear to immediately acquire generalizable representations without a preliminary phase of learning training set-specific information that is only later transferred to novel data.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# セキュアコード生成のためのインストラクションチューニング

Instruction Tuning for Secure Code Generation ( http://arxiv.org/abs/2402.09497v2 )

ライセンス: Link先を確認

Jingxuan He, Mark Vero, Gabriela Krasnopolska, Martin Vechev,

(参考訳) 現代の言語モデル(LM)は、日常や専門的な文脈、特にプログラミングにおいて広く受け入れられている。この導入を可能にする重要な手順は命令チューニングであり、ユーザ命令や人間の好みに従うように訓練することで、LMの実用性を大幅に向上させる。しかし、既存の命令チューニングスキームは、生成されたコードのセキュリティという重要な側面を見落としている。その結果、最先端の命令チューニングされたLMでさえ、しばしば安全でないコードを生成し、重大なセキュリティリスクを生じさせる。この作業では、このギャップに対処するためにSafeCoderを導入します。 SafeCoderは、自動パイプラインを使用して収集した多種多様な高品質データセットを使用して、セキュリティ中心の微調整を実行します。セキュリティの微調整と標準命令のチューニングを統合し,セキュリティとユーティリティの両面の最適化を容易にする。その単純さにもかかわらず、SafeCoderは様々な人気のあるLMやデータセットで有効であることを示す。ユーティリティを保ちながら、セキュリティを大幅に改善できます(約30%)。

Modern language models (LMs) have gained widespread acceptance in everyday and professional contexts, particularly in programming. An essential procedure enabling this adoption is instruction tuning, which substantially enhances LMs' practical utility by training them to follow user instructions and human preferences. However, existing instruction tuning schemes overlook a crucial aspect: the security of generated code. As a result, even the state-of-the-art instruction-tuned LMs frequently produce unsafe code, posing significant security risks. In this work, we introduce SafeCoder to address this gap. SafeCoder performs security-centric fine-tuning using a diverse and high-quality dataset that we collected using an automated pipeline. We integrate the security fine-tuning with standard instruction tuning, to facilitate a joint optimization of both security and utility. Despite its simplicity, we show that SafeCoder is effective across a variety of popular LMs and datasets. It is able to drastically improve security (by about 30%), while preserving utility.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# 拡散テンパリングは正規微分方程式に対する確率積分器によるパラメータ推定を改善する

Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations ( http://arxiv.org/abs/2402.12231v3 )

ライセンス: Link先を確認

Jonas Beck, Nathanael Bosch, Michael Deistler, Kyra L. Kadhim, Jakob H. Macke, Philipp Hennig, Philipp Berens,

(参考訳) 通常微分方程式(ODE)は科学の力学系を記述するために広く用いられているが、実験的な測定を説明するパラメータを特定することは困難である。特に、ODEは微分可能であり、勾配に基づくパラメータ最適化が可能であるが、ODEの非線形ダイナミクスは多くの場合、多くの局所最小化と初期条件に対する極度な感度をもたらす。そこで我々は,ODEにおける勾配に基づくパラメータ最適化の収束性を改善する確率的数値法の新しい正規化手法である拡散テンパリングを提案する。確率積分器の雑音パラメータを反復的に低減することにより、提案手法は真のパラメータにより確実に収束する。本手法は複雑性の異なる力学系に対して有効であることを示すとともに,実際に関連するパラメータ数を持つHodgkin-Huxleyモデルに対して,信頼性の高いパラメータ推定値が得られることを示す。

Ordinary differential equations (ODEs) are widely used to describe dynamical systems in science, but identifying parameters that explain experimental measurements is challenging. In particular, although ODEs are differentiable and would allow for gradient-based parameter optimization, the nonlinear dynamics of ODEs often lead to many local minima and extreme sensitivity to initial conditions. We therefore propose diffusion tempering, a novel regularization technique for probabilistic numerical methods which improves convergence of gradient-based parameter optimization in ODEs. By iteratively reducing a noise parameter of the probabilistic integrator, the proposed method converges more reliably to the true parameters. We demonstrate that our method is effective for dynamical systems of different complexity and show that it obtains reliable parameter estimates for a Hodgkin-Huxley model with a practically relevant number of parameters.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# 障害のないSachdev-Ye-Kitaevモデル:積分性と量子カオス

Disorder-Free Sachdev-Ye-Kitaev models: Integrability and Quantum Chaos ( http://arxiv.org/abs/2402.13154v2 )

ライセンス: Link先を確認

Soshun Ozaki, Hosho Katsura,

(参考訳) 本稿では、Sachdev-Ye-Kitaevモデル(SYK)の2つの障害のない変種を紹介し、それらの可積分性を実証し、それらの静的および動的性質について検討する。図式的手法とは異なり、これらのモデルの可積分性は、マヨラナフェルミオンの数が有限である場合でも、動的相関関数を得ることができる。これらの解から、これらのモデルにおける時間外相関器(OTOC)は、障害や外的キック項のような量子カオス系と同様、早期に指数関数的な成長を示すことが分かる。逆に、我々の分析では、レベル統計学やスペクトル形状因子におけるランダム行列の挙動の証拠は示されていない。以上の結果から,SYKモデルのクリーンバージョンは,OTOCのカオス的挙動を示す乱れのない量子多体系の単純かつ非自明な例であることがわかった。

We introduce two disorder-free variants of the Sachdev-Ye-Kitaev (SYK) model, demonstrate their integrability, and study their static and dynamical properties. Unlike diagrammatic techniques, the integrability of these models allows us to obtain dynamical correlation functions even when the number of Majorana fermions is finite. From the solutions, we find that out-of-time-order correlators (OTOCs) in these models exhibit exponential growth at early times, resembling that of quantum chaotic systems such as those with disorder or external kick terms. Conversely, our analysis shows no evidence of random-matrix behavior in level statistics or the spectral form factor. Our findings illustrate that the clean versions of the SYK models represent simple but nontrivial examples of disorder-free quantum many-body systems displaying chaos-like behavior of OTOCs.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# STENCIL: コールドスタートアクティブラーニングのためのサブモジュール相互情報に基づく弱スーパービジョン

STENCIL: Submodular Mutual Information Based Weak Supervision for Cold-Start Active Learning ( http://arxiv.org/abs/2402.13468v2 )

ライセンス: Link先を確認

Nathan Beck, Adithya Iyer, Rishabh Iyer,

(参考訳) NLPアプリケーションにおける事前訓練済みモデルの微調整が普及するにつれて、特に大きな言語モデルにおけるパラメータ数の増加に伴い、注釈付きデータのコーパスが大きいことが要求される。モデルパフォーマンスを最大に向上させるためにラベルのないインスタンスをマイニングし注釈付けしようとするアクティブラーニングは、アノテーションコストを削減するための一般的な選択肢であるが、ほとんどのメソッドは、クラス不均衡を無視したり、初期アノテーション付きデータへのアクセスを前提としたり、稀なクラスを改善する前に複数のアクティブラーニング選択を必要とする。本稿では,一連のテキスト例と最近提案されたサブモジュール相互情報を利用して,アノテータによって強くラベル付けされた弱いラベル付けされたレアクラスのインスタンス群を選択する。 STENCILは、クラス不均衡のコールドスタート設定において、一般的なアクティブな学習方法よりも、複数のテキスト分類データセットに対して10\%-18\%$とレアクラスのF-1スコアを17\%-40\%$に改善することを示した。

As supervised fine-tuning of pre-trained models within NLP applications increases in popularity, larger corpora of annotated data are required, especially with increasing parameter counts in large language models. Active learning, which attempts to mine and annotate unlabeled instances to improve model performance maximally fast, is a common choice for reducing the annotation cost; however, most methods typically ignore class imbalance and either assume access to initial annotated data or require multiple rounds of active learning selection before improving rare classes. We present STENCIL, which utilizes a set of text exemplars and the recently proposed submodular mutual information to select a set of weakly labeled rare-class instances that are then strongly labeled by an annotator. We show that STENCIL improves overall accuracy by $10\%-18\%$ and rare-class F-1 score by $17\%-40\%$ on multiple text classification datasets over common active learning methods within the class-imbalanced cold-start setting.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# 長期制約付きチャット凸関数

Chasing Convex Functions with Long-term Constraints ( http://arxiv.org/abs/2402.14012v2 )

ライセンス: Link先を確認

Adam Lechowicz, Nicolas Christianson, Bo Sun, Noman Bashir, Mohammad Hajiesmaili, Adam Wierman, Prashant Shenoy,

(参考訳) 我々は,長期的制約を伴うオンライン計量問題群を紹介し,研究する。これらの問題において、オンラインプレーヤーは、計量空間$(X,d)$で$\mathbf{x}_t$を判定し、ヒットコスト$f_t(\mathbf{x}_t)$を同時に最小化し、計量によって決定される切り替えコストを最小化する。時間が経つにつれて、プレイヤーは長期要求制約である$\sum_{t} c(\mathbf{x}_t) \geq 1$を満たさなければならない。このような問題は、持続可能なエネルギー/計算システムにおけるオンラインリソース割り当てへの幅広い応用を見出すことができる。我々は,有界ヒットコスト勾配と重み付き$\ell_1$メトリクスの場合に最適な競合性および学習強化アルゴリズムを考案し,さらに,提案アルゴリズムが数値実験で良好に動作することを示す。

We introduce and study a family of online metric problems with long-term constraints. In these problems, an online player makes decisions $\mathbf{x}_t$ in a metric space $(X,d)$ to simultaneously minimize their hitting cost $f_t(\mathbf{x}_t)$ and switching cost as determined by the metric. Over the time horizon $T$, the player must satisfy a long-term demand constraint $\sum_{t} c(\mathbf{x}_t) \geq 1$, where $c(\mathbf{x}_t)$ denotes the fraction of demand satisfied at time $t$. Such problems can find a wide array of applications to online resource allocation in sustainable energy/computing systems. We devise optimal competitive and learning-augmented algorithms for the case of bounded hitting cost gradients and weighted $\ell_1$ metrics, and further show that our proposed algorithms perform well in numerical experiments.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# 人間機械社会システム

Human-machine social systems ( http://arxiv.org/abs/2402.14410v2 )

ライセンス: Link先を確認

Milena Tsvetkova, Taha Yasseri, Niccolo Pescetelli, Tobias Werner,

(参考訳) 偽のソーシャルメディアアカウントや生成AIチャットボットから金融取引アルゴリズムや自動運転車、ロボット、ボット、アルゴリズムに至るまで、私たちのコミュニケーションチャネル、社会的相互作用、経済取引、そして交通機関が普及し、浸透しています。複数の相互依存・相互作用する人間と自律機械のネットワークは複雑な社会システムを構成する。本パラダイムでは, 競争, 協調, 協力, 伝染, 集団的意思決定の状況において, 競争, 協調, 協調, 協調, 集団的意思決定の状況における, さまざまな分野からの最近の研究を概観し, 高頻度取引市場, ソーシャルメディアプラットフォーム, オープン・コラボレーション・コミュニティ, ディスカッション・フォーラムの事例を考察する。より堅牢でレジリエントな人間と機械のコミュニティを確実にするためには、研究者たちは複雑なシステム手法を使ってそれらを研究し、エンジニアは人間と機械の相互作用のためのAIを明示的に設計し、規制当局は人間と機械の生態多様性と社会的共進化を統治する必要がある。

From fake social media accounts and generative-AI chatbots to financial trading algorithms and self-driving vehicles, robots, bots, and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions, and transportation arteries. Networks of multiple interdependent and interacting humans and autonomous machines constitute complex social systems where the collective outcomes cannot be deduced from either human or machine behavior alone. Under this paradigm, we review recent research from across a range of disciplines and identify general dynamics and patterns in situations of competition, coordination, cooperation, contagion, and collective decision-making, with context-rich examples from high-frequency trading markets, a social media platform, an open-collaboration community, and a discussion forum. To ensure more robust and resilient human-machine communities, researchers should study them using complex-system methods, engineers should explicitly design AI for human-machine and machine-machine interactions, and regulators should govern the ecological diversity and social co-evolution of humans and machines.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# 複数のモデルにまたがる統一タスク埋め込みに向けて: Promptベースの大規模言語モデルのギャップを埋める

Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond ( http://arxiv.org/abs/2402.14522v2 )

ライセンス: Link先を確認

Xinyu Wang, Hainiu Xu, Lin Gui, Yulan He,

(参考訳) タスク固有の情報をキャプチャするメタ学習技術であるタスク埋め込みは、特にマルチタスク学習、モデル編集、解釈可能性などの分野で人気を集めている。しかし、プロンプト誘導型大規模言語モデル(LLM)がグラデーションフリーで動作し、課題に直面している。既存のタスク埋め込み手法は、細調整されたタスク固有の言語モデルに依存しており、様々なモデル、特にプロンプトベースのLLMに対するタスク埋め込みの適応性を妨げている。 LLMの時代にタスク埋め込みの可能性を困難にするため、単一ベクトル空間内で、より小さな言語モデルや様々なプロンプトを持つLLMを含む様々なモデルからタスク埋め込みを調和させる統合タスク埋め込み(FUTE)フレームワークを提案する。このような統一性は、異なるモデル間の類似性の比較と分析を可能にし、アーキテクチャ固有のメソッドに匹敵する性能を維持しながら、既存のタスク埋め込みメソッドの範囲と実用性を広げる。

Task embedding, a meta-learning technique that captures task-specific information, has gained popularity, especially in areas such as multi-task learning, model editing, and interpretability. However, it faces challenges with the emergence of prompt-guided Large Language Models (LLMs) operating in a gradient-free manner. Existing task embedding methods rely on fine-tuned, task-specific language models, which hinders the adaptability of task embeddings across diverse models, especially prompt-based LLMs. To hardness the potential of task embeddings in the era of LLMs, we propose a framework for unified task embeddings (FUTE), harmonizing task embeddings from various models, including smaller language models and LLMs with varied prompts, within a single vector space. Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios, while maintaining their performance comparable to architecture-specific methods.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# 量子熱状態におけるデータのトポロジー

The topology of data hides in quantum thermal states ( http://arxiv.org/abs/2402.15633v2 )

ライセンス: Link先を確認

Stefano Scali, Chukwudubem Umeano, Oleksandr Kyriienko,

(参考訳) 量子熱状態の蒸留によるトポロジカルデータ解析(TDA)を行うための量子プロトコルを提供する。量子熱状態生成アルゴリズムの最近の進歩は、散逸性リンドブレディアンの性質によって定義される特徴的スケーリングを明らかにする。これは、組合せラプラシアンの性質に依存するスケーリングを持つユニタリ進化に基づくプロトコルとは対照的である。量子熱状態生成アルゴリズムを活用するために、量子TDAをリアルタイムから虚像に変換し、パラダイムをユニタリなアプローチから散逸的なアプローチにシフトする。システムの基底状態と重なり合う初期状態から始めると、そのエネルギーはデータセット固有のチャネルを介して散逸し、その情報を自然に蒸留することができる。したがって、ベッチ数の計算は純度推定に変換される。あるいは、このことはR\'{e}nyi 2-エントロピー、ウルマンのフィディリティ、あるいは単純錯体の埋め込みトポロジーとの熱状態に対するヒルベルト・シュミット距離の評価と解釈できる。我々の研究は、データトポロジのより物理的解釈に向けて、TDAの分野を開放する。

We provide a quantum protocol to perform topological data analysis (TDA) via the distillation of quantum thermal states. Recent developments of quantum thermal state preparation algorithms reveal their characteristic scaling defined by properties of dissipative Lindbladians. This contrasts with protocols based on unitary evolution which have a scaling depending on the properties of the combinatorial Laplacian. To leverage quantum thermal state preparation algorithms, we translate quantum TDA from a real-time to an imaginary-time picture, shifting the paradigm from a unitary approach to a dissipative one. Starting from an initial state overlapping with the ground state of the system, one can dissipate its energy via channels unique to the dataset, naturally distilling its information. Therefore calculating Betti numbers translates into a purity estimation. Alternatively, this can be interpreted as the evaluation of the R\'{e}nyi 2-entropy, Uhlmann fidelity or Hilbert-Schmidt distance relative to thermal states with the embedded topology of simplicial complexes. Our work opens the field of TDA toward a more physical interpretation of the topology of data.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# アルゴリズム問題を解くニューラルネットワーク書き換えシステム

A Neural Rewriting System to Solve Algorithmic Problems ( http://arxiv.org/abs/2402.17407v2 )

ライセンス: Link先を確認

Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti,

(参考訳) 現代のニューラルネットワークアーキテクチャは、アウト・オブ・ディストリビューションの問題を解決するために構成規則を体系的に適用する必要があるアルゴリズムの手順を学ぶのに依然として苦労している。本研究では,ニューラルアーキテクチャの体系的一般化能力の研究に使用される合成ベンチマークのクラスである式単純化問題に焦点をあてる。本稿では,最小限の学習例にのみ依存して,ネストした数式を解くための一般的な手順を学習するために設計されたモジュラーアーキテクチャを提案する。シンボリック人工知能の古典的な枠組みであるシステム書き換えに触発された我々は、解決可能な部分表現を識別するために訓練されたセレクタ(Selector)と、それらの値にサブ表現をマッピングするソルバー(Solver)と、元の式でのサブ表現をソルバー(Solver)が提供する解に置き換えるコンビネータ(Compiner)という、3つの特殊かつ相互作用するモジュールをアーキテクチャに含めている。我々は,系統的な一般化に特化した最近のモデルであるニューラル・データ・ルータと,先進的なプロンプト戦略で探索された最先端の大規模言語モデル(GPT-4)とをベンチマークした。本稿では,3種類の式単純化問題に対するこれらの代替手法と比較して,分布外一般化の程度が高いことを実証し,その限界を解析して考察する。

Modern neural network architectures still struggle to learn algorithmic procedures that require to systematically apply compositional rules to solve out-of-distribution problem instances. In this work, we focus on formula simplification problems, a class of synthetic benchmarks used to study the systematic generalization capabilities of neural architectures. We propose a modular architecture designed to learn a general procedure for solving nested mathematical formulas by only relying on a minimal set of training examples. Inspired by rewriting systems, a classic framework in symbolic artificial intelligence, we include in the architecture three specialized and interacting modules: the Selector, trained to identify solvable sub-expressions; the Solver, mapping sub-expressions to their values; and the Combiner, replacing sub-expressions in the original formula with the solution provided by the Solver. We benchmark our system against the Neural Data Router, a recent model specialized for systematic generalization, and a state-of-the-art large language model (GPT-4) probed with advanced prompting strategies. We demonstrate that our approach achieves a higher degree of out-of-distribution generalization compared to these alternative approaches on three different types of formula simplification problems, and we discuss its limitations by analyzing its failures.

翻訳日:2024-07-16 05:17:24 公開日:2024-07-12

# ShapeLLM: 身体インタラクションのためのユニバーサル3Dオブジェクト理解

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction ( http://arxiv.org/abs/2402.17766v3 )

ライセンス: Link先を確認

Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, Kaisheng Ma,

(参考訳) 本稿では,3次元点群と言語を用いた汎用的な3次元オブジェクト理解を探求する,最初の3次元マルチモーダル大言語モデルであるShapeLLMを提案する。 ShapeLLMはReConをReCon++に拡張することで改良された3Dエンコーダ上に構築されている。 LLMのための3Dポイントクラウド入力エンコーダとしてReCon++を活用することで、ShapeLLMは命令追従データの構築を訓練し、3D MM-Vetという新しいベンチマークでテストする。 ReCon++とShapeLLMは、3Dの幾何学的理解と、具体化された視覚的接地のような言語統一された3Dインタラクションタスクにおいて最先端のパフォーマンスを達成する。プロジェクトページ: https://qizekun.github.io/shapellm/

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages. ShapeLLM is built upon an improved 3D encoder by extending ReCon to ReCon++ that benefits from multi-view image distillation for enhanced geometry understanding. By utilizing ReCon++ as the 3D point cloud input encoder for LLMs, ShapeLLM is trained on constructed instruction-following data and tested on our newly human-curated benchmark, 3D MM-Vet. ReCon++ and ShapeLLM achieve state-of-the-art performance in 3D geometry understanding and language-unified 3D interaction tasks, such as embodied visual grounding. Project page: https://qizekun.github.io/shapellm/

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# 重機型クラス不均衡とAdamが言語モデルでグラディエント・ダイスを上回る理由

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models ( http://arxiv.org/abs/2402.19449v2 )

ライセンス: Link先を確認

Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti,

(参考訳) Adamは、他のタスクよりも大きなマージンで、大きな言語モデルでの勾配勾配よりも優れていることが示されているが、なぜかは定かではない。この性能ギャップの重要な要因は、言語タスクで見られる重み付きクラス不均衡であることを示す。勾配降下法で訓練すると、頻度の低い単語の損失は、頻繁な単語の損失よりも遅くなる。これは、ほとんどのサンプルが頻度の低い単語から来ているため、平均的な損失が緩やかに減少する。一方、Adamと手話に基づく手法はこの問題にはあまり敏感ではない。この動作がクラス不均衡によって引き起こされることを示すために、アーキテクチャやデータタイプ、言語変換器、視覚CNN、線形モデル上で再現できることを実証的に示す。クロスエントロピー損失を持つ線形モデルにおいて、クラス不均衡はアダムに利益をもたらすと仮定された不均衡な相関勾配とヘッセン性をもたらすことを示す。また、連続時間において、勾配降下は低周波のクラスにゆっくりと収束するが、符号降下は必ずしも収束しないことを示す。

Adam has been shown to outperform gradient descent on large language models by a larger margin than on other tasks, but it is unclear why. We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks. When trained with gradient descent, the loss of infrequent words decreases more slowly than the loss of frequent ones. This leads to a slow decrease on the average loss as most samples come from infrequent words. On the other hand, Adam and sign-based methods are less sensitive to this problem. To establish that this behavior is caused by class imbalance, we show empirically that it can be reproduced across architectures and data types, on language transformers, vision CNNs, and linear models. On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. We also prove that, in continuous time, gradient descent converges slowly on low-frequency classes while sign descent does not.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# 深度情報を利用した単一画像デハジングのための協調的相互促進ネットワーク

Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing ( http://arxiv.org/abs/2403.01105v2 )

ライセンス: Link先を確認

Yafei Zhang, Shen Zhou, Huafeng Li,

(参考訳) 一つのぼんやりした画像から明確なイメージを復元することは、オープンな逆問題である。研究の進展は著しいが、既存の手法のほとんどは上流の脱ハイキングを促進するために下流のタスクが果たす影響を無視している。ヘイズ生成機構の観点からは、シーンの深さ情報とヘイズ画像との間に潜在的な関係がある。そこで本研究では,単一画像のデハジングを実現するためのマルチタスク協調促進フレームワークを提案する。本フレームワークは,両タスクインタラクション機構による深度推定とデハジングを統合し,性能の相互向上を実現する。 2つのタスクの協調最適化を実現するために,差分認識を用いた代替実装機構を開発した。一方,デハジング結果の深度マップと理想像との差分認識を提案し,デハジングネットワークを促進させ,デハジングの非理想領域に注意を払う。一方、ヘイズ画像の回収困難な領域における深度推定性能を向上させることにより、ヘイズ画像の深度情報を明示的に利用して鮮明な画像復元を支援することができる。深度推定を促進するために,デハズド画像と地上の真実との差を利用して,デハズド一理想領域に焦点を合わせ,深度推定ネットワークを誘導する手法を提案する。これにより、デハジングと深さの推定は、相互に強化された方法で彼らの強みを活用することができる。実験結果から,提案手法は最先端手法よりも優れた性能が得られることが示された。

Recovering a clear image from a single hazy image is an open inverse problem. Although significant research progress has been made, most existing methods ignore the effect that downstream tasks play in promoting upstream dehazing. From the perspective of the haze generation mechanism, there is a potential relationship between the depth information of the scene and the hazy image. Based on this, we propose a dual-task collaborative mutual promotion framework to achieve the dehazing of a single image. This framework integrates depth estimation and dehazing by a dual-task interaction mechanism and achieves mutual enhancement of their performance. To realize the joint optimization of the two tasks, an alternative implementation mechanism with the difference perception is developed. On the one hand, the difference perception between the depth maps of the dehazing result and the ideal image is proposed to promote the dehazing network to pay attention to the non-ideal areas of the dehazing. On the other hand, by improving the depth estimation performance in the difficult-to-recover areas of the hazy image, the dehazing network can explicitly use the depth information of the hazy image to assist the clear image recovery. To promote the depth estimation, we propose to use the difference between the dehazed image and the ground truth to guide the depth estimation network to focus on the dehazed unideal areas. It allows dehazing and depth estimation to leverage their strengths in a mutually reinforcing manner. Experimental results show that the proposed method can achieve better performance than that of the state-of-the-art approaches.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# 低照度映像強調のための時空間アライメントSUNetモデル

A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement ( http://arxiv.org/abs/2403.02408v3 )

ライセンス: Link先を確認

Ruirui Lin, Nantheera Anantrasirichai, Alexandra Malyugina, David Bull,

(参考訳) 低照度条件による歪みは視覚的に不快なだけでなく、コンピュータビジョンタスクのパフォーマンスを低下させる。修復と強化は、非常に有益であることが証明されている。しかし、低照度で取得したビデオ用に明示的に設計された拡張手法は限られている。本稿では,Swin Transformer をバックボーンとした時空間適応SUNet(Spatio-Temporal Aligned SUNet)モデルを提案する。 STA-SUNetモデルは、様々な光条件下でキャプチャされた動的なシーンを含む、新しい完全に登録されたデータセット(BVI)に基づいて訓練されている。さらに3つのテストデータセット上で、他のさまざまなモデルに対して比較分析される。このモデルは全てのデータセットに対して優れた適応性を示し、最も高いPSNRとSSIM値を得る。極端に低照度な条件下では特に有効であり、非常に良好な視覚化結果をもたらす。

Distortions caused by low-light conditions are not only visually unpleasant but also degrade the performance of computer vision tasks. The restoration and enhancement have proven to be highly beneficial. However, there are only a limited number of enhancement methods explicitly designed for videos acquired in low-light conditions. We propose a Spatio-Temporal Aligned SUNet (STA-SUNet) model using a Swin Transformer as a backbone to capture low light video features and exploit their spatio-temporal correlations. The STA-SUNet model is trained on a novel, fully registered dataset (BVI), which comprises dynamic scenes captured under varying light conditions. It is further analysed comparatively against various other models over three test datasets. The model demonstrates superior adaptivity across all datasets, obtaining the highest PSNR and SSIM values. It is particularly effective in extreme low-light conditions, yielding fairly good visualisation results.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# 生成モデリング研究のためのクリニカル人工知能チェックリストに関する最小情報(MI-CLAIM-GEN)

The Minimum Information about CLinical Artificial Intelligence Checklist for Generative Modeling Research (MI-CLAIM-GEN) ( http://arxiv.org/abs/2403.02558v2 )

ライセンス: Link先を確認

Brenda Y. Miao, Irene Y. Chen, Christopher YK Williams, Jaysón Davidson, Augusto Garcia-Agundez, Shenghuan Sun, Travis Zack, Suchi Saria, Rima Arnaout, Giorgio Quer, Hossein J. Sadaei, Ali Torkamani, Brett Beaulieu-Jones, Bin Yu, Milena Gianfrancesco, Atul J. Butte, Beau Norgeot, Madhumita Sushil,

(参考訳) 大規模言語モデル(LLMs)、視覚言語モデル(VLMs)、拡散モデル(拡散モデル)などの生成モデルの最近の進歩は、医学における自然言語と画像処理の分野を加速させ、バイオメディカルモデルを開発・展開する際の重要なパラダイムシフトとなった。これらのモデルは、新しいタスクに非常に適応できるが、その使い方のスケーリングと評価は、以前のフレームワークでは対処できなかった新しい課題を示す。特に、これらのモデルが、専門的なトレーニングデータ(「ゼロ」または「ファウショット」アプローチ)をほとんど必要とせず、有用なアウトプットを生成する能力と、そのアウトプットのオープンな性質は、臨床生成モデル研究の堅牢な報告のための新しいガイドラインの開発を必要としている。米国大統領令141103および臨床AI評価のための新興国ネットワークによって特定される臨床AIツールの開発における標準とベストプラクティスのギャップに対応するため、我々は、元のMI-CLAIMチェックリストに基づいてこれらのガイドラインのいくつかを定式化し始めた。新しいチェックリストであるMI-CLAIM-GEN(Table 1)は、非生成的(予測的)AIモデルと比較して、新しい生成モデルのトレーニング、評価、解釈可能性、再現性の違いに対処することを目的としている。このMI-CLAIM-GENチェックリストは、非構造化臨床データによるコホート選択報告を明確にし、臨床AI研究の倫理基準に沿った追加項目を追加することを目的とする。

Recent advances in generative models, including large language models (LLMs), vision language models (VLMs), and diffusion models, have accelerated the field of natural language and image processing in medicine and marked a significant paradigm shift in how biomedical models can be developed and deployed. While these models are highly adaptable to new tasks, scaling and evaluating their usage presents new challenges not addressed in previous frameworks. In particular, the ability of these models to produce useful outputs with little to no specialized training data ("zero-" or "few-shot" approaches), as well as the open-ended nature of their outputs, necessitate the development of new guidelines for robust reporting of clinical generative model research. In response to gaps in standards and best practices for the development of clinical AI tools identified by US Executive Order 141103 and several emerging national networks for clinical AI evaluation, we begin to formalize some of these guidelines by building on the original MI-CLAIM checklist. The new checklist, MI-CLAIM-GEN (Table 1), aims to address differences in training, evaluation, interpretability, and reproducibility of new generative models compared to non-generative ("predictive") AI models. This MI-CLAIM-GEN checklist also seeks to clarify cohort selection reporting with unstructured clinical data and adds additional items on alignment with ethical standards for clinical AI research.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# 集中治療室(ICU)におけるコンピュータビジョンの活用

Leveraging Computer Vision in the Intensive Care Unit (ICU) for Examining Visitation and Mobility ( http://arxiv.org/abs/2403.06322v2 )

ライセンス: Link先を確認

Scott Siegel, Jiaqing Zhang, Sabyasachi Bandyopadhyay, Subhash Nerella, Brandon Silva, Tezcan Baslanti, Azra Bihorac, Parisa Rashidi,

(参考訳) ICU (Intensive Care Unit) において患者を綿密に監視することの重要性にもかかわらず、医療提供者に課される時間的制約のため、多くの側面が限定的に評価されている。例えば、休息中の過度の訪問は概日リズムの破壊やデリリウムのリスクを悪化させる可能性があるが、ICUでは捕獲されない。同様に、ICU患者の回復または悪化の指標としてモビリティが重要であるが、これは散発的にのみ捕獲されるか、全く捕獲されないかのどちらかである。過去数年間、コンピュータビジョン分野は、人的負担を減らすことで、多くの領域で応用を見出した。 ICUのコンピュータビジョンシステムを使用することで、既存の評価の頻度と精度を高めつつ、スタッフの作業量を削減できる可能性がある。本研究では、奥行き画像に基づく最先端の非侵襲型コンピュータビジョンシステムを活用し、ICU訪問と患者の移動性を特徴付ける。次に、訪問と、痛み、明度、デリリウムなどのいくつかの患者結果との関係について検討する。患者視力低下と訪問の増加に伴うデリリウムの出現との関連を見いだした。一方,DVPRS(Defense and Veteran Pain Rating Scale)を用いた自己報告痛は,来院率の低下と相関した。 ICU患者に対する非侵襲的自律システムの有用性と可能性について検討した。

Despite the importance of closely monitoring patients in the Intensive Care Unit (ICU), many aspects are still assessed in a limited manner due to the time constraints imposed on healthcare providers. For example, although excessive visitations during rest hours can potentially exacerbate the risk of circadian rhythm disruption and delirium, it is not captured in the ICU. Likewise, while mobility can be an important indicator of recovery or deterioration in ICU patients, it is only captured sporadically or not captured at all. In the past few years, the computer vision field has found application in many domains by reducing the human burden. Using computer vision systems in the ICU can also potentially enable non-existing assessments or enhance the frequency and accuracy of existing assessments while reducing the staff workload. In this study, we leverage a state-of-the-art noninvasive computer vision system based on depth imaging to characterize ICU visitations and patients' mobility. We then examine the relationship between visitation and several patient outcomes, such as pain, acuity, and delirium. We found an association between deteriorating patient acuity and the incidence of delirium with increased visitations. In contrast, self-reported pain, reported using the Defense and Veteran Pain Rating Scale (DVPRS), was correlated with decreased visitations. Our findings highlight the feasibility and potential of using noninvasive autonomous systems to monitor ICU patients.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# StainFuser:マルチギガピクセル画像におけるより高速なニューラルスタイル転送のための拡散制御

StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images ( http://arxiv.org/abs/2403.09302v2 )

ライセンス: Link先を確認

Robert Jewsbury, Ruoyu Wang, Abhir Bhalerao, Nasir Rajpoot, Quoc Dang Vu,

(参考訳) 静止正規化アルゴリズムは、ソースマルチギガピクセルのヒストロジー画像の色と強度特性を、対象画像の色に合わせるように変換することを目的としており、画像中の細胞成分の強調に用いられる染色の外観上の矛盾を緩和する。我々は,新しい条件付き潜在拡散アーキテクチャを用いて,この問題をスタイル伝達タスクとして扱う新しいアプローチであるStainFuserを提案し,手作りカラーコンポーネントの必要性を排除した。本手法により,SPI-2Mは,200万枚以上の組織像に対して,高品質な画像変換のためのニューラルスタイル転送を行うため,これまでで最大の染色正規化データセットである。このデータに基づいてトレーニングされたStainFuserは、正規化された画像の品質と、CoNICデータセットのダウンストリームモデルパフォーマンスの観点から、最先端のディープラーニングおよび手作りの手法より優れています。

Stain normalization algorithms aim to transform the color and intensity characteristics of a source multi-gigapixel histology image to match those of a target image, mitigating inconsistencies in the appearance of stains used to highlight cellular components in the images. We propose a new approach, StainFuser, which treats this problem as a style transfer task using a novel Conditional Latent Diffusion architecture, eliminating the need for handcrafted color components. With this method, we curate SPI-2M the largest stain normalization dataset to date of over 2 million histology images with neural style transfer for high-quality transformations. Trained on this data, StainFuser outperforms current state-of-the-art deep learning and handcrafted methods in terms of the quality of normalized images and in terms of downstream model performance on the CoNIC dataset.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# 低ランクボツネックを用いたビジョンランゲージパラメータ効率の良いファインチューニングへのルーティング関数の導入

Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks ( http://arxiv.org/abs/2403.09377v2 )

ライセンス: Link先を確認

Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens,

(参考訳) LoRA(英語版)やAdapter(英語版)のようなメインストリームパラメータ効率の良い微調整(PEFT)手法は、モデルの隠れた状態を低い次元に投影し、トレーニング済みのモデルがこの低ランクのボトルネックを通じて新しいデータに適応できるようにする。しかしながら、視覚言語(VL)タスクのような複数のモダリティを含むPEFTタスクは、新しいデータへの適応だけでなく、異なるモダリティ間の関係も学習する必要がある。 VL PEFTタスクをターゲットに、低ランクボトルネックにおけるVLアライメントを高めるためにルーティング関数と呼ばれる一連の操作を提案する。これらの特徴ルーティング関数は線形演算を採用し、新しいトレーニング可能なパラメータを導入しない。詳細な分析を行ない、その振る舞いを研究する。様々なVL PEFT設定において、ルーティング機能は元のPEFTメソッドのパフォーマンスを大幅に改善し、VQAv2$\text{RoBERTa}_{\text{large}}$+ViT-L/16)とCOCOキャプション(GPT2-medium+ViT-L/16)を20以上改善した。また,CLIP-BARTのような事前学習型マルチモーダルモデルの微調整では,VL PEFTタスクの幅が小さくても一貫した改善が観察される。私たちのコードはhttps://github.com/tingyu215/Routing_VLPEFTで利用可能です。

Mainstream parameter-efficient fine-tuning (PEFT) methods, such as LoRA or Adapter, project a model's hidden states to a lower dimension, allowing pre-trained models to adapt to new data through this low-rank bottleneck. However, PEFT tasks involving multiple modalities, like vision-language (VL) tasks, require not only adaptation to new data but also learning the relationship between different modalities. Targeting at VL PEFT tasks, we propose a family of operations, called routing functions, to enhance VL alignment in the low-rank bottlenecks. These feature routing functions adopt linear operations and do not introduce new trainable parameters. In-depth analyses are conducted to study their behavior. In various VL PEFT settings, the routing functions significantly improve performance of the original PEFT methods, achieving over 20\% improvement on VQAv2 ($\text{RoBERTa}_{\text{large}}$+ViT-L/16) and 30\% on COCO Captioning (GPT2-medium+ViT-L/16). Also when fine-tuning a pre-trained multimodal model such as CLIP-BART, we observe smaller but consistent improvements across a range of VL PEFT tasks. Our code is available at https://github.com/tingyu215/Routing_VLPEFT.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# Glyph-ByT5: 正確なビジュアルテキストレンダリングのためのカスタマイズされたテキストエンコーダ

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering ( http://arxiv.org/abs/2403.09622v2 )

ライセンス: Link先を確認

Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan,

(参考訳) ビジュアルテキストレンダリングは、テキストエンコーダの欠陥が中心的な問題となっている現代テキスト・画像生成モデルにおいて、根本的な課題となっている。正確なテキストレンダリングを実現するために,文字認識とグリフとのアライメントという,テキストエンコーダの2つの重要な要件を特定した。我々のソリューションは、微妙にキュレートされたグリフテキストデータセットを使用して文字認識のBYT5エンコーダを微調整することで、一連のカスタマイズされたテキストエンコーダ、Glyph-ByT5を作成することである。本稿では,Glyph-ByT5をSDXLに統合する方法を提案する。これにより、テキストレンダリングの精度が大幅に向上し、デザインイメージベンチマークで20セント未満から90セント近くに改善します。注目すべきは、Glyph-SDXLの新しいテキスト段落レンダリング機能で、自動的な複数行レイアウトを持つ数十から数百文字のスペル精度を実現することである。最後に,Glyph-SDXLの微調整により,オープンドメイン実画像におけるシーンテキストレンダリング機能を大幅に向上させることを示す。これらの魅力的な成果は、多様で困難なタスクのためにカスタマイズされたテキストエンコーダを設計する際のさらなる調査を促進することを目的としている。

Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies. To achieve accurate text rendering, we identify two crucial requirements for text encoders: character awareness and alignment with glyphs. Our solution involves crafting a series of customized text encoder, Glyph-ByT5, by fine-tuning the character-aware ByT5 encoder using a meticulously curated paired glyph-text dataset. We present an effective method for integrating Glyph-ByT5 with SDXL, resulting in the creation of the Glyph-SDXL model for design image generation. This significantly enhances text rendering accuracy, improving it from less than $20\%$ to nearly $90\%$ on our design image benchmark. Noteworthy is Glyph-SDXL's newfound ability for text paragraph rendering, achieving high spelling accuracy for tens to hundreds of characters with automated multi-line layouts. Finally, through fine-tuning Glyph-SDXL with a small set of high-quality, photorealistic images featuring visual text, we showcase a substantial improvement in scene text rendering capabilities in open-domain real images. These compelling outcomes aim to encourage further exploration in designing customized text encoders for diverse and challenging tasks.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# 予算リサイクルの差別化

Budget Recycling Differential Privacy ( http://arxiv.org/abs/2403.11445v4 )

ライセンス: Link先を確認

Bo Jiang, Jian Du, Sagar Sharma, Qiang Yan,

(参考訳) 差分プライバシー(DP)メカニズムは通常、厳格なプライバシー予算のために"アウト・オブ・バウンド"ノイズのある結果を生成することによって、データユーティリティを強制的に削減する。本稿では,既存のDPメカニズムに対して,ソフトバウンドなノイズ出力を提供するために,BR-DP(Budgetcycle Differential Privacy)フレームワークを導入する。ソフトバウンド”では、事前に定義されたエラー境界内でほとんどのアウトプットを解放し、ユーティリティを改善し、同時にプライバシを維持するメカニズムの能力について言及する。 BR-DPのコアは2つのコンポーネントから構成される: 繰り返しごとにノイズの答えを生成するDPカーネルと、ノイズの答えを確率的にリサイクルまたは再生するリサイクル器である。我々は, BR-DP のプライバシ会計を探求し, DP カーネルとリサイクルシステムの間で利用可能な予算を最適にサブアロケーションする予算策定の原則を策定する。さらに, 構成シナリオにおけるBR-DPの厳密な会計アルゴリズムを導入し, BR-DPは, DPに比べてプライバシー漏洩後のコンポジションの低減を実現していることを示す。さらに、BR-DPフレームワーク内でのサブサンプリングによるプライバシアンプリフィケーションの概念について検討し、様々なクエリに対するBR-DPの最適なサンプリングレートを提案する。実データを用いて実験を行い, BR-DPがDP機構によって提供されるユーティリティ・プライバシ・トレードオフを解除する効果を実証した。

Differential Privacy (DP) mechanisms usually {force} reduction in data utility by producing "out-of-bound" noisy results for a tight privacy budget. We introduce the Budget Recycling Differential Privacy (BR-DP) framework, designed to provide soft-bounded noisy outputs for a broad range of existing DP mechanisms. By "soft-bounded," we refer to the mechanism's ability to release most outputs within a predefined error boundary, thereby improving utility and maintaining privacy simultaneously. The core of BR-DP consists of two components: a DP kernel responsible for generating a noisy answer per iteration, and a recycler that probabilistically recycles/regenerates or releases the noisy answer. We delve into the privacy accounting of BR-DP, culminating in the development of a budgeting principle that optimally sub-allocates the available budget between the DP kernel and the recycler. Furthermore, we introduce algorithms for tight BR-DP accounting in composition scenarios, and our findings indicate that BR-DP achieves reduced privacy leakage post-composition compared to DP. Additionally, we explore the concept of privacy amplification via subsampling within the BR-DP framework and propose optimal sampling rates for BR-DP across various queries. We experiment with real data, and the results demonstrate BR-DP's effectiveness in lifting the utility-privacy tradeoff provided by DP mechanisms.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# 畳み込み層に対する Roesser 型の状態空間表現

State space representations of the Roesser type for convolutional layers ( http://arxiv.org/abs/2403.11938v2 )

ライセンス: Link先を確認

Patricia Pauli, Dennis Gramlich, Frank Allgöwer,

(参考訳) 制御理論の観点からは、畳み込み層(ニューラルネットワーク)は2-D(またはN-D)線形時間不変力学系である。畳み込みカーネルによる畳み込み層の通常の表現は、そのインパルス応答による力学系の表現に対応する。しかし、制御理論からの多くの解析ツール、例えば線型行列の不等式は状態空間表現を必要とする。この理由から、我々は、$c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ state, where $c_\mathrm{in}$/c_\mathrm{out}$は層の入出力チャネルの数であり、$r_1$/$r_2$は、畳み込みカーネルの幅と長さを特徴づける。この表現は$c_\mathrm{in} = c_\mathrm{out}$に対して最小であることが示されている。さらに、拡張、ストライド、N-D畳み込みのための状態空間表現を構築する。

From the perspective of control theory, convolutional layers (of neural networks) are 2-D (or N-D) linear time-invariant dynamical systems. The usual representation of convolutional layers by the convolution kernel corresponds to the representation of a dynamical system by its impulse response. However, many analysis tools from control theory, e.g., involving linear matrix inequalities, require a state space representation. For this reason, we explicitly provide a state space representation of the Roesser type for 2-D convolutional layers with $c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ states, where $c_\mathrm{in}$/$c_\mathrm{out}$ is the number of input/output channels of the layer and $r_1$/$r_2$ characterizes the width/length of the convolution kernel. This representation is shown to be minimal for $c_\mathrm{in} = c_\mathrm{out}$. We further construct state space representations for dilated, strided, and N-D convolutions.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# EnvGen: 人工呼吸器を訓練するためのLLMによる環境の生成と適応

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents ( http://arxiv.org/abs/2403.12014v2 )

ライセンス: Link先を確認

Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal,

(参考訳) 近年のSOTAアプローチでは,環境における次のステップを決定するために,大規模言語モデル(LLM)を直接エージェントとして採用している。世界的知識と推論能力のため、LLMエージェントは強化学習(RL)に基づく従来のより小さなエージェントよりも高い性能を達成するが、LLMを頻繁に呼び出すのは遅くて高価である。 LLMをエージェントとして直接利用する代わりに、LLMの推論機能を使って、より小さなRLエージェントが、弱いスキルを学ぶのに役立つトレーニング環境を適応的に作成できますか? 本稿では,この問題に対処するための新しいフレームワークであるEnvGenを提案する。まず, LLMに, エージェントが学習すべきタスク記述とシミュレーションの目標を与え, 環境設定(例えば, 異なる地形, 当初エージェントに与えられた項目など)のセットを生成するように要求することで, トレーニング環境を生成するよう促す。次に、LLM生成環境とLLM生成環境を混合した小さなRLエージェントを訓練する。次に, LLMが生成した環境を継続的に適応させ, エージェントのパフォーマンスの形でLLMにフィードバックを提供することにより, エージェントが弱いスキルを徐々に向上させる。 Crafter および Heist 環境での総合的な実験により,EnvGen の有用性を実証する。我々は、EnvGenで訓練された小さなRLエージェントが、GPT-4エージェントを含むSOTAメソッドより優れており、長い水平タスクをかなり高速に学習できることを発見した。また,LLMを用いてカリキュラム学習の手法を動的に改善し,RLエージェントの能力向上にどのように適応するかを示す。さらに、EnvGenは、少数のLLMコール(例えば、合計4)しか使用していないのに対して、LLMエージェントは数千の呼び出しを必要とするため、かなり効率的である。最後に、EnvGen設計選択に関する詳細なアブレーション研究について述べる。

Recent SOTA approaches for embodied learning via interaction directly employ large language models (LLMs) as agents to determine the next steps in an environment. Due to their world knowledge and reasoning capabilities, LLM agents achieve stronger performance than previous smaller agents based on reinforcement learning (RL); however, frequently calling LLMs is slow and expensive. Instead of directly employing LLMs as agents, can we use LLMs' reasoning capabilities to adaptively create training environments to help smaller RL agents learn useful skills that they are weak at? We propose EnvGen, a novel framework to address this question. We first prompt an LLM to generate training environments by giving it the task description and simulator objectives that the agents should learn and then asking it to generate a set of environment configurations (e.g., different terrains, items initially given to agents, etc.). Next, we train a small RL agent in a mixture of the original and LLM-generated environments. Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance. We demonstrate the usefulness of EnvGen with comprehensive experiments in Crafter and Heist environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster. We also show that using an LLM to adapt environments dynamically outperforms curriculum learning approaches and how the environments are adapted to help improve RL agents' weaker skills over time. Additionally, EnvGen is substantially more efficient as it only uses a small number of LLM calls (e.g., 4 in total), whereas LLM agents require thousands of calls. Lastly, we present detailed ablation studies for EnvGen design choices.

翻訳日:2024-07-16 05:07:34 公開日:2024-07-12

# カメラローカライゼーションのためのニューラルボリュームポーズ特徴の学習

Learning Neural Volumetric Pose Features for Camera Localization ( http://arxiv.org/abs/2403.12800v4 )

ライセンス: Link先を確認

Jingyu Lin, Jiaqi Gu, Bojian Wu, Lubin Fan, Renjie Chen, Ligang Liu, Jieping Ye,

(参考訳) 本稿では,PoseMapと呼ばれるニューラルボリュームポーズ機能を導入し,画像と関連するカメラポーズの情報をカプセル化することで,カメラのローカライゼーションを強化する。我々のフレームワークは、拡張されたNeRFモジュールとともにAPR(Absolute Pose Regression)アーキテクチャを活用している。この統合は、トレーニングデータセットを豊かにする新しいビューの生成を促進するだけでなく、効果的なポーズ特徴の学習も可能にする。さらに、自己教師付きオンラインアライメントのためのアーキテクチャを拡張し、統合されたフレームワーク内で、未実装の画像に対してメソッドを使用および微調整できるようにします。室内および屋外のベンチマークシーンで平均14.28%, 20.51%の性能向上が得られた。

We introduce a novel neural volumetric pose feature, termed PoseMap, designed to enhance camera localization by encapsulating the information between images and the associated camera poses. Our framework leverages an Absolute Pose Regression (APR) architecture, together with an augmented NeRF module. This integration not only facilitates the generation of novel views to enrich the training dataset but also enables the learning of effective pose features. Additionally, we extend our architecture for self-supervised online alignment, allowing our method to be used and fine-tuned for unlabelled images within a unified framework. Experiments demonstrate that our method achieves 14.28% and 20.51% performance gain on average in indoor and outdoor benchmark scenes, outperforming existing APR methods with state-of-the-art accuracy.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# BaCon: バランスのとれた特徴レベルのコントラスト学習による非バランスな半教師あり学習の促進

BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning ( http://arxiv.org/abs/2403.12986v2 )

ライセンス: Link先を確認

Qianhan Feng, Lujing Xie, Shijie Fang, Tong Lin,

(参考訳) 半教師付き学習(SSL)は、ディープラーニングにおける広範なアノテーションの必要性を減らしますが、SSLにおける不均衡なデータ分散のより現実的な課題は、まだ明らかにされていません。クラス不均衡半教師学習(CISSL)では、信頼できない擬似ラベルによって引き起こされるバイアスは、不均衡なデータ分布によって悪化させることができる。既存のほとんどのメソッドは、再重み付けや再サンプリングを通じて、インスタンスレベルでこの問題に対処するが、パフォーマンスはバイアス付きバックボーン表現に依存しているため、非常に制限されている。その他の方法は、機能ブレンディングのような機能レベルの調整を行うが、好ましくないノイズをもたらす可能性がある。本稿では、CISSL問題に対するよりバランスのとれた特徴分布のボーナスについて論じ、さらにバランスのとれた特徴レベルコントラスト学習法(BaCon)を提案する。提案手法は、よく設計されたコントラスト的な方法で、インスタンスの表現の分布を直接正規化する。特に、クラスワイドの特徴中心は正のアンカーとして計算され、負のアンカーは単純で効果的なメカニズムによって選択される。分布関連温度調整を利用して、クラスワイドコントラストの度合いを動的に制御する。提案手法は, CIFAR10-LT, CIFAR100-LT, STL10-LT, SVHN-LTデータセットを様々な設定で包括的に実験することにより, その有効性を示す。例えば、BaConはCIFAR10-LTのインスタンスレベルのFixMatchベースのABCを1.21%の精度で上回り、CIFAR100-LTのCoSSLの精度は0.63%向上した。より極端な不均衡の度合いに直面すると、BaConは他の方法よりも堅牢性も向上する。

Semi-supervised Learning (SSL) reduces the need for extensive annotations in deep learning, but the more realistic challenge of imbalanced data distribution in SSL remains largely unexplored. In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions. Most existing methods address this issue at instance-level through reweighting or resampling, but the performance is heavily limited by their reliance on biased backbone representation. Some other methods do perform feature-level adjustments like feature blending but might introduce unfavorable noise. In this paper, we discuss the bonus of a more balanced feature distribution for the CISSL problem, and further propose a Balanced Feature-Level Contrastive Learning method (BaCon). Our method directly regularizes the distribution of instances' representations in a well-designed contrastive manner. Specifically, class-wise feature centers are computed as the positive anchors, while negative anchors are selected by a straightforward yet effective mechanism. A distribution-related temperature adjustment is leveraged to control the class-wise contrastive degrees dynamically. Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT datasets across various settings. For example, BaCon surpasses instance-level method FixMatch-based ABC on CIFAR10-LT with a 1.21% accuracy improvement, and outperforms state-of-the-art feature-level method CoSSL on CIFAR100-LT with a 0.63% accuracy improvement. When encountering more extreme imbalance degree, BaCon also shows better robustness than other methods.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# Find n' Propagate: 都市環境におけるオープンボキャブラリ3次元物体検出

Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments ( http://arxiv.org/abs/2403.13556v2 )

ライセンス: Link先を確認

Djamahl Etchegaray, Zi Huang, Tatsuya Harada, Yadan Luo,

(参考訳) 本研究では,従来のLiDARに基づく3次元オブジェクト検出システムの限界に対処する。都市環境におけるオープンボキャブラリ(OV)学習の探索は,複数センサデータを用いた事前学習型視覚言語モデル(VLM)を用いて,新規なインスタンスを捕捉することを目的としている。入力データ戦略に基づいて、トップダウンまたはボトムアップのアプローチに分類し、ベースラインとして4つの潜在的なソリューションを設計し、ベンチマークする。有効ではあるが、これらの手法は、3Dボックス推定における新しい物体の欠如や厳密な事前適用といった一定の制限を示しており、カメラや長方形地形の物体に偏りが生じる。これらの制約を克服するために、新しい物体のリコールを最大化し、この検出能力をより遠くまで伝播させることを目的として、3次元OVタスクに対して普遍的な \textsc{Find n' Propagate} アプローチを導入する。特に、グリーディボックス探索器を用いて、生成したフラストラムごとに異なる向きと深さの3D新鮮ボックスを探索し、クロスアライメントと密度ランク付けにより、新たに同定されたボックスの信頼性を確保する。さらに、カメラ近位物体に対する固有のバイアスは、メモリバンク内のベースサンプルの融合と相まって、自己学習プロセスにおいて擬似ラベル付き新規インスタンスをランダムに分散する遠隔シミュレーターによって軽減される。大規模な実験では、様々なOV設定、VLM、および3D検出器にまたがる新しいリコールが53%改善された。特に、新しいオブジェクトクラスに対する平均精度(AP)が最大3.97倍に向上する。ソースコードはhttps://github.com/djamahl99/findnpropagateで公開されている。

In this work, we tackle the limitations of current LiDAR-based 3D object detection systems, which are hindered by a restricted class vocabulary and the high costs associated with annotating new object classes. Our exploration of open-vocabulary (OV) learning in urban environments aims to capture novel instances using pre-trained vision-language models (VLMs) with multi-sensor data. We design and benchmark a set of four potential solutions as baselines, categorizing them into either top-down or bottom-up approaches based on their input data strategies. While effective, these methods exhibit certain limitations, such as missing novel objects in 3D box estimation or applying rigorous priors, leading to biases towards objects near the camera or of rectangular geometries. To overcome these limitations, we introduce a universal \textsc{Find n' Propagate} approach for 3D OV tasks, aimed at maximizing the recall of novel objects and propagating this detection capability to more distant areas thereby progressively capturing more. In particular, we utilize a greedy box seeker to search against 3D novel boxes of varying orientations and depth in each generated frustum and ensure the reliability of newly identified boxes by cross alignment and density ranker. Additionally, the inherent bias towards camera-proximal objects is alleviated by the proposed remote simulator, which randomly diversifies pseudo-labeled novel instances in the self-training process, combined with the fusion of base samples in the memory bank. Extensive experiments demonstrate a 53% improvement in novel recall across diverse OV settings, VLMs, and 3D detectors. Notably, we achieve up to a 3.97-fold increase in Average Precision (AP) for novel object classes. The source code is made available at https://github.com/djamahl99/findnpropagate.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# HAC:3次元ガウス切削圧縮のためのハッシュグリッド支援コンテキスト

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression ( http://arxiv.org/abs/2403.14530v3 )

ライセンス: Link先を確認

Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai,

(参考訳) 3D Gaussian Splatting (3DGS)は、新しいビュー合成のための有望なフレームワークとして登場し、高速レンダリング速度と高忠実さを誇っている。しかし、ガウスとその関連属性は効果的な圧縮技術を必要とする。それでも、ガウシアン(あるいは論文のアンカー)の点雲のスパースで非組織的な性質は、圧縮の課題を提示している。そこで我々は,非組織型アンカーと構造化ハッシュグリッドの関係を利用して,それらの相互情報をコンテキストモデリングに活用し,高度にコンパクトな3DGS表現のためのHash-grid Assisted Context(HAC)フレームワークを提案する。提案手法では, 連続的な空間的整合性を確立するための2値ハッシュグリッドを導入し, 慎重に設計した文脈モデルを用いて, アンカーの空間的関係を明らかにする。エントロピー符号化を容易にするために,我々はガウス分布を用いて各量子化属性の確率を正確に推定する。さらに,無効なガウスとアンカーを除去するために,適応的なマスキング戦略を取り入れた。重要なことは、我々の研究は3DGS表現の文脈ベースの圧縮を探求する先駆者であり、その結果、バニラ3DGSと比較して75ドル以上のコスト削減が達成され、同時に忠実度が向上し、SOTA3DGS圧縮アプローチであるScaffold-GSよりも11ドル以上のコスト削減が達成された。私たちのコードはこちらで入手可能です。

3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To address this, we make use of the relations between the unorganized anchors and the structured hash grid, leveraging their mutual information for context modeling, and propose a Hash-grid Assisted Context (HAC) framework for highly compact 3DGS representation. Our approach introduces a binary hash grid to establish continuous spatial consistencies, allowing us to unveil the inherent spatial relations of anchors through a carefully designed context model. To facilitate entropy coding, we utilize Gaussian distributions to accurately estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Additionally, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Importantly, our work is the pioneer to explore context-based compression for 3DGS representation, resulting in a remarkable size reduction of over $75\times$ compared to vanilla 3DGS, while simultaneously improving fidelity, and achieving over $11\times$ size reduction over SOTA 3DGS compression approach Scaffold-GS. Our code is available here: https://github.com/YihangChen-ee/HAC

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# 大規模モデルのためのパラメータ効率の良いファインチューニング:包括的調査

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey ( http://arxiv.org/abs/2403.14608v6 )

ライセンス: Link先を確認

Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, Sai Qian Zhang,

(参考訳) 大規模モデルは、複数のアプリケーション分野における画期的な進歩を表しており、様々なタスクにおける顕著な達成を可能にしている。しかし、その前例のない規模には計算コストがかなり伴う。これらのモデルはしばしば数十億のパラメータで構成され、実行には膨大な量の計算資源を必要とする。特に、拡張スケールと計算要求は、特定の下流タスク、特に計算能力に制約されたハードウェアプラットフォームをカスタマイズする際に大きな課題を生じさせる。パラメータ効率の良いファインチューニング(PEFT)は、様々な下流タスクに対して大きなモデルを効率的に調整することで、実用的なソリューションを提供する。特にPEFTは、訓練済みの大規模モデルのパラメータを特定のタスクやドメインに適応させ、導入された追加パラメータの数や計算資源を最小限に抑えるプロセスを指す。これらのモデルをスクラッチから微調整することは、計算コストが高く、リソース集約的であり、システムプラットフォーム設計をサポートする上で大きな課題となるため、大規模な言語モデルに高いパラメータ数で対処する上で特に重要である。本稿では,様々なPEFTアルゴリズムの総合的な研究を行い,その性能と計算オーバーヘッドについて検討する。さらに,異なるPEFTアルゴリズムを用いて開発されたアプリケーションの概要を述べるとともに,PEFTの計算コストを軽減するための一般的な手法について議論する。アルゴリズムの観点からの広範な調査に加えて,様々なPEFT手法による実装コストを調査するために,実世界のシステム設計についても検討する。この調査は、PEFTアルゴリズムとそのシステム実装の両方を理解することを目的とした研究者にとって、必須のリソースとなる。

Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pose considerable challenges when customizing them for particular downstream tasks, particularly over the hardware platforms constrained by computational capabilities. Parameter Efficient Fine-Tuning (PEFT) provides a practical solution by efficiently adjusting the large models over the various downstream tasks. In particular, PEFT refers to the process of adjusting the parameters of a pre-trained large models to adapt it to a specific task or domain while minimizing the number of additional parameters introduced or computational resources required. This approach is particularly important when dealing with large-scale language models with high parameter counts, as fine-tuning these models from scratch can be computationally expensive and resource-intensive, posing considerable challenges in the supporting system platform design. In this survey, we present comprehensive studies of various PEFT algorithms, examining their performance and computational overhead. Moreover, we provide an overview of applications developed using different PEFT algorithms and discuss common techniques employed to mitigate computation costs for PEFT. In addition to providing an extensive survey from an algorithmic standpoint, we also examine various real-world system designs to investigate the implementation costs associated with different PEFT approaches. This survey serves as an indispensable resource for researchers aiming to understand both the PEFT algorithm and its system implementation, offering detailed ......

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# 量子コンピュータを用いた絡み合い力学による素数同定

Using quantum computers to identify prime numbers via entanglement dynamics ( http://arxiv.org/abs/2403.14703v2 )

ライセンス: Link先を確認

Victor F. dos Santos, Jonas Maziero,

(参考訳) 近年,分離型コヒーレント状態に初期準備された2つの高調波発振器の絡み合いダイナミクスが,素数同定のための経路として実証された。本稿では、一般化されたアプローチを示し、スケーラブルなフォールトトレラント量子ビットベースの量子コンピュータにおけるこの理論概念の実装を可能にする決定論的アルゴリズムの概要を示す。本アルゴリズムで用いられる対角ユニタリ演算は,従来報告されていた一般対角ユニタリの指数的複雑性とは対照的に,次数2の多項式時間複雑性を示す。

Recently, the entanglement dynamics of two harmonic oscillators initially prepared in a separable-coherent state was demonstrated to offer a pathway for prime number identification. This article presents a generalized approach and outlines a deterministic algorithm making possible the implementation of this theoretical concept on scalable fault-tolerant qubit-based quantum computers. We prove that the diagonal unitary operations employed in our algorithm exhibit a polynomial-time complexity of degree two, contrasting with the previously reported exponential complexity of general diagonal unitaries.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# 大規模言語モデルはコンテキスト内を探索できるのか?

Can large language models explore in-context? ( http://arxiv.org/abs/2403.15371v2 )

ライセンス: Link先を確認

Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins,

(参考訳) 本稿では,現代における大規模言語モデル(LLM)が,強化学習と意思決定における中核的能力である探索にどの程度関与できるかを考察する。既存のLLMのネイティブパフォーマンスをトレーニングの介入なしに重視する。簡単なマルチアームバンディット環境において, LLMをエージェントとしてデプロイし, LLMプロンプト内で環境記述とインタラクション履歴を完全にコンテキスト内で指定する。 GPT-3.5, GPT-4, および Llama2 を各種のプロンプト設計を用いて実験した結果, モデルが実質的な介入なしには探索に強く関与しないことが判明した。一すべての実験において、十分な統計として提示されたチェーン・オブ・ソート推論と外部要約された相互作用履歴を備えたGPT-4の1つの構成だけで十分な探索行動が得られた。 ii)他のすべての構成は、チェーン・オブ・シークレットの推論を行うが、未熟な歴史を持つものを含め、堅牢な探索行動には至らなかった。これらの知見は肯定的に解釈できるが、より複雑な環境では不可能かもしれない外部の要約は、LSMエージェントから望ましい行動を得るために重要であることを示唆している。我々は,LLMに基づく意思決定エージェントを複雑な設定で強化するために,微調整やデータセットキュレーションなどの非自明なアルゴリズム介入が必要であると結論付けている。

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust exploratory behavior, including those with chain-of-thought reasoning but unsummarized history. Although these findings can be interpreted positively, they suggest that external summarization -- which may not be possible in more complex settings -- is important for obtaining desirable behavior from LLM agents. We conclude that non-trivial algorithmic interventions, such as fine-tuning or dataset curation, may be required to empower LLM-based decision making agents in complex settings.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# グラフ色問題に対する部分順序付けモデルのSAT符号化

SAT Encoding of Partial Ordering Models for Graph Coloring Problems ( http://arxiv.org/abs/2403.15961v2 )

ライセンス: Link先を確認

Daniel Faber, Adalat Jabrayilov, Petra Mutzel,

(参考訳) 本稿では,グラフ着色問題 (GCP) と帯域幅着色問題 (BCP) に対する部分順序付けベースLPモデルの新たなSAT符号化を提案する。 GCPは、与えられたグラフの頂点に割り当てられる最小の色数を求め、隣接する2つの頂点はそれぞれ異なる色を得る。 BCPは一般化であり、各エッジは、割り当てられた色の間に最小の「距離」を強制する重みを持ち、その目標は、使用される「最大の」色を最小化することである。広く研究されているGCPでは、新しいSATエンコーディングとDIMACSベンチマークセットの最先端アプローチを実験的に比較する。評価の結果、このSAT符号化はスパースグラフに有効であり、DIMACSインスタンスの最先端よりも優れていたことが確認された。 BCP では,部分順序付きSAT と ILP の定式化が古典的代入ベースモデルよりも漸近的に小さいことを示す。実際の評価では,代入ベースの符号化よりも,ベンチマークインスタンスの集合に対する最先端のアプローチの方が優位であることが確認されている。私たちの知る限り、BCPのいくつかのオープンな事例を文献から初めて解決しました。

In this paper, we suggest new SAT encodings of the partial-ordering based ILP model for the graph coloring problem (GCP) and the bandwidth coloring problem (BCP). The GCP asks for the minimum number of colors that can be assigned to the vertices of a given graph such that each two adjacent vertices get different colors. The BCP is a generalization, where each edge has a weight that enforces a minimal "distance" between the assigned colors, and the goal is to minimize the "largest" color used. For the widely studied GCP, we experimentally compare our new SAT encoding to the state-of-the-art approaches on the DIMACS benchmark set. Our evaluation confirms that this SAT encoding is effective for sparse graphs and even outperforms the state-of-the-art on some DIMACS instances. For the BCP, our theoretical analysis shows that the partial-ordering based SAT and ILP formulations have an asymptotically smaller size than that of the classical assignment-based model. Our practical evaluation confirms not only a dominance compared to the assignment-based encodings but also to the state-of-the-art approaches on a set of benchmark instances. Up to our knowledge, we have solved several open instances of the BCP from the literature for the first time.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# NeuSDFusion:3次元形状補完・再構成・生成のための空間認識生成モデル

NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation ( http://arxiv.org/abs/2403.18241v2 )

ライセンス: Link先を確認

Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan Ji,

(参考訳) 3D形状生成は、特定の条件や制約に固執する革新的な3Dコンテンツを作成することを目的としている。既存の方法では、しばしば3次元形状を局所成分の列に分解し、各要素を空間的一貫性を考慮せずに分離して扱う。その結果、これらの手法は、3次元データ表現と形状生成において限られた汎用性を示し、指定された制約を満たす高度に多様な3次元形状を生成する能力を妨げている。本稿では,2次元平面表現を利用した空間認識型3次元形状生成フレームワークを提案する。空間コヒーレンスを確保し,メモリ使用量を削減するため,直交2次元平面を用いて3次元形状の連続符号付き距離場表現を直接学習するハイブリッド形状表現手法を組み込んだ。さらに,トランスを用いたオートエンコーダ構造を用いて,異なる平面間の空間的対応を慎重に実施し,生成した3次元形状における空間的関係の保存を促進する。これにより、無条件形状生成、マルチモーダル形状完了、単一ビュー再構成、テキスト・ツー・シェイプ合成など、様々なタスクにおける最先端の3D形状生成手法を一貫して上回るアルゴリズムが得られる。私たちのプロジェクトページはhttps://weizheliu.github.io/NeuSDFusion/ で公開されています。

3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation without considering spatial consistency. As a result, these approaches exhibit limited versatility in 3D data representation and shape generation, hindering their ability to generate highly diverse 3D shapes that comply with the specified constraints. In this paper, we introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling. To ensure spatial coherence and reduce memory usage, we incorporate a hybrid shape representation technique that directly learns a continuous signed distance field representation of the 3D shape using orthogonal 2D planes. Additionally, we meticulously enforce spatial correspondences across distinct planes using a transformer-based autoencoder structure, promoting the preservation of spatial relationships in the generated 3D shapes. This yields an algorithm that consistently outperforms state-of-the-art 3D shape generation methods on various tasks, including unconditional shape generation, multi-modal shape completion, single-view reconstruction, and text-to-shape synthesis. Our project page is available at https://weizheliu.github.io/NeuSDFusion/ .

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# 深部因果生成モデルの半教師付き学習

Semi-Supervised Learning for Deep Causal Generative Models ( http://arxiv.org/abs/2403.18717v2 )

ライセンス: Link先を確認

Yasin Ibrahim, Hermione Warr, Konstantinos Kamnitsas,

(参考訳) yがzであった場合、xはどのように変化するか」という形式の疑問に答えることのできるモデルを開発することは、医療画像解析の進歩に不可欠である。しかし、このような反事実的問題に対処する因果生成モデルの訓練には、現在、すべての関連する変数が観察され、対応するラベルがトレーニングデータで利用可能であることが要求されている。しかし、臨床データは全患者の完全な記録を持っておらず、最先端の因果生成モデルでは十分に活用できない。そこで本研究では,変数間の因果関係を利用して全データの利用を最大化する半教師付き深い因果生成モデルを開発した。それぞれのサンプルが完全にラベル付けされているか、完全にラベル付けされていないかで、また各サンプルに異なるラベルが欠落しているというより臨床的に現実的なケースでこれを調査する。不完全なラベルを持つサンプルであっても、因果推論の手法を利用して、欠落した値を推測し、現実的な反事実を生成する。

Developing models that are capable of answering questions of the form "How would x change if y had been z?'" is fundamental to advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that the corresponding labels are available in the training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# ペアワイズ選好比較によるメトリックラーニング

Metric Learning from Limited Pairwise Preference Comparisons ( http://arxiv.org/abs/2403.19629v2 )

ライセンス: Link先を確認

Zhi Wang, Geelon So, Ramya Korlakai Vinayak,

(参考訳) 理想点モデルに基づく選好比較からメートル法学習について検討し、潜在する理想点に近ければ、ある項目を他の項目よりも好んで選択する。これらのアイテムは、ユーザ間で共有される未知のマハラノビス距離を備えた$\mathbb{R}^d$に埋め込まれる。最近の研究は、$\mathcal{O}(d)$ペアワイズの比較を1人あたり$\mathcal{O}(d)$で同時に回収できることを示しているが、実際には$o(d)$比較の限られた予算を持つことが多い。個人理想の項目を学習することはもはや不可能であるにもかかわらず、この指標が依然として回復可能であるかどうかを考察する。一般に、$o(d)$比較は、無限に多くのユーザでさえ、計量に関する情報を示さないことを示す。しかし、低次元構造を示す項目を比較した場合、各利用者は低次元部分空間に制限された計量を学習して、計量を共同で識別することができる。そこで本稿では,この問題を解決し,理論的回復保証と実証的検証を提供する。

We study metric learning from preference comparisons under the ideal point model, in which a user prefers an item over another if it is closer to their latent ideal item. These items are embedded into $\mathbb{R}^d$ equipped with an unknown Mahalanobis distance shared across users. While recent work shows that it is possible to simultaneously recover the metric and ideal items given $\mathcal{O}(d)$ pairwise comparisons per user, in practice we often have a limited budget of $o(d)$ comparisons. We study whether the metric can still be recovered, even though it is known that learning individual ideal items is now no longer possible. We show that in general, $o(d)$ comparisons reveal no information about the metric, even with infinitely many users. However, when comparisons are made over items that exhibit low-dimensional structure, each user can contribute to learning the metric restricted to a low-dimensional subspace so that the metric can be jointly identified. We present a divide-and-conquer approach that achieves this, and provide theoretical recovery guarantees and empirical validation.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# GraspXL: スケールでの異物に対するグラッピング運動の生成

GraspXL: Generating Grasping Motions for Diverse Objects at Scale ( http://arxiv.org/abs/2403.19649v2 )

ライセンス: Link先を確認

Hui Zhang, Sammy Christen, Zicong Fan, Otmar Hilliges, Jie Song,

(参考訳) 人間の手は、対象の特定の部分をつかんだり、望ましい方向から近づいたりするなど、多様な物体と相互作用する器用さを持っている。さらに重要なのは、人間は物体固有のスキルを使わずにあらゆる形の物体を把握できるということです。近年の作業では、所望の進路方向や把握領域などの単一目的に追従する把握動作を合成している。さらに、トレーニングや推論の間、高価な3Dハンドオブジェクトデータに頼っているため、大規模に見えない物体の把握動作を合成する能力が制限される。本論文では、政策学習フレームワークGraspXLにおいて、複数の運動対象物、多様な物体形状、および器用な手形態にまたがる手対象把握動作の生成を統一する。目的は、把握可能な領域、接近中の方向、手首回転、手の位置から成り立っている。 3Dハンドオブジェクトのインタラクションデータを必要としないため、58個のオブジェクトでトレーニングされたポリシーは、成功率82.2%の500万以上の未確認オブジェクトに対する多様な把握動作を堅牢に合成することができる。同時に、ポリシーは目的に固執し、オブジェクトごとの多様な把握の生成を可能にする。さらに、我々のフレームワークは、異なるデクスタラスハンドにデプロイされ、再構成または生成されたオブジェクトで作業可能であることを示す。提案手法の有効性を定量的に,質的に評価した。私たちのモデル、コード、そして大規模な生成されたモーションはhttps://eth-ait.github.io/graspxl/.com/で利用可能です。

Human hands possess the dexterity to interact with diverse objects such as grasping specific parts of the objects and/or approaching them from desired directions. More importantly, humans can grasp objects of any shape without object-specific skills. Recent works synthesize grasping motions following single objectives such as a desired approach heading direction or a grasping area. Moreover, they usually rely on expensive 3D hand-object data during training and inference, which limits their capability to synthesize grasping motions for unseen objects at scale. In this paper, we unify the generation of hand-object grasping motions across multiple motion objectives, diverse object shapes and dexterous hand morphologies in a policy learning framework GraspXL. The objectives are composed of the graspable area, heading direction during approach, wrist rotation, and hand position. Without requiring any 3D hand-object interaction data, our policy trained with 58 objects can robustly synthesize diverse grasping motions for more than 500k unseen objects with a success rate of 82.2%. At the same time, the policy adheres to objectives, which enables the generation of diverse grasps per object. Moreover, we show that our framework can be deployed to different dexterous hands and work with reconstructed or generated objects. We quantitatively and qualitatively evaluate our method to show the efficacy of our approach. Our model, code, and the large-scale generated motions are available at https://eth-ait.github.io/graspxl/.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# 潜伏拡散空間における潜伏透かし:潜伏拡散空間における透かしの注入と検出

Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space ( http://arxiv.org/abs/2404.00230v2 )

ライセンス: Link先を確認

Zheling Meng, Bo Peng, Jing Dong,

(参考訳) ウォーターマーキング(英: Watermarking)は、潜伏拡散モデルによって生成された画像を積極的に識別し、帰属するツールである。既存の手法は、画質と透かしの堅牢性のジレンマに直面している。画像品質の優れた透かしは通常、ぼかしやJPEG圧縮などの攻撃に対して弱い頑健さを持つが、優れた強靭性を持つ透かしは通常、画像品質に著しくダメージを与える。このジレンマは、透かしがピクセル空間に注入され、検出される伝統的なパラダイムに由来し、透かしの検出と攻撃に対するレジリエンスにピクセルの摂動に依存している。本稿では,潜伏拡散空間における透かしの注入と検出を効果的に行うことを強調し,進行的学習戦略を用いた潜伏透かしを提案する。品質とロバスト性の間の直接的な関係を弱め、矛盾を和らげる。 2つのデータセットと10のウォーターマーク攻撃に対して評価を行う。 6のメトリクスは、画質と透かしの堅牢性を測定する。その結果、StegaStamp、StableSignature、RoSteALS、TreeRingといった最近提案された手法と比較して、LWはロバスト性だけでなく、画質も優れていることがわかった。私たちのコードはhttps://github.com/RichardSunnyMeng/LatentWatermarkで公開されます。

Watermarking is a tool for actively identifying and attributing the images generated by latent diffusion models. Existing methods face the dilemma of image quality and watermark robustness. Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression, while watermarks with superior robustness usually significantly damage image quality. This dilemma stems from the traditional paradigm where watermarks are injected and detected in pixel space, relying on pixel perturbation for watermark detection and resilience against attacks. In this paper, we highlight that an effective solution to the problem is to both inject and detect watermarks in the latent diffusion space, and propose Latent Watermark with a progressive training strategy. It weakens the direct connection between quality and robustness and thus alleviates their contradiction. We conduct evaluations on two datasets and against 10 watermark attacks. 6 metrics measure the image quality and watermark robustness. Results show that compared to the recently proposed methods such as StegaStamp, StableSignature, RoSteALS, and TreeRing, LW not only surpasses them in terms of robustness but also offers superior image quality. Our code will be available at https://github.com/RichardSunnyMeng/LatentWatermark.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# LAKE-RED:潜在背景知識検索拡散によるカモフラージュ画像の生成

LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion ( http://arxiv.org/abs/2404.00292v4 )

ライセンス: Link先を確認

Pancheng Zhao, Peng Xu, Pengda Qin, Deng-Ping Fan, Zhicheng Zhang, Guoli Jia, Bowen Zhou, Jufeng Yang,

(参考訳) カモフラージュされた視覚知覚は、多くの実用的な応用において重要な視覚課題である。高価な収集とラベル付けコストのため、このコミュニティはデータセットの種分類が少数の対象種に限られているという大きなボトルネックに直面している。しかし、既存のカモフラージュ生成法では、手動でバックグラウンドを指定する必要があるため、カモフラージュされたサンプルの多様性を低コストで拡張できない。本稿では,カモフラージュ画像生成のための潜在背景知識検索拡散(LAKE-RED)を提案する。 1) 背景入力を受信する必要のないカモフラージュ生成パラダイムを提案する。 2) LAKE-REDは, カモフラージュ生成のための解釈可能性を持つ最初の知識検索拡張手法であり, タスク固有の課題を軽減するために, 知識検索と推論の強化を明示的に分離する考え方を提案する。さらに,本手法は特定の前景的対象や背景に限らず,より多様な領域に視知覚を拡大する可能性がある。実験の結果,提案手法は既存の手法よりも優れ,よりリアルなカモフラージュ画像を生成することがわかった。

Camouflaged vision perception is an important vision task with numerous practical applications. Due to the expensive collection and labeling costs, this community struggles with a major bottleneck that the species category of its datasets is limited to a small number of object species. However, the existing camouflaged generation methods require specifying the background manually, thus failing to extend the camouflaged sample diversity in a low-cost manner. In this paper, we propose a Latent Background Knowledge Retrieval-Augmented Diffusion (LAKE-RED) for camouflaged image generation. To our knowledge, our contributions mainly include: (1) For the first time, we propose a camouflaged generation paradigm that does not need to receive any background inputs. (2) Our LAKE-RED is the first knowledge retrieval-augmented method with interpretability for camouflaged generation, in which we propose an idea that knowledge retrieval and reasoning enhancement are separated explicitly, to alleviate the task-specific challenges. Moreover, our method is not restricted to specific foreground targets or backgrounds, offering a potential for extending camouflaged vision perception to more diverse domains. (3) Experimental results demonstrate that our method outperforms the existing approaches, generating more realistic camouflage images.

翻訳日:2024-07-16 04:57:27 公開日:2024-07-12

# SceneGraphLoc: 3D Scene Graph上でのクロスモーダル粗なビジュアルローカライゼーション

SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs ( http://arxiv.org/abs/2404.00469v3 )

ライセンス: Link先を確認

Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Dániel Béla Baráth,

(参考訳) 本稿では,3次元シーングラフのデータベースで表されるマルチモーダル参照マップ内の入力画像の局所化という,新たな問題を紹介する。これらのグラフは、オブジェクトレベルの点雲、画像、属性、オブジェクト間の関係を含む複数のモードから構成されており、広範囲な画像データベースに依存する従来の方法に対する軽量で効率的な代替手段を提供する。提案手法であるSceneGraphLocは、利用可能なモダリティを考慮し、シーングラフ内の各ノード(すなわちオブジェクトインスタンスを表す)に対する固定サイズの埋め込みを学習し、入力されたクエリ画像に表示されるオブジェクトとの効果的なマッチングを可能にする。この戦略は、地図埋め込みにイメージを組み込むことなく、他のクロスモーダル手法よりも大幅に優れている。画像を利用する場合、SceneGraphLocは、大規模な画像データベースに依存する最先端技術に近いパフォーマンスを達成すると同時に、3つの命令の保存を減らし、命令の処理を高速化する。コードは公開されます。

We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given the available modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing an object instance) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map embeddings. When images are leveraged, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. The code will be made public.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# インストラクションチューニングによる顔の感情行動解析

Facial Affective Behavior Analysis with Instruction Tuning ( http://arxiv.org/abs/2404.05052v2 )

ライセンス: Link先を確認

Yifan Li, Anh Dao, Wentao Bao, Zhen Tan, Tianlong Chen, Huan Liu, Yu Kong,

(参考訳) 顔の感情行動分析(FABA)は、画像から人間の精神状態を理解するために重要である。しかし、従来のアプローチは、主に個別の感情カテゴリーを識別するためのモデルをデプロイし、複雑な顔の振る舞いに対する細かい粒度と推論能力が欠如している。 MLLM(Multi-modal Large Language Models)の出現は、一般的な視覚的理解タスクにおいて成功している。しかし、データセットやベンチマークの不足、顔の事前知識の無視、トレーニング効率の低下など、MLLMを直接FABAに活用することは難しい。これらの課題に対処するために、私たちは i)2つのFABAタスクのための指示追従データセット。例えば、感情と行動単位認識。 (ii)認識能力と生成能力の両方を考慮した新しい指標を持つベンチマークFABA-Bench (三)コミュニティの強力な基盤となる新しいMLLM「エモラ」。データセットとベンチマークに関する我々のイニシアチブは、顔の感情行動の性質と理性、すなわち、きめ細かい顔の動き、解釈可能性、推論を明らかにする。さらに,FABA MLLMを効果的かつ効率的に構築するために,顔構造知識と低ランク適応モジュールを事前訓練したMLLMに導入する。 FABA-Benchと4つの一般的なFABAデータセットについて広範な実験を行った。以上の結果から,提案した顔前エキスパートはパフォーマンスを向上し,EmoLAはFABA-Benchで最高の結果を得ることができた。一般的に使用されるFABAデータセットでは、EmoLAはタスク固有の最先端モデルと競合する。

Facial affective behavior analysis (FABA) is crucial for understanding human mental states from images. However, traditional approaches primarily deploy models to discriminate among discrete emotion categories, and lack the fine granularity and reasoning capability for complex facial behaviors. The advent of Multi-modal Large Language Models (MLLMs) has been proven successful in general visual understanding tasks. However, directly harnessing MLLMs for FABA is challenging due to the scarcity of datasets and benchmarks, neglecting facial prior knowledge, and low training efficiency. To address these challenges, we introduce (i) an instruction-following dataset for two FABA tasks, e.g., emotion and action unit recognition, (ii) a benchmark FABA-Bench with a new metric considering both recognition and generation ability, and (iii) a new MLLM "EmoLA" as a strong baseline to the community. Our initiative on the dataset and benchmarks reveal the nature and rationale of facial affective behaviors, i.e., fine-grained facial movement, interpretability, and reasoning. Moreover, to build an effective and efficient FABA MLLM, we introduce a facial prior expert module with face structure knowledge and a low-rank adaptation module into pre-trained MLLM. We conduct extensive experiments on FABA-Bench and four commonly-used FABA datasets. The results demonstrate that the proposed facial prior expert can boost the performance and EmoLA achieves the best results on our FABA-Bench. On commonly-used FABA datasets, EmoLA is competitive rivaling task-specific state-of-the-art models.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing

CLIPping the Limits: Finding the Sweet Spot for Relevant Images in Automated Driving Systems Perception Testing ( http://arxiv.org/abs/2404.05309v2 )

ライセンス: Link先を確認

Philipp Rigoll, Laurenz Adolph, Lennart Ries, Eric Sax,

(参考訳) 認識システム、特にカメラは自動走行システムの目玉だ。確実かつ堅牢に機能することを保証することは、車両の自動化において重要なビルディングブロックである。自動走行システムの認識をテストするには様々な方法がある。しかし、究極的には、それは常に特定の入力データの下での知覚システムの振舞いの調査に繋がる。カメラ画像は入力データの重要な部分である。そのため、自動走行システムのテストのために画像データセットが収集されるが、これらのデータセットに特定の画像を見つけることは容易ではない。ニューラルネットワークの最近の進歩により、自然言語のプロンプトと類似性に応じてデータセット内の画像をソートする手法が現在存在する。検索結果の提供をさらに自動化するために、これらのソート結果のしきい値定義を自動化し、結果としてプロンプトに関連する画像のみを返すことでコントリビューションを行う。私たちの焦点は、偽陽性と偽陰性を平等に防止することにあります。また,本手法が堅牢であり,仮定が満たされていない場合には,フォールバックソリューションを提供することも重要である。

Perception systems, especially cameras, are the eyes of automated driving systems. Ensuring that they function reliably and robustly is therefore an important building block in the automation of vehicles. There are various approaches to test the perception of automated driving systems. Ultimately, however, it always comes down to the investigation of the behavior of perception systems under specific input data. Camera images are a crucial part of the input data. Image data sets are therefore collected for the testing of automated driving systems, but it is non-trivial to find specific images in these data sets. Thanks to recent developments in neural networks, there are now methods for sorting the images in a data set according to their similarity to a prompt in natural language. In order to further automate the provision of search results, we make a contribution by automating the threshold definition in these sorted results and returning only the images relevant to the prompt as a result. Our focus is on preventing false positives and false negatives equally. It is also important that our method is robust and in the case that our assumptions are not fulfilled, we provide a fallback solution.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# BISCUIT:計算ノートにおける一時UIによるLLM生成コードの共有

BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks ( http://arxiv.org/abs/2404.07387v3 )

ライセンス: Link先を確認

Ruijia Cheng, Titus Barik, Alan Leung, Fred Hohman, Jeffrey Nichols,

(参考訳) プログラマは計算ノートブックの機械学習チュートリアルに頻繁に携わり、大規模言語モデル(LLM)に基づいたコード生成技術を採用してきた。しかし、LLMが生成したコードを理解し、操作することの難しさに直面する。これらの課題を軽減するため,ユーザプロンプトとコード生成の中間段階としてユーザUIスキャフォールドを提供するとともに,LLMベースのコード生成を一時UIステップで強化する新しいワークフローを計算ノートに導入する。このワークフローは、JupyterLabの拡張機能であるBISCUITで、ユーザに対して、コードと意図のコンテキストに基づいてLLMが生成した短命なUIを提供し、ユーザがLLM生成コードを理解し、ガイドし、探索するための足場を提供する。 10人の初心者が機械学習チュートリアルにBISCUITを使用したユーザスタディを通じて、BISCUITはユーザの理解を助けるためのコードの表現を提供し、迅速なエンジニアリングの複雑さを低減し、ユーザが異なる変数を探索し、アイデアを反復するための遊び場を作成する。

Programmers frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through a user study where 10 novices used BISCUIT for machine learning tutorials, we found that BISCUIT offers users representations of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# オムニサット:地球観測のための自己監督されたモーダリティ融合

OmniSat: Self-Supervised Modality Fusion for Earth Observation ( http://arxiv.org/abs/2404.08351v2 )

ライセンス: Link先を確認

Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu,

(参考訳) 地球観測(EO)の分野は、多様なセンサーからの豊富なデータを提供し、自己監督型マルチモーダル学習を前進させる大きな機会を提供する。しかし、現在のマルチモーダルなEOデータセットとモデルは、単一のデータタイプ、すなわちモノデート画像または時系列に焦点を合わせ、表現性を制限している。 OmniSatは,複数のEOモダリティ間の空間的アライメントを利用して,ラベルのない表現型マルチモーダル表現を学習する新しいアーキテクチャである。異なる性質のモダリティを組み合わせる利点を示すため、既存の2つのデータセットを新しいモダリティで拡張する。下流の3つの課題:林業、土地被覆分類、作物マッピング。 OmniSatは、教師なしの方法でリッチな表現を学習することができ、推論に1つのモダリティしか利用できない場合でも、半教師付き設定と完全教師付き設定のパフォーマンスが改善される。コードとデータセットはhttps://github.com/gastruc/OmniSat.comで入手できる。

The field of Earth Observations (EO) offers a wealth of data from diverse sensors, presenting a great opportunity for advancing self-supervised multimodal learning. However, current multimodal EO datasets and models focus on a single data type, either mono-date images or time series, which limits their expressivity. We introduce OmniSat, a novel architecture that exploits the spatial alignment between multiple EO modalities to learn expressive multimodal representations without labels. To demonstrate the advantages of combining modalities of different natures, we augment two existing datasets with new modalities. As demonstrated on three downstream tasks: forestry, land cover classification, and crop mapping. OmniSat can learn rich representations in an unsupervised manner, leading to improved performance in the semi- and fully-supervised settings, even when only one modality is available for inference. The code and dataset are available at https://github.com/gastruc/OmniSat.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# OneActor: クラスタ駆動誘導による一貫性キャラクタ生成

OneActor: Consistent Character Generation via Cluster-Conditioned Guidance ( http://arxiv.org/abs/2404.10267v2 )

ライセンス: Link先を確認

Jiahao Wang, Caixia Yan, Haonan Lin, Weizhan Zhang, Mengmeng Wang, Tieliang Gong, Guang Dai, Hao Sun,

(参考訳) テキストから画像への拡散モデルは、高品質な画像生成でアーティストに恩恵を与える。しかし、彼らの確率的な性質は、アーティストが同じ主題の一貫性のあるイメージを作成するのを妨げる。既存の手法はこの課題に取り組み、様々な方法で一貫性のあるコンテンツを生成する。しかし、それらは外部の制限されたデータに依存するか、拡散モデルの高価なチューニングを必要とする。本稿では,OneActorと呼ばれる新しいワンショットチューニングパラダイムを提案する。学習したセマンティックガイダンスを通じてのみプロンプトによって駆動される一貫した主題生成を効率よく実行し、面倒なバックボーンチューニングを回避します。我々は、クラスタリングの観点から一貫した主題生成の目的を定式化し、クラスタ条件モデルの設計を導く。ワンショットチューニングパイプラインが共有するオーバーフィッティングの課題を軽減するため、補助的なサンプルによるチューニングを強化し、セマンティック補間とクラスタガイダンスという2つの推論戦略を考案する。これらの技術は後に、生成品質を著しく向上させるために検証される。包括的実験により,本手法は,良好な主観的整合性,即時整合性,高画質で,様々なベースラインに優れることが示された。提案手法は多目的生成が可能であり, 一般的な拡散拡張と互換性がある。さらに、チューニングベースのベースラインよりも4倍高速なチューニング速度を実現し、望めば推論時間の増加を回避できる。さらに、我々の知る限り、拡散モデルの意味空間が潜在空間と同じ補間性を持っていることを初めて証明する。この特性は、ファインジェネレーション制御のためのもう1つの有望なツールとして機能する。

Text-to-image diffusion models benefit artists with high-quality image generation. Yet their stochastic nature hinders artists from creating consistent images of the same subject. Existing methods try to tackle this challenge and generate consistent content in various ways. However, they either depend on external restricted data or require expensive tuning of the diffusion model. For this issue, we propose a novel one-shot tuning paradigm, termed as OneActor. It efficiently performs consistent subject generation solely driven by prompts via a learned semantic guidance to bypass the laborious backbone tuning. We lead the way to formalize the objective of consistent subject generation from a clustering perspective, and thus design a cluster-conditioned model. To mitigate the overfitting challenge shared by one-shot tuning pipelines, we augment the tuning with auxiliary samples and devise two inference strategies: semantic interpolation and cluster guidance. These techniques are later verified to significantly enhance the generation quality. Comprehensive experiments show that our method outperforms a variety of baselines with satisfactory subject consistency, superior prompt conformity as well as high image quality. Our method is capable of multi-subject generation and compatible with popular diffusion extensions. Besides, we achieve a 4 times faster tuning speed than tuning-based baselines and, if desired, avoid increasing inference time. Furthermore, to our best knowledge, we are the first to prove that the semantic space of the diffusion model has the same interpolation property as the latent space does. This property can serve as another promising tool for fine generation control.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# VRにおける視線駆動認証性能のベースライン構築:超大規模データセットに関する第1報

Establishing a Baseline for Gaze-driven Authentication Performance in VR: A Breadth-First Investigation on a Very Large Dataset ( http://arxiv.org/abs/2404.11798v2 )

ライセンス: Link先を確認

Dillon Lohr, Michael J. Proulx, Oleg Komogortsev,

(参考訳) 本稿では,9202人の視線追跡(ET)信号品質を現代消費者向けバーチャルリアリティ(VR)プラットフォームと同等とした非常に大規模な視線記録データセットを用いて,視線駆動型認証性能のベースラインを確立するための重要な作業を行う。採用データセットのサイズは、少なくとも以前の関連する作業から得られた他のデータセットよりも大きくなっています。本モデルでは,眼の視軸と視軸の両眼的推定値と,眼球運動の受入と検証に最低限の期間を要し,偽受容率(FAR)で3%未満の偽拒絶率(FRR)を5万分の1で達成する。ギャラリーサイズとともに減少する識別精度については,ギャラリーサイズが148,000以上の場合,我々のモデルがチャンスレベルの精度を下回ると推定する。我々の主要な発見は、最先端の機械学習アーキテクチャと十分に大きなトレーニングデータセットによって駆動される場合、視線認証はFIDO標準で必要とされるように正確であることを示している。

This paper performs the crucial work of establishing a baseline for gaze-driven authentication performance to begin answering fundamental research questions using a very large dataset of gaze recordings from 9202 people with a level of eye tracking (ET) signal quality equivalent to modern consumer-facing virtual reality (VR) platforms. The size of the employed dataset is at least an order-of-magnitude larger than any other dataset from previous related work. Binocular estimates of the optical and visual axes of the eyes and a minimum duration for enrollment and verification are required for our model to achieve a false rejection rate (FRR) of below 3% at a false acceptance rate (FAR) of 1 in 50,000. In terms of identification accuracy which decreases with gallery size, we estimate that our model would fall below chance-level accuracy for gallery sizes of 148,000 or more. Our major findings indicate that gaze authentication can be as accurate as required by the FIDO standard when driven by a state-of-the-art machine learning architecture and a sufficiently large training dataset.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# 英語からウクライナ語への機械翻訳を改良したデータプリンタのセットアップ

Setting up the Data Printer with Improved English to Ukrainian Machine Translation ( http://arxiv.org/abs/2404.15196v2 )

ライセンス: Link先を確認

Yurii Paniv, Dmytro Chaplynskyi, Nikita Trynus, Volodymyr Kyrylov,

(参考訳) ウクライナ語のための大規模な言語モデルを構築するには、自然言語で表現された大量の新しいアルゴリズムタスクでコーパスを拡張する必要がある。英語で表現されたタスクパフォーマンスの例は豊富であるため、高品質な翻訳システムでは、コミュニティがデータセットを高速にキュレートすることが可能になります。この目的を達成するために、ウクライナ語と英語の3M対のノイズの多い並列データセットを用いた大規模事前学習言語モデルの教師付き微調整を用いた翻訳システムの構築法を紹介し、それに続いて、k-fold perplexity filtering(k-fold perplexity filtering)によって選択された17K例を高品質のデータセット上で選択した第2フェーズのトレーニングを行う。我々のデコーダのみのモデルであるDragomanは、FLORESのデペレーティングセットにおける従来の最先端のエンコーダ-デコーダモデルのパフォーマンスを上回りました。

To build large language models for Ukrainian we need to expand our corpora with large amounts of new algorithmic tasks expressed in natural language. Examples of task performance expressed in English are abundant, so with a high-quality translation system our community will be enabled to curate datasets faster. To aid this goal, we introduce a recipe to build a translation system using supervised finetuning of a large pretrained language model with a noisy parallel dataset of 3M pairs of Ukrainian and English sentences followed by a second phase of training using 17K examples selected by k-fold perplexity filtering on another dataset of higher quality. Our decoder-only model named Dragoman beats performance of previous state of the art encoder-decoder models on the FLORES devtest set.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# TOP-Nav:Terrin, Obstacle, Proprioception Estimationを統合した脚付きナビゲーション

TOP-Nav: Legged Navigation Integrating Terrain, Obstacle and Proprioception Estimation ( http://arxiv.org/abs/2404.15256v3 )

ライセンス: Link先を確認

Junli Ren, Yikai Liu, Yingru Dai, Junfeng Long, Guijin Wang,

(参考訳) 脚のついたナビゲーションは通常、オープンワールド、オフロード、挑戦的な環境で検査される。これらのシナリオでは、外乱を推定するには、多重モーダル情報の複雑な合成が必要である。これは、主に障害を避けることに焦点を当てた既存の作業において、大きな制限となる。本研究では,包括的パスプランナとTerrain認識,Obstacle回避,クローズループプロプライオセプションを統合した新しい脚付きナビゲーションフレームワークTOP-Navを提案する。 TOP-Navは、経路計画と運動計画の両方において、視覚とプロプレセプションの相乗効果を強調している。経路プランナ内では、障害物を効果的に回避しつつ、高い走行性を有する地形上の経路をロボットが選択できる地形推定器を提示し、統合する。動作計画レベルでは、ナビゲーションコマンドを追跡するために移動制御器を実装できるだけでなく、経路プランナーに動作評価を提供するための受容アドバイザも構築する。クローズループ動作フィードバックに基づいて、視覚に基づく地形と障害物推定のオンライン修正を行う。そのため、TOP-Navは、ロボットが以前の知識の分布を超えて地形や乱れを扱えるように、オープンワールドナビゲーションを実現し、視覚条件によって課される制約を克服する。 TOP-Navは、シミュレーションと実世界の環境の両方で実施された広範な実験に基づいて、既存の手法と比較して、オープンワールドナビゲーションにおいて優れた性能を示す。

Legged navigation is typically examined within open-world, off-road, and challenging environments. In these scenarios, estimating external disturbances requires a complex synthesis of multi-modal information. This underlines a major limitation in existing works that primarily focus on avoiding obstacles. In this work, we propose TOP-Nav, a novel legged navigation framework that integrates a comprehensive path planner with Terrain awareness, Obstacle avoidance and close-loop Proprioception. TOP-Nav underscores the synergies between vision and proprioception in both path and motion planning. Within the path planner, we present and integrate a terrain estimator that enables the robot to select waypoints on terrains with higher traversability while effectively avoiding obstacles. In the motion planning level, we not only implement a locomotion controller to track the navigation commands, but also construct a proprioception advisor to provide motion evaluations for the path planner. Based on the close-loop motion feedback, we make online corrections for the vision-based terrain and obstacle estimations. Consequently, TOP-Nav achieves open-world navigation that the robot can handle terrains or disturbances beyond the distribution of prior knowledge and overcomes constraints imposed by visual conditions. Building upon extensive experiments conducted in both simulation and real-world environments, TOP-Nav demonstrates superior performance in open-world navigation compared to existing methods.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# ODMixer:Metro Origin-Destination Predictionのための微細な時空間MLP

ODMixer: Fine-grained Spatial-temporal MLP for Metro Origin-Destination Prediction ( http://arxiv.org/abs/2404.15734v2 )

ライセンス: Link先を確認

Yang Liu, Binglin Chen, Yongsen Zheng, Guanbin Li, Liang Lin,

(参考訳) Metro Origin-Destination (OD) 予測は、都市コンピューティングにおいて重要な時空間予測課題であり、メトロスケジューリングを最適化し、全体の輸送効率を向上させるために、クロスステーションライダーシップを正確に予測することを目的としている。駅間の細粒度および包括的関係を効果的に分析することは、メトロOD予測に不可欠である。しかし、既存の地下鉄のODモデルは、駅の視点で複数のODペアからの情報や、ODペアのサブセットにのみ焦点を合わせている。これらのアプローチはODペア間の微細な関係を見落とし、潜在的な異常な状態を予測するのに困難をもたらす可能性がある。これらの課題に対処するために、すべてのODペアの観点からトラフィックの変動を分析し、ODMixerというメトロOD予測のための微粒な時空間MLPアーキテクチャを提案する。具体的には、ODMixerは二重分岐構造を持ち、Channel Mixer、Multi-view Mixer、Bidirectional Trend Learnerを含む。 Channel MixerはODペア間の短期的時間的関係を捉えることを目的としており、Multi-view Mixerは起源と目的地の両方の観点から関係を捉えることに集中している。長期的な時間的関係をモデル化するために,双方向トレンド学習システムを導入する。大規模OD予測データセットHZMODとSHMOの大規模な実験により,ODMixerの利点が示された。私たちのコードはhttps://github.com/KLatitude/ODMixer.comから入手可能です。

Metro Origin-Destination (OD) prediction is a crucial yet challenging spatial-temporal prediction task in urban computing, which aims to accurately forecast cross-station ridership for optimizing metro scheduling and enhancing overall transport efficiency. Analyzing fine-grained and comprehensive relations among stations effectively is imperative for metro OD prediction. However, existing metro OD models either mix information from multiple OD pairs from the station's perspective or exclusively focus on a subset of OD pairs. These approaches may overlook fine-grained relations among OD pairs, leading to difficulties in predicting potential anomalous conditions. To address these challenges, we analyze traffic variations from the perspective of all OD pairs and propose a fine-grained spatial-temporal MLP architecture for metro OD prediction, namely ODMixer. Specifically, our ODMixer has double-branch structure and involves the Channel Mixer, the Multi-view Mixer, and the Bidirectional Trend Learner. The Channel Mixer aims to capture short-term temporal relations among OD pairs, the Multi-view Mixer concentrates on capturing relations from both origin and destination perspectives. To model long-term temporal relations, we introduce the Bidirectional Trend Learner. Extensive experiments on two large-scale metro OD prediction datasets HZMOD and SHMO demonstrate the advantages of our ODMixer. Our code is available at https://github.com/KLatitude/ODMixer.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# FAD-SAR:深層学習に基づく合成開口レーダ画像による漁業活動検出システム

FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method ( http://arxiv.org/abs/2404.18245v2 )

ライセンス: Link先を確認

Yanbing Bai, Siao Li, Rui-Yang Ju, Zihao Yang, Jinze Yu, Jen-Shiun Chiang,

(参考訳) 違法で、報告されず、規制されていない(IUU)漁業活動は、人間の生活の様々な側面に深刻な影響を及ぼす。しかし,海洋におけるIUU漁活動の検出とモニタリングには限界がある。合成開口レーダ(SAR)は既存の容器検出システムを補完するが,従来の方法でのSAR画像から有用な情報を抽出することは,特にIUU漁では困難である。本稿では, SSD, RetinaNet, FSAF, FCOS, Faster R-CNN, Cascade R-CNNの6つの古典的物体検出モデルを用いて, xView3データセット上に実装された深層学習型漁獲活動検知システムを提案する。さらに、この研究は、より高速なR-CNNモデルの性能を向上させるために、異なる拡張技術を用いている。実験の結果,オンラインハードケースマイニング(OHEM)戦略を用いた高速R-CNNモデルのトレーニングにより,Avg-F1値が0.212から0.216に増加した。

Illegal, unreported, and unregulated (IUU) fishing activities seriously affect various aspects of human life. However, traditional methods for detecting and monitoring IUU fishing activities at sea have limitations. Although synthetic aperture radar (SAR) can complement existing vessel detection systems, extracting useful information from SAR images using traditional methods remains a challenge, especially in IUU fishing. This paper proposes a deep learning based fishing activity detection system, which is implemented on the xView3 dataset using six classical object detection models: SSD, RetinaNet, FSAF, FCOS, Faster R-CNN, and Cascade R-CNN. In addition, this work employs different enhancement techniques to improve the performance of the Faster R-CNN model. The experimental results demonstrate that training the Faster R-CNN model using the Online Hard Example Mining (OHEM) strategy increases the Avg-F1 value from 0.212 to 0.216.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# GRAMMAR:閉領域検索拡張言語モデルの評価のための基礎的およびモジュール的手法

GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model ( http://arxiv.org/abs/2404.19232v5 )

ライセンス: Link先を確認

Xinzhe Li, Ming Liu, Shang Gao,

(参考訳) Retrieval-augmented Generation (RAG) システムは、ドメイン固有の知識ベースを問うために、様々な産業で活発に研究され、展開されている。しかし、これらのシステムを評価することは、ドメイン固有のクエリの不足やそれに対応する基礎的な真実、そして障害の原因を診断するための体系的なアプローチの欠如など、ユニークな課題を示す。これらの課題に対処するために、GRAMMAR(GRounded and Modular Methodology for Assessment of RAG)という2つの要素からなる評価フレームワークを導入する。 1)リレーショナルデータベースとLCMを活用して,スケーラブルな問合せペアを効率よく作成し,評価を行うデータ生成プロセス。この方法は、クエリロジックを言語的バリエーションから分離しやすくし、非ロバストなテキスト形式に関する仮説の検証を可能にする。 2)知識ギャップと堅牢性を区別し,欠陥モジュールの識別を可能にする評価フレームワーク。我々の経験的結果は、モデル脆弱性を正確に識別するために、現在の基準フリー評価手法の限界とGRAMMARの信頼性を裏付けるものである。実装の詳細については、GitHubリポジトリを参照してください。

Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem from knowledge deficits or issues related to system robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising two key elements: 1) a data generation process that leverages relational databases and LLMs to efficiently produce scalable query-answer pairs for evaluation. This method facilitates the separation of query logic from linguistic variations, enabling the testing of hypotheses related to non-robust textual forms; and 2) an evaluation framework that differentiates knowledge gaps from robustness and enables the identification of defective modules. Our empirical results underscore the limitations of current reference-free evaluation approaches and the reliability of GRAMMAR to accurately identify model vulnerabilities. For implementation details, refer to our GitHub repository: https://github.com/xinzhel/grammar.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# ドメイン一般化のためのソフトプロンプト生成

Soft Prompt Generation for Domain Generalization ( http://arxiv.org/abs/2404.19286v2 )

ライセンス: Link先を確認

Shuanghao Bai, Yuedi Zhang, Wanqi Zhou, Zhirong Luan, Badong Chen,

(参考訳) 大規模な事前訓練された視覚言語モデル(VLM)は、手動で設計したプロンプトで下流のタスクに印象的なゼロショット能力を示している。 VLMを下流タスクにさらに適応させるために、ソフトプロンプトは、特定のドメインデータに基づいて微調整を行う手作業で設計されたプロンプトを置き換えることが提案されている。事前のプロンプト学習法は、主にトレーニングサンプルから固定されたプロンプトまたは予約されたプロンプトを学習する。しかし、学習したプロンプトは多様性を欠き、目に見えない領域に関する情報を無視する。本稿では,素早い学習フレームワークを生成的観点から再構築し,ドメイン一般化(DG)タスク,すなわちソフト・プロンプト・ジェネレーション(SPG)の簡易かつ効率的な手法を提案する。具体的には、SPGは2段階のトレーニングフェーズと推論フェーズから構成される。トレーニング期間中に、生成モデルドメイン知識を組み込んだソフトプロンプトラベルを各ドメインに導入する。推論フェーズでは、生成モデルのジェネレータを使用して、未知のターゲットドメインに対してインスタンス固有のソフトプロンプトを得る。 3つのDGタスクの5つの領域一般化ベンチマークの大規模な実験は、SPGが最先端のパフォーマンスを達成することを示す。コードはhttps://github.com/renytek13/Soft-Prompt-Generation-with-CGANで公開されている。

Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which undergoes fine-tuning based on specific domain data. Prior prompt learning methods primarily learn a fixed prompt or residuled prompt from training samples. However, the learned prompts lack diversity and ignore information about unseen domains. In this paper, we reframe the prompt learning framework from a generative perspective and propose a simple yet efficient method for the Domain Generalization (DG) task, namely Soft Prompt Generation (SPG). Specifically, SPG consists of a two-stage training phase and an inference phase. During the training phase, we introduce soft prompt label for each domain, aiming to incorporate the generative model domain knowledge. During the inference phase, the generator of the generative model is employed to obtain instance-specific soft prompts for the unseen target domain. Extensive experiments on five domain generalization benchmarks of three DG tasks demonstrate that SPG achieves state-of-the-art performance. The code is available at https://github.com/renytek13/Soft-Prompt-Generation-with-CGAN.

翻訳日:2024-07-16 04:47:43 公開日:2024-07-12

# WateRF:著作権保護分野におけるロバストな透かし

WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights ( http://arxiv.org/abs/2405.02066v4 )

ライセンス: Link先を確認

Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim,

(参考訳) NeRF(Neural Radiance Fields)研究の進歩は、様々な領域に広範な応用をもたらすが、著作権保護はまだ深く研究されていない。近年、NeRFベースの3D表現を安全に展開するための重要なソリューションの1つとして、NeRF透かしが検討されている。しかし、既存の手法は暗黙的あるいは明示的なNeRF表現にのみ適用するように設計されている。本研究では,NeRFの両表現に適用可能な革新的な透かし手法を提案する。これは、NeRFを微調整してバイナリメッセージをレンダリングプロセスに埋め込むことによって実現される。本稿では,NeRF空間における離散ウェーブレット変換を透かしに利用することを提案する。さらに、遅延バックプロパゲーション手法を採用し、パッチワイズ損失と組み合わせることで、最小トレードオフでレンダリング品質とビット精度を向上させる。提案手法は,2次元レンダリング画像に埋め込まれた透かしの容量,可視性,堅牢性の3つの異なる側面で評価する。本手法は、比較した最先端手法よりも高速なトレーニング速度で最先端性能を実現する。

The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# CausalLP:知識グラフの重み付きリンク予測による因果関係の学習

CausalLP: Learning causal relations with weighted knowledge graph link prediction ( http://arxiv.org/abs/2405.02327v2 )

ライセンス: Link先を確認

Utkarshani Jaimini, Cory Henson, Amit P. Sheth,

(参考訳) 因果ネットワークは、医療診断から製造における根本原因分析まで、幅広い用途で有用である。しかし、実際には因果関係が欠如しているため、因果関係は不完全であることが多い。本稿では,知識グラフ補完問題として不完全因果関係の問題を定式化するCausalLPという新しい手法を提案する。より具体的には、不完全な因果ネットワークにおける新たな因果関係を見つけるタスクを知識グラフリンク予測のタスクにマップする。因果関係を表すために知識グラフを用いることは、外部のドメイン知識の統合を可能にし、さらに複雑さとして、因果関係は知識グラフ内のエンティティ間の因果関係の強さを表す重みを持つ。 CausalLPでは、因果的説明と因果的予測という2つの主要なタスクがサポートされている。このアプローチの評価には、因果推論のためのシミュレーションビデオのベンチマークデータセットであるCLEVRER-Humansを使用し、複数の知識グラフ埋め込みアルゴリズムの性能を比較する。 2) 因果関係のマルコフ特性を利用した新しいデータ分割手法であるマルコフスプリット(Markov-based split) と, リンク予測アルゴリズムの評価に一般的に使用されるランダムスプリット(ランダムスプリット)と, マルコフスプリット(Markov-based split) の2つの異なるデータセット分割手法が評価に用いられている。その結果,重み付き因果関係を用いることで,重み付き関係を伴わないベースライン上の因果関係の予測が向上することがわかった。

Causal networks are useful in a wide variety of applications, from medical diagnosis to root-cause analysis in manufacturing. In practice, however, causal networks are often incomplete with missing causal relations. This paper presents a novel approach, called CausalLP, that formulates the issue of incomplete causal networks as a knowledge graph completion problem. More specifically, the task of finding new causal relations in an incomplete causal network is mapped to the task of knowledge graph link prediction. The use of knowledge graphs to represent causal relations enables the integration of external domain knowledge; and as an added complexity, the causal relations have weights representing the strength of the causal association between entities in the knowledge graph. Two primary tasks are supported by CausalLP: causal explanation and causal prediction. An evaluation of this approach uses a benchmark dataset of simulated videos for causal reasoning, CLEVRER-Humans, and compares the performance of multiple knowledge graph embedding algorithms. Two distinct dataset splitting approaches are used for evaluation: (1) random-based split, which is the method typically employed to evaluate link prediction algorithms, and (2) Markov-based split, a novel data split technique that utilizes the Markovian property of causal relations. Results show that using weighted causal relations improves causal link prediction over the baseline without weighted relations.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# コンフォーマル性, コンバブレーション, 偽装:多言語LLMコラボレーションにおけるペルソナの不整合

Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent LLM Collaboration ( http://arxiv.org/abs/2405.03862v2 )

ライセンス: Link先を確認

Razan Baltaji, Babak Hemmatian, Lav R. Varshney,

(参考訳) マルチエージェントAIシステムは、科学的および実践的な応用において、集合的な意思決定をシミュレートするために使用することができる。また、チャットボットパイプラインに多様なグループディスカッションステップを導入して、チャットボットの応答の文化的感受性を高めるためにも使用できる。しかしながら、これらのアプリケーションは、AIエージェントが割り当てられたペルソナを確実に採用し、人間のインタラクションを模倣する能力に基づいている。 LLMエージェントがこれらの要件を満たす能力を評価するために、カルチャーコラボレーションや議論に携わるAIエージェントのアンサンブルを、個人の反応やチャットの書き起こしを分析して検討する。本研究は, 多様な視点を反映した集団的意思決定を促すことが示唆されるが, この利益は, 対人的プレッシャーや一貫したペルソナや意見を維持する上での課題により, エージェントの適合性への感受性によって誘惑される。協力よりも意見を支持する上での議論を促す指示は、矛盾の度合いを増大させる。私たちが特定した要因に対処しない限り、より文化的に多様なAI出力や、グループ意思決定のより現実的なシミュレーションを生成するマルチエージェントフレームワークの潜在能力は未完成のままである。

Multi-agent AI systems can be used for simulating collective decision-making in scientific and practical applications. They can also be used to introduce a diverse group discussion step in chatbot pipelines, enhancing the cultural sensitivity of the chatbot's responses. These applications, however, are predicated on the ability of AI agents to reliably adopt assigned personas and mimic human interactions. To evaluate the ability of LLM agents to satisfy these requirements, we examine AI agent ensembles engaged in cultural collaboration and debate by analyzing their private responses and chat transcripts. Our findings suggest that multi-agent discussions can encourage collective decisions that reflect diverse perspectives, yet this benefit is tempered by the agents' susceptibility to conformity due to perceived peer pressure and challenges in maintaining consistent personas and opinions. Instructions that encourage debate in support of one's opinions rather than collaboration increase the rate of inconstancy. Without addressing the factors we identify, the full potential of multi-agent frameworks for producing more culturally diverse AI outputs or more realistic simulations of group decision-making will remain untapped.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# ViewFormer: View-Guided Transformer を用いた多視点3次元動作知覚のための時空間モデリング

ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers ( http://arxiv.org/abs/2405.04299v2 )

ライセンス: Link先を確認

Jinke Li, Xiao He, Chonghua Zhou, Xiaoqiang Cheng, Yang Wen, Dan Zhang,

(参考訳) シナリオを駆動するための高度な認識技術である3D占有は、物理空間をグリッドマップに定量化することで、前景と背景を区別することなく、シーン全体を表現している。画像特徴を3次元表現に変換するのに効率的で、広く採用されているプロジェクションファーストの変形可能な注意力は、センサーの配置制約によるマルチビュー機能集約の課題に遭遇する。この問題に対処するために,効果的な多視点特徴集約のための学習優先視点アテンション機構を提案する。さらに,マップ構築や3Dオブジェクト検出など,多視点3Dタスクにまたがるビューアテンションのスケーラビリティについても紹介する。提案するビューアテンションと,追加のマルチフレームストリーミング時間アテンションを活用して,時空間特徴アグリゲーションのための視覚中心のトランスフォーマーベースのフレームワークであるViewFormerを紹介する。占有レベルのフロー表現をさらに探求するため,既存の高品質データセット上に構築されたベンチマークであるFlowOcc3Dを紹介した。このベンチマークの質的および定量的分析は、きめ細かいダイナミックなシーンを表現する可能性を明らかにする。大規模な実験により,本手法は従来手法よりも有意に優れていたことがわかった。コードは \url{https://github.com/ViewFormerOcc/ViewFormer-Occ} で公開されている。

3D occupancy, an advanced perception technology for driving scenarios, represents the entire scene without distinguishing between foreground and background by quantifying the physical space into a grid map. The widely adopted projection-first deformable attention, efficient in transforming image features into 3D representations, encounters challenges in aggregating multi-view features due to sensor deployment constraints. To address this issue, we propose our learning-first view attention mechanism for effective multi-view feature aggregation. Moreover, we showcase the scalability of our view attention across diverse multi-view 3D tasks, including map construction and 3D object detection. Leveraging the proposed view attention as well as an additional multi-frame streaming temporal attention, we introduce ViewFormer, a vision-centric transformer-based framework for spatiotemporal feature aggregation. To further explore occupancy-level flow representation, we present FlowOcc3D, a benchmark built on top of existing high-quality datasets. Qualitative and quantitative analyses on this benchmark reveal the potential to represent fine-grained dynamic scenes. Extensive experiments show that our approach significantly outperforms prior state-of-the-art methods. The codes are available at \url{https://github.com/ViewFormerOcc/ViewFormer-Occ}.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# vAttention: PagedAttention のない LLM 実行のための動的メモリ管理

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention ( http://arxiv.org/abs/2405.04437v2 )

ライセンス: Link先を確認

Ramya Prabhu, Ajay Nayak, Jayashree Mohan, Ramachandran Ramjee, Ashish Panwar,

(参考訳) 高スループットLLM推論には,GPUメモリの効率的な管理が不可欠である。以前のシステムではKVキャッシュのメモリを前もって保存していたため、内部の断片化が原因で容量が無駄になった。需要パージングにインスパイアされたvLLMは、KV-cacheの動的メモリ割り当てを可能にするPagedAttentionを提案した。このアプローチは断片化を排除し、全体のサービスを改善する。しかし、物理メモリを動的に割り当てるために、PagedAttentionはKV-cacheのレイアウトを連続的な仮想メモリから連続しない仮想メモリに変更した。結果として、ページングをサポートするためにアテンションカーネルを書き換え、サービスフレームワークにメモリマネージャを実装する必要がある。これにより、パフォーマンスとプログラミングのオーバーヘッドと、最先端の注目カーネルを採用する際の移植性の問題の両方が生じる。本稿では,動的KVキャッシュメモリ管理のための新しいアプローチであるvAttentionを提案する。 PagedAttentionとは対照的に、vAttentionはKV-cacheを連続した仮想メモリに格納し、物理メモリのオンデマンド割り当てにOSサポートを活用する。 vAttentionは、コードを書き換えることなく、物理メモリの動的アロケーションのサポートを追加することで、最先端の注目カーネルをすぐに使えるようにする。我々は、vLLMサービススタックにvAttentionを実装し、FlashAttentionとFlashInferの最先端のPagedAttentionベースのカーネルに比べて、最大1.99倍のデコードスループット、最大1.22倍と1.29倍のエンドツーエンドサービススループットを向上させることを実証した。

Efficient management of GPU memory is essential for high throughput LLM inference. Prior systems used to reserve KV-cache memory ahead-of-time that resulted in wasted capacity due to internal fragmentation. Inspired by demand paging, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation and improves serving throughout. However, to be able to allocate physical memory dynamically, PagedAttention changes the layout of KV-cache from contiguous virtual memory to non-contiguous virtual memory. As a consequence, one needs to rewrite the attention kernels to support paging, and implement a memory manager in the serving framework. This results in both performance and programming overheads, as well as portability challenges in adopting state-of-the-art attention kernels. In this paper, we propose vAttention, a new approach for dynamic KV-cache memory management. In contrast to PagedAttention, vAttention stores KV-cache in contiguous virtual memory and leverages OS support for on-demand allocation of physical memory. vAttention thus enables one to use state-of-the art attention kernels out-of-the-box by adding support for dynamic allocation of physical memory without having to re-write their code. We implement vAttention in the vLLM serving stack to show that it also helps improve decode throughput by up to 1.99x over vLLM, and the end-to-end serving throughput by up to 1.22x and 1.29x, compared to using the state-of-the-art PagedAttention based kernels of FlashAttention and FlashInfer.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# ProLLM:タンパク質とタンパク質の相互作用予測のためのLLMの強化

ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction ( http://arxiv.org/abs/2405.06649v2 )

ライセンス: Link先を確認

Mingyu Jin, Haochen Xue, Zhenting Wang, Boming Kang, Ruosong Ye, Kaixiong Zhou, Mengnan Du, Yongfeng Zhang,

(参考訳) タンパク質-タンパク質相互作用(PPI)の予測は、生物学的機能や疾患を理解する上で重要である。 PPI予測に対する従来の機械学習アプローチは、主に直接物理的相互作用に焦点を当てており、中間タンパク質による非物理的接続の広いコンテキストを無視し、その効果を制限している。大規模言語モデル(LLM)の出現は、この複雑な生物学的課題に対処する新たな機会を提供する。構造化されたデータを自然言語のプロンプトに変換することで、タンパク質間の関係をテキストにマッピングできる。このアプローチにより、LLMはタンパク質間の間接的な接続を識別し、上流から下流への経路をトレースすることができる。そこで本研究では,PPIに適したLLMを用いた新しいフレームワークProLLMを提案する。具体的には、自然言語のプロンプトとしてシグナル伝達経路の生物学的機構を複製する、思考のタンパク質鎖(ProCoT)を提案する。 ProCoTはシグナル伝達経路を、上流タンパク質から始まり、いくつかの中間タンパク質を通過して下流タンパク質に生物学的シグナルを伝達するタンパク質推論過程とみなしている。したがって、上流タンパクと下流タンパクとの相互作用を予測するためにProCoTを使用することができる。 ProLLMのトレーニングには、複雑な生物学的問題に対するモデルの理解を深めるProCoTフォーマットが使用されている。本稿では,ProCoTに加えて,自然言語のプロンプトにタンパク質サイトを埋め込む方法の探索や,タンパク質知識データセットの微調整の指導にも貢献する。本稿では,ベンチマークデータセットに対する厳密な検証による ProLLM の有効性を実証し,予測精度と一般化性の観点から既存手法よりも大幅に向上したことを示す。コードは、https://github.com/MingyuJ666/ProLLM.comで入手できる。

The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases. Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions, ignoring the broader context of nonphysical connections through intermediate proteins, thus limiting their effectiveness. The emergence of Large Language Models (LLMs) provides a new opportunity for addressing this complex biological challenge. By transforming structured data into natural language prompts, we can map the relationships between proteins into texts. This approach allows LLMs to identify indirect connections between proteins, tracing the path from upstream to downstream. Therefore, we propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time. Specifically, we propose Protein Chain of Thought (ProCoT), which replicates the biological mechanism of signaling pathways as natural language prompts. ProCoT considers a signaling pathway as a protein reasoning process, which starts from upstream proteins and passes through several intermediate proteins to transmit biological signals to downstream proteins. Thus, we can use ProCoT to predict the interaction between upstream proteins and downstream proteins. The training of ProLLM employs the ProCoT format, which enhances the model's understanding of complex biological problems. In addition to ProCoT, this paper also contributes to the exploration of embedding replacement of protein sites in natural language prompts, and instruction fine-tuning in protein knowledge datasets. We demonstrate the efficacy of ProLLM through rigorous validation against benchmark datasets, showing significant improvement over existing methods in terms of prediction accuracy and generalizability. The code is available at: https://github.com/MingyuJ666/ProLLM.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# 静的AI評価を超えて: LLMの害とリスクに対する人間のインタラクション評価を前進させる

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks ( http://arxiv.org/abs/2405.10632v5 )

ライセンス: Link先を確認

Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung,

(参考訳) モデル評価は、AIシステムの安全性、リスク、社会的影響を理解する上で重要である。ほとんどの実世界のAIアプリケーションは人間とAIのインタラクションを含んでいるが、AIモデルの現在の評価(例えば、一般的なベンチマーク)はそうではない。その代わりに、人間的要因を限定的に組み込んで、モデルの安全性を個別に評価することで、人間とモデルの相互作用の複雑さを捉えることができない。本稿では,人-モデルインタラクションの評価や,モデルを用いた人-モデルインタラクションのプロセスと結果に焦点をあてた,新たな評価カテゴリ"ヒューマンインタラクション評価" (HIEs) の定義と運用について論じる。まず、HIEは安全性評価の妥当性を高め、直接人的影響と相互作用特異的害を評価し、モデルによる社会的影響の今後の評価を導くために使用できると論じる。第2に,安全性を重視したHIE設計フレームワーク(人-LLM相互作用分類を含む)について,(1)危険領域の同定,(2)使用状況の特徴付け,(3)評価パラメータの選択の3段階について提案する。第3に、過信と説得リスクの2つの潜在的評価に我々の枠組みを適用します。最後に,HIEのコスト,複製性,非表現性に関する懸念に対処するための具体的な勧告を述べる。

Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# グラフバックドア攻撃を再考する: 分散保存の観点から

Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective ( http://arxiv.org/abs/2405.10757v3 )

ライセンス: Link先を確認

Zhiwei Zhang, Minhua Lin, Enyan Dai, Suhang Wang,

(参考訳) グラフニューラルネットワーク(GNN)は、様々なタスクにおいて顕著なパフォーマンスを示している。しかし、最近の研究によると、GNNはバックドア攻撃に弱い。一般的に、バックドア攻撃は、トレーニンググラフ内の一連のノードにバックドアトリガとターゲットクラスラベルをアタッチすることで、グラフを毒する。有毒グラフでトレーニングされたGNNは、ターゲットクラスにトリガが付いたテストノードを予測するために誤解される。その効果にもかかわらず、我々の経験的分析は、既存の方法によって生成されるトリガーは、クリーンデータと大きく異なる分布外(OOD)である傾向があることを示している。したがって、これらのインジェクショントリガーは、現実世界のアプリケーションで広く使われている外れ値検出法で容易に検出および切断することができる。そこで本稿では,IDトリガによる無意味なグラフバックドア攻撃の新たな問題について検討する。我々は,IDトリガを生成するために,OOD検出器を逆学習戦略と組み合わせて導入し,分散中のトリガの属性を生成する。 IDトリガによる高い攻撃成功率を確保するため,有毒グラフで訓練した被害者モデルによるトリガ記憶の促進を目的とした新しいモジュールを提案する。実世界のデータセットに対する大規模な実験は、高い攻撃成功率を維持しながら、様々な防衛戦略をバイパスできる分散トリガの生成において、提案手法の有効性を実証している。

Graph Neural Networks (GNNs) have shown remarkable performance in various tasks. However, recent works reveal that GNNs are vulnerable to backdoor attacks. Generally, backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph. A GNN trained on the poisoned graph will then be misled to predict test nodes attached with trigger to the target class. Despite their effectiveness, our empirical analysis shows that triggers generated by existing methods tend to be out-of-distribution (OOD), which significantly differ from the clean data. Hence, these injected triggers can be easily detected and pruned with widely used outlier detection methods in real-world applications. Therefore, in this paper, we study a novel problem of unnoticeable graph backdoor attacks with in-distribution (ID) triggers. To generate ID triggers, we introduce an OOD detector in conjunction with an adversarial learning strategy to generate the attributes of the triggers within distribution. To ensure a high attack success rate with ID triggers, we introduce novel modules designed to enhance trigger memorization by the victim model trained on poisoned graph. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method in generating in distribution triggers that can by-pass various defense strategies while maintaining a high attack success rate.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# ベイズ学習によるクラスインクリメンタル学習のための原型コントラスト損失

Bayesian Learning-driven Prototypical Contrastive Loss for Class-Incremental Learning ( http://arxiv.org/abs/2405.11067v2 )

ライセンス: Link先を確認

Nisha L. Raichur, Lucas Heublein, Tobias Feigl, Alexander Rügamer, Christopher Mutschler, Felix Ott,

(参考訳) 連続学習における手法の主な目的は、破滅的な忘れ込みの有害な現象を軽減しつつ、データのストリームから連続的にタスクを学習することである。本稿では,従来のプロトタイプと新たに遭遇したプロトタイプの最適な表現を学習することに焦点を当てる。本稿では,クラス増分学習シナリオに特化して,ベイズ学習駆動型コントラスト損失(BLCL)を持つプロトタイプネットワークを提案する。そこで我々は,クラス間距離を小さくし,クラス間距離を増大させることにより,新しいクラスを潜在表現に組み込むコントラスト的損失を導入する。提案手法は,ベイズ学習手法を用いて,クロスエントロピーとコントラスト損失関数のバランスを動的に適用する。 CIFAR-10 と CIFAR-100 による画像分類と干渉分類のための GNSS ベースデータセットの画像化による実験的な評価により,提案手法の有効性を検証し,既存の最先端手法よりも優れていることを示す。

The primary objective of methods in continual learning is to learn tasks in a sequential manner over time from a stream of data, while mitigating the detrimental phenomenon of catastrophic forgetting. In this paper, we focus on learning an optimal representation between previous class prototypes and newly encountered ones. We propose a prototypical network with a Bayesian learning-driven contrastive loss (BLCL) tailored specifically for class-incremental learning scenarios. Therefore, we introduce a contrastive loss that incorporates new classes into the latent representation by reducing the intra-class distance and increasing the inter-class distance. Our approach dynamically adapts the balance between the cross-entropy and contrastive loss functions with a Bayesian learning technique. Empirical evaluations conducted on both the CIFAR-10 and CIFAR-100 dataset for image classification and images of a GNSS-based dataset for interference classification validate the efficacy of our method, showcasing its superiority over existing state-of-the-art approaches.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# リカレントグラフニューラルネットワークのリアルとフロートによる論理的特性評価

Logical Characterizations of Recurrent Graph Neural Networks with Reals and Floats ( http://arxiv.org/abs/2405.14606v2 )

ライセンス: Link先を確認

Veeti Ahvonen, Damian Heiman, Antti Kuusisto, Carsten Lutz,

(参考訳) 2019年の先駆的な研究の中で、Barcel\'o氏と共著者は、一階述語論理で定義可能な特性に対して、定数反復深度グラフニューラルネットワーク(GNN)の表現力に正確に一致するロジックを特定した。本稿では,(1)浮動小数点数の設定と(2)実数の設定の2つのシナリオにおいて,繰り返しGNNの正確な論理的特徴を与える。フロートに対して、繰り返しGNNと一致する形式主義は数えられる規則に基づくモーダル論理であり、実数に対しては数えるにも適切な無限のモーダル論理を用いる。これらの結果は、どちらの場合もバックグラウンド論理に関連付けることなく、繰り返し設定における論理とGNNの正確な一致を与えるが、浮動小数点演算に関する自然な仮定を用いる。キャラクタリゼーションを適用することで、モナディック二階述語論理(MSO)で定義可能なグラフ特性と比較して、無限論理と規則論理は等しく表現力があることも証明できる。これは、実数とフロートを持つリカレントGNNが、MSO定義可能な性質に対して同じ表現力を持つことを意味し、そのような性質に対して、実数を持つリカレントGNNも(最終!)ルールに基づくモーダル論理によって特徴づけられることを示している。一般的には、フロートによる表現力は実数よりも弱い。論理指向の結果に加えて、分散オートマトンを用いて、実数とフロートの両方を持つ繰り返しGNNを特徴付け、分散コンピューティングモデルへのリンクを描画する。

In pioneering work from 2019, Barcel\'o and coauthors identified logics that precisely match the expressive power of constant iteration-depth graph neural networks (GNNs) relative to properties definable in first-order logic. In this article, we give exact logical characterizations of recurrent GNNs in two scenarios: (1) in the setting with floating-point numbers and (2) with reals. For floats, the formalism matching recurrent GNNs is a rule-based modal logic with counting, while for reals we use a suitable infinitary modal logic, also with counting. These results give exact matches between logics and GNNs in the recurrent setting without relativising to a background logic in either case, but using some natural assumptions about floating-point arithmetic. Applying our characterizations, we also prove that, relative to graph properties definable in monadic second-order logic (MSO), our infinitary and rule-based logics are equally expressive. This implies that recurrent GNNs with reals and floats have the same expressive power over MSO-definable properties and shows that, for such properties, also recurrent GNNs with reals are characterized by a (finitary!) rule-based modal logic. In the general case, in contrast, the expressive power with floats is weaker than with reals. In addition to logic-oriented results, we also characterize recurrent GNNs, with both reals and floats, via distributed automata, drawing links to distributed computing models.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# Alistair: 差分生産型広告測定システムのためのデバイス上での効率的な予算化

Alistair: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems ( http://arxiv.org/abs/2405.16719v2 )

ライセンス: Link先を確認

Pierre Tholoniat, Kelly Kostopoulou, Peter McNeely, Prabhpreet Singh Sodhi, Anirudh Varanasi, Benjamin Case, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer,

(参考訳) 主要なブラウザからのサードパーティ製クッキーの削除や、新しいプライバシー保護広告APIの導入によって、調査コミュニティは、Webのプライバシーを質的に改善する業界を支援する機会を、タイムリーに持っている。本稿では、既存のプライバシー保護広告計測APIを強化するため、W3Cコミュニティグループ内での取り組みについて論じる。 Google、Apple、Meta、Mozillaのデザインを分析し、より厳格で効率的な差分プライバシー(DP)予算コンポーネントでそれらを強化します。当社のアプローチはAlistairと呼ばれ、明確に定義されたDP保証を強制し、広告主がより正確なプライベートな測定クエリを実行できるようにする。 DPの個々の形態でプライバシー保証をフレーミングすることで、従来のDP定義を使用するシステムよりもDP予算を効率的にすることができる。 AlistairをChromeに組み込んで、マイクロベンチマークや広告データセットで評価します。すべてのワークロードにおいて、Alistairは、同等のDP保護の下でより多くの広告測定を可能にする点で、ベースラインを著しく上回る。

With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Alistair, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Alistair into Chrome and evaluate it on microbenchmarks and advertising datasets. Across all workloads, Alistair significantly outperforms baselines in enabling more advertising measurements under comparable DP protection.

翻訳日:2024-07-16 04:37:57 公開日:2024-07-12

# Mini-Netによる医用画像分割の促進:医用画像の効率的な分別を目的とした軽量化

Advancing Medical Image Segmentation with Mini-Net: A Lightweight Solution Tailored for Efficient Segmentation of Medical Images ( http://arxiv.org/abs/2405.17520v3 )

ライセンス: Link先を確認

Syed Javed, Tariq M. Khan, Abdul Qayyum, Arcot Sowmya, Imran Razzak,

(参考訳) 医用画像における解剖学的構造と異常の正確なセグメンテーションは,コンピュータによる診断・解析に不可欠である。このタスクではディープラーニングの技術が優れていますが、その計算要求は課題を引き起こします。また, 一般的な物体分割には有効であるが, 医用画像には最適でない部分分割法もある。これらの課題に対処するために,医用画像に特化して設計された軽量セグメンテーションネットワークであるMini-Netを提案する。パラメータが38,000未満のMini-Netは、高周波数と低周波数の両方の機能を効率的にキャプチャし、様々な医療画像シナリオにおけるリアルタイムのアプリケーションを可能にする。 DRIVE, STARE, ISIC-2016, ISIC-2018, MoNuSegなどの各種データセット上でMini-Netを評価し, 最先端手法と比較して, その堅牢性と優れた性能を示す。

Accurate segmentation of anatomical structures and abnormalities in medical images is crucial for computer-aided diagnosis and analysis. While deep learning techniques excel at this task, their computational demands pose challenges. Additionally, some cutting-edge segmentation methods, though effective for general object segmentation, may not be optimised for medical images. To address these issues, we propose Mini-Net, a lightweight segmentation network specifically designed for medical images. With fewer than 38,000 parameters, Mini-Net efficiently captures both high- and low-frequency features, enabling real-time applications in various medical imaging scenarios. We evaluate Mini-Net on various datasets, including DRIVE, STARE, ISIC-2016, ISIC-2018, and MoNuSeg, demonstrating its robustness and good performance compared to state-of-the-art methods.

翻訳日:2024-07-16 04:27:57 公開日:2024-07-12

# ブロックチェーン検証器のジレンマに対するPeer-Predictionソリューションは2つ

It Takes Two: A Peer-Prediction Solution for Blockchain Verifier's Dilemma ( http://arxiv.org/abs/2406.01794v2 )

ライセンス: Link先を確認

Zishuo Zhao, Xi Chen, Yuan Zhou,

(参考訳) ブロックチェーンシステムのセキュリティは、基本的には、大多数の当事者が誠実に振る舞う分散コンセンサスに基づいており、ブロックチェーンシステムの堅牢性を維持するためには、コンテンツ検証のプロセスが不可欠である。しかし、不正行為者が少ない、あるいは全くないセキュアなブロックチェーンシステムが、検証者が正直に検証を行うのに十分なインセンティブを与えられないという現象は、検証者のジレンマと呼ばれ、ブロックチェーンシステムの基本的なセキュリティを著しく損なう可能性がある。既存の研究は遅延検証の非インセンティブ化のために意図的にエラーを挿入しようと試みているが、分散環境は検証の正しさを判断したり、悪意のある検証を直接検出することは不可能である。本稿では,複数の検証者間での分散検証ゲームのためのベイズ的真理機構の設計に対するピア予測手法を活用する研究を開始し,検証プロセスにおけるノイズ観測の存在下においても,基礎的真理にアクセスせずに誠実な検証を行うよう,検証者全員にインセンティブを与える。理論的に検証ゲームのメカニズムの真実性を保証することで、当社の作業は、ブロックチェーンやその他の分散システムのセキュリティと堅牢性を向上する検証メカニズムのフレームワークを提供します。

The security of blockchain systems is fundamentally based on the decentralized consensus in which the majority of parties behave honestly, and the process of content verification is essential to keep the robustness of blockchain systems. However, the phenomenon that a secure blockchain system with few or no cheaters could not provide sufficient incentive for verifiers to honestly perform the costly verification, referred to as the Verifier's Dilemma, could severely undermine the fundamental security of blockchain systems. While existing works have attempted to insert deliberate errors to disincentivize lazy verification, the decentralized environment makes it impossible to judge the correctness of verification or detect malicious verifiers directly. In this paper, we initiate the research that leverages the peer prediction approach towards the design of Bayesian truthful mechanisms for the decentralized verification game among multiple verifiers, incentivizing all verifiers to perform honest verification without access to the ground truth even in the presence of noisy observations in the verification process. With theoretically guaranteed truthfulness of our mechanism for the verification game, our work provides a framework of verification mechanisms that enhances the security and robustness of the blockchain and potentially other decentralized systems.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# Pseudo-Label Filtering for Continual Test-Time Adaptation

Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation ( http://arxiv.org/abs/2406.02609v2 )

ライセンス: Link先を確認

Jiayao Tan, Fan Lyu, Chenggong Ni, Tingliang Feng, Fuyuan Hu, Zhang Zhang, Shaochuang Zhao, Liang Wang,

(参考訳) 連続的テスト時間適応(CTTA)は、ソースデータにアクセスすることなく、テストフェーズ中に対象ドメインのシーケンスに事前訓練されたモデルを適用することを目的としている。未知のドメインからのラベルのないデータに適応するために、既存のメソッドは、すべてのサンプルに対して擬似ラベルを構築し、自己学習を通じてモデルを更新する。しかし、これらの擬似ラベルは、しばしばノイズを伴い、適応が不十分になる。 Pseudo Labeling Filter (PLF) と呼ばれるCTTAの擬似ラベル選択法を提案する。 PLFの鍵となる考え方は、擬似ラベルの適切なしきい値を選択し続け、自己学習のための信頼できるしきい値を特定することである。具体的には、初期化、成長、多様性を含む、継続的なドメイン学習の間にしきい値を設定するための3つの原則を提示します。これらの原則に基づいて、擬似ラベルをフィルタするために自己適応型閾値を設計する。さらに、未知のドメインサンプルに対して多様な予測を行うようモデルに促すために、クラス優先アライメント(CPA)手法を導入する。広範な実験を通じて、PLFは現在の最先端の手法よりも優れており、CTTAにおいてその効果が証明されている。

Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient adaptation. To improve the quality of pseudo-labels, we propose a pseudo-label selection method for CTTA, called Pseudo Labeling Filter (PLF). The key idea of PLF is to keep selecting appropriate thresholds for pseudo-labels and identify reliable ones for self-training. Specifically, we present three principles for setting thresholds during continuous domain learning, including initialization, growth and diversity. Based on these principles, we design Self-Adaptive Thresholding to filter pseudo-labels. Additionally, we introduce a Class Prior Alignment (CPA) method to encourage the model to make diverse predictions for unknown domain samples. Through extensive experiments, PLF outperforms current state-of-the-art methods, proving its effectiveness in CTTA.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# 大規模都市道路網における交通信号の最適化 -Isingモデルを用いた適応予測制御-

Traffic signal optimization in large-scale urban road networks: an adaptive-predictive controller using Ising models ( http://arxiv.org/abs/2406.03690v2 )

ライセンス: Link先を確認

Daisuke Inoue, Hiroshi Yamashita, Kazuyuki Aihara, Hiroaki Yoshida,

(参考訳) カーボン中立性を達成するためには,スムーズな交通流の実現が重要である。交通条件を考慮した適応的な交通信号制御が注目されている。しかし, 計算負荷が大きいため, 既存の制御手法を用いることで, 大都市全体での車両の最適走行を確保することは困難である。本稿では,AMPIC(Adaptive Model Predictive Ising Controller)と呼ばれる,スケーラビリティと最適性の両方を保証する制御手法を提案する。提案手法では,車両流の予測モデルを明確に考慮し,各制御区間における最適制御問題の解法としてモデル予測制御を用いる。この最適制御問題は、いわゆるイジング問題と同等のバイナリ変数を持つ組合せ最適化問題に変換される。この変換により、広く研究され、高速かつ効率的な最適化性能が期待されているIsingソルバが利用可能となる。現実的な都市道路網のための微視的交通シミュレータを用いて数値実験を行った。その結果、AMPICは従来の制御方式よりも待ち時間が少なく、より高速な走行が可能であり、結果としてCO2排出量は減少することがわかった。長い予測地平線を持つモデル予測手法は、制御性能を効果的に向上させる。モデル都市におけるシステムパラメトリック研究は,提案手法が大都市道路網のスムーズな交通流を実現することを示唆している。イジング解法のうち、D-Waveの量子アニールは、妥当な計算コストで最適に近い解を見つけることが示されている。

Realizing smooth traffic flow is important for achieving carbon neutrality. Adaptive traffic signal control, which considers traffic conditions, has thus attracted attention. However, it is difficult to ensure optimal vehicle flow throughout a large city using existing control methods because of their heavy computational load. Here, we propose a control method called AMPIC (Adaptive Model Predictive Ising Controller) that guarantees both scalability and optimality. The proposed method employs model predictive control to solve an optimal control problem at each control interval with explicit consideration of a predictive model of vehicle flow. This optimal control problem is transformed into a combinatorial optimization problem with binary variables that is equivalent to the so-called Ising problem. This transformation allows us to use an Ising solver, which has been widely studied and is expected to have fast and efficient optimization performance. We performed numerical experiments using a microscopic traffic simulator for a realistic city road network. The results show that AMPIC enables faster vehicle cruising speed with less waiting time than that achieved by classical control methods, resulting in lower CO2 emissions. The model predictive approach with a long prediction horizon thus effectively improves control performance. Systematic parametric studies on model cities indicate that the proposed method realizes smoother traffic flows for large city road networks. Among Ising solvers, D-Wave's quantum annealing is shown to find near-optimal solutions at a reasonable computational cost.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# 安定度エントロピーの普遍飽和による量子複雑性の探索

Probing quantum complexity via universal saturation of stabilizer entropies ( http://arxiv.org/abs/2406.04190v2 )

ライセンス: Link先を確認

Tobias Haug, Leandro Aolita, M. S. Kim,

(参考訳) 非安定化器性 (nonstabilizerness) または 'magic' は量子コンピューティングの鍵となる資源であり、量子優位性に必要な条件である。非クリフォード演算は安定化器状態を資源状態に変え、安定化器R'enyi entropies (SREs)のような資源測度によって非安定化器の量を定量化する。ここでは,SREが臨界数の非クリフォード演算でその最大値を飽和させることを示す。臨界点に近いSREは普遍的な振舞いを示す。顕著なことに、SREの微分は、キュービットの数とは無関係に同じ点で交差し、単一の曲線に再スケールすることができる。臨界点は R'enyi index $\alpha$ に非自明に依存していることが分かる。 Tゲートをドープしたランダムなクリフォード回路の場合、臨界Tゲート密度は$\alpha$とは独立にスケールする。対照的に、ランダムなハミルトン進化の場合、臨界時間は、$\alpha>1$ のキュービット数で線形にスケールするが、$\alpha<1$ の定数は$\alpha<1$ である。このことは、$\alpha$-SREsは、基本的には$\alpha$:$\alpha$-SREsと$\alpha<1$は、Cliffordシミュレーションの複雑さに関連する。技術的貢献として、ランダム進化のパウリスペクトルは2つの高度集中ピークによって近似され、SREを計算することができる。さらに、ランダムなクリフォード回路と回転として表現できるランダムな進化のクラスを導入し、その正確なSREを提供する。量子システムの複雑性を特徴付ける新しい手法が提案されている。

Nonstabilizerness or `magic' is a key resource for quantum computing and a necessary condition for quantum advantage. Non-Clifford operations turn stabilizer states into resourceful states, where the amount of nonstabilizerness is quantified by resource measures such as stabilizer R\'enyi entropies (SREs). Here, we show that SREs saturate their maximum value at a critical number of non-Clifford operations. Close to the critical point SREs show universal behavior. Remarkably, the derivative of the SRE crosses at the same point independent of the number of qubits and can be rescaled onto a single curve. We find that the critical point depends non-trivially on R\'enyi index $\alpha$. For random Clifford circuits doped with T-gates, the critical T-gate density scales independently of $\alpha$. In contrast, for random Hamiltonian evolution, the critical time scales linearly with qubit number for $\alpha>1$, while is a constant for $\alpha<1$. This highlights that $\alpha$-SREs reveal fundamentally different aspects of nonstabilizerness depending on $\alpha$: $\alpha$-SREs with $\alpha<1$ relate to Clifford simulation complexity, while $\alpha>1$ probe the distance to the closest stabilizer state and approximate state certification cost via Pauli measurements. As technical contributions, we observe that the Pauli spectrum of random evolution can be approximated by two highly concentrated peaks which allows us to compute its SRE. Further, we introduce a class of random evolution that can be expressed as random Clifford circuits and rotations, where we provide its exact SRE. Our results opens up new approaches to characterize the complexity of quantum systems.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# サーキットブレーカによるアライメントとロバスト性の改善

Improving Alignment and Robustness with Circuit Breakers ( http://arxiv.org/abs/2406.04313v4 )

ライセンス: Link先を確認

Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks,

(参考訳) AIシステムは有害な行動をとることができ、敵の攻撃に対して非常に脆弱である。本稿では,近年の表現工学の進歩に触発されて,有害な出力を「回路ブレーカー」で処理することでモデルを中断するアプローチを提案する。拒否訓練などのアライメント改善を目的とした既存の技術は、しばしばバイパスされる。敵の訓練のような技術は、特定の攻撃に対抗して穴を塞ごうとする。拒絶訓練や敵対訓練の代替として、サーキットブレーキングは、そもそも有害なアウトプットの原因となる表現を直接制御する。我々の手法はテキストのみの言語モデルとマルチモーダル言語モデルの両方に適用でき、強力な目に見えない攻撃があっても、ユーティリティを犠牲にすることなく有害なアウトプットの発生を防げます。特に、スタンドアロン画像認識における敵対的堅牢性は未解決の課題であるが、回路ブレーカーは、有害なコンテンツを生み出すことを目的とした画像「ヒジャック」に対して、より大きなマルチモーダルシステムを確実に耐えられるようにしている。最後に、我々のアプローチをAIエージェントに拡張し、攻撃されているときの有害な行動の率を大幅に低下させることを示す。当社のアプローチは、有害な行動や敵の攻撃に対する信頼性の高い安全対策の開発において、大きな前進を示している。

AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, circuit-breaking directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, circuit breakers allow the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# チャン5号機からのカメラパスロバストクレーター検出

Camera-Pose Robust Crater Detection from Chang'e 5 ( http://arxiv.org/abs/2406.04569v2 )

ライセンス: Link先を確認

Matthew Rodda, Sofia McLeod, Ky Cuong Pham, Tat-Jun Chin,

(参考訳) 宇宙ミッションはますます危険な地形を探索することを目的としており、安全な航法を確保するには正確な位置推定とタイムリーな位置推定が必要である。視覚に基づくナビゲーションは、船上の画像から見える衝突クレーターと既知のデータベースを関連付けて、機体の姿勢を推定することで、この目標を達成する。しかし、既存の文献では、外部視角を含む画像からクレーター検出アルゴリズム(CDA)の性能を十分に評価していない。本研究では, クレーター検出のためのMask R-CNNの性能評価を行い, 外部視角を含む模擬データに基づく事前学習モデルと実画像による事前学習モデルを比較した。実画像に対する事前トレーニングは, 外部視角を含む画像が欠如しているにもかかわらず, 63.1F1スコアの検知性能と0.701交叉の楕円回帰性能を実現しているにもかかわらず, 優れていることを示す。本研究は,外部視角を含む画像上でのCDAの性能を定量的に解析した最初のものである。ますますロバストなCDAの開発に向けて、Chang'e 5 Landing Cameraからの外部視角を持つ最初の注釈付きCDAデータセットも提供します。

As space missions aim to explore increasingly hazardous terrain, accurate and timely position estimates are required to ensure safe navigation. Vision-based navigation achieves this goal through correlating impact craters visible through onboard imagery with a known database to estimate a craft's pose. However, existing literature has not sufficiently evaluated crater-detection algorithm (CDA) performance from imagery containing off-nadir view angles. In this work, we evaluate the performance of Mask R-CNN for crater detection, comparing models pretrained on simulated data containing off-nadir view angles and to pretraining on real-lunar images. We demonstrate pretraining on real-lunar images is superior despite the lack of images containing off-nadir view angles, achieving detection performance of 63.1 F1-score and ellipse-regression performance of 0.701 intersection over union. This work provides the first quantitative analysis of performance of CDAs on images containing off-nadir view angles. Towards the development of increasingly robust CDAs, we additionally provide the first annotated CDA dataset with off-nadir view angles from the Chang'e 5 Landing Camera.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# VS-PINN: 厳格な振る舞いを持つPDEを解くための可変スケーリング手法を用いた物理インフォームドニューラルネットワークの高速かつ効率的なトレーニング

VS-PINN: A fast and efficient training of physics-informed neural networks using variable-scaling methods for solving PDEs with stiff behavior ( http://arxiv.org/abs/2406.06287v2 )

ライセンス: Link先を確認

Seungchan Ko, Sang Hyeon Park,

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、ディープニューラルネットワークを用いて偏微分方程式(PDE)の解を計算するための有望な方法として最近登場した。しかし、様々な分野で大きな成功を収めたにもかかわらず、PDEの解が硬い挙動や高い周波数を示す場合、PINNを効果的に訓練する方法は、多くの点で不明である。本稿では,変数スケーリング技術を用いたPINNのトレーニング手法を提案する。この方法は単純であり、急速に変化する解を持つPDEを含む幅広い問題に適用できる。様々な数値実験を通じて,提案手法の有効性を実証し,PINNのトレーニング効率と性能を大幅に向上させることができることを確認した。さらに,ニューラル・タンジェント・カーネル (NTK) の解析に基づき,この現象の理論的証拠を提供し,本手法がPINNの性能を向上させることを示す。

Physics-informed neural networks (PINNs) have recently emerged as a promising way to compute the solutions of partial differential equations (PDEs) using deep neural networks. However, despite their significant success in various fields, it remains unclear in many aspects how to effectively train PINNs if the solutions of PDEs exhibit stiff behaviors or high frequencies. In this paper, we propose a new method for training PINNs using variable-scaling techniques. This method is simple and it can be applied to a wide range of problems including PDEs with rapidly-varying solutions. Throughout various numerical experiments, we will demonstrate the effectiveness of the proposed method for these problems and confirm that it can significantly improve the training efficiency and performance of PINNs. Furthermore, based on the analysis of the neural tangent kernel (NTK), we will provide theoretical evidence for this phenomenon and show that our methods can indeed improve the performance of PINNs.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# 画像拡散モデルを用いたインスタント3次元アバター生成

Instant 3D Human Avatar Generation using Image Diffusion Models ( http://arxiv.org/abs/2406.07516v2 )

ライセンス: Link先を確認

Nikos Kolotouros, Thiemo Alldieck, Enric Corona, Eduard Gabriel Bazavan, Cristian Sminchisescu,

(参考訳) AvatarPopUpは画像やテキストプロンプトなどの異なる入力モードから高速で高品質な3Dアバターを生成する方法であり、生成したポーズや形状を制御できる。一般的なテーマは、各タスクに特化された拡散ベースの画像生成ネットワークを使用し、次に3Dリフトネットワークを使用することである。我々は、何十億ものテキストイメージペアで訓練された強力な画像合成を活用できるように、3Dモデリングから目的的に生成を分離する。画像生成とバックビュー予測のためのイメージコンディショニングを付加した潜伏拡散ネットワークを微調整し、定性的に異なる複数の3D仮説をサポートする。我々の部分的な微調整アプローチは、破滅的な忘れを誘発することなく、各タスクにネットワークを適応させることができる。実験では,本手法が多モードテキスト,画像,身体制御信号に敬意を表した,高精度で高品質な3Dアバターを製作できることを実証した。われわれのアプローチでは、2秒で3Dモデルを生成することができ、4桁のスピードアップが既存の手法の大部分に及んでいるが、そのほとんどはタスクのサブセットだけを解決し、より少ないコントロールで解決している。 AvatarPopUpは、大規模な人間のアバターの制御された3D生成を必要とするアプリケーションを可能にする。プロジェクトのWebサイトはhttps://www.nikoskolot.com/avatarpopup/にある。

We present AvatarPopUp, a method for fast, high quality 3D human avatar generation from different input modalities, such as images and text prompts and with control over the generated pose and shape. The common theme is the use of diffusion-based image generation networks that are specialized for each particular task, followed by a 3D lifting network. We purposefully decouple the generation from the 3D modeling which allow us to leverage powerful image synthesis priors, trained on billions of text-image pairs. We fine-tune latent diffusion networks with additional image conditioning for image generation and back-view prediction, and to support qualitatively different multiple 3D hypotheses. Our partial fine-tuning approach allows to adapt the networks for each task without inducing catastrophic forgetting. In our experiments, we demonstrate that our method produces accurate, high-quality 3D avatars with diverse appearance that respect the multimodal text, image, and body control signals. Our approach can produce a 3D model in as few as 2 seconds, a four orders of magnitude speedup wrt the vast majority of existing methods, most of which solve only a subset of our tasks, and with fewer controls. AvatarPopUp enables applications that require the controlled 3D generation of human avatars at scale. The project website can be found at https://www.nikoskolot.com/avatarpopup/.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# 階層型回帰モデルと計画による混合運動環境の適応性

Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning ( http://arxiv.org/abs/2406.08002v2 )

ライセンス: Link先を確認

Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng,

(参考訳) 近年のマルチエージェント強化学習(MARL)アルゴリズムの成功にもかかわらず、混合モチベーション環境でのコプレーヤへの適応は大きな課題である。一つの実現可能なアプローチは、その特性を推測し、階層的に共プレーヤの振る舞いをモデル化することである。しかし、これらの手法は推論情報の効率的な推論と利用においてしばしば困難に直面する。これらの問題に対処するために,混合モチベーション環境における未知のポリシーへのわずかな適応を可能にする,新しいマルチエージェント決定アルゴリズムである階層型対性モデリング・プランニング(HOP)を提案する。 HOPは階層的に2つのモジュールから構成されており、相手の目標を推論し、対応する目標条件付きポリシーを学習する対向モデリングモジュールと、モンテカルロ木探索(MCTS)を用いて最良の応答を識別する計画モジュールである。提案手法は,他者の目標に対する信念をエピソード内を問わず更新し,相手のモデリングモジュールからの情報を用いて計画のガイドを行うことにより効率を向上する。実験の結果, 混合運動環境においては, HOPは様々な未確認エージェントと相互作用する際, より優れた少数ショット適応能力を示し, 自己再生のシナリオにおいて優れていた。さらに、実験中の社会知能の出現は、複雑なマルチエージェント環境における我々のアプローチの可能性を強調している。

Despite the recent successes of multi-agent reinforcement learning (MARL) algorithms, efficiently adapting to co-players in mixed-motive environments remains a significant challenge. One feasible approach is to hierarchically model co-players' behavior based on inferring their characteristics. However, these methods often encounter difficulties in efficient reasoning and utilization of inferred information. To address these issues, we propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm that enables few-shot adaptation to unseen policies in mixed-motive environments. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies, and a planning module that employs Monte Carlo Tree Search (MCTS) to identify the best response. Our approach improves efficiency by updating beliefs about others' goals both across and within episodes and by using information from the opponent modeling module to guide planning. Experimental results demonstrate that in mixed-motive environments, HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios. Furthermore, the emergence of social intelligence during our experiments underscores the potential of our approach in complex multi-agent environments.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# OmniCorpus:100億レベル画像にテキストを埋め込んだ統合マルチモーダルコーパス

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ( http://arxiv.org/abs/2406.08418v3 )

ライセンス: Link先を確認

Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang, Min Dou, Changyao Tian, Xizhou Zhu, Lewei Lu, Yushi Chen, Junjun He, Zhongying Tu, Tong Lu, Yali Wang, Limin Wang, Dahua Lin, Yu Qiao, Botian Shi, Conghui He, Jifeng Dai,

(参考訳) 自然文書形式で配置された複数の画像とテキストからなる画像-テキストインターリーブドデータは、インターネットデータの提示パラダイムと整合し、人間の読書習慣によく似ている。近年の研究では、このようなデータがマルチモーダル・イン・コンテクスト学習に役立ち、マルチモーダル微調整時の大規模言語モデルの能力を維持することが示されている。しかし、現在の画像テキストインターリーブデータの規模と多様性は、マルチモーダルな大言語モデルの開発を制限している。本稿では,100億規模の画像テキストインターリーブデータセットであるOmniCorpusを紹介する。効率的なデータエンジンを用いて860億の画像と1,696億のテキストトークンを含む大規模高品質の文書をフィルタリング・抽出する。私たちのデータセット(例えば、MCC4、OBELICS)と比較してみましょう。 1) 優れたデータ品質を維持しながら、15倍のスケールを持つ。 2) 英語と非英語の両方のWebサイトやビデオ中心のWebサイトを含む、より多様なソースが特徴である。 3) より柔軟で、画像テキストインターリーブドフォーマットから純粋なテキストコーパスと画像テキストペアへ容易に分解できる。総合的な分析と実験を通じて,提案したデータセットの品質,ユーザビリティ,有効性を検証する。これが将来のマルチモーダルモデル研究に確かなデータ基盤を提供することを期待しています。コードとデータはhttps://github.com/OpenGVLab/OmniCorpusで公開されている。

Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale and diversity of current image-text interleaved data restrict the development of multimodal large language models. In this paper, we introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset. Using an efficient data engine, we filter and extract large-scale high-quality documents, which contain 8.6 billion images and 1,696 billion text tokens. Compared to counterparts (e.g., MMC4, OBELICS), our dataset 1) has 15 times larger scales while maintaining good data quality; 2) features more diverse sources, including both English and non-English websites as well as video-centric websites; 3) is more flexible, easily degradable from an image-text interleaved format to pure text corpus and image-text pairs. Through comprehensive analysis and experiments, we validate the quality, usability, and effectiveness of the proposed dataset. We hope this could provide a solid data foundation for future multimodal model research. Code and data are released at https://github.com/OpenGVLab/OmniCorpus.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# Glyph-ByT5-v2: 高精度多言語ビジュアルテキストレンダリングのための強力な美的ベースライン

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering ( http://arxiv.org/abs/2406.10208v2 )

ライセンス: Link先を確認

Zeyu Liu, Weicong Liang, Yiming Zhao, Bohan Chen, Lin Liang, Lijuan Wang, Ji Li, Yuhui Yuan,

(参考訳) 近年,Glyph-ByT5はグラフィックデザイン画像における高精度な視覚テキストレンダリング性能を実現している。しかし、それでも英語のみに焦点が当てられており、視覚的魅力の面では比較的貧弱である。本稿では,Glyph-ByT5-v2 と Glyph-SDXL-v2 という2つの基本的制約に対処する。これを達成するために、私たちは以下の貢献をしている。 (i)100万以上のグリフテキストペアと9つの他の言語をカバーする1000万のグラフィックデザインイメージテキストペアからなる高品質な多言語グリフテキストおよびグラフィックデザインデータセットを作成する。二言語ごとの100のプロンプトからなる多言語視覚段落ベンチマークを作成して、多言語視覚スペルの精度を評価すること。 3) 視覚美学の質を高めるために, 最新のステップアウェア優先学習アプローチを活用すること。これらの技術を組み合わせることで、強力なカスタマイズされた多言語テキストエンコーダGlyph-ByT5-v2と、10言語で正確な綴りをサポートする強力な美的グラフィック生成モデルGlyph-SDXL-v2を提供する。私たちは、最新のDALL-E3とIdeogram 1.0が、多言語のビジュアルテキストレンダリングタスクに苦戦していることを考慮し、我々の仕事を大きな進歩と見なしています。

Recently, Glyph-ByT5 has achieved highly accurate visual text rendering performance in graphic design images. However, it still focuses solely on English and performs relatively poorly in terms of visual appeal. In this work, we address these two fundamental limitations by presenting Glyph-ByT5-v2 and Glyph-SDXL-v2, which not only support accurate visual text rendering for 10 different languages but also achieve much better aesthetic quality. To achieve this, we make the following contributions: (i) creating a high-quality multilingual glyph-text and graphic design dataset consisting of more than 1 million glyph-text pairs and 10 million graphic design image-text pairs covering nine other languages, (ii) building a multilingual visual paragraph benchmark consisting of 1,000 prompts, with 100 for each language, to assess multilingual visual spelling accuracy, and (iii) leveraging the latest step-aware preference learning approach to enhance the visual aesthetic quality. With the combination of these techniques, we deliver a powerful customized multilingual text encoder, Glyph-ByT5-v2, and a strong aesthetic graphic generation model, Glyph-SDXL-v2, that can support accurate spelling in 10 different languages. We perceive our work as a significant advancement, considering that the latest DALL-E3 and Ideogram 1.0 still struggle with the multilingual visual text rendering task.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# SciEx: 人間の専門的なグラデーションと自動グラデーションによる科学実験における大規模言語モデルのベンチマーク

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading ( http://arxiv.org/abs/2406.10421v2 )

ライセンス: Link先を確認

Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Tobias Röddiger, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues,

(参考訳) LLM(Large Language Models)の急速な発展に伴い、異なるドメインにおけるLLMの能力を評価するためのベンチマークが不可欠である。 LLMの一般的な用途の1つは、アルゴリズムの作成、データベースのクエリ、数学的証明など、科学的なトピックに関するタスクを実行することである。本稿では,このような課題に対する大学生の評価の仕方から着想を得たSciExを提案する。 SciExは、(1)英語とドイツ語の両方の試験を含む多言語言語であり、(2)画像を含む質問を含むマルチモーダルであり、(3)大学試験の性質から、難易度が異なる様々な種類のフリーフォーム質問を含む。我々は,新しいベンチマークを用いて,最先端のLLMの性能評価を行った。 SciEx の質問は自由形式であるため LLM の性能を評価することは容易ではない。そこで我々は,SciEx 上での LLM 出力の人間の専門家による評価を行った。我々は、SciExのフリーフォーム試験が、現在、最高のLLMが平均59.4\%の試験成績しか達成していないLLMにとって、依然として挑戦的であることを示した。また,SciEx 上での LLM 性能と学生成績の詳細な比較を行った。 SciEx 上で LLM 回答を評価できる LLM-as-a-judge を提案する。実験の結果,LLMは試験の解法において完璧に機能するわけではないが,中等生として適しており,Pearson とエキスパートの成績の相関は0.948であることがわかった。

With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx - a benchmark consisting of university computer science exam questions, to evaluate LLMs ability on solving scientific tasks. SciEx is (1) multilingual, containing both English and German exams, and (2) multi-modal, containing questions that involve images, and (3) contains various types of freeform questions with different difficulty levels, due to the nature of university exams. We evaluate the performance of various state-of-the-art LLMs on our new benchmark. Since SciEx questions are freeform, it is not straightforward to evaluate LLM performance. Therefore, we provide human expert grading of the LLM outputs on SciEx. We show that the free-form exams in SciEx remain challenging for the current LLMs, where the best LLM only achieves 59.4\% exam grade on average. We also provide detailed comparisons between LLM performance and student performance on SciEx. To enable future evaluation of new LLMs, we propose using LLM-as-a-judge to grade the LLM answers on SciEx. Our experiments show that, although they do not perform perfectly on solving the exams, LLMs are decent as graders, achieving 0.948 Pearson correlation with expert grading.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# CVPR 2024 PBDLチャレンジの実施報告

Technique Report of CVPR 2024 PBDL Challenges ( http://arxiv.org/abs/2406.10744v3 )

ライセンス: Link先を確認

Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu, Yunkang Zhang, Siyuan Jiang, Xiaoqiang Lu, Licheng Jiao, Fang Liu, Xu Liu, Lingling Li, Wenping Ma, Shuyuan Yang, Haiyang Xie, Jian Zhao, Shihua Huang, Peng Cheng, Xi Shen, Zheng Wang, Shuai An, Caizhi Zhu, Xuelong Li, Tao Zhang, Liang Li, Yu Liu, Chenggang Yan, Gengchen Zhang, Linyan Jiang, Bingyi Song, Zhuoyu An, Haibo Lei, Qing Luo, Jie Song, Yuan Liu, Qihang Li, Haoyuan Zhang, Lingfeng Wang, Wei Chen, Aling Luo, Cheng Li, Jun Cao, Shu Chen, Zifei Dou, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Xuejian Gou, Qinliang Wang, Yang Liu, Shizhan Zhao, Yanzhao Zhang, Libo Yan, Yuwei Guo, Guoxin Li, Qiong Gao, Chenyue Che, Long Sun, Xiang Chen, Hao Li, Jinshan Pan, Chuanlong Xie, Hongming Chen, Mingrui Li, Tianchen Deng, Jingwei Huang, Yufeng Li, Fei Wan, Bingxin Xu, Jian Cheng, Hongzhe Liu, Cheng Xu, Yuxiang Zou, Weiguo Pan, Songyin Dai, Sen Jia, Junpei Zhang, Puhua Chen, Qihang Li,

(参考訳) 物理に基づくビジョンとディープラーニングの交わりは、コンピュータビジョン技術の進歩にエキサイティングなフロンティアをもたらす。物理の原理を活用して、深層学習モデルの情報提供と強化を行うことで、より堅牢で正確な視覚システムを開発することができる。物理に基づくビジョンは、画像から形状、反射率、光の分布、中性などのシーン特性を復元する過程を反転させることを目的としている。近年、ディープラーニングは様々な視覚タスクに有望な改善を示しており、物理に基づく視覚と組み合わせることで、これらのアプローチは視覚システムの堅牢性と精度を高めることができる。 CVPR 2024ワークショップで行われたPBDL 2024チャレンジの結果を要約する。課題は8つのトラックで構成され、低光強調と検出、ハイダイナミックレンジ(HDR)イメージングに焦点を当てた。本報告では,各トラックの目的,方法論,成果を詳述し,最高性能のソリューションとその革新的なアプローチについて述べる。

The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches.

翻訳日:2024-07-16 04:27:56 公開日:2024-07-12

# 3次元から2次元の空洞蒸留による単一スライスセグメンテーションの促進

Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation ( http://arxiv.org/abs/2406.12254v2 )

ライセンス: Link先を確認

Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael E. Kim, Rendong Zhang, Shunxing Bao, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman,

(参考訳) 腹部CT(Single-Slice abdominal Computed Tomography)により,低放射線照射による身体習慣および臓器の健康状態の評価が可能となった。しかしながら、単一スライスデータはセグメンテーションに2Dネットワークを使用する必要があるが、これらのネットワークは文脈情報を効果的に捉えるのに苦労することが多い。したがって、同一のデータセットでトレーニングしても、3Dネットワークは通常より優れたセグメンテーション結果が得られる。本研究では, 事前学習した3Dモデルを用いて, 2次元単一スライスセグメンテーションを向上する新しい3D-to-2D蒸留フレームワークを提案する。具体的には,3次元表現から予測分布セントロイドを抽出し,クラス内およびクラス間相関を学習することによって2次元学生の指導を行う。同じデータ入力を必要とする従来の知識蒸留法とは異なり、我々のアプローチでは、2次元の学生モデルをガイドするために、コントラストのない3次元CTスキャンを採用しています。単一スライス型ボルチモア縦断年代測定(BLSA)データセットから707名の被験者を対象に行った実験により,最先端の2次元多臓器分割法が3次元教師モデルの恩恵を受け,単一スライス型多臓器分割の性能向上を実現していることが示された。特に,本手法は,訓練対象者200名に過ぎなかった場合においても,訓練対象者全員で訓練したモデルよりも優れ,低データ体制において有意な有効性を示した。このように、この研究は手作業によるアノテーションの負担を軽減する可能性を浮き彫りにしている。

2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# タブラルデータ生成モデルのフッド下:ハイパーパラメータチューニングの強い影響

Under the Hood of Tabular Data Generation Models: the Strong Impact of Hyperparameter Tuning ( http://arxiv.org/abs/2406.12945v2 )

ライセンス: Link先を確認

G. Charbel N. Kindji, Lina Maria Rojas-Barahona, Elisa Fromont, Tanguy Urvoy,

(参考訳) グラフデータ生成のための最近の5つのモデルファミリに対する,データセット固有のハイパーパラメータ,特徴符号化,アーキテクチャチューニングの影響を,16データセットの広範なベンチマークを用いて検討した。本研究は、ハイパーパラメータ最適化を完全に考慮したモデルの統一評価の実践的必要性に対処する。さらに,各モデルに対して,高速な最適化を実現し,ほぼ同等の性能を極めて低いコストで達成する検索スペースの削減を提案し,我々のベンチマークでは,ほとんどのモデルにおいて,大規模データセット特化チューニングが元の構成よりも大幅に性能を向上することを示した。さらに,拡散モデルが表データ上で他のモデルを上回ることが確認された。しかし、チューニングとトレーニングプロセス全体がすべてのモデルで同じGPU予算に制限されている場合、この利点は重要ではない。

We investigate the impact of dataset-specific hyperparameter, feature encoding, and architecture tuning on five recent model families for tabular data generation through an extensive benchmark on 16 datasets. This study addresses the practical need for a unified evaluation of models that fully considers hyperparameter optimization. Additionally, we propose a reduced search space for each model that allows for quick optimization, achieving nearly equivalent performance at a significantly lower cost.Our benchmark demonstrates that, for most models, large-scale dataset-specific tuning substantially improves performance compared to the original configurations. Furthermore, we confirm that diffusion-based models generally outperform other models on tabular data. However, this advantage is not significant when the entire tuning and training process is restricted to the same GPU budget for all models.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# 対話型人工知能が心の理論と自律行動の体系化に有効か : 比較分析

The Efficacy of Conversational Artificial Intelligence in Rectifying the Theory of Mind and Autonomy Biases: Comparative Analysis ( http://arxiv.org/abs/2406.13813v3 )

ライセンス: Link先を確認

Marcin Rządeczka, Anna Sterna, Julia Stolińska, Paulina Kaczyńska, Marcin Moskalewicz,

(参考訳) この研究は、認知バイアスの是正と人間とAIの相互作用への影響の認識における会話型人工知能(CAI)の有効性を評価する。認知バイアス(規範的思考からの体系的な逸脱)は精神健康に影響を与え、うつ病や不安などの症状を増す。治療チャットボットは、認知行動療法(CBT)をより使いやすく、手頃な価格で、スケーラブルで即時のサポートを提供する。この研究は、典型的なユーザとボットの相互作用をシミュレートする臨床ベースの仮想ケースシナリオを用いた構造化手法を用いている。パフォーマンスと感情の認知バイアスは、マインドバイアスの理論(AIの人間的形態化、AIへの過信、AIへの帰属)と自律バイアス(制御のイリュージョン、基本的な帰属エラー、ジャストワールド仮説)の2つのカテゴリで評価された。定性的フィードバック機構は, 精度, 治療品質, およびCBTの原理の遵守に基づく応答の定量化のために, 順序尺度を用いて使用した。医療用ロボット(Wysa, Youper)と一般用LSM(GTP 3.5, GTP 4, Gemini Pro)をスクリプトによる相互作用により評価し, 認知科学者と臨床心理学者が二重レビューを行った。統計的分析では、非治療的ボットはバイアス修正において常に優れた成績を示し、6つのバイアスのうち4つは影響認識において優れていた。このデータは、非治療的なチャットボットが認知バイアスに対処する上でより効果的であることを示唆している。

The study evaluates the efficacy of Conversational Artificial Intelligence (CAI) in rectifying cognitive biases and recognizing affect in human-AI interactions, which is crucial for digital mental health interventions. Cognitive biases (systematic deviations from normative thinking) affect mental health, intensifying conditions like depression and anxiety. Therapeutic chatbots can make cognitive-behavioral therapy (CBT) more accessible and affordable, offering scalable and immediate support. The research employs a structured methodology with clinical-based virtual case scenarios simulating typical user-bot interactions. Performance and affect recognition were assessed across two categories of cognitive biases: theory of mind biases (anthropomorphization of AI, overtrust in AI, attribution to AI) and autonomy biases (illusion of control, fundamental attribution error, just-world hypothesis). A qualitative feedback mechanism was used with an ordinal scale to quantify responses based on accuracy, therapeutic quality, and adherence to CBT principles. Therapeutic bots (Wysa, Youper) and general-use LLMs (GTP 3.5, GTP 4, Gemini Pro) were evaluated through scripted interactions, double-reviewed by cognitive scientists and a clinical psychologist. Statistical analysis showed therapeutic bots were consistently outperformed by non-therapeutic bots in bias rectification and in 4 out of 6 biases in affect recognition. The data suggests that non-therapeutic chatbots are more effective in addressing some cognitive biases.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# バーを高くする - ジェネレーティブ進化テストによる大規模言語モデルの価値の調査

Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing ( http://arxiv.org/abs/2406.14230v2 )

ライセンス: Link先を確認

Han Jiang, Xiaoyuan Yi, Zhihua Wei, Shu Wang, Xing Xie,

(参考訳) 警告: 非倫理的な情報を示すモデル出力を含む。大きな言語モデル(LLM)は大きなブレークスルーを達成したが、生成された非倫理的コンテンツは潜在的なリスクをもたらしている。 LLMの価値アライメントを測定することは、その規制と責任あるデプロイメントにとって不可欠である。 LLMの社会的偏見、毒性、倫理を評価するために、多くのデータセットが構築されているが、モデルが急速に進化するにつれて、既存のデータが漏れたり、不必要な状態に陥り、絶え間なく発展するLLMを過大評価する、という評価のクロノエフェクトに悩まされている。この問題に対処するために,LLMの根底にある道徳的基線を動的に探索する新しい生成的進化テスト手法であるGAAを提案する。制限のある静的データセットに依存する従来の適応テスト手法とは違い、GAAは反復的に更新されたアイテムジェネレータを組み込んで、各LSMの道徳的境界を推測し、真のアライメント範囲を正確に反映して困難に調整されたテスト項目を生成する。このプロセスは理論的にアイテムとモデル応答の結合分布を学習し、アイテムの難易度と値の適合性を潜伏変数とし、ジェネレータはLSMと共進化し、クロノエフェクトに対処する。我々は,多様な能力を持つ多種多様なLLMを評価し,GAAが難解なテスト項目を作成し,LCMの値をより正確に評価し,未確認のOODおよびi.d.項目の性能と整合性を向上し,将来の評価パラダイムの基盤となることを実証した。

Warning: this paper contains model outputs exhibiting unethical information. Large Language Models (LLMs) have achieved significant breakthroughs, but their generated unethical content poses potential risks. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Numerous datasets have been constructed to assess social bias, toxicity, and ethics in LLMs, but they suffer from evaluation chronoeffect, that is, as models rapidly evolve, existing data becomes leaked or undemanding, overestimating ever-developing LLMs. To tackle this problem, we propose GETA, a novel generative evolving testing approach that dynamically probes the underlying moral baselines of LLMs. Distinct from previous adaptive testing methods that rely on static datasets with limited difficulty, GETA incorporates an iteratively-updated item generator which infers each LLM's moral boundaries and generates difficulty-tailored testing items, accurately reflecting the true alignment extent. This process theoretically learns a joint distribution of item and model response, with item difficulty and value conformity as latent variables, where the generator co-evolves with the LLM, addressing chronoeffect. We evaluate various popular LLMs with diverse capabilities and demonstrate that GETA can create difficulty-matching testing items and more accurately assess LLMs' values, better consistent with their performance on unseen OOD and i.i.d. items, laying the groundwork for future evaluation paradigms.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# Reward Misspecification 問題としての脱獄

Jailbreaking as a Reward Misspecification Problem ( http://arxiv.org/abs/2406.14393v2 )

ライセンス: Link先を確認

Zhihui Xie, Jiahui Gao, Lei Li, Zhenguo Li, Qi Liu, Lingpeng Kong,

(参考訳) 大規模言語モデル(LLM)の普及は、その安全性と信頼性、特に敵の攻撃に対する脆弱性に対する懸念を引き起こしている。本稿では,この脆弱性をアライメント過程における不特定性に寄与する新たな視点を提案する。本稿では,報酬の誤特定の程度を定量化するための指標ReGapを紹介し,有害なバックドアプロンプトを検出する上での有効性とロバスト性を示す。これらの知見に基づいて、様々な目標に整列したLDMに対して対向的なプロンプトを生成する自動レッドチーム作成システムであるReMissを提案する。 ReMissは、生成されたプロンプトの可読性を保ちながら、AdvBenchベンチマークにおける最先端の攻撃成功率を達成する。詳細な分析は、提案された報酬の不特定目標によってもたらされる独特な利点を以前の方法と比較して強調する。

The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness and robustness in detecting harmful backdoor prompts. Building upon these insights, we present ReMiss, a system for automated red teaming that generates adversarial prompts against various target aligned LLMs. ReMiss achieves state-of-the-art attack success rates on the AdvBench benchmark while preserving the human readability of the generated prompts. Detailed analysis highlights the unique advantages brought by the proposed reward misspecification objective compared to previous methods.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# ImageFlowNet:不規則にサンプリングされた縦断的医用画像による疾患進行のマルチスケール軌跡の予測

ImageFlowNet: Forecasting Multiscale Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images ( http://arxiv.org/abs/2406.14794v3 )

ライセンス: Link先を確認

Chen Liu, Ke Xu, Liangbo L. Shen, Guillaume Huguet, Zilong Wang, Alexander Tong, Danilo Bzdok, Jay Stewart, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy,

(参考訳) 画像から病気の進行を予測することは、臨床的意思決定の聖杯である。しかし, この課題は, 高次元性, 時空間性, サンプリング不規則性により複雑である。既存の手法では、しばしば手作りの特徴を抽出し、このベクトル空間で時系列解析を行うことで、画像内の豊富な空間情報が失われる。これらの課題を克服するために、我々は、ニューラルネットワークとSDEを用いて共同埋め込み空間におけるマルチスケール表現を進化させ、画像領域における病気の進行をモデル化する、潜時空間流れ場を学習する新しいフレームワークであるImageFlowNetを紹介した。特に、ImageFlowNetは、患者のコホートを組み合わせて、患者サンプル間で情報を伝達できるように、マルチスケールの関節表現空間を学習する。ダイナミクスはその後、進行のもっともらしい軌跡を提供し、SDEは同じ出発点から別の軌跡を提供する。我々は、ODEの定式化を支援し、高レベルの視覚的特徴、潜在空間の組織、軌道の滑らかさを含む正規化を動機付ける理論的洞察を提供する。次に、網膜の地理的萎縮、多発性硬化症、グリオ芽腫の進行を示す3つの縦断的医用画像データセットを用いて、画像FlowNetの有効性を実証的に評価した。

The forecasting of disease progression from images is a holy grail for clinical decision making. However, this task is complicated by the inherent high dimensionality, temporal sparsity and sampling irregularity in longitudinal image acquisitions. Existing methods often rely on extracting hand-crafted features and performing time-series analysis in this vector space, leading to a loss of rich spatial information within the images. To overcome these challenges, we introduce ImageFlowNet, a novel framework that learns latent-space flow fields that evolve multiscale representations in joint embedding spaces using neural ODEs and SDEs to model disease progression in the image domain. Notably, ImageFlowNet learns multiscale joint representation spaces by combining cohorts of patients together so that information can be transferred between the patient samples. The dynamics then provide plausible trajectories of progression, with the SDE providing alternative trajectories from the same starting point. We provide theoretical insights that support our formulation of ODEs, and motivate our regularizations involving high-level visual features, latent space organization, and trajectory smoothness. We then demonstrate ImageFlowNet's effectiveness through empirical evaluations on three longitudinal medical image datasets depicting progression in retinal geographic atrophy, multiple sclerosis, and glioblastoma.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# X線CTPA:2次元X線コンディショニングから3次元CTPAスキャンを生成する

X-ray2CTPA: Generating 3D CTPA scans from 2D X-ray conditioning ( http://arxiv.org/abs/2406.16109v3 )

ライセンス: Link先を確認

Noa Cahan, Eyal Klang, Galit Aviram, Yiftach Barash, Eli Konen, Raja Giryes, Hayit Greenspan,

(参考訳) 胸部X線または胸部X線撮影(CXR)は、一般的にCTスキャンと比較して限られた画像撮影が可能であり、特にCTPA(CT lung Angiography)のような造影スキャンにより、より詳細に正確な3次元データを提供する。しかし、CTスキャンはコストが高く、放射線被曝が大きく、CXRよりもアクセスしにくい。本研究では,2次元低コントラスト分解能X線入力から3次元高コントラスト・空間分解能CTPAスキャンへのクロスモーダル変換について検討する。生成AIの最近の進歩により、我々はこのタスクに新しい拡散に基づくアプローチを導入する。測定値と放射線技師からの定性的フィードバックの両方を用いてモデル性能を評価し, 生成した画像の診断的妥当性を保証した。さらに, 合成した3D画像を分類フレームワークに採用し, 最初のCXR入力を用いて, PE分類タスクにおいて改良されたAUCを示す。提案手法は一般化可能であり,医療画像に付加的なモダリティ変換を行うことができる。よりアクセシブルで費用対効果の高い高度な診断ツールの道を開くかもしれない。プロジェクトのコードは、https://github.com/NoaCahan/X-ray2CTPA である。

Chest X-rays or chest radiography (CXR), commonly used for medical diagnostics, typically enables limited imaging compared to computed tomography (CT) scans, which offer more detailed and accurate three-dimensional data, particularly contrast-enhanced scans like CT Pulmonary Angiography (CTPA). However, CT scans entail higher costs, greater radiation exposure, and are less accessible than CXRs. In this work we explore cross-modal translation from a 2D low contrast-resolution X-ray input to a 3D high contrast and spatial-resolution CTPA scan. Driven by recent advances in generative AI, we introduce a novel diffusion-based approach to this task. We evaluate the models performance using both quantitative metrics and qualitative feedback from radiologists, ensuring diagnostic relevance of the generated images. Furthermore, we employ the synthesized 3D images in a classification framework and show improved AUC in a PE categorization task, using the initial CXR input. The proposed method is generalizable and capable of performing additional cross-modality translations in medical imaging. It may pave the way for more accessible and cost-effective advanced diagnostic tools. The code for this project is available: https://github.com/NoaCahan/X-ray2CTPA .

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# 統計的推定を超える:シャッフルによる個人個人計算

Beyond Statistical Estimation: Differentially Private Individual Computation via Shuffling ( http://arxiv.org/abs/2406.18145v2 )

ライセンス: Link先を確認

Shaowei Wang, Changyu Dong, Xiangfu Song, Jin Li, Zhili Zhou, Di Wang, Han Wu,

(参考訳) データ駆動アプリケーションでは、ユーザプライバシを保持しながら、価値のある計算を可能にすることは、依然として重要な課題である。差別化プライバシ(DP)のような技術は、これらの懸念に対処する上で重要な役割を担っている。 DPのシャッフルモデルでは、信頼できるキュレーターは必要とせず、シャッフルから得られるプライバシー増幅効果を活用して高いユーティリティを実現することができる。これらの利点はシャッフルモデルに大きな関心を惹いた。しかし、シャッフルモデルの計算タスクは統計的推定に限られており、各ユーザがパーソナライズされた出力を必要とする実世界のシナリオには適用できない。本稿では、より広い範囲の置換同変計算をサポートするためにシャッフルモデルを拡張した、PIC(Private Individual Computation)と呼ばれる新しいパラダイムを提案する。 PICは、プライバシを保持しながらパーソナライズされたアウトプットを可能にし、シャッフルによってプライバシーを増幅する。 PICを実現するための具体的なプロトコルを提案する。本プロトコルでは,1回の公開鍵を使用すれば,プライバシーの増幅に不可欠な匿名性を損なうことなく,出力を受信することができる。さらに,有効性を高めるためにPICモデルのために設計された最適確率化器であるミンコフスキー応答を提案する。 PICプロトコルのセキュリティおよびプライバシ特性を正式に証明する。理論的解析と経験的評価は、PICが非統計計算タスクを処理し、PICとミンコフスキー確率化器が既存の解よりも優れた効用を達成できることを示す。

In data-driven applications, preserving user privacy while enabling valuable computations remains a critical challenge. Technologies like Differential Privacy (DP) have been pivotal in addressing these concerns. The shuffle model of DP requires no trusted curators and can achieve high utility by leveraging the privacy amplification effect yielded from shuffling. These benefits have led to significant interest in the shuffle model. However, the computation tasks in the shuffle model are limited to statistical estimation, making the shuffle model inapplicable to real-world scenarios in which each user requires a personalized output. This paper introduces a novel paradigm termed Private Individual Computation (PIC), expanding the shuffle model to support a broader range of permutation-equivariant computations. PIC enables personalized outputs while preserving privacy, and enjoys privacy amplification through shuffling. We propose a concrete protocol that realizes PIC. By using one-time public keys, our protocol enables users to receive their outputs without compromising anonymity, which is essential for privacy amplification. Additionally, we present an optimal randomizer, the Minkowski Response, designed for the PIC model to enhance utility. We formally prove the security and privacy properties of the PIC protocol. Theoretical analysis and empirical evaluations demonstrate PIC's capability in handling non-statistical computation tasks, and the efficacy of PIC and the Minkowski randomizer in achieving superior utility compared to existing solutions.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# 時系列の早期分類:分類学とベンチマーク

Early Classification of Time Series: Taxonomy and Benchmark ( http://arxiv.org/abs/2406.18332v2 )

ライセンス: Link先を確認

Aurélien Renault, Alexis Bondu, Antoine Cornuéjols, Vincent Lemaire,

(参考訳) 多くの場合、研究された現象の測定は順次提供され、タイムペナルティを過度に高くしないよう、クラスをできるだけ早く予測する必要があるが、早すぎるのではなく、誤分類のコストを支払うリスクがある。この問題は特に時系列の場合において研究されており、早期時系列分類(Early Classification of Time Series, ECTS)として知られている。文学の分野として発展してきたが,既存手法の相対的メリットを比較するための,体系的かつ共有的な評価プロトコルがいまだに存在しない。この文書は、これらの手法を原則に基づく分類に位置づけることから始まる。評価を整理するための次元を定義し、その後、9つの最先端ECTSアルゴリズムを含む、これらの次元に沿った非常に広範な実験の結果を報告する。さらに、これらや他の実験は、既存のECTSアルゴリズムの大部分が実装されているオープンソースライブラリを使って行うことができる(参照: \url{https://github.com/ML-EDM/ml_edm})。

In many situations, the measurements of a studied phenomenon are provided sequentially, and the prediction of its class needs to be made as early as possible so as not to incur too high a time penalty, but not too early and risk paying the cost of misclassification. This problem has been particularly studied in the case of time series, and is known as Early Classification of Time Series (ECTS). Although it has been the subject of a growing body of literature, there is still a lack of a systematic, shared evaluation protocol to compare the relative merits of the various existing methods. This document begins by situating these methods within a principle-based taxonomy. It defines dimensions for organizing their evaluation, and then reports the results of a very extensive set of experiments along these dimensions involving nine state-of-the art ECTS algorithms. In addition, these and other experiments can be carried out using an open-source library in which most of the existing ECTS algorithms have been implemented (see \url{https://github.com/ML-EDM/ml_edm}).

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# RoboUniView:ロボットマニピュレイトンのための統一ビュー表現を用いた視覚言語モデル

RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton ( http://arxiv.org/abs/2406.18977v2 )

ライセンス: Link先を確認

Fanfan Liu, Feng Yan, Liming Zheng, Chengjian Feng, Yiyang Huang, Lin Ma,

(参考訳) ロボット操作のためのビジョンランゲージモデル(VLM)の利用は、新しいオブジェクトや命令に一般化するモデルの能力を高めることを目的とした、新しいパラダイムである。しかし、カメラの仕様や設置位置の変化により、既存の手法は異なるロボットプラットフォーム間で大きな性能格差を示す。この課題に対処するために,アクション学習から視覚的特徴抽出を分離する革新的なアプローチであるRoboUniViewを提案する。我々はまず、アクセスしやすいデータに基づいて事前学習することで、多視点ビューから統一されたビュー表現を学び、その後、この統合されたビュー表現からアクションを導出し、ロボット操作を制御する。この統合ビュー表現は、物理的な世界をより正確に反映し、ロボットプラットフォームのカメラパラメータに制約されない。この手法により、要求されるCALVINベンチマークの最先端性能を達成し、93.0%から96.2%の$D \to D$設定、92.2%から94.2%の$ABC \to D$設定の成功率を高める。さらに,本モデルでは,未知のカメラパラメータの下で高い性能を維持し,様々なカメラパラメータを持つ複数のデータセットを利用でき,データセット間のクロスタスク学習を共同で行うことが可能である。コードは再実装のために提供される。 https://github.com/liufanfanlff/RoboUniview

Utilizing Vision-Language Models (VLMs) for robotic manipulation represents a novel paradigm, aiming to enhance the model's ability to generalize to new objects and instructions. However, due to variations in camera specifications and mounting positions, existing methods exhibit significant performance disparities across different robotic platforms. To address this challenge, we propose RoboUniView in this paper, an innovative approach that decouples visual feature extraction from action learning. We first learn a unified view representation from multi-perspective views by pre-training on readily accessible data, and then derive actions from this unified view representation to control robotic manipulation. This unified view representation more accurately mirrors the physical world and is not constrained by the robotic platform's camera parameters. Thanks to this methodology, we achieve state-of-the-art performance on the demanding CALVIN benchmark, enhancing the success rate in the $D \to D$ setting from 93.0% to 96.2%, and in the $ABC \to D$ setting from 92.2% to 94.2%. Moreover, our model exhibits outstanding adaptability and flexibility: it maintains high performance under unseen camera parameters, can utilize multiple datasets with varying camera parameters, and is capable of joint cross-task learning across datasets. Code is provided for re-implementation. https://github.com/liufanfanlff/RoboUniview

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# 合成癌-LLMで悪を増す

Synthetic Cancer -- Augmenting Worms with LLMs ( http://arxiv.org/abs/2406.19570v2 )

ライセンス: Link先を確認

Benjamin Zimmerman, David Zollikofer,

(参考訳) ますます洗練された大規模言語モデル(LLM)によって、乱用の可能性は大きく上昇する。スイスAI安全賞(Swiss AI Safety Prize)への提出として、2つの主要なプロセスにLLMを利用する新しいタイプの変成マルウェアを提案する。第一に、LSMは、アンチマルウェアプログラムによるシグネチャベースの検出を避けるために、自動コード書き換えに使用される。マルウェアはLLMを利用して電子メールの返信をソーシャルにエンジニアリングし、受信者にマルウェアの実行を促す。私たちの提出書類には、LLMがサイバーセキュリティにもたらすリスクを強調し、インテリジェントなマルウェアのさらなる研究の必要性を強調する機能的最小限のプロトタイプが含まれています。

With increasingly sophisticated large language models (LLMs), the potential for abuse rises drastically. As a submission to the Swiss AI Safety Prize, we present a novel type of metamorphic malware leveraging LLMs for two key processes. First, LLMs are used for automatic code rewriting to evade signature-based detection by antimalware programs. The malware then spreads its copies via email by utilizing an LLM to socially engineer email replies to encourage recipients to execute the attached malware. Our submission includes a functional minimal prototype, highlighting the risks that LLMs pose for cybersecurity and underscoring the need for further research into intelligent malware.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# ROS-LLM:タスクフィードバックと構造化推論を備えたAI具体化のためのROSフレームワーク

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning ( http://arxiv.org/abs/2406.19741v3 )

ライセンス: Link先を確認

Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar,

(参考訳) 本稿では,ロボットオペレーティング・システム(ROS)の自然言語プロンプトと文脈情報を活用する,非専門家による直感的なロボットプログラミングのためのフレームワークを提案する。我々のシステムは,大規模言語モデル (LLM) を統合し,非専門家がチャットインタフェースを通じてシステムにタスク要求を記述できるようにする。フレームワークの主な特徴は、オープンソースのLLMと接続されたAIエージェントとのROSの統合、LLM出力からの行動の自動抽出、ROSアクション/サービスの実行、3つの動作モード(シーケンス、行動ツリー、状態マシン)のサポート、可能なアクションのライブラリに新しいロボットアクションを追加する模倣学習、人間と環境のフィードバックによるLCMリフレクションである。大規模な実験により、長期のタスク、テーブルトップの再配置、リモート監視制御など、さまざまなシナリオにおける堅牢性、スケーラビリティ、汎用性を示すフレームワークが検証された。フレームワークの採用を容易にし、その結果の再現をサポートするため、コードをオープンソースにしました。 https://github.com/huawei-noah/HEBO/tree/master/ROSLLM

We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# 事前学習型言語モデルにおける認知知の発達

Development of Cognitive Intelligence in Pre-trained Language Models ( http://arxiv.org/abs/2407.01047v3 )

ライセンス: Link先を確認

Raj Sanjay Shah, Khushi Bhardwaj, Sashank Varma,

(参考訳) 近年の研究では、PLM(Large Pre-trained Language Models)における創発的認知能力の証拠が示されている。これらのモデルの認知的アライメントの増大は、認知科学理論の候補となっている。 PLMの創発的認知能力に関する以前の研究は、主にパス非依存のモデルトレーニング、すなわち、中間段階ではなく最終的なモデルウェイトに焦点を当ててきた。しかし, PLMを用いた人間認知モデルの構築は, 子どもの思考の軌跡に対する学習時の行動の発達的アライメントを考慮すれば有益である。人間の知能の心理測定テストにより、PLMの10家族のアライメントを調査する4つのタスクを選択し、その中間および最終訓練手順を評価する。これらのタスクは、数値能力、言語能力、概念理解、および流体推論である。モデルのサイズに関わらず、PLMの発達軌跡は、人間の認知発達に対する最大限の調整の窓を一貫して示している。そのウィンドウの前には、トレーニングによって"ブランクスレート"モデルと、経験から素早く学ぶために必要な構造が提供されるように思われる。この窓のあと、トレーニングは損失を減らすという工学的な目標に役立っているように見えるが、人間の認知との整合性を高めるという科学的目標ではない。

Recent studies show evidence for emergent cognitive abilities in Large Pre-trained Language Models (PLMs). The increasing cognitive alignment of these models has made them candidates for cognitive science theories. Prior research into the emergent cognitive abilities of PLMs has largely been path-independent to model training, i.e., has focused on the final model weights and not the intermediate steps. However, building plausible models of human cognition using PLMs would benefit from considering the developmental alignment of their performance during training to the trajectories of children's thinking. Guided by psychometric tests of human intelligence, we choose four sets of tasks to investigate the alignment of ten popular families of PLMs and evaluate their available intermediate and final training steps. These tasks are Numerical ability, Linguistic abilities, Conceptual understanding, and Fluid reasoning. We find a striking regularity: regardless of model size, the developmental trajectories of PLMs consistently exhibit a window of maximal alignment to human cognitive development. Before that window, training appears to endow "blank slate" models with the requisite structure to be poised to rapidly learn from experience. After that window, training appears to serve the engineering goal of reducing loss but not the scientific goal of increasing alignment with human cognition.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# Gloss2Text: LLMとSemantically Aware Label Smoothingを用いた手話グロス翻訳

Gloss2Text: Sign Language Gloss translation using LLMs and Semantically Aware Label Smoothing ( http://arxiv.org/abs/2407.01394v2 )

ライセンス: Link先を確認

Pooya Fayyazsanavi, Antonios Anastasopoulos, Jana Košecká,

(参考訳) ビデオから音声テキストへの手話翻訳は、異なる文法、表現ニュアンス、異なる話者や文脈間での視覚的外観の変化により、独特な課題を呈している。ビデオの中間的な光沢アノテーションは、翻訳プロセスのガイドを目的としている。本研究は,既存の言語モデル(LLM),データ拡張,光沢変換の曖昧性を利用した新しいラベル平滑化損失関数を活用することで,最先端の手法の性能を大幅に向上させることにより,翻訳段階に着目し,いくつかの進歩を提案する。 PHOENIX Weather 2014Tデータセットに関する広範な実験とアブレーション研究を通じて、我々のアプローチは、手話翻訳における最先端のパフォーマンスを超越し、手話翻訳におけるその有効性を示し、将来の研究開発への道のりを示唆している。

Sign language translation from video to spoken text presents unique challenges owing to the distinct grammar, expression nuances, and high variation of visual appearance across different speakers and contexts. The intermediate gloss annotations of videos aim to guide the translation process. In our work, we focus on {\em Gloss2Text} translation stage and propose several advances by leveraging pre-trained large language models (LLMs), data augmentation, and novel label-smoothing loss function exploiting gloss translation ambiguities improving significantly the performance of state-of-the-art approaches. Through extensive experiments and ablation studies on the PHOENIX Weather 2014T dataset, our approach surpasses state-of-the-art performance in {\em Gloss2Text} translation, indicating its efficacy in addressing sign language translation and suggesting promising avenues for future research and development.

翻訳日:2024-07-16 04:18:12 公開日:2024-07-12

# メモリベース大規模言語モデルのためのHaystackの針

Needle in the Haystack for Memory Based Large Language Models ( http://arxiv.org/abs/2407.01437v2 )

ライセンス: Link先を確認

Elliot Nelson, Georgios Kollias, Payel Das, Subhajit Chaudhury, Soham Dan,

(参考訳) 現在の大規模言語モデル(LLM)は、単純な事実検索タスクではよく機能しない。本稿では、動的に適応可能な外部メモリをLCMに結合することで、この問題を軽減することができるか検討する。この目的のために、我々は最近提案された言語モデルアーキテクチャであるLarimarを、パスキーやニードル・イン・ザ・ヘイスタックテストを含む長いコンテキストのリコールタスクでテストする。テキストサンプルのエピソードを高速に書き書きできるLarimarの外部メモリは、テスト時に、トレーニング中に見られるものよりもはるかに長いコンテキストを扱うために使用できることを示した。さらに、メモリからの遅延読み出し(長いコンテキストが書かれる)がデコーダを制御して正しい出力を生成し、メモリはGPUから外されることを示す。より大きいパラメータ数または修正された注意機構を使用する長文リコールタスクのための既存のトランスフォーマーベースのLLMアーキテクチャと比較すると、比較的小さなLarimarはタスク固有のトレーニングや長いコンテキストでのトレーニングをすることなく、強いパフォーマンスを維持することができる。

Current large language models (LLMs) often perform poorly on simple fact retrieval tasks. Here we investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. For this purpose, we test Larimar, a recently proposed language model architecture which uses an external associative memory, on long-context recall tasks including passkey and needle-in-the-haystack tests. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training. We further show that the latent readouts from the memory (to which long contexts are written) control the decoder towards generating correct outputs, with the memory stored off of the GPU. Compared to existing transformer-based LLM architectures for long-context recall tasks that use larger parameter counts or modified attention mechanisms, a relatively smaller size Larimar is able to maintain strong performance without any task-specific training or training on longer contexts.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# ブラックホール内部のテンソルネットワーク:非等方性、量子超表面、ワームホール

Tensor networks for black hole interiors: non-isometries, quantum extremal surfaces, and wormholes ( http://arxiv.org/abs/2407.01666v2 )

ライセンス: Link先を確認

Gracemarie Bueller, Oliver DeWolfe, Kenneth Higginbotham,

(参考訳) 双曲テンソルネットワークを用いてブラックホール内部のホログラフマップを構築し、Akers, Engelhardt, Harlow, Penington, Vardhanによって提案された非等距離符号に局所性の概念を追加する。我々は、これらのネットワークによって提供されるツールを用いて、地平線の背後にある非等方性と量子超曲面の関係を研究する。さらに、Akersらによって導入されたquditモデルに基づいて、これらの内部テンソルネットワークに対する力学の限られた概念を導入し、蒸発するブラックホールにおける量子超表面の進化を研究する。また、ブラックホールの内部と放射を繋ぐワームホールをテンソルネットワークで記述し、ページ時間後に内部の状態と演算子が放射中にエンコードされるメカニズムを提供する。特に, この非等尺ブラックホール符号の動的構造に非自明な有効動力学を組み込むために, 最近提案された逆向きフォワード写像のテンソルネットワーク実現を構築した。

We use hyperbolic tensor networks to construct a holographic map for black hole interiors that adds a notion of locality to the non-isometric codes proposed by Akers, Engelhardt, Harlow, Penington, and Vardhan. We use tools provided by these networks to study the relationship between non-isometries and quantum extremal surfaces behind the horizon. Furthermore, we introduce a limited notion of dynamics for these interior tensor networks based on the qudit models introduced by Akers et al., and study the evolution of quantum extremal surfaces in an evaporating black hole. We also find a tensor network description of a wormhole connecting the black hole interior to the radiation, providing a mechanism for interior states and operators to be encoded in the radiation after the Page time. As a particular case, we construct a tensor network realization of the backwards-forwards maps recently proposed to incorporate non-trivial effective dynamics in dynamical constructions of these non-isometric black hole codes.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# LPViT:ビジョントランス用低消費電力半構造化プルーニング

LPViT: Low-Power Semi-structured Pruning for Vision Transformers ( http://arxiv.org/abs/2407.02068v3 )

ライセンス: Link先を確認

Kaixin Xu, Zhe Wang, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin,

(参考訳) ビジョントランスフォーマーは、様々な画像解析タスクのための畳み込みニューラルネットワークに代わる有望な代替として登場し、同等または優れたパフォーマンスを提供している。しかし、ViTの重大な欠点は、そのリソース集約性であり、メモリフットプリントの増加、計算の複雑さ、電力消費につながる。この高性能技術を民主化し、環境に優しいものにするためには、ViTモデルを圧縮し、高い性能を維持しながらリソース要求を減らすことが不可欠である。本稿では,ViTの資源集約的な問題に対処するブロック構造化プルーニングを導入し,精度とハードウェアアクセラレーションのバランスのとれたトレードオフを提供する。非構造化プルーニングやチャネルワイドプルーニングとは異なり、ブロックプルーニングは線形層のブロックワイド構造を利用しており、より効率的な行列乗算をもたらす。このプルーニング方式を最適化するために,ブロック間隔構造に合わせて,高速化と推論時の消費電力の最小化を同時に行う,ハードウェア対応学習目標を提案する。この目的は、経験的なルックアップテーブルの必要性を排除し、パラメタライズされたレイヤ接続の削減にのみ焦点をあてる。さらに,本論文では,2次テイラー近似と経験的最適化を用いて,ViTの学習後プルーニングを実現するための軽量なアルゴリズムを提案する。 ImageNetの大規模な実験は、DeiT-BやDeiT-Sなど様々なViTアーキテクチャで行われ、他のプルーニング手法と競合する性能を示し、精度の保存と省電力の両立を実現している。特に,DeiT-Bでは専用ハードウェアで最大3.93倍,GPUで1.79倍の高速化を実現し,実世界のGPUで1.4倍の推論パワー低下を観測した。

Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more environmentally friendly, it is essential to compress ViT models, reducing their resource requirements while maintaining high performance. In this paper, we introduce a new block-structured pruning to address the resource-intensive issue for ViTs, offering a balanced trade-off between accuracy and hardware acceleration. Unlike unstructured pruning or channel-wise structured pruning, block pruning leverages the block-wise structure of linear layers, resulting in more efficient matrix multiplications. To optimize this pruning scheme, our paper proposes a novel hardware-aware learning objective that simultaneously maximizes speedup and minimizes power consumption during inference, tailored to the block sparsity structure. This objective eliminates the need for empirical look-up tables and focuses solely on reducing parametrized layer connections. Moreover, our paper provides a lightweight algorithm to achieve post-training pruning for ViTs, utilizing second-order Taylor approximation and empirical optimization to solve the proposed hardware-aware objective. Extensive experiments on ImageNet are conducted across various ViT architectures, including DeiT-B and DeiT-S, demonstrating competitive performance with other pruning methods and achieving a remarkable balance between accuracy preservation and power savings. Especially, we achieve up to 3.93x and 1.79x speedups on dedicated hardware and GPUs respectively for DeiT-B, and also observe an inference power reduction by 1.4x on real-world GPUs.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# 動的アルゴリズムとコンパイラ共設計によるオンデバイス超解法のためのデータオーバーフィッティング

Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design ( http://arxiv.org/abs/2407.02813v2 )

ライセンス: Link先を確認

Gen Li, Zhihao Shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma,

(参考訳) ディープニューラルネットワーク(DNN)は、様々なコンピュータビジョンアプリケーションで頻繁に使用される。現在、ビデオ配信システムにおける新たなトレンドは、DNNの過度に適合したプロパティを活用して、ビデオ解像度のアップスケールを実現することである。動画をチャンクに分割し、各チャンクに過度に適合させるために超高解像度(SR)モデルを適用することで、このSRモデルとビデオチャンクのスキームは、従来のビデオ伝送を置き換えることができ、ビデオ品質と伝送効率を向上させることができる。しかし、高パフォーマンスを保証するために多くのモデルとチャンクが必要であるため、モデルの切り替えとユーザ側のメモリフットプリントが大幅にオーバヘッドされる。このような問題を解決するために,Content-Awareデータ処理パイプラインが支援するダイナミックディープニューラルネットワークを提案する。また,Dy-DCAの動的特徴(動的形状,サイズ,制御フローなど)を最適化し,融合コード生成や静的実行計画など,一連のコンパイル最適化を可能にするフレームワークを設計した。このような手法を用いることで,市販携帯電話上でのPSNRとリアルタイム性能(33FPS)を向上する。一方、コンパイルの最適化によって、1.7$\times$スピードアップを実現し、最大1.61$\times$メモリ消費を節約します。コードはhttps://github.com/coulsonlee/Dy-DCA-ECCV2024で公開されている。

Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks is able to replace traditional video transmission to enhance video quality and transmission efficiency. However, many models and chunks are needed to guarantee high performance, which leads to tremendous overhead on model switching and memory footprints at the user end. To resolve such problems, we propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one (Dy-DCA), which helps promote performance while conserving computational resources. Additionally, to achieve real acceleration on the user end, we designed a framework that optimizes dynamic features (e.g., dynamic shapes, sizes, and control flow) in Dy-DCA to enable a series of compilation optimizations, including fused code generation, static execution planning, etc. By employing such techniques, our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone. Meanwhile, assisted by our compilation optimization, we achieve a 1.7$\times$ speedup while saving up to 1.61$\times$ memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# 弱い潜伏因子はいつ統計的に推測できるのか?

When can weak latent factors be statistically inferred? ( http://arxiv.org/abs/2407.03616v2 )

ライセンス: Link先を確認

Jianqing Fan, Yuling Yan, Yuheng Zheng,

(参考訳) 本稿では,主成分分析(PCA)の新しい包括的・包括的推定理論を,雑音レベルや信号対雑音比に対する因子強度を最小限に抑え,断面積依存の慣性成分を許容する弱因子モデルの下で確立する。我々の理論は断面次元$N$と時間次元$T$の相対的な成長速度によらず適用可能である。このより現実的な仮定と顕著な結果は、完全に新しい技術装置を必要とする。例えば、$N\asymp T$ の場合、PCA ベースの推定器の漸近正規性は、信号-雑音比 (SNR) が$\log N$ の多項式速度よりも早く増加する限り、保たれることを示す。この発見は、多項式レートが$N$を必要とした以前の作業を大幅に上回る。我々の理論は完全に非漸近的であり、推測誤差と統計的推論の不確実性の両方に有限サンプルの特性を与える。特筆すべき技術的革新は、PCAベースの推定器のクローズドフォームの1次近似であり、様々な統計的テストの道を開くものである。さらに,提案理論を適用して,未知の潜伏因子の線形スパンに該当する要因の検証,各ユニットの因子負荷における構造的欠陥の検証,2つのユニットが同一のリスク露光を有するかどうかの検証,系統的リスクに対する信頼区間の構築を行う。私たちの実証研究は、テスト結果と経済サイクルの洞察に富んだ相関関係を明らかにしました。

This article establishes a new and comprehensive estimation and inference theory for principal component analysis (PCA) under the weak factor model that allow for cross-sectional dependent idiosyncratic components under nearly minimal the factor strength relative to the noise level or signal-to-noise ratio. Our theory is applicable regardless of the relative growth rate between the cross-sectional dimension $N$ and temporal dimension $T$. This more realistic assumption and noticeable result requires completely new technical device, as the commonly-used leave-one-out trick is no longer applicable to the case with cross-sectional dependence. Another notable advancement of our theory is on PCA inference $ - $ for example, under the regime where $N\asymp T$, we show that the asymptotic normality for the PCA-based estimator holds as long as the signal-to-noise ratio (SNR) grows faster than a polynomial rate of $\log N$. This finding significantly surpasses prior work that required a polynomial rate of $N$. Our theory is entirely non-asymptotic, offering finite-sample characterizations for both the estimation error and the uncertainty level of statistical inference. A notable technical innovation is our closed-form first-order approximation of PCA-based estimator, which paves the way for various statistical tests. Furthermore, we apply our theories to design easy-to-implement statistics for validating whether given factors fall in the linear spans of unknown latent factors, testing structural breaks in the factor loadings for an individual unit, checking whether two units have the same risk exposures, and constructing confidence intervals for systematic risks. Our empirical studies uncover insightful correlations between our test results and economic cycles.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# 多言語ASRシステムの自己回帰デコーダの連続学習最適化

Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems ( http://arxiv.org/abs/2407.03645v2 )

ライセンス: Link先を確認

Chin Yuen Kwok, Jia Qi Yip, Eng Siong Chng,

(参考訳) 継続学習(CL)は、事前学習されたデータの性能を維持しながら、新しいデータで訓練済みモデルを微調整する。これは多言語ASR(MASR)の機能拡張に特に関係している。しかし、コンピュータビジョンと強化学習タスクを主目的とする既存のCL手法では、MASRに直接適用した場合、しばしば準最適結果が得られる。これはMASRモデルにおける自己回帰デコーダのCLが難しいためである。これを検証するために,デコーダに4つの最適化を提案する。その中には、デコーダ層勾配手術、未使用のトークン埋め込みの凍結、新たに追加されたトークンの出力の抑制、学習率の再スケーリングが含まれる。 Common VoiceデータセットからWhisperを10の未確認言語に適用する実験により、これらの最適化により、新しい言語のAWERを妥協することなく、事前訓練された言語の平均単語誤り率(AWER)が14.2%から12.4%に低下することを示した。

Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforcement learning tasks, often yield sub-optimal results when directly applied to MASR. We hypothesise that this is because CL of the auto-regressive decoder in the MASR model is difficult. To verify this, we propose four optimizations on the decoder. They include decoder-layer gradient surgery, freezing unused token embeddings, suppressing output of newly added tokens, and learning rate re-scaling. Our experiments on adapting Whisper to 10 unseen languages from the Common Voice dataset demonstrate that these optimizations reduce the Average Word Error Rate (AWER) of pretrained languages from 14.2% to 12.4% compared with Experience Replay, without compromising the AWER of new languages.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# SfM on-the-fly:より優れた3D撮影が可能に

SfM on-the-fly: Get better 3D from What You Capture ( http://arxiv.org/abs/2407.03939v2 )

ライセンス: Link先を確認

Zhan Zongqian, Yu Yifei, Xia Rui, Gan Wentian, Xie Hong, Perda Giulio, Morelli Luca, Remondino Fabio, Wang Xin,

(参考訳) 過去20年間、Structure from Motion (SfM) はフォトグラメトリー、コンピュータビジョン、ロボティクスなどの分野において、常にホットスポットとして研究されてきた。この作品は、オリジナルのオンザフライSfM(Zhan et al , 2024)の上に構築され、新しい3つの改良を加えて、撮影物からより良い3Dを得られるようにした。 (i)階層型ナビゲート型小型世界(HNSW)グラフを用いることにより、リアルタイム画像マッチングをさらに強化し、より真の正重畳み画像候補をより高速に同定する。 (II)SfM結果を改善するために,頑健な階層的局所バンドル調整のための自己適応重み付け戦略を提案する。三共同SfMを支援するための複数のエージェントを含み、一般的に登録された画像が現れたときに、複数の3D再構成をシームレスに完全3Dシーンにマージする。提案したSfM法(On-the-fly SfMv2)は,より完全でロバストな3次元再構成を高時間効率で実現できることを示す。コードはhttp://yifeiyu225.github.io/on-theflySfMv2.github.io/で公開されている。

In the last twenty years, Structure from Motion (SfM) has been a constant research hotspot in the fields of photogrammetry, computer vision, robotics etc., whereas real-time performance is just a recent topic of growing interest. This work builds upon the original on-the-fly SfM (Zhan et al., 2024) and presents an updated version with three new advancements to get better 3D from what you capture: (i) real-time image matching is further boosted by employing the Hierarchical Navigable Small World (HNSW) graphs, thus more true positive overlapping image candidates are faster identified; (ii) a self-adaptive weighting strategy is proposed for robust hierarchical local bundle adjustment to improve the SfM results; (iii) multiple agents are included for supporting collaborative SfM and seamlessly merge multiple 3D reconstructions into a complete 3D scene when commonly registered images appear. Various comprehensive experiments demonstrate that the proposed SfM method (named on-the-fly SfMv2) can generate more complete and robust 3D reconstructions in a high time-efficient way. Code is available at http://yifeiyu225.github.io/on-the-flySfMv2.github.io/.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# 観察可能な近接面の検出:クロスドメイン3次元物体検出の新しいモデリングと評価

Detect Closer Surfaces that can be Seen: New Modeling and Evaluation in Cross-domain 3D Object Detection ( http://arxiv.org/abs/2407.04061v3 )

ライセンス: Link先を確認

Ruixiao Zhang, Yihong Wu, Juheon Lee, Adam Prugel-Bennett, Xiaohao Cai,

(参考訳) ドメイン適応技術の性能は、現在の自動運転車の3Dオブジェクト検出分野において、まだ理想的なレベルに達していない。これらの要因が組み合わさって、特定のデータセットから学んだ知識の効果的な伝達と応用を妨げる。既存の評価指標は、当初、予測と接地トラスト境界ボックス間の2次元または3次元の重なりを計算して、単一領域上での評価のために設計されているため、データセット間のサイズ差に起因する過度な問題に悩まされることが多い。ドメインにまたがって適用された後、元の3Dバウンディングボックスで優れたパフォーマンスを維持するために、本当にモデルが必要なのでしょうか? 実用的アプリケーションの観点からは、車両と他の障害物との衝突を防止することに重点を置いています。言い換えれば、モデルがエゴ車両に最も近い表面を正確に識別できる限り、障害を効果的に回避することは十分である。本稿では,エゴ車両のセンサに近接する表面を検出する3次元物体検出モデルの能力を測定するための2つの指標を提案する。さらに、EdgeHeadと呼ばれる改良ヘッドを提案し、学習可能な近接面にもっと焦点を合わせることで、既存のモデルのクロスドメインパフォーマンスを大幅に向上させることができる。

The performance of domain adaptation technologies has not yet reached an ideal level in the current 3D object detection field for autonomous driving, which is mainly due to significant differences in the size of vehicles, as well as the environments they operate in when applied across domains. These factors together hinder the effective transfer and application of knowledge learned from specific datasets. Since the existing evaluation metrics are initially designed for evaluation on a single domain by calculating the 2D or 3D overlap between the prediction and ground-truth bounding boxes, they often suffer from the overfitting problem caused by the size differences among datasets. This raises a fundamental question related to the evaluation of the 3D object detection models' cross-domain performance: Do we really need models to maintain excellent performance in their original 3D bounding boxes after being applied across domains? From a practical application perspective, one of our main focuses is actually on preventing collisions between vehicles and other obstacles, especially in cross-domain scenarios where correctly predicting the size of vehicles is much more difficult. In other words, as long as a model can accurately identify the closest surfaces to the ego vehicle, it is sufficient to effectively avoid obstacles. In this paper, we propose two metrics to measure 3D object detection models' ability of detecting the closer surfaces to the sensor on the ego vehicle, which can be used to evaluate their cross-domain performance more comprehensively and reasonably. Furthermore, we propose a refinement head, named EdgeHead, to guide models to focus more on the learnable closer surfaces, which can greatly improve the cross-domain performance of existing models not only under our new metrics, but even also under the original BEV/3D metrics.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# Stephanie: 社会会話におけるヒューマンインタラクションの軽減のためのステップバイステップ対話

Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations ( http://arxiv.org/abs/2407.04093v2 )

ライセンス: Link先を確認

Hao Yang, Hongyuan Lu, Xinhua Zeng, Yang Liu, Xiang Zhang, Haoran Yang, Yumeng Zhang, Shan Huang, Yiran Wei, Wai Lam,

(参考訳) 自然言語処理の分野では、対話システムは1段階の対話パラダイムを主に採用している。このパラダイムは効率的だが、人間の相互作用の深さと流動性が欠如しており、自然に見えない。本稿では,人間の会話のダイナミックな性質を模倣した,新しい『textbf{Step}-by-Step Dialogue Paradigm』(ステファニー)を紹介する。デュアルラーニング戦略と,さらに分割した後編集手法を用いることで,既存の大規模言語モデルの微調整に高品質なステップバイステップ対話データセットを作成,活用し,ステップバイステップ対話を可能にする。私たちはステファニーを徹底的に紹介する。従来の単段階対話のパラダイムと比較して,その効果を評価するために,自動評価と人的評価を行った。チャットボットの未来を促進するために、コード、Stephanieデータセット、Stephanie LLMをリリースします。

In the rapidly evolving field of natural language processing, dialogue systems primarily employ a single-step dialogue paradigm. Although this paradigm is efficient, it lacks the depth and fluidity of human interactions and does not appear natural. We introduce a novel \textbf{Step}-by-Step Dialogue Paradigm (Stephanie), designed to mimic the ongoing dynamic nature of human conversations. By employing a dual learning strategy and a further-split post-editing method, we generated and utilized a high-quality step-by-step dialogue dataset to fine-tune existing large language models, enabling them to perform step-by-step dialogues. We thoroughly present Stephanie. Tailored automatic and human evaluations are conducted to assess its effectiveness compared to the traditional single-step dialogue paradigm. We will release code, Stephanie datasets, and Stephanie LLMs to facilitate the future of chatbot eras.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# NeuFair: ドロップアウトによるニューラルネットワークのフェアネス修復

NeuFair: Neural Network Fairness Repair with Dropout ( http://arxiv.org/abs/2407.04268v2 )

ライセンス: Link先を確認

Vishnu Asutosh Dasu, Ashish Kumar, Saeid Tizpaz-Niari, Gang Tan,

(参考訳) 本稿では,ディープニューラルネットワーク(DNN)における後処理バイアス緩和としてのニューロンのドロップアウトについて検討する。神経駆動型ソフトウェアソリューションは、社会的に重要な領域において、重要な公正性に影響を及ぼす。ニューラルネットワークは、データから統計的パターンを見つけるのに非常に適しているが、過去のデータから既存のバイアスをエンコードして増幅することができる。既存のバイアス軽減アルゴリズムでは、入力データセットや学習アルゴリズムを変更する必要があることが多い。ランダムにニューロンを落とすことによるトレーニング中に過剰な適合を防げる一般的なドロップアウト手法は、事前訓練されたDNNの公平性を改善するための効果的な、より侵入的なアプローチである可能性があると仮定する。しかし、ドロップするニューロンの理想的な集合を見つけることは組合せ問題である。我々は,事前学習したDNNにおける不公平さをトレーニング後の推論におけるドロップアウトによって軽減する,後処理のランダム化アルゴリズムであるNeuFairを提案する。我々のランダム化検索は、モデルの実用性を維持しながら差別を最小限に抑える目的によって導かれる。ランダム化アルゴリズムの設計は, モデルの性能劣化を最小限に抑えつつ, 公平性(最大69%)を向上させるのに有効であり, 効率的であることを示す。本稿では,これらの現象を直感的に説明し,探索アルゴリズムの様々なハイパーパラメータが結果に与える影響を慎重に検討する。最後に、NeuFairと異なる最先端バイアス緩和器を経験的、概念的に比較する。

This paper investigates neuron dropout as a post-processing bias mitigation for deep neural networks (DNNs). Neural-driven software solutions are increasingly applied in socially critical domains with significant fairness implications. While neural networks are exceptionally good at finding statistical patterns from data, they may encode and amplify existing biases from the historical data. Existing bias mitigation algorithms often require modifying the input dataset or the learning algorithms. We posit that the prevalent dropout methods that prevent over-fitting during training by randomly dropping neurons may be an effective and less intrusive approach to improve the fairness of pre-trained DNNs. However, finding the ideal set of neurons to drop is a combinatorial problem. We propose NeuFair, a family of post-processing randomized algorithms that mitigate unfairness in pre-trained DNNs via dropouts during inference after training. Our randomized search is guided by an objective to minimize discrimination while maintaining the model's utility. We show that our design of randomized algorithms is effective and efficient in improving fairness (up to 69%) with minimal or no model performance degradation. We provide intuitive explanations of these phenomena and carefully examine the influence of various hyperparameters of search algorithms on the results. Finally, we empirically and conceptually compare NeuFair to different state-of-the-art bias mitigators.

翻訳日:2024-07-16 04:08:24 公開日:2024-07-12

# パーソナライズによる公正なフェデレーションデータクラスタリング - 分散データ分散のギャップを埋める

Fair Federated Data Clustering through Personalization: Bridging the Gap between Diverse Data Distributions ( http://arxiv.org/abs/2407.04302v2 )

ライセンス: Link先を確認

Shivam Gupta, Tarushi, Tsering Wangzes, Shweta Jain,

(参考訳) エッジデバイスからのデータの急速な成長は、機械学習アルゴリズムのパフォーマンスを触媒にした。しかしながら、生成されたデータはクライアントデバイスに存在するため、従来の機械学習パラダイムが直面する大きな課題が2つある – トレーニング用のデータの集中化と、クラスラベルが欠落している生成データの大部分に対して、高コストと専門知識の欠如により、クライアントが手動でデータをラベル付けするインセンティブが非常に低い。これらの問題を解決するために、教師なしのフェデレートされたデータクラスタリングを使用して、分散的に保護されたプライバシで、不正なデータを処理するための初期の試みがあった。目標は、クライアントで利用可能なデータを、実際のデータ交換なしで、$k$パーティション(クラスタと呼ばれる)に分割することだ。既存のアルゴリズムのほとんどは、クライアント間のデータ分散パターンに依存しているか、あるいは計算コストが高い。さらに、既存のモデルが現実的なシナリオのほとんどにおいて、クライアントにまたがるデータの歪んだ性質があるため、クライアントは高いクラスタリングコストを被り、フェデレーションプロセスへの参加に消極的になる可能性がある。そこで,我々はまず,フェデレートクラスタリングにおけるパーソナライゼーションの考え方を紹介する。目標は、より低いクラスタリングコストを達成することと、同時に、クライアント間で均一なコストを達成することのバランスを達成することです。サーバとクライアント間の1ラウンドの通信でこれらの目標に対処するp-FClusを提案する。我々は,p-FClusがデータ独立性を示す様々なフェデレーションデータセットに対して有効であること,有限$$$$-normに適用可能であること,同時にコストと分散の低減を実現していることを検証した。

The rapid growth of data from edge devices has catalyzed the performance of machine learning algorithms. However, the data generated resides at client devices thus there are majorly two challenge faced by traditional machine learning paradigms - centralization of data for training and secondly for most the generated data the class labels are missing and there is very poor incentives to clients to manually label their data owing to high cost and lack of expertise. To overcome these issues, there have been initial attempts to handle unlabelled data in a privacy preserving distributed manner using unsupervised federated data clustering. The goal is partition the data available on clients into $k$ partitions (called clusters) without actual exchange of data. Most of the existing algorithms are highly dependent on data distribution patterns across clients or are computationally expensive. Furthermore, due to presence of skewed nature of data across clients in most of practical scenarios existing models might result in clients suffering high clustering cost making them reluctant to participate in federated process. To this, we are first to introduce the idea of personalization in federated clustering. The goal is achieve balance between achieving lower clustering cost and at same time achieving uniform cost across clients. We propose p-FClus that addresses these goal in a single round of communication between server and clients. We validate the efficacy of p-FClus against variety of federated datasets showcasing it's data independence nature, applicability to any finite $\ell$-norm, while simultaneously achieving lower cost and variance.

翻訳日:2024-07-16 04:08:23 公開日:2024-07-12

# Segment any 4D Gaussians

Segment Any 4D Gaussians ( http://arxiv.org/abs/2407.04504v2 )

ライセンス: Link先を確認

Shengxiang Ji, Guanjun Wu, Jiemin Fang, Jiazhong Cen, Taoran Yi, Wenyu Liu, Qi Tian, Xinggang Wang,

(参考訳) XR/VRでは、現実世界のモデリング、理解、再構築が不可欠である。近年,3次元ガウス散乱(3D-GS)法は3次元シーンのモデリングと理解において顕著な成功を収めている。同様に、様々な4D表現は、4D世界のダイナミクスを捉える能力を示している。しかし、4次元表現のセグメンテーションに焦点をあてる研究が数多く存在する。本稿では, 4D ガウスをベースとした 4D デジタル世界において, あらゆるものをセグメント化する最初のフレームワークである Segment Any 4D Gaussians (SA4D) を提案する。 SA4Dでは、ガウスのドリフトを扱うために効率的な時間的アイデンティティ特徴場を導入し、ノイズやスパース入力から正確なアイデンティティ特徴を学習することができる。さらに, アーティファクトを除去するために, 4次元セグメンテーション精製法を提案する。われわれのSA4Dは4Dガウスで数秒以内の精度で高品質なセグメンテーションを実現し、高品質なマスクを取り除き、色を変え、構成し、レンダリングする能力を示している。さらなるデモは、https://jsxzs.github.io/sa4d/.comで公開されている。

Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations. In this paper, we propose Segment Any 4D Gaussians (SA4D), one of the first frameworks to segment anything in the 4D digital world based on 4D Gaussians. In SA4D, an efficient temporal identity feature field is introduced to handle Gaussian drifting, with the potential to learn precise identity features from noisy and sparse input. Additionally, a 4D segmentation refinement process is proposed to remove artifacts. Our SA4D achieves precise, high-quality segmentation within seconds in 4D Gaussians and shows the ability to remove, recolor, compose, and render high-quality anything masks. More demos are available at: https://jsxzs.github.io/sa4d/.

翻訳日:2024-07-16 04:08:23 公開日:2024-07-12

# 強いLLMを判断する弱いLLMによるスケーラブルな監視について

On scalable oversight with weak LLMs judging strong LLMs ( http://arxiv.org/abs/2407.04622v2 )

ライセンス: Link先を確認

Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah,

(参考訳) スケーラブルな監視プロトコルは、人間が人間の超人的AIを正確に監視できるようにすることを目的としている。本稿では,2つのAIが1人のAIが1人の裁判官を納得させようとするコンサルト,単一のAIが1人の裁判官を納得させようとするコンサルト,そして、AIなしで裁判官が正解する直接的な質問回答の基準と比較する。大規模言語モデル(LLM)をAIエージェントと人間の判断のためのスタンドインの両方として使用し、判断モデルがエージェントモデルよりも弱いと判断する。我々は、裁判官とエージェント間のさまざまな非対称性をベンチマークし、情報非対称性を持つ1つの抽出的QAタスクの以前の作業を拡張し、数学、コーディング、論理学、マルチモーダル推論非対称性も含むようにした。議論は、コンサルタントがランダムにアサインされ、正しい/間違った回答を議論するときに、すべてのタスクでコンサルタントを上回ります。情報非対称性の議論を抽出するQAタスクでは、直接質問応答よりも優れるが、情報非対称性のない他のタスクでは、結果は混合される。以前の作業では議論者やコンサルタンに議論の答えを割り当てていた。代わりに、どの答えを議論するかを選べば、審査員は、コンサルタントよりも議論において間違った答えに納得する頻度が低いことが分かる。さらに、より強力な議論者モデルは、従来の研究よりも控えめに判断精度を高めることが判明した。

Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies.

翻訳日:2024-07-16 04:08:23 公開日:2024-07-12

# Toucan: 150のアフリカ語ペアの多言語翻訳

Toucan: Many-to-Many Translation for 150 African Language Pairs ( http://arxiv.org/abs/2407.04796v2 )

ライセンス: Link先を確認

AbdelRahim Elmadany, Ife Adebara, Muhammad Abdul-Mageed,

(参考訳) 我々は、低リソース言語のための機械翻訳(MT)を改善するために設計されたリソースのコレクションを導入することで、自然言語処理(NLP)の顕著なギャップに対処する。まず、12億と370億のパラメータを持つ2つの言語モデル、Cheetah-1.2BとCheetah-3.7Bを紹介する。次に、前述のモデルを微調整して、アフリカ語ペア156をサポートするように設計された、アフロセントリックな機械翻訳モデルであるToucanを作成します。 Toucanを評価するため、我々はAfroLingu-MTと呼ばれる機械翻訳評価のための広範囲な機械翻訳ベンチマークを慎重に開発した。トウカンは他のモデルよりも大幅に優れており、アフリカの言語におけるMTでの顕著なパフォーマンスを示している。最後に、新しいモデルspBLEU-1Kをトレーニングし、614のアフリカ語を含む1K言語をカバーする翻訳評価指標を強化する。この研究は、特にアフリカなどの限られた言語資源を持つ地域で、異文化間の理解と知識交換を促進することを目的としている。 ToucanプロジェクトのGitHubリポジトリはhttps://github.com/UBC-NLP/Toucanで公開されている。

We address a notable gap in Natural Language Processing (NLP) by introducing a collection of resources designed to improve Machine Translation (MT) for low-resource languages, with a specific focus on African languages. First, we introduce two language models (LMs), Cheetah-1.2B and Cheetah-3.7B, with 1.2 billion and 3.7 billion parameters respectively. Next, we finetune the aforementioned models to create toucan, an Afrocentric machine translation model designed to support 156 African language pairs. To evaluate Toucan, we carefully develop an extensive machine translation benchmark, dubbed AfroLingu-MT, tailored for evaluating machine translation. Toucan significantly outperforms other models, showcasing its remarkable performance on MT for African languages. Finally, we train a new model, spBLEU-1K, to enhance translation evaluation metrics, covering 1K languages, including 614 African languages. This work aims to advance the field of NLP, fostering cross-cultural understanding and knowledge exchange, particularly in regions with limited language resources such as Africa. The GitHub repository for the Toucan project is available at https://github.com/UBC-NLP/Toucan.

翻訳日:2024-07-16 04:08:23 公開日:2024-07-12

# LaSe-E2V:言語誘導型セマンティック・アウェア・イベント・ビデオ再構成を目指して

LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction ( http://arxiv.org/abs/2407.05547v2 )

ライセンス: Link先を確認

Kanghao Chen, Hangyu Li, JiaZhou Zhou, Zeyu Wang, Lin Wang,

(参考訳) イベントカメラは、標準カメラと比較して低レイテンシ、高時間分解能、高ダイナミックレンジ(HDR)などの利点を利用する。画像パラダイムの相違により、イベント・ツー・ビデオ(E2V)の再構築が主流となり、イベントベースと標準的なコンピュータビジョンが橋渡しされる。しかし、イベントカメラは、エッジとモーションの情報のみをローカルで検出する、本質的に不適切な性質のため、このタスクは依然として困難である。その結果、再構成されたビデオは、主にイベントデータのあいまいな意味論によって引き起こされる、アーティファクトや地域的曖昧さに悩まされることが多い。本稿では,言語は自然に豊富な意味情報を伝達し,E2V再構成のセマンティック一貫性を確保するのに驚くほど優れていることを示す。そこで本稿では,テキスト条件拡散モデルを用いて,言語誘導の観点から意味認識による高品質なE2V再構築を実現する,LaSe-E2Vという新しいフレームワークを提案する。しかし、拡散モデル固有の多様性とランダム性のため、E2V再構成のための空間的・時間的整合性を実現するために直接適用することは不可能である。そこで,まずイベント誘導時空間アテンション(ESA)モジュールを提案する。次に、時間的コヒーレンスを確保するためのイベント対応マスクロスと、空間的一貫性を高めるためのノイズ初期化戦略を導入する。イベントテキストとビデオのペアデータがないため、既存のE2Vデータセットを集約し、トレーニングと評価のためにタグ付けモデルを使用してテキスト記述を生成する。様々な難解なシナリオ(例えば、高速な動き、低光)をカバーする3つのデータセットの大規模な実験は、我々の手法の優位性を実証している。

Event cameras harness advantages such as low latency, high temporal resolution, and high dynamic range (HDR), compared to standard cameras. Due to the distinct imaging paradigm shift, a dominant line of research focuses on event-to-video (E2V) reconstruction to bridge event-based and standard computer vision. However, this task remains challenging due to its inherently ill-posed nature: event cameras only detect the edge and motion information locally. Consequently, the reconstructed videos are often plagued by artifacts and regional blur, primarily caused by the ambiguous semantics of event data. In this paper, we find language naturally conveys abundant semantic information, rendering it stunningly superior in ensuring semantic consistency for E2V reconstruction. Accordingly, we propose a novel framework, called LaSe-E2V, that can achieve semantic-aware high-quality E2V reconstruction from a language-guided perspective, buttressed by the text-conditional diffusion models. However, due to diffusion models' inherent diversity and randomness, it is hardly possible to directly apply them to achieve spatial and temporal consistency for E2V reconstruction. Thus, we first propose an Event-guided Spatiotemporal Attention (ESA) module to condition the event data to the denoising pipeline effectively. We then introduce an event-aware mask loss to ensure temporal coherence and a noise initialization strategy to enhance spatial consistency. Given the absence of event-text-video paired data, we aggregate existing E2V datasets and generate textual descriptions using the tagging models for training and evaluation. Extensive experiments on three datasets covering diverse challenging scenarios (e.g., fast motion, low light) demonstrate the superiority of our method.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# PORCA:部分観測データによる根本原因解析

PORCA: Root Cause Analysis with Partially Observed Data ( http://arxiv.org/abs/2407.05869v2 )

ライセンス: Link先を確認

Chang Gong, Di Yao, Jin Wang, Wenbin Li, Lanting Fang, Yongtao Xie, Kaiyu Feng, Peng Han, Jingping Bi,

(参考訳) ルート原因分析(RCA)は、複雑なシステムから因果構造を発見し解析することによって、システム障害の原因を特定することを目的としている。多くのアプリケーションドメインで広く使われている。信頼性の高い診断の結論は、システム障害と財政的損失を軽減する上で非常に重要である。しかし、以前の研究では、部分的な観察の効果(すなわち、欠損ノードと潜伏障害)を無視したシステムの完全な観察を暗黙的に仮定していた。その結果、信頼できるRCA結果の導出に失敗する。本稿では, 部分観察における未観測共同創設者の問題点と異質性を明らかにするとともに, 部分観察データを用いた根本原因分析の新たな課題を提起する。そこで本研究では,新しいRCAフレームワークであるPORCAを提案する。 PORCAは、拡大したスコアベースの因果探索を利用して、未観測の共同設立者の下で、非循環性指向の混合グラフを効率的に最適化する。さらに、適応的なサンプル重み付けを提供する不均一性を考慮したスケジューリング戦略も開発している。 1つの実世界のデータセットと2つの実世界のデータセットに対する大規模な実験結果は、提案フレームワークの有効性と優位性を示している。

Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which neglect the effect of partial observation (i.e., missing nodes and latent malfunction). As a result, they fail in deriving reliable RCA results. In this paper, we unveil the issues of unobserved confounders and heterogeneity in partial observation and come up with a new problem of root cause analysis with partially observed data. To achieve this, we propose PORCA, a novel RCA framework which can explore reliable root causes under both unobserved confounders and unobserved heterogeneity. PORCA leverages magnified score-based causal discovery to efficiently optimize acyclic directed mixed graph under unobserved confounders. In addition, we also develop a heterogeneity-aware scheduling strategy to provide adaptive sample weights. Extensive experimental results on one synthetic and two real-world datasets demonstrate the effectiveness and superiority of the proposed framework.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# PAS:データ効率の良いPlug-and-Play Prompt Augmentation System

PAS: Data-Efficient Plug-and-Play Prompt Augmentation System ( http://arxiv.org/abs/2407.06027v3 )

ライセンス: Link先を確認

Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou,

(参考訳) 近年、Large Language Models(LLMs)の台頭により、プラグアンドプレイAIシステムへの需要が高まっている。様々なAI技術の中で、プロンプトエンジニアリングは特に重要である。しかし、学習曲線の急激さや時間投資の大幅な増加により、ユーザーはプロンプトを書くことの難しさに直面することが多く、既存の自動プロンプトエンジニアリング(APE)モデルを使用することは困難である。この問題に対処するために, LLM ベースのプラグアンドプレイ APE システム PAS を提案する。 PASは高品質で自動生成される補完的なデータセットに基づいてトレーニングされたLLMを使用し、例外的なパフォーマンスを実現している。総合的なベンチマークでは、PASは従来のAPEモデルと比較して、平均6.09ポイントの改善を達成している。さらに、PASは非常に効率的で、9000のデータポイントしか持たないSoTAの性能を実現している。さらに、PASは人的労働を必要とせずに、即時増強データを自律的に生成することができる。この柔軟性により、既存のすべてのLLMと互換性があり、幅広いタスクに適用できる。 PASは人間の評価に優れており、ユーザのためのプラグインとしての適合性を強調している。高い性能、効率、柔軟性の組み合わせにより、PASはプロンプトエンジニアリングの改善を通じてLCMのユーザビリティと有効性を向上する貴重なシステムとなっている。

In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficult to use. To address this issue, we propose PAS, an LLM-based plug-and-play APE system. PAS utilizes LLMs trained on high-quality, automatically generated prompt complementary datasets, resulting in exceptional performance. In comprehensive benchmarks, PAS achieves state-of-the-art (SoTA) results compared to previous APE models, with an average improvement of 6.09 points. Moreover, PAS is highly efficient, achieving SoTA performance with only 9000 data points. Additionally, PAS can autonomously generate prompt augmentation data without requiring additional human labor. Its flexibility also allows it to be compatible with all existing LLMs and applicable to a wide range of tasks. PAS excels in human evaluations, underscoring its suitability as a plug-in for users. This combination of high performance, efficiency, and flexibility makes PAS a valuable system for enhancing the usability and effectiveness of LLMs through improved prompt engineering.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# 構造生成:階層的クラスタを用いて拡散モデルを導出する

Structured Generations: Using Hierarchical Clusters to guide Diffusion Models ( http://arxiv.org/abs/2407.06124v2 )

ライセンス: Link先を確認

Jorge da Silva Goncalves, Laura Manduchi, Moritz Vandenhirtz, Julia E. Vogt,

(参考訳) 本稿では,階層的クラスタリングをDenoising Diffusion Probabilistic Models (DDPMs) の枠組みに統合したDiffuse-TreeVAEを提案する。提案手法は,学習した潜在木VAE構造体の根埋め込みから新たな画像を生成し,階層的な経路を伝播し,第2段階のDDPMを用いて各データクラスタの異なる高品質な画像を洗練・生成する。その結果、画像の明瞭度を向上するだけでなく、生成されたサンプルがそれぞれのクラスタに代表されることを保証するモデルとなり、従来のVAEベースの手法の限界に対処し、クラスタリングベースの生成モデリングの状況を改善する。

This paper introduces Diffuse-TreeVAE, a deep generative model that integrates hierarchical clustering into the framework of Denoising Diffusion Probabilistic Models (DDPMs). The proposed approach generates new images by sampling from a root embedding of a learned latent tree VAE-based structure, it then propagates through hierarchical paths, and utilizes a second-stage DDPM to refine and generate distinct, high-quality images for each data cluster. The result is a model that not only improves image clarity but also ensures that the generated samples are representative of their respective clusters, addressing the limitations of previous VAE-based methods and advancing the state of clustering-based generative modeling.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# SideSeeing: 歩道アセスメントのためのマルチモーダルデータセットとツールコレクション

SideSeeing: A multimodal dataset and collection of tools for sidewalk assessment ( http://arxiv.org/abs/2407.06464v2 )

ライセンス: Link先を確認

R. J. P. Damaceno, L. Ferreira, F. Miranda, M. Hosseini, R. M. Cesar Jr,

(参考訳) 構築された環境を評価するためのツールとデータセットを提供する新しいイニシアティブであるSideSeeingを紹介する。本稿では,道路レベルのデータ取得,ロード,分析のためのフレームワークを提案する。このフレームワークを用いて,胸部搭載モバイルデバイスから撮影した映像とセンサデータ(加速度計,ジャイロスコープ,磁気センサ,GPS)を統合した新しいデータセットを収集した。それぞれのデータサンプルは、ブラジルとアメリカの病院の近くで歩道を撮影するユーザーが横断する経路を表している。データセットは、9つの病院の周囲12kmをカバーする3時間のコンテンツを含み、325,000のビデオフレームと対応するセンサーデータを含んでいる。さらに,歩道のシーン識別のための新しい68要素分類法を提案する。 SideSeeingは、都市の専門家が深層歩道のアクセシビリティ評価に利用できる一連のツールへの一歩だ。 SideSeeingデータとツールはhttps://sites.usp.br/sideseeing/.comで公開されている。

This paper introduces SideSeeing, a novel initiative that provides tools and datasets for assessing the built environment. We present a framework for street-level data acquisition, loading, and analysis. Using the framework, we collected a novel dataset that integrates synchronized video footaged captured from chest-mounted mobile devices with sensor data (accelerometer, gyroscope, magnetometer, and GPS). Each data sample represents a path traversed by a user filming sidewalks near hospitals in Brazil and the USA. The dataset encompasses three hours of content covering 12 kilometers around nine hospitals, and includes 325,000 video frames with corresponding sensor data. Additionally, we present a novel 68-element taxonomy specifically created for sidewalk scene identification. SideSeeing is a step towards a suite of tools that urban experts can use to perform in-depth sidewalk accessibility evaluations. SideSeeing data and tools are publicly available at https://sites.usp.br/sideseeing/.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# 視覚言語モデルは盲目です

Vision language models are blind ( http://arxiv.org/abs/2407.06581v3 )

ライセンス: Link先を確認

Pooyan Rahmanzadehgervi, Logan Bolton, Mohammad Reza Taesiri, Anh Totti Nguyen,

(参考訳) 視覚機能を備えた大規模言語モデル(VLM)、例えば、GPT-4o、Gemini 1.5 Proは、数え切れないほどの画像テキストアプリケーションを動かし、多くの視覚基盤ベンチマークで高いスコアを得ている。私たちはBlindTestを提案します。BlindTestは、人間を識別するなど、まったく簡単な7つの視覚タスクのスイートです。 (a) 2つの円が重複するか否か (b)二つの線が交差するか否か (c)どの文字が一言で丸められているか、 (d)オリンピックのようなロゴの円の数を数える。驚いたことに、最先端の4つのVLMは平均してベンチマークで56.20%しか正確ではなく、 \newsonnetが最も正確である(73.77%)。 BlindTestでは、VLMは正確な空間情報とカウント(0から10)を必要とするタスクに苦労する。コードは、https://vlmsareblind.github.io/で入手できる。

Large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro are powering countless image-text applications and scoring high on many vision-understanding benchmarks. We propose BlindTest, a suite of 7 visual tasks absurdly easy to humans such as identifying (a) whether two circles overlap; (b) whether two lines intersect; (c) which letter is being circled in a word; and (d) counting the number of circles in a Olympic-like logo. Surprisingly, four state-of-the-art VLMs are, on average, only 56.20% accurate on our benchmark, with \newsonnet being the best (73.77% accuracy). On BlindTest, VLMs struggle with tasks that requires precise spatial information and counting (from 0 to 10), sometimes providing an impression of a person with myopia seeing fine details as blurry and making educated guesses. Code is available at: https://vlmsareblind.github.io/

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# Mobius:テキスト・ビデオ生成タスクのための高能率空間時間並列学習パラダイム

Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task ( http://arxiv.org/abs/2407.06617v2 )

ライセンス: Link先を確認

Yiran Yang, Jinchao Zhang, Ying Deng, Jie Zhou,

(参考訳) テキスト・トゥ・イメージ(T2I)生成タスクの成功に触発されて、多くの研究者がテキスト・トゥ・ビデオ(T2V)生成タスクに力を注いでいる。 T2Vフレームワークの多くは、通常、T2Iモデルから継承し、動的ビデオを生成するための時間外トレーニング層を追加します。しかし、従来の3D-Unetはシリアルモードであり、時空間層は空間層に追従する。我々は、このシリアルモードは、環境に優しいものではなく、T2Vの開発に適さない大規模な拡散モデルと大規模なデータセットで、より多くのトレーニングコストをもたらすと信じている。そこで本稿では,T2Vタスクのための高効率な時空間並列訓練パラダイムであるMobiusを提案する。我々の3D-Unetでは、時間層と空間層は並列であり、特徴フローとバックプロパゲーションを最適化する。 Mobiusは24%のGPUメモリと12%のトレーニング時間を節約し、T2Vの微調整タスクを大幅に改善し、AIGCコミュニティに新たな洞察を与える。将来、コードをリリースします。

Inspired by the success of the text-to-image (T2I) generation task, many researchers are devoting themselves to the text-to-video (T2V) generation task. Most of the T2V frameworks usually inherit from the T2I model and add extra-temporal layers of training to generate dynamic videos, which can be viewed as a fine-tuning task. However, the traditional 3D-Unet is a serial mode and the temporal layers follow the spatial layers, which will result in high GPU memory and training time consumption according to its serial feature flow. We believe that this serial mode will bring more training costs with the large diffusion model and massive datasets, which are not environmentally friendly and not suitable for the development of the T2V. Therefore, we propose a highly efficient spatial-temporal parallel training paradigm for T2V tasks, named Mobius. In our 3D-Unet, the temporal layers and spatial layers are parallel, which optimizes the feature flow and backpropagation. The Mobius will save 24% GPU memory and 12% training time, which can greatly improve the T2V fine-tuning task and provide a novel insight for the AIGC community. We will release our codes in the future.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# 物理世界とサイバー空間の整合性: 体操AIに関する包括的調査

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI ( http://arxiv.org/abs/2407.06886v2 )

ライセンス: Link先を確認

Yang Liu, Weixing Chen, Yongjie Bai, Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Zhida Li, Ganlong Zhao, Junyi Lin, Guanbin Li, Wen Gao, Liang Lin,

(参考訳) Embodied Artificial Intelligence (Embodied AI)は、AGI(Artificial General Intelligence)を達成するために不可欠であり、サイバースペースと物理世界を橋渡しする様々なアプリケーションの基盤として機能する。近年,MLM(Multi-modal Large Models)やWM(World Models)の出現が注目されている。しかし、MLMの時代には、Embodied AIに関する包括的な調査は行われていない。本調査では,Embodied AIの最近の進歩を包括的に調査する。まず,ロボットとシミュレータの代表的な研究の最前線をナビゲートし,研究の焦点とその限界を十分に理解する。そして、主な研究対象を4つ分析する。 1)知覚の具体化。 2) 相互作用の具体化。 3)具体化剤、及び 4)シム・トゥ・リアルな適応、最先端の手法、必須パラダイム、包括的なデータセットを網羅する。さらに,仮想および実実施エージェントにおけるMLMの複雑さを考察し,動的デジタルおよび物理環境における相互作用を促進することの重要性を強調した。最後に、具体化AIの課題と限界を要約し、今後の方向性について論じる。この調査が研究コミュニティの基礎的な参考として役立ち、継続的なイノベーションを刺激することを期待しています。関連するプロジェクトはhttps://github.com/HCPLab-SYSU/Embodied_AI_Paper_Listにある。

Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# 天皇が服を着る:パスワードマネージャによるWebユーザ認証のためのセキュアなガバナンスフレームワーク

The Emperor is Now Clothed: A Secure Governance Framework for Web User Authentication through Password Managers ( http://arxiv.org/abs/2407.07205v2 )

ライセンス: Link先を確認

Ali Cherry, Konstantinos Barmpis, Siamak F. Shahandashti,

(参考訳) パスワードマネージャとWebアプリケーション間のインタラクションを促進する既存のアプローチは、適切な機能を提供し、重要な攻撃に対する緩和戦略を提供していない。 HTML Autofillは十分な表現力がなく、Credential Management APIはブラウザ拡張パスワードマネージャをサポートしておらず、他の提案されたソリューションは確立したユーザメンタルモデルに準拠していない。本稿では,パスワードマネージャとWebアプリケーション間のインタラクションを仲介するブラウザベースのガバナンスフレームワークであるBerytusを提案する。 2つのAPIは、パスワードマネージャとWebアプリケーションの間のオーケストレータとして機能するBerytusをサポートするように設計されている。 Firefoxにおけるフレームワークの実装は、登録および認証プロセスを完全にサポートする。これは、フィッシング、クロスサイトスクリプティング、インラインコードインジェクション(例えば、悪意のあるブラウザ拡張による)、TLSプロキシに対する効果的な緩和戦略を提供するのに対して、コンテンツセキュリティポリシーやクレデンシャルトークン化のような既存の緩和戦略は部分的に有効である。フレームワーク設計は、マルチステップ、マルチファクタ、カスタム認証スキームのサポートなど、望ましい機能も提供する。包括的セキュリティと機能評価を提供し、将来的な方向性について議論する。

Existing approaches to facilitate the interaction between password managers and web applications fall short of providing adequate functionality and mitigation strategies against prominent attacks. HTML Autofill is not sufficiently expressive, Credential Management API does not support browser extension password managers, and other proposed solutions do not conform to established user mental models. In this paper, we propose Berytus, a browser-based governance framework that mediates the interaction between password managers and web applications. Two APIs are designed to support Berytus acting as an orchestrator between password managers and web applications. An implementation of the framework in Firefox is developed that fully supports registration and authentication processes. As an orchestrator, Berytus is able to authenticate web applications and facilitate authenticated key exchange between web applications and password managers, which as we show, can provide effective mitigation strategies against phishing, cross-site scripting, inline code injection (e.g., by a malicious browser extension), and TLS proxy in the middle attacks, whereas existing mitigation strategies such as Content Security Policy and credential tokenisation are only partially effective. The framework design also provides desirable functional properties such as support for multi-step, multi-factor, and custom authentication schemes. We provide a comprehensive security and functionality evaluation and discuss possible future directions.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# 強化学習による構造設計

Structural Design Through Reinforcement Learning ( http://arxiv.org/abs/2407.07288v2 )

ライセンス: Link先を確認

Thomas Rochefort-Beaudoin, Aurelian Vadean, Niels Aage, Sofiane Achiche,

(参考訳) 本稿では、トポロジ最適化(TO)における機械学習を促進するために設計された、オープンソースの強化学習(RL)環境である構造最適化ジム(SOgym)を紹介する。ソギムは、TOの物理学を報酬関数に統合することにより、物理的に実現可能で構造的に堅牢な設計をRLエージェントで生成することを可能にする。スケーラビリティを高めるために、Sogymは、環境とエージェントの間のメッシュ非依存のインターフェースとしてフィーチャーマッピングメソッドを活用し、メッシュの解像度に関わらず、設計変数との効率的なインタラクションを可能にする。ベースラインの結果はモデルフリーのプロキシポリシー最適化エージェントとモデルベースDreamerV3エージェントを使用する。 3つの観測空間が試験された。 TopOptのゲームインスパイアされた構成は、ボリューム制約下でのコンプライアンスを最小化する構造設計における学生の直感を改善するインタラクティブな教育ツールであり、性能とサンプル効率の点で最善を尽くした。 DreamerV3の100Mパラメータバージョンは、従来の最適化手法によって達成されたベースラインコンプライアンスの54%以内の構造と0%の切断率を生成した。エージェントの学習率とTopOptゲーム実験の工学生の学習率を比較すると、DreamerV3-100Mモデルは約4桁の学習率を示し、試行錯誤を通じてスクラッチからトレーニングされたポリシーにとって素晴らしい成果だ。これらの結果は、RLが継続的TO問題を解決し、多様な設計ソリューションから学び、学習する能力を持っていることを示唆している。 SOgymは複雑な構造設計の課題に対してRLエージェントを開発するためのプラットフォームを提供しており、この分野のさらなる研究を支援するために公開されている。

This paper introduces the Structural Optimization gym (SOgym), a novel open-source Reinforcement Learning (RL) environment designed to advance machine learning in Topology Optimization (TO). SOgym enables RL agents to generate physically viable and structurally robust designs by integrating the physics of TO into the reward function. To enhance scalability, SOgym leverages feature-mapping methods as a mesh-independent interface between the environment and the agent, allowing efficient interaction with the design variables regardless of mesh resolution. Baseline results use a model-free Proximal Policy Optimization agent and a model-based DreamerV3 agent. Three observation space configurations were tested. The TopOpt game-inspired configuration, an interactive educational tool that improves students' intuition in designing structures to minimize compliance under volume constraints, performed best in terms of performance and sample efficiency. The 100M parameter version of DreamerV3 produced structures within 54% of the baseline compliance achieved by traditional optimization methods and a 0% disconnection rate, an improvement over supervised learning approaches that often struggle with disconnected load paths. When comparing the learning rates of the agents to those of engineering students from the TopOpt game experiment, the DreamerV3-100M model shows a learning rate approximately four orders of magnitude lower, an impressive feat for a policy trained from scratch through trial and error. These results suggest RL's potential to solve continuous TO problems and its capacity to explore and learn from diverse design solutions. SOgym provides a platform for developing RL agents for complex structural design challenges and is publicly available to support further research in the field.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# 大規模視線モデルに対する攻撃調査:資源・進歩・今後の動向

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends ( http://arxiv.org/abs/2407.07403v2 )

ライセンス: Link先を確認

Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu,

(参考訳) 近年の大規模モデルの発展に伴い、LVLM(Large Vision-Language Models)は多モード理解と推論タスクの幅広い分野において顕著な機能を示した。 LVLMは従来のLarge Language Models (LLMs)と比較して、マルチリソースの現実世界アプリケーションに近づき、マルチモーダル処理の複雑さのため、大きな可能性と課題を示す。しかし、LVLMsの脆弱性は比較的過小評価されており、日々の使用において潜在的なセキュリティリスクを生じさせている。本稿では,既存のLVLM攻撃の様々な形態について概説する。具体的には、まず、攻撃予備、攻撃課題、攻撃資源を含むLVLMをターゲットにした攻撃の背景を紹介する。次に,モデル出力を操作する敵攻撃,不正行為のモデル脆弱性を悪用するジェイルブレイク攻撃,プロンプト型とパターンを設計するインジェクション攻撃,モデルトレーニングに影響を与えるデータ中毒など,LVLM攻撃手法の開発を体系的に検討する。最後に,将来的な研究の方向性について論じる。我々の調査は、LVLMの脆弱性の現在の状況に関する洞察を提供し、より多くの研究者がLVLM開発における潜在的な安全性問題を探求し緩和するよう促していると信じています。 LVLM攻撃に関する最新の論文は、https://github.com/liudaizong/Awesome-LVLM-Attack.comで継続的に収集されている。

With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the complexity of multi-modal processing. However, the vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage. In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks. Specifically, we first introduce the background of attacks targeting LVLMs, including the attack preliminary, attack challenges, and attack resources. Then, we systematically review the development of LVLM attack methods, such as adversarial attacks that manipulate model outputs, jailbreak attacks that exploit model vulnerabilities for unauthorized actions, prompt injection attacks that engineer the prompt type and pattern, and data poisoning that affects model training. Finally, we discuss promising research directions in the future. We believe that our survey provides insights into the current landscape of LVLM vulnerabilities, inspiring more researchers to explore and mitigate potential safety issues in LVLM developments. The latest papers on LVLM attacks are continuously collected in https://github.com/liudaizong/Awesome-LVLM-Attack.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# Open-Vocabulary Video Instance Segmentationのための統一埋め込みアライメント

Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation ( http://arxiv.org/abs/2407.07427v2 )

ライセンス: Link先を確認

Hao Fang, Peng Wu, Yawei Li, Xinxin Zhang, Xiankai Lu,

(参考訳) Open-Vocabulary Video Instance Segmentation (VIS)は、任意のオブジェクトをセグメンテーションし追跡する能力によって、注目を集めている。しかし、最近のOpen-Vocabulary VISの試みは、特に新しいカテゴリの一般化能力に関して、不満足な結果を得た。 VLM機能(例えばCLIP)とインスタンスクエリのドメインギャップと時間的一貫性の未利用が2つの中心的な原因であることが判明した。これらの問題を緩和するために、我々はOVFormerと呼ばれる新しいOpen-Vocabulary VISベースラインを設計し、訓練する。 OVFormerは軽量なモジュールを使用して、クエリの埋め込みとCLIPイメージの埋め込みを統合してドメインギャップを修復する。従来の画像ベーストレーニングとは異なり、ビデオベースのモデルトレーニングを行い、ビデオ内の時間的一貫性を完全にマイニングする半オンライン推論スキームをデプロイする。ベルとホイッスルがなければ、OVFormerはLV-VISのResNet-50バックボーンで21.9mAPを達成した。いくつかの近接語彙VISデータセットに対する大規模な実験は、OVFormerの強いゼロショット一般化能力(YouTube-VIS 2019では7.6 mAP、OVISでは3.9 mAP)も示している。コードはhttps://github.com/fanghaook/OVFormer.comで入手できる。

Open-Vocabulary Video Instance Segmentation (VIS) is attracting increasing attention due to its ability to segment and track arbitrary objects. However, the recent Open-Vocabulary VIS attempts obtained unsatisfactory results, especially in terms of generalization ability of novel categories. We discover that the domain gap between the VLM features (e.g., CLIP) and the instance queries and the underutilization of temporal consistency are two central causes. To mitigate these issues, we design and train a novel Open-Vocabulary VIS baseline called OVFormer. OVFormer utilizes a lightweight module for unified embedding alignment between query embeddings and CLIP image embeddings to remedy the domain gap. Unlike previous image-based training methods, we conduct video-based model training and deploy a semi-online inference scheme to fully mine the temporal consistency in the video. Without bells and whistles, OVFormer achieves 21.9 mAP with a ResNet-50 backbone on LV-VIS, exceeding the previous state-of-the-art performance by 7.7. Extensive experiments on some Close-Vocabulary VIS datasets also demonstrate the strong zero-shot generalization ability of OVFormer (+ 7.6 mAP on YouTube-VIS 2019, + 3.9 mAP on OVIS). Code is available at https://github.com/fanghaook/OVFormer.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# 半監督的時間的行動定位のための適応的擬似ラベル学習に向けて

Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization ( http://arxiv.org/abs/2407.07673v2 )

ライセンス: Link先を確認

Feixiang Zhou, Bryan Williams, Hossein Rahmani,

(参考訳) ノイズを緩和する擬似ラベルは、セミスーパーバイズド・テンポラル・アクション・ローカライゼーション(SS-TAL)において重要な課題である。既存の手法はしばしば厳密な条件に基づいて擬似ラベルをフィルタリングするが、典型的には分類とローカライゼーションの質を別々に評価し、最適でない擬似ラベルのランク付けと選択に繋がる。特に、選択された正のラベルの中に不正確な擬似ラベルがあり、信頼されたラベルは誤って負のラベルに割り当てられる。これらの問題に対処するため, 擬似ラベル選択を容易にするために, 適応型擬似ラベル学習(APL)フレームワークを提案する。具体的には、ランキング品質を改善するために、分類信頼性と局所化信頼性を協調的に学習し、次いで、共同スコアに基づいて擬似ラベルを動的に選択する適応ラベル品質評価(ALQA)を提案する。さらに、インスタンスレベルの一貫性判別器(ICD)を提案し、不明瞭な正と潜在的な正を同時に除去し、インスタンス間固有の一貫性に基づいて、より正確な選択をもたらす。さらに,行動と背景の区別を高めるために,一般教師なしの行動対応コントラスト事前訓練(ACP)を導入し,SS-TALの恩恵を受ける。 THUMOS14とActivityNet v1.3の広範囲な実験により,様々な半教師付き環境下での最先端性能が実証された。

Alleviating noisy pseudo labels remains a key challenge in Semi-Supervised Temporal Action Localization (SS-TAL). Existing methods often filter pseudo labels based on strict conditions, but they typically assess classification and localization quality separately, leading to suboptimal pseudo-label ranking and selection. In particular, there might be inaccurate pseudo labels within selected positives, alongside reliable counterparts erroneously assigned to negatives. To tackle these problems, we propose a novel Adaptive Pseudo-label Learning (APL) framework to facilitate better pseudo-label selection. Specifically, to improve the ranking quality, Adaptive Label Quality Assessment (ALQA) is proposed to jointly learn classification confidence and localization reliability, followed by dynamically selecting pseudo labels based on the joint score. Additionally, we propose an Instance-level Consistency Discriminator (ICD) for eliminating ambiguous positives and mining potential positives simultaneously based on inter-instance intrinsic consistency, thereby leading to a more precise selection. We further introduce a general unsupervised Action-aware Contrastive Pre-training (ACP) to enhance the discrimination both within actions and between actions and backgrounds, which benefits SS-TAL. Extensive experiments on THUMOS14 and ActivityNet v1.3 demonstrate that our method achieves state-of-the-art performance under various semi-supervised settings.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# 科学シミュレーションのためのスマートサロゲートの能動的学習の可能性

Feasibility Study on Active Learning of Smart Surrogates for Scientific Simulations ( http://arxiv.org/abs/2407.07674v2 )

ライセンス: Link先を確認

Pradeep Bajracharya, Javier Quetzalcóatl Toledo-Marín, Geoffrey Fox, Shantenu Jha, Linwei Wang,

(参考訳) 複雑なシステムを理解する上で重要な高性能な科学シミュレーションは、特に広いパラメータ空間を探索する際に計算上の問題に遭遇する。シミュレーションを加速できる代理モデルとして、ディープニューラルネットワーク(DNN)の開発への関心が高まっている。しかし、これらのDNNサロゲートをトレーニングするための既存のアプローチは、ヒューリスティックに選択され、高価な計算で生成される広範なシミュレーションデータに依存している。本稿では,DNNサロゲートトレーニングにアクティブラーニングを取り入れることの可能性を検討する。これにより、インテリジェントで客観的なトレーニングシミュレーションの選択が可能になり、広範なシミュレーションデータを生成する必要がなくなり、事前定義されたトレーニングシミュレーションに対するDNNサロゲートのパフォーマンスの依存性が軽減される。 2つの異なるDNNアーキテクチャを考慮し,拡散方程式に対するDNNサロゲート構築の問題点として,多様性と不確実性に基づくトレーニングシミュレーション選択手法の有効性を検討する。研究成果は,科学シミュレーションの効率向上を図るために,能動的学習戦略によるシミュレーションデータのオンザフライ生成を支援する,スマートサロゲートのための高性能コンピューティング基盤の開発の基礎となるものである。

High-performance scientific simulations, important for comprehension of complex systems, encounter computational challenges especially when exploring extensive parameter spaces. There has been an increasing interest in developing deep neural networks (DNNs) as surrogate models capable of accelerating the simulations. However, existing approaches for training these DNN surrogates rely on extensive simulation data which are heuristically selected and generated with expensive computation -- a challenge under-explored in the literature. In this paper, we investigate the potential of incorporating active learning into DNN surrogate training. This allows intelligent and objective selection of training simulations, reducing the need to generate extensive simulation data as well as the dependency of the performance of DNN surrogates on pre-defined training simulations. In the problem context of constructing DNN surrogates for diffusion equations with sources, we examine the efficacy of diversity- and uncertainty-based strategies for selecting training simulations, considering two different DNN architecture. The results set the groundwork for developing the high-performance computing infrastructure for Smart Surrogates that supports on-the-fly generation of simulation data steered by active learning strategies to potentially improve the efficiency of scientific simulations.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# ケーラー非線形振動子における位相遷移

Topological Transitions in a Kerr Nonlinear Oscillator ( http://arxiv.org/abs/2407.07729v2 )

ライセンス: Link先を確認

Juan Lin, Shou-Bang Yang, Fan Wu, Zhen-Biao Yang,

(参考訳) カー非線形発振器(KNO)は、連続変数量子ビット基底状態の符号化に適した一対の定常固有状態、反対位相のコヒーレント状態をサポートする。定常状態部分空間内に閉じ込められたKNOの任意制御は、システムのクエンチ速度に対する物理的観測値の線形応答によるベリー曲率の抽出を可能にし、KNOにおける位相の効果的な評価法を提供する。代替として、KNOに「断熱へのショートカット」を採用する制御は、加速された断熱的固有状態の進化を通じてトポロジーの探索を可能にし、3つの物理観測物全てを測定する。位相遷移は、それぞれベリー曲率の積分と新しい極角関係から得られる第1チャーン数のパラメータ空間全体へのジャンプによって明らかにされる。我々の戦略は、連続変数系のトポロジカル遷移を測定する方法である。

A Kerr nonlinear oscillator (KNO) supports a pair of steady eigenstates, coherent states with opposite phases, that are good for the encoding of continuous variable qubit basis states. Arbitrary control of the KNO confined within the steady state subspace allows extraction of the Berry curvature through the linear response of the physical observable to the quench velocity of the system, providing an effective method for the characterization of topology in the KNO. As an alternative, the control adopting the "shortcut to adiabaticity" to the KNO enables the exploration of the topology through accelerated adiabatic eigenstate evolution to measure all three physical observables. Topological transitions are revealed by the jump of the first Chern number, obtained respectively from the integral of the Berry curvature and of the new polar angle relation, over the whole parameter space. Our strategy paves the way for measuring topological transitions in continuous variable systems.

翻訳日:2024-07-16 03:58:18 公開日:2024-07-12

# Mobility VLA: 長期VLMとトポロジグラフを用いたマルチモーダルインストラクションナビゲーション

Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs ( http://arxiv.org/abs/2407.07775v2 )

ライセンス: Link先を確認

Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan,

(参考訳) ナビゲーション研究の究極的な目標は、自然言語や画像を含むマルチモーダル命令を理解し、有用なナビゲーションを実行するインテリジェントエージェントを構築することである。そこで本研究では,MINT (Multimodal Instruction Navigation with Demo Tours) と呼ばれる,従来記録されていたデモビデオを通じて,事前の環境を提供するナビゲーションタスクのカテゴリについて検討する。視覚言語モデル(VLM)の最近の進歩は、マルチモーダル入力の知覚と推論能力を示すものとして、この目標を達成する上で有望な道筋を示している。しかしながら、VLMは典型的にはテキスト出力を予測するために訓練されており、ナビゲーションに最適な方法に関するオープンな研究課題である。 MINT を解決するために,環境理解と長文 VLM の共通感覚推論能力とトポロジグラフに基づくロバストな低レベルナビゲーションポリシを組み合わせた階層型視覚言語行動(VLA)ナビゲーションポリシーであるモビリティ VLA を提案する。高レベルポリシーは、デモツアービデオとマルチモーダルユーザーインストラクションを入力として、ツアービデオのゴールフレームを見つけるための長文VLMで構成されている。次に、低レベルのポリシーでは、ゴールフレームとオフラインで構築されたトポロジグラフを使用して、各ステップでロボットアクションを生成する。我々は,836m^2実環境におけるモビリティVLAの評価を行い,プラスチック製の容器を持ちながら,それまで未解決であったマルチモーダル命令に対して,モビリティVLAは高いエンドツーエンドの成功率を示す。 Mobility VLAのデモビデオはこちらで見ることができる。

An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recorded demonstration video. Recent advances in Vision Language Models (VLMs) have shown a promising path in achieving this goal as it demonstrates capabilities in perceiving and reasoning about multimodal inputs. However, VLMs are typically trained to predict textual output and it is an open research question about how to best utilize them in navigation. To solve MINT, we present Mobility VLA, a hierarchical Vision-Language-Action (VLA) navigation policy that combines the environment understanding and common sense reasoning power of long-context VLMs and a robust low-level navigation policy based on topological graphs. The high-level policy consists of a long-context VLM that takes the demonstration tour video and the multimodal user instruction as input to find the goal frame in the tour video. Next, a low-level policy uses the goal frame and an offline constructed topological graph to generate robot actions at every timestep. We evaluated Mobility VLA in a 836m^2 real world environment and show that Mobility VLA has a high end-to-end success rates on previously unsolved multimodal instructions such as "Where should I return this?" while holding a plastic bin. A video demonstrating Mobility VLA can be found here: https://youtu.be/-Tof__Q8_5s

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# 無効機器を用いた双方向MRの同定と推定

Identification and Estimation of the Bi-Directional MR with Some Invalid Instruments ( http://arxiv.org/abs/2407.07933v2 )

ライセンス: Link先を確認

Feng Xie, Zhen Yao, Lin Xie, Yan Zeng, Zhi Geng,

(参考訳) 両方向メンデルランダム化(MR)における純粋観測データから因果効果を推定する難しい問題について考察する。この問題に対処するために、既存のほとんどの手法は、専門家の知識によって、あるいは因果モデルが一方向MRモデルであると仮定して、対象因果効果の適切な有効器用変数(IV)を見つけようとする。そこで,本稿ではまず,観測データから双方向MRの同定を理論的に検討する。特に、一対の表現型(すなわち、治療と結果)の因果方向を含む双方向MRモデルが識別可能であるように、有効なIV集合が正しく同定される必要十分条件を提供する。さらに、同定理論に基づいて、有効なIV集合を発見し、興味の因果効果を推定するクラスタ融合のような手法を開発する。理論的に提案アルゴリズムの正しさを実証する。両方向MRの因果効果を推定するための方法の有効性を実験的に検証した。

We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming that the causal model is a one-directional MR model. As such, in this paper, we first theoretically investigate the identification of the bi-directional MR from observational data. In particular, we provide necessary and sufficient conditions under which valid IV sets are correctly identified such that the bi-directional MR model is identifiable, including the causal directions of a pair of phenotypes (i.e., the treatment and outcome). Moreover, based on the identification theory, we develop a cluster fusion-like method to discover valid IV sets and estimate the causal effects of interest. We theoretically demonstrate the correctness of the proposed algorithm. Experimental results show the effectiveness of our method for estimating causal effects in bi-directional MR.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# 潜伏条件付き要約因果グラフにおけるマクロ条件の不依存性とマクロトータル効果の同定

Identifying macro conditional independencies and macro total effects in summary causal graphs with latent confounding ( http://arxiv.org/abs/2407.07934v2 )

ライセンス: Link先を確認

Simon Ferreira, Charles K. Assaad,

(参考訳) ダイナミックシステムにおける因果関係を理解することは、疫学、経済学、生物学を含む多くの科学分野において不可欠である。因果推論法は広く研究されているが、しばしば完全に定義された因果グラフに依存しており、必ずしも複雑な力学系では利用できないかもしれない。要約因果グラフ(SCG)のような部分特定因果グラフは、因果関係の単純化、時間的情報の省略、高レベルの因果構造に焦点を当てる。グラフ内の頂点として表されるクラスタ間の関係を含むマクロクエリと、グラフの頂点を通して直接見えない変数間の関係を含むマイクロクエリである。本稿では,まず,マクロ条件の非依存性とマイクロ条件の非依存性と,マクロ効果とマイクロトータル効果を明確に区別する。次に,SCGにおけるマクロ条件の不一致を識別するために,d-セパレーションの健全性と完全性を示す。さらに,SCGにおけるマクロトータル効果を同定するために,do-calculusが健全かつ完全であることが確認された。逆に,マイクロコンディショナル・インディペンデンシーとマイクロトータル・エフェクトを考慮した場合,これらの結果は成立しないことを示す。

Understanding causal relationships in dynamic systems is essential for numerous scientific fields, including epidemiology, economics, and biology. While causal inference methods have been extensively studied, they often rely on fully specified causal graphs, which may not always be available or practical in complex dynamic systems. Partially specified causal graphs, such as summary causal graphs (SCGs), provide a simplified representation of causal relationships, omitting temporal information and focusing on high-level causal structures. This simplification introduces new challenges concerning the types of queries of interest: macro queries, which involve relationships between clusters represented as vertices in the graph, and micro queries, which pertain to relationships between variables that are not directly visible through the vertices of the graph. In this paper, we first clearly distinguish between macro conditional independencies and micro conditional independencies and between macro total effects and micro total effects. Then, we demonstrate the soundness and completeness of the d-separation to identify macro conditional independencies in SCGs. Furthermore, we establish that the do-calculus is sound and complete for identifying macro total effects in SCGs. Conversely, we also show through various examples that these results do not hold when considering micro conditional independencies and micro total effects.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# 言語モデル復号化のためのオートマタによる制約

Automata-based constraints for language model decoding ( http://arxiv.org/abs/2407.08103v2 )

ライセンス: Link先を確認

Terry Koo, Frederick Liu, Luheng He,

(参考訳) 例えば、構造化データ、API呼び出し、コードスニペットなどである。 LMは形式構文への適合性を改善するために調整できるが、特に大規模展開に適した小型のLMでは適合性は保証されない。加えて、チューニングにはかなりのリソースが必要であるため、一般的でないフォーマットやタスク固有のフォーマットでは実用的ではない。下流のパースエラーを防ぐためには、LMが有効な出力のみを生成することを理想的に制限するが、これはトークン化によって非常に複雑になる。 APIコールやスキーマ誘導JSON,YAMLなど,多くの実用的なアプリケーションを備えた多種多様な形式言語である,正規言語に対する効率的なクローズドフォームソリューションを導出する,オートマトン理論の適用により,これらの問題を解決する。また,高分岐係数問題に対処するための実用的拡張についても論じる。最後に、我々の手法を決定論的文脈自由言語に拡張し、同様に効率的な閉形式解を許容する。その柔軟性と代表的能力にもかかわらず、我々のアプローチでは、トークンごとの復号化ロジットへのアクセスしか必要とせず、LMサイズに依存しない単純な計算に抑えられるため、ほぼ全てのLMアーキテクチャに効率よく簡単に適用できる。

LMs are often expected to generate strings in some formal language; for example, structured data, API calls, or code snippets. Although LMs can be tuned to improve their adherence to formal syntax, this does not guarantee conformance, especially with smaller LMs suitable for large-scale deployment. In addition, tuning requires significant resources, making it impractical for uncommon or task-specific formats. To prevent downstream parsing errors we would ideally constrain the LM to only produce valid output, but this is severely complicated by tokenization, which is typically both ambiguous and misaligned with the formal grammar. We solve these issues through the application of automata theory, deriving an efficient closed-form solution for the regular languages, a broad class of formal languages with many practical applications, including API calls or schema-guided JSON and YAML. We also discuss pragmatic extensions for coping with the issue of high branching factor. Finally, we extend our techniques to deterministic context-free languages, which similarly admit an efficient closed-form solution. In spite of its flexibility and representative power, our approach only requires access to per-token decoding logits and lowers into simple calculations that are independent of LM size, making it both efficient and easy to apply to almost any LM architecture.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# 欧州連合におけるフェデレーション学習とAI規制 : 責任とは何か -- 学際的分析

Federated Learning and AI Regulation in the European Union: Who is Responsible? -- An Interdisciplinary Analysis ( http://arxiv.org/abs/2407.08105v2 )

ライセンス: Link先を確認

Herbert Woisetschläger, Simon Mertel, Christoph Krönke, Ruben Mayer, Hans-Arno Jacobsen,

(参考訳) 欧州連合人工知能法(EU)は、膨大な罰金を回避するため、機械学習アプリケーションの開発とデプロイにおけるステークホルダーの明確な責任を委任し、その起源にあるデータによるプライベートでセキュアなデータ処理を優先する。フェデレートラーニング(FL)は、データサイロを越えた生成AIモデルのトレーニングを可能にし、データセキュリティを改善しながらモデルパラメータのみを共有する。 FLは協調学習パラダイムであるため、クライアントとサーバはFLパイプラインにおける法的責任を自然に共有する。我々の仕事は、双方の役割を明確にし、責任をサーバオペレータに移すための戦略を説明し、EU AI法の下でFLの実践的適用性を改善するために解決しなければならない、オープンな技術的課題を指摘している。

The European Union Artificial Intelligence Act mandates clear stakeholder responsibilities in developing and deploying machine learning applications to avoid substantial fines, prioritizing private and secure data processing with data remaining at its origin. Federated Learning (FL) enables the training of generative AI Models across data siloes, sharing only model parameters while improving data security. Since FL is a cooperative learning paradigm, clients and servers naturally share legal responsibility in the FL pipeline. Our work contributes to clarifying the roles of both parties, explains strategies for shifting responsibilities to the server operator, and points out open technical challenges that we must solve to improve FL's practical applicability under the EU AI Act.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# EchoMimic: 編集可能なランドマーク条件によるライブライクなオーディオ駆動のポートレートアニメーション

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions ( http://arxiv.org/abs/2407.08136v2 )

ライセンス: Link先を確認

Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma,

(参考訳) オーディオ入力によって推進されるポートレートイメージアニメーションの領域は、ライフライクでダイナミックなポートレートの生成において顕著な進歩を見せている。従来の方法では、音声または顔のキーポイントを使用して映像をビデオに駆動するに限られるが、良好な結果が得られるが、ある問題が存在する。例えば、音声のみによって駆動される手法は、比較的弱い音声信号のために時々不安定になり、一方、顔のキーポイントのみによって駆動される手法は、運転時により安定しているが、キーポイント情報の過剰な制御による不自然な結果をもたらす可能性がある。本稿では,これまでに述べた課題に対処するため,EchoMimicという新しいアプローチを提案する。 EchoMimicはオーディオと顔のランドマークの両方を使って同時にトレーニングされている。新たなトレーニング戦略の実装を通じて、EchoMimicは、オーディオと顔のランドマークを個別に生成するだけでなく、オーディオと選択された顔のランドマークを組み合わせることで、ポートレートビデオを生成することができる。 EchoMimicは、さまざまな公開データセットや収集データセットの代替アルゴリズムと比較して総合的に比較され、定量評価と定性評価の両方において優れたパフォーマンスを示している。ソースコードへのさらなる視覚化とアクセスは、EchoMimicプロジェクトページにある。

The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into videos, while they can yield satisfactory results, certain issues exist. For instance, methods driven solely by audios can be unstable at times due to the relatively weaker audio signal, while methods driven exclusively by facial key points, although more stable in driving, can result in unnatural outcomes due to the excessive control of key point information. In addressing the previously mentioned challenges, in this paper, we introduce a novel approach which we named EchoMimic. EchoMimic is concurrently trained using both audios and facial landmarks. Through the implementation of a novel training strategy, EchoMimic is capable of generating portrait videos not only by audios and facial landmarks individually, but also by a combination of both audios and selected facial landmarks. EchoMimic has been comprehensively compared with alternative algorithms across various public datasets and our collected dataset, showcasing superior performance in both quantitative and qualitative evaluations. Additional visualization and access to the source code can be located on the EchoMimic project page.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# 距離一貫性リハーサルによる画像検索における生涯病理組織学

Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency Rehearsal ( http://arxiv.org/abs/2407.08153v2 )

ライセンス: Link先を確認

Xinyu Zhu, Zhiguo Jiang, Kun Wu, Jun Shi, Yushan Zheng,

(参考訳) 近年,CBHIR (Content-based Histopathological Image Search) が注目されている。しかし、臨床実践においては、WSIデータベースの連続的な拡張サイズは、現在のCBHIR法の実用化に制限される。本稿では,連続的に成長する検索データベース上でのプログレッシブモデル更新による破滅的な忘れ込みの課題を解決するために,ライフロング・ホール・スライド検索(LWSR)フレームワークを提案する。私たちのフレームワークは、継続的学習中に安定性と可塑性のバランスを達成することを目的としています。システムの可塑性を維持するため,ローカルメモリバンクと貯水池サンプリングを用いて,旧タスクと新タスクの両方の特徴空間を包括的に包括的に包括的に包括するインスタンスの保存を行う。さらに,従来のタスクに対する検索キューの整合性を確保するために,距離整合リハーサル (DCR) モジュールが設計されている。提案手法をTCGAプロジェクトの4つの公開WSIデータセット上で評価した。実験により,提案手法は有効であり,最先端手法よりも優れていることが示された。

Content-based histopathological image retrieval (CBHIR) has gained attention in recent years, offering the capability to return histopathology images that are content-wise similar to the query one from an established database. However, in clinical practice, the continuously expanding size of WSI databases limits the practical application of the current CBHIR methods. In this paper, we propose a Lifelong Whole Slide Retrieval (LWSR) framework to address the challenges of catastrophic forgetting by progressive model updating on continuously growing retrieval database. Our framework aims to achieve the balance between stability and plasticity during continuous learning. To preserve system plasticity, we utilize local memory bank with reservoir sampling method to save instances, which can comprehensively encompass the feature spaces of both old and new tasks. Furthermore, A distance consistency rehearsal (DCR) module is designed to ensure the retrieval queue's consistency for previous tasks, which is regarded as stability within a lifelong CBHIR system. We evaluated the proposed method on four public WSI datasets from TCGA projects. The experimental results have demonstrated the proposed method is effective and is superior to the state-of-the-art methods.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# RB-SQL: テキスト・トゥ・SQLのための検索ベースのLLMフレームワーク

RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL ( http://arxiv.org/abs/2407.08273v2 )

ライセンス: Link先を確認

Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song,

(参考訳) 文脈内学習を伴う大規模言語モデル(LLM)は、テキスト対SQLタスクのパフォーマンスを大幅に改善した。これまでの作業は一般的に、LLMの推論能力を改善するために排他的なSQL生成プロンプトを使用することに重点を置いていた。しかし、多くのテーブルや列を持つ大規模なデータベースを扱うことはほとんど難しく、通常、事前処理データベースの重要性を無視し、より効率的なプロンプトエンジニアリングのために貴重な情報を抽出する。提案するRB-SQLは,簡潔なテーブルと列をスキーマとして検索する3つのモジュールと,コンテキスト内学習のためのターゲット例からなる,コンテキスト内プロンプトエンジニアリングのための新しいLLMフレームワークである。実験により,我々のモデルは,公開データセットのBIRDとSpiderの競合ベースラインよりも優れた性能が得られることが示された。

Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting valuable information for more efficient prompt engineering. Based on above analysis, we propose RB-SQL, a novel retrieval-based LLM framework for in-context prompt engineering, which consists of three modules that retrieve concise tables and columns as schema, and targeted examples for in-context learning. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# 任意分解能における適応型深部虹彩特徴エクストラクタ

Adaptive Deep Iris Feature Extractor at Arbitrary Resolutions ( http://arxiv.org/abs/2407.08341v2 )

ライセンス: Link先を確認

Yuho Shoji, Yuka Ogino, Takahiro Toizumi, Atsushi Ito,

(参考訳) 本稿では,任意の解像度で虹彩認識を行うための深部特徴抽出器を提案する。分解能劣化は、高解像度画像で訓練されたディープラーニングモデルの認識性能を低下させる。高解像度画像の認識性能を犠牲にしながら、各種解像度画像のトレーニングによりモデルの堅牢性を向上させることができる。様々な解像度で高い認識性能を実現するために,自動切替ネットワークを用いた分解能適応特徴抽出法を提案する。我々のフレームワークには、ダウンサンプリングやアウト・オブ・フォーカスのぼかしなど、様々な分解能劣化に特化した分解能専門家モジュールが含まれています。入力画像の劣化条件に応じて自動的に切り替える。低解像度の専門家は、両方の専門家が共通のアイデンティティの特徴を抽出できるように、高解像度の専門家からの知識蒸留によって訓練される。従来の3つのニューラルネットワークモデルに我々のフレームワークを適用した。実験結果から,本手法は従来手法の低解像度での認識性能の向上と高解像度での認識性能の維持を図っている。

This paper proposes a deep feature extractor for iris recognition at arbitrary resolutions. Resolution degradation reduces the recognition performance of deep learning models trained by high-resolution images. Using various-resolution images for training can improve the model's robustness while sacrificing recognition performance for high-resolution images. To achieve higher recognition performance at various resolutions, we propose a method of resolution-adaptive feature extraction with automatically switching networks. Our framework includes resolution expert modules specialized for different resolution degradations, including down-sampling and out-of-focus blurring. The framework automatically switches them depending on the degradation condition of an input image. Lower-resolution experts are trained by knowledge-distillation from the high-resolution expert in such a manner that both experts can extract common identity features. We applied our framework to three conventional neural network models. The experimental results show that our method enhances the recognition performance at low-resolution in the conventional methods and also maintains their performance at high-resolution.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# 統計物理学の基礎における量子力学的積分性

Quantum Thermodynamic Integrability for Foundations of Statistical Physics ( http://arxiv.org/abs/2407.08344v2 )

ライセンス: Link先を確認

Ruo-Xun Zhai, C. P. Sun,

(参考訳) 第二法則のカラス・エオロディの原理を、体積や磁場などのマクロ変数に依存するエネルギー準位を持つ量子熱力学に拡張する。この拡張は量子熱力学積分(QTI)の概念を導入し、統計力学の代替基盤を提供する。 QTIの特徴は、熱力学多様体内の仕事と熱の経路依存性であり、エネルギーレベルと特定の熱力学パラメータによって局所的に記述されている。この枠組みの中で、温度は自然に積分因子として現れ、QTIに基づくエントロピー積分方程式(EIE)から正準分布と非正準分布の両方を導出することができる。特に、非正準状態は、熱力学限界の外側で特に重要なものとなり、有限サイズの熱力学系における情報相関の存在を明らかにしている。

We extend the Carath\'eodory principle of the Second Law to quantum thermodynamics with energy levels depending on macroscopic variables, such as volume and magnetic field. This extension introduces the concept of Quantum Thermodynamic Integrability (QTI), offering an alternative foundation for statistical mechanics. QTI is characterized by the path-independence of work and heat within the thermodynamic manifold, which is locally described by energy levels and specific thermodynamic parameters. Within this framework, temperature naturally emerges as an integrating factor, allowing for the derivation of both canonical and non-canonical distributions from the Entropy Integrable Equations (EIE) based on QTI. Notably, non-canonical states, which become particularly significant outside the thermodynamic limit, reveal the existence of informational correlations in finite-size thermodynamic systems.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# PredBench: さまざまな分野にわたる時空間予測のベンチマーク

PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines ( http://arxiv.org/abs/2407.08418v2 )

ライセンス: Link先を確認

ZiDong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai,

(参考訳) 本稿では,時空間予測ネットワークの全体的評価に適したベンチマークであるPredBenchを紹介する。この分野では大きな進歩があったが、様々な予測ネットワークアーキテクチャの詳細と比較分析のための標準化されたフレームワークはいまだに存在しない。 PredBenchはこのギャップに対処するため、大規模な実験を行い、標準化された適切な実験環境を維持し、多次元評価を実装する。このベンチマークは、広く採用されている12のメソッドと、複数のアプリケーションドメインにまたがる15の多様なデータセットを統合し、現代の時空間予測ネットワークを広範囲に評価する。 PredBenchは、様々なアプリケーションにわたる予測設定の厳密な校正を通じて、意図した使用に関する評価を保証し、公正な比較を可能にする。さらに、その多次元評価フレームワークは、包括的なメトリクスセットで分析を拡張し、モデルの能力に関する深い洞察を提供する。本研究から得られた知見は,今後の発展に向けての戦略的方向性を提供するものである。私たちのコードベースはhttps://github.com/OpenEarthLab/PredBench.orgで公開されています。

In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and appropriate experimental settings, and implementing multi-dimensional evaluations. This benchmark integrates 12 widely adopted methods with 15 diverse datasets across multiple application domains, offering extensive evaluation of contemporary spatio-temporal prediction networks. Through meticulous calibration of prediction settings across various applications, PredBench ensures evaluations relevant to their intended use and enables fair comparisons. Moreover, its multi-dimensional evaluation framework broadens the analysis with a comprehensive set of metrics, providing deep insights into the capabilities of models. The findings from our research offer strategic directions for future developments in the field. Our codebase is available at https://github.com/OpenEarthLab/PredBench.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# 無限運動:長文命令による拡張運動生成

Infinite Motion: Extended Motion Generation via Long Text Instructions ( http://arxiv.org/abs/2407.08443v2 )

ライセンス: Link先を確認

Mengtian Li, Chengshuo Zhai, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiang,

(参考訳) モーションジェネレーションの領域では、長周期で高品質なモーションシーケンスの作成は依然として重要な課題である。本稿では,長文から長文へ拡張された動き生成に活用し,短文と長文の運動合成のギャップを効果的に埋める新しい手法である「無限運動」に関する画期的な研究について述べる。私たちの中核となる洞察は、既存の高品質なテキストモーションデータセットの戦略的拡張と再組み立てであり、それによって、拡張されたモーションシーケンスのためのモデルのトレーニングを容易にする新しいベンチマークデータセットが作成されました。我々のモデルの重要な革新は、任意の長さのテキストを入力として受け入れることであり、特定の物語やシナリオに合わせた動き列の生成を可能にする。さらに、テキストのタイムスタンプ設計を取り入れ、生成したシーケンス内の局所セグメントの正確な編集を可能にし、非並列制御と動き合成の柔軟性を提供する。さらに、自然言語インタラクティブな編集、長いシーケンス内の動作シーケンスの編集、独立した動きシーケンスのスプライシングという3つの応用を通して、「無限運動」の汎用性と実用性を実証する。各アプリケーションは、我々のアプローチの適応性を強調し、モーションジェネレーションにおける研究と開発の可能性の範囲を広げる。大規模な実験を通じて,既存手法と比較して長周期動作の生成におけるモデルの性能を実証する。プロジェクトページ: https://shuochengzhai.github.io/ Infinite-motion.github.io/

In the realm of motion generation, the creation of long-duration, high-quality motion sequences remains a significant challenge. This paper presents our groundbreaking work on "Infinite Motion", a novel approach that leverages long text to extended motion generation, effectively bridging the gap between short and long-duration motion synthesis. Our core insight is the strategic extension and reassembly of existing high-quality text-motion datasets, which has led to the creation of a novel benchmark dataset to facilitate the training of models for extended motion sequences. A key innovation of our model is its ability to accept arbitrary lengths of text as input, enabling the generation of motion sequences tailored to specific narratives or scenarios. Furthermore, we incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences, offering unparalleled control and flexibility in motion synthesis. We further demonstrate the versatility and practical utility of "Infinite Motion" through three specific applications: natural language interactive editing, motion sequence editing within long sequences and splicing of independent motion sequences. Each application highlights the adaptability of our approach and broadens the spectrum of possibilities for research and development in motion generation. Through extensive experiments, we demonstrate the superior performance of our model in generating long sequence motions compared to existing methods.Project page: https://shuochengzhai.github.io/Infinite-motion.github.io/

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# Neural Poisson Solver: 自然信号ブレンディングのための普遍的で継続的なフレームワーク

Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending ( http://arxiv.org/abs/2407.08457v2 )

ライセンス: Link先を確認

Delong Wu, Hao Zhu, Qi Zhang, You Li, Zhan Ma, Xun Cao,

(参考訳) Inlicit Neural Representation (INR)は、視覚信号(例えば、2D画像や3Dシーン)を表現し、様々なダウンストリームアプリケーションで有望な結果を示す一般的な方法となっている。視覚信号の媒体としての可能性を考えると、INRを利用したニューラルブレンディング法の開発は自然な進歩である。ニューラルブレンディングは、2つのINRをマージして、両方の元の表現から情報をカプセル化する新しいINRを作成する。直接的アプローチでは、INRレンダリングプロセスに従来の画像編集手法を適用する。しかし、この手法はしばしば歪み、アーティファクト、色の変化をブレンドする。主な原因は、下層の画素格子の離散化と、変分問題を解くための境界条件の導入である。この問題に対処するために,INRによって表現される視覚信号をブレンドするための,プラグアンドプレイで普遍的に適用可能なフレームワークであるNeural Poisson Solverを導入する。我々のニューラル・ポアソン・ソルバーは連続ポアソン方程式に基づく変分問題解決手法を提供し、様々な領域で例外的な性能を示す。具体的には、変分問題の解法過程を表現するための勾配誘導型ニューラルソルバを提案し、対象信号を精製して自然なブレンディング結果を得る。また,ポアソン方程式に基づく損失と最適化手法を開発し,入力されたINRシーンを効果的にブレンドし,固有の構造と意味的内容を保存する。追加の事前知識への依存の欠如により,本手法は様々なタスクカテゴリに適応しやすく,その汎用性を強調している。総合的な実験結果は、複数の次元にまたがるアプローチの頑健さとタスクのブレンディングを検証した。

Implicit Neural Representation (INR) has become a popular method for representing visual signals (e.g., 2D images and 3D scenes), demonstrating promising results in various downstream applications. Given its potential as a medium for visual signals, exploring the development of a neural blending method that utilizes INRs is a natural progression. Neural blending involves merging two INRs to create a new INR that encapsulates information from both original representations. A direct approach involves applying traditional image editing methods to the INR rendering process. However, this method often results in blending distortions, artifacts, and color shifts, primarily due to the discretization of the underlying pixel grid and the introduction of boundary conditions for solving variational problems. To tackle this issue, we introduce the Neural Poisson Solver, a plug-and-play and universally applicable framework across different signal dimensions for blending visual signals represented by INRs. Our Neural Poisson Solver offers a variational problem-solving approach based on the continuous Poisson equation, demonstrating exceptional performance across various domains. Specifically, we propose a gradient-guided neural solver to represent the solution process of the variational problem, refining the target signal to achieve natural blending results. We also develop a Poisson equation-based loss and optimization scheme to train our solver, ensuring it effectively blends the input INR scenes while preserving their inherent structure and semantic content. The lack of dependence on additional prior knowledge makes our method easily adaptable to various task categories, highlighting its versatility. Comprehensive experimental results validate the robustness of our approach across multiple dimensions and blending tasks.

翻訳日:2024-07-16 03:48:26 公開日:2024-07-12

# マルチモーダル顔画像テキストデータセット1500万

15M Multimodal Facial Image-Text Dataset ( http://arxiv.org/abs/2407.08515v2 )

ライセンス: Link先を確認

Dawei Dai, YuTang Li, YingGe Liu, Mingming Jia, Zhang YuanHui, Guoyin Wang,

(参考訳) 現在、画像テキスト駆動型マルチモーダルディープラーニングモデルは、多くの分野でその顕著な可能性を実証している。実際には、顔画像を中心としたタスクは幅広い応用可能性を持っている。本稿では,顔画像の大規模・多様・高品質なデータセットである「textbf{FaceCaption-15M}」について,その自然言語記述(顔画像からテキストへ)を伴って述べる。このデータセットは、顔中心タスクの研究を容易にすることを目的としている。 FaceCaption-15Mは、1500万対以上の顔画像と、それに対応する顔の特徴の自然言語記述で構成されており、これまでで最大の顔画像キャプチャデータセットとなっている。画像品質, テキストの自然性, テキストの複雑さ, テキスト画像の関連性を総合的に分析し, FaceCaption-15Mの優位性を実証した。 FaceCaption-15Mの有効性を検証するために,顔画像と対応する字幕を特徴空間で整列させるために,まず顔画像前訓練モデル(FLIP,CLIPと類似)を訓練した。その後、画像エンコーダとテキストエンコーダを併用し、線形層のみを微調整することで、FLIPベースのモデルでは、2つの課題のある顔中心タスクに対して最先端の結果が得られた。目的は、FaceCaption-15Mデータセットの公開を通じて、顔関連タスクの研究を促進することである。すべてのデータ、コード、モデルは公開されています。 https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M

Currently, image-text-driven multi-modal deep learning models have demonstrated their outstanding potential in many fields. In practice, tasks centered around facial images have broad application prospects. This paper presents \textbf{FaceCaption-15M}, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image-caption dataset to date. We conducted a comprehensive analysis of image quality, text naturalness, text complexity, and text-image relevance to demonstrate the superiority of FaceCaption-15M. To validate the effectiveness of FaceCaption-15M, we first trained a facial language-image pre-training model (FLIP, similar to CLIP) to align facial image with its corresponding captions in feature space. Subsequently, using both image and text encoders and fine-tuning only the linear layer, our FLIP-based models achieved state-of-the-art results on two challenging face-centered tasks. The purpose is to promote research in the field of face-related tasks through the availability of the proposed FaceCaption-15M dataset. All data, codes, and models are publicly available. https://huggingface.co/datasets/OpenFace-CQUPT/FaceCaption-15M

翻訳日:2024-07-16 03:38:34 公開日:2024-07-12

# 時空間的フェデレーション学習のグラディエント・インバージョン・アタックに対するプライバシー強化

Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks ( http://arxiv.org/abs/2407.08529v2 )

ライセンス: Link先を確認

Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa,

(参考訳) 時空間フェデレーション学習は、様々な位置情報ベースのサービスにおいて、共有勾配しか持たない価値あるモデルを訓練する能力のために、近年、集中的な研究が進められている。一方、最近の研究では、画像やテキスト上での共有勾配は、勾配反転攻撃(GIA)を受ける可能性があることが示されている。しかし、現在、時空間学習における勾配反転攻撃に関する体系的な研究は行われていない。本稿では,攻撃と防衛の観点からの時空間的フェデレーション学習における勾配攻撃問題について検討する。まず、時空間学習におけるプライバシーリスクを理解するために、時空間データに適した勾配攻撃アルゴリズムである時空間勾配反転攻撃(ST-GIA)を提案する。さらに、時空間学習における勾配反転攻撃を軽減するための適応的な防御戦略を設計する。摂動レベルを動的に調整することで、さまざまなトレーニングデータに対して、適切な保護を提供することができます。実世界の3つのデータセットに対する集中的な実験分析により、提案した防衛戦略が、効果的なセキュリティ保護を備えた時空間フェデレーション学習の有用性を十分に維持できることが明らかとなった。

Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion attacks in spatiotemporal federated learning. In this paper, we explore the gradient attack problem in spatiotemporal federated learning from attack and defense perspectives. To understand privacy risks in spatiotemporal federated learning, we first propose Spatiotemporal Gradient Inversion Attack (ST-GIA), a gradient attack algorithm tailored to spatiotemporal data that successfully reconstructs the original location from gradients. Furthermore, we design an adaptive defense strategy to mitigate gradient inversion attacks in spatiotemporal federated learning. By dynamically adjusting the perturbation levels, we can offer tailored protection for varying rounds of training data, thereby achieving a better trade-off between privacy and utility than current state-of-the-art methods. Through intensive experimental analysis on three real-world datasets, we reveal that the proposed defense strategy can well preserve the utility of spatiotemporal federated learning with effective security protection.

翻訳日:2024-07-16 03:38:34 公開日:2024-07-12

# IDAT: 対話型タスクソービングエージェントの構築と評価のためのマルチモーダルデータセットとツールキット

IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents ( http://arxiv.org/abs/2407.08898v1 )

ライセンス: Link先を確認

Shrestha Mohanty, Negar Arabzadeh, Andrea Tupini, Yuxuan Sun, Alexey Skrynnik, Artem Zholus, Marc-Alexandre Côté, Julia Kiseleva,

(参考訳) AIエージェントと自然言語を用いた人間とのシームレスな対話は、AI研究の重要な目標である。本稿では,NeurIPSにおけるIGLUコンペティションを通じて,自然言語命令の理解と実行が可能な対話型エージェントを開発する上での課題について述べる。進歩にもかかわらず、適切なデータセットの不足や効果的な評価プラットフォームの必要性といった課題が続いている。 Minecraftのような環境で対話的な接地言語命令を収集するためのスケーラブルなデータ収集ツールを導入する。さらに,人間アノテータとのマルチターン通信による定性解析とエージェント性能の比較を行うための,Human-in-the-Loopインタラクティブ評価プラットフォームを提案する。我々は、知的な対話型AIエージェントの開発を促進し、さらなる研究に不可欠なリソースを提供することを目的として、IDAT(IGLU Dataset And Toolkit)と呼ばれるこれらの資産をコミュニティに提供します。

Seamless interaction between AI agents and humans using natural language remains a key goal in AI research. This paper addresses the challenges of developing interactive agents capable of understanding and executing grounded natural language instructions through the IGLU competition at NeurIPS. Despite advancements, challenges such as a scarcity of appropriate datasets and the need for effective evaluation platforms persist. We introduce a scalable data collection tool for gathering interactive grounded language instructions within a Minecraft-like environment, resulting in a Multi-Modal dataset with around 9,000 utterances and over 1,000 clarification questions. Additionally, we present a Human-in-the-Loop interactive evaluation platform for qualitative analysis and comparison of agent performance through multi-turn communication with human annotators. We offer to the community these assets referred to as IDAT (IGLU Dataset And Toolkit) which aim to advance the development of intelligent, interactive AI agents and provide essential resources for further research.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# 自閉症児の治療における人工知能の医療専門家・介護者支援への応用

Application of Artificial Intelligence in Supporting Healthcare Professionals and Caregivers in Treatment of Autistic Children ( http://arxiv.org/abs/2407.08902v1 )

ライセンス: Link先を確認

Hossein Mohammadi Rouzbahani, Hadis Karimipour,

(参考訳) 自閉症スペクトラム障害(Autism Spectrum disorder、ASD)は、社会的相互作用の困難、コミュニケーション障害、反復行動に特徴付けられる多面的な神経発達状態である。 ASDの理解の進展にもかかわらず、その診断と治療は、症状学の多様性と多分野の医療アプローチの必要性により、大きな課題を呈し続けている。本稿では,医療従事者や介護者のASD管理能力を高める人工知能(AI)の可能性について検討する。我々は,自閉症児と非自閉症児の日常活動における表情と身体の表情を解析するための高度なアルゴリズムを開発し,強力な深層学習に基づく自閉症検出システムの開発に繋がった。我々の研究は、AIモデル、特にXceptionとResNet50V2アーキテクチャが自閉症スペクトラム障害(ASD)の診断において高い精度を実現していることを示した。本研究は, ASDの診断, 治療, 包括的管理の改善におけるAIの変革的可能性を強調する。我々の研究によると、AIモデル、特にXceptionとResNet50V2アーキテクチャは、ASDの診断において高い精度を示した。

Autism Spectrum Disorder (ASD) represents a multifaceted neurodevelopmental condition marked by difficulties in social interaction, communication impediments, and repetitive behaviors. Despite progress in understanding ASD, its diagnosis and treatment continue to pose significant challenges due to the variability in symptomatology and the necessity for multidisciplinary care approaches. This paper investigates the potential of Artificial Intelligence (AI) to augment the capabilities of healthcare professionals and caregivers in managing ASD. We have developed a sophisticated algorithm designed to analyze facial and bodily expressions during daily activities of both autistic and non-autistic children, leading to the development of a powerful deep learning-based autism detection system. Our study demonstrated that AI models, specifically the Xception and ResNet50V2 architectures, achieved high accuracy in diagnosing Autism Spectrum Disorder (ASD). This research highlights the transformative potential of AI in improving the diagnosis, treatment, and comprehensive management of ASD. Our study revealed that AI models, notably the Xception and ResNet50V2 architectures, demonstrated high accuracy in diagnosing ASD.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# TensorTEE: 安全なコラボレーション型テンソルコンピューティングのための不均一なTEE粒度の統合

TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing ( http://arxiv.org/abs/2407.08903v1 )

ライセンス: Link先を確認

Husheng Han, Xinyao Zheng, Yuanbo Wen, Yifan Hao, Erhu Feng, Ling Liang, Jianan Mu, Xiaqing Li, Tianyun Ma, Pengwei Jin, Xinkai Song, Zidong Du, Qi Guo, Xing Hu,

(参考訳) NPUとCPUによる不均一なコラボレーティブコンピューティングは、その性能上の利点から広く注目を集めている。コンピューティングにおけるデータの機密性と整合性を確保するため、Trusted Execution Environments (TEE) は比較的低いオーバーヘッドのため、有望なソリューションであると考えられている。しかし、既存の異種TEE設計は、CPUとNPUのメモリの粒度が微妙で異なるため、協調コンピューティングでは非効率である。 1) CPU TEEのキャッシュラインの粒度は、余分なメモリアクセスによるメモリ圧力を増大させ、 2)NPUのキャッシュライン粒度MACは、限られたメモリストレージの圧力を増大させる。 3) 異種エンクレーブ間のデータ転送は非セキュア領域の転送に依存しており, 煩雑な再暗号化とスケジューリングを行う。これらの問題に対処するために,効率的な協調テンソル計算のための統一テンソル粒度不均一TEEであるTensorTEEを提案する。まず,CPUTEEにおけるテンソルの粒度を仮想的にサポートし,チップ上のテンソル構造を検出し維持することにより,オフチップメタデータアクセスを除去する。第2に,オフチップMACストレージとアクセスを排除しつつ,計算停止を回避するために,予測実行を伴うテンソル粒度MAC管理を提案する。さらに、統一された粒度に基づいて、再暗号化やジレンマのスケジューリングを行わずに直接データ転送を可能にする。本評価は,改良されたGem5とサイクル精度NPUシミュレータ上に構築した。その結果、TensorTEEは、既存の作業に比べてLarge Language Model(LLM)トレーニングワークロードのパフォーマンスを4.0倍改善し、非セキュアトレーニングに比べて2.1%オーバーヘッドしか発生せず、LLMトレーニングの実践的なセキュリティ保証を提供することがわかった。

Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# ローレンツ共変物理ブラウン運動:古典的および量子的

Lorentz covariant physical Brownian motion: Classical and quantum ( http://arxiv.org/abs/2407.08905v1 )

ライセンス: Link先を確認

Henryk Gzyl,

(参考訳) 本研究では,Goldstein-Ka\c{c}速度スイッチングモデルについて,二つの視点から再検討する。一方、確率過程の前方および後方のチャップマン・コルモゴロフ方程式がローレンツ共変であることを示す。一方、このモデルを量子ランダム進化として再キャストするために、ゴールドスタイン-Ka\c{c}モデルをハミルトニアン系として再検討し、標準対応規則を用いて量子化することができる。ランダムな量子進化の密度は古典的な場合と同様のチャップマン・コルモゴロフ方程式を満たすことが判明し、従ってローレンツ共変である。平均量子分散を計算する。結論として、量子モデルは特殊相対性理論とも一致しており、光円錐の外側の遷移、すなわち時空で不整合を持つ状態間の遷移は起こらないことを検証する。

In this work, we re-examine the Goldstein-Ka\c{c} velocity switching model from two points of view. On the one hand, we prove that the forward and backward Chapman-Kolmogorov equations of the stochastic process are Lorentz covariant when the trajectories are parameterized by their proper time. On the other hand, to recast the model as a quantum random evolution, we consider restating the Goldstein-Ka\c{c} model as a Hamiltonian system, which can then be quantized using the standard correspondence rules. It turns out that the density for the random quantum evolution satisfies a Chapman-Kolmogorov equation similar to that of the classical case, and therefore, it is also Lorentz covariant. We compute the average quantum variance. To finish, we verify that the quantum model is also consistent with special relativity and that transitions outside the light cone, that is, transitions between states with disjoint supports in space-time, cannot occur.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# AirSketch: スケッチのための生成モーション

AirSketch: Generative Motion to Sketch ( http://arxiv.org/abs/2407.08906v1 )

ライセンス: Link先を確認

Hui Xian Grace Lim, Xuanming Cui, Yogesh S Rawat, Ser-Nam Lim,

(参考訳) イラストレーションは人間の表現とコミュニケーションの基本的なモードである。音声に付随するある種の動きは、この説明的なコミュニケーションのモードを提供することができる。 Augmented and Virtual Reality Technologies (AR/VR) は手の動き(空気描画)を描画するツールを導入したが、通常は高価なハードウェアと追加のデジタルマーカーが必要であり、それによってアクセシビリティとポータビリティが制限される。さらに、空気描画は美的な結果を得るためにかなりの技術を必要とする。これらの課題に対処するために,手の動きから直接忠実で視覚的に整合したスケッチを生成し,複雑なヘッドセットやマーカーを必要としないAirSketchの概念を紹介した。制御可能な画像拡散モデルにより、ノイズの多い手追跡画像から、クリーンで美的なスケッチへの変換を学習し、元の追跡データから不可欠な視覚的手がかりを保ちながら、簡単な拡張ベースの自己教師付き訓練手順を考案する。この問題を研究するために,空気描画データセットを2つ提示する。以上の結果から,空間的正確な入力から写真実写画像を生成するだけでなく,制御可能な画像拡散により,ノイズの多い入力から鮮明なスケッチを効果的に作成できることが示唆された。我々の研究は、マーカーレス空気描画への最初のステップとして機能し、AirSketchやAR/VR全般に制御可能な拡散モデルの異なる応用を明らかにする。

Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting their accessibility and portability. Furthermore, air drawing demands considerable skill to achieve aesthetic results. To address these challenges, we introduce the concept of AirSketch, aimed at generating faithful and visually coherent sketches directly from hand motions, eliminating the need for complicated headsets or markers. We devise a simple augmentation-based self-supervised training procedure, enabling a controllable image diffusion model to learn to translate from highly noisy hand tracking images to clean, aesthetically pleasing sketches, while preserving the essential visual cues from the original tracking data. We present two air drawing datasets to study this problem. Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. Our work serves as an initial step towards marker-less air drawing and reveals distinct applications of controllable diffusion models to AirSketch and AR/VR in general.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# 画像が同じか?画像検索における人間とAIの協調のための概念ボタネックモデルの適用

Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval ( http://arxiv.org/abs/2407.08908v1 )

ライセンス: Link先を確認

Vaibhav Balloli, Sara Beery, Elizabeth Bondi-Kelly,

(参考訳) 画像検索は、野生生物保護から医療への応用において、個々の動物や関連画像を見つけるために重要な役割を果たす。画像検索のためのディープラーニング技術は著しく進歩しているが、その不完全な現実世界のパフォーマンスは人間の専門知識を含む必要がしばしばある。ヒューマン・イン・ザ・ループ(Human-in-the-loop)アプローチは一般的に、人間が独立してタスクを完了し、その意見をさまざまな方法でAIモデルと組み合わせる。人間がAIモデルに介入できるように、人間の時間と労力を節約し、概念ボトルネックモデル(CBM)を適用して、‘texttt{CHAIR}’を提案する。 \texttt{CHAIR} (a) 人間が中間概念を修正できるようにし、 textit{improve} 埋め込みの生成を助け、 b) より優れた検索のために、様々なレベルの人間の専門知識を適合させる柔軟な介入を可能にする。本手法は, 外部介入を伴わずに, 画像検索指標の類似モデルよりも優れた性能を示す。さらに,人間の介入によって検索性能が向上し,人間とAIの相補性が向上することを示す。

Image retrieval plays a pivotal role in applications from wildlife conservation to healthcare, for finding individual animals or relevant images to aid diagnosis. Although deep learning techniques for image retrieval have advanced significantly, their imperfect real-world performance often necessitates including human expertise. Human-in-the-loop approaches typically rely on humans completing the task independently and then combining their opinions with an AI model in various ways, as these models offer very little interpretability or \textit{correctability}. To allow humans to intervene in the AI model instead, thereby saving human time and effort, we adapt the Concept Bottleneck Model (CBM) and propose \texttt{CHAIR}. \texttt{CHAIR} (a) enables humans to correct intermediate concepts, which helps \textit{improve} embeddings generated, and (b) allows for flexible levels of intervention that accommodate varying levels of human expertise for better retrieval. To show the efficacy of \texttt{CHAIR}, we demonstrate that our method performs better than similar models on image retrieval metrics without any external intervention. Furthermore, we also showcase how human intervention helps further improve retrieval performance, thereby achieving human-AI complementarity.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# KGpose: ポイントワイズポーズ投票によるキーポイントグラフ駆動型エンド・ツー・エンド多目的6Dポーズ推定

KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting ( http://arxiv.org/abs/2407.08909v1 )

ライセンス: Link先を確認

Andrew Jeong,

(参考訳) KGposeは、複数のオブジェクトの6Dポーズ推定のための新しいエンドツーエンドフレームワークである。提案手法は,キーポイントのグラフ表現である'keypoint-graph'を通じて,キーポイントベースの手法と学習可能なポーズ回帰を組み合わせる。 KGposeはまず、RGBとポイントクラウドの機能を融合したマルチモーダル機能を用いて、各オブジェクトの3Dキーポイントを推定する。これらのキーポイントは点雲の各点から推定され、グラフ表現に変換される。ネットワークは、グラフ畳み込みで設計されたキーポイントグラフ埋め込みと局所グラフ埋め込みのシーケンスを通じて、各ポイントの6Dパラメータを直接回帰し、その後に回転と変換ヘッドが続く。各オブジェクトの最終ポーズは、ポイントワイズ予測候補から選択される。本手法は,本モデルの有効性を実証し,ベンチマークデータセット上での競合結果を実現する。 KGposeは、ロボットアプリケーションのための複雑なシーンにおける幾何学的コンテキストを理解するための統一的で効率的なソリューションを提供するため、追加のローカライゼーションステップを必要とせずに、多目的ポーズ推定を可能にする。

This letter presents KGpose, a novel end-to-end framework for 6D pose estimation of multiple objects. Our approach combines keypoint-based method with learnable pose regression through `keypoint-graph', which is a graph representation of the keypoints. KGpose first estimates 3D keypoints for each object using an attentional multi-modal feature fusion of RGB and point cloud features. These keypoints are estimated from each point of point cloud and converted into a graph representation. The network directly regresses 6D pose parameters for each point through a sequence of keypoint-graph embedding and local graph embedding which are designed with graph convolutions, followed by rotation and translation heads. The final pose for each object is selected from the candidates of point-wise predictions. The method achieves competitive results on the benchmark dataset, demonstrating the effectiveness of our model. KGpose enables multi-object pose estimation without requiring an extra localization step, offering a unified and efficient solution for understanding geometric contexts in complex scenes for robotic applications.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# PAIL:カーボンニュートラル最適化のための性能に基づく逆模倣学習エンジン

PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization ( http://arxiv.org/abs/2407.08910v1 )

ライセンス: Link先を確認

Yuyang Ye, Lu-An Tang, Haoyu Wang, Runlong Yu, Wenchao Yu, Erhu He, Haifeng Chen, Hui Xiong,

(参考訳) 工業運転における炭素中立化は、持続可能な開発にますます不可欠になっている。これは重要な課題であり、業界4.0における運用最適化の重要な機会でもある。近年、深層強化学習(DRL)に基づく手法は、逐次最適化プロセスの有望な拡張を提供し、二酸化炭素排出量の削減に利用できる。しかし、既存のDRL法では、各アクションが最終的な持続可能な開発目標(SDG)に与える影響を評価するために、事前に定義された報酬関数が必要である。多くの実応用において、そのような報酬関数は事前に与えられない。そこで本研究では,PAIL(Performance Based Adversarial Imitation Learning)エンジンを提案する。これは、事前に定義されたアクション報酬を伴わずに、炭素中立性のための最適な操作ポリシーを取得するための新しい方法である。具体的には、Transformerベースのポリシージェネレータを使用して、履歴情報をエンコードし、多次元空間内の後続のアクションを予測する。アクションシーケンス全体を環境シミュレータによって反復的に更新する。次に、PAILは判別器を用いて、生成されたシーケンスと高SDGの実世界のサンプルとの差を最小限にする。並行して、Qラーニングフレームワークに基づくパフォーマンス推定器は、各アクションがSDGに与える影響を推定するために設計されている。これらの推定に基づいて、PAILは識別器と性能推定器の両方の報酬で生成されたポリシーを洗練する。 PAILは、複数の実世界のアプリケーションケースとデータセットで評価される。実験結果は,他の最先端ベースラインと比較したPAILの有効性を示した。さらに、PAILは炭素中立性の最適化に有意義な解釈性を提供する。

Achieving carbon neutrality within industrial operations has become increasingly imperative for sustainable development. It is both a significant challenge and a key opportunity for operational optimization in industry 4.0. In recent years, Deep Reinforcement Learning (DRL) based methods offer promising enhancements for sequential optimization processes and can be used for reducing carbon emissions. However, existing DRL methods need a pre-defined reward function to assess the impact of each action on the final sustainable development goals (SDG). In many real applications, such a reward function cannot be given in advance. To address the problem, this study proposes a Performance based Adversarial Imitation Learning (PAIL) engine. It is a novel method to acquire optimal operational policies for carbon neutrality without any pre-defined action rewards. Specifically, PAIL employs a Transformer-based policy generator to encode historical information and predict following actions within a multi-dimensional space. The entire action sequence will be iteratively updated by an environmental simulator. Then PAIL uses a discriminator to minimize the discrepancy between generated sequences and real-world samples of high SDG. In parallel, a Q-learning framework based performance estimator is designed to estimate the impact of each action on SDG. Based on these estimations, PAIL refines generated policies with the rewards from both discriminator and performance estimator. PAIL is evaluated on multiple real-world application cases and datasets. The experiment results demonstrate the effectiveness of PAIL comparing to other state-of-the-art baselines. In addition, PAIL offers meaningful interpretability for the optimization in carbon neutrality.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# 高度な機械学習による映画推薦の変換:NMF,SVD,K-Meansクラスタリングの検討

Transforming Movie Recommendations with Advanced Machine Learning: A Study of NMF, SVD,and K-Means Clustering ( http://arxiv.org/abs/2407.08916v1 )

ライセンス: Link先を確認

Yubing Yan, Camille Moreau, Zhuoyue Wang, Wenhan Fan, Chengqian Fu,

(参考訳) 本研究では,Non-Negative Matrix Factorization (NMF),Trncated Singular Value Decomposition (SVD),K-Means Clusteringなどの機械学習技術を用いて,ロバストな映画推薦システムを開発した。主な目的は、パーソナライズされた映画レコメンデーションを提供することでユーザーエクスペリエンスを向上させることである。この研究は、データ前処理、モデルトレーニング、評価を含み、採用手法の有効性を強調している。その結果,提案システムはレコメンデーションの精度と妥当性が高く,レコメンデーションシステムの分野に多大な貢献をしていることがわかった。

This study develops a robust movie recommendation system using various machine learning techniques, including Non- Negative Matrix Factorization (NMF), Truncated Singular Value Decomposition (SVD), and K-Means clustering. The primary objective is to enhance user experience by providing personalized movie recommendations. The research encompasses data preprocessing, model training, and evaluation, highlighting the efficacy of the employed methods. Results indicate that the proposed system achieves high accuracy and relevance in recommendations, making significant contributions to the field of recommendations systems.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# 進化的多タスク最適化における知識伝達の探索--複雑ネットワークの視点から

Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective ( http://arxiv.org/abs/2407.08918v1 )

ライセンス: Link先を確認

Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu,

(参考訳) 進化的多タスク最適化(EMaTO)の分野は、繰り返し特性による最適化課題の解決を合理化し、計算資源を保存できることで、ますます認識されている。本稿では,個々のタスク評価の計算要求によって複雑化するタスクであるEMATO内で,効率的な知識伝達機構を構築することの課題に取り組む。本稿では,EMATO内のタスク間の知識伝達のダイナミクスを包括的に解析するために,複雑なネットワークを用いた新しいフレームワークを提案する。既存のEMATOアルゴリズムから知識伝達ネットワークを抽出し、精査することにより、ネットワーク修正が全体的なアルゴリズムの有効性に与える影響を評価する。その結果,これらのネットワークは多様であり,ネットワーク密度は異なるタスクセットに適応し,コミュニティ構造を指向したグラフ特性を示すことが示唆された。本研究は、複雑なネットワーク概念をEMATOに統合し、知識伝達プロセスを洗練し、将来的なドメインの進歩への道を開くことの可能性を実証するものである。

The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task evaluations. We introduce a novel framework that employs a complex network to comprehensively analyze the dynamics of knowledge transfer between tasks within EMaTO. By extracting and scrutinizing the knowledge transfer network from existing EMaTO algorithms, we evaluate the influence of network modifications on overall algorithmic efficacy. Our findings indicate that these networks are diverse, displaying community-structured directed graph characteristics, with their network density adapting to different task sets. This research underscores the viability of integrating complex network concepts into EMaTO to refine knowledge transfer processes, paving the way for future advancements in the domain.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# ナノ合成メカニズム説明のための大規模言語モデルを活用する:固体基礎か単なる予想か?

Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? ( http://arxiv.org/abs/2407.08922v1 )

ライセンス: Link先を確認

Yingming Pu, Liping Huang, Tao Lin, Hongyu Chen,

(参考訳) 人工知能(AI)の急速な発展に伴い、GPT-4のような大規模言語モデル(LLM)は科学界で大きな注目を集め、科学的発見の進展に大きな可能性を示している。これらのLSMは、現実世界の物理化学的原理とよく一致しているか? 現在の評価戦略は、物質的特性予測や名前認識などの事実に基づく知識を主に重視しているが、論理的推論を必要とする基本的な物理化学的メカニズムの理解が欠如していることが多い。このギャップを埋めるために,金ナノ粒子合成のメカニズムに焦点をあてた775個の多重選択質問からなるベンチマークを開発した。既存の評価指標を反映することにより、直接真偽評価が単に推測を示唆するかどうかを疑問視する。そこで本研究では,評価基準である信頼度に基づくスコア(cスコア)を提案し,出力ロジットを探索し,正解の正確な確率を導出する。実験結果から,金ナノ粒子合成の文脈では,LLMは予想よりも基礎となる物理化学的機構を理解する。本研究は,LLMが本質的な科学的メカニズムを把握し,より信頼性が高く効果的なAIツールを様々な科学領域で開発するための段階を定めている。

With the rapid development of artificial intelligence (AI), large language models (LLMs) such as GPT-4 have garnered significant attention in the scientific community, demonstrating great potential in advancing scientific discovery. This progress raises a critical question: are these LLMs well-aligned with real-world physicochemical principles? Current evaluation strategies largely emphasize fact-based knowledge, such as material property prediction or name recognition, but they often lack an understanding of fundamental physicochemical mechanisms that require logical reasoning. To bridge this gap, our study developed a benchmark consisting of 775 multiple-choice questions focusing on the mechanisms of gold nanoparticle synthesis. By reflecting on existing evaluation metrics, we question whether a direct true-or-false assessment merely suggests conjecture. Hence, we propose a novel evaluation metric, the confidence-based score (c-score), which probes the output logits to derive the precise probability for the correct answer. Based on extensive experiments, our results show that in the context of gold nanoparticle synthesis, LLMs understand the underlying physicochemical mechanisms rather than relying on conjecture. This study underscores the potential of LLMs to grasp intrinsic scientific mechanisms and sets the stage for developing more reliable and effective AI tools across various scientific domains.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# LLMによる難燃剤の分解

Disassembling Obfuscated Executables with LLM ( http://arxiv.org/abs/2407.08924v1 )

ライセンス: Link先を確認

Huanyao Rong, Yue Duan, Hang Zhang, XiaoFeng Wang, Hongbo Chen, Shengchen Duan, Shen Wang,

(参考訳) 分解は、特に分解エラーを引き起こすように設計されたジャンクバイトを含む難読化された実行ファイルにとって、困難なタスクである。既存のソリューションはヒューリスティックや機械学習技術を利用するが、限られた成功しか達成できない。基本的に、そのような難読化は、大言語モデル(LLM)の出現によって実現されるバイナリ実行ファイルのセマンティクスの深い理解なしには達成できない。本稿では,難読化可能ファイルの解析における課題を克服するために,新しいLCM駆動型ディスサンブラであるDisasLLMを提案する。 DisasLLMは、アセンブリコードスニペット内の命令が正しくデコードされているかどうかを決定するLLMベースの分類器と、このモデルを利用して難読化された実行ファイルをエンドツーエンドに分解する分解戦略の2つのコンポーネントで構成されている。我々は、DisasLLMを非常に難解な実行ファイルの集合で評価し、他の最先端の分解ソリューションよりも大幅に優れていることを示した。

Disassembly is a challenging task, particularly for obfuscated executables containing junk bytes, which is designed to induce disassembly errors. Existing solutions rely on heuristics or leverage machine learning techniques, but only achieve limited successes. Fundamentally, such obfuscation cannot be defeated without in-depth understanding of the binary executable's semantics, which is made possible by the emergence of large language models (LLMs). In this paper, we present DisasLLM, a novel LLM-driven dissembler to overcome the challenge in analyzing obfuscated executables. DisasLLM consists of two components: an LLM-based classifier that determines whether an instruction in an assembly code snippet is correctly decoded, and a disassembly strategy that leverages this model to disassemble obfuscated executables end-to-end. We evaluated DisasLLM on a set of heavily obfuscated executables, which is shown to significantly outperform other state-of-the-art disassembly solutions.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# ハードウェアデバイス上での複数量子回路のリソース認識スケジューリング

Resource-aware scheduling of multiple quantum circuits on a hardware device ( http://arxiv.org/abs/2407.08930v1 )

ライセンス: Link先を確認

Debasmita Bhoumik, Ritajit Majumdar, Susmita Sur-Kolay,

(参考訳) 最近の量子技術と量子エラー訂正符号は、望ましくないノイズを避けるために、量子回路を所定のハードウェアデバイスにマッピングしながら、最も近い隣り合う(NN)構成で相互作用する量子ビットを配置する必要性を強調している。 m < m の n 量子ビットの回路を実行しながら、m 量子ビットを持つ量子ハードウェア装置において、量子ビットの無駄を最小化することが同様に重要である。 2つの回路間のクロストークを防ぐために、レイアウト間のバッファ距離が必要となる。さらに、全ての量子ビットと2つの量子ビットの相互作用は同じノイズレベルであるわけではない。同じハードウェア上で複数の回路をスケジューリングすると、一部の回路が他の回路よりもノイズの多いレイアウトで実行される可能性がある。本稿では,ハードウェア上での並列実行に可能な限り多くの回路をスケジュールする最適化問題について検討する。相互作用する量子ビット間の近接配置を保ちながら、最大忠実性を確保する整数線形プログラミング定式化を示す。我々の主張は、よく知られた量子回路ベンチマークを含む包括的な調査によって支持されている。このスケジューリング問題はNPハードであることが示され、27量子ビットと127量子ビットのハードウェアデバイスに対して、それぞれ2倍と3倍の精度で量子ビットと時間で利用することができる、欲張りのヒューリスティック手法も提案する。

Recent quantum technologies and quantum error-correcting codes emphasize the requirement for arranging interacting qubits in a nearest-neighbor (NN) configuration while mapping a quantum circuit onto a given hardware device, in order to avoid undesirable noise. It is equally important to minimize the wastage of qubits in a quantum hardware device with m qubits while running circuits of n qubits in total, with n < m. In order to prevent cross-talk between two circuits, a buffer distance between their layouts is needed. Furthermore, not all the qubits and all the two-qubit interactions are at the same noise-level. Scheduling multiple circuits on the same hardware may create a possibility that some circuits are executed on a noisier layout than the others. In this paper, we consider an optimization problem which schedules as many circuits as possible for execution in parallel on the hardware, while maintaining a pre-defined layout quality for each. An integer linear programming formulation to ensure maximum fidelity while preserving the nearest neighbor arrangement among interacting qubits is presented. Our assertion is supported by comprehensive investigations involving various well-known quantum circuit benchmarks. As this scheduling problem is shown to be NP Hard, we also propose a greedy heuristic method which provides 2x and 3x better utilization for 27-qubit and 127-qubit hardware devices respectively in terms of qubits and time.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# ライダーを用いた開語彙検出のためのLLMを用いたグローバルローカル協調推論

Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection ( http://arxiv.org/abs/2407.08931v1 )

ライセンス: Link先を確認

Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu,

(参考訳) Open-Vocabulary Detection (OVD)は、事前に定義されたオブジェクトクラスなしで、あるシーンですべての興味深いオブジェクトを検出するタスクである。 2D RGB画像のOVDに対する大規模な取り組みは行われているが、3D OVDの探索はまだ限られている。直感的には、ライダーポイントクラウドはオブジェクトレベルとシーンレベルの両方の3D情報を提供し、信頼できる検出結果を生成する。しかし、従来のライダーベースのOVD手法は、シーンレベルの情報の本質を無視して、オブジェクトレベルの特徴の使用のみに焦点を当てていた。本稿では、オブジェクトレベルの検出結果を生成するローカルブランチと、シーンレベルのグローバル機能を得るグローバルブランチを含む、ライダーベースのOVDタスクのためのグローバルローカル協調スキーム(GLIS)を提案する。グローバルなローカル情報を用いて、連鎖推定にLarge Language Model(LLM)を適用し、それに応じて検出結果を洗練することができる。さらに,Reflectioned Pseudo Labels Generation (RPLG) を提案し,高品質な擬似ラベルを生成するとともに,背景認識オブジェクトローカライゼーション (BAOL) を用いて正確なオブジェクト提案を選択する。 ScanNetV2 と SUN RGB-D の大規模な実験により,本手法の優位性を実証した。コードはhttps://github.com/GradiusTwinbee/GLISで公開されている。

Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous lidar-based OVD methods only focus on the usage of object-level features, ignoring the essence of scene-level information. In this paper, we propose a Global-Local Collaborative Scheme (GLIS) for the lidar-based OVD task, which contains a local branch to generate object-level detection result and a global branch to obtain scene-level global feature. With the global-local information, a Large Language Model (LLM) is applied for chain-of-thought inference, and the detection result can be refined accordingly. We further propose Reflected Pseudo Labels Generation (RPLG) to generate high-quality pseudo labels for supervision and Background-Aware Object Localization (BAOL) to select precise object proposals. Extensive experiments on ScanNetV2 and SUN RGB-D demonstrate the superiority of our methods. Code is released at https://github.com/GradiusTwinbee/GLIS.

翻訳日:2024-07-16 01:16:30 公開日:2024-07-12

# 動的環境における自律走行車両意思決定のための深部注意駆動型強化学習(DAD-RL)

Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Vehicle Decision-Making in Dynamic Environment ( http://arxiv.org/abs/2407.08932v1 )

ライセンス: Link先を確認

Jayabrata Chowdhury, Venkataramanan Shivaraman, Sumit Dangi, Suresh Sundaram, P. B. Sujit,

(参考訳) 都市環境における自律走行車(AV)の意思決定は、周囲の車両との動的相互作用のために本質的に困難である。安全な計画のためには、AVはシーン内の様々な時空間相互作用の重み付けを理解する必要がある。現代の研究では、トラジェクトリ予測を中心に相互作用を符号化するためにコロッサルトランスフォーマーアーキテクチャを使用しており、計算複雑性が増大している。時空間的理解と性能を損なうことなくこの問題に対処するため,エゴのRL駆動意思決定プロセスにおいて,周囲車両の意義を動的に割り当て,組み込む,DADRL(Deep Attention Driven Reinforcement Learning)フレームワークを提案する。 AV中心の時空間アテンション符号化(STAE)機構を導入し,周囲の車両との動的相互作用を学習する。地図と経路のコンテキストを理解するために,コンテキストマップから特徴を抽出するためにコンテキストエンコーダを用いる。時空間表現と文脈符号化の組み合わせは、包括的な状態表現を提供する。得られたモデルは、Soft Actor Critic (SAC)アルゴリズムを用いて訓練される。我々は,交通信号のないSMARTS都市ベンチマークの枠組みを検証し,DADRLが最近の最先端手法よりも優れていることを示す。さらに、アブレーション研究は、優れた性能を達成する上で、文脈エンコーダと時空間アテンションエンコーダの重要性を強調している。

Autonomous Vehicle (AV) decision making in urban environments is inherently challenging due to the dynamic interactions with surrounding vehicles. For safe planning, AV must understand the weightage of various spatiotemporal interactions in a scene. Contemporary works use colossal transformer architectures to encode interactions mainly for trajectory prediction, resulting in increased computational complexity. To address this issue without compromising spatiotemporal understanding and performance, we propose the simple Deep Attention Driven Reinforcement Learning (DADRL) framework, which dynamically assigns and incorporates the significance of surrounding vehicles into the ego's RL driven decision making process. We introduce an AV centric spatiotemporal attention encoding (STAE) mechanism for learning the dynamic interactions with different surrounding vehicles. To understand map and route context, we employ a context encoder to extract features from context maps. The spatiotemporal representations combined with contextual encoding provide a comprehensive state representation. The resulting model is trained using the Soft Actor Critic (SAC) algorithm. We evaluate the proposed framework on the SMARTS urban benchmarking scenarios without traffic signals to demonstrate that DADRL outperforms recent state of the art methods. Furthermore, an ablation study underscores the importance of the context-encoder and spatio temporal attention encoder in achieving superior performance.

翻訳日:2024-07-16 01:06:34 公開日:2024-07-12

# 高ボリュームメディア製造における機械学習

Machine Learning in High Volume Media Manufacturing ( http://arxiv.org/abs/2407.08933v1 )

ライセンス: Link先を確認

Siddarth Reddy Karuka, Abhinav Sunderrajan, Zheng Zheng, Yong Woon Tiean, Ganesh Nagappan, Allan Luk,

(参考訳) 大量生産環境でのエラーや失敗は、時間とお金の損失をもたらす大きな影響を与える可能性がある。このような失敗を早期に特定することは製造業にとって最優先事項であり、長年にわたり様々なルールベースのアルゴリズムが開発されてきた。しかし、これらの失敗をキャッチすることは時間がかかり、そのようなアルゴリズムは設計の変化にうまく適応できない。さらに重要なのは、大量生産環境で監視するユニットの数は、手動の監視や単純なプログラムには大きすぎることだ。ここでは、ルールベースの意思決定と機械学習モデルを組み合わせて、このような日々のバリエーションや長期的なデザインの変更を学習し、適応できるだけでなく、現在使われている多くの製造ユニットにも大規模に適用できる新しいプログラムを開発する。現在の最先端技術を用いて、我々はこのプログラムを大規模に展開し、製造環境からの需要の増加に対処する。

Errors or failures in a high-volume manufacturing environment can have significant impact that can result in both the loss of time and money. Identifying such failures early has been a top priority for manufacturing industries and various rule-based algorithms have been developed over the years. However, catching these failures is time consuming and such algorithms cannot adapt well to changes in designs, and sometimes variations in everyday behavior. More importantly, the number of units to monitor in a high-volume manufacturing environment is too big for manual monitoring or for a simple program. Here we develop a novel program that combines both rule-based decisions and machine learning models that can not only learn and adapt to such day-to-day variations or long-term design changes, but also can be applied at scale to the high number of manufacturing units in use today. Using the current state-of-the-art technologies, we then deploy this program at-scale to handle the needs of ever-increasing demand from the manufacturing environment.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# 神経内包と相互作用分解における組成構造

Compositional Structures in Neural Embedding and Interaction Decompositions ( http://arxiv.org/abs/2407.08934v1 )

ライセンス: Link先を確認

Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto,

(参考訳) ニューラルネットワークにおけるベクトル埋め込みにおける線形代数構造と,これらのネットワークでモデル化された確率分布に対する条件独立性制約の基本的な対応について述べる。我々のフレームワークは、データ表現における構造パターンの出現に光を当てることを目的としている。具体的には、「相互作用分解」という観点から構成構造の特徴づけを導入し、モデルの表現の中にそのような構造が存在するためには必要かつ十分な条件を確立する。

We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks and conditional independence constraints on the probability distributions modeled by these networks. Our framework aims to shed light on the emergence of structural patterns in data representations, a phenomenon widely acknowledged but arguably still lacking a solid formal grounding. Specifically, we introduce a characterization of compositional structures in terms of "interaction decompositions," and we establish necessary and sufficient conditions for the presence of such structures within the representations of a model.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# フェデレーショングラフ学習と認証ディフェンスに対する分散型バックドアアタック

Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses ( http://arxiv.org/abs/2407.08935v1 )

ライセンス: Link先を確認

Yuxin Yang, Qiang Li, Jinyuan Jia, Yuan Hong, Binghui Wang,

(参考訳) フェデレーショングラフ学習(Federated Graph Learning, FedGL)は、FLを拡張してさまざまなソースからグラフデータを学習する、新たなフェデレーション学習(FedGL)フレームワークである。非グラフデータのFLは、トレーニングデータに共有バックドアトリガーを注入し、トレーニングされたバックドアFLモデルが、攻撃者が望むようにトリガーを含むテストデータを予測できるように、バックドアアタックに対して脆弱であることが示されている。しかし、FedGLによるバックドア攻撃に対する攻撃はほとんど探索されておらず、効果的な防御は存在しない。本稿では,このような重大な欠陥に対処することを目的とする。まず,FedGLに対する効果的な,ステルス的で永続的なバックドア攻撃を提案する。我々の攻撃では、サブグラフをトリガーとし、各グラフの効果的なトリガー位置と形状を導出できる適応トリガージェネレータを設計する。私たちの攻撃は、経験的防御が生成したトリガーを検出・削除することが難しいことを示している。これを軽減するため、任意の位置で任意の形状のトリガーに対して、バックドアのFedGLモデルに対する認証された防御を更に開発する。我々の弁護は、テストグラフを複数のサブグラフに慎重に分割し、これらのサブグラフに多数決ベースのアンサンブル分類器を設計することである。次に、アンサンブル分類器に基づいて決定論的証明された堅牢性を導出し、その厳密性を証明する。 6つのグラフデータセットに対する攻撃と防御を広範囲に評価した。我々の攻撃結果は、ほぼ全てのデータセットで90%以上のバックドア精度が得られることを示している。以上の結果から,20の任意のトリガに対するクリーンなテストグラフの精度は,攻撃を受けない場合の正常な精度に近いが,他の場合では適度なギャップがあることがわかった。さらに、我々の攻撃によって生成されたバックドアテストグラフに対して、認証されたバックドア精度は常に0であり、防衛が攻撃を完全に軽減できることを意味している。ソースコードはhttps://github.com/Yuxin104/Opt-GDBA.comで入手できる。

Federated graph learning (FedGL) is an emerging federated learning (FL) framework that extends FL to learn graph data from diverse sources. FL for non-graph data has shown to be vulnerable to backdoor attacks, which inject a shared backdoor trigger into the training data such that the trained backdoored FL model can predict the testing data containing the trigger as the attacker desires. However, FedGL against backdoor attacks is largely unexplored, and no effective defense exists. In this paper, we aim to address such significant deficiency. First, we propose an effective, stealthy, and persistent backdoor attack on FedGL. Our attack uses a subgraph as the trigger and designs an adaptive trigger generator that can derive the effective trigger location and shape for each graph. Our attack shows that empirical defenses are hard to detect/remove our generated triggers. To mitigate it, we further develop a certified defense for any backdoored FedGL model against the trigger with any shape at any location. Our defense involves carefully dividing a testing graph into multiple subgraphs and designing a majority vote-based ensemble classifier on these subgraphs. We then derive the deterministic certified robustness based on the ensemble classifier and prove its tightness. We extensively evaluate our attack and defense on six graph datasets. Our attack results show our attack can obtain > 90% backdoor accuracy in almost all datasets. Our defense results show, in certain cases, the certified accuracy for clean testing graphs against an arbitrary trigger with size 20 can be close to the normal accuracy under no attack, while there is a moderate gap in other cases. Moreover, the certified backdoor accuracy is always 0 for backdoored testing graphs generated by our attack, implying our defense can fully mitigate the attack. Source code is available at: https://github.com/Yuxin104/Opt-GDBA.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# 自己進化型GPT : 生涯にわたる自律的経験的学習者

Self-Evolving GPT: A Lifelong Autonomous Experiential Learner ( http://arxiv.org/abs/2407.08937v1 )

ライセンス: Link先を確認

Jinglong Gao, Xiao Ding, Yiming Cui, Jianbai Zhao, Hepeng Wang, Ting Liu, Bing Qin,

(参考訳) 大規模言語モデル(LLM)の性能向上のために,研究者らは,プロンプトによるテキストタスク解決エクスペリエンスを備えたLLMの提供を検討した。しかし、各タスクに対してこのような経験を習得し、適用するための手作業に頼っているため、LSMの需要の増加や様々なユーザ質問に対して実現不可能である。この問題に対処するために、LLMをベースとした生涯にわたる自律的経験学習フレームワークを設計し、LLMが人間の学習能力を模倣し、経験を活用できるかどうかを考察する。自律的に学習し、経験の伝達と帰納を通じて経験を蓄積し、どのような種類の入力質問を分類し、どの蓄積された経験を雇用するかを選択する。 6つのNLPデータセットによる実験結果から,本フレームワークは各中間段階において確実に動作し,GPT-3.5およびGPT-4の性能を効果的に向上することが示された。これは、人間の経験的学習と応用能力を模倣するためにLLMを使うことの可能性を検証する。さらに、各ステップでフレームワークの振る舞いを詳細に分析します。

To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential learning framework based on LLMs to explore whether LLMs can imitate human ability for learning and utilizing experience. It autonomously learns and accumulates experience through experience transfer and induction, categorizing the types of input questions to select which accumulated experience to employ for them. Experimental results on six widely used NLP datasets show that our framework performs reliably in each intermediate step and effectively improves the performance of GPT-3.5 and GPT-4. This validates the feasibility of using LLMs to mimic human experiential learning and application capabilities. Additionally, we provide a detailed analysis of the behavior of our framework at each step.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# LightenDiffusion: ラテン・レチネックス拡散モデルによる教師なし低光画像強調

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models ( http://arxiv.org/abs/2407.08939v1 )

ライセンス: Link先を確認

Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, Shuaicheng Liu,

(参考訳) 本稿では,低照度画像強調のための拡散モデルであるLightenDiffusionを用いて,物理的に説明可能なRetinex理論を組み込んだ拡散に基づく教師なしフレームワークを提案する。具体的には,画像空間ではなく遅延空間内でRetinex分解を行うコンテントトランスファー分解ネットワークを提案し,未ペアローライトおよびノーマルライト画像の特徴をコンテントリッチリフレクタンスマップとコンテントフリー照明マップに分解する。その後、低照度画像の反射率マップと通常照度画像の照度マップとを低照度特徴の誘導により教師なし復元のための拡散モデルに入力し、自己拘束的整合性損失をさらに提案して、回復結果に対する正常照度コンテンツの干渉を排除し、全体的な視覚的品質を向上させる。公開されている実世界のベンチマークに関する大規模な実験によると、提案されたLightenDiffusionは最先端の非教師付き競合よりも優れており、様々な場面でより一般化可能な教師付き手法に匹敵する。私たちのコードはhttps://github.com/JianghaiSCU/LightenDiffusion.comで公開されています。

In this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, we present a content-transfer decomposition network that performs Retinex decomposition within the latent space instead of image space as in previous approaches, enabling the encoded features of unpaired low-light and normal-light images to be decomposed into content-rich reflectance maps and content-free illumination maps. Subsequently, the reflectance map of the low-light image and the illumination map of the normal-light image are taken as input to the diffusion model for unsupervised restoration with the guidance of the low-light feature, where a self-constrained consistency loss is further proposed to eliminate the interference of normal-light content on the restored results to improve overall visual quality. Extensive experiments on publicly available real-world benchmarks show that the proposed LightenDiffusion outperforms state-of-the-art unsupervised competitors and is comparable to supervised methods while being more generalizable to various scenes. Our code is available at https://github.com/JianghaiSCU/LightenDiffusion.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# バイオメディカル仮説生成系としての大規模言語モデル:包括的評価

Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation ( http://arxiv.org/abs/2407.08940v1 )

ライセンス: Link先を確認

Biqing Qi, Kaiyan Zhang, Kai Tian, Haoxiang Li, Zhang-Ren Chen, Sihang Zeng, Ermo Hua, Hu Jinfang, Bowen Zhou,

(参考訳) 生物医学的知識の急速な成長は、洞察を効率的に抽出し、新しい仮説を創出する能力を大きく上回っている。大規模言語モデル(LLM)は、知識の相互作用を革新し、生体医学的な発見を加速するための有望なツールとして登場した。本稿では, LLMをバイオメディカル仮説生成器として包括的に評価する。バイオメディカル文献から背景と仮説のペアのデータセットを構築し、データ汚染を軽減するために、公開日に基づくトレーニング、観察、不明なテストセットに慎重に分割する。このデータセットを用いて、ゼロショット、少数ショット、微調整設定で上位層の指示されたモデルの仮説生成能力を評価する。科学的発見の重要な側面である不確実性の探索を強化するため,評価枠組みにツール利用とマルチエージェントインタラクションを取り入れた。さらに, LLMに基づく評価と人的評価の両面から, 仮説の質を評価するために, 広範な文献レビューに基礎を置く4つの新しい指標を提案する。我々の実験は2つの重要な発見をもたらす。 1)LLMは、トレーニング中に見えない文献でテストしても、新規で検証された仮説を生成できる。 2)マルチエージェントインタラクションやツール利用による不確実性の向上により,多様な候補生成が容易になり,ゼロショット仮説生成性能が向上する。しかし、数発の学習とツール使用による追加知識の統合は、必ずしもパフォーマンス向上につながるとは限りませんし、組み込まれた外部知識のタイプや範囲を慎重に検討する必要性も浮き彫りにしています。これらの知見は、LLMが生物医学的仮説生成の強力な補助となり、この分野のさらなる研究を導く貴重な洞察を与える可能性を示している。

The rapid growth of biomedical knowledge has outpaced our ability to efficiently extract insights and generate novel hypotheses. Large language models (LLMs) have emerged as a promising tool to revolutionize knowledge interaction and potentially accelerate biomedical discovery. In this paper, we present a comprehensive evaluation of LLMs as biomedical hypothesis generators. We construct a dataset of background-hypothesis pairs from biomedical literature, carefully partitioned into training, seen, and unseen test sets based on publication date to mitigate data contamination. Using this dataset, we assess the hypothesis generation capabilities of top-tier instructed models in zero-shot, few-shot, and fine-tuning settings. To enhance the exploration of uncertainty, a crucial aspect of scientific discovery, we incorporate tool use and multi-agent interactions in our evaluation framework. Furthermore, we propose four novel metrics grounded in extensive literature review to evaluate the quality of generated hypotheses, considering both LLM-based and human assessments. Our experiments yield two key findings: 1) LLMs can generate novel and validated hypotheses, even when tested on literature unseen during training, and 2) Increasing uncertainty through multi-agent interactions and tool use can facilitate diverse candidate generation and improve zero-shot hypothesis generation performance. However, we also observe that the integration of additional knowledge through few-shot learning and tool use may not always lead to performance gains, highlighting the need for careful consideration of the type and scope of external knowledge incorporated. These findings underscore the potential of LLMs as powerful aids in biomedical hypothesis generation and provide valuable insights to guide further research in this area.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# マルチモーダル大言語モデルに基づくニューラルネットワーク行列分解レコメンダシステムモデル

A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model ( http://arxiv.org/abs/2407.08942v1 )

ライセンス: Link先を確認

Ao Xiang, Bingjie Huang, Xinyu Guo, Haowei Yang, Tianyao Zheng,

(参考訳) 推薦システムは情報検索問題に対する重要な解決策となっている。本稿では,BoNMFと呼ばれる多モーダル大規模言語モデルに基づくニューラルネットワーク行列分解推薦システムモデルを提案する。このモデルは、自然言語処理におけるBoBERTaの強力な能力、視覚におけるコンピュータにおけるViT、およびニューラルマトリックス分解技術を組み合わせたものである。ユーザとアイテムの潜在的な特性をキャプチャし、ユーザとアイテムIDからなる低次元行列と対話した後、ニューラルネットワークは結果を出力する。推薦するコールドスタートおよびアブレーション実験の結果,BoNMFモデルは大規模な公開データセットに対して優れた性能を示し,レコメンデーションの精度を大幅に向上させることがわかった。

Recommendation systems have become an important solution to information search problems. This article proposes a neural matrix factorization recommendation system model based on the multimodal large language model called BoNMF. This model combines BoBERTa's powerful capabilities in natural language processing, ViT in computer in vision, and neural matrix decomposition technology. By capturing the potential characteristics of users and items, and after interacting with a low-dimensional matrix composed of user and item IDs, the neural network outputs the results. recommend. Cold start and ablation experimental results show that the BoNMF model exhibits excellent performance on large public data sets and significantly improves the accuracy of recommendations.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# 大規模ローカライゼーションのための展開可能な量子アクセスポイント選択アルゴリズム

A Deployable Quantum Access Points Selection Algorithm for Large-Scale Localization ( http://arxiv.org/abs/2407.08943v1 )

ライセンス: Link先を確認

Ahmed Shokry, Moustafa Youssef,

(参考訳) 効果的なアクセスポイントの選択は、ローカライズシステムにおいて重要なステップである。これは、ローカライズ精度と計算効率の両方に直接的な影響を与える。古典的なAP選択アルゴリズムは通常計算コストが高く、大規模なグローバルスケールでのローカライゼーションシステムの展開を妨げる。本稿では,大規模ローカライゼーションシステムのための量子APs選択アルゴリズムを提案する。提案アルゴリズムは、量子アニールを利用して冗長でノイズの多いAPを除去する。本稿では、量子アニールに適した2次非拘束バイナリ最適化(QUBO)問題としてAPs選択問題を定式化する方法と、完全APsセットと同じ局所化系精度を維持する最小数のAPを選択する方法について説明する。これに基づいて、最適なAP数を選択するための対数複雑度アルゴリズムを提案する。我々は,実D-Wave Systems量子マシンに量子アルゴリズムを実装し,フロアローカライズ問題に対する実テスト環境での性能評価を行う。その結果, 利用可能なAPの14%未満を環境下で選択することにより, 量子アルゴリズムは, 古典的なAP選択によるデータセットの削減よりも, 全体のAPを利用するのと同じフロアローカライズ精度と優れた精度を達成できることが判明した。さらに、提案した量子アルゴリズムは、対応する古典的なAPs選択アルゴリズムよりも1桁以上のスピードアップを実現し、大規模ローカライゼーションシステムにおける提案した量子アルゴリズムの効率性を強調した。

Effective access points (APs) selection is a crucial step in localization systems. It directly affects both localization accuracy and computational efficiency. Classical APs selection algorithms are usually computationally expensive, hindering the deployment of localization systems in a large worldwide scale. In this paper, we introduce a quantum APs selection algorithm for large-scale localization systems. The proposed algorithm leverages quantum annealing to eliminate redundant and noisy APs. We explain how to formulate the APs selection problem as a quadratic unconstrained binary optimization (QUBO) problem, suitable for quantum annealing, and how to select the minimum number of APs that maintain the same overall localization system accuracy as the complete APs set. Based on this, we further propose a logarithmic-complexity algorithm to select the optimal number of APs. We implement our quantum algorithm on a real D-Wave Systems quantum machine and assess its performance in a real test environment for a floor localization problem. Our findings reveal that by selecting fewer than 14% of the available APs in the environment, our quantum algorithm achieves the same floor localization accuracy as utilizing the entire set of APs and a superior accuracy over utilizing the reduced dataset by classical APs selection counterparts. Moreover, the proposed quantum algorithm achieves more than an order of magnitude speedup over the corresponding classical APs selection algorithms, emphasizing the efficiency of the proposed quantum algorithm for large-scale localization systems.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# Bora:バイオメディカルジェネリストのビデオ生成モデル

Bora: Biomedical Generalist Video Generation Model ( http://arxiv.org/abs/2407.08944v1 )

ライセンス: Link先を確認

Weixiang Sun, Xiaocao You, Ruizhe Zheng, Zhengqing Yuan, Xiang Li, Lifang He, Quanzheng Li, Lichao Sun,

(参考訳) 生成モデルは、医療教育の革新、ロボット支援手術、医療AI開発のためのデータ拡張を約束する。拡散モデルはテキストプロンプトからリアルな画像を生成できるようになったが、最近の進歩は、多種多様な高品質のビデオを作成する能力を示している。しかしながら、これらのモデルは、医療処置の正確な表現と詳細な解剖学的構造の生成に苦慮することが多い。本稿では,テキスト誘導バイオメディカルビデオ生成のための最初の時空間拡散確率モデルであるBoraを紹介する。 BoraはTransformerアーキテクチャを活用し、汎用ビデオ生成タスクで事前訓練されている。様々な医療分野のテキストビデオデータを含む,新たに確立された医用ビデオコーパスを用いて,モデルアライメントとインストラクションチューニングによって微調整を行う。私たちの知る限りでは、このような包括的な注釈付きバイオメディカルビデオデータセットを確立するための最初の試みである。 Boraは、4つの異なるバイオメディカル領域にまたがる高品質なビデオデータを生成し、医療専門家の基準に準拠し、一貫性と多様性を示す。このジェネラリストビデオ生成モデルは、特にリソース限定の設定において、医療相談や意思決定の強化に重要な可能性を秘めている。さらに、ボラは没入型医療訓練と手続き計画の道を開くことができる。内視鏡, 超音波, MRI, 細胞追跡などの異なる医用モダリティに関する広範囲な実験により, 生医学的指示を理解する上での本モデルの有効性と, 最先端の世代モデルと比較して, 被験者間での優れた性能が検証された。

Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for medical AI development. Diffusion models can now generate realistic images from text prompts, while recent advancements have demonstrated their ability to create diverse, high-quality videos. However, these models often struggle with generating accurate representations of medical procedures and detailed anatomical structures. This paper introduces Bora, the first spatio-temporal diffusion probabilistic model designed for text-guided biomedical video generation. Bora leverages Transformer architecture and is pre-trained on general-purpose video generation tasks. It is fine-tuned through model alignment and instruction tuning using a newly established medical video corpus, which includes paired text-video data from various biomedical fields. To the best of our knowledge, this is the first attempt to establish such a comprehensive annotated biomedical video dataset. Bora is capable of generating high-quality video data across four distinct biomedical domains, adhering to medical expert standards and demonstrating consistency and diversity. This generalist video generative model holds significant potential for enhancing medical consultation and decision-making, particularly in resource-limited settings. Additionally, Bora could pave the way for immersive medical training and procedure planning. Extensive experiments on distinct medical modalities such as endoscopy, ultrasound, MRI, and cell tracking validate the effectiveness of our model in understanding biomedical instructions and its superior performance across subjects compared to state-of-the-art generation models.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# 拡散モデルは秘密裏にノイズ分類器であり、コントラストトレーニングの利点

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training ( http://arxiv.org/abs/2407.08946v1 )

ライセンス: Link先を確認

Yunshu Wu, Yingtao Luo, Xianghao Kong, Evangelos E. Papalexakis, Greg Ver Steeg,

(参考訳) 拡散モデルはデータをノイズ化することを学び、訓練されたデノイザを使用してデータ分布から新しいサンプルを生成する。本稿では, 拡散サンプリングプロセスを再検討し, 試料品質劣化の根本原因を同定する。このデノイザは, トレーニング分布外(OOD)から遠く離れた地域では推定が不十分であり, これらのOOD領域ではサンプリングプロセスが必然的に評価される。これは全てのサンプリング手法において問題となり、特に並列サンプリングに移行する際には、動的の標本軌跡全体を並列に初期化および更新する必要があるため、多くのOOD評価が導かれる。この問題に対処するために,サンプルに付加される雑音のレベルを区別する新たな自己教師型学習目標を導入する。提案手法は, 拡散モデルが音量の異なる分布を識別する対数様比を暗黙的に定義することに基づいており, この表現は, 標準学習分布の外でのデノイザー性能に依存する。提案したコントラスト拡散訓練は逐次的および並列的な設定に有効であり, 並列サンプリング器の性能と速度を著しく向上することを示す。

Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions. This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models implicitly define a log-likelihood ratio that distinguishes distributions with different amounts of noise, and this expression depends on denoiser performance outside the standard training distribution. We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings, and it improves the performance and speed of parallel samplers significantly.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# 概念ベースモデルの構築による人間の最小努力とのスパーラス相関の緩和

Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort ( http://arxiv.org/abs/2407.08947v1 )

ライセンス: Link先を確認

Jeeyung Kim, Ze Wang, Qiang Qiu,

(参考訳) モデル解釈可能性の強化は、モデルがどのように予測を引き出すかを明らかにすることで、急激な相関に対処することができる。概念ボトルネックモデル(Concept Bottleneck Models, CBM)は、データアノテーションにおける人間の努力のコストが高いにもかかわらず、人間の理解可能な概念を通じてモデル行動の開示とガイドを行う、原則化された方法を提供する。本稿では,複数の基礎モデルの相乗効果を利用して,人的労力を伴わないCBMを構築する。我々は、事前学習モデル上に構築されたCBMの望ましくないバイアスを発見し、これらのバイアスに免疫を持ちながら事前学習モデルを利用するように設計された新しいフレームワークを提案する。具体的には、データセットの潜在的スパイラルな相関を評価し、画像の概念を注釈付けし、ロバスト性を改善するためのアノテーションを洗練するための基礎モデルを採用したシームレスなパイプラインを提供する。提案手法を複数のデータセット上で評価し,その解釈可能性を維持しつつ,素粒子相関によるモデル依存の低減効果を示した。

Enhancing model interpretability can address spurious correlations by revealing how models draw their predictions. Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts, albeit at a high cost of human efforts in data annotation. In this paper, we leverage a synergy of multiple foundation models to construct CBMs with nearly no human effort. We discover undesirable biases in CBMs built on pre-trained models and propose a novel framework designed to exploit pre-trained models while being immune to these biases, thereby reducing vulnerability to spurious correlations. Specifically, our method offers a seamless pipeline that adopts foundation models for assessing potential spurious correlations in datasets, annotating concepts for images, and refining the annotations for improved robustness. We evaluate the proposed method on multiple datasets, and the results demonstrate its effectiveness in reducing model reliance on spurious correlations while preserving its interpretability.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# 脳画像解析のためのDeep Learning Frameworkをコード化した対称性認識

Symmetry Awareness Encoded Deep Learning Framework for Brain Imaging Analysis ( http://arxiv.org/abs/2407.08948v1 )

ライセンス: Link先を確認

Yang Ma, Dongang Wang, Peilin Liu, Lynette Masters, Michael Barnett, Weidong Cai, Chenyu Wang,

(参考訳) 構造異常から機能障害まで、神経学的条件の不均一性は、医用画像解析タスクにおいて重要な課題である。さらに、注釈付きデータセットの可用性の制限により、ロバスト分析モデルの開発が制限される。この背景から,本研究では,ヒト脳の解剖学的対称的特徴を活用する新たなアプローチを導入し,その後の脳疾患の検出とセグメンテーション分析を強化する。左右半球の対称的特徴を符号化する新しいシンメトリー・アウェア・クロス・アテンション(SACA)モジュールが提案され、様々なMRIおよびCTで健康な脳画像と疾患の脳画像からなる広範囲な脳画像データセット上でネットワーク全体の事前学習を導くシンメトリー・アウェア・ヘッド(SAH)として対称的特徴を検出するプロキシタスクが提案されている。脳疾患の分類とセグメンテーションの両方を含む下流タスクの綿密な実験を通じて、我々のモデルは最先端の方法論よりも優れた性能を示し、特に対称性学習の重要性を強調している。本研究は, 事前トレーニングに対称性認識を取り入れることの有効性を提唱し, 医用画像解析のための新しいベンチマークを作成し, 正確かつ効率的な診断プロセスに向けて大きな前進を約束する。コードはhttps://github.com/bitMyron/sa-swin.comから入手できる。

The heterogeneity of neurological conditions, ranging from structural anomalies to functional impairments, presents a significant challenge in medical imaging analysis tasks. Moreover, the limited availability of well-annotated datasets constrains the development of robust analysis models. Against this backdrop, this study introduces a novel approach leveraging the inherent anatomical symmetrical features of the human brain to enhance the subsequent detection and segmentation analysis for brain diseases. A novel Symmetry-Aware Cross-Attention (SACA) module is proposed to encode symmetrical features of left and right hemispheres, and a proxy task to detect symmetrical features as the Symmetry-Aware Head (SAH) is proposed, which guides the pretraining of the whole network on a vast 3D brain imaging dataset comprising both healthy and diseased brain images across various MRI and CT. Through meticulous experimentation on downstream tasks, including both classification and segmentation for brain diseases, our model demonstrates superior performance over state-of-the-art methodologies, particularly highlighting the significance of symmetry-aware learning. Our findings advocate for the effectiveness of incorporating symmetry awareness into pretraining and set a new benchmark for medical imaging analysis, promising significant strides toward accurate and efficient diagnostic processes. Code is available at https://github.com/bitMyron/sa-swin.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# One-Shot Pose-Driving Face Animation Platform

One-Shot Pose-Driving Face Animation Platform ( http://arxiv.org/abs/2407.08949v1 )

ライセンス: Link先を確認

He Feng, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su,

(参考訳) 顔アニメーションの目的は、ビデオまたは音声入力から導かれる駆動条件を利用して、単一の参照顔から動的で表現力のある音声ヘッドビデオを生成することである。現在のアプローチでは、特定のアイデンティティを微調整する必要があることが多く、Wav2Poseモジュールの有効性が制限されているため、表現力のあるビデオの生成に失敗することが多い。ワンショットかつ連続的な音声ヘッドビデオの生成を容易にするため,Face LocatorとMotion Frame機構を統合し,既存のImage2Videoモデルを洗練する。その後、人間の顔ビデオデータセットを用いてモデルを最適化し、高品質で表現力のある音声ヘッドビデオを作成する能力を大幅に向上させた。さらに,Gradioフレームワークを用いたデモプラットフォームを開発し,プロセスの合理化を実現した。

The objective of face animation is to generate dynamic and expressive talking head videos from a single reference face, utilizing driving conditions derived from either video or audio inputs. Current approaches often require fine-tuning for specific identities and frequently fail to produce expressive videos due to the limited effectiveness of Wav2Pose modules. To facilitate the generation of one-shot and more consecutive talking head videos, we refine an existing Image2Video model by integrating a Face Locator and Motion Frame mechanism. We subsequently optimize the model using extensive human face video datasets, significantly enhancing its ability to produce high-quality and expressive talking head videos. Additionally, we develop a demo platform using the Gradio framework, which streamlines the process, enabling users to quickly create customized talking head videos.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# 画像復元のための周波数選択によるよりリッチで高精度な情報探索

Exploring Richer and More Accurate Information via Frequency Selection for Image Restoration ( http://arxiv.org/abs/2407.08950v1 )

ライセンス: Link先を確認

Hu Gao, Depeng Dang,

(参考訳) 画像復元は、破損した画像から高品質な画像を復元することを目的としている。既存の多くの手法は、主に空間領域に注目し、周波数変動の理解を無視し、スキップ接続における暗騒音の影響を無視している。本稿では,空間および周波数領域の知識をシームレスに統合し,よりリッチで正確な情報を選択的に回復するマルチスケール周波数選択ネットワーク(MSFSNet)を提案する。具体的には、まずまず空間的特徴を捉え、周波数知識を統合するために異なるスケールで動的フィルタ選択モジュール(DFS)に入力する。 DFSは学習可能なフィルタを用いて高周波数・低周波情報を生成し、周波数クロスアテンション機構(FCAM)を用いて回復する最も多くの情報を決定する。マルチスケールで正確なハイブリッド特徴集合を学習するために,コンテキスト特徴を利用したスキップ特徴融合ブロック(SFF)を開発し,どの情報をスキップ接続で伝播すべきかを識別する。 DFSとSFFがジェネリックプラグインモジュールであることは注目に値する。様々な画像復元タスクに対する大規模な実験により、MSFSNetは最先端のアルゴリズムに匹敵する性能を達成できることを示した。

Image restoration aims to recover high-quality images from their corrupted counterparts. Many existing methods primarily focus on the spatial domain, neglecting the understanding of frequency variations and ignoring the impact of implicit noise in skip connections. In this paper, we introduce a multi-scale frequency selection network (MSFSNet) that seamlessly integrates spatial and frequency domain knowledge, selectively recovering richer and more accurate information. Specifically, we initially capture spatial features and input them into dynamic filter selection modules (DFS) at different scales to integrate frequency knowledge. DFS utilizes learnable filters to generate high and low-frequency information and employs a frequency cross-attention mechanism (FCAM) to determine the most information to recover. To learn a multi-scale and accurate set of hybrid features, we develop a skip feature fusion block (SFF) that leverages contextual features to discriminatively determine which information should be propagated in skip-connections. It is worth noting that our DFS and SFF are generic plug-in modules that can be directly employed in existing networks without any adjustments, leading to performance improvements. Extensive experiments across various image restoration tasks demonstrate that our MSFSNet achieves performance that is either superior or comparable to state-of-the-art algorithms.

翻訳日:2024-07-16 01:06:33 公開日:2024-07-12

# 検知・調査・判断:Few-shot Fakeニュース検出のための新しいLCMベースのフレームワーク

Detect, Investigate, Judge and Determine: A Novel LLM-based Framework for Few-shot Fake News Detection ( http://arxiv.org/abs/2407.08952v1 )

ライセンス: Link先を確認

Ye Liu, Jiajun Zhu, Kai Zhang, Haoyu Tang, Yanghai Zhang, Xukai Liu, Qi Liu, Enhong Chen,

(参考訳) Few-Shot Fake News Detection (FS-FND) は、極めて低リソースのシナリオにおいて、非正確なニュースを実際のニュースと区別することを目的としている。ソーシャルメディア上でのフェイクニュースの拡散や有害な影響により、このタスクは注目を集めている。大きな言語モデル(LLM)は、豊富な事前知識と優れたコンテキスト内学習能力の助けを借りて、競争性能を実証している。しかし、既存の手法では、LLMの可能性を著しく損なう「理解のあいまいさ」や「インフォメーション・スカシティ」といった重大な制限に直面している。これらの欠点に対処するため、内部および外部からLLMを強化するために設計されたDual-perspective Augmented Fake News Detection (DAFND)モデルを提案する。具体的には、DAFNDはまず、検出モジュールを通じて各ニュース記事のキーワードを識別する。その後、DAFNDは、現在のニュースに関する情報の内部および外部の貴重な情報を検索するための調査モジュールを創造的に設計し、続いて別の審査モジュールがそれぞれの2つの予測結果を導出する。最後に、決定モジュールはこれらの2つの予測をさらに統合し、最終的な結果を引き出す。 2つの公開データセットに対する大規模な実験により,提案手法の有効性,特に低リソース環境での有効性が示された。

Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. Large Language Models (LLMs) have demonstrated competitive performance with the help of their rich prior knowledge and excellent in-context learning abilities. However, existing methods face significant limitations, such as the Understanding Ambiguity and Information Scarcity, which significantly undermine the potential of LLMs. To address these shortcomings, we propose a Dual-perspective Augmented Fake News Detection (DAFND) model, designed to enhance LLMs from both inside and outside perspectives. Specifically, DAFND first identifies the keywords of each news article through a Detection Module. Subsequently, DAFND creatively designs an Investigation Module to retrieve inside and outside valuable information concerning to the current news, followed by another Judge Module to derive its respective two prediction results. Finally, a Determination Module further integrates these two predictions and derives the final result. Extensive experiments on two publicly available datasets show the efficacy of our proposed method, particularly in low-resource settings.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# 資産価格における帰属方法:彼らはリスクを考慮しているか?

Attribution Methods in Asset Pricing: Do They Account for Risk? ( http://arxiv.org/abs/2407.08953v1 )

ライセンス: Link先を確認

Dangxing Chen, Yuan Gao,

(参考訳) 過去数十年間、機械学習モデルは極めて成功した。公理的帰属法の結果、特徴的貢献はより明確かつ厳密に説明されている。しかし、公理とともにドメイン知識を調べる研究はほとんどない。本研究では,リスク管理と密接に関連する金融の資産価格について検討する。したがって、機械学習モデルを適用する際には、帰属法が根底にあるリスクを正確に反映することを保証する必要がある。本研究では、資産価格ドメインの知識から導かれるいくつかの公理を提示し、研究する。シャプリー値と積分勾配は、ほとんどの公理を保存するが、どちらも全ての公理を満たすことはできない。分析的および実証的な例を用いて、帰属法がいかにリスクを反映し、いつ使用すべきでないかを実証する。

Over the past few decades, machine learning models have been extremely successful. As a result of axiomatic attribution methods, feature contributions have been explained more clearly and rigorously. There are, however, few studies that have examined domain knowledge in conjunction with the axioms. In this study, we examine asset pricing in finance, a field closely related to risk management. Consequently, when applying machine learning models, we must ensure that the attribution methods reflect the underlying risks accurately. In this work, we present and study several axioms derived from asset pricing domain knowledge. It is shown that while Shapley value and Integrated Gradients preserve most axioms, neither can satisfy all axioms. Using extensive analytical and empirical examples, we demonstrate how attribution methods can reflect risks and when they should not be used.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# PriRoAgg: フェデレートラーニングのための最小限のプライバシリークでロバストモデル集約を実現する

PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning ( http://arxiv.org/abs/2407.08954v1 )

ライセンス: Link先を確認

Sizai Hou, Songze Li, Tayyebeh Jahani-Nezhad, Giuseppe Caire,

(参考訳) フェデレートラーニング(FL)は、ユーザプライバシを保ちながら、大規模な分散ユーザデータを活用できる可能性から、最近大きな勢いを増している。しかし、FLの典型的なパラダイムは、プライバシとロバスト性の両方の課題に直面している。送信されたモデル更新は、機密性の高いユーザ情報を漏洩させる可能性があるし、ローカルトレーニングプロセスの集中的な制御の欠如は、モデル更新に対する悪意のある操作の影響を受けやすいグローバルモデルを残している。ワンサーバFL設定の下で両方の問題に対処しようとする現在のソリューションは、以下の側面で不足している。 1) 高度な攻撃(例えば、個別更新の基準のチェックなど)に対して不十分な簡易な妥当性確認のために設計された。 2) より複雑なロバストな集約アルゴリズムに対する部分的なプライバシリーク(例えば、マルチスクラムではモデル更新間の距離がリークされる)。本研究では,より高度なロバストなアグリゲーションを実現するためには,ユーザ情報量の最小化を図った,新たなセキュリティ概念であるアグリゲートプライバシを,ユーザ更新の集計統計の形で形式化する。我々は、Lagrange符号化計算と分散ゼロ知識証明を利用した汎用フレームワークPriRoAggを開発し、集約されたプライバシを満たすとともに、幅広いロバストな集約アルゴリズムを実行する。 PriRoAggの具体的なインスタンス化として、最先端のロバストアルゴリズムに基づく2つのセキュアでロバストなプロトコルを構築し、セキュリティと複雑性に関する完全な理論的分析を行う。これらのプロトコルに対して大規模な実験を行い、様々なモデルの整合性攻撃に対する頑健さと、ベースラインに対する効率上の優位性を実証した。

Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data while preserving user privacy. However, the typical paradigm of FL faces challenges of both privacy and robustness: the transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. Current solutions attempting to address both problems under the one-server FL setting fall short in the following aspects: 1) designed for simple validity checks that are insufficient against advanced attacks (e.g., checking norm of individual update); and 2) partial privacy leakage for more complicated robust aggregation algorithms (e.g., distances between model updates are leaked for multi-Krum). In this work, we formalize a novel security notion of aggregated privacy that characterizes the minimum amount of user information, in the form of some aggregated statistics of users' updates, that is necessary to be revealed to accomplish more advanced robust aggregation. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy. As concrete instantiations of PriRoAgg, we construct two secure and robust protocols based on state-of-the-art robust algorithms, for which we provide full theoretical analyses on security and complexity. Extensive experiments are conducted for these protocols, demonstrating their robustness against various model integrity attacks, and their efficiency advantages over baselines.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# DeCE:裏口攻撃に備えて設計したクロスエントロピー障害

DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks ( http://arxiv.org/abs/2407.08956v1 )

ライセンス: Link先を確認

Guang Yang, Yu Zhou, Xiang Chen, Xiangyu Zhang, Terry Yue Zhuo, David Lo, Taolue Chen,

(参考訳) コード言語モデル(CLM)、特にディープラーニングを活用するものは、コードインテリジェンス領域において大きな成功を収めています。しかし、セキュリティの問題、特にバックドア攻撃は、このプロセスでは見過ごされがちである。これまでの研究では、CLMのバックドア攻撃の設計に焦点が当てられていたが、効果的な防御は適切に対処されていない。特に、自然言語処理からの既存の防御手法は、CLMに直接適用しても効果が十分ではなく、汎用性に欠けており、いくつかのモデルやシナリオではうまく機能するが、他のモデルではうまく機能しないため、バックドア攻撃を継続的に軽減するには不十分である。このギャップを埋めるために,我々はまず,CLMの訓練中に発生する「早期学習」現象を確認した。この現象は、モデルが最初はトレーニングデータの主な特徴に焦点を当てていたが、時間が経つにつれてバックドアのトリガーに敏感になり、バックドアの攻撃に対する過度な適合と感受性をもたらす可能性があることを示唆している。次に, バックドアへの過度な適合は, クロスエントロピー損失関数の使用による結果であり, クロスエントロピーの非有界性は, 有毒データの特徴にますます集中させる。そこで本研究では,知覚的分布をブレンドしてラベルスムースにラベルスムースにすることで,モデルがバックドアトリガに過度に収まることを防止し,バックドア攻撃に対するCLMの安全性を高めることで,汎用的で効果的な損失関数DeCEを提案する。本手法の有効性を検証するために,コード合成タスクを実験シナリオとして選択する。各種コード合成データセット,モデル,有毒比に対する実験により,CLMの安全性を高める上でのDeCEの適用性と有効性を示した。

Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# デバッグのための実用的かつ便利なプログラム修復に向けて

Towards Practical and Useful Automated Program Repair for Debugging ( http://arxiv.org/abs/2407.08958v1 )

ライセンス: Link先を確認

Qi Xin, Haojun Wu, Steven P. Reiss, Jifeng Xuan,

(参考訳) 現在の自動プログラム修復(APR)技術は、現実的なデバッグに十分な実用的かつ有用なものではない。それらは、パッチ検証の正確性基準と頻繁なプログラムの再実行として、包括的なテストケースのスイートを必要とすること、高速ではないこと、プログラムの複数箇所を修正して、一般的に発生する複雑なバグを修復する能力など、非現実的な仮定に依存している。 APRの実用性、有効性、有用性を大幅に改善して、デバッグを支援したいと思っています。この目標に向けて,統合開発環境(IDE)で動作する対話型修復システムであるPracAPRを構想し,デバッグに有効な修復提案を行う。 PracAPRはテストスイートやプログラムの再実行を必要としない。開発者はIDEデバッガを使用しており、問題が観測された場所でプログラムが停止していると仮定する。問題仕様を得るために開発者と対話する。この仕様に基づいて、テストフリーでフロー分析に基づくフォールトローカライゼーション、大規模言語モデルに基づく局所的な修復と調整された戦略駆動のグローバルな修復を組み合わせたパッチ生成、およびシミュレーショントレース比較に基づくプログラム再実行不要なパッチ検証を実行し、修復を提案する。 PracAPRを使用することで、APRを便利にし、デバッグの日常的な部分へと、大きな一歩を踏み出したいと考えています。

Current automated program repair (APR) techniques are far from being practical and useful enough to be considered for realistic debugging. They rely on unrealistic assumptions including the requirement of a comprehensive suite of test cases as the correctness criterion and frequent program re-execution for patch validation; they are not fast; and their ability of repairing the commonly arising complex bugs by fixing multiple locations of the program is very limited. We hope to substantially improve APR's practicality, effectiveness, and usefulness to help people debug. Towards this goal, we envision PracAPR, an interactive repair system that works in an Integrated Development Environment (IDE) to provide effective repair suggestions for debugging. PracAPR does not require a test suite or program re-execution. It assumes that the developer uses an IDE debugger and the program has suspended at a location where a problem is observed. It interacts with the developer to obtain a problem specification. Based on the specification, it performs test-free, flow-analysis-based fault localization, patch generation that combines large language model-based local repair and tailored strategy-driven global repair, and program re-execution-free patch validation based on simulated trace comparison to suggest repairs. By having PracAPR, we hope to take a significant step towards making APR useful and an everyday part of debugging.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# 数ショット階層テキスト分類のための反復推論の連鎖によるドメイン階層適応

Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification ( http://arxiv.org/abs/2407.08959v1 )

ライセンス: Link先を確認

Ke Ji, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Jingsheng Gao, Ziyu Shang,

(参考訳) 近年,様々な事前学習型言語モデル (PLM) が提案されている。しかし、PLMにおける非構造的事前知識に制限されているため、特に下流データが極めて少ない場合に、階層的テキスト分類(HTC)のような複雑な構造化シナリオで一貫した性能を維持することは困難である。主な課題は、PLMの非構造化セマンティック空間を下流ドメイン階層に転送する方法である。複数ラベルの分類やグラフニューラルネットワーク(GNN)を用いてラベル階層をインジェクトする以前のHTCの作業とは異なり、本研究では、HTCの問題を数ショットの条件下で研究し、構造化されていない方法でPLMの知識を下流階層に適応させる。技術的には、階層的反復条件ランダムフィールド (HierICRF) と呼ばれる単純な手法を設計し、最もドメインが混在する方向を探索し、ドメイン階層適応を階層的反復言語モデリング問題として巧妙に構築し、推論中に階層的一貫性を自己補正し、階層的一貫性の維持による知識伝達を実現する。私たちは、さまざまなアーキテクチャ上でHierICRFを実行し、2つの人気のあるHTCデータセット上で大規模な実験を行い、HierICRFによるプロンプトによって、平均的なMicro-F1が28.80%、Macro-F1が36.29%から1.5%向上し、SOTAの階層的一貫性が保たれる一方で、以前のSOTAベースラインよりも大幅に向上することを示した。

Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex structured scenarios, such as hierarchical text classification (HTC), especially when the downstream data is extremely scarce. The main challenge is how to transfer the unstructured semantic space in PLMs to the downstream domain hierarchy. Unlike previous work on HTC which directly performs multi-label classification or uses graph neural network (GNN) to inject label hierarchy, in this work, we study the HTC problem under a few-shot setting to adapt knowledge in PLMs from an unstructured manner to the downstream hierarchy. Technically, we design a simple yet effective method named Hierarchical Iterative Conditional Random Field (HierICRF) to search the most domain-challenging directions and exquisitely crafts domain-hierarchy adaptation as a hierarchical iterative language modeling problem, and then it encourages the model to make hierarchical consistency self-correction during the inference, thereby achieving knowledge transfer with hierarchical consistency preservation. We perform HierICRF on various architectures, and extensive experiments on two popular HTC datasets demonstrate that prompt with HierICRF significantly boosts the few-shot HTC performance with an average Micro-F1 by 28.80% to 1.50% and Macro-F1 by 36.29% to 1.5% over the previous state-of-the-art (SOTA) baselines under few-shot settings, while remaining SOTA hierarchical consistency performance.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# 胸部CTにおけるセグメンテーション前訓練のための組織競合性セミマスクオートエンコーダ

Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT ( http://arxiv.org/abs/2407.08961v1 )

ライセンス: Link先を確認

Jie Zheng, Ru Wen, Haiqin Hu, Lina Wei, Kui Su, Wei Chen, Chen Liu, Jun Wang,

(参考訳) 既存のマスク付き画像モデリング(MIM)は、胸部CTに適用した場合の2つの制限に直面する未ラベル画像から物体の像を知覚するための空間パッチベースのマスク再構築戦略に依存している。 1)CT画像に示される複雑な解剖学的詳細による非効率な特徴学習 2)上流モデルと下流モデルとの入力格差による準最適知識伝達。これらの課題に対処するため,胸部CT画像のモデリングのためのCTS-MAEと呼ばれる新しいMIM法を提案する。私たちの手法には2つの新しい設計があります。 1)より微細な解剖学的特徴を捉えるための組織ベースのマスキング・リコンストラクション戦略 2) 上流モデルと下流モデルのギャップを埋めるために,マスクとオリジナル画像ビューの対比学習を施したデュアルAEアーキテクチャ。本手法の有効性を検証するために, 肺炎, 縦隔腫瘍, 各種臓器の分節化に関わる課題に対して, 代表的コントラスト, 生成的, ハイブリッド自己教師型学習法を体系的に検討した。その結果,既存の手法と比較して組織認識表現をより効果的に学習し,全タスクのセグメンテーション性能を大幅に向上させることができた。

Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream models. To address these issues, we propose a new MIM method named Tissue-Contrastive Semi-Masked Autoencoder (TCS-MAE) for modeling chest CT images. Our method has two novel designs: 1) a tissue-based masking-reconstruction strategy to capture more fine-grained anatomical features, and 2) a dual-AE architecture with contrastive learning between the masked and original image views to bridge the gap of the upstream and downstream models. To validate our method, we systematically investigate representative contrastive, generative, and hybrid self-supervised learning methods on top of tasks involving segmenting pneumonia, mediastinal tumors, and various organs. The results demonstrate that, compared to existing methods, our TCS-MAE more effectively learns tissue-aware representations, thereby significantly enhancing segmentation performance across all tasks.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# 熱力学エントロピーと情報

Information vs Thermodynamic Entropy ( http://arxiv.org/abs/2407.08962v1 )

ライセンス: Link先を確認

Phil Attard,

(参考訳) シャノン情報は熱力学のエントロピーと異なり、熱力学の第二法則とは無関係である。

The Shannon information is shown to be different to the thermodynamic entropy, and indifferent to the Second Law of Thermodynamics.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# 多様性最適化における局所最適性:非自明なオフスプリング人口は不可欠である

Local Optima in Diversity Optimization: Non-trivial Offspring Population is Essential ( http://arxiv.org/abs/2407.08963v1 )

ライセンス: Link先を確認

Denis Antipov, Aneta Neumann, Frank Neumann,

(参考訳) 多様性最適化の主目的は、適合性にある程度の低い制約を満たす多様なソリューションを見つけることである。進化的アルゴリズム(EA)は、自然に解の集団を最適化するために設計されているため、そのようなタスクにしばしば使用される。 EDOと呼ばれるこの多様性最適化のアプローチは、以前は理論的な観点から研究されてきたが、ほとんどの研究は、(\mu + 1)$ EAのような自明な子孫を持つEAのみを考慮に入れている。そこで本論文では,従来の単一目的最適化と多様性最適化の重大な違い,すなわち,少なくとも2つの個人を一度に置き換えることによってのみ逃れることのできる局所的最適集団の存在,すなわち$(\mu + 1)$アルゴリズムでは不可能である,という問題を例に挙げる。また、$(\mu + \lambda)$ EA with $\lambda \ge \mu$ は、ブランソン・アンド・サットン(TCS 2023)にインスパイアされた突然変異演算子を使用する場合、$k$-vertex カバー上の多様な集団を効果的に見つけることができることを示した。多様性を最適化するとき、$(\mu + \lambda)$ EAに生じる部分集合選択の問題を避けるために、人口に対する$(1 + 1)$ EAの類似である$(1_\mu + 1_\mu)$ EA$_D$も提案する。

The main goal of diversity optimization is to find a diverse set of solutions which satisfy some lower bound on their fitness. Evolutionary algorithms (EAs) are often used for such tasks, since they are naturally designed to optimize populations of solutions. This approach to diversity optimization, called EDO, has been previously studied from theoretical perspective, but most studies considered only EAs with a trivial offspring population such as the $(\mu + 1)$ EA. In this paper we give an example instance of a $k$-vertex cover problem, which highlights a critical difference of the diversity optimization from the regular single-objective optimization, namely that there might be a locally optimal population from which we can escape only by replacing at least two individuals at once, which the $(\mu + 1)$ algorithms cannot do. We also show that the $(\mu + \lambda)$ EA with $\lambda \ge \mu$ can effectively find a diverse population on $k$-vertex cover, if using a mutation operator inspired by Branson and Sutton (TCS 2023). To avoid the problem of subset selection which arises in the $(\mu + \lambda)$ EA when it optimizes diversity, we also propose the $(1_\mu + 1_\mu)$ EA$_D$, which is an analogue of the $(1 + 1)$ EA for populations, and which is also efficient at optimizing diversity on the $k$-vertex cover problem.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# 協調型適応型クルーズ制御のためのコミュニケーション・アウェア強化学習

Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control ( http://arxiv.org/abs/2407.08964v1 )

ライセンス: Link先を確認

Sicong Jiang, Seongjin Choi, Lijun Sun,

(参考訳) コラボレーティブ・アダプティブ・クルーズ・コントロール(CACC)は、コネクテッド・アンド・オートマチック・ビークルズ(CAV)における交通効率と安全性を高める上で重要な役割を担っている。強化学習(RL)は、CACCにおける複雑な意思決定プロセスの最適化に有効であることが証明され、システム性能と適応性が改善された。 RLのアプローチの中で、マルチエージェント強化学習(MARL)は、分散実行による集中訓練(CTDE)を通じて、複数のCAV間で協調的な行動を可能にすることで、顕著な可能性を示している。しかし、MARLはスケーラビリティの問題に直面することが多く、特にCACC車両が突然小隊に加わるか去ると性能が低下する。これらの課題に対処するため,コミュニケーション・アウェア・強化学習(CA-RL)を提案する。 CA-RLは、前方および後方情報伝送モジュールを介して車両の通信情報を抽出し、圧縮する通信対応モジュールを含む。これにより、CACCトラフィックフロー内での効率的な循環情報伝搬が可能となり、ポリシーの整合性を確保し、CACCにおけるMARLのスケーラビリティ問題を軽減できる。実験の結果,CA-RLは様々な交通シナリオにおいてベースライン法よりも優れており,車両数の変化にもかかわらず信頼性の高い性能を維持しつつ,優れたスケーラビリティ,堅牢性,システム全体の性能を実現していることがわかった。

Cooperative Adaptive Cruise Control (CACC) plays a pivotal role in enhancing traffic efficiency and safety in Connected and Autonomous Vehicles (CAVs). Reinforcement Learning (RL) has proven effective in optimizing complex decision-making processes in CACC, leading to improved system performance and adaptability. Among RL approaches, Multi-Agent Reinforcement Learning (MARL) has shown remarkable potential by enabling coordinated actions among multiple CAVs through Centralized Training with Decentralized Execution (CTDE). However, MARL often faces scalability issues, particularly when CACC vehicles suddenly join or leave the platoon, resulting in performance degradation. To address these challenges, we propose Communication-Aware Reinforcement Learning (CA-RL). CA-RL includes a communication-aware module that extracts and compresses vehicle communication information through forward and backward information transmission modules. This enables efficient cyclic information propagation within the CACC traffic flow, ensuring policy consistency and mitigating the scalability problems of MARL in CACC. Experimental results demonstrate that CA-RL significantly outperforms baseline methods in various traffic scenarios, achieving superior scalability, robustness, and overall system performance while maintaining reliable performance despite changes in the number of participating vehicles.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# Lite-SAMは、あらゆるセグメンテーションに必要なもの

Lite-SAM Is Actually What You Need for Segment Everything ( http://arxiv.org/abs/2407.08965v1 )

ライセンス: Link先を確認

Jianhai Fu, Yuanjie Yu, Ningchuan Li, Yi Zhang, Qichao Chen, Jianping Xiong, Jun Yin, Zhiyu Xiang,

(参考訳) 本稿では、計算コストと冗長性を低減するために設計されたSegEveryタスクの効率的なエンドツーエンドソリューションであるLite-SAMを紹介する。 Lite-SAMは、CNN-Transformerハイブリッドエンコーダ(LiteViT)、自動プロンプトプロポーザルネットワーク(AutoPPN)、従来のプロンプトエンコーダ、マスクデコーダの4つの主要コンポーネントで構成されている。これらのコンポーネントはすべてSAMフレームワークに統合されます。我々のLiteViTは、高性能で軽量なバックボーンネットワークであり、1.16Mのパラメータしか持たない。また,AutoPPNを導入し,プロンプトボックスとポイント生成のための革新的なエンドツーエンド手法を提案する。これは従来のグリッドサーチサンプリング法よりも改善され、そのユニークな設計により、SAMシリーズのアルゴリズムに容易に統合でき、使い勝手を向上させることができる。公開データセットとプライベートデータセットの両方で、Lite-SAMを徹底的にベンチマークしました。評価には、パラメータの数、SegEveryの実行時間、精度など、幅広い普遍的な指標が含まれていた。その結果、Lite-SAMはリーン4.2Mパラメータで動作しており、SAM、MobileSAM、Edge-SAM、EfficientViT-SAM、MobileSAM-v2よりも43x、31x、20x、21x、1.6xのパフォーマンス改善を示しながら、競争の正確さを維持していることがわかった。これにより、Lite-SAMは、パフォーマンスと精度の最適な均衡を達成し、ドメインに新しい最先端(SOTA)ベンチマークを設定できる。

This paper introduces Lite-SAM, an efficient end-to-end solution for the SegEvery task designed to reduce computational costs and redundancy. Lite-SAM is composed of four main components: a streamlined CNN-Transformer hybrid encoder (LiteViT), an automated prompt proposal network (AutoPPN), a traditional prompt encoder, and a mask decoder. All these components are integrated within the SAM framework. Our LiteViT, a high-performance lightweight backbone network, has only 1.16M parameters, which is a 23% reduction compared to the lightest existing backbone network Shufflenet. We also introduce AutoPPN, an innovative end-to-end method for prompt boxes and points generation. This is an improvement over traditional grid search sampling methods, and its unique design allows for easy integration into any SAM series algorithm, extending its usability. we have thoroughly benchmarked Lite-SAM across a plethora of both public and private datasets. The evaluation encompassed a broad spectrum of universal metrics, including the number of parameters, SegEvery execution time, and accuracy. The findings reveal that Lite-SAM, operating with a lean 4.2M parameters, significantly outpaces its counterparts, demonstrating performance improvements of 43x, 31x, 20x, 21x, and 1.6x over SAM, MobileSAM, Edge-SAM, EfficientViT-SAM, and MobileSAM-v2 respectively, all the while maintaining competitive accuracy. This underscores Lite-SAM's prowess in achieving an optimal equilibrium between performance and precision, thereby setting a new state-of-the-art(SOTA) benchmark in the domain.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# LAPT:視覚言語モデルを用いたOOD検出のためのラベル駆動型自動プロンプトチューニング

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models ( http://arxiv.org/abs/2407.08966v1 )

ライセンス: Link先を確認

Yabin Zhang, Wenjie Zhu, Chenhang He, Lei Zhang,

(参考訳) オフ・オブ・ディストリビューション(OOD)検出は、未知のクラスからのサンプルを特定し、予期しない入力によるエラーを低減するため、モデルの信頼性に不可欠である。 CLIPのような視覚言語モデル(VLM)は、マルチモーダル情報を統合することで、OOD検出の強力なツールとして現れている。しかし、そのようなシステムの実践的応用は、ドメインの専門知識を必要とし、言語的なニュアンスに敏感な手動プロンプト工学によって挑戦されている。本稿では,手動プロンプトエンジニアリングの必要性を低減させるOOD検出の新しいアプローチである,ラベル駆動型自動プロンプトチューニング(LAPT)を提案する。 In-distriion (ID) クラス名と負ラベルを自動的にマイニングする分布認識プロンプトを開発した。これらのクラスラベルに関連付けられたトレーニングサンプルは、画像合成と検索によって自律的に収集され、手作業なしで即時学習が可能である。簡単なクロスエントロピー損失を即時最適化に利用し、クロスモーダルとクロスディストリビューションの混合戦略を用いて、画像ノイズを低減し、分布間の中間空間を探索する。 LAPTフレームワークは自律的に動作し、IDクラス名のみを入力として必要とし、手動による介入を不要とする。広範な実験により、LAPTは手作業によるプロンプトを一貫して上回り、OOD検出の新しい標準を設定した。さらに、LAPTは、IDとOODの区別を強化するだけでなく、ID分類精度も向上し、共変量シフトに対する一般化ロバスト性を強化し、フルスペクトルOOD検出タスクに挑戦する際、優れたパフォーマンスをもたらす。コードは \url{https://github.com/YBZh/LAPT} で公開されている。

Out-of-distribution (OOD) detection is crucial for model reliability, as it identifies samples from unknown classes and reduces errors due to unexpected inputs. Vision-Language Models (VLMs) such as CLIP are emerging as powerful tools for OOD detection by integrating multi-modal information. However, the practical application of such systems is challenged by manual prompt engineering, which demands domain expertise and is sensitive to linguistic nuances. In this paper, we introduce Label-driven Automated Prompt Tuning (LAPT), a novel approach to OOD detection that reduces the need for manual prompt engineering. We develop distribution-aware prompts with in-distribution (ID) class names and negative labels mined automatically. Training samples linked to these class labels are collected autonomously via image synthesis and retrieval methods, allowing for prompt learning without manual effort. We utilize a simple cross-entropy loss for prompt optimization, with cross-modal and cross-distribution mixing strategies to reduce image noise and explore the intermediate space between distributions, respectively. The LAPT framework operates autonomously, requiring only ID class names as input and eliminating the need for manual intervention. With extensive experiments, LAPT consistently outperforms manually crafted prompts, setting a new standard for OOD detection. Moreover, LAPT not only enhances the distinction between ID and OOD samples, but also improves the ID classification accuracy and strengthens the generalization robustness to covariate shifts, resulting in outstanding performance in challenging full-spectrum OOD detection tasks. Codes are available at \url{https://github.com/YBZh/LAPT}.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# 従来のRE手法と大規模言語モデルの統合によるFew-Shot関係抽出

Empowering Few-Shot Relation Extraction with The Integration of Traditional RE Methods and Large Language Models ( http://arxiv.org/abs/2407.08967v1 )

ライセンス: Link先を確認

Ye Liu, Kai Zhang, Aoran Gan, Linan Yue, Feng Hu, Qi Liu, Enhong Chen,

(参考訳) Few-Shot Relation extract (FSRE)は、限られたトレーニングインスタンスを利用するリレーショナル抽出(RE)のサブタスクであり、非常に低リソースのシナリオでテキスト情報を抽出する能力により、自然言語処理(NLP)の研究者にアピールする。 FSREの主要な手法は、事前学習言語モデル(PLM)に基づく微調整または即時チューニング技術である。近年,大規模言語モデル (LLM) の出現により,多くの研究者が文脈学習 (ICL) を通じてFSREを探求している。しかし、従来のREモデルやLLMに基づいたメソッドには、かなりの制限がある。従来のREモデルは、必要な事前知識の欠如によって妨げられ、一方LLMは、REのタスク固有の能力に不足しています。これらの欠点に対処するため,従来のREモデルとLLMを相乗的に組み合わせたデュアルシステム拡張関係エクストラクタ(DSARE)を提案する。具体的には、DSARE は従来の RE モデルに LLM の以前の知識を革新的に注入し、関係抽出による RE に対する LLM のタスク固有の適性を向上させる。さらに、統合予測モジュールを用いて、これらの2つの予測を共同で検討し、最終的な結果を導出する。大規模実験により提案手法の有効性が示された。

Few-Shot Relation Extraction (FSRE), a subtask of Relation Extraction (RE) that utilizes limited training instances, appeals to more researchers in Natural Language Processing (NLP) due to its capability to extract textual information in extremely low-resource scenarios. The primary methodologies employed for FSRE have been fine-tuning or prompt tuning techniques based on Pre-trained Language Models (PLMs). Recently, the emergence of Large Language Models (LLMs) has prompted numerous researchers to explore FSRE through In-Context Learning (ICL). However, there are substantial limitations associated with methods based on either traditional RE models or LLMs. Traditional RE models are hampered by a lack of necessary prior knowledge, while LLMs fall short in their task-specific capabilities for RE. To address these shortcomings, we propose a Dual-System Augmented Relation Extractor (DSARE), which synergistically combines traditional RE models with LLMs. Specifically, DSARE innovatively injects the prior knowledge of LLMs into traditional RE models, and conversely enhances LLMs' task-specific aptitude for RE through relation extraction augmentation. Moreover, an Integrated Prediction module is employed to jointly consider these two respective predictions and derive the final results. Extensive experiments demonstrate the efficacy of our proposed method.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# SlideGCD:全スライド画像分類のための知識蒸留によるグラフ協調学習

SlideGCD: Slide-based Graph Collaborative Training with Knowledge Distillation for Whole Slide Image Classification ( http://arxiv.org/abs/2407.08968v1 )

ライセンス: Link先を確認

Tong Shu, Jun Shi, Dongdong Sun, Zhiguo Jiang, Yushan Zheng,

(参考訳) 既存のWSI分析法は、腫瘍の病理組織学的特徴ががん診断の重要な指針である、という結論に基づいている。特に、がんの進化は連続的なプロセスであるため、様々な段階、解剖学的位置、患者との相関や差異を考慮する必要がある。しかし、最近の研究は主にスライド間の相関を無視して、単一のWSIの内部コンテキスト情報に焦点を当てている。スライド相互相関の導入がWSI表現学習の改善をもたらすかどうかを検証するため,既存のマルチインスタンス学習(MIL)手法をバックボーンとして考慮し,WSI分類タスクをノード分類問題としてフォッジする,汎用的なWSI解析パイプラインであるSlideGCDを提案する。より具体的には、SlideGCDは、その後の広範なスライドベースのグラフ構築のために、以前のスライド埋め込みを格納するノードバッファを宣言し、グラフ学習を実施して、スライドベースのグラフに暗示される相関関係を探索する。さらに、MIL分類器とグラフ学習を2つの並列ワークフローに分類し、知識蒸留をデプロイして、識別可能な情報をグラフニューラルネットワークに転送する。 2つのTCGAベンチマークデータセットで、これまでの4つの最先端MILメソッドのSlideGCDによる一貫したパフォーマンス向上が観察された。コードはhttps://github.com/HFUT-miaLab/SlideGCDで入手できる。

Existing WSI analysis methods lie on the consensus that histopathological characteristics of tumors are significant guidance for cancer diagnostics. Particularly, as the evolution of cancers is a continuous process, the correlations and differences across various stages, anatomical locations and patients should be taken into account. However, recent research mainly focuses on the inner-contextual information in a single WSI, ignoring the correlations between slides. To verify whether introducing the slide inter-correlations can bring improvements to WSI representation learning, we propose a generic WSI analysis pipeline SlideGCD that considers the existing multi-instance learning (MIL) methods as the backbone and forge the WSI classification task as a node classification problem. More specifically, SlideGCD declares a node buffer that stores previous slide embeddings for subsequent extensive slide-based graph construction and conducts graph learning to explore the inter-correlations implied in the slide-based graph. Moreover, we frame the MIL classifier and graph learning into two parallel workflows and deploy the knowledge distillation to transfer the differentiable information to the graph neural network. The consistent performance boosting, brought by SlideGCD, of four previous state-of-the-art MIL methods is observed on two TCGA benchmark datasets. The code is available at https://github.com/HFUT-miaLab/SlideGCD.

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# Llamaの検出 - 大規模言語モデルによるスマートコントラクトの脆弱性検出

Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models ( http://arxiv.org/abs/2407.08969v1 )

ライセンス: Link先を確認

Peter Ince, Xiapu Luo, Jiangshan Yu, Joseph K. Liu, Xiaoning Du,

(参考訳) 本稿では, OpenAI の GPT-4 がよく動作するが, スマートコントラクトの脆弱性検出において, GPT-4 よりも優れたオープンソースモデルを微調整できるという仮説を検証した。我々はMetaのCode Llamaと17kプロンプトのデータセット、Llama - Foundation と Detect Llama - Instruct の2つのモデルを微調整し、OpenAI の GPT-3.5 Turbo Model (GPT-3.5FT) を微調整する。次に、これらのモデルとランダムなベースラインを、GPT-4とGPT-4のTurboに対して開発したテストセットに基づいて評価し、データセットから8つの脆弱性を検出し、2つの最上位の脆弱性(重み付きF1スコア)を検出します。バイナリ分類(つまり、このスマートコントラクトは脆弱か?)では、GPT-3.5FT と Detect Llama - Foundation の2つの最高のパフォーマンスモデルが、0.776$と0.68$のF1スコアを達成し、GPT-4とGPT-4 Turboを0.66$と0.675$で上回ります。 GPT-4は0.218ドル、GPT-4は0.243ドル、F1は0.719ドル、GPT-3.5FTは0.674ドル、Llamaは0.363ドル、GPT-4は0.429ドルだった。

In this paper, we test the hypothesis that although OpenAI's GPT-4 performs well generally, we can fine-tune open-source models to outperform GPT-4 in smart contract vulnerability detection. We fine-tune two models from Meta's Code Llama and a dataset of 17k prompts, Detect Llama - Foundation and Detect Llama - Instruct, and we also fine-tune OpenAI's GPT-3.5 Turbo model (GPT-3.5FT). We then evaluate these models, plus a random baseline, on a testset we develop against GPT-4, and GPT-4 Turbo's, detection of eight vulnerabilities from the dataset and the two top identified vulnerabilities - and their weighted F1 scores. We find that for binary classification (i.e., is this smart contract vulnerable?), our two best-performing models, GPT-3.5FT and Detect Llama - Foundation, achieve F1 scores of $0.776$ and $0.68$, outperforming both GPT-4 and GPT-4 Turbo, $0.66$ and $0.675$. For the evaluation against individual vulnerability identification, our top two models, GPT-3.5FT and Detect Llama - Foundation, both significantly outperformed GPT-4 and GPT-4 Turbo in both weighted F1 for all vulnerabilities ($0.61$ and $0.56$ respectively against GPT-4's $0.218$ and GPT-4 Turbo's $0.243$) and weighted F1 for the top two identified vulnerabilities ($0.719$ for GPT-3.5FT, $0.674$ for Detect Llama - Foundation against GPT-4's $0.363$ and GPT-4 Turbo's $0.429$).

翻訳日:2024-07-16 00:56:38 公開日:2024-07-12

# ソフトプロンプトは難しい - 隠れたメタ命令でビジュアル言語モデルをステアリングする

Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions ( http://arxiv.org/abs/2407.08970v1 )

ライセンス: Link先を確認

Tingwei Zhang, Collin Zhang, John X. Morris, Eugene Bagdasaryan, Vitaly Shmatikov,

(参考訳) 隠れた"メタインストラクション"は、モデルがどのようにイメージを解釈し、モデルのアウトプットを操り、逆長線スタイル、感情、視点を表現する。ソフトプロンプトとして機能する画像を生成することによってメタ命令を生成する方法について説明する。ジェイルブレイク攻撃や敵の例とは異なり、これらの画像から得られる出力は、画像の視覚的内容に基づいて可視であり、敵の指示に従う。誤情報やスピンを含むこれらの攻撃のリスクについて述べるとともに、複数の視覚言語モデルや敵対的メタオブジェクトに対する有効性を評価し、明示的なテキスト命令によって利用できない基盤となる言語モデルの能力を「アンロック」する方法を実証する。最後に、これらの攻撃に対する防御について論じる。

We introduce a new type of indirect injection vulnerabilities in language models that operate on images: hidden "meta-instructions" that influence how the model interprets the image and steer the model's outputs to express an adversary-chosen style, sentiment, or point of view. We explain how to create meta-instructions by generating images that act as soft prompts. Unlike jailbreaking attacks and adversarial examples, the outputs resulting from these images are plausible and based on the visual content of the image, yet follow the adversary's (meta-)instructions. We describe the risks of these attacks, including misinformation and spin, evaluate their efficacy for multiple visual language models and adversarial meta-objectives, and demonstrate how they can "unlock" the capabilities of the underlying language models that are unavailable via explicit text instructions. Finally, we discuss defenses against these attacks.

翻訳日:2024-07-16 00:46:39 公開日:2024-07-12

# 弱教師付き時間行動定位のためのフルステージ擬似ラベル品質向上

Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization ( http://arxiv.org/abs/2407.08971v1 )

ライセンス: Link先を確認

Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen,

(参考訳) 微弱に監督された時間的行動局所化(WSTAL)は、ビデオレベルの監督のみを使用して、未編集ビデオのアクションをローカライズすることを目的としている。最新のWSTAL手法は、分類に基づくトレーニングとローカライゼーションにおける推論ターゲットのギャップを埋め、最先端の結果を得るための擬似ラベル学習フレームワークを導入している。これらのフレームワークでは、回帰に基づく学生モデルのために、分類に基づくモデルを使用して擬似ラベルを生成し、そこから学習する。しかし、最終結果の鍵となるフレームワークにおける擬似ラベルの品質は慎重に研究されていない。本稿では,FuSTALフレームワークを構築するための簡易かつ効率的な擬似ラベル品質向上機構を提案する。 FuSTALは擬似ラベルの品質を3段階で強化する: 提案生成段階におけるクロスビデオコントラスト学習、提案した選択段階における事前フィルタリング、訓練段階におけるEMAベースの蒸留。これらの設計は、フレームワークの異なる段階で擬似ラベルの品質を高め、より情報的で、偽りがなく、よりスムーズなアクション提案を生み出すのに役立つ。これらの総合的な設計の助けを借りて、FuSTALはTHUMOS'14で平均50.8%のmAPを達成し、以前のベストメソッドを1.2%上回った。

Weakly-supervised Temporal Action Localization (WSTAL) aims to localize actions in untrimmed videos using only video-level supervision. Latest WSTAL methods introduce pseudo label learning framework to bridge the gap between classification-based training and inferencing targets at localization, and achieve cutting-edge results. In these frameworks, a classification-based model is used to generate pseudo labels for a regression-based student model to learn from. However, the quality of pseudo labels in the framework, which is a key factor to the final result, is not carefully studied. In this paper, we propose a set of simple yet efficient pseudo label quality enhancement mechanisms to build our FuSTAL framework. FuSTAL enhances pseudo label quality at three stages: cross-video contrastive learning at proposal Generation-Stage, prior-based filtering at proposal Selection-Stage and EMA-based distillation at Training-Stage. These designs enhance pseudo label quality at different stages in the framework, and help produce more informative, less false and smoother action proposals. With the help of these comprehensive designs at all stages, FuSTAL achieves an average mAP of 50.8% on THUMOS'14, outperforming the previous best method by 1.2%, and becomes the first method to reach the milestone of 50%.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# 極大カーネルネットワークの暗黒秘密をロバスト性に発見する

Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness ( http://arxiv.org/abs/2407.08972v1 )

ライセンス: Link先を確認

Honghao Chen, Yurong Zhang, Xiaokun Feng, Xiangxiang Chu, Kaiqi Huang,

(参考訳) ロバスト性は、ディープラーニングモデルを野生に展開する上で、考慮すべき重要な側面である。ビジョントランスフォーマー(ViT)の堅牢性の研究に多くの研究が注がれており、2020年代以降、視覚タスクの主要なバックボーン選択として支配されてきた。近年、一部の大規模なカーネル・コンブネットは、性能と効率性で復活している。しかし、大規模なカーネルネットワークが堅牢なのか、その堅牢性に起因するのかはまだ不明である。本稿では,6つの多種多様なロバスト性ベンチマークデータセット上で,大カーネルのロバスト性と,典型的な小カーネルのロバスト性とViTとの相違点を総合的に評価する。そして、その強靭性の背後にある要因を分析するために、定量的および質的な視点から実験を設計し、典型的な共振器とは全く異なる大きなカーネル共振器の興味深い特性を明らかにする。我々の実験は、純粋なCNNが、ViTに匹敵する、あるいはそれよりも優れた、非常に堅牢性を達成できることを初めて実証した。本研究は,オクルージョンの不変性,カーネルの注意パターン,周波数特性について解析し,ロバスト性の原因に関する新たな知見を提供する。

Robustness is a vital aspect to consider when deploying deep learning models into the wild. Numerous studies have been dedicated to the study of the robustness of vision transformers (ViTs), which have dominated as the mainstream backbone choice for vision tasks since the dawn of 2020s. Recently, some large kernel convnets make a comeback with impressive performance and efficiency. However, it still remains unclear whether large kernel networks are robust and the attribution of their robustness. In this paper, we first conduct a comprehensive evaluation of large kernel convnets' robustness and their differences from typical small kernel counterparts and ViTs on six diverse robustness benchmark datasets. Then to analyze the underlying factors behind their strong robustness, we design experiments from both quantitative and qualitative perspectives to reveal large kernel convnets' intriguing properties that are completely different from typical convnets. Our experiments demonstrate for the first time that pure CNNs can achieve exceptional robustness comparable or even superior to that of ViTs. Our analysis on occlusion invariance, kernel attention patterns and frequency characteristics provide novel insights into the source of robustness.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# 解釈可能な機械学習のための白黒ボックス技術の統合

Integrating White and Black Box Techniques for Interpretable Machine Learning ( http://arxiv.org/abs/2407.08973v1 )

ライセンス: Link先を確認

Eric M. Vernon, Naoki Masuyama, Yusuke Nojima,

(参考訳) 機械学習アルゴリズム設計では、アルゴリズムの解釈可能性と性能の間にトレードオフが存在する。一般に、人間が理解するのが簡単で容易なアルゴリズムは、より複雑で透明性の低いアルゴリズムよりも悪い性能を示す傾向がある。例えば、ランダムな森林分類器は単純な決定木よりも正確である可能性が高いが、解釈可能性の犠牲になる。本稿では,高い解釈可能な分類器(ホワイトボックスモデル)を用いてより簡単な入力を分類するアンサンブル分類器の設計と,より強力だが解釈可能な分類器(ブラックボックスモデル)を用いてより難しい入力を行う。

In machine learning algorithm design, there exists a trade-off between the interpretability and performance of the algorithm. In general, algorithms which are simpler and easier for humans to comprehend tend to show worse performance than more complex, less transparent algorithms. For example, a random forest classifier is likely to be more accurate than a simple decision tree, but at the expense of interpretability. In this paper, we present an ensemble classifier design which classifies easier inputs using a highly-interpretable classifier (i.e., white box model), and more difficult inputs using a more powerful, but less interpretable classifier (i.e., black box model).

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# 抗がんペプチド予測のためのトポロジー強化機械学習モデル(Top-ML)

Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction ( http://arxiv.org/abs/2407.08974v1 )

ライセンス: Link先を確認

Joshua Zhi En Tan, JunJie Wee, Xue Gong, Kelin Xia,

(参考訳) 近年,治療ペプチドは癌治療に大いに期待されている。強力な抗がんペプチドを探索するために、人工知能(AI)ベースのアプローチが、潜在的な候補を体系的にスクリーニングするために開発されている。しかし、これらの機械学習モデルでは、ペプチドの効率的な分解の欠如がボトルネックとなっている。本稿では,抗がんペプチド予測のためのトポロジー強化機械学習モデル(Top-ML)を提案する。筆者らのTop-MLでは, ベクターおよびスペクトル記述子を特徴とする配列"接続"情報から得られるペプチドトポロジ的特徴を用いている。我々のTop-MLモデルは、広く使われている2つのAntiCP 2.0ベンチマークデータセットで検証され、最先端のパフォーマンスを達成した。本研究は,抗がんペプチドの同定を促進するために,新規なトポロジを基盤とした創製の可能性を強調した。

Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by vector and spectral descriptors. Our Top-ML model has been validated on two widely used AntiCP 2.0 benchmark datasets and has achieved state-of-the-art performance. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# ランダムフーリエ特徴を持つカーネル2サンプルテストにおける計算統計的トレードオフ

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features ( http://arxiv.org/abs/2407.08976v1 )

ライセンス: Link先を確認

Ikjun Choi, Ilmun Kim,

(参考訳) 近年,2サンプル試験の手法が急増しており,その中の1つは,高次元・高次元データを扱うための有効なツールとして,最大平均離散性(MMD)テスト(Maximum Mean Discrepancy)テスト(英語版))が出現している。成功と広く採用されているにもかかわらず、MDDテストの主な制限は2次時間の複雑さであり、大規模な分析の課題となっている。手順の迅速化には様々なアプローチが提案されているが、MDD試験と同等の出力保証を準4次時間で達成できるかどうかは不明である。このギャップを埋めるために、ランダムなフーリエ特徴を用いて近似MDDテストを再検討し、その計算統計的トレードオフについて検討する。まず,無作為な特徴数が無限大に近づいた場合にのみ,近似MDDテストがパワーで一意に一致していることを明らかにする。次に、テストの均一なパワーを検討し、ミニマックステストフレームワークの下でタイムパワートレードオフを研究する。その結果, ランダムな特徴数を慎重に選択することにより, MMD試験と同一の最小分離率を準4次時間で達成できることが示唆された。我々は、ソボレフ球の密度のような異なる分布仮定の下でこの点を実証する。我々の理論的発見はシミュレーション研究によって裏付けられている。

Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread adoption, the primary limitation of the MMD test has been its quadratic-time complexity, which poses challenges for large-scale analysis. While various approaches have been proposed to expedite the procedure, it has been unclear whether it is possible to attain the same power guarantee as the MMD test at sub-quadratic time cost. To fill this gap, we revisit the approximated MMD test using random Fourier features, and investigate its computational-statistical trade-off. We start by revealing that the approximated MMD test is pointwise consistent in power only when the number of random features approaches infinity. We then consider the uniform power of the test and study the time-power trade-off under the minimax testing framework. Our result shows that, by carefully choosing the number of random features, it is possible to attain the same minimax separation rates as the MMD test within sub-quadratic time. We demonstrate this point under different distributional assumptions such as densities in a Sobolev ball. Our theoretical findings are corroborated by simulation studies.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# CURE:プライバシ保護のスプリット学習を正しく行う

CURE: Privacy-Preserving Split Learning Done Right ( http://arxiv.org/abs/2407.08977v1 )

ライセンス: Link先を確認

Halil Ibrahim Kanpak, Aqsa Shabbir, Esra Genç, Alptekin Küpçü, Sinem Sav,

(参考訳) ディープニューラルネットワークのトレーニングには、計算上の制約により、大規模なデータセット、ストレージ、クラウドサーバでの処理が必要になることが少なくない。この手続きは、医療などの領域で厳格なプライバシー規制に従う必要がある。モデル層をクライアントとサーバに分割するフレームワークであるスプリットラーニング(SL)は、分散モデルトレーニングに広く採用されている。 Split Learningは、完全なパラメータセットへのサーバアクセスを制限することによって、プライバシのリスクを低減するが、以前の調査では、サーバとクライアントの間で交換された中間出力が、クライアントのデータプライバシを損なう可能性がある。このシナリオには、同型暗号化(HE)ベースのソリューションが存在するが、しばしば禁止的な計算負担を課す。これらの課題に対処するために,モデルサーバ側と任意にデータのみを暗号化するHEに基づく新しいシステムであるCUREを提案する。 CUREはセキュアなSLを実現すると同時に、高度なパッキング技術による通信と並列化を大幅に改善する。我々は,1層ネットワークのHEレベルを1つ消費し,n層ニューラルネットワークへの解を一般化する2つのパッキング手法を提案する。 CUREは,従来のプライバシ保存方式に比べて,実行時の16倍の効率で,平文SLと同等の精度を達成できることを実証した。

Training deep neural networks often requires large-scale datasets, necessitating storage and processing on cloud servers due to computational constraints. The procedures must follow strict privacy regulations in domains like healthcare. Split Learning (SL), a framework that divides model layers between client(s) and server(s), is widely adopted for distributed model training. While Split Learning reduces privacy risks by limiting server access to the full parameter set, previous research has identified that intermediate outputs exchanged between server and client can compromise client's data privacy. Homomorphic encryption (HE)-based solutions exist for this scenario but often impose prohibitive computational burdens. To address these challenges, we propose CURE, a novel system based on HE, that encrypts only the server side of the model and optionally the data. CURE enables secure SL while substantially improving communication and parallelization through advanced packing techniques. We propose two packing schemes that consume one HE level for one-layer networks and generalize our solutions to n-layer neural networks. We demonstrate that CURE can achieve similar accuracy to plaintext SL while being 16x more efficient in terms of the runtime compared to the state-of-the-art privacy-preserving alternatives.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# 大規模言語モデルによる第1章から第2章へのコンテキスト認識文学翻訳に向けて

Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models ( http://arxiv.org/abs/2407.08978v1 )

ライセンス: Link先を確認

Linghao Jin, Li An, Xuezhe Ma,

(参考訳) 既存の文書レベルの翻訳データセットにおける談話現象は希少であり、文脈対応機械翻訳モデルの開発において根本的な障害となっている。さらに、既存の文書レベルのコーパスや文脈対応機械翻訳手法は、文レベルのアライメントに関する非現実的な仮定に依存している。これらの問題を緩和するために、我々はまず、複雑な談話構造を持つ160冊の本からなる漢文文学の新しいデータセットをキュレートする。そこで本稿では,Ch2Ch(チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/チャプタ/ さらに,Ch2Ch文の翻訳領域において,大規模言語モデル(LLM)を微調整する潜在的なアプローチを導入する。包括的分析を通して、モデル学習法と翻訳復号アルゴリズムの両方に関して、Ch2Ch設定による文体翻訳が本質的に困難であることを明らかにする。

Discourse phenomena in existing document-level translation datasets are sparse, which has been a fundamental obstacle in the development of context-aware machine translation models. Moreover, most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments. To mitigate these issues, we first curate a novel dataset of Chinese-English literature, which consists of 160 books with intricate discourse structures. Then, we propose a more pragmatic and challenging setting for context-aware translation, termed chapter-to-chapter (Ch2Ch) translation, and investigate the performance of commonly-used machine translation models under this setting. Furthermore, we introduce a potential approach of finetuning large language models (LLMs) within the domain of Ch2Ch literary translation, yielding impressive improvements over baselines. Through our comprehensive analysis, we unveil that literary translation under the Ch2Ch setting is challenging in nature, with respect to both model learning methods and translation decoding algorithms.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# 構文記述によるより信頼しやすく解釈可能なLLMのコード化に向けて

Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations ( http://arxiv.org/abs/2407.08983v1 )

ライセンス: Link先を確認

David N. Palacio, Daniel Rodriguez-Cardenas, Alejandro Velasco, Dipin Khati, Kevin Moran, Denys Poshyvanyk,

(参考訳) 信頼性と解釈性は、LLMにとって、厳密には結びついている概念である。 LLMが解釈可能なほど、より信頼できるものになります。しかし、コード関連タスクに適用する場合のLLMの解釈技術は、精度の測定、モデルがどのように変化に反応するかの計測、あるいはより深い解釈可能性のために予測時に必要とされる詳細な説明ではなく個々のタスクパフォーマンスに重点を置いている。本稿では,モデル信頼度とプログラミング言語の構文構造との関係を基礎とした説明を生成する,LLMの解釈可能性手法であるASTrustを紹介する。 ASTrustは抽象構文木に基づく構文カテゴリのコンテキストで生成されたコードを説明し、ローカル(個別のコードスニペット)とグローバル(コードのより大きなデータセット)のレベルでモデル予測を理解する実践者を支援する。 AST内に存在するよく知られた構文構造にモデルの信頼度スコアを分配し割り当てることにより、当社のアプローチは、開発者が親しんだプログラミング言語の概念と直接一致するモデル信頼度ビューを提供することにより、トークンレベルの信頼度マッピングを実行する従来の技術を超えた。 ASTrustを実践するために,ASTからの構文構造のシーケンス,熱マップ,グラフに基づく可視化に重畳されたモデル信頼性スコアを自動可視化する手法を開発した。我々は、GitHubリポジトリのキュレートされたセット上での12の人気のあるLCMに関するデータサイエンス研究を通じてASTrustがもたらす実用的なメリットと、人間による研究によるASTrustの有用性について検討する。

Trustworthiness and interpretability are inextricably linked concepts for LLMs. The more interpretable an LLM is, the more trustworthy it becomes. However, current techniques for interpreting LLMs when applied to code-related tasks largely focus on accuracy measurements, measures of how models react to change, or individual task performance instead of the fine-grained explanations needed at prediction time for greater interpretability, and hence trust. To improve upon this status quo, this paper introduces ASTrust, an interpretability method for LLMs of code that generates explanations grounded in the relationship between model confidence and syntactic structures of programming languages. ASTrust explains generated code in the context of syntax categories based on Abstract Syntax Trees and aids practitioners in understanding model predictions at both local (individual code snippets) and global (larger datasets of code) levels. By distributing and assigning model confidence scores to well-known syntactic structures that exist within ASTs, our approach moves beyond prior techniques that perform token-level confidence mapping by offering a view of model confidence that directly aligns with programming language concepts with which developers are familiar. To put ASTrust into practice, we developed an automated visualization that illustrates the aggregated model confidence scores superimposed on sequence, heat-map, and graph-based visuals of syntactic structures from ASTs. We examine both the practical benefit that ASTrust can provide through a data science study on 12 popular LLMs on a curated set of GitHub repos and the usefulness of ASTrust through a human study.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# 高等教育における創発的AI政策の探求--中国,日本,モンゴル,米国からの比較

Exploring Generative AI Policies in Higher Education: A Comparative Perspective from China, Japan, Mongolia, and the USA ( http://arxiv.org/abs/2407.08986v1 )

ライセンス: Link先を確認

Qin Xie, Ming Li, Ariunaa Enkhtur,

(参考訳) 本研究は,中国,日本,モンゴル,米国の4カ国における生成AIに関する国家政策の比較分析を行った。質的比較分析(QCA)手法を用いて、これらの国々の高等教育環境におけるジェネレーティブAIに対する反応を調査し、このグループ内での彼らのアプローチの多様性を精査する。高等教育における創造的AIに対する肯定的な態度を示す4つの国はいずれも、日本と米国は、人間中心のアプローチを優先し、教育と学習の直接的なガイダンスを提供する。対照的に、中国とモンゴルは国家の安全に関する懸念を優先し、そのガイドラインは特に教育に合わせたものではなく、社会水準に重点を置いている。さらに、多様性、株式、包摂性を強調している4カ国すべてに拘わらず、デジタル格差に対処する措置について明確に議論したり実施したりすることは一貫して失敗している。これらの国々の高等教育における生成的AIに関する態度と政策の総合的な比較分析を提供することにより、既存の文献を豊かにし、政策立案者にグローバルな視点を提供し、この領域の政策が排除よりも包摂性を促進することを確実にする。

This study conducts a comparative analysis of national policies on Generative AI across four countries: China, Japan, Mongolia, and the USA. Employing the Qualitative Comparative Analysis (QCA) method, it examines the responses of these nations to Generative AI in higher education settings, scrutinizing the diversity in their approaches within this group. While all four countries exhibit a positive attitude toward Generative AI in higher education, Japan and the USA prioritize a human-centered approach and provide direct guidance in teaching and learning. In contrast, China and Mongolia prioritize national security concerns, with their guidelines focusing more on the societal level rather than being specifically tailored to education. Additionally, despite all four countries emphasizing diversity, equity, and inclusion, they consistently fail to clearly discuss or implement measures to address the digital divide. By offering a comprehensive comparative analysis of attitudes and policies regarding Generative AI in higher education across these countries, this study enriches existing literature and provides policymakers with a global perspective, ensuring that policies in this domain promote inclusion rather than exclusion.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# 非定常未知過程からのパラメータ推定

Parameter inference from a non-stationary unknown process ( http://arxiv.org/abs/2407.08987v1 )

ライセンス: Link先を確認

Kieran S. Owens, Ben D. Fulcher,

(参考訳) 非定常系は、二酸化炭素濃度の変動による気候パターンから、上昇する神経調節によって引き起こされる脳のダイナミクスまで、世界中で見られる。したがって、非定常過程を解析する手法が必要であるが、科学や産業における重要な問題において実際に使用されるほとんどの時系列解析手法は、定常性の仮定を単純化する。非定常システムの解析における重要な問題は、非定常未知プロセス(PINUP)からのパラメータ推論と呼ばれる問題クラスである。観測された時系列が与えられた場合、基礎となるシステムの数学的モデルに関する知識や推論を必要とせず、時系列の非定常性を駆動するパラメータを推測する。ここでは、PINUPのための多様なアルゴリズムの文献をレビューし、統一する。問題を定式化し、様々なアルゴリズムの貢献を分類する。この合成により、研究者は文献のギャップを特定でき、異なる方法の体系的な比較が可能になる。また、既存の手法がテストされている最も一般的なシステム(特に静止しないLorenzプロセスとロジスティックマップ)は、ウィンドウ付き平均や分散のような単純な統計的特徴を使用することで驚くほど簡単に動作できることを示し、アルゴリズム性能の証拠としてこれらのシステムで優れたパフォーマンスを使用するプラクティスを損なう。そして、多くの既存手法が不十分に動作し、この分野の方法論的進歩を促進するために使用できる、より困難な問題を特定する。本研究は,非定常系解析への科学的貢献を統一し,PINUP問題と非定常現象のより広範な研究の進展に向けた新たな方向性を提案する。

Non-stationary systems are found throughout the world, from climate patterns under the influence of variation in carbon dioxide concentration, to brain dynamics driven by ascending neuromodulation. Accordingly, there is a need for methods to analyze non-stationary processes, and yet most time-series analysis methods that are used in practice, on important problems across science and industry, make the simplifying assumption of stationarity. One important problem in the analysis of non-stationary systems is the problem class that we refer to as Parameter Inference from a Non-stationary Unknown Process (PINUP). Given an observed time series, this involves inferring the parameters that drive non-stationarity of the time series, without requiring knowledge or inference of a mathematical model of the underlying system. Here we review and unify a diverse literature of algorithms for PINUP. We formulate the problem, and categorize the various algorithmic contributions. This synthesis will allow researchers to identify gaps in the literature and will enable systematic comparisons of different methods. We also demonstrate that the most common systems that existing methods are tested on - notably the non-stationary Lorenz process and logistic map - are surprisingly easy to perform well on using simple statistical features like windowed mean and variance, undermining the practice of using good performance on these systems as evidence of algorithmic performance. We then identify more challenging problems that many existing methods perform poorly on and which can be used to drive methodological advances in the field. Our results unify disjoint scientific contributions to analyzing non-stationary systems and suggest new directions for progress on the PINUP problem and the broader study of non-stationary phenomena.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# テキスト中の摂動に対するLDMのロバスト性

Robustness of LLMs to Perturbations in Text ( http://arxiv.org/abs/2407.08989v1 )

ライセンス: Link先を確認

Ayush Singh, Navpreet Singh, Shubham Vatsal,

(参考訳) クリーンなデータセットを持つことは、ほとんどの自然言語処理(NLP)システムの基本的な前提となっている。しかし、適切に書かれたテキストは現実のシナリオではほとんど見つからないため、しばしば上記の基礎前提を無効にする。最近、Large Language Model (LLM) は目覚ましい性能を示しているが、現実のデータでは避けられないノイズを処理できるだろうか? この研究は、LLMのテキストのモルフォロジー変化に対するレジリエンスを調査することによって、この重要な問題に取り組む。そこで本研究では,異なるレベルのノイズを多種多様なデータセットに人工的に導入し,オリジナルテキストの劣化に対するLLMの頑健さを体系的に評価する。以上の結果から, LLM は, 一般の信念とは対照的に, 文中での騒々しい摂動に対して静かであることが明らかとなった。これはBERTやRoBERTaのような事前訓練済みのモデルから外れており、ノイズの多いテキストの劣化に敏感であることが示されている。さらに、複数の実世界のベンチマークでLSMのレジリエンスをテストする。最小限のプロンプトにより、LLMはGrammar Error Correction (GEC) と Lexical Semantic Change (LSC) のベンチマークタスクにおいて、新たな最先端を実現する。今後の研究を促進するために、私たちは、LLMと人間の補正出力の好みを記述したデータセットを、コードとともにリリースし、その結果を再現します。

Having a clean dataset has been the foundational assumption of most natural language processing (NLP) systems. However, properly written text is rarely found in real-world scenarios and hence, oftentimes invalidates the aforementioned foundational assumption. Recently, Large language models (LLMs) have shown impressive performance, but can they handle the inevitable noise in real-world data? This work tackles this critical question by investigating LLMs' resilience against morphological variations in text. To that end, we artificially introduce varying levels of noise into a diverse set of datasets and systematically evaluate LLMs' robustness against the corrupt variations of the original text. Our findings show that contrary to popular beliefs, generative LLMs are quiet robust to noisy perturbations in text. This is a departure from pre-trained models like BERT or RoBERTa whose performance has been shown to be sensitive to deteriorating noisy text. Additionally, we test LLMs' resilience on multiple real-world benchmarks that closely mimic commonly found errors in the wild. With minimal prompting, LLMs achieve a new state-of-the-art on the benchmark tasks of Grammar Error Correction (GEC) and Lexical Semantic Change (LSC). To empower future research, we also release a dataset annotated by humans stating their preference for LLM vs. human-corrected outputs along with the code to reproduce our results.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# メムリシブCIMとCAMを用いた2次元・3次元視覚のための動的ニューラルネットワーク

Dynamic neural network with memristive CIM and CAM for 2D and 3D vision ( http://arxiv.org/abs/2407.08990v1 )

ライセンス: Link先を確認

Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo Wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu,

(参考訳) 脳はダイナミックで、連想的で、効率的です。入力と過去の経験を関連付けることで、メモリと処理を融合して再構成する。対照的に、AIモデルは静的であり、入力と過去の経験を関連付けることができず、物理的に分離されたメモリと処理を備えたデジタルコンピュータ上で実行される。メムリスタを用いたセマンティックメモリベースの動的ニューラルネットワーク(DNN)であるハードウェア・ソフトウェア共同設計を提案する。ネットワークは、受信したデータとセマンティックベクターとして格納された過去の経験を関連付ける。ネットワークとセマンティックメモリは,それぞれCIM(Computer-Addressable Memory)回路とCAM(Content-Addressable Memory)回路上に実装されている。我々は、40nmのmemristorマクロを用いて、MNISTとModelNetのデータセットから画像と3Dポイントを分類するResNetとPointNet++のコデザインを検証する。さらに、エネルギー消費の77.6%と93.3%を削減している。

The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# 効率的な量子化手法によるDNN話者検証モデルの最適化

Optimization of DNN-based speaker verification model through efficient quantization technique ( http://arxiv.org/abs/2407.08991v1 )

ライセンス: Link先を確認

Yeona Hong, Woo-Jin Chung, Hong-Goo Kang,

(参考訳) ディープニューラルネットワーク(Deep Neural Networks, DNN)は、音声検証を含む様々な分野で急速に進歩しているため、一般的には高い計算コストとかなりのメモリ消費を伴い、モバイルシステムでは管理が難しい。ディープモデルの量子化は、計算コストとメモリコストの両方を削減する手段を提供する。本研究では,話者検証モデルの定量化のための最適化フレームワークを提案する。事前学習話者検証モデルの各層における性能変化とモデルサイズ削減を解析することにより、モデルサイズを著しく低減しつつ、性能劣化を効果的に最小化することができる。我々の量子化アルゴリズムは、そのモデルサイズを著しく圧縮しつつ、最先端の事前訓練話者検証モデル ECAPATDNN の性能を維持するための最初の試みである。全体として、我々の量子化アプローチはモデルのサイズを半分に減らし、EERの増大は0.07%に制限された。

As Deep Neural Networks (DNNs) rapidly advance in various fields, including speech verification, they typically involve high computational costs and substantial memory consumption, which can be challenging to manage on mobile systems. Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of the speaker verification model. By analyzing performance changes and model size reductions in each layer of a pre-trained speaker verification model, we have effectively minimized performance degradation while significantly reducing the model size. Our quantization algorithm is the first attempt to maintain the performance of the state-of-the-art pre-trained speaker verification model, ECAPATDNN, while significantly compressing its model size. Overall, our quantization approach resulted in reducing the model size by half, with an increase in EER limited to 0.07%.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# 感情講演:心理的支援のための音声メッセージによる感情支援

Emotion Talk: Emotional Support via Audio Messages for Psychological Assistance ( http://arxiv.org/abs/2407.08992v1 )

ライセンス: Link先を確認

Fabrycio Leite Nakano Almada, Kauan Divino Pouso Mariano, Maykon Adriell Dutra, Victor Emanuel da Silva Monteiro,

(参考訳) 本稿では,心理的支援のための音声メッセージを通じて,継続的な感情的支援を提供するシステムである「感情講演」を提案する。主な目的は、音声メッセージを分析し、感情を検出し、適切な反応を生成することによって、従来のセラピーセッション以外の患者に一貫したサポートを提供することである。このソリューションはポルトガル語話者に焦点を合わせ、システムが言語的かつ文化的に関連があることを保証する。本システムはセラピストが行う心理的フォローアッププロセスを補完し、特に迅速な対応が不可欠な緊急時において、即時かつアクセス可能な支援を提供することを目的としている。実験により,提案システムの有効性を実証し,心理的サポートの適用の可能性を明らかにする。

This paper presents "Emotion Talk," a system designed to provide continuous emotional support through audio messages for psychological assistance. The primary objective is to offer consistent support to patients outside traditional therapy sessions by analyzing audio messages to detect emotions and generate appropriate responses. The solution focuses on Portuguese-speaking users, ensuring that the system is linguistically and culturally relevant. This system aims to complement and enhance the psychological follow-up process conducted by therapists, providing immediate and accessible assistance, especially in emergency situations where rapid response is crucial. Experimental results demonstrate the effectiveness of the proposed system, highlighting its potential in applications of psychological support.

翻訳日:2024-07-16 00:46:38 公開日:2024-07-12

# タスク駆動型単一画像による文書スキャンの超解像再構成

Task-driven single-image super-resolution reconstruction of document scans ( http://arxiv.org/abs/2407.08993v1 )

ライセンス: Link先を確認

Maciej Zyrek, Michal Kawulok,

(参考訳) 超分解能再構成は、低分解能観測から高分解能の画像を生成することを目的としている。ディープラーニングに根ざした最先端の超解像技術は、目立った視覚的品質の結果を得ることができるが、それらが特定のコンピュータビジョンアプリケーションにとって貴重な情報源であるかどうかはほとんど検証されていない。本稿では,文書スキャンから光学的文字認識を改善するために,超解像を前処理ステップとして活用する可能性を検討する。そこで本研究では,単一画像の超解像のための深層ネットワークをタスク駆動方式で訓練し,テキスト検出のための適応性を高めることを提案する。特定のタスクに限られる問題は重大な欠陥があるため、画像類似性によって導かれるテキスト検出に関連するコンポーネントを取り入れたマルチタスク損失関数を導入する。本稿では,文書画像のリアルタイム超解像化に向けた重要なステップであることを示す。

Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# グローバルアテンション誘導型デュアルドメイン・ポイント・クラウド特徴学習による分類とセグメンテーション

Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation ( http://arxiv.org/abs/2407.08994v1 )

ライセンス: Link先を確認

Zihao Li, Pan Gao, Kang You, Chuan Yan, Manoranjan Paul,

(参考訳) 従来の研究では、ポイントクラウド分析タスクにおけるポイントベースニューラルモデルの有効性が実証されている。しかし, 原点座標に対する効率的な入力埋め込みを実現する上で, 依然として重要な課題が残っている。さらに、もう1つの問題は、ネットワークの幹において重要な要素である隣り合う集約の効率の制限にある。本稿では,上記の課題に対処するため,グローバルアテンション誘導型デュアルドメイン特徴学習ネットワーク(GAD)を提案する。我々はまず,グローバルアテンション機構を改良したCPTモジュールを考案し,その後のアグリゲーションのガイダンスとして機能するグローバル・アウェア・インプット・埋め込みを開発した。次に、Dual-domain K-nearest neighbor Feature Fusion (DKFF)をカスケードして、局所幾何学的関係と長距離意味的接続の両方を高く評価する新しい二重ドメイン特徴学習を通して効果的な特徴集約を行う。マルチポイントクラウド解析タスク(例えば分類,部分セグメンテーション,シーンセグメンテーション)における広範囲な実験により,提案手法の優れた性能と,考案したモジュールの有効性が示された。

Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Attention-guided Dual-domain Feature Learning network (GAD) to address the above-mentioned issues. We first devise the Contextual Position-enhanced Transformer (CPT) module, which is armed with an improved global attention mechanism, to produce a global-aware input embedding that serves as the guidance to subsequent aggregations. Then, the Dual-domain K-nearest neighbor Feature Fusion (DKFF) is cascaded to conduct effective feature aggregation through novel dual-domain feature learning which appreciates both local geometric relations and long-distance semantic connections. Extensive experiments on multiple point cloud analysis tasks (e.g., classification, part segmentation, and scene semantic segmentation) demonstrate the superior performance of the proposed method and the efficacy of the devised modules.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# セルフプロンプトチューニング: LLMでの自律的なロールプレイを可能にする

Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs ( http://arxiv.org/abs/2407.08995v1 )

ライセンス: Link先を確認

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Jiaming Zhou, Haoqin Sun,

(参考訳) LLMの最近の進歩は、異なる指示や文脈に基づいて、様々な役割の対話スタイルと認知過程を正確にシミュレートできる、目覚ましいロールプレイング能力を示してきた。研究は、LLMを専門家の役割に割り当てること、すなわちロールプレイプロンプトとして知られる戦略は、対応する領域におけるそれらのパフォーマンスを高めることを示唆している。しかし、プロンプトは、特定の専門知識と反復的な修正を必要とする、与えられた問題のために手動で設計する必要がある。この目的のために,LLM自体が微調整によってロールプレイプロンプトを生成するセルフ・プロンプト・チューニングを提案する。 LIMAデータセットを基本コーパスとして活用することにより、各データポイントにロールプレイプロンプトをアノテートするためにGPT-4を使用し、LIMA-Roleデータセットを作成する。 LIMA-Role上のLlama-2-7BやMistral-7Bのような微調整LDMを作製した。従って、自己プロンプト調整されたLSMは、任意の質問に対して専門家のロールプロンプトを自動的に生成することができる。我々は、広く使われているNLPベンチマークとオープンエンド質問テストに基づいて、自己プロンプト調整LPMを広範囲に評価した。実験結果から,LLMの自発チューニングは,ほとんどのデータセットにおいて,標準命令のチューニングベースラインよりも優れていたことが示唆された。このことは、LLMを自己プロンプトにするために微調整を利用する大きな可能性を強調し、複雑なプロンプト戦略を自動化する。データセット、モデル、コードは、この \href{https://anonymous.4open.science/r/Self-Prompt-Tuning-739E/}{url} でリリースします。

Recent advancements in LLMs have showcased their remarkable role-playing capabilities, able to accurately simulate the dialogue styles and cognitive processes of various roles based on different instructions and contexts. Studies indicate that assigning LLMs the roles of experts, a strategy known as role-play prompting, can enhance their performance in the corresponding domains. However, the prompt needs to be manually designed for the given problem, requiring certain expertise and iterative modifications. To this end, we propose self-prompt tuning, making LLMs themselves generate role-play prompts through fine-tuning. Leveraging the LIMA dataset as our foundational corpus, we employ GPT-4 to annotate role-play prompts for each data points, resulting in the creation of the LIMA-Role dataset. We then fine-tune LLMs like Llama-2-7B and Mistral-7B on LIMA-Role. Consequently, the self-prompt tuned LLMs can automatically generate expert role prompts for any given question. We extensively evaluate self-prompt tuned LLMs on widely used NLP benchmarks and open-ended question test. Our empirical results illustrate that self-prompt tuned LLMs outperform standard instruction tuned baselines across most datasets. This highlights the great potential of utilizing fine-tuning to enable LLMs to self-prompt, thereby automating complex prompting strategies. We release the dataset, models, and code at this \href{https://anonymous.4open.science/r/Self-Prompt-Tuning-739E/}{url}.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# 音源項による量子場理論の音響的アナローグ

Acoustic Analogue for Quantum Field Theory with a Source term ( http://arxiv.org/abs/2407.08999v1 )

ライセンス: Link先を確認

Akshat Pandey,

(参考訳) 古典音源に結合したスカラー場に対する非相対論的流体類似モデルを提案する。一般的なアナログ重力モデルは、音響計量に結合されたフォノン場を含む。音響アナログの特殊相対性限界について研究する。流体系上の時間依存外部ポテンシャルを仮定することにより、スカラー場の原項をモデル化することができる。量子化の際には、ソースによるフォノン生成が研究される。

We propose a non-relativistic fluid analogue model for a scalar field coupled to a classical source. The generic analogue gravity model involves the phonon field which is coupled to the acoustic metric. We work in the special relativity limit of the acoustic analogue. By assuming a time dependent external potential on the fluid system, we are able to model a source term for the scalar field. Upon quantisation, phonon creation due to the source is studied.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# 大規模言語モデルによる株価変動予測の強化

Enhancing Few-Shot Stock Trend Prediction with Large Language Models ( http://arxiv.org/abs/2407.09003v1 )

ライセンス: Link先を確認

Yiqi Deng, Xingwei He, Jiahao Hu, Siu-Ming Yiu,

(参考訳) 株価トレンド予測の目標は、情報投資決定のための将来の市場の動きを予測することである。既存の手法は主に、広範囲な注釈付きデータに基づいてトレーニングされた教師付きモデルによる株価トレンドの予測に重点を置いている。しかし、人間のアノテーションはリソース集約的であり、注釈付きデータは簡単には利用できない。 LLM(Large Language Models)の印象的な数ショット機能に触発されて,ラベル付きデータの不足を克服し,投資家にとってより有益な予測を行うために,LLMを数ショット設定で使用することを提案する。従来は複数の金融ニュースを合併して株価トレンドを予測し,(1)合併ニュースにはノイズが伴い,(2)LLMの入力限界を超え,性能が低下する2つの重大な問題を引き起こしていた。これらの問題を克服するため、我々は2段階の「デノベーション・then-voting」手法を提案する。具体的には、「関連性」カテゴリーを導入し、統合ニュースの代わりに個別ニュースの株価トレンドを予測する。次に、多数決投票を用いてこれらの予測を集計する。提案手法は, ノイズのあるニュースを無関係に分類し, 最終予測への影響を除去する2つの利点を提供する。 2)個人ニュースの予測はLLMの入力長制限を緩和する。本手法は,S&P500の66.59%,CSI-100の62.17%,HK株の61.17%の精度を達成し,標準の少数ショットの約7%,4%,4%を上回った。さらに,提案手法は最先端の教師付き手法と同等に動作する。

The goal of stock trend prediction is to forecast future market movements for informed investment decisions. Existing methods mostly focus on predicting stock trends with supervised models trained on extensive annotated data. However, human annotation can be resource-intensive and the annotated data are not readily available. Inspired by the impressive few-shot capability of Large Language Models (LLMs), we propose using LLMs in a few-shot setting to overcome the scarcity of labeled data and make prediction more feasible to investors. Previous works typically merge multiple financial news for predicting stock trends, causing two significant problems when using LLMs: (1) Merged news contains noise, and (2) it may exceed LLMs' input limits, leading to performance degradation. To overcome these issues, we propose a two-step method 'denoising-then-voting'. Specifically, we introduce an `Irrelevant' category, and predict stock trends for individual news instead of merged news. Then we aggregate these predictions using majority voting. The proposed method offers two advantages: (1) Classifying noisy news as irrelevant removes its impact on the final prediction. (2) Predicting for individual news mitigates LLMs' input length limits. Our method achieves 66.59% accuracy in S&P 500, 62.17% in CSI-100, and 61.17% in HK stock prediction, outperforming the standard few-shot counterparts by around 7%, 4%, and 4%. Furthermore, our proposed method performs on par with state-of-the-art supervised methods.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# プライバシ保護型協調ゲノム研究 : 実生活の展開とビジョン

Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision ( http://arxiv.org/abs/2407.09004v1 )

ライセンス: Link先を確認

Zahra Rahmani, Nahal Shahini, Nadav Gat, Zebin Yun, Yuzhou Jiang, Ofir Farchy, Yaniv Harel, Vipin Chaudhary, Mahmood Sharif, Erman Ayday,

(参考訳) データ革命は、医療セクターにとって大きな可能性を秘めている。個人から収集された膨大なデータは、知識、AIモデル、予測システム、ベストプラクティスに変換される。健康分野の1つにゲノム領域がある。 AI、機械学習、データサイエンスの進歩は、ゲノム研究の新しい機会を開き、パーソナライズドメディカルのブレークスルーを約束している。しかし、プライバシーとサイバーセキュリティに対する意識の高まりは、協調研究において機密データを保護するための堅牢なソリューションを必要としている。本稿では、健康データコラボレーションのためのプラットフォームであるLynx.MDと共同で開発された、ゲノム研究のためのプライバシ保護フレームワークの実践的展開について述べる。このフレームワークは、重要なサイバーセキュリティとプライバシの課題に対処し、データ漏洩に伴うリスクを軽減しつつ、プライバシ保護によるゲノムデータの共有と分析を可能にする。高度なプライバシ保護アルゴリズムを統合することで、このソリューションは、データユーティリティを損なうことなく、個々のプライバシを保護する。このシステムのユニークな特徴は、データ共有とプライバシのトレードオフのバランスをとる能力であり、ステークホルダーがプライバシのリスクを定量化し、情報的な決定を行うためのツールを提供する。 Lynx.MD内でのフレームワークの実装には、ゲノムデータをバイナリ形式に符号化し、制御された摂動技術を通じてノイズを適用することが含まれる。このアプローチはデータの本質的な統計特性を保ち、効果的な研究と分析を容易にする。さらに、リアルタイムデータ監視と高度な可視化ツールが組み込まれ、ユーザエクスペリエンスと意思決定が向上する。この論文は、ゲノムデータに特有のプライバシー攻撃と防衛の必要性を強調している。これらの課題に対処することで、ゲノム研究の協力が促進され、パーソナライズされた医療と公衆衛生が推進される。

The data revolution holds significant promise for the health sector. Vast amounts of data collected from individuals will be transformed into knowledge, AI models, predictive systems, and best practices. One area of health that stands to benefit greatly is the genomic domain. Progress in AI, machine learning, and data science has opened new opportunities for genomic research, promising breakthroughs in personalized medicine. However, increasing awareness of privacy and cybersecurity necessitates robust solutions to protect sensitive data in collaborative research. This paper presents a practical deployment of a privacy-preserving framework for genomic research, developed in collaboration with Lynx.MD, a platform for secure health data collaboration. The framework addresses critical cybersecurity and privacy challenges, enabling the privacy-preserving sharing and analysis of genomic data while mitigating risks associated with data breaches. By integrating advanced privacy-preserving algorithms, the solution ensures the protection of individual privacy without compromising data utility. A unique feature of the system is its ability to balance trade-offs between data sharing and privacy, providing stakeholders tools to quantify privacy risks and make informed decisions. Implementing the framework within Lynx.MD involves encoding genomic data into binary formats and applying noise through controlled perturbation techniques. This approach preserves essential statistical properties of the data, facilitating effective research and analysis. Moreover, the system incorporates real-time data monitoring and advanced visualization tools, enhancing user experience and decision-making. The paper highlights the need for tailored privacy attacks and defenses specific to genomic data. Addressing these challenges fosters collaboration in genomic research, advancing personalized medicine and public health.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# VaDAの導入:新しいデータセットを用いた海上物体分割のための新しい画像分割モデル

Introducing VaDA: Novel Image Segmentation Model for Maritime Object Segmentation Using New Dataset ( http://arxiv.org/abs/2407.09005v1 )

ライセンス: Link先を確認

Yongjin Kim, Jinbum Park, Sanha Kang, Hanguen Kim,

(参考訳) 海上輸送産業は、コンピュータビジョン人工知能(AI)の進歩によって急速に進化している。その結果、海上輸送のためのAIベースの物体認識モデルの研究は着実に増加しており、センサー技術とコンピュータ性能の進歩を活用している。しかし、海洋環境における物体認識は、光の反射、干渉、激しい照明、様々な気象条件といった課題に直面している。これらの課題に対処するためには、海洋画像に適した高性能ディープラーニングアルゴリズムと海洋シーンに特化した高品質データセットが不可欠である。既存のAI認識モデルとデータセットは、自律ナビゲーションシステムを構成するのに限定的に適している。そこで本稿では,海洋オブジェクトセグメンテーションのためのVaDAモデルと新たなモデル評価手法であるIFCP(Integrated Figure of Compute Performance)を提案する。さらに、様々な海洋環境におけるモデルパフォーマンス評価を標準化するために、ベンチマーク海事データセットOASI(Ocean AI Segmentation Initiatives)を導入する。 OASIsデータセットと詳細は、私たちのWebサイトにある。

The maritime shipping industry is undergoing rapid evolution driven by advancements in computer vision artificial intelligence (AI). Consequently, research on AI-based object recognition models for maritime transportation is steadily growing, leveraging advancements in sensor technology and computing performance. However, object recognition in maritime environments faces challenges such as light reflection, interference, intense lighting, and various weather conditions. To address these challenges, high-performance deep learning algorithms tailored to maritime imagery and high-quality datasets specialized for maritime scenes are essential. Existing AI recognition models and datasets have limited suitability for composing autonomous navigation systems. Therefore, in this paper, we propose a Vertical and Detail Attention (VaDA) model for maritime object segmentation and a new model evaluation method, the Integrated Figure of Calculation Performance (IFCP), to verify its suitability for the system in real-time. Additionally, we introduce a benchmark maritime dataset, OASIs (Ocean AI Segmentation Initiatives) to standardize model performance evaluation across diverse maritime environments. OASIs dataset and details are available at our website: https://www.navlue.com/dataset

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# ベンチマーク言語モデルの創造性:コード生成のケーススタディ

Benchmarking Language Model Creativity: A Case Study on Code Generation ( http://arxiv.org/abs/2407.09007v1 )

ライセンス: Link先を確認

Yining Lu, Dixuan Wang, Tianjian Li, Dongwei Jiang, Daniel Khashabi,

(参考訳) LLMが普及するにつれて、これらのモデルがいかに「創造的」であるかを考えることは興味深い。認知科学では、創造性は少なくとも2つの重要な特徴から構成される: \emph{convergent} 思考(与えられた目標を達成するための目的性)と \emph{divergent} 思考(新しい環境や制約への適応性) \citep{runco2003 critical} である。本稿では,2つの特徴を取り入れたLCM創造性を定量化する枠組みを提案する。本研究の成果は,1) 従来のソリューションに新たな制約を段階的に課すことにより, LLM がより創造的な解決策を導き出すように促すとともに, LLM が新たな戦略を採用するよう説得すること,2) LLM が生成した創造的応答における収束的思考と発散的思考の両方を考察するNeoGauge メトリクスの定義と計算によって達成される。我々は,人間のコーディングソリューションを収集する自然なデータソースであるCodeforces問題に対して,提案したフレームワークを適用した。さまざまなプロプライエタリなオープンソースモデルに対してNeoGaugeを定量化し、最も創造的なモデルであるGPT-4でさえ、人間のような創造性を実証するに足りていないことを発見した。また、先進的推論戦略(MCTS、自己補正など)も試行し、創造性に大きな改善は見つからない。分析の副産物として、将来のモデルで結果を再現するためのNeoCoderデータセットをリリースします。

As LLMs become increasingly prevalent, it is interesting to consider how ``creative'' these models can be. From cognitive science, creativity consists of at least two key characteristics: \emph{convergent} thinking (purposefulness to achieve a given goal) and \emph{divergent} thinking (adaptability to new environments or constraints) \citep{runco2003critical}. In this work, we introduce a framework for quantifying LLM creativity that incorporates the two characteristics. This is achieved by (1) Denial Prompting pushes LLMs to come up with more creative solutions to a given problem by incrementally imposing new constraints on the previous solution, compelling LLMs to adopt new strategies, and (2) defining and computing the NeoGauge metric which examines both convergent and divergent thinking in the generated creative responses by LLMs. We apply the proposed framework on Codeforces problems, a natural data source for collecting human coding solutions. We quantify NeoGauge for various proprietary and open-source models and find that even the most creative model, GPT-4, still falls short of demonstrating human-like creativity. We also experiment with advanced reasoning strategies (MCTS, self-correction, etc.) and observe no significant improvement in creativity. As a by-product of our analysis, we release NeoCoder dataset for reproducing our results on future models.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# 1羽, 4羽の鳥:教師付きコントラスト学習を用いたQAシステムの総合的解法

One Stone, Four Birds: A Comprehensive Solution for QA System Using Supervised Contrastive Learning ( http://arxiv.org/abs/2407.09011v1 )

ライセンス: Link先を確認

Bo Wang, Tsunenori Mine,

(参考訳) 本稿では,教師付きコントラスト学習(SCL)による質問応答(QA)システムの堅牢性と効率性を両立させる,新しい総合的ソリューションを提案する。事前訓練された言語モデルでは、少量のデータと単純な微調整を必要とせず、高性能なQAシステムのトレーニングが簡単になっている。しかし、近年の進歩にもかかわらず、既存のQAシステムは機能や訓練効率に重大な欠陥をみせている。ユーザ入力意図分類、ドメイン外入力検出、新しい意図発見、継続学習の4つの重要なタスクを定義することで、機能問題に対処する。次に,SCLをベースとした表現学習手法を活用し,クラス内およびクラス間分散特徴空間を効率的に構築し,既知の意図分類と未知の意図検出と発見を容易にする。その結果、下流タスクに最小限のチューニングを施すことで、モデル効率を大幅に改善し、全てのタスクにまたがる新しい最先端パフォーマンスを実現することができる。

This paper presents a novel and comprehensive solution to enhance both the robustness and efficiency of question answering (QA) systems through supervised contrastive learning (SCL). Training a high-performance QA system has become straightforward with pre-trained language models, requiring only a small amount of data and simple fine-tuning. However, despite recent advances, existing QA systems still exhibit significant deficiencies in functionality and training efficiency. We address the functionality issue by defining four key tasks: user input intent classification, out-of-domain input detection, new intent discovery, and continual learning. We then leverage a unified SCL-based representation learning method to efficiently build an intra-class compact and inter-class scattered feature space, facilitating both known intent classification and unknown intent detection and discovery. Consequently, with minimal additional tuning on downstream tasks, our approach significantly improves model efficiency and achieves new state-of-the-art performance across all tasks.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# TCAN:拡散モデルを用いた時間的視点誘導による人物画像のアニメーション

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models ( http://arxiv.org/abs/2407.09012v1 )

ライセンス: Link先を確認

Jeongho Kim, Min-Jung Kim, Junsoo Lee, Jaegul Choo,

(参考訳) ポーズ駆動型人像アニメーション拡散モデルは、リアルな人間のビデオ合成において顕著な能力を示した。従来のアプローチによる有望な結果にもかかわらず、時間的に一貫したアニメーションの実現と、市販のポーズ検出器による堅牢性の確保には課題が続いている。本稿では,ポーズ駆動型人間画像アニメーション法であるTCANについて述べる。従来の手法とは対照的に,事前学習したControlNetを微調整なしで利用し,多数のポーズ・イメージ・ペアから取得した膨大な知識を活用する。 ControlNetを凍結に保つために、LoRAをUNet層に適応させ、ポーズと外観の特徴の間に潜伏した空間を調整できるようにします。さらに、ControlNetに追加の時間層を導入することで、ポーズ検出器の外れ値に対する堅牢性を高める。時間軸上のアテンションマップの解析を通じて、ポーズ情報を利用した新しい温度マップを設計し、より静的な背景を実現する。大規模な実験により,チビのような様々なポーズを含む映像合成タスクにおいて,提案手法が期待できる結果が得られることが示された。プロジェクトページ: https://eccv2024tcan.github.io/

Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to erroneous poses and consistent over time. In contrast to previous methods, we utilize the pre-trained ControlNet without fine-tuning to leverage its extensive pre-acquired knowledge from numerous pose-image-caption pairs. To keep the ControlNet frozen, we adapt LoRA to the UNet layers, enabling the network to align the latent space between the pose and appearance features. Additionally, by introducing an additional temporal layer to the ControlNet, we enhance robustness against outliers of the pose detector. Through the analysis of attention maps over the temporal axis, we also designed a novel temperature map leveraging pose information, allowing for a more static background. Extensive experiments demonstrate that the proposed method can achieve promising results in video synthesis tasks encompassing various poses, like chibi. Project Page: https://eccv2024tcan.github.io/

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# 生成人工知能による手続き的コンテンツ生成

Procedural Content Generation via Generative Artificial Intelligence ( http://arxiv.org/abs/2407.09013v1 )

ライセンス: Link先を確認

Xinyu Mao, Wanli Yu, Kazunori D Yamada, Michael R. Zielewski,

(参考訳) PCGで機械学習を活用する試みは過去にも行われてきた。そこで本研究では,2010年代中盤に注目が集まってきた生成人工知能(AI)がPCGにどのように利用されているかを検討する。我々は、地形、アイテム、さらにはストーリーラインを含む様々なタイプのコンテンツを作成するための生成AIの応用についてレビューする。生成AIはPCGに有効だが、それが直面する重要な問題は、高性能生成AIの構築には膨大なトレーニングデータが必要であることだ。コンテンツは一般的に高度にカスタマイズされているため、ドメイン固有のトレーニングデータは少なく、生成AIモデルへの直接的なアプローチはうまく機能しないかもしれない。 PCG研究をさらに進めるためには、限られたトレーニングデータに関連する問題を克服する必要がある。このように、限られたトレーニングデータによってもたらされる課題に対処する研究についても、特別に検討する。

The attempt to utilize machine learning in PCG has been made in the past. In this survey paper, we investigate how generative artificial intelligence (AI), which saw a significant increase in interest in the mid-2010s, is being used for PCG. We review applications of generative AI for the creation of various types of content, including terrains, items, and even storylines. While generative AI is effective for PCG, one significant issues it faces is that building high-performance generative AI requires vast amounts of training data. Because content generally highly customized, domain-specific training data is scarce, and straightforward approaches to generative AI models may not work well. For PCG research to advance further, issues related to limited training data must be overcome. Thus, we also give special consideration to research that addresses the challenges posed by limited training data.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# CompAct: 質問応答のために検索した文書をアクティブに圧縮する

CompAct: Compressing Retrieved Documents Actively for Question Answering ( http://arxiv.org/abs/2407.09014v1 )

ライセンス: Link先を確認

Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, Jaewoo Kang,

(参考訳) Retrieval-augmented Generationは、言語モデルをサポートし、外部コンテキストを提供することで、実際の基盤を強化する。しかし、言語モデルは、広範囲な情報を与えるとしばしば課題に直面し、問題の解決においての有効性を低下させる。コンテキスト圧縮は、無関係な情報をフィルタリングすることでこの問題に対処するが、現在の手法は、単一ステップのアプローチで重要な情報をキャプチャできない現実的なシナリオで依然として苦労している。この制限を克服するために、キー情報を失うことなく広範囲の文書を凝縮するアクティブな戦略を取り入れた新しいフレームワークCompActを紹介する。本実験は,マルチホップ質問応答(QA)ベンチマークにおいて,CompActが性能と圧縮速度の両方に大幅な改善をもたらすことを示した。 CompActは、様々なオフザシェルフレトリバーやリーダーを備えたコスト効率のよいプラグインモジュールとして柔軟に動作し、非常に高い圧縮率(47倍)を達成する。

Retrieval-augmented generation supports language models to strengthen their factual groundings by providing external contexts. However, language models often face challenges when given extensive information, diminishing their effectiveness in solving questions. Context compression tackles this issue by filtering out irrelevant information, but current methods still struggle in realistic scenarios where crucial information cannot be captured with a single-step approach. To overcome this limitation, we introduce CompAct, a novel framework that employs an active strategy to condense extensive documents without losing key information. Our experiments demonstrate that CompAct brings significant improvements in both performance and compression rate on multi-hop question-answering (QA) benchmarks. CompAct flexibly operates as a cost-efficient plug-in module with various off-the-shelf retrievers or readers, achieving exceptionally high compression rates (47x).

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# ブールネットワークを用いた論理プログラムの静的解析

Static Analysis of Logic Programs via Boolean Networks ( http://arxiv.org/abs/2407.09015v1 )

ライセンス: Link先を確認

Van-Giang Trinh, Belaid Benhamou,

(参考訳) 解答集合プログラミング(Answer Set Programming, ASP)は、仮定された問題の解に対応する安定モデルを持つ論理プログラムとして組合せ問題の符号化に使用できる宣言的問題解決パラダイムである。 ASPはAIなどのさまざまな領域に広く適用されています。静的情報から論理プログラムの安定モデルについて何が言えるのか?」という疑問が研究され、多くの状況で有用であることが証明されている。本研究では,論理プログラムとBooleanネットワークを接続させることにより,この方向をさらに深く掘り下げる。提案されたコネクションは、Booleanネットワークの静的解析に関する豊富な歴史に既存の結果をもたらし、ASP.NETの静的解析をさらに研究するための統一的で強力なツールとなる。特に、新しく得られた洞察は、ASP.NETの分野で多くの問題を解決する可能性がある。

Answer Set Programming (ASP) is a declarative problem solving paradigm that can be used to encode a combinatorial problem as a logic program whose stable models correspond to the solutions of the considered problem. ASP has been widely applied to various domains in AI and beyond. The question "What can be said about stable models of a logic program from its static information?" has been investigated and proved useful in many circumstances. In this work, we dive into this direction more deeply by making the connection between a logic program and a Boolean network, which is a prominent modeling framework with applications to various areas. The proposed connection can bring the existing results in the rich history on static analysis of Boolean networks to explore and prove more theoretical results on ASP, making it become a unified and powerful tool to further study the static analysis of ASP. In particular, the newly obtained insights have the potential to benefit many problems in the field of ASP.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# Microsoft Copilotによるセキュリティ運用センターのためのAI駆動ガイド応答

AI-Driven Guided Response for Security Operation Centers with Microsoft Copilot for Security ( http://arxiv.org/abs/2407.09017v1 )

ライセンス: Link先を確認

Scott Freitas, Jovan Kalajdjieski, Amir Gharib, Rob McCann,

(参考訳) セキュリティオペレーションセンターは、単純なものから非常に複雑なものまで、セキュリティインシデントの絶え間ないストリームと競合する。この問題を解決するために、業界規模のMLアーキテクチャであるCopilot Guided Response(CGR)を開発した。これは、(1)類似のインシデントを特定することによって、セキュリティアナリストを調査、必須の歴史的コンテキストを提供する、(2)真のポジティブ、偽陽性、良心的ポジティブ、(3)修正された封じ込めアクションを推奨する、という3つの重要なタスクにわたって、セキュリティアナリストをガイドするものだ。 CGRはMicrosoft Defender XDR製品に統合され、世界中でデプロイされ、何千もの顧客に対して数百万のレコメンデーションを生成する。内部評価、セキュリティ専門家とのコラボレーション、顧客からのフィードバックを取り入れた大規模な評価は、CGRが3つのタスクすべてにわたって高品質なレコメンデーションを提供することを示すものです。我々は、CGRアーキテクチャの概要を包括的に紹介し、このような詳細でこれらの機能をオープンに議論した最初のサイバーセキュリティ企業として、先例を定めている。さらに、現実のセキュリティインシデントに関する最大の公開コレクションであるGUIDEは、100万件の注釈付きインシデントにまたがる13万件のエビデンスにまたがっています。研究者や実践者が現実世界のデータの研究を行うことで、GUIDEはサイバーセキュリティの状態を前進させ、次世代の機械学習システムの開発をサポートする。

Security operation centers contend with a constant stream of security incidents, ranging from straightforward to highly complex. To address this, we developed Copilot Guided Response (CGR), an industry-scale ML architecture that guides security analysts across three key tasks -- (1) investigation, providing essential historical context by identifying similar incidents; (2) triaging to ascertain the nature of the incident -- whether it is a true positive, false positive, or benign positive; and (3) remediation, recommending tailored containment actions. CGR is integrated into the Microsoft Defender XDR product and deployed worldwide, generating millions of recommendations across thousands of customers. Our extensive evaluation, incorporating internal evaluation, collaboration with security experts, and customer feedback, demonstrates that CGR delivers high-quality recommendations across all three tasks. We provide a comprehensive overview of the CGR architecture, setting a precedent as the first cybersecurity company to openly discuss these capabilities in such depth. Additionally, we GUIDE, the largest public collection of real-world security incidents, spanning 13M evidences across 1M annotated incidents. By enabling researchers and practitioners to conduct research on real-world data, GUIDE advances the state of cybersecurity and supports the development of next-generation machine learning systems.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# AUITestAgent: 自動要件指向GUI機能テスト

AUITestAgent: Automatic Requirements Oriented GUI Function Testing ( http://arxiv.org/abs/2407.09018v1 )

ライセンス: Link先を確認

Yongxiang Hu, Xuan Wang, Yingchuan Wang, Yu Zhang, Shiyu Guo, Chaoyi Chen, Xin Wang, Yangfan Zhou,

(参考訳) Graphical User Interface (GUI)は、ユーザがモバイルアプリと対話する方法である。適切に機能するためには、テストエンジニアは、通常自然言語で書かれたテスト要件に基づいて、意図した通りに機能するようにする必要がある。広く採用されている手動テストとスクリプトベースの手法は有効であるが、モダンなモバイルアプリではGUIページの多さと迅速なイテレーションのため、かなりの努力を要する。本稿では,モバイル向け初の自動自然言語駆動型GUIテストツールであるAUITestAgentについて紹介する。テスト要件は通常、インタラクションコマンドと検証オラクルを含む。 AUITestAgentは動的に整理されたエージェントを介してテスト要件からGUIインタラクションを抽出できる。次に、AUITestAgentは多次元データ抽出戦略を使用して、インタラクショントレースからテスト要件に関連するデータを検索し、検証を行う。カスタマイズされたベンチマークの実験では、AUITestAgentは生成されたGUIインタラクションの品質で既存のツールよりも優れており、検証の精度は94%に達した。さらに、Meituanのフィールドデプロイメントでは、AUITestAgentの実用的ユーザビリティが示されており、2ヶ月で10回の回帰テスト中に4つの新しい機能バグを検出する。

The Graphical User Interface (GUI) is how users interact with mobile apps. To ensure it functions properly, testing engineers have to make sure it functions as intended, based on test requirements that are typically written in natural language. While widely adopted manual testing and script-based methods are effective, they demand substantial effort due to the vast number of GUI pages and rapid iterations in modern mobile apps. This paper introduces AUITestAgent, the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification. Since test requirements typically contain interaction commands and verification oracles. AUITestAgent can extract GUI interactions from test requirements via dynamically organized agents. Then, AUITestAgent employs a multi-dimensional data extraction strategy to retrieve data relevant to the test requirements from the interaction trace and perform verification. Experiments on customized benchmarks demonstrate that AUITestAgent outperforms existing tools in the quality of generated GUI interactions and achieved the accuracy of verifications of 94%. Moreover, field deployment in Meituan has shown AUITestAgent's practical usability, with it detecting 4 new functional bugs during 10 regression tests in two months.

翻訳日:2024-07-16 00:36:46 公開日:2024-07-12

# ソーシャルメディア上での解釈型抑うつ検出のためのプロンプト学習を用いた不均質なサブグラフネットワーク

Heterogeneous Subgraph Network with Prompt Learning for Interpretable Depression Detection on Social Media ( http://arxiv.org/abs/2407.09019v1 )

ライセンス: Link先を確認

Chen Chen, Mingwei Li, Fenghuan Li, Haopeng Chen, Yuankun Lin,

(参考訳) ソーシャルメディアの膨大なデータは、人々の真正な思考、感情、コミュニケーションなどを反映し、うつ病などの精神疾患の早期発見のために分析することができる。ソーシャルメディアにおける初期うつ病検出に関する既存の研究は、解釈可能性に欠け、ソーシャルメディアデータの異質性を無視した。さらに、ユーザ間のグローバルなインタラクションも見落としていた。これらの課題に対処するために,不均質なサブグラフネットワークとPrompt Learning(HSNPL)とコントラスト学習機構を活用した新しい手法を開発した。具体的には、ユーザの暗黙的な心理的シンボルを解釈しやすくマッピングするために、迅速な学習が使用され、深い意味と多様な行動特徴が異種情報ネットワークに組み込まれている。そして、二重注意機構を有する異種グラフネットワークを構築し、特徴レベルにおける異種社会情報間の関係をモデル化する。さらに、ユーザレベルでのユーザとグループ間の複雑な相互作用を探索するために、サブグラフ注意と自己教師付きコントラスト学習を統合した異種サブグラフネットワークを開発した。その結果,提案手法はソーシャルメディア上での抑うつ検出の最先端手法よりも優れていた。

Massive social media data can reflect people's authentic thoughts, emotions, communication, etc., and therefore can be analyzed for early detection of mental health problems such as depression. Existing works about early depression detection on social media lacked interpretability and neglected the heterogeneity of social media data. Furthermore, they overlooked the global interaction among users. To address these issues, we develop a novel method that leverages a Heterogeneous Subgraph Network with Prompt Learning(HSNPL) and contrastive learning mechanisms. Specifically, prompt learning is employed to map users' implicit psychological symbols with excellent interpretability while deep semantic and diverse behavioral features are incorporated by a heterogeneous information network. Then, the heterogeneous graph network with a dual attention mechanism is constructed to model the relationships among heterogeneous social information at the feature level. Furthermore, the heterogeneous subgraph network integrating subgraph attention and self-supervised contrastive learning is developed to explore complicated interactions among users and groups at the user level. Extensive experimental results demonstrate that our proposed method significantly outperforms state-of-the-art methods for depression detection on social media.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# 3M-Health:メンタルヘルス検出のためのマルチモーダルマルチテラー知識蒸留

3M-Health: Multimodal Multi-Teacher Knowledge Distillation for Mental Health Detection ( http://arxiv.org/abs/2407.09020v1 )

ライセンス: Link先を確認

Rina Carines Cabral, Siwen Luo, Soyeon Caren Han, Josiah Poon,

(参考訳) メンタルヘルスの分類の重要性は現代社会において最重要であり、デジタルプラットフォームは個人の健康をモニタリングするための重要な情報源となっている。しかし、既存のソーシャルメディアのメンタルヘルスデータセットは、主にテキストのみのサンプルで構成されており、そのようなデータに基づいてトレーニングされたモデルの有効性を制限する可能性がある。人間は複雑な状況や問題を理解するために横断的な情報を活用することを認識して、現在の方法論の限界に対処するための新しいアプローチを提案する。本研究では, メンタルヘルス分類のためのマルチモーダル・マルチモーダル知識蒸留モデルを提案する。多様な特徴を統合するための単純な結合にしばしば依存する従来のアプローチとは異なり、我々のモデルは様々な性質(例えばテキストや音)の入力を適切に表現するという課題に対処する。すべての機能をひとつのモデルに統合する際の計算複雑性を軽減するために,マルチモーダル・マルチ教師アーキテクチャを採用する。複数の教員にまたがって学習過程を分散し、それぞれが特定の特徴抽出の側面を特化することにより、メンタルヘルスの全体的分類性能を向上させる。実験により,性能向上のためのモデルの有効性を実証した。関連するすべてのコードは、出版時に利用可能になる。

The significance of mental health classification is paramount in contemporary society, where digital platforms serve as crucial sources for monitoring individuals' well-being. However, existing social media mental health datasets primarily consist of text-only samples, potentially limiting the efficacy of models trained on such data. Recognising that humans utilise cross-modal information to comprehend complex situations or issues, we present a novel approach to address the limitations of current methodologies. In this work, we introduce a Multimodal and Multi-Teacher Knowledge Distillation model for Mental Health Classification, leveraging insights from cross-modal human understanding. Unlike conventional approaches that often rely on simple concatenation to integrate diverse features, our model addresses the challenge of appropriately representing inputs of varying natures (e.g., texts and sounds). To mitigate the computational complexity associated with integrating all features into a single model, we employ a multimodal and multi-teacher architecture. By distributing the learning process across multiple teachers, each specialising in a particular feature extraction aspect, we enhance the overall mental health classification performance. Through experimental validation, we demonstrate the efficacy of our model in achieving improved performance. All relevant codes will be made available upon publication.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# 効率的な連続制御のためのQ関数付き拡散挙動の調整

Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control ( http://arxiv.org/abs/2407.09024v1 )

ライセンス: Link先を確認

Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu,

(参考訳) 言語モデルアライメントの最近の進歩に基づき、オフライン強化学習を2段階最適化問題として定式化します。まず、報酬のない行動データセットに対して表現豊かな生成ポリシーを事前訓練し、次に、これらのポリシーをQ値のようなタスク固有のアノテーションに合わせるように微調整します。この戦略により、多種多様な行動データを活用し、一般化を強化し、最小限のアノテーションを使って下流タスクへの迅速な適応を可能にする。特に,連続制御問題を解くための効率的な拡散アライメント(EDA)を導入する。 EDAは拡散モデルを用いて行動モデリングを行う。しかし、従来のアプローチとは異なり、我々は拡散ポリシーを行動入力に対するスカラーニューラルネットワークの微分として表現する。この表現は拡散モデルの直接密度計算を可能にするため、既存のLLMアライメント理論と互換性がある。ポリシーの微調整中に、直接優先度最適化(DPO)のような嗜好に基づくアライメント手法を拡張して、拡散挙動を連続的なQ-関数と整合させる。 D4RL ベンチマークによる評価の結果,EDA は全体の性能においてすべての基準手法を超越していることがわかった。特に、EDAは95%程度のパフォーマンスを維持し、微調整中にQラベル付きデータのわずか1倍の精度でいくつかのベースラインを上回ります。

Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generalization and enable rapid adaptation to downstream tasks using minimal annotations. In particular, we introduce Efficient Diffusion Alignment (EDA) for solving continuous control problems. EDA utilizes diffusion models for behavior modeling. However, unlike previous approaches, we represent diffusion policies as the derivative of a scalar neural network with respect to action inputs. This representation is critical because it enables direct density calculation for diffusion models, making them compatible with existing LLM alignment theories. During policy fine-tuning, we extend preference-based alignment methods like Direct Preference Optimization (DPO) to align diffusion behaviors with continuous Q-functions. Our evaluation on the D4RL benchmark shows that EDA exceeds all baseline methods in overall performance. Notably, EDA maintains about 95\% of performance and still outperforms several baselines given only 1\% of Q-labelled data during fine-tuning.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# SpreadsheetLLM: 大規模言語モデルのためのスプレッドシートのエンコード

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models ( http://arxiv.org/abs/2407.09025v1 )

ライセンス: Link先を確認

Yuzhang Tian, Jianbo Zhao, Haoyu Dong, Junyu Xiong, Shiyu Xia, Mengyu Zhou, Yun Lin, José Cambronero, Yeye He, Shi Han, Dongmei Zhang,

(参考訳) 広範な2次元グリッド、様々なレイアウト、多様なフォーマットオプションを備えたスプレッドシートは、大きな言語モデル(LLM)において顕著な課題を提示する。そこで我々は,スプレッドシート上でのLLMの強力な理解と推論能力の解放と最適化を目的とした,効率的な符号化手法であるSpreadsheetLLMを紹介した。まず、セルアドレス、値、フォーマットを組み込んだバニラシリアライズ手法を提案する。しかし、このアプローチはLLMのトークン制約によって制限され、ほとんどのアプリケーションでは実用的ではない。この課題に対処するために,LLMのスプレッドシートを効果的に圧縮する革新的な符号化フレームワークである SheetCompressor を開発した。構造アンカーベースの圧縮、逆インデックス変換、データフォーマット対応アグリゲーションの3つのモジュールで構成されている。これはスプレッドシートテーブル検出タスクのパフォーマンスを大幅に改善し、GPT4のコンテキスト内学習環境ではバニラアプローチを25.6%上回った。さらに、シート圧縮機を用いた微調整LDMの圧縮率は平均25倍であるが、最先端の78.9%のF1スコアを達成し、既存のモデルでは12.3%を上回っている。最後に、スプレッドシート理解の下流タスクのためのスプレッドシートのチェーンを提案し、新しい要求のスプレッドシートQAタスクで検証する。我々はスプレッドシートのレイアウトと構造を手法的に利用し、スプレッドシートLLMが様々なスプレッドシートタスクにおいて極めて有効であることを示す。

Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs' token constraints, making it impractical for most applications. To tackle this challenge, we develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approach by 25.6% in GPT4's in-context learning setting. Moreover, fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, but achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%. Finally, we propose Chain of Spreadsheet for downstream tasks of spreadsheet understanding and validate in a new and demanding spreadsheet QA task. We methodically leverage the inherent layout and structure of spreadsheets, demonstrating that SpreadsheetLLM is highly effective across a variety of spreadsheet tasks.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# HPC: ボリュームビデオのための階層的プログレッシブコーディングフレームワーク

HPC: Hierarchical Progressive Coding Framework for Volumetric Video ( http://arxiv.org/abs/2407.09026v1 )

ライセンス: Link先を確認

Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang,

(参考訳) ニューラル・ラジアンス・フィールド(NeRF)に基づくボリュームビデオは、様々な3Dアプリケーションにとって大きな可能性を秘めている。現在のNeRF圧縮は、様々なネットワークとデバイス容量のための単一のモデル内でビデオ品質とビットレートを調整する柔軟性に欠ける。これらの問題に対処するために,HPCを提案する。HPCは,単一のモデルを用いて可変ビットレートを実現する新しい階層的なプログレッシブボリュームビデオ符号化フレームワークである。具体的には、HPCは、多分解能残留放射場を持つ階層表現を導入し、様々な詳細レベルを同時に生成しながら、長期化シーケンスにおける時間的冗長性を減少させる。そこで本稿では,階層的表現と圧縮の両面を協調的に最適化するマルチレート歪み損失関数を用いたエンドツーエンドのプログレッシブ・ラーニング手法を提案する。我々のHPCは一度だけ複数の圧縮レベルを実現することができるが、現在の手法では異なるレート歪み(RD)トレードオフのために複数の固定ビットレートモデルをトレーニングする必要がある。大規模な実験により、HPCは可変ビットレートの柔軟な品質レベルを単一モデルで達成し、競争力のあるRD性能を示し、また様々なデータセットで固定ビットレートモデルよりも優れていた。

Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hierarchical progressive volumetric video coding framework achieving variable bitrate using a single model. Specifically, HPC introduces a hierarchical representation with a multi-resolution residual radiance field to reduce temporal redundancy in long-duration sequences while simultaneously generating various levels of detail. Then, we propose an end-to-end progressive learning approach with a multi-rate-distortion loss function to jointly optimize both hierarchical representation and compression. Our HPC trained only once can realize multiple compression levels, while the current methods need to train multiple fixed-bitrate models for different rate-distortion (RD) tradeoffs. Extensive experiments demonstrate that HPC achieves flexible quality levels with variable bitrate by a single model and exhibits competitive RD performance, even outperforming fixed-bitrate models across various datasets.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# 異方性量子ラビスタークモデルによる量子オットーサイクルにおける臨界性の役割の探索

Exploring the role of criticality in the quantum Otto cycle fueled by the anisotropic quantum Rabi-Stark model ( http://arxiv.org/abs/2407.09027v1 )

ライセンス: Link先を確認

He-Guang Xu, Jiasen Jin, Norton G. de Almeida, G. D. de Moraes Neto,

(参考訳) 熱エンジン、冷蔵庫、ヒーター、加速器を含む量子熱機械は、量子熱力学の最前線を象徴し、熱エネルギーを有用な機械作業に変換する新しいパラダイムを提供する。量子力学の原理を活用することで、これらの機械は、再生可能エネルギーと量子コンピューティングの潜在的な応用により、古典的な機械よりも優れた効率と性能を約束する。本稿では, 理想と有限時間の両方のシナリオで動作する量子オットーエンジンについて検討し, 異方性量子Rabi-Starkモデル(AQRSM)のフレームワーク内の調和振動子と相互作用する2レベルシステムを用いて検討する。このモデルは、一階法と連続量子相転移の両方を示すことで有名である。量子熱機関に着目して、これらの相転移がAQRSMベースのエンジンの効率とパワーをクリティカルに調節し、ハーモニックスペクトルを持つ作業媒体によって駆動される量子エンジンよりも優れていることを示した。さらに, 有限時間運転における量子摩擦の影響について検討し, 実用化に向けた量子熱エンジンの最適化に関する知見を提供する。

Quantum heat machines, encompassing heat engines, refrigerators, heaters, and accelerators, represent the forefront of quantum thermodynamics, offering a novel paradigm for converting heat energy into useful mechanical work. Leveraging quantum mechanical principles, these machines promise superior efficiency and performance compared to classical counterparts, with potential applications in renewable energy and quantum computing. This paper investigates a quantum Otto engine operating in both ideal and finite-time scenarios, employing a two-level system interacting with a harmonic oscillator within the framework of the anisotropic quantum Rabi-Stark model (AQRSM) as the working medium. This model is notable for exhibiting both first-order and continuous quantum phase transitions. By focusing on quantum heat engines, our study reveals that these phase transitions critically modulate the efficiency and power of AQRSM-based engines, outperforming quantum engines fueled by working medium with harmonic spectrum. Additionally, we explore the impacts of quantum friction and conduct limit cycle analysis in finite-time operations, providing insights into optimizing quantum heat engines for practical implementation.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# 不完全データにおける感情認識の強化:新しいクロスモーダルアライメント,リコンストラクション,リファインメントフレームワーク

Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework ( http://arxiv.org/abs/2407.09029v1 )

ライセンス: Link先を確認

Haoqin Sun, Shiwan Zhao, Shaokai Li, Xiangyu Kong, Xuechen Wang, Aobo Kong, Jiaming Zhou, Yong Chen, Wenjia Zeng, Yong Qin,

(参考訳) マルチモーダル感情認識システムは、モーダルデータの完全利用に大きく依存しており、モーダルデータが不完全である場合に顕著な性能低下を被る。この問題に対処するために,クロスモーダルアライメント,リコンストラクション,リファインメント(CM-ARR)フレームワークを提案する。このフレームワークは、教師なし分布に基づくコントラスト学習を利用して、不均一なモーダル分布を整列させ、相違を低減し、意味的不確実性を効果的にモデル化する。再構成フェーズは、これらの整列分布を変換し、欠落したモダリティを回復するために、正規化フローモデルを適用する。改善フェーズでは、教師付きポイントベースのコントラスト学習を用いて、意味的相関を乱し、感情的特徴をアクセントし、再構成された表現の感情的内容を強化する。 IEMOCAP と MSP-IMPROV データセットの大規模な実験により、CM-ARR の欠落と完全モダリティの両方の条件下での優れた性能が確認された。 CM-ARRは6つのモダリティのシナリオの平均として、IEMOCAPデータセットではWARが2.11%、UARが2.12%、MSP-IMPROVデータセットではWARが1.71%、UARが1.96%という絶対的な改善を実現している。

Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle missing modalities and enhance emotion recognition. This framework utilizes unsupervised distribution-based contrastive learning to align heterogeneous modal distributions, reducing discrepancies and modeling semantic uncertainty effectively. The reconstruction phase applies normalizing flow models to transform these aligned distributions and recover missing modalities. The refinement phase employs supervised point-based contrastive learning to disrupt semantic correlations and accentuate emotional traits, thereby enriching the affective content of the reconstructed representations. Extensive experiments on the IEMOCAP and MSP-IMPROV datasets confirm the superior performance of CM-ARR under conditions of both missing and complete modalities. Notably, averaged across six scenarios of missing modalities, CM-ARR achieves absolute improvements of 2.11% in WAR and 2.12% in UAR on the IEMOCAP dataset, and 1.71% and 1.96% in WAR and UAR, respectively, on the MSP-IMPROV dataset.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# CAMP: 病理学における継続的かつ適応的な学習モデル

CAMP: Continuous and Adaptive Learning Model in Pathology ( http://arxiv.org/abs/2407.09030v1 )

ライセンス: Link先を確認

Anh Tien Nguyen, Keunho Byeon, Kyungeun Kim, Boram Song, Seoung Wan Chae, Jin Tae Kwak,

(参考訳) 病理には多くの診断課題がある。従来の計算病理学は、それらを独立および個別の画像分類問題として定式化し、それによって計算の非効率性と高いコストをもたらす。この課題に対処するために,病理画像分類のための連続的適応学習モデル (CAMP) と呼ばれる汎用的,統一的,普遍的なフレームワークを提案する。 CAMPは、どんな分類タスクにも継続的に適応できる生成的、効率的、適応的な分類モデルであり、病理学固有の事前知識を活用し、タスク固有の知識を最小の計算コストで学習し、既存のタスクからの知識を忘れることなく得る。我々はCAMPを17の分類タスクに対して,1,171,526のパッチと11,811の病理スライドを含む22のデータセットで評価した。 CAMPは、パッチレベルとスライドレベルの両方で、幅広いデータセットとタスクに対して最先端の分類性能を達成し、従来の分類モデルと比較して、計算時間の94%とストレージメモリの85%を削減した。以上の結果から,CAMPは画像分類の根本的な変換を図り,完全にデジタル化されコンピュータ化された病理学の実践の道を開くことができることが示された。

There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pathology image classification. CAMP is a generative, efficient, and adaptive classification model that can continuously adapt to any classification task by leveraging pathology-specific prior knowledge and learning taskspecific knowledge with minimal computational cost and without forgetting the knowledge from the existing tasks. We evaluated CAMP on 22 datasets, including 1,171,526 patches and 11,811 pathology slides, across 17 classification tasks. CAMP achieves state-of-theart classification performance on a wide range of datasets and tasks at both patch- and slide-levels and reduces up to 94% of computation time and 85% of storage memory in comparison to the conventional classification models. Our results demonstrate that CAMP can offer a fundamental transformation in pathology image classification, paving the way for the fully digitized and computerized pathology practice.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# DRMの再検討:完全なエラー分析

DRM Revisited: A Complete Error Analysis ( http://arxiv.org/abs/2407.09032v1 )

ライセンス: Link先を確認

Yuling Jiao, Ruoxuan Li, Peiying Wu, Jerry Zhijian Yang, Pingwen Zhang,

(参考訳) 目標精度レベルが与えられた場合、トレーニングサンプルの適切な数、ニューラルネットワークの鍵となるアーキテクチャパラメータ、投影された勾配勾配最適化手順のステップサイズ、反復の必要回数をどうやって決定できるか。

In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number of iterations, such that the output of the gradient descent process closely approximates the true solution of the underlying partial differential equation to the specified precision?

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# ドメイン一般化セグメンテーションのためのテキストクエリ駆動型マスク変換器

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation ( http://arxiv.org/abs/2407.09033v1 )

ライセンス: Link先を確認

Byeonghyun Pak, Byeongju Woo, Sunghwan Kim, Dae-hwan Kim, Hoseong Kim,

(参考訳) 本稿では,視覚言語モデルのテキスト埋め込みから,ドメイン不変の意味知識を活用することによって,ドメイン一般化セマンティックセマンティックセマンティックセマンティックセマンティクス(DGSS)に取り組む手法を提案する。我々は、変換器ベースのセグメンテーションフレームワーク(テキストオブジェクトクエリ)内で、オブジェクトクエリとしてテキスト埋め込みを使用します。これらのクエリは、DGSSにおけるピクセルグループ化のドメイン不変基底と見なされる。テキスト・オブジェクト・クエリのパワーを活用するために,テキスト・クエリ・ドリブン・マスク・トランスフォーマ (tqdm) と呼ばれる新しいフレームワークを導入する。 tqdmの目的は,(1)ドメイン不変セマンティクスを最大エンコードするテキストオブジェクトクエリを生成し,(2)高密度な視覚的特徴のセマンティクスを明確にすることである。さらに,視覚的特徴とテキスト的特徴の整合により,tqdmの有効性を向上させるために3つの正規化損失を提案する。本手法を用いることで,本モデルは興味のあるクラスに固有の意味情報を理解し,極端なドメイン(スケッチスタイルなど)に一般化することができる。我々のtqdmはGTA5$\rightarrow$Cityscapes上で68.9 mIoUを達成した。プロジェクトのページはhttps://byeonghyunpak.github.io/tqdm.comで公開されている。

In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. To leverage the power of textual object queries, we introduce a novel framework named the textual query-driven mask transformer (tqdm). Our tqdm aims to (1) generate textual object queries that maximally encode domain-invariant semantics and (2) enhance the semantic clarity of dense visual features. Additionally, we suggest three regularization losses to improve the efficacy of tqdm by aligning between visual and textual features. By utilizing our method, the model can comprehend inherent semantic information for classes of interest, enabling it to generalize to extreme domains (e.g., sketch style). Our tqdm achieves 68.9 mIoU on GTA5$\rightarrow$Cityscapes, outperforming the prior state-of-the-art method by 2.5 mIoU. The project page is available at https://byeonghyunpak.github.io/tqdm.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# GPC:ジェネレーティブおよび一般的な病理画像分類器

GPC: Generative and General Pathology Image Classifier ( http://arxiv.org/abs/2407.09035v1 )

ライセンス: Link先を確認

Anh Tien Nguyen, Jin Tae Kwak,

(参考訳) ディープラーニングは、その効率、正確性、堅牢性を改善するために、様々な計算病理アプリケーションに組み込まれている。画像分類における従来のアプローチは成功したが、重大な欠点がある。病理学には多くのタスクがあるが、タスクごとにモデルを構築する必要がある。さらに、任意のタスク固有のモデルを別のタスクに転送することは、依然として難しい問題である。本稿では,多種多様な病理画像から学習し,多数の分類タスクを統一モデルで実行することを目的とした,GPCと呼ばれるタスク依存型画像分類器を提案する。 GPCは、畳み込みニューラルネットワークとトランスフォーマーベースの言語モデルを備え、病理画像を高次元の特徴空間にマッピングし、画像からテキストへの分類機構を介して、関連するクラスラベルをテキストとして生成する。我々は,4つの病理画像分類タスクに対して,GPCを6つのデータセットで評価した。実験の結果,GPCは病理画像解析のための効率的かつ効率的な普遍的モデルの開発にかなりの可能性を秘めていることが明らかとなった。

Deep learning has been increasingly incorporated into various computational pathology applications to improve its efficiency, accuracy, and robustness. Although successful, most previous approaches for image classification have crucial drawbacks. There exist numerous tasks in pathology, but one needs to build a model per task, i.e., a task-specific model, thereby increasing the number of models, training resources, and cost. Moreover, transferring arbitrary task-specific model to another task is still a challenging problem. Herein, we propose a task-agnostic generative and general pathology image classifier, so called GPC, that aims at learning from diverse kinds of pathology images and conducting numerous classification tasks in a unified model. GPC, equipped with a convolutional neural network and a Transformer-based language model, maps pathology images into a high-dimensional feature space and generates pertinent class labels as texts via the image-to-text classification mechanism. We evaluate GPC on six datasets for four different pathology image classification tasks. Experimental results show that GPC holds considerable potential for developing an effective and efficient universal model for pathology image analysis.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# タブラルデータ分類におけるカタストロフィック・フォーミングの克服--擬似リハーサルに基づくアプローチ

Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approach ( http://arxiv.org/abs/2407.09039v1 )

ライセンス: Link先を確認

Pablo García-Santaclara, Bruno Fernández-Castro, Rebeca P. Díaz-Redondo,

(参考訳) 継続学習(CL)は、それまで獲得した知識を忘れずに、新たな知識を統合しながら、データ分散の進化に適応する上で重要な課題となる。本稿では,Tarbular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3) と呼ばれる新しい手法を提案する。 TRIL3は、プロトタイプベースのインクリメンタル生成モデルXuILVQを使用して、古い知識を保存するために合成データを生成する。合成データの適切なパーセンテージを取得し, TRIL3 と他の CL との比較を行った結果, TRIL3 の性能は, 合成データの 50% しか利用しない文献の他の選択肢に勝っていると結論できる。

Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classification problems. TRIL3 uses the prototype-based incremental generative model XuILVQ to generate synthetic data to preserve old knowledge and the DNDF algorithm, which was modified to run in an incremental way, to learn classification tasks for tabular data, without storing old samples. After different tests to obtain the adequate percentage of synthetic data and to compare TRIL3 with other CL available proposals, we can conclude that the performance of TRIL3 outstands other options in the literature using only 50% of synthetic data.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# 拡張ペアを用いた分子言語モデルとエキスパートトランスファー

Molecule Language Model with Augmented Pairs and Expertise Transfer ( http://arxiv.org/abs/2407.09043v1 )

ライセンス: Link先を確認

Namkyeong Lee, Siddhartha Laghuvarapu, Chanyoung Park, Jimeng Sun,

(参考訳) 最近、分子言語モデル(MoLM)による分子とそのテキスト記述の理解が、研究者の間で注目を集めている。しかし、MOLMの分野には独自の課題が存在する。 1)分子文のペア化データの限られた量と 2)専門家の専門分野による専門知識の欠如。この目的のために,我々はAMOLEを提案する。 1)構造的類似性保持損失を有する分子文対を増補し、 2) 専門知識を分子間で伝達する。様々な下流タスクに関する大規模な実験は、コンプレッション分子とその記述におけるAMOLEの優位性を示し、現実世界の薬物発見への応用の可能性を強調している。

Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# 人物再同定のための可変長WiFi CSI信号の時間周波数解析

Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification ( http://arxiv.org/abs/2407.09045v1 )

ライセンス: Link先を確認

Chen Mao, Chong Tan, Jingqi Hu, Min Zheng,

(参考訳) セキュリティ分野における重要な技術である人物再識別(ReID)は、セキュリティ検出や数え方において重要な役割を担っている。現在のセキュリティと監視システムは、主に視覚情報に依存しており、個人のプライバシーを侵害し、特定のシナリオにおける歩行者の外観や衣服からの干渉を受けやすい可能性がある。一方、ルータの普及はReIDに新たな可能性をもたらす。本文では, 歩行者の特徴を識別するための基礎として, WiFi信号のマルチパス伝搬特性を活用する, WiFiチャネル状態情報(CSI)を用いた手法を紹介する。本稿では、WiFi信号の周波数領域における時間領域と位相の振幅を解析し、連続的な横方向接続を通して時間周波数情報を融合し、表現とメートル法学習のための高度な目的関数を用いる可変長データを処理する2ストリームネットワーク構造を提案する。実世界で収集されたデータセットを用いて実験し、93.68%のmAPと98.13%のランク-1を達成した。

Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of routers offers new possibilities for ReID. This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features. We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals, fuses time-frequency information through continuous lateral connections, and employs advanced objective functions for representation and metric learning. Tested on a dataset collected in the real world, our method achieves 93.68% mAP and 98.13% Rank-1.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# Cs2K:インクリメンタルセマンティックセグメンテーションのためのクラス固有およびクラス共有知識ガイダンス

Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation ( http://arxiv.org/abs/2407.09047v1 )

ライセンス: Link先を確認

Wei Cong, Yang Cong, Yuyang Liu, Gan Sun,

(参考訳) 増分的セグメンテーションは、古いクラスの知識を維持しながら、新しく遭遇したクラスをセグメンテーションする。しかし、既存の方法もある。 1) クラス固有の知識(例えば、古いクラスプロトタイプ)からのガイダンスが欠如し、新しいクラスへの偏見につながるか、 2) クラス共有知識(すなわち古いモデルウェイト)を過度に差別せずに拘束し, 古いクラスを優先する。本稿では,モデル性能をトレードオフするために,段階的セマンティックセグメンテーションのためのクラス固有およびクラス共有知識(Cs2K)ガイダンスを提案する。具体的には、クラス固有の知識の観点から、プロトタイプからの特徴近接を利用して擬似ラベルを補正し、破滅的な忘れを克服するプロトタイプ誘導擬似ラベルを設計する。一方,従来の拡張プロトタイプを学習することで,データセット間のクラス分布を整合させるプロトタイプ誘導型クラス適応を開発した。さらに,クラス共有の知識的側面から,新しいメモリを維持しつつ,古いクラスの重みと新しいモデルの重みを統合することで,古いメモリの強化を図るための重み付き選択的統合を提案する。公開データセットの実験により,提案したCs2Kはセグメンテーション性能を著しく向上し,プラグアンドプレイであることが示された。

Incremental semantic segmentation endeavors to segment newly encountered classes while maintaining knowledge of old classes. However, existing methods either 1) lack guidance from class-specific knowledge (i.e., old class prototypes), leading to a bias towards new classes, or 2) constrain class-shared knowledge (i.e., old model weights) excessively without discrimination, resulting in a preference for old classes. In this paper, to trade off model performance, we propose the Class-specific and Class-shared Knowledge (Cs2K) guidance for incremental semantic segmentation. Specifically, from the class-specific knowledge aspect, we design a prototype-guided pseudo labeling that exploits feature proximity from prototypes to correct pseudo labels, thereby overcoming catastrophic forgetting. Meanwhile, we develop a prototype-guided class adaptation that aligns class distribution across datasets via learning old augmented prototypes. Moreover, from the class-shared knowledge aspect, we propose a weight-guided selective consolidation to strengthen old memory while maintaining new memory by integrating old and new model weights based on weight importance relative to old classes. Experiments on public datasets demonstrate that our proposed Cs2K significantly improves segmentation performance and is plug-and-play.

翻訳日:2024-07-16 00:26:50 公開日:2024-07-12

# KUNPENG:Intelligent Maritimeのためのエンボディード大型モデル

KUNPENG: An Embodied Large Model for Intelligent Maritime ( http://arxiv.org/abs/2407.09048v1 )

ライセンス: Link先を確認

Naiyao Wang, Tongbang Jiang, Ye Wang, Shaoyang Qiu, Bo Zhang, Xinqiang Xie, Munan Li, Chunliu Wang, Yiyang Wang, Hongxiang Ren, Ruili Wang, Hongjun Shan, Hongbo Liu,

(参考訳) インテリジェントな海運は、スマート海洋構築の不可欠なコンポーネントとして、高度な人工知能技術とデータ分析手法を深く統合し、スマート血管、ルート最適化、安全な航行といった複数の側面を網羅し、海洋資源利用の効率化と輸送ネットワークの知性向上を目指している。しかし、複雑でダイナミックな海洋環境は、多種多様で異質な大規模データソースとともに、インテリジェントな海洋におけるリアルタイムな意思決定の課題を提示している。本稿では,6つのシステムからなるスマート海洋構築における知的海洋モデルであるKUNPENGを提案する。このモデルは、環境相互作用の認識のためのマルチソースの異種データを認識し、インテリジェントな船舶が安全と緊急の保証の下で航行行動を行い、海上でのエンボディドインテリジェンスを達成するために継続的に電力を最適化する自律的な意思決定戦略を行う。総合的な海上作業評価において、KUNPENGは優れた性能を示した。

Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic maritime environment, along with diverse and heterogeneous large-scale data sources, present challenges for real-time decision-making in intelligent maritime. In this paper, We propose KUNPENG, the first-ever embodied large model for intelligent maritime in the smart ocean construction, which consists of six systems. The model perceives multi-source heterogeneous data for the cognition of environmental interaction and make autonomous decision strategies, which are used for intelligent vessels to perform navigation behaviors under safety and emergency guarantees and continuously optimize power to achieve embodied intelligence in maritime. In comprehensive maritime task evaluations, KUNPENG has demonstrated excellent performance.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# マルチモーダル大言語モデルのための安全プロンプトの再利用

Refusing Safe Prompts for Multi-modal Large Language Models ( http://arxiv.org/abs/2407.09050v1 )

ライセンス: Link先を確認

Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong,

(参考訳) マルチモーダルな大規模言語モデル(MLLM)は、今日の生成AIエコシステムの基盤となり、テック大企業やスタートアップの間で激しい競争を巻き起こしている。特に、MLLMは、画像と質問からなるプロンプトが与えられたテキスト応答を生成する。最先端のMLLMは安全フィルタとアライメント技術を用いて安全でないプロンプトを拒否するが,本研究では,安全プロンプトに対する拒絶を誘導する最初の手法であるMLLM-Refusalを紹介する。特に、MLLM-Refusalは、ほとんど認識不能な拒絶摂動を最適化し、画像を付加するので、ターゲットMLLMは、摂動画像と安全な質問を含む安全なプロンプトを拒否する可能性が高い。具体的には,MLLM-Refusalを制約付き最適化問題として定式化し,その解法を提案する。本手法は,MLLM のユーザエクスペリエンスを損なう可能性を秘めているため,MLLM モデルプロバイダに対して競争上の優位性を提供する。 4つのデータセットにわたるMLLMに対するMLLM-Refusalの評価を行い、競合するMLLMが非競合MLLMに影響を与えずに安全なプロンプトを拒否する効果を示した。さらに,ガウス雑音,DiffPure,対人訓練の3つの潜在的な対策について検討した。 MLLM-Refusalの有効性を緩和できるが、競合するMLLMの精度や効率を犠牲にすることができる。コードはhttps://github.com/Sadcardation/MLLM-Refusalで入手できる。

Multimodal large language models (MLLMs) have become the cornerstone of today's generative AI ecosystem, sparking intense competition among tech giants and startups. In particular, an MLLM generates a text response given a prompt consisting of an image and a question. While state-of-the-art MLLMs use safety filters and alignment techniques to refuse unsafe prompts, in this work, we introduce MLLM-Refusal, the first method that induces refusals for safe prompts. In particular, our MLLM-Refusal optimizes a nearly-imperceptible refusal perturbation and adds it to an image, causing target MLLMs to likely refuse a safe prompt containing the perturbed image and a safe question. Specifically, we formulate MLLM-Refusal as a constrained optimization problem and propose an algorithm to solve it. Our method offers competitive advantages for MLLM model providers by potentially disrupting user experiences of competing MLLMs, since competing MLLM's users will receive unexpected refusals when they unwittingly use these perturbed images in their prompts. We evaluate MLLM-Refusal on four MLLMs across four datasets, demonstrating its effectiveness in causing competing MLLMs to refuse safe prompts while not affecting non-competing MLLMs. Furthermore, we explore three potential countermeasures -- adding Gaussian noise, DiffPure, and adversarial training. Our results show that they are insufficient: though they can mitigate MLLM-Refusal's effectiveness, they also sacrifice the accuracy and/or efficiency of the competing MLLM. The code is available at https://github.com/Sadcardation/MLLM-Refusal.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# DroneMOT:ドローンと物体の同時移動と検出困難を考慮したドローンによる多物体追跡

DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects ( http://arxiv.org/abs/2407.09051v1 )

ライセンス: Link先を確認

Peng Wang, Yongcai Wang, Deying Li,

(参考訳) 監視カメラのような静的なプラットフォーム上でのマルチオブジェクトトラッキング(MOT)は、様々なパラダイムが魅力的なパフォーマンスを提供するなど、大きな進歩を遂げている。しかし、ドローンのような動的プラットフォームに関しては、従来のMOT手法の有効性は著しく低下している。この減少は、(1)物体は画像平面において一般的に小さく、ぼやけており、しばしば隠蔽されているため、検出と認識が困難である、(2)ドローンが異なる角度から物体を移動して見ることにより、予測された位置の信頼性が低下し、物体の特徴が埋もれてしまうという、MOT-on-droneシナリオの特異な課題に起因している。本稿では,DroneMOTを提案する。DroneMOTは,ドローンによる物体検出の高速化と,小型でぼやけた,隠蔽された物体に対する特徴埋め込みを実現するために,ドローンの高速動作を考慮したDual-domain Integrated Attention (DIA)モジュールを提案する。次に、ドローンと物体の同時動作を考慮した、革新的な動き駆動協会(MDA)方式を導入する。 MDAでは、異なる角度から見る物体の特徴を更新するために、適応的特徴同期(AFS)技術が提示されている。さらに、物体の位置を予測するために、デュアルモーションベース予測(DMP)手法を用いる。最後に、改良された特徴埋め込みと予測された位置を統合して、オブジェクトアソシエーションを強化する。 VisDrone2019-MOTとUAVDTデータセットの総合的な評価によると、DroneMOTは、ドローン上のMOTの領域における最先端技術よりも大幅にパフォーマンスが改善されている。

Multi-object tracking (MOT) on static platforms, such as by surveillance cameras, has achieved significant progress, with various paradigms providing attractive performances. However, the effectiveness of traditional MOT methods is significantly reduced when it comes to dynamic platforms like drones. This decrease is attributed to the distinctive challenges in the MOT-on-drone scenario: (1) objects are generally small in the image plane, blurred, and frequently occluded, making them challenging to detect and recognize; (2) drones move and see objects from different angles, causing the unreliability of the predicted positions and feature embeddings of the objects. This paper proposes DroneMOT, which firstly proposes a Dual-domain Integrated Attention (DIA) module that considers the fast movements of drones to enhance the drone-based object detection and feature embedding for small-sized, blurred, and occluded objects. Then, an innovative Motion-Driven Association (MDA) scheme is introduced, considering the concurrent movements of both the drone and the objects. Within MDA, an Adaptive Feature Synchronization (AFS) technique is presented to update the object features seen from different angles. Additionally, a Dual Motion-based Prediction (DMP) method is employed to forecast the object positions. Finally, both the refined feature embeddings and the predicted positions are integrated to enhance the object association. Comprehensive evaluations on VisDrone2019-MOT and UAVDT datasets show that DroneMOT provides substantial performance improvements over the state-of-the-art in the domain of MOT on drones.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# MIDIからリッチタブラチュアへ:リードギター奏者の指とスティリスティックな選択を取り入れた自動生成システム

From MIDI to Rich Tablatures: an Automatic Generative System incorporating Lead Guitarists' Fingering and Stylistic choices ( http://arxiv.org/abs/2407.09052v1 )

ライセンス: Link先を確認

Pierluigi Bontempi, Daniele Manerba, Alexandre D'Hooge, Sergio Canazza,

(参考訳) 弦楽器のメロディ演奏に最適な指の自動識別は、文献では(少なくとも部分的には)既に行われているが、リードエレキギターに関する特定のケースには、専用のアプローチが必要である。簡単なMIDIメロディから,指や調音,表現技術に富んだタブを生成できるシステムを提案する。基本指は,各瞬間に使用する指だけでなく,フレッティングハンドの最適な位置を導出する制約付き多属性最適化問題の解法から導かれるもので,MySongBook corpusの統計データを解析することにより,最も一般的なclich{\'e}と生体力学的実現性,調音,表現技術を導入している。最後に、得られた出力はMusicXMLフォーマットに変換され、視覚化と使用が容易になる。提案手法の質と高い構成性は、特に器楽教育、補助的な作曲とアレンジング、および計算的表現力のある演奏モデルにおいて、いくつかの影響を及ぼす可能性がある。

Although the automatic identification of the optimal fingering for the performance of melodies on fretted string instruments has already been addressed (at least partially) in the literature, the specific case regarding lead electric guitar requires a dedicated approach. We propose a system that can generate, from simple MIDI melodies, tablatures enriched by fingerings, articulations, and expressive techniques. The basic fingering is derived by solving a constrained and multi-attribute optimization problem, which derives the best position of the fretting hand, not just the finger used at each moment.Then, by analyzing statistical data from the mySongBook corpus, the most common clich{\'e}s and biomechanical feasibility, articulations, and expressive techniques are introduced. Finally, the obtained output is converted into MusicXML format, which allows for easy visualization and use. The quality of the tablatures derived and the high configurability of the proposed approach can have several impacts, in particular in the fields of instrumental teaching, assisted composition and arranging, and computational expressive music performance models.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# 高度なグラフクラスタリング手法: 包括的で詳細な分析

Advanced Graph Clustering Methods: A Comprehensive and In-Depth Analysis ( http://arxiv.org/abs/2407.09055v1 )

ライセンス: Link先を確認

Timothé Watteau, Aubin Bonnefoy, Simon Illouz-Laurent, Joaquim Jusseau, Serge Iovleff,

(参考訳) グラフクラスタリングは、グラフを複数の均質なグループに分割することを目的としており、ソーシャルネットワーク分析、バイオインフォマティクス、イメージセグメンテーションといった様々な分野にまたがるアプリケーションにおいて重要な研究領域である。本稿では,従来のグラフクラスタリング手法と最近のグラフクラスタリング手法について検討する。まず、グラフ理論における重要な概念と定義を紹介する。背景のセクションでは、グラフラプラシアンやグラフ解析におけるディープラーニングの統合など、重要なトピックが取り上げられている。論文では、スペクトルクラスタリングやライデンアルゴリズムなど、従来のクラスタリング手法について論じる。次に,ディープラーニングを活用した最先端クラスタリング手法について検討した。これらの手法の総合的な比較は実験を通じて行われる。本稿では,グラフクラスタリングの実用化と今後の研究の方向性について論じる。

Graph clustering, which aims to divide a graph into several homogeneous groups, is a critical area of study with applications that span various fields such as social network analysis, bioinformatics, and image segmentation. This paper explores both traditional and more recent approaches to graph clustering. Firstly, key concepts and definitions in graph theory are introduced. The background section covers essential topics, including graph Laplacians and the integration of Deep Learning in graph analysis. The paper then delves into traditional clustering methods, including Spectral Clustering and the Leiden algorithm. Following this, state-of-the-art clustering techniques that leverage deep learning are examined. A comprehensive comparison of these methods is made through experiments. The paper concludes with a discussion of the practical applications of graph clustering and potential future research directions.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# 高エネルギー物理実験におけるジェットクラスタリングの新しい量子化

A Novel Quantum Realization of Jet Clustering in High-Energy Physics Experiments ( http://arxiv.org/abs/2407.09056v1 )

ライセンス: Link先を確認

Yongfeng Zhu, Weifeng Zhuang, Chen Qian, Yunheng Ma, Dong E. Liu, Manqi Ruan, Chen Zhou,

(参考訳) 量子技術の基礎科学への応用を探求することは、双方にとってイノベーションを育む鍵となる。高エネルギー粒子衝突ではクォークとグルーオンが生成され、すぐにジェットとして知られる衝突粒子噴霧を形成する。正確なジェット・クラスタリングは、起源のクォークやグルーオンの情報を保持し、亜原子粒子の質量生成の機構を基盤とするヒッグス粒子の性質の研究の基礎を形成するため、重要である。衝突イベントをノードとして、角分離をエッジとしてグラフにマッピングすることで、利用可能な量子資源と古典的な組合せ最適化問題に対処するハイブリッド量子古典アルゴリズムであるQuantum Approximate Optimization Algorithm (QAOA)を用いてジェットクラスタリングを実現する。量子コンピュータシミュレータの30量子ビットと量子コンピュータハードウェアの6量子ビットから得られた本研究では,QAOAを用いたジェットクラスタリング性能が,小型問題に対する古典的アルゴリズムと同等かそれ以上に優れていることを示す。この研究は、ジェットクラスタリングに革命をもたらす量子コンピューティングの可能性を強調し、高エネルギー物理実験における量子コンピューティングの実践的応用を一歩近づいた。

Exploring the application of quantum technologies to fundamental sciences holds the key to fostering innovation for both sides. In high-energy particle collisions, quarks and gluons are produced and immediately form collimated particle sprays known as jets. Accurate jet clustering is crucial as it retains the information of the originating quark or gluon and forms the basis for studying properties of the Higgs boson, which underlies teh mechanism of mass generation for subatomic particles. For the first time, by mapping collision events into graphs--with particles as nodes and their angular separations as edges--we realize jet clustering using the Quantum Approximate Optimization Algorithm (QAOA), a hybrid quantum-classical algorithm for addressing classical combinatorial optimization problems with available quantum resources. Our results, derived from 30 qubits on quantum computer simulator and 6 qubits on quantum computer hardware, demonstrate that jet clustering performance with QAOA is comparable with or even better than classical algorithms for a small-sized problem. This study highlights the feasibility of quantum computing to revolutionize jet clustering, bringing the practical application of quantum computing in high-energy physics experiments one step closer.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# パーソナライゼーションネット:パーソナライズされた主題を人として振る舞う

PersonificationNet: Making customized subject act like a person ( http://arxiv.org/abs/2407.09057v1 )

ライセンス: Link先を確認

Tianchu Guo, Pengyu Li, Biao Wang, Xiansheng Hua,

(参考訳) 近年、カスタマイズされた生成には大きな可能性を秘めており、3～5個のユーザ提供画像を用いて、特定の被写体の新たな画像の合成をモデルに訓練している。その後のアプリケーションは、カスタマイズされた生成の柔軟性と多様性を高めるが、人のポーズのように振る舞う対象に対するきめ細かい制御は、まだ研究の欠如である。本稿では,人物像と同一のポーズを演じるために,漫画キャラクタやぬいぐるみなどの特定対象を制御できるペルソナライズネットを提案する。カスタマイズされたブランチ、ポーズ条件ブランチ、構造アライメントモジュールが含まれている。具体的には、まず、カスタマイズされたブランチが特定の被写体を模倣する。第2に、ポーズ条件分岐は、人体構造情報を変種インスタンスに転送する。最後に、構造アライメントモジュールは、推論段階における人と特定被写体の間の構造ギャップをブリッジする。実験の結果,提案するペルソナリティネットは最先端の手法よりも優れていた。

Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a PersonificationNet, which can control the specified subject such as a cartoon character or plush toy to act the same pose as a given referenced person's image. It contains a customized branch, a pose condition branch and a structure alignment module. Specifically, first, the customized branch mimics specified subject appearance. Second, the pose condition branch transfers the body structure information from the human to variant instances. Last, the structure alignment module bridges the structure gap between human and specified subject in the inference stage. Experimental results show our proposed PersonificationNet outperforms the state-of-the-art methods.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# テスト時間ブラリングによるドメイン適応型ビデオの劣化

Domain-adaptive Video Deblurring via Test-time Blurring ( http://arxiv.org/abs/2407.09059v1 )

ライセンス: Link先を確認

Jin-Ting He, Fu-Jen Tsai, Jia-Hao Wu, Yan-Tsung Peng, Chung-Chi Tsai, Chia-Wen Lin, Yen-Yu Lin,

(参考訳) ダイナミックなシーンビデオのデブロアリングは、露光プロセス中にキャプチャーされた望ましくないぼやけたアーティファクトを取り除くことを目的としている。従来のビデオデブロアリング手法は目覚ましい結果を得たが、トレーニングとテストビデオの領域差、特に実世界のシナリオで捉えた場合、大きなパフォーマンス低下に悩まされている。そこで本研究では,未確認領域におけるデブロアリングモデルに対するテスト時間微調整を実現するために,ぼやけたモデルに基づくドメイン適応方式を提案する。そこで本手法では, 対象領域の劣化モデルを校正するために, ドメイン適応型トレーニングペアを生成することができる。まず、ぼやけた入力画像から比較的シャープな領域を識別し、擬似シャープ画像とみなすための相対シャープネス検出モジュールを提案する。次に、テスト中に抽出した擬似シャープ画像に基づいて、ぼやけた画像を生成するために、ぼやけたモデルを用いる。対象データ分布に応じてぼやけた画像を合成するために, ドメイン適応型ブラ条件生成モジュールを提案し, ぼやけたモデルに対して, ドメイン固有なぼやけた条件を作成する。最後に、生成された擬似シャープとぼやけたペアを使用して、より優れた性能を得るためにデブロアリングモデルを微調整する。大規模な実験結果から,本手法は最先端のビデオデブロアリング法を大幅に改善し,実世界のビデオデブロアリングデータセットに対して最大7.54dBの性能向上を達成できることが示された。ソースコードはhttps://github.com/Jin-Ting-He/DADeblur.comで入手できる。

Dynamic scene video deblurring aims to remove undesirable blurry artifacts captured during the exposure process. Although previous video deblurring methods have achieved impressive results, they suffer from significant performance drops due to the domain gap between training and testing videos, especially for those captured in real-world scenarios. To address this issue, we propose a domain adaptation scheme based on a blurring model to achieve test-time fine-tuning for deblurring models in unseen domains. Since blurred and sharp pairs are unavailable for fine-tuning during inference, our scheme can generate domain-adaptive training pairs to calibrate a deblurring model for the target domain. First, a Relative Sharpness Detection Module is proposed to identify relatively sharp regions from the blurry input images and regard them as pseudo-sharp images. Next, we utilize a blurring model to produce blurred images based on the pseudo-sharp images extracted during testing. To synthesize blurred images in compliance with the target data distribution, we propose a Domain-adaptive Blur Condition Generation Module to create domain-specific blur conditions for the blurring model. Finally, the generated pseudo-sharp and blurred pairs are used to fine-tune a deblurring model for better performance. Extensive experimental results demonstrate that our approach can significantly improve state-of-the-art video deblurring methods, providing performance gains of up to 7.54dB on various real-world video deblurring datasets. The source code is available at https://github.com/Jin-Ting-He/DADeblur.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# スペクトル自己監督的特徴選択

Spectral Self-supervised Feature Selection ( http://arxiv.org/abs/2407.09061v1 )

ライセンス: Link先を確認

Daniel Segal, Ofir Lindenbaum, Ariel Jaffe,

(参考訳) 教師なし環境での高次元観測から有意義な特徴のサブセットを選択することは、クラスタリングや次元減少といった下流分析の精度を大幅に向上させ、与えられたデータセットの不均一性の原因に関する貴重な洞察を提供する。本稿では,教師なし特徴選択のための自己教師付きグラフベースアプローチを提案する。提案手法のコアは,グラフラプラシアンの固有ベクトルに単純な処理ステップを適用することで,ロバストな擬似ラベルを計算することである。擬似ラベル計算に使用される固有ベクトルのサブセットは、モデル安定性基準に基づいて選択される。次に,観測結果から擬似ラベルを予測するために代理モデルを訓練することにより,各特徴の重要性を測定する。我々のアプローチは、外れ値や複雑な部分構造の存在など、困難なシナリオに対して堅牢であることが示されている。実世界のデータセットを用いた実験を通して,本手法の有効性を実証し,その堅牢性,特に生物学的データセットにおける有効性を示す。

Choosing a meaningful subset of features from high-dimensional observations in unsupervised settings can greatly enhance the accuracy of downstream analysis, such as clustering or dimensionality reduction, and provide valuable insights into the sources of heterogeneity in a given dataset. In this paper, we propose a self-supervised graph-based approach for unsupervised feature selection. Our method's core involves computing robust pseudo-labels by applying simple processing steps to the graph Laplacian's eigenvectors. The subset of eigenvectors used for computing pseudo-labels is chosen based on a model stability criterion. We then measure the importance of each feature by training a surrogate model to predict the pseudo-labels from the observations. Our approach is shown to be robust to challenging scenarios, such as the presence of outliers and complex substructures. We demonstrate the effectiveness of our method through experiments on real-world datasets, showing its robustness across multiple domains, particularly its effectiveness on biological datasets.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# フェデレーションのためのマルチモーダルデータセット作成 -DICOM構造化レポートを用いた学習-

Multi-Modal Dataset Creation for Federated~Learning with DICOM Structured Reports ( http://arxiv.org/abs/2407.09064v1 )

ライセンス: Link先を確認

Malte Tölle, Lukas Burger, Halvar Kelm, Florian André, Peter Bannas, Gerhard Diller, Norbert Frey, Philipp Garthe, Stefan Groß, Anja Hennemuth, Lars Kaderali, Nina Krüger, Andreas Leha, Simon Martin, Alexander Meyer, Eike Nagel, Stefan Orwat, Clemens Scherer, Moritz Seiffert, Jan Moritz Seliger, Stefan Simm, Tim Friede, Tim Seidler, Sandy Engelhardt,

(参考訳) 目的: フェデレーショントレーニングは,多種多様なデータストレージオプション,一貫性のない命名方式,さまざまなアノテーション手順,ラベル品質の相違などにより,不均一なデータセットによって妨げられることが多い。これは、均一なデータ表現とフィルタリングオプションを含むデータセット調和が最重要となる、新興のマルチモーダル学習パラダイムにおいて特に顕著である。メソッド: DICOM構造化レポートは、イメージングドメインを超えて任意の情報の標準化されたリンクを可能にする。これに基づいて、マルチモーダルデータセットの組み立てプロセスを簡単にする、データ統合と対話型フィルタリング機能のためのオープンプラットフォームを開発した。結果: 本研究は,ドイツにある8つの大学病院のコンソーシアムにおけるフェデレーショントレーニングのためのデータセットの合理化とともに, より多種多様なデータタイプに適用可能性を示すことによって, これまでの作業を拡張した。最小侵襲心弁置換術後の結果を予測するため,全部位に調和したマルチモーダルデータセットを作成した。データはDICOMデータ(CT画像、心電図スキャン)、アノテーション(石灰化セグメンテーション、ポイントセット、ペースメーカー依存性)、メタデータ(補綴、診断)を含む。結論: 構造化レポートは、画像システムと情報システムの間の伝統的なギャップを橋渡しする。固有のDICOM参照システムを利用することで、任意のデータ型を同時にクエリして、臨床的研究に意味のあるコホートを作成することができる。グラフィカルインターフェースと構造化レポートテンプレートの例が公開される予定だ。

Purpose: Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance. Methods: DICOM structured reports enable the standardized linkage of arbitrary information beyond the imaging domain and can be used within Python deep learning pipelines with highdicom. Building on this, we developed an open platform for data integration and interactive filtering capabilities that simplifies the process of assembling multi-modal datasets. Results: In this study, we extend our prior work by showing its applicability to more and divergent data types, as well as streamlining datasets for federated training within an established consortium of eight university hospitals in Germany. We prove its concurrent filtering ability by creating harmonized multi-modal datasets across all locations for predicting the outcome after minimally invasive heart valve replacement. The data includes DICOM data (i.e. computed tomography images, electrocardiography scans) as well as annotations (i.e. calcification segmentations, pointsets and pacemaker dependency), and metadata (i.e. prosthesis and diagnoses). Conclusion: Structured reports bridge the traditional gap between imaging systems and information systems. Utilizing the inherent DICOM reference system arbitrary data types can be queried concurrently to create meaningful cohorts for clinical studies. The graphical interface as well as example structured report templates will be made publicly available.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# 直接参照最適化のための新しいデシラタ

New Desiderata for Direct Preference Optimization ( http://arxiv.org/abs/2407.09072v1 )

ライセンス: Link先を確認

Xiangkun Hu, Tong He, David Wipf,

(参考訳) これまでの大きな言語モデルは、モデル応答と人間の嗜好をより良く整合させるために、人間からのフィードバック(RLHF)による強化学習のある種の形式に依存してきた。しかし、これらのRLHFパイプラインを実装する際の不安定性のため、RL報酬モデルを個別に学習する必要性を助長するために、近年様々なパラメータ化技術が導入されている。代わりに、人間の嗜好を直接微調整することは、単一のクローズドフォームトレーニング目標(元々は直接選好最適化(DPO)と呼ばれ、その後いくつかの顕著な子孫が続くプロセス)の最小化によって達成される。実世界の特定の環境では有効であるが、既存のDPO手法が事前訓練された参照モデルと人間の嗜好の実証的尺度を補間する能力の未解決の欠点を浮き彫りにする新たな評価基準を導入する。私たちの洞察は、これらの制限を確実に緩和する代替のDPOライクな損失を動機付けます。経験的結果は、我々の分析の顕著な側面を裏付けるものである。

Large language models in the past have typically relied on some form of reinforcement learning with human feedback (RLHF) to better align model responses with human preferences. However, because of oft-observed instabilities when implementing these RLHF pipelines, various reparameterization techniques have recently been introduced to sidestep the need for separately learning an RL reward model. Instead, directly fine-tuning for human preferences is achieved via the minimization of a single closed-form training objective, a process originally referred to as direct preference optimization (DPO) and followed by several notable descendants. Although effective in certain real-world settings, we introduce new evaluation criteria that serve to highlight unresolved shortcomings in the ability of existing DPO methods to interpolate between a pre-trained reference model and empirical measures of human preferences, as well as unavoidable trade-offs in how low- and high-quality responses are regularized and constraints are handled. Our insights then motivate an alternative DPO-like loss that provably mitigates these limitations. Empirical results serve to corroborate notable aspects of our analyses.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# Open Vocabulary Multi-Label Video 分類

Open Vocabulary Multi-Label Video Classification ( http://arxiv.org/abs/2407.09073v1 )

ライセンス: Link先を確認

Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi,

(参考訳) 事前学習された視覚言語モデル(VLM)は、画像分類、オブジェクト検出、画像セグメント化などのオープン語彙コンピュータビジョンタスクにおいて大きな進歩をもたらした。いくつかの最近の研究はVLMを拡張し、ビデオ内の単一のラベルのアクション分類をオープンにすることに焦点を当てている。しかし、従来の手法では、複数のアクションやエンティティを同時に認識する能力、例えば、ビデオ内のオブジェクトをオープンな語彙設定で認識する能力を必要とする、全体論的ビデオ理解では不足していた。この問題をオープン語彙多ラベルビデオ分類として定式化し、CLIPなどの事前学習VLMを適用してこの問題を解決する方法を提案する。大規模言語モデル(LLM)を活用して,クラスラベルに関するVLMのセマンティックガイダンスを提供し,そのオープンな語彙性能を2つの重要なコントリビューションで改善する。まず、LLMにCLIPテキストエンコーダのソフト属性を生成して、新しいクラスを認識できるようにする、エンドツーエンドのトレーニング可能なアーキテクチャを提案する。第2に、時間モデリングモジュールをCLIPの視覚エンコーダに統合し、ビデオ概念の時空間的ダイナミクスを効果的にモデル化し、ビデオ領域における強力なオープン語彙分類性能を保証するための新しい正規化微調整手法を提案する。大規模な実験では、複数のベンチマークデータセットに対するアプローチの有効性を示す。

Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to simultaneously recognize multiple actions and entities e.g., objects in the video in an open vocabulary setting. We formulate this problem as open vocabulary multilabel video classification and propose a method to adapt a pre-trained VLM such as CLIP to solve this task. We leverage large language models (LLMs) to provide semantic guidance to the VLM about class labels to improve its open vocabulary performance with two key contributions. First, we propose an end-to-end trainable architecture that learns to prompt an LLM to generate soft attributes for the CLIP text-encoder to enable it to recognize novel classes. Second, we integrate a temporal modeling module into CLIP's vision encoder to effectively model the spatio-temporal dynamics of video concepts as well as propose a novel regularized finetuning technique to ensure strong open vocabulary classification performance in the video domain. Our extensive experimentation showcases the efficacy of our approach on multiple benchmark datasets.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# BKDSNN: 知識蒸留による学習型スパイクニューラルネットワークトレーニングの性能向上

BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation ( http://arxiv.org/abs/2407.09083v1 )

ライセンス: Link先を確認

Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He,

(参考訳) 生物学的ニューラルネットワークを模倣して離散スパイクを介して情報を伝達するスパイキングニューラルネットワーク(SNN)は、優れた計算効率を持つ脳にインスパイアされたモデルとしてよく知られている。離散スパイクに対する代理勾配推定を利用して、超低推論遅延(時間ステップ数)を達成する学習ベースのSNNトレーニング手法が最近出現している。それでも、離散スパイクの正確な勾配推定を学習ベース手法で導き出すことが難しいため、SNNとその人工知能ニューラルネットワーク(ANN)間では、明確な精度のギャップが持続する。上記の問題に対処するために,ランダムなぼやけたSNN機能を活用してANN機能を復元・模倣する,ぼやけた知識蒸留(BKD)手法を提案する。なお, 我々のBKDは, SNNの最終層直前の機能マップに適用されており, 従来のロジットに基づく知識蒸留と組み合わせることで, 精度を最大化することができる。我々の知る限り、学習に基づく手法のカテゴリでは、静的およびニューロモルフィックなデータセット上でSNNをトレーニングするための最先端のパフォーマンスを達成する。 ImageNetデータセットでは、BKDSNNは、CNNとTransformerのネットワークトポロジでそれぞれ4.51%、0.93%の先行結果を上回っている。

Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to the difficulty in deriving precise gradient estimation for discrete spikes using learning-based method, a distinct accuracy gap persists between SNN and its artificial neural networks (ANNs) counterpart. To address the aforementioned issue, we propose a blurred knowledge distillation (BKD) technique, which leverages random blurred SNN feature to restore and imitate the ANN feature. Note that, our BKD is applied upon the feature map right before the last layer of SNN, which can also mix with prior logits-based knowledge distillation for maximized accuracy boost. To our best knowledge, in the category of learning-based methods, our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets. On ImageNet dataset, BKDSNN outperforms prior best results by 4.51% and 0.93% with the network topology of CNN and Transformer respectively.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# 視覚表現学習における離散的トークン化の役割について

On the Role of Discrete Tokenization in Visual Representation Learning ( http://arxiv.org/abs/2407.09087v1 )

ライセンス: Link先を確認

Tianqi Du, Yifei Wang, Yisen Wang,

(参考訳) 自己教師付き学習(SSL)の分野では、マスク付き画像モデリング(MIM)が対照的な学習手法と共に人気を集めている。 MIMは、入力画像の被写体領域を、その被写体部分を用いて再構成する。 MIM手法の顕著なサブセットは、個別のトークンを再構成ターゲットとして採用しているが、この選択の理論的基盤は未解明のままである。本稿では,これらのトークンの役割について考察し,それらのメリットと限界を明らかにすることを目的とする。 MIMと対照的な学習の関連性に基づいて、離散的なトークン化がモデルの一般化能力にどのように影響するかを包括的に理論的に理解する。さらに、MIMフレームワーク内の離散トークンの有効性を評価するために、TASと呼ばれる新しいメトリクスを提案する。本稿では,この指標に触発され,革新的なトークン化器の設計に寄与し,それに対応するクラスタMIM法を提案する。さまざまなベンチマークデータセットとViTバックボーン上での優れたパフォーマンスを示している。コードはhttps://github.com/PKU-ML/ClusterMIMで入手できる。

In the realm of self-supervised learning (SSL), masked image modeling (MIM) has gained popularity alongside contrastive learning methods. MIM involves reconstructing masked regions of input images using their unmasked portions. A notable subset of MIM methodologies employs discrete tokens as the reconstruction target, but the theoretical underpinnings of this choice remain underexplored. In this paper, we explore the role of these discrete tokens, aiming to unravel their benefits and limitations. Building upon the connection between MIM and contrastive learning, we provide a comprehensive theoretical understanding on how discrete tokenization affects the model's generalization capabilities. Furthermore, we propose a novel metric named TCAS, which is specifically designed to assess the effectiveness of discrete tokens within the MIM framework. Inspired by this metric, we contribute an innovative tokenizer design and propose a corresponding MIM method named ClusterMIM. It demonstrates superior performance on a variety of benchmark datasets and ViT backbones. Code is available at https://github.com/PKU-ML/ClusterMIM.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# FD-SOS: 口腔内画像からの骨剥離・脱ヒスンス検出のためのビジョンランゲージオープンセット検出器

FD-SOS: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images ( http://arxiv.org/abs/2407.09088v1 )

ライセンス: Link先を確認

Marawan Elbatel, Keyuan Liu, Yanqi Yang, Xiaomeng Li,

(参考訳) 歯科における骨形成・脱ヒスチンス(FD)の正確な検出は,効果的な治療計画立案に不可欠である。コーンビームCT(CBCT)はFDを評価するための金の標準であるが, 放射線照射, アクセシビリティの制限, 口腔内画像と比較して高コストである。口腔内画像では、歯科医はFDの鑑別診断に困難に直面している。本論文は口腔内画像のみからFD検出の新規かつ臨床的に重要な応用について述べる。そこで本研究では,口腔内画像からのFD検出のための新しいオープンセットオブジェクト検出器FD-SOSを提案する。 FD-SOSは、条件付きコントラストデノイング(CCDN)と歯特異的マッチング割り当て(TMA)の2つの新しい構成要素を持つ。これらのモジュールにより、FD-SOSは外部の歯科的意味論を効果的に活用できる。以上の結果から,本手法は既存の検出方法よりも優れ,同じ精度で35%のリコールを達成できた。コードは、https://github.com/xmed-lab/FD-SOSで入手できる。

Accurate detection of bone fenestration and dehiscence (FD) is crucial for effective treatment planning in dentistry. While cone-beam computed tomography (CBCT) is the gold standard for evaluating FD, it comes with limitations such as radiation exposure, limited accessibility, and higher cost compared to intraoral images. In intraoral images, dentists face challenges in the differential diagnosis of FD. This paper presents a novel and clinically significant application of FD detection solely from intraoral images. To achieve this, we propose FD-SOS, a novel open-set object detector for FD detection from intraoral images. FD-SOS has two novel components: conditional contrastive denoising (CCDN) and teeth-specific matching assignment (TMA). These modules enable FD-SOS to effectively leverage external dental semantics. Experimental results showed that our method outperformed existing detection methods and surpassed dental professionals by 35% recall under the same level of precision. Code is available at: https://github.com/xmed-lab/FD-SOS.

翻訳日:2024-07-16 00:17:04 公開日:2024-07-12

# 波動関数スナップショットからの量子輸送の診断

Diagnosing quantum transport from wave function snapshots ( http://arxiv.org/abs/2407.09092v1 )

ライセンス: Link先を確認

Devendra Singh Bhakuni, Roberto Verdel, Cristiano Muzzi, Riccardo Andreoni, Monika Aidelsburger, Marcello Dalmonte,

(参考訳) スピン鎖の非平衡量子力学を波動関数スナップショットのデータセットに主成分分析(PCA)を用いて研究し,これらのデータセット内で情報がどのように伝播するかを検討する。私たちが使用する量は、データセットから直接構築されたサンプル第2モーメント行列のスペクトルから導き出される。スピンやエネルギー輸送の異なるいくつかの相互作用するスピン鎖の研究により、データ情報の拡散はスピンやエネルギーの量子輸送の根底にあるものと同じ動的指数に従っていることが明らかとなった。具体的には,ハミルトニアン形式を仮定することなく,限られたサンプル数でエネルギー輸送を追跡するために,簡便でデータ駆動型かつ重要な解釈可能な診断を可能にする。これらの観測は、実験的および数値的な制約に沿って、わずかに有限の大きさと進化の時間で得られる。我々のフレームワークは、高次元系の力学の実験的量子シミュレーターデータセットに直接適用され、古典的なシミュレーション手法は通常、重要な制限に直面し、近距離および遠距離のクエンチにも等しく適用される。

We study nonequilibrium quantum dynamics of spin chains by employing principal component analysis (PCA) on data sets of wave function snapshots and examine how information propagates within these data sets. The quantities we employ are derived from the spectrum of the sample second moment matrix, built directly from data sets. Our investigations on several interacting spin chains featuring distinct spin or energy transport reveal that the growth of data information spreading follows the same dynamical exponents as that of the underlying quantum transport of spin or energy. Specifically, our approach enables an easy, data-driven, and importantly interpretable diagnostic to track energy transport with a limited number of samples, which is usually challenging without any assumption on the Hamiltonian form. These observations are obtained at a modest finite size and evolution time, which aligns with experimental and numerical constraints. Our framework directly applies to experimental quantum simulator data sets of dynamics in higher-dimensional systems, where classical simulation methods usually face significant limitations and apply equally to both near- and far-from-equilibrium quenches.