Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20241106となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# DeNetDM: ネットワーク深さ変調によるデバイアス DeNetDM: Debiasing by Network Depth Modulation ( http://arxiv.org/abs/2403.19863v2 ) ライセンス: Link先を確認	Silpa Vadakkeeveetil Sreelatha, Adarsh Kappiyath, Abhra Chaudhuri, Anjan Dutta,	(参考訳) ニューラルネットワークがバイアス付きデータセットに基づいてトレーニングされる場合、彼らは必然的に急激な相関を学習する傾向にあり、強力な一般化と堅牢性を達成する上での課題に繋がる。このようなバイアスに対処する現在のアプローチは、一般的にバイアスアノテーションの利用、疑似バイアスラベルに基づくリウェイト、または増補手法によるバイアス強調データポイント内の多様性の向上を含む。 DeNetDMは、浅層ニューラルネットワークが学習コア属性を優先するのに対して、より深いものは、異なる情報を取得することを課題とする際のバイアスを強調するという観察に基づく、新しいデバイアス手法である。エキスパートのプロダクトから派生したトレーニングパラダイムを使用して、深いアーキテクチャと浅いアーキテクチャを持つ偏見のあるブランチと偏見のあるブランチの両方を作成し、知識を蒸留して、ターゲットの偏見のあるモデルを生成する。大規模な実験と分析により、我々のアプローチは現在のデバイアス技術より優れており、3つのデータセットで約5%の顕著な改善を実現し、合成データと実世界のデータの両方を包含していることが示された。注目すべきは、DeNetDMはバイアスラベルやバイアスタイプに関連するアノテーションを必要とせずにこれを達成すると同時に、監視対象と同等のパフォーマンスを提供することだ。さらに,本手法は,データ内のバイアス強調点の多様性を効果的に活用し,従来の手法を超越し,バイアス強調点の多様性を高めるための明示的な拡張法の必要性を回避している。ソースコードは受理時に利用可能になる。 When neural networks are trained on biased datasets, they tend to inadvertently learn spurious correlations, leading to challenges in achieving strong generalization and robustness. Current approaches to address such biases typically involve utilizing bias annotations, reweighting based on pseudo-bias labels, or enhancing diversity within bias-conflicting data points through augmentation techniques. We introduce DeNetDM, a novel debiasing method based on the observation that shallow neural networks prioritize learning core attributes, while deeper ones emphasize biases when tasked with acquiring distinct information. Using a training paradigm derived from Product of Experts, we create both biased and debiased branches with deep and shallow architectures and then distill knowledge to produce the target debiased model. Extensive experiments and analyses demonstrate that our approach outperforms current debiasing techniques, achieving a notable improvement of around 5% in three datasets, encompassing both synthetic and real-world data. Remarkably, DeNetDM accomplishes this without requiring annotations pertaining to bias labels or bias types, while still delivering performance on par with supervised counterparts. Furthermore, our approach effectively harnesses the diversity of bias-conflicting points within the data, surpassing previous methods and obviating the need for explicit augmentation-based methods to enhance the diversity of such bias-conflicting points. The source code will be available upon acceptance.	翻訳日:2024-11-09 03:37:09 公開日:2024-11-06
# ファインライン:ダウンストリーム能力分析による大規模言語モデルの事前学習 The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis ( http://arxiv.org/abs/2404.01204v2 ) ライセンス: Link先を確認	Chen Yang, Junzhuo Li, Xinyao Niu, Xinrun Du, Songyang Gao, Haoran Zhang, Zhaoliang Chen, Xingwei Qu, Ruibin Yuan, Yizhi Li, Jiaheng Liu, Stephen W. Huang, Shawn Yue, Jie Fu, Ge Zhang,	(参考訳) 最終的なモデルパフォーマンスを反映した初期段階のメトリクスを明らかにすることは、大規模な事前トレーニングのコア原則のひとつです。既存のスケーリング法則では,事前学習損失とトレーニングフロップの相関関係が示されており,これは大規模言語モデルにおける現在のトレーニング状態の重要な指標となっている。しかし、この原則はトレーニングデータに対するモデルの圧縮特性のみに焦点を当てており、結果として下流タスクの能力改善と矛盾する。いくつかの追従的な研究は、スケーリング法則をより複雑なメトリクス(ハイパーパラメータなど)に拡張しようとしたが、事前訓練中に様々な能力の動的差異を包括的に分析することはできなかった。上記の制約に対処するため,本研究では,様々な事前学習中間チェックポイントにおけるモデル機能の包括的比較を行う。この分析により、特定の下流メトリクスが、最大67億のパラメータを含む、異なるサイズのモデルにまたがる同様のトレーニングダイナミクスを示すことを確認した。コアの発見に加えて、AmberとOpenLLaMAを再現し、中間チェックポイントをリリースしました。このイニシアチブは、研究コミュニティに貴重なリソースを提供し、オープンソース研究者によるLLM事前学習の検証と探索を容易にする。さらに、異なるモデルと能力のパフォーマンス比較や、異なるトレーニングフェーズのための重要なメトリクスの授業など、実証的な要約も提供します。これらの知見に基づき、我々は最適化状態を評価するためのよりユーザフレンドリな戦略を提供し、安定した事前学習プロセスを確立するためのガイダンスを提供する。 Uncovering early-stage metrics that reflect final model performance is one core principle for large-scale pretraining. The existing scaling law demonstrates the power-law correlation between pretraining loss and training flops, which serves as an important indicator of the current training state for large language models. However, this principle only focuses on the model's compression properties on the training data, resulting in an inconsistency with the ability improvements on the downstream tasks. Some follow-up works attempted to extend the scaling-law to more complex metrics (such as hyperparameters), but still lacked a comprehensive analysis of the dynamic differences among various capabilities during pretraining. To address the aforementioned limitations, this paper undertakes a comprehensive comparison of model capabilities at various pretraining intermediate checkpoints. Through this analysis, we confirm that specific downstream metrics exhibit similar training dynamics across models of different sizes, up to 67 billion parameters. In addition to our core findings, we've reproduced Amber and OpenLLaMA, releasing their intermediate checkpoints. This initiative offers valuable resources to the research community and facilitates the verification and exploration of LLM pretraining by open-source researchers. Besides, we provide empirical summaries, including performance comparisons of different models and capabilities, and tuition of key metrics for different training phases. Based on these findings, we provide a more user-friendly strategy for evaluating the optimization state, offering guidance for establishing a stable pretraining process.	翻訳日:2024-11-09 03:37:09 公開日:2024-11-06
# 信念に対する知覚:大規模言語モデルにおける心の理論の事前推論を探る Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models ( http://arxiv.org/abs/2407.06004v3 ) ライセンス: Link先を確認	Chani Jung, Dongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim,	(参考訳) 人間は心の理論(ToM)を自然に開発するが、他者の精神状態や信念を理解する能力は、単純なToMベンチマークでは性能が劣る。我々は、人間のToM前駆体$-$perception inferenceと知覚-to-belief inference$-$in LLMsを評価することで、LLMのToM能力に対する理解を深めることができると仮定する。本稿では2つのデータセット,Percept-ToMi と Percept-FANToM を導入し,ToMi と FANToM に対する文字の認識をアノテートすることで,LLM におけるこれらのToM の前駆的推論を評価する。 8種類のLLMを評価した結果, モデルが知覚的推論において良好に機能し, 知覚的信頼的推論(例えば, 抑制的制御の欠如)の能力に限界があることが判明した。これらの結果に基づいて,LLMの強い知覚推論能力を活用しつつ,限られた知覚と信頼の推論を補完する新しいToM手法であるPercepToMを提案する。実験結果から,PercepToM は LLM の性能を著しく向上させることが明らかとなった。 While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors$-$perception inference and perception-to-belief inference$-$in LLMs. We introduce two datasets, Percept-ToMi and Percept-FANToM, to evaluate these precursory inferences for ToM in LLMs by annotating characters' perceptions on ToMi and FANToM, respectively. Our evaluation of eight state-of-the-art LLMs reveals that the models generally perform well in perception inference while exhibiting limited capability in perception-to-belief inference (e.g., lack of inhibitory control). Based on these results, we present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference. Experimental results demonstrate that PercepToM significantly enhances LLM's performance, especially in false belief scenarios.	翻訳日:2024-11-08 23:13:33 公開日:2024-11-06
# Dual-Inference Large Language Modelを用いた解釈可能な微分診断 Interpretable Differential Diagnosis with Dual-Inference Large Language Models ( http://arxiv.org/abs/2407.07330v2 ) ライセンス: Link先を確認	Shuang Zhou, Mingquan Lin, Sirui Ding, Jiashuo Wang, Genevieve B. Melton, James Zou, Rui Zhang,	(参考訳) DDx(Automatic differential diagnosis)は、患者の症状記述に基づいて、潜在的な疾患のリストをディファレンシャルとして生成する重要な医療課題である。実際には、これらの差分診断を解釈することは大きな価値をもたらすが、未発見のままである。大規模言語モデル (LLM) の強力な機能を考えると, DDx の解釈に LLM を用いて検討した。具体的には, 570 個の臨床ノートに専門家由来の解釈を用いた最初の DDx データセットをキュレートした。さらに, DDx 解釈のために LLM が双方向の推論(症状から診断まで, 逆も含む)を可能にする新しいフレームワークである Dual-Inf を提案する。人および自動評価は, 4基LLMにおける差分予測および解法の有効性を検証した。さらに、Dual-Infは解釈エラーを減らし、稀な疾患の説明を約束する。我々の知る限りでは、DDx説明のためにLLMをカスタマイズし、その解釈性能を総合的に評価する最初の作品である。本研究は,DDxの解釈において重要なギャップを埋め,臨床的意思決定を促進するものである。 Automatic differential diagnosis (DDx) is an essential medical task that generates a list of potential diseases as differentials based on patient symptom descriptions. In practice, interpreting these differential diagnoses yields significant value but remains under-explored. Given the powerful capabilities of large language models (LLMs), we investigated using LLMs for interpretable DDx. Specifically, we curated the first DDx dataset with expert-derived interpretation on 570 clinical notes. Besides, we proposed Dual-Inf, a novel framework that enabled LLMs to conduct bidirectional inference (i.e., from symptoms to diagnoses and vice versa) for DDx interpretation. Both human and automated evaluation validated its efficacy in predicting and elucidating differentials across four base LLMs. In addition, Dual-Inf could reduce interpretation errors and hold promise for rare disease explanations. To the best of our knowledge, it is the first work that customizes LLMs for DDx explanation and comprehensively evaluates their interpretation performance. Overall, our study bridges a critical gap in DDx interpretation and enhances clinical decision-making.	翻訳日:2024-11-08 22:40:08 公開日:2024-11-06
# 選択的G-双スペクトルとその逆変換:G-不変ネットワークへの応用 The Selective G-Bispectrum and its Inversion: Applications to G-Invariant Networks ( http://arxiv.org/abs/2407.07655v2 ) ライセンス: Link先を確認	Simon Mataigne, Johan Mathe, Sophia Sanborn, Christopher Hillar, Nina Miolane,	(参考訳) 信号処理と深層学習において重要な問題は、タスクに関係のないニュアンス要因に対して「textit{invariance}」を達成することである。これらの因子の多くは群$G$(例えば回転、変換、スケーリング)の作用として記述できるので、メソッドは$G$不変であることが望まれる。 G$-Bispectrumは、与えられた信号のすべての特性をグループアクションまで抽出する。その結果、$G$-Bispectrumは、プール機構に似た$G$-invariance\textemdashの計算プリミティブとしてディープニューラルネットワークアーキテクチャに組み込まれている。しかしながら、$G$-Bispectrum ($\mathcal{O}(\|G\|^2)$, with $\|G\|$ の計算コストは広く採用されている。ここでは、$G$-Bispectrum計算は、$\mathcal{O}(\|G\|)$ complexity で \textit{selective $G$-Bispectrum} に還元できる冗長性を含むことを示す。我々は、選択的な$G$-Bispectrumの数学的特性を証明し、ニューラルネットワークへの統合が従来のアプローチと比較して精度と堅牢性を向上し、フルの$G$-Bispectrumと比較してかなりのスピードアップを享受することを示した。 An important problem in signal processing and deep learning is to achieve \textit{invariance} to nuisance factors not relevant for the task. Since many of these factors are describable as the action of a group $G$ (e.g. rotations, translations, scalings), we want methods to be $G$-invariant. The $G$-Bispectrum extracts every characteristic of a given signal up to group action: for example, the shape of an object in an image, but not its orientation. Consequently, the $G$-Bispectrum has been incorporated into deep neural network architectures as a computational primitive for $G$-invariance\textemdash akin to a pooling mechanism, but with greater selectivity and robustness. However, the computational cost of the $G$-Bispectrum ($\mathcal{O}(\|G\|^2)$, with $\|G\|$ the size of the group) has limited its widespread adoption. Here, we show that the $G$-Bispectrum computation contains redundancies that can be reduced into a \textit{selective $G$-Bispectrum} with $\mathcal{O}(\|G\|)$ complexity. We prove desirable mathematical properties of the selective $G$-Bispectrum and demonstrate how its integration in neural networks enhances accuracy and robustness compared to traditional approaches, while enjoying considerable speeds-up compared to the full $G$-Bispectrum.	翻訳日:2024-11-08 22:40:08 公開日:2024-11-06
# 多視点逆学習による自己教師付き3Dポイントクラウドコンプリート Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning ( http://arxiv.org/abs/2407.09786v2 ) ライセンス: Link先を確認	Lintai Wu, Xianjing Cheng, Yong Xu, Huanqiang Zeng, Junhui Hou,	(参考訳) 現実のシナリオでは、スキャンされた点雲はしばしば閉塞問題のために不完全である。自己監督点雲完備化の課題は、完全な地底の真実を監督することなく、これらの不完全な物体の欠落した領域を再構築することである。現在の自己監督法は、監視のために部分観測の複数の視点に依存するか、または与えられた部分点雲から特定され、利用することができる固有の幾何学的類似性を見渡すかのいずれかである。本稿では,オブジェクトレベルとカテゴリ固有の幾何学的類似性を効果的に活用するフレームワークであるMAL-SPCを提案する。私たちのMAL-SPCは3Dの完全な監視を一切必要とせず、各オブジェクトに1つの部分点クラウドを必要とするだけです。具体的には、まず、部分入力と予測形状との間の類似した位置と曲率パターンを検索し、これらの類似性を活用して再構成結果の密度化と精査を行うパターン検索ネットワークを提案する。さらに、再構成された完全形状を多視点深度マップに描画し、カテゴリ固有の一視点深度画像から対象形状の幾何学を学習するための対角学習モジュールを設計する。異方性レンダリングを実現するために,レンダリング画像の品質向上を目的とした密度認識半径推定アルゴリズムを設計する。私たちのMAL-SPCは、現在の最先端のメソッドと比較して、最高の結果をもたらします。 In real-world scenarios, scanned point clouds are often incomplete due to occlusion issues. The task of self-supervised point cloud completion involves reconstructing missing regions of these incomplete objects without the supervision of complete ground truth. Current self-supervised methods either rely on multiple views of partial observations for supervision or overlook the intrinsic geometric similarity that can be identified and utilized from the given partial point clouds. In this paper, we propose MAL-SPC, a framework that effectively leverages both object-level and category-specific geometric similarities to complete missing structures. Our MAL-SPC does not require any 3D complete supervision and only necessitates a single partial point cloud for each object. Specifically, we first introduce a Pattern Retrieval Network to retrieve similar position and curvature patterns between the partial input and the predicted shape, then leverage these similarities to densify and refine the reconstructed results. Additionally, we render the reconstructed complete shape into multi-view depth maps and design an adversarial learning module to learn the geometry of the target shape from category-specific single-view depth images. To achieve anisotropic rendering, we design a density-aware radius estimation algorithm to improve the quality of the rendered images. Our MAL-SPC yields the best results compared to current state-of-the-art methods.We will make the source code publicly available at \url{https://github.com/ltwu6/malspc	翻訳日:2024-11-08 21:54:45 公開日:2024-11-06
# CIBench: コードインタープリタプラグインによるLLMの評価 CIBench: Evaluating Your LLMs with a Code Interpreter Plugin ( http://arxiv.org/abs/2407.10499v3 ) ライセンス: Link先を確認	Chuyu Zhang, Songyang Zhang, Yingfan Hu, Haowen Shen, Kuikun Liu, Zerun Ma, Fengzhe Zhou, Wenwei Zhang, Xuming He, Dahua Lin, Kai Chen,	(参考訳) 複雑な問題を解決するために外部ツールを使用するLCMベースのエージェントは大きな進歩を遂げているが、それらの能力のベンチマークは困難であり、それによってそれらの制限を明確に理解するのを妨げる。本稿では,データサイエンスタスクにコードインタプリタを利用するLLMの能力を総合的に評価する,CIBenchという対話型評価フレームワークを提案する。評価フレームワークは評価データセットと2つの評価モードを含む。評価データセットは,LLM-人的協調手法を用いて構築され,連続的かつ対話的なIPythonセッションを活用することによって,実際のワークフローをシミュレートする。 2つの評価モードは、LLMの人的援助なしでの能力を評価する。コードインタプリタの利用において, CIBench 上で 24 個の LLM の能力を解析し, 将来の LLM に対する貴重な洞察を提供するため, 広範囲にわたる実験を行った。 While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation framework includes an evaluation dataset and two evaluation modes. The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions. The two evaluation modes assess LLMs' ability with and without human assistance. We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.	翻訳日:2024-11-08 21:32:38 公開日:2024-11-06
# 列車なし、全利得:自己監督のグラディエントは深い凍結表現を改善する No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations ( http://arxiv.org/abs/2407.10964v2 ) ライセンス: Link先を確認	Walter Simoncini, Spyros Gidaris, Andrei Bursuc, Yuki M. Asano,	(参考訳) 本稿では、自己教師付き勾配を利用してトランスフォーマーエンコーダの特徴を高める方法であるUNsupervised GradIentsの機能であるFUNGIを紹介する。事前訓練されたモデルがあれば、まず入力毎に様々な自己教師対象からの勾配を計算します。これらの勾配は低次元に投影され、その後モデルの出力埋め込みと連結される。得られた特徴は、視覚からの11データセット、自然言語処理からの5データセット、オーディオからの2データセットの k-nearest 隣の分類に基づいて評価される。さまざまなサイズと事前トレーニング戦略にまたがるバックボーン全体において、FUNGI機能は埋め込みよりも一貫したパフォーマンス改善を提供する。また,FUNGI機能の使用は,線形分類,クラスタリング,画像検索に有効であり,事前訓練されたモデルの検索に基づくコンテキスト内シーン理解能力,例えば意味的セグメンテーションにおいて,DINOを+17%向上させるなどを大幅に向上することを示した。 This paper introduces FUNGI, Features from UNsupervised GradIents, a method to enhance the features of transformer encoders by leveraging self-supervised gradients. Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. These gradients are projected to a lower dimension and then concatenated with the model's output embedding. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio. Across backbones spanning various sizes and pretraining strategies, FUNGI features provide consistent performance improvements over the embeddings. We also show that using FUNGI features can benefit linear classification, clustering and image retrieval, and that they significantly improve the retrieval-based in-context scene understanding abilities of pretrained models, for example improving upon DINO by +17% for semantic segmentation - without any training.	翻訳日:2024-11-08 21:32:38 公開日:2024-11-06
# 自己監督型音響マスクオートエンコーダを用いたユニバーサル音源分離 Universal Sound Separation with Self-Supervised Audio Masked Autoencoder ( http://arxiv.org/abs/2407.11745v2 ) ライセンス: Link先を確認	Junqi Zhao, Xubo Liu, Jinzheng Zhao, Yi Yuan, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang,	(参考訳) ユニバーサルサウンド分離(Universal Sound separation, USS)は、任意の音源の混合物を分離するタスクである。通常、普遍的な分離モデルは、ラベル付きデータを使用して、監督された方法でスクラッチから訓練される。自己教師付き学習(SSL)は、ラベルのないデータを活用してタスクに依存しない表現を得る、新たなディープラーニングアプローチである。本稿では,音声マスク付きオートエンコーダ(A-MAE)の自己教師付き事前学習モデルについて,その分離性能を高めるため,普遍的な音源分離システムに統合することを提案する。 A-MAEのパラメータを微調整中に凍結または更新するSSL埋め込みを利用するための2つの戦略を採用している。 SSL埋め込みは、短時間フーリエ変換(STFT)と結合され、分離モデルの入力機能として機能する。提案手法をAudioSetデータセット上で評価した結果,提案手法は最先端のResUNetベースUSSモデルの分離性能を向上させることができた。 Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we propose integrating a self-supervised pre-trained model, namely the audio masked autoencoder (A-MAE), into a universal sound separation system to enhance its separation performance. We employ two strategies to utilize SSL embeddings: freezing or updating the parameters of A-MAE during fine-tuning. The SSL embeddings are concatenated with the short-time Fourier transform (STFT) to serve as input features for the separation model. We evaluate our methods on the AudioSet dataset, and the experimental results indicate that the proposed methods successfully enhance the separation performance of a state-of-the-art ResUNet-based USS model.	翻訳日:2024-11-08 20:59:00 公開日:2024-11-06
# GPT-4Vは放射線学のレポートをまだ生成できない GPT-4V Cannot Generate Radiology Reports Yet ( http://arxiv.org/abs/2407.12176v2 ) ライセンス: Link先を確認	Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan,	(参考訳) GPT-4Vの強いマルチモーダル能力は、放射線学レポート作成の自動化に関心を喚起するが、徹底的な評価は得られていない。本研究では,2つの胸部X線レポートデータセット(MIMIC-CXRとIU X-Ray)について,GPT-4Vの系統的評価を行った。我々は, GPT-4V を用いた報告を異なるプロンプト戦略により直接生成し, 語彙指標と臨床効果指標の両方で異常を生じさせることを試みた。低パフォーマンスを理解するために、タスクを2つのステップに分解します。 1)画像から医療条件ラベルを予測するための医用画像推論ステップ 2)(地中)条件から報告を生成するための報告合成ステップ。画像推論におけるGPT-4Vの性能は、異なるプロンプト間で一貫して低いことを示す。実際、モデル予測ラベルの分布は、画像上にどの基底条件が存在するかに関わらず一定であり、モデルが胸部X線を有意に解釈していないことを示唆している。レポート合成における基底条件が与えられたとしても、その生成した報告は微調整されたLLaMA-2よりも正確で自然音の少ないものである。また,GPT-4Vを放射線学のワークフローで用いる可能性についても疑念を呈していた。 GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow.	翻訳日:2024-11-08 20:48:00 公開日:2024-11-06
# GPT-4Vは放射線学のレポートをまだ生成できない GPT-4V Cannot Generate Radiology Reports Yet ( http://arxiv.org/abs/2407.12176v3 ) ライセンス: Link先を確認	Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan,	(参考訳) GPT-4Vの強いマルチモーダル能力は、放射線学レポート作成の自動化に関心を喚起するが、徹底的な評価は得られていない。本研究では,2つの胸部X線レポートデータセット(MIMIC-CXRとIU X-Ray)について,GPT-4Vの系統的評価を行った。我々は, GPT-4V を用いた報告を異なるプロンプト戦略により直接生成し, 語彙指標と臨床効果指標の両方で異常を生じさせることを試みた。低パフォーマンスを理解するために、タスクを2つのステップに分解します。 1)画像から医療条件ラベルを予測するための医用画像推論ステップ 2)(地中)条件から報告を生成するための報告合成ステップ。画像推論におけるGPT-4Vの性能は、異なるプロンプト間で一貫して低いことを示す。実際、モデル予測ラベルの分布は、画像上にどの基底条件が存在するかに関わらず一定であり、モデルが胸部X線を有意に解釈していないことを示唆している。レポート合成における基底条件が与えられたとしても、その生成した報告は微調整されたLLaMA-2よりも正確で自然音の少ないものである。また,GPT-4Vを放射線学のワークフローで用いる可能性についても疑念を呈していた。 GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow.	翻訳日:2024-11-08 20:48:00 公開日:2024-11-06
# 距離・予測・地図のない視覚空間ナビゲーション Visuospatial navigation without distance, prediction, or maps ( http://arxiv.org/abs/2407.13535v2 ) ライセンス: Link先を確認	Patrick Govoni, Pawel Romanczuk,	(参考訳) ナビゲーションは、少なくとも2つの部分的に解離可能な、同時に開発された脳のシステムによって制御される。認知地図は、生物がその位置、軸受、環境特徴間の距離を知らせ、ショートカットを可能にする。一方、応答に基づくナビゲーションは、知覚作用対を経路に結合するプロセスは、不正確で柔軟性に欠け、最終的に地図ベースの表現を保存していると見なされる。このように、ナビゲーションモデルは、応答に基づく戦略を無視しながら、予測制御と距離知覚によって構築されたトップダウンの地図の優位性を仮定する傾向にある。ここでは、従来の視覚ナビゲーションタスクにおける最小限のフィードフォワードフレームワークの有効性を示す。我々のエージェントは、直接視覚を移動に翻訳し、オープンフィールドの隠れた目標に向かって移動します。視覚的距離は目標への直接的軌跡を可能にするが、2つの異なるアルゴリズムは視覚的角度だけで頑健にナビゲートするように開発されている。それぞれに独自の文脈的トレードオフが与えられ、またげっ歯類、昆虫、魚、精子の細胞で観察される運動行動と一致し、反応に基づく戦略の広範な重要性が示唆される。計算コストの高いトップダウン表現へのオンラインアクセスを前提とせず、ボトムアップからのナビゲーションのさらなる研究を提唱する。 Navigation is controlled by at least two partially dissociable, concurrently developed systems in the brain. The cognitive map informs an organism of its location, bearing, and distances between environmental features, enabling shortcuts. Response-based navigation, on the other hand, the process of combining percept-action pairs into routes, is regarded as inaccurate and inflexible, ultimately subserving map-based representation. As such, navigation models tend to assume the primacy of maps, top-down constructed via predictive control and distance perception, while neglecting response-based strategies. Here we show the sufficiency of a minimal feedforward framework in a classic visual navigation task. Our agents, directly translating visual perception to movement, navigate to a hidden goal in an open field, an environment often assumed to require a map-based representation. While visual distance enables direct trajectories to the goal, two distinct algorithms develop to robustly navigate using visual angles alone. Each of the three confers unique contextual tradeoffs as well as aligns with movement behavior observed in rodents, insects, fish, and sperm cells, suggesting the widespread significance of response-based strategies. We advocate further study of navigation from the bottom-up without assuming online access to computationally expensive top-down representations, which may better explain behavior under energetic or attentional constraints.	翻訳日:2024-11-08 20:14:30 公開日:2024-11-06
# 多様体位相学習のためのユーレアン表現における永続ド・ラム=ホッジ・ラプラシアン Persistent de Rham-Hodge Laplacians in Eulerian representation for manifold topological learning ( http://arxiv.org/abs/2408.00220v2 ) ライセンス: Link先を確認	Zhe Su, Yiying Tong, Guo-Wei Wei,	(参考訳) 近年、トポロジカルデータ分析はデータサイエンスとエンジニアリングのトレンドとなっている。しかし、トポロジカルデータ解析の鍵となる技術、すなわち永続ホモロジーは、多様体上のデータに対して直接作用しない点クラウドデータ上で定義される。初期の進化的ド・ラム=ホッジ理論は多様体に関するデータを扱うが、ラグランジュ表現における多様体の補間による数値的な矛盾のため、機械学習の応用には不都合である。本稿では, 多様体位相学習の略として, 永続的ド・ラム・ホッジ・ラプラシアン, または持続的ホッジ・ラプラシアン(PHL)を導入する。我々のPHLは、多スケール多様体上の数値的不整合を回避し、構造パーバーするカルト格子を通してユーレアン表現内に構築される。多様体トポロジ学習を容易にするために,多様体や体積データ上のデータに対する持続的ホッジラプラシアン学習アルゴリズムを提案する。提案した多様体トポロジカル学習モデルの原理的応用として、2つのベンチマークデータセットによるタンパク質-リガンド結合親和性の予測を考察する。提案手法のパワーと将来性を明らかにする数値実験を行った。 Recently, topological data analysis has become a trending topic in data science and engineering. However, the key technique of topological data analysis, i.e., persistent homology, is defined on point cloud data, which does not work directly for data on manifolds. Although earlier evolutionary de Rham-Hodge theory deals with data on manifolds, it is inconvenient for machine learning applications because of the numerical inconsistency caused by remeshing the involving manifolds in the Lagrangian representation. In this work, we introduce persistent de Rham-Hodge Laplacian, or persistent Hodge Laplacian (PHL) as an abbreviation, for manifold topological learning. Our PHLs are constructed in the Eulerian representation via structure-persevering Cartesian grids, avoiding the numerical inconsistency over the multiscale manifolds. To facilitate the manifold topological learning, we propose a persistent Hodge Laplacian learning algorithm for data on manifolds or volumetric data. As a proof-of-principle application of the proposed manifold topological learning model, we consider the prediction of protein-ligand binding affinities with two benchmark datasets. Our numerical experiments highlight the power and promise of the proposed method.	翻訳日:2024-11-08 20:01:00 公開日:2024-11-06
# AIアライメントにおける嗜好を超えて Beyond Preferences in AI Alignment ( http://arxiv.org/abs/2408.16984v2 ) ライセンス: Link先を確認	Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton,	(参考訳) AIアライメントの主流の実践は、(1)嗜好が人間の価値観の適切な表現であること、(2)人間の合理性は嗜好の満足度を最大化すること、(3)AIシステムは1人以上の人の嗜好と整合して、我々の価値観に従って安全に行動することを保証するべきであることを前提としている。暗黙的に従うか、明示的に支持されるかにかかわらず、これらのコミットメントは、私たちがAIアライメントに対する優先的なアプローチと呼ぶものを構成する。本稿では,さらなる研究に欠かせない概念的・技術的選択肢を記述し,優先主義的アプローチを特徴付け,挑戦する。本稿はまず,有理選択理論の限界を記述的モデルとして調査し,人的価値の厚い意味的内容の獲得に優先権が如何に失敗するか,実用的表現がそれらの価値の不可避性を如何に無視するかを説明する。次に、我々は、人間とAIに対する期待されたユーティリティ理論(EUT)の規範性を批判し、合理的エージェントがEUTに準拠すべきでないことの議論を引き合いに出し、EUTがどの規範的に受け入れられるかについて沈黙しているかを強調した。最後に、これらの制限がAIアライメントの目標の再フレーミングを動機付けていると論じる: 人間のユーザ、開発者、あるいは人間性に富んだ大きな好みに合わせる代わりに、AIシステムは、汎用アシスタントの役割など、彼らの社会的役割に適する規範的な標準に適合すべきである。さらに、これらの標準は、関連するすべてのステークホルダーによって交渉され、合意されるべきです。この代替的なアライメントの概念では、AIシステムの多種多様さは、複数の値と分岐した値に関わらず、相互利益を促進し、害を制限する規範的な標準と整合して、多様な目的を達成することができる。 The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term a preferentist approach to AI alignment. In this paper, we characterize and challenge the preferentist approach, describing conceptual and technical alternatives that are ripe for further research. We first survey the limits of rational choice theory as a descriptive model, explaining how preferences fail to capture the thick semantic content of human values, and how utility representations neglect the possible incommensurability of those values. We then critique the normativity of expected utility theory (EUT) for humans and AI, drawing upon arguments showing how rational agents need not comply with EUT, while highlighting how EUT is silent on which preferences are normatively acceptable. Finally, we argue that these limitations motivate a reframing of the targets of AI alignment: Instead of alignment with the preferences of a human user, developer, or humanity-writ-large, AI systems should be aligned with normative standards appropriate to their social roles, such as the role of a general-purpose assistant. Furthermore, these standards should be negotiated and agreed upon by all relevant stakeholders. On this alternative conception of alignment, a multiplicity of AI systems will be able to serve diverse ends, aligned with normative standards that promote mutual benefit and limit harm despite our plural and divergent values.	翻訳日:2024-11-08 19:50:01 公開日:2024-11-06
# 言語モデルにおける文脈認識の嗜好モデルの改善 Improving Context-Aware Preference Modeling for Language Models ( http://arxiv.org/abs/2407.14916v2 ) ライセンス: Link先を確認	Silviu Pitis, Ziang Xiao, Nicolas Le Roux, Alessandro Sordoni,	(参考訳) ペアの選好から言語モデルを微調整することは極めて効果的であることが証明されているが、自然言語の未特定の性質は重要な課題を呈している。直接の嗜好フィードバックは解釈不能であり、多次元の基準が適用可能な場所を提供するのが困難であり、不完全な指示に基づくものであるか、様々なプリンシパルによって提供されるため、しばしば矛盾する。これらの課題に対処するために、まず、コンテキストを選択し、選択したコンテキストに対して好みを評価する2段階の選好モデリング手法を検討する。これら2つのステップに従って報酬モデリング誤差を分解し、文脈固有の嗜好に加えて文脈を監督することは、モデルと多様な人間の嗜好を整合させるための実行可能なアプローチである可能性を示唆している。これを実行するためには、コンテキスト固有の嗜好を評価するモデルの能力が不可欠である。この目的のために、文脈条件付き嗜好データセットと、文脈固有の嗜好を評価する言語モデルの有効性を調査する伴奏実験をコントリビュートする。我々は(1)既存の嗜好モデルの利点を示すためにデータセットを使用し、(2)テストデータセット上でのGPT-4およびLlama 370Bを超える文脈特異的なパフォーマンスを持つ文脈対応報酬モデルを作成し、(3)文脈対応嗜好モデルの価値を調査する。 While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute context-conditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B on tested datasets, and (3) investigate the value of context-aware preference modeling.	翻訳日:2024-11-08 19:27:32 公開日:2024-11-06
# マルチスポットホログラフィーツイーザーのフィードバック強度等化アルゴリズム Feedback Intensity Equalization Algorithm for Multi-Spots Holographic Tweezer ( http://arxiv.org/abs/2407.17049v2 ) ライセンス: Link先を確認	Shaoxiong Wang, Yifei Hu, Yaoting Zhou, Peng Lan, Heng Shen, Zhongxiao Xu,	(参考訳) 高度調整性のおかげで、ホログラフィック・ツイーザーアレイは任意のジオメトリ原子配列を作るのに最適な選択であることが証明された。ホログラフィックトウィーザーアレイ実験では、通常、空間光変調器(SLM)によって生成された光トウィーザーが静的トウィーザーアレイとして使用される。交流スタークシフト効果により、トラップの強度差は異なる光シフトを引き起こす。したがって、強度等化の最適化は、単原子からなる多体系において非常に重要である。本稿では,強度等化アルゴリズムの研究について報告する。このアルゴリズムにより、ツイーザーの大きさが1000より大きい場合、ツイーザーの均一性が96%を超える。解析により、さらなる均一性には光学系のさらなる最適化が必要であることが示された。強度等化アルゴリズムの実現は、単一原子配列に基づく多体実験において非常に重要である。 Thanks to the high degree of adjustability, holographic tweezer array has been proved to be the best choice to create arbitrary geometries atomic array. In holographic tweezer array experiment, optical tweezer generated by spatial light modulator (SLM) usually is used as static tweezer array. Due to the alternating current(AC) stark shifts effect, intensity difference of traps will cause different light shift. So, the optimization of intensity equalization is very important in many-body system consist of single atoms. Here we report a work on studying of intensity equalization algorithm. Through this algorithm, the uniformity of tweezer can exceed 96% when the number of tweezer size is bigger than 1000. Our analysis shows that further uniformity requires further optimization of optical system. The realization of the intensity equalization algorithm is of great significance to the many-body experiments based on single atom array.	翻訳日:2024-11-08 15:23:20 公開日:2024-11-06
# 視覚表現学習のためのマルチラベルクラスタ識別 Multi-label Cluster Discrimination for Visual Representation Learning ( http://arxiv.org/abs/2407.17331v2 ) ライセンス: Link先を確認	Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jiankang Deng,	(参考訳) コントラスト言語画像事前学習(CLIP)は、画像テキストのコントラスト学習によって強化された優れた特徴表現により、様々なタスクで成功した。しかし、CLIPが使用するインスタンス識別手法では、トレーニングデータのセマンティック構造をほとんどエンコードできない。この制限に対処するため、反復的なクラスタ割り当てと分類によってクラスタ識別が提案されている。しかしながら、ほとんどのクラスタ識別アプローチは、画像内の複数ラベル信号を無視して、各画像に対して1つの擬似ラベルを定義するだけである。本稿では,MLCDと呼ばれる新しいマルチラベルクラスタ識別手法を提案する。クラスタリングのステップでは、まず大規模なLAION-400Mデータセットを、オフザシェルフの埋め込み機能に基づいて100万のセンタにクラスタ化します。自然画像には複数の視覚的対象や属性が頻繁に含まれており、補助的なクラスラベルとして複数の最も近い中心を選択する。識別段階において、我々は、正のクラスと負のクラスから損失を優雅に分離し、決定境界の曖昧さを軽減する、新しい多ラベル分類損失を設計する。モデルと事前学習データセットの異なるスケールの実験により,提案手法の有効性を検証した。実験の結果,線形プローブ,ゼロショット分類,画像テキスト検索など,複数の下流タスクにおける最先端性能が得られた。コードとモデルはhttps://github.com/deepglint/unicom でリリースされた。 Contrastive Language Image Pre-training (CLIP) has recently demonstrated success across various tasks due to superior feature representation empowered by image-text contrastive learning. However, the instance discrimination method used by CLIP can hardly encode the semantic structure of training data. To handle this limitation, cluster discrimination has been proposed through iterative cluster assignment and classification. Nevertheless, most cluster discrimination approaches only define a single pseudo-label for each image, neglecting multi-label signals in the image. In this paper, we propose a novel Multi-Label Cluster Discrimination method named MLCD to enhance representation learning. In the clustering step, we first cluster the large-scale LAION-400M dataset into one million centers based on off-the-shelf embedding features. Considering that natural images frequently contain multiple visual objects or attributes, we select the multiple closest centers as auxiliary class labels. In the discrimination step, we design a novel multi-label classification loss, which elegantly separates losses from positive classes and negative classes, and alleviates ambiguity on decision boundary. We validate the proposed multi-label cluster discrimination method with experiments on different scales of models and pre-training datasets. Experimental results show that our method achieves state-of-the-art performance on multiple downstream tasks including linear probe, zero-shot classification, and image-text retrieval. Code and models have been released at https://github.com/deepglint/unicom .	翻訳日:2024-11-08 15:23:20 公開日:2024-11-06
# BetterDepth:ゼロショット単眼深度推定のためのプラグアンドプレイ拡散精錬器 BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation ( http://arxiv.org/abs/2407.17952v2 ) ライセンス: Link先を確認	Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton Obukhov, Markus Gross, Konrad Schindler, Christopher Schroers,	(参考訳) 大規模データセット上でのトレーニングにより、ゼロショット単眼深度推定(MDE)手法は、野生では堅牢な性能を示すが、詳細が不十分な場合が多い。拡散に基づく最近のMDE手法は、詳細を抽出する優れた能力を示すが、より多様な3Dデータに基づいて訓練された、幾何学的に複雑なシーンに苦しむ。両世界の相補的な利点を活用するため,細部を捉えながら幾何的に正しいアフィン不変のMDEを実現するためのBetterDepthを提案する。具体的には、BetterDepthは、事前訓練されたMDEモデルからの予測を深度条件付けとして、大域的な深度レイアウトを適切にキャプチャし、入力画像に基づいて詳細を反復的に洗練する条件拡散ベースの精錬機である。このようなリファインダのトレーニングのために,細かなシーンの詳細を学習しながら,BetterDepthが奥行き条件に忠実であることを保証するために,グローバルな事前調整と局所パッチマスキング手法を提案する。小規模の合成データセットの効率的なトレーニングにより、BetterDepthは、さまざまな公開データセットと、その中のシーンで、最先端のゼロショットMDEパフォーマンスを達成する。さらに、BetterDepthはプラグアンドプレイ方式で他のMDEモデルの性能を向上させることができる。 By training over large-scale datasets, zero-shot monocular depth estimation (MDE) methods show robust performance in the wild but often suffer from insufficient detail. Although recent diffusion-based MDE approaches exhibit a superior ability to extract details, they struggle in geometrically complex scenes that challenge their geometry prior, trained on less diverse 3D data. To leverage the complementary merits of both worlds, we propose BetterDepth to achieve geometrically correct affine-invariant MDE while capturing fine details. Specifically, BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning, in which the global depth layout is well-captured, and iteratively refines details based on the input image. For the training of such a refiner, we propose global pre-alignment and local patch masking methods to ensure BetterDepth remains faithful to the depth conditioning while learning to add fine-grained scene details. With efficient training on small-scale synthetic datasets, BetterDepth achieves state-of-the-art zero-shot MDE performance on diverse public datasets and on in-the-wild scenes. Moreover, BetterDepth can improve the performance of other MDE models in a plug-and-play manner without further re-training.	翻訳日:2024-11-08 15:01:09 公開日:2024-11-06
# 効率的かつ効果的に:交通分類のための平文と暗号化テキストのバランスをとるための2段階的アプローチ Efficiently and Effectively: A Two-stage Approach to Balance Plaintext and Encrypted Text for Traffic Classification ( http://arxiv.org/abs/2407.19687v3 ) ライセンス: Link先を確認	Wei Peng, Lei Cui, Wei Cai, Zhenquan Ding, Zhiyu Hao, Xiaochun Yun,	(参考訳) 暗号化されたトラフィック分類は、暗号化されたネットワークトラフィックに関連するアプリケーションまたはサービスを特定するタスクである。このタスクの効果的なアプローチは、ディープラーニングを使って生のトラフィックバイトを直接エンコードし、分類のための機能(バイトベースモデル)を自動的に抽出することである。しかし、現在のバイトベースのモデルでは、平文や暗号化されたテキストのいずれでも、平文や暗号化されたテキストが下流タスクに与える影響を無視して、自動的な特徴抽出のために生のトラフィックバイトを入力している。さらに、これらのモデルは主に分類精度の改善に重点を置いており、モデルの効率にはほとんど重点を置いていない。本稿では,原文と暗号化されたテキストがモデルの有効性と効率に与える影響を初めて分析する。そこで本研究では,トラフィック分類における平文と暗号化テキストのトレードオフを両立させる2段階の手法を提案する。具体的には、提案したDPCセレクタを用いて、Plainテキストが正確に分類(DPC)できるかどうかを決定する。この段階では、平文で分類できるサンプルを素早く特定し、平文で明示的なバイト機能を活用してモデルの効率を高める。ステージ2は、ステージ1の結果を適応的に分類することを目的としている。この段階では、平文だけで分類できないサンプルに対して暗号化されたテキスト情報を組み込み、トラフィック分類タスクにおけるモデルの有効性を保証する。 2つのデータセットに対する実験により,提案モデルが有効性と効率の両面で最先端の結果が得られることを示した。 Encrypted traffic classification is the task of identifying the application or service associated with encrypted network traffic. One effective approach for this task is to use deep learning methods to encode the raw traffic bytes directly and automatically extract features for classification (byte-based models). However, current byte-based models input raw traffic bytes, whether plaintext or encrypted text, for automated feature extraction, neglecting the distinct impacts of plaintext and encrypted text on downstream tasks. Additionally, these models primarily focus on improving classification accuracy, with little emphasis on the efficiency of models. In this paper, for the first time, we analyze the impact of plaintext and encrypted text on the model's effectiveness and efficiency. Based on our observations and findings, we propose a two-phase approach to balance the trade-off between plaintext and encrypted text in traffic classification. Specifically, Stage one is to Determine whether the Plain text is enough to be accurately Classified (DPC) using the proposed DPC Selector. This stage quickly identifies samples that can be classified using plaintext, leveraging explicit byte features in plaintext to enhance model's efficiency. Stage two aims to adaptively make a classification with the result from stage one. This stage incorporates encrypted text information for samples that cannot be classified using plaintext alone, ensuring the model's effectiveness on traffic classification tasks. Experiments on two datasets demonstrate that our proposed model achieves state-of-the-art results in both effectiveness and efficiency.	翻訳日:2024-11-08 14:27:29 公開日:2024-11-06
# Virchow2: 病理学における自己監督型混合拡大モデルのスケーリング Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology ( http://arxiv.org/abs/2408.00738v3 ) ライセンス: Link先を確認	Eric Zimmermann, Eugene Vorontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi, Thomas Fuchs, Nicolo Fusi, Siqi Liu, Kristen Severson,	(参考訳) 基礎モデルは、計算病理学の応用のために急速に開発されている。しかし、データスケールと多様性、モデルサイズ、トレーニングアルゴリズムなど、ダウンストリームのパフォーマンスにおいて、どの要素がもっとも重要かは、まだ明らかな疑問である。本研究では,病理学に適したアルゴリズム的修正を提案するとともに,データサイズとモデルサイズの両方をスケールした結果を,両次元の先行研究を超越した結果として提示する。 6億2200万のパラメータ・ビジョン・トランスフォーマーであるVirchow2Gと、19億のパラメータ・ビジョン・トランスフォーマーであるVirchow2G Miniと、Virchow2Gの2200万のパラメータ・蒸留であるVirchow2G Miniの3つの新しいモデルを紹介した。上位の競合モデルと比較して,12のタイルレベルのタスクで最先端のパフォーマンスを実現する。以上の結果から,データ多様性とドメイン固有の手法は,パラメータ数のみをスケールするモデルよりも優れているが,平均的には,ドメイン固有の手法,データスケール,モデルスケールの組み合わせによるパフォーマンス上のメリットが期待できる。 Foundation models are rapidly being developed for computational pathology applications. However, it remains an open question which factors are most important for downstream performance with data scale and diversity, model size, and training algorithm all playing a role. In this work, we propose algorithmic modifications, tailored for pathology, and we present the result of scaling both data and model size, surpassing previous studies in both dimensions. We introduce three new models: Virchow2, a 632 million parameter vision transformer, Virchow2G, a 1.9 billion parameter vision transformer, and Virchow2G Mini, a 22 million parameter distillation of Virchow2G, each trained with 3.1 million histopathology whole slide images, with diverse tissues, originating institutions, and stains. We achieve state of the art performance on 12 tile-level tasks, as compared to the top performing competing models. Our results suggest that data diversity and domain-specific methods can outperform models that only scale in the number of parameters, but, on average, performance benefits from the combination of domain-specific methods, data scale, and model scale.	翻訳日:2024-11-08 13:29:21 公開日:2024-11-06
# 半導体Si-SiGeスピンビットにおけるフォノン誘起交換ゲート不均一性 Phonon-Induced Exchange Gate Infidelities in Semiconducting Si-SiGe Spin Qubits ( http://arxiv.org/abs/2408.02742v2 ) ライセンス: Link先を確認	Matthew Brooks, Rex Lundgren, Charles Tahan,	(参考訳) 半導体スピン量子ビット間のスピン-スピン交換相互作用は、高速な単一および2量子ゲートを可能にする。交換の間、クォービットと周囲のフォノン浴のカップリングは、結果として生じるゲートに誤りを引き起こす可能性がある。ここでは、有限温度フォノン浴に結合したSi-SiGeヘテロ構造における半導体二重量子ドットスピン量子ビットとの交換操作の忠実さを考察する。マスター方程式を用いて、各スピンフォノン結合項の孤立効果と符号化量子ビット演算の漏れ誤差を解くことができる。温度が上昇するにつれて、2つの電子スピン状態のフォノン誘起摂動に起因する一次誤差の源となる部分と、励起軌道状態へのフォノン誘起結合が支配的誤差となる部分との交差が観察される。さらに, パルス形状と長さの単純なトレードオフにより, ゲート操作時のスピンフォノン誘起誤差に対して, 最大で1桁の堅牢性を向上できることが示されている。以上の結果から,200-300mK以内の高温では,交換ゲートの動作はバルクフォノンで制限されていないことが示唆された。これは最近の実験と一致している。 Spin-spin exchange interactions between semiconductor spin qubits allow for fast single and two-qubit gates. During exchange, coupling of the qubits to a surrounding phonon bath may cause errors in the resulting gate. Here, the fidelities of exchange operations with semiconductor double quantum dot spin qubits in a Si-SiGe heterostructure coupled to a finite temperature phonon bath are considered. By employing a master equation approach, the isolated effect of each spin-phonon coupling term may be resolved, as well as leakage errors of encoded qubit operations. As the temperature is increased, a crossover is observed from where the primary source of error is due to a phonon induced perturbation of the two electron spin states, to one where the phonon induced coupling to an excited orbital state becomes the dominant error. Additionally, it is shown that a simple trade-off in pulse shape and length can improve robustness to spin-phonon induced errors during gate operations by up to an order of magnitude. Our results suggest that for elevated temperatures within 200-300 mK, exchange gate operations are not currently limited by bulk phonons. This is consistent with recent experiments.	翻訳日:2024-11-08 12:55:50 公開日:2024-11-06
# コードのための大規模言語モデルのホットフィックス Hotfixing Large Language Models for Code ( http://arxiv.org/abs/2408.05727v3 ) ライセンス: Link先を確認	Zhou Yang, David Lo,	(参考訳) コードのための大規模言語モデル(LLM4Code)は開発者のワークフローの不可欠な部分となり、コード補完や生成などのタスクを支援している。しかし、これらのモデルは、バグの多いコードを含む大量のソースコードを広範囲にトレーニングしたために、バグの多いコードを生成するなど、リリース後に望ましくない振る舞いを示す。トレーニングデータ(通常、オープンソースソフトウェアから来る)は進化を続けており、例えば、開発者はバグの多いコードを修正します。しかしながら、LLM4Codeの望ましくない振る舞いを軽減するためにこのような進化を適用することは、簡単ではない。このことは、LLM4Codeの望ましくない振る舞いを最小限の負の効果で効果的かつ効率的に緩和する、LLM4Codeのホットフィックスの概念を提案する動機である。本稿では,LLM4Codeをホットフィックスすることで,バグの少ないコードとより固定的なコードを生成することに焦点を当てる。私たちは、人気のあるCodeGenファミリのモデルが頻繁にバグのあるコードを生成することを実証することから始めます。そこで,本研究では,(1)所望の動作を学習し,(2)望ましくない動作を学習し,(3)他のコードの知識を保持する,という3つの学習目標を定義した。モデルをホットフィックスするための4つの異なる微調整手法を評価し,以下の知見を得た。 LoRA(低ランク適応)を用いてこれら3つの学習目標を同時に最適化することは、モデルの振る舞いに効果的に影響を及ぼす。具体的には、固定コードの生成を最大108.42%増加させ、バグコードの生成を最大50.47%減少させる。統計テストでは、HumanEvalベンチマークにおいてホットフィックスがモデルの機能的正しさに悪影響を及ぼさないことが確認された。さらに、メールアドレスの露出を99.30%減らし、ホットフィックスの一般化性を評価する。 Large Language Models for Code (LLM4Code) have become an integral part of developers' workflows, assisting with tasks such as code completion and generation. However, these models are found to exhibit undesired behaviors after their release, like generating buggy code, due to their extensive training on vast amounts of source code that contain such buggy code. The training data (usually coming from open-source software) keeps evolving, e.g., developers fix the buggy code. However, adapting such evolution to mitigate LLM4Code's undesired behaviors is non-trivial, as retraining models on the updated dataset usually takes much time and resources. This motivates us to propose the concept of hotfixing LLM4Code, mitigating LLM4Code's undesired behaviors effectively and efficiently with minimal negative effects. This paper mainly focuses on hotfixing LLM4Code to make them generate less buggy code and more fixed code. We begin by demonstrating that models from the popular CodeGen family frequently generate buggy code. Then, we define three learning objectives in hotfixing and design multiple loss functions for each objective: (1) learn the desired behaviors, (2) unlearn the undesired behaviors, and (3) retain knowledge of other code. We evaluate four different fine-tuning techniques for hotfixing the models and gain the following insights. Optimizing these three learning goals together, using LoRA (low-rank adaptation), effectively influences the model's behavior. Specifically, it increases the generation of fixed code by up to 108.42% and decreases the generation of buggy code by up to 50.47%. Statistical tests confirm that hotfixing does not significantly affect the models' functional correctness on the HumanEval benchmark. Additionally, to evaluate the generalizability of hotfixing by reducing the exposure of email addresses by 99.30%.	翻訳日:2024-11-08 11:49:24 公開日:2024-11-06
# コードのための大規模言語モデルのホットフィックス Hotfixing Large Language Models for Code ( http://arxiv.org/abs/2408.05727v4 ) ライセンス: Link先を確認	Zhou Yang, David Lo,	(参考訳) コードのための大規模言語モデル(LLM4Code)は開発者のワークフローの不可欠な部分となり、コード補完や生成などのタスクを支援している。しかし、これらのモデルは、バグの多いコードを含む大量のソースコードを広範囲にトレーニングしたために、バグの多いコードを生成するなど、リリース後に望ましくない振る舞いを示す。トレーニングデータ(通常、オープンソースソフトウェアから来る)は進化を続けており、例えば、開発者はバグの多いコードを修正します。しかしながら、LLM4Codeの望ましくない振る舞いを軽減するためにこのような進化を適用することは、簡単ではない。このことは、LLM4Codeの望ましくない振る舞いを最小限の負の効果で効果的かつ効率的に緩和する、LLM4Codeのホットフィックスの概念を提案する動機である。本稿では,LLM4Codeをホットフィックスすることで,バグの少ないコードとより固定的なコードを生成することに焦点を当てる。私たちは、人気のあるCodeGenファミリのモデルが頻繁にバグのあるコードを生成することを実証することから始めます。そこで,本研究では,(1)所望の動作を学習し,(2)望ましくない動作を学習し,(3)他のコードの知識を保持する,という3つの学習目標を定義した。モデルをホットフィックスするための4つの異なる微調整手法を評価し,以下の知見を得た。 LoRA(低ランク適応)を用いてこれら3つの学習目標を同時に最適化することは、モデルの振る舞いに効果的に影響を及ぼす。具体的には、固定コードの生成を最大108.42%増加させ、バグコードの生成を最大50.47%減少させる。統計テストでは、HumanEvalベンチマークにおいてホットフィックスがモデルの機能的正しさに悪影響を及ぼさないことが確認された。さらに、メールアドレスの露出を99.30%減らし、ホットフィックスの一般化性を評価する。 Large Language Models for Code (LLM4Code) have become an integral part of developers' workflows, assisting with tasks such as code completion and generation. However, these models are found to exhibit undesired behaviors after their release, like generating buggy code, due to their extensive training on vast amounts of source code that contain such buggy code. The training data (usually coming from open-source software) keeps evolving, e.g., developers fix the buggy code. However, adapting such evolution to mitigate LLM4Code's undesired behaviors is non-trivial, as retraining models on the updated dataset usually takes much time and resources. This motivates us to propose the concept of hotfixing LLM4Code, mitigating LLM4Code's undesired behaviors effectively and efficiently with minimal negative effects. This paper mainly focuses on hotfixing LLM4Code to make them generate less buggy code and more fixed code. We begin by demonstrating that models from the popular CodeGen family frequently generate buggy code. Then, we define three learning objectives in hotfixing and design multiple loss functions for each objective: (1) learn the desired behaviors, (2) unlearn the undesired behaviors, and (3) retain knowledge of other code. We evaluate four different fine-tuning techniques for hotfixing the models and gain the following insights. Optimizing these three learning goals together, using LoRA (low-rank adaptation), effectively influences the model's behavior. Specifically, it increases the generation of fixed code by up to 108.42% and decreases the generation of buggy code by up to 50.47%. Statistical tests confirm that hotfixing does not significantly affect the models' functional correctness on the HumanEval benchmark. Additionally, to evaluate the generalizability of hotfixing by reducing the exposure of email addresses by 99.30%.	翻訳日:2024-11-08 11:49:24 公開日:2024-11-06
# 量子ビットを用いた量子情報 Quantum information with quantum-like bits ( http://arxiv.org/abs/2408.06485v2 ) ライセンス: Link先を確認	Graziano Amati, Gregory D. Scholes,	(参考訳) これまでの研究で我々は、例えば発振器のような大型で複雑な古典的システムと、デコヒーレンスによって損なわれない量子的機能を実現する量子的ビットの構築を提案してきた。本稿では、量子状態のこのプラットフォームをさらに検討する。まず,創発的な状態を許容する同期ネットワークの構築方法に関する一般的なプロトコルについて議論する。次に、これらの状態に対してゲートをどのように実装できるかを研究する。これは、特別に構築された古典的ネットワーク上での量子ライクな計算の可能性を示している。最後に、我々のモデルを古典的確率システムから分離する特徴である非コルモゴロフ干渉を可能にする測定の概念を定義する。本稿では,量子的資源の数学的構造を探究し,これらのシステムにおける創発的状態を操作することで任意のゲートをどのように実現できるかを示す。 In previous work we have proposed a construction of quantum-like bits that could endow a large, complex classical system, for example of oscillators, with quantum-like function that is not compromised by decoherence. In the present paper we investigate further this platform of quantum-like states. Firstly, we discuss a general protocol on how to construct synchronizing networks that allow for emergent states. We then study how gates can be implemented on those states. This suggests the possibility of quantum-like computing on specially-constructed classical networks. Finally, we define a notion of measurement that allows for non-Kolmogorov interference, a feature that separates our model from a classical probabilistic system. This paper aims to explore the mathematical structure of quantum-like resources, and shows how arbitrary gates can be implemented by manipulating emergent states in those systems.	翻訳日:2024-11-08 11:26:46 公開日:2024-11-06
# ELASTIC:シークエンス圧縮のための効率的な線形アテンション ELASTIC: Efficient Linear Attention for Sequential Interest Compression ( http://arxiv.org/abs/2408.09380v3 ) ライセンス: Link先を確認	Jiaxin Deng, Shiyao Wang, Song Lu, Yinfeng Li, Xinchen Luo, Yuanjun Liu, Peixing Xu, Guorui Zhou,	(参考訳) 最先端のシーケンシャルレコメンデーションモデルは、トランスフォーマーの注意機構に大きく依存している。しかし、自己注意の二次計算とメモリの複雑さは、ユーザの長距離動作シーケンスをモデル化するためのスケーラビリティを制限している。この問題に対処するために、線形時間複雑性と計算コストからのモデルキャパシティの分離を必要とせず、SequenTial Interest Compressionの効率的な線形アテンションであるELASTICを提案する。具体的には、線形ディスパッチアテンション機構を備えた固定長関心の専門家を導入し、長期の動作シーケンスをよりコンパクトな表現に圧縮し、x2.7推論速度で最大90%のGPUメモリ使用量を削減した。提案した線形ディスパッチアテンション機構は2次複雑性を著しく低減し、非常に長いシーケンスを適切にモデル化できるモデルを実現する。さらに、多様なユーザ関心をモデル化する能力を維持するため、ELASTICは、膨大な学習可能な関心記憶バンクを初期化し、圧縮されたユーザ関心を、無視可能な計算オーバーヘッドでメモリからわずかに回収する。提案手法は,同じ計算コストを維持しつつ,利用可能な関心空間の濃度を著しく拡張し,推奨精度と効率のトレードオフを生じさせる。提案するELASTICの有効性を検証するため,様々な公開データセットに対する広範囲な実験を行い,複数の強力なシーケンシャルなレコメンデータと比較した。実験結果から、ELASTICはベースラインをかなりのマージンで一貫した性能を示し、長いシーケンスをモデル化する際の計算効率を強調した。実装コードを公開します。 State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism. However, the quadratic computational and memory complexities of self attention have limited its scalability for modeling users' long range behaviour sequences. To address this problem, we propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression, requiring only linear time complexity and decoupling model capacity from computational cost. Specifically, ELASTIC introduces a fixed length interest experts with linear dispatcher attention mechanism which compresses the long-term behaviour sequences to a significantly more compact representation which reduces up to 90% GPU memory usage with x2.7 inference speed up. The proposed linear dispatcher attention mechanism significantly reduces the quadratic complexity and makes the model feasible for adequately modeling extremely long sequences. Moreover, in order to retain the capacity for modeling various user interests, ELASTIC initializes a vast learnable interest memory bank and sparsely retrieves compressed user's interests from the memory with a negligible computational overhead. The proposed interest memory retrieval technique significantly expands the cardinality of available interest space while keeping the same computational cost, thereby striking a trade-off between recommendation accuracy and efficiency. To validate the effectiveness of our proposed ELASTIC, we conduct extensive experiments on various public datasets and compare it with several strong sequential recommenders. Experimental results demonstrate that ELASTIC consistently outperforms baselines by a significant margin and also highlight the computational efficiency of ELASTIC when modeling long sequences. We will make our implementation code publicly available.	翻訳日:2024-11-08 07:07:05 公開日:2024-11-06
# CIPHER: サイバーセキュリティのインテリジェントな侵入テスト支援者 CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher ( http://arxiv.org/abs/2408.11650v2 ) ライセンス: Link先を確認	Derry Pratama, Naufal Suryanto, Andro Aprila Adiputra, Thi-Thu-Huong Le, Ahmada Yusril Kadiptya, Muhammad Iqbal, Howon Kim,	(参考訳) サイバーセキュリティの重要なコンポーネントである浸透テストは、脆弱性を見つけるのに広範囲な時間と労力を必要とする。この分野のベジニアは、しばしばコミュニティや専門家との協力的なアプローチの恩恵を受ける。そこで我々はCIPHER(Cybersecurity Intelligent Peretration-testing Helper for Ethical researchers)を開発した。私たちは、脆弱なマシンの300以上の高品質な書き込み、ハッキングテクニック、オープンソースの侵入テストツールのドキュメントを使用してCIPHERをトレーニングしました。さらに我々は,大規模な言語モデルに適した完全自動ペンテスティングシミュレーションベンチマークを確立するために,インテグレーション・アクション・推論・結果(FARR)フロー拡張(en:Fundings, Action, Reasoning, results)を導入した。このアプローチは、従来のサイバーセキュリティのQ\&Aベンチマークにおける大きなギャップを埋め、AIの技術知識、推論能力、動的侵入テストシナリオにおける実用性を評価するための、現実的で厳格な標準を提供する。我々の評価では、CIPHERは、Llama 3 70BやQwen1.5 72B Chatのような、同じ大きさの他のオープンソース浸透試験モデルや、さらに大きな最先端モデルと比較して、正確な提案応答を提供することで、最高の全体的なパフォーマンスを達成しました。このことは、汎用LLMの現在の能力が、侵入テストプロセスを通じてユーザを効果的に導くのに不十分であることを示している。また、スケーリングによる改善の可能性や、FARR Flow Augmentationの結果を用いたより良いベンチマークの開発についても論じる。私たちのベンチマークはhttps://github.com/ibndias/CIPHER.comで公開されます。 Penetration testing, a critical component of cybersecurity, typically requires extensive time and effort to find vulnerabilities. Beginners in this field often benefit from collaborative approaches with the community or experts. To address this, we develop CIPHER (Cybersecurity Intelligent Penetration-testing Helper for Ethical Researchers), a large language model specifically trained to assist in penetration testing tasks. We trained CIPHER using over 300 high-quality write-ups of vulnerable machines, hacking techniques, and documentation of open-source penetration testing tools. Additionally, we introduced the Findings, Action, Reasoning, and Results (FARR) Flow augmentation, a novel method to augment penetration testing write-ups to establish a fully automated pentesting simulation benchmark tailored for large language models. This approach fills a significant gap in traditional cybersecurity Q\&A benchmarks and provides a realistic and rigorous standard for evaluating AI's technical knowledge, reasoning capabilities, and practical utility in dynamic penetration testing scenarios. In our assessments, CIPHER achieved the best overall performance in providing accurate suggestion responses compared to other open-source penetration testing models of similar size and even larger state-of-the-art models like Llama 3 70B and Qwen1.5 72B Chat, particularly on insane difficulty machine setups. This demonstrates that the current capabilities of general LLMs are insufficient for effectively guiding users through the penetration testing process. We also discuss the potential for improvement through scaling and the development of better benchmarks using FARR Flow augmentation results. Our benchmark will be released publicly at https://github.com/ibndias/CIPHER.	翻訳日:2024-11-08 06:11:36 公開日:2024-11-06
# OpenFactCheck: LLMのファクチュアリティ評価のための統一フレームワーク OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs ( http://arxiv.org/abs/2408.11832v2 ) ライセンス: Link先を確認	Hasan Iqbal, Yuxia Wang, Minghan Wang, Georgi Georgiev, Jiahui Geng, Iryna Gurevych, Preslav Nakov,	(参考訳) 様々な現実世界のアプリケーションにまたがる大規模言語モデル(LLM)の利用が増加し、LLMが幻覚しているように、アウトプットの実際の正確性をチェックするための自動ツールが求められている。自由形式のオープンドメイン応答の事実性を評価する必要があるため、これは難しい。この話題について多くの研究が行われてきたが、異なる論文では異なる評価ベンチマークと測定方法を使用しているため、将来の進歩を比べることは困難である。これらの問題を緩和するため、私たちは3つのモジュールを持つ統一フレームワークであるOpenFactCheckを開発しました。 (i)RESPONSEEVALは、自動事実確認システムを容易にカスタマイズし、そのシステムを用いて入力文書中のすべてのクレームの事実性を評価することができる。二 LLMの全体的事実性を評価する LLMEVAL 及び三自動事実確認システムを評価するためのモジュール、CECKEREVAL OpenFactCheckはオープンソース(https://github.com/mbzuai-nlp/openfactcheck)で、Pythonライブラリ(https://pypi.org/project/openfactcheck/)として、Webサービス(http://app.openfactcheck.com)として公開されている。システムを記述するビデオはhttps://youtu.be/-i9VKL0HleIで公開されている。 The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures, which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI.	翻訳日:2024-11-08 06:00:04 公開日:2024-11-06
# 線形ニューラルネットワークの講義ノート:ディープラーニングにおける最適化と一般化の物語 Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning ( http://arxiv.org/abs/2408.13767v2 ) ライセンス: Link先を確認	Nadav Cohen, Noam Razin,	(参考訳) これらのノートは、深層学習の数学的理解に関するプリンストン大学の上級講座の一部として、2021年3月にNCが行った講義に基づいている。彼らは線形ニューラルネットワークの理論(NC、NR、共同研究者によって開発された)を提示し、ディープラーニングの最適化と一般化の研究における基礎モデルである。提示された理論から生まれた実践的応用についても論じる。この理論は、自然界で動的である数学的ツールに基づいている。これは、ディープラーニングにおける最適化と一般化の理解のエンベロープを推し進めるための、そのようなツールの可能性を示している。このテキストは統計学習理論の基礎に精通している。エクササイズは(ソリューションなしで)含まれます。 These notes are based on a lecture delivered by NC on March 2021, as part of an advanced course in Princeton University on the mathematical understanding of deep learning. They present a theory (developed by NC, NR and collaborators) of linear neural networks -- a fundamental model in the study of optimization and generalization in deep learning. Practical applications born from the presented theory are also discussed. The theory is based on mathematical tools that are dynamical in nature. It showcases the potential of such tools to push the envelope of our understanding of optimization and generalization in deep learning. The text assumes familiarity with the basics of statistical learning theory. Exercises (without solutions) are included.	翻訳日:2024-11-08 05:15:13 公開日:2024-11-06
# BCDNet: 浸潤性直腸癌検出のための高速残像ニューラルネットワーク BCDNet: A Fast Residual Neural Network For Invasive Ductal Carcinoma Detection ( http://arxiv.org/abs/2408.13800v3 ) ライセンス: Link先を確認	Yujia Lin, Aiwei Lian, Mingyu Liao, Shuangjie Yuan,	(参考訳) 乳がんの亜型である浸潤性直腸癌(IDC)を早期に診断することが重要である。 CAD(Computer-Aided Diagnosis)システムの強力なモデルは有望な結果をもたらすが、他の医療機器と統合したり、十分な計算資源を使わずに使用することは依然として困難である。本稿では,まず入力画像を残差ブロックで増幅し,より小さな畳み込みブロックと特別なMLPを用いて特徴を学習するBCDNetを提案する。 BCDNetは、病理組織学的RGB画像におけるIDCを91.6%の平均精度で効果的に検出し、ResNet 50やViT-B-16と比較してトレーニング消費を効果的に削減することが証明されている。 It is of great significance to diagnose Invasive Ductal Carcinoma (IDC) in early stage, which is the most common subtype of breast cancer. Although the powerful models in the Computer-Aided Diagnosis (CAD) systems provide promising results, it is still difficult to integrate them into other medical devices or use them without sufficient computation resource. In this paper, we propose BCDNet, which firstly upsamples the input image by the residual block and use smaller convolutional block and a special MLP to learn features. BCDNet is proofed to effectively detect IDC in histopathological RGB images with an average accuracy of 91.6% and reduce training consumption effectively compared to ResNet 50 and ViT-B-16.	翻訳日:2024-11-08 05:15:13 公開日:2024-11-06
# 人的介入を伴わない手術器具分割の再検討:グラフ分割 Revisiting Surgical Instrument Segmentation Without Human Intervention: A Graph Partitioning View ( http://arxiv.org/abs/2408.14789v2 ) ライセンス: Link先を確認	Mingyu Sheng, Jianan Fan, Dongnan Liu, Ron Kikinis, Weidong Cai,	(参考訳) 内視鏡画像における手術器具のセグメンテーション(SIS)は,低侵襲手術を増強するためのコンピュータ支援的介入の文脈において,長年の重要課題である。近年の深層学習の方法論とデータ・ハングリーの性質の高まりを踏まえ、大規模な専門家による注釈に基づく神経予測モデルを訓練することは、この分野における既成のアプローチとして支配され、しかしながら、収集された外科的ビデオフレームに対応する微細なピクセル単位のラベルを作成するために、臨床医に禁止的な負担を課す可能性がある。本研究では,ビデオフレーム分割をグラフ分割問題として再検討し,画像画素をグラフノードとして扱う教師なし手法を提案する。自己教師付き事前学習モデルは、まず、高レベルな意味的特徴をキャプチャする特徴抽出器として活用される。すると、ラプラシア行列は特徴量から計算され、グラフ分割のために固有分解される。ディープ」固有ベクトルでは、手術用ビデオフレームは、ツールや組織などの異なるモジュールに意味的に分割され、位置、クラス、関係などの区別可能な意味情報を提供する。セグメンテーション問題は、固有ベクトルにクラスタリングやしきい値を適用することで自然に取り組むことができる。様々な臨床エンドポイント(例:EndoVis2017、EndoVis2018、UCLなど)で広範囲にわたる実験が実施されている。難解なシナリオのすべてにおいて,本手法は,教師なしの最先端(SOTA)手法よりも優れた性能と堅牢性を示す。コードはhttps://github.com/MingyuShengSMY/GraphClusteringSIS.gitで公開されている。 Surgical instrument segmentation (SIS) on endoscopic images stands as a long-standing and essential task in the context of computer-assisted interventions for boosting minimally invasive surgery. Given the recent surge of deep learning methodologies and their data-hungry nature, training a neural predictive model based on massive expert-curated annotations has been dominating and served as an off-the-shelf approach in the field, which could, however, impose prohibitive burden to clinicians for preparing fine-grained pixel-wise labels corresponding to the collected surgical video frames. In this work, we propose an unsupervised method by reframing the video frame segmentation as a graph partitioning problem and regarding image pixels as graph nodes, which is significantly different from the previous efforts. A self-supervised pre-trained model is firstly leveraged as a feature extractor to capture high-level semantic features. Then, Laplacian matrixs are computed from the features and are eigendecomposed for graph partitioning. On the "deep" eigenvectors, a surgical video frame is meaningfully segmented into different modules such as tools and tissues, providing distinguishable semantic information like locations, classes, and relations. The segmentation problem can then be naturally tackled by applying clustering or threshold on the eigenvectors. Extensive experiments are conducted on various datasets (e.g., EndoVis2017, EndoVis2018, UCL, etc.) for different clinical endpoints. Across all the challenging scenarios, our method demonstrates outstanding performance and robustness higher than unsupervised state-of-the-art (SOTA) methods. The code is released at https://github.com/MingyuShengSMY/GraphClusteringSIS.git.	翻訳日:2024-11-08 04:52:58 公開日:2024-11-06
# 量子シャドウトモグラフィによる効率的な後処理による量子アドバンテージ Quantum Advantage via Efficient Post-processing on Qudit Shadow tomography ( http://arxiv.org/abs/2408.16244v2 ) ライセンス: Link先を確認	Yu Wang,	(参考訳) 量子科学や人工知能などの分野において、 \(\text{tr}(AB)\) の計算は必須であるが、古典的な計算複雑性は \(A \) と \(B \) が \(d \)-次元行列であるときに \(O(d^2) \) である。さらに、 \(A \) と \(B \) を格納するには \(O(d^2) \) メモリが必要であるため、指数関数的な高次元システムにはさらなる課題が生じる。広義の行列のクラス \(A \) と有界ノルムエルミート行列 \(B \) に対して、計算と記憶の複雑さを指数関数的に \(O(\text{poly}(\log d)) \) に減らし、既知の \(\text{tr}(B)\ を持つ有界ノルムエルミート行列 \(B \) に対して量子的アプローチを提案する。ランダムなクリフォード測定によるシャドウトモグラフィーと比較すると,本手法は,測定毎の計算処理の複雑さを指数最悪のシナリオから定数に減らし,任意の次元の \(d \) に適用可能である。この進歩は、効率的な高次元データ解析と複雑なシステムモデリングのための新しい経路を開く。 The calculation of \(\text{tr}(AB)\) is essential in fields like quantum science and artificial intelligence, but the classical computational complexity is \( O(d^2) \) when \( A \) and \( B \) are \( d \)-dimensional matrices. Moreover, storing \( A \) and \( B \) requires \( O(d^2) \) memory, which poses additional challenges for exponential high-dimensional systems. We propose a quantum approach through a qudit shadow tomography framework to exponentially reduce both the computational and storage complexity to \( O(\text{poly}(\log d)) \) for a broad class of matrices \( A \) and for bounded-norm Hermitian matrices \( B \) with known \(\text{tr}(B)\). Compared to shadow tomography via random Clifford measurements, our method reduces the computational complexity of post-processing per measurement from an exponential worst-case scenario to a constant, and it is applicable across arbitrary dimensions \( d \). This advancement opens new pathways for efficient high-dimensional data analysis and complex system modeling.	翻訳日:2024-11-08 04:19:50 公開日:2024-11-06
# Krawtchouk鎖におけるフェルミオン対数否定性 Fermionic logarithmic negativity in the Krawtchouk chain ( http://arxiv.org/abs/2408.16531v2 ) ライセンス: Link先を確認	Gabrielle Blanchet, Gilles Parez, Luc Vinet,	(参考訳) 非補体領域の絡み合いは、フェルミオン対数ネガティビティのレンズを介して不均一な自由フェルミオン鎖で研究される。クラウチョーク連鎖(Krawtchouk chain)は、同名の直交多項式との関係により、ある相関関数の正確な対角化と解析的な計算が可能となる。隣り合う地域では、負性スケーリングは、クローチョーク連鎖における二部鎖の絡み合いに関する以前の研究と一致して、中心電荷$c=1$の共形場理論のそれに対応する。解離した地域については,各地域が1つの場所に縮小する骨格体制に焦点をあてる。この体制は、遠くで先導的な行動を取り出すのに十分である。バルクにおいて、負性は$d^{-4 \Delta_f}$と$\Delta_f=1/2$で崩壊する。これは、自由ディラックフェルミオンの1次元における均質な結果と一致する。驚いたことに、あるサイトが境界に近いとき、この指数は$m=0,1,2,\dots$と$\Delta_f^{\textrm{even}}=3/8$と$\Delta_f^{\textrm{odd}}=5/8$のパリティに依存する。結果は数値計算と解析計算によって支えられている。 The entanglement of non-complementary regions is investigated in an inhomogeneous free-fermion chain through the lens of the fermionic logarithmic negativity. Focus is on the Krawtchouk chain, whose relation to the eponymous orthogonal polynomials allows for exact diagonalization and analytical calculations of certain correlation functions. For adjacent regions, the negativity scaling corresponds to that of a conformal field theory with central charge $c=1$, in agreement with previous studies on bipartite entanglement in the Krawtchouk chain. For disjoint regions, we focus on the skeletal regime where each region reduces to a single site. This regime is sufficient to extract the leading behaviour at large distances. In the bulk, the negativity decays as $d^{-4 \Delta_f}$ with $\Delta_f=1/2$, where $d$ is the separation between the regions. This is in agreement with the homogeneous result of free Dirac fermions in one dimension. Surprisingly, when one site is close to the boundary, this exponent changes and depends on the parity of the boundary site $m=0,1,2,\dots$, with $\Delta_f^{\textrm{even}}=3/8$ and $\Delta_f^{\textrm{odd}}=5/8$. The results are supported by numerics and analytical calculations.	翻訳日:2024-11-08 04:19:50 公開日:2024-11-06
# 深層学習を用いた高アスペクト比核融合デバイスの設計 Using Deep Learning to Design High Aspect Ratio Fusion Devices ( http://arxiv.org/abs/2409.00564v2 ) ライセンス: Link先を確認	P. Curvo, D. R. Ferreira, R. Jorge,	(参考訳) 融合装置の設計は一般に計算コストのかかるシミュレーションに基づいている。これは、特に、大きなパラメータ空間を持つ非軸対称磁場が特定の性能基準を満たすように最適化されたステラレータ最適化の場合において、自由パラメータの少ない高アスペクト比モデルを用いて緩和することができる。しかし、低伸長、高回転変換、有限プラズマベータ、良好な高速粒子閉じ込めなどの特性を持つ構成を見つけるためには、依然として最適化が必要である。本研究では,機械学習モデルを用いて,所望の特性に対するモデル入力パラメータの集合を求める逆設計問題の解を求めることにより,良好な閉じ込め特性を持つ構成を構築することを訓練する。逆問題の解は非一様であるため、混合密度ネットワークに基づく確率論的アプローチが用いられる。この方法で最適化された構成を確実に生成できることが示されている。 The design of fusion devices is typically based on computationally expensive simulations. This can be alleviated using high aspect ratio models that employ a reduced number of free parameters, especially in the case of stellarator optimization where non-axisymmetric magnetic fields with a large parameter space are optimized to satisfy certain performance criteria. However, optimization is still required to find configurations with properties such as low elongation, high rotational transform, finite plasma beta, and good fast particle confinement. In this work, we train a machine learning model to construct configurations with favorable confinement properties by finding a solution to the inverse design problem, that is, obtaining a set of model input parameters for given desired properties. Since the solution of the inverse problem is non-unique, a probabilistic approach, based on mixture density networks, is used. It is shown that optimized configurations can be generated reliably using this method.	翻訳日:2024-11-08 03:46:24 公開日:2024-11-06
# TabEBM:個別クラス特化エネルギーモデルを用いた語彙データ拡張手法 TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models ( http://arxiv.org/abs/2409.16118v3 ) ライセンス: Link先を確認	Andrei Margeloiu, Xiangjian Jiang, Nikola Simidjievski, Mateja Jamnik,	(参考訳) データ収集は、医学、物理学、化学といった重要な分野においてしばしば困難である。その結果、分類法は通常これらの小さなデータセットでは性能が悪く、予測性能が低下する。画像のデータの増大と同様、追加の合成データによるトレーニングセットの増加は、下流の分類性能を改善すると一般的に信じられている。しかしながら、結合分布 $ p(\mathbf{x}, y) $ またはクラス条件分布 $ p(\mathbf{x} \mid y) $ を学習する現在の表表生成法は、しばしば小さなデータセットに過度に適合し、結果として、品質の悪い合成データとなり、実際のデータのみを使用するよりも分類性能が悪化する。これらの課題を解決するために,エネルギーベースモデル(EBM)を用いた新しいクラス条件生成手法であるTabEBMを紹介する。全てのクラス条件密度を近似するために共有モデルを使用する既存の方法とは異なり、我々の重要な革新は、クラスごとに別々のEMM生成モデルを作成し、各クラス固有のデータ分布を個別にモデル化することである。このアプローチは、あいまいなクラス分布であっても、堅牢なエネルギーランドスケープを生み出す。実験の結果,TabEBMは既存の手法よりも高品質で統計的忠実度の高い合成データを生成することがわかった。データ拡張に使用する場合、我々の合成データは、様々なサイズのデータセット、特に小さなデータセットの分類性能を一貫して改善する。コードはhttps://github.com/andreimargeloiu/TabEBM.comで入手できる。 Data collection is often difficult in critical fields such as medicine, physics, and chemistry. As a result, classification methods usually perform poorly with these small datasets, leading to weak predictive performance. Increasing the training set with additional synthetic data, similar to data augmentation in images, is commonly believed to improve downstream classification performance. However, current tabular generative methods that learn either the joint distribution $ p(\mathbf{x}, y) $ or the class-conditional distribution $ p(\mathbf{x} \mid y) $ often overfit on small datasets, resulting in poor-quality synthetic data, usually worsening classification performance compared to using real data alone. To solve these challenges, we introduce TabEBM, a novel class-conditional generative method using Energy-Based Models (EBMs). Unlike existing methods that use a shared model to approximate all class-conditional densities, our key innovation is to create distinct EBM generative models for each class, each modelling its class-specific data distribution individually. This approach creates robust energy landscapes, even in ambiguous class distributions. Our experiments show that TabEBM generates synthetic data with higher quality and better statistical fidelity than existing methods. When used for data augmentation, our synthetic data consistently improves the classification performance across diverse datasets of various sizes, especially small ones. Code is available at https://github.com/andreimargeloiu/TabEBM.	翻訳日:2024-11-08 03:46:24 公開日:2024-11-06
# 中性子の$β$崩壊から生じる反ニュートリノは、異なる質量固有状態のコヒーレントな重ね合わせには含まれない Antineutrinos produced from $β$ decays of neutrons cannot be in coherent superpositions of different mass eigenstates ( http://arxiv.org/abs/2410.03133v3 ) ライセンス: Link先を確認	Shi-Biao Zheng,	(参考訳) 中性子の$\beta$崩壊によって生じる反ニュートリノ-陽電子系の波動関数全体を解析する。反ニュートリノは、中性子の初期運動量分布に関係なく、異なる質量固有状態のコヒーレントな重ね合わせには収まらないことが証明されている。 The entire wavefunction of the antineutrino-proton-electron system, produced by the $\beta$ decay of a neutron is analyzed. It is proven that the antineutrino cannot be in coherent superpositions of different mass eigenstates, irrespective of the initial momentum distribution of the neutron.	翻訳日:2024-11-08 03:46:24 公開日:2024-11-06
# 対立音声の文脈における雑音増強手法の再評価 Reassessing Noise Augmentation Methods in the Context of Adversarial Speech ( http://arxiv.org/abs/2409.01813v2 ) ライセンス: Link先を確認	Karla Pizzi, Matías Pizarro, Asja Fischer,	(参考訳) 本研究では,自動音声認識(ASR)システムにおいて,雑音増強訓練が対向的頑健性を同時に改善できるかどうかを検討する。 ASRアーキテクチャは、背景雑音、速度変動、残響の3つの異なる拡張条件下で訓練され、もう1つは速度変化のみのものであり、もう1つはデータ拡張の形式を持たないものである。その結果,雑音の増大は雑音音声のモデル性能を向上するだけでなく,敵攻撃に対するモデルの堅牢性も向上することが示された。 In this study, we investigate if noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition (ASR) systems. We conduct a comparative analysis of the adversarial robustness of four different state-of-the-art ASR architectures, where each of the ASR architectures is trained under three different augmentation conditions: one subject to background noise, speed variations, and reverberations, another subject to speed variations only, and a third without any form of data augmentation. The results demonstrate that noise augmentation not only improves model performance on noisy speech but also the model's robustness to adversarial attacks.	翻訳日:2024-11-08 03:23:46 公開日:2024-11-06
# 対立音声の文脈における雑音増強手法の再評価 Reassessing Noise Augmentation Methods in the Context of Adversarial Speech ( http://arxiv.org/abs/2409.01813v3 ) ライセンス: Link先を確認	Karla Pizzi, Matías Pizarro, Asja Fischer,	(参考訳) 本研究では,自動音声認識(ASR)システムにおいて,雑音増強訓練が対向的頑健性を同時に改善できるかどうかを検討する。 ASRアーキテクチャは、背景雑音、速度変動、残響の3つの異なる拡張条件下で訓練され、もう1つは速度変化のみのものであり、もう1つはデータ拡張の形式を持たないものである。その結果,雑音の増大は雑音音声のモデル性能を向上するだけでなく,敵攻撃に対するモデルの堅牢性も向上することが示された。 In this study, we investigate if noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition (ASR) systems. We conduct a comparative analysis of the adversarial robustness of four different state-of-the-art ASR architectures, where each of the ASR architectures is trained under three different augmentation conditions: one subject to background noise, speed variations, and reverberations, another subject to speed variations only, and a third without any form of data augmentation. The results demonstrate that noise augmentation not only improves model performance on noisy speech but also the model's robustness to adversarial attacks.	翻訳日:2024-11-08 03:23:46 公開日:2024-11-06
# IFAdapter: 接地テキスト・画像生成のためのインスタンス特徴制御 IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation ( http://arxiv.org/abs/2409.08240v3 ) ライセンス: Link先を確認	Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang,	(参考訳) テキスト・ツー・イメージ(T2I)拡散モデルは個々のインスタンスの視覚的に魅力的な画像を生成するのに優れていますが、複数のインスタンスの特徴の生成を正確に位置決めし制御するのに苦労しています。 Layout-to-Image(L2I)タスクは、境界ボックスを空間制御信号として組み込むことによって位置決めの問題に対処するために導入された。そこで本研究では,生成インスタンスにおける位置精度と特徴の忠実度を両立することを目的としたIFGタスクを提案する。 IFGタスクに対処するために、インスタンス・フィーチャー・アダプタ(IFAdapter)を導入します。 IFAdapterは、追加の外観トークンを導入し、インスタンスレベルの機能を空間的位置と整列するためにインスタンスセマンティックマップを活用することで、機能描写を強化する。 IFAdapterは、拡散プロセスをプラグアンドプレイモジュールとしてガイドし、様々なコミュニティモデルに適応できるようにする。評価のために、IFGベンチマークにコントリビュートし、正確な位置決めと特徴を持つインスタンスを生成するためのモデルの能力を客観的に比較する検証パイプラインを開発する。実験の結果,IFAdapterは定量評価と定性評価の両方において,他のモデルよりも優れていた。 While Text-to-Image (T2I) diffusion models excel at generating visually appealing images of individual instances, they struggle to accurately position and control the features generation of multiple instances. The Layout-to-Image (L2I) task was introduced to address the positioning challenges by incorporating bounding boxes as spatial control signals, but it still falls short in generating precise instance features. In response, we propose the Instance Feature Generation (IFG) task, which aims to ensure both positional accuracy and feature fidelity in generated instances. To address the IFG task, we introduce the Instance Feature Adapter (IFAdapter). The IFAdapter enhances feature depiction by incorporating additional appearance tokens and utilizing an Instance Semantic Map to align instance-level features with spatial locations. The IFAdapter guides the diffusion process as a plug-and-play module, making it adaptable to various community models. For evaluation, we contribute an IFG benchmark and develop a verification pipeline to objectively compare models' abilities to generate instances with accurate positioning and features. Experimental results demonstrate that IFAdapter outperforms other models in both quantitative and qualitative evaluations.	翻訳日:2024-11-07 21:20:36 公開日:2024-11-06
# $f$-divergence最小化によるテキスト・画像生成のアライメントパラダイムの一般化 Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization ( http://arxiv.org/abs/2409.09774v2 ) ライセンス: Link先を確認	Haoyuan Sun, Bo Xia, Yongzhe Chang, Xueqian Wang,	(参考訳) 直接選好最適化(DPO)は、最近、大きな言語モデル(LLM)の整合化から、テキストから画像モデルと人間の選好の整合化まで、その成功例を拡張した。しかし, これらの手法は, 微調整モデルと参照モデルとのアライメント過程において, 逆クルバック・リーブラー分岐の最小化にのみ依存している。本研究では,テキスト・ツー・イメージ・モデルのアライメントパラダイムにおける逆のKullback-Leibler分散を$f$-divergenceに拡張することに着目し,優れたアライメント性能と優れた世代多様性を実現することを目的とした。我々は、$f$-divergence条件下でのアライメントパラダイムの一般化式を提供し、勾配場の観点から異なる分散制約がアライメントプロセスに与える影響を徹底的に分析する。本研究では, 画像テキストアライメント性能, 人的価値アライメント性能, 世代多様性パフォーマンスを, 異なる分散制約下で総合的に評価し, イェンセン=シャノンの発散に基づくアライメントが, それらの間に最高のトレードオフをもたらすことを示す。テキストと画像のアライメントに使用する分散オプションは、アライメント性能(特に人的価値アライメント)とジェネレーション多様性のトレードオフに大きく影響する。 Direct Preference Optimization (DPO) has recently expanded its successful application from aligning large language models (LLMs) to aligning text-to-image models with human preferences, which has generated considerable interest within the community. However, we have observed that these approaches rely solely on minimizing the reverse Kullback-Leibler divergence during alignment process between the fine-tuned model and the reference model, neglecting the incorporation of other divergence constraints. In this study, we focus on extending reverse Kullback-Leibler divergence in the alignment paradigm of text-to-image models to $f$-divergence, which aims to garner better alignment performance as well as good generation diversity. We provide the generalized formula of the alignment paradigm under the $f$-divergence condition and thoroughly analyze the impact of different divergence constraints on alignment process from the perspective of gradient fields. We conduct comprehensive evaluation on image-text alignment performance, human value alignment performance and generation diversity performance under different divergence constraints, and the results indicate that alignment based on Jensen-Shannon divergence achieves the best trade-off among them. The option of divergence employed for aligning text-to-image models significantly impacts the trade-off between alignment performance (especially human value alignment) and generation diversity, which highlights the necessity of selecting an appropriate divergence for practical applications.	翻訳日:2024-11-07 20:46:36 公開日:2024-11-06
# $f$-divergence最小化によるテキスト・画像生成のアライメントパラダイムの一般化 Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization ( http://arxiv.org/abs/2409.09774v3 ) ライセンス: Link先を確認	Haoyuan Sun, Bo Xia, Yongzhe Chang, Xueqian Wang,	(参考訳) 直接選好最適化(DPO)は、最近、大きな言語モデル(LLM)の整合化から、テキストから画像モデルと人間の選好の整合化まで、その成功例を拡張した。しかし, これらの手法は, 微調整モデルと参照モデルとのアライメント過程において, 逆クルバック・リーブラー分岐の最小化にのみ依存している。本研究では,テキスト・ツー・イメージ・モデルのアライメントパラダイムにおける逆のKullback-Leibler分散を$f$-divergenceに拡張することに着目し,優れたアライメント性能と優れた世代多様性を実現することを目的とした。我々は、$f$-divergence条件下でのアライメントパラダイムの一般化式を提供し、勾配場の観点から異なる分散制約がアライメントプロセスに与える影響を徹底的に分析する。本研究では, 画像テキストアライメント性能, 人的価値アライメント性能, 世代多様性パフォーマンスを, 異なる分散制約下で総合的に評価し, イェンセン=シャノンの発散に基づくアライメントが, それらの間に最高のトレードオフをもたらすことを示す。テキストと画像のアライメントに使用する分散オプションは、アライメント性能(特に人的価値アライメント)とジェネレーション多様性のトレードオフに大きく影響する。 Direct Preference Optimization (DPO) has recently expanded its successful application from aligning large language models (LLMs) to aligning text-to-image models with human preferences, which has generated considerable interest within the community. However, we have observed that these approaches rely solely on minimizing the reverse Kullback-Leibler divergence during alignment process between the fine-tuned model and the reference model, neglecting the incorporation of other divergence constraints. In this study, we focus on extending reverse Kullback-Leibler divergence in the alignment paradigm of text-to-image models to $f$-divergence, which aims to garner better alignment performance as well as good generation diversity. We provide the generalized formula of the alignment paradigm under the $f$-divergence condition and thoroughly analyze the impact of different divergence constraints on alignment process from the perspective of gradient fields. We conduct comprehensive evaluation on image-text alignment performance, human value alignment performance and generation diversity performance under different divergence constraints, and the results indicate that alignment based on Jensen-Shannon divergence achieves the best trade-off among them. The option of divergence employed for aligning text-to-image models significantly impacts the trade-off between alignment performance (especially human value alignment) and generation diversity, which highlights the necessity of selecting an appropriate divergence for practical applications.	翻訳日:2024-11-07 20:46:36 公開日:2024-11-06
# 海上サイバーセキュリティ:総合的なレビュー Maritime Cybersecurity: A Comprehensive Review ( http://arxiv.org/abs/2409.11417v2 ) ライセンス: Link先を確認	Meixuan Li, Jianying Zhou, Sudipta Chattopadhyay, Mark Goh,	(参考訳) 海上産業は危機的な危機に立たされ、技術的進歩の必須条件は、堅牢なサイバーセキュリティ対策の必要性と交差している。海上サイバーセキュリティ(英: Maritime Cybersecurity)とは、海洋産業におけるコンピュータシステムとデジタル資産の保護、および海洋生態系を構成する相互接続コンポーネントの広範なネットワークを指す。本研究では,海上サイバーセキュリティの重要領域を特定し,その有効性を評価することを目的とする。 AIS, GNSS, ECDIS, VDR, RADAR, VSAT, GMDSSを含む主要海洋システムにおける脅威の詳細な分析を行った。海上サイバー攻撃の多次元分類が提示され、脅威アクター、モチベーション、影響に関する洞察を提供する。統合ソリューションからコンポーネント固有のソリューションまで、さまざまなセキュリティソリューションの評価も行っています。最後に、オープンな課題と将来のソリューションを共有しました。補足セクションでは,本調査で論じられた容器コンポーネントの定義と脆弱性について紹介する。重要な相互接続の側面でこれらすべての重要な問題に対処することで、このレビューはより回復力のある海洋生態系を育むことを目的としている。 The maritime industry stands at a critical juncture, where the imperative for technological advancement intersects with the pressing need for robust cybersecurity measures. Maritime cybersecurity refers to the protection of computer systems and digital assests within the maritime industry, as well as the broader network of interconnected components that make up the maritime ecosystem. In this survey, we aim to identify the significant domains of maritime cybersecurity and measure their effectiveness. We have provided an in-depth analysis of threats in key maritime systems, including AIS, GNSS, ECDIS, VDR, RADAR, VSAT, and GMDSS, while exploring real-world cyber incidents that have impacted the sector. A multi-dimensional taxonomy of maritime cyber attacks is presented, offering insights into threat actors, motivations, and impacts. We have also evaluated various security solutions, from integrated solutions to component specific solutions. Finally, we have shared open challenges and future solutions. In the supplementary section, we have presented definitions and vulnerabilities of vessel components that have discussed in this survey. By addressing all these critical issues with key interconnected aspects, this review aims to foster a more resilient maritime ecosystem.	翻訳日:2024-11-07 20:01:55 公開日:2024-11-06
# 海上サイバーセキュリティ:総合的なレビュー Maritime Cybersecurity: A Comprehensive Review ( http://arxiv.org/abs/2409.11417v3 ) ライセンス: Link先を確認	Meixuan Li, Jianying Zhou, Sudipta Chattopadhyay, Mark Goh,	(参考訳) 海上産業は危機的な危機に立たされ、技術的進歩の必須条件は、堅牢なサイバーセキュリティ対策の必要性と交差している。海上サイバーセキュリティ(英: Maritime Cybersecurity)とは、海洋産業におけるコンピュータシステムとデジタル資産の保護、および海洋生態系を構成する相互接続コンポーネントの広範なネットワークを指す。本研究では,海上サイバーセキュリティの重要領域を特定し,その有効性を評価することを目的とする。 AIS, GNSS, ECDIS, VDR, RADAR, VSAT, GMDSSを含む主要海洋システムにおける脅威の詳細な分析を行った。海上サイバー攻撃の多次元分類が提示され、脅威アクター、モチベーション、影響に関する洞察を提供する。統合ソリューションからコンポーネント固有のソリューションまで、さまざまなセキュリティソリューションの評価も行っています。最後に、オープンな課題と将来のソリューションを共有しました。補足セクションでは,本調査で論じられた容器コンポーネントの定義と脆弱性について紹介する。重要な相互接続の側面でこれらすべての重要な問題に対処することで、このレビューはより回復力のある海洋生態系を育むことを目的としている。 The maritime industry stands at a critical juncture, where the imperative for technological advancement intersects with the pressing need for robust cybersecurity measures. Maritime cybersecurity refers to the protection of computer systems and digital assests within the maritime industry, as well as the broader network of interconnected components that make up the maritime ecosystem. In this survey, we aim to identify the significant domains of maritime cybersecurity and measure their effectiveness. We have provided an in-depth analysis of threats in key maritime systems, including AIS, GNSS, ECDIS, VDR, RADAR, VSAT, and GMDSS, while exploring real-world cyber incidents that have impacted the sector. A multi-dimensional taxonomy of maritime cyber attacks is presented, offering insights into threat actors, motivations, and impacts. We have also evaluated various security solutions, from integrated solutions to component specific solutions. Finally, we have shared open challenges and future solutions. In the supplementary section, we have presented definitions and vulnerabilities of vessel components that have discussed in this survey. By addressing all these critical issues with key interconnected aspects, this review aims to foster a more resilient maritime ecosystem.	翻訳日:2024-11-07 20:01:55 公開日:2024-11-06
# カルタン移動フレームとデータ多様体 Cartan moving frames and the data manifolds ( http://arxiv.org/abs/2409.12057v2 ) ライセンス: Link先を確認	Eliot Tron, Rita Fioresi, Nicolas Couellan, Stéphane Puechmorel,	(参考訳) 本研究の目的は,データ情報量とデータ点の曲率を用いて,カルタン移動フレームの言語を用いて,データ多様体とそのリーマン構造の幾何学を研究することである。このフレームワークと実験を通じて、ニューラルネットワークの応答に関する説明は、与えられた入力から容易に到達可能な出力クラスを指摘することによって与えられる。このことは、ネットワークの出力と入力の幾何学との間の数学的関係が、説明可能な人工知能ツールとしてどのように活用できるかを強調している。 The purpose of this paper is to employ the language of Cartan moving frames to study the geometry of the data manifolds and its Riemannian structure, via the data information metric and its curvature at data points. Using this framework and through experiments, explanations on the response of a neural network are given by pointing out the output classes that are easily reachable from a given input. This emphasizes how the proposed mathematical relationship between the output of the network and the geometry of its inputs can be exploited as an explainable artificial intelligence tool.	翻訳日:2024-11-07 19:50:48 公開日:2024-11-06
# TalkMosaic:マルチモーダルLLMQ&Aインタラクションによる対話型フォトモザイク TalkMosaic: Interactive PhotoMosaic with Multi-modal LLM Q&A Interactions ( http://arxiv.org/abs/2409.13941v2 ) ライセンス: Link先を確認	Kevin Li, Fulu Li,	(参考訳) 本研究では, 環境保護のテーマとして, 鳥やライオンなどの動物のイメージを構成するために, 幅広い種類の車両の画像を用いて, 合成画像中の車に関する情報を最大化し, 環境問題に対する意識を高める。本稿では,写真モザイク画像中のタイル画像とそれに対応する原車画像とのインタラクティブな切り替えをデスクトップ上に自動的に保存する「クリック・アンド・ディスプレイ」という簡単な操作を用いて,芸術的に構成されたフォトモザイク画像とのインタラクションを示す。カーイメージ情報と関連する知識をChatGPTに組み込むことで,TalkMosaicというマルチモーダルカスタムGPTを構築する。元のカーイメージをTalkMosaicにアップロードすることで、与えられたカーイメージについて質問し、高い環境基準を満たす車イメージのタイヤの購入場所など、効率よく、かつ効果的に回答を得ることができる。スパースアテンションと量子化技術を用いてマルチモーダル LLM の推論を高速化する方法を,提案した確率的 FlashAttention (PrFlashAttention) 法とStaircase Adaptive Quantization (SAQ) 法を用いて詳細に解析する。実装されたプロトタイプは,提案手法の有効性と有効性を示す。 We use images of cars of a wide range of varieties to compose an image of an animal such as a bird or a lion for the theme of environmental protection to maximize the information about cars in a single composed image and to raise the awareness about environmental challenges. We present a novel way of image interaction with an artistically-composed photomosaic image, in which a simple operation of "click and display" is used to demonstrate the interactive switch between a tile image in a photomosaic image and the corresponding original car image, which will be automatically saved on the Desktop. We build a multimodal custom GPT named TalkMosaic by incorporating car images information and the related knowledge to ChatGPT. By uploading the original car image to TalkMosaic, we can ask questions about the given car image and get the corresponding answers efficiently and effectively such as where to buy the tire in the car image that satisfies high environmental standards. We give an in-depth analysis on how to speed up the inference of multimodal LLM using sparse attention and quantization techniques with presented probabilistic FlashAttention (PrFlashAttention) and Staircase Adaptive Quantization (SAQ) methods. The implemented prototype demonstrates the feasibility and effectiveness of the presented approach.	翻訳日:2024-11-07 19:50:48 公開日:2024-11-06
# 電気自動車インターネットにおける生成人工知能の役割 The Roles of Generative Artificial Intelligence in Internet of Electric Vehicles ( http://arxiv.org/abs/2409.15750v2 ) ライセンス: Link先を確認	Hanwen Zhang, Dusit Niyato, Wei Zhang, Changyuan Zhao, Hongyang Du, Abbas Jamalipour, Sumei Sun, Yiyang Pei,	(参考訳) 生成人工知能(GenAI)モデルの進歩により、その能力はコンテンツ生成を超えて大幅に拡大し、さまざまなアプリケーションにまたがってモデルの利用が増えている。特にGenAIは、充電管理からサイバー攻撃防止まで、電気自動車(EV)エコシステムの課題に対処する大きな可能性を示している。本稿では、電気自動車のインターネット(IoEV)を具体的に検討し、IoEV用のGenAIを、EVのバッテリ層、個々のEV層、スマートグリッド層、セキュリティ層という4つの異なるレイヤに分類する。 IoEVアプリケーションの各レイヤで使用されるさまざまなGenAI技術を紹介します。その後、GenAIモデルをトレーニングするための公開データセットが要約される。最後に、今後の方向性について推奨する。この調査は、異なるレイヤにわたるIoEVにおけるGenAIの応用を分類するだけでなく、各レイヤにおける設計と実装の課題を強調することで、研究者や実践者にとって貴重なリソースとして役立ちます。さらに、将来の研究方向性のロードマップを提供し、より堅牢で効率的なIoEVシステムの開発を可能にする。 With the advancements of generative artificial intelligence (GenAI) models, their capabilities are expanding significantly beyond content generation and the models are increasingly being used across diverse applications. Particularly, GenAI shows great potential in addressing challenges in the electric vehicle (EV) ecosystem ranging from charging management to cyber-attack prevention. In this paper, we specifically consider Internet of electric vehicles (IoEV) and we categorize GenAI for IoEV into four different layers namely, EV's battery layer, individual EV layer, smart grid layer, and security layer. We introduce various GenAI techniques used in each layer of IoEV applications. Subsequently, public datasets available for training the GenAI models are summarized. Finally, we provide recommendations for future directions. This survey not only categorizes the applications of GenAI in IoEV across different layers but also serves as a valuable resource for researchers and practitioners by highlighting the design and implementation challenges within each layer. Furthermore, it provides a roadmap for future research directions, enabling the development of more robust and efficient IoEV systems through the integration of advanced GenAI techniques.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# Gaussian Deja-vu: 一般化とパーソナライズ能力の強化による制御可能な3次元ガウスヘッドアバターの作成 Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities ( http://arxiv.org/abs/2409.16147v3 ) ライセンス: Link先を確認	Peizhi Yan, Rabab Ward, Qiang Tang, Shan Du,	(参考訳) 近年の3Dガウス・スプラッティング(3DGS)は、3Dヘッドアバターをモデル化する大きな可能性を解き放ち、メッシュベースの手法よりも柔軟性があり、NeRFベースの手法よりも効率の良いレンダリングを実現している。これらの進歩にもかかわらず、制御可能な3DGSベースのヘッドアバターの作成は時間集約的であり、しばしば数分間から数時間を要する。このプロセスを高速化するために、まず頭部アバターの一般化モデルを取得し、その結果をパーソナライズする「ガウスデジャヴ」(Gaussian Deja-vu)フレームワークを導入する。一般化されたモデルは、大規模な2D(合成および実)画像データセットに基づいて訓練される。このモデルは、パーソナライズされた頭部アバターを実現するためにモノクロビデオを用いてさらに洗練される、十分に初期化された3Dガウスヘッドを提供する。パーソナライズのために,最初の3次元ガウシアンを補正し,ニューラルネットワークに頼らずに迅速な収束を確保するために,学習可能な表現認識補正ブレンドマップを提案する。実験により,提案手法が目的を満たすことを示す。最先端の3Dガウシアンヘッドアバターをフォトリアリスティックな品質で上回り、トレーニング時間を既存の方法の少なくとも4分の1に短縮し、数分でアバターを生産する。 Recent advancements in 3D Gaussian Splatting (3DGS) have unlocked significant potential for modeling 3D head avatars, providing greater flexibility than mesh-based methods and more efficient rendering compared to NeRF-based approaches. Despite these advancements, the creation of controllable 3DGS-based head avatars remains time-intensive, often requiring tens of minutes to hours. To expedite this process, we here introduce the "Gaussian Deja-vu" framework, which first obtains a generalized model of the head avatar and then personalizes the result. The generalized model is trained on large 2D (synthetic and real) image datasets. This model provides a well-initialized 3D Gaussian head that is further refined using a monocular video to achieve the personalized head avatar. For personalizing, we propose learnable expression-aware rectification blendmaps to correct the initial 3D Gaussians, ensuring rapid convergence without the reliance on neural networks. Experiments demonstrate that the proposed method meets its objectives. It outperforms state-of-the-art 3D Gaussian head avatars in terms of photorealistic quality as well as reduces training time consumption to at least a quarter of the existing methods, producing the avatar in minutes.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# 大規模言語モデルにおける対実的トークン生成 Counterfactual Token Generation in Large Language Models ( http://arxiv.org/abs/2409.17027v2 ) ライセンス: Link先を確認	Ivi Chatzi, Nina Corvelo Benz, Eleni Straitouri, Stratis Tsirtsis, Manuel Gomez-Rodriguez,	(参考訳) 「さようなら、リラ船長は信頼ある船「マエルストロムの怒り」の司令台に立ち、果てしない海を見つめていた。[...]リラの目は、苦しい真実に気付くと涙を流し、富を流すために全てを犠牲にして、乗組員、彼女の家族、そして自分自身の愛を失った。」この物語は、大きな言語モデルによって生み出され、魅惑的だが、モデルが代わりに"Captain Maeve"を主人公に選んだ場合、どのように展開されるのだろうか? わからない。最先端の大規模言語モデルはステートレスであり、内部メモリや状態は保持しない。プロンプトが与えられたら、自動回帰プロセスを使用して出力としてトークンのシーケンスを生成する。結果として、彼らは過去に生成したトークンに対して、偽造的な代替手段を推論することはできない。この作業では、この機能でそれらを強化することを目標としています。そこで我々は,Gumbel-Max構造因果モデルに基づくトークン生成の因果モデルを構築した。我々のモデルは、どんな大きな言語モデルでも、バニラトークン生成と比較してほとんどコストがかからず、実装が恥ずかしいほど簡単であり、微調整や迅速なエンジニアリングは不要である。 Llama 3 8B-Instruct と Ministral-8B-Instruct にモデルを実装し,定性的かつ定量的なテキスト解析を行った。我々は,大規模な言語モデルによって構築された世界のモデルに関する興味深い洞察を提示し,バイアス検出に反事実トークン生成の実証的応用で結論付けた。 "Sure, I am happy to generate a story for you: Captain Lyra stood at the helm of her trusty ship, the Maelstrom's Fury, gazing out at the endless sea. [...] Lyra's eyes welled up with tears as she realized the bitter truth - she had sacrificed everything for fleeting riches, and lost the love of her crew, her family, and herself." Although this story, generated by a large language model, is captivating, one may wonder -- how would the story have unfolded if the model had chosen "Captain Maeve" as the protagonist instead? We cannot know. State-of-the-art large language models are stateless -- they maintain no internal memory or state. Given a prompt, they generate a sequence of tokens as an output using an autoregressive process. As a consequence, they cannot reason about counterfactual alternatives to tokens they have generated in the past. In this work, our goal is to enhance them with this functionality. To this end, we develop a causal model of token generation that builds upon the Gumbel-Max structural causal model. Our model allows any large language model to perform counterfactual token generation at almost no cost in comparison with vanilla token generation, it is embarrassingly simple to implement, and it does not require any fine-tuning nor prompt engineering. We implement our model on Llama 3 8B-Instruct and Ministral-8B-Instruct and conduct a qualitative and a quantitative analysis of counterfactually generated text. We conclude with a demonstrative application of counterfactual token generation for bias detection, unveiling interesting insights about the model of the world constructed by large language models.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# TFS-NeRF:動的シーンのセマンティック3次元再構成のためのテンプレートフリーNeRF TFS-NeRF: Template-Free NeRF for Semantic 3D Reconstruction of Dynamic Scene ( http://arxiv.org/abs/2409.17459v2 ) ライセンス: Link先を確認	Sandika Biswas, Qianyi Wu, Biplab Banerjee, Hamid Rezatofighi,	(参考訳) 3次元表面再構成のためのニューラルインプリシットモデルの発展にもかかわらず、任意の剛性、非剛性、変形可能なエンティティによる動的環境の扱いは依然として困難である。多くのテンプレートベースの手法は、人間に焦点をあてたエンティティ固有であり、そのようなダイナミックなシーンに適応可能な汎用的な再構成手法は、しばしば深度や光の流れのような追加の入力を必要とするか、合理的な結果を得るために事前訓練された画像特徴に依存している。これらの手法は通常、フレーム単位の変形をキャプチャするために潜時符号を使用する。対照的に、いくつかのテンプレートフリーメソッドはこれらの要件を回避し、変形可能な物体の動きの詳細な表現に従来のLBS重み(Linear Blend Skinning)を採用する。この目的のために,本稿では,スパースやシングルビューRGBビデオから取得した動的シーンのテンプレートフリーな3DセマンティックNeRFであるTFS-NeRFを紹介し,様々なエンティティ間のインタラクションと,他のLBSベースのアプローチよりも時間効率のよいアプローチを提案する。我々のフレームワークは、LBS予測にInvertible Neural Network(INN)を使用し、トレーニングプロセスを簡素化する。本手法は,複数の実体の運動を分離し,各中心のスキン重量を最適化することにより,高精度でセマンティックに分離可能な測地を効率的に生成する。大規模実験により, 複雑な相互作用において, 変形可能なオブジェクトと非変形可能なオブジェクトの両方を高品質に再構成し, 既存の手法と比較して訓練効率が向上した。 Despite advancements in Neural Implicit models for 3D surface reconstruction, handling dynamic environments with arbitrary rigid, non-rigid, or deformable entities remains challenging. Many template-based methods are entity-specific, focusing on humans, while generic reconstruction methods adaptable to such dynamic scenes often require additional inputs like depth or optical flow or rely on pre-trained image features for reasonable outcomes. These methods typically use latent codes to capture frame-by-frame deformations. In contrast, some template-free methods bypass these requirements and adopt traditional LBS (Linear Blend Skinning) weights for a detailed representation of deformable object motions, although they involve complex optimizations leading to lengthy training times. To this end, as a remedy, this paper introduces TFS-NeRF, a template-free 3D semantic NeRF for dynamic scenes captured from sparse or single-view RGB videos, featuring interactions among various entities and more time-efficient than other LBS-based approaches. Our framework uses an Invertible Neural Network (INN) for LBS prediction, simplifying the training process. By disentangling the motions of multiple entities and optimizing per-entity skinning weights, our method efficiently generates accurate, semantically separable geometries. Extensive experiments demonstrate that our approach produces high-quality reconstructions of both deformable and non-deformable objects in complex interactions, with improved training efficiency compared to existing methods.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# 高速なランダム化動的デカップリング Faster Randomized Dynamical Decoupling ( http://arxiv.org/abs/2409.18369v2 ) ライセンス: Link先を確認	Changhao Yi, Leeseok Kim, Milad Marvian,	(参考訳) 本稿では、任意の決定論的DDの性能を2つ以上の追加パルスを用いることで改善する、ランダム化された動的疎結合(DD)プロトコルを提案する。提案手法はパルス列を確率的に適用することにより実現し,結合強度と線形にスケールする誤差項を効果的に除去する。その結果、数個のパルスを用いたランダム化プロトコルは、かなり多くのパルスを必要とする決定論的DDプロトコルよりも優れていることを示した。さらに,従来最適と考えられていたUhrig DDなど,システムのヒルベルト空間における誤りの低減を目的とした決定論的DDシーケンスと比較して,乱数化プロトコルが改善することを示す。性能を厳格に評価するために,独立性のある高次DDプロトコルの解析に適した新しい解析手法を提案する。また,広く使われている決定論的プロトコルと比較して,ランダム化プロトコルの利点を裏付ける数値シミュレーションを行った。 We present a randomized dynamical decoupling (DD) protocol that can improve the performance of any given deterministic DD, by using no more than two additional pulses. Our construction is implemented by probabilistically applying sequences of pulses, which, when combined, effectively eliminate the error terms that scale linearly with the system-environment coupling strength. As a result, we show that a randomized protocol using a few pulses can outperform deterministic DD protocols that require considerably more pulses. Furthermore, we prove that the randomized protocol provides an improvement compared to deterministic DD sequences that aim to reduce the error in the system's Hilbert space, such as Uhrig DD, which had been previously regarded to be optimal. To rigorously evaluate the performance, we introduce new analytical methods suitable for analyzing higher-order DD protocols that might be of independent interest. We also present numerical simulations confirming the significant advantage of using randomized protocols compared to widely used deterministic protocols.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# シングルオーディオを超えて:オーディオ大言語モデルにおけるマルチオーディオ処理の改善 Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models ( http://arxiv.org/abs/2409.18680v3 ) ライセンス: Link先を確認	Yiming Chen, Xianghu Yue, Xiaoxue Gao, Chen Zhang, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li,	(参考訳) 近年,複数のオーディオ-LLM (ALLMs) が単一の統一モデルを用いて,複数のオーディオタスクを同時に処理するために研究されている。 ALLMの既存の評価は主に単一オーディオタスクに焦点を当てているが、現実のアプリケーションは複数のオーディオストリームを同時に処理することが多い。このギャップを埋めるために、音声と音声の両方のシナリオを含む11のマルチオーディオタスクから20のデータセットからなる、最初のマルチオーディオ評価(MAE)ベンチマークを提案する。 MAEに関する総合的な実験では、既存のALLMは個々のオーディオ入力における一次オーディオ要素の解釈に強力でありながら、マルチオーディオシナリオを扱うのに苦労していることが示されている。この目的のために,提案した合成データに対する識別学習を用いて,複数の類似音声間の音声コンテキストをキャプチャするマルチオーディオLLM(MALLM)を提案する。その結果,提案したMALLMはすべてのベースラインを上回り,人間のアノテーションを必要とせずに合成データを用いて高いデータ効率を実現することができた。提案されたMALLMは、マルチオーディオ処理時代へのALLMの扉を開き、機械における人間の聴覚能力の複製に近づきつつある。 Various audio-LLMs (ALLMs) have been explored recently for tackling different audio tasks simultaneously using a single, unified model. While existing evaluations of ALLMs primarily focus on single-audio tasks, real-world applications often involve processing multiple audio streams simultaneously. To bridge this gap, we propose the first multi-audio evaluation (MAE) benchmark that consists of 20 datasets from 11 multi-audio tasks encompassing both speech and sound scenarios. Comprehensive experiments on MAE demonstrate that the existing ALLMs, while being powerful in comprehending primary audio elements in individual audio inputs, struggling to handle multi-audio scenarios. To this end, we propose a novel multi-audio-LLM (MALLM) to capture audio context among multiple similar audios using discriminative learning on our proposed synthetic data. The results demonstrate that the proposed MALLM outperforms all baselines and achieves high data efficiency using synthetic data without requiring human annotations. The proposed MALLM opens the door for ALLMs towards multi-audio processing era and brings us closer to replicating human auditory capabilities in machines.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# 知覚圧縮機:長期シナリオにおける訓練不要なプロンプト圧縮法 Perception Compressor:A training-free prompt compression method in long context scenarios ( http://arxiv.org/abs/2409.19272v2 ) ライセンス: Link先を確認	Jiwei Tang, Jin Xu, Tingwei Lu, Zhicheng Zhang, Yiming Zhao, Lin Hai, Hai-Tao Zheng,	(参考訳) 大規模言語モデル(LLM)は、様々なシナリオにおいて例外的な能力を示す。しかし、それらは非常に冗長な情報に悩まされており、長いコンテキストシナリオにおけるキー情報の位置(入力問題に関連する)に敏感であり、性能が劣る。これらの課題に対処するために、トレーニング不要なプロンプト圧縮手法であるPerception Compressorを提案する。もっとも関連性の高いデモンストレーションを検索するための指導的質問と指示を利用する知覚検索器と、圧縮率とオープンブック比を動的に割り当てるデュアルスロープ比アロケータと、LLMを邪魔するトークンを除去しながらトークンレベルでキー情報を保持する半誘導反復圧縮とを含む。長い文脈のベンチマーク、すなわちNaturalQuestions、LongBench、MuSiQueについて広範な実験を行う。実験の結果, パーセプション圧縮機は既存手法よりも高い性能を示し, 最先端性能を実現している。 Large Language Models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information (relevant to the input question) in long context scenarios, leading to inferior performance. To address these challenges, we present Perception Compressor, a training-free prompt compression method. It includes a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, a dual-slope ratio allocator to dynamically allocate compression ratios and open-book ratios, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# LLMサービングシステムでタイム・サイド・チャンネルを発見 The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems ( http://arxiv.org/abs/2409.20002v2 ) ライセンス: Link先を確認	Linke Song, Zixuan Pang, Wenhao Wang, Zihao Wang, XiaoFeng Wang, Hongbo Chen, Wei Song, Yier Jin, Dan Meng, Rui Hou,	(参考訳) LLM(Large Language Models)の広範な展開により、推論性能の最適化に対する強い要求が生まれている。この目的を達成する今日のテクニックは、主にレイテンシの削減と、アルゴリズムとハードウェアの強化によるスループットの向上に重点を置いている。本研究では,LLMシステムにおいて,機密システムプロンプトと他のユーザからの発行の両方を推測するために,共有キャッシュとGPUメモリアロケーションから発生する,新たなタイミング側チャネルのセットを初めて発見した。これらの脆弱性は、従来のコンピューティングシステムで観測されたセキュリティ上の課題と類似しており、LLMサービスインフラストラクチャの潜在的な情報漏洩に対処する緊急の必要性を強調している。本稿では,鍵値キャッシュ(KV)とセマンティックキャッシュをターゲットとして,LLMのデプロイメントに固有のタイミング側チャネルを活用するために設計された,新たな攻撃戦略について報告する。提案手法では,タイミング測定と分類モデルを用いてキャッシュヒットを検出することにより,敵がプライベートプロンプトを高精度に推測することを可能にする。また,キャッシュ内の共有プロンプトプレフィックスを効率よく回収するトークン・バイ・トークン検索アルゴリズムを提案する。オンラインLLMサービスのブラックボックステストに関する実験的研究は、このようなプライバシーリスクが完全に現実的であり、重大な結果をもたらすことを実証している。我々の研究は、LSMシステムを保護するための堅牢な緩和の必要性を浮き彫りにした。 The wide deployment of Large Language Models (LLMs) has given rise to strong demands for optimizing their inference performance. Today's techniques serving this purpose primarily focus on reducing latency and improving throughput through algorithmic and hardware enhancements, while largely overlooking their privacy side effects, particularly in a multi-user environment. In our research, for the first time, we discovered a set of new timing side channels in LLM systems, arising from shared caches and GPU memory allocations, which can be exploited to infer both confidential system prompts and those issued by other users. These vulnerabilities echo security challenges observed in traditional computing systems, highlighting an urgent need to address potential information leakage in LLM serving infrastructures. In this paper, we report novel attack strategies designed to exploit such timing side channels inherent in LLM deployments, specifically targeting the Key-Value (KV) cache and semantic cache widely used to enhance LLM inference performance. Our approach leverages timing measurements and classification models to detect cache hits, allowing an adversary to infer private prompts with high accuracy. We also propose a token-by-token search algorithm to efficiently recover shared prompt prefixes in the caches, showing the feasibility of stealing system prompts and those produced by peer users. Our experimental studies on black-box testing of popular online LLM services demonstrate that such privacy risks are completely realistic, with significant consequences. Our findings underscore the need for robust mitigation to protect LLM systems against such emerging threats.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# 大規模言語モデルにおけるアンラーニングとアライメントの確率論的視点 A Probabilistic Perspective on Unlearning and Alignment for Large Language Models ( http://arxiv.org/abs/2410.03523v3 ) ライセンス: Link先を確認	Yan Scholten, Stephan Günnemann, Leo Schwinn,	(参考訳) 大規模言語モデル(LLM)の包括的評価はオープンな研究課題である。既存の評価は、グリーディ復号によって生成される決定論的点推定に依存している。しかし、決定論的評価では、モデル全体の出力分布を捉えることができず、モデル機能の不正確な推定結果が得られることがわかった。これは、正確なモデル評価が不可欠であるアンラーニングやアライメントのような重要なコンテキストにおいて特に問題となる。そこで本研究では,LLMにおける最初の形式的確率的評価フレームワークを提案する。すなわち、モデルの出力分布に関する高い確率保証を持つ新しいメトリクスを導出する。私たちのメトリクスはアプリケーションに依存しないので、デプロイ前にモデル機能についてより信頼性の高い見積を行うことができます。アンラーニングに焦点を当てたケーススタディを通じて、決定論的評価は未学習の成功を誤って示すのに対し、確率論的評価は、未学習と思われる情報が全てではないとしても、これらのモデルでアクセス可能であることを示す。さらに,エントロピー最適化と適応温度スケーリングに基づく新しいアンラーニング損失を提案する。提案手法は, 点推定から出力分布の確率的評価へのシフトが, LLMの包括的評価への重要な一歩である。コードはhttps://github.com/yascho/probabilistic-unlearningで公開されている。 Comprehensive evaluation of Large Language Models (LLMs) is an open research problem. Existing evaluations rely on deterministic point estimates generated via greedy decoding. However, we find that deterministic evaluations fail to capture the whole output distribution of a model, yielding inaccurate estimations of model capabilities. This is particularly problematic in critical contexts such as unlearning and alignment, where precise model evaluations are crucial. To remedy this, we introduce the first formal probabilistic evaluation framework in LLMs. Namely, we derive novel metrics with high-probability guarantees concerning the output distribution of a model. Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment. Through a case study focused on unlearning, we reveal that deterministic evaluations falsely indicate successful unlearning, whereas our probabilistic evaluations demonstrate that most if not all of the supposedly unlearned information remains accessible in these models. Additionally, we propose a novel unlearning loss based on entropy optimization and adaptive temperature scaling, which significantly improves unlearning in probabilistic settings on recent benchmarks. Our proposed shift from point estimates to probabilistic evaluations of output distributions represents an important step toward comprehensive evaluations of LLMs. Code available at https://github.com/yascho/probabilistic-unlearning.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# Timer-XL: 統合時系列予測のためのLong-Context Transformer Timer-XL: Long-Context Transformers for Unified Time Series Forecasting ( http://arxiv.org/abs/2410.04803v2 ) ライセンス: Link先を確認	Yong Liu, Guo Qin, Xiangdong Huang, Jianmin Wang, Mingsheng Long,	(参考訳) 我々は時系列の統一予測のための生成変換器Timer-XLを提案する。 1Dおよび2D時系列を均一に予測するために、主に1Dシーケンスの因果生成に採用された次のトークン予測を一般化し、次のトークン予測を多変量化する。提案手法は,長文生成問題として様々な予測シナリオを均一に定式化する。非定常性, 複雑な動的および相関を持つ多変量時系列, 内因性および外因性の両方を含む共変量インフォームド・コンテクストを特徴とする一変量系列の統一予測を実装する。本稿では,時系列における生成トランスフォーマーの高速化を目的としたTimeAttentionを提案する。これは,フラット化された時系列トークン(パッチ)の細粒度内および系列間依存性を効果的に把握し,時間次元と変動次元の両方に位置埋め込みを組み込むことにより,さらに強化される。 Timer-XLは、統一されたアプローチにより、挑戦的な予測ベンチマークで最先端のパフォーマンスを達成する。大規模時系列モデルとして、大規模事前訓練による顕著なモデル転送性、およびトークン長の文脈的柔軟性を示し、一対一の予測器として位置づける。 We present Timer-XL, a generative Transformer for unified time series forecasting. To uniformly predict 1D and 2D time series, we generalize next token prediction, predominantly adopted for causal generation of 1D sequences, to multivariate next token prediction. The proposed paradigm uniformly formulates various forecasting scenarios as a long-context generation problem. We opt for the generative Transformer, which can capture global-range and causal dependencies while providing contextual flexibility, to implement unified forecasting on univariate series characterized by non-stationarity, multivariate time series with complicated dynamics and correlations, and covariate-informed contexts that include both endogenous and exogenous variables. Technically, we propose a universal TimeAttention to facilitate generative Transformers on time series, which can effectively capture fine-grained intra- and inter-series dependencies of flattened time series tokens (patches) and is further strengthened by position embeddings in both temporal and variable dimensions. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach. As a large time series model, it demonstrates notable model transferability by large-scale pre-training, as well as contextual flexibility in token lengths, positioning it as a one-for-all forecaster.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# ディープニューラルネットワークを用いたスマートフォン画像からの偽造品の検出 Deep neural network-based detection of counterfeit products from smartphone images ( http://arxiv.org/abs/2410.05969v2 ) ライセンス: Link先を確認	Hugo Garcia-Cotte, Dorra Mellouli, Abdul Rehman, Li Wang, David G. Stork,	(参考訳) 医薬品やワクチンなどの偽造品や、高級ハンドバッグ、時計、宝石、衣服、化粧品などの高級品は、正統な製造業者や販売業者への収入の著しい直接的損失であり、社会全体に間接的なコストがかかる。我々は、プライチェーン追跡のための製品や修正に特別なセキュリティタグやその他の変更を必要としない偽造と戦う、世界初の純粋にコンピュータビジョンベースのシステムを提示します。私たちのディープニューラルネットワークシステムは、小売店、税関のチェックポイント、倉庫、屋外など、自然に制御された弱い条件下で撮影された画像を使用して、最初のメーカーがテストした(3.06%の拒否後99.71%)ブランドの衣服に高い精度を示す。本システムでは,少数の偽物や偽物で訓練された商品を,ファッションアクセサリ,香水箱,薬品など,追加の製品カテゴリに応用することを目的としている。 Counterfeit products such as drugs and vaccines as well as luxury items such as high-fashion handbags, watches, jewelry, garments, and cosmetics, represent significant direct losses of revenue to legitimate manufacturers and vendors, as well as indirect costs to societies at large. We present the world's first purely computer-vision-based system to combat such counterfeiting-one that does not require special security tags or other alterations to the products or modifications to supply chain tracking. Our deep neural network system shows high accuracy on branded garments from our first manufacturer tested (99.71% after 3.06% rejections) using images captured under natural, weakly controlled conditions, such as in retail stores, customs checkpoints, warehouses, and outdoors. Our system, suitably transfer trained on a small number of fake and genuine articles, should find application in additional product categories as well, for example fashion accessories, perfume boxes, medicines, and more.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# LeanAgent: 形式理論の証明のための生涯学習 LeanAgent: Lifelong Learning for Formal Theorem Proving ( http://arxiv.org/abs/2410.06209v6 ) ライセンス: Link先を確認	Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, Anima Anandkumar,	(参考訳) 大規模言語モデル(LLM)は、リーンのようなインタラクティブな証明アシスタントと統合された際の形式的定理証明のような数学的推論タスクで成功している。既存のアプローチでは、学部レベルの数学のような特定の領域でうまく機能するために、特定のデータセット上でLLMを訓練または微調整する。これらの手法は高度な数学への一般化性に苦しむ。基本的な制限は、これらのアプローチが静的なドメイン上で動作し、数学者が複数のドメインやプロジェクトを同時に、あるいは循環的にどのように機能するかを捉えることができないことである。 LeanAgentは定理証明のための新しい生涯学習フレームワークで、従来学習されていた知識を忘れずに、絶え間なく拡張された数学的知識を継続的に一般化し改善する。 LeanAgentは、数学的難易度の観点から学習軌道を最適化するカリキュラム学習戦略、進化する数学的知識の効率的な管理のための動的データベース、安定性と可塑性のバランスをとるための進歩的なトレーニングなど、いくつかの重要なイノベーションを紹介している。 LeanAgentは、23のリーンリポジトリで、以前は人間が証明していなかった162の定理の証明に成功した。静的 LLM ベースラインよりもはるかに優れた性能を示し、抽象代数や代数トポロジーのような領域における挑戦的な定理を証明し、基礎概念から高度なトピックへの学習の明確な進展を示している。さらに、LeanAgentの長寿命学習メトリクスにおける優れたパフォーマンスを分析します。 LeanAgentは、新しいタスクを学ぶことで、以前に学んだタスクのパフォーマンスが向上する、安定性と後方移行において、例外的なスコアを達成します。これはLeanAgentの継続的一般化性と改善を強調し、その優れた定理を提供するパフォーマンスを説明する。 Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 162 theorems previously unproved by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# News Reporter: 放送T.Vニュースのための多言語LLMフレームワーク News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News ( http://arxiv.org/abs/2410.07520v2 ) ライセンス: Link先を確認	Tarun Jain, Yufei Gao, Sridhar Vanga, Karan Singla,	(参考訳) 大規模言語モデル(LLM)は、さまざまなクエリに対して一貫性のある回答を提供する能力のため、多くの会話チャットボットにとって、急速に重要なツールになっている。これらのLSMを訓練するために使われるデータセットは、しばしば一般的なサンプルと合成サンプルの混合であり、T.V.ニュースに対して正確で検証可能な答えを提供するのに必要な検証が欠如している。我々は、米国中の様々なニュースチャンネルから、ニュース録音の書き起こしから抽出された大量のQAペアを収集し、共有する。結果のQAペアを使用して、既製のLCMモデルを微調整する。我々のモデルは、いくつかのオープン LLM ベンチマークにおいて、同様の大きさのベースモデルを上回る。我々はさらに、回答の文脈化を改善するためのRAG手法の統合と提案を行い、それを検証可能なニュース記録に向ける。 Large Language Models (LLMs) have fast become an essential tools to many conversational chatbots due to their ability to provide coherent answers for varied queries. Datasets used to train these LLMs are often a mix of generic and synthetic samples, thus lacking the verification needed to provide correct and verifiable answers for T.V. News. We collect and share a large collection of QA pairs extracted from transcripts of news recordings from various news-channels across the United States. Resultant QA pairs are then used to fine-tune an off-the-shelf LLM model. Our model surpasses base models of similar size on several open LLM benchmarks. We further integrate and propose a RAG method to improve contextualization of our answers and also point it to a verifiable news recording.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# 計算関数表現モデル The Function-Representation Model of Computation ( http://arxiv.org/abs/2410.07928v3 ) ライセンス: Link先を確認	Alfredo Ibias, Hector Antona, Guillem Ramirez-Miranda, Enric Guinovart, Eduard Alarcon,	(参考訳) 認知アーキテクチャは、人工的な認知を開発する研究の最前線である。しかし、分離されたメモリとプログラムモデルから問題にアプローチする。この計算モデルには、知識検索ヒューリスティックという根本的な問題がある。本稿では,メモリとプログラムが統合された新しい計算モデルであるFunction-Representationを用いて,この問題を解決することを提案する。この計算モデルは、一般的な関数表現を定義し、その複数のインスタンスをインスタンス化する。本稿では,数学的定義と証明による新しい計算モデルの可能性について考察する。また、Function-Representationが実装できる関数の種類についても検討し、Function-Representationの複数のインスタンスを整理するさまざまな方法を提示します。 Cognitive Architectures are the forefront of the research into developing an artificial cognition. However, they approach the problem from a separated memory and program model of computation. This model of computation poses a fundamental problem: the knowledge retrieval heuristic. In this paper we propose to solve this problem by using a novel model of computation, one where memory and program are merged: the Function-Representation. This model of computation involves defining a generic Function-Representation and instantiating multiple instances of it. In this paper we explore the potential of this novel model of computation through mathematical definitions and proofs. We also explore the kind of functions a Function-Representation can implement, and present different ways to organise multiple instances of a Function-Representation.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# Packing Analysis: 監督されたファインチューニングにおいて、大規模モデルやデータセットにパッキングが適している Packing Analysis: Packing Is More Appropriate for Large Models or Datasets in Supervised Fine-tuning ( http://arxiv.org/abs/2410.08081v3 ) ライセンス: Link先を確認	Shuhe Wang, Guoyin Wang, Yizhong Wang, Jiwei Li, Eduard Hovy, Chen Guo,	(参考訳) プリトレーニングフェーズで最初に使用されるパッキングは、異なるトレーニングシーケンスを組み合わせてモデルの最大入力長に適合させることで、ハードウェアリソース効率を最大化する最適化技術である。プレトレーニングにおいて有効性を示したが,(1)パッキングが性能を維持しつつトレーニング効率を効果的に向上するか否か,(2)パッキング法に適合するモデルとデータセットのサイズ,(3)無関係または関連するトレーニングサンプルのパッキングが過度に無視または過度にモデルに影響を及ぼす可能性があるか,といった点において,教師付き微調整(SFT)段階の総合的な分析が欠如している。本稿では,69Kから1.2MまでのSFTデータセットと8Bから70Bのモデルを対象として,パディングとパッキングを用いたSFT法の比較を行った。これは、パッキング対パディングの利点と制限に関する最初の包括的な分析と、さまざまなトレーニングシナリオでパッキングを実装するための実践的な考慮を提供する。我々の分析では、知識、推論、コーディング、GPTに基づく評価、時間効率、その他の微調整パラメータなど、様々なベンチマークを網羅している。また、細調整と評価のためのコードをオープンソースとして公開し、さまざまなサイズのデータセットに微調整されたチェックポイントを提供し、今後のパッキング手法の研究を進めることを目指しています。コードは、https://github.com/ShuheWang1998/Packing-Analysis? tab=readme-ov-file Packing, initially utilized in the pre-training phase, is an optimization technique designed to maximize hardware resource efficiency by combining different training sequences to fit the model's maximum input length. Although it has demonstrated effectiveness during pre-training, there remains a lack of comprehensive analysis for the supervised fine-tuning (SFT) stage on the following points: (1) whether packing can effectively enhance training efficiency while maintaining performance, (2) the suitable size of the model and dataset for fine-tuning with the packing method, and (3) whether packing unrelated or related training samples might cause the model to either excessively disregard or over-rely on the context. In this paper, we perform extensive comparisons between SFT methods using padding and packing, covering SFT datasets ranging from 69K to 1.2M and models from 8B to 70B. This provides the first comprehensive analysis of the advantages and limitations of packing versus padding, as well as practical considerations for implementing packing in various training scenarios. Our analysis covers various benchmarks, including knowledge, reasoning, and coding, as well as GPT-based evaluations, time efficiency, and other fine-tuning parameters. We also open-source our code for fine-tuning and evaluation and provide checkpoints fine-tuned on datasets of different sizes, aiming to advance future research on packing methods. Code is available at: https://github.com/ShuheWang1998/Packing-Analysis?tab=readme-ov-file.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06
# 適応対称性インフォームドベイズ戦略による低温原子実験における5倍精度向上 Five-fold precision enhancement in a cold atom experiment via adaptive symmetry-informed Bayesian strategies ( http://arxiv.org/abs/2410.10615v2 ) ライセンス: Link先を確認	Matt Overton, Jesús Rubio, Nathan Cooper, Daniele Baldolini, David Johnson, Janet Anders, Lucia Hackermüller,	(参考訳) ベイジアン法はデバイス性能の向上とデータ収集の高速化を約束する。量子技術実験において, 原子数推定のための適応ベイズ測定戦略を実証し, 対称性インフォームド損失関数を応用した。提案手法は, 標準的な非最適化戦略と比較して, 原子数推定の分数分散を5倍に削減する。同様に、以前のデータポイントの3分の1を目標精度で達成する。我々は、量子コンピューティング、通信、気象学、およびより広い量子技術分野におけるこれらの戦略の適用を容易にし、対称性インフォームド戦略に対応可能な任意の量に対する最適な推定値と誤差の一般的な式を提供する。 Bayesian methods promise enhanced device performance and accelerated data collection. We demonstrate an adaptive Bayesian measurement strategy for atom number estimation in a quantum technology experiment, utilising a symmetry-informed loss function. Compared to a standard unoptimised strategy, our method yields a five-fold reduction in the fractional variance of the atom number estimate. Equivalently, it achieves the target precision with a third of the data points previously required. We provide general expressions for the optimal estimator and error for any quantity amenable to symmetry-informed strategies, facilitating the application of these strategies in quantum computing, communication, metrology, and the wider quantum technology sector.	翻訳日:2024-11-07 19:39:48 公開日:2024-11-06

Title

Authors

Abstract

論文公表日・翻訳日

# DeNetDM: ネットワーク深さ変調によるデバイアス

DeNetDM: Debiasing by Network Depth Modulation ( http://arxiv.org/abs/2403.19863v2 )

ライセンス: Link先を確認

Silpa Vadakkeeveetil Sreelatha, Adarsh Kappiyath, Abhra Chaudhuri, Anjan Dutta,

(参考訳) ニューラルネットワークがバイアス付きデータセットに基づいてトレーニングされる場合、彼らは必然的に急激な相関を学習する傾向にあり、強力な一般化と堅牢性を達成する上での課題に繋がる。このようなバイアスに対処する現在のアプローチは、一般的にバイアスアノテーションの利用、疑似バイアスラベルに基づくリウェイト、または増補手法によるバイアス強調データポイント内の多様性の向上を含む。 DeNetDMは、浅層ニューラルネットワークが学習コア属性を優先するのに対して、より深いものは、異なる情報を取得することを課題とする際のバイアスを強調するという観察に基づく、新しいデバイアス手法である。エキスパートのプロダクトから派生したトレーニングパラダイムを使用して、深いアーキテクチャと浅いアーキテクチャを持つ偏見のあるブランチと偏見のあるブランチの両方を作成し、知識を蒸留して、ターゲットの偏見のあるモデルを生成する。大規模な実験と分析により、我々のアプローチは現在のデバイアス技術より優れており、3つのデータセットで約5%の顕著な改善を実現し、合成データと実世界のデータの両方を包含していることが示された。注目すべきは、DeNetDMはバイアスラベルやバイアスタイプに関連するアノテーションを必要とせずにこれを達成すると同時に、監視対象と同等のパフォーマンスを提供することだ。さらに,本手法は,データ内のバイアス強調点の多様性を効果的に活用し,従来の手法を超越し,バイアス強調点の多様性を高めるための明示的な拡張法の必要性を回避している。ソースコードは受理時に利用可能になる。

When neural networks are trained on biased datasets, they tend to inadvertently learn spurious correlations, leading to challenges in achieving strong generalization and robustness. Current approaches to address such biases typically involve utilizing bias annotations, reweighting based on pseudo-bias labels, or enhancing diversity within bias-conflicting data points through augmentation techniques. We introduce DeNetDM, a novel debiasing method based on the observation that shallow neural networks prioritize learning core attributes, while deeper ones emphasize biases when tasked with acquiring distinct information. Using a training paradigm derived from Product of Experts, we create both biased and debiased branches with deep and shallow architectures and then distill knowledge to produce the target debiased model. Extensive experiments and analyses demonstrate that our approach outperforms current debiasing techniques, achieving a notable improvement of around 5% in three datasets, encompassing both synthetic and real-world data. Remarkably, DeNetDM accomplishes this without requiring annotations pertaining to bias labels or bias types, while still delivering performance on par with supervised counterparts. Furthermore, our approach effectively harnesses the diversity of bias-conflicting points within the data, surpassing previous methods and obviating the need for explicit augmentation-based methods to enhance the diversity of such bias-conflicting points. The source code will be available upon acceptance.

翻訳日:2024-11-09 03:37:09 公開日:2024-11-06

# ファインライン:ダウンストリーム能力分析による大規模言語モデルの事前学習

The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis ( http://arxiv.org/abs/2404.01204v2 )

ライセンス: Link先を確認

Chen Yang, Junzhuo Li, Xinyao Niu, Xinrun Du, Songyang Gao, Haoran Zhang, Zhaoliang Chen, Xingwei Qu, Ruibin Yuan, Yizhi Li, Jiaheng Liu, Stephen W. Huang, Shawn Yue, Jie Fu, Ge Zhang,

(参考訳) 最終的なモデルパフォーマンスを反映した初期段階のメトリクスを明らかにすることは、大規模な事前トレーニングのコア原則のひとつです。既存のスケーリング法則では,事前学習損失とトレーニングフロップの相関関係が示されており,これは大規模言語モデルにおける現在のトレーニング状態の重要な指標となっている。しかし、この原則はトレーニングデータに対するモデルの圧縮特性のみに焦点を当てており、結果として下流タスクの能力改善と矛盾する。いくつかの追従的な研究は、スケーリング法則をより複雑なメトリクス(ハイパーパラメータなど)に拡張しようとしたが、事前訓練中に様々な能力の動的差異を包括的に分析することはできなかった。上記の制約に対処するため,本研究では,様々な事前学習中間チェックポイントにおけるモデル機能の包括的比較を行う。この分析により、特定の下流メトリクスが、最大67億のパラメータを含む、異なるサイズのモデルにまたがる同様のトレーニングダイナミクスを示すことを確認した。コアの発見に加えて、AmberとOpenLLaMAを再現し、中間チェックポイントをリリースしました。このイニシアチブは、研究コミュニティに貴重なリソースを提供し、オープンソース研究者によるLLM事前学習の検証と探索を容易にする。さらに、異なるモデルと能力のパフォーマンス比較や、異なるトレーニングフェーズのための重要なメトリクスの授業など、実証的な要約も提供します。これらの知見に基づき、我々は最適化状態を評価するためのよりユーザフレンドリな戦略を提供し、安定した事前学習プロセスを確立するためのガイダンスを提供する。

Uncovering early-stage metrics that reflect final model performance is one core principle for large-scale pretraining. The existing scaling law demonstrates the power-law correlation between pretraining loss and training flops, which serves as an important indicator of the current training state for large language models. However, this principle only focuses on the model's compression properties on the training data, resulting in an inconsistency with the ability improvements on the downstream tasks. Some follow-up works attempted to extend the scaling-law to more complex metrics (such as hyperparameters), but still lacked a comprehensive analysis of the dynamic differences among various capabilities during pretraining. To address the aforementioned limitations, this paper undertakes a comprehensive comparison of model capabilities at various pretraining intermediate checkpoints. Through this analysis, we confirm that specific downstream metrics exhibit similar training dynamics across models of different sizes, up to 67 billion parameters. In addition to our core findings, we've reproduced Amber and OpenLLaMA, releasing their intermediate checkpoints. This initiative offers valuable resources to the research community and facilitates the verification and exploration of LLM pretraining by open-source researchers. Besides, we provide empirical summaries, including performance comparisons of different models and capabilities, and tuition of key metrics for different training phases. Based on these findings, we provide a more user-friendly strategy for evaluating the optimization state, offering guidance for establishing a stable pretraining process.

翻訳日:2024-11-09 03:37:09 公開日:2024-11-06

# 信念に対する知覚:大規模言語モデルにおける心の理論の事前推論を探る

Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models ( http://arxiv.org/abs/2407.06004v3 )

ライセンス: Link先を確認

Chani Jung, Dongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim,

(参考訳) 人間は心の理論(ToM)を自然に開発するが、他者の精神状態や信念を理解する能力は、単純なToMベンチマークでは性能が劣る。我々は、人間のToM前駆体$-$perception inferenceと知覚-to-belief inference$-$in LLMsを評価することで、LLMのToM能力に対する理解を深めることができると仮定する。本稿では2つのデータセット,Percept-ToMi と Percept-FANToM を導入し,ToMi と FANToM に対する文字の認識をアノテートすることで,LLM におけるこれらのToM の前駆的推論を評価する。 8種類のLLMを評価した結果, モデルが知覚的推論において良好に機能し, 知覚的信頼的推論(例えば, 抑制的制御の欠如)の能力に限界があることが判明した。これらの結果に基づいて,LLMの強い知覚推論能力を活用しつつ,限られた知覚と信頼の推論を補完する新しいToM手法であるPercepToMを提案する。実験結果から,PercepToM は LLM の性能を著しく向上させることが明らかとなった。

While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors$-$perception inference and perception-to-belief inference$-$in LLMs. We introduce two datasets, Percept-ToMi and Percept-FANToM, to evaluate these precursory inferences for ToM in LLMs by annotating characters' perceptions on ToMi and FANToM, respectively. Our evaluation of eight state-of-the-art LLMs reveals that the models generally perform well in perception inference while exhibiting limited capability in perception-to-belief inference (e.g., lack of inhibitory control). Based on these results, we present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference. Experimental results demonstrate that PercepToM significantly enhances LLM's performance, especially in false belief scenarios.

翻訳日:2024-11-08 23:13:33 公開日:2024-11-06

# Dual-Inference Large Language Modelを用いた解釈可能な微分診断

Interpretable Differential Diagnosis with Dual-Inference Large Language Models ( http://arxiv.org/abs/2407.07330v2 )

ライセンス: Link先を確認

Shuang Zhou, Mingquan Lin, Sirui Ding, Jiashuo Wang, Genevieve B. Melton, James Zou, Rui Zhang,

(参考訳) DDx(Automatic differential diagnosis)は、患者の症状記述に基づいて、潜在的な疾患のリストをディファレンシャルとして生成する重要な医療課題である。実際には、これらの差分診断を解釈することは大きな価値をもたらすが、未発見のままである。大規模言語モデル (LLM) の強力な機能を考えると, DDx の解釈に LLM を用いて検討した。具体的には, 570 個の臨床ノートに専門家由来の解釈を用いた最初の DDx データセットをキュレートした。さらに, DDx 解釈のために LLM が双方向の推論(症状から診断まで, 逆も含む)を可能にする新しいフレームワークである Dual-Inf を提案する。人および自動評価は, 4基LLMにおける差分予測および解法の有効性を検証した。さらに、Dual-Infは解釈エラーを減らし、稀な疾患の説明を約束する。我々の知る限りでは、DDx説明のためにLLMをカスタマイズし、その解釈性能を総合的に評価する最初の作品である。本研究は,DDxの解釈において重要なギャップを埋め,臨床的意思決定を促進するものである。

Automatic differential diagnosis (DDx) is an essential medical task that generates a list of potential diseases as differentials based on patient symptom descriptions. In practice, interpreting these differential diagnoses yields significant value but remains under-explored. Given the powerful capabilities of large language models (LLMs), we investigated using LLMs for interpretable DDx. Specifically, we curated the first DDx dataset with expert-derived interpretation on 570 clinical notes. Besides, we proposed Dual-Inf, a novel framework that enabled LLMs to conduct bidirectional inference (i.e., from symptoms to diagnoses and vice versa) for DDx interpretation. Both human and automated evaluation validated its efficacy in predicting and elucidating differentials across four base LLMs. In addition, Dual-Inf could reduce interpretation errors and hold promise for rare disease explanations. To the best of our knowledge, it is the first work that customizes LLMs for DDx explanation and comprehensively evaluates their interpretation performance. Overall, our study bridges a critical gap in DDx interpretation and enhances clinical decision-making.

翻訳日:2024-11-08 22:40:08 公開日:2024-11-06

# 選択的G-双スペクトルとその逆変換:G-不変ネットワークへの応用

The Selective G-Bispectrum and its Inversion: Applications to G-Invariant Networks ( http://arxiv.org/abs/2407.07655v2 )

ライセンス: Link先を確認

Simon Mataigne, Johan Mathe, Sophia Sanborn, Christopher Hillar, Nina Miolane,

(参考訳) 信号処理と深層学習において重要な問題は、タスクに関係のないニュアンス要因に対して「textit{invariance}」を達成することである。これらの因子の多くは群$G$(例えば回転、変換、スケーリング)の作用として記述できるので、メソッドは$G$不変であることが望まれる。 G$-Bispectrumは、与えられた信号のすべての特性をグループアクションまで抽出する。その結果、$G$-Bispectrumは、プール機構に似た$G$-invariance\textemdashの計算プリミティブとしてディープニューラルネットワークアーキテクチャに組み込まれている。しかしながら、$G$-Bispectrum ($\mathcal{O}(|G|^2)$, with $|G|$ の計算コストは広く採用されている。ここでは、$G$-Bispectrum計算は、$\mathcal{O}(|G|)$ complexity で \textit{selective $G$-Bispectrum} に還元できる冗長性を含むことを示す。我々は、選択的な$G$-Bispectrumの数学的特性を証明し、ニューラルネットワークへの統合が従来のアプローチと比較して精度と堅牢性を向上し、フルの$G$-Bispectrumと比較してかなりのスピードアップを享受することを示した。

An important problem in signal processing and deep learning is to achieve \textit{invariance} to nuisance factors not relevant for the task. Since many of these factors are describable as the action of a group $G$ (e.g. rotations, translations, scalings), we want methods to be $G$-invariant. The $G$-Bispectrum extracts every characteristic of a given signal up to group action: for example, the shape of an object in an image, but not its orientation. Consequently, the $G$-Bispectrum has been incorporated into deep neural network architectures as a computational primitive for $G$-invariance\textemdash akin to a pooling mechanism, but with greater selectivity and robustness. However, the computational cost of the $G$-Bispectrum ($\mathcal{O}(|G|^2)$, with $|G|$ the size of the group) has limited its widespread adoption. Here, we show that the $G$-Bispectrum computation contains redundancies that can be reduced into a \textit{selective $G$-Bispectrum} with $\mathcal{O}(|G|)$ complexity. We prove desirable mathematical properties of the selective $G$-Bispectrum and demonstrate how its integration in neural networks enhances accuracy and robustness compared to traditional approaches, while enjoying considerable speeds-up compared to the full $G$-Bispectrum.

翻訳日:2024-11-08 22:40:08 公開日:2024-11-06

# 多視点逆学習による自己教師付き3Dポイントクラウドコンプリート

Self-supervised 3D Point Cloud Completion via Multi-view Adversarial Learning ( http://arxiv.org/abs/2407.09786v2 )

ライセンス: Link先を確認

Lintai Wu, Xianjing Cheng, Yong Xu, Huanqiang Zeng, Junhui Hou,

(参考訳) 現実のシナリオでは、スキャンされた点雲はしばしば閉塞問題のために不完全である。自己監督点雲完備化の課題は、完全な地底の真実を監督することなく、これらの不完全な物体の欠落した領域を再構築することである。現在の自己監督法は、監視のために部分観測の複数の視点に依存するか、または与えられた部分点雲から特定され、利用することができる固有の幾何学的類似性を見渡すかのいずれかである。本稿では,オブジェクトレベルとカテゴリ固有の幾何学的類似性を効果的に活用するフレームワークであるMAL-SPCを提案する。私たちのMAL-SPCは3Dの完全な監視を一切必要とせず、各オブジェクトに1つの部分点クラウドを必要とするだけです。具体的には、まず、部分入力と予測形状との間の類似した位置と曲率パターンを検索し、これらの類似性を活用して再構成結果の密度化と精査を行うパターン検索ネットワークを提案する。さらに、再構成された完全形状を多視点深度マップに描画し、カテゴリ固有の一視点深度画像から対象形状の幾何学を学習するための対角学習モジュールを設計する。異方性レンダリングを実現するために,レンダリング画像の品質向上を目的とした密度認識半径推定アルゴリズムを設計する。私たちのMAL-SPCは、現在の最先端のメソッドと比較して、最高の結果をもたらします。

In real-world scenarios, scanned point clouds are often incomplete due to occlusion issues. The task of self-supervised point cloud completion involves reconstructing missing regions of these incomplete objects without the supervision of complete ground truth. Current self-supervised methods either rely on multiple views of partial observations for supervision or overlook the intrinsic geometric similarity that can be identified and utilized from the given partial point clouds. In this paper, we propose MAL-SPC, a framework that effectively leverages both object-level and category-specific geometric similarities to complete missing structures. Our MAL-SPC does not require any 3D complete supervision and only necessitates a single partial point cloud for each object. Specifically, we first introduce a Pattern Retrieval Network to retrieve similar position and curvature patterns between the partial input and the predicted shape, then leverage these similarities to densify and refine the reconstructed results. Additionally, we render the reconstructed complete shape into multi-view depth maps and design an adversarial learning module to learn the geometry of the target shape from category-specific single-view depth images. To achieve anisotropic rendering, we design a density-aware radius estimation algorithm to improve the quality of the rendered images. Our MAL-SPC yields the best results compared to current state-of-the-art methods.We will make the source code publicly available at \url{https://github.com/ltwu6/malspc

翻訳日:2024-11-08 21:54:45 公開日:2024-11-06

# CIBench: コードインタープリタプラグインによるLLMの評価

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin ( http://arxiv.org/abs/2407.10499v3 )

ライセンス: Link先を確認

Chuyu Zhang, Songyang Zhang, Yingfan Hu, Haowen Shen, Kuikun Liu, Zerun Ma, Fengzhe Zhou, Wenwei Zhang, Xuming He, Dahua Lin, Kai Chen,

(参考訳) 複雑な問題を解決するために外部ツールを使用するLCMベースのエージェントは大きな進歩を遂げているが、それらの能力のベンチマークは困難であり、それによってそれらの制限を明確に理解するのを妨げる。本稿では,データサイエンスタスクにコードインタプリタを利用するLLMの能力を総合的に評価する,CIBenchという対話型評価フレームワークを提案する。評価フレームワークは評価データセットと2つの評価モードを含む。評価データセットは,LLM-人的協調手法を用いて構築され,連続的かつ対話的なIPythonセッションを活用することによって,実際のワークフローをシミュレートする。 2つの評価モードは、LLMの人的援助なしでの能力を評価する。コードインタプリタの利用において, CIBench 上で 24 個の LLM の能力を解析し, 将来の LLM に対する貴重な洞察を提供するため, 広範囲にわたる実験を行った。

While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation framework includes an evaluation dataset and two evaluation modes. The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions. The two evaluation modes assess LLMs' ability with and without human assistance. We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.

翻訳日:2024-11-08 21:32:38 公開日:2024-11-06

# 列車なし、全利得:自己監督のグラディエントは深い凍結表現を改善する

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations ( http://arxiv.org/abs/2407.10964v2 )

ライセンス: Link先を確認

Walter Simoncini, Spyros Gidaris, Andrei Bursuc, Yuki M. Asano,

(参考訳) 本稿では、自己教師付き勾配を利用してトランスフォーマーエンコーダの特徴を高める方法であるUNsupervised GradIentsの機能であるFUNGIを紹介する。事前訓練されたモデルがあれば、まず入力毎に様々な自己教師対象からの勾配を計算します。これらの勾配は低次元に投影され、その後モデルの出力埋め込みと連結される。得られた特徴は、視覚からの11データセット、自然言語処理からの5データセット、オーディオからの2データセットの k-nearest 隣の分類に基づいて評価される。さまざまなサイズと事前トレーニング戦略にまたがるバックボーン全体において、FUNGI機能は埋め込みよりも一貫したパフォーマンス改善を提供する。また,FUNGI機能の使用は,線形分類,クラスタリング,画像検索に有効であり,事前訓練されたモデルの検索に基づくコンテキスト内シーン理解能力,例えば意味的セグメンテーションにおいて,DINOを+17%向上させるなどを大幅に向上することを示した。

This paper introduces FUNGI, Features from UNsupervised GradIents, a method to enhance the features of transformer encoders by leveraging self-supervised gradients. Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. These gradients are projected to a lower dimension and then concatenated with the model's output embedding. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio. Across backbones spanning various sizes and pretraining strategies, FUNGI features provide consistent performance improvements over the embeddings. We also show that using FUNGI features can benefit linear classification, clustering and image retrieval, and that they significantly improve the retrieval-based in-context scene understanding abilities of pretrained models, for example improving upon DINO by +17% for semantic segmentation - without any training.

翻訳日:2024-11-08 21:32:38 公開日:2024-11-06

# 自己監督型音響マスクオートエンコーダを用いたユニバーサル音源分離

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder ( http://arxiv.org/abs/2407.11745v2 )

ライセンス: Link先を確認

Junqi Zhao, Xubo Liu, Jinzheng Zhao, Yi Yuan, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang,

(参考訳) ユニバーサルサウンド分離(Universal Sound separation, USS)は、任意の音源の混合物を分離するタスクである。通常、普遍的な分離モデルは、ラベル付きデータを使用して、監督された方法でスクラッチから訓練される。自己教師付き学習(SSL)は、ラベルのないデータを活用してタスクに依存しない表現を得る、新たなディープラーニングアプローチである。本稿では,音声マスク付きオートエンコーダ(A-MAE)の自己教師付き事前学習モデルについて,その分離性能を高めるため,普遍的な音源分離システムに統合することを提案する。 A-MAEのパラメータを微調整中に凍結または更新するSSL埋め込みを利用するための2つの戦略を採用している。 SSL埋め込みは、短時間フーリエ変換(STFT)と結合され、分離モデルの入力機能として機能する。提案手法をAudioSetデータセット上で評価した結果,提案手法は最先端のResUNetベースUSSモデルの分離性能を向上させることができた。

Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we propose integrating a self-supervised pre-trained model, namely the audio masked autoencoder (A-MAE), into a universal sound separation system to enhance its separation performance. We employ two strategies to utilize SSL embeddings: freezing or updating the parameters of A-MAE during fine-tuning. The SSL embeddings are concatenated with the short-time Fourier transform (STFT) to serve as input features for the separation model. We evaluate our methods on the AudioSet dataset, and the experimental results indicate that the proposed methods successfully enhance the separation performance of a state-of-the-art ResUNet-based USS model.

翻訳日:2024-11-08 20:59:00 公開日:2024-11-06

# GPT-4Vは放射線学のレポートをまだ生成できない

GPT-4V Cannot Generate Radiology Reports Yet ( http://arxiv.org/abs/2407.12176v2 )

ライセンス: Link先を確認

Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan,

(参考訳) GPT-4Vの強いマルチモーダル能力は、放射線学レポート作成の自動化に関心を喚起するが、徹底的な評価は得られていない。本研究では,2つの胸部X線レポートデータセット(MIMIC-CXRとIU X-Ray)について,GPT-4Vの系統的評価を行った。我々は, GPT-4V を用いた報告を異なるプロンプト戦略により直接生成し, 語彙指標と臨床効果指標の両方で異常を生じさせることを試みた。低パフォーマンスを理解するために、タスクを2つのステップに分解します。 1)画像から医療条件ラベルを予測するための医用画像推論ステップ 2)(地中)条件から報告を生成するための報告合成ステップ。画像推論におけるGPT-4Vの性能は、異なるプロンプト間で一貫して低いことを示す。実際、モデル予測ラベルの分布は、画像上にどの基底条件が存在するかに関わらず一定であり、モデルが胸部X線を有意に解釈していないことを示唆している。レポート合成における基底条件が与えられたとしても、その生成した報告は微調整されたLLaMA-2よりも正確で自然音の少ないものである。また,GPT-4Vを放射線学のワークフローで用いる可能性についても疑念を呈していた。

GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow.

翻訳日:2024-11-08 20:48:00 公開日:2024-11-06

# GPT-4Vは放射線学のレポートをまだ生成できない

GPT-4V Cannot Generate Radiology Reports Yet ( http://arxiv.org/abs/2407.12176v3 )

ライセンス: Link先を確認

Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan,

翻訳日:2024-11-08 20:48:00 公開日:2024-11-06

# 距離・予測・地図のない視覚空間ナビゲーション

Visuospatial navigation without distance, prediction, or maps ( http://arxiv.org/abs/2407.13535v2 )

ライセンス: Link先を確認

Patrick Govoni, Pawel Romanczuk,

(参考訳) ナビゲーションは、少なくとも2つの部分的に解離可能な、同時に開発された脳のシステムによって制御される。認知地図は、生物がその位置、軸受、環境特徴間の距離を知らせ、ショートカットを可能にする。一方、応答に基づくナビゲーションは、知覚作用対を経路に結合するプロセスは、不正確で柔軟性に欠け、最終的に地図ベースの表現を保存していると見なされる。このように、ナビゲーションモデルは、応答に基づく戦略を無視しながら、予測制御と距離知覚によって構築されたトップダウンの地図の優位性を仮定する傾向にある。ここでは、従来の視覚ナビゲーションタスクにおける最小限のフィードフォワードフレームワークの有効性を示す。我々のエージェントは、直接視覚を移動に翻訳し、オープンフィールドの隠れた目標に向かって移動します。視覚的距離は目標への直接的軌跡を可能にするが、2つの異なるアルゴリズムは視覚的角度だけで頑健にナビゲートするように開発されている。それぞれに独自の文脈的トレードオフが与えられ、またげっ歯類、昆虫、魚、精子の細胞で観察される運動行動と一致し、反応に基づく戦略の広範な重要性が示唆される。計算コストの高いトップダウン表現へのオンラインアクセスを前提とせず、ボトムアップからのナビゲーションのさらなる研究を提唱する。

Navigation is controlled by at least two partially dissociable, concurrently developed systems in the brain. The cognitive map informs an organism of its location, bearing, and distances between environmental features, enabling shortcuts. Response-based navigation, on the other hand, the process of combining percept-action pairs into routes, is regarded as inaccurate and inflexible, ultimately subserving map-based representation. As such, navigation models tend to assume the primacy of maps, top-down constructed via predictive control and distance perception, while neglecting response-based strategies. Here we show the sufficiency of a minimal feedforward framework in a classic visual navigation task. Our agents, directly translating visual perception to movement, navigate to a hidden goal in an open field, an environment often assumed to require a map-based representation. While visual distance enables direct trajectories to the goal, two distinct algorithms develop to robustly navigate using visual angles alone. Each of the three confers unique contextual tradeoffs as well as aligns with movement behavior observed in rodents, insects, fish, and sperm cells, suggesting the widespread significance of response-based strategies. We advocate further study of navigation from the bottom-up without assuming online access to computationally expensive top-down representations, which may better explain behavior under energetic or attentional constraints.

翻訳日:2024-11-08 20:14:30 公開日:2024-11-06

# 多様体位相学習のためのユーレアン表現における永続ド・ラム=ホッジ・ラプラシアン

Persistent de Rham-Hodge Laplacians in Eulerian representation for manifold topological learning ( http://arxiv.org/abs/2408.00220v2 )

ライセンス: Link先を確認

Zhe Su, Yiying Tong, Guo-Wei Wei,

(参考訳) 近年、トポロジカルデータ分析はデータサイエンスとエンジニアリングのトレンドとなっている。しかし、トポロジカルデータ解析の鍵となる技術、すなわち永続ホモロジーは、多様体上のデータに対して直接作用しない点クラウドデータ上で定義される。初期の進化的ド・ラム=ホッジ理論は多様体に関するデータを扱うが、ラグランジュ表現における多様体の補間による数値的な矛盾のため、機械学習の応用には不都合である。本稿では, 多様体位相学習の略として, 永続的ド・ラム・ホッジ・ラプラシアン, または持続的ホッジ・ラプラシアン(PHL)を導入する。我々のPHLは、多スケール多様体上の数値的不整合を回避し、構造パーバーするカルト格子を通してユーレアン表現内に構築される。多様体トポロジ学習を容易にするために,多様体や体積データ上のデータに対する持続的ホッジラプラシアン学習アルゴリズムを提案する。提案した多様体トポロジカル学習モデルの原理的応用として、2つのベンチマークデータセットによるタンパク質-リガンド結合親和性の予測を考察する。提案手法のパワーと将来性を明らかにする数値実験を行った。

Recently, topological data analysis has become a trending topic in data science and engineering. However, the key technique of topological data analysis, i.e., persistent homology, is defined on point cloud data, which does not work directly for data on manifolds. Although earlier evolutionary de Rham-Hodge theory deals with data on manifolds, it is inconvenient for machine learning applications because of the numerical inconsistency caused by remeshing the involving manifolds in the Lagrangian representation. In this work, we introduce persistent de Rham-Hodge Laplacian, or persistent Hodge Laplacian (PHL) as an abbreviation, for manifold topological learning. Our PHLs are constructed in the Eulerian representation via structure-persevering Cartesian grids, avoiding the numerical inconsistency over the multiscale manifolds. To facilitate the manifold topological learning, we propose a persistent Hodge Laplacian learning algorithm for data on manifolds or volumetric data. As a proof-of-principle application of the proposed manifold topological learning model, we consider the prediction of protein-ligand binding affinities with two benchmark datasets. Our numerical experiments highlight the power and promise of the proposed method.

翻訳日:2024-11-08 20:01:00 公開日:2024-11-06

# AIアライメントにおける嗜好を超えて

Beyond Preferences in AI Alignment ( http://arxiv.org/abs/2408.16984v2 )

ライセンス: Link先を確認

Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton,

(参考訳) AIアライメントの主流の実践は、(1)嗜好が人間の価値観の適切な表現であること、(2)人間の合理性は嗜好の満足度を最大化すること、(3)AIシステムは1人以上の人の嗜好と整合して、我々の価値観に従って安全に行動することを保証するべきであることを前提としている。暗黙的に従うか、明示的に支持されるかにかかわらず、これらのコミットメントは、私たちがAIアライメントに対する優先的なアプローチと呼ぶものを構成する。本稿では,さらなる研究に欠かせない概念的・技術的選択肢を記述し,優先主義的アプローチを特徴付け,挑戦する。本稿はまず,有理選択理論の限界を記述的モデルとして調査し,人的価値の厚い意味的内容の獲得に優先権が如何に失敗するか,実用的表現がそれらの価値の不可避性を如何に無視するかを説明する。次に、我々は、人間とAIに対する期待されたユーティリティ理論(EUT)の規範性を批判し、合理的エージェントがEUTに準拠すべきでないことの議論を引き合いに出し、EUTがどの規範的に受け入れられるかについて沈黙しているかを強調した。最後に、これらの制限がAIアライメントの目標の再フレーミングを動機付けていると論じる: 人間のユーザ、開発者、あるいは人間性に富んだ大きな好みに合わせる代わりに、AIシステムは、汎用アシスタントの役割など、彼らの社会的役割に適する規範的な標準に適合すべきである。さらに、これらの標準は、関連するすべてのステークホルダーによって交渉され、合意されるべきです。この代替的なアライメントの概念では、AIシステムの多種多様さは、複数の値と分岐した値に関わらず、相互利益を促進し、害を制限する規範的な標準と整合して、多様な目的を達成することができる。

The dominant practice of AI alignment assumes (1) that preferences are an adequate representation of human values, (2) that human rationality can be understood in terms of maximizing the satisfaction of preferences, and (3) that AI systems should be aligned with the preferences of one or more humans to ensure that they behave safely and in accordance with our values. Whether implicitly followed or explicitly endorsed, these commitments constitute what we term a preferentist approach to AI alignment. In this paper, we characterize and challenge the preferentist approach, describing conceptual and technical alternatives that are ripe for further research. We first survey the limits of rational choice theory as a descriptive model, explaining how preferences fail to capture the thick semantic content of human values, and how utility representations neglect the possible incommensurability of those values. We then critique the normativity of expected utility theory (EUT) for humans and AI, drawing upon arguments showing how rational agents need not comply with EUT, while highlighting how EUT is silent on which preferences are normatively acceptable. Finally, we argue that these limitations motivate a reframing of the targets of AI alignment: Instead of alignment with the preferences of a human user, developer, or humanity-writ-large, AI systems should be aligned with normative standards appropriate to their social roles, such as the role of a general-purpose assistant. Furthermore, these standards should be negotiated and agreed upon by all relevant stakeholders. On this alternative conception of alignment, a multiplicity of AI systems will be able to serve diverse ends, aligned with normative standards that promote mutual benefit and limit harm despite our plural and divergent values.

翻訳日:2024-11-08 19:50:01 公開日:2024-11-06

# 言語モデルにおける文脈認識の嗜好モデルの改善

Improving Context-Aware Preference Modeling for Language Models ( http://arxiv.org/abs/2407.14916v2 )

ライセンス: Link先を確認

Silviu Pitis, Ziang Xiao, Nicolas Le Roux, Alessandro Sordoni,

(参考訳) ペアの選好から言語モデルを微調整することは極めて効果的であることが証明されているが、自然言語の未特定の性質は重要な課題を呈している。直接の嗜好フィードバックは解釈不能であり、多次元の基準が適用可能な場所を提供するのが困難であり、不完全な指示に基づくものであるか、様々なプリンシパルによって提供されるため、しばしば矛盾する。これらの課題に対処するために、まず、コンテキストを選択し、選択したコンテキストに対して好みを評価する2段階の選好モデリング手法を検討する。これら2つのステップに従って報酬モデリング誤差を分解し、文脈固有の嗜好に加えて文脈を監督することは、モデルと多様な人間の嗜好を整合させるための実行可能なアプローチである可能性を示唆している。これを実行するためには、コンテキスト固有の嗜好を評価するモデルの能力が不可欠である。この目的のために、文脈条件付き嗜好データセットと、文脈固有の嗜好を評価する言語モデルの有効性を調査する伴奏実験をコントリビュートする。我々は(1)既存の嗜好モデルの利点を示すためにデータセットを使用し、(2)テストデータセット上でのGPT-4およびLlama 370Bを超える文脈特異的なパフォーマンスを持つ文脈対応報酬モデルを作成し、(3)文脈対応嗜好モデルの価値を調査する。

While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute context-conditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B on tested datasets, and (3) investigate the value of context-aware preference modeling.

翻訳日:2024-11-08 19:27:32 公開日:2024-11-06

# マルチスポットホログラフィーツイーザーのフィードバック強度等化アルゴリズム

Feedback Intensity Equalization Algorithm for Multi-Spots Holographic Tweezer ( http://arxiv.org/abs/2407.17049v2 )

ライセンス: Link先を確認

Shaoxiong Wang, Yifei Hu, Yaoting Zhou, Peng Lan, Heng Shen, Zhongxiao Xu,

(参考訳) 高度調整性のおかげで、ホログラフィック・ツイーザーアレイは任意のジオメトリ原子配列を作るのに最適な選択であることが証明された。ホログラフィックトウィーザーアレイ実験では、通常、空間光変調器(SLM)によって生成された光トウィーザーが静的トウィーザーアレイとして使用される。交流スタークシフト効果により、トラップの強度差は異なる光シフトを引き起こす。したがって、強度等化の最適化は、単原子からなる多体系において非常に重要である。本稿では,強度等化アルゴリズムの研究について報告する。このアルゴリズムにより、ツイーザーの大きさが1000より大きい場合、ツイーザーの均一性が96%を超える。解析により、さらなる均一性には光学系のさらなる最適化が必要であることが示された。強度等化アルゴリズムの実現は、単一原子配列に基づく多体実験において非常に重要である。

Thanks to the high degree of adjustability, holographic tweezer array has been proved to be the best choice to create arbitrary geometries atomic array. In holographic tweezer array experiment, optical tweezer generated by spatial light modulator (SLM) usually is used as static tweezer array. Due to the alternating current(AC) stark shifts effect, intensity difference of traps will cause different light shift. So, the optimization of intensity equalization is very important in many-body system consist of single atoms. Here we report a work on studying of intensity equalization algorithm. Through this algorithm, the uniformity of tweezer can exceed 96% when the number of tweezer size is bigger than 1000. Our analysis shows that further uniformity requires further optimization of optical system. The realization of the intensity equalization algorithm is of great significance to the many-body experiments based on single atom array.

翻訳日:2024-11-08 15:23:20 公開日:2024-11-06

# 視覚表現学習のためのマルチラベルクラスタ識別

Multi-label Cluster Discrimination for Visual Representation Learning ( http://arxiv.org/abs/2407.17331v2 )

ライセンス: Link先を確認

Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jiankang Deng,

(参考訳) コントラスト言語画像事前学習(CLIP)は、画像テキストのコントラスト学習によって強化された優れた特徴表現により、様々なタスクで成功した。しかし、CLIPが使用するインスタンス識別手法では、トレーニングデータのセマンティック構造をほとんどエンコードできない。この制限に対処するため、反復的なクラスタ割り当てと分類によってクラスタ識別が提案されている。しかしながら、ほとんどのクラスタ識別アプローチは、画像内の複数ラベル信号を無視して、各画像に対して1つの擬似ラベルを定義するだけである。本稿では,MLCDと呼ばれる新しいマルチラベルクラスタ識別手法を提案する。クラスタリングのステップでは、まず大規模なLAION-400Mデータセットを、オフザシェルフの埋め込み機能に基づいて100万のセンタにクラスタ化します。自然画像には複数の視覚的対象や属性が頻繁に含まれており、補助的なクラスラベルとして複数の最も近い中心を選択する。識別段階において、我々は、正のクラスと負のクラスから損失を優雅に分離し、決定境界の曖昧さを軽減する、新しい多ラベル分類損失を設計する。モデルと事前学習データセットの異なるスケールの実験により,提案手法の有効性を検証した。実験の結果,線形プローブ,ゼロショット分類,画像テキスト検索など,複数の下流タスクにおける最先端性能が得られた。コードとモデルはhttps://github.com/deepglint/unicom でリリースされた。

Contrastive Language Image Pre-training (CLIP) has recently demonstrated success across various tasks due to superior feature representation empowered by image-text contrastive learning. However, the instance discrimination method used by CLIP can hardly encode the semantic structure of training data. To handle this limitation, cluster discrimination has been proposed through iterative cluster assignment and classification. Nevertheless, most cluster discrimination approaches only define a single pseudo-label for each image, neglecting multi-label signals in the image. In this paper, we propose a novel Multi-Label Cluster Discrimination method named MLCD to enhance representation learning. In the clustering step, we first cluster the large-scale LAION-400M dataset into one million centers based on off-the-shelf embedding features. Considering that natural images frequently contain multiple visual objects or attributes, we select the multiple closest centers as auxiliary class labels. In the discrimination step, we design a novel multi-label classification loss, which elegantly separates losses from positive classes and negative classes, and alleviates ambiguity on decision boundary. We validate the proposed multi-label cluster discrimination method with experiments on different scales of models and pre-training datasets. Experimental results show that our method achieves state-of-the-art performance on multiple downstream tasks including linear probe, zero-shot classification, and image-text retrieval. Code and models have been released at https://github.com/deepglint/unicom .

翻訳日:2024-11-08 15:23:20 公開日:2024-11-06

# BetterDepth:ゼロショット単眼深度推定のためのプラグアンドプレイ拡散精錬器

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation ( http://arxiv.org/abs/2407.17952v2 )

ライセンス: Link先を確認

Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton Obukhov, Markus Gross, Konrad Schindler, Christopher Schroers,

(参考訳) 大規模データセット上でのトレーニングにより、ゼロショット単眼深度推定(MDE)手法は、野生では堅牢な性能を示すが、詳細が不十分な場合が多い。拡散に基づく最近のMDE手法は、詳細を抽出する優れた能力を示すが、より多様な3Dデータに基づいて訓練された、幾何学的に複雑なシーンに苦しむ。両世界の相補的な利点を活用するため,細部を捉えながら幾何的に正しいアフィン不変のMDEを実現するためのBetterDepthを提案する。具体的には、BetterDepthは、事前訓練されたMDEモデルからの予測を深度条件付けとして、大域的な深度レイアウトを適切にキャプチャし、入力画像に基づいて詳細を反復的に洗練する条件拡散ベースの精錬機である。このようなリファインダのトレーニングのために,細かなシーンの詳細を学習しながら,BetterDepthが奥行き条件に忠実であることを保証するために,グローバルな事前調整と局所パッチマスキング手法を提案する。小規模の合成データセットの効率的なトレーニングにより、BetterDepthは、さまざまな公開データセットと、その中のシーンで、最先端のゼロショットMDEパフォーマンスを達成する。さらに、BetterDepthはプラグアンドプレイ方式で他のMDEモデルの性能を向上させることができる。

By training over large-scale datasets, zero-shot monocular depth estimation (MDE) methods show robust performance in the wild but often suffer from insufficient detail. Although recent diffusion-based MDE approaches exhibit a superior ability to extract details, they struggle in geometrically complex scenes that challenge their geometry prior, trained on less diverse 3D data. To leverage the complementary merits of both worlds, we propose BetterDepth to achieve geometrically correct affine-invariant MDE while capturing fine details. Specifically, BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning, in which the global depth layout is well-captured, and iteratively refines details based on the input image. For the training of such a refiner, we propose global pre-alignment and local patch masking methods to ensure BetterDepth remains faithful to the depth conditioning while learning to add fine-grained scene details. With efficient training on small-scale synthetic datasets, BetterDepth achieves state-of-the-art zero-shot MDE performance on diverse public datasets and on in-the-wild scenes. Moreover, BetterDepth can improve the performance of other MDE models in a plug-and-play manner without further re-training.

翻訳日:2024-11-08 15:01:09 公開日:2024-11-06

# 効率的かつ効果的に:交通分類のための平文と暗号化テキストのバランスをとるための2段階的アプローチ

Efficiently and Effectively: A Two-stage Approach to Balance Plaintext and Encrypted Text for Traffic Classification ( http://arxiv.org/abs/2407.19687v3 )

ライセンス: Link先を確認

Wei Peng, Lei Cui, Wei Cai, Zhenquan Ding, Zhiyu Hao, Xiaochun Yun,

(参考訳) 暗号化されたトラフィック分類は、暗号化されたネットワークトラフィックに関連するアプリケーションまたはサービスを特定するタスクである。このタスクの効果的なアプローチは、ディープラーニングを使って生のトラフィックバイトを直接エンコードし、分類のための機能(バイトベースモデル)を自動的に抽出することである。しかし、現在のバイトベースのモデルでは、平文や暗号化されたテキストのいずれでも、平文や暗号化されたテキストが下流タスクに与える影響を無視して、自動的な特徴抽出のために生のトラフィックバイトを入力している。さらに、これらのモデルは主に分類精度の改善に重点を置いており、モデルの効率にはほとんど重点を置いていない。本稿では,原文と暗号化されたテキストがモデルの有効性と効率に与える影響を初めて分析する。そこで本研究では,トラフィック分類における平文と暗号化テキストのトレードオフを両立させる2段階の手法を提案する。具体的には、提案したDPCセレクタを用いて、Plainテキストが正確に分類(DPC)できるかどうかを決定する。この段階では、平文で分類できるサンプルを素早く特定し、平文で明示的なバイト機能を活用してモデルの効率を高める。ステージ2は、ステージ1の結果を適応的に分類することを目的としている。この段階では、平文だけで分類できないサンプルに対して暗号化されたテキスト情報を組み込み、トラフィック分類タスクにおけるモデルの有効性を保証する。 2つのデータセットに対する実験により,提案モデルが有効性と効率の両面で最先端の結果が得られることを示した。

Encrypted traffic classification is the task of identifying the application or service associated with encrypted network traffic. One effective approach for this task is to use deep learning methods to encode the raw traffic bytes directly and automatically extract features for classification (byte-based models). However, current byte-based models input raw traffic bytes, whether plaintext or encrypted text, for automated feature extraction, neglecting the distinct impacts of plaintext and encrypted text on downstream tasks. Additionally, these models primarily focus on improving classification accuracy, with little emphasis on the efficiency of models. In this paper, for the first time, we analyze the impact of plaintext and encrypted text on the model's effectiveness and efficiency. Based on our observations and findings, we propose a two-phase approach to balance the trade-off between plaintext and encrypted text in traffic classification. Specifically, Stage one is to Determine whether the Plain text is enough to be accurately Classified (DPC) using the proposed DPC Selector. This stage quickly identifies samples that can be classified using plaintext, leveraging explicit byte features in plaintext to enhance model's efficiency. Stage two aims to adaptively make a classification with the result from stage one. This stage incorporates encrypted text information for samples that cannot be classified using plaintext alone, ensuring the model's effectiveness on traffic classification tasks. Experiments on two datasets demonstrate that our proposed model achieves state-of-the-art results in both effectiveness and efficiency.

翻訳日:2024-11-08 14:27:29 公開日:2024-11-06

# Virchow2: 病理学における自己監督型混合拡大モデルのスケーリング

Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology ( http://arxiv.org/abs/2408.00738v3 )

ライセンス: Link先を確認

Eric Zimmermann, Eugene Vorontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi, Thomas Fuchs, Nicolo Fusi, Siqi Liu, Kristen Severson,

(参考訳) 基礎モデルは、計算病理学の応用のために急速に開発されている。しかし、データスケールと多様性、モデルサイズ、トレーニングアルゴリズムなど、ダウンストリームのパフォーマンスにおいて、どの要素がもっとも重要かは、まだ明らかな疑問である。本研究では,病理学に適したアルゴリズム的修正を提案するとともに,データサイズとモデルサイズの両方をスケールした結果を,両次元の先行研究を超越した結果として提示する。 6億2200万のパラメータ・ビジョン・トランスフォーマーであるVirchow2Gと、19億のパラメータ・ビジョン・トランスフォーマーであるVirchow2G Miniと、Virchow2Gの2200万のパラメータ・蒸留であるVirchow2G Miniの3つの新しいモデルを紹介した。上位の競合モデルと比較して,12のタイルレベルのタスクで最先端のパフォーマンスを実現する。以上の結果から,データ多様性とドメイン固有の手法は,パラメータ数のみをスケールするモデルよりも優れているが,平均的には,ドメイン固有の手法,データスケール,モデルスケールの組み合わせによるパフォーマンス上のメリットが期待できる。

Foundation models are rapidly being developed for computational pathology applications. However, it remains an open question which factors are most important for downstream performance with data scale and diversity, model size, and training algorithm all playing a role. In this work, we propose algorithmic modifications, tailored for pathology, and we present the result of scaling both data and model size, surpassing previous studies in both dimensions. We introduce three new models: Virchow2, a 632 million parameter vision transformer, Virchow2G, a 1.9 billion parameter vision transformer, and Virchow2G Mini, a 22 million parameter distillation of Virchow2G, each trained with 3.1 million histopathology whole slide images, with diverse tissues, originating institutions, and stains. We achieve state of the art performance on 12 tile-level tasks, as compared to the top performing competing models. Our results suggest that data diversity and domain-specific methods can outperform models that only scale in the number of parameters, but, on average, performance benefits from the combination of domain-specific methods, data scale, and model scale.

翻訳日:2024-11-08 13:29:21 公開日:2024-11-06

# 半導体Si-SiGeスピンビットにおけるフォノン誘起交換ゲート不均一性

Phonon-Induced Exchange Gate Infidelities in Semiconducting Si-SiGe Spin Qubits ( http://arxiv.org/abs/2408.02742v2 )

ライセンス: Link先を確認

Matthew Brooks, Rex Lundgren, Charles Tahan,

(参考訳) 半導体スピン量子ビット間のスピン-スピン交換相互作用は、高速な単一および2量子ゲートを可能にする。交換の間、クォービットと周囲のフォノン浴のカップリングは、結果として生じるゲートに誤りを引き起こす可能性がある。ここでは、有限温度フォノン浴に結合したSi-SiGeヘテロ構造における半導体二重量子ドットスピン量子ビットとの交換操作の忠実さを考察する。マスター方程式を用いて、各スピンフォノン結合項の孤立効果と符号化量子ビット演算の漏れ誤差を解くことができる。温度が上昇するにつれて、2つの電子スピン状態のフォノン誘起摂動に起因する一次誤差の源となる部分と、励起軌道状態へのフォノン誘起結合が支配的誤差となる部分との交差が観察される。さらに, パルス形状と長さの単純なトレードオフにより, ゲート操作時のスピンフォノン誘起誤差に対して, 最大で1桁の堅牢性を向上できることが示されている。以上の結果から,200-300mK以内の高温では,交換ゲートの動作はバルクフォノンで制限されていないことが示唆された。これは最近の実験と一致している。

Spin-spin exchange interactions between semiconductor spin qubits allow for fast single and two-qubit gates. During exchange, coupling of the qubits to a surrounding phonon bath may cause errors in the resulting gate. Here, the fidelities of exchange operations with semiconductor double quantum dot spin qubits in a Si-SiGe heterostructure coupled to a finite temperature phonon bath are considered. By employing a master equation approach, the isolated effect of each spin-phonon coupling term may be resolved, as well as leakage errors of encoded qubit operations. As the temperature is increased, a crossover is observed from where the primary source of error is due to a phonon induced perturbation of the two electron spin states, to one where the phonon induced coupling to an excited orbital state becomes the dominant error. Additionally, it is shown that a simple trade-off in pulse shape and length can improve robustness to spin-phonon induced errors during gate operations by up to an order of magnitude. Our results suggest that for elevated temperatures within 200-300 mK, exchange gate operations are not currently limited by bulk phonons. This is consistent with recent experiments.

翻訳日:2024-11-08 12:55:50 公開日:2024-11-06

# コードのための大規模言語モデルのホットフィックス

Hotfixing Large Language Models for Code ( http://arxiv.org/abs/2408.05727v3 )

ライセンス: Link先を確認

Zhou Yang, David Lo,

(参考訳) コードのための大規模言語モデル(LLM4Code)は開発者のワークフローの不可欠な部分となり、コード補完や生成などのタスクを支援している。しかし、これらのモデルは、バグの多いコードを含む大量のソースコードを広範囲にトレーニングしたために、バグの多いコードを生成するなど、リリース後に望ましくない振る舞いを示す。トレーニングデータ(通常、オープンソースソフトウェアから来る)は進化を続けており、例えば、開発者はバグの多いコードを修正します。しかしながら、LLM4Codeの望ましくない振る舞いを軽減するためにこのような進化を適用することは、簡単ではない。このことは、LLM4Codeの望ましくない振る舞いを最小限の負の効果で効果的かつ効率的に緩和する、LLM4Codeのホットフィックスの概念を提案する動機である。本稿では,LLM4Codeをホットフィックスすることで,バグの少ないコードとより固定的なコードを生成することに焦点を当てる。私たちは、人気のあるCodeGenファミリのモデルが頻繁にバグのあるコードを生成することを実証することから始めます。そこで,本研究では,(1)所望の動作を学習し,(2)望ましくない動作を学習し,(3)他のコードの知識を保持する,という3つの学習目標を定義した。モデルをホットフィックスするための4つの異なる微調整手法を評価し,以下の知見を得た。 LoRA(低ランク適応)を用いてこれら3つの学習目標を同時に最適化することは、モデルの振る舞いに効果的に影響を及ぼす。具体的には、固定コードの生成を最大108.42%増加させ、バグコードの生成を最大50.47%減少させる。統計テストでは、HumanEvalベンチマークにおいてホットフィックスがモデルの機能的正しさに悪影響を及ぼさないことが確認された。さらに、メールアドレスの露出を99.30%減らし、ホットフィックスの一般化性を評価する。

Large Language Models for Code (LLM4Code) have become an integral part of developers' workflows, assisting with tasks such as code completion and generation. However, these models are found to exhibit undesired behaviors after their release, like generating buggy code, due to their extensive training on vast amounts of source code that contain such buggy code. The training data (usually coming from open-source software) keeps evolving, e.g., developers fix the buggy code. However, adapting such evolution to mitigate LLM4Code's undesired behaviors is non-trivial, as retraining models on the updated dataset usually takes much time and resources. This motivates us to propose the concept of hotfixing LLM4Code, mitigating LLM4Code's undesired behaviors effectively and efficiently with minimal negative effects. This paper mainly focuses on hotfixing LLM4Code to make them generate less buggy code and more fixed code. We begin by demonstrating that models from the popular CodeGen family frequently generate buggy code. Then, we define three learning objectives in hotfixing and design multiple loss functions for each objective: (1) learn the desired behaviors, (2) unlearn the undesired behaviors, and (3) retain knowledge of other code. We evaluate four different fine-tuning techniques for hotfixing the models and gain the following insights. Optimizing these three learning goals together, using LoRA (low-rank adaptation), effectively influences the model's behavior. Specifically, it increases the generation of fixed code by up to 108.42% and decreases the generation of buggy code by up to 50.47%. Statistical tests confirm that hotfixing does not significantly affect the models' functional correctness on the HumanEval benchmark. Additionally, to evaluate the generalizability of hotfixing by reducing the exposure of email addresses by 99.30%.

翻訳日:2024-11-08 11:49:24 公開日:2024-11-06

# コードのための大規模言語モデルのホットフィックス

Hotfixing Large Language Models for Code ( http://arxiv.org/abs/2408.05727v4 )

ライセンス: Link先を確認

Zhou Yang, David Lo,

翻訳日:2024-11-08 11:49:24 公開日:2024-11-06

# 量子ビットを用いた量子情報

Quantum information with quantum-like bits ( http://arxiv.org/abs/2408.06485v2 )

ライセンス: Link先を確認

Graziano Amati, Gregory D. Scholes,

(参考訳) これまでの研究で我々は、例えば発振器のような大型で複雑な古典的システムと、デコヒーレンスによって損なわれない量子的機能を実現する量子的ビットの構築を提案してきた。本稿では、量子状態のこのプラットフォームをさらに検討する。まず,創発的な状態を許容する同期ネットワークの構築方法に関する一般的なプロトコルについて議論する。次に、これらの状態に対してゲートをどのように実装できるかを研究する。これは、特別に構築された古典的ネットワーク上での量子ライクな計算の可能性を示している。最後に、我々のモデルを古典的確率システムから分離する特徴である非コルモゴロフ干渉を可能にする測定の概念を定義する。本稿では,量子的資源の数学的構造を探究し,これらのシステムにおける創発的状態を操作することで任意のゲートをどのように実現できるかを示す。

In previous work we have proposed a construction of quantum-like bits that could endow a large, complex classical system, for example of oscillators, with quantum-like function that is not compromised by decoherence. In the present paper we investigate further this platform of quantum-like states. Firstly, we discuss a general protocol on how to construct synchronizing networks that allow for emergent states. We then study how gates can be implemented on those states. This suggests the possibility of quantum-like computing on specially-constructed classical networks. Finally, we define a notion of measurement that allows for non-Kolmogorov interference, a feature that separates our model from a classical probabilistic system. This paper aims to explore the mathematical structure of quantum-like resources, and shows how arbitrary gates can be implemented by manipulating emergent states in those systems.

翻訳日:2024-11-08 11:26:46 公開日:2024-11-06

# ELASTIC:シークエンス圧縮のための効率的な線形アテンション

ELASTIC: Efficient Linear Attention for Sequential Interest Compression ( http://arxiv.org/abs/2408.09380v3 )

ライセンス: Link先を確認

Jiaxin Deng, Shiyao Wang, Song Lu, Yinfeng Li, Xinchen Luo, Yuanjun Liu, Peixing Xu, Guorui Zhou,

(参考訳) 最先端のシーケンシャルレコメンデーションモデルは、トランスフォーマーの注意機構に大きく依存している。しかし、自己注意の二次計算とメモリの複雑さは、ユーザの長距離動作シーケンスをモデル化するためのスケーラビリティを制限している。この問題に対処するために、線形時間複雑性と計算コストからのモデルキャパシティの分離を必要とせず、SequenTial Interest Compressionの効率的な線形アテンションであるELASTICを提案する。具体的には、線形ディスパッチアテンション機構を備えた固定長関心の専門家を導入し、長期の動作シーケンスをよりコンパクトな表現に圧縮し、x2.7推論速度で最大90%のGPUメモリ使用量を削減した。提案した線形ディスパッチアテンション機構は2次複雑性を著しく低減し、非常に長いシーケンスを適切にモデル化できるモデルを実現する。さらに、多様なユーザ関心をモデル化する能力を維持するため、ELASTICは、膨大な学習可能な関心記憶バンクを初期化し、圧縮されたユーザ関心を、無視可能な計算オーバーヘッドでメモリからわずかに回収する。提案手法は,同じ計算コストを維持しつつ,利用可能な関心空間の濃度を著しく拡張し,推奨精度と効率のトレードオフを生じさせる。提案するELASTICの有効性を検証するため,様々な公開データセットに対する広範囲な実験を行い,複数の強力なシーケンシャルなレコメンデータと比較した。実験結果から、ELASTICはベースラインをかなりのマージンで一貫した性能を示し、長いシーケンスをモデル化する際の計算効率を強調した。実装コードを公開します。

State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism. However, the quadratic computational and memory complexities of self attention have limited its scalability for modeling users' long range behaviour sequences. To address this problem, we propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression, requiring only linear time complexity and decoupling model capacity from computational cost. Specifically, ELASTIC introduces a fixed length interest experts with linear dispatcher attention mechanism which compresses the long-term behaviour sequences to a significantly more compact representation which reduces up to 90% GPU memory usage with x2.7 inference speed up. The proposed linear dispatcher attention mechanism significantly reduces the quadratic complexity and makes the model feasible for adequately modeling extremely long sequences. Moreover, in order to retain the capacity for modeling various user interests, ELASTIC initializes a vast learnable interest memory bank and sparsely retrieves compressed user's interests from the memory with a negligible computational overhead. The proposed interest memory retrieval technique significantly expands the cardinality of available interest space while keeping the same computational cost, thereby striking a trade-off between recommendation accuracy and efficiency. To validate the effectiveness of our proposed ELASTIC, we conduct extensive experiments on various public datasets and compare it with several strong sequential recommenders. Experimental results demonstrate that ELASTIC consistently outperforms baselines by a significant margin and also highlight the computational efficiency of ELASTIC when modeling long sequences. We will make our implementation code publicly available.

翻訳日:2024-11-08 07:07:05 公開日:2024-11-06

# CIPHER: サイバーセキュリティのインテリジェントな侵入テスト支援者

CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher ( http://arxiv.org/abs/2408.11650v2 )

ライセンス: Link先を確認

Derry Pratama, Naufal Suryanto, Andro Aprila Adiputra, Thi-Thu-Huong Le, Ahmada Yusril Kadiptya, Muhammad Iqbal, Howon Kim,

(参考訳) サイバーセキュリティの重要なコンポーネントである浸透テストは、脆弱性を見つけるのに広範囲な時間と労力を必要とする。この分野のベジニアは、しばしばコミュニティや専門家との協力的なアプローチの恩恵を受ける。そこで我々はCIPHER(Cybersecurity Intelligent Peretration-testing Helper for Ethical researchers)を開発した。私たちは、脆弱なマシンの300以上の高品質な書き込み、ハッキングテクニック、オープンソースの侵入テストツールのドキュメントを使用してCIPHERをトレーニングしました。さらに我々は,大規模な言語モデルに適した完全自動ペンテスティングシミュレーションベンチマークを確立するために,インテグレーション・アクション・推論・結果(FARR)フロー拡張(en:Fundings, Action, Reasoning, results)を導入した。このアプローチは、従来のサイバーセキュリティのQ\&Aベンチマークにおける大きなギャップを埋め、AIの技術知識、推論能力、動的侵入テストシナリオにおける実用性を評価するための、現実的で厳格な標準を提供する。我々の評価では、CIPHERは、Llama 3 70BやQwen1.5 72B Chatのような、同じ大きさの他のオープンソース浸透試験モデルや、さらに大きな最先端モデルと比較して、正確な提案応答を提供することで、最高の全体的なパフォーマンスを達成しました。このことは、汎用LLMの現在の能力が、侵入テストプロセスを通じてユーザを効果的に導くのに不十分であることを示している。また、スケーリングによる改善の可能性や、FARR Flow Augmentationの結果を用いたより良いベンチマークの開発についても論じる。私たちのベンチマークはhttps://github.com/ibndias/CIPHER.comで公開されます。

Penetration testing, a critical component of cybersecurity, typically requires extensive time and effort to find vulnerabilities. Beginners in this field often benefit from collaborative approaches with the community or experts. To address this, we develop CIPHER (Cybersecurity Intelligent Penetration-testing Helper for Ethical Researchers), a large language model specifically trained to assist in penetration testing tasks. We trained CIPHER using over 300 high-quality write-ups of vulnerable machines, hacking techniques, and documentation of open-source penetration testing tools. Additionally, we introduced the Findings, Action, Reasoning, and Results (FARR) Flow augmentation, a novel method to augment penetration testing write-ups to establish a fully automated pentesting simulation benchmark tailored for large language models. This approach fills a significant gap in traditional cybersecurity Q\&A benchmarks and provides a realistic and rigorous standard for evaluating AI's technical knowledge, reasoning capabilities, and practical utility in dynamic penetration testing scenarios. In our assessments, CIPHER achieved the best overall performance in providing accurate suggestion responses compared to other open-source penetration testing models of similar size and even larger state-of-the-art models like Llama 3 70B and Qwen1.5 72B Chat, particularly on insane difficulty machine setups. This demonstrates that the current capabilities of general LLMs are insufficient for effectively guiding users through the penetration testing process. We also discuss the potential for improvement through scaling and the development of better benchmarks using FARR Flow augmentation results. Our benchmark will be released publicly at https://github.com/ibndias/CIPHER.

翻訳日:2024-11-08 06:11:36 公開日:2024-11-06

# OpenFactCheck: LLMのファクチュアリティ評価のための統一フレームワーク

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs ( http://arxiv.org/abs/2408.11832v2 )

ライセンス: Link先を確認

Hasan Iqbal, Yuxia Wang, Minghan Wang, Georgi Georgiev, Jiahui Geng, Iryna Gurevych, Preslav Nakov,

(参考訳) 様々な現実世界のアプリケーションにまたがる大規模言語モデル(LLM)の利用が増加し、LLMが幻覚しているように、アウトプットの実際の正確性をチェックするための自動ツールが求められている。自由形式のオープンドメイン応答の事実性を評価する必要があるため、これは難しい。この話題について多くの研究が行われてきたが、異なる論文では異なる評価ベンチマークと測定方法を使用しているため、将来の進歩を比べることは困難である。これらの問題を緩和するため、私たちは3つのモジュールを持つ統一フレームワークであるOpenFactCheckを開発しました。 (i)RESPONSEEVALは、自動事実確認システムを容易にカスタマイズし、そのシステムを用いて入力文書中のすべてのクレームの事実性を評価することができる。二 LLMの全体的事実性を評価する LLMEVAL 及び三自動事実確認システムを評価するためのモジュール、CECKEREVAL OpenFactCheckはオープンソース(https://github.com/mbzuai-nlp/openfactcheck)で、Pythonライブラリ(https://pypi.org/project/openfactcheck/)として、Webサービス(http://app.openfactcheck.com)として公開されている。システムを記述するビデオはhttps://youtu.be/-i9VKL0HleIで公開されている。

The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures, which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/mbzuai-nlp/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (http://app.openfactcheck.com). A video describing the system is available at https://youtu.be/-i9VKL0HleI.

翻訳日:2024-11-08 06:00:04 公開日:2024-11-06

# 線形ニューラルネットワークの講義ノート:ディープラーニングにおける最適化と一般化の物語

Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning ( http://arxiv.org/abs/2408.13767v2 )

ライセンス: Link先を確認

Nadav Cohen, Noam Razin,

(参考訳) これらのノートは、深層学習の数学的理解に関するプリンストン大学の上級講座の一部として、2021年3月にNCが行った講義に基づいている。彼らは線形ニューラルネットワークの理論(NC、NR、共同研究者によって開発された)を提示し、ディープラーニングの最適化と一般化の研究における基礎モデルである。提示された理論から生まれた実践的応用についても論じる。この理論は、自然界で動的である数学的ツールに基づいている。これは、ディープラーニングにおける最適化と一般化の理解のエンベロープを推し進めるための、そのようなツールの可能性を示している。このテキストは統計学習理論の基礎に精通している。エクササイズは(ソリューションなしで)含まれます。

These notes are based on a lecture delivered by NC on March 2021, as part of an advanced course in Princeton University on the mathematical understanding of deep learning. They present a theory (developed by NC, NR and collaborators) of linear neural networks -- a fundamental model in the study of optimization and generalization in deep learning. Practical applications born from the presented theory are also discussed. The theory is based on mathematical tools that are dynamical in nature. It showcases the potential of such tools to push the envelope of our understanding of optimization and generalization in deep learning. The text assumes familiarity with the basics of statistical learning theory. Exercises (without solutions) are included.

翻訳日:2024-11-08 05:15:13 公開日:2024-11-06

# BCDNet: 浸潤性直腸癌検出のための高速残像ニューラルネットワーク

BCDNet: A Fast Residual Neural Network For Invasive Ductal Carcinoma Detection ( http://arxiv.org/abs/2408.13800v3 )

ライセンス: Link先を確認

Yujia Lin, Aiwei Lian, Mingyu Liao, Shuangjie Yuan,

(参考訳) 乳がんの亜型である浸潤性直腸癌(IDC)を早期に診断することが重要である。 CAD(Computer-Aided Diagnosis)システムの強力なモデルは有望な結果をもたらすが、他の医療機器と統合したり、十分な計算資源を使わずに使用することは依然として困難である。本稿では,まず入力画像を残差ブロックで増幅し,より小さな畳み込みブロックと特別なMLPを用いて特徴を学習するBCDNetを提案する。 BCDNetは、病理組織学的RGB画像におけるIDCを91.6%の平均精度で効果的に検出し、ResNet 50やViT-B-16と比較してトレーニング消費を効果的に削減することが証明されている。

It is of great significance to diagnose Invasive Ductal Carcinoma (IDC) in early stage, which is the most common subtype of breast cancer. Although the powerful models in the Computer-Aided Diagnosis (CAD) systems provide promising results, it is still difficult to integrate them into other medical devices or use them without sufficient computation resource. In this paper, we propose BCDNet, which firstly upsamples the input image by the residual block and use smaller convolutional block and a special MLP to learn features. BCDNet is proofed to effectively detect IDC in histopathological RGB images with an average accuracy of 91.6% and reduce training consumption effectively compared to ResNet 50 and ViT-B-16.

翻訳日:2024-11-08 05:15:13 公開日:2024-11-06

# 人的介入を伴わない手術器具分割の再検討:グラフ分割

Revisiting Surgical Instrument Segmentation Without Human Intervention: A Graph Partitioning View ( http://arxiv.org/abs/2408.14789v2 )

ライセンス: Link先を確認

Mingyu Sheng, Jianan Fan, Dongnan Liu, Ron Kikinis, Weidong Cai,

(参考訳) 内視鏡画像における手術器具のセグメンテーション(SIS)は,低侵襲手術を増強するためのコンピュータ支援的介入の文脈において,長年の重要課題である。近年の深層学習の方法論とデータ・ハングリーの性質の高まりを踏まえ、大規模な専門家による注釈に基づく神経予測モデルを訓練することは、この分野における既成のアプローチとして支配され、しかしながら、収集された外科的ビデオフレームに対応する微細なピクセル単位のラベルを作成するために、臨床医に禁止的な負担を課す可能性がある。本研究では,ビデオフレーム分割をグラフ分割問題として再検討し,画像画素をグラフノードとして扱う教師なし手法を提案する。自己教師付き事前学習モデルは、まず、高レベルな意味的特徴をキャプチャする特徴抽出器として活用される。すると、ラプラシア行列は特徴量から計算され、グラフ分割のために固有分解される。ディープ」固有ベクトルでは、手術用ビデオフレームは、ツールや組織などの異なるモジュールに意味的に分割され、位置、クラス、関係などの区別可能な意味情報を提供する。セグメンテーション問題は、固有ベクトルにクラスタリングやしきい値を適用することで自然に取り組むことができる。様々な臨床エンドポイント(例:EndoVis2017、EndoVis2018、UCLなど)で広範囲にわたる実験が実施されている。難解なシナリオのすべてにおいて,本手法は,教師なしの最先端(SOTA)手法よりも優れた性能と堅牢性を示す。コードはhttps://github.com/MingyuShengSMY/GraphClusteringSIS.gitで公開されている。

Surgical instrument segmentation (SIS) on endoscopic images stands as a long-standing and essential task in the context of computer-assisted interventions for boosting minimally invasive surgery. Given the recent surge of deep learning methodologies and their data-hungry nature, training a neural predictive model based on massive expert-curated annotations has been dominating and served as an off-the-shelf approach in the field, which could, however, impose prohibitive burden to clinicians for preparing fine-grained pixel-wise labels corresponding to the collected surgical video frames. In this work, we propose an unsupervised method by reframing the video frame segmentation as a graph partitioning problem and regarding image pixels as graph nodes, which is significantly different from the previous efforts. A self-supervised pre-trained model is firstly leveraged as a feature extractor to capture high-level semantic features. Then, Laplacian matrixs are computed from the features and are eigendecomposed for graph partitioning. On the "deep" eigenvectors, a surgical video frame is meaningfully segmented into different modules such as tools and tissues, providing distinguishable semantic information like locations, classes, and relations. The segmentation problem can then be naturally tackled by applying clustering or threshold on the eigenvectors. Extensive experiments are conducted on various datasets (e.g., EndoVis2017, EndoVis2018, UCL, etc.) for different clinical endpoints. Across all the challenging scenarios, our method demonstrates outstanding performance and robustness higher than unsupervised state-of-the-art (SOTA) methods. The code is released at https://github.com/MingyuShengSMY/GraphClusteringSIS.git.

翻訳日:2024-11-08 04:52:58 公開日:2024-11-06

# 量子シャドウトモグラフィによる効率的な後処理による量子アドバンテージ

Quantum Advantage via Efficient Post-processing on Qudit Shadow tomography ( http://arxiv.org/abs/2408.16244v2 )

ライセンス: Link先を確認

Yu Wang,

(参考訳) 量子科学や人工知能などの分野において、 $\text{tr}(AB)$ の計算は必須であるが、古典的な計算複雑性は $A $ と $B $ が $d $-次元行列であるときに $O(d^2) $ である。さらに、 $A $ と $B $ を格納するには $O(d^2) $ メモリが必要であるため、指数関数的な高次元システムにはさらなる課題が生じる。広義の行列のクラス $A $ と有界ノルムエルミート行列 $B $ に対して、計算と記憶の複雑さを指数関数的に $O(\text{poly}(\log d)) $ に減らし、既知の $\text{tr}(B)\ を持つ有界ノルムエルミート行列 \(B $ に対して量子的アプローチを提案する。ランダムなクリフォード測定によるシャドウトモグラフィーと比較すると,本手法は,測定毎の計算処理の複雑さを指数最悪のシナリオから定数に減らし,任意の次元の $d $ に適用可能である。この進歩は、効率的な高次元データ解析と複雑なシステムモデリングのための新しい経路を開く。

The calculation of $\text{tr}(AB)$ is essential in fields like quantum science and artificial intelligence, but the classical computational complexity is $ O(d^2) $ when $ A $ and $ B $ are $ d $-dimensional matrices. Moreover, storing $ A $ and $ B $ requires $ O(d^2) $ memory, which poses additional challenges for exponential high-dimensional systems. We propose a quantum approach through a qudit shadow tomography framework to exponentially reduce both the computational and storage complexity to $ O(\text{poly}(\log d)) $ for a broad class of matrices $ A $ and for bounded-norm Hermitian matrices $ B $ with known $\text{tr}(B)$. Compared to shadow tomography via random Clifford measurements, our method reduces the computational complexity of post-processing per measurement from an exponential worst-case scenario to a constant, and it is applicable across arbitrary dimensions $ d $. This advancement opens new pathways for efficient high-dimensional data analysis and complex system modeling.

翻訳日:2024-11-08 04:19:50 公開日:2024-11-06

# Krawtchouk鎖におけるフェルミオン対数否定性

Fermionic logarithmic negativity in the Krawtchouk chain ( http://arxiv.org/abs/2408.16531v2 )

ライセンス: Link先を確認

Gabrielle Blanchet, Gilles Parez, Luc Vinet,

(参考訳) 非補体領域の絡み合いは、フェルミオン対数ネガティビティのレンズを介して不均一な自由フェルミオン鎖で研究される。クラウチョーク連鎖(Krawtchouk chain)は、同名の直交多項式との関係により、ある相関関数の正確な対角化と解析的な計算が可能となる。隣り合う地域では、負性スケーリングは、クローチョーク連鎖における二部鎖の絡み合いに関する以前の研究と一致して、中心電荷$c=1$の共形場理論のそれに対応する。解離した地域については,各地域が1つの場所に縮小する骨格体制に焦点をあてる。この体制は、遠くで先導的な行動を取り出すのに十分である。バルクにおいて、負性は$d^{-4 \Delta_f}$と$\Delta_f=1/2$で崩壊する。これは、自由ディラックフェルミオンの1次元における均質な結果と一致する。驚いたことに、あるサイトが境界に近いとき、この指数は$m=0,1,2,\dots$と$\Delta_f^{\textrm{even}}=3/8$と$\Delta_f^{\textrm{odd}}=5/8$のパリティに依存する。結果は数値計算と解析計算によって支えられている。

The entanglement of non-complementary regions is investigated in an inhomogeneous free-fermion chain through the lens of the fermionic logarithmic negativity. Focus is on the Krawtchouk chain, whose relation to the eponymous orthogonal polynomials allows for exact diagonalization and analytical calculations of certain correlation functions. For adjacent regions, the negativity scaling corresponds to that of a conformal field theory with central charge $c=1$, in agreement with previous studies on bipartite entanglement in the Krawtchouk chain. For disjoint regions, we focus on the skeletal regime where each region reduces to a single site. This regime is sufficient to extract the leading behaviour at large distances. In the bulk, the negativity decays as $d^{-4 \Delta_f}$ with $\Delta_f=1/2$, where $d$ is the separation between the regions. This is in agreement with the homogeneous result of free Dirac fermions in one dimension. Surprisingly, when one site is close to the boundary, this exponent changes and depends on the parity of the boundary site $m=0,1,2,\dots$, with $\Delta_f^{\textrm{even}}=3/8$ and $\Delta_f^{\textrm{odd}}=5/8$. The results are supported by numerics and analytical calculations.

翻訳日:2024-11-08 04:19:50 公開日:2024-11-06

# 深層学習を用いた高アスペクト比核融合デバイスの設計

Using Deep Learning to Design High Aspect Ratio Fusion Devices ( http://arxiv.org/abs/2409.00564v2 )

ライセンス: Link先を確認

P. Curvo, D. R. Ferreira, R. Jorge,

(参考訳) 融合装置の設計は一般に計算コストのかかるシミュレーションに基づいている。これは、特に、大きなパラメータ空間を持つ非軸対称磁場が特定の性能基準を満たすように最適化されたステラレータ最適化の場合において、自由パラメータの少ない高アスペクト比モデルを用いて緩和することができる。しかし、低伸長、高回転変換、有限プラズマベータ、良好な高速粒子閉じ込めなどの特性を持つ構成を見つけるためには、依然として最適化が必要である。本研究では,機械学習モデルを用いて,所望の特性に対するモデル入力パラメータの集合を求める逆設計問題の解を求めることにより,良好な閉じ込め特性を持つ構成を構築することを訓練する。逆問題の解は非一様であるため、混合密度ネットワークに基づく確率論的アプローチが用いられる。この方法で最適化された構成を確実に生成できることが示されている。

The design of fusion devices is typically based on computationally expensive simulations. This can be alleviated using high aspect ratio models that employ a reduced number of free parameters, especially in the case of stellarator optimization where non-axisymmetric magnetic fields with a large parameter space are optimized to satisfy certain performance criteria. However, optimization is still required to find configurations with properties such as low elongation, high rotational transform, finite plasma beta, and good fast particle confinement. In this work, we train a machine learning model to construct configurations with favorable confinement properties by finding a solution to the inverse design problem, that is, obtaining a set of model input parameters for given desired properties. Since the solution of the inverse problem is non-unique, a probabilistic approach, based on mixture density networks, is used. It is shown that optimized configurations can be generated reliably using this method.

翻訳日:2024-11-08 03:46:24 公開日:2024-11-06

# TabEBM:個別クラス特化エネルギーモデルを用いた語彙データ拡張手法

TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models ( http://arxiv.org/abs/2409.16118v3 )

ライセンス: Link先を確認

Andrei Margeloiu, Xiangjian Jiang, Nikola Simidjievski, Mateja Jamnik,

(参考訳) データ収集は、医学、物理学、化学といった重要な分野においてしばしば困難である。その結果、分類法は通常これらの小さなデータセットでは性能が悪く、予測性能が低下する。画像のデータの増大と同様、追加の合成データによるトレーニングセットの増加は、下流の分類性能を改善すると一般的に信じられている。しかしながら、結合分布 $ p(\mathbf{x}, y) $ またはクラス条件分布 $ p(\mathbf{x} \mid y) $ を学習する現在の表表生成法は、しばしば小さなデータセットに過度に適合し、結果として、品質の悪い合成データとなり、実際のデータのみを使用するよりも分類性能が悪化する。これらの課題を解決するために,エネルギーベースモデル(EBM)を用いた新しいクラス条件生成手法であるTabEBMを紹介する。全てのクラス条件密度を近似するために共有モデルを使用する既存の方法とは異なり、我々の重要な革新は、クラスごとに別々のEMM生成モデルを作成し、各クラス固有のデータ分布を個別にモデル化することである。このアプローチは、あいまいなクラス分布であっても、堅牢なエネルギーランドスケープを生み出す。実験の結果,TabEBMは既存の手法よりも高品質で統計的忠実度の高い合成データを生成することがわかった。データ拡張に使用する場合、我々の合成データは、様々なサイズのデータセット、特に小さなデータセットの分類性能を一貫して改善する。コードはhttps://github.com/andreimargeloiu/TabEBM.comで入手できる。

Data collection is often difficult in critical fields such as medicine, physics, and chemistry. As a result, classification methods usually perform poorly with these small datasets, leading to weak predictive performance. Increasing the training set with additional synthetic data, similar to data augmentation in images, is commonly believed to improve downstream classification performance. However, current tabular generative methods that learn either the joint distribution $ p(\mathbf{x}, y) $ or the class-conditional distribution $ p(\mathbf{x} \mid y) $ often overfit on small datasets, resulting in poor-quality synthetic data, usually worsening classification performance compared to using real data alone. To solve these challenges, we introduce TabEBM, a novel class-conditional generative method using Energy-Based Models (EBMs). Unlike existing methods that use a shared model to approximate all class-conditional densities, our key innovation is to create distinct EBM generative models for each class, each modelling its class-specific data distribution individually. This approach creates robust energy landscapes, even in ambiguous class distributions. Our experiments show that TabEBM generates synthetic data with higher quality and better statistical fidelity than existing methods. When used for data augmentation, our synthetic data consistently improves the classification performance across diverse datasets of various sizes, especially small ones. Code is available at https://github.com/andreimargeloiu/TabEBM.

翻訳日:2024-11-08 03:46:24 公開日:2024-11-06

# 中性子の$β$崩壊から生じる反ニュートリノは、異なる質量固有状態のコヒーレントな重ね合わせには含まれない

Antineutrinos produced from $β$ decays of neutrons cannot be in coherent superpositions of different mass eigenstates ( http://arxiv.org/abs/2410.03133v3 )

ライセンス: Link先を確認

Shi-Biao Zheng,

(参考訳) 中性子の$\beta$崩壊によって生じる反ニュートリノ-陽電子系の波動関数全体を解析する。反ニュートリノは、中性子の初期運動量分布に関係なく、異なる質量固有状態のコヒーレントな重ね合わせには収まらないことが証明されている。

The entire wavefunction of the antineutrino-proton-electron system, produced by the $\beta$ decay of a neutron is analyzed. It is proven that the antineutrino cannot be in coherent superpositions of different mass eigenstates, irrespective of the initial momentum distribution of the neutron.

翻訳日:2024-11-08 03:46:24 公開日:2024-11-06

# 対立音声の文脈における雑音増強手法の再評価

Reassessing Noise Augmentation Methods in the Context of Adversarial Speech ( http://arxiv.org/abs/2409.01813v2 )

ライセンス: Link先を確認

Karla Pizzi, Matías Pizarro, Asja Fischer,

(参考訳) 本研究では,自動音声認識(ASR)システムにおいて,雑音増強訓練が対向的頑健性を同時に改善できるかどうかを検討する。 ASRアーキテクチャは、背景雑音、速度変動、残響の3つの異なる拡張条件下で訓練され、もう1つは速度変化のみのものであり、もう1つはデータ拡張の形式を持たないものである。その結果,雑音の増大は雑音音声のモデル性能を向上するだけでなく,敵攻撃に対するモデルの堅牢性も向上することが示された。

In this study, we investigate if noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition (ASR) systems. We conduct a comparative analysis of the adversarial robustness of four different state-of-the-art ASR architectures, where each of the ASR architectures is trained under three different augmentation conditions: one subject to background noise, speed variations, and reverberations, another subject to speed variations only, and a third without any form of data augmentation. The results demonstrate that noise augmentation not only improves model performance on noisy speech but also the model's robustness to adversarial attacks.

翻訳日:2024-11-08 03:23:46 公開日:2024-11-06

# 対立音声の文脈における雑音増強手法の再評価

Reassessing Noise Augmentation Methods in the Context of Adversarial Speech ( http://arxiv.org/abs/2409.01813v3 )

ライセンス: Link先を確認

Karla Pizzi, Matías Pizarro, Asja Fischer,

翻訳日:2024-11-08 03:23:46 公開日:2024-11-06

# IFAdapter: 接地テキスト・画像生成のためのインスタンス特徴制御

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation ( http://arxiv.org/abs/2409.08240v3 )

ライセンス: Link先を確認

Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang,

(参考訳) テキスト・ツー・イメージ(T2I)拡散モデルは個々のインスタンスの視覚的に魅力的な画像を生成するのに優れていますが、複数のインスタンスの特徴の生成を正確に位置決めし制御するのに苦労しています。 Layout-to-Image(L2I)タスクは、境界ボックスを空間制御信号として組み込むことによって位置決めの問題に対処するために導入された。そこで本研究では,生成インスタンスにおける位置精度と特徴の忠実度を両立することを目的としたIFGタスクを提案する。 IFGタスクに対処するために、インスタンス・フィーチャー・アダプタ(IFAdapter)を導入します。 IFAdapterは、追加の外観トークンを導入し、インスタンスレベルの機能を空間的位置と整列するためにインスタンスセマンティックマップを活用することで、機能描写を強化する。 IFAdapterは、拡散プロセスをプラグアンドプレイモジュールとしてガイドし、様々なコミュニティモデルに適応できるようにする。評価のために、IFGベンチマークにコントリビュートし、正確な位置決めと特徴を持つインスタンスを生成するためのモデルの能力を客観的に比較する検証パイプラインを開発する。実験の結果,IFAdapterは定量評価と定性評価の両方において,他のモデルよりも優れていた。

While Text-to-Image (T2I) diffusion models excel at generating visually appealing images of individual instances, they struggle to accurately position and control the features generation of multiple instances. The Layout-to-Image (L2I) task was introduced to address the positioning challenges by incorporating bounding boxes as spatial control signals, but it still falls short in generating precise instance features. In response, we propose the Instance Feature Generation (IFG) task, which aims to ensure both positional accuracy and feature fidelity in generated instances. To address the IFG task, we introduce the Instance Feature Adapter (IFAdapter). The IFAdapter enhances feature depiction by incorporating additional appearance tokens and utilizing an Instance Semantic Map to align instance-level features with spatial locations. The IFAdapter guides the diffusion process as a plug-and-play module, making it adaptable to various community models. For evaluation, we contribute an IFG benchmark and develop a verification pipeline to objectively compare models' abilities to generate instances with accurate positioning and features. Experimental results demonstrate that IFAdapter outperforms other models in both quantitative and qualitative evaluations.

翻訳日:2024-11-07 21:20:36 公開日:2024-11-06

# $f$-divergence最小化によるテキスト・画像生成のアライメントパラダイムの一般化

Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization ( http://arxiv.org/abs/2409.09774v2 )

ライセンス: Link先を確認

Haoyuan Sun, Bo Xia, Yongzhe Chang, Xueqian Wang,

(参考訳) 直接選好最適化(DPO)は、最近、大きな言語モデル(LLM)の整合化から、テキストから画像モデルと人間の選好の整合化まで、その成功例を拡張した。しかし, これらの手法は, 微調整モデルと参照モデルとのアライメント過程において, 逆クルバック・リーブラー分岐の最小化にのみ依存している。本研究では,テキスト・ツー・イメージ・モデルのアライメントパラダイムにおける逆のKullback-Leibler分散を$f$-divergenceに拡張することに着目し,優れたアライメント性能と優れた世代多様性を実現することを目的とした。我々は、$f$-divergence条件下でのアライメントパラダイムの一般化式を提供し、勾配場の観点から異なる分散制約がアライメントプロセスに与える影響を徹底的に分析する。本研究では, 画像テキストアライメント性能, 人的価値アライメント性能, 世代多様性パフォーマンスを, 異なる分散制約下で総合的に評価し, イェンセン=シャノンの発散に基づくアライメントが, それらの間に最高のトレードオフをもたらすことを示す。テキストと画像のアライメントに使用する分散オプションは、アライメント性能(特に人的価値アライメント)とジェネレーション多様性のトレードオフに大きく影響する。

Direct Preference Optimization (DPO) has recently expanded its successful application from aligning large language models (LLMs) to aligning text-to-image models with human preferences, which has generated considerable interest within the community. However, we have observed that these approaches rely solely on minimizing the reverse Kullback-Leibler divergence during alignment process between the fine-tuned model and the reference model, neglecting the incorporation of other divergence constraints. In this study, we focus on extending reverse Kullback-Leibler divergence in the alignment paradigm of text-to-image models to $f$-divergence, which aims to garner better alignment performance as well as good generation diversity. We provide the generalized formula of the alignment paradigm under the $f$-divergence condition and thoroughly analyze the impact of different divergence constraints on alignment process from the perspective of gradient fields. We conduct comprehensive evaluation on image-text alignment performance, human value alignment performance and generation diversity performance under different divergence constraints, and the results indicate that alignment based on Jensen-Shannon divergence achieves the best trade-off among them. The option of divergence employed for aligning text-to-image models significantly impacts the trade-off between alignment performance (especially human value alignment) and generation diversity, which highlights the necessity of selecting an appropriate divergence for practical applications.

翻訳日:2024-11-07 20:46:36 公開日:2024-11-06

Haoyuan Sun, Bo Xia, Yongzhe Chang, Xueqian Wang,

翻訳日:2024-11-07 20:46:36 公開日:2024-11-06

# 海上サイバーセキュリティ:総合的なレビュー

Maritime Cybersecurity: A Comprehensive Review ( http://arxiv.org/abs/2409.11417v2 )

ライセンス: Link先を確認

Meixuan Li, Jianying Zhou, Sudipta Chattopadhyay, Mark Goh,

(参考訳) 海上産業は危機的な危機に立たされ、技術的進歩の必須条件は、堅牢なサイバーセキュリティ対策の必要性と交差している。海上サイバーセキュリティ(英: Maritime Cybersecurity)とは、海洋産業におけるコンピュータシステムとデジタル資産の保護、および海洋生態系を構成する相互接続コンポーネントの広範なネットワークを指す。本研究では,海上サイバーセキュリティの重要領域を特定し,その有効性を評価することを目的とする。 AIS, GNSS, ECDIS, VDR, RADAR, VSAT, GMDSSを含む主要海洋システムにおける脅威の詳細な分析を行った。海上サイバー攻撃の多次元分類が提示され、脅威アクター、モチベーション、影響に関する洞察を提供する。統合ソリューションからコンポーネント固有のソリューションまで、さまざまなセキュリティソリューションの評価も行っています。最後に、オープンな課題と将来のソリューションを共有しました。補足セクションでは,本調査で論じられた容器コンポーネントの定義と脆弱性について紹介する。重要な相互接続の側面でこれらすべての重要な問題に対処することで、このレビューはより回復力のある海洋生態系を育むことを目的としている。

The maritime industry stands at a critical juncture, where the imperative for technological advancement intersects with the pressing need for robust cybersecurity measures. Maritime cybersecurity refers to the protection of computer systems and digital assests within the maritime industry, as well as the broader network of interconnected components that make up the maritime ecosystem. In this survey, we aim to identify the significant domains of maritime cybersecurity and measure their effectiveness. We have provided an in-depth analysis of threats in key maritime systems, including AIS, GNSS, ECDIS, VDR, RADAR, VSAT, and GMDSS, while exploring real-world cyber incidents that have impacted the sector. A multi-dimensional taxonomy of maritime cyber attacks is presented, offering insights into threat actors, motivations, and impacts. We have also evaluated various security solutions, from integrated solutions to component specific solutions. Finally, we have shared open challenges and future solutions. In the supplementary section, we have presented definitions and vulnerabilities of vessel components that have discussed in this survey. By addressing all these critical issues with key interconnected aspects, this review aims to foster a more resilient maritime ecosystem.

翻訳日:2024-11-07 20:01:55 公開日:2024-11-06

# 海上サイバーセキュリティ:総合的なレビュー

Maritime Cybersecurity: A Comprehensive Review ( http://arxiv.org/abs/2409.11417v3 )

ライセンス: Link先を確認

Meixuan Li, Jianying Zhou, Sudipta Chattopadhyay, Mark Goh,

翻訳日:2024-11-07 20:01:55 公開日:2024-11-06

# カルタン移動フレームとデータ多様体

Cartan moving frames and the data manifolds ( http://arxiv.org/abs/2409.12057v2 )

ライセンス: Link先を確認

Eliot Tron, Rita Fioresi, Nicolas Couellan, Stéphane Puechmorel,

(参考訳) 本研究の目的は,データ情報量とデータ点の曲率を用いて,カルタン移動フレームの言語を用いて,データ多様体とそのリーマン構造の幾何学を研究することである。このフレームワークと実験を通じて、ニューラルネットワークの応答に関する説明は、与えられた入力から容易に到達可能な出力クラスを指摘することによって与えられる。このことは、ネットワークの出力と入力の幾何学との間の数学的関係が、説明可能な人工知能ツールとしてどのように活用できるかを強調している。

The purpose of this paper is to employ the language of Cartan moving frames to study the geometry of the data manifolds and its Riemannian structure, via the data information metric and its curvature at data points. Using this framework and through experiments, explanations on the response of a neural network are given by pointing out the output classes that are easily reachable from a given input. This emphasizes how the proposed mathematical relationship between the output of the network and the geometry of its inputs can be exploited as an explainable artificial intelligence tool.

翻訳日:2024-11-07 19:50:48 公開日:2024-11-06

# TalkMosaic:マルチモーダルLLMQ&Aインタラクションによる対話型フォトモザイク

TalkMosaic: Interactive PhotoMosaic with Multi-modal LLM Q&A Interactions ( http://arxiv.org/abs/2409.13941v2 )

ライセンス: Link先を確認

Kevin Li, Fulu Li,

(参考訳) 本研究では, 環境保護のテーマとして, 鳥やライオンなどの動物のイメージを構成するために, 幅広い種類の車両の画像を用いて, 合成画像中の車に関する情報を最大化し, 環境問題に対する意識を高める。本稿では,写真モザイク画像中のタイル画像とそれに対応する原車画像とのインタラクティブな切り替えをデスクトップ上に自動的に保存する「クリック・アンド・ディスプレイ」という簡単な操作を用いて,芸術的に構成されたフォトモザイク画像とのインタラクションを示す。カーイメージ情報と関連する知識をChatGPTに組み込むことで,TalkMosaicというマルチモーダルカスタムGPTを構築する。元のカーイメージをTalkMosaicにアップロードすることで、与えられたカーイメージについて質問し、高い環境基準を満たす車イメージのタイヤの購入場所など、効率よく、かつ効果的に回答を得ることができる。スパースアテンションと量子化技術を用いてマルチモーダル LLM の推論を高速化する方法を,提案した確率的 FlashAttention (PrFlashAttention) 法とStaircase Adaptive Quantization (SAQ) 法を用いて詳細に解析する。実装されたプロトタイプは,提案手法の有効性と有効性を示す。

We use images of cars of a wide range of varieties to compose an image of an animal such as a bird or a lion for the theme of environmental protection to maximize the information about cars in a single composed image and to raise the awareness about environmental challenges. We present a novel way of image interaction with an artistically-composed photomosaic image, in which a simple operation of "click and display" is used to demonstrate the interactive switch between a tile image in a photomosaic image and the corresponding original car image, which will be automatically saved on the Desktop. We build a multimodal custom GPT named TalkMosaic by incorporating car images information and the related knowledge to ChatGPT. By uploading the original car image to TalkMosaic, we can ask questions about the given car image and get the corresponding answers efficiently and effectively such as where to buy the tire in the car image that satisfies high environmental standards. We give an in-depth analysis on how to speed up the inference of multimodal LLM using sparse attention and quantization techniques with presented probabilistic FlashAttention (PrFlashAttention) and Staircase Adaptive Quantization (SAQ) methods. The implemented prototype demonstrates the feasibility and effectiveness of the presented approach.

翻訳日:2024-11-07 19:50:48 公開日:2024-11-06

# 電気自動車インターネットにおける生成人工知能の役割

The Roles of Generative Artificial Intelligence in Internet of Electric Vehicles ( http://arxiv.org/abs/2409.15750v2 )

ライセンス: Link先を確認

Hanwen Zhang, Dusit Niyato, Wei Zhang, Changyuan Zhao, Hongyang Du, Abbas Jamalipour, Sumei Sun, Yiyang Pei,

(参考訳) 生成人工知能(GenAI)モデルの進歩により、その能力はコンテンツ生成を超えて大幅に拡大し、さまざまなアプリケーションにまたがってモデルの利用が増えている。特にGenAIは、充電管理からサイバー攻撃防止まで、電気自動車(EV)エコシステムの課題に対処する大きな可能性を示している。本稿では、電気自動車のインターネット(IoEV)を具体的に検討し、IoEV用のGenAIを、EVのバッテリ層、個々のEV層、スマートグリッド層、セキュリティ層という4つの異なるレイヤに分類する。 IoEVアプリケーションの各レイヤで使用されるさまざまなGenAI技術を紹介します。その後、GenAIモデルをトレーニングするための公開データセットが要約される。最後に、今後の方向性について推奨する。この調査は、異なるレイヤにわたるIoEVにおけるGenAIの応用を分類するだけでなく、各レイヤにおける設計と実装の課題を強調することで、研究者や実践者にとって貴重なリソースとして役立ちます。さらに、将来の研究方向性のロードマップを提供し、より堅牢で効率的なIoEVシステムの開発を可能にする。

With the advancements of generative artificial intelligence (GenAI) models, their capabilities are expanding significantly beyond content generation and the models are increasingly being used across diverse applications. Particularly, GenAI shows great potential in addressing challenges in the electric vehicle (EV) ecosystem ranging from charging management to cyber-attack prevention. In this paper, we specifically consider Internet of electric vehicles (IoEV) and we categorize GenAI for IoEV into four different layers namely, EV's battery layer, individual EV layer, smart grid layer, and security layer. We introduce various GenAI techniques used in each layer of IoEV applications. Subsequently, public datasets available for training the GenAI models are summarized. Finally, we provide recommendations for future directions. This survey not only categorizes the applications of GenAI in IoEV across different layers but also serves as a valuable resource for researchers and practitioners by highlighting the design and implementation challenges within each layer. Furthermore, it provides a roadmap for future research directions, enabling the development of more robust and efficient IoEV systems through the integration of advanced GenAI techniques.