Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240716となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# メラノーマ検出のためのハイブリッドディープラーニングフレームワーク Hybrid Deep Learning Framework for Enhanced Melanoma Detection ( http://arxiv.org/abs/2408.00772v1 ) ライセンス: Link先を確認	Peng Zhang, Divya Chaudhary,	(参考訳) がんは世界中で主要な死因であり、早期発見と治療技術の進歩を必要としている。本稿では,皮膚画像の分類におけるU-Netの長所と有効ネットの長所を相乗的に組み合わせた,新規で高効率なメラノーマ検出フレームワークを提案する。本研究の目的は, メラノーマ検出の精度と効率を, 革新的なハイブリッドアプローチにより向上させることである。我々は、HAM10000データセットを使用して、U-Netモデルを綿密にトレーニングし、癌領域を正確に分類できるようにした。同時に,ISIC 2020データセットを用いてEfficientNetモデルをトレーニングし,皮膚がんのバイナリ分類に最適化した。私たちのハイブリッドモデルは、ISIC 2020データセットで99.01%の顕著な精度を達成することで、パフォーマンスを著しく向上させる。この例外的な結果は、既存のモデル構造と比較して、我々のアプローチの優位性を示している。 EfficientNetの高度な分類技術とU-Netの正確なセグメンテーション機能を統合することで、我々のフレームワークはメラノーマ検出のための包括的なソリューションを提供する。大規模な実験の結果は,セグメント化タスクと分類タスクの両方において,提案手法の精度と信頼性を強調した。悪性黒色腫の早期診断と治療において医療従事者にとって堅牢なツールである。われわれのフレームワークは、皮膚がん自動検出の分野で新しいベンチマークを設定でき、この重要な医療画像領域におけるさらなる研究と開発を奨励できると信じている。 Cancer is a leading cause of death worldwide, necessitating advancements in early detection and treatment technologies. In this paper, we present a novel and highly efficient melanoma detection framework that synergistically combines the strengths of U-Net for segmentation and EfficientNet for the classification of skin images. The primary objective of our study is to enhance the accuracy and efficiency of melanoma detection through an innovative hybrid approach. We utilized the HAM10000 dataset to meticulously train the U-Net model, enabling it to precisely segment cancerous regions. Concurrently, we employed the ISIC 2020 dataset to train the EfficientNet model, optimizing it for the binary classification of skin cancer. Our hybrid model demonstrates a significant improvement in performance, achieving a remarkable accuracy of 99.01% on the ISIC 2020 dataset. This exceptional result underscores the superiority of our approach compared to existing model structures. By integrating the precise segmentation capabilities of U-Net with the advanced classification prowess of EfficientNet, our framework offers a comprehensive solution for melanoma detection. The results of our extensive experiments highlight the high accuracy and reliability of our method in both segmentation and classification tasks. This indicates the potential of our hybrid approach to significantly enhance cancer detection, providing a robust tool for medical professionals in the early diagnosis and treatment of melanoma. We believe that our framework can set a new benchmark in the field of automated skin cancer detection, encouraging further research and development in this crucial area of medical imaging.	翻訳日:2024-08-19 05:28:21 公開日:2024-07-16
# K平均クラスタリングに基づく色抽出によるWebサイトの視覚分析のためのファジィ論理手法 Fuzzy Logic Approach For Visual Analysis Of Websites With K-means Clustering-based Color Extraction ( http://arxiv.org/abs/2408.00774v1 ) ライセンス: Link先を確認	Tamiris Abildayeva, Pakizar Shamoi,	(参考訳) ウェブサイトはインターネットの基礎を形成し、情報を広め、デジタルリソースにアクセスするためのプラットフォームとして機能する。ユーザーは幅広いコンテンツやサービスにアクセスできるようになり、インターネットの利便性が向上する。ウェブサイトの美学は、全体的な効果において重要な役割を担い、ユーザー体験、エンゲージメント、満足度に大きな影響を与えます。本稿では,世界中のインターネット利用者の増加を踏まえ,Webサイトデザインの美学がユーザエクスペリエンスの向上に重要であることを考察する。これは、しばしば50ミリ秒以内に形成される最初の印象が、ウェブサイトの魅力とユーザビリティに対するユーザの認識に重大な影響を与えることを強調している。本稿では、ファジィ論理を用いて、色調和とフォント人気に基づいてウェブサイトの美意識を測定する新しい手法を提案する。私たちは、Webデザイントレンドの動的性質に対する妥当性と適応性を確保するために、200近い人気で頻繁に使用されるWebサイトデザインからなる、独自のデータセットを収集しました。ウェブサイトのスクリーンショットから、k-meansクラスタリングを用いて、支配的な色を抽出した。本研究の目的は,Webサイトデザインにおける美学とユーザビリティの関係の理解を深めることである。 Websites form the foundation of the Internet, serving as platforms for disseminating information and accessing digital resources. They allow users to engage with a wide range of content and services, enhancing the Internet's utility for all. The aesthetics of a website play a crucial role in its overall effectiveness and can significantly impact user experience, engagement, and satisfaction. This paper examines the importance of website design aesthetics in enhancing user experience, given the increasing number of internet users worldwide. It emphasizes the significant impact of first impressions, often formed within 50 milliseconds, on users' perceptions of a website's appeal and usability. We introduce a novel method for measuring website aesthetics based on color harmony and font popularity, using fuzzy logic to predict aesthetic preferences. We collected our own dataset, consisting of nearly 200 popular and frequently used website designs, to ensure relevance and adaptability to the dynamic nature of web design trends. Dominant colors from website screenshots were extracted using k-means clustering. The findings aim to improve understanding of the relationship between aesthetics and usability in website design.	翻訳日:2024-08-19 05:28:21 公開日:2024-07-16
# 多スケール偏微分方程式に対する拡張畳み込みニューラル作用素 Dilated convolution neural operator for multiscale partial differential equations ( http://arxiv.org/abs/2408.00775v1 ) ライセンス: Link先を確認	Bo Xu, Xinliang Liu, Lei Zhang,	(参考訳) 本稿では,多スケール偏微分方程式に対するデータ駆動型演算子学習法を提案する。低周波数フーリエモードのような低ランクなグローバルベースと粗いパッチ(拡張畳み込みに類似)上の局所化されたベースの組み合わせによるマルチスケールパラメータ化ソリューションの表現からインスピレーションを得て、Dilated Convolutional Neural Operator (DCNO)を提案する。 DCNOアーキテクチャは、畳み込み層とフーリエ層を組み合わせて低計算コストを維持しながら、高周波と低周波の両方の特徴を効果的に捉えている。我々は,多スケール楕円型方程式,逆問題,ナビエ・ストークス方程式,ヘルムホルツ方程式など,様々なデータセット上でのDCNOの性能を評価する実験を行った。我々は,DCNOが精度と計算コストの最適なバランスをとることを示し,マルチスケール演算子学習に有望なソリューションを提供する。 This paper introduces a data-driven operator learning method for multiscale partial differential equations, with a particular emphasis on preserving high-frequency information. Drawing inspiration from the representation of multiscale parameterized solutions as a combination of low-rank global bases (such as low-frequency Fourier modes) and localized bases over coarse patches (analogous to dilated convolution), we propose the Dilated Convolutional Neural Operator (DCNO). The DCNO architecture effectively captures both high-frequency and low-frequency features while maintaining a low computational cost through a combination of convolution and Fourier layers. We conduct experiments to evaluate the performance of DCNO on various datasets, including the multiscale elliptic equation, its inverse problem, Navier-Stokes equation, and Helmholtz equation. We show that DCNO strikes an optimal balance between accuracy and computational cost and offers a promising solution for multiscale operator learning.	翻訳日:2024-08-19 05:28:21 公開日:2024-07-16
# CATD:脳波対fMRIのクロスモーダル生成のための統一表現学習 CATD: Unified Representation Learning for EEG-to-fMRI Cross-Modal Generation ( http://arxiv.org/abs/2408.00777v1 ) ライセンス: Link先を確認	Weiheng Yao, Shuqiang Wang,	(参考訳) マルチモーダル・ニューロイメージング分析は、異なるイメージング技術の統合を可能にするため、脳機能と病理の包括的理解に不可欠であり、個々のモダリティの限界を克服する。しかし、高いコストと特定のモダリティの可用性の制限は、大きな課題を引き起こしている。これらの課題に対処するために,脳波(EEG)信号から機能的磁気共鳴画像(fMRI)検出血酸素レベル依存性(BOLD)信号を生成するために,ニューロイメージングの終端と終端のクロスモーダル合成のための条件付き時間拡散(CATD)フレームワークを提案する。条件付きアラインドブロック(CAB)を構築することにより、異種ニューロイメージングはポテンシャル空間に整列し、ニューロイメージングにおけるクロスモーダル変換の基礎となる統一された表現を実現する。構築されたDynamic Time-Frequency Segmentation (DTFS)モジュールと組み合わせることで、脳波信号を使用してBOLD信号の時間分解能を改善し、脳のダイナミックな詳細を捉えることができる。実験により,神経活動予測の精度の向上,異常脳領域の同定,BOLD信号の時間分解能の向上にフレームワークの有効性が示された。提案フレームワークは、異種神経画像データを潜在的表現空間に統一し、パーキンソン病予測の改善や異常脳領域の同定などの医学的応用の約束を示すことにより、ニューロイメージングのクロスモーダル合成のための新しいパラダイムを確立する。 Multi-modal neuroimaging analysis is crucial for a comprehensive understanding of brain function and pathology, as it allows for the integration of different imaging techniques, thus overcoming the limitations of individual modalities. However, the high costs and limited availability of certain modalities pose significant challenges. To address these issues, this paper proposed the Condition-Aligned Temporal Diffusion (CATD) framework for end-to-end cross-modal synthesis of neuroimaging, enabling the generation of functional magnetic resonance imaging (fMRI)-detected Blood Oxygen Level Dependent (BOLD) signals from more accessible Electroencephalography (EEG) signals. By constructing Conditionally Aligned Block (CAB), heterogeneous neuroimages are aligned into a potential space, achieving a unified representation that provides the foundation for cross-modal transformation in neuroimaging. The combination with the constructed Dynamic Time-Frequency Segmentation (DTFS) module also enables the use of EEG signals to improve the temporal resolution of BOLD signals, thus augmenting the capture of the dynamic details of the brain. Experimental validation demonstrated the effectiveness of the framework in improving the accuracy of neural activity prediction, identifying abnormal brain regions, and enhancing the temporal resolution of BOLD signals. The proposed framework establishes a new paradigm for cross-modal synthesis of neuroimaging by unifying heterogeneous neuroimaging data into a potential representation space, showing promise in medical applications such as improving Parkinson's disease prediction and identifying abnormal brain regions.	翻訳日:2024-08-19 05:28:21 公開日:2024-07-16
# フロントエンド拡散: 抽象的から詳細なタスク遷移によるインテントベースのユーザインタフェースの探索 Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions ( http://arxiv.org/abs/2408.00778v1 ) ライセンス: Link先を確認	Qinshi Zhang, Latisha Besariani Hendra, Mohan Chi, Zijian Ding,	(参考訳) Generative AIの出現は、コマンドベースのユーザインターフェースからインテントベースの結果仕様へのパラダイムシフトを引き起こしている。本稿では,ユーザインタフェースの抽象化と具体的実装のギャップを埋めることを目的として,フロントエンドコード生成の文脈における抽象的から詳細的なタスク遷移を,意図に基づくユーザインタフェースへのステップとして検討する。本稿では,ユーザスケッチから高品質なWebサイトを生成する,エンドツーエンドのLDMツールであるFrontend Diffusionを紹介する。このシステムは、スケッチ、書き込み、コーディングという3段階のタスク遷移プロセスを採用している。複雑なタスクにおける人的介入やコミュニケーションコストを低減するために,タスク遷移の可能性を示す。我々の研究は、他のドメインで同様のアプローチを探求するための道を開き、ビデオ制作のようなより複雑で相互依存的なタスクにまで拡張する可能性がある。 The emergence of Generative AI is catalyzing a paradigm shift in user interfaces from command-based to intent-based outcome specification. In this paper, we explore abstract-to-detailed task transitions in the context of frontend code generation as a step towards intent-based user interfaces, aiming to bridge the gap between abstract user intentions and concrete implementations. We introduce Frontend Diffusion, an end-to-end LLM-powered tool that generates high-quality websites from user sketches. The system employs a three-stage task transition process: sketching, writing, and coding. We demonstrate the potential of task transitions to reduce human intervention and communication costs in complex tasks. Our work also opens avenues for exploring similar approaches in other domains, potentially extending to more complex, interdependent tasks such as video production.	翻訳日:2024-08-19 05:28:21 公開日:2024-07-16
# ショーショット画像分類のためのシームズ変圧器ネットワーク Siamese Transformer Networks for Few-shot Image Classification ( http://arxiv.org/abs/2408.01427v1 ) ライセンス: Link先を確認	Weihao Jiang, Shuoxi Zhang, Kun He,	(参考訳) 人間は視覚分類タスクにおいて顕著な熟練度を示し、最小限の例で新しい画像を正確に認識し分類する。この能力は、詳細に集中し、以前の画像と新しい画像の間で共通の特徴を識別する能力に起因している。対照的に、既存の少数ショット画像分類法は、大域的特徴と局所的特徴の両方を強調し、両者を統合することを考える研究はほとんどない。この制限に対処するため,Samese Transformer Network (STN) に基づく新しいアプローチを提案する。提案手法では,事前学習した視覚変換器 (ViT) アーキテクチャを用いて,グローバルな特徴と局所的な特徴を抽出する2つの並列分岐ネットワークを用いる。具体的には、ViT-Smallネットワークアーキテクチャを実装し、自己教師付き学習によって得られた事前学習モデルパラメータを用いて分岐ネットワークを初期化する。ユークリッド距離測度を大域的特徴に適用し,KL(Kulback-Leibler)偏差測度を局所特徴量に適用する。 2つの指標を統合するために、まずL2正規化を用い、次に正規化結果を重み付けして最終的な類似度スコアを得る。この戦略は、グローバル機能とローカル機能の両方の利点を生かし、相補的なメリットを保証します。トレーニングフェーズでは、ネットワーク全体を微調整するメタラーニングアプローチを採用しています。本戦略は, 複雑な特徴適応モジュールの必要性を回避し, モデルの一般化能力を高めることを目的として, 画像分類におけるグローバルな特徴と局所的な特徴の可能性を効果的に活用する。大規模な実験により、我々のフレームワークはシンプルで有効であり、5ショットと1ショットの両方のシナリオで人気のある4つの数ショット分類ベンチマークの最先端のベースラインよりも優れたパフォーマンスを実現していることが示された。 Humans exhibit remarkable proficiency in visual classification tasks, accurately recognizing and classifying new images with minimal examples. This ability is attributed to their capacity to focus on details and identify common features between previously seen and new images. In contrast, existing few-shot image classification methods often emphasize either global features or local features, with few studies considering the integration of both. To address this limitation, we propose a novel approach based on the Siamese Transformer Network (STN). Our method employs two parallel branch networks utilizing the pre-trained Vision Transformer (ViT) architecture to extract global and local features, respectively. Specifically, we implement the ViT-Small network architecture and initialize the branch networks with pre-trained model parameters obtained through self-supervised learning. We apply the Euclidean distance measure to the global features and the Kullback-Leibler (KL) divergence measure to the local features. To integrate the two metrics, we first employ L2 normalization and then weight the normalized results to obtain the final similarity score. This strategy leverages the advantages of both global and local features while ensuring their complementary benefits. During the training phase, we adopt a meta-learning approach to fine-tune the entire network. Our strategy effectively harnesses the potential of global and local features in few-shot image classification, circumventing the need for complex feature adaptation modules and enhancing the model's generalization ability. Extensive experiments demonstrate that our framework is simple yet effective, achieving superior performance compared to state-of-the-art baselines on four popular few-shot classification benchmarks in both 5-shot and 1-shot scenarios.	翻訳日:2024-08-19 05:08:48 公開日:2024-07-16
# レイテンシ最適化ディープニューラルネットワーク(DNNs):チップ上のマルチプロセッサシステム(MPSoC)を用いたエッジでの人工知能アプローチ Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC) ( http://arxiv.org/abs/2407.18264v1 ) ライセンス: Link先を確認	Seyed Nima Omidsajedi, Rekha Reddy, Jianming Yi, Jan Herbst, Christoph Lipps, Hans Dieter Schotten,	(参考訳) 6G通信システムから自律運転プラットフォームに至るまで、計算に大きく依存するほとんどのアプリケーションにおいて、計算の大部分はクライアント側であるべきです。モバイルデバイスにおけるエッジコンピューティング(Edge at Edge)は、この要件に対処するための最適化されたアプローチのひとつだ。そこで本研究では,低レイテンシ・電力最適化型スマートモバイルシステムの実現の可能性と課題について検討する。 FPGA(Field Programmable Gate Array)ベースのソリューションをエッジで利用すると、帯域幅最適化設計が実現し、結果としてシステムレベルのデッドラインでの計算効率が向上する。さらに,組込みFPGAエッジデバイス(MPSoC(Xilinx Multiprocessor System on Chip))とクラウドの両方におけるニューラルネットワーク(NN)の性能面と実装可能性について論じる。この研究の主な目的は、Xilinx Inc.によって開発されたディープラーニングプログラマブルエンジンをハードウェアアクセラレーターの主要コンポーネントとして使用するハイブリッドシステムの実証である。そして、この設計に基づいて、組込みソリューションを用いて、モバイルエッジコンピューティングの効率的なシステムを表現する。 Almost in every heavily computation-dependent application, from 6G communication systems to autonomous driving platforms, a large portion of computing should be near to the client side. Edge computing (AI at Edge) in mobile devices is one of the optimized approaches for addressing this requirement. Therefore, in this work, the possibilities and challenges of implementing a low-latency and power-optimized smart mobile system are examined. Utilizing Field Programmable Gate Array (FPGA) based solutions at the edge will lead to bandwidth-optimized designs and as a consequence can boost the computational effectiveness at a system-level deadline. Moreover, various performance aspects and implementation feasibilities of Neural Networks (NNs) on both embedded FPGA edge devices (using Xilinx Multiprocessor System on Chip (MPSoC)) and Cloud are discussed throughout this research. The main goal of this work is to demonstrate a hybrid system that uses the deep learning programmable engine developed by Xilinx Inc. as the main component of the hardware accelerator. Then based on this design, an efficient system for mobile edge computing is represented by utilizing an embedded solution.	翻訳日:2024-08-05 01:35:56 公開日:2024-07-16
# NudgeRank: パーソナライズされた健康のためのディジタルアルゴリズムナッジ NudgeRank: Digital Algorithmic Nudging for Personalized Health ( http://arxiv.org/abs/2407.20241v1 ) ライセンス: Link先を確認	Jodi Chiam, Aloysius Lim, Ankur Teredesai,	(参考訳) 本稿では、人口規模でポジティブな健康行動を促進するために設計された革新的なデジタルアルゴリズムヌードシステムであるNudgeRankについて述べる。拡張可能な知識グラフを付加したグラフニューラルネットワークの新たな組み合わせを利用して、このレコメンダシステムは本番環境で運用されており、パーソナライズされたコンテキスト対応のナッジを毎日1100万人以上の介護者に提供する。この企業展開は、さまざまな健康状態とウェアラブルデバイスを収容するAI駆動型ヘルス行動変革イニシアチブの中で、最大のもののひとつだ。厳格な評価は、日歩が6.17%増加し、運動時間が7.61%増加したことを含む、統計的に有意な健康改善を示している。さらにユーザエンゲージメントとプログラムの登録が増加し、ベースラインシステムの4%に比べて13.1%のオープンレートとなった。スケーラビリティと信頼性を実証するため、NudgeRankは、製品システムに不可欠な自動化と可観測性標準を維持しながら、コモディティな計算資源を効率的に運用している。 In this paper we describe NudgeRank, an innovative digital algorithmic nudging system designed to foster positive health behaviors on a population-wide scale. Utilizing a novel combination of Graph Neural Networks augmented with an extensible Knowledge Graph, this Recommender System is operational in production, delivering personalized and context-aware nudges to over 1.1 million care recipients daily. This enterprise deployment marks one of the largest AI-driven health behavior change initiatives, accommodating diverse health conditions and wearable devices. Rigorous evaluation reveals statistically significant improvements in health outcomes, including a 6.17% increase in daily steps and 7.61% more exercise minutes. Moreover, user engagement and program enrollment surged, with a 13.1% open rate compared to baseline systems' 4%. Demonstrating scalability and reliability, NudgeRank operates efficiently on commodity compute resources while maintaining automation and observability standards essential for production systems.	翻訳日:2024-08-05 00:56:24 公開日:2024-07-16
# BadRobot:物理世界でLLMベースの体操AIをジェイルブレイク BadRobot: Jailbreaking LLM-based Embodied AI in the Physical World ( http://arxiv.org/abs/2407.20242v1 ) ライセンス: Link先を確認	Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Shengshan Hu, Leo Yu Zhang,	(参考訳) 人工知能(AI)は、センサーやアクチュエータを通じて物理的な世界と相互作用し、知覚と行動をシームレスに統合する人工知能システムである。この設計により、AIは複雑な現実世界の環境から学び、操作することができる。大規模言語モデル(LLM)は言語命令を深く探求し、複雑なタスクの計画策定において重要な役割を担います。その結果、LLMベースのインボディードAIがコミュニティ内の研究の焦点として現れ、エンボディードAIを増強する大きな可能性を徐々に示してきた。今後10年間で、LLMベースのエンボディAIロボットが広く普及し、家庭や産業で一般的なものになるだろうと予測されている。 LLMベースのインボディードAIは有害な振る舞いを迫害するだろうか? アシモフの『3つのロボット法則』に逆らって人間の安全を脅かすこのロボットがもたらす深刻なリスクを、我々の研究は初めて確認した。具体的には、AIのジェイルブレイクを具体化して、3つの重大なセキュリティ上の脆弱性を露呈する。我々はまた、潜在的な緩和策を分析し、実世界における具体化されたAIアプリケーションの安全性に関するコミュニティの認識を提唱する。 Embodied artificial intelligence (AI) represents an artificial intelligence system that interacts with the physical world through sensors and actuators, seamlessly integrating perception and action. This design enables AI to learn from and operate within complex, real-world environments. Large Language Models (LLMs) deeply explore language instructions, playing a crucial role in devising plans for complex tasks. Consequently, they have progressively shown immense potential in empowering embodied AI, with LLM-based embodied AI emerging as a focal point of research within the community. It is foreseeable that, over the next decade, LLM-based embodied AI robots are expected to proliferate widely, becoming commonplace in homes and industries. However, a critical safety issue that has long been hiding in plain sight is: could LLM-based embodied AI perpetrate harmful behaviors? Our research investigates for the first time how to induce threatening actions in embodied AI, confirming the severe risks posed by these soon-to-be-marketed robots, which starkly contravene Asimov's Three Laws of Robotics and threaten human safety. Specifically, we formulate the concept of embodied AI jailbreaking and expose three critical security vulnerabilities: first, jailbreaking robotics through compromised LLM; second, safety misalignment between action and language spaces; and third, deceptive prompts leading to unaware hazardous behaviors. We also analyze potential mitigation measures and advocate for community awareness regarding the safety of embodied AI applications in the physical world.	翻訳日:2024-08-05 00:56:24 公開日:2024-07-16
# 小児相談における軽量オープンソース大言語モデルの性能評価 : 比較分析 Performance Evaluation of Lightweight Open-source Large Language Models in Pediatric Consultations: A Comparative Analysis ( http://arxiv.org/abs/2407.15862v1 ) ライセンス: Link先を確認	Qiuhong Wei, Ying Cui, Mengwei Ding, Yanqin Wang, Lingling Xiang, Zhengxiong Yao, Ceran Chen, Ying Long, Zhezhen Jin, Ximing Xu,	(参考訳) 大規模言語モデル(LLM)は医療への応用の可能性を示しているが、データのプライバシーと計算上の負担は医療機関への展開を制限する。 LLMのオープンソース版と軽量版は潜在的な解決策として浮上するが、その性能、特に小児科の環境では未調査である。 2022年12月1日から2023年10月30日にかけて、25の小児科からそれぞれ10の質問が寄せられた。 2つの軽量オープンソースLLM、ChatGLM3-6BとVicuna-7Bは、より大規模なモデルであるVicuna-13Bと、広く使われているプロプライエタリなChatGPT-3.5と共に、2023年11月1日から2023年11月7日までの間に、これらの質問に独立して答えた。再現性を評価するために、各調査は一度複製された。 We found that ChatGLM3-6B showed higher accuracy and completeness than Vicuna-13B and Vicuna-7B (P < .001) but all performance by ChatGPT-3.5。 ChatGPT-3.5は、ChatGLM3-6B (41.2%)、Vicuna-13B (11.2%)、Vicuna-7B (4.4%)と比較して高い評価を受けた。同様に、ChatGPT-3.5が78.4%、ChatGLM3-6Bが76.0%、Vicuna-13Bが34.8%、Vicuna-7Bが22.0%だった。 ChatGLM3-6Bは読みやすさにおいてChatGPT-3.5と一致し、どちらもVicunaモデル(P < .001)を上回った。共感の面では、ChatGPT-3.5は軽量LLM(P < .001)よりも優れていた。安全性の面では、全てのモデルが良好に動作し(P > .05)、98.4%以上の応答が安全であると評価された。調査を繰り返して確認した。結論として、軽量LSMは小児医療に有望な応用を実証している。しかし、軽量と大規模プロプライエタリなLLM間のギャップは、継続的な開発努力の必要性を浮き彫りにしている。 Large language models (LLMs) have demonstrated potential applications in medicine, yet data privacy and computational burden limit their deployment in healthcare institutions. Open-source and lightweight versions of LLMs emerge as potential solutions, but their performance, particularly in pediatric settings remains underexplored. In this cross-sectional study, 250 patient consultation questions were randomly selected from a public online medical forum, with 10 questions from each of 25 pediatric departments, spanning from December 1, 2022, to October 30, 2023. Two lightweight open-source LLMs, ChatGLM3-6B and Vicuna-7B, along with a larger-scale model, Vicuna-13B, and the widely-used proprietary ChatGPT-3.5, independently answered these questions in Chinese between November 1, 2023, and November 7, 2023. To assess reproducibility, each inquiry was replicated once. We found that ChatGLM3-6B demonstrated higher accuracy and completeness than Vicuna-13B and Vicuna-7B (P < .001), but all were outperformed by ChatGPT-3.5. ChatGPT-3.5 received the highest ratings in accuracy (65.2%) compared to ChatGLM3-6B (41.2%), Vicuna-13B (11.2%), and Vicuna-7B (4.4%). Similarly, in completeness, ChatGPT-3.5 led (78.4%), followed by ChatGLM3-6B (76.0%), Vicuna-13B (34.8%), and Vicuna-7B (22.0%) in highest ratings. ChatGLM3-6B matched ChatGPT-3.5 in readability, both outperforming Vicuna models (P < .001). In terms of empathy, ChatGPT-3.5 outperformed the lightweight LLMs (P < .001). In safety, all models performed comparably well (P > .05), with over 98.4% of responses being rated as safe. Repetition of inquiries confirmed these findings. In conclusion, Lightweight LLMs demonstrate promising application in pediatric healthcare. However, the observed gap between lightweight and large-scale proprietary LLMs underscores the need for continued development efforts.	翻訳日:2024-07-28 18:29:13 公開日:2024-07-16
# コントラスト学習における過度な適合? Overfitting In Contrastive Learning? ( http://arxiv.org/abs/2407.15863v1 ) ライセンス: Link先を確認	Zachary Rabin, Jim Davis, Benjamin Lewis, Matthew Scherreik,	(参考訳) オーバーフィッティング(Overfitting)は、モデルがトレーニングデータにあまりにも密接に適合し、結果として一般化が不十分な機械学習現象を記述している。この現象は、教師付き学習の様々な形態について完全に文書化されているが、教師付き学習の文脈では十分に研究されていない。本研究では,教師なしコントラスト学習における過剰適合の性質について検討する。オーバーフィッティングが実際に起こり、オーバーフィッティングのメカニズムが明らかになる。 Overfitting describes a machine learning phenomenon where the model fits too closely to the training data, resulting in poor generalization. While this occurrence is thoroughly documented for many forms of supervised learning, it is not well examined in the context of \underline{un}supervised learning. In this work we examine the nature of overfitting in unsupervised contrastive learning. We show that overfitting can indeed occur and the mechanism behind overfitting.	翻訳日:2024-07-28 18:29:13 公開日:2024-07-16
# 大規模移動データにおけるバイアスの緩和--大規模交通システムのモニタリングを事例として Mitigating biases in big mobility data: a case study of monitoring large-scale transit systems ( http://arxiv.org/abs/2407.14541v1 ) ライセンス: Link先を確認	Feilong Wang, Xuegang Ban, Peng Chen, Chenxi Liu, Rong Zhao,	(参考訳) ビッグモビリティデータセット(BMD)は、人間のモビリティを研究し、交通システムの性能を評価する上で、多くの利点を示してきた。しかし、BMDの質はいまだによく分かっていない。本研究では,BMDのバイアスを評価し,緩和法を開発した。今回の研究では、GoogleとAppleのモビリティデータを例として、政府機関のベンチマークデータと比較します。 BMDとベンチマークの時空間差が観察され,輸送アプリケーションへの影響が調査され,誤った政策立案を防止するために,これらのバイアスに緊急に対応する必要性が強調された。本研究は, バイアス緩和法の提案と試験を行う。この緩和されたBMDは、米国100郡以上の大規模公共交通システムに貴重な洞察を与え、新型コロナウイルス(COVID-19)からの交通システムの復旧に地域差があることが示されている。本研究は,BMDを用いた交通研究における注意点と,実践者にとって有益となる効果的な緩和策を提示するものである。 Big mobility datasets (BMD) have shown many advantages in studying human mobility and evaluating the performance of transportation systems. However, the quality of BMD remains poorly understood. This study evaluates biases in BMD and develops mitigation methods. Using Google and Apple mobility data as examples, this study compares them with benchmark data from governmental agencies. Spatio-temporal discrepancies between BMD and benchmark are observed and their impacts on transportation applications are investigated, emphasizing the urgent need to address these biases to prevent misguided policymaking. This study further proposes and tests a bias mitigation method. It is shown that the mitigated BMD could generate valuable insights into large-scale public transit systems across 100+ US counties, revealing regional disparities of the recovery of transit systems from the COVID-19. This study underscores the importance of caution when using BMD in transportation research and presents effective mitigation strategies that would benefit practitioners.	翻訳日:2024-07-23 22:03:21 公開日:2024-07-16
# ルールベース説明書とブラックボックスモデルの整合性を目指して -- ルール誘導とXAIに基づく特徴の融合 Towards consistency of rule-based explainer and black box model -- fusion of rule induction and XAI-based feature importance ( http://arxiv.org/abs/2407.14543v1 ) ライセンス: Link先を確認	Michał Kozielski, Marek Sikora, Łukasz Wawrowski,	(参考訳) ルールベースのモデルは、人間の理解可能な表現、すなわち解釈可能な表現を提供する。このため、ブラックボックスモデルと呼ばれる非解釈可能な複素モデルの決定を説明するために用いられる。このような説明の生成には、ルールベースモデルによるブラックボックスモデルの近似が含まれる。しかし,ルールベースモデルがブラックボックスモデルと同じような判断を下すかどうかについては,現時点では調査されていない。同様に意思決定は、決定の一貫性と意思決定に使用される最も重要な属性の一貫性として、この研究で理解されています。本研究では,ルールベースサロゲートモデルがブラックボックスモデルの性能を模倣することを保証する新しい手法を提案する。提案手法はルール生成を含む説明融合を行い,ブラックボックスモデルに対する選択されたXAI法で決定される特徴を考慮に入れた。この手法の結果は、大域的および局所的なルールに基づく説明である。提案手法の品質は,分類問題を表す30の表付きベンチマークデータセットの広範囲な解析により検証された。評価には, 基準法との比較と, 図案ケーススタディが含まれていた。さらに,本論文では,XAIにおけるルールベースアプローチの適用の可能性と,提案手法を含むルールベースの説明が,コンテンツとプレゼンテーションの両方のユーザ視点と要件を満たす方法について論じる。ソフトウェアと完全な実験結果を含む詳細なレポートはGitHubリポジトリ(https://github.com/ruleminer/FI-rules4XAI )で公開されている。 Rule-based models offer a human-understandable representation, i.e. they are interpretable. For this reason, they are used to explain the decisions of non-interpretable complex models, referred to as black box models. The generation of such explanations involves the approximation of a black box model by a rule-based model. To date, however, it has not been investigated whether the rule-based model makes decisions in the same way as the black box model it approximates. Decision making in the same way is understood in this work as the consistency of decisions and the consistency of the most important attributes used for decision making. This study proposes a novel approach ensuring that the rule-based surrogate model mimics the performance of the black box model. The proposed solution performs an explanation fusion involving rule generation and taking into account the feature importance determined by the selected XAI methods for the black box model being explained. The result of the method can be both global and local rule-based explanations. The quality of the proposed solution was verified by extensive analysis on 30 tabular benchmark datasets representing classification problems. Evaluation included comparison with the reference method and an illustrative case study. In addition, the paper discusses the possible pathways for the application of the rule-based approach in XAI and how rule-based explanations, including the proposed method, meet the user perspective and requirements for both content and presentation. The software created and a detailed report containing the full experimental results are available on the GitHub repository (https://github.com/ruleminer/FI-rules4XAI ).	翻訳日:2024-07-23 22:03:21 公開日:2024-07-16
# 大規模言語モデルのジェイルブレークにおけるクラップ入力による連続埋め込み攻撃 Continuous Embedding Attacks via Clipped Inputs in Jailbreaking Large Language Models ( http://arxiv.org/abs/2407.13796v1 ) ライセンス: Link先を確認	Zihao Xu, Yi Liu, Gelei Deng, Kailong Wang, Yuekang Li, Ling Shi, Stjepan Picek,	(参考訳) 大規模言語モデル(LLM)に対するセキュリティ上の懸念は最近エスカレートされ、個別のプロンプトでジェイルブレイクの試みを阻止することに焦点が当てられている。しかしながら、連続的な埋め込みから生じるジェイルブレイクの脆弱性の探索は制限されており、以前のアプローチは主に個別または連続的な接尾辞を入力に追加するものだった。本研究は,所望の出力が予め定義されている場合の補足の追加や特定の質問の必要をなくし,LSM入力に対して直接攻撃を行うための新しいチャネルを提案する。さらに、大規模なイテレーションは、出力の繰り返しによって特徴づけられる過度な適合につながることが多いことも観察します。これに対抗するために,CLIP というシンプルで効果的な戦略を提案する。実験の結果,繰り返し1000回に40回入力した場合,CLIPを適用するとASRは62%から83%に改善することがわかった。 Security concerns for large language models (LLMs) have recently escalated, focusing on thwarting jailbreaking attempts in discrete prompts. However, the exploration of jailbreak vulnerabilities arising from continuous embeddings has been limited, as prior approaches primarily involved appending discrete or continuous suffixes to inputs. Our study presents a novel channel for conducting direct attacks on LLM inputs, eliminating the need for suffix addition or specific questions provided that the desired output is predefined. We additionally observe that extensive iterations often lead to overfitting, characterized by repetition in the output. To counteract this, we propose a simple yet effective strategy named CLIP. Our experiments show that for an input length of 40 at iteration 1000, applying CLIP improves the ASR from 62% to 83%	翻訳日:2024-07-22 21:39:27 公開日:2024-07-16
# ディープ・パーセプチュアル・ハッシュを破る学習 : ニューラル・ハッシュのユースケース Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash ( http://arxiv.org/abs/2111.06628v5 ) ライセンス: Link先を確認	Lukas Struppek, Dominik Hintersdorf, Daniel Neider, Kristian Kersting,	(参考訳) Appleは最近、ユーザーのデバイス上で子どもの性的虐待物質(CSAM)を検知し、ファイルをiCloudサービスにアップロードする、深い知覚的ハッシュシステムNeuralHashを公開した。ユーザプライバシの保護とシステムの信頼性に関する批判が急速に起こった。本稿では,ニューラルハッシュに基づく深層感性ハッシュの包括的分析について述べる。具体的には、現在の深い知覚的ハッシュは堅牢でない可能性があることを示す。相手は、勾配ベースのアプローチによって引き起こされた画像のわずかな変化を施すことや、標準画像変換の実行、ハッシュ衝突の強制または防止によってハッシュ値を操作できる。このような攻撃は、悪意のあるアクターが容易に検出システムを利用することを可能にする。さらに、ハッシュ値を使用することで、ユーザデバイスに格納されたデータに関する推論を行うこともできる。私たちの見解では、私たちの結果に基づいて、現在の形式での深い知覚的ハッシュは、一般的に、堅牢なクライアント側のスキャンには準備ができておらず、プライバシの観点からは使用すべきではありません。 Apple recently revealed its deep perceptual hashing system NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to its iCloud service. Public criticism quickly arose regarding the protection of user privacy and the system's reliability. In this paper, we present the first comprehensive empirical analysis of deep perceptual hashing based on NeuralHash. Specifically, we show that current deep perceptual hashing may not be robust. An adversary can manipulate the hash values by applying slight changes in images, either induced by gradient-based approaches or simply by performing standard image transformations, forcing or preventing hash collisions. Such attacks permit malicious actors easily to exploit the detection system: from hiding abusive material to framing innocent users, everything is possible. Moreover, using the hash values, inferences can still be made about the data stored on user devices. In our view, based on our results, deep perceptual hashing in its current form is generally not ready for robust client-side scanning and should not be used from a privacy perspective.	翻訳日:2024-07-20 00:38:23 公開日:2024-07-16
# SELF-GUIDE: 自己合成ファインタニングによるタスク特定指導の改善 SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning ( http://arxiv.org/abs/2407.12874v1 ) ライセンス: Link先を確認	Chenyang Zhao, Xueying Jia, Vijay Viswanathan, Tongshuang Wu, Graham Neubig,	(参考訳) 大規模言語モデル(LLM)は、適切な自然言語プロンプトを提供する際に、多様なタスクを解決するという約束を持っている。しかしながら、モデルのプロンプトは、十分なトレーニングデータでモデルを微調整するよりも、精度の低い予測をすることがしばしばある。一方、タスク固有のデータ上でのLCMの微調整は、一般的にそのパフォーマンスを改善するが、豊富な注釈付きデータセットは全てのタスクで利用できない。従来の研究では、最先端のLLMからタスク固有のデータを生成して、このデータを使ってより小さなモデルを微調整する方法が検討されてきたが、このアプローチでは、トレーニング対象以外の言語モデルへのアクセスが必要となり、コスト、スケーラビリティの課題、より強力なLLMに継続的に依存する法的なハードルがもたらされる。これに対応して,学生LLMからタスク固有の入出力ペアを合成し,これらの入出力ペアを用いて学生LLM自体を微調整する多段階メカニズムであるSELF-GUIDEを提案する。本研究では,Natural Instructions V2ベンチマークを実証的に評価した結果,SELF-GUIDEによりLLMの性能が大幅に向上することが確認された。具体的には,分類タスクが約15%,生成タスクが18%の絶対的な改善をベンチマークの指標で報告する。このことは、LLMが外部の学習信号なしでタスク固有の専門家になるための自己合成データの約束に光を当てている。 Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. However, prompting often leads models to make predictions with lower accuracy compared to finetuning a model with ample training data. On the other hand, while finetuning LLMs on task-specific data generally improves their performance, abundant annotated datasets are not available for all tasks. Previous work has explored generating task-specific data from state-of-the-art LLMs and using this data to finetune smaller models, but this approach requires access to a language model other than the one being trained, which introduces cost, scalability challenges, and legal hurdles associated with continuously relying on more powerful LLMs. In response to these, we propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM, then use these input-output pairs to finetune the student LLM itself. In our empirical evaluation of the Natural Instructions V2 benchmark, we find that SELF-GUIDE improves the performance of LLM by a substantial margin. Specifically, we report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics. This sheds light on the promise of self-synthesized data guiding LLMs towards becoming task-specific experts without any external learning signals.	翻訳日:2024-07-19 20:02:37 公開日:2024-07-16
# ChatBCG:AIはあなたのスライドデッキを読むことができるか? ChatBCG: Can AI Read Your Slide Deck? ( http://arxiv.org/abs/2407.12875v1 ) ライセンス: Link先を確認	Nikita Singh, Rob Balian, Lukas Martinelli,	(参考訳) GPT4oやGemini Flashのようなマルチモーダルモデルは、人間レベルのパフォーマンスにアプローチする推論および要約タスクにおいて例外的である。しかし、これらのモデルは、特にビジネスデッキのビジュアルチャートの文脈において、非常に具体的な「読み上げと推定」タスクを行うよう依頼されたとき、人間に比べて性能が劣っていることがわかった。本稿では,GPT 4o と Gemini Flash-1.5 の精度を評価し,ラベル付きチャート(グラフ上にデータが明確に注釈付けされている場合)およびラベルなしチャート(データが明確に注釈付けされておらず,X軸とY軸から推測する必要がある場合)に関する簡単な質問に答える。これらのモデルは、複雑なグラフやラベル付けされていないグラフを含む場合、現在、デッキを正確にエンドツーエンドに読むことはできないと結論付けています。たとえユーザーがラベル付きチャートのみのデッキを作ったとしても、このモデルは15のラベル付きチャートのうち7～8個しか読めない。スライドデッキのフィギュアの全リストについては、https://www.repromptai.com/chat_bcgを参照してください。 Multimodal models like GPT4o and Gemini Flash are exceptional at inference and summarization tasks, which approach human-level in performance. However, we find that these models underperform compared to humans when asked to do very specific 'reading and estimation' tasks, particularly in the context of visual charts in business decks. This paper evaluates the accuracy of GPT 4o and Gemini Flash-1.5 in answering straightforward questions about data on labeled charts (where data is clearly annotated on the graphs), and unlabeled charts (where data is not clearly annotated and has to be inferred from the X and Y axis). We conclude that these models aren't currently capable of reading a deck accurately end-to-end if it contains any complex or unlabeled charts. Even if a user created a deck of only labeled charts, the model would only be able to read 7-8 out of 15 labeled charts perfectly end-to-end. For full list of slide deck figures visit https://www.repromptai.com/chat_bcg	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# Civitaiにおける乱用生成AIモデルの利用を探る Exploring the Use of Abusive Generative AI Models on Civitai ( http://arxiv.org/abs/2407.12876v1 ) ライセンス: Link先を確認	Yiluo Wei, Yiming Zhu, Pan Hui, Gareth Tyson,	(参考訳) 生成AIの台頭はデジタル画像の風景を変え、オンラインクリエイティブコミュニティに大きな影響を与えている。これにより、CivitaiのようなAIGC(AI-Generated Content)ソーシャルプラットフォームが誕生した。これらのユニークなソーシャルプラットフォームにより、ユーザーは独自の生成AIモデルを構築し、共有することができ、それによってより多様な芸術的表現の可能性を高めることができる。ソーシャルネットワークの中でデザインされた彼らは、アーチストたちに自分たちの創造(モデルから生成される)を披露する手段を提供し、議論を行い、フィードバックを得て、コミュニティの感覚を育む。しかし、このオープン性は、例えば、偽りのディープフェイクを広めたり、著作権を侵害したりするモデルの使用など、そのようなプラットフォームの悪用に対する懸念も引き起こす。これを探るため,我々はAIGCソーシャルプラットフォームに関する総合的な実証的研究を行い,乱用コンテンツの生成に利用することに焦点を当てた。例として、利用可能なAIGCソーシャルプラットフォームとして最大であるCivitaiをカバーする包括的データセットを構築した。この87Kモデルと2M画像のデータセットに基づいて、コンテンツの特徴を調査し、これらのプラットフォームをよりよく管理するためのモデレーション戦略について議論する。 The rise of generative AI is transforming the landscape of digital imagery, and exerting a significant influence on online creative communities. This has led to the emergence of AI-Generated Content (AIGC) social platforms, such as Civitai. These distinctive social platforms allow users to build and share their own generative AI models, thereby enhancing the potential for more diverse artistic expression. Designed in the vein of social networks, they also provide artists with the means to showcase their creations (generated from the models), engage in discussions, and obtain feedback, thus nurturing a sense of community. Yet, this openness also raises concerns about the abuse of such platforms, e.g., using models to disseminate deceptive deepfakes or infringe upon copyrights. To explore this, we conduct the first comprehensive empirical study of an AIGC social platform, focusing on its use for generating abusive content. As an exemplar, we construct a comprehensive dataset covering Civitai, the largest available AIGC social platform. Based on this dataset of 87K models and 2M images, we explore the characteristics of content and discuss strategies for moderation to better govern these platforms.	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# Review-Feedback-Reason (ReFeR): NLG評価と推論のための新しいフレームワーク Review-Feedback-Reason (ReFeR): A Novel Framework for NLG Evaluation and Reasoning ( http://arxiv.org/abs/2407.12877v1 ) ライセンス: Link先を確認	Yaswanth Narsupalli, Abhranil Chandra, Sreevatsa Muppirala, Manish Gupta, Pawan Goyal,	(参考訳) 大規模言語モデル(LLM)によって生成されるような自然言語生成(NLG)出力の品質を評価することは、大きな課題となる。従来のアプローチでは、リソース集約的な人的評価と自動メトリクスの両方が関係しており、しばしば人間の判断と相関が低い。本研究では,LPM エージェントを用いた NLG 評価フレームワークである Review-Feedback-Reason (ReFeR) を提案する。 NLGタスクの2つの既存のベンチマークデータセットを使用して、ReFeRを厳格にテストする。提案フレームワークは,NLG評価の精度を高め,従来のベンチマークを$\sim$20\%以上越えるだけでなく,構成的フィードバックを生成し,集合的推論を大幅に改善する。このフィードバックは、Mistral-7Bのような小さなモデルを微調整するために使用する命令チューニングデータセットの作成に利用される。また,GPT-3.5 Turbo を$\sim$11.67\% ,GPT-4 を$\sim$1\% で評価する。 Assessing the quality of Natural Language Generation (NLG) outputs, such as those produced by large language models (LLMs), poses significant challenges. Traditional approaches involve either resource-intensive human evaluations or automatic metrics, which often exhibit a low correlation with human judgment. In this study, we propose Review-Feedback-Reason (ReFeR), a novel evaluation framework for NLG using LLM agents. We rigorously test ReFeR using two pre-existing benchmark datasets on diverse NLG tasks. The proposed framework not only enhances the accuracy of NLG evaluation, surpassing previous benchmarks by $\sim$20\%, but also generates constructive feedback and significantly improves collective reasoning. This feedback is then leveraged for the creation of instruction-tuning datasets, which, when used to fine-tune smaller models like Mistral-7B, makes them extremely good evaluators, yielding a better correlation with human evaluations and performance nearly on par with GPT-3.5. We highlight the effectiveness of our methodology through its application on three reasoning benchmarks, where it outperforms most of the state-of-the-art methods, and also outperforms the reasoning capabilities of models like GPT-3.5 Turbo by $\sim$11.67\% and GPT-4 by $\sim$1\% on an average.	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# LLMには一貫性のある価値はあるか? Do LLMs have Consistent Values? ( http://arxiv.org/abs/2407.12878v1 ) ライセンス: Link先を確認	Naama Rozen, Gal Elidan, Amir Globerson, Ella Daniel,	(参考訳) 価値は人間の行動の基礎となる基本的な原動力である。大規模言語モデル(LLM)技術は、人間のような対話に向けて常に改善されている。しかし、LLMが生成したテキストで表される値についての研究はほとんど行われていない。ここでは、心理学における価値構造に関する豊富な文献に目を向けることで、この問題を研究する。我々は,LLMが,値のランク付けや値の相関など,人間で実証されたのと同じ値構造を示すかどうかを問う。この分析の結果は, LLMの推進方法に強く依存しており, 特定の促進戦略(「値アンチョリング」と呼ぶ)の下では, 人的データとの合意が極めて説得力があることが示されている。この結果は,LLMにおける値の理解の向上と,LLM応答の一貫性を評価する新しい手法の導入に寄与する。 Values are a basic driving force underlying human behavior. Large Language Models (LLM) technology is constantly improving towards human-like dialogue. However, little research has been done to study the values exhibited in text generated by LLMs. Here we study this question by turning to the rich literature on value structure in psychology. We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values. We show that the results of this analysis strongly depend on how the LLM is prompted, and that under a particular prompting strategy (referred to as 'Value Anchoring') the agreement with human data is quite compelling. Our results serve both to improve our understanding of values in LLMs, as well as introduce novel methods for assessing consistency in LLM responses.	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# 大規模視覚言語モデルも良い分類法である:インテクストマルチモーダルフェイクニュース検出の検討 Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection ( http://arxiv.org/abs/2407.12879v1 ) ライセンス: Link先を確認	Ye Jiang, Yimin Wang,	(参考訳) 大規模視覚言語モデル(LVLM)は、多種多様なクロスモーダルベンチマークにおいて、視覚言語推論において例外的な性能を示す。これらの進歩にもかかわらず、最近の研究は、GPT-3.5-turboのような大規模言語モデル(LLM)が、Fake News Detection (FND)においてBERTのようなよく訓練された小型モデルと比較され、FNDタスクにおけるLVLMsの有効性を問うことが示唆されている。微調整のLVLMにより性能は向上するが、かなりのパラメータと必要な事前訓練の重み付けにより、FNDアプリケーションのためのリソース重み付けの取り組みとなった。本稿は,CLIPモデルと比較し,まず2つの有名なLVLM(CagVLMとGPT4V)のFND能力を評価する。以上の結果から,LVLMは小型モデルと競合する性能が得られることが示された。次に,標準文脈学習(ICL)をLVLMと統合し,FND性能の向上に言及する。この問題に対処するため、我々は、よく訓練された小さなモデルからの予測と対応する確率で、文脈内例とテストインプットを豊かにすることで、textbf{I}n-context \textbf{M}ultimodal \textbf{F}ake \textbf{N}ews \textbf{D}etection (IMFND) フレームワークを導入する。この戦略的統合により、LVLMは高い確率に関連するニュースセグメントに焦点を向け、分析精度を向上させることができる。実験結果から,IMFNDフレームワークはLVLMのFND効率を大幅に向上し,3つのFNDデータセットの標準ICLアプローチよりも精度が向上したことが示唆された。 Large visual-language models (LVLMs) exhibit exceptional performance in visual-language reasoning across diverse cross-modal benchmarks. Despite these advances, recent research indicates that Large Language Models (LLMs), like GPT-3.5-turbo, underachieve compared to well-trained smaller models, such as BERT, in Fake News Detection (FND), prompting inquiries into LVLMs' efficacy in FND tasks. Although performance could improve through fine-tuning LVLMs, the substantial parameters and requisite pre-trained weights render it a resource-heavy endeavor for FND applications. This paper initially assesses the FND capabilities of two notable LVLMs, CogVLM and GPT4V, in comparison to a smaller yet adeptly trained CLIP model in a zero-shot context. The findings demonstrate that LVLMs can attain performance competitive with that of the smaller model. Next, we integrate standard in-context learning (ICL) with LVLMs, noting improvements in FND performance, though limited in scope and consistency. To address this, we introduce the \textbf{I}n-context \textbf{M}ultimodal \textbf{F}ake \textbf{N}ews \textbf{D}etection (IMFND) framework, enriching in-context examples and test inputs with predictions and corresponding probabilities from a well-trained smaller model. This strategic integration directs the LVLMs' focus towards news segments associated with higher probabilities, thereby improving their analytical accuracy. The experimental results suggest that the IMFND framework significantly boosts the FND efficiency of LVLMs, achieving enhanced accuracy over the standard ICL approach across three publicly available FND datasets.	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# Few-Shot Multimodal Fake News 検出のためのクロスモーダル拡張 Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection ( http://arxiv.org/abs/2407.12880v1 ) ライセンス: Link先を確認	Ye Jiang, Taihang Wang, Xiaoman Xu, Yimin Wang, Xingyi Song, Diana Maynard,	(参考訳) 偽ニュースの初期段階のトピックは、限られた注釈付きサンプルから素早く学習する自動検出方法を必要とする。そのため,早期のフェイクニュースの検出には,限られた指導力,あるいは少数ショットラーニング(英語版)としても知られる新しいタスクにおいて,急速に習熟する能力が不可欠である。既存のアプローチでは、多数のパラメータを伴ってトレーニング済みの言語モデルを微調整するか、大規模な注釈付きデータセットでスクラッチから複雑なニューラルネットワークをトレーニングする。本稿では,一様特徴を用いたマルチモーダル特徴を付加したマルチモーダルフェイクニュース検出モデルを提案する。この目的のために,Nショット分類をより堅牢な (n $\times$ z) ショット問題に変換することで,マルチモーダルな複数モーダルな偽ニュースの検出を簡易に行うCMA(Cross-Modal Augmentation)を導入する。提案したCMAは3つのベンチマークデータセット上でSOTA結果を達成し、驚くほど単純な線形探索法を用いて、少数のトレーニングサンプルでマルチモーダルフェイクニュースを分類する。さらに,本手法は従来手法よりもはるかに軽量であり,特に訓練可能なパラメータの数やエポック時間の観点からも顕著である。コードはここで入手できる。 \url{https://github.com/zgjiangtoby/FND_fewshot} The nascent topic of fake news requires automatic detection methods to quickly learn from limited annotated samples. Therefore, the capacity to rapidly acquire proficiency in a new task with limited guidance, also known as few-shot learning, is critical for detecting fake news in its early stages. Existing approaches either involve fine-tuning pre-trained language models which come with a large number of parameters, or training a complex neural network from scratch with large-scale annotated datasets. This paper presents a multimodal fake news detection model which augments multimodal features using unimodal features. For this purpose, we introduce Cross-Modal Augmentation (CMA), a simple approach for enhancing few-shot multimodal fake news detection by transforming n-shot classification into a more robust (n $\times$ z)-shot problem, where z represents the number of supplementary features. The proposed CMA achieves SOTA results over three benchmark datasets, utilizing a surprisingly simple linear probing method to classify multimodal fake news with only a few training samples. Furthermore, our method is significantly more lightweight than prior approaches, particularly in terms of the number of trainable parameters and epoch times. The code is available here: \url{https://github.com/zgjiangtoby/FND_fewshot}	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# BinaryAlign:バイナリシーケンスラベリングとしての単語アライメント BinaryAlign: Word Alignment as Binary Sequence Labeling ( http://arxiv.org/abs/2407.12881v1 ) ライセンス: Link先を確認	Gaetan Lopez Latouche, Marc-André Carbonneau, Ben Swanson,	(参考訳) 単語アライメントの現実的な展開は、高リソース言語と低リソース言語の両方をカバーすることがほぼ確実である。しかし、このタスクの最先端は、特定の言語ペアに対するゴールドアライメントトレーニングデータの可用性に応じて、異なるモデルクラスを推奨する。両シナリオの既存手法よりも優れたバイナリシーケンスラベリングに基づく新しい単語アライメント手法であるBinaryAlignを提案する。さらに,多言語基盤モデルの具体的選択に違いがあり,アライメントエラー型よりも階層化された誤り解析を行い,非英語言語対上でのBinaryAlignの性能について検討する。ソースコードを公開しています。 Real world deployments of word alignment are almost certain to cover both high and low resource languages. However, the state-of-the-art for this task recommends a different model class depending on the availability of gold alignment training data for a particular language pair. We propose BinaryAlign, a novel word alignment technique based on binary sequence labeling that outperforms existing approaches in both scenarios, offering a unifying approach to the task. Additionally, we vary the specific choice of multilingual foundation model, perform stratified error analysis over alignment error type, and explore the performance of BinaryAlign on non-English language pairs. We make our source code publicly available.	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# InstructAV:オーサリング検証のためのインストラクションファインタニング大型言語モデル InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification ( http://arxiv.org/abs/2407.12882v1 ) ライセンス: Link先を確認	Yujia Hu, Zhiqiang Hu, Chun-Wei Seah, Roy Ka-Wei Lee,	(参考訳) 大規模言語モデル(LLM)は、幅広いNLPタスクにおいて顕著な習熟性を示している。しかし、2つのテキストが同じ著者シップを共有しているかどうかを判断するオーサシップ検証(AV)タスクに関しては、ChatGPTのような先進的なモデルでさえ、顕著な制限がある。本稿では,著者確認のための新しいアプローチであるInstructAVを紹介する。このアプローチでは,パラメータ効率の細かいチューニング(PEFT)手法と併用して,精度と説明可能性の向上を図る。 InstructAVの特徴は、分類決定を透明で理解可能な説明と整合させる能力にある。さまざまなデータセットにわたる包括的な実験を通じて、InstructAVはAVタスクにおける最先端のパフォーマンスを示し、高い分類精度と説明信頼性の強化を提供する。 Large Language Models (LLMs) have demonstrated remarkable proficiency in a wide range of NLP tasks. However, when it comes to authorship verification (AV) tasks, which involve determining whether two given texts share the same authorship, even advanced models like ChatGPT exhibit notable limitations. This paper introduces a novel approach, termed InstructAV, for authorship verification. This approach utilizes LLMs in conjunction with a parameter-efficient fine-tuning (PEFT) method to simultaneously improve accuracy and explainability. The distinctiveness of InstructAV lies in its ability to align classification decisions with transparent and understandable explanations, representing a significant progression in the field of authorship verification. Through comprehensive experiments conducted across various datasets, InstructAV demonstrates its state-of-the-art performance on the AV task, offering high classification accuracy coupled with enhanced explanation reliability.	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# BRIGHT: 推論集約検索のための現実的でカオスなベンチマーク BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval ( http://arxiv.org/abs/2407.12883v1 ) ライセンス: Link先を確認	Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu,	(参考訳) 既存の検索ベンチマークは主に、キーワードまたは意味に基づく検索が通常十分である情報検索クエリ(例えば、検索エンジンからの集約された質問)で構成されている。しかし、多くの複雑な現実世界のクエリは、サーフェスフォームマッチングを超える関連ドキュメントを特定するために、詳細な推論を必要とする。例えば、コーディング問題のためのドキュメントを見つけるには、関連する関数のロジックと構文を理解する必要がある。このような難解なクエリに対する検索のベンチマークを改善するために,関係文書の検索に集中的推論を必要とする最初のテキスト検索ベンチマークBRIGHTを導入する。 BRIGHTは、さまざまな領域(経済学、心理学、ロボット工学、ソフトウェア工学、地球科学など)から収集された1,398の現実世界のクエリから構築されている。広範囲な評価により,最先端の検索モデルでさえBRIGHTでは性能が良くないことが明らかとなった。 MTEBリーダーボード[38]のリードモデルは59.0 nDCG@10,2であり、BRIGHTでは18.0 nDCG@10のスコアを生成する。さらに,大規模言語モデル(LLM)が生成するChain-of-Thought推論によるクエリの強化により,最大12.2ポイントの性能向上が図られている。さらに、BRIGHTは、トレーニングデータにベンチマークからの文書が含まれている場合でも、同様の性能を示すことによって、ベンチマークされたモデルの事前トレーニング中にデータ漏洩に対して堅牢である。 BRIGHTは、より現実的で困難な環境での検索システムに関する将来の研究の道を開くものだと考えています。私たちのコードとデータはhttps://brightbenchmark.github.io.comで公開されています。 Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries require in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. To better benchmark retrieval on such challenging queries, we introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. BRIGHT is constructed from the 1,398 real-world queries collected from diverse domains (such as economics, psychology, robotics, software engineering, earth sciences, etc.), sourced from naturally occurring or carefully curated human data. Extensive evaluation reveals that even state-of-the-art retrieval models perform poorly on BRIGHT. The leading model on the MTEB leaderboard [38 ], which achieves a score of 59.0 nDCG@10,2 produces a score of nDCG@10 of 18.0 on BRIGHT. We further demonstrate that augmenting queries with Chain-of-Thought reasoning generated by large language models (LLMs) improves performance by up to 12.2 points. Moreover, BRIGHT is robust against data leakage during pretraining of the benchmarked models as we validate by showing similar performance even when documents from the benchmark are included in the training data. We believe that BRIGHT paves the way for future research on retrieval systems in more realistic and challenging settings. Our code and data are available at https://brightbenchmark.github.io.	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# Surro Flow:パラメータ空間探索と不確実性定量のためのフローベースサロゲートモデル SurroFlow: A Flow-Based Surrogate Model for Parameter Space Exploration and Uncertainty Quantification ( http://arxiv.org/abs/2407.12884v1 ) ライセンス: Link先を確認	Jingyi Shen, Yuhan Duan, Han-Wei Shen,	(参考訳) 既存のディープラーニングベースのサロゲートモデルは、効率的なデータ生成を容易にするが、不確実な定量化、効率的なパラメータ空間探索、逆予測に不足する。本研究では,新しいフローベース代理モデルであるSurroFlowを導入し,シミュレーションパラメータとシミュレーション出力の間の可逆変換を学習する。このモデルは、与えられたシミュレーションパラメータのシミュレーション結果の正確な予測を可能にするだけでなく、データ生成プロセスにおける不確実な定量化もサポートする。さらに、効率的なシミュレーションパラメータのレコメンデーションと探索を可能にする。我々は,SurroFlowと遺伝的アルゴリズムを視覚インタフェースのバックエンドとして統合し,効果的なユーザ誘導アンサンブルシミュレーション探索と可視化を支援する。本フレームワークは,科学的サロゲートモデルの信頼性と探索能力を向上しつつ,計算コストを大幅に削減する。 Existing deep learning-based surrogate models facilitate efficient data generation, but fall short in uncertainty quantification, efficient parameter space exploration, and reverse prediction. In our work, we introduce SurroFlow, a novel normalizing flow-based surrogate model, to learn the invertible transformation between simulation parameters and simulation outputs. The model not only allows accurate predictions of simulation outcomes for a given simulation parameter but also supports uncertainty quantification in the data generation process. Additionally, it enables efficient simulation parameter recommendation and exploration. We integrate SurroFlow and a genetic algorithm as the backend of a visual interface to support effective user-guided ensemble simulation exploration and visualization. Our framework significantly reduces the computational costs while enhancing the reliability and exploration capabilities of scientific surrogate models.	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# LLMの分類課題に推奨されない白化 Whitening Not Recommended for Classification Tasks in LLMs ( http://arxiv.org/abs/2407.12886v1 ) ライセンス: Link先を確認	Ali Forooghi, Shaghayegh Sadeghi, Jianguo Lu,	(参考訳) 文の埋め込みはNLPの基盤となる。ホワイトニングは、LLM(Large Language Models)から得られる埋め込み品質を改善する効果的な操作であると言われている。しかし,ホワイトニングの有効性はモデルに依存し,タスクに依存していることがわかった。特に、ホワイトニングは分類タスクの埋め込みを退化させる。結論は広範な実験によって支持される。また, PCA, ZCA, PCA-Cor, ZCA-Cor, Cholesky などの白化処理についても検討した。我々の研究の副産物は、SentEval+と呼ばれるLCMの組込み評価プラットフォームである。 Sentence embedding is a cornerstone in NLP. Whitening has been claimed to be an effective operation to improve embedding quality obtained from Large Language Models (LLMs). However, we find that the efficacy of whitening is model-dependent and task-dependent. In particular, whitening degenerates embeddings for classification tasks. The conclusion is supported by extensive experiments. We also explored a variety of whitening operations, including PCA, ZCA, PCA-Cor, ZCA-Cor and Cholesky whitenings. A by-product of our research is embedding evaluation platform for LLMs called SentEval+.	翻訳日:2024-07-19 19:52:52 公開日:2024-07-16
# UAVを利用した宇宙空間統合ネットワーク:最近の学習アルゴリズムの技術レビュー UAV-Assisted Space-Air-Ground Integrated Networks: A Technical Review of Recent Learning Algorithms ( http://arxiv.org/abs/2211.14931v2 ) ライセンス: Link先を確認	Atefeh H. Arani, Peng Hu, Yeying Zhu,	(参考訳) 近年の宇宙・空気・地上機器の技術進歩により、宇宙空地上統合ネットワーク(SAGIN)と呼ばれる新しいネットワークパラダイムが実現されている。無人航空機(UAV)はSAGINにおいて重要な役割を果たしている。しかし、UAVの高ダイナミック性と複雑さのため、SAGINの実際の展開は、そのようなSAGINを実現する上で重要な障壁となる。 UAVは、限られた操作性と宇宙および地上コンポーネントの資源を備えた重要な性能要件を満たすことが期待されている。したがって、様々な利用シナリオでUAVを採用するには、アルゴリズム的なアプローチで十分に設計された計画が必要である。本稿では,UAV支援SAGINにおける最近の学習アルゴリズムのレビューと分析を行う。報奨関数について検討し、Q-ラーニング、深層Q-ラーニング、マルチアームバンディット、粒子群最適化、満足度に基づく学習アルゴリズムなどの報酬関数を最適化するための最先端アルゴリズムについて議論する。他の調査論文とは異なり、最適化問題における方法論的視点に注目し、SAGIN上の様々なミッションに適用する。実世界の構成と2次元(2次元)と3次元(3次元)のUAV軌跡を配置事例の反映として検討する。シミュレーションの結果,3次元満足度に基づく学習アルゴリズムは,ほとんどの場合,他の手法よりも優れていたことが示唆された。最後に,UAV支援型SAGINの設計・展開ガイドラインについて述べる。 Recent technological advancements in space, air, and ground components have made possible a new network paradigm called space-air-ground integrated network (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, real-world deployment of a SAGIN becomes a significant barrier to realizing such SAGINs. UAVs are expected to meet key performance requirements with limited maneuverability and resources with space and terrestrial components. Therefore, employing UAVs in various usage scenarios requires well-designed planning in algorithmic approaches. This paper provides an essential review and analysis of recent learning algorithms in a UAV-assisted SAGIN. We consider possible reward functions and discuss the state-of-the-art algorithms for optimizing the reward functions, including Q-learning, deep Q-learning, multi-armed bandit, particle swarm optimization, and satisfaction-based learning algorithms. Unlike other survey papers, we focus on the methodological perspective of the optimization problem, applicable to various missions on a SAGIN. We consider real-world configurations and the 2-dimensional (2D) and 3-dimensional (3D) UAV trajectories to reflect deployment cases. Our simulations suggest the 3D satisfaction-based learning algorithm outperforms other approaches in most cases. With open challenges discussed at the end, we aim to provide design and deployment guidelines for UAV-assisted SAGINs.	翻訳日:2024-07-19 03:58:48 公開日:2024-07-16
# 量子力学の非線形拡張の幾何学的解釈 Geometric Interpretation of a nonlinear extension of Quantum Mechanics ( http://arxiv.org/abs/2405.07289v3 ) ライセンス: Link先を確認	Alan Chodos, Fred Cooper,	(参考訳) 我々は最近、通常の線形量子力学問題のハミルトニアンの固有値と固有関数の観点から正確に解ける性質を持つ特定の非線形量子力学の一般化を導入した。本稿では,波動関数の2つの成分が時空の2つの異なる漸近領域におけるハミルトニアンHによって記述された系を表すことを示唆し,非線型項が重力効果をもたらすと考えられることを示す。 We recently introduced a particular nonlinear generalization of quantum mechanics which has the property that it is exactly solvable in terms of the eigenvalues and eigenfunctions of the Hamiltonian of the usual linear quantum mechanics problem. In this paper we suggest that the two components of the wave function represent the system described by the Hamiltonian H in two different asymptotic regions of spacetime and we show that the non-linear terms can be viewed as giving rise to gravitational effects.	翻訳日:2024-07-19 03:51:44 公開日:2024-07-16
# 拡張エッジを持つ量子ホール系におけるホーキング放射-異常法の適用 Hawking radiation in quantum Hall system with an expanding edge: application of anomaly method ( http://arxiv.org/abs/2407.02796v2 ) ライセンス: Link先を確認	Riku Yoshimoto, Yasusada Nambu,	(参考訳) 重力異常とブラックホールのホーキング放射の関係はウィルツェクとロビンソンによって明らかにされた。本研究では,拡張エッジを持つ量子ホール系におけるジッター時空のアナログにそれらの手法を適用した。この系はキラルであるため、元々の方法で仮定した地平線付近での進入モードの条件を課す必要はない。さらに、この系は、ド・ジッター空間が2つの平坦空間の間に挟まれるように構成されており、この異常の影響はオーディナル・ド・ジッター時空には現れないが、ド・ジッターと平坦領域の境界条件として現れる。これらの境界条件下での計算により、ド・ジッター地平線のギボンズ・ホーキング温度で外平領域におけるホーキング放射のフラックスを求める。 The relationship between gravitational anomalies and Hawking radiation of black holes was revealed by Wilczek and Robinson. In this study, we apply their method to an analogue de Sitter spacetime in the quantum Hall system with an expanding edge. Because this system is chiral, there is no need to impose the condition of ingoing modes near the horizon, which was assumed in the original method. Moreover, this system is structured so that the de Sitter space is sandwiched between two flat spaces, and although the effects of the anomaly would not appear in an ordinal de Sitter spacetime, they manifest themselves as boundary conditions between the de Sitter and the flat regions. By performing calculations under these boundary conditions, we obtain the flux of Hawking radiation in the outer flat region with the Gibbons-Hawking temperature of the de Sitter horizon.	翻訳日:2024-07-19 03:51:44 公開日:2024-07-16
# パッチ空間下マンニフォルドの測地線図を用いた画像デノーミング Image Denoising Using the Geodesics' Gramian of the Manifold Underlying Patch-Space ( http://arxiv.org/abs/2010.07769v3 ) ライセンス: Link先を確認	Kelum Gajamannage,	(参考訳) 現代社会における高度なカメラの普及に伴い、正確で視覚的な画像の需要が高まっている。しかし、カメラが捉えた画像の品質はノイズによって劣化する可能性がある。したがって, 画像の特徴を損なうことなく, ノイズを除去するためには, 画像の処理が必要である。現在の文献では様々な復調法が提供されているが、その正当性や効力性は不確かである。そこで本研究では,精度の高い画像を生成することができる新しい,計算効率の良い画像復号法を提案する。画像の滑らか性を維持するため、画素ではなく画像から分割されたパッチを入力する。そして、画像領域ではなく、パッチ空間の裏にある多様体をデノナイズして、画像全体の機能をよりよく保存する。本稿では,この手法の性能をベンチマーク画像処理法に対して検証する。 With the proliferation of sophisticated cameras in modern society, the demand for accurate and visually pleasing images is increasing. However, the quality of an image captured by a camera may be degraded by noise. Thus, some processing of images is required to filter out the noise without losing vital image features. Even though the current literature offers a variety of denoising methods, the fidelity and efficacy of their denoising are sometimes uncertain. Thus, here we propose a novel and computationally efficient image denoising method that is capable of producing accurate images. To preserve image smoothness, this method inputs patches partitioned from the image rather than pixels. Then, it performs denoising on the manifold underlying the patch-space rather than that in the image domain to better preserve the features across the whole image. We validate the performance of this method against benchmark image processing methods.	翻訳日:2024-07-19 00:05:30 公開日:2024-07-16
# 観測・干渉データを用いた文脈特異的因果関係モデルの表現 Representation of Context-Specific Causal Models with Observational and Interventional Data ( http://arxiv.org/abs/2101.09271v4 ) ライセンス: Link先を確認	Eliana Duarte, Liam Solus,	(参考訳) CStreesと呼ばれるコンテキスト固有独立モデルの新たなファミリを導入することにより、一般に(例えば硬さや柔らかい)介入によって収集された観測データと実験データの両方に基づいて、文脈固有因果モデルを表現する問題に対処する。この族は、一般的な介入DAGモデルを定義する因子化特性の一般化を可能にする新しい分解基準によって定義される。 DAGのVermaとPearlの基準を拡張した観測CSツリーのモデル等価性のグラフィカルな特徴を導出する。この特徴は、一般にコンテキスト特異的な介入の下でCStreeモデルに拡張される。これらの結果を得るために、CStreeモデルの簡潔なグラフィカル表現に組み込むことができる文脈依存的介入の概念を定式化する。 CSツリーと他の文脈特化モデルとを関連づけ,DAG,CSツリー,ラベル付きDAG,ステージ付きツリーが厳密な包摂連鎖を形成することを示す。 CStreeモデルを実際のデータセットに適用し、データ依存構造とソフトな介入摂動の文脈固有の性質を明らかにする。 We address the problem of representing context-specific causal models based on both observational and experimental data collected under general (e.g. hard or soft) interventions by introducing a new family of context-specific conditional independence models called CStrees. This family is defined via a novel factorization criterion that allows for a generalization of the factorization property defining general interventional DAG models. We derive a graphical characterization of model equivalence for observational CStrees that extends the Verma and Pearl criterion for DAGs. This characterization is then extended to CStree models under general, context-specific interventions. To obtain these results, we formalize a notion of context-specific intervention that can be incorporated into concise graphical representations of CStree models. We relate CStrees to other context-specific models, showing that the families of DAGs, CStrees, labeled DAGs and staged trees form a strict chain of inclusions. We end with an application of interventional CStree models to a real data set, revealing the context-specific nature of the data dependence structure and the soft, interventional perturbations.	翻訳日:2024-07-19 00:00:34 公開日:2024-07-16
# 非互換性を超えた: 機械学習と法における相互排他的公正基準のトレードオフ Beyond Incompatibility: Trade-offs between Mutually Exclusive Fairness Criteria in Machine Learning and Law ( http://arxiv.org/abs/2212.00469v4 ) ライセンス: Link先を確認	Meike Zehlike, Alex Loosley, Håkan Jonsson, Emil Wiedemann, Philipp Hacker,	(参考訳) 公正で信頼できるAIは、マシンラーニングと法的なドメインの両方において、ますます重要になっている。重要な結果の1つは、意思決定者は「公正」すなわち非差別的、アルゴリズム的な決定手順を保証する必要があることである。しかし、現実的な事実的仮定の下で相互に相容れないことが示されているアルゴリズム的公正性のいくつかの競合する概念がある。この懸念は、例えば「グループ内の校正」と「正・負のクラスに対する均衡」の広く使われている公平度尺度である。本稿では,これら3つのフェアネス基準を補間する新しいアルゴリズム(FAir Interpolation Method: FAIM)を提案する。したがって、初期不公平な予測は、少なくとも部分的には、各公正条件の所望の重み付けされた組み合わせを満たすように修正することができる。我々は,合成データ,CompASデータセット,電子商取引部門による新たな実世界のデータセットに適用した場合のアルゴリズムの有効性を実証する。最後に、FAIMが相反する法的義務を満たすためにどの程度活用できるかについて議論する。この分析は、信用スコアリングや刑事司法手続といった従来の法分野における業務を運用するだけでなく、デジタル市場法や最近制定されたAI法など、EUで実施された最新のAI規制についても運用する可能性があることを示唆している。 Fair and trustworthy AI is becoming ever more important in both machine learning and legal domains. One important consequence is that decision makers must seek to guarantee a 'fair', i.e., non-discriminatory, algorithmic decision procedure. However, there are several competing notions of algorithmic fairness that have been shown to be mutually incompatible under realistic factual assumptions. This concerns, for example, the widely used fairness measures of 'calibration within groups' and 'balance for the positive/negative class'. In this paper, we present a novel algorithm (FAir Interpolation Method: FAIM) for continuously interpolating between these three fairness criteria. Thus, an initially unfair prediction can be remedied to, at least partially, meet a desired, weighted combination of the respective fairness conditions. We demonstrate the effectiveness of our algorithm when applied to synthetic data, the COMPAS data set, and a new, real-world data set from the e-commerce sector. Finally, we discuss to what extent FAIM can be harnessed to comply with conflicting legal obligations. The analysis suggests that it may operationalize duties in traditional legal fields, such as credit scoring and criminal justice proceedings, but also for the latest AI regulations put forth in the EU, like the Digital Markets Act and the recently enacted AI Act.	翻訳日:2024-07-19 00:00:34 公開日:2024-07-16
# 自転車のフレーム:ニュースの中のサイクリストの欠かせないポートレイダルを理解する Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News ( http://arxiv.org/abs/2301.06178v2 ) ライセンス: Link先を確認	Xingmeng Zhao, Dan Schumacher, Sashank Nalluri, Xavier Walton, Suhana Shrestha, Anthony Rios,	(参考訳) 輸送やレクリエーションのためのサイクリングの増加は、健康を増し、車両の環境への影響を減少させる。しかし、報道機関のイデオロギーや報告スタイルは、しばしばサイクリングに対する大衆の認識に影響を及ぼす。例えば、報道機関がサイクリング事故を過度に報告すると、人々はサイクリストを「危険な」と認識させ、サイクリングを選択したサイクリストの数を減少させる可能性がある。さらに、サイクリングの減少は、安全なインフラに対する政府資金の削減につながる可能性がある。本稿では,ニュース見出し中のサイクリストの知覚を検知する手法を開発する。これを達成するために ``Bike Frames'' と呼ばれる新しいデータセットを導入します。データセットは31,480のニュース見出しと1500のアノテーションで構成されている。私たちの焦点は、米国からの11,385の見出しを分析することです。 BikeFrame Chain-of-Codeフレームワークを導入し、サイクリストの知覚を予測し、事故に関連する見出しを特定し、欠陥を判定する。このフレームワークは、正確な論理に擬似コードを使用し、ニューズエージェンシーのバイアス分析を統合して、大規模言語モデルにおける従来のチェーン・オブ・シークレット推論に対する予測を改善する。提案手法は,他の手法よりも優れており,特に,ニュースバイアス情報の導入がパフォーマンスに大きく影響を与え,平均F1が.739から.815に向上することがわかった。最後に,米国発ニュースの見出しを包括的に分析し,報道機関とサイクリング特化ウェブサイトの相違や,サイクリストの性別による報告の相違を見出した。 WARNING: 本論文では事故と死亡について記述する。 Increasing cycling for transportation or recreation can boost health and reduce the environmental impacts of vehicles. However, news agencies' ideologies and reporting styles often influence public perception of cycling. For example, if news agencies overly report cycling accidents, it may make people perceive cyclists as "dangerous," reducing the number of cyclists who opt to cycle. Additionally, a decline in cycling can result in less government funding for safe infrastructure. In this paper, we develop a method for detecting the perceived perception of cyclists within news headlines. We introduce a new dataset called ``Bike Frames'' to accomplish this. The dataset consists of 31,480 news headlines and 1,500 annotations. Our focus is on analyzing 11,385 headlines from the United States. We also introduce the BikeFrame Chain-of-Code framework to predict cyclist perception, identify accident-related headlines, and determine fault. This framework uses pseudocode for precise logic and integrates news agency bias analysis for improved predictions over traditional chain-of-thought reasoning in large language models. Our method substantially outperforms other methods, and most importantly, we find that incorporating news bias information substantially impacts performance, improving the average F1 from .739 to .815. Finally, we perform a comprehensive case study on US-based news headlines, finding reporting differences between news agencies and cycling-specific websites as well as differences in reporting depending on the gender of cyclists. WARNING: This paper contains descriptions of accidents and death.	翻訳日:2024-07-19 00:00:34 公開日:2024-07-16
# SSL-Cleanse: 自己監視学習におけるトロイの木馬の検出と緩和 SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning ( http://arxiv.org/abs/2303.09079v3 ) ライセンス: Link先を確認	Mengxin Zheng, Jiaqi Xue, Zihao Wang, Xun Chen, Qian Lou, Lei Jiang, Xiaofeng Wang,	(参考訳) 自己教師付き学習(SSL)は、データ表現を符号化する一般的な手法である。トレーニング済みのSSLイメージエンコーダを使用して、その後、下流の分類器をトレーニングすることで、ラベル付きデータをほとんど持たずに、様々なタスクで印象的なパフォーマンスを実現することができる。 SSLの採用の増加により、SSLエンコーダと関連するTrojan攻撃に関するセキュリティ調査が増加した。 SSLエンコーダに埋め込まれたTrojan攻撃は隠蔽的に動作し、複数のユーザやデバイスに分散する。トロイの木馬エンコーダにおけるバックドアの挙動の存在は、下流の分類器によって必然的に継承され、脅威の検出と緩和がさらに困難になる。教師あり学習における現在のトロイの木馬検出手法は、SSL下流の分類器を保護できる可能性があるが、広く普及する前にSSLエンコーダ内のトリガーを特定し、対処することは難しい課題である。この課題は、ダウンストリームタスクが不明な場合やデータセットラベルが使用できない場合、SSLエンコーダのトロイの木馬検出時に、元のラベルのないトレーニングデータセットにアクセスできない場合、発生します。 SSLエンコーダのバックドア脅威を特定し軽減するためのソリューションとしてSSL-Cleanseを導入します。 1200エンコーダを用いてさまざまなデータセット上でSSL-Cleanseを評価し,ImageNet-100では平均82.2%の検出成功率を達成した。バックドアを緩和した後、平均して、バックドアエンコーダは、高い精度の損失なしに0.3%の攻撃成功率を達成し、SSL-Cleanseの有効性を証明した。 SSL-Cleanseのソースコードはhttps://github.com/UCF-ML-Research/SSL-Cleanseで公開されている。 Self-supervised learning (SSL) is a prevalent approach for encoding data representations. Using a pre-trained SSL image encoder and subsequently training a downstream classifier, impressive performance can be achieved on various tasks with very little labeled data. The growing adoption of SSL has led to an increase in security research on SSL encoders and associated Trojan attacks. Trojan attacks embedded in SSL encoders can operate covertly, spreading across multiple users and devices. The presence of backdoor behavior in Trojaned encoders can inadvertently be inherited by downstream classifiers, making it even more difficult to detect and mitigate the threat. Although current Trojan detection methods in supervised learning can potentially safeguard SSL downstream classifiers, identifying and addressing triggers in the SSL encoder before its widespread dissemination is a challenging task. This challenge arises because downstream tasks might be unknown, dataset labels may be unavailable, and the original unlabeled training dataset might be inaccessible during Trojan detection in SSL encoders. We introduce SSL-Cleanse as a solution to identify and mitigate backdoor threats in SSL encoders. We evaluated SSL-Cleanse on various datasets using 1200 encoders, achieving an average detection success rate of 82.2% on ImageNet-100. After mitigating backdoors, on average, backdoored encoders achieve 0.3% attack success rate without great accuracy loss, proving the effectiveness of SSL-Cleanse. The source code of SSL-Cleanse is available at https://github.com/UCF-ML-Research/SSL-Cleanse.	翻訳日:2024-07-19 00:00:34 公開日:2024-07-16
# PlayBest: 拡散型プランニングによるプロバスケットボール選手の行動合成 PlayBest: Professional Basketball Player Behavior Synthesis via Planning with Diffusion ( http://arxiv.org/abs/2306.04090v3 ) ライセンス: Link先を確認	Xiusi Chen, Wei-Yao Wang, Ziniu Hu, David Reynoso, Kun Jin, Mingyan Liu, P. Jeffrey Brantingham, Wei Wang,	(参考訳) 複雑なシステムにおける動的計画は、様々な領域における意思決定を改善するために研究されている。プロバスケットボールは、文脈に依存した意思決定を含む動的時空間ゲームの魅力的な例である。しかし,様々なオンコート信号の処理や潜在的な行動や成果の膨大な空間のナビゲートは,進化する状況に対応する最適な戦略を迅速に特定することが困難である。本研究では,条件付き軌道生成プロセスとして逐次決定過程を定式化する。この定式化に基づき,プレイベスト (PLAYer BEhavior Synthesis) を導入する。本研究では,NBA選手の運動追跡データから,拡散確率モデルを拡張し,環境動態を学習する。データ駆動戦略を取り入れるために、補助値関数は対応する報酬で訓練される。報奨誘導軌道生成を実現するため,分類器誘導サンプリングにより,値関数上の拡散モデルを条件とした。本研究では,プロバスケットボールチームで採用されている選手と,生成された軌跡を対比し,シミュレーション研究によりPlayBestの有効性を検証した。以上の結果から,このモデルは,効率的なプレーを実現する合理的なバスケットボールコースの創出に優れることが明らかとなった。さらに、合成されたプレイ戦略はプロの戦術と一致しており、バスケットボールの試合の複雑なダイナミクスを捉えるためのモデルの能力を強調している。 Dynamically planning in complex systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing context-dependent decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes make it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we formulate the sequential decision-making process as a conditional trajectory generation process. Based on the formulation, we introduce PlayBest (PLAYer BEhavior SynThesis), a method to improve player decision-making. We extend the diffusion probabilistic model to learn challenging environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained with corresponding rewards. To accomplish reward-guided trajectory generation, we condition the diffusion model on the value function via classifier-guided sampling. We validate the effectiveness of PlayBest through simulation studies, contrasting the generated trajectories with those employed by professional basketball teams. Our results reveal that the model excels at generating reasonable basketball trajectories that produce efficient plays. Moreover, the synthesized play strategies exhibit an alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games.	翻訳日:2024-07-18 23:50:47 公開日:2024-07-16
# Skills-in-Context Prompting:大規模言語モデルにおける構成性の解き放つ Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models ( http://arxiv.org/abs/2308.00304v3 ) ライセンス: Link先を確認	Jiaao Chen, Xiaoman Pan, Dian Yu, Kaiqiang Song, Xiaoyang Wang, Dong Yu, Jianshu Chen,	(参考訳) 本研究では,大規模言語モデル(LLM)における合成一般化能力の活用方法について検討する。構成的一般化は、人間の知性に似た重要な推論能力である基礎的スキルを組み合わせることによって、LCMに複雑な問題を解決する権限を与える。しかし、最も先進的なLSMでさえ、このタイプの推論に苦戦している。我々は,この問題を文脈内学習の枠組みの中で検討し,これらのスキルに根ざした基礎的スキルと構成的事例の両方を同じプロンプトの文脈で示すことが重要であることを発見した。本稿では,このプロンプト構造をスキル・イン・コンテクスト(SKiC)と呼ぶ。 2つの例に限らず、この文脈内学習構造により、LLMは革新的なスキルの組み合わせを必要とするより困難な問題に取り組み、幅広いタスクにわたってほぼ完璧な体系的一般化を実現することができる。興味深いことに、SKiCはLSMの潜在可能性を解き、既存の内部スキルをより積極的に活用して複雑な推論問題を解決することができる。 SKiCの構造は、異なるスキル構成や模範的な選択にまたがって堅牢であり、新しいタスクへの強い伝達性を示す。最後に,本研究では,SKiC型データを用いた微調整LDMを用いて,ゼロショットの弱強一般化を導出し,モデルが標準的プロンプトで直接的にはるかに難しい問題を解けることを示す。 We investigate how to elicit compositional generalization capabilities in large language models (LLMs). Compositional generalization empowers LLMs to solve complex problems by combining foundational skills, a critical reasoning ability akin to human intelligence. However, even the most advanced LLMs currently struggle with this form of reasoning. We examine this problem within the framework of in-context learning and find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial. We refer to this prompt structure as skills-in-context (SKiC). With as few as two exemplars, this in-context learning structure enables LLMs to tackle more challenging problems requiring innovative skill combinations, achieving near-perfect systematic generalization across a broad range of tasks. Intriguingly, SKiC also unlocks the latent potential of LLMs, allowing them to more actively utilize pre-existing internal skills acquired during earlier pretraining stages to solve complex reasoning problems. The SKiC structure is robust across different skill constructions and exemplar choices and demonstrates strong transferability to new tasks. Finally, inspired by our in-context learning study, we show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization, enabling the models to solve much harder problems directly with standard prompting.	翻訳日:2024-07-18 23:28:28 公開日:2024-07-16
# EGIC:セマンティックセグメンテーションによる低ビットレート生成画像圧縮の強化 EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation ( http://arxiv.org/abs/2309.03244v3 ) ライセンス: Link先を確認	Nikolai Körber, Eduard Kromer, Andreas Siebert, Sascha Hauke, Daniel Mueller-Gritschneder, Björn Schuller,	(参考訳) 本稿では,1つのモデルから歪み知覚曲線を効率的にトラバースできる改良された生成画像圧縮手法EGICを紹介する。 EGICは2つの新しいビルディングブロックに基づいている。一)OASIS-Cは、空間的及び意味的に認識された勾配フィードバックをジェネレータに提供し、潜画像分布を条件とした条件付き訓練済みセマンティックセマンティクス誘導識別装置である。二出力残差予測(英: Output Residual Prediction、ORP)とは、MSE最適化とGAN最適化デコーダ出力の残差がGAN再構成に与える影響を調整することにより、合成過程の制御を可能にするマルチリアリズム画像圧縮の逆最適化ソリューションである。共に、EGICは強力なコーデックを形成し、最先端の拡散とGANベースの手法(例えば、HiFiC、MS-ILLM、DIRAC-100)を上回り、歪み端のVTM-20.0とほぼ同等に動作する。 EGICは実装が簡単で、非常に軽量であり、補間特性に優れたので、低ビット範囲をターゲットとした実用的なアプリケーションには有望な候補となる。 We introduce EGIC, an enhanced generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. EGIC is based on two novel building blocks: i) OASIS-C, a conditional pre-trained semantic segmentation-guided discriminator, which provides both spatially and semantically-aware gradient feedback to the generator, conditioned on the latent image distribution, and ii) Output Residual Prediction (ORP), a retrofit solution for multi-realism image compression that allows control over the synthesis process by adjusting the impact of the residual between an MSE-optimized and GAN-optimized decoder output on the GAN-based reconstruction. Together, EGIC forms a powerful codec, outperforming state-of-the-art diffusion and GAN-based methods (e.g., HiFiC, MS-ILLM, and DIRAC-100), while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight, and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.	翻訳日:2024-07-18 23:28:28 公開日:2024-07-16
# 共鳴駆動型アンサンブルにおける量子デフォーカスの能動的抑制 Active Suppression of Quantum Dephasing in Resonantly Driven Ensembles ( http://arxiv.org/abs/2310.10525v3 ) ライセンス: Link先を確認	Chengxing He, Robert R. Jones,	(参考訳) 我々は、ランダムな原子位置が原子対内のコヒーレントな集団移動に与える影響を抑制するために量子制御を用い、数百原子のリドベルクガス中の双極子-双極子駆動のラビ振動の観測を可能にした。この方法は、非共振ラビ周波数の結合強度の低下を利用して、非線形光学における準位相マッチングと類似して達成可能な集団移動をコヒーレントに増幅する。シミュレーションは実験結果を再現し、他の多体量子制御アプリケーションに対する手法の潜在的な利点を実証する。 We have used quantum control to suppress the impact of random atom positions on coherent population transfer within atom pairs, enabling the observation of dipole-dipole driven Rabi oscillations in a Rydberg gas with hundreds of atoms. The method exploits the reduced coupling-strength sensitivity of the off-resonant Rabi frequency, and coherently amplifies the achievable population transfer in analogy to quasi-phase-matching in non-linear optics. Simulations reproduce the experimental results and demonstrate the potential benefits of the technique to other many-body quantum control applications.	翻訳日:2024-07-18 23:18:25 公開日:2024-07-16
# 生成的データ拡張を用いた統合失調症診断のための説明可能な深層学習手法 An Explainable Deep Learning-Based Method For Schizophrenia Diagnosis Using Generative Data-Augmentation ( http://arxiv.org/abs/2310.16867v2 ) ライセンス: Link先を確認	Mehrshad Saadatinia, Armin Salimi-Badr,	(参考訳) 本研究では,脳波記録を用いた統合失調症の自動診断にディープラーニングを用いた手法を応用した。このアプローチは、診断の精度を高める強力な手法である生成データ拡張を利用する。時間周波数特性を利用するために, 原信号からスペクトルを抽出した。いくつかのニューラルネットワークアーキテクチャのセットアップを探索した後、最初の診断には適切な畳み込みニューラルネットワーク(CNN)が使用された。その後、Wasserstein GAN と Gradient Penalty (WGAN-GP) と Variational Autoencoder (VAE) を用いて、2つの異なる合成データセットを生成し、初期データセットを増大させ、過度に適合する問題に対処した。 VAEを用いたデータセットの精度は最大99.0.%まで3.0\%向上し、損失値も低く、収束も速くなった。最後に、診断過程において最も重要なスーパーピクセル(周波数)を決定するために、Local Interpretable Model-Agnostic Explanations (LIME)アルゴリズムを用いたブラックボックスモデルの信頼性の欠如に対処した。 In this study, we leverage a deep learning-based method for the automatic diagnosis of schizophrenia using EEG brain recordings. This approach utilizes generative data augmentation, a powerful technique that enhances the accuracy of the diagnosis. To enable the utilization of time-frequency features, spectrograms were extracted from the raw signals. After exploring several neural network architectural setups, a proper convolutional neural network (CNN) was used for the initial diagnosis. Subsequently, using Wasserstein GAN with Gradient Penalty (WGAN-GP) and Variational Autoencoder (VAE), two different synthetic datasets were generated in order to augment the initial dataset and address the over-fitting issue. The augmented dataset using VAE achieved a 3.0\% improvement in accuracy reaching up to 99.0\% and yielded a lower loss value as well as a faster convergence. Finally, we addressed the lack of trust in black-box models using the Local Interpretable Model-agnostic Explanations (LIME) algorithm to determine the most important superpixels (frequencies) in the diagnosis process.	翻訳日:2024-07-18 23:18:25 公開日:2024-07-16
# メタラーニングにおけるアクティブラーニングの探求 - コンテキストセットラベリングの強化 Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling ( http://arxiv.org/abs/2311.02879v2 ) ライセンス: Link先を確認	Wonho Bae, Jing Wang, Danica J. Sutherland,	(参考訳) ほとんどのメタ学習手法は、テスト時に新しいタスクを確立するのに使用される(非常に小さい)コンテキストセットが受動的に提供されると仮定する。しかし、ある設定では、どのポイントをラベルにするかを積極的に選択することは可能であり、慎重に選択することによる潜在的な利益は相当であるが、典型的なアクティブな学習設定との大きな違いが必要である。メタラーニングプロセスのどの部分がアクティブラーニングを使用するかによって、アクティブなメタラーニングを用いてコンテキストセットをラベル付けする方法を明確にする。本枠組みでは,ラベルのどの点を選択するかを選択するため,ガウス混合に適合した自然なアルゴリズムを提案する。提案アルゴリズムは、複数のベンチマークデータセットにまたがる様々なメタラーニングアルゴリズムを使用する場合、最先端のアクティブラーニング手法より優れている。 Most meta-learning methods assume that the (very small) context set used to establish a new task at test time is passively provided. In some settings, however, it is feasible to actively select which points to label; the potential gain from a careful choice is substantial, but the setting requires major differences from typical active learning setups. We clarify the ways in which active meta-learning can be used to label a context set, depending on which parts of the meta-learning process use active learning. Within this framework, we propose a natural algorithm based on fitting Gaussian mixtures for selecting which points to label; though simple, the algorithm also has theoretical motivation. The proposed algorithm outperforms state-of-the-art active learning methods when used with various meta-learning algorithms across several benchmark datasets.	翻訳日:2024-07-18 23:18:25 公開日:2024-07-16
# 階層的関係と常識知識によるシーングラフ生成の強化 Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge ( http://arxiv.org/abs/2311.12889v2 ) ライセンス: Link先を確認	Bowen Jiang, Zhijun Zhuang, Shreyas S. Shivakumar, Camillo J. Taylor,	(参考訳) この研究は、関係階層とコモンセンス知識の両方を組み込むことにより、シーングラフを生成するための拡張されたアプローチを導入する。具体的には、情報的階層構造を利用する階層的関係ヘッドの提案から始める。画像内のオブジェクトペア間の関係のスーパーカテゴリと、各スーパーカテゴリの詳細な関係を共同で予測する。これに続いて、我々は、基礎モデルを利用してシーングラフ予測システムから結果を批判する堅牢なコモンセンス検証パイプラインを実装し、小さな言語のみのモデルであっても非意味な述語を除去する。 Visual GenomeとOpenImage V6データセットに関する大規模な実験は、既存のシーングラフ生成アルゴリズムのプラグイン・アンド・プレイ拡張として提案されたモジュールをシームレスに統合できることを実証している。結果は、データセットアノテーションを超えて、妥当な予測の広範なセットで大幅に改善されたことを示している。コードはhttps://github.com/bowen-upenn/scene_graph_commonsenseで公開されている。 This work introduces an enhanced approach to generating scene graphs by incorporating both a relationship hierarchy and commonsense knowledge. Specifically, we begin by proposing a hierarchical relation head that exploits an informative hierarchical structure. It jointly predicts the relation super-category between object pairs in an image, along with detailed relations under each super-category. Following this, we implement a robust commonsense validation pipeline that harnesses foundation models to critique the results from the scene graph prediction system, removing nonsensical predicates even with a small language-only model. Extensive experiments on Visual Genome and OpenImage V6 datasets demonstrate that the proposed modules can be seamlessly integrated as plug-and-play enhancements to existing scene graph generation algorithms. The results show significant improvements with an extensive set of reasonable predictions beyond dataset annotations. Codes are available at https://github.com/bowen-upenn/scene_graph_commonsense.	翻訳日:2024-07-18 23:08:39 公開日:2024-07-16
# 連続表現学習のための再考 Revisiting Supervision for Continual Representation Learning ( http://arxiv.org/abs/2311.13321v2 ) ライセンス: Link先を確認	Daniel Marczak, Sebastian Cygert, Tomasz Trzciński, Bartłomiej Twardowski,	(参考訳) 連続学習の分野では、モデルは次々にタスクを学ぶように設計されている。ほとんどの研究は教師なし連続学習を中心にしているが、大量のラベルのないデータを活用する教師なし連続学習への関心が高まっている。近年の研究では、堅牢な表現を提供する上で、教師なしの方法、特に自己教師付き学習の強みを強調している。自己教師付き手法で構築した表現の転写性の向上は、多層パーセプトロンプロジェクタが果たす役割と関連していることが多い。本研究では、この観察から出発し、連続表現学習における監督の役割を再検討する。人間のアノテーションのような追加情報は、表現の質を損なうべきではないと考えている。本研究は,多層パーセプトロンヘッドで強化された教師付きモデルにおいて,連続表現学習において自己教師付きモデルよりも優れることを示す。このことは、連続学習における一連のタスクにまたがる特徴伝達可能性を形成する上で、多層パーセプトロンプロジェクタの重要性を強調している。コードはgithubで入手できる。 https://github.com/danielm1405/sl-vs-ssl-cl。 In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, there is a growing interest in unsupervised continual learning, which makes use of the vast amounts of unlabeled data. Recent studies have highlighted the strengths of unsupervised methods, particularly self-supervised learning, in providing robust representations. The improved transferability of those representations built with self-supervised methods is often associated with the role played by the multi-layer perceptron projector. In this work, we depart from this observation and reexamine the role of supervision in continual representation learning. We reckon that additional information, such as human annotations, should not deteriorate the quality of representations. Our findings show that supervised models when enhanced with a multi-layer perceptron head, can outperform self-supervised models in continual representation learning. This highlights the importance of the multi-layer perceptron projector in shaping feature transferability across a sequence of tasks in continual learning. The code is available on github: https://github.com/danielm1405/sl-vs-ssl-cl.	翻訳日:2024-07-18 23:08:38 公開日:2024-07-16
# 拡散モデルを用いた時間連続デテール合成によるビデオ超解像の知覚品質向上 Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models ( http://arxiv.org/abs/2311.15908v2 ) ライセンス: Link先を確認	Claudio Rota, Marco Buzzelli, Joost van de Weijer,	(参考訳) 本稿では,フレーム間の時間的一貫性を確保しつつ,拡散モデル(DM)を用いたビデオ超解像(VSR)の知覚品質向上の問題に対処する。本稿では,リアルタイムかつ時間的に一貫性のある細部を合成することにより,高画質映像の知覚的品質を大幅に向上させる,DMに基づくVSR手法であるStableVSRを提案する。本稿では,TCM(Temporal Conditioning Module)を訓練済みのDMに導入し,単一画像の超解像をVSR法に変換する。 TCMは、隣接フレームで合成された空間的に整列し、詳細に富んだテクスチャ情報を提供する、新しいテンポラルテクスチャガイダンスを使用している。これは、現在のフレームの生成過程を、高品質で時間的に一貫性のある結果へと導く。さらに,過去から未来への情報活用を促進するために,新しいフレームワイド双方向サンプリング戦略を導入する。この戦略は、結果の知覚的品質とフレーム間の時間的一貫性を改善する。本稿では,既存のVSRの最先端手法と比較して,時間的整合性を向上しつつ,高画質映像の知覚品質を高める上でのStableVSRの有効性を実証する。プロジェクトページはhttps://github.com/claudiom4sir/StableVSRで公開されている。 In this paper, we address the problem of enhancing perceptual quality in video super-resolution (VSR) using Diffusion Models (DMs) while ensuring temporal consistency among frames. We present StableVSR, a VSR method based on DMs that can significantly enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details. We introduce the Temporal Conditioning Module (TCM) into a pre-trained DM for single image super-resolution to turn it into a VSR method. TCM uses the novel Temporal Texture Guidance, which provides it with spatially-aligned and detail-rich texture information synthesized in adjacent frames. This guides the generative process of the current frame toward high-quality and temporally-consistent results. In addition, we introduce the novel Frame-wise Bidirectional Sampling strategy to encourage the use of information from past to future and vice-versa. This strategy improves the perceptual quality of the results and the temporal consistency across frames. We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR. The project page is available at https://github.com/claudiom4sir/StableVSR.	翻訳日:2024-07-18 23:08:38 公開日:2024-07-16
# ストリートトライオン:不自由な人物画像からWildのバーチャルトライオンを学習する Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images ( http://arxiv.org/abs/2311.16094v3 ) ライセンス: Link先を確認	Aiyu Cui, Jay Mahajan, Viraj Shah, Preeti Gomathinayagam, Chang Liu, Svetlana Lazebnik,	(参考訳) ほとんどの仮想試着研究は、低コストでスタジオモデルに衣服を展示するための画像を生成することで、ファッションビジネスに役立てることを目的としている。しかし、仮想トライオンは、ユーザーが自分のカジュアルな写真を使って自分自身で衣服を視覚化する、より広範なアプリケーションでなければならない。残念なことに、スタジオ・トライオン・セッティングで有効な結果が得られる既存の手法は、Wildのコンテキストでは性能が良くない。これは、トレーニングにはペア画像(同じ服を着ている人のイメージをペアにした衣料品画像)を必要とすることが多いためである。このようなペアリングされたデータは、スタジオ設定のためにショッピングサイトから簡単に収集できるが、現場のシーンでは入手が困難である。本研究は,(1)実機での仮想試行を支援するためのStreetTryOnベンチマークを導入し,(2)ペアデータを必要とすることなく,実機で直接仮想試行を学習する新しい手法を提案することでギャップを埋める。我々は,DensePoseワープ補正法と拡散型条件付き塗装を組み合わせることで,衣服をより多様な人間のポーズに変形させ,より複雑な背景を忠実にレンダリングするなど,ユニークな課題に取り組む。実験では,標準的なスタジオトライオンタスクと,ストリートトライオンタスクとクロスドメイントライオンタスクのSOTAパフォーマンスの競合性能を示す。 Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results for studio try-on settings, perform poorly in the in-the-wild context. This is because these methods often require paired images (garment images paired with images of people wearing the same garment) for training. While such paired data is easy to collect from shopping websites for studio settings, it is difficult to obtain for in-the-wild scenes. In this work, we fill the gap by (1) introducing a StreetTryOn benchmark to support in-the-wild virtual try-on applications and (2) proposing a novel method to learn virtual try-on from a set of in-the-wild person images directly without requiring paired data. We tackle the unique challenges, including warping garments to more diverse human poses and rendering more complex backgrounds faithfully, by a novel DensePose warping correction method combined with diffusion-based conditional inpainting. Our experiments show competitive performance for standard studio try-on tasks and SOTA performance for street try-on and cross-domain try-on tasks.	翻訳日:2024-07-18 23:08:38 公開日:2024-07-16
# ScribblePrompt:どんなバイオメディカル画像でも高速でフレキシブルなインタラクティブセグメンテーション ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image ( http://arxiv.org/abs/2312.07381v3 ) ライセンス: Link先を確認	Hallee E. Wong, Marianne Rakic, John Guttag, Adrian V. Dalca,	(参考訳) バイオメディカルイメージセグメンテーションは、科学研究と臨床医療の両方において重要な部分である。十分なラベル付きデータによって、ディープラーニングモデルは、特定のバイオメディカルイメージセグメンテーションタスクを正確に自動化するように訓練することができる。しかし、トレーニングデータを作成するために手動で画像のセグメンテーションを行うのは、非常に労力がかかり、ドメインの専門知識を必要とする。バイオメディカルイメージングのためのフレキシブルニューラルネットワークベースのインタラクティブセグメンテーションツールである \emph{ScribblePrompt} を紹介した。厳密な定量的実験により、ScribblePromptはトレーニング中に見つからないデータセットの従来の方法よりも正確なセグメンテーションを生成することを示した。ドメインの専門家によるユーザスタディでは、ScribblePromptはアノテーションの時間を28%削減し、Diceを15%改善した。 ScribblePromptの成功は、注意深い設計決定にかかっている。これには、非常に多様なイメージとタスクのセット、ユーザインタラクションとラベルをシミュレートする新しいアルゴリズム、高速な推論を可能にするネットワークを含むトレーニング戦略が含まれる。インタラクティブなデモでScribblePromptを紹介し、コードを提供し、https://scribbleprompt.csail.mit.eduでscribbleアノテーションのデータセットをリリースする。 Biomedical image segmentation is a crucial part of both scientific research and clinical care. With enough labelled data, deep learning models can be trained to accurately automate specific biomedical image segmentation tasks. However, manually segmenting images to create training data is highly labor intensive and requires domain expertise. We present \emph{ScribblePrompt}, a flexible neural network based interactive segmentation tool for biomedical imaging that enables human annotators to segment previously unseen structures using scribbles, clicks, and bounding boxes. Through rigorous quantitative experiments, we demonstrate that given comparable amounts of interaction, ScribblePrompt produces more accurate segmentations than previous methods on datasets unseen during training. In a user study with domain experts, ScribblePrompt reduced annotation time by 28% while improving Dice by 15% compared to the next best method. ScribblePrompt's success rests on a set of careful design decisions. These include a training strategy that incorporates both a highly diverse set of images and tasks, novel algorithms for simulated user interactions and labels, and a network that enables fast inference. We showcase ScribblePrompt in an interactive demo, provide code, and release a dataset of scribble annotations at https://scribbleprompt.csail.mit.edu	翻訳日:2024-07-18 22:58:48 公開日:2024-07-16
# 高速拡散方式によるショートカット除去・生成対策 Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation ( http://arxiv.org/abs/2312.14223v2 ) ライセンス: Link先を確認	Nina Weng, Paraskevas Pegios, Eike Petersen, Aasa Feragen, Siavash Bigdeli,	(参考訳) ショートカット学習(英: shortcut learning)とは、モデル、例えば心臓病分類器が対象のラベルと急激なショートカット特徴、例えばペースメーカーとの間の相関を利用して、実際の識別的特徴ではなく、そのショートカットに基づいてターゲットのラベルを予測することである。これは医学的画像において一般的であり、治療と臨床的アノテーションは疾患のラベルと相関し、疾患を予測するためのショートカットを容易にする。本稿では,ショートカットを合成的に除去あるいは付加できる高速拡散型反ファクト画像生成により,潜在的ショートカット特徴の影響の新たな検出と定量化を提案する。空間的制約のあるショートカット特徴の除去を奨励するとともに, ショートカットのないカウンターファクトファクトが残像特徴を高い精度で保持することを保証する。これらを用いて,ショートカットがモデル予測に与える影響を評価する。これは2つ目の貢献によって実現された: 効率的な拡散に基づく反ファクト的説明法は、画像品質を最先端技術に匹敵する精度で推論速度を向上する。胸部X線データセット,皮膚病変データセット,CelebAで確認した。私たちのコードはfastdime.compute.dtu.dkで公開されています。 Shortcut learning is when a model -- e.g. a cardiac disease classifier -- exploits correlations between the target label and a spurious shortcut feature, e.g. a pacemaker, to predict the target label based on the shortcut rather than real discriminative features. This is common in medical imaging, where treatment and clinical annotations correlate with disease labels, making them easy shortcuts to predict disease. We propose a novel detection and quantification of the impact of potential shortcut features via a fast diffusion-based counterfactual image generation that can synthetically remove or add shortcuts. Via a novel inpainting-based modification we spatially limit the changes made with no extra inference step, encouraging the removal of spatially constrained shortcut features while ensuring that the shortcut-free counterfactuals preserve their remaining image features to a high degree. Using these, we assess how shortcut features influence model predictions. This is enabled by our second contribution: An efficient diffusion-based counterfactual explanation method with significant inference speed-up at comparable image quality as state-of-the-art. We confirm this on two large chest X-ray datasets, a skin lesion dataset, and CelebA. Our code is publicly available at fastdime.compute.dtu.dk.	翻訳日:2024-07-18 22:58:48 公開日:2024-07-16
# 変分量子アルゴリズムのための階層型多重グリッドアンサッツ Hierarchical Multigrid Ansatz for Variational Quantum Algorithms ( http://arxiv.org/abs/2312.15048v2 ) ライセンス: Link先を確認	Christo Meriwether Keller, Stephan Eidenbenz, Andreas Bärtschi, Daniel O'Malley, John Golden, Satyajayant Misra,	(参考訳) 量子コンピューティングは、基礎物理学を用いたスーパーコンピューティングを強化することを約束する、エンジニアリングにおける新たなトピックである。短期的には、この利点を達成するための最良の候補アルゴリズムは変分量子アルゴリズム(VQA)である。本稿では,変分量子固有解法(VQE)を中心に,新しいVQAアンサッツの設計と数値評価を行う。我々のアンザッツは古典的な多重グリッド階層法にインスパイアされているので、我々はそれを「マルチグリッド」アンザッツと呼ぶ。マルチグリッドアンサッツは、より小さなキュービット数に対する回路を連続的に構築し最適化することにより、$n$ qubitsの量子問題に対するパラメータ化量子回路を生成し、$j+1$の次のレベル階層に対する初期解として最適化されたパラメータ値を再利用する。数値シミュレーションにより,Laplacian 固有解器の解法品質やMaxCut と Maximum $k$-Satisfiability の具体例による組合せ最適化問題において,マルチグリッドアンサッツは標準的なハードウェア効率のアンサッツよりも優れていることを示す。本研究は,多くのVQAに対して有効な候補としてマルチグリッドアンサッツを確立し,特に組合せ最適化問題に対するQAOAアプローチの代替として有望であることを示す。 Quantum computing is an emerging topic in engineering that promises to enhance supercomputing using fundamental physics. In the near term, the best candidate algorithms for achieving this advantage are variational quantum algorithms (VQAs). We design and numerically evaluate a novel ansatz for VQAs, focusing in particular on the variational quantum eigensolver (VQE). As our ansatz is inspired by classical multigrid hierarchy methods, we call it "multigrid" ansatz. The multigrid ansatz creates a parameterized quantum circuit for a quantum problem on $n$ qubits by successively building and optimizing circuits for smaller qubit counts $j < n$, reusing optimized parameter values as initial solutions to next level hierarchy at $j+1$. We show through numerical simulation that the multigrid ansatz outperforms the standard hardware-efficient ansatz in terms of solution quality for the Laplacian eigensolver as well as for a large class of combinatorial optimization problems with specific examples for MaxCut and Maximum $k$-Satisfiability. Our studies establish the multi-grid ansatz as a viable candidate for many VQAs and in particular present a promising alternative to the QAOA approach for combinatorial optimization problems.	翻訳日:2024-07-18 22:58:48 公開日:2024-07-16
# Diffusion-ES: 自律走行とゼロショット指示に続く拡散を考慮したグラディエントフリープランニング Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following ( http://arxiv.org/abs/2402.06559v2 ) ライセンス: Link先を確認	Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki,	(参考訳) 拡散モデルは、意思決定と制御のための複雑および多モーダルな軌道分布のモデリングにおいて優れている。微分可能報酬関数と拡散モデルで捉えたデータ分布下での確率の両方を最大化する軌道を生成するために、最近、逆勾配誘導型雑音発生法が提案されている。逆勾配誘導復調法は、クリーンサンプルとノイズサンプルの両方に適合する微分可能な報酬関数を必要とし、一般的な軌道最適化器としての適用性を制限する。本稿では,データ多様体に留まりながら,勾配のない最適化とトラジェクトリデノイングを組み合わせて,ブラックボックスの非微分対象を最適化するDiffusionESを提案する。拡散-ESは、拡散モデルからの進化的探索中に軌道をサンプリングし、ブラックボックス報酬関数を用いてそれらをスコアする。トランキャット拡散法を用いて高次の軌道を変異させ、少数のノイズ化とデノナイジングのステップを適用し、解空間のより効率的な探索を可能にする。 DiffusionESは、自動運転のための確立されたクローズドループ計画ベンチマークであるnuPlan上で、最先端のパフォーマンスを実現する。 Diffusion-ESは、既存のサンプリングベースプランナー、リアクティブ決定性または拡散ベースのポリシー、報酬段階のガイダンスより優れている。また,従来の指導手法と異なり,本手法はLLMプロンプトが生成する非微分言語型報酬関数を最適化できることを示す。学習データには存在しないアグレッシブレーンウィービングのような,新たな複雑な行動を生成することができる。これにより、既存の軌道最適化メソッドと駆動ポリシーの能力を超えた最も難しいnuPlanシナリオを解決できます。 Diffusion models excel at modeling complex and multimodal trajectory distributions for decision-making and control. Reward-gradient guided denoising has been recently proposed to generate trajectories that maximize both a differentiable reward function and the likelihood under the data distribution captured by a diffusion model. Reward-gradient guided denoising requires a differentiable reward function fitted to both clean and noised samples, limiting its applicability as a general trajectory optimizer. In this paper, we propose DiffusionES, a method that combines gradient-free optimization with trajectory denoising to optimize black-box non-differentiable objectives while staying in the data manifold. Diffusion-ES samples trajectories during evolutionary search from a diffusion model and scores them using a black-box reward function. It mutates high-scoring trajectories using a truncated diffusion process that applies a small number of noising and denoising steps, allowing for much more efficient exploration of the solution space. We show that DiffusionES achieves state-of-the-art performance on nuPlan, an established closed-loop planning benchmark for autonomous driving. Diffusion-ES outperforms existing sampling-based planners, reactive deterministic or diffusion-based policies, and reward-gradient guidance. Additionally, we show that unlike prior guidance methods, our method can optimize non-differentiable language-shaped reward functions generated by few-shot LLM prompting. When guided by a human teacher that issues instructions to follow, our method can generate novel, highly complex behaviors, such as aggressive lane weaving, which are not present in the training data. This allows us to solve the hardest nuPlan scenarios which are beyond the capabilities of existing trajectory optimization methods and driving policies.	翻訳日:2024-07-18 22:48:58 公開日:2024-07-16
# 拡散生成モデルにおける最近近傍スコア推定器 Nearest Neighbour Score Estimators for Diffusion Generative Models ( http://arxiv.org/abs/2402.08018v2 ) ライセンス: Link先を確認	Matthew Niedoba, Dylan Green, Saeid Naderiparizi, Vasileios Lioutas, Jonathan Wilder Lavington, Xiaoxuan Liang, Yunpeng Liu, Ke Zhang, Setareh Dabiri, Adam Ścibior, Berend Zwartsenberg, Frank Wood,	(参考訳) スコア関数推定は拡散生成モデルからのトレーニングとサンプリングの両方の基礎となる。この事実にもかかわらず、最もよく使われる推定器は、バイアス付きニューラルネットワーク近似または条件スコアに基づく高分散モンテカルロ推定器である。トレーニングセットから複数のサンプルを抽出し,推定値の分散を劇的に低減する新しい近傍スコア関数推定器を提案する。低分散推定器を2つの説得力のある応用に活用する。推定器による整合性モデルの訓練を行い, 収束速度と試料品質の両面で有意な増加が報告された。拡散モデルでは,確率フローODE統合のための学習ネットワークを推定器で置き換えることができ,将来的な研究の新たな道が開かれる。 Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research.	翻訳日:2024-07-18 22:48:58 公開日:2024-07-16
# ELECTRAの文は修復以上のものなのか? : 意味的テクスチャ類似性の事例 Are ELECTRA's Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity ( http://arxiv.org/abs/2402.13130v2 ) ライセンス: Link先を確認	Ivan Rep, David Dukić, Jan Šnajder,	(参考訳) BERTは高品質な文埋め込みを生成するが、事前学習の計算コストは大きな欠点である。これとは対照的に、ELECTRAはコスト効率のよい事前学習目標と下流タスクのパフォーマンスの改善を提供するが、文の埋め込みとしては機能しない。コミュニティは、セマンティックテキスト類似性(STS)にELECTRAの文を埋め込むことを熱心に止めた。 ELECTRAディスクリミネータの最後の層を以前の層と比較すると,性能が著しく低下していることが分かる。我々はこの落下を探索し、ELECTRAの埋め込みを修復する方法を考案し、新しいTMFT法を提案する。 TMFTは、STSベンチマークデータセットのパラメータ効率を高めながら、スピアマン相関係数を8点以上改善する。我々は分析を様々なモデルサイズと言語に拡張する。さらに,BERTと同等に動作するELECTRAのジェネレータモデルに対して,パラメータが大幅に小さく,埋め込みサイズも大幅に小さくなった。最後に、TMFTと単語類似性タスク、ドメイン適応型事前学習を組み合わせることで、さらなる向上を観察する。 While BERT produces high-quality sentence embeddings, its pre-training computational cost is a significant drawback. In contrast, ELECTRA delivers a cost-effective pre-training objective and downstream task performance improvements, but not as performant sentence embeddings. The community tacitly stopped utilizing ELECTRA's sentence embeddings for semantic textual similarity (STS). We notice a significant drop in performance when using the ELECTRA discriminator's last layer in comparison to earlier layers. We explore this drop and devise a way to repair ELECTRA's embeddings, proposing a novel truncated model fine-tuning (TMFT) method. TMFT improves the Spearman correlation coefficient by over 8 points while increasing parameter efficiency on the STS benchmark dataset. We extend our analysis to various model sizes and languages. Further, we discover the surprising efficacy of ELECTRA's generator model, which performs on par with BERT, using significantly fewer parameters and a substantially smaller embedding size. Finally, we observe further boosts by combining TMFT with a word similarity task or domain adaptive pre-training.	翻訳日:2024-07-18 22:39:10 公開日:2024-07-16
# MedContext:効率的なボリューム・メディカル・セグメンテーションのためのコンテキストキューの学習 MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation ( http://arxiv.org/abs/2402.17725v2 ) ライセンス: Link先を確認	Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan,	(参考訳) ボリューム・メディカル・セグメンテーションは、異なる意味領域を規定する3次元医用画像解析の重要な構成要素である。ディープニューラルネットワークは、ボリューム医学のセグメンテーションを大幅に改善したが、一般的には、パフォーマンス向上のために大規模な注釈付きデータを必要とするため、高価で入手が禁止される可能性がある。この制限に対処するために、既存の研究は典型的には伝達学習や、代表的特徴を学習するための専用の訓練済みファインタニングステージの設計を行う。しかし、ソースとターゲットドメインのミスマッチにより、ボリュームデータの最適な表現を学習することは困難になり、マルチステージトレーニングでは、より高い計算と、ステージ固有の設計選択を慎重に選択する必要がある。対照的に、アーキテクチャに依存しない、既存の医用セグメンテーションのトレーニングフレームワークに組み込むことのできる、MedContextと呼ばれる普遍的なトレーニングフレームワークを提案する。本手法は,大規模注釈付ボリューム・メディカル・データや専用トレーニング前ファインタニング・ステージを必要とせず,教師付きボクセルセグメンテーション・タスクと協調して自己指導型コンテキストキューを効果的に学習する。提案手法は,出力セグメンテーション空間における臓器の欠損部分や臓器の再構築を学習することで,ネットワーク内の文脈知識を誘導する。 MedContextの有効性は、複数の3D医療データセットと4つの最先端モデルアーキテクチャで検証されている。このアプローチは、数ショットのデータシナリオであっても、データセットと異なるアーキテクチャ間でセグメンテーションパフォーマンスが一貫して向上していることを示します。私たちのコードと事前訓練されたモデルはhttps://github.com/hananshafi/MedContextで利用可能です。 Volumetric medical segmentation is a critical component of 3D medical image analysis that delineates different semantic regions. Deep neural networks have significantly improved volumetric medical segmentation, but they generally require large-scale annotated data to achieve better performance, which can be expensive and prohibitive to obtain. To address this limitation, existing works typically perform transfer learning or design dedicated pretraining-finetuning stages to learn representative features. However, the mismatch between the source and target domain can make it challenging to learn optimal representation for volumetric data, while the multi-stage training demands higher compute as well as careful selection of stage-specific design choices. In contrast, we propose a universal training framework called MedContext that is architecture-agnostic and can be incorporated into any existing training framework for 3D medical segmentation. Our approach effectively learns self supervised contextual cues jointly with the supervised voxel segmentation task without requiring large-scale annotated volumetric medical data or dedicated pretraining-finetuning stages. The proposed approach induces contextual knowledge in the network by learning to reconstruct the missing organ or parts of an organ in the output segmentation space. The effectiveness of MedContext is validated across multiple 3D medical datasets and four state-of-the-art model architectures. Our approach demonstrates consistent gains in segmentation performance across datasets and different architectures even in few-shot data scenarios. Our code and pretrained models are available at https://github.com/hananshafi/MedContext	翻訳日:2024-07-18 22:39:10 公開日:2024-07-16
# アルゴリズムはいつ辞任すべきか? AIガバナンスの提案 When Should Algorithms Resign? A Proposal for AI Governance ( http://arxiv.org/abs/2402.18326v2 ) ライセンス: Link先を確認	Umang Bhatt, Holli Sargeant,	(参考訳) アルゴリズムの辞退は、ガバナンスを直接AIシステムに埋め込むことによって、人工知能(AI)の使用を管理する戦略的アプローチである。特定のシナリオにおいて、AIの適切な効果的な使用を支援するために、アクセスAI出力の制限やパフォーマンス評価の表示など、AIからの意図的かつインフォームドな切り離しが含まれる。アルゴリズムの辞退をガバナンスメカニズムとして統合することにより、組織はAIの使用タイミングと使い方をよりよく制御し、自動化のメリットと人間の監視の必要性のバランスを取ることができる。 Algorithmic resignation is a strategic approach for managing the use of artificial intelligence (AI) by embedding governance directly into AI systems. It involves deliberate and informed disengagement from AI, such as restricting access AI outputs or displaying performance disclaimers, in specific scenarios to aid the appropriate and effective use of AI. By integrating algorithmic resignation as a governance mechanism, organizations can better control when and how AI is used, balancing the benefits of automation with the need for human oversight.	翻訳日:2024-07-18 22:39:10 公開日:2024-07-16
# 空間コヒーレンス損失:正当性およびカモフラージュ性物体検出における全ての物体 Spatial Coherence Loss: All Objects Matter in Salient and Camouflaged Object Detection ( http://arxiv.org/abs/2402.18698v2 ) ライセンス: Link先を確認	Ziyun Yang, Kevin Choy, Sina Farsiu,	(参考訳) ジェネリックオブジェクト検出は、オブジェクトの正確なモデリングに依存するカテゴリに依存しないタスクである。正確な意味分析を行うには,事前定義された基底真理(GT)オブジェクトや,ネットワークが前景と誤認する曖昧なデコイオブジェクトを含む,学習の任意の段階で現れるオブジェクトレベルの予測を学習する必要がある。しかし、最も関連するモデルは、主にGTオブジェクトの学習を改善することに焦点を当てた。デコイオブジェクトを考えるいくつかの方法は、単一の不明瞭なピクセルの損失応答にのみ焦点をあてるロス関数を利用するため、オブジェクトレベルのあいまいさ学習設計が提供できる豊富な情報から恩恵を受けない。人間の視覚システムに触発され,まず意味を掘り下げる前に曖昧な領域の境界を識別し,隣接する画素間の相互応答を広義に用いた新しい損失関数である空間コヒーレンス損失(SCLoss)を提案する。提案するSCLosは, 自己適応的に境界を検出, 強調することにより, あいまいな領域を徐々に学習できることを実証する。総合的な実験により、一般的な損失関数をSCLosに置き換えることで、現在の最先端(SOTA)サラリアンまたはカモフラージュされたオブジェクト検出(SODまたはCOD)モデルの性能が向上することを示した。また、SCLosと他の損失関数を組み合わせることで、パフォーマンスが向上し、異なるアプリケーションに対してSOTA結果が得られることを示す。 Generic object detection is a category-independent task that relies on accurate modeling of objectness. We show that for accurate semantic analysis, the network needs to learn all object-level predictions that appear at any stage of learning, including the pre-defined ground truth (GT) objects and the ambiguous decoy objects that the network misidentifies as foreground. Yet, most relevant models focused mainly on improving the learning of the GT objects. A few methods that consider decoy objects utilize loss functions that only focus on the single-response, i.e., the loss response of a single ambiguous pixel, and thus do not benefit from the wealth of information that an object-level ambiguity learning design can provide. Inspired by the human visual system, which first discerns the boundaries of ambiguous regions before delving into the semantic meaning, we propose a novel loss function, Spatial Coherence Loss (SCLoss), that incorporates the mutual response between adjacent pixels into the widely-used single-response loss functions. We demonstrate that the proposed SCLoss can gradually learn the ambiguous regions by detecting and emphasizing their boundaries in a self-adaptive manner. Through comprehensive experiments, we demonstrate that replacing popular loss functions with SCLoss can improve the performance of current state-of-the-art (SOTA) salient or camouflaged object detection (SOD or COD) models. We also demonstrate that combining SCLoss with other loss functions can further improve performance and result in SOTA outcomes for different applications.	翻訳日:2024-07-18 22:39:10 公開日:2024-07-16
# ゼロショットインスタンスナビゲーションのための優先順位付きセマンティック学習 Prioritized Semantic Learning for Zero-shot Instance Navigation ( http://arxiv.org/abs/2403.11650v2 ) ライセンス: Link先を確認	Xinyu Sun, Lizhao Liu, Hongyan Zhi, Ronghe Qiu, Junwei Liang,	(参考訳) 我々はゼロショットのインスタンスナビゲーションについて研究し、エージェントはトレーニングにオブジェクトアノテーションを使わずに特定のオブジェクトにナビゲートする。従来のオブジェクトナビゲーション手法では、事前トレーニングのためにImage-goal Navigation (ImageNav) タスクを適用し、エージェントを移動して視覚言語モデルを用いてオブジェクト目標を達成する。しかし、これらのアプローチは意味的無視の問題を招き、モデルが意味的な意味的アライメントを学ばない。本稿では,ナビゲーションエージェントのセマンティック理解能力を向上させるために,優先度付き意味学習(PSL)手法を提案する。具体的には、セマンティック強化PSLエージェントを提案し、明確なセマンティックインスペクションを示すゴールイメージを選択し、厳密な正確なビューマッチングから報酬関数を緩和するために、優先順位付けされたセマンティックトレーニング戦略を導入する。推論時には、目標セマンティクスの粒度レベルをトレーニングと同一に保つために意味拡張推論スキームが設計される。さらに、一般的なHM3D環境では、目的が単にオブジェクトカテゴリによって定義されるObject Navigation(ObjectNav)タスクとは対照的に、特定のオブジェクトインスタンスに詳細な説明をする必要のあるインスタンスナビゲーション(InstanceNav)タスクを提示します。我々のPSLエージェントは、0ショットのObjectNavにおいて、0ショットのObjectNavを66%上回り、新しいInstanceNavタスクよりも優れている。コードはhttps://github.com/XinyuSun/PSL-InstanceNav.comでリリースされる。 We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training. Previous object navigation approaches apply the image-goal navigation (ImageNav) task (go to the location of an image) for pretraining, and transfer the agent to achieve object goals using a vision-language model. However, these approaches lead to issues of semantic neglect, where the model fails to learn meaningful semantic alignments. In this paper, we propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents. Specifically, a semantic-enhanced PSL agent is proposed and a prioritized semantic training strategy is introduced to select goal images that exhibit clear semantic supervision and relax the reward function from strict exact view matching. At inference time, a semantic expansion inference scheme is designed to preserve the same granularity level of the goal semantic as training. Furthermore, for the popular HM3D environment, we present an Instance Navigation (InstanceNav) task that requires going to a specific object instance with detailed descriptions, as opposed to the Object Navigation (ObjectNav) task where the goal is defined merely by the object category. Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot ObjectNav in terms of success rate and is also superior on the new InstanceNav task. Code will be released at https://github.com/XinyuSun/PSL-InstanceNav.	翻訳日:2024-07-18 22:29:24 公開日:2024-07-16
# テキスト分類のためのアクティブ学習者の脆弱性について On the Fragility of Active Learners for Text Classification ( http://arxiv.org/abs/2403.15744v4 ) ライセンス: Link先を確認	Abhishek Ghose, Emma Thuong Nguyen,	(参考訳) アクティブラーニング(AL)技術は、学習に最も価値のあるインスタンスを反復的に選択することで、ラベル付け予算を最適に活用する。しかし、それらは '`prerequisite checks''' を欠いている。すなわち、データセットに最も適したALアルゴリズムを選択するための所定の基準はない。実践者は、事前に報告された結果に基づいて、ランダムサンプリングを破るテクニックを選択し、データセット、予算のラベル付け、予測パイプラインといった、環境内の多くの変数に対してレジリエンスを期待する必要があります。平均してどのくらいの頻度で、任意のALテクニックが、ランダムサンプリングの計算的安価で実装が容易な戦略を確実に打ち負かすことを期待していますか? ALを予測パイプラインの‘Always ON’モードで使用するのは,少なくとも意味があるのだろうか? ALの成功において、予測パイプラインはどの程度の役割を担っていますか? 本稿では,現在ユビキタスな事前学習表現を用いたテキスト分類タスクについて,これらの質問を詳細に検討する。ここでの私たちの主な貢献は、wrtデータセット、テキスト表現、分類器によって異なるセットアップをまたいだALテクニック、古くて新しい、厳密な評価です。これはウォームアップ時間に関する複数の洞察、すなわちALからの利得の前にラベルの数、`Always ON'モードの生存可能性、および異なる要因の相対的重要性を解き放つ。さらに,テキスト分類のためのAL手法の厳密なベンチマークを行うためのフレームワークもリリースした。 Active learning (AL) techniques optimally utilize a labeling budget by iteratively selecting instances that are most valuable for learning. However, they lack ``prerequisite checks'', i.e., there are no prescribed criteria to pick an AL algorithm best suited for a dataset. A practitioner must pick a technique they \emph{trust} would beat random sampling, based on prior reported results, and hope that it is resilient to the many variables in their environment: dataset, labeling budget and prediction pipelines. The important questions then are: how often on average, do we expect any AL technique to reliably beat the computationally cheap and easy-to-implement strategy of random sampling? Does it at least make sense to use AL in an ``Always ON'' mode in a prediction pipeline, so that while it might not always help, it never under-performs random sampling? How much of a role does the prediction pipeline play in AL's success? We examine these questions in detail for the task of text classification using pre-trained representations, which are ubiquitous today. Our primary contribution here is a rigorous evaluation of AL techniques, old and new, across setups that vary wrt datasets, text representations and classifiers. This unlocks multiple insights around warm-up times, i.e., number of labels before gains from AL are seen, viability of an ``Always ON'' mode and the relative significance of different factors. Additionally, we release a framework for rigorous benchmarking of AL techniques for text classification.	翻訳日:2024-07-18 22:29:24 公開日:2024-07-16
# フェデレートラーニングにおけるマルチモーダルトランスフォーマー Towards Multi-modal Transformers in Federated Learning ( http://arxiv.org/abs/2404.12467v2 ) ライセンス: Link先を確認	Guangyu Sun, Matias Mendieta, Aritra Dutta, Xin Li, Chen Chen,	(参考訳) マルチモーダルトランスは、異なる領域で顕著な進歩を示すが、サイロ化された高品質なデータは、さらなる改善を妨げる。これを解決するために、フェデレートラーニング(FL)は、異なるクライアントが保持する生データに直接アクセスすることなく、モデルをトレーニングする上で有望なプライバシー保護パラダイムとして登場した。その可能性にもかかわらず、未実装のユニモーダルクライアントとFLのトランスフォーマーアーキテクチャに関するかなりの研究の方向性は未解明のままである。このギャップを埋めるために,クライアントが異なるデータセットに分散した様々なモダリティのデータを保有する視覚言語領域内でのマルチモーダル・フェデレート・ラーニング(MFL)シナリオについて検討する。我々は,トランスフォーマーアーキテクチャを利用する場合の既存手法の性能を体系的に評価し,クライアント間の非モダリティと相互モダリティのギャップに対処することで,FedCola(Federated modality complementary and collaboration)と呼ばれる新しいフレームワークを導入する。さまざまなFL設定にわたる広範な実験を通じて、FedColaは従来のアプローチよりも優れたパフォーマンスを示し、将来のマルチモーダルトランスのフェデレーショントレーニングに関する新たな視点を提供する。 Multi-modal transformers mark significant progress in different domains, but siloed high-quality data hinders their further improvement. To remedy this, federated learning (FL) has emerged as a promising privacy-preserving paradigm for training models without direct access to the raw data held by different clients. Despite its potential, a considerable research direction regarding the unpaired uni-modal clients and the transformer architecture in FL remains unexplored. To fill this gap, this paper explores a transfer multi-modal federated learning (MFL) scenario within the vision-language domain, where clients possess data of various modalities distributed across different datasets. We systematically evaluate the performance of existing methods when a transformer architecture is utilized and introduce a novel framework called Federated modality complementary and collaboration (FedCola) by addressing the in-modality and cross-modality gaps among clients. Through extensive experiments across various FL settings, FedCola demonstrates superior performance over previous approaches, offering new perspectives on future federated training of multi-modal transformers.	翻訳日:2024-07-18 22:07:40 公開日:2024-07-16
# 教師付き学習のためのMPP定式化:一般化された時間差学習モデル An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models ( http://arxiv.org/abs/2404.15518v3 ) ライセンス: Link先を確認	Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr,	(参考訳) 従来の統計的学習では、データポイントは通常、未知の確率分布の後、独立して同じ分布(すなわち、同じ分布)であると仮定される。本稿では、データポイントを相互接続したものとして認識し、データモデリングにマルコフ報酬プロセス(MRP)を用いる、対照的な視点を示す。我々は、強化学習(RL)における政治政策評価問題として、典型的教師付き学習を再構成し、一般化時間差学習アルゴリズム(TD)を解法として導入する。理論的には、線形TD学習の解と通常の最小二乗(OLS)の間の関係を抽出する。また、特定の条件下では、特にノイズが相関している場合、TDの解はOLSよりも効果的に推定できることを示す。さらに,線形関数近似の下で一般化されたTDアルゴリズムの収束性を確立する。実験的な研究により、我々の理論的結果を検証し、我々のTDアルゴリズムの重要設計を検証し、回帰や深層学習による画像分類といったタスクを含む様々なデータセットで実用性を示す。 In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling. We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution. Theoretically, our analysis draws connections between the solutions of linear TD learning and ordinary least squares (OLS). We also show that under specific conditions, particularly when noises are correlated, the TD's solution proves to be a more effective estimator than OLS. Furthermore, we establish the convergence of our generalized TD algorithms under linear function approximation. Empirical studies verify our theoretical results, examine the vital design of our TD algorithm and show practical utility across various datasets, encompassing tasks such as regression and image classification with deep learning.	翻訳日:2024-07-18 22:07:40 公開日:2024-07-16
# Sim-Grasp: 合成ベンチマークによるクラスタリング環境のための6-DOF Grasp ポリシの学習 Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments Using a Synthetic Benchmark ( http://arxiv.org/abs/2405.00841v2 ) ライセンス: Link先を確認	Juncheng Li, David J. Cappelleri,	(参考訳) そこで本稿では, オブジェクト操作の強化を目的とした高度な言語モデルを統合する, 頑健な6-DOF2指グリップシステムであるSim-Graspを提案する。我々はSim-Grasp-Datasetを紹介し、500のシナリオに7.9百万のアノテートラベルを持つ1,550のオブジェクトを含み、ポイントクラウドから把握ポーズを生成するSim-GraspNetを開発した。 Sim-Grasp-Policesは1つのオブジェクトで97.14%、Levels 1-2とLevels 3-4の混合クラッタシナリオで87.43%、83.33%の達成率を達成した。テキストとボックスプロンプトを通じてターゲット識別のための言語モデルを統合することで、Sim-Graspはオブジェクト非依存とターゲットピッキングの両方を可能にし、インテリジェントなロボットシステムのバウンダリを押し上げる。 In this paper, we present Sim-Grasp, a robust 6-DOF two-finger grasping system that integrates advanced language models for enhanced object manipulation in cluttered environments. We introduce the Sim-Grasp-Dataset, which includes 1,550 objects across 500 scenarios with 7.9 million annotated labels, and develop Sim-GraspNet to generate grasp poses from point clouds. The Sim-Grasp-Polices achieve grasping success rates of 97.14% for single objects and 87.43% and 83.33% for mixed clutter scenarios of Levels 1-2 and Levels 3-4 objects, respectively. By incorporating language models for target identification through text and box prompts, Sim-Grasp enables both object-agnostic and target picking, pushing the boundaries of intelligent robotic systems.	翻訳日:2024-07-18 21:57:43 公開日:2024-07-16
# Phylotrack:シリコ系統追跡のためのC++およびPythonライブラリ Phylotrack: C++ and Python libraries for in silico phylogenetic tracking ( http://arxiv.org/abs/2405.09389v2 ) ライセンス: Link先を確認	Emily Dolson, Santiago Rodriguez-Papa, Matthew Andres Moreno,	(参考訳) ケイ素進化(英: silico evolution)は、コンピュータエージェントのデジタル集団における遺伝、変異、微分生殖成功の過程(自然選択による進化のための3つの「独立」)をインスタンス化する。その結果、これらの個体群は進化し、進化力学を研究するための仮想モデルシステムとして利用することができる。この実験パラダイムは、生物学的モデリング、人工生命、進化的計算にまたがって使用され、実験室やフィールドで不可能な実験を可能にすることで、in vitroおよびin vivoシステムを用いて行われた研究を補完する。ひとつ大きなメリットは、完全な、正確な可観測性です。例えば、シミュレーションの歴史を通してすべての親子関係を完璧に記録し、完全な系統(系統樹)を作り出すことができる。この情報は、いつ特性が得られたか、失われたかを明らかにし、根底にある進化力学の推論を促進する。 Phylotrackプロジェクトは、シリコの進化における系統の追跡と解析のためのライブラリを提供する。プロジェクトは構成されています 1) Phylotracklib: Empiricalプロジェクトの傘下で開発されたヘッダのみのC++ライブラリ。 2) Phylotrackpy: Phylotracklibを囲むPythonラッパー。両方のコンポーネントは、デジタル進化システムに系統追跡を付加する公開APIと、さまざまな一般的な系統トポロジーメトリクスを測定するスタンドアロンインターフェースを提供する。設計とC++の実装は効率を優先し、数万のエージェントの数を高速に世代交代できる。系統情報のメモリフットプリントを低減するために、いくつかの明示的な特徴(例えば、系統解析や抽象化など)を提供する。 In silico evolution instantiates the processes of heredity, variation, and differential reproductive success (the three "ingredients" for evolution by natural selection) within digital populations of computational agents. Consequently, these populations undergo evolution, and can be used as virtual model systems for studying evolutionary dynamics. This experimental paradigm -- used across biological modeling, artificial life, and evolutionary computation -- complements research done using in vitro and in vivo systems by enabling experiments that would be impossible in the lab or field. One key benefit is complete, exact observability. For example, it is possible to perfectly record all parent-child relationships across simulation history, yielding complete phylogenies (ancestry trees). This information reveals when traits were gained or lost, and also facilitates inference of underlying evolutionary dynamics. The Phylotrack project provides libraries for tracking and analyzing phylogenies in in silico evolution. The project is composed of 1) Phylotracklib: a header-only C++ library, developed under the umbrella of the Empirical project, and 2) Phylotrackpy: a Python wrapper around Phylotracklib, created with Pybind11. Both components supply a public-facing API to attach phylogenetic tracking to digital evolution systems, as well as a stand-alone interface for measuring a variety of popular phylogenetic topology metrics. Underlying design and C++ implementation prioritizes efficiency, allowing for fast generational turnover for agent populations numbering in the tens of thousands. Several explicit features (e.g., phylogeny pruning and abstraction, etc.) are provided for reducing the memory footprint of phylogenetic information.	翻訳日:2024-07-18 21:57:43 公開日:2024-07-16
# GenMix:医療画像分類のための生成データと混合データの統合 GenMix: Combining Generative and Mixture Data Augmentation for Medical Image Classification ( http://arxiv.org/abs/2405.20650v2 ) ライセンス: Link先を確認	Hansang Lee, Haeil Lee, Helen Hong,	(参考訳) 本稿では、生成的手法と混合的手法を組み合わせて、両方の手法の強みを利用するGenMixと呼ばれる新しいデータ拡張手法を提案する。生成モデルは新たなデータパターンの作成に優れていますが、GANのモード崩壊や、拡散モデルのトレーニングの困難、特に限られた医療画像データといった課題に直面しています。一方、混合モデルはクラス境界領域を強化するが、クラス不均衡のシナリオでは主要なクラスを好む傾向にある。これらの制限に対処するため、GenMixは両方のアプローチを統合して相互補完する。 GenMix は,(1) 合成画像を生成するために生成モデルを訓練し,(2) 合成データと実データとの混合を行う。このプロセスは、生成モデルの新たなパターン学習と混合モデルのバウンダリ強化の恩恵を受けながら、合成データの質と多様性を向上させる。局所肝病変(FLL)をCT画像で分類する作業において,本法の有効性を検証した。この結果から,GenMix は DCGAN, StyleGAN, Textual Inversion, Diffusion Models など,様々な生成モデルの性能を向上させることが示された。特に、テキスト・インバージョンを用いた提案手法は、FLLデータセット上での微調整拡散モデルなしで他の手法よりも優れている。 In this paper, we propose a novel data augmentation technique called GenMix, which combines generative and mixture approaches to leverage the strengths of both methods. While generative models excel at creating new data patterns, they face challenges such as mode collapse in GANs and difficulties in training diffusion models, especially with limited medical imaging data. On the other hand, mixture models enhance class boundary regions but tend to favor the major class in scenarios with class imbalance. To address these limitations, GenMix integrates both approaches to complement each other. GenMix operates in two stages: (1) training a generative model to produce synthetic images, and (2) performing mixup between synthetic and real data. This process improves the quality and diversity of synthetic data while simultaneously benefiting from the new pattern learning of generative models and the boundary enhancement of mixture models. We validate the effectiveness of our method on the task of classifying focal liver lesions (FLLs) in CT images. Our results demonstrate that GenMix enhances the performance of various generative models, including DCGAN, StyleGAN, Textual Inversion, and Diffusion Models. Notably, the proposed method with Textual Inversion outperforms other methods without fine-tuning diffusion model on the FLL dataset.	翻訳日:2024-07-18 21:57:43 公開日:2024-07-16
# 製品検索における関連判断のための大規模言語モデル Large Language Models for Relevance Judgment in Product Search ( http://arxiv.org/abs/2406.00247v2 ) ライセンス: Link先を確認	Navid Mehrdad, Hrushikesh Mohapatra, Mossaab Bagdouri, Prijith Chandran, Alessandro Magnani, Xunfan Cai, Ajit Puthenputhussery, Sachin Yadav, Tony Lee, ChengXiang Zhai, Ciya Liao,	(参考訳) 検索クエリに対する検索および再ランク項目の高関連性は、製品検索の成功の基盤であるが、クエリに対するアイテムの関連性の測定は、製品情報検索において最も困難な課題の1つであり、製品検索の品質は、利用可能な関連ラベル付きデータの正確性とスケールの影響を強く受けている。本稿では,大規模言語モデル (LLM) を利用したクエリ-イム対 (QIP) の関連判断を大規模に行うための一連の手法を提案する。マルチミリオンQIPのユニークなデータセットを用いて,低ランク適応 (LoRA) と低ランク適応 (LoRA) を併用した10億パラメトリックLCMの微調整のためのハイパーパラメータのテストと最適化を行い,LCMファインタニングにおけるアイテム属性の結合と促進の様々なモードについて検討し,関連性予測の品質に対するアイテム属性の包摂性を考慮したトレードオフを検討する。我々は,従来のLLMのベースライン,および市販のモデルに対して,人間の関連性評価値と同等の関連アノテーションに対して,大幅に改善されていることを示す。本研究は,製品検索における関連判断の自動化の分野への直接的な影響を示唆するものである。 High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for leveraging Large Language Models (LLMs) for automating the relevance judgment of query-item pairs (QIPs) at scale. Using a unique dataset of multi-million QIPs, annotated by human evaluators, we test and optimize hyper parameters for finetuning billion-parameter LLMs with and without Low Rank Adaption (LoRA), as well as various modes of item attribute concatenation and prompting in LLM finetuning, and consider trade offs in item attribute inclusion for quality of relevance predictions. We demonstrate considerable improvement over baselines of prior generations of LLMs, as well as off-the-shelf models, towards relevance annotations on par with the human relevance evaluators. Our findings have immediate implications for the growing field of relevance judgment automation in product search.	翻訳日:2024-07-18 21:57:43 公開日:2024-07-16
# テリル制御可能な状態表現の学習 Learning telic-controllable state representations ( http://arxiv.org/abs/2406.14476v2 ) ライセンス: Link先を確認	Nadav Amir, Stas Tiomkin, Angela Langdon,	(参考訳) 目的的行動の計算的記述は、記述的側面と規範的側面の両方から構成される。前者は、世界の現在(または未来)の状態を確認するために使用され、後者は、ある目標の下でこれらの状態の望ましさ、またはその欠如を評価するために使用される。強化学習(Reinforcement Learning)では、規範的側面(逆と値関数)は、事前定義された、固定された記述的側面(状態表現)に依存すると仮定される。ゴールは状態依存の報酬関数によって近似されるが、取得した状態表現自体を形作ることもできる。本稿では,有界エージェントにおける状態表現学習のための新しい計算フレームワークを提案する。本稿では, テリック状態表現の粒度と, 全てのテリック状態に到達するために必要な政策複雑性とのトレードオフを特徴付ける, テリック制御可能性の概念を紹介する。制御可能な状態表現を学習するためのアルゴリズムを提案する。当社のフレームワークは、目標の柔軟性とポリシの複雑さのバランスをとる状態表現を学習する上で、意図的な無知(どのエクスペリエンスを無視すべきかを知る)という重要な役割を強調しています。より広範に、我々の研究は、自然エージェントと人工エージェントの目標指向状態表現学習に関する統一的な理論的視点を推し進めている。 Computational descriptions of purposeful behavior comprise both descriptive and normative} aspects. The former are used to ascertain current (or future) states of the world and the latter to evaluate the desirability, or lack thereof, of these states under some goal. In Reinforcement Learning, the normative aspect (reward and value functions) is assumed to depend on a predefined and fixed descriptive one (state representation). Alternatively, these two aspects may emerge interdependently: goals can be, and indeed often are, approximated by state-dependent reward functions, but they may also shape the acquired state representations themselves. Here, we present a novel computational framework for state representation learning in bounded agents, where descriptive and normative aspects are coupled through the notion of goal-directed, or telic, states. We introduce the concept of telic controllability to characterize the tradeoff between the granularity of a telic state representation and the policy complexity required to reach all telic states. We propose an algorithm for learning controllable state representations, illustrating it using a simple navigation task with shifting goals. Our framework highlights the crucial role of deliberate ignorance -- knowing which features of experience to ignore -- for learning state representations that balance goal flexibility and policy complexity. More broadly, our work advances a unified theoretical perspective on goal-directed state representation learning in natural and artificial agents.	翻訳日:2024-07-18 21:47:53 公開日:2024-07-16
# OPT-Tree:適応的なドラフトツリー構造を持つ投機的デコーディング OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure ( http://arxiv.org/abs/2406.17276v2 ) ライセンス: Link先を確認	Jikai Wang, Yi Su, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Min Zhang,	(参考訳) 自動回帰言語モデルは、様々なシナリオにおいて優れたパフォーマンスを示す。しかし,1ステップ1ワード生成モードでは推論効率が制限されるため,モデルが大きくなったため,近年はプレッシャー問題となっている。投機的復号法では、複数のトークンを1ステップで生成できる「ドラフト・アンド・検証」機構を採用し、損失のない加速を実現する。既存の手法は主に固定ヒューリスティックなドラフト構造を採用しており、検証中の受け入れ長を最大化するために異なる状況に適応できない。このジレンマを緩和するために、適応的でスケーラブルなドラフトツリーを構築するアルゴリズムであるOPT-Treeを提案する。各復号ステップにおける受理長の数学的期待を最大化する最適な木構造を探索する。実験結果から, OPT-Treeは既存のドラフト構造より優れており, 自己回帰復号と比較して最大3.2の高速化率を実現していることがわかった。ドラフトモデルが十分に強力で、ノード予算が十分であれば、1ステップで10以上のトークンを生成することができる。私たちのコードはhttps://github.com/Jikai0Wang/OPT-Tree.comで公開されています。 Autoregressive language models demonstrate excellent performance in various scenarios. However, the inference efficiency is limited by its one-step-one-word generation mode, which has become a pressing problem recently as the models become increasingly larger. Speculative decoding employs a "draft and then verify" mechanism to allow multiple tokens to be generated in one step, realizing lossless acceleration. Existing methods mainly adopt fixed heuristic draft structures, which fail to adapt to different situations to maximize the acceptance length during verification. To alleviate this dilemma, we proposed OPT-Tree, an algorithm to construct adaptive and scalable draft trees. It searches the optimal tree structure that maximizes the mathematical expectation of the acceptance length in each decoding step. Experimental results reveal that OPT-Tree outperforms the existing draft structures and achieves a speed-up ratio of up to 3.2 compared with autoregressive decoding. If the draft model is powerful enough and the node budget is sufficient, it can generate more than ten tokens in a single step. Our code is available at https://github.com/Jikai0Wang/OPT-Tree.	翻訳日:2024-07-18 21:47:53 公開日:2024-07-16
# AnatoMask:リコンストラクション誘導型セルフマスキングによる医用画像セグメンテーションの強化 AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking ( http://arxiv.org/abs/2407.06468v2 ) ライセンス: Link先を確認	Yuheng Li, Tianyu Luan, Yizhou Wu, Shaoyan Pan, Yenho Chen, Xiaofeng Yang,	(参考訳) ラベル付きデータの不足により、ラベル付きデータから意味表現を抽出することにより、自己教師付き学習(SSL)が3次元画像のセグメンテーションにおいて大きな注目を集めている。 SSL戦略の中で、マスクされた画像をランダムに再構成して詳細な表現を学習することで、Masked Image Modeling (MIM)の有効性を示した。しかし, 従来のMIM法では, 医用画像の撮影に課題があるため, 良好な成績を収めるために, 広範囲なトレーニングデータが必要である。ランダムマスキングは医療画像内の全ての領域を均一にサンプリングするため、重要な解剖学的領域を見落とし、事前学習効率を低下させる可能性がある。本稿では,再建損失を利用して解剖学的に重要な領域を動的に識別・マスキングし,事前トレーニングの有効性を向上させる新しいMIM手法であるAnatoMaskを提案する。 AnatoMaskは自己蒸留アプローチを採用し、より重要なマスク領域を見つける方法と、これらのマスクされた領域を再構築する方法の両方を学ぶ。準最適学習を避けるため、Anatomaskはマスキングダイナミクス関数を用いて事前学習の難しさを段階的に調整する。我々は,CT,MRI,PETの4つのパブリックデータセットを用いて,複数の画像モダリティ(CT,MRI,PET)を用いて評価を行った。 AnatoMaskは既存のSSLメソッドよりも優れたパフォーマンスとスケーラビリティを示している。コードはhttps://github.com/ricklisz/AnatoMask.comで入手できる。 Due to the scarcity of labeled data, self-supervised learning (SSL) has gained much attention in 3D medical image segmentation, by extracting semantic representations from unlabeled data. Among SSL strategies, Masked image modeling (MIM) has shown effectiveness by reconstructing randomly masked images to learn detailed representations. However, conventional MIM methods require extensive training data to achieve good performance, which still poses a challenge for medical imaging. Since random masking uniformly samples all regions within medical images, it may overlook crucial anatomical regions and thus degrade the pretraining efficiency. We propose AnatoMask, a novel MIM method that leverages reconstruction loss to dynamically identify and mask out anatomically significant regions to improve pretraining efficacy. AnatoMask takes a self-distillation approach, where the model learns both how to find more significant regions to mask and how to reconstruct these masked regions. To avoid suboptimal learning, Anatomask adjusts the pretraining difficulty progressively using a masking dynamics function. We have evaluated our method on 4 public datasets with multiple imaging modalities (CT, MRI, and PET). AnatoMask demonstrates superior performance and scalability compared to existing SSL methods. The code is available at https://github.com/ricklisz/AnatoMask.	翻訳日:2024-07-18 21:38:02 公開日:2024-07-16
# データ汚染下における分断等角予測 Split Conformal Prediction under Data Contamination ( http://arxiv.org/abs/2407.07700v2 ) ライセンス: Link先を確認	Jase Clarkson, Wenkai Xu, Mihai Cucuringu, Gesine Reinert,	(参考訳) コンフォーマル予測(Conformal prediction)とは、データ交換可能な仮定の下で任意の予測モデルから予測間隔や集合を構築するための非パラメトリック手法である。予測セットの限界被覆に関する理論的保証が伴い、分割共形予測変種はモデルトレーニングと比較して計算コストが極めて低いことから人気がある。データ汚染条件下での分割共形予測のロバスト性について検討し、キャリブレーションスコアのごく一部がバルクと異なる分布から引き出されると仮定する。クリーンな」テストポイントで評価した場合, 破損したデータの影響を定量的に評価し, 数値実験による検証を行った。さらに,汚染ロバスト・コンフォーマル予測(Contamination Robust Conformal Prediction)と呼ぶ分類設定の調整を提案し,合成データと実データの両方を用いて本手法の有効性を検証する。 Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is popular as it comes with theoretical guarantees on the marginal coverage of the prediction sets and the split conformal prediction variant has a very low computational cost compared to model training. We study the robustness of split conformal prediction in a data contamination setting, where we assume a small fraction of the calibration scores are drawn from a different distribution than the bulk. We quantify the impact of the corrupted data on the coverage and efficiency of the constructed sets when evaluated on "clean" test points, and verify our results with numerical experiments. Moreover, we propose an adjustment in the classification setting which we call Contamination Robust Conformal Prediction, and verify the efficacy of our approach using both synthetic and real datasets.	翻訳日:2024-07-18 21:28:12 公開日:2024-07-16
# LiteGPT: 胸部X線像の局所化と分類作業のための大規模視覚言語モデル LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task ( http://arxiv.org/abs/2407.12064v1 ) ライセンス: Link先を確認	Khai Le-Duc, Ryan Zhang, Ngoc Son Nguyen, Tan-Hanh Pham, Anh Dao, Ba Hung Ngo, Anh Totti Nguyen, Truong-Son Hy,	(参考訳) 視覚言語モデルは幅広いタスクにわたって広範囲に探索され、良好な性能を保っているが、医療画像への応用は未解明のままである。本研究では,医用画像用統合フレームワークLiteGPTを提案する。複数の事前学習されたビジュアルエンコーダを利用して情報を強化し、視覚言語モデルの性能を向上させる。我々の知る限りでは、医用画像における共同局所化と分類の新たな課題に視覚言語モデルを利用した最初の研究である。また, 胸部X線における疾患局在の基準線を提供する先駆者でもある。最後に、よくベンチマークされたVinDr-CXRデータセット上で、画像分類タスクに新しい最先端性能を設定した。すべてのコードとモデルはオンラインで公開されている。 Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To the best of our knowledge, this is the first study to utilize vision-language models for the novel task of joint localization and classification in medical images. Besides, we are pioneers in providing baselines for disease localization in chest X-rays. Finally, we set new state-of-the-art performance in the image classification task on the well-benchmarked VinDr-CXR dataset. All code and models are publicly available online: https://github.com/leduckhai/LiteGPT	翻訳日:2024-07-18 21:28:12 公開日:2024-07-16
# 自動運転車評価のためのデータ選択手法 Data selection method for assessment of autonomous vehicles ( http://arxiv.org/abs/2407.12065v1 ) ライセンス: Link先を確認	Linh Trinh, Ali Anwar, Siegfried Mercelis,	(参考訳) 自動運転車の人気が高まるにつれて、ISO、NHTSA、Euro NCAPといった多くの標準や規制機関は、実際の世界に配備する前に十分なレベルの安全性を確保するために、安全性の検証を必要としている。製造業者は、この目的のために大量の公道データを収集します。しかしながら、これらのバリデーション活動の大部分は、人間が手作業で行います。さらに、各駆動特性を検証するために使用されるデータが異なる場合がある。その結果、検証プロセスの高速化を図りつつ、柔軟かつ動的に検証・検証に使用できる効率的なデータ選択方法を持つことが不可欠である。本稿では,自律走行車の評価を行う上で,実用的で柔軟かつ効率的なデータ選択手法を提案する。我々の考えは、選択したデータのメタデータ分布と、バリデーションに期待される事前定義されたメタデータ分布との類似性を最適化することである。 BDD100Kを用いた大規模データセット実験により,提案手法が効率的にデータ選択タスクを実行できることを示す。これらの結果から,本手法は信頼性が高く,各種安全機能の検証に有効なデータ選択に有用であることが示唆された。 As the popularity of autonomous vehicles has grown, many standards and regulators, such as ISO, NHTSA, and Euro NCAP, require safety validation to ensure a sufficient level of safety before deploying them in the real world. Manufacturers gather a large amount of public road data for this purpose. However, the majority of these validation activities are done manually by humans. Furthermore, the data used to validate each driving feature may differ. As a result, it is essential to have an efficient data selection method that can be used flexibly and dynamically for verification and validation while also accelerating the validation process. In this paper, we present a data selection method that is practical, flexible, and efficient for assessment of autonomous vehicles. Our idea is to optimize the similarity between the metadata distribution of the selected data and a predefined metadata distribution that is expected for validation. Our experiments on the large dataset BDD100K show that our method can perform data selection tasks efficiently. These results demonstrate that our methods are highly reliable and can be used to select appropriate data for the validation of various safety functions.	翻訳日:2024-07-18 21:28:12 公開日:2024-07-16
# 非拘束映像における時間的グラウンドインストラクショナルダイアグラム Temporally Grounding Instructional Diagrams in Unconstrained Videos ( http://arxiv.org/abs/2407.12066v1 ) ライセンス: Link先を確認	Jiahao Zhang, Frederic Z. Zhang, Cristian Rodriguez, Yizhak Ben-Shabat, Anoop Cherian, Stephen Gould,	(参考訳) ビデオ中の命令図の形式でクエリのシーケンスを同時にローカライズするという課題について検討する。これは個々のクエリだけでなく、相互関係も理解する必要がある。しかし、既存のほとんどの手法は、汎用的な相互排他性や時間的順序といったクエリの固有の構造を無視して、一度に1つのクエリを基底にすることに焦点を当てている。これにより、異なるステップダイアグラムの予測タイムパンが著しく重複したり、時間順序に反したりし、精度を損なう可能性がある。本稿では,一連のステップ図を同時に構築することにより,この問題に対処する。具体的には、ステップダイアグラムの視覚的特徴と学習可能な定数の位置埋め込みとを徹底的に組み合わせて構築した複合クエリを提案する。コンテントの特徴が異なる複合クエリ間の自己アテンションが抑制され,予測の時間的重複が減少するのに対して,クロスアテンションはコンテンツと位置ジョイントガイダンスによって時間的ミスアライメントを補正する。ステップダイアグラムのグラウンド化のためのIAWデータセットと自然言語クエリのグラウンド化のためのYouCook2ベンチマークに対するアプローチの有効性を示す。 We study the challenging problem of simultaneously localizing a sequence of queries in the form of instructional diagrams in a video. This requires understanding not only the individual queries but also their interrelationships. However, most existing methods focus on grounding one query at a time, ignoring the inherent structures among queries such as the general mutual exclusiveness and the temporal order. Consequently, the predicted timespans of different step diagrams may overlap considerably or violate the temporal order, thus harming the accuracy. In this paper, we tackle this issue by simultaneously grounding a sequence of step diagrams. Specifically, we propose composite queries, constructed by exhaustively pairing up the visual content features of the step diagrams and a fixed number of learnable positional embeddings. Our insight is that self-attention among composite queries carrying different content features suppress each other to reduce timespan overlaps in predictions, while the cross-attention corrects the temporal misalignment via content and position joint guidance. We demonstrate the effectiveness of our approach on the IAW dataset for grounding step diagrams and the YouCook2 benchmark for grounding natural language queries, significantly outperforming existing methods while simultaneously grounding multiple queries.	翻訳日:2024-07-18 21:28:12 公開日:2024-07-16
# MaskVD: 効率的なビデオオブジェクト検出のための領域マスキング MaskVD: Region Masking for Efficient Video Object Detection ( http://arxiv.org/abs/2407.12067v1 ) ライセンス: Link先を確認	Sreetama Sarkar, Gourav Datta, Souvik Kundu, Kai Zheng, Chirayata Bhattacharyya, Peter A. Beerel,	(参考訳) ビデオタスクは計算量が多いため、特に最先端のビジョントランスフォーマー(ViT)を必要とするタスクにおいて、リアルタイムアプリケーションにデプロイする際の課題となる。いくつかの研究は、ビデオの大部分がフレーム間でほとんど変化せず、フレームベースのビデオ処理における冗長な計算に繋がるという事実を活用することで、この問題に対処しようとしている。特に、フレーム間のピクセルやセマンティックな違いを利用する研究もあるが、メモリオーバーヘッドが大幅に増加するため、レイテンシのメリットは限られている。一方,本論文では,画像中の意味情報とフレーム間の時間的相関を利用して,ビデオフレーム内の領域をマスキングする手法を提案する。特に、以前のフレームから抽出した特徴を活用することで、ViTバックボーンは、領域マスキングから直接恩恵を受け、入力領域の80%をスキップし、FLOPとレイテンシを3.14倍、1.5倍改善することを示した。我々は、同様の検出性能を維持しながら、最新技術(SOTA)のメモリとレイテンシを2.3倍と1.14倍改善する。さらに,提案手法は畳み込みニューラルネットワーク(CNN)の有望な結果を示し,特殊計算カーネルを用いたSOTAの最大1.3倍のレイテンシ向上を実現する。 Video tasks are compute-heavy and thus pose a challenge when deploying in real-time applications, particularly for tasks that require state-of-the-art Vision Transformers (ViTs). Several research efforts have tried to address this challenge by leveraging the fact that large portions of the video undergo very little change across frames, leading to redundant computations in frame-based video processing. In particular, some works leverage pixel or semantic differences across frames, however, this yields limited latency benefits with significantly increased memory overhead. This paper, in contrast, presents a strategy for masking regions in video frames that leverages the semantic information in images and the temporal correlation between frames to significantly reduce FLOPs and latency with little to no penalty in performance over baseline models. In particular, we demonstrate that by leveraging extracted features from previous frames, ViT backbones directly benefit from region masking, skipping up to 80% of input regions, improving FLOPs and latency by 3.14x and 1.5x. We improve memory and latency over the state-of-the-art (SOTA) by 2.3x and 1.14x, while maintaining similar detection performance. Additionally, our approach demonstrates promising results on convolutional neural networks (CNNs) and provides latency improvements over the SOTA up to 1.3x using specialized computational kernels.	翻訳日:2024-07-18 21:28:12 公開日:2024-07-16
# 大規模言語モデル(LLM)を用いたグラフの学習 : モデルロバストネスの深層化 Learning on Graphs with Large Language Models(LLMs): A Deep Dive into Model Robustness ( http://arxiv.org/abs/2407.12068v1 ) ライセンス: Link先を確認	Kai Guo, Zewen Liu, Zhikai Chen, Hongzhi Wen, Wei Jin, Jiliang Tang, Yi Chang,	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理タスクにおいて顕著な性能を示している。近年,テキスト属性を持つグラフの学習を向上し,有望な性能を示すLLMベースのパイプラインがいくつか開発されている。しかし、グラフは敵攻撃の影響を受けやすいことがよく知られており、LLMがグラフ上での学習において堅牢性を示すかどうかは不明である。このギャップに対処するため,本研究は,グラフに対する敵対的攻撃の文脈におけるLLMの可能性を探究することを目的としている。具体的には, LLMs-as-Enhancers と LLMs-as-Predictors という2次元のグラフ構造とテキストの摂動に対する頑健性について検討する。より広範な実験により,LLM-as-EnhancersとLLM-as-Predictorsは,浅層モデルと比較して,構造的およびテキスト的攻撃に対して優れた堅牢性を有することが明らかとなった。さらに、我々のベンチマークライブラリを公開して、迅速かつ公平な評価を容易にし、この分野で進行中の革新的な研究を促進するようにしました。 Large Language Models (LLMs) have demonstrated remarkable performance across various natural language processing tasks. Recently, several LLMs-based pipelines have been developed to enhance learning on graphs with text attributes, showcasing promising performance. However, graphs are well-known to be susceptible to adversarial attacks and it remains unclear whether LLMs exhibit robustness in learning on graphs. To address this gap, our work aims to explore the potential of LLMs in the context of adversarial attacks on graphs. Specifically, we investigate the robustness against graph structural and textual perturbations in terms of two dimensions: LLMs-as-Enhancers and LLMs-as-Predictors. Through extensive experiments, we find that, compared to shallow models, both LLMs-as-Enhancers and LLMs-as-Predictors offer superior robustness against structural and textual attacks.Based on these findings, we carried out additional analyses to investigate the underlying causes. Furthermore, we have made our benchmark library openly available to facilitate quick and fair evaluations, and to encourage ongoing innovative research in this field.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 個人的アイデンティティのワンショットアンラーニング One-Shot Unlearning of Personal Identities ( http://arxiv.org/abs/2407.12069v1 ) ライセンス: Link先を確認	Thomas De Min, Subhankar Roy, Massimiliano Mancini, Stéphane Lathuilière, Elisa Ricci,	(参考訳) マシン・アンラーニング(MU)は、トレーニング中に見たことのないようなモデルからデータを消去することを目的としている。この範囲で、既存のMUアプローチはトレーニングデータへの完全または部分的なアクセスを前提としており、これはプライバシー規制のために時間とともに制限される可能性がある。しかし、そのようなシナリオにおけるMUメソッドの有効性を調査するための設定やベンチマークは存在しない。このギャップを埋めるために、トレーニングデータにアクセスできない場合の未学習モデルを評価できるOne-Shot Unlearning of Personal Identities (O-UPI) と呼ばれる新しいタスクを提案する。具体的には、トレーニング後のデータ削除を要求される現在の規制が関係しているIDアンラーニングケースに焦点を当てる。データの欠如に対処するため,利用者は未学習のポートレート画像の提供を期待する。 O-UPIの手法を評価するため,異なる未学習データセットサイズでCelebAとCelebA-HQデータセットの誤りをベンチマークした。我々は、この挑戦的なベンチマークで適用可能な手法を検証し、メタ学習者が1つの画像からアイデンティティを忘れる効果的な方法を提案する。得られたサンプルとトレーニング時に使用するデータとの相違点がある場合,データ可用性が制限された場合,既存のアプローチは困難であることが示唆された。受け入れ次第、コードとベンチマークをリリースします。 Machine unlearning (MU) aims to erase data from a model as if it never saw them during training. To this extent, existing MU approaches assume complete or partial access to the training data, which can be limited over time due to privacy regulations. However, no setting or benchmark exists to probe the effectiveness of MU methods in such scenarios, i.e. when training data is missing. To fill this gap, we propose a novel task we call One-Shot Unlearning of Personal Identities (O-UPI) that evaluates unlearning models when the training data is not accessible. Specifically, we focus on the identity unlearning case, which is relevant due to current regulations requiring data deletion after training. To cope with data absence, we expect users to provide a portraiting picture to perform unlearning. To evaluate methods in O-UPI, we benchmark the forgetting on CelebA and CelebA-HQ datasets with different unlearning set sizes. We test applicable methods on this challenging benchmark, proposing also an effective method that meta-learns to forget identities from a single image. Our findings indicate that existing approaches struggle when data availability is limited, with greater difficulty when there is dissimilarity between provided samples and data used at training time. We will release the code and benchmark upon acceptance.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# エッジ展開効率向上のための二元化変圧器とハードウェア加速器の共設計 Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment ( http://arxiv.org/abs/2407.12070v1 ) ライセンス: Link先を確認	Yuhao Ji, Chao Fang, Shaobo Ma, Haikuo Shao, Zhongfeng Wang,	(参考訳) トランスフォーマーモデルはAIタスクに革命をもたらしたが、その大きなサイズはリソース制約やレイテンシクリティカルなエッジデバイスへの実際のデプロイメントを妨げる。バイナライズされたトランスフォーマーは、モデルサイズを大幅に削減することで、有望なソリューションを提供するが、既存のアプローチでは、アルゴリズムとハードウェアのミスマッチに悩まされ、コデザイン探索が制限され、エッジデバイス上でのサブ最適化のパフォーマンスが向上する。そこで本研究では,アルゴリズム,ハードウェア,共同最適化の3つの側面から,トランスフォーマーのエンドツーエンド配置を効率的に行うための設計手法を提案する。まず、最適化された量子化手法とコンポーネントを備えたハードウェアフレンドリなバイナライズトランスであるBMTを提案し、重み付き三重分割トレーニング技術を活用することにより、モデル精度をさらに向上する。第2に,二項変換器を効率よく推定するための専用ユニットとスケジューリングパイプラインを備えたストリーミングプロセッサ混合二項変換器アクセラレータ,すなわちBATを開発した。最後に、我々は設計空間探索アプローチを通じてアルゴリズムとハードウェアを協調して最適化し、現実世界のデプロイメントにおける正確性、レイテンシ、堅牢性の間のグローバルなトレードオフを実現する。実験結果から,2.14-49.37倍のスループット向上と3.72-88.53倍のエネルギー効率を実現し,エンドツーエンドのエッジ展開を効果的に実現した。 Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing model size, existing approaches suffer from algorithm-hardware mismatches with limited co-design exploration, leading to suboptimal performance on edge devices. Hence, we propose a co-design method for efficient end-to-end edge deployment of Transformers from three aspects: algorithm, hardware, and joint optimization. First, we propose BMT, a novel hardware-friendly binarized Transformer with optimized quantization methods and components, and we further enhance its model accuracy by leveraging the weighted ternary weight splitting training technique. Second, we develop a streaming processor mixed binarized Transformer accelerator, namely BAT, which is equipped with specialized units and scheduling pipelines for efficient inference of binarized Transformers. Finally, we co-optimize the algorithm and hardware through a design space exploration approach to achieve a global trade-off between accuracy, latency, and robustness for real-world deployments. Experimental results show our co-design achieves up to 2.14-49.37x throughput gains and 3.72-88.53x better energy efficiency over state-of-the-art Transformer accelerators, enabling efficient end-to-end edge deployment.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# リレーショナル表現蒸留 Relational Representation Distillation ( http://arxiv.org/abs/2407.12073v1 ) ライセンス: Link先を確認	Nikolaos Giakoumoglou, Tania Stathaki,	(参考訳) 知識蒸留(KD)は、大きく訓練された教師モデルからより小さく、より効率的な学生モデルに知識を移す効果的な方法である。その成功にもかかわらず、KDの主な課題の1つは、学生の計算効率を維持しながら、複雑な知識の効率的な伝達を保証することである。明示的な負のインスタンスを促進するために対照的な目的を適用した以前の研究とは異なり、リレーショナル表現蒸留(RRD)を導入している。本手法は,教師モデルと学生モデルの関係を探索し,強化するために,ペアワイズな類似性を利用する。自己監督学習の原則に触発されて、正確な複製よりも類似性に焦点を当てた、リラックスした対照的な損失を使用する。本手法は,教師サンプルの出力分布を大容量メモリバッファに整列させ,厳密な負のインスタンス差分を伴わずに生徒モデルの堅牢性と性能を向上させる。提案手法はCIFAR-100よりも優れた性能を示し,従来のKD技術より優れ,最先端手法は13を超える。 Tiny ImageNetやSTL-10といった他のデータセットへの転送も成功している。コードはまもなく公開されます。 Knowledge distillation (KD) is an effective method for transferring knowledge from a large, well-trained teacher model to a smaller, more efficient student model. Despite its success, one of the main challenges in KD is ensuring the efficient transfer of complex knowledge while maintaining the student's computational efficiency. Unlike previous works that applied contrastive objectives promoting explicit negative instances, we introduce Relational Representation Distillation (RRD). Our approach leverages pairwise similarities to explore and reinforce the relationships between the teacher and student models. Inspired by self-supervised learning principles, it uses a relaxed contrastive loss that focuses on similarity rather than exact replication. This method aligns the output distributions of teacher samples in a large memory buffer, improving the robustness and performance of the student model without the need for strict negative instance differentiation. Our approach demonstrates superior performance on CIFAR-100, outperforming traditional KD techniques and surpassing 13 state-of-the-art methods. It also transfers successfully to other datasets like Tiny ImageNet and STL-10. The code will be made public soon.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 大規模モデルにおけるパラメータ効率と一般化の促進--正規化とマスク付き低ランク適応アプローチ Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach ( http://arxiv.org/abs/2407.12074v1 ) ライセンス: Link先を確認	Yuzhu Mao, Siqi Ping, Zihao Zhao, Yang Liu, Wenbo Ding,	(参考訳) 大規模言語モデル(LLM)のような大規模事前学習モデルでは、特にモバイルシステムでの応用において、パラメータサイズが広いため、微調整において重要なリソース課題が生じる。これを解決するため、ローランド適応(LoRA)は、良好な微調整結果を維持しつつ、資源消費を減らすために開発された。その効果にもかかわらず、オリジナルのLoRA法は最適化性能と過度な適合性の課題に直面している。本稿では,LoRA法により近似された行列更新の本質的な次元について検討し,本質的な次元を増大させることによる性能上の利点を明らかにする。正規化法と勾配マスキング法を用いることで,正規化法とMasked LoRA (RM-LoRA) と呼ばれる手法は,従来のLoRAや,様々なオープンソースビジョンや言語データセットにまたがる最新のバリエーションと比較して,同じあるいは低いトレーニング可能なパラメータ予算で優れた一般化性能を実現する。 Large pre-trained models, such as large language models (LLMs), present significant resource challenges for fine-tuning due to their extensive parameter sizes, especially for applications in mobile systems. To address this, Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results. Despite its effectiveness, the original LoRA method faces challenges of suboptimal performance and overfitting. This paper investigates the intrinsic dimension of the matrix updates approximated by the LoRA method and reveals the performance benefits of increasing this intrinsic dimension. By employing regularization and a gradient masking method that encourages higher intrinsic dimension, the proposed method, termed Regularized and Masked LoRA (RM-LoRA), achieves superior generalization performance with the same or lower trainable parameter budget compared to the original LoRA and its latest variants across various open-source vision and language datasets.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# Tiled Bit Networks:学習可能なバイナリベクトルの再利用によるサブビットニューラルネットワーク圧縮 Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors ( http://arxiv.org/abs/2407.12075v1 ) ライセンス: Link先を確認	Matt Gorbett, Hossein Shirazi, Indrakshi Ray,	(参考訳) バイナリニューラルネットワーク(BNN)は、ストレージと計算コストを節約して効率的なディープラーニングを実現する。しかしながら、ニューラルネットワークのサイズが拡大し続けるにつれて、計算要求を満たすことは依然として困難である。本研究では,2次重み付きニューラルネットワークのサブビット圧縮を実現するために,ビット列を持つタイル型ニューラルネットワーク層に対する新しい量子化方式を提案する。この方法は2進ベクトル(すなわちタイル)を学習し、アグリゲーションとリフォーム操作を通じてモデルの各層をポップアップさせる。推論中、この方法は全テンソルを表すために層ごとに1つのタイルを再利用する。私たちは完全に接続された層と畳み込み層の両方にアプローチを採用しています。経験的に、このアプローチは、様々なアーキテクチャ(CNN、トランスフォーマー、MPP)とタスク(分類、セグメンテーション、時系列予測)において、バイナリ重み付けモデルと比較して最大8倍の精度で、ほぼ完全な性能を達成する。我々は、Tiled Bit Networksに2つの実装を提供している。 1) 資源制約環境におけるその実現可能性を評価するため, マイクロコントローラにモデルを展開する。 2) GPU互換の推論カーネルで、メモリ内の1層当たりのタイルの再利用を容易にする。 Binary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational requirements remains a challenge. In this work, we propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. The method learns binary vectors (i.e. tiles) to populate each layer of a model via aggregation and reshaping operations. During inference, the method reuses a single tile per layer to represent the full tensor. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures. Empirically, the approach achieves near fullprecision performance on a diverse range of architectures (CNNs, Transformers, MLPs) and tasks (classification, segmentation, and time series forecasting) with up to an 8x reduction in size compared to binary-weighted models. We provide two implementations for Tiled Bit Networks: 1) we deploy the model to a microcontroller to assess its feasibility in resource-constrained environments, and 2) a GPU-compatible inference kernel to facilitate the reuse of a single tile per layer in memory.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# GoldFinch:Linear Pre-FillとExtreme KV-Cache圧縮を備えた高性能RWKV/Transformerハイブリッド GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression ( http://arxiv.org/abs/2407.12077v1 ) ライセンス: Link先を確認	Daniel Goldstein, Fares Obeid, Eric Alcaide, Guangyu Song, Eugene Cheah,	(参考訳) 我々は,線形時間と空間において高圧縮・再利用可能なKVキャッシュを効率よく生成する新しい手法を用いたハイブリッド線形アテンション/トランスフォーマーシーケンスモデルGoldFinchを紹介する。 GoldFinchは、Finch(RWKV-6)アーキテクチャの拡張版の上に、新しいGOLDトランスフォーマーを積み重ねています。我々は、Finch、Llama、GoldFinchアーキテクチャの1.5Bパラメータクラスモデルをトレーニングし、FinchおよびLlamaと比較して、劇的に改善されたモデリング性能を見出した。キャッシュサイズ削減はモデル層数とともに線形的に増加し,従来型のトランスフォーマーキャッシュの756～2550倍の小型化が可能となり,限られたハードウェア上でも極めて大きなコンテキスト長の推測が可能となった。自己回帰生成はトークン毎のO(n)時間複雑性を持つが、このキャッシュを生成するためにリカレントニューラルネットワーク(RNN)を使用するため、送信されたコンテキストに対する初期キャッシュ状態全体のプリフィル計算はトークン毎のO(1)時間しかかからない。コミュニティ利用のためのApache 2.0ライセンスの下で、トレーニングされたウェイトとトレーニングコードをリリースしています。 We introduce GoldFinch, a hybrid Linear Attention/Transformer sequence model that uses a new technique to efficiently generate a highly compressed and reusable KV-Cache in linear time and space with respect to sequence length. GoldFinch stacks our new GOLD transformer on top of an enhanced version of the Finch (RWKV-6) architecture. We train up to 1.5B parameter class models of the Finch, Llama, and GoldFinch architectures, and find dramatically improved modeling performance relative to both Finch and Llama. Our cache size savings increase linearly with model layer count, ranging from 756-2550 times smaller than the traditional transformer cache for common sizes, enabling inference of extremely large context lengths even on limited hardware. Although autoregressive generation has O(n) time complexity per token because of attention, pre-fill computation of the entire initial cache state for a submitted context costs only O(1) time per token due to the use of a recurrent neural network (RNN) to generate this cache. We release our trained weights and training code under the Apache 2.0 license for community use.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 非ガウス状態の絡み合い構造とその測定方法 Entanglement Structure of Non-Gaussian States and How to Measure It ( http://arxiv.org/abs/2407.12083v1 ) ライセンス: Link先を確認	Henry Froland, Torsten V. Zache, Robert Ott, Niklas Mueller,	(参考訳) 量子シミュレーターの急速に成長する量子多体現象を探索する能力は、ますます複雑な状態を特徴づける新しい方法を必要とする。本稿では,システムサイズと多項式的にしかスケールしない相関関数を実験的に測定することで,量子状態の制約を行うプロトコルを提案する。この方法は量子状態の絡み合い構造の測定を可能にし、絡み合いに関連する現象を研究するための新しい経路を開く。我々の手法は高次相関を体系的に組み込むことでガウス状態パラメータ化を拡張する。本稿では,提案プロトコルが現在および今後の実験能力とともに有用であることを示し,概念実証として弱い相互作用を持つフェルミオンに着目した。ここでは、最も低い非自明な展開は、ハミルトンの絡み合いによって示される量子カオスのオンセットのシグナルを含む初期の熱化ダイナミクスを定量的に予測する。 Rapidly growing capabilities of quantum simulators to probe quantum many-body phenomena require new methods to characterize increasingly complex states. We present a protocol that constrains quantum states by experimentally measured correlation functions which only scales polynomially with system size. This method enables measurement of a quantum state's entanglement structure, opening a new route to study entanglement-related phenomena. Our approach extends Gaussian state parameterizations by systematically incorporating higher-order correlations. We show the protocol's usefulness in conjunction with current and forthcoming experimental capabilities, focusing on weakly interacting fermions as a proof of concept. Here, the lowest non-trivial expansion quantitatively predicts early time thermalization dynamics, including signaling the on-set of quantum chaos indicated by the entanglement Hamiltonian.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 空洞埋没時のマヨラナ境界状態の高品質化 High-quality poor man's Majorana bound states from cavity embedding ( http://arxiv.org/abs/2407.12088v1 ) ライセンス: Link先を確認	Álvaro Gómez-León, Marco Schirò, Olesia Dmytruk,	(参考訳) 粗い男のマヨアナ境界状態(MBS)は、パラメータがスイートスポットに微調整されたときに、最小限のキータエフ鎖に現れる。単一モードキャビティに結合した相互作用する2部位の北エフ鎖を考えると, スイートスポット状態は, キャビティ周波数とサイト間のホッピングによって制御可能であることを示す。さらに,光子を介する効果的な相互作用は,本質的な相互作用のスクリーニングやMBSの本来の品質向上に有効であることを示す。キャビティ伝達における実験的なシグネチャを記述し,その存在と品質を検出する。我々の研究は、空洞に結合された量子ドットアレイで貧しい人間のMBSをチューニングする新しい方法を提案する。 Poor man's Majorana Bound States (MBS) arise in minimal Kitaev chains when the parameters are fine-tuned to a sweet spot. We consider an interacting two-site Kitaev chain coupled to a single-mode cavity and show that the sweet spot condition can be controlled with the cavity frequency and the hopping between sites. Furthermore, we demonstrate that photon-mediated effective interactions can be used to screen intrinsic interactions, improving the original quality of the MBS. We describe experimental signatures in the cavity transmission to detect their presence and quality. Our work proposes a new way to tune poor man's MBS in a quantum dot array coupled to a cavity.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 近藤効果における異方性の関係-シンプレクティックケースからの教訓- Relevance of Anisotropy in the Kondo Effect -- Lessons From the Symplectic Case ( http://arxiv.org/abs/2407.12093v1 ) ライセンス: Link先を確認	Matan Lotem, Sarath Sankar, Tianhao Ren, Moshe Goldstein, Elio. J. König, Andreas Weichselbaum, Eran Sela, Alexei M. Tsvelik,	(参考訳) シンプレクティック対称性を持つ近藤模型は, 超伝導アイランドデバイスの有効低エネルギー理論として最近提案された。非フェルミ液体物理学と有効エノンを持つこのモデルは、位相的近藤効果のクラスに属すると論じられた。ここでは、ボゾン化と共形場理論とともに摂動的および数値的再正規化群を用いて、その異方的不動点の安定性の程度を明らかにする。従来の主張とは対照的に、鉛とのカップリングにおける非対称性が非フェルミ液体を不安定化することを示す。その他の不安定な摂動には、超伝導対の非対称性や、島内の個々の量子ドットの内部エネルギーが含まれる。それでもこれらの摂動は、すべて同じ関連する作用素を生成する。したがって、結合を個別に調整する必要は少なく、これらは実験的な利便性に応じて選択できる。本結果は,近藤結合における異方性は常に無関係であるという共通の誤解を浮き彫りにしている。証明されたように、群生成元が不純物作用素の全空間にまたがらないとき、関連する用語が現れる。これは、大スピン不純物やSO(M)コンドモデルのような、この性質を示すモデルのより詳細な検査を要求する。 A Kondo model with symplectic symmetry was recently put forward as the effective low-energy theory of a superconducting-island device coupled to multiple leads. This model, which possesses non-Fermi liquid physics and effective anyons, was argued to belong to the class of topological Kondo effects. Here, we clarify the extent of stability of its exotic fixed point using perturbative and numerical renormalization group in conjunction with bosonization and conformal field theory. In contrast to previous claims, we show that asymmetry in the coupling to the leads destabilizes the non-Fermi liquid. Other destabilizing perturbations include asymmetry in the superconducting pairing or internal energy of the individual quantum dots in the island. Nevertheless, these perturbations all generate the same relevant operators. Thus, only a small number of couplings need to be tuned individually, and these can be selected according to experimental convenience. Our results highlight a common misconception that anisotropy in the Kondo coupling is always irrelevant. As demonstrated, relevant terms will emerge whenever the group generators do not span the full space of impurity operators. This calls for a more detailed inspection of models that exhibit this property, such as large-spin impurities and SO(M) Kondo models	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 対話文中の話者の識別:事前学習型言語モデルを用いたテキストベースアプローチ Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models ( http://arxiv.org/abs/2407.12094v1 ) ライセンス: Link先を確認	Minh Nguyen, Franck Dernoncourt, Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan Rossi, Quan Hung Tran, Trung Bui, Thien Huu Nguyen,	(参考訳) 本稿では,デジタルメディアアーカイブにおけるコンテンツアクセシビリティと検索可能性を高めるための重要な課題である,対話テキスト中の話者名同定手法を提案する。音声認識の進歩にもかかわらず、テキストベースの話者識別(SpeakerID)のタスクには、効果的なモデルトレーニングのための大規模で多様なデータセットが欠如している。これらのギャップに対処するために,メディアサムコーパスから派生した,幅広いメディアソースからの転写を含む,新しい大規模データセットを提案する。本稿では,話者名を正確に属性付けるために,対話中の文脈的手がかりを活用する,話者IDに適したトランスフォーマーモデルを提案する。広範囲な実験を通して、我々の最良のモデルは 80.3\% の精度を達成し、SpeakerID のベンチマークを新たに設定する。データとコードはここで公開されている。 \url{https://github.com/adobe-research/speaker-identification} We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experiments, our best model achieves a great precision of 80.3\%, setting a new benchmark for SpeakerID. The data and code are publicly available here: \url{https://github.com/adobe-research/speaker-identification}	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 正規化ワッサースタイン距離を用いたシミュレーション出力分布の集約クラスタリング Agglomerative Clustering of Simulation Output Distributions Using Regularized Wasserstein Distance ( http://arxiv.org/abs/2407.12100v1 ) ライセンス: Link先を確認	Mohammadmahdi Ghasemloo, David J. Eckman,	(参考訳) 本稿では,確率シミュレータによるデータに対するクラスタリング手法の適用について検討し,異常検出,事前最適化,オンラインモニタリングへの応用について述べる。本稿では,正規化ワッサーシュタイン距離を用いて経験分布をクラスタリングする集合的クラスタリングアルゴリズムを導入し,その手法をコールセンタモデルに適用する。 We investigate the use of clustering methods on data produced by a stochastic simulator, with applications in anomaly detection, pre-optimization, and online monitoring. We introduce an agglomerative clustering algorithm that clusters multivariate empirical distributions using the regularized Wasserstein distance and apply the proposed methodology on a call-center model.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 関連情報ゲインを用いたRAGの改善 Better RAG using Relevant Information Gain ( http://arxiv.org/abs/2407.12101v1 ) ライセンス: Link先を確認	Marc Pickett, Jeremy Hartman, Ayan Kumar Bhowmick, Raquib-ul Alam, Aditya Vempaty,	(参考訳) 大きな言語モデル(LLM)のメモリを拡張する一般的な方法は、より大きなメモリから取得したテキストをLLMのコンテキストウィンドウに挿入する検索拡張生成(RAG)である。しかし、コンテキストウィンドウは通常数千のトークンに制限されており、モデルが応答したことを知らせる検索されたパスの数を制限する。このため,検索したパス間の多様性の度合いを確保することにより,冗長な情報によるコンテキストウィンドウの占有を回避することが重要である。同時に、情報は現在のタスクにも関係するべきです。 MMR(Maximal Marginal Relevance)のような、得られた結果の多様性を促進する最も以前の手法は、多様性と妥当性を明確に取り除く目的を組み込むことによって実現している。本稿では,検索結果の集合に対するクエリに関連する総情報の確率的尺度である,関連情報ゲインに基づく新しい単純な最適化指標を提案する。この計量を最適化することで、多様性は我々のシステムから有機的に現れる。 RAGシステムの検索コンポーネントのドロップイン置換として使用すると、RGB(Retrieval Augmented Generation Benchmark)から質問応答タスクの最先端性能が得られ、妥当性と多様性を直接最適化する既存の指標よりも優れる。 A common way to extend the memory of large language models (LLMs) is by retrieval augmented generation (RAG), which inserts text retrieved from a larger memory into an LLM's context window. However, the context window is typically limited to several thousand tokens, which limits the number of retrieved passages that can inform a model's response. For this reason, it's important to avoid occupying context window space with redundant information by ensuring a degree of diversity among retrieved passages. At the same time, the information should also be relevant to the current task. Most prior methods that encourage diversity among retrieved results, such as Maximal Marginal Relevance (MMR), do so by incorporating an objective that explicitly trades off diversity and relevance. We propose a novel simple optimization metric based on relevant information gain, a probabilistic measure of the total information relevant to a query for a set of retrieved results. By optimizing this metric, diversity organically emerges from our system. When used as a drop-in replacement for the retrieval component of a RAG system, this method yields state-of-the-art performance on question answering tasks from the Retrieval Augmented Generation Benchmark (RGB), outperforming existing metrics that directly optimize for relevance and diversity.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 大規模合成テキスト生成のためのプライベート予測 Private prediction for large-scale synthetic text generation ( http://arxiv.org/abs/2407.12108v1 ) ライセンス: Link先を確認	Kareem Amin, Alex Bie, Weiwei Kong, Alexey Kurakin, Natalia Ponomareva, Umar Syed, Andreas Terzis, Sergei Vassilvitskii,	(参考訳) 本稿では,大規模言語モデル (LLM) を用いた個人用テキスト生成手法を提案する。プライベートな予測フレームワークでは、差分プライバシー保証を満たすために出力された合成データのみを必要とする。これは、潜在的に敏感なユーザ供給ソースデータに対して生成モデルをトレーニングし、モデル自体が安全にリリースできるようにするアプローチとは対照的である。我々は、ソースデータで事前訓練されたLLMを起動するが、次の注意すべき予測が、異なるプライバシ保証で実行されることを保証する。このパラダイムの以前の研究は、適切なプライバシーレベルで少数の例(10)を生成したと報告していた。対照的に、私たちは何千もの高品質な合成データポイントを生成できるように変更し、潜在的なアプリケーションセットを大きく拡大します。我々の改善は、LLMのトークンをサンプリングするソフトマックス層と指数的なメカニズムとの等価性を活用することで、プライバシー分析の改善と、より優れたプライベート選択機構によって実現されている。さらに、機密データなしで予測可能なトークンに対して、プライバシコストを払わないスパースベクター手法によるパブリック予測を新たに導入し、構造化データに特に有効であることが判明した。 We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the model itself is safe to release. We prompt a pretrained LLM with source data, but ensure that next-token predictions are made with differential privacy guarantees. Previous work in this paradigm reported generating a small number of examples (<10) at reasonable privacy levels, an amount of data that is useful only for downstream in-context learning or prompting. In contrast, we make changes that allow us to generate thousands of high-quality synthetic data points, greatly expanding the set of potential applications. Our improvements come from an improved privacy analysis and a better private selection mechanism, which makes use of the equivalence between the softmax layer for sampling tokens in LLMs and the exponential mechanism. Furthermore, we introduce a novel use of public predictions via the sparse vector technique, in which we do not pay privacy costs for tokens that are predictable without sensitive data; we find this to be particularly effective for structured data.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 公正なグラフ学習のためのベンチマーク A Benchmark for Fairness-Aware Graph Learning ( http://arxiv.org/abs/2407.12112v1 ) ライセンス: Link先を確認	Yushun Dong, Song Wang, Zhenyu Lei, Zaiyi Zheng, Jing Ma, Chen Chen, Jundong Li,	(参考訳) 公正なグラフ学習は近年注目を集めている。それにもかかわらず、さまざまな公正を意識したグラフ学習手法の評価と比較を行うための包括的なベンチマークが欠けているため、実践者がより広い現実世界のアプリケーションに適切なものを選択するのを妨げている。本稿では,10の代表的な公正性を考慮したグラフ学習手法に関する広範なベンチマークを示す。具体的には、グループフェアネス、個人フェアネス、異なるフェアネス基準間のバランス、計算効率など、複数の視点からこれらの手法を評価するために、体系的な評価プロトコルを設計し、7つの実世界のデータセット上で実験を行う。我々の詳細な分析は、既存の手法の強みと限界に関する重要な洞察を明らかにしている。さらに,フェアネスを考慮したグラフ学習手法を応用するための実践的ガイダンスを提供する。我々の知識を最大限に活用するために、本研究は、この分野の今後の進歩を促進するために、代表的公正を意識したグラフ学習手法を包括的に理解するための最初のステップとなる。 Fairness-aware graph learning has gained increasing attention in recent years. Nevertheless, there lacks a comprehensive benchmark to evaluate and compare different fairness-aware graph learning methods, which blocks practitioners from choosing appropriate ones for broader real-world applications. In this paper, we present an extensive benchmark on ten representative fairness-aware graph learning methods. Specifically, we design a systematic evaluation protocol and conduct experiments on seven real-world datasets to evaluate these methods from multiple perspectives, including group fairness, individual fairness, the balance between different fairness criteria, and computational efficiency. Our in-depth analysis reveals key insights into the strengths and limitations of existing methods. Additionally, we provide practical guidance for applying fairness-aware graph learning methods in applications. To the best of our knowledge, this work serves as an initial step towards comprehensively understanding representative fairness-aware graph learning methods to facilitate future advancements in this area.	翻訳日:2024-07-18 21:18:26 公開日:2024-07-16
# 都市空調における信頼性・リアルタイムフリートスケジューリングのためのグラフベース逆模倣学習フレームワーク A Graph-based Adversarial Imitation Learning Framework for Reliable & Realtime Fleet Scheduling in Urban Air Mobility ( http://arxiv.org/abs/2407.12113v1 ) ライセンス: Link先を確認	Prithvi Poddar, Steve Paul, Souma Chowdhury,	(参考訳) UAM(Urban Air Mobility)の出現は、都市交通の領域における変革的シフトの範囲を示す。しかし、その普及と経済性は、空域の混雑、気象条件の変化、および様々な要求に起因する不確実性の下で、UAMネットワーク内のバーチポートを横断する航空機の艦隊を最適にスケジュールする能力に部分的に依存している。そこで本論文では, 整数型非線形計画問題の直接解法は, 日次スケジューリングでは計算が不可能であるため, フラッグスケジューリング問題の総合的な最適化を図りながら, 代替解法の必要性を同定する。従来の研究は、(グラフ)強化学習(RL)アプローチを用いて、艦隊スケジューリングのためのリアルタイム実行可能なポリシーモデルを訓練することの有効性を示した。しかし、そのようなポリシーは、アウト・オブ・ディストリビューションのシナリオやエッジのケースでは不安定であることが多い。さらに、問題の複雑さ(例えば制約の数)が増加するにつれて、トレーニングパフォーマンスも悪化する。これらの問題に対処するために,RLに基づくポリシーは,遺伝的アルゴリズムを用いて正確な最適化を解くことで得られる専門家の実証を活用できる模擬学習手法を提案する。ポリシーモデルは、バーティポートと航空機の空間を埋め込むグラフニューラルネットワーク(GNN)ベースのエンコーダ、需要、乗客運賃、輸送コストプロファイルをエンコードするトランスフォーマーネットワーク、マルチヘッドアテンション(MHA)ベースのデコーダを含む。専門家によるデモンストレーションは、GAIL(Generative Adversarial Imitation Learning)アルゴリズムを通じて行われている。 8機と40機からなるUAMシミュレーション環境と対話し、毎日の利益が報われるという観点から、新しい模倣アプローチは、純粋なRL結果と比較して、目に見えない最悪のシナリオの場合において、より良い平均性能と顕著な改善を達成する。 The advent of Urban Air Mobility (UAM) presents the scope for a transformative shift in the domain of urban transportation. However, its widespread adoption and economic viability depends in part on the ability to optimally schedule the fleet of aircraft across vertiports in a UAM network, under uncertainties attributed to airspace congestion, changing weather conditions, and varying demands. This paper presents a comprehensive optimization formulation of the fleet scheduling problem, while also identifying the need for alternate solution approaches, since directly solving the resulting integer nonlinear programming problem is computationally prohibitive for daily fleet scheduling. Previous work has shown the effectiveness of using (graph) reinforcement learning (RL) approaches to train real-time executable policy models for fleet scheduling. However, such policies can often be brittle on out-of-distribution scenarios or edge cases. Moreover, training performance also deteriorates as the complexity (e.g., number of constraints) of the problem increases. To address these issues, this paper presents an imitation learning approach where the RL-based policy exploits expert demonstrations yielded by solving the exact optimization using a Genetic Algorithm. The policy model comprises Graph Neural Network (GNN) based encoders that embed the space of vertiports and aircraft, Transformer networks to encode demand, passenger fare, and transport cost profiles, and a Multi-head attention (MHA) based decoder. Expert demonstrations are used through the Generative Adversarial Imitation Learning (GAIL) algorithm. Interfaced with a UAM simulation environment involving 8 vertiports and 40 aircrafts, in terms of the daily profits earned reward, the new imitative approach achieves better mean performance and remarkable improvement in the case of unseen worst-case scenarios, compared to pure RL results.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# Wigner関数のモーメントを用いた連続変数状態の非古典性の効率的な検出 Efficient detection of non-classicality of continuous variable states using moments of Wigner function ( http://arxiv.org/abs/2407.12116v1 ) ライセンス: Link先を確認	Bivas Mallick, Sudip Chakrabarty, Saheli Mukherjee, Ananda G. Maity, A. S. Majumdar,	(参考訳) 非古典状態の重要なサブクラスである負のウィグナー関数を持つ状態は、様々な量子情報処理タスクの貴重な資源として機能する。ここでは、負のウィグナー関数を示すような量子状態を検出するための基準を提供する。本手法は, 単純な関数を計算し, 完全な状態トモグラフィやウィグナー関数再構成を必要とせずに実実験で実装できるWigner関数のモーメントを評価することに依存する。次に、検出方式をサポートするための明示的な例を示す。さらに,連続変数SWAP演算子を用いて実実験でこれらのモーメントを実現する実験手法を提案する。 States with negative Wigner function, a significant subclass of non-classical states, serve as a valuable resource for various quantum information processing tasks. Here, we provide a criterion for detecting such quantum states exhibiting negative Wigner function. Our method relies on evaluating moments of the Wigner function which involves computing simple functionals and can be implemented in a real experiment without the need for full state tomography or Wigner function reconstruction. We then provide explicit examples to support our detection scheme. Further, we propose an experimental method utilizing the continuous variable SWAP operator to realize these moments in a real experiment.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# 8GPU上で100万シーケンスの7B LLMを効率的にトレーニングする Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs ( http://arxiv.org/abs/2407.12117v1 ) ライセンス: Link先を確認	Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui,	(参考訳) 現在、LLM(Large Language Models)は、よりクリエイティブなアプリケーションを促進するために、拡張コンテキスト長を使用して訓練されている。しかし、長いコンテキストトレーニングはGPUメモリの制約を考慮すると大きな課題となる。トレーニング中にメモリ消費が相当に活性化されるだけでなく、メモリの断片化も生じる。長期のコンテキストトレーニングを容易にするため、既存のフレームワークでは、再計算や様々な形式の並列処理といった戦略を採用している。しかしながら、これらの手法は冗長な計算や広範囲な通信に依存しており、結果としてモデルFLOPS(MFU)が低くなる。本稿では,メモリ管理の微粒化を目的とした新しいLCMトレーニングフレームワークMEMOを提案する。 FlashAttentionを使用する場合、メモリの2次スケーリングとシーケンス長の線形スケーリングを考慮し、各レイヤの前方通過後にメモリ消費の活性化をCPUメモリにオフロードし、後方通過時にそれらをフェッチする。演算を邪魔することなくアクティベーションのスワップを最大化し、限られたCPUメモリの浪費を避けるため、トークン単位のアクティベーション再計算とスワップ機構を実装した。さらに,2レベル混合整数プログラミング(MIP)アプローチを採用し,トランスフォーマー層間のメモリ再利用を最適化することで,メモリ断片化の問題に取り組む。実験の結果、MEMOはMegatron-LMとDeepSpeedと比較して平均2.42倍、平均2.26倍のMFUを達成することが示された。この改善は、メモリの断片化を最小限に抑え、再計算と集中的な通信を減らし、断片化によるメモリ再編成プロセスに伴う遅延を回避できるMEMOの能力に起因している。きめ細かいアクティベーションメモリ管理を活用することで、MEMOはわずか8A800 GPU上で100万のシーケンス長を持つ7B LLMの効率的なトレーニングを可能にし、52.30%のMFUを達成する。 Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing frameworks have adopted strategies such as recomputation and various forms of parallelisms. Nevertheless, these techniques rely on redundant computation or extensive communication, resulting in low Model FLOPS Utilization (MFU). In this paper, we propose MEMO, a novel LLM training framework designed for fine-grained activation memory management. Given the quadratic scaling of computation and linear scaling of memory with sequence lengths when using FlashAttention, we offload memory-consuming activations to CPU memory after each layer's forward pass and fetch them during the backward pass. To maximize the swapping of activations without hindering computation, and to avoid exhausting limited CPU memory, we implement a token-wise activation recomputation and swapping mechanism. Furthermore, we tackle the memory fragmentation issue by employing a bi-level Mixed Integer Programming (MIP) approach, optimizing the reuse of memory across transformer layers. Empirical results demonstrate that MEMO achieves an average of 2.42x and 2.26x MFU compared to Megatron-LM and DeepSpeed, respectively. This improvement is attributed to MEMO's ability to minimize memory fragmentation, reduce recomputation and intensive communication, and circumvent the delays associated with the memory reorganization process due to fragmentation. By leveraging fine-grained activation memory management, MEMO facilitates efficient training of 7B LLM with 1 million sequence length on just 8 A800 GPUs, achieving an MFU of 52.30%.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# FoodMem:リアルタイムと精密なフードビデオセグメンテーション FoodMem: Near Real-time and Precise Food Video Segmentation ( http://arxiv.org/abs/2407.12121v1 ) ライセンス: Link先を確認	Ahmad AlMughrabi, Adrián Galán, Ricardo Marques, Petia Radeva,	(参考訳) ビデオを含む食品のセグメンテーションは、現実世界の健康、農業、食品バイオテクノロジーの問題に対処するために不可欠である。現在の制限は、不正確な栄養分析、非効率な作物管理、最適な食品加工につながり、食料安全保障と公衆衛生に影響を及ぼす。セグメンテーション技術の改善は、食物アセスメント、農業生産性、および食品生産プロセスを向上させることができる。本研究では、最小限のハードウェアリソースを用いて、高品質でほぼリアルタイムなセグメンテーションとビデオ内の食品の追跡のための堅牢なフレームワークの開発を紹介する。私たちは、360度無境界シーンのビデオシーケンスから食品を分割する新しいフレームワーク、FoodMemを紹介します。 FoodMemは、ビデオ処理コンテキストにおけるフリッカリングや禁止推論速度といった、既存のセマンティックセグメンテーションモデルの制限を克服して、ビデオシーケンス内の食品部分のマスクを一貫して生成することができる。これらの問題に対処するため、FoodMemは、トランスフォーマーセグメンテーションフェーズを使用して、初期セグメンテーションマスクと、複雑なシーンにおけるフードマスクを監視するメモリベースのトラッキングフェーズを生成する。われわれのフレームワークは、現在の最先端食品セグメンテーションモデルより優れており、カメラアングル、照明、反射、シーンの複雑さ、食品の多様性など、様々な条件で優れたパフォーマンスが得られる。これにより、セグメンテーションノイズの低減、アーティファクトの除去、欠落セグメントの完成が実現される。ここでは、以前のベンチマークにない挑戦的なシナリオを含む、新しい注釈付き食品データセットについても紹介する。 Nutrition5k と Vegetables & Fruits のデータセットで実施された大規模な実験は、FoodMem が食品ビデオのセグメンテーションにおける平均精度を2.5%向上し、平均で58倍高速であることを示した。 Food segmentation, including in videos, is vital for addressing real-world health, agriculture, and food biotechnology issues. Current limitations lead to inaccurate nutritional analysis, inefficient crop management, and suboptimal food processing, impacting food security and public health. Improving segmentation techniques can enhance dietary assessments, agricultural productivity, and the food production process. This study introduces the development of a robust framework for high-quality, near-real-time segmentation and tracking of food items in videos, using minimal hardware resources. We present FoodMem, a novel framework designed to segment food items from video sequences of 360-degree unbounded scenes. FoodMem can consistently generate masks of food portions in a video sequence, overcoming the limitations of existing semantic segmentation models, such as flickering and prohibitive inference speeds in video processing contexts. To address these issues, FoodMem leverages a two-phase solution: a transformer segmentation phase to create initial segmentation masks and a memory-based tracking phase to monitor food masks in complex scenes. Our framework outperforms current state-of-the-art food segmentation models, yielding superior performance across various conditions, such as camera angles, lighting, reflections, scene complexity, and food diversity. This results in reduced segmentation noise, elimination of artifacts, and completion of missing segments. Here, we also introduce a new annotated food dataset encompassing challenging scenarios absent in previous benchmarks. Extensive experiments conducted on Nutrition5k and Vegetables & Fruits datasets demonstrate that FoodMem enhances the state-of-the-art by 2.5% mean average precision in food video segmentation and is 58 x faster on average.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# LLMs-in-the-loop Part-1:バイオメディカルテキスト翻訳のためのエキスパート・スモールAIモデル LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation ( http://arxiv.org/abs/2407.12126v1 ) ライセンス: Link先を確認	Bunyamin Keles, Murat Gunay, Serdar I. Caglar,	(参考訳) 機械翻訳は、言語にまたがる医療知識のグローバルな普及を可能にするために、医療において不可欠である。しかし、複雑な医学用語は、適切な翻訳品質と精度を達成するために固有の課題を生んでいる。本研究では,医療用テキストに最適化された教師ありニューラルマシン翻訳モデルを開発するために,新しい"LLMs-in-the-loop"アプローチを提案する。大規模言語モデル(LLM)は強力な能力を示しているが、この研究は、高品質なドメイン(主に合成された)データに基づいて訓練された小さな特殊なモデルの方が、さらに大きなLLMよりも優れていることを示している。 6つの言語での独自の平行コーパスは、科学論文、人工的に生成された臨床文書、医療文書から編纂された。 LLM-in-the-loop法では,データ生成,厳密な評価,エージェントオーケストレーションを用いて性能を向上させる。 MarianMTベースモデルを用いた小さな医療用翻訳モデルを開発した。この領域での評価を標準化するための新しい医療翻訳試験データセットを導入する。このテストセットでBLEU、METEOR、ROUGE、BERTのスコアを用いて評価すると、MarianMTベースのモデルはGoogle Translate、DeepL、GPT-4-Turboより優れています。その結果、LLM-in-the-loopアプローチと、微調整された高品質なドメイン固有データを組み合わせることで、汎用システムや大規模システムよりも優れた性能を発揮することが示された。この研究は、専門家の小さなモデルに関するより広範なシリーズの一部であり、身元特定やバイオメディカルな実体抽出モデルを含む、将来の医療関連AI開発への道を開く。本研究は,データ生成,評価,エージェント,モデリング技術の改善を通じて,ニューラルネットワークモデルの改良とLLM-in-the-loop法の可能性を明らかにする。 Machine translation is indispensable in healthcare for enabling the global dissemination of medical knowledge across languages. However, complex medical terminology poses unique challenges to achieving adequate translation quality and accuracy. This study introduces a novel "LLMs-in-the-loop" approach to develop supervised neural machine translation models optimized specifically for medical texts. While large language models (LLMs) have demonstrated powerful capabilities, this research shows that small, specialized models trained on high-quality in-domain (mostly synthetic) data can outperform even vastly larger LLMs. Custom parallel corpora in six languages were compiled from scientific articles, synthetically generated clinical documents, and medical texts. Our LLMs-in-the-loop methodology employs synthetic data generation, rigorous evaluation, and agent orchestration to enhance performance. We developed small medical translation models using the MarianMT base model. We introduce a new medical translation test dataset to standardize evaluation in this domain. Assessed using BLEU, METEOR, ROUGE, and BERT scores on this test set, our MarianMT-based models outperform Google Translate, DeepL, and GPT-4-Turbo. Results demonstrate that our LLMs-in-the-loop approach, combined with fine-tuning high-quality, domain-specific data, enables specialized models to outperform general-purpose and some larger systems. This research, part of a broader series on expert small models, paves the way for future healthcare-related AI developments, including deidentification and bio-medical entity extraction models. Our study underscores the potential of tailored neural translation models and the LLMs-in-the-loop methodology to advance the field through improved data generation, evaluation, agent, and modeling techniques.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# 動的オンラインデータストリームを用いた完全テスト時間適応のための配電アライメント Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams ( http://arxiv.org/abs/2407.12128v1 ) ライセンス: Link先を確認	Ziqiang Wang, Zhixiang Chi, Yanan Wu, Li Gu, Zhi Liu, Konstantinos Plataniotis, Yang Wang,	(参考訳) ソースデータに基づいてトレーニングされたモデルによって、テスト時間適応(TTA)は、ソースからのドメインシフトを伴うテストデータストリームの適応と推論を可能にする。現在の手法では、自己学習損失を使用して、入ってくるテストデータバッチ毎にモデルを最適化している。これらの手法は、バッチが独立して、ターゲットの分布から同一にサンプリングされる理想的なテストデータストリームに変換可能な結果をもたらすが、より実用的なテストデータストリームは、独立で、同一に分散されていない(非i.d.)。非i.d.ストリームのデータバッチは、相互に顕著なラベルシフトを表示する。 TTAプロセスの間、バッチ間で最適化の目標が矛盾することになります。ソースモデルを予測不能なテスト時間分布に適応させる固有のリスクを考慮し、適応過程を逆転させ、新しいTTA分布アライメント損失を提案する。これにより、十分に訓練されたソースモデルとの互換性を確保し、矛盾する最適化目標に関連する落とし穴を取り除くことができる。さらに、連続的なドメインシフトシナリオにおいて、提案したTTA手法の成功を拡大するためのドメインシフト検出機構を考案する。本研究では,本手法の論理と有効性を検証した。 6つのベンチマークデータセットでは、非i.d.シナリオで既存の手法を超越し、理想的なi.d.仮定の下で競争性能を維持する。 Given a model trained on source data, Test-Time Adaptation (TTA) enables adaptation and inference in test data streams with domain shifts from the source. Current methods predominantly optimize the model for each incoming test data batch using self-training loss. While these methods yield commendable results in ideal test data streams, where batches are independently and identically sampled from the target distribution, they falter under more practical test data streams that are not independent and identically distributed (non-i.i.d.). The data batches in a non-i.i.d. stream display prominent label shifts relative to each other. It leads to conflicting optimization objectives among batches during the TTA process. Given the inherent risks of adapting the source model to unpredictable test-time distributions, we reverse the adaptation process and propose a novel Distribution Alignment loss for TTA. This loss guides the distributions of test-time features back towards the source distributions, which ensures compatibility with the well-trained source model and eliminates the pitfalls associated with conflicting optimization objectives. Moreover, we devise a domain shift detection mechanism to extend the success of our proposed TTA method in the continual domain shift scenarios. Our extensive experiments validate the logic and efficacy of our method. On six benchmark datasets, we surpass existing methods in non-i.i.d. scenarios and maintain competitive performance under the ideal i.i.d. assumption.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# ゲームとブロックチェーン: ハイプと現実 Gaming and Blockchain: Hype and Reality ( http://arxiv.org/abs/2407.12134v1 ) ライセンス: Link先を確認	Max McGuinness,	(参考訳) 本稿では,ゲーム産業におけるブロックチェーン技術の導入について考察する。支持者は、分散台帳技術はゲーム経済に革命をもたらし、プレイヤーに仮想資産のコントロールを提供する可能性があると断言する一方で、エネルギー消費やユーザの採用といった現実的な課題に対処し、デトラクタはブロックチェーンの統合がさらに必要かどうかを疑問視している。このレポートでは、EnjinやAxie Infinityといった一般的なブロックチェーンベースのゲームプロジェクトを特徴付け、トランザクションコストやプレイヤーのフィードバックといったメトリクスを比較して、ブロックチェーン統合ゲーム全体の長寿を評価する。 This paper explores the adoption of blockchain technology in the gaming industry. While supporters affirm that distributed ledger technology has potential to revolutionize gaming economies and provide players with control over their virtual assets, there are practical challenges such as energy consumption and user adoption to be addressed, and detractors question whether blockchain integration is even necessary. This report characterises popular blockchain-based gaming projects like Enjin and Axie Infinity, then compares metrics such as transaction cost and player feedback to evaluate the longevity of blockchain-integrated gaming as a whole.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# 分子トポロジープロファイル(MOLTOP) -- 分子グラフ分類のための単純で強力なベースライン Molecular Topological Profile (MOLTOP) -- Simple and Strong Baseline for Molecular Graph Classification ( http://arxiv.org/abs/2407.12136v1 ) ライセンス: Link先を確認	Jakub Adamczyk, Wojciech Czech,	(参考訳) 分子グラフ分類におけるトポロジカル記述子の有効性を再検討し、単純で強力なベースラインを設計する。本稿では,エッジディスクリプタのヒストグラムアグリゲーションと原子番号と結合型のワンホットエンコーディングを併用した機能工学への簡単なアプローチが,ランダムフォレスト分類器と組み合わせることで,グラフニューラルネットワーク(GNN)の強力なベースラインを確立することを実証する。新たなアルゴリズムである分子トポロジカルプロファイル(MOLTOP)は、エッジ間の中央性、調整されたランダムインデックス、SCAN構造類似度スコアを統合している。このアプローチは、現代的なGNNと比較して、非常に競争力がある一方で、単純で、高速で、低分散で、ハイパーパラメータフリーであることを示す。提案手法は, Open Graph Benchmark による公正な評価プロトコルを用いて, MoleculeNet データセット上で厳密に検証されている。また、Long Range Graph Benchmarkのペプチド分類タスクにおいて、ドメインのアウトオブドメイン生成機能を示す。 11のベンチマークデータセットに対する評価では、MOLTOPの強力な識別能力が、グラフのクラスで1ドル=WLテスト、さらに3ドル=WLテストを超えていることが明らかになった。我々の結論は、GNNドメインの進歩を正確に評価するためには、記述子ベースのベースライン(例えば、提案するもの)が依然として不可欠であるということだ。 We revisit the effectiveness of topological descriptors for molecular graph classification and design a simple, yet strong baseline. We demonstrate that a simple approach to feature engineering - employing histogram aggregation of edge descriptors and one-hot encoding for atomic numbers and bond types - when combined with a Random Forest classifier, can establish a strong baseline for Graph Neural Networks (GNNs). The novel algorithm, Molecular Topological Profile (MOLTOP), integrates Edge Betweenness Centrality, Adjusted Rand Index and SCAN Structural Similarity score. This approach proves to be remarkably competitive when compared to modern GNNs, while also being simple, fast, low-variance and hyperparameter-free. Our approach is rigorously tested on MoleculeNet datasets using fair evaluation protocol provided by Open Graph Benchmark. We additionally show out-of-domain generation capabilities on peptide classification task from Long Range Graph Benchmark. The evaluations across eleven benchmark datasets reveal MOLTOP's strong discriminative capabilities, surpassing the $1$-WL test and even $3$-WL test for some classes of graphs. Our conclusion is that descriptor-based baselines, such as the one we propose, are still crucial for accurately assessing advancements in the GNN domain.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# 開腹手術における整形外科器具の単眼的ポーズ推定 Monocular pose estimation of articulated surgical instruments in open surgery ( http://arxiv.org/abs/2407.12138v1 ) ライセンス: Link先を確認	Robert Spektor, Tom Friedman, Itay Or, Gil Bolotin, Shlomi Laufer,	(参考訳) 本研究は, 開腹手術における手術器具の単眼6Dポーズ推定に対する新しいアプローチとして, 物体調音, 対称性, 閉塞, 注釈付き実世界のデータの欠如といった課題に対処する。この手法は、これらの障害を克服するために合成データ生成とドメイン適応技術を利用する。提案手法は,(1)調音リギングと物理的レンダリングを用いた外科的ツールの3次元モデリングを用いた合成データ生成,(2)ポーズ推定とハイブリッドな幾何学的融合戦略を組み合わせた適切なポーズ推定フレームワーク,(3)合成データと実際の注釈データの両方を利用したトレーニング戦略,および(3)自動生成擬似ラベルを用いた実ビデオデータへのドメイン適応を用いたトレーニング戦略からなる。オープン手術の映像で行った評価は,提案手法の優れた性能と実世界の応用性を示し,医療用拡張現実およびロボットシステムへの統合の可能性を強調した。このアプローチは、実際の外科的データの広範な手動アノテーションを不要にする。 This work presents a novel approach to monocular 6D pose estimation of surgical instruments in open surgery, addressing challenges such as object articulations, symmetries, occlusions, and lack of annotated real-world data. The method leverages synthetic data generation and domain adaptation techniques to overcome these obstacles. The proposed approach consists of three main components: (1) synthetic data generation using 3D modeling of surgical tools with articulation rigging and physically-based rendering; (2) a tailored pose estimation framework combining object detection with pose estimation and a hybrid geometric fusion strategy; and (3) a training strategy that utilizes both synthetic and real unannotated data, employing domain adaptation on real video data using automatically generated pseudo-labels. Evaluations conducted on videos of open surgery demonstrate the good performance and real-world applicability of the proposed method, highlighting its potential for integration into medical augmented reality and robotic systems. The approach eliminates the need for extensive manual annotation of real surgical data.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# 傾斜クエンチ量子相転移における長距離相互作用と雑音の競合--長距離ペアの北エフ鎖の場合- Competition of long-range interactions and noise at ramped quench dynamical quantum phase transition: The case of the long-range pairing Kitaev chain ( http://arxiv.org/abs/2407.12140v1 ) ライセンス: Link先を確認	R. Baghran, R. Jafari, A. Langari,	(参考訳) 動的量子相転移 (DQPTs) のフレームワークにおいて, ノイズのない線形時間依存性の化学ポテンシャルを持つ長距離ペアリング北エフモデルの非平衡ダイナミクスについて検討した。短距離ペアリング北エフモデルでは1つの臨界時間尺度が示され、長距離ペアリングは3つのDQPT時間尺度を持つ領域を誘導する。 3つのDQPT時間スケールを持つ領域はノイズの存在下で縮小することがわかった。さらに,DQPTが消滅する臨界スイープ速度であるクエンチが2つの臨界点を横切ることを明らかにした。数値シミュレーションに基づいて,雑音が長距離ペアリング誘導を減少させることを示した。 The nonequilibrium dynamics of long-range pairing Kitaev model with noiseless/noisy linear time dependent chemical potential, is investigated in the frame work of dynamical quantum phase transitions (DQPTs). We have shown for the ramp crosses a single quantum critical point, while the short-range pairing Kitaev model displays a single critical time scale, the long-range pairing induces a region with three DQPTs time scales. We have found that the region with three DQPTs time scales shrinks in the presence of the noise. In addition, we have uncovered for a quench crossess two critical points, the critical sweep velocity above which the DQPTs disappear, enhances by the long-range pairing exponent while decreases in the presence of the noise. On the basis of numerical simulations, we have shown that noise diminishes the long-range pairing inductions.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# ポーランドの政治文における感情強度の予測:資源不足言語における教師付きモデルと大規模言語モデルの比較 Predicting Emotion Intensity in Polish Political Texts: Comparing Supervised Models and Large Language Models in a Resource-Poor Language ( http://arxiv.org/abs/2407.12141v1 ) ライセンス: Link先を確認	Hubert Plisiecki, Piotr Koc, Maria Flakus, Artur Pokropek,	(参考訳) 本研究では,ポーランドの政治文における感情強度の予測に大規模言語モデル(LLM)を用いることを検討した。この研究は、専門家による感情の強さを評価するために、1万のソーシャルメディアテキストの注釈付きコーパスで訓練された教師付きモデルと比較した。これらの結果から, 教師付きモデルはLLMよりも優れ, 精度が高く, 分散度も低いが, LLMは特にデータアノテーションに関連するコストが高いため, 有効な代替手段となることが示唆された。この研究は、低リソース言語設定におけるLLMの可能性を強調し、感情の強度予測とその異なる言語と連続した特徴に対するさらなる研究の必要性を強調している。これらの意味は、リソースの可用性とタスクの特定の要求に基づいて、研究者や実践者にとっての感情予測への正しいアプローチを選択するための、曖昧な意思決定プロセスが示唆されている。 This study explores the use of large language models (LLMs) to predict emotion intensity in Polish political texts, a resource-poor language context. The research compares the performance of several LLMs against a supervised model trained on an annotated corpus of 10,000 social media texts, evaluated for the intensity of emotions by expert judges. The findings indicate that while the supervised model generally outperforms LLMs, offering higher accuracy and lower variance, LLMs present a viable alternative, especially given the high costs associated with data annotation. The study highlights the potential of LLMs in low-resource language settings and underscores the need for further research on emotion intensity prediction and its application across different languages and continuous features. The implications suggest a nuanced decision-making process to choose the right approach to emotion prediction for researchers and practitioners based on resource availability and the specific requirements of their tasks.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# 投票結果の予測でオンライン関係を超越した物理パルチザン Physical partisan proximity outweighs online ties in predicting US voting outcomes ( http://arxiv.org/abs/2407.12146v1 ) ライセンス: Link先を確認	Marco Tonin, Bruno Lepri, Michele Tizzoni,	(参考訳) 影響力のある分極と社会的分裂の増大は、社会の混合や、オンラインや物理的な空間における情報の拡散、社会的・選挙的分裂の強化、政治的結果への影響に影響を及ぼす。ここでは、集約された非特定コロケーションとオンラインネットワークデータを用いて、米国におけるパルチザン曝露と投票パターンの関係を、同じ社会的文脈への物理的近接と露出、オンライン社会関係、住宅分類の3次元で比較検討する。様々な統計的モデリングアプローチを活用することで、コロケーションパターンによって捉えられた物理空間におけるパルチザン露光が、米国郡の選挙結果をより正確に予測し、大都市圏や非都市圏でのオンライン・住宅露光よりも優れていたことを一貫して見出す。さらに, 投票結果が不確実なスウィング郡では, 投票パターンの予測に物理パルチザンが最適であることが示唆された。また、郡レベルの経験者分離を推定し、個人の人口動態と社会経済特性との関係について検討した。本稿は、大都市圏を中心に、米国における大規模なパルチザン分離の存在を確認し、オフラインのパルチザン分離が、物理的出会いや住宅の選別の両方を考慮して、オンラインのセグメンテーションよりも高く、主に教育的達成と関連していることを示す。本研究は,ソーシャル・ネットワークと政治行動の関係を理解する上での物理空間の重要性を強調し,オンライン・ソーシャルネットワークと選挙に焦点を絞った厳しい精査とは対照的である。 Affective polarization and increasing social divisions affect social mixing and the spread of information across online and physical spaces, reinforcing social and electoral cleavages and influencing political outcomes. Here, using aggregated and de-identified co-location and online network data, we investigate the relationship between partisan exposure and voting patterns in the USA by comparing three dimensions of partisan exposure: physical proximity and exposure to the same social contexts, online social ties, and residential sorting. By leveraging various statistical modeling approaches, we consistently find that partisan exposure in the physical space, as captured by co-location patterns, more accurately predicts electoral outcomes in US counties, outperforming online and residential exposures across metropolitan and non-metro areas. Moreover, our results show that physical partisan proximity is the best predictor of voting patterns in swing counties, where the election results are most uncertain. We also estimate county-level experienced partisan segregation and examine its relationship with individuals' demographic and socioeconomic characteristics. Focusing on metropolitan areas, our results confirm the presence of extensive partisan segregation in the US and show that offline partisan isolation, both considering physical encounters or residential sorting, is higher than online segregation and is primarily associated with educational attainment. Our findings emphasize the importance of physical space in understanding the relationship between social networks and political behavior, in contrast to the intense scrutiny focused on online social networks and elections.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# 生物活性予測のための深層化学言語処理のためのヒッチハイカーガイド A Hitchhiker's Guide to Deep Chemical Language Processing for Bioactivity Prediction ( http://arxiv.org/abs/2407.12152v1 ) ライセンス: Link先を確認	Rıza Özçelik, Francesca Grisoni,	(参考訳) 深層学習は薬物発見を著しく加速させ、顕著なアプローチとして「化学言語」処理(CLP)が出現した。 CLPは、分子文字列表現(例えば、Simplified Molecular Input Line Entry Systems(SMILES)とSelf-Reference Embedded Strings(SELFIES))から、自然言語処理に似たメソッドで学習する。その重要性は増しているが、予測型CLPモデルは、多くの「鐘と笛」を含むため、決して自明ではない。ここでは,CLPトレーニングの重要な要素を分析し,新参者や専門家のガイドラインを提供する。我々の研究は、分類と回帰の両方のために、3つのニューラルネットワークアーキテクチャ、2つの文字列表現、3つの埋め込み戦略、10の生物活性データセットにまたがる。この「ヒッチハイカーのガイド」は、特定の方法論的選択の重要性を浮き彫りにするだけでなく、ニューラルネットワークアーキテクチャ、分子表現、ハイパーパラメータ最適化といった、理想的な選択に関する実践的な勧告を研究者に与えている。 Deep learning has significantly accelerated drug discovery, with 'chemical language' processing (CLP) emerging as a prominent approach. CLP learns from molecular string representations (e.g., Simplified Molecular Input Line Entry Systems [SMILES] and Self-Referencing Embedded Strings [SELFIES]) with methods akin to natural language processing. Despite their growing importance, training predictive CLP models is far from trivial, as it involves many 'bells and whistles'. Here, we analyze the key elements of CLP training, to provide guidelines for newcomers and experts alike. Our study spans three neural network architectures, two string representations, three embedding strategies, across ten bioactivity datasets, for both classification and regression purposes. This 'hitchhiker's guide' not only underscores the importance of certain methodological choices, but it also equips researchers with practical recommendations on ideal choices, e.g., in terms of neural network architectures, molecular representations, and hyperparameter optimization.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# Parity-deformed $su(2)$ and $so(3)$ Algebras: a Basis for Quantum Optics and Quantum Communications Applications Parity-deformed $su(2)$ and $so(3)$ Algebras: a Basis for Quantum Optics and Quantum Communications Applications ( http://arxiv.org/abs/2407.12157v1 ) ライセンス: Link先を確認	W. S. Chung, H. Hassanabadi, L. M. Nieto, S. Zarrinkamar,	(参考訳) 物理学の様々な分野におけるパリティ(リフレクション)の重要性を念頭に置いて、単モードおよび二モードウィグナー代数はそれらにリフレクション作用素を加えると考えられる。関連する $su(2)$ algebra, $su_{\nu}(2)$, and the deformed $so(3)$ algebra, $so_{\nu}(3)$, is constructed for the wide use Jordan-Schwinger and Holstein-Primakoff realizations, commenting on various aspects and ingredients of the formalism for both single-mode and two-mode case。最後に、このフレームワークでは、パリティ変形した $so_{\nu}(3)$表現が解析される。 Having in mind the significance of parity (reflection) in various areas of physics, the single-mode and two-mode Wigner algebras are considered adding to them a reflection operator. The associated deformed $su(2)$ algebra, $su_{\nu}(2)$, and the deformed $so(3)$ algebra, $so_{\nu}(3)$, are constructed for the widely used Jordan-Schwinger and Holstein-Primakoff realizations, commenting on various aspects and ingredients of the formalism for both single-mode and two-mode cases. Finally, in this framework the parity-deformed $so_{\nu}(3)$ representation is analyzed, due to its potential application in the study of qubit and qutrit systems.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# 家庭のIoTが再び育つ The IoT Breaches your Household Again ( http://arxiv.org/abs/2407.12159v1 ) ライセンス: Link先を確認	Davide Bonaventura, Sergio Esposito, Giampaolo Bella,	(参考訳) その明らかな単純さにもかかわらず、スマート電球や電気プラグのようなデバイスは、厳格なセキュリティ対策から除外されることが多い。しかし、本稿は、この誤解に挑戦し、これらの一見無害なデバイスの脆弱性がどのようにユーザを重大なリスクに晒すかを明らかにする。本報告では, これまでの研究の概要を概説し, 新たな攻撃シナリオを導入する。この新たな攻撃により、悪意のあるアクターは、被害者のTapoアカウントのEメールやパスワード、SSID、ローカルネットワークのパスワードなど、機密情報を取得することができる。さらに、同じIoTエコシステム内の他のスマートデバイス、特にTp-Linkによって製造されたデバイスに対して、これらの発見を部分的にあるいは完全に複製する方法を実証する。調査は、スマート電球(Tapo L530E, Tapo L510E V2, Tapo L630)、スマートプラグ(Tapo P100)、スマートカメラ(Tapo C200)を含むTp-Link Tapoの範囲に焦点を当てた。類似した通信プロトコル,あるいは若干の変種を用いることで,新たに同定された攻撃シナリオを含むすべての攻撃シナリオの完全活用が可能であることが判明した。逆に、Tapo P100とTapo C200は攻撃シナリオのサブセットにのみ脆弱性を示す。結論として、これらの脆弱性とその潜在的な影響を強調して、認識を高め、スマートデバイスデプロイメントにおけるセキュリティリスクを軽減するための積極的なステップを促進することを目指している。 Despite their apparent simplicity, devices like smart light bulbs and electrical plugs are often perceived as exempt from rigorous security measures. However, this paper challenges this misconception, uncovering how vulnerabilities in these seemingly innocuous devices can expose users to significant risks. This paper extends the findings outlined in previous work, introducing a novel attack scenario. This new attack allows malicious actors to obtain sensitive credentials, including the victim's Tapo account email and password, as well as the SSID and password of her local network. Furthermore, we demonstrate how these findings can be replicated, either partially or fully, across other smart devices within the same IoT ecosystem, specifically those manufactured by Tp-Link. Our investigation focused on the Tp-Link Tapo range, encompassing smart bulbs (Tapo L530E, Tapo L510E V2, and Tapo L630), a smart plug (Tapo P100), and a smart camera (Tapo C200). Utilizing similar communication protocols, or slight variants thereof, we found that the Tapo L530E, Tapo L510E V2, and Tapo L630 are susceptible to complete exploitation of all attack scenarios, including the newly identified one. Conversely, the Tapo P100 and Tapo C200 exhibit vulnerabilities to only a subset of attack scenarios. In conclusion, by highlighting these vulnerabilities and their potential impact, we aim to raise awareness and encourage proactive steps towards mitigating security risks in smart device deployment.	翻訳日:2024-07-18 19:18:21 公開日:2024-07-16
# 行動の解釈可能性:マインクラフトエージェントVPTの探索分析 Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent ( http://arxiv.org/abs/2407.12161v1 ) ライセンス: Link先を確認	Karolis Jucys, George Adamopoulos, Mehrab Hamidi, Stephanie Milani, Mohammad Reza Samsami, Artem Zholus, Sonia Joseph, Blake Richards, Irina Rish, Özgür Şimşek,	(参考訳) 意思決定タスクにおける大規模基盤モデルによる意思決定の背後にあるメカニズムを理解することは、そのようなシステムが透過的かつ安全に動作することを保証するために重要である。本研究では,最大規模のオープンソースビジョンベースエージェントである Video PreTraining (VPT) Minecraft プレイエージェントについて探索分析を行った。我々は,様々な解釈可能性技術を適用して,その推論機構を照らし出すことを目的とする。まず、エージェントがトレーニングタスクを完了している間の注意機構を分析し、ダイヤモンドピックアックスを製作する。エージェントは6秒のメモリで最後の4フレームといくつかのキーフレームに注意を払っている。これは、メモリが短いにもかかわらず、3～10分かかるタスクでコヒーレンスを維持するためのメカニズムである。第2に,我々は様々な介入を行い,目標誤一般化の懸念事例を明らかにするのに役立つ。VPTは,緑葉の葉の下に静止しているときに,茶色の服を着ている村人を誤って木の幹として特定し,それを打倒する。 Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task - crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# ベルマン拡散モデル Bellman Diffusion Models ( http://arxiv.org/abs/2407.12163v1 ) ライセンス: Link先を確認	Liam Schramm, Abdeslam Boularias,	(参考訳) 拡散モデルは生成的アーキテクチャとして大きな成功を収めた。近年,オフライン強化学習や模倣学習のためのポリシーのモデル化に有効であることが示されている。政策の後継状態尺度(SSM)のモデルクラスとして拡散を利用する方法について検討する。ベルマンフローの制約を強制することは、拡散ステップ分布の単純なベルマン更新につながる。 Diffusion models have seen tremendous success as generative architectures. Recently, they have been shown to be effective at modelling policies for offline reinforcement learning and imitation learning. We explore using diffusion as a model class for the successor state measure (SSM) of a policy. We find that enforcing the Bellman flow constraints leads to a simple Bellman update on the diffusion step distribution.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# 選好型強化学習による主観的テキスト・ツー・イメージ生成 Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning ( http://arxiv.org/abs/2407.12164v1 ) ライセンス: Link先を確認	Yanting Miao, William Loh, Suraj Kothawade, Pascal Poupart, Abdullah Rashwan, Yeqing Li,	(参考訳) 近年,テキスト・ツー・イメージ生成モデルが注目され,テキスト・プロンプトから高品質な画像の合成が可能となった。しかし、これらのモデルには、与えられた参照画像から特定の主題を生成する能力や、異なる条件下で新規な表現を合成する能力がないことが多い。 DreamBooth や Subject-driven Text-to-Image (SuTI) のような手法はこの分野で大きな進歩を遂げている。しかし、どちらのアプローチも主に参照画像との類似性の向上に重点を置いており、しばしば効率的なトレーニングの必要性を見落とし、参照画像への過度な適合を避けるために高価なセットアップを必要としている。本稿では,信頼度の高い報奨信号を提供する$\lambda$-Harmonic reward関数を提案する。 Bradley-Terry の選好モデルを組み合わせることで、$\lambda$-Harmonic reward関数は主観駆動生成タスクの選好ラベルも提供する。本稿では,Reward Preference Optimization(RPO)を提案する。これはより簡単なセットアップ(DreamBoothが使用する負のサンプルのわずか$3\%)と,微調整のための勾配ステップの削減を実現する。既存の方法とは異なり,本手法ではテキストエンコーダのトレーニングやテキスト埋め込みの最適化を必要とせず,U-Netコンポーネントのみを微調整することでテキストイメージアライメントを実現する。経験的に、$\lambda$-Harmonicは、主観駆動生成タスクにおけるモデル選択の信頼性の高いアプローチであることが証明されている。このアルゴリズムは、好みラベルと$\lambda$-Harmonic reward関数の早期停止検証に基づいて、最先端のCLIP-Iスコア0.833、DreamBenchのCLIP-Tスコア0.314を達成する。 Text-to-image generative models have recently attracted considerable interest, enabling the synthesis of high-quality images from textual prompts. However, these models often lack the capability to generate specific subjects from given reference images or to synthesize novel renditions under varying conditions. Methods like DreamBooth and Subject-driven Text-to-Image (SuTI) have made significant progress in this area. Yet, both approaches primarily focus on enhancing similarity to reference images and require expensive setups, often overlooking the need for efficient training and avoiding overfitting to the reference images. In this work, we present the $\lambda$-Harmonic reward function, which provides a reliable reward signal and enables early stopping for faster training and effective regularization. By combining the Bradley-Terry preference model, the $\lambda$-Harmonic reward function also provides preference labels for subject-driven generation tasks. We propose Reward Preference Optimization (RPO), which offers a simpler setup (requiring only $3\%$ of the negative samples used by DreamBooth) and fewer gradient steps for fine-tuning. Unlike most existing methods, our approach does not require training a text encoder or optimizing text embeddings and achieves text-image alignment by fine-tuning only the U-Net component. Empirically, $\lambda$-Harmonic proves to be a reliable approach for model selection in subject-driven generation tasks. Based on preference labels and early stopping validation from the $\lambda$-Harmonic reward function, our algorithm achieves a state-of-the-art CLIP-I score of 0.833 and a CLIP-T score of 0.314 on DreamBench.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# 自律クラウドのためのAIエージェントの構築 - 課題と設計原則 Building AI Agents for Autonomous Clouds: Challenges and Design Principles ( http://arxiv.org/abs/2407.12165v1 ) ライセンス: Link先を確認	Manish Shetty, Yinfang Chen, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, Suman Nath, Chetan Bansal, Saravan Rajmohan,	(参考訳) ソフトウェア開発とデプロイメントの一部としてのLarge Language Models(LLM)とAI Agentsの利用の急速な成長は、情報技術の展望に革命をもたらしている。コード生成は大きな注目を集める一方で、AIエージェントをクラウドサービスの運用上のレジリエンスに使用する場合、よりインパクトの高いアプリケーションは、現在、かなりの人的努力とドメイン知識を必要としている。 AI for IT Operations(AIOps)には、障害のローカライゼーションや根本原因分析といった複雑な運用タスクを自動化することを目的としている。しかし、自律的で自己修復的なクラウドのビジョンを達成することは、AIOpsエージェントの構築、評価、改善のための標準化されたフレームワークが欠如していることによって妨げられている。このビジョンペーパーは、まず要求をフレーミングし、それを満たす設計決定について議論することで、そのようなフレームワークの基礎を定めます。また、アプリケーションをオーケストレーションし、カオスエンジニアリングを使用してリアルタイム障害を注入するエージェント-クラウドインターフェースを活用したプロトタイプ実装であるAIOpsLabや、障害のローカライズと解決を行うエージェントとのインターフェースも提案する。我々は有望な結果を報告し、自律クラウドのエージェントの構築、評価、改善のためのモジュラーで堅牢なフレームワークを構築するための基礎を築きます。 The rapid growth in the use of Large Language Models (LLMs) and AI Agents as part of software development and deployment is revolutionizing the information technology landscape. While code generation receives significant attention, a higher-impact application lies in using AI agents for operational resilience of cloud services, which currently require significant human effort and domain knowledge. There is a growing interest in AI for IT Operations (AIOps) which aims to automate complex operational tasks, like fault localization and root cause analysis, thereby reducing human intervention and customer impact. However, achieving the vision of autonomous and self-healing clouds though AIOps is hampered by the lack of standardized frameworks for building, evaluating, and improving AIOps agents. This vision paper lays the groundwork for such a framework by first framing the requirements and then discussing design decisions that satisfy them. We also propose AIOpsLab, a prototype implementation leveraging agent-cloud-interface that orchestrates an application, injects real-time faults using chaos engineering, and interfaces with an agent to localize and resolve the faults. We report promising results and lay the groundwork to build a modular and robust framework for building, evaluating, and improving agents for autonomous clouds.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# 乱流雰囲気のダイナミクス予測のためのスケーラブルなリアルタイムデータ同化フレームワーク A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics ( http://arxiv.org/abs/2407.12168v1 ) ライセンス: Link先を確認	Junqi Yin, Siming Liang, Siyan Liu, Feng Bao, Hristo G. Chipilski, Dan Lu, Guannan Zhang,	(参考訳) 天気と気候のドメインは、FourCastNet、GraphCast、ClimaX、Pangu-WeatherといったAIベースの基盤モデルの進歩により、大きな変革が進んでいる。これらのモデルはかなりの可能性を示しているが、気象予報や気候予報にはまだ運用される準備ができていない。これは、受信した地球系の観測をリアルタイムで同化可能にするために、ワークフローの一部としてデータ同化法が欠如しているためである。この制限は、熱帯のサイクロンや大気の川のような複雑な大気現象を予測する効果に影響を及ぼす。これらの障害を克服するために,汎用的なリアルタイムデータ同化フレームワークを導入し,Frontierスーパーコンピュータ上でのエンド・ツー・エンドの性能を示す。アンサンブルスコアフィルタ(EnSF)は、局所アンサンブル変換カルマンフィルタ(LETKF)と、観測データの統合によるリアルタイム適応が可能な視覚変換器ベースのサロゲートの2つの主要モジュールから構成される。 ViTサロゲートは、物理ベースのモデルまたはAIベースのファンデーションモデルのいずれかを表現することができる。 ExascaleスーパーコンピュータであるFrontier上では、私たちのフレームワークの強いスケーリングと弱いスケーリングの両方を1024GPUで実証しています。本研究は,高性能コンピューティングシステムにおけるフレームワークの卓越したスケーラビリティを示すだけでなく,気象・気候予報のリアルタイムデータ同化におけるスーパーコンピュータの重要性を示すものである。提案したフレームワークは、ベンチマーク表面の準地磁気(SQG)乱流システムでのみテストされるが、既存のAIベースの基盤モデルと組み合わせる可能性があり、将来の運用実装に適している。 The weather and climate domains are undergoing a significant transformation thanks to advances in AI-based foundation models such as FourCastNet, GraphCast, ClimaX and Pangu-Weather. While these models show considerable potential, they are not ready yet for operational use in weather forecasting or climate prediction. This is due to the lack of a data assimilation method as part of their workflow to enable the assimilation of incoming Earth system observations in real time. This limitation affects their effectiveness in predicting complex atmospheric phenomena such as tropical cyclones and atmospheric rivers. To overcome these obstacles, we introduce a generic real-time data assimilation framework and demonstrate its end-to-end performance on the Frontier supercomputer. This framework comprises two primary modules: an ensemble score filter (EnSF), which significantly outperforms the state-of-the-art data assimilation method, namely, the Local Ensemble Transform Kalman Filter (LETKF); and a vision transformer-based surrogate capable of real-time adaptation through the integration of observational data. The ViT surrogate can represent either physics-based models or AI-based foundation models. We demonstrate both the strong and weak scaling of our framework up to 1024 GPUs on the Exascale supercomputer, Frontier. Our results not only illustrate the framework's exceptional scalability on high-performance computing systems, but also demonstrate the importance of supercomputers in real-time data assimilation for weather and climate predictions. Even though the proposed framework is tested only on a benchmark surface quasi-geostrophic (SQG) turbulence system, it has the potential to be combined with existing AI-based foundation models, making it suitable for future operational implementations.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# ブロックチェーンにおけるThreshold暗号システムのレイテンシ価格 The Latency Price of Threshold Cryptosystem in Blockchains ( http://arxiv.org/abs/2407.12172v1 ) ライセンス: Link先を確認	Zhuolun Xiang, Sourav Das, Zekun Li, Zhoujun Ma, Alexander Spiegelman,	(参考訳) 閾値暗号は多くのブロックチェーンプロトコルに必須である。例えば、多くのプロトコルは、非同期コンセンサス、リーダー選挙を実装し、ランダム化されたアプリケーションのサポートを提供するために、しきい値共通のコインに依存している。同様に、しきい値署名スキームはプロトコル効率や状態認証に頻繁に使用され、しきい値復号としきい値タイムロックパズルはプライバシーのために必要となることが多い。本稿では,Byzantine-fault Tolerant(BFT)コンセンサスプロトコルを用いて,レイテンシに着目したしきい値暗号とブロックチェーンのクラス間の相互作用について検討する。具体的には、ブロックチェーンネイティブなしきい値暗号システムに注目し、ブロックチェーン検証者は、しきい値暗号プロトコルへの入力としてブロック内容を持つブロック毎に、しきい値暗号プロトコルを実行しようとする。ブロックチェーンネイティブなしきい値暗号システムに対する既存のアプローチはすべて、しきい値暗号プロトコルを実行するための少なくとも1つのメッセージ遅延のレイテンシオーバーヘッドを導入している。本稿では,秘密と復元しきい値が同じしきい値暗号プロトコルにおいて,厳密なしきい値を持つブロックチェーンネイティブのしきい値暗号システムに対して,このオーバーヘッドを解消する機構を最初に提案する。しかし、現実の証明ベースのブロックチェーンネイティブなしきい値暗号システムの多くは、復元しきい値が機密しきい値より厳密に大きいランプしきい値に依存している。これらのブロックチェーンについては、追加の遅延が避けられないことを正式に示しています。次に、楽観的な場合において、この遅延を最小限に抑えるメカニズムを導入する。我々は,Aptosブロックチェーン上での分散ランダム性証明のための楽観的なプロトコルを実装した。 Aptosのメインネットからの測定によると、楽観的なアプローチは遅延オーバーヘッドを71%削減する。 Threshold cryptography is essential for many blockchain protocols. For example, many protocols rely on threshold common coin to implement asynchronous consensus, leader elections, and provide support for randomized applications. Similarly, threshold signature schemes are frequently used for protocol efficiency and state certification, and threshold decryption and threshold time-lock puzzles are often necessary for privacy. In this paper, we study the interplay between threshold cryptography and a class of blockchains that use Byzantine-fault tolerant (BFT) consensus protocols with a focus on latency. More specifically, we focus on blockchain-native threshold cryptosystem, where the blockchain validators seek to run a threshold cryptographic protocol once for every block with the block contents as an input to the threshold cryptographic protocol. All existing approaches for blockchain-native threshold cryptosystems introduce a latency overhead of at least one message delay for running the threshold cryptographic protocol. In this paper, we first propose a mechanism to eliminate this overhead for blockchain-native threshold cryptosystems with tight thresholds, i.e., in threshold cryptographic protocols where the secrecy and reconstruction thresholds are the same. However, many real-world proof-of-stake-based blockchain-native threshold cryptosystems rely on ramp thresholds, where reconstruction thresholds are strictly greater than secrecy thresholds. For these blockchains, we formally demonstrate that the additional delay is unavoidable. We then introduce a mechanism to minimize this delay in the optimistic case. We implement our optimistic protocol for the proof-of-stake distributed randomness scheme on the Aptos blockchain. Our measurements from the Aptos mainnet show that the optimistic approach reduces latency overhead by 71%.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# ベータサンプリングは必要なすべて:ステップワイド分光分析を用いた拡散モデルのための効率的な画像生成戦略 Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis ( http://arxiv.org/abs/2407.12173v1 ) ライセンス: Link先を確認	Haeil Lee, Hansang Lee, Seoyeon Gye, Junmo Kim,	(参考訳) 生成拡散モデルは高品質な画像合成のための強力なツールとして登場してきたが、その反復性は重要な計算資源を必要とする。本稿では,拡散過程の画像スペクトル分析に基づく効率的な時間ステップサンプリング手法を提案する。従来の均一な分散ベースのタイムステップサンプリングの代わりに、プロセスの初期段階と後期において重要なステップを優先する、ベータディストリビューションのようなサンプリング技術を導入します。我々の仮説では、あるステップは画像の内容に大きな変化を示すが、他のステップは最小限に寄与する。フーリエ変換を用いて各ステップの周波数応答変化を計測し, 早期の低周波変化と, その後の高周波調整について検証した。 ADMとStable Diffusionを用いた実験では、ベータサンプリング法は一貫して一様サンプリングよりも優れ、FIDとISスコアが向上し、AutoDiffusionのような最先端の手法と比較して競争効率が向上することを示した。この研究は、計算資源を最も影響の大きいステップに集中させることで拡散モデルの効率を高めるための実践的なフレームワークを提供し、さらなる最適化とより広範な応用の可能性を秘めている。 Generative diffusion models have emerged as a powerful tool for high-quality image synthesis, yet their iterative nature demands significant computational resources. This paper proposes an efficient time step sampling method based on an image spectral analysis of the diffusion process, aimed at optimizing the denoising process. Instead of the traditional uniform distribution-based time step sampling, we introduce a Beta distribution-like sampling technique that prioritizes critical steps in the early and late stages of the process. Our hypothesis is that certain steps exhibit significant changes in image content, while others contribute minimally. We validated our approach using Fourier transforms to measure frequency response changes at each step, revealing substantial low-frequency changes early on and high-frequency adjustments later. Experiments with ADM and Stable Diffusion demonstrated that our Beta Sampling method consistently outperforms uniform sampling, achieving better FID and IS scores, and offers competitive efficiency relative to state-of-the-art methods like AutoDiffusion. This work provides a practical framework for enhancing diffusion model efficiency by focusing computational resources on the most impactful steps, with potential for further optimization and broader application.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# GPT-4Vは放射線学のレポートをまだ生成できない GPT-4V Cannot Generate Radiology Reports Yet ( http://arxiv.org/abs/2407.12176v1 ) ライセンス: Link先を確認	Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan,	(参考訳) GPT-4Vの強いマルチモーダル能力は、放射線学レポート作成の自動化に関心を喚起するが、徹底的な評価は得られていない。本研究では,2つの胸部X線レポートデータセット(MIMIC-CXRとIU X-Ray)について,GPT-4Vの系統的評価を行った。我々は, GPT-4V を用いた報告を異なるプロンプト戦略により直接生成し, 語彙指標と臨床効果指標の両方で異常を生じさせることを試みた。低パフォーマンスを理解するために、タスクを2つのステップに分解します。 1)画像から医療条件ラベルを予測するための医用画像推論ステップ 2)(地中)条件から報告を生成するための報告合成ステップ。画像推論におけるGPT-4Vの性能は、異なるプロンプト間で一貫して低いことを示す。実際、モデル予測ラベルの分布は、画像上にどの基底条件が存在するかに関わらず一定であり、モデルが胸部X線を有意に解釈していないことを示唆している。レポート合成における基底条件が与えられたとしても、その生成した報告は微調整されたLLaMA-2よりも正確で自然音の少ないものである。また,GPT-4Vを放射線学のワークフローで用いる可能性についても疑念を呈していた。 GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# 線形回帰モデルはホワイトボックスと解釈可能か? Are Linear Regression Models White Box and Interpretable? ( http://arxiv.org/abs/2407.12177v1 ) ライセンス: Link先を確認	Ahmed M Salih, Yuhe Wang,	(参考訳) 説明可能な人工知能(XAI)は、モデルを理解し解釈するために機械学習モデルに適用または組み込んだ一連のツールとアルゴリズムである。人間の視点では解釈できないため、ディープニューラルネットワークを含む複雑なモデルや高度なモデルでは特に推奨される。一方、線形回帰を含む単純なモデルは実装が容易であり、計算の複雑さが小さく、出力の可視化も容易である。文学における一般的な概念は、線形回帰を含む単純なモデルはより解釈可能で理解しやすいことから「白い箱」と見なされる。これは線形回帰モデルがモデルの特徴の影響やモデル出力に対して正あるいは負の影響を及ぼすかどうかなど、いくつかの好ましい結果をもたらすという考えに基づいている。さらに、信頼区間を用いてモデルの不確実性を計測または推定することができる。しかし、この認識は正確ではなく、線形回帰モデルは一般的なXAIメトリクスや潜在的な課題を考えると、容易には理解できない。これには線形性、局所的説明、多重線型性、共変量、正規化、不確実性、特徴の寄与と公正性が含まれる。したがって、説明可能性や解釈可能性に関して、いわゆる単純なモデルは、複雑なモデルに対して等しく扱われるべきである。 Explainable artificial intelligence (XAI) is a set of tools and algorithms that applied or embedded to machine learning models to understand and interpret the models. They are recommended especially for complex or advanced models including deep neural network because they are not interpretable from human point of view. On the other hand, simple models including linear regression are easy to implement, has less computational complexity and easy to visualize the output. The common notion in the literature that simple models including linear regression are considered as "white box" because they are more interpretable and easier to understand. This is based on the idea that linear regression models have several favorable outcomes including the effect of the features in the model and whether they affect positively or negatively toward model output. Moreover, uncertainty of the model can be measured or estimated using the confidence interval. However, we argue that this perception is not accurate and linear regression models are not easy to interpret neither easy to understand considering common XAI metrics and possible challenges might face. This includes linearity, local explanation, multicollinearity, covariates, normalization, uncertainty, features contribution and fairness. Consequently, we recommend the so-called simple models should be treated equally to complex models when it comes to explainability and interpretability.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# 探査 Exploration Unbound ( http://arxiv.org/abs/2407.12178v1 ) ライセンス: Link先を確認	Dilip Arumugam, Wanqiao Xu, Benjamin Van Roy,	(参考訳) シーケンシャルな意思決定エージェントは、環境に関する新しい知識を得るための探索と、現在の知識を活用して即時報酬を最大化することのバランスをとる。伝統的な文献で研究される環境において、エージェントが十分な知識を蓄積し、さらなる探索の恩恵が消えるにつれて、最適な決定は時間の経過とともに搾取へと導かれる。しかし、もし環境が無限に有用な知識を提供しており、エージェントがどれだけ学習したとしても、さらなる探索には大きなメリットがあるとしたらどうだろうか? このような複雑な環境の単純で簡潔な例を示します。この環境では、報酬は非有界であり、エージェントは常に、より多くのことを学ぶことで報酬が蓄積される率を高めることができる。その結果、最適なエージェントは、探索する確率を永久に維持する。 A sequential decision-making agent balances between exploring to gain new knowledge about an environment and exploiting current knowledge to maximize immediate reward. For environments studied in the traditional literature, optimal decisions gravitate over time toward exploitation as the agent accumulates sufficient knowledge and the benefits of further exploration vanish. What if, however, the environment offers an unlimited amount of useful knowledge and there is large benefit to further exploration no matter how much the agent has learned? We offer a simple, quintessential example of such a complex environment. In this environment, rewards are unbounded and an agent can always increase the rate at which rewards accumulate by exploring to learn more. Consequently, an optimal agent forever maintains a propensity to explore.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# 半月板異常の画像再構成評価と臨床的解釈を支援する物体検出法 The object detection method aids in image reconstruction evaluation and clinical interpretation of meniscal abnormalities ( http://arxiv.org/abs/2407.12184v1 ) ライセンス: Link先を確認	Natalia Konovalova, Aniket Tolpadi, Felix Liu, Zehra Akkaya, Felix Gassert, Paula Giesler, Johanna Luitjens, Misung Han, Emma Bahroos, Sharmila Majumdar, Valentina Pedoia,	(参考訳) 本研究は, 深層学習(DL)画像再構成の品質と異常検出性能の関係について検討し, 再建画像に対する半月異常の解釈を高度化するための人工知能(AI)アシスタントの有効性を評価する。 896例の膝関節MRI画像を評価するために, 室内再建と異常検出パイプラインを用いて回顧調査を行った。 DL再構成画像の原画像と14セットを,新たに開発されたボックスベース再構築指標とともに,標準再構成とオブジェクト検出指標を用いて評価した。 2人の臨床放射線技師が50人の患者の画像のサブセットをレビューした。その結果, 構造類似度指数 (SSIM) は異常検出指標 (mAP, r=0.64, p=0.01; F1 score, r=0.38, p=0.18) との相関が弱く, ボックスベースSSIMは検出性能 (mAP, r=0.81, p<0.01; F1 score, r=0.65, p=0.01) と強い相関を示した。 SSIMの小さな変動は検出結果には影響しなかったが、大きな変化は性能を低下させた。放射線技師によるAIによる評価では、精度が改善(援助なし86.0%、援助なし88.3%、p<0.05)し、インターラッター契約(Cohen's kappa、援助なし0.39、援助なし0.57)した。さらなるレビューにより、データセットにさらに17の病変が組み込まれた。提案手法は, 自動作業のための再構成アルゴリズムの評価と, DL再構成MR画像の解釈において, 放射線技師を支援することの確証を示す。 This study investigates the relationship between deep learning (DL) image reconstruction quality and anomaly detection performance, and evaluates the efficacy of an artificial intelligence (AI) assistant in enhancing radiologists' interpretation of meniscal anomalies on reconstructed images. A retrospective study was conducted using an in-house reconstruction and anomaly detection pipeline to assess knee MR images from 896 patients. The original and 14 sets of DL-reconstructed images were evaluated using standard reconstruction and object detection metrics, alongside newly developed box-based reconstruction metrics. Two clinical radiologists reviewed a subset of 50 patients' images, both original and AI-assisted reconstructed, with subsequent assessment of their accuracy and performance characteristics. Results indicated that the structural similarity index (SSIM) showed a weaker correlation with anomaly detection metrics (mAP, r=0.64, p=0.01; F1 score, r=0.38, p=0.18), while box-based SSIM had a stronger association with detection performance (mAP, r=0.81, p<0.01; F1 score, r=0.65, p=0.01). Minor SSIM fluctuations did not affect detection outcomes, but significant changes reduced performance. Radiologists' AI-assisted evaluations demonstrated improved accuracy (86.0% without assistance vs. 88.3% with assistance, p<0.05) and interrater agreement (Cohen's kappa, 0.39 without assistance vs. 0.57 with assistance). An additional review led to the incorporation of 17 more lesions into the dataset. The proposed anomaly detection method shows promise in evaluating reconstruction algorithms for automated tasks and aiding radiologists in interpreting DL-reconstructed MR images.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# 深層強化学習のための満足度探索 Satisficing Exploration for Deep Reinforcement Learning ( http://arxiv.org/abs/2407.12185v1 ) ライセンス: Link先を確認	Dilip Arumugam, Saurabh Kumar, Ramki Gummadi, Benjamin Van Roy,	(参考訳) 強化学習アルゴリズムの設計におけるデフォルトの前提は、意思決定エージェントが常に最適な行動を学ぶことである。しかし、現実世界の広大さと規模にアプローチする十分な複雑な環境では、最適なパフォーマンスを達成することは、実際には完全に難解な試みであり、エージェントが最適な政策を特定するための必要な探索を完了させる立場に立つことは滅多にない。最近の研究は、情報理論から設計エージェントへのツールを活用し、十分な満足度や満足度のあるソリューションが、損失のある圧縮によって得られるように、最適なソリューションを意図的に強制する。特に、このようなエージェントは、データ集約的な最適なエージェントよりも効率よく満足度の高い行動を学ぶために、根本的に異なる探索的決定を採用する可能性がある。厳密な近似理論によって支持されているが、基礎となるアルゴリズムはモデルに基づく計画に依存しており、関数近似と高次元観測によるこれらのアイデアの互換性を劇的に制限している。本研究では、モデルベースの計画の必要性を回避し、満足度の高いポリシーを学習できるように、最適な値関数に対する不確実性を直接表現するエージェントを拡張して、この問題を是正する。我々は,本アルゴリズムが深層強化学習エージェントを満足度の高い行動にどのように適用できるかを示す,単純かつ図解的な実験を行った。マルチアームバンディットのこの設定に関する以前の研究に従えば、我々のアルゴリズムは、情報理論以外の手法よりも効率的に最適な振る舞いを合成できることが分かる。 A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisite exploration for identifying an optimal policy. Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions, obtained through lossy compression. Notably, such agents may employ fundamentally different exploratory decisions to learn satisficing behaviors more efficiently than optimal ones that are more data intensive. While supported by a rigorous corroborating theory, the underlying algorithm relies on model-based planning, drastically limiting the compatibility of these ideas with function approximation and high-dimensional observations. In this work, we remedy this issue by extending an agent that directly represents uncertainty over the optimal value function allowing it to both bypass the need for model-based planning and to learn satisficing policies. We provide simple yet illustrative experiments that demonstrate how our algorithm enables deep reinforcement-learning agents to achieve satisficing behaviors. In keeping with previous work on this setting for multi-armed bandits, we additionally find that our algorithm is capable of synthesizing optimal behaviors, when feasible, more efficiently than its non-information-theoretic counterpart.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# L2AI:IOMTブロックチェーン環境における軽量3要素認証と認証 L2AI: lightweight three-factor authentication and authorization in IOMT blockchain-based environment ( http://arxiv.org/abs/2407.12187v1 ) ライセンス: Link先を確認	Laleh Khajehzadeh, Hamid Barati, Ali Barati,	(参考訳) 医療用インターネット・オブ・モノ(IoMT)は、デジタル革命の次のフロンティアであり、医療で利用されている。このコンテキストにおいて、IoTは、最小限のインタラクションで、個人が重要なアクティビティをリモートで管理することを可能にする。しかしながら、ネットワークリソースの制限とセキュアなチャネルを確立することの課題、さらにはセキュアでない公開チャネルを通じて機密情報を共有および収集することは、医療用IoTにセキュリティ上の課題をもたらす。本稿では,ブロックチェーン環境におけるリアルタイムデータにアクセスするための,軽量な多要素認証と匿名ユーザ認証方式を提案する。このスキームはL2AIと呼ばれる安全でないチャネルを利用する。 L2AIは、疑似同一性および動的インデックス化を使用して、ユーザ匿名性を高めながら、セキュリティと効率を確保する。提案手法は,ユーザ登録処理の効率化を図り,既存のシステムと新たに追加されたシステムの両方に,新たなプロセスなしでアクセスできるようにする。このスキームは主に医療インフラなどの大規模システム向けに設計されているが、資源に制約のあるデバイスにも適している。この方式は、一方通行の暗号ハッシュ関数とビットワイズXOR操作に依存している。さらに、ユーザ側でファジィマイニングアルゴリズムを使用して、ユーザの生体情報を検証する。 L2AIはセキュリティ証明に"Real-Or-Random (ROR)"モデルを採用し、認証の証明にBANロジックを採用している。形式的セキュリティ検証は、L2AIの適切な機能を示す非公式なセキュリティ分析によって補完される"Automatic Validation of Internet Security Protocols and Programs"(Proverif)ツールを用いて行われる。 Medical Internet of Things (IoMT) is the next frontier in the digital revolution and is utilized in healthcare. In this context, IoT enables individuals to remotely manage their essential activities with minimal interaction. However, the limitations of network resources and the challenges of establishing a secure channel, as well as sharing and collecting sensitive information through an insecure public channel, pose security challenges for the medical IoT. This paper presents a lightweight multi-factor authentication and anonymous user authentication scheme to access real-time data in a blockchain-based environment. The scheme utilizes an insecure channel called L2AI. L2AI ensures security and efficiency while enhancing user anonymity through the use of pseudo-identity and dynamic indexing. The proposed method supports highly scalable systems with an efficient user registration process, allowing authenticated users to access both existing and newly added system entities without additional processes. Although the scheme is primarily designed for large systems, such as health infrastructure, it is also suitable for resource-constrained devices. The scheme relies on one-way cryptographic hashing functions and bitwise XOR operations. Additionally, a fuzzy mining algorithm is employed on the user side to verify the user's biometric information. L2AI adopts the "Real-Or-Random (ROR)" model for security proof and employs BAN logic for proof of authenticity. Formal security verification is conducted using the "Automatic Validation of Internet Security Protocols and Programs" (Proverif) tool, complemented by informal security analysis demonstrating the proper functionality of L2AI.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# CroMo-Mixup: 継続的自己監督型学習のためのクロスモデル表現の拡張 CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning ( http://arxiv.org/abs/2407.12188v1 ) ライセンス: Link先を確認	Erum Mushtaq, Duygu Nur Yaldiz, Yavuz Faruk Bakman, Jie Ding, Chenyang Tao, Dimitrios Dimitriadis, Salman Avestimehr,	(参考訳) 連続自己教師付き学習(CSSL)は、ラベルのないデータに基づいて連続的に一連のタスクを学習する。継続的な学習の2つの主な課題は、破滅的な忘れとタスクの混乱である。 CSSL問題は、悲惨な忘れがちな課題に対処するために研究されているが、タスクの混乱に対処する作業はほとんど行われていない。本研究は,CSSLがタスク混乱問題,特に異なるタスクに属する異なるクラスが同時に訓練されないため,クラスインクリメンタルラーニングの少ない環境において,自己教師型学習(SSL)により CSSL がより受け入れやすいことを示す。この課題に触発され、2つの重要なコンポーネントを通じてこの問題に対処する新しいクロスモデル機能Mixup(CroMo-Mixup)フレームワークを提案する。 1)タスク間でサンプルを混合して負のサンプルの多様性を高めるクロスタスクデータ混成 2) クロスモデルの特徴は, 混合サンプルの現在のモデルと古いモデルから得られた埋め込みと原画像との類似性を学習し, クロスタスククラスコントラスト学習と古い知識検索を容易にする。 CIFAR10, CIFAR100, littleImageNetの3つのデータセットにおいて, タスクID予測と平均線形精度の両方を改善するためにCroMo-Mixupの有効性を評価した。最先端SSLの4つの目標に対して,CroMo-Mixupの互換性を検証する。コードは \url{https://github.com/ErumMushtaq/CroMo-Mixup} で公開されている。 Continual self-supervised learning (CSSL) learns a series of tasks sequentially on the unlabeled data. Two main challenges of continual learning are catastrophic forgetting and task confusion. While CSSL problem has been studied to address the catastrophic forgetting challenge, little work has been done to address the task confusion aspect. In this work, we show through extensive experiments that self-supervised learning (SSL) can make CSSL more susceptible to the task confusion problem, particularly in less diverse settings of class incremental learning because different classes belonging to different tasks are not trained concurrently. Motivated by this challenge, we present a novel cross-model feature Mixup (CroMo-Mixup) framework that addresses this issue through two key components: 1) Cross-Task data Mixup, which mixes samples across tasks to enhance negative sample diversity; and 2) Cross-Model feature Mixup, which learns similarities between embeddings obtained from current and old models of the mixed sample and the original images, facilitating cross-task class contrast learning and old knowledge retrieval. We evaluate the effectiveness of CroMo-Mixup to improve both Task-ID prediction and average linear accuracy across all tasks on three datasets, CIFAR10, CIFAR100, and tinyImageNet under different class-incremental learning settings. We validate the compatibility of CroMo-Mixup on four state-of-the-art SSL objectives. Code is available at \url{https://github.com/ErumMushtaq/CroMo-Mixup}.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# MASIVE:英語とスペイン語でオープンエンディングされた影響のある国家識別 MASIVE: Open-Ended Affective State Identification in English and Spanish ( http://arxiv.org/abs/2407.12196v1 ) ライセンス: Link先を確認	Nicholas Deas, Elsbeth Turcan, Iván Pérez Mejía, Kathleen McKeown,	(参考訳) 感情分析の分野では、多くのNLP研究は、言語にまたがる限られた数の個別の感情カテゴリーを特定することに焦点を当てている。しかし、これらの基本セットはテキストデータを念頭に置いて設計されることはめったになく、文化、言語、方言は特定の感情がどのように解釈されるかに影響を与える。本研究は,人間が感情体験を記述するために使用する言葉を含む,事実上無拘束な「textit{affective states}」の範囲を広げる。私たちは、英語とスペイン語でReddit投稿のデータセットであるMASIVEを収集し、公開しています。次に、マスク付きスパン予測タスクとしてフレーム化された言語生成モデルに対する「textit{affective state Identification}」という新しい問題を定義する。このタスクでは、より小さな微調整された多言語モデルの方が、地域固有のスペイン感情状態においても、ずっと大きなLLMより優れていることが分かる。さらに,MASIVEの事前学習により,既存の感情ベンチマークのモデル性能が向上することを示す。最後に, 機械翻訳実験により, この課題に対して, ネイティブな話者記述データが不可欠であることが確認された。 In the field of emotion analysis, much NLP research focuses on identifying a limited number of discrete emotion categories, often applied across languages. These basic sets, however, are rarely designed with textual data in mind, and culture, language, and dialect can influence how particular emotions are interpreted. In this work, we broaden our scope to a practically unbounded set of \textit{affective states}, which includes any terms that humans use to describe their experiences of feeling. We collect and publish MASIVE, a dataset of Reddit posts in English and Spanish containing over 1,000 unique affective states each. We then define the new problem of \textit{affective state identification} for language generation models framed as a masked span prediction task. On this task, we find that smaller finetuned multilingual models outperform much larger LLMs, even on region-specific Spanish affective states. Additionally, we show that pretraining on MASIVE improves model performance on existing emotion benchmarks. Finally, through machine translation experiments, we find that native speaker-written data is vital to good performance on this task.	翻訳日:2024-07-18 19:08:36 公開日:2024-07-16
# ソフトロボットインタラクションのための解釈可能なビジュオ触覚予測モデルを目指して Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot Interactions ( http://arxiv.org/abs/2407.12197v1 ) ライセンス: Link先を確認	Enrico Donato, Thomas George Thuruthel, Egidio Falotico,	(参考訳) 自律システムは予測不可能な環境をナビゲートし、外部オブジェクトと対話するという、複雑な課題に直面します。ロボットエージェントを現実世界の状況にうまく統合することは、世界モデルと予測スキルの融合を含む知覚能力に依存している。効果的な知覚モデルは、周囲を探索するために様々な感覚モダリティの融合の上に構築される。生の感覚モダリティに応用されたディープラーニングは、実行可能な選択肢を提供する。しかし、学習に基づく知覚表現は解釈が困難になる。この課題はソフトロボットにおいて特に顕著であり、構造や素材のコンプライアンスが予測をさらに困難にしている。我々の研究は、生成モデルを利用してソフトロボットのためのマルチモーダル認識モデルを構築し、対外物体との接触を予測・解釈するために、受容的・視覚的情報を活用することで、この複雑さに対処する。知覚モデルを理解するための一連のツールが提供され、学習段階の後に複数の感覚入力の融合と予測プロセスに光を当てる。我々は、知覚モデルとその制御目的への含意の展望を掘り下げる。 Autonomous systems face the intricate challenge of navigating unpredictable environments and interacting with external objects. The successful integration of robotic agents into real-world situations hinges on their perception capabilities, which involve amalgamating world models and predictive skills. Effective perception models build upon the fusion of various sensory modalities to probe the surroundings. Deep learning applied to raw sensory modalities offers a viable option. However, learning-based perceptive representations become difficult to interpret. This challenge is particularly pronounced in soft robots, where the compliance of structures and materials makes prediction even harder. Our work addresses this complexity by harnessing a generative model to construct a multi-modal perception model for soft robots and to leverage proprioceptive and visual information to anticipate and interpret contact interactions with external objects. A suite of tools to interpret the perception model is furnished, shedding light on the fusion and prediction processes across multiple sensory inputs after the learning phase. We will delve into the outlooks of the perception model and its implications for control purposes.	翻訳日:2024-07-18 18:58:45 公開日:2024-07-16
# 可逆性プロトタイプネットワークは、たぶんそんな感じだ This Probably Looks Exactly Like That: An Invertible Prototypical Network ( http://arxiv.org/abs/2407.12200v1 ) ライセンス: Link先を確認	Zachariah Carmichael, Timothy Redgrave, Daniel Gonzalez Cedre, Walter J. Scheirer,	(参考訳) 概念に基づくニューラルネットワークと、フローベースの生成的分類器を組み合わせることで、教師あり学習に対する、本質的に説明可能な、正確に非可逆的なアプローチを実現します。概念に基づくニューラルネットワークの一種であるプロトタイプニューラルネットワークは、概念アノテーションを使わずに人間の理解可能な機械学習を実現する上で、エキサイティングな方法を示しているが、人間と機械のセマンティックギャップは、現在のアプローチを悩ませ続けている。原型的説明に対する間接的解釈関数への依存は、プロトタイプの情報的力に厳しい制限を課すことが判明した。このことから、潜在空間上の分布としてプロトタイプを非可逆的に学習することで、より堅牢で表現的で解釈可能なモデリングが可能になると仮定する。本稿では,ガウス混合モデルを用いて正規化フローを構成することにより,ProtoFlowと呼ばれるモデルを提案する。 The new-of-the-art in joint generative and predictive modeling and (2) achieves predictive performance with existing prototypeal neural network while capable more interpretation。 We combine concept-based neural networks with generative, flow-based classifiers into a novel, intrinsically explainable, exactly invertible approach to supervised learning. Prototypical neural networks, a type of concept-based neural network, represent an exciting way forward in realizing human-comprehensible machine learning without concept annotations, but a human-machine semantic gap continues to haunt current approaches. We find that reliance on indirect interpretation functions for prototypical explanations imposes a severe limit on prototypes' informative power. From this, we posit that invertibly learning prototypes as distributions over the latent space provides more robust, expressive, and interpretable modeling. We propose one such model, called ProtoFlow, by composing a normalizing flow with Gaussian mixture models. ProtoFlow (1) sets a new state-of-the-art in joint generative and predictive modeling and (2) achieves predictive performance comparable to existing prototypical neural networks while enabling richer interpretation.	翻訳日:2024-07-18 18:58:45 公開日:2024-07-16
# 発音自由ヘブライ語TSに対する言語モデリング手法 A Language Modeling Approach to Diacritic-Free Hebrew TTS ( http://arxiv.org/abs/2407.12206v1 ) ライセンス: Link先を確認	Amit Roth, Arnon Turetzky, Yossi Adi,	(参考訳) 我々はヘブライ語におけるテキスト音声(TTS)の課題に取り組む。伝統的なヘブライ語には、個人が与えられた言葉を発音する方法を指示するダイアクリティカル語が含まれているが、現代のヘブライ語ではほとんど使われていない。現代のヘブライ語における発音学の欠如は、読者が正しい発音を結論付け、文脈に基づいてどの音素を使うべきかを理解することを期待する結果となった。これにより、TSシステムにテキストから音声への正確なマッピングを行うという根本的な課題が生じる。本研究では,Hubrew TTSの課題に対して,言語モデリングのダイアクリティカルスフリーアプローチを採用することを提案する。モデルは個別の音声表現で動作し、ワードピーストークン化器で条件付けされる。本稿では,弱教師付きデータを用いて提案手法を最適化し,複数のダイアクリティカルベースTSシステムと比較する。その結果,提案手法は,生成音声の内容保存と自然性の両方を考慮した評価ベースラインよりも優れていることが示唆された。 page.cs.huji.ac.il/adiyoss-lab/HebTTS/ We tackle the task of text-to-speech (TTS) in Hebrew. Traditional Hebrew contains Diacritics, which dictate the way individuals should pronounce given words, however, modern Hebrew rarely uses them. The lack of diacritics in modern Hebrew results in readers expected to conclude the correct pronunciation and understand which phonemes to use based on the context. This imposes a fundamental challenge on TTS systems to accurately map between text-to-speech. In this work, we propose to adopt a language modeling Diacritics-Free approach, for the task of Hebrew TTS. The model operates on discrete speech representations and is conditioned on a word-piece tokenizer. We optimize the proposed method using in-the-wild weakly supervised data and compare it to several diacritic-based TTS systems. Results suggest the proposed method is superior to the evaluated baselines considering both content preservation and naturalness of the generated speech. Samples can be found under the following link: pages.cs.huji.ac.il/adiyoss-lab/HebTTS/	翻訳日:2024-07-18 18:58:45 公開日:2024-07-16
# NeuSurfEmb:CADモデルなしの高密度対応型6次元オブジェクト空間推定のための完全パイプライン NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models ( http://arxiv.org/abs/2407.12207v1 ) ライセンス: Link先を確認	Francesco Milano, Jen Jen Chung, Hermann Blum, Roland Siegwart, Lionel Ott,	(参考訳) 6Dオブジェクトのポーズ推定のための最先端のアプローチはCADモデルの可用性を前提としており、ユーザーは合成トレーニングデータ生成のために物理ベースレンダリング(PBR)パイプラインを手動でセットアップする必要がある。どちらの要因も実際のシナリオにおけるこれらの手法の適用を制限する。本研究では,CADモデルを必要とせず,少数の実画像のみを入力として必要とする最先端のポーズ推定器を訓練できるパイプラインを提案する。提案手法は,Structure-from-Motion (SfM) とオブジェクトに依存しないセグメンテーションに基づいて,半自動で学習するNeuS2オブジェクト表現に基づいている。我々は、NeuS2とシンプルなカット・アンド・ペースト・オーグメンテーションの新規なビュー合成機能を利用して、自動的にフォトリアリスティックなオブジェクトレンダリングを生成し、通信ベースのSurfEmbポーズ推定器を訓練する。提案手法をLINEMOD-Occlusionデータセット上で評価し,各コンポーネントの影響を広範囲に検討し,CADモデルとPBRデータに基づくアプローチによる競合性能を示す。さらに,本手法は,従来のCADモデルのない手法よりも優れた精度とロバスト性を実現し,自己コンパイルされた実世界のオブジェクトに対するパイプラインの使用の容易さと有効性を実証する。ロボットコミュニティがこのシステムの恩恵を受けるために、https://www.github.com/ethz-asl/neusurfemb.comで公開します。 State-of-the-art approaches for 6D object pose estimation assume the availability of CAD models and require the user to manually set up physically-based rendering (PBR) pipelines for synthetic training data generation. Both factors limit the application of these methods in real-world scenarios. In this work, we present a pipeline that does not require CAD models and allows training a state-of-the-art pose estimator requiring only a small set of real images as input. Our method is based on a NeuS2 object representation, that we learn through a semi-automated procedure based on Structure-from-Motion (SfM) and object-agnostic segmentation. We exploit the novel-view synthesis ability of NeuS2 and simple cut-and-paste augmentation to automatically generate photorealistic object renderings, which we use to train the correspondence-based SurfEmb pose estimator. We evaluate our method on the LINEMOD-Occlusion dataset, extensively studying the impact of its individual components and showing competitive performance with respect to approaches based on CAD models and PBR data. We additionally demonstrate the ease of use and effectiveness of our pipeline on self-collected real-world objects, showing that our method outperforms state-of-the-art CAD-model-free approaches, with better accuracy and robustness to mild occlusions. To allow the robotics community to benefit from this system, we will publicly release it at https://www.github.com/ethz-asl/neusurfemb.	翻訳日:2024-07-18 18:58:45 公開日:2024-07-16
# 画像分類による自己監督型事前学習のベンチマーク A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification ( http://arxiv.org/abs/2407.12210v1 ) ライセンス: Link先を確認	Markus Marks, Manuel Knott, Neehar Kondapaneni, Elijah Cole, Thijs Defraeye, Fernando Perez-Cruz, Pietro Perona,	(参考訳) 自己教師付き学習(SSL)は、データ自体が監視を提供する機械学習アプローチであり、外部ラベルの必要性を排除している。モデルは、プリテキストタスクを解くことで、データ構造やコンテキストについて学ぶことを余儀なくされます。 SSLでは、モデルが豊富で安価なラベル付きデータから学習でき、ラベルが高価またはアクセス不能なトレーニングモデルのコストが大幅に削減される。コンピュータビジョンでは、SSLは事前トレーニングに続き、教師付き転送、より小さなラベル付きデータセットでの少数ショット学習、および/または教師なしクラスタリングなどの下流タスクとして広く使用されている。残念ながら、すべてのダウンストリームタスクに対してSSLメソッドを評価し、学習した表現の質を客観的に測定することは不可能である。代わりに、SSLメソッドは、細調整、線形探索、k-nearest neighbors(kNN)などのドメイン内評価プロトコルを用いて評価される。しかし、これらの評価プロトコルが、データセット、メートル法、モデルアーキテクチャといった異なる条件下で、異なる下流タスクに対する事前訓練されたモデルの表現品質をどのように評価するかはよく分かっていない。 SSLの分類に基づく評価プロトコルがどのように相関し、異なるデータセットのダウンストリーム性能を予測するかを検討する。我々の研究には、11の一般的なイメージデータセットと26のモデルが含まれており、それらは異なるSSLメソッドまたは異なるモデルバックボーンで事前トレーニングされた。ドメイン内線形/kNN探索プロトコルは,平均してドメイン外性能の予測器として最適であることがわかった。さらに、バッチ正規化の重要性について検討し、異なる種類のデータセットドメインシフトに対するロバストな相関性を評価する。識別的自己管理手法と生成的自己管理手法の関係に関する仮定に挑戦し,その性能差の大部分は,モデルバックボーンの変更によって説明できることを示した。 Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. The model is forced to learn about the data structure or context by solving a pretext task. With SSL, models can learn from abundant and cheap unlabeled data, significantly reducing the cost of training models where labels are expensive or inaccessible.In Computer Vision, SSL is widely used as pre-training followed by a downstream task, such as supervised transfer, few-shot learning on smaller labeled data sets, and/or unsupervised clustering. Unfortunately, it is infeasible to evaluate SSL methods on all possible downstream tasks and objectively measure the quality of the learned representation. Instead, SSL methods are evaluated using in-domain evaluation protocols, such as fine-tuning, linear probing, and k-nearest neighbors (kNN). However, it is not well understood how well these evaluation protocols estimate the representation quality of a pre-trained model for different downstream tasks under different conditions, such as dataset, metric, and model architecture. We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types. Our study includes eleven common image datasets and 26 models that were pre-trained with different SSL methods or have different model backbones. We find that in-domain linear/kNN probing protocols are, on average, the best general predictors for out-of-domain performance. We further investigate the importance of batch normalization and evaluate how robust correlations are for different kinds of dataset domain shifts. We challenge assumptions about the relationship between discriminative and generative self-supervised methods, finding that most of their performance differences can be explained by changes to model backbones.	翻訳日:2024-07-18 18:58:45 公開日:2024-07-16
# 疫学不確かさの校正について--原理・パラドックス・紛争損失 On the Calibration of Epistemic Uncertainty: Principles, Paradoxes and Conflictual Loss ( http://arxiv.org/abs/2407.12211v1 ) ライセンス: Link先を確認	Mohammed Fellaji, Frédéric Pennerath, Brieuc Conan-Guez, Miguel Couceiro,	(参考訳) 予測分布の校正は、ディープラーニングにおいて広く研究されてきたが、Deep Ensembles、Bayesian Deep Networks、Evidential Deep Networksによって生み出されたより具体的な疫学的な不確実性については、同じことは言えない。測定可能ではあるが、この不確実性は、様々な選択肢が存在する事前に依存するため、客観的に校正することは困難である。それにもかかわらず、すべてのケースにおいて、疫学的不確実性は2つの形式的要件を満たす必要がある: 第一に、トレーニングデータセットが大きくなると減少し、第二に、モデル表現性が大きくなると増大しなければならない。これらの期待にもかかわらず、いくつかの基準データセットやモデルにおいて、疫学的不確実性の尺度がこれらの要件に違反し、時には予想とは全く逆の傾向を示すことが実験的に示されている。これらの期待と現実の間のパラドックスは、これらのモデルによって推定されるてんかんの不確実性の真の有用性に関する疑問を提起する。公式な議論は、この不一致は測度自体の欠陥ではなく、後部分布の近似が不十分なためであることを示している。そこで本研究では,これらの要求に則った競合損失という,深層アンサンブルの正規化関数を提案する。我々は,深層アンサンブルの性能や校正を犠牲にすることなく,疫学的不確実性の要件の双方を修復できることを実験的に示すことで,その強度を強調した。 The calibration of predictive distributions has been widely studied in deep learning, but the same cannot be said about the more specific epistemic uncertainty as produced by Deep Ensembles, Bayesian Deep Networks, or Evidential Deep Networks. Although measurable, this form of uncertainty is difficult to calibrate on an objective basis as it depends on the prior for which a variety of choices exist. Nevertheless, epistemic uncertainty must in all cases satisfy two formal requirements: first, it must decrease when the training dataset gets larger and, second, it must increase when the model expressiveness grows. Despite these expectations, our experimental study shows that on several reference datasets and models, measures of epistemic uncertainty violate these requirements, sometimes presenting trends completely opposite to those expected. These paradoxes between expectation and reality raise the question of the true utility of epistemic uncertainty as estimated by these models. A formal argument suggests that this disagreement is due to a poor approximation of the posterior distribution rather than to a flaw in the measure itself. Based on this observation, we propose a regularization function for deep ensembles, called conflictual loss in line with the above requirements. We emphasize its strengths by showing experimentally that it restores both requirements of epistemic uncertainty, without sacrificing either the performance or the calibration of the deep ensembles.	翻訳日:2024-07-18 18:58:45 公開日:2024-07-16
# よりロバストな低予算能動学習のための一般化被覆 Generalized Coverage for More Robust Low-Budget Active Learning ( http://arxiv.org/abs/2407.12212v1 ) ライセンス: Link先を確認	Wonho Bae, Junhyug Noh, Danica J. Sutherland,	(参考訳) Yehuda et al の ProbCover 法は低予算体制下での活発な学習のためのよく動機付けられたアルゴリズムであり、与えられた半径の球でデータ分布を探索しようとするものである。しかし,本アルゴリズムの性能は,この半径ハイパーパラメータの選択に極めて敏感であり,チューニングは非常に困難であり,本来のヒューリスティックは頻繁に失敗することを示した。したがって、特殊ケースとしてのProbCoverの目的を含む一般化された「被覆」の概念を導入する(そして理論的に動機づける)が、超パラメータ選択に対してはるかに堅牢な滑らかな概念を可能にする。本稿では、このカバレッジを最適化し、ProbCoverのアルゴリズムを一般化する効率的なグリージー手法を提案する。この目的は、$k$-medoidsの変種によって非グレードに最適化され、他の低予算のアクティブな学習方法との関係を明確にすることができる。総合的な実験では、MaxHerdingは複数の低予算画像分類ベンチマークにまたがる既存のアクティブな学習手法を超越し、ほとんどの競争的手法よりも計算コストが低い。 The ProbCover method of Yehuda et al. is a well-motivated algorithm for active learning in low-budget regimes, which attempts to "cover" the data distribution with balls of a given radius at selected data points. We demonstrate, however, that the performance of this algorithm is extremely sensitive to the choice of this radius hyper-parameter, and that tuning it is quite difficult, with the original heuristic frequently failing. We thus introduce (and theoretically motivate) a generalized notion of "coverage," including ProbCover's objective as a special case, but also allowing smoother notions that are far more robust to hyper-parameter choice. We propose an efficient greedy method to optimize this coverage, generalizing ProbCover's algorithm; due to its close connection to kernel herding, we call it "MaxHerding." The objective can also be optimized non-greedily through a variant of $k$-medoids, clarifying the relationship to other low-budget active learning methods. In comprehensive experiments, MaxHerding surpasses existing active learning methods across multiple low-budget image classification benchmarks, and does so with less computational cost than most competitive methods.	翻訳日:2024-07-18 18:58:45 公開日:2024-07-16
# VideoClusterNet:ビデオの自己監視と適応的クラスタリング VideoClusterNet: Self-Supervised and Adaptive Clustering For Videos ( http://arxiv.org/abs/2407.12214v1 ) ライセンス: Link先を確認	Devesh Walawalkar, Pablo Garrido,	(参考訳) デジタルメディアのコンテンツ制作が進むにつれ、映画やテレビシリーズのエピソードを分析してキャラクタの主役を正確に特定する必要性が高まっており、特にビデオ顔クラスタリングは、検出された顔のトラックを共通の顔のアイデンティティでまとめることを目的としている。この問題は、ビデオフレームにまたがる特定の顔のポーズ、表情、外観、照明のバリエーションが多岐にわたるため、非常に難しい。ジェネリックな事前訓練された顔識別(ID)モデルは、高いダイナミックレンジのコンテンツとユニークなシネマティックスタイルを考えると、ビデオ制作領域に適さない。さらに、従来のクラスタリングアルゴリズムはデータセットをまたいだ個別のチューニングを必要とするハイパーパラメータに依存している。本稿では,ジェネリック・フェイスIDモデルから新しいビデオ・フェイス・トラックへの適応を,完全自己管理方式で学習する新しいビデオ・フェイス・クラスタリング手法を提案する。また,任意の入力ビデオに対して,微調整されたモデルの埋め込み空間に自動的に適応できるパラメータフリークラスタリングアルゴリズムを提案する。包括的な映画顔クラスタリングベンチマークが欠如しているため、第1世代の映画データセットであるMovieFaceClusterも提示する。私たちのデータセットは、映画業界の専門家によって手作業で作成されており、非常に困難な顔認証シナリオが含まれています。実験により,従来のテレビシリーズのデータセットでは,ベンチマークデータセットにおける難易度の高いメインストリームのシーンの処理と,最先端の性能が評価された。 With the rise of digital media content production, the need for analyzing movies and TV series episodes to locate the main cast of characters precisely is gaining importance.Specifically, Video Face Clustering aims to group together detected video face tracks with common facial identities. This problem is very challenging due to the large range of pose, expression, appearance, and lighting variations of a given face across video frames. Generic pre-trained Face Identification (ID) models fail to adapt well to the video production domain, given its high dynamic range content and also unique cinematic style. Furthermore, traditional clustering algorithms depend on hyperparameters requiring individual tuning across datasets. In this paper, we present a novel video face clustering approach that learns to adapt a generic face ID model to new video face tracks in a fully self-supervised fashion. We also propose a parameter-free clustering algorithm that is capable of automatically adapting to the finetuned model's embedding space for any input video. Due to the lack of comprehensive movie face clustering benchmarks, we also present a first-of-kind movie dataset: MovieFaceCluster. Our dataset is handpicked by film industry professionals and contains extremely challenging face ID scenarios. Experiments show our method's effectiveness in handling difficult mainstream movie scenes on our benchmark dataset and state-of-the-art performance on traditional TV series datasets.	翻訳日:2024-07-18 18:58:45 公開日:2024-07-16
# AFIDAF: ViT におけるアテンションの効率的な代替手段としてのフーリエと画像ドメイン適応フィルタの代替 AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs ( http://arxiv.org/abs/2407.12217v1 ) ライセンス: Link先を確認	Yunling Zheng, Zeyi Xu, Fanghui Xue, Biao Yang, Jiancheng Lyu, Shuai Zhang, Yingyong Qi, Jack Xin,	(参考訳) 本稿では,視覚バックボーン構築の代替として,特徴抽出のためのFourier と Image Domain Filtering の交互なアプローチを提案する。軽量モデル間の性能は、ImageNet-1K分類の最先端レベルに達し、オブジェクト検出やセグメンテーションの下流タスクも一貫して改善する。我々のアプローチは、視覚変換器(ViT)を圧縮するための新しいツールとしても機能する。 We propose and demonstrate an alternating Fourier and image domain filtering approach for feature extraction as an efficient alternative to build a vision backbone without using the computationally intensive attention. The performance among the lightweight models reaches the state-of-the-art level on ImageNet-1K classification, and improves downstream tasks on object detection and segmentation consistently as well. Our approach also serves as a new tool to compress vision transformers (ViTs).	翻訳日:2024-07-18 18:58:45 公開日:2024-07-16
# 静止画をダイナミックビデオに変える「Animate Your Motion」 Animate Your Motion: Turning Still Images into Dynamic Videos ( http://arxiv.org/abs/2403.10179v3 ) ライセンス: Link先を確認	Mingxiao Li, Bo Wan, Marie-Francine Moens, Tinne Tuytelaars,	(参考訳) 近年、拡散モデルはテキスト・ビデオ生成において顕著な進歩を遂げており、ユーザの意図をより正確に反映するために、ビデオ出力の制御を強化しようと試みている。従来の取り組みは主に、画像や深度マップのようなセマンティックな手がかりや、スケッチやオブジェクト境界ボックスの移動といったモーションベースの条件の採用に重点を置いている。セマンティックな入力はリッチなシーンコンテキストを提供するが、詳細な動きの特異性は欠く; 逆に、モーションインプットは正確な軌跡情報を提供するが、より広いセマンティックな物語を見逃す。図1に示すように、ビデオ生成のための拡散モデルにおいて、セマンティックキューとモーションキューの両方を初めて統合する。この目的のために,マルチモーダル入力を管理する新しい手法であるScene and Motion Conditional Diffusion (SMCD)を紹介した。認識された動作条件モジュールを組み込み、シーン条件を統合する様々なアプローチを調査し、異なるモーダル間のシナジーを促進する。モデルトレーニングでは、2つのモードの条件を分離し、2段階のトレーニングパイプラインを導入します。実験により,映像品質,動作精度,セマンティックコヒーレンスを著しく向上させることが示された。 In recent years, diffusion models have made remarkable strides in text-to-video generation, sparking a quest for enhanced control over video outputs to more accurately reflect user intentions. Traditional efforts predominantly focus on employing either semantic cues, like images or depth maps, or motion-based conditions, like moving sketches or object bounding boxes. Semantic inputs offer a rich scene context but lack detailed motion specificity; conversely, motion inputs provide precise trajectory information but miss the broader semantic narrative. For the first time, we integrate both semantic and motion cues within a diffusion model for video generation, as demonstrated in Fig 1. To this end, we introduce the Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. It incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions, promoting synergy between different modalities. For model training, we separate the conditions for the two modalities, introducing a two-stage training pipeline. Experimental results demonstrate that our design significantly enhances video quality, motion precision, and semantic coherence.	翻訳日:2024-07-18 11:56:44 公開日:2024-07-16
# BraTS-PED:2023年度国際小児脳腫瘍研究会議報告 BraTS-PEDs: Results of the Multi-Consortium International Pediatric Brain Tumor Segmentation Challenge 2023 ( http://arxiv.org/abs/2407.08855v2 ) ライセンス: Link先を確認	Anahita Fathi Kazerooni, Nastaran Khalili, Xinyang Liu, Debanjan Haldar, Zhifan Jiang, Anna Zapaishchykova, Julija Pavaine, Lubdha M. Shah, Blaise V. Jones, Nakul Sheth, Sanjay P. Prabhu, Aaron S. McAllister, Wenxin Tu, Khanak K. Nandolia, Andres F. Rodriguez, Ibraheem Salman Shaikh, Mariana Sanchez Montano, Hollie Anne Lai, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Hannah Anderson, Syed Muhammed Anwar, Alejandro Aristizabal, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, James Eddy, Ivan Ezhov, Ariana M. Familiar, Keyvan Farahani, Deep Gandhi, Anurag Gottipati, Shuvanjan Haldar, Juan Eugenio Iglesias, Anastasia Janas, Elaine Elaine, Alexandros Karargyris, Hasan Kassem, Neda Khalili, Florian Kofler, Dominic LaBella, Koen Van Leemput, Hongwei B. Li, Nazanin Maleki, Zeke Meier, Bjoern Menze, Ahmed W. Moawad, Sarthak Pati, Marie Piraud, Tina Poussaint, Zachary J. Reitman, Jeffrey D. Rudie, Rachit Saluja, MIcah Sheller, Russell Takeshi Shinohara, Karthik Viswanathan, Chunhao Wang, Benedikt Wiestler, Walter F. Wiggins, Christos Davatzikos, Phillip B. Storm, Miriam Bornhorst, Roger Packer, Trent Hummel, Peter de Blank, Lindsey Hoffman, Mariam Aboian, Ali Nabavizadeh, Jeffrey B. Ware, Benjamin H. Kann, Brian Rood, Adam Resnick, Spyridon Bakas, Arastoo Vossough, Marius George Linguraru,	(参考訳) 小児中枢神経系腫瘍は、小児のがん関連死亡の主な原因である。小児の高次グリオーマの生存率は20%未満である。新しい治療法の開発は、再現可能で正確な集中的反応評価を必要とする多施設共同臨床試験に依存している。小児脳腫瘍に焦点を当てた第1回BraTS-PEDs 2023チャレンジ(BraTS-PEDs 2023 Challenge)の結果を報告する。この課題は、小児神経腫瘍学と臨床試験に特化した複数の国際コンソーシアムから取得したデータを利用した。 BraTS-PEDs 2023は、BraTS 2023の課題にまたがる標準的な定量的パフォーマンス評価指標を用いて、磁気共鳴画像から小児脳グリオーマのボリュームセグメンテーションアルゴリズムを評価することを目的とした。小児腫瘍分析におけるトップパフォーマンスのAIアプローチには、nnU-NetとSwin UNETR、Auto3DSeg、あるいはnnU-Netの自己組織化フレームワークによるアンサンブルが含まれていた。 BraTSPEDs 2023は、臨床医(神経腫瘍学者、神経放射線学者)とAI/画像科学者とのコラボレーションを促進し、より高速なデータ共有と自動ボリューム分析技術の開発を促進した。これらの進歩は臨床試験に大きく貢献し、脳腫瘍の子供のケアを改善する可能性がある。 Pediatric central nervous system tumors are the leading cause of cancer-related deaths in children. The five-year survival rate for high-grade glioma in children is less than 20%. The development of new treatments is dependent upon multi-institutional collaborative clinical trials requiring reproducible and accurate centralized response assessment. We present the results of the BraTS-PEDs 2023 challenge, the first Brain Tumor Segmentation (BraTS) challenge focused on pediatric brain tumors. This challenge utilized data acquired from multiple international consortia dedicated to pediatric neuro-oncology and clinical trials. BraTS-PEDs 2023 aimed to evaluate volumetric segmentation algorithms for pediatric brain gliomas from magnetic resonance imaging using standardized quantitative performance evaluation metrics employed across the BraTS 2023 challenges. The top-performing AI approaches for pediatric tumor analysis included ensembles of nnU-Net and Swin UNETR, Auto3DSeg, or nnU-Net with a self-supervised framework. The BraTSPEDs 2023 challenge fostered collaboration between clinicians (neuro-oncologists, neuroradiologists) and AI/imaging scientists, promoting faster data sharing and the development of automated volumetric analysis techniques. These advancements could significantly benefit clinical trials and improve the care of children with brain tumors.	翻訳日:2024-07-18 11:56:44 公開日:2024-07-16
# 情報検索と製品検索のギャップを埋める:eコマースへのQ&A勧告 Bridging the Gap Between Information Seeking and Product Search Systems: Q&A Recommendation for E-commerce ( http://arxiv.org/abs/2407.09653v2 ) ライセンス: Link先を確認	Saar Kuzi, Shervin Malmasi,	(参考訳) ショッピングミッションの消費者は、商品の理解を深め、購入決定に達するための反復的なプロセスにおいて、Web検索エンジンや質問回答(QA)システムのような製品検索と情報検索システムの両方を利用することが多い。商品検索は、購入者が自分の要求を満たす実際の商品をカタログで見つけるのに有用であるが、情報検索システムは、それらの要求を洗練させるために必要なあらゆる質問に答えるために利用することができる。最近、LLM(Large Language Models)の成功により、顧客が目標を迅速に効果的に達成するための2つのタスク間のギャップを埋める機会が開かれた。本稿では,ユーザに対して,製品検索に関連する質問応答(Q&A)ペアを推薦し,購入決定を支援することを提案する。本稿では、Q&Aペアの要件と特性、その生成、Q&Aレコメンデーションタスクの最適化など、問題のさまざまな側面について論じる。我々は、この新興分野における今後の研究を促進するための課題、オープンな課題、そして解決策を提案する。 Consumers on a shopping mission often leverage both product search and information seeking systems, such as web search engines and Question Answering (QA) systems, in an iterative process to improve their understanding of available products and reach a purchase decision. While product search is useful for shoppers to find the actual products meeting their requirements in the catalog, information seeking systems can be utilized to answer any questions they may have to refine those requirements. The recent success of Large Language Models (LLMs) has opened up an opportunity to bridge the gap between the two tasks to help customers achieve their goals quickly and effectively by integrating conversational QA within product search. In this paper, we propose to recommend users Question-Answer (Q&A) pairs that are relevant to their product search and can help them make a purchase decision. We discuss the different aspects of the problem including the requirements and characteristics of the Q&A pairs, their generation, and the optimization of the Q&A recommendation task. We highlight the challenges, open problems, and suggested solutions to encourage future research in this emerging area.	翻訳日:2024-07-18 11:56:44 公開日:2024-07-16
# 正・未ラベルデータ:モデル、推定、推論、分類 Positive and Unlabeled Data: Model, Estimation, Inference, and Classification ( http://arxiv.org/abs/2407.09735v2 ) ライセンス: Link先を確認	Siyan Liu, Chi-Kuang Yeh, Xin Zhang, Qinglong Tian, Pengfei Li,	(参考訳) 本研究では,2次指数傾斜モデル(DETM)による正・ラベルなし(PU)データへの新たなアプローチを提案する。従来の手法は、正とラベルなしの正のデータが同じ分布から来ると仮定されるランダムな(SCAR)PUデータでのみ適用されるため、しばしば不足する。対照的に、DEMの双対構造は、ラベル付きおよびラベルなしの正のデータが異なる分布から得られるランダムPUデータにおいて、より複雑で未探索のデータを効果的に許容する。同定可能性,パラメータ推定,漸近特性など,DETMの理論的基礎を厳格に確立する。さらに、SCAR条件の適合性テストを開発し、対象領域における正のインスタンスの割合に対する信頼区間を構築することにより、統計的推測を推し進める。我々は、近似ベイズ分類器を分類タスクに利用し、予測におけるDETMの頑健な性能を実証する。本研究は、理論的洞察と実用的応用を通じて、PUデータの課題に対処するための包括的なフレームワークとして、DETMを強調した。 This study introduces a new approach to addressing positive and unlabeled (PU) data through the double exponential tilting model (DETM). Traditional methods often fall short because they only apply to selected completely at random (SCAR) PU data, where the labeled positive and unlabeled positive data are assumed to be from the same distribution. In contrast, our DETM's dual structure effectively accommodates the more complex and underexplored selected at random PU data, where the labeled and unlabeled positive data can be from different distributions. We rigorously establish the theoretical foundations of DETM, including identifiability, parameter estimation, and asymptotic properties. Additionally, we move forward to statistical inference by developing a goodness-of-fit test for the SCAR condition and constructing confidence intervals for the proportion of positive instances in the target domain. We leverage an approximated Bayes classifier for classification tasks, demonstrating DETM's robust performance in prediction. Through theoretical insights and practical applications, this study highlights DETM as a comprehensive framework for addressing the challenges of PU data.	翻訳日:2024-07-18 11:56:44 公開日:2024-07-16
# 安全ファインチューニングの作り方と壊し方 : メカニカルスタディ What Makes and Breaks Safety Fine-tuning? A Mechanistic Study ( http://arxiv.org/abs/2407.10264v2 ) ライセンス: Link先を確認	Samyak Jain, Ekdeep Singh Lubana, Kemal Oksuz, Tom Joy, Philip H. S. Torr, Amartya Sanyal, Puneet K. Dokania,	(参考訳) 安全性の微調整は、大規模な言語モデル(LLM)を、安全なデプロイメントのための人間の好みに合わせるのに役立つ。モデルが実行するタスク間の相互作用(例えば「設計」)をモデル化し、そのタスクの実行を要求される特定の概念(例えば「サイクル」対「ボム」)に対してモデル化することで、安全でない入力の健全な側面を捉える合成データ生成フレームワークを設計する。これを用いて、教師付き安全微調整、直接選好最適化、未学習の3つの有名な安全微調整手法を調査し、これらの手法がMDP重みを最小限に変換し、安全でない入力をその重みのnull空間に具体的に整合させることを示す重要な証拠を提供する。これにより、モデルがそれらを安全とみなすかどうかに基づいて、入力のクラスタリングが生成される。それに対応して、敵入力(例えばジェイルブレイク)が提供されると、その活性化はより安全なサンプルに近づき、安全であるかのように入力などのモデル処理が行われる。実世界のモデル、特にLlama-2 7BとLlama-3 8Bでこの結果を検証する。 Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation framework that captures salient aspects of an unsafe input by modeling the interaction between the task the model is asked to perform (e.g., "design") versus the specific concepts the task is asked to be performed upon (e.g., a "cycle" vs. a "bomb"). Using this, we investigate three well-known safety fine-tuning methods -- supervised safety fine-tuning, direct preference optimization, and unlearning -- and provide significant evidence demonstrating that these methods minimally transform MLP weights to specifically align unsafe inputs into its weights' null space. This yields a clustering of inputs based on whether the model deems them safe or not. Correspondingly, when an adversarial input (e.g., a jailbreak) is provided, its activations are closer to safer samples, leading to the model processing such an input as if it were safe. We validate our findings, wherever possible, on real-world models -- specifically, Llama-2 7B and Llama-3 8B.	翻訳日:2024-07-18 11:56:44 公開日:2024-07-16
# RecGS:リカレントガウススプラッティングによる水源除去 RecGS: Removing Water Caustic with Recurrent Gaussian Splatting ( http://arxiv.org/abs/2407.10318v2 ) ライセンス: Link先を確認	Tianyi Zhang, Weiming Zhi, Kaining Huang, Joshua Mangelson, Corina Barbalata, Matthew Johnson-Roberson,	(参考訳) 水の因果関係は浅海域の海底画像データでよく見られる。画像から因果パターンを除去する従来の方法は、注釈付きデータセットの2Dフィルタリングや事前トレーニングに依存しており、3D構造を持つ現実世界の海底データに一般化する際のパフォーマンスを妨げている。本稿では,今日の光現実的3次元再構成技術である3DGSを利用して,海底画像から因果関係を分離する新たな手法であるRecurrent Gaussian Splatting(RecGS)を提案する。水中ロボットによって撮影された一連の画像を用いて、3DGSを反復的に構築し、各イテレーションで低通過フィルタで因果関係を分解する。実験では, 共同最適化, 2次元フィルタリング, 深層学習など, 様々な手法を解析・比較した。以上の結果から,本手法は海底の因果関係を効果的に分離し,視覚的外観を良くし,不整合照明の問題点にも適用できる可能性が示唆された。 Water caustics are commonly observed in seafloor imaging data from shallow-water areas. Traditional methods that remove caustic patterns from images often rely on 2D filtering or pre-training on an annotated dataset, hindering the performance when generalizing to real-world seafloor data with 3D structures. In this paper, we present a novel method Recurrent Gaussian Splatting (RecGS), which takes advantage of today's photorealistic 3D reconstruction technology, 3DGS, to separate caustics from seafloor imagery. With a sequence of images taken by an underwater robot, we build 3DGS recurrently and decompose the caustic with low-pass filtering in each iteration. In the experiments, we analyze and compare with different methods, including joint optimization, 2D filtering, and deep learning approaches. The results show that our method can effectively separate the caustic from the seafloor, improving the visual appearance, and can be potentially applied on more problems with inconsistent illumination.	翻訳日:2024-07-18 11:56:44 公開日:2024-07-16
# 計算木論理によるシーケンシャルプランニングにおけるMCTS説明可能性の実現 Enabling MCTS Explainability for Sequential Planning Through Computation Tree Logic ( http://arxiv.org/abs/2407.10820v2 ) ライセンス: Link先を確認	Ziyan An, Hendrik Baier, Abhishek Dubey, Ayan Mukhopadhyay, Meiyi Ma,	(参考訳) モンテカルロ木探索(MCTS)は、シーケンシャルな計画タスクのための最も有能なオンライン検索アルゴリズムの1つであり、資源配分やトランジット計画といった分野において重要な応用がある。実世界のデプロイメントのパフォーマンスは高いが、MCTSの本質的な複雑さは、技術的なバックグラウンドのないユーザにとって理解を困難にしている。本稿では,MCTSを交通ルーティングサービスに利用し,最適化された経路計画を構築するためにアルゴリズムを統合することを検討する。これらの計画は、様々な制約と要件を同時に満たし、現実の文脈でアルゴリズムの操作を説明するタスクをさらに複雑にする必要がある。この重要な研究ギャップに対処するために、MCTSのための新しい計算木論理ベースの説明器を導入する。私たちのフレームワークは、ユーザ定義の要件を言語テンプレートを使って厳密なロジック仕様に翻訳することから始まります。そこで,本論文では,MCTSアルゴリズムでトラバースされた状態と動作を検証する論理検証と定量的評価モジュールを組み込んだ。この分析の結果は、第2の言語テンプレートを使用して、人間可読な記述テキストに変換される。アプローチのユーザ満足度を82名を対象に調査した。その結果,説明的アプローチはユーザの嗜好において,他のベースラインよりも有意に優れていた。 Monte Carlo tree search (MCTS) is one of the most capable online search algorithms for sequential planning tasks, with significant applications in areas such as resource allocation and transit planning. Despite its strong performance in real-world deployment, the inherent complexity of MCTS makes it challenging to understand for users without technical background. This paper considers the use of MCTS in transportation routing services, where the algorithm is integrated to develop optimized route plans. These plans are required to meet a range of constraints and requirements simultaneously, further complicating the task of explaining the algorithm's operation in real-world contexts. To address this critical research gap, we introduce a novel computation tree logic-based explainer for MCTS. Our framework begins by taking user-defined requirements and translating them into rigorous logic specifications through the use of language templates. Then, our explainer incorporates a logic verification and quantitative evaluation module that validates the states and actions traversed by the MCTS algorithm. The outcomes of this analysis are then rendered into human-readable descriptive text using a second set of language templates. The user satisfaction of our approach was assessed through a survey with 82 participants. The results indicated that our explanatory approach significantly outperforms other baselines in user preference.	翻訳日:2024-07-18 11:42:46 公開日:2024-07-16
# 地形モデルを用いたマラリアベクター飼育地の検出 Detection of Malaria Vector Breeding Habitats using Topographic Models ( http://arxiv.org/abs/2011.13714v2 ) ライセンス: Link先を確認	Aishwarya Jadhav,	(参考訳) マラリアベクターの繁殖地として機能する停滞した水域の処理は、ほとんどのマラリア除去キャンペーンの基本的なステップである。しかし、大規模な水域の特定は高価であり、労働集約的で時間を要するため、資源が限られている国では困難である。水体を効率的に発見できる実用的なモデルは、現場労働者がスキャンする必要がある領域を大幅に減らし、限られた資源を標的にすることができる。そこで本研究では,可能でグローバルで高解像度なDEMデータに基づく実用的な地形モデルを提案する。ガーナのオプアシ地域を調査し,様々な地形特性が異なる水域に与える影響を調査し,水生生物形成に大きな影響を及ぼす特徴を明らかにする。複数のモデルの有効性をさらに評価する。我々の最良モデルは、衛星画像データを利用し、異なる設定で堅牢性を示すものでさえも、小さな水面の検出に地形変数を用いた以前の試みよりも大幅に優れています。 Treatment of stagnant water bodies that act as a breeding site for malarial vectors is a fundamental step in most malaria elimination campaigns. However, identification of such water bodies over large areas is expensive, labour-intensive and time-consuming and hence, challenging in countries with limited resources. Practical models that can efficiently locate water bodies can target the limited resources by greatly reducing the area that needs to be scanned by the field workers. To this end, we propose a practical topographic model based on easily available, global, high-resolution DEM data to predict locations of potential vector-breeding water sites. We surveyed the Obuasi region of Ghana to assess the impact of various topographic features on different types of water bodies and uncover the features that significantly influence the formation of aquatic habitats. We further evaluate the effectiveness of multiple models. Our best model significantly outperforms earlier attempts that employ topographic variables for detection of small water sites, even the ones that utilize additional satellite imagery data and demonstrates robustness across different settings.	翻訳日:2024-07-18 00:37:39 公開日:2024-07-16
# 量子ヤンミル理論の公理 -- 1. ユークリッド公理(不完全) Axioms for Quantum Yang-Mills Theories -- 1. Euclidean Axioms (incomplete) ( http://arxiv.org/abs/2112.08575v6 ) ライセンス: Link先を確認	Min C. Lee,	(参考訳) 本稿では、シュウィンガー関数の概念を量子ヤン・ミルズ理論に拡張し、それらが満たすべき公理を提案する。この公理スキームの2つの主な特徴は、ゲージ不変な共位置シュウィンガー函数の存在を仮定し、それらにのみ反射正の積を課すことである。これはゲージ不変量のみが物理的意味を与えられるというゲージ理論の基本原理に従っている。 This paper extends the notion of Schwinger functions to quantum Yang-Mills theories and propose the axioms they should satisfy. Two main features of this axiom scheme is that we assume existence of gauge-invariant co-located Schwinger functions and impose reflection positivity only on them. This is in accordance with the fundamental principle of gauge theories that only gauge-invariant quantities can be given physical meaning.	翻訳日:2024-07-18 00:37:39 公開日:2024-07-16
# ガウス過程回帰による未知の力学系の形式的検証 Formal Verification of Unknown Dynamical Systems via Gaussian Process Regression ( http://arxiv.org/abs/2201.00655v2 ) ライセンス: Link先を確認	John Skovbekk, Luca Laurenti, Eric Frew, Morteza Lahijanian,	(参考訳) 安全クリティカルなシナリオにおける自律システムの活用には、システムのダイナミクスに影響を与える不確実性やブラックボックスコンポーネントの存在下での行動を検証する必要がある。本研究では,インプット・アウトプット・データセットから,時間論理仕様に対する非モデル化された力学と雑音測定による離散時間力学システムの検証を行うフレームワークを開発する。検証フレームワークはガウス過程(GP)回帰を用いてデータセットから未知のダイナミクスを学習し、連続空間システムを有限状態で不確実なマルコフ決定過程(MDP)として抽象化する。この抽象化は、再現可能なカーネルヒルベルト空間解析を用いて、GP回帰の誤差による不確かさを捉える空間の離散化と遷移確率間隔、および離散化によって引き起こされる不確かさに依存する。このフレームワークは、既存のモデルチェックツールを使用して、特定の時間論理仕様に対して不確実なMDP抽象化を検証する。ノイズ測定結果から基礎システムへの抽象化結果の拡張の正当性を確立した。フレームワークの計算複雑性は、データセットのサイズと離散抽象の多項式であることを示す。複雑性分析は、検証結果の品質と、より大きなデータセットとより詳細な抽象化を扱うための計算負荷との間のトレードオフを示している。最後に,線形・非線形・切替力学系を用いたいくつかのケーススタディにおいて,学習・検証フレームワークの有効性を実証した。 Leveraging autonomous systems in safety-critical scenarios requires verifying their behaviors in the presence of uncertainties and black-box components that influence the system dynamics. In this work, we develop a framework for verifying discrete-time dynamical systems with unmodelled dynamics and noisy measurements against temporal logic specifications from an input-output dataset. The verification framework employs Gaussian process (GP) regression to learn the unknown dynamics from the dataset and abstracts the continuous-space system as a finite-state, uncertain Markov decision process (MDP). This abstraction relies on space discretization and transition probability intervals that capture the uncertainty due to the error in GP regression by using reproducible kernel Hilbert space analysis as well as the uncertainty induced by discretization. The framework utilizes existing model checking tools for verification of the uncertain MDP abstraction against a given temporal logic specification. We establish the correctness of extending the verification results on the abstraction created from noisy measurements to the underlying system. We show that the computational complexity of the framework is polynomial in the size of the dataset and discrete abstraction. The complexity analysis illustrates a trade-off between the quality of the verification results and the computational burden to handle larger datasets and finer abstractions. Finally, we demonstrate the efficacy of our learning and verification framework on several case studies with linear, nonlinear, and switched dynamical systems.	翻訳日:2024-07-18 00:37:39 公開日:2024-07-16
# Infinityにおける凸解析:アストラル空間入門 Convex Analysis at Infinity: An Introduction to Astral Space ( http://arxiv.org/abs/2205.03260v3 ) ライセンス: Link先を確認	Miroslav Dudík, Robert E. Schapire, Matus Telgarsky,	(参考訳) $\mathbb{R}^n$ 上の凸函数は、有限の最小化子を持つわけではない。本研究は,無限大におけるそのような最小化要因を理解するための理論を開発することを目的としている。無限遠点が加わったような$\mathbb{R}^n$のコンパクトな拡張であるアストラル空間について研究する。アストラル空間はできるだけ小さいように構成され、すべての線型函数が新しい空間へ連続的に拡張されることを保証する。アストラル空間は$\mathbb{R}^n$のすべてを含むが、これはベクトル空間ではない。しかし、凸性、共役性、および部分微分の概念の有用かつ有意義な拡張を可能にするには十分に構造化されている。我々はこれらの概念を開発し、アストラル空間上の凸関数の様々な特性を解析し、それらの最小化器の詳細な構造、連続性の正確な特徴付け、降下アルゴリズムの収束を含む。 Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity. In this work, we aim to develop a theory for understanding such minimizers at infinity. We study astral space, a compact extension of $\mathbb{R}^n$ to which such points at infinity have been added. Astral space is constructed to be as small as possible while still ensuring that all linear functions can be continuously extended to the new space. Although astral space includes all of $\mathbb{R}^n$, it is not a vector space, nor even a metric space. However, it is sufficiently well-structured to allow useful and meaningful extensions of concepts of convexity, conjugacy, and subdifferentials. We develop these concepts and analyze various properties of convex functions on astral space, including the detailed structure of their minimizers, exact characterizations of continuity, and convergence of descent algorithms.	翻訳日:2024-07-18 00:37:39 公開日:2024-07-16
# 安全強化学習による並行性制約付き経済派遣 Contingency-constrained economic dispatch with safe reinforcement learning ( http://arxiv.org/abs/2205.06212v3 ) ライセンス: Link先を確認	Michael Eichelbeck, Hannah Markgraf, Matthias Althoff,	(参考訳) 将来の電力システムは、分散化された再生可能エネルギー源とエネルギー貯蔵システムを多く含むマイクログリッドに大きく依存する。この文脈における高い複雑さと不確実性により、従来の配電戦略が実現不可能になる可能性がある。強化学習ベース(RL)コントローラは、この課題に対処することができるが、それ自体が安全保証を提供しておらず、実際にデプロイすることを防ぐことはできない。この制限を克服するために、経済派遣のための正式に検証されたRLコントローラを提案する。従来の制約を時間依存制約によって拡張する。セットベースの後方到達可能性分析を用いて一致制約を算出し、安全層を介してRLエージェントの動作を検証する。安全でないアクションは安全なアクション空間に投影され、制約付きゾノトペ集合表現を計算効率に活用する。本手法は実世界の実測値を用いた住宅利用事例で実証された。 Future power systems will rely heavily on micro grids with a high share of decentralised renewable energy sources and energy storage systems. The high complexity and uncertainty in this context might make conventional power dispatch strategies infeasible. Reinforcement-learning based (RL) controllers can address this challenge, however, cannot themselves provide safety guarantees, preventing their deployment in practice. To overcome this limitation, we propose a formally validated RL controller for economic dispatch. We extend conventional constraints by a time-dependent constraint encoding the islanding contingency. The contingency constraint is computed using set-based backwards reachability analysis and actions of the RL agent are verified through a safety layer. Unsafe actions are projected into the safe action space while leveraging constrained zonotope set representations for computational efficiency. The developed approach is demonstrated on a residential use case using real-world measurements.	翻訳日:2024-07-18 00:37:39 公開日:2024-07-16
# インタラクティブな固定効果を用いた線形多次元回帰 Linear multidimensional regression with interactive fixed-effects ( http://arxiv.org/abs/2209.11691v3 ) ライセンス: Link先を確認	Hugo Freeman,	(参考訳) 本稿では,3次元以上の多次元パネルデータに対する線形かつ付加的に分離可能なモデルについて検討する。 2つのアプローチは、観測された共変量に対する係数を推定する際に、これらの観測されていないインタラクティブな固定効果を考慮に入れていると考えられる。第一に、モデルは標準的な2次元パネルの枠組みに埋め込まれており、Bai (2009) における因子構造法がモデルパラメータの一貫した推定に繋がる制約を形成するが、収束速度は遅い。第2のアプローチでは、カーネル重み付き固定効果法を開発し、この問題の多次元的性質に対してより堅牢であり、特定の条件下での一貫性のパラメトリック速度を達成することができる。理論的な結果とシミュレーションは、インタラクティブな固定効果項の構造が知られている場合の標準的な2次元パネル法にいくつかの利点を示す一方で、カーネル重み付け法がこの構造を知らずにどのように機能するかを強調している。ビールの需要弾力性を推定する手法が提案されている。 This paper studies a linear and additively separable model for multidimensional panel data of three or more dimensions with unobserved interactive fixed effects. Two approaches are considered to account for these unobserved interactive fixed-effects when estimating coefficients on the observed covariates. First, the model is embedded within the standard two dimensional panel framework and restrictions are formed under which the factor structure methods in Bai (2009) lead to consistent estimation of model parameters, but at slow rates of convergence. The second approach develops a kernel weighted fixed-effects method that is more robust to the multidimensional nature of the problem and can achieve the parametric rate of consistency under certain conditions. Theoretical results and simulations show some benefits to standard two-dimensional panel methods when the structure of the interactive fixed-effect term is known, but also highlight how the kernel weighted method performs well without knowledge of this structure. The methods are implemented to estimate the demand elasticity for beer.	翻訳日:2024-07-18 00:30:09 公開日:2024-07-16
# 量子シミュレーションのための高次積公式の改良 Greatly improved higher-order product formulae for quantum simulation ( http://arxiv.org/abs/2210.15817v2 ) ライセンス: Link先を確認	Mauro E. S. Morales, Pedro C. S. Costa, Giacomo Pantaleoni, Daniel K. Burgarth, Yuval R. Sanders, Dominic W. Berry,	(参考訳) ハミルトン進化のシミュレーションのための量子アルゴリズムは、しばしば積公式に基づいている。スズキのフラクタル法は、任意の高次積公式を見つける体系的な方法を与えるが、多くの指数関数をもたらす。一方、指数関数の少ない積公式は、同時非線形方程式の数値解によって見つけることができる。また、カーネルを繰り返し、プロセッサをシミュレーションの開始と終了にのみ適用する必要があるような処理によって、長時間シミュレーションのコストを削減することもできる。本研究では,8位と10位の両方の新しい積公式を数千個発見し,これらの式と先行文献の多くの公式を数値的に検証した。異なる長さと異なる順序の積公式を適切に比較する方法を提供する。システムパラメータ$T$ (time) と$\epsilon$ (allowable error) の8桁の精度で、他のテスト済み製品式よりも優れた性能を持つ8階目の製品公式が発見された。これには、量子アルゴリズムで使用されるパラメータの最も合理的な組み合わせが含まれる。 Quantum algorithms for simulation of Hamiltonian evolution are often based on product formulae. The fractal method of Suzuki gives a systematic way to find arbitrarily high-order product formulae, but results in a large number of exponentials. On the other hand, product formulae with fewer exponentials can be found by numerical solution of simultaneous nonlinear equations. It is also possible to reduce the cost of long-time simulations by processing, where a kernel is repeated and a processor need only be applied at the beginning and end of the simulation. In this work, we found thousands of new product formulae of both 8th and 10th order, and numerically tested these formulae, together with many formulae from prior literature. We provide methods to fairly compare product formulae of different lengths and different orders. We have found a new 8th order processed product formula with exceptional performance, that outperforms all other tested product formulae for about eight orders of magnitude in system parameters $T$ (time) and $\epsilon$ (allowable error). That includes most reasonable combinations of parameters to be used in quantum algorithms.	翻訳日:2024-07-18 00:30:09 公開日:2024-07-16
# コンセンサストラッキングとアグリゲーションゲームにおける結合制約による差分プライバシと収束精度の確保 Ensure Differential Privacy and Convergence Accuracy in Consensus Tracking and Aggregative Games with Coupling Constraints ( http://arxiv.org/abs/2210.16395v4 ) ライセンス: Link先を確認	Yongqiang Wang,	(参考訳) 共有結合制約を持つ完全分散集約ゲームに対する差分プライバシに対処する。一般化ナッシュ平衡(GNE)探索機構と微分プライバシ雑音注入機構を共同設計することにより、GNEへの証明可能な収束と厳密なエプシロン差分プライバシーを両立できる最初のGNE探索アルゴリズムを提案する。共同設計の基盤として,我々の知る限りでは達成されていない正確な追跡性能を維持しつつ,厳密なエプシロン差分プライバシーを実現するための新たなコンセンサス追跡アルゴリズムを提案する。収束解析を容易にするために,多数の最適化と変分問題の中核に位置する確率論的に摂動された非定常不動点反復過程に対する一般化結果も確立する。数値シミュレーションの結果,提案手法の有効性が確認された。 We address differential privacy for fully distributed aggregative games with shared coupling constraints. By co-designing the generalized Nash equilibrium (GNE) seeking mechanism and the differential-privacy noise injection mechanism, we propose the first GNE seeking algorithm that can ensure both provable convergence to the GNE and rigorous epsilon-differential privacy, even with the number of iterations tending to infinity. As a basis of the co-design, we also propose a new consensus-tracking algorithm that can achieve rigorous epsilon-differential privacy while maintaining accurate tracking performance, which, to our knowledge, has not been achieved before. To facilitate the convergence analysis, we also establish a general convergence result for stochastically-perturbed nonstationary fixed-point iteration processes, which lie at the core of numerous optimization and variational problems. Numerical simulation results confirm the effectiveness of the proposed approach.	翻訳日:2024-07-18 00:30:09 公開日:2024-07-16
# グローバルモーメント初期化による敵攻撃の伝達性向上 Boosting the Transferability of Adversarial Attacks with Global Momentum Initialization ( http://arxiv.org/abs/2211.11236v3 ) ライセンス: Link先を確認	Jiafeng Wang, Zhaoyu Chen, Kaixun Jiang, Dingkang Yang, Lingyi Hong, Pinxue Guo, Haijing Guo, Wenqiang Zhang,	(参考訳) ディープニューラルネットワーク(Deep Neural Networks, DNN)は、敵対的な例に対して脆弱である。同時に、敵の例はモデル間での転送可能性を示し、実用的なブラックボックス攻撃を可能にした。しかし、既存の手法では所望の転送攻撃性能を達成できない。本研究では、勾配最適化と整合性に着目し、勾配除去現象と局所運動量最適ジレンマを解析する。これらの課題に対処するために,Global Momentum Initialization (GI)を導入し,勾配除去を緩和するためのグローバルな運動量知識を提供する。具体的には、攻撃前に勾配前収束を行い、この段階でグローバル検索を行う。 GIは既存の転送方式とシームレスに統合され、最先端の防御機構により平均6.4%の転送攻撃の成功率を大幅に向上させる。最終的に、GIは画像とビデオの両方の攻撃領域で強力な転送可能性を示す。特に、画像領域における高度な防御方法を攻撃する場合、平均的な攻撃成功率は95.4%に達する。コードは$\href{https://github.com/Omenzychen/Global-Momentum-Initialization}{https://github.com/Omenzychen/Global-Momentum-Initialization}$で入手できる。 Deep Neural Networks (DNNs) are vulnerable to adversarial examples, which are crafted by adding human-imperceptible perturbations to the benign inputs. Simultaneously, adversarial examples exhibit transferability across models, enabling practical black-box attacks. However, existing methods are still incapable of achieving the desired transfer attack performance. In this work, focusing on gradient optimization and consistency, we analyse the gradient elimination phenomenon as well as the local momentum optimum dilemma. To tackle these challenges, we introduce Global Momentum Initialization (GI), providing global momentum knowledge to mitigate gradient elimination. Specifically, we perform gradient pre-convergence before the attack and a global search during this stage. GI seamlessly integrates with existing transfer methods, significantly improving the success rate of transfer attacks by an average of 6.4% under various advanced defense mechanisms compared to the state-of-the-art method. Ultimately, GI demonstrates strong transferability in both image and video attack domains. Particularly, when attacking advanced defense methods in the image domain, it achieves an average attack success rate of 95.4%. The code is available at $\href{https://github.com/Omenzychen/Global-Momentum-Initialization}{https://github.com/Omenzychen/Global-Momentum-Initialization}$.	翻訳日:2024-07-18 00:30:09 公開日:2024-07-16
# 連続学習のための潜在スペクトル規則化 Latent Spectral Regularization for Continual Learning ( http://arxiv.org/abs/2301.03345v4 ) ライセンス: Link先を確認	Emanuele Frascaroli, Riccardo Benaglia, Matteo Boschini, Luca Moschella, Cosimo Fiorini, Emanuele Rodolà, Simone Calderara,	(参考訳) 生物の知性は、新しい知識が生涯にわたって収集されるにつれて有機的に成長するが、ニューラルネットワークは、変化するトレーニングデータ分布に直面すると破滅的なことを忘れる。リハーサルベースの連続学習(CL)アプローチは、この制限を克服するための汎用的で信頼性の高いソリューションとして確立されているが、突然の入力障害とメモリ制約は、それらの予測の一貫性を変えることが知られている。本研究では,学習者の潜伏空間の幾何学的特徴を調べた結果,異なるクラスにおけるリプレイされたデータポイントが次第に混在し,分類に干渉していることが判明した。そこで我々は,ラプラシアンスペクトルの弱要求を強制する幾何正則化器を提案し,分割挙動を推し進める。提案手法はCaSpeR-IL(Continuous Spectral Regularizer for Incremental Learning)と呼ばれ,任意のリハーサルベースのCLアプローチと簡単に組み合わせることができる。 While biological intelligence grows organically as new knowledge is gathered throughout life, Artificial Neural Networks forget catastrophically whenever they face a changing training data distribution. Rehearsal-based Continual Learning (CL) approaches have been established as a versatile and reliable solution to overcome this limitation; however, sudden input disruptions and memory constraints are known to alter the consistency of their predictions. We study this phenomenon by investigating the geometric characteristics of the learner's latent space and find that replayed data points of different classes increasingly mix up, interfering with classification. Hence, we propose a geometric regularizer that enforces weak requirements on the Laplacian spectrum of the latent space, promoting a partitioning behavior. Our proposal, called Continual Spectral Regularizer for Incremental Learning (CaSpeR-IL), can be easily combined with any rehearsal-based CL approach and improves the performance of SOTA methods on standard benchmarks.	翻訳日:2024-07-18 00:30:09 公開日:2024-07-16
# ODIM:Under-Fitted Generative Modelの類似による外部検出 ODIM: Outlier Detection via Likelihood of Under-Fitted Generative Models ( http://arxiv.org/abs/2301.04257v2 ) ライセンス: Link先を確認	Dongha Kim, Jaesung Hwang, Jongjin Lee, Kunwoong Kim, Yongdai Kim,	(参考訳) unsupervised outlier detection (UOD) 問題とは、インリアーとインリアーを含む訓練データからインリアーとインリアーのラベルを付けずにインリアーを識別するタスクである。完全に訓練された確率ベース深部生成モデル(DGM)を用いることで、不整合と外れ値の区別性能が低下することが広く認識されている。本研究は、DGMが慎重に不適合であることを前提として、UDDタスクの不整合を識別する強力な証拠となる可能性を主張する。我々のアプローチは、inlier-memorization(IM)エフェクトと呼ばれる新しい観測から始まる。そこで本研究では, IM効果(ODIM)を用いた外乱検出法を開発した。注目すべきなのは、ODIMは数回のアップデートしか必要とせず、計算効率が他のディープラーニングベースのアルゴリズムの何倍も高速であることだ。また、ODIMは、表、画像、テキストデータを含むデータの種類にかかわらず、アウトレーヤを良好にフィルタリングする。提案手法の優位性と効率性を検証するため,60近いデータセットに対して広範な実験分析を行った。 The unsupervised outlier detection (UOD) problem refers to a task to identify inliers given training data which contain outliers as well as inliers, without any labeled information about inliers and outliers. It has been widely recognized that using fully-trained likelihood-based deep generative models (DGMs) often results in poor performance in distinguishing inliers from outliers. In this study, we claim that the likelihood itself could serve as powerful evidence for identifying inliers in UOD tasks, provided that DGMs are carefully under-fitted. Our approach begins with a novel observation called the inlier-memorization (IM) effect-when training a deep generative model with data including outliers, the model initially memorizes inliers before outliers. Based on this finding, we develop a new method called the outlier detection via the IM effect (ODIM). Remarkably, the ODIM requires only a few updates, making it computationally efficient-at least tens of times faster than other deep-learning-based algorithms. Also, the ODIM filters out outliers excellently, regardless of the data type, including tabular, image, and text data. To validate the superiority and efficiency of our method, we provide extensive empirical analyses on close to 60 datasets.	翻訳日:2024-07-18 00:30:09 公開日:2024-07-16
# NeSIG: 計画問題生成のためのニューロシンボリックな学習方法 NeSIG: A Neuro-Symbolic Method for Learning to Generate Planning Problems ( http://arxiv.org/abs/2301.10280v2 ) ライセンス: Link先を確認	Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares,	(参考訳) 自動計画(Automated Planning)の分野では、マシンラーニングのトレーニングデータや、計画競合のベンチマークとして使用されるような、特定のドメインからの計画上の問題セットが必要になることが多い。ほとんどの場合、これらの問題は手動かドメイン固有のジェネレータによって生成され、人間の設計者に負担がかかる。本稿では,NeSIGを提案する。この知識を最大限に活用するために,有効で多種多様で解決が難しい計画問題を自動的に生成する,ドメインに依存しない最初の手法を提案する。マルコフ決定プロセスとして問題生成を定式化し、Deep Reinforcement Learning を用いて2つの生成ポリシーを訓練し、所望の特性の問題を発生させる。我々は3つの古典的ドメインについて実験を行い、手工芸のドメイン固有のインスタンスジェネレータと様々なアブリケーションに対するアプローチを比較した。結果は、NeSIGがドメイン固有のジェネレータよりもはるかに困難(幾何平均の15.5倍)な、有効で多様な問題を自動生成できることを示している。さらに、トレーニング中に見られる問題よりも大きな問題に一般化することができる。 In the field of Automated Planning there is often the need for a set of planning problems from a particular domain, e.g., to be used as training data for Machine Learning or as benchmarks in planning competitions. In most cases, these problems are created either by hand or by a domain-specific generator, putting a burden on the human designers. In this paper we propose NeSIG, to the best of our knowledge the first domain-independent method for automatically generating planning problems that are valid, diverse and difficult to solve. We formulate problem generation as a Markov Decision Process and train two generative policies with Deep Reinforcement Learning to generate problems with the desired properties. We conduct experiments on three classical domains, comparing our approach against handcrafted, domain-specific instance generators and various ablations. Results show NeSIG is able to automatically generate valid and diverse problems of much greater difficulty (15.5 times more on geometric average) than domain-specific generators, while simultaneously reducing human effort when compared to them. Additionally, it can generalize to larger problems than those seen during training.	翻訳日:2024-07-18 00:30:09 公開日:2024-07-16
# 深層学習を用いた医用画像分割のためのマスク処理による外レンズ補間の評価 Evaluation of Extra Pixel Interpolation with Mask Processing for Medical Image Segmentation with Deep Learning ( http://arxiv.org/abs/2302.11522v4 ) ライセンス: Link先を確認	Olivier Rukundo,	(参考訳) 現在のマスク処理操作は、隣接する(NN)補間のような余分なピクセルを生成しない補間アルゴリズムに依存しており、バイキュビック(BIC)やバイリニア(BIL)補間のような余分なピクセルを生成するアルゴリズムとは対照的である。前報では,NNを用いたマスク処理の代替手法を提案し,その効果が深層学習訓練結果に及ぼす影響を評価した。本研究では,BICベースの画像とマスク処理とBICとNNベースの画像とマスク処理の両方が,NNベースの画像とマスク処理に与える影響を評価した。評価の結果、BIC-BICモデルは8.9578 %(画像サイズ256 x 256)、1.0496 %(画像サイズ384 x 384)、NN-NNネットワークは8.3127 %(画像サイズ256 x 256)、0.2887 %(画像サイズ384 x 384)であった。 Current mask processing operations rely on interpolation algorithms that do not produce extra pixels, such as nearest neighbor (NN) interpolation, as opposed to algorithms that do produce extra pixels, like bicubic (BIC) or bilinear (BIL) interpolation. In our previous study, the author proposed an alternative approach to NN-based mask processing and evaluated its effects on deep learning training outcomes. In this study, the author evaluated the effects of both BIC-based image and mask processing and BIC-and-NN-based image and mask processing versus NN-based image and mask processing. The evaluation revealed that the BIC-BIC model/network was an 8.9578 % (with image size 256 x 256) and a 1.0496 % (with image size 384 x 384) increase of the NN-NN network compared to the NN-BIC network which was an 8.3127 % (with image size 256 x 256) and a 0.2887 % (with image size 384 x 384) increase of the NN-NN network.	翻訳日:2024-07-18 00:30:09 公開日:2024-07-16
# CompoDiff:Versatileの合成画像検索と遅延拡散 CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion ( http://arxiv.org/abs/2303.11916v4 ) ライセンス: Link先を確認	Geonmo Gu, Sanghyuk Chun, Wonjae Kim, HeeJae Jun, Yoohoon Kang, Sangdoo Yun,	(参考訳) 本稿では,ゼロショット合成画像検索(ZS-CIR)を遅延拡散で解くための新しい拡散モデルCompoDiffを提案する。また,CIRモデルをトレーニングするための1880万の参照画像,条件,および対応するターゲット画像三重項を含む,SynthTriplets18Mという新しい合成データセットも紹介した。 CompoDiffとSynthTriplets18Mは、小さなデータセットスケールと限られた条件による一般化性の低下など、従来のCIRアプローチの不足に対処している。 CompoDiffは、FashionIQ、CIRR、CIRCO、GeneCISを含む4つのZS-CIRベンチマークで新たな最先端を達成しているだけでなく、ネガティブテキストやイメージマスク条件などのさまざまな条件を受け入れることで、より汎用的で制御可能なCIRを実現している。 CompoDiffはまた、テキストと画像クエリ間の条件強度の制御性と、既存のCIRメソッドでは利用できない推論速度と性能のトレードオフも示す。コードとデータセットはhttps://github.com/navervision/CompoDiffで公開されている。 This paper proposes a novel diffusion-based model, CompoDiff, for solving zero-shot Composed Image Retrieval (ZS-CIR) with latent diffusion. This paper also introduces a new synthetic dataset, named SynthTriplets18M, with 18.8 million reference images, conditions, and corresponding target image triplets to train CIR models. CompoDiff and SynthTriplets18M tackle the shortages of the previous CIR approaches, such as poor generalizability due to the small dataset scale and the limited types of conditions. CompoDiff not only achieves a new state-of-the-art on four ZS-CIR benchmarks, including FashionIQ, CIRR, CIRCO, and GeneCIS, but also enables a more versatile and controllable CIR by accepting various conditions, such as negative text, and image mask conditions. CompoDiff also shows the controllability of the condition strength between text and image queries and the trade-off between inference speed and performance, which are unavailable with existing CIR methods. The code and dataset are available at https://github.com/navervision/CompoDiff	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# LMExplainer: 知識の基盤と言語モデル LMExplainer: Grounding Knowledge and Explaining Language Models ( http://arxiv.org/abs/2303.16537v3 ) ライセンス: Link先を確認	Zichen Chen, Jianda Chen, Yuanyuan Chen, Han Yu, Ambuj K Singh, Misha Sra,	(参考訳) GPT-4のような言語モデル(LM)は、AIアプリケーションにおいて重要であるが、不透明な意思決定プロセスは、特に安全クリティカルな領域において、ユーザの信頼を低下させる。 LMExplainerは,人間の直感的,理解可能な説明を通じて,LMの推論過程を明らかにする新しい知識基盤説明器である。大規模知識グラフ(KG)を用いたグラフアテンションネットワーク(GAT)を活用することで、LMExplainerは推論空間を正確に狭め、最も関連する知識にフォーカスするだけでなく、幻覚を減らし、解釈可能性を高めるために、構造化された検証可能な知識にその推論を基礎付ける。 LMExplainerは、透明性を高め、意思決定プロセスを合理化するために、人間の理解可能な説明を効果的に生成する。さらに、デバッグを説明に組み込むことで、開発の観点からLMを改善する専門的な提案を提供する。したがって、LMExplainerは、LMをユーザにとってよりアクセスしやすく、理解しやすいものにするための拡張である。我々は、CommonsenseQAやOpenBookQAといったベンチマークデータセット上でLMExplainerを評価し、既存のメソッドよりも優れていることを示す。 LMExplainerが生成した説明と他のモデルの説明を比較することで、我々のアプローチは推論プロセスのより包括的で明確な説明を提供することを示す。 LMExplainerは、LMの内部動作をより深く理解し、より信頼性が高く、透明で、公平なAIに向かっている。 Language models (LMs) like GPT-4 are important in AI applications, but their opaque decision-making process reduces user trust, especially in safety-critical areas. We introduce LMExplainer, a novel knowledge-grounded explainer that clarifies the reasoning process of LMs through intuitive, human-understandable explanations. By leveraging a graph attention network (GAT) with a large-scale knowledge graph (KG), LMExplainer not only precisely narrows the reasoning space to focus on the most relevant knowledge but also grounds its reasoning in structured, verifiable knowledge to reduce hallucinations and enhance interpretability. LMExplainer effectively generates human-understandable explanations to enhance transparency and streamline the decision-making process. Additionally, by incorporating debugging into the explanation, it offers expertise suggestions that improve LMs from a developmental perspective. Thus, LMExplainer stands as an enhancement in making LMs more accessible and understandable to users. We evaluate LMExplainer on benchmark datasets such as CommonsenseQA and OpenBookQA, demonstrating that it outperforms most existing methods. By comparing the explanations generated by LMExplainer with those of other models, we show that our approach offers more comprehensive and clearer explanations of the reasoning process. LMExplainer provides a deeper understanding of the inner workings of LMs, advancing towards more reliable, transparent, and equitable AI.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# 全距離イジングモデルによるN$ spin-$1/2$システムにおける量子絡み合い、幾何学的および動的外観の相補性 Complementarity between quantum entanglement, geometrical and dynamical appearances in $N$ spin-$1/2$ system under all-range Ising model ( http://arxiv.org/abs/2304.05278v2 ) ライセンス: Link先を確認	Jamal Elfakir, Brahim Amghar, Abdallah Slaoui, Mohammed Daoud,	(参考訳) 幾何学科学の成長に伴い、現代の幾何学によって情報の世界を探索する手法を含め、幾何学的・位相的・動的特性と量子的絡み合いとの間には謎の曖昧な関係が常にある。幾何学は距離や曲率などの要素間の相互関係を研究するため、積分可能量子系の実用的で理解可能な記述をもたらす強力な構造を持つ情報科学を提供する。ここでは、これらの構造を全範囲イジングモデルの下でN$相互作用スピン-1/2$の物理系で探索する。系の力学により、関連する量子状態空間を定義するフビニ・スタディ計量を決定する。ガウス・ボンネットの定理の範囲内でガウス曲率を適用することで、ダンベル型構造と球面位相の両方を持つ閉2次元多様体上でその力学が生じることを証明した。系の進化過程に現れる幾何学的位相と位相的位相を十分に議論する。その後、時間-最適進化を達成して量子ブラキストロン問題を解く。一つ目は幾何学的な性質であり、その絡み合いレベルがフビニ・スタディ計量、ガウス曲率、幾何学的位相などの導出した幾何学的構造にどのように影響するかを探求する。 2つ目は動的性質であり、進化速度と関連するフビニ・スタディ距離に対する絡み合い効果に対処する。さらに、絡み合いの度合いにより、量子ブラキストロン問題を解く。 With the growth of geometric science, including the methods of exploring the world of information by means of modern geometry, there has always been a mysterious and fascinating ambiguous link between geometric, topological and dynamical characteristics with quantum entanglement. Since geometry studies the interrelations between elements such as distance and curvature, it provides the information sciences with powerful structures that yield practically useful and understandable descriptions of integrable quantum systems. We explore here these structures in a physical system of $N$ interaction spin-$1/2$ under all-range Ising model. By performing the system dynamics, we determine the Fubini-Study metric defining the relevant quantum state space. Applying Gaussian curvature within the scope of the Gauss-Bonnet theorem, we proved that the dynamics happens on a closed two-dimensional manifold having both a dumbbell-shape structure and a spherical topology. The geometric and topological phases appearing during the system evolution processes are sufficiently discussed. Subsequently, we resolve the quantum brachistochrone problem by achieving the time-optimal evolution. By restricting the whole system to a two spin-$1/2$ system, we investigate the relevant entanglement from two viewpoints; The first is of geometric nature and explores how the entanglement level affects derived geometric structures such as the Fubini-Study metric, the Gaussian curvature, and the geometric phase. The second is of dynamic nature and addresses the entanglement effect on the evolution speed and the related Fubini-Study distance. Further, depending on the degree of entanglement, we resolve the quantum brachistochrone problem.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# 基礎モデルに基づくシステム設計のための参照アーキテクチャ A Reference Architecture for Designing Foundation Model based Systems ( http://arxiv.org/abs/2304.11090v5 ) ライセンス: Link先を確認	Qinghua Lu, Liming Zhu, Xiwei Xu, Zhenchang Xing, Jon Whittle,	(参考訳) ChatGPT、Gemini、その他の大規模言語モデルのリリースは、基礎モデルに大きな関心を集めている。ファンデーションモデルが将来のAIシステムの基本的なビルディングブロックになる、という広いコンセンサスがある。しかし、アーキテクチャ設計に関する体系的なガイダンスが不足している。特に、ファンデーションモデルの急速な機能向上は、最終的にはAIシステムの他のコンポーネントを吸収し、アーキテクチャ設計における境界の移動とインターフェースの進化の課題を提起する。さらに、基礎モデルをAIシステムに組み込むことは、不透明な性質と急速に進歩するインテリジェンスのために、責任と安全性に関する重要な懸念を提起する。これらの課題に対処するため,本論文では,基礎モデル時代におけるAIシステムのアーキテクチャ進化について,"境界モデル・アズ・ア・コネクタ"から"境界モデル・ア・ア・モノリシックアーキテクチャ"へ移行した。そこで本論文では,設計上の重要な決定事項を特定し,基礎モデルに基づくシステム設計のためのパターン指向参照アーキテクチャを提案する。このパターンは、関連するリスクを確保しながら、ファンデーションモデルの可能性を可能にする。 The release of ChatGPT, Gemini, and other large language model has drawn huge interests on foundations models. There is a broad consensus that foundations models will be the fundamental building blocks for future AI systems. However, there is a lack of systematic guidance on the architecture design. Particularly, the the rapidly growing capabilities of foundations models can eventually absorb other components of AI systems, posing challenges of moving boundary and interface evolution in architecture design. Furthermore, incorporating foundations models into AI systems raises significant concerns about responsible and safe AI due to their opaque nature and rapidly advancing intelligence. To address these challenges, the paper first presents an architecture evolution of AI systems in the era of foundation models, transitioning from "foundation-model-as-a-connector" to "foundation-model-as-a-monolithic architecture". The paper then identifies key design decisions and proposes a pattern-oriented reference architecture for designing responsible foundation-model-based systems. The patterns can enable the potential of foundation models while ensuring associated risks.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# 量子オブジェクト間の変換のキャラクタリゼーション、量子特性の「完全性」、固定因数順序のない変換 Characterising transformations between quantum objects, 'completeness' of quantum properties, and transformations without a fixed causal order ( http://arxiv.org/abs/2305.01247v2 ) ライセンス: Link先を確認	Simon Milz, Marco Túlio Quintino,	(参考訳) 量子力学における多くの基本的および鍵的対象は、特定のアフィン/線型空間間の線型写像である。この構造には、状態、測定、チャネル、機器、非シグナリングチャネル、メモリを持つチャネルといった基本的な量子要素や、スーパーチャネル、量子コム、n時間プロセス、テスタ、プロセス行列といった特定の因果順序を尊重しない高次演算が含まれる。線形および半定値制約の観点でそれらの構造特性を導出し特徴付けることは、基礎的関連性だけでなく、量子オブジェクトの集合に対する数値最適化を可能にし、異なる概念とオブジェクト間のより単純な接続を可能にする上で重要な役割を担っている。ここでは、これらのプロパティを直接的で使いやすい方法で推論する一般的なフレームワークを提供する。現実的な量子力学的考察によって導かれるが、一般線型/ファイン空間間の写像に解析を拡張し、それらの性質を導出し、量子論によって明示的に禁止されていないが、まだあまり研究されていない集合を解析する可能性を開く。これらの結果と合わせて、量子力学などにおいて線形変換の特徴付けを必要とする全てのタスクに対して、汎用的で容易に適用可能なツールが得られる。提案手法の適用例として、高次量子変換において不定因性の存在が自然に出現し、入力空間の部分のみに非自明に振る舞う場合の「完全」な意味での性質を保たなければならない写像のキャラクタリゼーションのための簡単な戦略について論じる。 Many fundamental and key objects in quantum mechanics are linear mappings between particular affine/linear spaces. This structure includes basic quantum elements such as states, measurements, channels, instruments, non-signalling channels and channels with memory, and also higher-order operations such as superchannels, quantum combs, n-time processes, testers, and process matrices which may not respect a definite causal order. Deducing and characterising their structural properties in terms of linear and semidefinite constraints is not only of foundational relevance, but plays an important role in enabling the numerical optimisation over sets of quantum objects and allowing simpler connections between different concepts and objects. Here, we provide a general framework to deduce these properties in a direct and easy to use way. While primarily guided by practical quantum mechanical considerations, we also extend our analysis to mappings between general linear/affine spaces and derive their properties, opening the possibility for analysing sets which are not explicitly forbidden by quantum theory, but are still not much explored. Together, these results yield versatile and readily applicable tools for all tasks that require the characterisation of linear transformations, in quantum mechanics and beyond. As an application of our methods, we discuss how the existence of indefinite causality naturally emerges in higher-order quantum transformations and provide a simple strategy for the characterisation of mappings that have to preserve properties in a 'complete' sense, i.e., when acting non-trivially only on parts of an input space.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# 室内のエレファント:自然言語処理研究におけるビッグデータの存在分析 The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research ( http://arxiv.org/abs/2305.02797v4 ) ライセンス: Link先を確認	Mohamed Abdalla, Jan Philip Wahle, Terry Ruas, Aurélie Névéol, Fanny Ducel, Saif M. Mohammad, Karën Fort,	(参考訳) 自然言語処理(NLP)の深層学習手法の最近の進歩は、新たなビジネス機会を生み出し、NLP研究を産業発展に欠かせないものにしている。 NLPの分野では、政府や大学とともに大きなプレーヤーの1つとして、産業が研究に与える影響を追跡することが重要である。本研究では,NLPコミュニティにおける産業の存在を時間とともに定量化し,特徴付けることを目的とする。 78,187冊のNLP出版物と701冊のNLP出版物の包括的なメタデータを持つコーパスを用いて,90年代初め以降の分野における業界の存在を探求する。 NLP作家の業界における存在感は、過去5年間で急激な増加(2017年から2022年までの180%)を前に着実に推移している。いくつかの企業は出版物の大半を占め、助成金やインターンシップを通じて学術研究者に資金を提供している。本研究は,自然言語処理研究における産業の存在と影響が重要かつ急速に成長していることを示している。この研究は、この分野における産業の影響の透明性を高めることを求めている。 Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence in the NLP community over time. Using a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors, we explore the industry presence in the field since the early 90s. We find that industry presence among NLP authors has been steady before a steep increase over the past five years (180% growth from 2017 to 2022). A few companies account for most of the publications and provide funding to academic researchers through grants and internships. Our study shows that the presence and impact of the industry on natural language processing research are significant and fast-growing. This work calls for increased transparency of industry influence in the field.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# 自由電子干渉計を用いたコヒーレント増幅超高速イメージング Coherently amplified ultrafast imaging using a free-electron interferometer ( http://arxiv.org/abs/2305.04877v2 ) ライセンス: Link先を確認	Tomer Bucher, Harel Nahari, Hanan Herzig Sheinfux, Ron Ruimy, Arthur Niedermayr, Raphael Dahan, Qinghui Yan, Yuval Adiv, Michael Yannai, Jialin Chen, Yaniv Kurman, Sang Tae Park, Daniel J. Masiel, Eli Janzen, James H. Edgar, Fabrizio Carbone, Guy Bartal, Shai Tsesses, Frank H. L. Koppens, Giovanni Maria Vanacore, Ido Kaminer,	(参考訳) 空間分解能と時間分解能の同時同時に物質とその分極子の低エネルギー非平衡ダイナミクスにアクセスすることは、近年の電子顕微鏡の大胆なフロンティアである。主な課題の1つは、振幅と位相情報を同時に切り離しながら非常に弱い信号を取得する能力である。本稿では、光誘起電子変調に基づく顕微鏡手法であるFree-Electron Ramsey Imaging(FERI)を提案する。六方晶窒化ホウ素膜から作製したマイクロドラムの時間・空間・位相同時分解測定を行い、2次元偏光子波束のサブサイクルダイナミクスを可視化した。位相分解測定により、ポラリトン波面上の渦反渦特異点と、定常波の振幅プロファイルを模倣する走行波の興味深い現象が明らかになった。実験では, 従来の電子近接場イメージングと比較して20倍のコヒーレント増幅を行い, 数kV/mの磁場振幅に対応するピーク場強度を ~W/cm2 の順に解消した。その結果、我々の研究は、生体試料や量子材料を時空間電子顕微鏡で観察する方法を練り上げました。 Accessing the low-energy non-equilibrium dynamics of materials and their polaritons with simultaneous high spatial and temporal resolution has been a bold frontier of electron microscopy in recent years. One of the main challenges lies in the ability to retrieve extremely weak signals while simultaneously disentangling amplitude and phase information. Here, we present Free-Electron Ramsey Imaging (FERI), a microscopy approach based on light-induced electron modulation that enables coherent amplification of optical near-fields in electron imaging. We provide simultaneous time-, space-, and phase-resolved measurements of a micro-drum made from a hexagonal boron nitride membrane visualizing the sub-cycle dynamics of 2D polariton wavepackets therein. The phase-resolved measurements reveals vortex-anti-vortex singularities on the polariton wavefronts, together with an intriguing phenomenon of a traveling wave mimicking the amplitude profile of a standing wave. Our experiments show a 20-fold coherent amplification of the near-field signal compared to conventional electron near-field imaging, resolving peak field intensities in the order of ~W/cm2, corresponding to field amplitudes of a few kV/m. As a result, our work paves the way for spatio-temporal electron microscopy of biological specimens and quantum materials, exciting yet delicate samples that are currently difficult to investigate.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# 大規模言語モデルの時代における関係抽出の再検討 Revisiting Relation Extraction in the era of Large Language Models ( http://arxiv.org/abs/2305.05003v2 ) ライセンス: Link先を確認	Somin Wadhwa, Silvio Amir, Byron C. Wallace,	(参考訳) 関係抽出(RE)は、テキストからエンティティ間の意味的関係を推測する中核的なNLPタスクである。標準教師付きRE技術は、エンティティスパンを構成するトークンをタグ付けし、それらの関係を予測するためのトレーニングモジュールを提供する。最近の研究は、この問題を「emph{sequence-to-sequence}」タスクとして扱い、入力に条件付けされたターゲット文字列としてエンティティ間の関係を線形化する。ここでは、従来の作業よりも大きい言語モデル(GPT-3とFlan-T5)を用いて、標準的なREタスクの性能を様々なレベルの監督下で評価し、このアプローチの限界を推し進める。我々は、正確なマッチングに頼る代わりに、人間による評価を行うことにより、REに対する生成的アプローチを評価することに固有の問題に対処する。改良された評価では,(1) GPT-3 を用いたショットプロンプトは SOTA に近い性能,すなわち,既存の完全教師付きモデルとほぼ同等である。(2) Flan-T5 は,ショットセットではあまり機能しないが,チェーン・オブ・ソート(CoT) スタイルの説明(GPT-3 で生成)でそれを監視・微調整することで SOTA の結果が得られる。私たちはこのモデルをREタスクの新しいベースラインとしてリリースします。 Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) Few-shot prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing fully supervised models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# ParamNet: 高速マルチツーワンステン正規化のための動的パラメータネットワーク ParamNet: A Dynamic Parameter Network for Fast Multi-to-One Stain Normalization ( http://arxiv.org/abs/2305.06511v3 ) ライセンス: Link先を確認	Hongtao Kang, Die Luo, Li Chen, Junbo Hu, Tingwei Quan, Shaoqun Zeng, Shenghua Cheng, Xiuli Liu,	(参考訳) 実際には、デジタル病理画像は様々な要因に影響され、色と明るさに大きな違いをもたらすことが多い。 Stain normalizationは、デジタル病理画像の色と明るさの違いを効果的に低減し、コンピュータ支援診断システムの性能を向上させる。従来の染色正規化法は1つまたは複数の参照画像に依存しているが、1つまたは複数の画像はデータセット全体を適切に表現していない。学習に基づく染色正規化法は一般的な手法であるが、複雑なディープネットワークを使用し、計算効率を大幅に低下させるだけでなく、アーティファクトの導入リスクも低減する。特殊なネットワーク構造を用いて計算効率と信頼性を向上させる研究もあるが、これらの手法はネットワーク容量が不足しているため、複数対1の染色正規化に適用することは困難である。本研究では,動的パラメータネットワークを導入し,ParamNetと呼ばれる新しい染色正規化法を提案する。 ParamNetは、ネットワーク設計に動的パラメータ(畳み込み層の重みとバイアス)を導入することで、限られたネットワーク容量と計算効率の課題に対処する。これらのパラメータを効果的に活用することにより、ParamNetは、計算効率を維持しながら、染色正規化における優れた性能を達成する。その結果、ParamNetは25秒で10万×100,000のスライド画像(WSI)を正規化できることがわかった。コードは、https://github.com/khtao/ParamNet.comで入手できる。 In practice, digital pathology images are often affected by various factors, resulting in very large differences in color and brightness. Stain normalization can effectively reduce the differences in color and brightness of digital pathology images, thus improving the performance of computer-aided diagnostic systems. Conventional stain normalization methods rely on one or several reference images, but one or several images may not adequately represent the entire dataset. Although learning-based stain normalization methods are a general approach, they use complex deep networks, which not only greatly reduce computational efficiency, but also risk introducing artifacts. Some studies use specialized network structures to enhance computational efficiency and reliability, but these methods are difficult to apply to multi-to-one stain normalization due to insufficient network capacity. In this study, we introduced dynamic-parameter network and proposed a novel method for stain normalization, called ParamNet. ParamNet addresses the challenges of limited network capacity and computational efficiency by introducing dynamic parameters (weights and biases of convolutional layers) into the network design. By effectively leveraging these parameters, ParamNet achieves superior performance in stain normalization while maintaining computational efficiency. Results show ParamNet can normalize one whole slide image (WSI) of 100,000x100,000 within 25s. The code is available at: https://github.com/khtao/ParamNet.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# 言語間QA: コンテキスト内の言語間パフォーマンスをアンロックする鍵 Cross-lingual QA: A Key to Unlocking In-context Cross-lingual Performance ( http://arxiv.org/abs/2305.15233v3 ) ライセンス: Link先を確認	Sunkyoung Kim, Dayeon Ki, Yireun Kim, Jinsik Lee,	(参考訳) MLLM(Multilingual Large Language Model)は、コンテキスト内学習を通じて、言語間の重要な機能を示す。既存のアプローチは、典型的には、ソースまたはターゲット言語のいずれかで、モノリンガルなインコンテキストの例を構築します。しかし、コンテキスト内サンプル全体を対象言語に翻訳することは、コンテキスト整合性を損なう可能性があり、長いコンテキストパスの場合、コストがかかる。そこで本研究では,質問部と回答部のみを翻訳する言語間プロンプト手法であるクロスランガルQAを導入し,翻訳コストを削減した。 4つの類型的多言語ベンチマークの実験により、クロスランガルQAがモデルに効果的に刺激を与え、それらの言語間知識を引き出すことが示され、以前の単言語間プロンプトアプローチよりも優れていた。さらに,言語間実例を用いたオープンソースMLLMの高速化により,モデルスケールの増大とともに性能が向上することを示す。 Multilingual large language models (MLLMs) have demonstrated significant cross-lingual capabilities through in-context learning. Existing approaches typically construct monolingual in-context examples, either in the source or target language. However, translating entire in-context examples into the target language might compromise contextual integrity and be costly in the case of long-context passages. To address this, we introduce Cross-lingual QA, a cross-lingual prompting method that translates only the question and answer parts, thus reducing translation costs. Experiments on four typologically diverse multilingual benchmarks show that Cross-lingual QA prompting effectively stimulates models to elicit their cross-lingual knowledge, outperforming prior monolingual prompting approaches. Furthermore, we show that prompting open-source MLLMs with cross-lingual in-context examples enhances performance as the model scale increases.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# FlexRound: トレーニング後の量子化のための要素分割に基づく学習可能なラウンドリング FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization ( http://arxiv.org/abs/2306.00317v2 ) ライセンス: Link先を確認	Jung Hyun Lee, Jeonghoon Kim, Se Jung Kwon, Dongsoo Lee,	(参考訳) トレーニング後の量子化(PTQ)は、量子化対応のトレーニングとは異なり、完全なトレーニングデータセットもエンドツーエンドトレーニングもまったく必要としないため、リソース制限されたデバイスへのディープニューラルネットワークのデプロイで人気が高まっている。近年, 各層やブロック出力を再構成したPTQスキームは, 定量化モデルの性能向上に有効であることが判明し, 各層やブロック出力をより良く再構築するための新しい重み付きスキームを考案し, 学習するアルゴリズムが開発されている。そこで本研究では,FlexRoundが共通の量子化グリッドサイズと,事前学習した各ウェイトに対する異なるスケールを共同学習できるように,従来の要素分割ではなく,要素分割をベースとしたPTQの簡易かつ効果的な新しいウェイトラウンド機構を提案する。要素分割によって誘導される微分の相互規則により、FlexRoundは本質的に、対応するスケールを更新する際に事前トレーニングされた重みを利用することができ、したがって、その大きさに応じて柔軟に事前トレーニングされた重みを定量化することができる。幅広いモデルやタスクにおいてFlexRoundの有効性を実証的に検証する。我々の知識を最大限に活用するために、画像分類と自然言語理解だけでなく、自然言語生成に関する総合的な実験を初めて行った。さらに,大規模言語モデルをブロック単位で再構築することで,半精度のベースラインと比較して,性能に無視できる影響しか持たず,効率的に定量化できることを実証した。私たちのコードは \url{https://github.com/onliwad101/FlexRound_LRQ} で利用可能です。 Post-training quantization (PTQ) has been gaining popularity for the deployment of deep neural networks on resource-limited devices since unlike quantization-aware training, neither a full training dataset nor end-to-end training is required at all. As PTQ schemes based on reconstructing each layer or block output turn out to be effective to enhance quantized model performance, recent works have developed algorithms to devise and learn a new weight-rounding scheme so as to better reconstruct each layer or block output. In this work, we propose a simple yet effective new weight-rounding mechanism for PTQ, coined \emph{FlexRound}, based on element-wise division instead of typical element-wise addition such that FlexRound enables jointly learning a common quantization grid size as well as a different scale for each pre-trained weight. Thanks to the reciprocal rule of derivatives induced by element-wise division, FlexRound is inherently able to exploit pre-trained weights when updating their corresponding scales, and thus, flexibly quantize pre-trained weights depending on their magnitudes. We empirically validate the efficacy of FlexRound on a wide range of models and tasks. To the best of our knowledge, our work is the first to carry out comprehensive experiments on not only image classification and natural language understanding but also natural language generation. Moreover, we demonstrate, for the first time, that large language models can be efficiently quantized, with only a negligible impact on performance compared to half-precision baselines, achieved by reconstructing the output in a block-by-block manner. Our code is available at \url{https://github.com/onliwad101/FlexRound_LRQ}.	翻訳日:2024-07-18 00:20:24 公開日:2024-07-16
# 対実予測セットを用いた意思決定支援システムの設計 Designing Decision Support Systems Using Counterfactual Prediction Sets ( http://arxiv.org/abs/2306.03928v3 ) ライセンス: Link先を確認	Eleni Straitouri, Manuel Gomez Rodriguez,	(参考訳) 分類タスクの意思決定支援システムは主に、基底真理ラベルの価値を予測するために設計されている。しかし、予測が完璧ではないため、これらのシステムは、いつ、どのようにして予測を更新するかを人間の専門家に理解させる必要がある。残念なことに、これは挑戦的だった。この文脈では最近、代替的な意思決定支援システムがこの課題を回避できるかもしれないと論じられている。これらのシステムは、単一のラベル予測を提供するのではなく、共形予測器を用いて構築されたラベル予測値のセット、すなわち予測セットを提供し、専門家に予測セットからラベル値を予測するように強制的に要求する。しかしながら、これらのシステムの設計と評価は、これまでのところ、形式化された専門家モデルに依存しており、彼らの約束に疑問を呈している。本稿では,このようなシステムの設計をオンライン学習の観点から再考し,専門家モデルを必要としない方法論を開発する。提案手法は,任意の共形予測器によって提供される予測セットのネスト構造と自然な反ファクト的単調性仮定を利用して,バニラバンディットアルゴリズムと比較して,後悔の指数的な改善を実現する。我々は、我々の方法論をいくつかの競争基準と比較するために、大規模な人体研究(n = 2{,}751$)を行う。その結果, 予測セットに基づく意思決定支援システムにおいて, 専門家のエージェントレベルを制限することは, 専門家が常に自分自身のエージェンシーを行使することよりも, 高いパフォーマンスをもたらすことがわかった。我々は、人間の主題研究に集められたデータと、我々のシステムのオープンソース実装をhttps://github.com/Networks-Learning/counterfactual-prediction-setsで公開しました。 Decision support systems for classification tasks are predominantly designed to predict the value of the ground truth labels. However, since their predictions are not perfect, these systems also need to make human experts understand when and how to use these predictions to update their own predictions. Unfortunately, this has been proven challenging. In this context, it has been recently argued that an alternative type of decision support systems may circumvent this challenge. Rather than providing a single label prediction, these systems provide a set of label prediction values constructed using a conformal predictor, namely a prediction set, and forcefully ask experts to predict a label value from the prediction set. However, the design and evaluation of these systems have so far relied on stylized expert models, questioning their promise. In this paper, we revisit the design of this type of systems from the perspective of online learning and develop a methodology that does not require, nor assumes, an expert model. Our methodology leverages the nested structure of the prediction sets provided by any conformal predictor and a natural counterfactual monotonicity assumption to achieve an exponential improvement in regret in comparison to vanilla bandit algorithms. We conduct a large-scale human subject study ($n = 2{,}751$) to compare our methodology to several competitive baselines. The results show that, for decision support systems based on prediction sets, limiting experts' level of agency leads to greater performance than allowing experts to always exercise their own agency. We have made available the data gathered in our human subject study as well as an open source implementation of our system at https://github.com/Networks-Learning/counterfactual-prediction-sets.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# PromptRobust: 対話型プロンプトにおける大規模言語モデルのロバスト性評価に向けて PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts ( http://arxiv.org/abs/2306.04528v5 ) ライセンス: Link先を確認	Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Zhenqiang Gong, Xing Xie,	(参考訳) 学界や業界全体にわたる大規模言語モデル(LLM)への依存度の増加は、その堅牢さをプロンプトに包括的に理解する必要がある。この重要なニーズに対応するために,LLMの弾力性を測定するために設計された頑健性ベンチマークであるPromptRobustを導入する。本研究は、文字、単語、文、意味といった複数のレベルにわたるプロンプトを標的とした、敵対的なテキスト攻撃を多用する。逆のプロンプトは、タイプミスやシノニムなどのユーザエラーを模倣することを目的としており、意味的整合性を維持しながら、LCMの結果にわずかな偏差がどの程度影響するかを評価することを目的としている。これらのプロンプトは、感情分析、自然言語推論、読書理解、機械翻訳、数学の問題解決など様々なタスクに使用される。本研究は,8つのタスクと13のデータセットに対して慎重に評価した4,788の逆のプロンプトを生成する。以上の結果から,現代のLDMは敵のプロンプトに対して堅牢ではないことが示唆された。さらに,素早い強靭性と伝達性の背後にあるミステリーを理解するための包括的解析を行った。次に、洞察に富んだ堅牢性分析と、即興的な構成のための実用的なレコメンデーションを提供し、研究者と日々のユーザーの両方に有益である。 The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. The adversarial prompts, crafted to mimic plausible user errors like typos or synonyms, aim to evaluate how slight deviations can affect LLM outcomes while maintaining semantic integrity. These prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts. Furthermore, we present a comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# ロバストなセマンティックセグメンテーションモデルの信頼性評価と高速訓練に向けて Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models ( http://arxiv.org/abs/2306.12941v2 ) ライセンス: Link先を確認	Francesco Croce, Naman D Singh, Matthias Hein,	(参考訳) 画像分類において、特に$\ell_\infty$-threatモデルにおいて、敵対的ロバスト性は広範囲に研究されてきたが、オブジェクト検出やセマンティックセグメンテーションといった関連するタスクでは、画像分類よりもはるかに難しい最適化問題であることが判明した。我々は,mIoUとmIoUの精度の異なる指標を最小化する,いくつかの問題固有の新規攻撃を提案する。攻撃のアンサンブルであるSEAは、既存の攻撃がセマンティックセグメンテーションモデルの堅牢性を大幅に過大評価していることを示している。驚くべきことに、セマンティックセグメンテーションモデルに対する既存の敵の訓練の試みは、弱かったり、全く損なわれなかったりしている。従来の逆行訓練のセマンティックセグメンテーションへの適応が失敗した理由を考察し、最近提案された堅牢なImageNetバックボーンを用いて、PASCAL-VOCとADE20kのトレーニング時間の最大6倍の堅牢なセマンティックセグメンテーションモデルを得ることができることを示す。関連コードとロバストモデルはhttps://github.com/nmndeep/robust-segmentationで公開されている。 Adversarial robustness has been studied extensively in image classification, especially for the $\ell_\infty$-threat model, but significantly less so for related tasks such as object detection and semantic segmentation, where attacks turn out to be a much harder optimization problem than for image classification. We propose several problem-specific novel attacks minimizing different metrics in accuracy and mIoU. The ensemble of our attacks, SEA, shows that existing attacks severely overestimate the robustness of semantic segmentation models. Surprisingly, existing attempts of adversarial training for semantic segmentation models turn out to be weak or even completely non-robust. We investigate why previous adaptations of adversarial training to semantic segmentation failed and show how recently proposed robust ImageNet backbones can be used to obtain adversarially robust semantic segmentation models with up to six times less training time for PASCAL-VOC and the more challenging ADE20k. The associated code and robust models are available at https://github.com/nmndeep/robust-segmentation	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# DP-SGDでは感度が過大評価される Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD ( http://arxiv.org/abs/2307.00310v3 ) ライセンス: Link先を確認	Anvith Thudi, Hengrui Jia, Casey Meehan, Ilia Shumailov, Nicolas Papernot,	(参考訳) 個人的確率勾配勾配勾配(DP-SGD)は、個人的深層学習における標準的アプローチである。 DP-SGDの現在のプライバシ分析は、いくつかの設定では厳密であることが知られているが、いくつかの実証的な結果は、一般的なベンチマークデータセットでトレーニングされたモデルが、多くのデータポイントのプライバシを著しく減らすことを示唆している。しかし、過去の試みにもかかわらず、なぜこれがそうなるのかという厳格な説明は得られていない。これは、これらのデータセット設定に制限された場合、より厳格なプライバシー上限が存在するためなのか、あるいは特定のデータポイントに対して、我々の攻撃は不十分なのか? 本稿では,DP-SGD の初 DP 解析(すなわち ``data-dependent' )を行う。我々の分析では、データセット内の類似の隣人が、アウトリージよりもデータ依存のプライバシを享受していることを示す直感を捉えています。形式的には、DP-SGDのステップごとのプライバシー分析を変更して、トレーニングデータセットから計算されたモデル更新の分布に依存するようにする。我々はさらに、この新たなステップごとの分析を効果的に活用して、トレーニングの実行全体について推論する新しい合成定理を開発した。まとめると、この新たなDP-SGD分析により、DP-SGDのリークが、現在のデータ非依存保証よりも多くのデータポイント(一般的なベンチマークでトレーニングされた場合)のプライバシーを著しく少なくすることを示すことができる。これは、敵がトレーニングデータセットを十分にコントロールしていない場合、プライバシ攻撃が多くのデータポイントに対して必ず失敗することを意味する。 Differentially private stochastic gradient descent (DP-SGD) is the canonical approach to private deep learning. While the current privacy analysis of DP-SGD is known to be tight in some settings, several empirical results suggest that models trained on common benchmark datasets leak significantly less privacy for many datapoints. Yet, despite past attempts, a rigorous explanation for why this is the case has not been reached. Is it because there exist tighter privacy upper bounds when restricted to these dataset settings, or are our attacks not strong enough for certain datapoints? In this paper, we provide the first per-instance (i.e., ``data-dependent") DP analysis of DP-SGD. Our analysis captures the intuition that points with similar neighbors in the dataset enjoy better data-dependent privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints (when trained on common benchmarks) than the current data-independent guarantee. This implies privacy attacks will necessarily fail against many datapoints if the adversary does not have sufficient control over the possible training datasets.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# MOCA:masked Online Codebook Assignmentsの予測による自己指導型表現学習 MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments ( http://arxiv.org/abs/2307.09361v2 ) ライセンス: Link先を確認	Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos Komodakis, Matthieu Cord, Patrick Pérez,	(参考訳) 自己教師付き学習は、非常に大きな完全注釈付きデータセットに対するビジョントランスフォーマーネットワークの欲求を緩和するために使用することができる。自己教師付き学習の異なるクラスは、例えば、マスク付き画像モデリング戦略を使用する、あるいは、コントラスト的手法で画像摂動に不変な、文脈的推論特性を持つ表現を提供する。そこで本研究では,高レベルの特徴(ピクセルレベルの細部ではなく)で定義された新しいマスク・アンド・予測目標を用いて,所望のプロパティを統一するMOCAを提案する。さらに,学習パラダイムを相乗的かつ計算効率のよい方法で効果的に活用する方法を示す。そこで我々は,従来の手法よりも3倍高速なトレーニングを施した各種評価プロトコルにおいて,低照度設定における最先端の新たな結果と強力な実験結果を得る。実装コードはhttps://github.com/valeoai/MOCAで提供します。 Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose a single-stage and standalone method, MOCA, which unifies both desired properties using novel mask-and-predict objectives defined with high-level features (instead of pixel-level details). Moreover, we show how to effectively employ both learning paradigms in a synergistic and computation-efficient way. Doing so, we achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols with a training that is at least 3 times faster than prior methods. We provide the implementation code at https://github.com/valeoai/MOCA.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# HeightFormer:バードアイビューにおけるカメラのみの3次元物体検出のための余分なデータのない明示的な高さモデリング HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird's Eye View ( http://arxiv.org/abs/2307.13510v3 ) ライセンス: Link先を確認	Yiming Wu, Ruixiang Li, Zequn Qin, Xinhai Zhao, Xi Li,	(参考訳) 視覚に基づくバードアイビュー(Bird's Eye View, BEV)の表現は、自律運転のための新たな知覚定式化である。最大の課題は、マルチカメラ機能を備えたBEVスペースを構築することだ。従来のBEV表現生成手法に分割すると,そのほとんどはイメージビューの深度をモデル化するか,BEV空間の高さをモデル化するかの2つのタイプに分類される。本研究では、LiDARのような余分なデータを必要としないBEV空間における高さを明示的にモデル化し、モデリング深度と比較して任意のカメラリグやタイプを適合させることができることを提案する。理論的には,高さに基づく手法と深さに基づく手法の等価性を示す。自己再帰的手法で高さと不確実性をモデル化するHeightFormerを提案する。追加のデータがなければ、提案されたHeightFormerはBEVの高度を正確に見積もることができる。ベンチマークの結果,HeightFormerの性能はカメラのみの手法と比較してSOTAを実現していることがわかった。 Vision-based Bird's Eye View (BEV) representation is an emerging perception formulation for autonomous driving. The core challenge is to construct BEV space with multi-camera features, which is a one-to-many ill-posed problem. Diving into all previous BEV representation generation methods, we found that most of them fall into two types: modeling depths in image views or modeling heights in the BEV space, mostly in an implicit way. In this work, we propose to explicitly model heights in the BEV space, which needs no extra data like LiDAR and can fit arbitrary camera rigs and types compared to modeling depths. Theoretically, we give proof of the equivalence between height-based methods and depth-based methods. Considering the equivalence and some advantages of modeling heights, we propose HeightFormer, which models heights and uncertainties in a self-recursive way. Without any extra data, the proposed HeightFormer could estimate heights in BEV accurately. Benchmark results show that the performance of HeightFormer achieves SOTA compared with those camera-only methods.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# 一般化されたアンバイアス付きシーングラフ生成 Generalized Unbiased Scene Graph Generation ( http://arxiv.org/abs/2308.04802v2 ) ライセンス: Link先を確認	Xinyu Lyu, Lianli Gao, Junlin Xie, Pengpeng Zeng, Yulu Tian, Jie Shao, Heng Tao Shen,	(参考訳) 既存のUnbiased Scene Graph Generation (USGG) 手法は、概念レベルの不均衡を克服しながら、高周波クラスが希少なクラスの予測を支配している述語レベルの不均衡に対処することのみに焦点を当てている。実際、たとえ述語自体がバランスが取れているとしても、文脈の長い尾の分布(つまり主観と対象の組み合わせ)のために、その中に重要な概念不均衡が存在する。この概念レベルの不均衡は、主対象対が本質的に結合において複雑であるため、述語レベルの不均衡よりも広範で困難な問題を引き起こす。そこで我々は, 述語レベルと概念レベルの両不均衡を考慮に入れた, 一般化されたアンバイアスドシーングラフ生成(G-USGG)という新たな研究問題を紹介した。最後に,MCL(Multi-Concept Learning)フレームワークを提案する。 MCLはまず、異なる概念の量の観点から述語間の概念レベルの不均衡を定量化し、同じクラス内の複数の概念-プロトタイプとして表す。その後、概念正規化(CR)技術を用いて概念プロトタイプを効果的に学習する。さらに、異なる概念に対するバランスの取れた学習を実現するために、SGGモデルを誘導し、コンセプトプロトタイプのためのバランスのとれた表現を生成する、バランスのとれたプロトタイプメモリ(BPM)を導入する。 VG-SGGデータセットとOI-SGGデータセットのベンチマークモデルの性能向上に,我々のモデル非依存戦略の顕著な効果が実証された。 Existing Unbiased Scene Graph Generation (USGG) methods only focus on addressing the predicate-level imbalance that high-frequency classes dominate predictions of rare ones, while overlooking the concept-level imbalance. Actually, even if predicates themselves are balanced, there is still a significant concept-imbalance within them due to the long-tailed distribution of contexts (i.e., subject-object combinations). This concept-level imbalance poses a more pervasive and challenging issue compared to the predicate-level imbalance since subject-object pairs are inherently complex in combinations. Hence, we introduce a novel research problem: Generalized Unbiased Scene Graph Generation (G-USGG), which takes into account both predicate-level and concept-level imbalance. To the end, we propose the Multi-Concept Learning (MCL) framework, which ensures a balanced learning process across rare/ uncommon/ common concepts. MCL first quantifies the concept-level imbalance across predicates in terms of different amounts of concepts, representing as multiple concept-prototypes within the same class. It then effectively learns concept-prototypes by applying the Concept Regularization (CR) technique. Furthermore, to achieve balanced learning over different concepts, we introduce the Balanced Prototypical Memory (BPM), which guides SGG models to generate balanced representations for concept-prototypes. Extensive experiments demonstrate the remarkable efficacy of our model-agnostic strategy in enhancing the performance of benchmark models on both VG-SGG and OI-SGG datasets, leading to new state-of-the-art achievements in two key aspects: predicate-level unbiased relation recognition and concept-level compositional generability.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# 変分量子回路による多変量積分 Multi-variable integration with a variational quantum circuit ( http://arxiv.org/abs/2308.05657v2 ) ライセンス: Link先を確認	Juan M. Cruz-Martinez, Matteo Robbiati, Stefano Carrazza,	(参考訳) 本研究では,量子回路を用いた多変数積分の評価手法を提案する。手順はまず、積分変数をパラメトリック回路に符号化する。得られた回路は、パラメータシフトルール法を用いて積分変数に対して導出される。導関数を表すオブザーバブルは、量子機械学習アプローチに従って、ターゲット積分関数の予測器として使用される。積分は、元の回路を評価することによって積分計算の基本定理を用いて推定される。再ロード戦略に従ってデータを埋め込み、多次元変数を回路のゲートに容易にエンコードし、回路を導出しながら個別にターゲットとして取り込むことができる。これらの手法は、関数を部分的に統合したり、トレーニングハイパースペース内でパラメトリック積分を高速に計算するために利用することができる。 In this work we present a novel strategy to evaluate multi-variable integrals with quantum circuits. The procedure first encodes the integration variables into a parametric circuit. The obtained circuit is then derived with respect to the integration variables using the parameter shift rule technique. The observable representing the derivative is then used as the predictor of the target integrand function following a quantum machine learning approach. The integral is then estimated using the fundamental theorem of integral calculus by evaluating the original circuit. Embedding data according to a reuploading strategy, multi-dimensional variables can be easily encoded into the circuit's gates and then individually taken as targets while deriving the circuit. These techniques can be exploited to partially integrate a function or to quickly compute parametric integrands within the training hyperspace.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# CUPID:より正確なバグレポート検出のためのChatGPTの活用 CUPID: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection ( http://arxiv.org/abs/2308.10022v3 ) ライセンス: Link先を確認	Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, David Lo,	(参考訳) 重複バグレポート検出(DBRD)は、学術と産業の両方において長年の課題である。過去数十年にわたって、重複バグレポートをより正確に検出するための様々なアプローチが提案されてきた。近年のディープラーニングの進歩により、DBRDタスクに対処するためのディープラーニングベースのアプローチもいくつか提案されている。多くのバグレポートを持つバグリポジトリでは、ディープラーニングベースのアプローチが有望なパフォーマンスを示している。しかし、バグ報告が少ないバグレポジトリでは、既存のディープラーニングアプローチは従来のアプローチよりもパフォーマンスが悪くなっている。従来のアプローチにも制限がある。例えば、バグレポートのセマンティクスをキャプチャできないbag-of-wordsモデルに基づいているのが一般的だ。上記の課題に対処するため,従来のDBRDアプローチの性能向上のために,最先端の大規模言語モデル(LLM)を活用しようと考えている。本稿では,従来のDBRD手法(すなわちREP)と最先端LLM(すなわちChatGPT)を組み合わせたCUPIDという手法を提案する。 CUPIDと既存の3つのデータセットを比較して評価を行った。実験の結果、CUPIDは最先端の結果を達成し、解析されたすべてのデータセットに対して、Recall Rate@10スコアが0.602から0.654まで到達した。特に、CUPIDは、データセットのリコールレート@10において、従来の最先端アプローチよりも5%から8%改善している。 CUPIDはまた、最先端のディープラーニングベースのDBRDアプローチを最大82%上回った。 Duplicate bug report detection (DBRD) is a long-standing challenge in both academia and industry. Over the past decades, researchers have proposed various approaches to detect duplicate bug reports more accurately. With the recent advancement of deep learning, researchers have also proposed several deep learning-based approaches to address the DBRD task. In the bug repositories with many bug reports, deep learning-based approaches have shown promising performance. However, in the bug repositories with a smaller number of bug reports, i.e., around 10k, the existing deep learning approaches show worse performance than the traditional approaches. Traditional approaches have limitations, too, e.g., they are usually based on the bag-of-words model, which cannot capture the semantics of bug reports. To address these aforementioned challenges, we seek to leverage a state-of-the-art large language model (LLM) to improve the performance of the traditional DBRD approach. In this paper, we propose an approach called CUPID, which combines the bestperforming traditional DBRD approach (i.e., REP) with the state-of-the-art LLM (i.e., ChatGPT). We conducted an evaluation by comparing CUPID with three existing approaches on three datasets. The experimental results show that CUPID achieves state-of-theart results, reaching Recall Rate@10 scores ranging from 0.602 to 0.654 across all the datasets analyzed. In particular, CUPID improves over the prior state-ofthe-art approach by 5% - 8% in terms of Recall Rate@10 in the datasets. CUPID also surpassed the state-of-the-art deep learning-based DBRD approach by up to 82%.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# GADePo:文書レベル関係抽出のためのグラフ支援宣言型ポーリング変換器 GADePo: Graph-Assisted Declarative Pooling Transformers for Document-Level Relation Extraction ( http://arxiv.org/abs/2308.14423v3 ) ライセンス: Link先を確認	Andrei C. Coman, Christos Theodoropoulos, Marie-Francine Moens, James Henderson,	(参考訳) 文書レベルの関係抽出は、典型的にはテキストベースのエンコーダと手書きプーリングヒューリスティックに頼り、エンコーダが学習した情報を集約する。本稿では,Transformerモデルの本質的なグラフ処理機能を活用し,アテンション重み計算における明示的なグラフ関係による情報収集を目的とした,手書きプーリング手法を入力に新しいトークンで置き換えることを提案する。本稿では,共同テキストグラフ変換モデルとグラフ支援型宣言型プール(GADePo)仕様を導入し,情報集約のための明示的かつ高レベルな命令を提供する。 GADePoによって、プールプロセスはドメイン固有の知識や望ましい結果によってガイドされるが、Transformerによってまだ学習され、より柔軟でカスタマイズ可能なプール戦略が実現される。提案手法は,多様なデータセットやモデルにまたがって評価し,手作業によるプール機能よりも一貫した優れた有望な結果が得られることを示す。 Document-level relation extraction typically relies on text-based encoders and hand-coded pooling heuristics to aggregate information learned by the encoder. In this paper, we leverage the intrinsic graph processing capabilities of the Transformer model and propose replacing hand-coded pooling methods with new tokens in the input, which are designed to aggregate information via explicit graph relations in the computation of attention weights. We introduce a joint text-graph Transformer model and a graph-assisted declarative pooling (GADePo) specification of the input, which provides explicit and high-level instructions for information aggregation. GADePo allows the pooling process to be guided by domain-specific knowledge or desired outcomes but still learned by the Transformer, leading to more flexible and customisable pooling strategies. We evaluate our method across diverse datasets and models and show that our approach yields promising results that are consistently better than those achieved by the hand-coded pooling functions.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# 深度3次元視覚接地における両眼融合改善のための4つの方法 Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding ( http://arxiv.org/abs/2309.04561v3 ) ライセンス: Link先を確認	Ozan Unal, Christos Sakaridis, Suman Saha, Luc Van Gool,	(参考訳) 3Dビジュアルグラウンドティング(3D visual grounding)は、自然言語で記述された3Dシーンでオブジェクトをローカライズするタスクである。自律型屋内ロボティクスからAR/VRまで幅広い応用により、このタスクは最近人気が高まっている。 3次元の視覚的接地に取り組むための一般的な定式化は、境界ボックスを介して局所化を行うグラウンド・バイ・検出である。しかし、物理的な相互作用を必要とする現実のアプリケーションでは、境界ボックスは対象の幾何学を十分に記述していない。そこで我々は,高密度な3次元視覚的接地,すなわちレファレンシャルベースの3次元インスタンスセグメンテーションの問題に取り組む。本研究では,4つの新しいスタンドアロンモジュールを特徴とする高密度な3DグラウンドネットワークであるContactNetを提案する。まず,階層間関係を曖昧にすることを目的としたボトムアップ注意融合モジュールを導入し,次に,潜時空間の分離を誘導する対照的な学習手法を構築し,学習されたグローバルカメラトークンを用いてビュー依存発話を解決し,最後に,参照マスクの品質を向上させるためにマルチビューアンサンブルを用いる。 concreteNetは、挑戦的なScanReferオンラインベンチマークで1位にランクインし、ICCV 3rd Workshop on Language for 3D Scenes "3D Object Localization"チャレンジで優勝した。私たちのコードはouenal.github.io/concretenet/で利用可能です。 3D visual grounding is the task of localizing the object in a 3D scene which is referred by a description in natural language. With a wide range of applications ranging from autonomous indoor robotics to AR/VR, the task has recently risen in popularity. A common formulation to tackle 3D visual grounding is grounding-by-detection, where localization is done via bounding boxes. However, for real-life applications that require physical interactions, a bounding box insufficiently describes the geometry of an object. We therefore tackle the problem of dense 3D visual grounding, i.e. referral-based 3D instance segmentation. We propose a dense 3D grounding network ConcreteNet, featuring four novel stand-alone modules that aim to improve grounding performance for challenging repetitive instances, i.e. instances with distractors of the same semantic class. First, we introduce a bottom-up attentive fusion module that aims to disambiguate inter-instance relational cues, next, we construct a contrastive training scheme to induce separation in the latent space, we then resolve view-dependent utterances via a learned global camera token, and finally we employ multi-view ensembling to improve referred mask quality. ConcreteNet ranks 1st on the challenging ScanRefer online benchmark and has won the ICCV 3rd Workshop on Language for 3D Scenes "3D Object Localization" challenge. Our code is available at ouenal.github.io/concretenet/.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# スケーラブルニューラルネットワークによる粒子流イベント再構成の改良 Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors ( http://arxiv.org/abs/2309.06782v6 ) ライセンス: Link先を確認	Joosep Pata, Eric Wulff, Farouk Mokhtar, David Southwick, Mengke Zhang, Maria Girone, Javier Duarte,	(参考訳) 高感度大型ハドロン衝突型加速器とFuture Circular Colliderで期待される高粒度検出器の粒子を、効率的かつ正確なアルゴリズムで再構成する必要がある。電子-陽電子衝突における事象再構成のためのスケーラブルな機械学習モデルについて, フル検出器シミュレーションに基づく検討を行った。粒子フロー再構成は、トラックとカロリークラスタを用いた教師付き学習タスクとして定式化することができる。グラフニューラルネットワークとカーネルベースのトランスフォーマーを比較し、現実的な再構築を実現しながら二次演算を回避できることを実証する。ハイパーパラメータチューニングにより,モデルの性能が大幅に向上することを示す。最良のグラフニューラルネットワークモデルでは、ルールベースのアルゴリズムと比較して、ジェット横運動量分解能が最大50%向上している。結果はNvidia、AMD、Habanaのハードウェアに移植できる。高精度かつ高速な機械学習に基づく再構築は、衝突機における将来の測定を大幅に改善することができる。 Efficient and accurate algorithms are necessary to reconstruct particles in the highly granular detectors anticipated at the High-Luminosity Large Hadron Collider and the Future Circular Collider. We study scalable machine learning models for event reconstruction in electron-positron collisions based on a full detector simulation. Particle-flow reconstruction can be formulated as a supervised learning task using tracks and calorimeter clusters. We compare a graph neural network and kernel-based transformer and demonstrate that we can avoid quadratic operations while achieving realistic reconstruction. We show that hyperparameter tuning significantly improves the performance of the models. The best graph neural network model shows improvement in the jet transverse momentum resolution by up to 50% compared to the rule-based algorithm. The resulting model is portable across Nvidia, AMD and Habana hardware. Accurate and fast machine-learning based reconstruction can significantly improve future measurements at colliders.	翻訳日:2024-07-18 00:10:39 公開日:2024-07-16
# 言語モデルの物理:その3.1,知識の蓄積と抽出 Physics of Language Models: Part 3.1, Knowledge Storage and Extraction ( http://arxiv.org/abs/2309.14316v3 ) ライセンス: Link先を確認	Zeyuan Allen-Zhu, Yuanzhi Li,	(参考訳) 大規模な言語モデル(LLM)は膨大な量の世界の知識を格納することができ、しばしば質問回答によって抽出できる(例:エイブラハム・リンカーンの誕生日とは何か)。しかし、これらの質問は、トレーニング中に類似した質問(つまり不正行為)に暴露されたり、ウィキペディアのような情報源から知識を引き出すために真に学習することで答えるのだろうか? 本稿では,制御されたバイオグラフィーデータセットを用いてこの問題を考察する。モデルが知識を抽出する能力と,トレーニングデータの多様な多様性尺度との間には,強い相関関係が認められた。 $\textbf{Essentially}$、知識を確実に抽出するには、十分な拡張(言い換え、文シャッフル、翻訳)が必要である。このような拡張がなければ、知識は記憶されるが抽出できないため、その後の命令の微調整に関わらず、精度は0%になる。この理由を理解するために、我々は、観測された相関関係とモデル内部の知識のエンコード方法(エンティティ名の隠された埋め込みに線形にエンコードされているか、あるいはトレーニングテキストに他のトークンの埋め込みに分散されているか)の強い関係を示すために、線形なプローブを用いています。本論文では、LLM事前学習のための$\textbf{several key recommend for LLM pretraining}$: (1) 事前学習データ -- 小さな補助モデルを使って -- を書き換え、知識増強を提供し、(2) 事前学習段階により多くの命令精細化データを組み込む。 Large language models (LLMs) can store a vast amount of world knowledge, often extractable via question-answering (e.g., "What is Abraham Lincoln's birthday?"). However, do they answer such questions based on exposure to similar questions during training (i.e., cheating), or by genuinely learning to extract knowledge from sources like Wikipedia? In this paper, we investigate this issue using a controlled biography dataset. We find a strong correlation between the model's ability to extract knowledge and various diversity measures of the training data. $\textbf{Essentially}$, for knowledge to be reliably extracted, it must be sufficiently augmented (e.g., through paraphrasing, sentence shuffling, translations) $\textit{during pretraining}$. Without such augmentation, knowledge may be memorized but not extractable, leading to 0% accuracy, regardless of subsequent instruction fine-tuning. To understand why this occurs, we employ (nearly) linear probing to demonstrate a strong connection between the observed correlation and how the model internally encodes knowledge -- whether it is linearly encoded in the hidden embeddings of entity names or distributed across other token embeddings in the training text. This paper provides $\textbf{several key recommendations for LLM pretraining in the industry}$: (1) rewrite the pretraining data -- using small, auxiliary models -- to provide knowledge augmentation, and (2) incorporate more instruction-finetuning data into the pretraining stage before it becomes too late.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# 言語モデルの物理:その3.2,知識操作 Physics of Language Models: Part 3.2, Knowledge Manipulation ( http://arxiv.org/abs/2309.14402v2 ) ライセンス: Link先を確認	Zeyuan Allen-Zhu, Yuanzhi Li,	(参考訳) 言語モデルは膨大な事実知識を格納することができるが、この知識を下流のタスク(例えば、命令の微調整)に柔軟に活用する能力には疑問が残る。本稿では、検索(eg, "A's attribute X?")、分類(eg, "A's attribute X even or odd?)、比較(eg, "A's greater than B in attribute X?")、逆探索(eg, "Which person's attribute X equals T?)の4つの基本的な知識操作タスクについて検討する。思考の連鎖(CoT)を学習と推論の両方に用いない限り,言語モデルは知識検索に優れるが,最も単純な分類や比較タスクにおいても困難であることを示す。さらに,その逆知識探索における性能は,プロンプトによらずほぼ0%である。十分なトレーニングと十分なモデルサイズにもかかわらず、これらの知識がモデルに完全に格納されている場合でも、事前学習データから知識を効率的に操作することはできない。また、GPT-4のような現代の事前学習言語モデルにも適用でき、現代のAIと人間を区別するためのチューリングテストが多数発生している。 Language models can store vast factual knowledge, yet their ability to flexibly use this knowledge for downstream tasks (e.g., via instruction finetuning) remains questionable. This paper investigates four fundamental knowledge manipulation tasks: retrieval (e.g., "What is person A's attribute X?"), classification (e.g., "Is A's attribute X even or odd?"), comparison (e.g., "Is A greater than B in attribute X?"), and inverse search (e.g., "Which person's attribute X equals T?"). We show that language models excel in knowledge retrieval but struggle even in the simplest classification or comparison tasks unless Chain of Thoughts (CoTs) are employed during both training and inference. Moreover, their performance in inverse knowledge search is virtually 0%, regardless of the prompts. Our primary contribution is a controlled, synthetic experiment that confirms these weaknesses are inherent to language models: they cannot efficiently manipulate knowledge from pre-training data, even when such knowledge is perfectly stored in the models, despite adequate training and sufficient model size. Our findings also apply to modern pretrained language models such as GPT-4, thus giving rise to many Turing tests to distinguish Humans from contemporary AIs.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# UltraFeedback: 大規模AIフィードバックによる言語モデルの強化 UltraFeedback: Boosting Language Models with Scaled AI Feedback ( http://arxiv.org/abs/2310.01377v2 ) ライセンス: Link先を確認	Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, Maosong Sun,	(参考訳) 人間からのフィードバックから学ぶことは、大きな言語モデル(LLM)と人間の好みを整合させる重要なテクニックとなっている。しかし、膨大な量の人的フィードバックを取得することは、時間、労力、人的能力によってボトルネックとなり、結果として、現在のデータセットの小さなサイズや限られたトピックが生まれる。これにより、フィードバック学習だけでなく、オープンソースコミュニティ内のアライメント調査も妨げられます。この問題に対処するために,人間のフィードバックを超えて,スケーラブルな代替手段として高品質な‘textit{AI feedback’を自動的に収集する方法を検討する。具体的には,フィードバックデータに影響を及ぼす重要な要因として,‘textbf{scale and diversity} を同定する。そこで,我々はまず,幅広いユーザ・アシスタントインタラクションを包含するために,量と幅の両方で指示と応答を広げる。そして、より信頼性の高いAIフィードバックに対するアノテーションバイアスを軽減するために、慎重に一連のテクニックを適用します。我々はついに、大規模で高品質で多様なAIフィードバックデータセットである‘textsc{UltraFeedback}を提示した。 textsc{UltraFeedback}に基づいて構築され、LLaMAベースのモデルをベスト・オブ・n$のサンプリングと強化学習によって整列させ、チャットベンチマークで例外的なパフォーマンスを示す。我々の研究は、オープンソースのチャット言語モデルの構築におけるスケールドAIフィードバックデータの有効性を検証し、将来のフィードバック学習研究の基盤となる。我々のデータとモデルはhttps://github.com/thunlp/UltraFeedback.comで利用可能です。 Learning from human feedback has become a pivot technique in aligning large language models (LLMs) with human preferences. However, acquiring vast and premium human feedback is bottlenecked by time, labor, and human capability, resulting in small sizes or limited topics of current datasets. This further hinders feedback learning as well as alignment research within the open-source community. To address this issue, we explore how to go beyond human feedback and collect high-quality \textit{AI feedback} automatically for a scalable alternative. Specifically, we identify \textbf{scale and diversity} as the key factors for feedback data to take effect. Accordingly, we first broaden instructions and responses in both amount and breadth to encompass a wider range of user-assistant interactions. Then, we meticulously apply a series of techniques to mitigate annotation biases for more reliable AI feedback. We finally present \textsc{UltraFeedback}, a large-scale, high-quality, and diversified AI feedback dataset, which contains over 1 million GPT-4 feedback for 250k user-assistant conversations from various aspects. Built upon \textsc{UltraFeedback}, we align a LLaMA-based model by best-of-$n$ sampling and reinforcement learning, demonstrating its exceptional performance on chat benchmarks. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models, serving as a solid foundation for future feedback learning research. Our data and models are available at https://github.com/thunlp/UltraFeedback.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# 土壌相互作用を考慮した自由フェルミオン系における情報交換 Information Scrambling in Free Fermion Systems with a Sole Interaction ( http://arxiv.org/abs/2310.07043v2 ) ライセンス: Link先を確認	Qucheng Gao, Pengfei Zhang, Xiao Chen,	(参考訳) 単一不純物の存在は、低温における量子多体系の輸送特性に大きな影響を与えることがよく確認されている。本研究では,量子情報力学の観点から,この問題の類似性について検討する。我々は、自由フェルミオンホッピング項と単独相互作用からなるブラウン回路とクリフォード回路を構築する。両回路とも,演算子のスクランブルの発生が明らかとなった。特に、作用素の成長は、単一の点に局所化された元項の存在下での対称排除過程にマッピングすることができる。 1次元システムでは、演算子と絡み合いの両方が拡散スケーリングを示す。逆に、オール・ツー・オールホッピングによって特徴づけられるシナリオでは、作用素のサイズは指数関数的に成長し、エンタングルメントは時間とともに線形に増加する。 It is well established that the presence of single impurity can have a substantial impact on the transport properties of quantum many-body systems at low temperature. In this work, we investigate a close analog of this problem from the perspective of quantum information dynamics. We construct Brownian circuits and Clifford circuits consisting of a free fermion hopping term and a sole interaction. In both circuits, our findings reveal the emergence of operator scrambling. Notably, the growth of the operator can be mapped to the symmetric exclusion process in the presence of a source term localized at a single point. We demonstrate that in the one-dimensional system, both the operator and entanglement exhibit diffusive scaling. Conversely, in scenarios characterized by all-to-all hopping, the operator's size undergoes exponential growth, while the entanglement exhibits a linear increase over time.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# 汎用バックボーンネットワーク設計のための画像復元ネットワークの比較検討 A Comparative Study of Image Restoration Networks for General Backbone Network Design ( http://arxiv.org/abs/2310.11881v4 ) ライセンス: Link先を確認	Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, Chao Dong,	(参考訳) 様々な画像復元作業における深層モデルによる顕著な進歩にもかかわらず、既存の画像復元ネットワークはタスクの汎用性の観点からも課題に直面している。直感的な表現は、あるタスクで優れているネットワークは、他のタスクで満足な結果をもたらすのに失敗することが多い、ということである。この点を説明するために、5つの代表的ネットワークを選択し、5つの古典的画像復元タスクの比較研究を行う。まず、画像復元タスクとバックボーンネットワークの特徴について、詳細な説明を行う。次に、ベンチマーク結果を示し、様々なタスクにおける異なるモデルの性能格差の背景にある理由を分析する。この比較研究から,一般的な画像復元バックボーンネットワークは多様なタスクの機能的要件を満たす必要があることを示唆する。この原理に基づいて,新しい画像復元バックボーンネットワークであるX-Restormerを設計する。大規模な実験により、X-Restormerは優れたタスクの汎用性を有し、様々なタスクで最先端のパフォーマンスを達成することが示された。 Despite the significant progress made by deep models in various image restoration tasks, existing image restoration networks still face challenges in terms of task generality. An intuitive manifestation is that networks which excel in certain tasks often fail to deliver satisfactory results in others. To illustrate this point, we select five representative networks and conduct a comparative study on five classic image restoration tasks. First, we provide a detailed explanation of the characteristics of different image restoration tasks and backbone networks. Following this, we present the benchmark results and analyze the reasons behind the performance disparity of different models across various tasks. Drawing from this comparative study, we propose that a general image restoration backbone network needs to meet the functional requirements of diverse tasks. Based on this principle, we design a new general image restoration backbone network, X-Restormer. Extensive experiments demonstrate that X-Restormer possesses good task generality and achieves state-of-the-art performance across a variety of tasks.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# 外部誘導による画像クラスタリング Image Clustering with External Guidance ( http://arxiv.org/abs/2310.11989v3 ) ライセンス: Link先を確認	Yunfan Li, Peng Hu, Dezhong Peng, Jiancheng Lv, Jianping Fan, Xi Peng,	(参考訳) クラスタリングのコアは、監視信号を構築するために、事前の知識を取り入れている。データコンパクト性に基づく古典的なk-平均から、自己スーパービジョンによって導かれる最近のコントラストクラスタリングまで、クラスタリング法の進化は本質的に監督信号の進行に対応している。現在、データから内部監視信号のマイニングに多大な努力が注がれている。それでも、クラスタリングに自然に寄与する意味記述のような豊富な外部知識は、残念なことに見過ごされている。本研究では,クラスタリングを誘導する新たな監視信号として外部知識を活用することを提案する。提案手法の実装と検証のために,WordNetのテキストセマンティクスを活用して画像クラスタリングを容易にする外部ガイド型クラスタリング手法(Text-Aided Clustering, TAC)を設計した。特に、TACは最初にWordNetの名詞を選択して検索し、特徴識別性を高めるために画像を最もよく区別する。そして、画像クラスタリング性能を向上させるために、TACは、相互にモダル近傍情報を蒸留することにより、テキストと画像のモダリティを協調する。実験によると、TACは、広く使用されている5つの画像クラスタリングベンチマークと、完全なImageNet-1Kデータセットを含む、より難しい3つのイメージクラスタリングベンチマークで、最先端のパフォーマンスを達成する。 The core of clustering is incorporating prior knowledge to construct supervision signals. From classic k-means based on data compactness to recent contrastive clustering guided by self-supervision, the evolution of clustering methods intrinsically corresponds to the progression of supervision signals. At present, substantial efforts have been devoted to mining internal supervision signals from data. Nevertheless, the abundant external knowledge such as semantic descriptions, which naturally conduces to clustering, is regrettably overlooked. In this work, we propose leveraging external knowledge as a new supervision signal to guide clustering, even though it seems irrelevant to the given data. To implement and validate our idea, we design an externally guided clustering method (Text-Aided Clustering, TAC), which leverages the textual semantics of WordNet to facilitate image clustering. Specifically, TAC first selects and retrieves WordNet nouns that best distinguish images to enhance the feature discriminability. Then, to improve image clustering performance, TAC collaborates text and image modalities by mutually distilling cross-modal neighborhood information. Experiments demonstrate that TAC achieves state-of-the-art performance on five widely used and three more challenging image clustering benchmarks, including the full ImageNet-1K dataset.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# アウト・オブ・ディストリビューション検出のためのディープ・アンサンブルの再検討:失われた景観の展望 Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective ( http://arxiv.org/abs/2310.14227v2 ) ライセンス: Link先を確認	Kun Fang, Qinghua Tao, Xiaolin Huang, Jie Yang,	(参考訳) In-Distribution(In-Distribution(InD)データからOoDサンプルを検出する既存のOoD検出手法は,Deep Neural Networks(DNNs)の特徴,ロジット,勾配の差を主に探究する。本研究では,OoD検出における損失景観とモードアンサンブルの新しい視点を提案する。 DNNの最適化では、パラメータ空間やモードに多くの局所最適化が存在する。興味深いことに、これらの独立モードはInDデータ(トレーニングとテストデータ)で低損失領域に到達するが、OoDデータでは損失ランドスケープが著しく異なる。このような観察は、損失ランドスケープからのOoD検出を調査するための新しい視点を提供し、さらに、これらのモード間でのOoD検出性能を著しく変動させることを示唆している。例えば、RopFeatメソッドのFPR値は5つのモードのうち46.58%から84.70%まで変化し、独立モード間で不確実な検出性能評価を示す。モード間におけるOoD損失ランドスケープの多様化により,モードアンサンブルによるOoD検出の深層アンサンブル法が再検討され,性能が向上し,ばらつきを低減したOoD検出器のメリットが得られた。様々なOoD検出器とネットワーク構造を包含する広範囲な実験は、モード間の高いばらつきを示し、OOD検出を促進するモードアンサンブルの優位性を検証する。我々は、OoDデータのロスランドスケープにおける独立モードや、OoD検出器の信頼性の高い評価の観点から、この研究が注目されることを期待している。 Existing Out-of-Distribution (OoD) detection methods address to detect OoD samples from In-Distribution (InD) data mainly by exploring differences in features, logits and gradients in Deep Neural Networks (DNNs). We in this work propose a new perspective upon loss landscape and mode ensemble to investigate OoD detection. In the optimization of DNNs, there exist many local optima in the parameter space, or namely modes. Interestingly, we observe that these independent modes, which all reach low-loss regions with InD data (training and test data), yet yield significantly different loss landscapes with OoD data. Such an observation provides a novel view to investigate the OoD detection from the loss landscape, and further suggests significantly fluctuating OoD detection performance across these modes. For instance, FPR values of the RankFeat method can range from 46.58% to 84.70% among 5 modes, showing uncertain detection performance evaluations across independent modes. Motivated by such diversities on OoD loss landscape across modes, we revisit the deep ensemble method for OoD detection through mode ensemble, leading to improved performance and benefiting the OoD detector with reduced variances. Extensive experiments covering varied OoD detectors and network structures illustrate high variances across modes and validate the superiority of mode ensemble in boosting OoD detection. We hope this work could attract attention in the view of independent modes in the loss landscape of OoD data and more reliable evaluations on OoD detectors.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# 生成・検出のためのパラフレーズタイプ Paraphrase Types for Generation and Detection ( http://arxiv.org/abs/2310.14863v3 ) ライセンス: Link先を確認	Jan Philip Wahle, Bela Gipp, Terry Ruas,	(参考訳) パラフレーズの生成と検出の現在のアプローチは、言語の複雑な言語特性を無視して、単一の一般的な類似点に大きく依存している。本稿では, パラフレーズ型, 特定のテキスト位置における特定の言語摂動を考慮した2つの新しい課題を提案する。これらのタスクをパラフレーズ型生成とパラフレーズ型検出と呼ぶ。以上の結果から,従来の手法は二項分類のシナリオ,すなわちパラフレーズ化の有無でよく機能するが,粒度の細かいパラフレーズ型の含みは大きな課題となることが示唆された。ほとんどのアプローチは、一般的な意味的類似コンテンツの生成と検出に長けているが、それらが操作する固有の言語変数を理解できない。パラフレーズ型の生成と識別について訓練されたモデルは、それらなしでのタスクの改善も示している。さらに、これらのモデルをスケールすることで、パラフレーズの型を理解する能力がさらに向上する。我々は、パラフレーズ型が将来、パラフレーズモデルの開発とタスクの解決のための新しいパラダイムを解き放つことができると考えている。 Current approaches in paraphrase generation and detection heavily rely on a single general similarity score, ignoring the intricate linguistic properties of language. This paper introduces two new tasks to address this shortcoming by considering paraphrase types - specific linguistic perturbations at particular text positions. We name these tasks Paraphrase Type Generation and Paraphrase Type Detection. Our results suggest that while current techniques perform well in a binary classification scenario, i.e., paraphrased or not, the inclusion of fine-grained paraphrase types poses a significant challenge. While most approaches are good at generating and detecting general semantic similar content, they fail to understand the intrinsic linguistic variables they manipulate. Models trained in generating and identifying paraphrase types also show improvements in tasks without them. In addition, scaling these models further improves their ability to understand paraphrase types. We believe paraphrase types can unlock a new paradigm for developing paraphrase models and solving tasks in the future.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# 私たちは誰だ:自然言語処理と他の学術分野の影響の橋渡し We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields ( http://arxiv.org/abs/2310.14870v3 ) ライセンス: Link先を確認	Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad,	(参考訳) 自然言語処理(NLP)は、世界に大きな影響を与える可能性がある。しかし、大きな進歩は大きなリスクを伴う。これに対処するには、様々な分野の研究に幅広く関与する必要がある。しかし、そのようなエンゲージメント(パストまたはカレント)の状態を実証する経験的な研究はほとんどない。本稿では,23分野の学習分野とNLP(相互に)の影響力の程度を定量化する。我々は,77kのNLP論文,NLP論文から他の論文への3.1mの引用,および他の論文からNLP論文への1.8mの引用を分析した。その結果,1980年には0.58から2022年には0.31に減少した。さらに、NLPはますます不規則になってきており、NLPの論文が増え、フィールド間のブリッジとして機能する論文も少なくなっている。 NLP引用の8%未満は言語学、3%未満は数学と心理学である。これらの知見は,NLPの様々な分野への関与を反映する緊急の必要性を浮き彫りにしている。 Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# VMAFによるPyTorchの再実装:実験結果 VMAF Re-implementation on PyTorch: Some Experimental Results ( http://arxiv.org/abs/2310.15578v4 ) ライセンス: Link先を確認	Kirill Aistov, Maxim Koroteev,	(参考訳) 標準VMAF実装に基づいて,PyTorchフレームワークを用いたVMAFの実装を提案する。この実装について、標準 (libvmaf) と比較すると、VMAF単位における差は$\lesssim 10^{-2}$である。目的関数としてVMAFを使用する場合の勾配計算について検討し、この関数を用いたトレーニングが不利な勾配を生じさせないことを示す。実装はプレプロセスフィルタのトレーニングに使用される。その性能はアンシャープマスキングフィルタよりも優れていることが実証された。結果として得られるフィルタは実装も容易であり、ビデオ圧縮改善のためのビデオ処理タスクにも適用できる。これは数値実験の結果によって確認される。 Based on the standard VMAF implementation we propose an implementation of VMAF using PyTorch framework. For this implementation comparisons with the standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We investigate gradients computation when using VMAF as an objective function and demonstrate that training using this function does not result in ill-behaving gradients. The implementation is then used to train a preprocessing filter. It is demonstrated that its performance is superior to the unsharp masking filter. The resulting filter is also easy for implementation and can be applied in video processing tasks for video copression improvement. This is confirmed by the results of numerical experiments.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# 時間優先による自己注意: 時間短縮からもっと学べるか? Self Attention with Temporal Prior: Can We Learn More from Arrow of Time? ( http://arxiv.org/abs/2310.18932v3 ) ライセンス: Link先を確認	Kyung Geun Kim, Byeong Tak Lee,	(参考訳) 自然界における多くの多様な現象は、特に時間の流れの方向から生じる短期的および長期的依存関係の両方を本質的にエンコードする。この点に関して、より近い時間スタンプでは、これらの事象の相互関係がより高いことを示す実験的証拠が発見された。しかし、注意に基づくモデルでこれらの規則を短期的な依存関係で学習するためには、大量のデータが必要である。これは、断片的な時間的依存を学ぶのに長けているが、注意に基づくモデルは時系列のバイアスをエンコードする構造を欠いているためである。そこで本研究では,学習可能な適応型カーネルをアテンション行列に直接適用することにより,これらのデータセットの短期的時間的バイアスをよりよく符号化する,シンプルで効率的な手法を提案する。我々はElectronic Health Records(EHR)データセットを用いた実験の様々な予測タスクを選択した。本実験は,ほとんどのタスクやデータセットにおいて,最高の性能を示すモデルと比較して,例外的な分類結果を示す。 Many diverse phenomena in nature often inherently encode both short- and long-term temporal dependencies, which especially result from the direction of the flow of time. In this respect, we discovered experimental evidence suggesting that interrelations of these events are higher for closer time stamps. However, to be able for attention-based models to learn these regularities in short-term dependencies, it requires large amounts of data, which are often infeasible. This is because, while they are good at learning piece-wise temporal dependencies, attention-based models lack structures that encode biases in time series. As a resolution, we propose a simple and efficient method that enables attention layers to better encode the short-term temporal bias of these data sets by applying learnable, adaptive kernels directly to the attention matrices. We chose various prediction tasks for the experiments using Electronic Health Records (EHR) data sets since they are great examples with underlying long- and short-term temporal dependencies. Our experiments show exceptional classification results compared to best-performing models on most tasks and data sets.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# TeacherLM: 魚を贈るよりも魚を教えること、言語モデリングも同じように TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise ( http://arxiv.org/abs/2310.19019v3 ) ライセンス: Link先を確認	Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan,	(参考訳) 大規模言語モデル(LLM)は、様々なNLPタスクにおいて印象的な推論とデータ拡張能力を示す。しかし、小さなモデルはどうだろう? 本研究では,多くのNLPサンプルに対して,関連する基本や思考の連鎖,一般的な誤りを注釈できるTeachLM-7.1Bを提案する。 TeacherLM-7.1BモデルはMMLUで0ショットスコア52.3を獲得し、100B以上のパラメータを持つほとんどのモデルを上回った。さらに注目すべきは、データ拡張機能だ。 TeacherLM-7.1Bに基づいて58個のNLPデータセットを拡張し,OPTおよびBLOOMシリーズと異なるパラメータを持つ様々な学生モデルをマルチタスク環境で教えた。実験結果から,TeachLMが提供するデータ拡張が大きなメリットをもたらしたことが示唆された。 TeacherLMシリーズのモデルと拡張データセットをオープンソースとしてリリースします。 Large Language Models (LLMs) exhibit impressive reasoning and data augmentation capabilities in various NLP tasks. However, what about small models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant fundamentals, chain of thought, and common mistakes for most NLP samples, which makes annotation more than just an answer, thus allowing other models to learn "why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we augmented 58 NLP datasets and taught various student models with different parameters from OPT and BLOOM series in a multi-task setting. The experimental results indicate that the data augmentation provided by TeacherLM has brought significant benefits. We will release the TeacherLM series of models and augmented datasets as open-source.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# フェデレーション・アンラーニングに関する調査 : 課題,方法,今後の方向性 A Survey on Federated Unlearning: Challenges, Methods, and Future Directions ( http://arxiv.org/abs/2310.20448v4 ) ライセンス: Link先を確認	Ziyao Liu, Yu Jiang, Jiyuan Shen, Minyi Peng, Kwok-Yan Lam, Xingliang Yuan, Xiaoning Liu,	(参考訳) 近年、忘れられる権利(RTBF)の概念は、デジタル信頼とAI安全のためのデータプライバシの重要な側面となり、要求に応じて個人の個人データの削除をサポートするメカニズムの提供が求められている。その結果、機械学習(MU)が注目され、MLモデルが識別可能な情報を選択的に排除することができるようになった。 MUから進化したFunderated Unlearning(FU)は、FLモデルにFLクライアントを解放する権限を与えるフェデレートラーニング(FL)設定におけるデータ消去の課題に直面している。それでも、連合学習の特徴は、FU技術に固有の課題をもたらす。これらの課題は、FUアルゴリズムを開発する際に適切な設計を必要とする。この分野では、様々な概念や多くの非学習スキームが存在するが、統一ワークフローとFUのカスタマイズ設計はまだ十分に理解されていない。そこで本研究では, FUにおける基礎概念と原則の概観, 既存のアンラーニングアルゴリズムの評価, フェデレーション学習に適した最適化の見直しなど, 手法と方法論を総合的に検討した。さらに、実用的応用について検討し、その限界を評価する。最後に、将来の研究への有望な方向性を概説する。 In recent years, the notion of ``the right to be forgotten" (RTBF) has become a crucial aspect of data privacy for digital trust and AI safety, requiring the provision of mechanisms that support the removal of personal data of individuals upon their requests. Consequently, machine unlearning (MU) has gained considerable attention which allows an ML model to selectively eliminate identifiable information. Evolving from MU, federated unlearning (FU) has emerged to confront the challenge of data erasure within federated learning (FL) settings, which empowers the FL model to unlearn an FL client or identifiable information pertaining to the client. Nevertheless, the distinctive attributes of federated learning introduce specific challenges for FU techniques. These challenges necessitate a tailored design when developing FU algorithms. While various concepts and numerous federated unlearning schemes exist in this field, the unified workflow and tailored design of FU are not yet well understood. Therefore, this comprehensive survey delves into the techniques and methodologies in FU providing an overview of fundamental concepts and principles, evaluating existing federated unlearning algorithms, and reviewing optimizations tailored to federated learning. Additionally, it discusses practical applications and assesses their limitations. Finally, it outlines promising directions for future research.	翻訳日:2024-07-18 00:00:40 公開日:2024-07-16
# 蒸留言語モデルにおける容量ギャップの法則に向けて Towards the Law of Capacity Gap in Distilling Language Models ( http://arxiv.org/abs/2311.07052v2 ) ライセンス: Link先を確認	Chen Zhang, Dawei Song, Zheyu Ye, Yan Gao,	(参考訳) 言語モデル (LM) 蒸留は, 大規模教師のLMに居住する知識を小学生に活用することを目的とした, 流行の分野である。蒸留の有効性を最大化するために様々な方法が提案されているが、特に教師と学生のLMの間にかなりの容量差がある場合、大きな課題が続いている。この問題は、しばしばキャパシティギャップの「textit{curse}」と呼ばれ、より大きな教師が、より小さな教師から蒸留されたものよりも優れた生徒をもたらすとは限らないことを示唆している。言い換えれば、教師のスケーリングコースに沿って、最高の生徒を得られる最適な教師がいる可能性が高い。さらに悪いことに、以前の研究で示されているように、余分な計算がなければキャパシティギャップの呪いは解けない。大規模な LM (LLMs) の文脈では、特に計算量を増やすことなく、大きな教師を良い生徒に蒸留することは不可能であるため、これまで実現可能であったアプローチは、はるかに意味を欠くものとなる。しかし、この物語は決して片面ではない。大規模な教師を使うことがリソース需要であることを知るのは遅刻しない。そのため、呪いを解き放つ代わりに、呪いをそのまま残し、小さいが適切な教師を使わなければならない。さらに、本論文では、法をスケールする精神を取り入れ、最適な教師スケールが、様々なモデルアーキテクチャやデータスケールにわたる学生スケールとほぼ一貫して線形に相関していることを明らかにし、幸運にも呪いをキャパシティギャップの「textit{law}」に変える。この法則は後に LLaMA2-7B から 3B の学生 LM (termed \textsc{MiniMA}) を除去するように導かれる。 \textsc{MiniMA} は幅広い 3B の競合より優れており、いくつかの 7B モデルと競合することも可能である。 Language model (LM) distillation is a trending area that aims to distil the knowledge residing in a large teacher LM to a small student one. While various methods have been proposed to maximize the effectiveness of the distillation, significant challenges persist, particularly when there is a substantial capacity gap between the teacher and student LMs. This issue, often referred to as the \textit{curse} of capacity gap, suggests that a larger teacher does not necessarily result in a superior student compared to one distilled from a smaller teacher. In other words, there is likely an optimal teacher yielding the best student along the scaling course of the teacher. Even worse, the curse of capacity gap can not be lifted without additional compute, as indicated in previous studies. In the context of large LMs (LLMs), previously viable approaches become much less meaningful, as it is impossible to distill a large teacher to a good student without notably additional compute. However, the tale is not ever one-sided. It is always not late to acquire that using a large teacher is resource-demanding. Consequently, instead of sticking to lifting the curse, leaving the curse as is and using a small yet adequate teacher should be arguably fine. Even better, in this paper, we take the spirits of scaling law and reveal that the optimal teacher scale is almost consistently and linearly correlated to the student scale across different model architectures and data scales, fortunately turning the curse into a \textit{law} of capacity gap. The law later guides us to distil a 3B student LM (termed \textsc{MiniMA}) from LLaMA2-7B. \textsc{MiniMA} is demonstrated to outperform a wide range of 3B competitors and could even compete with several 7B models.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# ロボット制御のための事前訓練強化学習を目的とした中央モータシステム A Central Motor System Inspired Pre-training Reinforcement Learning for Robotic Control ( http://arxiv.org/abs/2311.07822v3 ) ライセンス: Link先を確認	Pei Zhang, Zhaobo Hua, Jinliang Ding,	(参考訳) マルチジョイントロボットの自然運動能力を実現するためのコントローラーの設計は、大きな課題である。しかし、自然界の動物は自然に基本的な運動能力を持ち、獲得した学習を通じて様々な複雑な運動スキルを習得することができる。哺乳類の中枢運動系のメカニズムを解析し,ロボットが外部データに頼ることなく,リッチな運動能力を学び,複雑なタスク環境に適用することのできる,事前学習型強化学習アルゴリズムを提案する。本稿ではまず,小脳基底核における随意運動の選択機構と小脳の運動調節能力を利用して,小脳に似たスキルベースネットワークを設計する。その後、中央モーターシステムにおける高度なセンターの構造を模倣することにより、異なるスキルの組み合わせを生成するための高レベルなポリシーを提案し、ロボットが自然運動能力を得ることができるようにした。本研究では,4種類のロボットと22種類のタスク環境について実験を行い,提案手法により,柔軟な運動能力を実現することができることを示す。全体として、我々の研究はニューラルネットワークモーターコントローラの設計に有望なフレームワークを提供する。 Designing controllers to achieve natural motor capabilities for multi-joint robots is a significant challenge. However, animals in nature are naturally with basic motor abilities and can master various complex motor skills through acquired learning. On the basis of analyzing the mechanism of the central motor system in mammals, we propose a novel pre-training reinforcement learning algorithm that enables robots to learn rich motor skills and apply them to complex task environments without relying on external data. We first design a skill based network similar to the cerebellum by utilizing the selection mechanism of voluntary movements in the basal ganglia and the basic motor regulation ability of the cerebellum. Subsequently, by imitating the structure of advanced centers in the central motor system, we propose a high-level policy to generate different skill combinations, thereby enabling the robot to acquire natural motor abilities. We conduct experiments on 4 types of robots and 22 task environments, and the results show that the proposed method can enable different types of robots to achieve flexible motor skills. Overall, our research provides a promising framework for the design of neural network motor controllers.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# 分散(非)-ベイジアン推論の周波数保証 Frequentist Guarantees of Distributed (Non)-Bayesian Inference ( http://arxiv.org/abs/2311.08214v4 ) ライセンス: Link先を確認	Bohan Wu, César A. Uribe,	(参考訳) 大規模で分散化されたデータセットを分析する必要性から、分散ベイズ推論は統計学、電気工学、経済学など、様々な分野において重要な研究領域となっている。本稿では、通信ネットワークを介して接続されたエージェント間の分散(非)ベイズ推論問題に対して、後続一貫性、漸近正規性、後続収縮率などの周波数特性を確立する。この結果から,分散ベイズ推定は不確実性定量化におけるロバスト性を高めつつ,パラメトリックな効率を保ちながら,通信グラフ上の適切な仮定の下で分散ベイズ推定が維持されることが示唆された。また,通信グラフの設計とサイズが後部収縮率にどのように影響するかを検討することで,統計的効率と通信効率のトレードオフについても検討する。さらに,解析結果を時間変化グラフに拡張し,指数関数系モデル,分散ロジスティック回帰モデル,分散検出モデルに適用する。 Motivated by the need to analyze large, decentralized datasets, distributed Bayesian inference has become a critical research area across multiple fields, including statistics, electrical engineering, and economics. This paper establishes Frequentist properties, such as posterior consistency, asymptotic normality, and posterior contraction rates, for the distributed (non-)Bayes Inference problem among agents connected via a communication network. Our results show that, under appropriate assumptions on the communication graph, distributed Bayesian inference retains parametric efficiency while enhancing robustness in uncertainty quantification. We also explore the trade-off between statistical efficiency and communication efficiency by examining how the design and size of the communication graph impact the posterior contraction rate. Furthermore, We extend our analysis to time-varying graphs and apply our results to exponential family models, distributed logistic regression, and decentralized detection models.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# 機械学習に基づくアプリケーション行動の継続的管理 Continuous Management of Machine Learning-Based Application Behavior ( http://arxiv.org/abs/2311.12686v2 ) ライセンス: Link先を確認	Marco Anisetti, Claudio A. Ardagna, Nicola Bena, Ernesto Damiani, Paolo G. Panero,	(参考訳) 現代のアプリケーションは、設計から運用までのアプリケーションライフサイクル全体に影響を与える非決定的な振る舞いを持つ機械学習(ML)モデルによって、ますます推進されています。 MLの広範な採用は、MLベースのアプリケーションの時間的およびモデル変更間の安定した非機能的動作を保証するアプローチを緊急に求めている。この目的のために、プライバシ、機密性、公正性、説明可能性などのMLモデルの非機能特性を監視、検証、維持する必要がある。既存のアプローチは主に焦点をあてる一 MLモデルの機能的振舞いに応じて分類器選択のソリューションを実装すること。二連続的再訓練のような新しいアルゴリズムの解を見つけること。本稿では,MLベースのアプリケーションの安定な非機能動作を保証するためのマルチモデルアプローチを提案する。同様の非機能特性を示す複数のMLモデルを比較し、(動的かつ予測不可能な)文脈変化に応じて、時間とともに安定した非機能挙動をサポートするモデルを選択するためのアーキテクチャ的および方法論的アプローチが提供される。我々のアプローチは、MLベースのアプリケーションの安定した非機能的動作を継続的に保証し、MLアルゴリズムに依存しず、MLモデル自身で評価された非機能的特性によって駆動されるソリューションを提供することによって、最先端以上のものを提供します。モデル評価は、開発時に訓練され、選択されたMLモデルの非機能特性を検証し、モデル置換は、非機能特性の連続的かつ安定したサポートを保証する。非機能的プロパティフェアネスに着目した実世界のシナリオで,我々のソリューションを実験的に評価した。 Modern applications are increasingly driven by Machine Learning (ML) models whose non-deterministic behavior is affecting the entire application life cycle from design to operation. The pervasive adoption of ML is urgently calling for approaches that guarantee a stable non-functional behavior of ML-based applications over time and across model changes. To this aim, non-functional properties of ML models, such as privacy, confidentiality, fairness, and explainability, must be monitored, verified, and maintained. Existing approaches mostly focus on i) implementing solutions for classifier selection according to the functional behavior of ML models, ii) finding new algorithmic solutions, such as continuous re-training. In this paper, we propose a multi-model approach that aims to guarantee a stable non-functional behavior of ML-based applications. An architectural and methodological approach is provided to compare multiple ML models showing similar non-functional properties and select the model supporting stable non-functional behavior over time according to (dynamic and unpredictable) contextual changes. Our approach goes beyond the state of the art by providing a solution that continuously guarantees a stable non-functional behavior of ML-based applications, is ML algorithm-agnostic, and is driven by non-functional properties assessed on the ML models themselves. It consists of a two-step process working during application operation, where model assessment verifies non-functional properties of ML models trained and selected at development time, and model substitution guarantees continuous and stable support of non-functional properties. We experimentally evaluate our solution in a real-world scenario focusing on non-functional property fairness.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# 非対称Bethe Ansatz Asymmetric Bethe Ansatz ( http://arxiv.org/abs/2311.15155v2 ) ライセンス: Link先を確認	Steven G. Jackson, Hélène Perrin, Gregory E. Astrakharchik, Maxim Olshanii,	(参考訳) 最近提案された2つのデルタ関数相互作用粒子に対する正確な量子解は、ハードウォールボックス (Y. Liu, F. Qi, Y. Zhang, S. Chen, iScience 22 181 (2019)) に質量比3:1の質量比を持つ。本稿では、この条件を緩和する方法を見出した: 既知の自己不変鏡重ね合わせの半透明鏡の一部が、完全に反射する鏡に置き換えられ、自己不変性を損なう。提案された手法の名は、非対称ベテ・アンザッツ (Asymmetric Bethe Ansatz, Asymmetric BA) である。実例として、デルタウェル内のボゾン二量体からなる名目上は非可積分系の有界状態について詳細に研究する。最後に、Lou-Qi-Zhang-Chen問題の正確な解は非対称BAの特別な例であることを示す。 The recently proposed exact quantum solution for two delta-function-interacting particles with a mass-ratio 3:1 in a hard-wall box [Y. Liu, F. Qi, Y. Zhang and S. Chen, iScience 22, 181 (2019)] violates the conventional necessary condition for a Bethe Ansatz integrability, the condition being that the system must be reducible to a superposition of semi-transparent mirrors that is invariant under all the reflections it generates. In this article, we found a way to relax this condition: some of the semi-transparent mirrors of a known self-invariant mirror superposition can be replaced by the perfectly reflecting ones, thus breaking the self-invariance. The proposed name for the method is Asymmetric Bethe Ansatz (Asymmetric BA). As a worked example, we study in detail the bound states of the nominally non-integrable system comprised of a bosonic dimer in a delta-well. Finally, we show that the exact solution of the Liu-Qi-Zhang-Chen problem is a particular instance of the the Asymmetric BA.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# シンクロナイゼーションは必要なものすべて:非ラベル同期ビデオペアを用いた時間的アクションセグメンテーションのためのExocentric-to-Egocentric Transfer Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs ( http://arxiv.org/abs/2312.02638v3 ) ライセンス: Link先を確認	Camillo Quattrocchi, Antonino Furnari, Daniele Di Mauro, Mario Valerio Giuffrida, Giovanni Maria Farinella,	(参考訳) 我々は、当初、外向型(固定型)カメラ用に設計された時間的アクションセグメンテーションシステムを、ウェアラブルカメラが映像データをキャプチャするエゴセントリックなシナリオに転送する問題を考える。従来の教師付きアプローチでは、コストと時間を要するモデルに適応するために、新しいエゴセントリックなビデオのコレクションとラベリングが必要となる。そこで本稿では,既存のラベル付きエキソセントリックビデオと,時間的アクションセグメンテーションアノテーションを収集する必要のない,非ラベル付き,同期型エキソセントリックビデオペアを新たに導入する手法を提案する。提案手法を知識蒸留に基づく手法を用いて実装し,特徴量と時間行動セグメンテーションモデルの両方について検討する。 Assembly101とEgoExo4Dの実験は、従来の教師なし領域適応と時間的アライメントアプローチに対する提案手法の有効性を実証している。我々の最良のモデルは、ラベル付きエゴセントリックなデータに基づいてトレーニングされた教師付きアプローチと同等に動作し、単一のエゴセントリックなラベルを見ることなく、アセンブリ101データセットの編集スコア(28.59対12.60)を、エゴセントリックなデータのみに基づいてトレーニングされたベースラインモデルと比較して+15.99改善した。同様の設定では、EgoExo4Dベンチマークの編集スコアを+3.32に改善する。コードはここにある。 https://github.com/fpv-iplab/synchronization-is-all-you-need。 We consider the problem of transferring a temporal action segmentation system initially designed for exocentric (fixed) cameras to an egocentric scenario, where wearable cameras capture video data. The conventional supervised approach requires the collection and labeling of a new set of egocentric videos to adapt the model, which is costly and time-consuming. Instead, we propose a novel methodology which performs the adaptation leveraging existing labeled exocentric videos and a new set of unlabeled, synchronized exocentric-egocentric video pairs, for which temporal action segmentation annotations do not need to be collected. We implement the proposed methodology with an approach based on knowledge distillation, which we investigate both at the feature and Temporal Action Segmentation model level. Experiments on Assembly101 and EgoExo4D demonstrate the effectiveness of the proposed method against classic unsupervised domain adaptation and temporal alignment approaches. Without bells and whistles, our best model performs on par with supervised approaches trained on labeled egocentric data, without ever seeing a single egocentric label, achieving a +15.99 improvement in the edit score (28.59 vs 12.60) on the Assembly101 dataset compared to a baseline model trained solely on exocentric data. In similar settings, our method also improves edit score by +3.32 on the challenging EgoExo4D benchmark. Code is available here: https://github.com/fpv-iplab/synchronization-is-all-you-need.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# Egocentric Hand-Object Interaction Detection に合成データは有用か? Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? ( http://arxiv.org/abs/2312.02672v3 ) ライセンス: Link先を確認	Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella,	(参考訳) 本研究では,エゴセントリックな手・物体間相互作用検出における合成データの有効性について検討した。また,3つのエゴセントリックデータセット(VISOR,EgoHOS,ENIGMA-51)の広範な実験と比較分析により,実際のラベル付きデータが不足あるいは利用できない場合に,HOI検出タスクの合成データを利用する方法が明らかになった。具体的には、実際のラベル付きデータの10%しか利用せず、EPIC-KITCHENS VISORで+5.67%、EgoHOSで+8.24%、ENIGMA-51で+11.69%のトレーニングを受けたベースラインと比較して、全体的なAPの改善を実現している。我々の分析は、新しいデータ生成パイプラインと、新たに導入されたHOI-Synthベンチマークによって支援され、手オブジェクト間相互作用の合成画像に手オブジェクト接触状態、バウンディングボックス、ピクセルワイドセグメンテーションマスクを自動ラベル付けする。将来の研究をサポートするデータ、コード、およびデータ生成ツールは、https://fpv-iplab.github.io/HOI-Synth/でリリースされている。 In this study, we investigate the effectiveness of synthetic data in enhancing egocentric hand-object interaction detection. Via extensive experiments and comparative analyses on three egocentric datasets, VISOR, EgoHOS, and ENIGMA-51, our findings reveal how to exploit synthetic data for the HOI detection task when real labeled data are scarce or unavailable. Specifically, by leveraging only 10% of real labeled data, we achieve improvements in Overall AP compared to baselines trained exclusively on real data of: +5.67% on EPIC-KITCHENS VISOR, +8.24% on EgoHOS, and +11.69% on ENIGMA-51. Our analysis is supported by a novel data generation pipeline and the newly introduced HOI-Synth benchmark which augments existing datasets with synthetic images of hand-object interactions automatically labeled with hand-object contact states, bounding boxes, and pixel-wise segmentation masks. Data, code, and data generation tools to support future research are released at: https://fpv-iplab.github.io/HOI-Synth/.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# MotionCtrl:ビデオ生成のための統一型フレキシブルモーションコントローラ MotionCtrl: A Unified and Flexible Motion Controller for Video Generation ( http://arxiv.org/abs/2312.03641v2 ) ライセンス: Link先を確認	Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan,	(参考訳) ビデオ中の動きは、主にカメラの動きによって誘導されるカメラの動きと、物体の動きによって生じる物体の動きから成り立っている。映像生成にはカメラと物体の動きの正確な制御が不可欠である。しかし、既存の作品は、主に1つのタイプの動きに焦点を当てるか、その2つを明確に区別せず、制御能力と多様性を制限している。そこで本稿では,カメラと物体の動きを効果的かつ独立に制御するビデオ生成用統合フレキシブルモーションコントローラであるMotionCtrlを提案する。 MotionCtrlのアーキテクチャとトレーニング戦略は、カメラモーション、オブジェクトモーション、および不完全なトレーニングデータの性質を考慮して慎重に考案されている。従来の方法と比較して、MotionCtrlには3つの大きな利点がある。 1) カメラの動きと物体の動きを効果的かつ独立に制御し, よりきめ細かい動きの制御を可能にし, 両動作の柔軟性と多様な組み合わせを容易にする。 2) 動作条件はカメラのポーズや軌跡によって決定され, 映像中の物体の外観や形状に最小限に影響を及ぼす。 3)広範に訓練されたカメラのポーズや軌跡に適応できる比較的一般化可能なモデルである。既存の手法よりもMotionCtrlの方が優れていることを示すために,大規模な定性的および定量的実験が実施されている。 Project Page: https://wzhouxiff.github.io/projects/MotionCtrl/ Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. Accurate control of both camera and object motion is essential for video generation. However, existing works either mainly focus on one type of motion or do not clearly distinguish between the two, limiting their control capabilities and diversity. Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion. The architecture and training strategy of MotionCtrl are carefully devised, taking into account the inherent properties of camera motion, object motion, and imperfect training data. Compared to previous methods, MotionCtrl offers three main advantages: 1) It effectively and independently controls camera motion and object motion, enabling more fine-grained motion control and facilitating flexible and diverse combinations of both types of motion. 2) Its motion conditions are determined by camera poses and trajectories, which are appearance-free and minimally impact the appearance or shape of objects in generated videos. 3) It is a relatively generalizable model that can adapt to a wide array of camera poses and trajectories once trained. Extensive qualitative and quantitative experiments have been conducted to demonstrate the superiority of MotionCtrl over existing methods. Project Page: https://wzhouxiff.github.io/projects/MotionCtrl/	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# MEVG:テキスト・ツー・ビデオモデルによるマルチイベントビデオ生成 MEVG: Multi-event Video Generation with Text-to-Video Models ( http://arxiv.org/abs/2312.04086v2 ) ライセンス: Link先を確認	Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Sangpil Kim,	(参考訳) 本稿では,ユーザから複数の個々の文が与えられた複数のイベントを示すビデオを生成する,拡散に基づく新しいビデオ生成手法を提案する。提案手法は, 微調整処理を伴わずに, 事前学習した拡散型テキスト・ビデオ生成モデルを使用するため, 大規模なビデオデータセットを必要としない。具体的には、各ビデオが異なるイベントで構成されている連続ビデオ間の視覚的コヒーレンスを維持するための最後のフレーム認識拡散プロセスを提案する。さらに, 先行フレームを全て参照することで, ビデオクリップ内のフレーム全体のグローバルな外観を保ちながら, 遅延ベクトルの反復的な更新を行うことが判明した。ビデオ生成のための動的テキスト入力を処理するために,ユーザからテキスト拡散モデルのための複数の最適プロンプトにコーステキストメッセージを転送する新しいプロンプト生成器を利用する。広汎な実験とユーザスタディにより,提案手法はコンテンツとセマンティクスの時間的コヒーレンシーの観点から,他のビデオ生成モデルよりも優れていることが示された。ビデオ例はプロジェクトのページで公開されている。 We introduce a novel diffusion-based video generation method, generating a video showing multiple events given multiple individual sentences from the user. Our method does not require a large-scale video dataset since our method uses a pre-trained diffusion-based text-to-video generative model without a fine-tuning process. Specifically, we propose a last frame-aware diffusion process to preserve visual coherence between consecutive videos where each video consists of different events by initializing the latent and simultaneously adjusting noise in the latent to enhance the motion dynamic in a generated video. Furthermore, we find that the iterative update of latent vectors by referring to all the preceding frames maintains the global appearance across the frames in a video clip. To handle dynamic text input for video generation, we utilize a novel prompt generator that transfers course text messages from the user into the multiple optimal prompts for the text-to-video diffusion model. Extensive experiments and user studies show that our proposed method is superior to other video-generative models in terms of temporal coherency of content and semantics. Video examples are available on our project page: https://kuai-lab.github.io/eccv2024mevg.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# LiCamPose: マルチビューLiDARとRGBカメラの組み合わせによるロバストな1フレーム3D人物位置推定 LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation ( http://arxiv.org/abs/2312.06409v3 ) ライセンス: Link先を確認	Zhiyu Pan, Zhicheng Zhong, Wenxuan Guo, Yifan Chen, Jianjiang Feng, Jie Zhou,	(参考訳) 多視点画像から3次元人間のポーズを推定する手法が提案されている。しかし、RGBやポイントクラウドデータといったマルチモーダル入力から3次元人間の骨格を抽出する手法は限られている。このギャップに対処するために,マルチビューRGBとスパースポイントクラウド情報を統合するパイプラインLiCamPoseを導入する。これらのモダリティを組み合わせる上で,ボリュームアーキテクチャの有効性を実証する。さらに,手動でラベル付けされた3次元ポーズアノテーションの必要性を回避するため,手動アノテーションを使わずに3次元ポーズ推定器を訓練するための教師なしドメイン適応戦略を事前訓練・設計するための合成データセット生成器を開発した。提案手法の一般化能力を検証するため,LiCamPoseは2つの公開データセット,1つの合成データセット,BasketBallという名の挑戦的な自己収集データセットを含む4つのデータセットで評価され,多様なシナリオをカバーする。その結果,LiCamPoseは高い一般化性能とアプリケーションの可能性を示した。この論文を受け入れると、コード、ジェネレータ、データセットが利用可能になる。 Several methods have been proposed to estimate 3D human pose from multi-view images, achieving satisfactory performance on public datasets collected under relatively simple conditions. However, there are limited approaches studying extracting 3D human skeletons from multimodal inputs, such as RGB and point cloud data. To address this gap, we introduce LiCamPose, a pipeline that integrates multi-view RGB and sparse point cloud information to estimate robust 3D human poses via single frame. We demonstrate the effectiveness of the volumetric architecture in combining these modalities. Furthermore, to circumvent the need for manually labeled 3D human pose annotations, we develop a synthetic dataset generator for pretraining and design an unsupervised domain adaptation strategy to train a 3D human pose estimator without manual annotations. To validate the generalization capability of our method, LiCamPose is evaluated on four datasets, including two public datasets, one synthetic dataset, and one challenging self-collected dataset named BasketBall, covering diverse scenarios. The results demonstrate that LiCamPose exhibits great generalization performance and significant application potential. The code, generator, and datasets will be made available upon acceptance of this paper.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# 高次ショートカット規則によるモデル整合性復元 Advanced Model Consistency Restoration with Higher-Order Short-Cut Rules ( http://arxiv.org/abs/2312.09828v3 ) ライセンス: Link先を確認	Lars Fritsche, Jens Kosiol, Alexander Lauer, Adrian Möller, Andy Schürr,	(参考訳) 逐次モデル同期は、あるモデルから別のモデルへの変化を伝達し、一貫性を回復するタスクである。不要な削除(情報損失を引き起こす可能性がある)を避けるため、この伝播を最小限の変更方法で実行することは困難である。理論的な観点からは、情報損失を回避しつつ変化の伝播を確実に補正するいわゆるショートカット(SC)ルールが開発されている。しかし、可能なすべての変化に反応できるためには、そのような規則の無限の集合が必要であるかもしれない。実際には、事前計算された基本的なSCルールの小さなセットのみが使われており、情報を失うことなく伝達できる変更の種類を厳しく制限している。本研究は、同期中に必要となるSCルールをオンザフライで計算するアプローチを開発することで、そのギャップを埋めるものである。これらの高階のSCルールは、複数の変更を1ステップで処理しなければならない場合に、より複雑なシナリオに対処することができます。モデル変換ツールeMoflonにアプローチを実装しました。評価により、高次SCルールのオンザフライでの計算のオーバーヘッドは許容可能であり、時には全体的な性能も向上することが示された。その上、情報を失うことなく、まったく新しいシナリオを扱うことができます。 Sequential model synchronisation is the task of propagating changes from one model to another correlated one to restore consistency. It is challenging to perform this propagation in a least-changing way that avoids unnecessary deletions (which might cause information loss). From a theoretical point of view, so-called short-cut (SC) rules have been developed that enable provably correct propagation of changes while avoiding information loss. However, to be able to react to every possible change, an infinite set of such rules might be necessary. Practically, only small sets of pre-computed basic SC rules have been used, severely restricting the kind of changes that can be propagated without loss of information. In this work, we close that gap by developing an approach to compute more complex required SC rules on-the-fly during synchronisation. These higher-order SC rules allow us to cope with more complex scenarios when multiple changes must be handled in one step. We implemented our approach in the model transformation tool eMoflon. An evaluation shows that the overhead of computing higher-order SC rules on-the-fly is tolerable and at times even improves the overall performance. Above that, completely new scenarios can be dealt with without the loss of information.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# クォート振動子に対する経路積分:分割関数の正確な解析公式 Path integral for the quartic oscillator: An accurate analytic formula for the partition function ( http://arxiv.org/abs/2312.09859v3 ) ライセンス: Link先を確認	Michel Caffarel,	(参考訳) 本稿では、ポテンシャル $V(x) = \frac{1}{2} \omega^2 x^2 + g x^4$ で表されるクォート発振子の量子分割関数の近似解析式を示す。経路積分形式を用いて、正確な分割関数は、温度と結合定数$g$に依存する有効周波数を持つ調和振動子の分配関数によって近似される。実効周波数に積分された経路の最小感度原理(PMS)を導出することにより、分割関数の数学的に明確に定義された式を導出する。極めて顕著に、この公式は正確な分割関数の重要な特徴を定性的かつ定量的に再現する。自由エネルギーは温度と結合強度全体の数パーセントまで正確である。調和(g\rightarrow 0$)と古典的(高温)の制限はどちらも正確に回復される。摂動エネルギーの因子的成長を特徴とする弱結合時の基底状態エネルギーの動力系列のばらつきと、正確な係数とともに強結合膨張の関数形式を再現する。基底および第1励起状態エネルギーの正確な式、$E_0(g)$と$E_1(g)$も提示される。 In this work an approximate analytic expression for the quantum partition function of the quartic oscillator described by the potential $V(x) = \frac{1}{2} \omega^2 x^2 + g x^4$ is presented. Using a path integral formalism, the exact partition function is approximated by the partition function of a harmonic oscillator with an effective frequency depending both on the temperature and coupling constant $g$. By invoking a Principle of Minimal Sensitivity (PMS) of the path integral to the effective frequency, we derive a mathematically well-defined analytic formula for the partition function. Quite remarkably, the formula reproduces qualitatively and quantitatively the key features of the exact partition function. The free energy is accurate to a few percent over the entire range of temperatures and coupling strengths $g$. Both the harmonic ($g\rightarrow 0$) and classical (high-temperature) limits are exactly recovered. The divergence of the power series of the ground-state energy at weak coupling, characterized by a factorial growth of the perturbational energies, is reproduced as well as the functional form of the strong-coupling expansion along with accurate coefficients. Explicit accurate expressions for the ground- and first-excited state energies, $E_0(g)$ and $E_1(g)$ are also presented.	翻訳日:2024-07-17 23:50:29 公開日:2024-07-16
# SPIRE: セマンティックプロンプト駆動画像復元 SPIRE: Semantic Prompt-Driven Image Restoration ( http://arxiv.org/abs/2312.11595v2 ) ライセンス: Link先を確認	Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi,	(参考訳) テキスト駆動拡散モデルは、インペイント、スタイリゼーション、オブジェクト置換など、様々な画像編集タスクでますます人気が高まっている。しかし、この言語ビジョンパラダイムを、より精細な画像処理タスク(例えば、デノイング、超解像、デブロアリング、圧縮アーティファクト削除など)に採用することは、依然としてオープンな研究課題である。本稿では,自然言語をユーザフレンドリなインタフェースとして活用し,画像復元プロセスを制御する,セマンティック・プロンプト駆動型画像復元フレームワークであるSPIREを開発する。本稿では,2次元における情報伝達能力について考察する。まず、コンテンツ関連プロンプトを用いてセマンティックアライメントを強化し、修復結果におけるアイデンティティの曖昧さを効果的に軽減する。第2に,本手法は,タスク固有の明示的な設計を必要とせず,言語に基づく復元強度の定量的な仕様化による細粒度指導を支援する最初のフレームワークである。さらに,既存のControlNetアーキテクチャを拡張した新しい融合機構を導入し,生成前の再スケールを学習することで,復元精度の向上を実現した。我々は,SPIREの回復性能を最先端技術と比較し,回復効果に対するテキストベース制御の柔軟性を実証した。 Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework that leverages natural language as a user-friendly interface to control the image restoration process. We consider the capacity of prompt information in two dimensions. First, we use content-related prompts to enhance the semantic alignment, effectively alleviating identity ambiguity in the restoration outcomes. Second, our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength, without the need for explicit task-specific design. In addition, we introduce a novel fusion mechanism that augments the existing ControlNet architecture by learning to rescale the generative prior, thereby achieving better restoration fidelity. Our extensive experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts, alongside offering the flexibility of text-based control over the restoration effects.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# Qスコアマッチングによるリワードからの拡散モデルポリシーの学習 Learning a Diffusion Model Policy from Rewards via Q-Score Matching ( http://arxiv.org/abs/2312.11752v3 ) ライセンス: Link先を確認	Michael Psenka, Alejandro Escontrela, Pieter Abbeel, Yi Ma,	(参考訳) 拡散モデルは、行動クローニングとオフライン強化学習においてアクターポリシーを表現するために一般的な選択肢となっている。これは、連続空間上の表現的分布のクラスを最適化する自然な能力のためである。しかし、以前の作品では楽譜に基づく拡散モデルの構造を活用できず、代わりに単純な行動クローニング用語を使用してアクターを訓練し、アクター批判的な設定におけるそれらの能力を制限する。本稿では,拡散モデルポリシの構造を学習されたQ-関数にリンクする理論的枠組みを提案する。本稿では, 外部強化学習に着目し, この理論からQスコアマッチングを示す新しいポリシー更新手法を提案する。特に、このアルゴリズムは拡散モデル全体の評価よりもデノナイジングモデルを通してしか区別する必要がなく、Qスコアマッチングによる収束ポリシーは、連続的なドメインにおいて暗黙的に多重モーダルかつ爆発的である。シミュレーション環境で実験を行い,提案手法の有効性を実証し,一般的なベースラインと比較した。ソースコードはプロジェクトのWebサイト(https://michaelpsenka.io/qsm)から入手できる。 Diffusion models have become a popular choice for representing actor policies in behavior cloning and offline reinforcement learning. This is due to their natural ability to optimize an expressive class of distributions over a continuous space. However, previous works fail to exploit the score-based structure of diffusion models, and instead utilize a simple behavior cloning term to train the actor, limiting their ability in the actor-critic setting. In this paper, we present a theoretical framework linking the structure of diffusion model policies to a learned Q-function, by linking the structure between the score of the policy to the action gradient of the Q-function. We focus on off-policy reinforcement learning and propose a new policy update method from this theory, which we denote Q-score matching. Notably, this algorithm only needs to differentiate through the denoising model rather than the entire diffusion model evaluation, and converged policies through Q-score matching are implicitly multi-modal and explorative in continuous domains. We conduct experiments in simulated environments to demonstrate the viability of our proposed method and compare to popular baselines. Source code is available from the project website: https://michaelpsenka.io/qsm.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# LiDAR領域の一般化を再考する:多重密度領域としての単一ソース Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains ( http://arxiv.org/abs/2312.12098v2 ) ライセンス: Link先を確認	Jaeyeul Kim, Jungwan Woo, Jeonghoon Kim, Sunghoon Im,	(参考訳) LiDARに基づく認識の領域では、重要な進歩がなされているが、領域の一般化は依然として重大な課題である。この性能は、異なるLiDARセンサーを持つ未知のデータセットにモデルを適用する場合や、主に点雲密度分布の変化のために新しい環境にデプロイする場合に劣化することが多い。この課題に対処するために、単一ソースのLiDAR点雲が密度のスペクトルを包含しているという観測に乗じて、DDFE(Divate Discriminative Feature Embedding)モジュールを提案する。 DDFEモジュールは、単一のソースドメイン内で密度固有の特徴を抽出し、異なるLiDARセンサー間で類似した密度特性を共有するオブジェクトの認識を容易にするように設計されている。さらに、ソースデータの密度スペクトルを拡大し、DDFEの能力を高めることを目的とした、シンプルで効果的な密度拡張手法を導入する。 DDFEは汎用的で軽量なドメイン一般化モジュールとして際立っている。様々な3Dバックボーンネットワークにシームレスに統合することができ、現在の最先端ドメイン一般化法よりも優れた性能を示している。コードはhttps://github.com/dgist-cvlab/MultiDensityDGで入手できる。 In the realm of LiDAR-based perception, significant strides have been made, yet domain generalization remains a substantial challenge. The performance often deteriorates when models are applied to unfamiliar datasets with different LiDAR sensors or deployed in new environments, primarily due to variations in point cloud density distributions. To tackle this challenge, we propose a Density Discriminative Feature Embedding (DDFE) module, capitalizing on the observation that a single source LiDAR point cloud encompasses a spectrum of densities. The DDFE module is meticulously designed to extract density-specific features within a single source domain, facilitating the recognition of objects sharing similar density characteristics across different LiDAR sensors. In addition, we introduce a simple yet effective density augmentation technique aimed at expanding the spectrum of density in source data, thereby enhancing the capabilities of the DDFE. Our DDFE stands out as a versatile and lightweight domain generalization module. It can be seamlessly integrated into various 3D backbone networks, where it has demonstrated superior performance over current state-of-the-art domain generalization methods. Code is available at https://github.com/dgist-cvlab/MultiDensityDG.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# NeRF-VO:ニューラルラジアンス場を用いたリアルタイムスパース視覚計測 NeRF-VO: Real-Time Sparse Visual Odometry with Neural Radiance Fields ( http://arxiv.org/abs/2312.13471v2 ) ライセンス: Link先を確認	Jens Naumann, Binbin Xu, Stefan Leutenegger, Xingxing Zuo,	(参考訳) 我々は,低遅延カメラ追跡のための学習ベーススパース視覚計測システムNeRF-VOと,微細な高密度再構成と新しいビュー合成のためのニューラルラディアンスシーン表現を統合した新しいモノクロ視覚計測システムNeRF-VOを導入する。本システムでは,スパース・ビジュアル・オドメトリーを用いてカメラのポーズを初期化し,モノラルな予測ネットワークからビュー依存の高密度な幾何学的先行情報を得る。我々は、ポーズのスケールと密な幾何学を調和させ、それらを神経暗黙のシーン表現を訓練するための監督的手がかりとして扱う。 NeRF-VOは、キーフレームされたポーズのスライドウィンドウと、ボリュームレンダリングによるラディアンスフィールドのトレーニングによって達成される下層の密度幾何を共同最適化することにより、シーン表現の測度と幾何学的忠実度の両方において、例外的な性能を示す。我々は、高いカメラトラッキング周波数を実現し、GPUメモリの消費を抑えつつ、ポーズ推定精度、新しいビュー合成忠実度、および様々な合成および実世界のデータセットにおける密度の高い再構成品質をSOTA法を超えている。 We introduce a novel monocular visual odometry (VO) system, NeRF-VO, that integrates learning-based sparse visual odometry for low-latency camera tracking and a neural radiance scene representation for fine-detailed dense reconstruction and novel view synthesis. Our system initializes camera poses using sparse visual odometry and obtains view-dependent dense geometry priors from a monocular prediction network. We harmonize the scale of poses and dense geometry, treating them as supervisory cues to train a neural implicit scene representation. NeRF-VO demonstrates exceptional performance in both photometric and geometric fidelity of the scene representation by jointly optimizing a sliding window of keyframed poses and the underlying dense geometry, which is accomplished through training the radiance field with volume rendering. We surpass SOTA methods in pose estimation accuracy, novel view synthesis fidelity, and dense reconstruction quality across a variety of synthetic and real-world datasets while achieving a higher camera tracking frequency and consuming less GPU memory.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# 正規化誤り訂正によるフェデレーション学習のためのスパーストレーニング Sparse Training for Federated Learning with Regularized Error Correction ( http://arxiv.org/abs/2312.13795v2 ) ライセンス: Link先を確認	Ran Greidi, Kobi Cohen,	(参考訳) Federated Learning(FL)は、ディープニューラルネットワーク(DNN)モデルをトレーニングする上で大きなメリットがあるため、大きな関心を集めている。しかし、通信資源や計算資源は限られているため、FLシステムにおけるDNNモデルの訓練は、複雑なタスクにおける計算コストや通信コストの増大などの課題に直面している。スパーストレーニングスキームは、各クライアント(すなわちノード)送信の寸法を縮小するために注目される。具体的には,重要な更新のみをパラメータサーバ(PS)に送信し,残りをローカルに蓄積するという,エラー訂正手法によるスペーシングが有望な手法である。誤り訂正法は収束を損なうことなくクライアント対PSメッセージの大幅なスペーサー化レベルを達成することが示されているが、スペーサー化は安定化効果によりさらに未解決のままである。本稿では,FLARE(Federated Learning with Accumulated Regularized Embeddings)と呼ばれる新しいアルゴリズムを提案する。 FLAREでは,更新モデルの蓄積とFLプロセスへの埋め込みの正規化によるスパーストレーニング手法を提案する。 FLAREの性能は、多種多様な複雑なモデルに関する広範な実験を通じて検証され、顕著なスパーシリティレベル(現在の最先端の10倍以上の)を達成するとともに、精度が大幅に向上した。さらに、研究者や関連分野の開発者の利益のために、オープンソースのソフトウェアパッケージが開発されている。 Federated Learning (FL) has attracted much interest due to the significant advantages it brings to training deep neural network (DNN) models. However, since communications and computation resources are limited, training DNN models in FL systems face challenges such as elevated computational and communication costs in complex tasks. Sparse training schemes gain increasing attention in order to scale down the dimensionality of each client (i.e., node) transmission. Specifically, sparsification with error correction methods is a promising technique, where only important updates are sent to the parameter server (PS) and the rest are accumulated locally. While error correction methods have shown to achieve a significant sparsification level of the client-to-PS message without harming convergence, pushing sparsity further remains unresolved due to the staleness effect. In this paper, we propose a novel algorithm, dubbed Federated Learning with Accumulated Regularized Embeddings (FLARE), to overcome this challenge. FLARE presents a novel sparse training approach via accumulated pulling of the updated models with regularization on the embeddings in the FL process, providing a powerful solution to the staleness effect, and pushing sparsity to an exceptional level. The performance of FLARE is validated through extensive experiments on diverse and complex models, achieving a remarkable sparsity level (10 times and more beyond the current state-of-the-art) along with significantly improved accuracy. Additionally, an open-source software package has been developed for the benefit of researchers and developers in related fields.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# 古典最適化ユニタリ回路による非平衡量子力学のスケーラブルシミュレーション Scalable simulation of non-equilibrium quantum dynamics via classically optimised unitary circuits ( http://arxiv.org/abs/2312.14245v2 ) ライセンス: Link先を確認	Luke Causer, Felix Jung, Asimpunya Mitra, Frank Pollmann, Adam Gammon-Smith,	(参考訳) 短期的なデジタル量子コンピュータの出現は、古典的コンピューティング以上の量子多体現象を研究するエキサイティングな機会になるかもしれない。ハードウェアを最大限に活用するためには、限られた回路深さに対してハミルトン力学を正確にシミュレートする手法が最重要である。本稿では,量子時間進化演算子を近似するために,一元的ブロックウォール回路を古典的に最適化する手法を提案する。本手法はテンソルネットワークを用いてシステムサイズを拡張可能である。様々な3体ハミルトニアンに対して、我々の手法は、その精度と力学を実装するために必要な量子回路深さの両方でトロタライズを上回る量子回路を生成し、正確な詳細はハミルトニアンに依存することを示した。また、量子デバイスとブロックウォール回路の近似の組合せ誤差を最小限に抑える最適な時間ステップを選択する方法についても説明する。 The advent of near-term digital quantum computers could offer us an exciting opportunity to investigate quantum many-body phenomena beyond that of classical computing. To make the best use of the hardware available, it is paramount that we have methods that accurately simulate Hamiltonian dynamics for limited circuit depths. In this paper, we propose a method to classically optimise unitary brickwall circuits to approximate quantum time evolution operators. Our method is scalable in system size through the use of tensor networks. We demonstrate that, for various three-body Hamiltonians, our approach produces quantum circuits that can outperform Trotterization in both their accuracy and the quantum circuit depth needed to implement the dynamics, with the exact details being dependent on the Hamiltonian. We also explain how to choose an optimal time step that minimises the combined errors of the quantum device and the brickwall circuit approximation.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# 深層学習法により得られた包括的電子-炭素散乱データへの経験的適合 Empirical fits to inclusive electron-carbon scattering data obtained by deep-learning methods ( http://arxiv.org/abs/2312.17298v2 ) ライセンス: Link先を確認	Beata E. Kowal, Krzysztof M. Graczyk, Artur M. Ankowski, Rwik Dharmapal Banerjee, Hemant Prasad, Jan T. Sobczyk,	(参考訳) ニューラルネットワークの枠組みを応用して、準弾性ピークから共鳴励起を経て深い非弾性散乱の開始まで、広い運動領域上の炭素の電子散乱断面積に実験的に適合することを示す。このようなモデル非依存のパラメトリゼーションとそれに対応する不確実性を得る2つの異なる方法を考える:ブートストラップ法とモンテカルロのドロップアウト法に基づく。解析において、$\chi^2$は、各独立した測定セットに対する点対点と正規化の不確かさを含む損失関数を定義する。我々の統計的アプローチは、同等の品質と、同様の不確実性の7ドル%の順序に適合する。これらのモデルをテストするために、これらの予測を、トレーニングプロセスから除外されたデータセットとスペクトル関数アプローチで得られた理論的予測と比較する。両方のモデルの予測は、実験的な測定と理論的な計算と一致している。また,対象キネマティック領域を超えたデータセットとの比較を行い,ブートストラップ手法は,ドロップアウトアルゴリズムに基づくデータセットよりも,補間能力と補間性能が優れていることを示した。 Employing the neural network framework, we obtain empirical fits to the electron-scattering cross sections for carbon over a broad kinematic region, extending from the quasielastic peak through resonance excitation to the onset of deep-inelastic scattering. We consider two different methods of obtaining such model-independent parametrizations and the corresponding uncertainties: based on the bootstrap approach and the Monte Carlo dropout approach. In our analysis, the $\chi^2$ defines the loss function, including point-to-point and normalization uncertainties for each independent set of measurements. Our statistical approaches lead to fits of comparable quality and similar uncertainties of the order of $7$%. To test these models, we compare their predictions to test datasets excluded from the training process and theoretical predictions obtained within the spectral function approach. The predictions of both models agree with experimental measurements and theoretical calculations. We also perform a comparison to a dataset lying beyond the covered kinematic region, and find that the bootstrap approach shows better interpolation and extrapolation abilities than the one based on the dropout algorithm.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# ストリートガウシアン:ガウシアンスプレイティングによる動的都市景観のモデル化 Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting ( http://arxiv.org/abs/2401.01339v2 ) ライセンス: Link先を確認	Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng,	(参考訳) 本稿では,自律走行シーンの動的街路をモデル化する問題に取り組むことを目的とする。近年の手法では、車両のアニメーション化に追従した車両のポーズを取り入れてNeRFを拡張し、ダイナミックな街路シーンの写実的なビュー合成を可能にしている。しかし、トレーニングの遅さとレンダリングのスピードには大きな制限がある。この制限に対処する新たな明示的なシーン表現であるStreet Gaussiansを紹介します。具体的には、ダイナミックアーバンシーンは、セマンティックロジットと3Dガウスアンを備えた点雲の集合として表現され、それぞれが前景車両または背景に関連付けられている。前景の物体車両の動力学をモデル化するために、各物体点雲は、動的外観のための4次元球面調和モデルとともに、最適化可能な追跡されたポーズで最適化される。明示的な表現は、オブジェクト車両と背景の簡単な構成を可能にし、30分以内のトレーニングで、シーン編集操作とレンダリングを135 FPS (1066$\times$1600 resolution)で行うことができる。提案手法は、KITTIやWaymo Openデータセットなど、複数の挑戦的なベンチマークで評価される。実験の結果,提案手法はすべてのデータセットで常に最先端の手法よりも優れていた。再現性を確保するために、コードはリリースされます。 This paper aims to tackle the problem of modeling dynamic urban streets for autonomous driving scenes. Recent methods extend NeRF by incorporating tracked vehicle poses to animate vehicles, enabling photo-realistic view synthesis of dynamic urban street scenes. However, significant limitations are their slow training and rendering speed. We introduce Street Gaussians, a new explicit scene representation that tackles these limitations. Specifically, the dynamic urban scene is represented as a set of point clouds equipped with semantic logits and 3D Gaussians, each associated with either a foreground vehicle or the background. To model the dynamics of foreground object vehicles, each object point cloud is optimized with optimizable tracked poses, along with a 4D spherical harmonics model for the dynamic appearance. The explicit representation allows easy composition of object vehicles and background, which in turn allows for scene editing operations and rendering at 135 FPS (1066 $\times$ 1600 resolution) within half an hour of training. The proposed method is evaluated on multiple challenging benchmarks, including KITTI and Waymo Open datasets. Experiments show that the proposed method consistently outperforms state-of-the-art methods across all datasets. The code will be released to ensure reproducibility.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# 要求工学における自然言語処理技術の選択と評価に関する実践的ガイドライン Practical Guidelines for the Selection and Evaluation of Natural Language Processing Techniques in Requirements Engineering ( http://arxiv.org/abs/2401.01508v3 ) ライセンス: Link先を確認	Mehrdad Sabetzadeh, Chetan Arora,	(参考訳) 自然言語処理(NLP)が要求自動化の基礎になった。要求工学(RE)におけるNLPの採用の増加の背景にある重要な要因の1つは、業界における要求を特定するために自然言語(NL)が普及していることである。 NLP技術は、要求を自動的に分類し、重要な情報、例えばドメインモデルや用語を抽出し、曖昧性処理や完全性チェックなどの品質保証タスクを実行するために一般的に用いられる。多くの異なるNLPソリューション戦略が利用可能であり、機械学習を同時に適用することが可能であるため、特定のREタスクの適切な戦略を選択し、結果のソリューションを経験的に厳密な方法で評価することは困難である。本章では,NLP技術の選択に関するガイドラインと,REの文脈における評価について述べる。特に,従来のNLP,特徴ベース機械学習,言語モデルに基づく手法など,さまざまな戦略を選択する方法について議論する。この章の究極の希望は、NLP4REへの新規参入者を支援し、RE分野に最も関係のあるNLP技術に迅速に参入することである。 Natural Language Processing (NLP) is now a cornerstone of requirements automation. One compelling factor behind the growing adoption of NLP in Requirements Engineering (RE) is the prevalent use of natural language (NL) for specifying requirements in industry. NLP techniques are commonly used for automatically classifying requirements, extracting important information, e.g., domain models and glossary terms, and performing quality assurance tasks, such as ambiguity handling and completeness checking. With so many different NLP solution strategies available and the possibility of applying machine learning alongside, it can be challenging to choose the right strategy for a specific RE task and to evaluate the resulting solution in an empirically rigorous manner. In this chapter, we present guidelines for the selection of NLP techniques as well as for their evaluation in the context of RE. In particular, we discuss how to choose among different strategies such as traditional NLP, feature-based machine learning, and language-model-based methods. Our ultimate hope for this chapter is to serve as a stepping stone, assisting newcomers to NLP4RE in quickly initiating themselves into the NLP technologies most pertinent to the RE field.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# パワーロー減衰下における分析スペクトルアルゴリズムの一般化誤差曲線 Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay ( http://arxiv.org/abs/2401.01599v2 ) ライセンス: Link先を確認	Yicheng Li, Weiye Gan, Zuoqiang Shi, Qian Lin,	(参考訳) カーネル回帰法の一般化誤差曲線は,極小率ではなく,様々な音源条件,雑音レベル,正規化パラメータの選択による一般化誤差の正確な順序を決定することを目的としている。本研究では、軽微な仮定の下で、カーネル回帰におけるカーネル勾配勾配法(および分析スペクトルアルゴリズムの大規模なクラス)の一般化誤差曲線を厳格に評価する。その結果、カーネル補間の不整合性を明確化し、より高い資格を有するカーネル回帰アルゴリズムの飽和効果を明らかにすることができた。ニューラル・タンジェント・カーネル理論により、これらの結果は広義のニューラルネットワークを訓練する際の一般化行動の理解を大幅に改善する。解析的機能論という新しい技術的貢献は、独立した関心事であるかもしれない。 The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# 認知領域における量子機械学習 : アルツハイマー病研究 Quantum Machine Learning in the Cognitive Domain: Alzheimer's Disease Study ( http://arxiv.org/abs/2401.06697v2 ) ライセンス: Link先を確認	Emine Akpinar,	(参考訳) アルツハイマー病(英語: Alzheimer's disease、AD)は、特に高齢者において認知障害となる神経変性性脳疾患である。認知障害は、集中力、記憶力、その他の高次認知能力などの様々な精神能力の低下として現れる。これらの欠陥は、個人が情報を理解し、新しい知識を取得し、効果的にコミュニケーションする能力に大きな影響を及ぼす可能性がある。認知障害による影響の1つは手書きである。圧力、速度、空間的な組織など、手書きのさまざまな側面を分析することで、研究者は早期の認知障害、特にADを示す微妙な変化を検出することができる。近年,高齢者のADを手書き解析により検出するための古典的人工知能(AI)手法がいくつか提案されている。しかし、高度なAI手法は、データのサイズが大きくなるにつれて、より多くの計算能力を必要とする。さらに、診断は古典的ベクトル空間の制限や特徴間の相関などの影響を受けうる。近年の研究では、医療における量子コンピューティング技術の使用は、これらの問題に対処するだけでなく、複雑なデータ分析を加速し、大規模データセットをより効率的に処理できることが示されている。本研究では,手書きデータに基づく高齢者のAD早期診断を容易にするため,回路要素が少ない変分量子分類器を提案する。機能のエンコーディングにはZZFeatureMapを使用しました。 ADを分類するために、繰り返しRyとRzの回転ゲートとCYとCZの2量子エンタングルゲートからなるパラメータ化量子回路を設計、実装した。提案したモデルはADの分類において0.75の精度を達成した。 Alzheimer's disease (AD) is the most prevalent neurodegenerative brain disorder, which results in significant cognitive impairments, especially in the elderly population. Cognitive impairments can manifest as a decline in various mental faculties, such as concentration, memory, and other higher-order cognitive abilities. These deficits can significantly impact an individual's capacity to comprehend information, acquire new knowledge, and communicate effectively. One of the affected activities due to cognitive impairments is handwriting. By analyzing different aspects of handwriting, including pressure, velocity, and spatial organization, researchers can detect subtle alterations that might indicate early-stage cognitive impairments, especially AD. Recently, several classical artificial intelligence (AI) approaches have been proposed for detecting AD in elderly individuals through handwriting analysis. However, advanced AI methods require more computational power as the size of the data increases. Additionally, diagnoses can be influenced by factors such as limited relevant classical vector space and correlations between features. Recent studies have shown that using quantum computing technologies in healthcare can not only address these problems but also accelerate complex data analysis and process large datasets more efficiently. In this study, we introduced a variational quantum classifier with fewer circuit elements to facilitate the early diagnosis of AD in elderly individuals based on handwriting data. We employed ZZFeatureMap for encoding features. To classify AD, a parameterized quantum circuit consisting of repeated Ry and Rz rotation gates, as well as CY and CZ two-qubit entangling gates, was designed and implemented. The proposed model achieved an accuracy of 0.75 in classifying AD.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# 暗黙のニューラルキャンバスの解説:その貢献を追究して、レンズとニューロンを繋ぐ Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions ( http://arxiv.org/abs/2401.10217v2 ) ライセンス: Link先を確認	Namitha Padmanabhan, Matthew Gwilliam, Pulkit Kumar, Shishira R Maiya, Max Ehrlich, Abhinav Shrivastava,	(参考訳) ニューラルネットワークが信号の連続的な表現として訓練されるインプリシトニューラルネットワーク表現(INR)の多くのバリエーションは、新しいビュー合成、ビデオ圧縮、画像超解像といった下流タスクに極めて実用的である。残念なことに、これらのネットワークの内部構造は、あまり研究されていない。我々の研究であるeXplaining the Implicit Neural Canvas (XINC)は、各ニューロンの出力画素への寄与の強さを調べることによって、INRの特性を説明する統一的なフレームワークである。これらのコントリビューションの集合をImplicit Neural Canvasと呼び、この概念を使って、私たちが研究しているINRが、彼らの表現するフレームを驚くべき方法で"見る"ことを学ぶことを実証します。例えば、INRは高度に分散した表現を持つ傾向がある。高レベルのオブジェクトセマンティクスを欠いているが、色とエッジには大きなバイアスがあり、ほとんど完全に空間に依存しない。我々は、ビデオINRにおいてオブジェクトがどのように時間にわたって表現されるかを調べ、クラスタリングを使用して、レイヤやアーキテクチャにわたって類似したニューロンを視覚化し、これが動きに支配されていることを示す、という結論に達した。これらの知見は分析フレームワークの汎用性を示している。私たちのプロジェクトページはhttps://namithap10.github.io/xinc.comで公開されている。 The many variations of Implicit Neural Representations (INRs), where a neural network is trained as a continuous representation of a signal, have tremendous practical utility for downstream tasks including novel view synthesis, video compression, and image super-resolution. Unfortunately, the inner workings of these networks are seriously under-studied. Our work, eXplaining the Implicit Neural Canvas (XINC), is a unified framework for explaining properties of INRs by examining the strength of each neuron's contribution to each output pixel. We call the aggregate of these contribution maps the Implicit Neural Canvas and we use this concept to demonstrate that the INRs we study learn to "see" the frames they represent in surprising ways. For example, INRs tend to have highly distributed representations. While lacking high-level object semantics, they have a significant bias for color and edges, and are almost entirely space-agnostic. We arrive at our conclusions by examining how objects are represented across time in video INRs, using clustering to visualize similar neurons across layers and architectures, and show that this is dominated by motion. These insights demonstrate the general usefulness of our analysis framework. Our project page is available at https://namithap10.github.io/xinc.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# 簡易潜時拡散法によるパノプティカルセグメンテーションとマスク塗布 A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting ( http://arxiv.org/abs/2401.10227v2 ) ライセンス: Link先を確認	Wouter Van Gansbeke, Bert De Brabandere,	(参考訳) パノプティクスとインスタンスセグメンテーションネットワークは、しばしば、特殊なオブジェクト検出モジュール、複雑な損失関数、およびインスタンスマスクの置換不変性を管理するためのアドホックな後処理ステップで訓練される。この研究は安定拡散の上に構築され、汎視的セグメンテーションの潜在拡散アプローチを提案し、その結果、これらの複雑さを省略する単純なアーキテクチャをもたらす。トレーニングは,(1)部分分割マスクを潜伏空間に投影する浅層オートエンコーダの訓練,(2)潜伏空間における画像条件付きサンプリングを可能にする拡散モデルの訓練,の2段階からなる。この生成的アプローチは、マスクの完成または塗装の探索を解き放つ。 COCOとADE20kに関する実験的検証は、強いセグメンテーション結果をもたらす。最後に,学習可能なタスク埋め込みを導入することで,マルチタスクへのモデルの適応性を実証する。 Panoptic and instance segmentation networks are often trained with specialized object detection modules, complex loss functions, and ad-hoc post-processing steps to manage the permutation-invariance of the instance masks. This work builds upon Stable Diffusion and proposes a latent diffusion approach for panoptic segmentation, resulting in a simple architecture that omits these complexities. Our training consists of two steps: (1) training a shallow autoencoder to project the segmentation masks to latent space; (2) training a diffusion model to allow image-conditioned sampling in latent space. This generative approach unlocks the exploration of mask completion or inpainting. The experimental validation on COCO and ADE20k yields strong segmentation results. Finally, we demonstrate our model's adaptability to multi-tasking by introducing learnable task embeddings.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# Yoneda Lemmaを用いた完全同型暗号スキームの構築 Constructing a fully homomorphic encryption scheme with the Yoneda Lemma ( http://arxiv.org/abs/2401.13255v3 ) ライセンス: Link先を確認	Rémy Tuyéras,	(参考訳) 本稿では, Yoneda Lemmaの適用を通じて, 非対称暗号の同型暗号システムの基盤を再定義する。これは、ElGamal、RSA、Benaloh、RegevのLWE、NTRUEncryptといった広く採用されているシステムが、Yoneda Lemmaの原則から直接派生していることを示している。この合成により、Yoneda Encryption Schemeと呼ばれる全体論的同型暗号化フレームワークが生まれる。このスキームの中では、暗号は Yoneda Lemma 同型(英語版)の単射写像を通して解明され、復号はこれらの写像の自然性からシームレスに従う。この統合は統一モデル理論フレームワークの予想を示唆し、同型および完全同型暗号(FHE)スキームの推論の基礎を提供する。実演として、スキャッシングやブートストレッピングといった追加の調整を必要とせず、暗号化された乗算と加算の任意の有限列を処理できるFHE方式を提案する。このことは、提案された理論の進歩の実践的な意味を浮き彫りにするだけでなく、FHEスキームの設計を容易にするために、モデル理論と暗号の強制技術を活用する新たな可能性ももたらしている。 This paper redefines the foundations of asymmetric cryptography's homomorphic cryptosystems through the application of the Yoneda Lemma. It explicitly illustrates that widely adopted systems, including ElGamal, RSA, Benaloh, Regev's LWE, and NTRUEncrypt, directly derive from the principles of the Yoneda Lemma. This synthesis gives rise to a holistic homomorphic encryption framework named the Yoneda Encryption Scheme. Within this scheme, encryption is elucidated through the bijective maps of the Yoneda Lemma Isomorphism, and decryption seamlessly follows from the naturality of these maps. This unification suggests a conjecture for a unified model theory framework, providing a basis for reasoning about both homomorphic and fully homomorphic encryption (FHE) schemes. As a practical demonstration, the paper introduces an FHE scheme capable of processing arbitrary finite sequences of encrypted multiplications and additions without the need for additional tweaking techniques, such as squashing or bootstrapping. This not only underscores the practical implications of the proposed theoretical advancements but also introduces new possibilities for leveraging model theory and forcing techniques in cryptography to facilitate the design of FHE schemes.	翻訳日:2024-07-17 23:40:44 公開日:2024-07-16
# ConTextual:大規模マルチモーダルモデルにおけるコンテキスト感性テキストリッチビジュアル推論の評価 ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models ( http://arxiv.org/abs/2401.13311v3 ) ライセンス: Link先を確認	Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang, Nanyun Peng,	(参考訳) 多くの実世界のタスクでは、エージェントがテキストとビジュアルオブジェクト(例えば、公共空間をナビゲートする)を共同で推論する必要がある。具体的には、これらのタスクは、テキストが画像内の視覚的要素と相互作用するコンテキストを理解する必要がある。しかし、文脈に敏感なテキストリッチな視覚的推論に対して、最先端のマルチモーダルモデルの能力をベンチマークする既存のデータセットが欠如している。本稿では,テキストリッチな画像に対する文脈依存推論を必要とする人為的命令を特徴とする新しいデータセットであるConTextualを紹介する。我々は,14の基礎モデル(GPT-4V,Gemini-Pro-Vision,LLaVA-Next)の性能評価実験を行い,人間のパフォーマンスベースラインを確立する。さらに、モデル応答の人的評価を行い、GPT-4V(現在の最高性能の大規模マルチモーダルモデル)と人的性能の30.8%の顕著な性能ギャップを観察する。 GPT-4Vは時間関連データやインフォグラフィックの解釈が困難であることが明らかとなった。しかし、ミームや引用文のような抽象的な視覚的文脈を解釈する能力を示す。最後に、質的分析により、視覚の正確な知覚や幻覚の欠如など、パフォーマンスの低下に寄与する様々な要因が明らかになった。私たちのデータセット、コード、リーダーボードはプロジェクトページ https://con-textual.github.io/ で確認できます。 Many real-world tasks require an agent to reason jointly over text and visual objects, (e.g., navigating in public spaces), which we refer to as context-sensitive text-rich visual reasoning. Specifically, these tasks require an understanding of the context in which the text interacts with visual elements within an image. However, there is a lack of existing datasets to benchmark the state-of-the-art multimodal models' capability on context-sensitive text-rich visual reasoning. In this paper, we introduce ConTextual, a novel dataset featuring human-crafted instructions that require context-sensitive reasoning for text-rich images. We conduct experiments to assess the performance of 14 foundation models (GPT-4V, Gemini-Pro-Vision, LLaVA-Next) and establish a human performance baseline. Further, we perform human evaluations of the model responses and observe a significant performance gap of 30.8% between GPT-4V (the current best-performing Large Multimodal Model) and human performance. Our fine-grained analysis reveals that GPT-4V encounters difficulties interpreting time-related data and infographics. However, it demonstrates proficiency in comprehending abstract visual contexts such as memes and quotes. Finally, our qualitative analysis uncovers various factors contributing to poor performance including lack of precise visual perception and hallucinations. Our dataset, code, and leaderboard can be found on the project page https://con-textual.github.io/	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# 無限次元のChoi形式主義から完全正の動的半群の生成元の一意分解へ From the Choi Formalism in Infinite Dimensions to Unique Decompositions of Generators of Completely Positive Dynamical Semigroups ( http://arxiv.org/abs/2401.14344v3 ) ライセンス: Link先を確認	Frederik vom Ende,	(参考訳) 任意の可分複素ヒルベルト空間が与えられたとき、純粋に虚トレースを持たないトレースクラス作用素$B$と、全正写像のノルム連続一パラメータ半群の任意の生成元$L$は、一意有界作用素$K$と一意完全正写像$Phi$が存在することを証明する。 (i)$L=K(\cdot)+(\cdot)K^+\Phi$, (ii) Superoperator $\Phi(B^(\cdot)B)$はトレースクラスであり、トレースが消滅する。 (iii)${\rm tr}(B^K)$は実数である。私たちの証明の中心は、正の半定値作用素に完全正の写像を関連付けるチェ形式論の修正版である。この対応がそれぞれ単射かつ全射であるときの特徴付けを行い、その結果、主結果の証明アイデアが非分離ヒルベルト空間に拡張できない理由を説明する。特に、上述のヒルベルト空間が無限次元となるとすぐに、チェイ形式の下で空の事前像を持つ正半定値作用素の例が見つかる。 Given any separable complex Hilbert space, any trace-class operator $B$ which does not have purely imaginary trace, and any generator $L$ of a norm-continuous one-parameter semigroup of completely positive maps we prove that there exists a unique bounded operator $K$ and a unique completely positive map $\Phi$ such that (i) $L=K(\cdot)+(\cdot)K^+\Phi$, (ii) the superoperator $\Phi(B^(\cdot)B)$ is trace class and has vanishing trace, and (iii) ${\rm tr}(B^K)$ is a real number. Central to our proof is a modified version of the Choi formalism which relates completely positive maps to positive semi-definite operators. We characterize when this correspondence is injective and surjective, respectively, which in turn explains why the proof idea of our main result cannot extend to non-separable Hilbert spaces. In particular, we find examples of positive semi-definite operators which have empty pre-image under the Choi formalism as soon as the underlying Hilbert space is infinite-dimensional.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# 浮動小数点演算におけるReLUとステップネットワークの表現力 Expressive Power of ReLU and Step Networks under Floating-Point Operations ( http://arxiv.org/abs/2401.15121v2 ) ライセンス: Link先を確認	Yeachan Park, Geonho Hwang, Wonyeol Lee, Sejun Park,	(参考訳) ニューラルネットワークの表現力の研究は、ニューラルネットワークの基本的限界について研究してきた。既存の結果の多くは、実数値入力とパラメータと、ニューラルネットワークの評価中の正確な操作を仮定している。しかしながら、ニューラルネットワークは通常、実数のごく一部しか表現できないコンピュータ上で実行され、不正確な操作を適用する。本研究では、実際に浮動小数点数と演算を使用する場合のニューラルネットワークの表現力について、より現実的な設定で分析する。最初の結果の集合は浮動小数点演算を仮定し、浮動小数点演算は有限ビットで表されるが、指数関数は任意の整数値を取ることができる。この設定では、バイナリしきい値単位またはReLUを用いたニューラルネットワークが有限入力/出力ペアを記憶し、任意の誤差内で連続関数を近似することができることを示す。特に、普遍近似と記憶のための構成におけるパラメータの数は、正確な数学的操作を仮定する古典的な結果と一致する。また,浮動小数点演算が有意および指数の両方に有限ビットを使用する場合の記憶と普遍近似についても同様の結果を示す。これらの結果はIEEE 754規格(例えば,32ビット単精度フォーマット)やbfloat16など,多くの一般的な浮動小数点形式に適用できる。 The study of the expressive power of neural networks has investigated the fundamental limits of neural networks. Most existing results assume real-valued inputs and parameters as well as exact operations during the evaluation of neural networks. However, neural networks are typically executed on computers that can only represent a tiny subset of the reals and apply inexact operations, i.e., most existing results do not apply to neural networks used in practice. In this work, we analyze the expressive power of neural networks under a more realistic setup: when we use floating-point numbers and operations as in practice. Our first set of results assumes floating-point operations where the significand of a float is represented by finite bits but its exponent can take any integer value. Under this setup, we show that neural networks using a binary threshold unit or ReLU can memorize any finite input/output pairs and can approximate any continuous function within an arbitrary error. In particular, the number of parameters in our constructions for universal approximation and memorization coincides with that in classical results assuming exact mathematical operations. We also show similar results on memorization and universal approximation when floating-point operations use finite bits for both significand and exponent; these results are applicable to many popular floating-point formats such as those defined in the IEEE 754 standard (e.g., 32-bit single-precision format) and bfloat16.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# 拡散に基づくグラフ生成法 Diffusion-based Graph Generative Methods ( http://arxiv.org/abs/2401.15617v2 ) ライセンス: Link先を確認	Hongyang Chen, Can Xu, Lingyu Zheng, Qiang Zhang, Xuemin Lin,	(参考訳) 最も最先端な生成法であるため、拡散法は幅広いタスクにおいて大きな進歩を見せている。中でもグラフ生成は、実生活に広く応用されていることから、大きな研究の注目を集めている。本研究では,拡散グラフ生成法について,系統的,包括的に検討した。まず,拡散確率モデル,スコアベース生成モデル,確率微分方程式の3つの主流パラダイムについて検討する。次に、グラフ上の拡散モデルの最新の応用を分類し、紹介する。最後に,現在の研究の限界と今後の探査の方向性を指摘する。この調査で得られた既存のメソッドの要約はhttps://github.com/zhejiangzhuque/Diffusion-based-Graph-Generative-Methodsにある。 Being the most cutting-edge generative methods, diffusion methods have shown great advances in wide generation tasks. Among them, graph generation attracts significant research attention for its broad application in real life. In our survey, we systematically and comprehensively review on diffusion-based graph generative methods. We first make a review on three mainstream paradigms of diffusion methods, which are denoising diffusion probabilistic models, score-based genrative models, and stochastic differential equations. Then we further categorize and introduce the latest applications of diffusion models on graphs. In the end, we point out some limitations of current studies and future directions of future explorations. The summary of existing methods metioned in this survey is in https://github.com/zhejiangzhuque/Diffusion-based-Graph-Generative-Methods.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# 連続強化学習のための世界モデルにおけるリプレイ強化 Augmenting Replay in World Models for Continual Reinforcement Learning ( http://arxiv.org/abs/2401.16650v3 ) ライセンス: Link先を確認	Luke Yang, Levin Kuhlmann, Gideon Kowadlo,	(参考訳) 連続RLは、エージェントが過去のタスクと将来のタスクの両方を改善しながら、以前のタスクを忘れずに新しいタスクを学ぶ必要がある。最も一般的なアプローチは、モデルフリーのアルゴリズムとリプレイバッファを使うことで、破滅的な忘れを軽減できますが、大きなメモリ要求のためにスケーラビリティに悩まされます。生物学的にインスパイアされたリプレイは、モデルベースのRLと整合した世界モデルへのリプレイを提案し、モデルフリーアルゴリズムにおけるリプレイの一般的な設定とは対照的である。モデルベースのRLは、ポリシーとは独立して環境の知識を活用することで、連続的なRLに利益をもたらす。 WMAR(World Models with Augmented Replay)は,メモリ効率の高い分散マッチングリプレイバッファを持つモデルベースRLアルゴリズムである。 WMARは、単純なFIFOバッファを使用し、連続RLではテストされなかった、よく知られたDreamerV3アルゴリズムを拡張している。 WMARとDreamerV3を同サイズのリプレイバッファで評価した。 OpenAI Procgenを使って共有構造を持つタスクと、Atariベンチマークを使って共有構造を持たないタスクの2つのシナリオでテストされた。 WMARは、過去のタスクと将来のタスクのスキル伝達だけでなく、忘れるための指標も考慮した連続RLに好適な特性を示した。 DreamerV3と比較して、WMARは共有構造を持つタスクにはわずかに利点があり、共有構造を持たないタスクの特徴をかなりよく忘れている。その結果,メモリ効率のよいリプレイバッファを持つモデルベースRLは連続RLに有効なアプローチであり,さらなる研究を正当化する可能性が示唆された。 Continual RL requires an agent to learn new tasks without forgetting previous ones, while improving on both past and future tasks. The most common approaches use model-free algorithms and replay buffers can help to mitigate catastrophic forgetting, but often struggle with scalability due to large memory requirements. Biologically inspired replay suggests replay to a world model, aligning with model-based RL; as opposed to the common setting of replay in model-free algorithms. Model-based RL offers benefits for continual RL by leveraging knowledge of the environment, independent of policy. We introduce WMAR (World Models with Augmented Replay), a model-based RL algorithm with a memory-efficient distribution-matching replay buffer. WMAR extends the well known DreamerV3 algorithm, which employs a simple FIFO buffer and was not tested in continual RL. We evaluated WMAR and DreamerV3, with the same-size replay buffers. They were tested on two scenarios: tasks with shared structure using OpenAI Procgen and tasks without shared structure using the Atari benchmark. WMAR demonstrated favourable properties for continual RL considering metrics for forgetting as well as skill transfer on past and future tasks. Compared to DreamerV3, WMAR showed slight benefits in tasks with shared structure and substantially better forgetting characteristics on tasks without shared structure. Our results suggest that model-based RL with a memory-efficient replay buffer can be an effective approach to continual RL, justifying further research.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# グラフオートエンコーダを用いたネットワーク表現の学習 Learning Network Representations with Disentangled Graph Auto-Encoder ( http://arxiv.org/abs/2402.01143v2 ) ライセンス: Link先を確認	Di Fan, Chuanhou Gao,	(参考訳) 変分グラフオートエンコーダはグラフ構造化データの表現を学習するために広く用いられている。しかし、実世界のグラフの形成は、潜在因子の影響を受け、複雑で不均一な過程である。既存のエンコーダは基本的に全体論的であり、潜在因子の絡み合いを無視している。これにより、グラフ解析タスクの有効性が低下すると同時に、学習した表現を説明するのが難しくなる。その結果、(変分)グラフオートエンコーダを用いた非絡合グラフ表現の学習は、大きな課題を生じさせ、現在の研究でほとんど解明されていない。本稿では,DVGA(Disentangled Graph Auto-Encoder)とDVGA(Disentangled Variational Graph Auto-Encoder)を導入して,不整形表現の学習を行う。具体的には、まず、エンコーダとして機能するマルチチャネルメッセージパッシング層を持つ非絡み合いグラフ畳み込みネットワークを設計する。これにより、各チャネルは各潜伏因子に関する情報を集約することができる。次に、各チャネルにコンポーネントワイドフローを適用することにより、不整合変分グラフ自動エンコーダの表現能力を向上する。さらに,不整合表現の特徴を考慮に入れた因子的デコーダを構築する。我々は、異なる潜伏要因のマッピングチャネルに独立性制約を課すことにより、表現の独立性を改善する。人工的および実世界の両方のデータセットに関する実証実験は、いくつかの最先端ベースラインと比較して提案手法の優位性を実証している。 The (variational) graph auto-encoder is widely used to learn representations for graph-structured data. However, the formation of real-world graphs is a complicated and heterogeneous process influenced by latent factors. Existing encoders are fundamentally holistic, neglecting the entanglement of latent factors. This reduces the effectiveness of graph analysis tasks, while also making it more difficult to explain the learned representations. As a result, learning disentangled graph representations with the (variational) graph auto-encoder poses significant challenges and remains largely unexplored in the current research. In this paper, we introduce the Disentangled Graph Auto-Encoder (DGA) and the Disentangled Variational Graph Auto-Encoder (DVGA) to learn disentangled representations. Specifically, we first design a disentangled graph convolutional network with multi-channel message-passing layers to serve as the encoder. This allows each channel to aggregate information about each latent factor. The disentangled variational graph auto-encoder's expressive capability is then enhanced by applying a component-wise flow to each channel. In addition, we construct a factor-wise decoder that takes into account the characteristics of disentangled representations. We improve the independence of representations by imposing independence constraints on the mapping channels for distinct latent factors. Empirical experiments on both synthetic and real-world datasets demonstrate the superiority of our proposed method compared to several state-of-the-art baselines.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# リクエストを超えて: ブラウザ間のWebトラッカー分類のためのHTTPレスポンスヘッダを不均衡に設定する Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting ( http://arxiv.org/abs/2402.01240v2 ) ライセンス: Link先を確認	Wolf Rieder, Philip Raschke, Thomas Cory,	(参考訳) World Wide Webの接続性はHTTPプロトコルに大きく影響しており、HTTPメッセージはWebセキュリティやプライバシ、特にWebトラッキングに関する規律に訴える情報的ヘッダフィールドを提供する。既存の調査では、Webトラッカーを特定するためにHTTPリクエストメッセージを使用しているが、HTTPレスポンスヘッダはしばしば見過ごされている。本研究は、二項化HTTP応答ヘッダを用いたWebトラッカー検出のための効果的な機械学習分類器を設計する試みである。トラフィック監視ブラウザエクステンションであるT.EXを通じて得られたChrome、Firefox、Braveブラウザのデータは、私たちのデータセットとして役立ちます。 10の教師付きモデルがChromeデータ上でトレーニングされ、1年後のChromeデータセットを含むすべてのブラウザでテストされた。結果は、ChromeとFirefoxで高い精度、F1スコア、精度、リコール、最小ログロスエラーを示したが、Braveのデータ分散と機能セットが異なるため、Braveのパフォーマンスは低い。その結果,これらの分類器はWebトラッカー検出に有効であることが示唆された。しかし、現実のアプリケーションテストはまだ進行中であり、トラッカータイプとより広範なラベルソースの区別は今後の研究で検討される可能性がある。 The World Wide Web's connectivity is greatly attributed to the HTTP protocol, with HTTP messages offering informative header fields that appeal to disciplines like web security and privacy, especially concerning web tracking. Despite existing research employing HTTP request messages to identify web trackers, HTTP response headers are often overlooked. This study endeavors to design effective machine learning classifiers for web tracker detection using binarized HTTP response headers. Data from the Chrome, Firefox, and Brave browsers, obtained through the traffic monitoring browser extension T.EX, serves as our dataset. Ten supervised models were trained on Chrome data and tested across all browsers, including a Chrome dataset from a year later. The results demonstrated high accuracy, F1-score, precision, recall, and minimal log-loss error for Chrome and Firefox, but subpar performance on Brave, potentially due to its distinct data distribution and feature set. The research suggests that these classifiers are viable for web tracker detection. However, real-world application testing remains pending, and the distinction between tracker types and broader label sources could be explored in future studies.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# 量子テレポーテーションにおけるノイズ低減 Noise mitigation in quantum teleportation ( http://arxiv.org/abs/2402.02343v2 ) ライセンス: Link先を確認	Zi-Jian Xu, Jun-Hong An,	(参考訳) 量子テレポーテーション(quantum teleportation)は、多くの量子技術において重要な構成要素である。しかし、ノイズによって引き起こされるデコヒーレンスによって量子テレポーテーションの実用的な実現は必然的に挑戦される。本稿では,離散型および連続型量子テレポーテーション方式の両方に適用可能なノイズ低減機構を提案する。 2種類の量子テレポーテーションスキームの非マルコフ的デコヒーレンスダイナミクスを調査した結果、関連するサブシステムとそれらの貯水池からなる全系のエネルギースペクトルにおいて境界状態が形成される限り、その忠実性の量子的優越性は持続的に回復されることがわかった。ノイズ緩和プロトコルの洞察に富んだ理解を提供するため,ノイズ耐性量子テレポーテーションの実現に向けての道を開いた。 Permitting the transmission of unknown quantum states over long distances by using entanglement, quantum teleportation serves as an important building block for many quantum technologies. However, in the noisy intermediate-scale quantum era, the practical realization of quantum teleportation is inevitably challenged by the noise-induced decoherence. We here propose a noise-mitigation mechanism applicable in both the discrete- and continuous-variable quantum teleportation schemes. Via investigating the non-Markovian decoherence dynamics of the two types of quantum teleportation schemes, we find that, as long as a bound state is formed in the energy spectrum of the total system consisting of the involved subsystems and their respective reservoirs, the quantum superiority of the fidelity is persistently recovered. Supplying an insightful understanding of the noise-mitigation protocols, our result paves the way to the practical realization of noise-tolerant quantum teleportation.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# LHRS-Bot:VGI強化大規模マルチモーダル言語モデルを用いたリモートセンシング LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model ( http://arxiv.org/abs/2402.02544v4 ) ライセンス: Link先を確認	Dilxat Muhtar, Zhenshi Li, Feng Gu, Xueliang Zhang, Pengfeng Xiao,	(参考訳) 大規模言語モデル(LLM)の革命的能力は、マルチモーダルな大規模言語モデル(MLLM)の道を切り開き、様々な専門分野にまたがる多様な応用を育んでいる。しかし、リモートセンシング(RS)分野では、最近のMLLMでは、多様な地形やRS画像の様々な物体が適切に考慮されていない。このギャップを埋めるために、大規模なRS画像テキストデータセットであるLHRS-Alignと情報的RS固有の命令データセットであるLHRS-Instructを構築し、大規模なボランティア地理情報(VGI)とグローバルに利用可能なRS画像を活用する。この基盤の上に構築されたLHRS-Botは、新しい多段階視覚言語アライメント戦略とカリキュラム学習手法により、RS画像理解に適したMLLMである。さらに、RS画像理解におけるMLLMの能力を徹底的に評価するベンチマークであるLHRS-Benchを紹介する。総合的な実験により、LHRS-BotはRS画像の深い理解と、RS領域内でニュアンス推論を行う能力を示すことが示された。 The revolutionary capabilities of large language models (LLMs) have paved the way for multimodal large language models (MLLMs) and fostered diverse applications across various specialized domains. In the remote sensing (RS) field, however, the diverse geographical landscapes and varied objects in RS imagery are not adequately considered in recent MLLM endeavors. To bridge this gap, we construct a large-scale RS image-text dataset, LHRS-Align, and an informative RS-specific instruction dataset, LHRS-Instruct, leveraging the extensive volunteered geographic information (VGI) and globally available RS images. Building on this foundation, we introduce LHRS-Bot, an MLLM tailored for RS image understanding through a novel multi-level vision-language alignment strategy and a curriculum learning method. Additionally, we introduce LHRS-Bench, a benchmark for thoroughly evaluating MLLMs' abilities in RS image understanding. Comprehensive experiments demonstrate that LHRS-Bot exhibits a profound understanding of RS images and the ability to perform nuanced reasoning within the RS domain.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# 拡張Open-Set Object DetectorによるクロスドメインFew-Shotオブジェクト検出 Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector ( http://arxiv.org/abs/2402.03094v3 ) ライセンス: Link先を確認	Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc Van Gool, Xingqun Jiang,	(参考訳) 本稿では,最小限のラベル付きサンプルを用いた新規ドメイン向け高精度物体検出装置の開発を目指して,CD-FSODの挑戦的領域間多重ショット検出手法について検討する。 DE-ViTのようなトランスフォーマーベースのオープンセット検出器は、従来の数発の物体検出において有望であるが、CD-FSODへの一般化はまだ不明である。 1) このような開集合検出法はCD-FSODに容易に一般化できるのか? 2) もしそうでなければ、巨大なドメインギャップに直面したモデルをどのように拡張できるでしょうか? 最初の質問に答えるために、私たちは、ドメインギャップを理解するために、スタイル、クラス間分散(ICV)、定義不能境界(IB)などの手段を使用します。これらの測定値に基づいて,オブジェクト検出手法を評価するためのCD-FSODという新しいベンチマークを構築し,現在のアプローチの大部分がドメイン全体の一般化に失敗していることを明らかにする。技術的には, 性能低下は, 提案手法であるスタイル, ICV, IBと関連していると考えられる。そこで本研究では,これらの問題に対処する新しいモジュールをいくつか提案する。まず、学習可能なインスタンス機能は、初期固定インスタンスをターゲットカテゴリに整列し、特徴の識別性を向上する。第二に、インスタンス再重み付けモジュールは、わずかなIBを持つ高品質なインスタンスにより高い重要性を割り当てる。第3に、ドメインプロンプトは、意味内容を変更することなく想像領域を合成することにより、異なるスタイルに回復する機能を奨励する。これらの技術はCD-FSOD(CD-ViTO)用クロスドメインビジョントランスの開発に一括して寄与し、D-ViTベースで大幅に改善された。実験により,本モデルの有効性が検証された。 This paper studies the challenging cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors, such as DE-ViT, show promise in traditional few-shot object detection, their generalization to CD-FSOD remains unclear: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If not, how can models be enhanced when facing huge domain gaps? To answer the first question, we employ measures including style, inter-class variance (ICV), and indefinable boundaries (IB) to understand the domain gap. Based on these measures, we establish a new benchmark named CD-FSOD to evaluate object detection methods, revealing that most of the current approaches fail to generalize across domains. Technically, we observe that the performance decline is associated with our proposed measures: style, ICV, and IB. Consequently, we propose several novel modules to address these issues. First, the learnable instance features align initial fixed instances with target categories, enhancing feature distinctiveness. Second, the instance reweighting module assigns higher importance to high-quality instances with slight IB. Third, the domain prompter encourages features resilient to different styles by synthesizing imaginary domains without altering semantic contents. These techniques collectively contribute to the development of the Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO), significantly improving upon the base DE-ViT. Experimental results validate the efficacy of our model.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# CAT-SAM:Segment Anything ModelのFew-Shot Adaptationのための条件調整 CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model ( http://arxiv.org/abs/2402.03631v3 ) ライセンス: Link先を確認	Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu,	(参考訳) 最近のSegment Anything Model (SAM) は、一般画像のセグメンテーションにおいて顕著なゼロショット能力と柔軟な幾何学的プロンプトを示した。しかしSAMは、航空、医療、非RGB画像など、様々な非伝統的なイメージを扱う際にしばしば苦労する。本稿では,CAT-SAM(ConditionAl Tuning Network)を提案する。 CAT-SAMはSAM全体を凍結し、マスクデコーダとイメージエンコーダに少数の学習可能なパラメータを同時に適用する。コア設計は、重厚画像エンコーダと軽量マスクデコーダのデコーダ条件付きジョイントチューニングを可能にするプロンプトブリッジ構造である。ブリッジングはマスクデコーダのプロンプトトークンを画像エンコーダにマッピングし、エンコーダとデコーダの相乗的適応を相互に促進する。我々は、入力空間に学習可能なプロンプトトークンを注入する1つのCAT-SAMと、軽量なアダプタネットワークを挿入する2つのCAT-SAM変異をもたらすイメージエンコーダの2つの代表的チューニング戦略を開発する。 11の非従来型タスクに対する大規模な実験により、CAT-SAMはどちらも、非常に困難なワンショット適応設定の下でも、より優れた目標セグメンテーション性能を達成することが示された。プロジェクトページ: https://xiaoaoran.github.io/projects/CAT-SAM The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just few-shot target samples. CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters. The core design is a prompt bridge structure that enables decoder-conditioned joint tuning of the heavyweight image encoder and the lightweight mask decoder. The bridging maps the prompt token of the mask decoder to the image encoder, fostering synergic adaptation of the encoder and the decoder with mutual benefits. We develop two representative tuning strategies for the image encoder which leads to two CAT-SAM variants: one injecting learnable prompt tokens in the input space and the other inserting lightweight adapter networks. Extensive experiments over 11 unconventional tasks show that both CAT-SAM variants achieve superior target segmentation performance consistently even under the very challenging one-shot adaptation setup. Project page: https://xiaoaoran.github.io/projects/CAT-SAM	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# 安全なマルチモーダル学習システムに関する調査研究 A Survey on Safe Multi-Modal Learning System ( http://arxiv.org/abs/2402.05355v6 ) ライセンス: Link先を確認	Tianyi Zhao, Liangliang Zhang, Yao Ma, Lu Cheng,	(参考訳) 人工知能の急速な発展の中で、マルチモーダル学習システム(MMLS)は、様々なモーダル入力から情報を処理し統合する能力によって、注目を集めている。医療などの重要な分野での利用が拡大し、安全保証が重要な関心事となっている。しかし、その安全性に関する体系的な研究が欠如していることは、この分野の進歩にとって重要な障壁である。このギャップを埋めるために,MMLSの安全性を体系的に分類し評価する最初の分類法を提案する。この分類は、MMLSの安全性を保証するために重要な4つの基本的な柱、すなわち堅牢性、アライメント、監視、制御性に基づいて構成されている。この分類を活用して、既存の方法論、ベンチマーク、研究の現状をレビューするとともに、知識の主な限界とギャップを指摘します。最後に,MMLSの安全性に関するユニークな課題について論じる。これらの課題を明らかにするために,我々は今後の研究の道を開くことを目指しており,MMLSの安全性プロトコルの大幅な進歩につながる可能性のある潜在的方向性を提案する。 In the rapidly evolving landscape of artificial intelligence, multimodal learning systems (MMLS) have gained traction for their ability to process and integrate information from diverse modality inputs. Their expanding use in vital sectors such as healthcare has made safety assurance a critical concern. However, the absence of systematic research into their safety is a significant barrier to progress in this field. To bridge the gap, we present the first taxonomy that systematically categorizes and assesses MMLS safety. This taxonomy is structured around four fundamental pillars that are critical to ensuring the safety of MMLS: robustness, alignment, monitoring, and controllability. Leveraging this taxonomy, we review existing methodologies, benchmarks, and the current state of research, while also pinpointing the principal limitations and gaps in knowledge. Finally, we discuss unique challenges in MMLS safety. In illuminating these challenges, we aim to pave the way for future research, proposing potential directions that could lead to significant advancements in the safety protocols of MMLS.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# 未知状態を用いた実時間ホロスティックロボットの姿勢推定 Real-time Holistic Robot Pose Estimation with Unknown States ( http://arxiv.org/abs/2402.05655v4 ) ライセンス: Link先を確認	Shikun Ban, Juling Fan, Xiaoxuan Ma, Wentao Zhu, Yu Qiao, Yizhou Wang,	(参考訳) RGB画像からロボットのポーズを推定することは、コンピュータビジョンとロボット工学において重要な問題である。従来の手法は有望な性能を達成してきたが、そのほとんどはロボットの内部状態、例えば接地型ロボット関節角の完全な知識を前提としている。しかし、この仮定は現実的な状況では必ずしも有効ではない。マルチロボットのコラボレーションや人間とロボットのインタラクションのような現実世界のアプリケーションでは、ロボットの関節状態は共有されず、信頼できないこともある。一方, 従来のロボットの動作推定手法は, 計算負荷が重いため, リアルタイムアプリケーションをサポートできない。本研究は,RGB画像からリアルタイムロボットのポーズ推定を行う上で,既知のロボットの状態を必要としない効率的なフレームワークを提案する。本手法では,ロボットの状態パラメータ,キーポイント位置,ルート深さを推定し,各タスクにニューラルネットワークモジュールを用いて学習とシミュレートを容易にする。特に、繰り返し最適化することなく、単一のフィードフォワードパスでの推論を実現する。提案手法は,最先端の精度で12倍の速度向上を実現し,実時間で総合的なロボットのポーズ推定を可能にする。コードとモデルはhttps://github.com/Oliverbansk/Holistic-Robot-Pose-Estimationで公開されている。 Estimating robot pose from RGB images is a crucial problem in computer vision and robotics. While previous methods have achieved promising performance, most of them presume full knowledge of robot internal states, e.g. ground-truth robot joint angles. However, this assumption is not always valid in practical situations. In real-world applications such as multi-robot collaboration or human-robot interaction, the robot joint states might not be shared or could be unreliable. On the other hand, existing approaches that estimate robot pose without joint state priors suffer from heavy computation burdens and thus cannot support real-time applications. This work introduces an efficient framework for real-time robot pose estimation from RGB images without requiring known robot states. Our method estimates camera-to-robot rotation, robot state parameters, keypoint locations, and root depth, employing a neural network module for each task to facilitate learning and sim-to-real transfer. Notably, it achieves inference in a single feed-forward pass without iterative optimization. Our approach offers a 12-time speed increase with state-of-the-art accuracy, enabling real-time holistic robot pose estimation for the first time. Code and models are available at https://github.com/Oliverbansk/Holistic-Robot-Pose-Estimation.	翻訳日:2024-07-17 23:30:59 公開日:2024-07-16
# 集積オンチップフィルタによる高速制御による劣化保護超伝導量子ビット Decay-protected superconducting qubit with fast control enabled by integrated on-chip filters ( http://arxiv.org/abs/2402.08906v2 ) ライセンス: Link先を確認	Aashish Sah, Suman Kundu, Heikki Suominen, Qiming Chen, Mikko Möttönen,	(参考訳) 超伝導量子ビットの高速ゲートと長いコヒーレンス時間を達成することは、通常、駆動線がより強く結合するか、または過度に強いマイクロ波信号が量子ビットに結合する必要があるという課題を示す。そこで本研究では、キュービット周波数で停止帯域を示すキュービットドライブのオンチップフィルタを導入し、低調波周波数での長いコヒーレンス時間と強い結合を可能にし、高速な単一キュービットゲートの実現と熱負荷の低減を実現した。フィルタは外因性緩和時間を数秒で示し、サブハーモニック制御を備えたサブ10nゲートを実現した。ここでは, ストップバンドにおける測定緩和時間を200倍に改善した。さらに、12 nsのパルス持続時間$$\pi$のラビ振動のサブハーモニック駆動を実装した。 2次元量子プロセッサにおけるオンチップフィルタと効率的なサブハーモニック駆動の実証は、制御線からの熱負荷とノイズを低減したスケーラブルな量子ビットアーキテクチャへの道を開く。 Achieving fast gates and long coherence times for superconducting qubits presents challenges, typically requiring either a stronger coupling of the drive line or an excessively strong microwave signal to the qubit. To address this, we introduce on-chip filters of the qubit drive exhibiting a stopband at the qubit frequency, thus enabling long coherence times and strong coupling at the subharmonic frequency, facilitating fast single-qubit gates, and reduced thermal load. The filters exhibit an extrinsic relaxation time of a few seconds while enabling sub-10-ns gates with subharmonic control. Here we show up to 200-fold improvement in the measured relaxation time at the stopband. Furthermore, we implement subharmonic driving of Rabi oscillations with a $\pi$ pulse duration of 12 ns. Our demonstration of on-chip filters and efficient subharmonic driving in a two-dimensional quantum processor paves the way for a scalable qubit architecture with reduced thermal load and noise from the control line.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# 連続列列列モデリングのための階層的状態空間モデル Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling ( http://arxiv.org/abs/2402.10211v2 ) ライセンス: Link先を確認	Raunaq Bhirangi, Chenyu Wang, Venkatesh Pattabiraman, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, Lerrel Pinto,	(参考訳) 生の知覚データのシーケンスから推論することは、医療機器からロボティクスまで、あらゆる分野にまたがる問題である。これらの問題はしばしば、望ましい物理量のシーケンス(例えば力、慣性測定)を予測するために、センサーデータの長いシーケンス(例えば磁力計、ピエゾ抵抗器)を使用する。古典的なアプローチは、局所的な線形予測問題には強力だが、実世界のセンサーを使用すると、しばしば不足する。これらのセンサーは典型的には非線形であり、外部変数(例えば振動)に影響を受け、データ依存のドリフトを示す。多くの問題において、地上のトラスラベルを取得するには高価な機器を必要とするため、小さなラベル付きデータセットによって予測タスクが悪化する。本研究では,階層型状態空間モデル(HiSS)を提案する。 HiSSスタックは、時間階層を生成するために、互いに上にステートスペースモデルを構造化する。触覚に基づく状態予測から加速度計による慣性測定に至るまで、現実世界の6つのセンサデータセットにわたって、HiSSは、因果変換器、LSTM、S4、Mambaといった最先端のシーケンスモデルを、MSEで少なくとも23%上回っている。我々の実験は、HiSSがより小さなデータセットへの効率的なスケーリングを示し、既存のデータフィルタリング技術と互換性があることを示唆している。コード、データセット、ビデオはhttps://hiss-csp.github.io.comで見ることができる。 Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e.g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space Models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https://hiss-csp.github.io.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# Bayesian Online Multiple Testing: リソース割り当てアプローチ Bayesian Online Multiple Testing: A Resource Allocation Approach ( http://arxiv.org/abs/2402.11425v4 ) ライセンス: Link先を確認	Ruicheng Ao, Hongyu Chen, David Simchi-Levi, Feng Zhu,	(参考訳) 各実験が仮説テストタスクに対応する複数の実験を順次実施する問題を考察する。各時点において、実験者は、次の実験結果が到着する前に、ヌル仮説を拒絶するか(または、同等に発見を主張するか)という不可解な決定をしなければならない。目標は、ローカル偽発見率(LFDR)によって測定されるすべての時点において、低いエラー率を維持しながら、発見回数を最大化することである。オンラインのknapsack問題として,外因性ランダム予算補充問題として定式化する。まず、一般的な到着分布から始め、単純なポリシーが$O(\sqrt{T})$後悔を達成することを示す。このような後悔率は一般的には実現不可能であることを示すことで、結果を補完する。次に、個別の到着分布に焦点を移します。オンラインリソース割り当て文献における多くの既存の再解決ヒューリスティックは、標準設定における有界損失を達成したとしても、$\Omega(\sqrt{T})$あるいは$\Omega(T)$後悔を招きかねない。標準政策は楽観的すぎる傾向にあり,要求発見を超越する傾向にあることから,予算安全バッファを組み込んだ新たな政策を提案する。小さな対数バッファは、後悔を$\Omega(\sqrt{T})$または$\Omega(T)$から$O(\ln^2T)$に減らすのに十分である。実用の観点からは、ポリシーを連続到着分布、時間依存情報構造、未知の$T$のシナリオに拡張する。ニューヨーク市のタクシー乗客の時系列データに合成実験と経験的応用の両方を施し,提案手法の有効性を検証した。本研究は,外因性予算補充を伴うオンライン資源配分問題において,政策がいかに効果的に設計されるべきかを強調した。 We consider the problem of sequentially conducting multiple experiments where each experiment corresponds to a hypothesis testing task. At each time point, the experimenter must make an irrevocable decision of whether to reject the null hypothesis (or equivalently claim a discovery) before the next experimental result arrives. The goal is to maximize the number of discoveries while maintaining a low error rate at all time points measured by Local False Discovery Rate (LFDR). We formulate the problem as an online knapsack problem with exogenous random budget replenishment. We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret. We complement the result by showing that such regret rate is in general not improvable. We then shift our focus to discrete arrival distributions. We find that many existing re-solving heuristics in the online resource allocation literature, albeit achieve bounded loss in canonical settings, may incur a $\Omega(\sqrt{T})$ or even a $\Omega(T)$ regret. With the observation that canonical policies tend to be too optimistic and over claim discoveries, we propose a novel policy that incorporates budget safety buffers. It turns out that a little more safety can greatly enhance efficiency -- small additional logarithmic buffers suffice to reduce the regret from $\Omega(\sqrt{T})$ or even $\Omega(T)$ to $O(\ln^2 T)$. From a practical perspective, we extend the policy to the scenario with continuous arrival distributions, time-dependent information structures, as well as unknown $T$. We conduct both synthetic experiments and empirical applications on a time series data from New York City taxi passengers to validate the performance of our proposed policies. Our results emphasize how effective policies should be designed in online resource allocation problems with exogenous budget replenishment.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# FairProof : ニューラルネットワークの信頼性と証明可能な公正性 FairProof : Confidential and Certifiable Fairness for Neural Networks ( http://arxiv.org/abs/2402.12572v2 ) ライセンス: Link先を確認	Chhavi Yadav, Amrita Roy Chowdhury, Dan Boneh, Kamalika Chaudhuri,	(参考訳) 機械学習モデルは、社会的アプリケーションでますます使われているが、法的およびプライバシー上の懸念は、しばしば秘密にしておくことを要求する。その結果、モデル予測の受信端にいる消費者の心の中で、これらのモデルの公平性に関する不信が高まっている。この目的のために,Zero-Knowledge Proofs (暗号プリミティブ) を用いて,機密性を保ちながらモデルの公正性を公に検証するシステムである \name を提案する。また、ZKPに適合し、本システムで使用される完全連結ニューラルネットワークの公平性検証アルゴリズムを提案する。 Gnark で \name を実装し、我々のシステムが実際に実現可能であることを実証的に示す。コードはhttps://github.com/infinite-pursuits/FairProof.comで入手できる。 Machine learning models are increasingly used in societal applications, yet legal and privacy concerns demand that they very often be kept confidential. Consequently, there is a growing distrust about the fairness properties of these models in the minds of consumers, who are often at the receiving end of model predictions. To this end, we propose \name -- a system that uses Zero-Knowledge Proofs (a cryptographic primitive) to publicly verify the fairness of a model, while maintaining confidentiality. We also propose a fairness certification algorithm for fully-connected neural networks which is befitting to ZKPs and is used in this system. We implement \name in Gnark and demonstrate empirically that our system is practically feasible. Code is available at https://github.com/infinite-pursuits/FairProof.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# 無秩序な相関を緩和する無秩序な概念発見 Unsupervised Concept Discovery Mitigates Spurious Correlations ( http://arxiv.org/abs/2402.13368v2 ) ライセンス: Link先を確認	Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi,	(参考訳) トレーニングデータにおける急激な相関関係のモデルはしばしば脆い予測を発生させ、意図しないバイアスを導入する。この課題に対処するには、多くの場合、多くのアプリケーションでは容易に利用できない急激な相関を取り除くために、事前の知識とグループアノテーションに依存するメソッドが関係する。本稿では,教師なし対象中心学習と突発的相関の緩和の新たな関連性を確立する。ラベルと異なる相関関係を持つ部分群を直接推論する代わりに、本手法では、入力サンプル間で共有される離散的なアイデアという概念の発見に重点を置いている。既存のオブジェクト指向表現学習を活用したCoBalTは,サブグループの人によるラベル付けを必要とせず,効果的な相関を緩和する概念バランス技術である。サブポピュレーションシフトのためのベンチマークデータセットによる評価は、グループアノテーションを必要とせずに、最先端のベースラインよりも優れた、あるいは競合的なパフォーマンスを示している。コードはhttps://github.com/rarefin/CoBalT.comで入手できる。 Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases. Addressing this challenge typically involves methods relying on prior knowledge and group annotation to remove spurious correlations, which may not be readily available in many applications. In this paper, we establish a novel connection between unsupervised object-centric learning and mitigation of spurious correlations. Instead of directly inferring subgroups with varying correlations with labels, our approach focuses on discovering concepts: discrete ideas that are shared across input samples. Leveraging existing object-centric representation learning, we introduce CoBalT: a concept balancing technique that effectively mitigates spurious correlations without requiring human labeling of subgroups. Evaluation across the benchmark datasets for sub-population shifts demonstrate superior or competitive performance compared state-of-the-art baselines, without the need for group annotation. Code is available at https://github.com/rarefin/CoBalT.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# AIによる心理的仮説生成の自動化--大言語モデルが因果グラフに一致する場合 Automating psychological hypothesis generation with AI: when large language models meet causal graph ( http://arxiv.org/abs/2402.14424v3 ) ライセンス: Link先を確認	Song Tong, Kai Mao, Zhen Huang, Yukun Zhao, Kaiping Peng,	(参考訳) 因果知識グラフと大言語モデル(LLM)の相乗効果を利用して,心理学における計算仮説生成のための画期的なアプローチを提案する。 LLMを用いて43,312の心理学記事を分析し,因果関係のペアを抽出した。この分析は心理学の専門的な因果グラフを生み出した。リンク予測アルゴリズムを適用し,「幸福」に焦点をあてた130の心理学的仮説を作成した。興味深いことに, LLM と因果グラフの組み合わせは, LLM のみの仮説 (t(59) = 3.34, p=0.007, t(59) = 4.32, p<0.001, ) を明らかに上回り, 新奇性の観点から専門家レベルの洞察を反映している。このアライメントは、ディープセマンティック分析によってさらに裏付けられた。その結果, LLMと因果知識グラフなどの機械学習技術を組み合わせることで, 心理学における自動発見に革命をもたらし, 幅広い文献から新たな知見を抽出できることが示唆された。この研究は心理学と人工知能のクロスロードに立っており、心理学研究においてデータ駆動仮説生成のための新しい豊かなパラダイムを推進している。 Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 potential psychological hypotheses focusing on `well-being', then compared them against research ideas conceived by doctoral scholars and those produced solely by the LLM. Interestingly, our combined approach of a LLM and causal graphs mirrored the expert-level insights in terms of novelty, clearly surpassing the LLM-only hypotheses (t(59) = 3.34, p=0.007 and t(59) = 4.32, p<0.001, respectively). This alignment was further corroborated using deep semantic analysis. Our results show that combining LLM with machine learning techniques such as causal knowledge graphs can revolutionize automated discovery in psychology, extracting novel insights from the extensive literature. This work stands at the crossroads of psychology and artificial intelligence, championing a new enriched paradigm for data-driven hypothesis generation in psychological research.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# 構造Marginalizationと自己回帰順序による効果的なベイズ因果推論 Effective Bayesian Causal Inference via Structural Marginalisation and Autoregressive Orders ( http://arxiv.org/abs/2402.14781v2 ) ライセンス: Link先を確認	Christian Toth, Christian Knoll, Franz Pernkopf, Robert Peharz,	(参考訳) ベイズ因果推論(BCI)は、真の因果モデルに関する疫学的な不確実性を、因果モデルに対する後部平均化によって下流因果推論タスクに自然に組み込む。しかし、これは難解な数の因果構造が疎外されるため、非常に難しい計算問題を引き起こす。本研究では,構造学習問題を推論に分解する。 (i)因果順序、及び (ii)各変数の親集合に因果順序を付与する。変数あたりの親数を制限することで、多項式時間で親集合を正確に極小化することができ、因果順序のみを極小化することができる。そこで本研究では,勾配法で学習可能な因果順序(ARCO)に対する自己回帰モデルを提案する。提案手法は, スケールフリーおよびエルドス・レーニグラフ構造を用いた非線形加法雑音ベンチマークによる構造学習の最先端と実世界のデータに対する競合結果を得る。さらに,本手法は介入分布を正確に推算し,平均因果効果および他の多くの因果量の推定を行う。 Bayesian causal inference (BCI) naturally incorporates epistemic uncertainty about the true causal model into down-stream causal reasoning tasks by posterior averaging over causal models. However, this poses a tremendously hard computational problem due to the intractable number of causal structures to marginalise over. In this work, we decompose the structure learning problem into inferring (i) a causal order and (ii) a parent set for each variable given a causal order. By limiting the number of parents per variable, we can exactly marginalise over the parent sets in polynomial time, which leaves only the causal order to be marginalised. To this end, we propose a novel autoregressive model over causal orders (ARCO) learnable with gradient-based methods. Our method yields state-of-the-art in structure learning on simulated non-linear additive noise benchmarks with scale-free and Erdos-Renyi graph structures, and competitive results on real-world data. Moreover, we illustrate that our method accurately infers interventional distributions, which allows us to estimate posterior average causal effects and many other causal quantities of interest.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# 高速自動回帰デコードのためのLCM-to-SLM Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding ( http://arxiv.org/abs/2402.16844v2 ) ライセンス: Link先を確認	Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi,	(参考訳) 大規模言語モデル(LLM)は、実際にはユビキタスなものとなり、翻訳、要約、命令の追従といった生成タスクに広く利用されている。しかし、その巨大なサイズと自動回帰デコードへの依存は、デプロイメントコストを増大させ、レイテンシクリティカルなアプリケーションでの使用を複雑にする。本研究では,異なる大きさの言語モデルを組み合わせて,高い性能を維持しながら自己回帰復号の効率を向上させるハイブリッド手法を提案する。提案手法では, 並列に全てのプロンプトトークンを符号化し, その表現を条件付けし, 小言語モデル(SLM)を導出し, その応答をより効率的に生成する。本研究では,エンコーダ・デコーダとモデルファミリのデコーダ・デコーダ・専用SLMの組み合わせについて検討し,SLMの微調整のみを要した。様々なベンチマークによる実験では、LLMと比較して、翻訳および要約タスクに対して1-2\%の小さなパフォーマンスペナルティで、最大4\times$の大幅なスピードアップが示されている。 Large language models (LLMs) have become ubiquitous in practice and are widely used for generation tasks such as translation, summarization and instruction following. However, their enormous size and reliance on autoregressive decoding increase deployment costs and complicate their use in latency-critical applications. In this work, we propose a hybrid approach that combines language models of different sizes to increase the efficiency of autoregressive decoding while maintaining high performance. Our method utilizes a pretrained frozen LLM that encodes all prompt tokens once in parallel, and uses the resulting representations to condition and guide a small language model (SLM), which then generates the response more efficiently. We investigate the combination of encoder-decoder LLMs with both encoder-decoder and decoder-only SLMs from different model families and only require fine-tuning of the SLM. Experiments with various benchmarks show substantial speedups of up to $4\times$, with minor performance penalties of $1-2\%$ for translation and summarization tasks compared to the LLM.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# ディスクリプタとしてのユニバーサルニューラルネットワークポテンシャル:量子コンピュータと古典コンピュータを用いたスケーラブルな化学特性予測を目指して Universal neural network potentials as descriptors: Towards scalable chemical property prediction using quantum and classical computers ( http://arxiv.org/abs/2402.18433v2 ) ライセンス: Link先を確認	Tomoya Shiota, Kenji Ishihara, Wataru Mizukami,	(参考訳) 多様な化学特性の正確な予測は、分子設計と材料発見の進展に不可欠である。本稿では,化学特性予測のための汎用記述子として,普遍的ニューラルネットワークポテンシャルの中間情報を利用する汎用的アプローチを提案する。本手法は, 汎用力場のための洗練されたニューラルネットワークアーキテクチャを訓練することにより, 原子環境の伝達可能な表現を学習する,という知見に基づいている。本稿では,M3GNet や MACE などのグラフニューラルネットワークポテンシャルを用いた伝達学習が,量子機械学習を用いたNMR化学シフトの予測手法に匹敵する精度を実現するとともに,記述子のコンパクトさにもかかわらず,標準的な古典回帰モデルも実現可能であることを示す。特に、MACEディスクリプタは、薬物分子の${^{13}}$C NMR化学シフトベンチマークにおいて、これまでで最高の精度を示している。この研究は、特性を正確に予測する効率的な方法を提供し、新しい分子や物質の発見を加速させる可能性がある。 Accurate prediction of diverse chemical properties is crucial for advancing molecular design and materials discovery. Here we present a versatile approach that uses the intermediate information of a universal neural network potential as a general-purpose descriptor for chemical property prediction. Our method is based on the insight that by training a sophisticated neural network architecture for universal force fields, it learns transferable representations of atomic environments. We show that transfer learning with graph neural network potentials such as M3GNet and MACE achieves accuracy comparable to state-of-the-art methods for predicting the NMR chemical shifts of using quantum machine learning as well as a standard classical regression model, despite the compactness of its descriptors. In particular, the MACE descriptor demonstrates the highest accuracy to date on the ${^{13}}$C NMR chemical shift benchmarks for drug molecules. This work provides an efficient way to accurately predict properties, potentially accelerating the discovery of new molecules and materials.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# 安定化剤基底状態:理論、アルゴリズムおよび応用 Stabilizer ground states: theory, algorithms and applications ( http://arxiv.org/abs/2403.08441v2 ) ライセンス: Link先を確認	Jiace Sun, Lixue Cheng, Shi-Xin Zhang,	(参考訳) 安定化器状態は、単純な数学的構造のため、量子情報、量子エラー補正、量子回路シミュレーションで一般的に利用されてきた。本研究では、量子多体問題に対処するために安定化器状態を適用し、安定化器基底状態の概念を導入する。我々は、パウリ・ハミルトニアン将軍の安定化基底状態を特定するための同値形式を確立した。さらに、1次元局所ハミルトニアンの安定化基底状態を得るための完全かつ線形スケールのアルゴリズムも開発し、したがって離散最適化は不要である。この等価形式と線形スケールのアルゴリズムは、有限サイズシステムだけでなく、無限周期システムにも適用可能である。アルゴリズムのスケーラビリティと効率は、異なるハミルトン多様体上で数値的にベンチマークされる。最後に、安定化器基底状態は、量子システムの質的な理解だけでなく、古典的および量子コンピュータ上でのより高度な量子状態の基盤としても有望なツールであることを示す。 Stabilizer states have been commonly utilized in quantum information, quantum error correction, and quantum circuit simulation due to their simple mathematical structure. In this work, we apply stabilizer states to tackle quantum many-body problems and introduce the concept of stabilizer ground states. We establish an equivalence formalism for identifying stabilizer ground states of general Pauli Hamiltonians. Moreover, we also develop an exact and linear-scaled algorithm to obtain stabilizer ground states of 1D local Hamiltonians and thus free from discrete optimization. This proposed equivalence formalism and linear-scaled algorithm are not only applicable to finite-size systems, but also adaptable to infinite periodic systems. The scalability and efficiency of the algorithms are numerically benchmarked on different Hamiltonians. Finally, we demonstrate that stabilizer ground states are promising tools for not only qualitative understanding of quantum systems, but also cornerstones of more advanced quantum state ansatzes on both classical and quantum computers.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# Q学習者に対する戦略化:制御理論的アプローチ Strategizing against Q-learners: A Control-theoretical Approach ( http://arxiv.org/abs/2403.08906v3 ) ライセンス: Link先を確認	Yuksel Arslantas, Ege Yuceel, Muhammed O. Sayin,	(参考訳) 本稿では,従来の多エージェント強化学習手法である独立Q-ラーニングアルゴリズム(Q-ラーニングアルゴリズム)の,通常型ゲームにおける高度な対戦相手の戦略的操作に対する感受性について検討する。敵のQ-ラーニングアルゴリズムを知っていれば、いかに戦略的に洗練されたエージェントが素質のQ-ラーナーを活用できるかを定量化する。この目的のために、戦略アクターの相互作用を確率ゲーム(Q-学習者のQ-関数推定を含む状態)として定式化し、Q-学習アルゴリズムが基礎となる力学系であるようにする。また、連続状態空間への量子化に基づく近似手法を提案し、競合する2人の戦略的アクターと1人の戦略的アクターのパフォーマンスを解析的および数値的に解析する。 In this paper, we explore the susceptibility of the independent Q-learning algorithms (a classical and widely used multi-agent reinforcement learning method) to strategic manipulation of sophisticated opponents in normal-form games played repeatedly. We quantify how much strategically sophisticated agents can exploit naive Q-learners if they know the opponents' Q-learning algorithm. To this end, we formulate the strategic actors' interactions as a stochastic game (whose state encompasses Q-function estimates of the Q-learners) as if the Q-learning algorithms are the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance for two competing strategic actors and a single strategic actor both analytically and numerically.	翻訳日:2024-07-17 21:30:11 公開日:2024-07-16
# PYRA: トレーニング推論効率の良いタスク適応のための並列収量再活性化 PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation ( http://arxiv.org/abs/2403.09192v3 ) ライセンス: Link先を確認	Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding,	(参考訳) 近年, 変圧器の規模が急速に拡大し, タスク適応の分野において, トレーニングオーバーヘッドや推論効率の面で大きな課題がもたらされている。既存の研究、すなわちパラメータ効率のよいファインチューニング(PEFT)とモデル圧縮は、これらの課題を別々に検討している。しかしPEFTは、特に大規模モデルでは、元のバックボーンの推論効率を保証できない。モデル圧縮は構造探索と再訓練にかなりの訓練コストを必要とする。したがって、これらの単純な組み合わせは、最小のコストでトレーニング効率と推論効率の両方を達成することを保証できない。本稿では,PYRA(Parallel Yielding Re-Activation)手法を提案する。 PYRAは、まず並列出力適応重みを利用して、下流タスクのデータ分布を包括的に知覚する。その後、トークン変調のための再活性化戦略がマージされるトークンに適用され、キャリブレーションされたトークン特徴が導かれる。 PYRAは低圧縮率と高圧縮率の両方で競合する全ての手法より優れており、大規模基礎モデルのトレーニング効率と推論効率の両面において、PYRAの有効性と優位性を示している。私たちのコードはhttps://github.com/THU-MIG/PYRA.comで公開されています。 Recently, the scale of transformers has grown rapidly, which introduces considerable challenges in terms of training overhead and inference efficiency in the scope of task adaptation. Existing works, namely Parameter-Efficient Fine-Tuning (PEFT) and model compression, have separately investigated the challenges. However, PEFT cannot guarantee the inference efficiency of the original backbone, especially for large-scale models. Model compression requires significant training costs for structure searching and re-training. Consequently, a simple combination of them cannot guarantee accomplishing both training efficiency and inference efficiency with minimal costs. In this paper, we propose a novel Parallel Yielding Re-Activation (PYRA) method for such a challenge of training-inference efficient task adaptation. PYRA first utilizes parallel yielding adaptive weights to comprehensively perceive the data distribution in downstream tasks. A re-activation strategy for token modulation is then applied for tokens to be merged, leading to calibrated token features. Extensive experiments demonstrate that PYRA outperforms all competing methods under both low compression rate and high compression rate, demonstrating its effectiveness and superiority in maintaining both training efficiency and inference efficiency for large-scale foundation models. Our code is available at https://github.com/THU-MIG/PYRA.	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# SCP-Diff:拡散に基づくセマンティック画像合成のための空間カテゴリー結合 SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis ( http://arxiv.org/abs/2403.09638v2 ) ライセンス: Link先を確認	Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhao,	(参考訳) セマンティック画像合成(SIS)は、センサシミュレーションに良い可能性を示している。しかし、この分野の現在のベストプラクティスは、GANに基づいており、まだ望ましい品質レベルに達していません。遅延拡散モデルが画像生成において顕著な進歩を遂げる中、我々はその高密度制御能力の顕著な方法である制御ネットを評価するよう促される。調査の結果,大きなセマンティック領域に奇妙なサブ構造が存在すること,セマンティックマスクによるコンテンツ調整の誤り,という2つの大きな問題が明らかになった。実験的な研究を通じて,これらの問題の原因を,推測段階で適用される雑音付きトレーニングデータ分布と標準正規値とのミスマッチとして特定した。この課題に対処するために、推論に先立って、空間的、カテゴリー的、および新しい空間的カテゴリー的関節を含む、SISの特定のノイズ先行法を開発した。 SCP-Diffという名前のこのアプローチは、SIS on Cityscapes, ADE20K and COCO-Stuffにおいて、新しい最先端の成果を設定し、Cityscapesでは10.53という低いFIDが得られる。コードとモデルはプロジェクトページからアクセスすることができる。 Semantic image synthesis (SIS) shows good promises for sensor simulation. However, current best practices in this field, based on GANs, have not yet reached the desired level of quality. As latent diffusion models make significant strides in image generation, we are prompted to evaluate ControlNet, a notable method for its dense control capabilities. Our investigation uncovered two primary issues with its results: the presence of weird sub-structures within large semantic areas and the misalignment of content with the semantic mask. Through empirical study, we pinpointed the cause of these problems as a mismatch between the noised training data distribution and the standard normal prior applied at the inference stage. To address this challenge, we developed specific noise priors for SIS, encompassing spatial, categorical, and a novel spatial-categorical joint prior for inference. This approach, which we have named SCP-Diff, has set new state-of-the-art results in SIS on Cityscapes, ADE20K and COCO-Stuff, yielding a FID as low as 10.53 on Cityscapes. The code and models can be accessed via the project page.	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# ニューロシンボリックビデオ理解に向けて Towards Neuro-Symbolic Video Understanding ( http://arxiv.org/abs/2403.11021v2 ) ライセンス: Link先を確認	Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali,	(参考訳) 近年のビデオデータ生産の急激な増加は、下流のタスクのためにビデオから意味のあるフレームを抽出する効率的なツールを必要としている。長期的時間的推論は、フレーム検索システムにとって重要なデシダータムである。 VideoLLaMAやViCLIPのような最先端の基盤モデルは、短期的な意味理解に熟練しているが、フレーム間の長期的な推論では驚くほど失敗する。この失敗の主な理由は、フレーム単位の認識と時間的推論を1つのディープネットワークに織り込むためである。したがって、効率的なシーン識別には、疎結合だが協調設計のセマンティック理解と時間的推論が不可欠である。本稿では,個々のフレームのセマンティック理解に視覚言語モデルを活用するシステムを提案する。我々のTLベースの推論は、WaymoやNuScenesといった最先端の自動運転データセットの推論にGPT4を使用するベンチマークと比較して、複雑なイベント識別のF1スコアを9～15%改善します。 The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reasoning across frames. A key reason for this failure is that they intertwine per-frame perception and temporal reasoning into a single deep network. Hence, decoupling but co-designing semantic understanding and temporal reasoning is essential for efficient scene identification. We propose a system that leverages vision-language models for semantic understanding of individual frames but effectively reasons about the long-term evolution of events using state machines and temporal logic (TL) formulae that inherently capture memory. Our TL-based reasoning improves the F1 score of complex event identification by 9-15% compared to benchmarks that use GPT4 for reasoning on state-of-the-art self-driving datasets such as Waymo and NuScenes.	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# 教師なしマルチクラス異常検出のための統一参照表現の学習 Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection ( http://arxiv.org/abs/2403.11561v2 ) ライセンス: Link先を確認	Liren He, Zhengkai Jiang, Jinlong Peng, Liang Liu, Qiangang Du, Xiaobin Hu, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang,	(参考訳) 多クラス異常検出の分野では、単一クラス異常検出から導かれる再構成に基づく手法は、「学習ショートカット」というよく知られた課題に直面し、モデルが通常のサンプルのパターンを学習するのに失敗し、その代わりにアイデンティティマッピングや人工ノイズ除去などのショートカットを選択する。結果として、モデルは通常のインスタンスとして真の異常を再構築することができなくなり、結果として異常検出が失敗する。本稿では,RLR (Reconstruct features from a Learnable Reference representation) と呼ばれる新しい特徴再構成に基づく異常検出フレームワークを提案する。従来の方法とは異なり、RLRは学習可能な参照表現を使用して、モデルに正常な特徴パターンを明示的に学習させる。さらに、RLRは学習可能な参照に局所性制約を組み込んで、より効果的な正常なパターンキャプチャを容易にし、マスク付き学習可能なキーアテンション機構を使用して堅牢性を高める。 15カテゴリのMVTec-ADデータセットと12カテゴリのVisAデータセットによるRLRの評価は、統一された設定下での最先端手法と比較して優れた性能を示している。 RLRのコードは公開されます。 In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of "learning shortcuts", wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity mapping or artificial noise elimination. Consequently, the model becomes unable to reconstruct genuine anomalies as normal instances, resulting in a failure of anomaly detection. To counter this issue, we present a novel unified feature reconstruction-based anomaly detection framework termed RLR (Reconstruct features from a Learnable Reference representation). Unlike previous methods, RLR utilizes learnable reference representations to compel the model to learn normal feature patterns explicitly, thereby prevents the model from succumbing to the "learning shortcuts" issue. Additionally, RLR incorporates locality constraints into the learnable reference to facilitate more effective normal pattern capture and utilizes a masked learnable key attention mechanism to enhance robustness. Evaluation of RLR on the 15-category MVTec-AD dataset and the 12-category VisA dataset shows superior performance compared to state-of-the-art methods under the unified setting. The code of RLR will be publicly available.	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# GVGEN: ボリューム表現によるテキストから3D生成 GVGEN: Text-to-3D Generation with Volumetric Representation ( http://arxiv.org/abs/2403.12957v2 ) ライセンス: Link先を確認	Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He,	(参考訳) 近年, 高速かつ高品質なレンダリング機能で知られる3次元再構成・生成技術として, 3次元ガウシアンスプラッティングが登場している。これらの欠点に対処するために,テキスト入力から3次元ガウス表現を効率的に生成する新しい拡散型フレームワークGVGENを提案する。提案手法は,(1)構造化体積表現法である。まず、分解された3次元ガウス点を構成形式として配置する。この変換により、一定数のガウスからなる体積内で複雑なテクスチャの詳細を捉えることができる。これらの詳細の表現を最適化するために,Candidate Pool Strategy という独特なプルーニング・デンシフィケーション手法を提案する。 2)粗粒化パイプライン GaussianVolumeの生成を単純化し、詳細な3次元形状のインスタンスを生成するためにモデルに力を与えるため、粗いパイプラインを提案する。最初は基本的な幾何学構造を構築し、続いて完全なガウス属性の予測を行う。筆者らのフレームワークであるGVGENは,既存の3次元生成法と比較して質的,定量的な評価において優れた性能を示す。同時に、高速な生成速度($7秒)を維持し、品質と効率のバランスを効果的に損なう。私たちのプロジェクトページは以下のとおりです。 In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed ($\sim$7 seconds), effectively striking a balance between quality and efficiency. Our project page is: https://gvgen.github.io/	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# 視覚変換器の回転位置埋め込み Rotary Position Embedding for Vision Transformer ( http://arxiv.org/abs/2403.13298v2 ) ライセンス: Link先を確認	Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun,	(参考訳) RoPE(Rotary Position Embedding)は、特にトランスフォーマーの長さ外挿において、言語モデルにおいて顕著に機能する。しかし、RoPEは視覚変換器(ViT)の性能を言語ドメインと似た方法で向上させることができるにもかかわらず、コンピュータビジョン領域に対するRoPEの影響は過小評価されている。本研究では,2次元視覚データに対するRoPEの実践的実装を利用して,VTに適用したRoPEの包括的解析を行う。解析の結果、RoPEは印象的な外挿性能、すなわち推論時の画像分解能を高めながら精度を維持できることが判明した。最終的にImageNet-1k、COCO検出、ADE-20kセグメンテーションのパフォーマンスが向上した。この研究は、RoPEをViTに適用するための徹底的なガイドラインを提供し、計算オーバーヘッドを最小限に抑えたバックボーン性能の向上を約束する。私たちのコードと事前訓練済みモデルはhttps://github.com/naver-ai/rope-vitで利用可能です。 Rotary Position Embedding (RoPE) performs remarkably on language models, especially for length extrapolation of Transformers. However, the impacts of RoPE on computer vision domains have been underexplored, even though RoPE appears capable of enhancing Vision Transformer (ViT) performance in a way similar to the language domain. This study provides a comprehensive analysis of RoPE when applied to ViTs, utilizing practical implementations of RoPE for 2D vision data. The analysis reveals that RoPE demonstrates impressive extrapolation performance, i.e., maintaining precision while increasing image resolution at inference. It eventually leads to performance improvement for ImageNet-1k, COCO detection, and ADE-20k segmentation. We believe this study provides thorough guidelines to apply RoPE into ViT, promising improved backbone performance with minimal extra computational overhead. Our code and pre-trained models are available at https://github.com/naver-ai/rope-vit	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# DaCapo: ビデオ分析のための自律システムにおける継続的学習の高速化 DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics ( http://arxiv.org/abs/2403.14353v3 ) ライセンス: Link先を確認	Yoonsung Kim, Changhun Oh, Jinwoo Hwang, Wonung Kim, Seongryong Oh, Yubin Lee, Hardik Sharma, Amir Yazdanbakhsh, Jongse Park,	(参考訳) ディープニューラルネットワーク(DNN)ビデオ分析は、自動運転車、無人航空機(UAV)、セキュリティロボットなどの自律システムにとって不可欠である。しかし、実際のデプロイメントは、計算リソースの制限とバッテリ電力のために困難に直面している。これらの課題に取り組むために、継続的学習は、デプロイメント(推論)における軽量な"学生"モデルを利用し、サンプルデータ(ラベル付け)のラベル付けにより大きな"教師"モデルを活用し、変化するシナリオ(トレーニング)に適応するために、学生モデルを継続的に再トレーニングする。本稿では,1)推論とラベリングの計算ニーズを見越しながら,リトレーニングのための計算に重点を置くこと,(2)バッテリー駆動の自律システムには適さないパワーハングリーGPUに依存すること,(3)マルチテナントシナリオを想定したリモート集中型サーバ上に置かれること,そして,プライバシー,ネットワーク可用性,レイテンシに関する懸念から,自律システムには適さないこと,といった,最先端の継続的学習システムの限界を強調した。本研究では,自律型システムによる推論,ラベル付け,トレーニングの同時実行を実現するためのハードウェアアルゴリズムであるDaCapoを提案する。 DaCapoは,(1)サブアクセラレータ上のカーネルをそれぞれの精度で並列実行可能な空間分割可能かつ高精度な加速器と,(2)資源・正確性トレードオフ空間を戦略的にナビゲートし,資源割り当ての最適決定を容易にする時空間資源割り当てアルゴリズムを備える。評価の結果,DaCapoは最先端のGPUベースの継続的学習システムであるEkyaとEOMUよりも6.5%,5.5%高い精度を実現し,消費電力は254倍減少した。 Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power.	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# 格子量子色力学の量子シミュレーションのためのQu8its Qu8its for Quantum Simulations of Lattice Quantum Chromodynamics ( http://arxiv.org/abs/2403.14537v2 ) ライセンス: Link先を確認	Marc Illa, Caroline E. P. Robin, Martin J. Savage,	(参考訳) 1+1D SU(3)格子量子色力学の力学の量子シミュレーションにおける$d=8$ qudits, qu8itsの有用性を探求する。並列ゲートの応用の最近の進歩は、2クォーディット演算と比較して単一クォーディット演算の適用時間が短くなり、量子シミュレーションの忠実度や量子ビットではなくクォーディットを用いた回路深度において大きな利点が期待できる。 qu8itsを用いた時間進化に必要な2量子エンタングリングゲートの数は、qubitsよりも5倍以下であることが判明した。この研究で示された発展により、新しい量子ハードウェアを用いて改良された量子シミュレーションが実行できるようになることを期待する。 We explore the utility of $d=8$ qudits, qu8its, for quantum simulations of the dynamics of 1+1D SU(3) lattice quantum chromodynamics, including a mapping for arbitrary numbers of flavors and lattice size and a re-organization of the Hamiltonian for efficient time-evolution. Recent advances in parallel gate applications, along with the shorter application times of single-qudit operations compared with two-qudit operations, lead to significant projected advantages in quantum simulation fidelities and circuit depths using qu8its rather than qubits. The number of two-qudit entangling gates required for time evolution using qu8its is found to be more than a factor of five fewer than for qubits. We anticipate that the developments presented in this work will enable improved quantum simulations to be performed using emerging quantum hardware.	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# データ増幅学習による簡潔で高品質な顔作り Toward Tiny and High-quality Facial Makeup with Data Amplify Learning ( http://arxiv.org/abs/2403.15033v3 ) ライセンス: Link先を確認	Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin, Ying Chen, Rui Shi, Yucheng Zheng, Yupeng Zhu, Bingbing Ni,	(参考訳) 現代の化粧は、主に障害のない学習パラダイムにヒンジでアプローチするが、不正確な監督(例えば、顔の修正)と洗練された顔のプロンプト(顔解析、ランドマーク検出を含む)の課題に対処する。これらの課題は、特にモバイルデバイスにおける顔化粧モデルの低コスト展開を禁止している。以上の問題を解決するために、我々は「データ増幅学習(DAL)」と呼ばれる新しい学習パラダイムを提案し、さらに「TinyBeauty」というコンパクトメイクモデルも提案する。 DALの中核となる考え方は、DDA(Diffusion-based Data Amplifier)を使用して、モデルトレーニングのための制限されたイメージを"増幅する"ことである。 1)残差拡散モデル(RDM)は、高忠実度の詳細を生成し、バニラ拡散モデルにおける詳細化問題を回避し、(2)ファイングラインドメイクアップモジュール(FGMM)は、顔認証を維持しながら正確なメイクアップ制御と組み合わせを実現するために提案されている。 DALと組み合わせて、TinyBeautyは80Kパラメータを必要とせず、複雑な顔プロンプトなしで最先端のパフォーマンスを実現する。一方、TinyBeautyはiPhone 13で460fpsという驚くべき速度を実現している。大規模な実験により、DALは5つの画像ペアだけで非常に競争力のあるメイクモデルを作成できることが示された。 Contemporary makeup approaches primarily hinge on unpaired learning paradigms, yet they grapple with the challenges of inaccurate supervision (e.g., face misalignment) and sophisticated facial prompts (including face parsing, and landmark detection). These challenges prohibit low-cost deployment of facial makeup models, especially on mobile devices. To solve above problems, we propose a brand-new learning paradigm, termed "Data Amplify Learning (DAL)," alongside a compact makeup model named "TinyBeauty." The core idea of DAL lies in employing a Diffusion-based Data Amplifier (DDA) to "amplify" limited images for the model training, thereby enabling accurate pixel-to-pixel supervision with merely a handful of annotations. Two pivotal innovations in DDA facilitate the above training approach: (1) A Residual Diffusion Model (RDM) is designed to generate high-fidelity detail and circumvent the detail vanishing problem in the vanilla diffusion models; (2) A Fine-Grained Makeup Module (FGMM) is proposed to achieve precise makeup control and combination while retaining face identity. Coupled with DAL, TinyBeauty necessitates merely 80K parameters to achieve a state-of-the-art performance without intricate face prompts. Meanwhile, TinyBeauty achieves a remarkable inference speed of up to 460 fps on the iPhone 13. Extensive experiments show that DAL can produce highly competitive makeup models using only 5 image pairs.	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# 無視からの憎しみ! 会話のヘイトスピーチに対する説得モードの蒸留 Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech ( http://arxiv.org/abs/2403.15449v2 ) ライセンス: Link先を確認	Ghadi Alyahya, Abeer Aldayel,	(参考訳) カウンター音声が使用する要因を調べることは、オンラインでヘイトスピーチに直面する最適な方法を理解するための中核にある。様々な研究は、感情的共感、攻撃性、敵意など、カウンタースピーチで使用される感情的基盤因子を評価してきた。本研究は、会話で使用される対語をより深く理解するために、説得様式を理性、感情、信頼性に精査し、人種差別、セクシズム、宗教的な偏見に関するクローズド(複数ターン)とオープン(単ターン)の2つのタイプの会話相互作用におけるそれらの使用を評価した。この評価は、機械が生成した逆音声とは対照的に、人間のソースで見られる異なる振る舞いをカバーしている。また、対訳に見られる姿勢と説得の態様との相互作用を評価する。特に、オープン・クローズドな相互作用において用いられる対音声の説得モードの微妙な相違、特にトピックの観点からは、コメントを憎むための対位法を表現するために、説得モードとして理性を用いる傾向が一般的である。機械が生成したカウンター音声は感情的な説得モードを示す傾向があり、人間のカウンターは理性に傾いている。さらに,本研究は,他の説得モードよりも支援的な応答が得られる傾向が示唆された。この知見は、ヘイトスピーチに対する研究に説得モードを組み込むことの可能性を強調し、それらが説明可能性の最適な手段として機能し、回答のスタンスをさらに導入する方法と、それが最適なカウンター音声を構成するものを評価する上で果たす役割を明らかにする。 Examining the factors that the counterspeech uses are at the core of understanding the optimal methods for confronting hate speech online. Various studies have assessed the emotional base factors used in counter speech, such as emotional empathy, offensiveness, and hostility. To better understand the counterspeech used in conversations, this study distills persuasion modes into reason, emotion, and credibility and evaluates their use in two types of conversation interactions: closed (multi-turn) and open (single-turn) concerning racism, sexism, and religious bigotry. The evaluation covers the distinct behaviors seen with human-sourced as opposed to machine-generated counterspeech. It also assesses the interplay between the stance taken and the mode of persuasion seen in the counterspeech. Notably, we observe nuanced differences in the counterspeech persuasion modes used in open and closed interactions, especially in terms of the topic, with a general tendency to use reason as a persuasion mode to express the counterpoint to hate comments. The machine-generated counterspeech tends to exhibit an emotional persuasion mode, while human counters lean toward reason. Furthermore, our study shows that reason tends to obtain more supportive replies than other persuasion modes. The findings highlight the potential for incorporating persuasion modes into studies about countering hate speech, as they can serve as an optimal means of explainability and pave the way for the further adoption of the reply's stance and the role it plays in assessing what comprises the optimal counterspeech.	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# インターフュージョン:3次元ヒューマンオブジェクトインタラクションのテキスト駆動生成 InterFusion: Text-Driven Generation of 3D Human-Object Interaction ( http://arxiv.org/abs/2403.15612v2 ) ライセンス: Link先を確認	Sisi Dai, Wenhao Li, Haowen Sun, Haibin Huang, Chongyang Ma, Hui Huang, Kai Xu, Ruizhen Hu,	(参考訳) 本研究では,ゼロショットテキスト・ツー・3D方式でテキスト記述から3次元オブジェクト間インタラクション(HOI)を生成する複雑な課題に取り組む。 HOIにおける直接テキスト・ツー・3D手法の不満足な結果は主にペアのテキスト・インタラクションデータがないことによるものであり、複雑な空間的関係を持つ複数の概念を同時に生成する上で固有の困難さである。これらの問題を効果的に解決するために,HOI生成用に設計された2段階のフレームワークであるInterFusionを提案する。インターフュージョンは、テキストから派生した人間のポーズ推定を幾何学的先行として含み、テキストから3Dへの変換プロセスを単純化し、正確なオブジェクト生成のための追加の制約を導入する。最初の段階では、InterFusionは、幅広いインタラクションを描写した合成画像データセットから3Dの人間のポーズを抽出し、その後、これらのポーズをインタラクション記述にマッピングする。 InterFusionの第2段階は、テキストから3D生成の最新の発展を活かし、現実的で高品質な3D HOIシーンを制作できる。これは、人体とオブジェクトの生成を別々に最適化し、シーン全体のグローバルな最適化と共同で洗練し、シームレスでコンテキスト的に一貫性のある統合を保証する、ローカル・グローバルな最適化プロセスによって達成される。実験の結果,InterFusionは3次元HOI生成において既存の最先端手法よりも優れていたことが確認された。 In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with complex spatial relationships. To effectively address these issues, we present InterFusion, a two-stage framework specifically designed for HOI generation. InterFusion involves human pose estimations derived from text as geometric priors, which simplifies the text-to-3D conversion process and introduces additional constraints for accurate object generation. At the first stage, InterFusion extracts 3D human poses from a synthesized image dataset depicting a wide range of interactions, subsequently mapping these poses to interaction descriptions. The second stage of InterFusion capitalizes on the latest developments in text-to-3D generation, enabling the production of realistic and high-quality 3D HOI scenes. This is achieved through a local-global optimization process, where the generation of human body and object is optimized separately, and jointly refined with a global optimization of the entire scene, ensuring a seamless and contextually coherent integration. Our experimental results affirm that InterFusion significantly outperforms existing state-of-the-art methods in 3D HOI generation.	翻訳日:2024-07-17 21:18:43 公開日:2024-07-16
# 空間・時間・脳におけるバックプロパゲーション Backpropagation through space, time, and the brain ( http://arxiv.org/abs/2403.16933v2 ) ライセンス: Link先を確認	Benjamin Ellenberger, Paul Haider, Jakob Jordan, Kevin Max, Ismael Jaras, Laura Kriener, Federico Benitez, Mihai A. Petrovici,	(参考訳) 時空間的局所性制約によって束縛された神経細胞の物理的ネットワークが、いかに効率的なクレジット割り当てを行うことができるかは、大きな疑問が残る。機械学習では、その答えは空間と時間の両方を通して、ほとんど普遍的にエラーのバックプロパゲーションアルゴリズムによって与えられる。しかし、このアルゴリズムは生物学的に証明できない仮定、特に時空間(非局所性)に頼っていることはよく知られている。リアルタイム・リカレント・ラーニングのような別のフォワード・プロパゲーション・モデルでは、局所性の問題が部分的に解決されるが、ストレージの要求が禁止されているため、スケーリングのコストに限られる。本稿では,ニューロンの物理的,動的ネットワークにおける完全局所的時空間クレジット割り当てのための計算フレームワークであるGeneralized Latent Equilibrium (GLE)を紹介する。まず、ニューロン局所的なミスマッチに基づいてエネルギーを定義し、そこから定常性による神経力学と勾配降下によるパラメータ力学の両方を導出する。結果として生じる力学は、連続的な活動的な局所シナプス可塑性を持つ深部皮質神経回路網における空間と時間を通しての、リアルタイムで生物学的に妥当なバックプロパゲーションの近似と解釈できる。特に、GLEは樹状体の木の形態を利用して、単一ニューロンにおけるより複雑な情報保存と処理を可能にし、情報伝達の両方向において必須である膜電位に関して、生物学的ニューロンが出力速度を位相シフトさせる能力も持っている。前方の計算では、時間連続入力をニューロン空間にマッピングすることができ、時空間の時空間畳み込みを効果的に行うことができる。後ろ向きの計算では、フィードバック信号の時間反転を許容し、結果として有用なパラメータ更新に必要な随伴変数を近似する。 How physical networks of neurons, bound by spatio-temporal locality constraints, can perform efficient credit assignment, remains, to a large extent, an open question. In machine learning, the answer is almost universally given by the error backpropagation algorithm, through both space and time. However, this algorithm is well-known to rely on biologically implausible assumptions, in particular with respect to spatio-temporal (non-)locality. Alternative forward-propagation models such as real-time recurrent learning only partially solve the locality problem, but only at the cost of scaling, due to prohibitive storage requirements. We introduce Generalized Latent Equilibrium (GLE), a computational framework for fully local spatio-temporal credit assignment in physical, dynamical networks of neurons. We start by defining an energy based on neuron-local mismatches, from which we derive both neuronal dynamics via stationarity and parameter dynamics via gradient descent. The resulting dynamics can be interpreted as a real-time, biologically plausible approximation of backpropagation through space and time in deep cortical networks with continuous-time neuronal dynamics and continuously active, local synaptic plasticity. In particular, GLE exploits the morphology of dendritic trees to enable more complex information storage and processing in single neurons, as well as the ability of biological neurons to phase-shift their output rate with respect to their membrane potential, which is essential in both directions of information propagation. For the forward computation, it enables the mapping of time-continuous inputs to neuronal space, effectively performing a spatio-temporal convolution. For the backward computation, it permits the temporal inversion of feedback signals, which consequently approximate the adjoint variables necessary for useful parameter updates.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# 内視鏡映像からの単眼深度推定のための近接場照明の活用 Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos ( http://arxiv.org/abs/2403.17915v2 ) ライセンス: Link先を確認	Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen M. Pizer, Marc Niethammer, Roni Sengupta,	(参考訳) 内視鏡ビデオにおける単眼深度推定は、補助手術やロボット手術によって臓器のより良いカバレッジと様々な健康問題の検出を可能にする。主流である自然画像深度推定の進歩は期待できるが、強力な幾何学的特徴の欠如と難解な照明効果のため、内視鏡画像では技術が不十分である。本稿では, 内視鏡から放射される光を表面から反射する光学的手がかりを用いて, 単分子深度推定を改善する。まず、画素ごとのシェーディング表現を利用した教師付きおよび自己監督型の2つの新しい損失関数を作成する。次に、同じピクセルごとのシェーディング表現を利用する新しい深度改善ネットワーク(PPSNet)を提案する。最後に,教師学生の移動学習を導入し,自己監督型と臨床データを用いた合成データから,より深い深度マップを作成する。我々は,臨床データから高品質な深度マップを推定しながら,C3VDデータセットの最先端結果を得る。私たちのコード、事前訓練されたモデル、補足的な資料は、プロジェクトのページで確認できます。 Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation. We first create two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. We then propose a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation. Finally, we introduce teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. We achieve state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data. Our code, pre-trained models, and supplementary materials can be found on our project page: https://ppsnet.github.io/	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# ボース・アインシュタイン凝縮体を用いた重力の量子的性質の探索 Probing the quantum nature of gravity using a Bose-Einstein condensate ( http://arxiv.org/abs/2403.18460v3 ) ライセンス: Link先を確認	Soham Sen, Sunandan Gangopadhyay,	(参考訳) ボース・アインシュタイン凝縮体を用いてグラビトンによる騒音の影響について検討した。重力波の摂動は運動量空間における離散フーリエモードの和と見なされる。作用素表現と、全系の重力とボゾン部分に対応する正準共役変数の間の正準可換関係を通じて位相空間変数を量子化し、適切な量子重力設定を得る。次に, 擬ゴールドストーン粒子の時間依存性部分の解からボゴリューボフ係数を求め, 初期懸濁状態にあるボソンの共分散測定値を構成する。フィッシャー情報の確率平均を用いて重力波の振幅パラメータの低い値を求める。計算全体をゼロ温度で行うと、ボゾン系は建設によってボース=アインシュタイン凝縮体として振る舞う。ボース=アインシュタインが1つのモードで凝縮すると、振幅測定における不確実性の平方の期待値の低い境界は、全観測項が0に近づくと無限にならない。すべての運動量モードをまとめるために、次は時間とともに減衰する適切なガウス重み係数を持つ雑音項を考える。次に、振幅パラメータの分散の正方形の最終的な期待値に対する下界を求める。重力波によって誘導されるノイズのため、ボース・アインシュタイン凝縮体を用いて重力波を検出できない測定時間の最小値が存在する。最後に、ボース・アインシュタイン凝縮体のフォノンモード間の相互作用を考察し、デコヒーレンスをもたらす。この脱コヒーレンス効果は, 最小のスクイージングを有するグラビトンに対して重要であることが観察された。 The effect of noise induced by gravitons has been investigated using a Bose-Einstein condensate. The gravitational wave perturbation is then considerd as a sum of discrete Fourier modes in the momentum space. Coming to an operatorial representation and quantizing the phase space variables via appropriately introduced canonincal commutation relations between the canonically conjugate variables corresponding to the graviton and bosonic part of the total system, one obtains a proper quantum gravity setup. Then we obtain the Bogoliubov coefficients from the solution of the time-dependent part of the pseudo-Goldstone boson and construct the covariance metric for the bosons initially being in a squeezed state. Using the stochastic average of the Fisher information, we obtain a lower bound on the amplitude parameter of the gravitational wave. As the entire calculation is done at zero temperature, the bosonic system, by construction, will behave as a Bose-Einstein condensate. For a Bose-Einstein condensate with a single mode, we observe that the lower bound of the expectation value of the square of the uncertainty in the amplitude measurement does not become infinite when the total observational term approaches zero. In order to sum over all possible momentum modes, we next consider a noise term with a suitable Gaussian weight factor which decays over time. We then obtain the lower bound on the final expectation value of the square of the variance in the amplitude parameter. Because of the noise induced by the graviton, there is a minimum value of the measurement time below which it is impossible to detect any gravitational wave using a Bose-Einstein condensate. Finally, we consider interaction between the phonon modes of the Bose-Einstein condensate which results in the decoherence. We observe that the decoherence effect becomes significant for gravitons with minimal squeezing.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# 動的コンテキスト内:詩列を用いた慣性認識型3次元人体モデリング Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence ( http://arxiv.org/abs/2403.19160v2 ) ライセンス: Link先を確認	Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng,	(参考訳) ニューラルレンダリング技術は、3次元の人体モデリングを著しく進歩させた。しかし、従来のアプローチでは、運動慣性などの要因によって引き起こされるダイナミクスを見落とし、回転後の突然停止のようなシナリオでは、ポーズが変化しながら静止している。この制限は、1つのポーズを条件入力として依存することから生じ、1つのポーズを複数の外観にマッピングするあいまいさをもたらす。本研究では、現在のフレームのポーズ状態だけでなく、過去のポーズ状態にも人間の外観の変化が依存していることを明らかにする。そこで本稿では,非剛性変形と標準空間にデルタポーズシーケンス表現を応用し,時間変動を効果的にモデル化するDycoを提案する。新たなポーズに対するモデルの一般化能力の低下を防止するため、不要なボディ間の依存関係を減らすための低次元グローバルコンテキストと、モデルによるデルタポーズシーケンスのオーバーフィッティングを軽減するための量子化操作を提案する。 I3D-Human という新しいデータセットを収集し,衣服の外観の時間的変化を近似的なポーズで捉えた。 I3D-Humanおよび既存のデータセットに関する広範な実験を通じて,本手法は質的かつ定量的な性能を示す。さらに, 慣性を考慮した3次元人間の手法は, 異なる速度での慣性による外観変化を前例なくシミュレートすることができる。 Neural rendering techniques have significantly advanced 3D human body modeling. However, previous approaches often overlook dynamics induced by factors such as motion inertia, leading to challenges in scenarios like abrupt stops after rotation, where the pose remains static while the appearance changes. This limitation arises from reliance on a single pose as conditional input, resulting in ambiguity in mapping one pose to multiple appearances. In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states. Therefore, we introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations and canonical space to effectively model temporal appearance variations. To prevent a decrease in the model's generalization ability to novel poses, we further propose low-dimensional global context to reduce unnecessary inter-body part dependencies and a quantization operation to mitigate overfitting of the delta pose sequence by the model. To validate the effectiveness of our approach, we collected a novel dataset named I3D-Human, with a focus on capturing temporal changes in clothing appearance under approximate poses. Through extensive experiments on both I3D-Human and existing datasets, our approach demonstrates superior qualitative and quantitative performance. In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# 一次元非相互格子における非エルミート皮膚効果と拡張状態の共存 Coexistence of non-Hermitian skin effect and extended states in one-dimensional nonreciprocal lattices ( http://arxiv.org/abs/2403.19430v2 ) ライセンス: Link先を確認	Han Xiao, Qi-Bo Zeng,	(参考訳) 本研究は,1次元非エルミチアン格子のスタッガートオンサイト変調と非相互ホッピングを次のアレスト近傍(NNN)サイトまで行うことを目的とした。 NNN の非相反性のため、開境界条件 (OBC) 下での系の非エルミート皮膚効果 (NHSE) はエネルギー依存性があり、格子の反対側に局在する固有状態を分離する固有エネルギースペクトルに NHSE エッジが存在する。非相反ホッピングとオンサイト変調の相互作用は、皮膚効果の方向を逆転させ、NHSEエッジの位置を変更することができる。さらに、システムパラメータをチューニングすることにより、OBCの下での固有状態のいくつかは完全に拡張され、対応する固有エネルギーは開境界条件と周期境界条件の両方で虚数となる。したがって、拡張状態は同じシステムでNHSHと共存することができる。 NHSEは、変調が虚構であるときに全ての固有状態が拡張されて完全に溶解する。本研究は,非エルミート系におけるオンサイト変調と非相互ホッピングの複雑な相互作用を明らかにする。 We study the one-dimensional non-Hermitian lattices with staggered onsite modulations and nonreciprocal hopping up to the next-nearest-neighboring (NNN) sites. Due to the NNN nonreciprocity, the non-Hermitian skin effect (NHSE) in the system under open boundary conditions (OBC) can be energy dependent, and there will be NHSE edges in the eigenenergy spectrum, which separates the eigenstates localized at the opposite ends of the lattice. We find that the interplay between the nonreciprocal hopping and onsite modulations can reverse the direction of the skin effect and modify the position of the NHSE edge. Moreover, by tuning the system parameters, some of the eigenstates under OBC will become fully extended with the corresponding eigenenergies being imaginary under both open and periodic boundary conditions. Thus, the extended states can coexist with the NHSH in the same system. The NHSE can even be completely dissolved with all the eigenstates being extended when the modulation is imaginary. Our work unveils the intricate interplay between onsite modulations and nonreciprocal hopping in non-Hermitian systems.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# Change-Agent: 対話型総合的リモートセンシング変化解釈と分析を目指して Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis ( http://arxiv.org/abs/2403.19646v3 ) ライセンス: Link先を確認	Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, Zhenwei Shi,	(参考訳) 地球表面における変化のモニタリングは、自然の過程や人間の影響を理解するために不可欠であり、精密で包括的な解釈手法を必要とする。リモートセンシング衛星画像は、これらの変化を監視するためのユニークな視点を提供し、重要な研究焦点としてリモートセンシング画像変化解釈(RSICI)の出現につながった。現在のRSICI技術は、変更検出と変更キャプションを包含しており、それぞれに包括的な解釈を提供する限界がある。そこで本稿では,変更検出や変更キャプション,変更対象のカウント,変更原因分析など,包括的な変更解釈と洞察に富んだ分析を実現するためのユーザ指示に従うインタラクティブなChange-Agentを提案する。 Change-Agentは、マルチレベル変化解釈(MCI)モデルを目として、大きな言語モデル(LLM)を脳として統合する。 MCIモデルには2つのピクセルレベルの変化検出とセマンティックレベルの変化キャプションが含まれており、BI時間的反復相互作用(BI3)層がモデルの識別的特徴表現能力を高めるために提案されている。 MCIモデルのトレーニングを支援するため、多数の変更マスクと変更のキャプションを備えたLEVIR-MCIデータセットを構築した。実験では,変化検出と変化記述を同時に達成する上で,MCIモデルのSOTA性能を実証し,表面変化の包括的解釈を容易にする上で,我々のChange-Agentの有望な応用価値を強調し,インテリジェントなリモートセンシングアプリケーションのための新たな道を開く。 MCIモデルとChange-Agentのデータセットとコードベースをhttps://github.com/Chen-Yang-Liu/Change-Agentで公開します。 Monitoring changes in the Earth's surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of remote sensing image change interpretation (RSICI) as a significant research focus. Current RSICI technology encompasses change detection and change captioning, each with its limitations in providing comprehensive interpretation. To address this, we propose an interactive Change-Agent, which can follow user instructions to achieve comprehensive change interpretation and insightful analysis, such as change detection and change captioning, change object counting, change cause analysis, etc. The Change-Agent integrates a multi-level change interpretation (MCI) model as the eyes and a large language model (LLM) as the brain. The MCI model contains two branches of pixel-level change detection and semantic-level change captioning, in which the BI-temporal Iterative Interaction (BI3) layer is proposed to enhance the model's discriminative feature representation capabilities. To support the training of the MCI model, we build the LEVIR-MCI dataset with a large number of change masks and captions of changes. Experiments demonstrate the SOTA performance of the MCI model in achieving both change detection and change description simultaneously, and highlight the promising application value of our Change-Agent in facilitating comprehensive interpretation of surface changes, which opens up a new avenue for intelligent remote sensing applications. To facilitate future research, we will make our dataset and codebase of the MCI model and Change-Agent publicly available at https://github.com/Chen-Yang-Liu/Change-Agent	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# Diff-Reg v1: 登録問題に対する拡散マッチングモデル Diff-Reg v1: Diffusion Matching Model for Registration Problem ( http://arxiv.org/abs/2403.19919v2 ) ライセンス: Link先を確認	Qianliang Wu, Haobo Jiang, Lei Luo, Jun Li, Yaqing Ding, Jin Xie, Jian Yang,	(参考訳) 3Dや2D3Dの登録のような登録タスクには、信頼できる対応を確立することが不可欠である。既存の手法では、幾何学的あるいは意味的な特徴を利用して潜在的な対応を生成する。しかし、これらの特徴は大きな変形、スケールの不整合、曖昧なマッチング問題(例えば対称性)といった課題に直面している可能性がある。さらに、シングルパス予測に依存する多くの従来の手法は、複雑なシナリオにおいて局所ミニマと競合する可能性がある。これらの課題を軽減するために,ロバスト対応構築のための拡散マッチングモデルを提案する。提案手法は, 2次確率行列空間内の共振拡散過程として対応し, 2次確率マッチング行列を2次確率マッチング行列から2次確率マッチング行列に分解し,高品質な対応推定を行う。これは、ガウス雑音を基底の真理マッチング行列に徐々に導入する前方拡散過程と、雑音マッチング行列を反復的に洗練する逆復調過程を含む。特に、バックボーンからの特徴抽出は推論フェーズ中に1回だけ発生する。我々の軽量デノナイジングモジュールは、各逆サンプリングステップで同じ機能を利用する。 3次元および2次元の登録タスクにおける本手法の有効性を検証した。コードはhttps://github.com/wuqianliang/Diff-Reg.comで公開されている。 Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy matching matrix. In particular, the feature extraction from the backbone occurs only once during the inference phase. Our lightweight denoising module utilizes the same feature at each reverse sampling step. Evaluation of our method on both 3D and 2D3D registration tasks confirms its effectiveness. The code is available at https://github.com/wuqianliang/Diff-Reg.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# 振動子に対する量子ランゲヴィン方程式の弱結合限界 Weak-coupling limits of the quantum Langevin equation for an oscillator ( http://arxiv.org/abs/2404.01285v2 ) ライセンス: Link先を確認	Aritra Ghosh, Sushanta Dattagupta,	(参考訳) 独立振動子モデルから得られる量子ランゲヴィン方程式は、ゴリーニ=コサコフスキー=スダルシャン=リンドブラッド方程式の文脈で用いられるボルン=マルコフ近似を欠いた強い結合状態を記述する。この問題は、変動散逸定理を満たす雑音項を持つ高調波発振器に対して、量子ランゲヴィン方程式のレベルでそのような'Born-Markov'のような近似を実装するとどうなるかということである。この背景には、回転波近似についてもコメントする。 The quantum Langevin equation as obtained from the independent-oscillator model describes a strong-coupling situation, devoid of the Born-Markov approximation that is employed in the context of the Gorini-Kossakowski-Sudarshan-Lindblad equation. The question we address is what happens when we implement such 'Born-Markov'-like approximations at the level of the quantum Langevin equation for a harmonic oscillator which carries a noise term satisfying a fluctuation-dissipation theorem. In this backdrop, we also comment on the rotating-wave approximation.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# JailbreakBench: 大規模言語モデルのジェイルブレークのためのオープンなロバストネスベンチマーク JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models ( http://arxiv.org/abs/2404.01318v4 ) ライセンス: Link先を確認	Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong,	(参考訳) ジェイルブレイク攻撃は、大きな言語モデル(LLM)が有害、非倫理的、またはその他の不快なコンテンツを生成する原因となる。これらの攻撃を評価することは、現在のベンチマークや評価技術が適切に対処していない多くの課題を示す。第一に、脱獄評価に関する明確な基準はない。第二に、既存の作業はコストと成功率を相容れない方法で計算します。そして第3に、多くの著作物は再現不可能で、敵のプロンプトを無視したり、クローズドソースのコードに関わったり、プロプライエタリなAPIの進化に依存している。これらの課題に対処するために、(1) ジェイルブレイクアーティファクトと呼ばれる最先端の敵対的プロンプトの進化したリポジトリ、(2) 以前の作業(Zou et al , 2023; Mazeika et al , 2023, 2024)から生まれた100の行動からなるジェイルブレイクデータセット、(3) https://github.com/JailbreakBench/jailbreakbenchの標準化された評価フレームワークで、明確に定義された脅威モデル、システムプロンプト、チャットテンプレート、スコアリング機能を含む。我々は、このベンチマークのリリースによる倫理的影響を慎重に検討し、コミュニティにとってプラスになると考えている。 Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. First, there is no clear standard of practice regarding jailbreaking evaluation. Second, existing works compute costs and success rates in incomparable ways. And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely on evolving proprietary APIs. To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work (Zou et al., 2023; Mazeika et al., 2023, 2024) -- which align with OpenAI's usage policies; (3) a standardized evaluation framework at https://github.com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at https://jailbreakbench.github.io/ that tracks the performance of attacks and defenses for various LLMs. We have carefully considered the potential ethical implications of releasing this benchmark, and believe that it will be a net positive for the community.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# HSViT:水平にスケーラブルな視覚変換器 HSViT: Horizontally Scalable Vision Transformer ( http://arxiv.org/abs/2404.05196v2 ) ライセンス: Link先を確認	Chenhao Xu, Chang-Tsun Li, Chee Peng Lim, Douglas Creighton,	(参考訳) 事前知識(帰納バイアス)が不足しているため、ViT(Vision Transformer)は大規模データセットの事前トレーニングを必要としている。さらに、ViTモデルの成長するレイヤとパラメータは、限られたコンピューティングリソースを持つデバイスへの適用性を妨げている。上記の課題を緩和するため,本稿では,新しい水平拡張型視覚変換器(HSViT)方式を提案する。具体的には、ViTに新しいイメージレベルの機能埋め込みが導入され、保存された帰納バイアスにより、小さなデータセットでパフォーマンスを向上しながら、事前トレーニングの必要性を排除できる。さらに、水平にスケーラブルな新しいアーキテクチャが設計され、複数のコンピューティングデバイス間で協調的なモデルトレーニングと推論を容易にする。実験結果は、事前トレーニングなしで、HSViTは、小さなデータセット上の最先端のスキームよりも最大10%高いトップ1精度を達成する一方で、ImageNet上で既存のCNNバックボーンを最大3.1%改善することを示している。コードはhttps://github.com/xuchenhao001/HSViTで入手できる。 Due to its deficiency in prior knowledge (inductive bias), Vision Transformer (ViT) requires pre-training on large-scale datasets to perform well. Moreover, the growing layers and parameters in ViT models impede their applicability to devices with limited computing resources. To mitigate the aforementioned challenges, this paper introduces a novel horizontally scalable vision transformer (HSViT) scheme. Specifically, a novel image-level feature embedding is introduced to ViT, where the preserved inductive bias allows the model to eliminate the need for pre-training while outperforming on small datasets. Besides, a novel horizontally scalable architecture is designed, facilitating collaborative model training and inference across multiple computing devices. The experimental results depict that, without pre-training, HSViT achieves up to 10% higher top-1 accuracy than state-of-the-art schemes on small datasets, while providing existing CNN backbones up to 3.1% improvement in top-1 accuracy on ImageNet. The code is available at https://github.com/xuchenhao001/HSViT.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# PromptAD:Few-Shot 異常検出のための正規サンプルのみを用いた学習プロンプト PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection ( http://arxiv.org/abs/2404.05231v2 ) ライセンス: Link先を確認	Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma,	(参考訳) 視覚言語モデルは、数発の産業異常検出に大きな改善をもたらしており、通常は急速エンジニアリングを通じて数百のプロンプトを設計する必要がある。自動シナリオでは,まず従来のプロンプト学習をベースラインとして多クラスパラダイムを用いて,プロンプトを自動的に学習するが,一クラス異常検出ではうまく動作しないことがわかった。そこで本研究では,PromptADと呼ばれる,数発の異常検出のための一級プロンプト学習手法を提案する。まず,正常なプロンプトと異常なサフィックスを連結することにより,通常のプロンプトを異常なプロンプトに変換できるセマンティック・コンカネーションを提案する。さらに,異常画像の欠如によるトレーニング課題を軽減するために,異常画像と異常画像とのマージンを明示的に制御する明示的異常マージンの概念を導入する。画像レベル/ピクセルレベルの異常検出のために、PromptADはMVTecとVisAで11/12のショット設定で1位を達成した。 The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# 3D-COCO:画像検出用MS-COCOデータセットと3D再構成モジュールの拡張 3D-COCO: extension of MS-COCO dataset for image detection and 3D reconstruction modules ( http://arxiv.org/abs/2404.05641v3 ) ライセンス: Link先を確認	Maxence Bideaux, Alice Phe, Mohamed Chaouch, Bertrand Luvison, Quoc-Cuong Pham,	(参考訳) 3Dモデルと2D-3Dアライメントアノテーションを提供するMS-COCOデータセットの拡張である3D-COCOを紹介する。 3D-COCOは、テキスト、2D画像、および3DCADモデルクエリで構成可能な3D再構成や画像検出などのコンピュータビジョンタスクを実現するように設計されている。既存のMS-COCOデータセットは、ShapeNetとObjaverseで収集された28Kの3Dモデルで完結する。 IoUをベースとした手法により,各MS-COCOアノテーションと最適な3Dモデルとをマッチングし,2D-3Dアライメントを実現する。 3D-COCOのオープンソース性は、新しい3D関連トピック研究の道を開くためのプレミアである。データセットとそのソースコードはhttps://kalisteo.cea.fr/index.php/coco3d-object-detection-and-reconstruction/で公開されている。 We introduce 3D-COCO, an extension of the original MS-COCO dataset providing 3D models and 2D-3D alignment annotations. 3D-COCO was designed to achieve computer vision tasks such as 3D reconstruction or image detection configurable with textual, 2D image, and 3D CAD model queries. We complete the existing MS-COCO dataset with 28K 3D models collected on ShapeNet and Objaverse. By using an IoU-based method, we match each MS-COCO annotation with the best 3D models to provide a 2D-3D alignment. The open-source nature of 3D-COCO is a premiere that should pave the way for new research on 3D-related topics. The dataset and its source codes is available at https://kalisteo.cea.fr/index.php/coco3d-object-detection-and-reconstruction/	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# 球面面表現による安定3次元フルヘッド合成 SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation ( http://arxiv.org/abs/2404.05680v2 ) ライセンス: Link先を確認	Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, Xiaoguang Han,	(参考訳) 近年のGAN(Generative Adversarial Networks)の進歩は,ヒトの顔合成の発達に寄与しているが,すべての角度から視認できる完全な3D頭部を包括的に合成するという課題は今も続いている。 PanoHeadは、正面と後方の両方のビューをイメージした大規模なデータセットをフルヘッド合成に使用する可能性を証明しているが、多くの場合、バックビューのアーティファクトを発生させる。詳細な分析の結果,主に2倍の理由が判明した。まず、ネットワークアーキテクチャの観点から、利用した三平面/三格子表現空間の各平面は、両面から特徴を混乱させる傾向があり、「輝く」アーティファクト(例えば、眼鏡が後ろに現れる)が生じる。第2に、データ監視の観点から、既存の3D GANにおける差別化訓練は、レンダリング画像自体の品質に重点を置いており、レンダリングされた視点では、その妥当性をあまり気にしていないことがわかった。これにより、差別者を騙すのが簡単であるため、前向きでない視点で「顔」を生成できる。球面座標系における新しい三面面表現であるSphereHeadを提案し,人間の頭部の幾何学的特徴に適合し,生成した人工物の多くを効率的に緩和する。さらに、カメラパラメータと画像の対応性を強調するために、識別器の視像整合性損失を導入する。これらの取り組みを組み合わせることで、視覚的に優れた成果が得られ、成果物は著しく少ない。私たちのコードとデータセットはhttps://lhyfst.github.io/spherehead.comで公開されています。 While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often causes artifacts for back views. Based on our in-depth analysis, we found the reasons are mainly twofold. First, from network architecture perspective, we found each plane in the utilized tri-plane/tri-grid representation space tends to confuse the features from both sides, causing "mirroring" artifacts (e.g., the glasses appear in the back). Second, from data supervision aspect, we found that existing discriminator training in 3D GANs mainly focuses on the quality of the rendered image itself, and does not care much about its plausibility with the perspective from which it was rendered. This makes it possible to generate "face" in non-frontal views, due to its easiness to fool the discriminator. In response, we propose SphereHead, a novel tri-plane representation in the spherical coordinate system that fits the human head's geometric characteristics and efficiently mitigates many of the generated artifacts. We further introduce a view-image consistency loss for the discriminator to emphasize the correspondence of the camera parameters and the images. The combination of these efforts results in visually superior outcomes with significantly fewer artifacts. Our code and dataset are publicly available at https://lhyfst.github.io/spherehead.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# フェルミオン熱場理論における量子計算 Quantum computation in fermionic thermal field theories ( http://arxiv.org/abs/2404.07912v2 ) ライセンス: Link先を確認	Wenyang Qian, Bin Wu,	(参考訳) 有限温度での量子場の熱的性質は、強く相互作用する物質を理解するために不可欠であり、量子コンピューティングにおける最近の発展は、代替的で有望な研究の道筋となった。本研究では,量子アルゴリズムを用いてフェルミオンのみを含む熱場理論を研究する。まず、汎用量子場理論の熱的性質を評価するために用いられる量子想像時間進化のような量子アルゴリズムとともに、デジタル量子コンピュータ上の量子ビットによるフェルミオン場のプレゼンテーションを探索する。具体的には、Majoranaフェルミオンの熱分布やエネルギー密度などの数値計算結果を量子シミュレーターを用いて1+1次元で示す。自由場理論に加えて、空間的に均質なマヨナ場との結合から生じる相互作用の効果についても検討する。どちらの場合も、位相空間分布を用いて系の熱的性質を記述できることを解析的に示し、量子シミュレーションの結果は解析的および半古典的期待値と一致することを示す。我々の研究は、熱的固定点を理解するための重要なステップであり、リアルタイムの熱化の量子シミュレーションの準備である。 Thermal properties of quantum fields at finite temperature are crucial to understanding strongly interacting matter and recent development in quantum computing has provided an alternative and promising avenue of study. In this work, we study thermal field theories involving only fermions using quantum algorithms. We first delve into the presentations of fermion fields via qubits on digital quantum computers alongside the quantum algorithms such as quantum imaginary time evolutions employed to evaluate thermal properties of generic quantum field theories. Specifically, we show numerical results such as the thermal distribution and the energy density of thermal field theories for Majorana fermions in 1+1 dimensions using quantum simulators. In addition to free field theory, we also study the effects of interactions resulting from coupling with a spatially homogeneous Majorana field. In both cases, we show analytically that thermal properties of the system can be described using phase-space distributions, and the quantum simulation results agree with analytical and semiclassical expectations. Our work is an important step to understand thermal fixed points, preparing for quantum simulation of thermalization in real time.	翻訳日:2024-07-17 21:08:58 公開日:2024-07-16
# LaVy: ベトナムのマルチモーダル大言語モデル LaVy: Vietnamese Multimodal Large Language Model ( http://arxiv.org/abs/2404.07922v6 ) ライセンス: Link先を確認	Chi Tran, Huong Le Thanh,	(参考訳) LLM(Large Language Models)とMLLM(Multimodal Large Language Models)は、複雑な推論と言語理解において印象的な能力を持つ嵐によって世界を席巻している。一方、ベトナムの大規模言語モデルに関連する多くの作品があり、マルチモーダリティにおける高品質な資源の欠如はベトナムのMLLMの進歩を妨げている。本稿では,現在最先端のベトナム語MLLMであるLaVyを導入することでこの問題に対処し,また,MLLMのベトナム語視覚言語タスクに対する理解を評価するためのLaVy-Benchベンチマークも導入する。私たちのプロジェクトはhttps://github.com/baochi0212/LaVyで公開されています。 Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs's understanding on Vietnamese visual language tasks. Our project is public at https://github.com/baochi0212/LaVy	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# 量子ビットゆらぎの物理インフォームドトラッキング Physics-informed tracking of qubit fluctuations ( http://arxiv.org/abs/2404.09212v2 ) ライセンス: Link先を確認	Fabrizio Berritta, Jan A. Krzywda, Jacob Benestad, Joost van der Heijden, Federico Fedele, Saeed Fallahi, Geoffrey C. Gardner, Michael J. Manfra, Evert van Nieuwenburg, Jeroen Danon, Anasua Chatterjee, Ferdinand Kuemmeth,	(参考訳) 環境変動は固体量子ビットの性能を低下させるが、原理的には推定効率によって設定された時間スケールまでリアルタイムハミルトン推定によって緩和することができる。物理インフォームドおよび適応ベイズ推定戦略を実装し,それを半導体スピン量子ビットにリアルタイムで適用する。物理インフォームド戦略は、ガリウム-ヒ素中の核スピン拡散の影響を説明するのに適した、フォッカー・プランク方程式に従って量子コントローラ内の確率分布を伝播させる。所定のキュービットプローブシーケンスによる予測分布の評価と絞りにより、シングルトリップキュービット内の非制御磁場勾配の動的追跡を改善することができる。適応戦略は、プローブシーケンスを少数のキュービットプローブサイクルに置き換え、前の測定結果に基づいて各プローブ時間を設定することにより、推定効率をさらに高める。組み合わせたリアルタイム推定戦略は、固体量子ビット内の低周波核スピン変動を効率的に追跡し、適切な更新方程式を調整して異なるノイズ源を捕捉することにより、他の量子ビットプラットフォームに適用することができる。 Environmental fluctuations degrade the performance of solid-state qubits but can in principle be mitigated by real-time Hamiltonian estimation down to time scales set by the estimation efficiency. We implement a physics-informed and an adaptive Bayesian estimation strategy and apply them in real time to a semiconductor spin qubit. The physics-informed strategy propagates a probability distribution inside the quantum controller according to the Fokker-Planck equation, appropriate for describing the effects of nuclear spin diffusion in gallium-arsenide. Evaluating and narrowing the anticipated distribution by a predetermined qubit probe sequence enables improved dynamical tracking of the uncontrolled magnetic field gradient within the singlet-triplet qubit. The adaptive strategy replaces the probe sequence by a small number of qubit probe cycles, with each probe time conditioned on the previous measurement outcomes, thereby further increasing the estimation efficiency. The combined real-time estimation strategy efficiently tracks low-frequency nuclear spin fluctuations in solid-state qubits, and can be applied to other qubit platforms by tailoring the appropriate update equation to capture their distinct noise sources.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# アンダーバッグングのレプリカ解析 A replica analysis of under-bagging ( http://arxiv.org/abs/2404.09779v3 ) ライセンス: Link先を確認	Takashi Takahashi,	(参考訳) アンダーバッグング(Under-bagging, UB)は、アンダーサンプリングとバッグングを組み合わせたアンサンブル学習法である。アンダーサンプリングによる試料径の減少による分散の増大を抑えるためにバッジを用いることは自然なアプローチである。しかし近年、一般化線形モデルでは、クラス不均衡構造を考慮しない単純バッグングとリッジ正規化が同じ結果をもたらすことが指摘されている。したがって、線形モデルのトレーニングにおいて、アンダーサンプルデータセットの数に比例する計算コストの増大を必要とするUBを使う方がよいかどうかは明らかではない。このような状況を踏まえ、本研究ではUBの急激な漸近をヒューリスティックに導き、二成分混合データから線形分類器を訓練するシナリオにおいて、不均衡データから学習する他の一般的な方法と比較する。比較した方法は、アンダーサンプリング(US)法と、アンダーサンプリングデータの単一実現を用いてモデルをトレーニングする単純な重み付け(SW)法と、全データに重み付き損失を持つモデルをトレーニングする単純な重み付け(SW)法を含む。特に少数クラスのサイズが小さい場合において、クラス不均衡が大きい場合であっても、少数クラスのサイズを維持しながら、多数クラスのサイズを増大させることにより、UBの性能が向上することが示されている。これは、大多数のクラスサイズからほぼ独立したパフォーマンスを持つ米国とは対照的である。この意味では、アンダーサンプリングによる分散の増大を抑える方法として、バッグングと単純な正規化が異なる。一方、最適な重み付け係数を持つSWの性能はUBとほぼ同等であり、再重み付けと正則化の組み合わせはUBと類似している可能性がある。 Under-bagging (UB), which combines under-sampling and bagging, is a popular ensemble learning method for training classifiers on an imbalanced data. Using bagging to reduce the increased variance caused by the reduction in sample size due to under-sampling is a natural approach. However, it has recently been pointed out that in generalized linear models, naive bagging, which does not consider the class imbalance structure, and ridge regularization can produce the same results. Therefore, it is not obvious whether it is better to use UB, which requires an increased computational cost proportional to the number of under-sampled data sets, when training linear models. Given such a situation, in this study, we heuristically derive a sharp asymptotics of UB and use it to compare with several other popular methods for learning from imbalanced data, in the scenario where a linear classifier is trained from a two-component mixture data. The methods compared include the under-sampling (US) method, which trains a model using a single realization of the under-sampled data, and the simple weighting (SW) method, which trains a model with a weighted loss on the entire data. It is shown that the performance of UB is improved by increasing the size of the majority class while keeping the size of the minority fixed, even though the class imbalance can be large, especially when the size of the minority class is small. This is in contrast to US, whose performance is almost independent of the majority class size. In this sense, bagging and simple regularization differ as methods to reduce the variance increased by under-sampling. On the other hand, the performance of SW with the optimal weighting coefficients is almost equal to UB, indicating that the combination of reweighting and regularization may be similar to UB.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# 分子アンサンブルに対するTavis-Cummingsモデルの拡張 -- 双極子自己エネルギーと静的双極子モーメントの効果を探る Extending the Tavis-Cummings model for molecular ensembles -- Exploring the effects of dipole self energies and static dipole moments ( http://arxiv.org/abs/2404.10680v2 ) ライセンス: Link先を確認	Lucas Borges, Thomas Schnappinger, Markus Kowalewski,	(参考訳) 有機分子とナノスケールキャビティの真空場との強いカップリングは、それらの化学的および物理的性質を変更するために用いられる。分子アンサンブルに対するTavis-Cummingsモデルを拡張し、静的双極子モーメントと双極子自己エネルギーから生じるしばしば無視される相互作用項が、偏光化学における光-物質相互作用の正確な記述に不可欠であることを示す。完全な量子記述に基づいて、光空洞に共鳴結合したMgH$^+$分子の励起状態ダイナミクスと分光をシミュレートする。静的双極子モーメントと双極子自己エネルギーの包含は、一貫したモデルを得るのに必要であることを示す。実分子系の主要な特徴を再現し,より大規模な分子アンサンブルをシミュレートする,効率的な2レベルシステムアプローチを構築した。 Strong coupling of organic molecules to the vacuum field of a nanoscale cavity can be used to modify their chemical and physical properties. We extend the Tavis-Cummings model for molecular ensembles and show that the often neglected interaction terms arising from the static dipole moment and the dipole self-energy are essential for a correct description of the light-matter interaction in polaritonic chemistry. On the basis of a full quantum description, we simulate the excited-state dynamics and spectroscopy of MgH$^+$ molecules resonantly coupled to an optical cavity. We show that the inclusion of static dipole moments and the dipole self-energy is necessary to obtain a consistent model. We construct an efficient two-level system approach that reproduces the main features of the real molecular system and may be used to simulate larger molecular ensembles.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# StyleCity: 大規模3D都市シーンのスタイリゼーション StyleCity: Large-Scale 3D Urban Scenes Stylization ( http://arxiv.org/abs/2404.10681v2 ) ライセンス: Link先を確認	Yingshu Chen, Huajian Huang, Tuan-Anh Vu, Ka Chun Shum, Sai-Kit Yeung,	(参考訳) さまざまなスタイルで大規模な仮想都市シーンを作ることは、本質的に困難である。仮想制作のプロトタイプを容易にし,複雑な材料や照明設備の必要を回避すべく,大規模な都市シーンを対象とした視覚・テキスト駆動型テクスチャスタイリングシステムであるStyleCityを紹介した。画像とテキストを参照として、StyleCityは、大都市シーンの3次元テクスチャメッシュを意味論的に認識し、調和した全方位空背景を生成する。そこで我々は,2次元の視覚とテクスチャをグローバルかつ局所的に3Dに転送することで,ニューラルネットワークのテクスチャフィールドをスタイリングすることを提案する。 3Dスタイリングでは,高品質なシーンコンテンツを保存するために,入力された3Dシーンのトレーニングビューを異なるレベルに段階的に拡大する。次に、トレーニングビューのスケールにスタイルイメージのスケールを適用することで、世界規模でシーンスタイルを最適化する。さらに,写真リアリスティックなスタイリゼーションに不可欠なセマンティクス・アウェアスタイルの損失によって,局所的なセマンティクスの整合性を向上させる。テクスチャのスタイリゼーションに加えて,より没入的な雰囲気を提供し,セマンティックなスタイリゼーションプロセスを支援する,スタイルに一貫性のある全方位スカイイメージを合成するための生成拡散モデルも導入する。スタイリングされたニューラルテクスチャフィールドを任意の解像度のテクスチャに焼き込むことができ、従来のレンダリングパイプラインへのシームレスな統合を可能にし、仮想生産プロトタイピングプロセスを大幅に緩和することができる。大規模な実験は、質的で定量的なパフォーマンスとユーザの嗜好において、スタイリングされたシーンの優越性を実証する。 Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a large-scale urban scene in a semantics-aware fashion and generates a harmonic omnidirectional sky background. To achieve that, we propose to stylize a neural texture field by transferring 2D vision-and-text priors to 3D globally and locally. During 3D stylization, we progressively scale the planned training views of the input 3D scene at different levels in order to preserve high-quality scene content. We then optimize the scene style globally by adapting the scale of the style image with the scale of the training views. Moreover, we enhance local semantics consistency by the semantics-aware style loss which is crucial for photo-realistic stylization. Besides texture stylization, we further adopt a generative diffusion model to synthesize a style-consistent omnidirectional sky image, which offers a more immersive atmosphere and assists the semantic stylization process. The stylized neural texture field can be baked into an arbitrary-resolution texture, enabling seamless integration into conventional rendering pipelines and significantly easing the virtual production prototyping process. Extensive experiments demonstrate our stylized scenes' superiority in qualitative and quantitative performance and user preferences.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# 物理インフォームドアクティブラーニングによる量子化学シミュレーションの高速化 Physics-informed active learning for accelerating quantum chemical simulations ( http://arxiv.org/abs/2404.11811v2 ) ライセンス: Link先を確認	Yi-Fan Hou, Lina Zhang, Quanhao Zhang, Fuchun Ge, Pavlo O. Dral,	(参考訳) 量子化学シミュレーションは、しばしばアクティブラーニング(AL)を使用して行われる機械学習ポテンシャルを構築することで、大幅に加速することができる。構築されたポテンシャルの有用性は、必要とされる高い労力とシミュレーションにおいて不十分なロバスト性によって制限されることが多い。ここでは、時間とリソースを手頃な価格で投資し、人間の干渉を最小限に抑えて、堅牢なデータ効率ポテンシャルを構築するためのエンドツーエンドALを紹介する。我々のALプロトコルは、物理インフォームドされたトレーニングポイントのサンプリング、初期データの自動選択、不確実性定量化、収束モニタリングに基づいている。このプロトコルの汎用性は、振動スペクトルをシミュレートするための準古典分子動力学、重要な生化学分子のコンホメータ探索、ディールス・アルダー反応の時間分解機構の実装において示される。これらの調査は、高性能コンピューティングクラスタ上での純粋な量子化学計算ではなく、数週間を要した。 MLatomとチュートリアルのコードはhttps://github.com/dralgroup/mlatom.comで公開されている。 Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable investment of time and resources and minimum human interference. Our AL protocol is based on the physics-informed sampling of training points, automatic selection of initial data, uncertainty quantification, and convergence monitoring. The versatility of this protocol is shown in our implementation of quasi-classical molecular dynamics for simulating vibrational spectra, conformer search of a key biochemical molecule, and time-resolved mechanism of the Diels-Alder reactions. These investigations took us days instead of weeks of pure quantum chemical calculations on a high-performance computing cluster. The code in MLatom and tutorials are available at https://github.com/dralgroup/mlatom.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# RetailOpt:スマートフォンの動きデータと小売施設情報から簡単に軌道を推定できるOpt-In RetailOpt: Opt-In, Easy-to-Deploy Trajectory Estimation from Smartphone Motion Data and Retail Facility Information ( http://arxiv.org/abs/2404.12548v2 ) ライセンス: Link先を確認	Ryo Yonetani, Jun Baba, Yasutaka Furukawa,	(参考訳) RetailOptは、屋内小売環境でオフラインで顧客の動きを追跡するための、オプトインで簡単にデプロイできる新しいシステムである。このシステムは、顧客のスマートフォンや小売アプリから簡単にアクセス可能な情報(モーションデータ、ストアマップ、購入記録など)を利用する。これにより、追加のハードウェアインストール/メンテナンスが不要になり、顧客が完全なデータコントロールを保証できる。具体的には、RetailOptはまず慣性ナビゲーションを使用して、スマートフォンのモーションデータから相対軌道を復元する。店舗マップと購入記録は、訪問した棚のリストを特定するために相互参照され、連続的かつ離散的な最適化を通じて、店舗内の相対軌跡をローカライズするアンカーを提供する。 5つの異なる環境におけるシステムの有効性を実証する。このシステムは、成功すれば、顧客の行動分析や店内ナビゲーションを含む幅広い小売アプリケーションに不可欠な、正確な顧客移動データを生成する。 We present RetailOpt, a novel opt-in, easy-to-deploy system for tracking customer movements offline in indoor retail environments. The system uses readily accessible information from customer smartphones and retail apps, including motion data, store maps, and purchase records. This eliminates the need for additional hardware installations/maintenance and ensures customers full data control. Specifically, RetailOpt first uses inertial navigation to recover relative trajectories from smartphone motion data. The store map and purchase records are cross-referenced to identify a list of visited shelves, providing anchors to localize the relative trajectories in a store through continuous and discrete optimization. We demonstrate the effectiveness of our system in five diverse environments. The system, if successful, would produce accurate customer movement data, essential for a broad range of retail applications including customer behavior analysis and in-store navigation.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# 絡み合いに基づく人工トポロジー:周辺ネットワークノード Entanglement-Based Artificial Topology: Neighboring Remote Network Nodes ( http://arxiv.org/abs/2404.16204v2 ) ライセンス: Link先を確認	SiYi Chen, Jessica Illiano, Angela Sara Cacciapuoti, Marcello Caleffi,	(参考訳) 絡み合いは、量子インターネットの鍵となる通信資源として全会一致で認識される。しかし, 両端の絡み合いに注意を集中させることによって, 両端の絡み合いを生かして, 新たなネットワーク機能を実現する可能性について, これまでに検討が進んでいない。本稿では,ネットワーク間リソースとしてマルチパーティ・エンタングルメントを活用することを目的としている。具体的には、異なる量子局所領域ネットワーク(QLAN)の相互接続を考察し、マルチパーティント・エンタングルメントにより、局所演算のみにより、物理QLANトポロジの限界を克服する、QLAN間人工トポロジを動的に生成できることを示す。そこで本研究ではまず,各QLANに分散するマルチパーティの絡み合った状態を設計する。そして、そのような状態がどのように設計されるかを示す。一異なるQLANに属する相互接続ノード及び二異なるQLAN間トラフィックパターンに動的に適応すること。我々の貢献は、ネットワークエンジニアリングコミュニティに、人工トポロジと人工地区の概念に関する手持ちのガイドラインを提供することである。 Entanglement is unanimously recognized as the key communication resource of the Quantum Internet. Yet, the possibility of implementing novel network functionalities by exploiting the marvels of entanglement has been poorly investigated so far, by mainly restricting the attention to bipartite entanglement. Conversely, in this paper, we aim at exploiting multipartite entanglement as inter-network resource. Specifically, we consider the interconnection of different Quantum Local Area Networks (QLANs), and we show that multipartite entanglement allows to dynamically generate an inter-QLAN artificial topology, by means of local operations only, that overcomes the limitations of the physical QLAN topologies. To this aim, we first design the multipartite entangled state to be distributed within each QLAN. Then, we show how such a state can be engineered to: i) interconnect nodes belonging to different QLANs, and ii) dynamically adapt to different inter-QLAN traffic patterns. Our contribution aims at providing the network engineering community with a hands-on guideline towards the concept of artificial topology and artificial neighborhood.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# HYPE:未特定画像とテキストのためのハイパーボリックエンターメントフィルタ HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts ( http://arxiv.org/abs/2404.17507v2 ) ライセンス: Link先を確認	Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun,	(参考訳) データ量によって自己教師付き学習の有効性が促進される時代において、データセマンティクスの特異性と明確性はモデルトレーニングにおいて重要な役割を担っている。そこで, HYPerbolic Entailment Filtering (HYPE) を導入し, 広範でノイズの多い画像とテキストのペアのデータセットから, モダリティに有意かつ整合性のあるデータを正確に抽出する手法を提案する。提案手法は, ハイパーボリックな埋め込みとエンテーメント・コーンの概念を利用して, サンプルを無意味あるいは不特定なセマンティクスで評価・フィルタリングし, サンプルの特異性の向上に重点を置いている。 HYPEは、フィルタリング効率を大幅に改善するだけでなく、既存のフィルタリング技術と組み合わせることで、DataCompベンチマークの最先端を新たに設定する。このブレークスルーは、HYPEがデータ選択プロセスを洗練させる可能性を示し、より正確で効率的な自己教師型学習モデルの開発に寄与する。さらに、画像特異性$\epsilon_{i}$は、画像テキストまたは画像のみのデータプールから画像のみのデータセットをインジェクションして、画像のみの自己教師付きモデルをトレーニングし、CLIPスコアによって誘導されたデータセットと比較して優れたパフォーマンスを示すために独立して適用することができる。 In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training. Addressing this, we introduce HYPerbolic Entailment filtering (HYPE), a novel methodology designed to meticulously extract modality-wise meaningful and well-aligned data from extensive, noisy image-text pair datasets. Our approach leverages hyperbolic embeddings and the concept of entailment cones to evaluate and filter out samples with meaningless or underspecified semantics, focusing on enhancing the specificity of each data sample. HYPE not only demonstrates a significant improvement in filtering efficiency but also sets a new state-of-the-art in the DataComp benchmark when combined with existing filtering techniques. This breakthrough showcases the potential of HYPE to refine the data selection process, thereby contributing to the development of more accurate and efficient self-supervised learning models. Additionally, the image specificity $\epsilon_{i}$ can be independently applied to induce an image-only dataset from an image-text or image-only data pool for training image-only self-supervised models and showed superior performance when compared to the dataset induced by CLIP score.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# 行列積状態における非安定度と絡み合い Non-stabilizerness versus entanglement in matrix product states ( http://arxiv.org/abs/2404.18768v2 ) ライセンス: Link先を確認	M. Frau, P. S. Tarabunga, M. Collura, M. Dalmonte, E. Tirrito,	(参考訳) 本稿では,行列積状態(MPS)における絡み合いと非安定化剤性(マジックとも呼ばれる)の関係について検討する。スピン1異方性ハイゼンベルク鎖のマジックと相互マジックの完全状態(相互情報の非安定化アナログ、したがって境界効果のない)の2つの異なる文脈において、多体系の基底状態を近似するために用いられるマジックと結合次元の関係について検討する。この結果から,非安定化剤性に対する収束結果の取得は,典型的には絡み合いよりもかなり容易であることが示唆された。臨界点と十分に大きな体積での完全な状態マジックに対して、$\chi$はMPS結合次元である1/\chi^2$の収束を観測する。小さなボリュームでは、マジック飽和が非常に速く、エラーバー内では、有限$\chi$補正を評価できない。相互魔法はまた、結合次元との高速な収束を示すが、その特定の機能形態はサンプリングエラーによって妨げられる。本研究の副産物として,パウリ・マルコフ連鎖(当初は魔法を評価するために定式化された)がMPSの相互情報の計算において最先端の情報をリセットする方法を示す。臨界点における連結分割間の相互情報の対数的増加を検証することで、この最後の事実を説明する。相互情報と相互マジックを比較することで、接続されたパーティションの場合、後者は通常、パーティションサイズとパーティションサイズとのスケーリングが遅くなります。 In this paper, we investigate the relationship between entanglement and non-stabilizerness (also known as magic) in matrix product states (MPSs). We study the relation between magic and the bond dimension used to approximate the ground state of a many-body system in two different contexts: full state of magic and mutual magic (the non-stabilizer analogue of mutual information, thus free of boundary effects) of spin-1 anisotropic Heisenberg chains. Our results indicate that obtaining converged results for non-stabilizerness is typically considerably easier than entanglement. For full state magic at critical points and at sufficiently large volumes, we observe convergence with $1/\chi^2$, with $\chi$ being the MPS bond dimension. At small volumes, magic saturation is so quick that, within error bars, we cannot appreciate any finite-$\chi$ correction. Mutual magic also shows a fast convergence with bond dimension, whose specific functional form is however hindered by sampling errors. As a by-product of our study, we show how Pauli-Markov chains (originally formulated to evaluate magic) resets the state of the art in terms of computing mutual information for MPS. We illustrate this last fact by verifying the logarithmic increase of mutual information between connected partitions at critical points. By comparing mutual information and mutual magic, we observe that, for connected partitions, the latter is typically scaling much slower - if at all - with the partition size, while for disconnected partitions, both are constant in size.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# ニューロモルフィックハードウェアにおけるロバスト多時間記号計算を可能にする分散表現 Distributed Representations Enable Robust Multi-Timescale Symbolic Computation in Neuromorphic Hardware ( http://arxiv.org/abs/2405.01305v2 ) ライセンス: Link先を確認	Madison Cotteret, Hugh Greatorex, Alpha Renner, Junren Chen, Emre Neftci, Huaqiang Wu, Giacomo Indiveri, Martin Ziegler, Elisabetta Chicca,	(参考訳) マルチスケール計算を堅牢に行うために、繰り返しスパイクニューラルネットワーク(RSNN)をプログラミングすることは、依然として難しい課題である。これを解決するために,高次元分布表現の特性を利用して,ロバストなマルチタイムダイナミックスをアトラクタベースRSNNに組み込むシングルショット重み学習手法について述べる。対称自己解離重み行列と非対称遷移項を重畳することにより、有限状態機械をRSNN力学に埋め込み、それぞれ状態間の入力とヘテロ解離外部積のベクトル結合によって形成される。提案手法は,高度に非理想的な重みを持つシミュレーション,実験的なクローズドループ・メムリシブ・ハードウェア・セットアップ,および大規模マシンにシームレスにスケールするLoihi 2を用いて検証した。この研究は、パラメータの微調整やプラットフォーム固有の重要な最適化を必要とせず、リカレントダイナミクスによる堅牢な記号計算をニューロモルフィックハードウェアに組み込むスケーラブルなアプローチを導入している。さらに、分散シンボル表現は、ニューロモルフィックハードウェアにおける認知アルゴリズムのための高度に有能な表現不変言語として機能することを示した。 Programming recurrent spiking neural networks (RSNNs) to robustly perform multi-timescale computation remains a difficult challenge. To address this, we describe a single-shot weight learning scheme to embed robust multi-timescale dynamics into attractor-based RSNNs, by exploiting the properties of high-dimensional distributed representations. We embed finite state machines into the RSNN dynamics by superimposing a symmetric autoassociative weight matrix and asymmetric transition terms, which are each formed by the vector binding of an input and heteroassociative outer-products between states. Our approach is validated through simulations with highly non-ideal weights; an experimental closed-loop memristive hardware setup; and on Loihi 2, where it scales seamlessly to large state machines. This work introduces a scalable approach to embed robust symbolic computation through recurrent dynamics into neuromorphic hardware, without requiring parameter fine-tuning or significant platform-specific optimisation. Moreover, it demonstrates that distributed symbolic representations serve as a highly capable representation-invariant language for cognitive algorithms in neuromorphic hardware.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# ワンオンラインエージェントは、平均的なフィールドゲームを効果的に学習できる A Single Online Agent Can Efficiently Learn Mean Field Games ( http://arxiv.org/abs/2405.03718v2 ) ライセンス: Link先を確認	Chenyu Zhang, Xu Chen, Xuan Di,	(参考訳) 平均場ゲーム (MFGs) は大規模人口システムの振る舞いをモデル化するための有望なフレームワークである。しかし、MFGの解決は、前向きの個体群進化と後向きのエージェントダイナミクスの結合によって困難になる可能性がある。通常、平均場 Nash 平衡 (MFNE) を得るには、固定点反復 (FPI) と呼ばれる前方と後方のプロセスが交互に解かれる反復的アプローチが必要となる。この方法は、空間領域全体にわたって完全に観察された人口伝播とエージェントダイナミクスを必要とするが、現実のシナリオでは現実的ではない。この制限を克服するために,本研究では,オンラインサンプルを用いたMFNE学習を,状態-行動空間,報酬関数,遷移ダイナミクスの事前知識を伴わずに行うことのできる,新しいオンライン単エージェントモデルフリー学習方式を提案する。具体的には、エージェントは、そのポリシーを値関数(Q)を介して更新し、同時に平均場状態(M)を評価し、同じ観察バッチを用いて評価する。我々はこの学習方式の2つの変種を開発する: オフ・ポリティクスとオン・ポリティクスのQM反復である。それらが効率的にFPIを近似していることが証明され、複雑性の保証が提供される。数値実験により本手法の有効性を確認した。 Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems. However, solving MFGs can be challenging due to the coupling of forward population evolution and backward agent dynamics. Typically, obtaining mean field Nash equilibria (MFNE) involves an iterative approach where the forward and backward processes are solved alternately, known as fixed-point iteration (FPI). This method requires fully observed population propagation and agent dynamics over the entire spatial domain, which could be impractical in some real-world scenarios. To overcome this limitation, this paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples, without prior knowledge of the state-action space, reward function, or transition dynamics. Specifically, the agent updates its policy through the value function (Q), while simultaneously evaluating the mean field state (M), using the same batch of observations. We develop two variants of this learning scheme: off-policy and on-policy QM iteration. We prove that they efficiently approximate FPI, and a sample complexity guarantee is provided. The efficacy of our methods is confirmed by numerical experiments.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# 確率的な1ステップ生成のための特徴学習 Characteristic Learning for Provable One Step Generation ( http://arxiv.org/abs/2405.05512v4 ) ライセンス: Link先を確認	Zhao Ding, Chenguang Duan, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang, Pingwen Zhang,	(参考訳) 本稿では,GAN(Generative Adversarial Networks)におけるサンプリング効率とフローベースモデルの安定した性能を組み合わせた,新しい一段階生成モデルである特徴生成器を提案する。我々のモデルは、確率密度輸送を通常の微分方程式(ODE)で記述できる特性によって駆動される。具体的には、非パラメトリック回帰を用いて速度場を推定し、Euler法を用いて確率フローODEを解き、特性に対する一連の離散近似を生成する。次に、深層ニューラルネットワークを用いてこれらの特性に適合し、先行分布を目標分布へ効果的にプッシュするワンステップマッピングを確実にする。理論的には, 速度マッチング, オイラー離散化, 特性適合の誤差を分析し, 2-ワッサーシュタイン距離における特性発生器の非漸近収束速度を確立する。私たちの知る限りでは、これはシミュレーションなしの1ステップ生成モデルに対する最初の徹底的な分析である。さらに,本研究では,前処理におけるフローベース生成モデルの誤差解析を改良する。提案手法を合成データセットと実データセットの両方に適用し,ニューラルネットワークの単一評価で特徴生成器が高次品質を実現することを示す。 We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field through nonparametric regression and utilize Euler method to solve the probability flow ODE, generating a series of discrete approximations to the characteristics. We then use a deep neural network to fit these characteristics, ensuring a one-step mapping that effectively pushes the prior distribution towards the target distribution. In the theoretical aspect, we analyze the errors in velocity matching, Euler discretization, and characteristic fitting to establish a non-asymptotic convergence rate for the characteristic generator in 2-Wasserstein distance. To the best of our knowledge, this is the first thorough analysis for simulation-free one step generative models. Additionally, our analysis refines the error analysis of flow-based generative models in prior works. We apply our method on both synthetic and real datasets, and the results demonstrate that the characteristic generator achieves high generation quality with just a single evaluation of neural network.	翻訳日:2024-07-17 20:59:06 公開日:2024-07-16
# ダンス・アニー・ビート:ダンス・ビデオ・ジェネレーションのビジュアル・ビート Dance Any Beat: Blending Beats with Visuals in Dance Video Generation ( http://arxiv.org/abs/2405.09266v2 ) ライセンス: Link先を確認	Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai,	(参考訳) 自動振付は、音楽からダンスを生成することによって進行する。現在の方法では、完全なダンスビデオではなくスケルトンキーポイントシーケンスを作成し、実際の使用を制限することで、特定の個人がダンスをすることができない。これらのメソッドには正確なキーポイントアノテーションも必要であり、データの収集が難しくなり、自作のビデオデータセットの使用が制限される。これらの課題を克服するために、音楽によってガイドされた個人の画像から直接ダンスビデオを生成するという新しいタスクを導入する。このタスクは、キーポイントアノテーションを必要とせず、特定の個人のダンス生成を可能にする。我々のソリューションであるDance Any Beat Diffusion Model (DabFusion)は、参照画像と楽曲を使用して、さまざまなダンスタイプや振付を特徴とするダンスビデオを生成する。音楽は、ダンススタイル、ムーブメント、リズムといった重要な特徴を識別する、特別に設計された音楽エンコーダによって分析される。 DabFusionは、トレーニングデータセットの個人だけでなく、これまで目に見えない人でもダンスビデオを生成するのに長けている。この汎用性は、画像中の任意の人物をアニメーションするために必要なすべての動き情報を含む潜在光学フローを生成するというアプローチに起因している。 AIST++データセットを用いてDabFusionの性能評価を行い,映像品質,オーディオ・ビデオ同期,モーション・ミュージックアライメントに着目した。本研究では、ビートアライメントスコアをベースとした2次元モーションミュージックアライメントスコア(2D-MMアライメントスコア)を提案する。実験の結果、我々のDabFusionがこの革新的なタスクの確かなベースラインを確立していることがわかった。ビデオの結果はプロジェクトのページで確認できます。 Automated choreography advances by generating dance from music. Current methods create skeleton keypoint sequences, not full dance videos, and cannot make specific individuals dance, limiting their real-world use. These methods also need precise keypoint annotations, making data collection difficult and restricting the use of self-made video datasets. To overcome these challenges, we introduce a novel task: generating dance videos directly from images of individuals guided by music. This task enables the dance generation of specific individuals without requiring keypoint annotations, making it more versatile and applicable to various situations. Our solution, the Dance Any Beat Diffusion model (DabFusion), utilizes a reference image and a music piece to generate dance videos featuring various dance types and choreographies. The music is analyzed by our specially designed music encoder, which identifies essential features including dance style, movement, and rhythm. DabFusion excels in generating dance videos not only for individuals in the training dataset but also for any previously unseen person. This versatility stems from its approach of generating latent optical flow, which contains all necessary motion information to animate any person in the image. We evaluate DabFusion's performance using the AIST++ dataset, focusing on video quality, audio-video synchronization, and motion-music alignment. We propose a 2D Motion-Music Alignment Score (2D-MM Align), which builds on the Beat Alignment Score to more effectively evaluate motion-music alignment for this new task. Experiments show that our DabFusion establishes a solid baseline for this innovative task. Video results can be found on our project page: https://DabFusion.github.io.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# モダリティエキスパートの混在による脳病変分割の基礎モデル A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts ( http://arxiv.org/abs/2405.10246v2 ) ライセンス: Link先を確認	Xinru Zhang, Ni Ou, Berke Doga Basaran, Marco Visentin, Mengyun Qiao, Renyang Gu, Cheng Ouyang, Yaou Liu, Paul M. Matthew, Chuyang Ye, Wenjia Bai,	(参考訳) 脳病変の分節は神経研究や診断において重要な役割を担っている。脳病変は様々な病理学的変化によって引き起こされる可能性があるため、異なるタイプの脳病変は、異なる画像モダリティに異なる特徴を持つ傾向がある。この複雑さのため、脳病変のセグメンテーション法はしばしばタスク固有の方法で開発される。特定の病変タイプと画像のモダリティに対して、特定のセグメンテーションモデルを開発する。しかし、タスク固有のモデルを使用することで、病変のタイプや画像のモダリティが事前に決定され、現実のシナリオへの展開が複雑になる。そこで本研究では,様々な画像モダリティの入力データに対して,異なる種類の脳病変を自動的に分割できる3次元脳病変分割のための普遍的基礎モデルを提案する。我々は,様々な画像モダリティに対応する複数のエキスパートネットワークを備えた,新しいMixture of Modality Experts (MoME) フレームワークを定式化する。階層的なゲーティングネットワークは、専門家の予測を組み合わせて、専門的なコラボレーションを促進する。さらに、各専門家ネットワークの劣化を回避し、その専門性を維持するために、訓練中のカリキュラム学習戦略を導入する。提案手法は5つの画像モダリティと8種類の病変を含む9つの脳病変データセットを用いて評価した。その結果、我々のモデルは最先端のユニバーサルモデルよりも優れており、未知のデータセットに有望な一般化を提供することが示された。 Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# KernelSHAP-IQ:共有インタラクションのための重み付き最小二乗最適化 KernelSHAP-IQ: Weighted Least-Square Optimization for Shapley Interactions ( http://arxiv.org/abs/2405.10852v2 ) ライセンス: Link先を確認	Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, Barbara Hammer,	(参考訳) Shapley値(SV)は、ブラックボックスMLモデルを理解するために、クレジットカードを機械学習(ML)エンティティに割り当てる一般的なアプローチである。このような解釈を高次相互作用で強化することは、Shapley Interaction Index (SII) がSVの直接公理的拡張である複雑なシステムでは避けられない。 SVが重み付き最小二乗(WLS)の目的によって任意のゲームの最適近似を得られることはよく知られているが、この結果のSIIへの拡張は長い間未解決の問題であり、代替指標の提案さえも導いた。本研究では、WLS問題の解として高階SIIを特徴付け、SIIと$k$-Shapley値(k$-SII)による最適近似を構築する。 SV とペアワイズ SII に対してこの表現を証明し、より高い順序に対して経験的に検証された予想を与える。その結果、SII 用 KernelSHAP の直接拡張である KernelSHAP-IQ を提案し、機能相互作用の最先端性能を示す。 The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and $k$-Shapley values ($k$-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# 情報アクセスのための生成人工知能の社会学的意味 Sociotechnical Implications of Generative Artificial Intelligence for Information Access ( http://arxiv.org/abs/2405.11612v2 ) ライセンス: Link先を確認	Bhaskar Mitra, Henriette Cramer, Olya Gurevich,	(参考訳) 信頼できる情報へのロバストなアクセスは、知識生産、公衆衛生教育、民主社会における情報市民の促進といった社会にとって重要な必要性である。生成的AI技術は、情報にアクセスし、既存の情報検索システムの有効性を改善する新しい方法を可能にするかもしれませんが、私たちはその長期的な社会的意味を理解し、理解し始めています。本章では、情報アクセスの文脈において、生成AIを採用する際のシステム的結果とリスクについて概説する。また,評価と緩和の勧告も提供し,今後の研究課題について論じる。 Robust access to trustworthy information is a critical need for society with implications for knowledge production, public health education, and promoting informed citizenry in democratic societies. Generative AI technologies may enable new ways to access information and improve effectiveness of existing information retrieval systems but we are only starting to understand and grapple with their long-term social implications. In this chapter, we present an overview of some of the systemic consequences and risks of employing generative AI in the context of information access. We also provide recommendations for evaluation and mitigation, and discuss challenges for future research.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# グローバル・ローカル・セマンティック・一貫性学習によるテキスト・ビデオ検索 Text-Video Retrieval with Global-Local Semantic Consistent Learning ( http://arxiv.org/abs/2405.12710v3 ) ライセンス: Link先を確認	Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Hengtao Shen,	(参考訳) 大規模画像テキスト事前学習モデル(例えばCLIP)をビデオ領域に適応させることは、テキストビデオ検索の最先端を表現している。第一のアプローチは、テキストとビデオのペアを共通の埋め込み空間に転送することと、特定のエンティティ上のクロスモーダルな相互作用を活用してセマンティックアライメントを構築することである。効果はあるものの、これらのパラダイムは計算コストを禁止し、非効率な検索に繋がる。そこで本研究では,テキスト・ビデオ検索のモダリティにまたがる潜在的共有セマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティック・ラーニング(GLSCL)を提案する。具体的には、粗い粒度のアライメントを探索するパラメータフリーなグローバル相互作用モジュールを提案する。そこで我々は,複数の学習可能なクエリを用いて,微粒なアライメントを学習するための潜在意味概念をキャプチャする共有ローカルインタラクションモジュールを考案した。さらに、ビジュアルクエリと対応するテキストクエリの整合性を達成するために、ICL(Inter-Consistency Loss)が考案され、ビジュアル(テキスト)クエリ内の分散を反発させてより識別的な概念を生成するために、IDL(Intra-Diversity Loss)が開発された。 MSR-VTT, MSVD, DiDeMo, LSMDC, ActivityNet の5つの広く使用されているベンチマーク実験により,提案手法の有効性と有効性を実証した。また,本手法はSOTAと同等の性能を示し,計算コストの約220倍の高速化を実現している。コードは、https://github.com/zchoi/GLSCLで入手できる。 Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, leading to inefficient retrieval. To address this, we propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL), which capitalizes on latent shared semantics across modalities for text-video retrieval. Specifically, we introduce a parameter-free global interaction module to explore coarse-grained alignment. Then, we devise a shared local interaction module that employs several learnable queries to capture latent semantic concepts for learning fine-grained alignment. Furthermore, an Inter-Consistency Loss (ICL) is devised to accomplish the concept alignment between the visual query and corresponding textual query, and an Intra-Diversity Loss (IDL) is developed to repulse the distribution within visual (textual) queries to generate more discriminative concepts. Extensive experiments on five widely used benchmarks (i.e., MSR-VTT, MSVD, DiDeMo, LSMDC, and ActivityNet) substantiate the superior effectiveness and efficiency of the proposed method. Remarkably, our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost. Code is available at: https://github.com/zchoi/GLSCL.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# FLIPHAT:高次元スパルスリニアバンドの差分プライバシー FLIPHAT: Joint Differential Privacy for High Dimensional Sparse Linear Bandits ( http://arxiv.org/abs/2405.14038v2 ) ライセンス: Link先を確認	Sunrit Chakraborty, Saptarshi Roy, Debabrota Basu,	(参考訳) 高次元スペアリニアバンドは、ユーザの高次元特徴(例えばゲノムデータ)が利用できるが、そのごく一部だけが関連している、シーケンシャルな意思決定問題(例えばパーソナライズドメディカル)の効率的なモデルとして機能する。これらのアプリケーションにおけるデータプライバシの懸念により、我々は、報酬と文脈の両方をプライベートデータとみなす、差分的にプライベートな高次元の疎線形帯域について検討する。まず、プライバシのコストを定量化するために、この設定で達成可能な後悔の限界を低くする。さらにこの問題に対処するため、計算効率の良い帯域幅アルゴリズムである \textbf{F}orgetfu\textbf{L} \textbf{I}terative \textbf{P}rivate \textbf{HA}rd \textbf{T}hresholding (FLIPHAT) を設計する。 FLIPHATはエピソードの倍増とエピソード的忘れ込みとともに、プライバシと後悔の最適性の両方を保証するために、疎線形回帰オラクルとしてノイズイテレーティブ・ハード・スレッショニング(N-IHT)アルゴリズムの亜種をデプロイする。また,FLIPHATは対数的要因を最適に再現できることが示唆された。並列利害関係であるN-IHTの推定誤差を, より精巧に解析することで, 後悔の分析を行う。 High dimensional sparse linear bandits serve as an efficient model for sequential decision-making problems (e.g. personalized medicine), where high dimensional features (e.g. genomic data) on the users are available, but only a small subset of them are relevant. Motivated by data privacy concerns in these applications, we study the joint differentially private high dimensional sparse linear bandits, where both rewards and contexts are considered as private data. First, to quantify the cost of privacy, we derive a lower bound on the regret achievable in this setting. To further address the problem, we design a computationally efficient bandit algorithm, \textbf{F}orgetfu\textbf{L} \textbf{I}terative \textbf{P}rivate \textbf{HA}rd \textbf{T}hresholding (FLIPHAT). Along with doubling of episodes and episodic forgetting, FLIPHAT deploys a variant of Noisy Iterative Hard Thresholding (N-IHT) algorithm as a sparse linear regression oracle to ensure both privacy and regret-optimality. We show that FLIPHAT achieves optimal regret up to logarithmic factors. We analyze the regret by providing a novel refined analysis of the estimation error of N-IHT, which is of parallel interest.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# ニューラルPDEサロゲートを用いた二相流の加速シミュレーション Accelerating Simulation of Two-Phase Flows with Neural PDE Surrogates ( http://arxiv.org/abs/2405.17260v2 ) ライセンス: Link先を確認	Yoeri Poels, Koen Minartz, Harshit Bansal, Vlado Menkovski,	(参考訳) シミュレーションは物理系をよりよく理解するための強力なツールであるが、一般に計算に高価な数値法を必要とする。このようなシミュレーションの下流の応用は、例えば多くの自由度を持つ逆設計の場合など、多くの前方解を必要とする場合、計算不可能となる。本研究では,2相流問題に対するスケーリングシミュレーションを支援するツールとして,ニューラルPDEソルバを検討・拡張し,特に孔内からの油流出のシミュレーションを行う。この問題に対する既存の数値的手法を、ドメインの様々なジオメトリを含むより複雑な設定に拡張し、挑戦的なデータセットを生成する。さらに,UNet,DRN,U-FNOの3つの顕著なPDE解法について検討し,油流出問題の特徴として,(1)幾何学上の空間条件,(2)境界における周期性,(3)近似質量保存について検討した。我々は全ての手法をスケールし、その速度精度トレードオフをベンチマークし、質的特性を評価し、アブレーション研究を行う。提案手法は, 最大3桁の速さで液滴力学を正確にモデル化し, 拡張によりベースラインよりも性能が向上し, 導入した様々なジオメトリーが, 従来検討されていた油流出問題よりもはるかに困難であることがわかった。 Simulation is a powerful tool to better understand physical systems, but generally requires computationally expensive numerical methods. Downstream applications of such simulations can become computationally infeasible if they require many forward solves, for example in the case of inverse design with many degrees of freedom. In this work, we investigate and extend neural PDE solvers as a tool to aid in scaling simulations for two-phase flow problems, and simulations of oil expulsion from a pore specifically. We extend existing numerical methods for this problem to a more complex setting involving varying geometries of the domain to generate a challenging dataset. Further, we investigate three prominent neural PDE solver methods, namely the UNet, DRN, and U-FNO, and extend them for characteristics of the oil-expulsion problem: (1) spatial conditioning on the geometry; (2) periodicity in the boundary; (3) approximate mass conservation. We scale all methods and benchmark their speed-accuracy trade-off, evaluate qualitative properties, and perform an ablation study. We find that the investigated methods can accurately model the droplet dynamics with up to three orders of magnitude speed-up, that our extensions improve performance over the baselines, and that the introduced varying geometries constitute a significantly more challenging setting over the previously considered oil expulsion problem.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# 視覚強化学習における非有界データ強化の試み A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning ( http://arxiv.org/abs/2405.17416v2 ) ライセンス: Link先を確認	Abdulaziz Almuzairee, Nicklas Hansen, Henrik I. Christensen,	(参考訳) Q-learningアルゴリズムは、データ効率のために現実世界のアプリケーションにアピールしていますが、視覚的な観察からトレーニングされた場合、過度に適合し、トレーニングする傾向があります。以前の研究、すなわちSVEAは、データ拡張の選択的応用は、トレーニングを不安定にすることなく、RLエージェントの視覚的一般化を改善することができることを示した。我々は、データ拡張のためのレシピを再検討し、その効果を測光特性の増強に制限する仮定を求める。これらの制限に対処し、より広い種類の拡張を扱う一般化されたレシピであるSADAを提案する。提案するDMControl Generalization Benchmark と Meta-World および Distracting Control Suite のタスクを拡張した DMC-GB2 にその効果をベンチマークし,その方法である SADA が,多種多様な拡張セットにおけるトレーニング安定性と RL エージェントの一般化を大幅に改善することを発見した。視覚化、コード、ベンチマークについてはhttps://aalmuzairee.github.io/SADA/を参照してください。 Q-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 - our proposed extension of the popular DMControl Generalization Benchmark - as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. For visualizations, code and benchmark: see https://aalmuzairee.github.io/SADA/	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# 画像コピー検出のためのコンパクトディスクリプタによる自己教師付き蒸留 Relational Self-supervised Distillation with Compact Descriptors for Image Copy Detection ( http://arxiv.org/abs/2405.17928v4 ) ライセンス: Link先を確認	Juntae Kim, Sungwon Woo, Jongho Nang,	(参考訳) 画像コピー検出は、参照データベース内の任意の画像から編集されたコピーを検出するタスクである。従来のアプローチは目覚ましい進歩を見せたが、ネットワークと記述子の大きさは依然として不利であり、実用的応用を複雑にしている。本稿では,軽量ネットワークとコンパクトディスクリプタを用いて,競争性能を実現する手法を提案する。大規模ネットワークから小さなネットワークへ知識を伝達するために,リレーショナル自己教師型蒸留を利用することで,少ない記述子サイズの軽量ネットワークのトレーニングを可能にする。より小さな特徴空間におけるフレキシブルな表現のためのリレーショナル自己教師型蒸留を導入し, 次元崩壊を防止するために, 強負損失を伴うコントラスト学習を適用した。 DISC2021ベンチマークでは、ResNet-50/EfficientNet-B0を教師と学生それぞれに使用し、ベースライン法と比較して64/128/256ディスクリプタサイズのマイクロ平均精度を5.0%/4.9%/5.9%改善した。 Image copy detection is a task of detecting edited copies from any image within a reference database. While previous approaches have shown remarkable progress, the large size of their networks and descriptors remains disadvantage, complicating their practical application. In this paper, we propose a novel method that achieves a competitive performance by using a lightweight network and compact descriptors. By utilizing relational self-supervised distillation to transfer knowledge from a large network to a small network, we enable the training of lightweight networks with a small descriptor size. We introduce relational self-supervised distillation for flexible representation in a smaller feature space and applies contrastive learning with a hard negative loss to prevent dimensional collapse. For the DISC2021 benchmark, ResNet-50/EfficientNet-B0 are used as a teacher and student respectively, the micro average precision improved by 5.0%/4.9%/5.9% for 64/128/256 descriptor sizes compared to the baseline method.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# DeMamba: 数百万台のGenVideoベンチマークでAIが生成したビデオ検出 DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark ( http://arxiv.org/abs/2405.19707v2 ) ライセンス: Link先を確認	Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, Huaxiong Li,	(参考訳) 近年,映像生成技術は急速に進歩している。ソーシャルメディアプラットフォームでの動画コンテンツの人気を考えると、これらのモデルは偽情報の拡散に対する懸念を強めている。したがって、偽のAI生成ビデオを区別し、偽の情報による潜在的な害を軽減できる検出器の需要が高まっている。しかし、最も先進的なビデオジェネレータからの大規模なデータセットの欠如は、そのような検出器の開発に障壁をもたらす。このギャップに対処するために、最初のAI生成ビデオ検出データセットであるGenVideoを紹介する。 1)AIが生成した100万以上の実ビデオを含む大量のビデオ、(2)ビデオカテゴリと生成テクニックの幅広い範囲をカバーする、生成されたコンテンツと方法論の豊富な多様性。そこで,本研究では,実世界のシナリオに合わせた2つの評価手法を提案する。クロスジェネレータビデオ分類タスクは,ジェネレータ上での訓練された検出器の一般化性を評価する。さらに,デテール・マンバ (DeMamba, DeMamba) というプラグイン・アンド・プレイ・モジュールを導入し,時間次元と空間次元の矛盾を解析することにより,AI生成した映像を識別することで検出器の強化を図った。我々の大規模な実験は、既存の検出器と比較して、DeMambaのGenVideoにおける優れた一般化性とロバスト性を示している。我々は、GenVideoデータセットとDeMambaモジュールがAI生成ビデオ検出の分野を大幅に前進させると考えている。コードとデータセットは \url{https://github.com/chenhaoxing/DeMamba} でアビリザブルになります。 Recently, video generation techniques have advanced rapidly. Given the popularity of video content on social media platforms, these models intensify concerns about the spread of fake information. Therefore, there is a growing demand for detectors capable of distinguishing between fake AI-generated videos and mitigating the potential harm caused by fake information. However, the lack of large-scale datasets from the most advanced video generators poses a barrier to the development of such detectors. To address this gap, we introduce the first AI-generated video detection dataset, GenVideo. It features the following characteristics: (1) a large volume of videos, including over one million AI-generated and real videos collected; (2) a rich diversity of generated content and methodologies, covering a broad spectrum of video categories and generation techniques. We conducted extensive studies of the dataset and proposed two evaluation methods tailored for real-world-like scenarios to assess the detectors' performance: the cross-generator video classification task assesses the generalizability of trained detectors on generators; the degraded video classification task evaluates the robustness of detectors to handle videos that have degraded in quality during dissemination. Moreover, we introduced a plug-and-play module, named Detail Mamba (DeMamba), designed to enhance the detectors by identifying AI-generated videos through the analysis of inconsistencies in temporal and spatial dimensions. Our extensive experiments demonstrate DeMamba's superior generalizability and robustness on GenVideo compared to existing detectors. We believe that the GenVideo dataset and the DeMamba module will significantly advance the field of AI-generated video detection. Our code and dataset will be aviliable at \url{https://github.com/chenhaoxing/DeMamba}.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# エゴセントリックな行動認識のためのマルチモーダルなクロスドメインFew-Shot学習 Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition ( http://arxiv.org/abs/2405.19917v3 ) ライセンス: Link先を確認	Masashi Hatano, Ryo Hachiuma, Ryo Fujii, Hideo Saito,	(参考訳) マルチモーダル入力とラベルなしターゲットデータを用いた,エゴセントリックな行動認識のための新しいクロスドメイン少ショット学習タスク(CD-FSL)について検討する。本稿では,CD-FSL設定におけるエゴセントリックなアクション認識に関わる2つの重要な課題について,(1)エゴセントリックなビデオ(例えば,日々の生活と産業の領域)における極端なドメインギャップ,(2)実世界のアプリケーションにおける計算コストの2つを同時に解決する。本稿では,対象領域への適応性を向上し,推論コストを改善するために,ドメイン適応的で効率的なアプローチであるMM-CDFSLを提案する。最初の課題に対処するために,教師モデルを用いた学生RGBモデルへのマルチモーダル蒸留の導入を提案する。各教師モデルは、それぞれのモダリティのソースデータとターゲットデータに基づいて、独立して訓練される。マルチモーダル蒸留における未ラベルのターゲットデータのみを活用すると、学生モデルのターゲット領域への適応性が向上する。さらに,マスクによる入力トークン数を削減する手法であるアンサンブルマスク推論を導入する。このアプローチでは、アンサンブル予測はマスキングによる性能劣化を緩和し、2つ目の問題に効果的に対処する。当社のアプローチは、最先端のCD-FSLアプローチよりも優れており、複数のエゴセントリックデータセットに対してかなりのマージンを有し、平均6.12/6.10ポイントの1ショット/5ショット設定で改善され、推論速度は2.2ドルの速さで達成された。プロジェクトページ:https://masashi-hatano.github.io/MM-CDFSL/ We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input and unlabeled target data for egocentric action recognition. This paper simultaneously tackles two critical challenges associated with egocentric action recognition in CD-FSL settings: (1) the extreme domain gap in egocentric videos (e.g., daily life vs. industrial domain) and (2) the computational cost for real-world applications. We propose MM-CDFSL, a domain-adaptive and computationally efficient approach designed to enhance adaptability to the target domain and improve inference cost. To address the first challenge, we propose the incorporation of multimodal distillation into the student RGB model using teacher models. Each teacher model is trained independently on source and target data for its respective modality. Leveraging only unlabeled target data during multimodal distillation enhances the student model's adaptability to the target domain. We further introduce ensemble masked inference, a technique that reduces the number of input tokens through masking. In this approach, ensemble prediction mitigates the performance degradation caused by masking, effectively addressing the second issue. Our approach outperformed the state-of-the-art CD-FSL approaches with a substantial margin on multiple egocentric datasets, improving by an average of 6.12/6.10 points for 1-shot/5-shot settings while achieving $2.2$ times faster inference speed. Project page: https://masashi-hatano.github.io/MM-CDFSL/	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# スーパーガウシアン:3Dスーパーレゾリューションのためにビデオモデルを再購入 SuperGaussian: Repurposing Video Models for 3D Super Resolution ( http://arxiv.org/abs/2406.00609v4 ) ライセンス: Link先を確認	Yuan Shen, Duygu Ceylan, Paul Guerrero, Zexiang Xu, Niloy J. Mitra, Shenlong Wang, Anna Frühstück,	(参考訳) 本稿では,幾何学的および外観的詳細を付加することにより,粗い3次元モデルをアップサンプルする,単純でモジュラーで汎用的な手法を提案する。生成的な3Dモデルは現在存在するが、画像やビデオの領域におけるそれらのモデルの品質とはまだ一致していない。既存の(事前訓練済み)ビデオモデルを3次元超解像に直接再利用することは可能であり、高品質な3次元トレーニングモデルの大規模なリポジトリ不足の問題を副次的に解決できることを実証する。本稿では,3次元整合性のない映像アップサンプリングモデルを再利用し,それらを3次元整合化と組み合わせて3次元整合性のある結果を生成する方法について述べる。出力として、オブジェクト中心で有効である高品質なガウススプラモデルを生成する。本手法はカテゴリ非依存であり,既存の3Dワークフローに容易に組み込むことができる。提案したSuperGaussianを,複雑性と表現の両面で多種多様な3次元インプット(例えばガウススプレートやNeRF)で評価し,本手法が最終3次元モデルの忠実度を著しく向上させることを示す。詳細はプロジェクトのWebサイトをご覧ください。 We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models. Check our project website for details: supergaussian.github.io	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# LanEvil: レーン検出のロバストさを環境問題にベンチマークする LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions ( http://arxiv.org/abs/2406.00934v4 ) ライセンス: Link先を確認	Tianyuan Zhang, Lu Wang, Hainan Li, Yisong Xiao, Siyuan Liang, Aishan Liu, Xianglong Liu, Dacheng Tao,	(参考訳) レーン検出(LD)は自律走行システムにおいて不可欠な要素であり、適応型クルーズ制御や自動車線センターなどの基本的な機能を提供している。既存のLDベンチマークは主に、道路上の影やタイヤマークのような環境錯覚に対するLDモデルの堅牢性を無視し、一般的なケースを評価することに焦点を当てている。この研究のギャップは、現実世界の交通状況に自然に存在するため、重要な安全上の課題を生じさせる。本稿では,これらの環境錯覚によるLDに対する潜在的脅威を初めて研究し,この自然破壊に対するLDの堅牢性を評価するための総合的な指標であるLanEvilを確立する。 LDタスクにおける実世界の影響要因を幅広くカバーする,14種類の重要かつ重要な環境錯覚(例えば,影,反射)を体系的に設計する。実世界の環境をベースとして、広く使われているCARLAシミュレータを用いて、94の現実的でカスタマイズ可能な3Dケースを作成し、90,292枚のサンプル画像からなるデータセットを作成する。大規模な実験を通じて、LanEvilを用いた一般的なLD手法の堅牢性をベンチマークし、性能劣化(平均5.37%の精度と10.70%のF1スコア)を明らかにし、シャドーエフェクトが最もリスクが高い(7.39%の精度)。さらに、協調シミュレーションにより商用自動運転システムOpenPilotとApolloの性能を評価し、提案した環境錯覚が誤った判断や交通事故につながることを実証する。環境イリュージョンに対する対策として,照明条件下でのロバスト性向上(+3.76%)を目立たせる厳密な例を用いた注意領域混合(AAM)手法を提案する。われわれの論文が今後、より堅牢な自動運転システムに貢献できることを願っている。ウェブサイト: https://lanevil.github.io/.com Lane detection (LD) is an essential component of autonomous driving systems, providing fundamental functionalities like adaptive cruise control and automated lane centering. Existing LD benchmarks primarily focus on evaluating common cases, neglecting the robustness of LD models against environmental illusions such as shadows and tire marks on the road. This research gap poses significant safety challenges since these illusions exist naturally in real-world traffic situations. For the first time, this paper studies the potential threats caused by these environmental illusions to LD and establishes the first comprehensive benchmark LanEvil for evaluating the robustness of LD against this natural corruption. We systematically design 14 prevalent yet critical types of environmental illusions (e.g., shadow, reflection) that cover a wide spectrum of real-world influencing factors in LD tasks. Based on real-world environments, we create 94 realistic and customizable 3D cases using the widely used CARLA simulator, resulting in a dataset comprising 90,292 sampled images. Through extensive experiments, we benchmark the robustness of popular LD methods using LanEvil, revealing substantial performance degradation (-5.37% Accuracy and -10.70% F1-Score on average), with shadow effects posing the greatest risk (-7.39% Accuracy). Additionally, we assess the performance of commercial auto-driving systems OpenPilot and Apollo through collaborative simulations, demonstrating that proposed environmental illusions can lead to incorrect decisions and potential traffic accidents. To defend against environmental illusions, we propose the Attention Area Mixing (AAM) approach using hard examples, which witness significant robustness improvement (+3.76%) under illumination effects. We hope our paper can contribute to advancing more robust auto-driving systems in the future. Website: https://lanevil.github.io/.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# オンラインデータの重要性 : カバレッジによる選好の微調整を理解する The Importance of Online Data: Understanding Preference Fine-tuning via Coverage ( http://arxiv.org/abs/2406.01462v2 ) ライセンス: Link先を確認	Yuda Song, Gokul Swamy, Aarti Singh, J. Andrew Bagnell, Wen Sun,	(参考訳) 人間の嗜好データからの学習が,大規模言語モデル (LLM) を微調整する主要なパラダイムとして浮上している。 PPO(Proximal Policy Optimization)のようなオンライン強化学習(RL)と、DPO(Direct Preference Optimization)のようなオフラインのコントラスト的手法は、どちらも同一のオフライン優先データセットから開始する必要があるため、以前の作業では同等と位置づけられていた。選好微調整のためのオンラインとオフラインの技法の類似点と相違点に関する理論的理解をさらに深めるため、データセットカバレッジのレンズを通して厳密な分析を行い、トレーニングデータがテスト分布をどのようにカバーしているかを捉え、RLで広く使われている概念である。グローバルなカバレッジ条件は,オフラインのコントラスト手法が最適ポリシーに収束するのに必要かつ十分であることを示すが,オンラインRL手法ではより弱い部分カバレッジ条件で十分である。この分離によって、オンラインRLメソッドがオフラインメソッドよりも優れたパフォーマンスを得られる理由が説明できる。最後に, 従来の理論的観測をベースとして, オフラインデータをコントラッシブな選好最適化に用いるハイブリッド選好最適化(HyPO)アルゴリズムと, KL正則化のためのオンラインデータを導出する。理論的かつ実証的に、HyPOは純粋なオフラインのDPOよりも高性能でありながら、その計算とメモリ効率を保っていることを実証する。 Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset. To further expand our theoretical understanding of the similarities and differences between online and offline techniques for preference fine-tuning, we conduct a rigorous analysis through the lens of dataset coverage, a concept that captures how the training data covers the test distribution and is widely used in RL. We prove that a global coverage condition is both necessary and sufficient for offline contrastive methods to converge to the optimal policy, but a weaker partial coverage condition suffices for online RL methods. This separation provides one explanation of why online RL methods can perform better than offline methods, especially when the offline preference data is not diverse enough. Finally, motivated by our preceding theoretical observations, we derive a hybrid preference optimization (HyPO) algorithm that uses offline data for contrastive-based preference optimization and online data for KL regularization. Theoretically and empirically, we demonstrate that HyPO is more performant than its pure offline counterpart DPO, while still preserving its computation and memory efficiency.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# CommonPowerを用いた系統制御のための安全強化学習の強化 Empowering Safe Reinforcement Learning for Power System Control with CommonPower ( http://arxiv.org/abs/2406.03231v2 ) ライセンス: Link先を確認	Michael Eichelbeck, Hannah Markgraf, Matthias Althoff,	(参考訳) 電力系統管理の複雑さの増大により、強化学習(RL)への関心が高まっている。しかしながら、バニラRLコントローラはシステム制約の満足度を保証することはできない。したがって, 電力系統管理のためのRL研究において, 公式に正しい保護機構と組み合わせることが重要である。複雑なユースケースにセーフガードを統合するには、ツールのサポートが必要だ。このニーズに対処するために、PythonツールのCommonPowerを紹介します。 CommonPowerのユニークな貢献は、RLコントローラの柔軟なモデルベースの保護を可能にするシンボリックモデリングアプローチにある。さらにCommonPowerは、単一エージェントRL、マルチエージェントRL、最適制御のための統一インターフェースを提供し、異なる予測メソッドをシームレスに統合する。これにより、ユーザは、さまざまなケーススタディで安全なRLコントローラの有効性を検証し、全体的なパフォーマンスに対する特定の側面の影響を調べることができる。我々は、異なる安全ガードを特徴とするRLエージェントと、エネルギー管理のコンテキストにおけるモデル予測制御器を比較した数値ケーススタディにより、CommonPowerの汎用性を実証する。 The growing complexity of power system management has led to an increased interest in reinforcement learning (RL). However, vanilla RL controllers cannot themselves ensure satisfaction of system constraints. Therefore, combining them with formally correct safeguarding mechanisms is an important aspect when studying RL for power system management. Integrating safeguarding into complex use cases requires tool support. To address this need, we introduce the Python tool CommonPower. CommonPower's unique contribution lies in its symbolic modeling approach, which enables flexible, model-based safeguarding of RL controllers. Moreover, CommonPower offers a unified interface for single-agent RL, multi-agent RL, and optimal control, with seamless integration of different forecasting methods. This allows users to validate the effectiveness of safe RL controllers across a large variety of case studies and investigate the influence of specific aspects on overall performance. We demonstrate CommonPower's versatility through a numerical case study that compares RL agents featuring different safeguards with a model predictive controller in the context of building energy management.	翻訳日:2024-07-17 20:49:21 公開日:2024-07-16
# マルチエージェント流れのオンライン・ジョイント微調整 Online Joint Fine-tuning of Multi-Agent Flows ( http://arxiv.org/abs/2406.04516v3 ) ライセンス: Link先を確認	Paul Mineiro,	(参考訳) フローはコンポーネントモデルの集合("Agents")であり、反復的なコミュニケーションを通じて複雑な問題の解を構築する。フローはコード生成のための最先端アーキテクチャとして登場し、Autogenのようなフレームワークのラジソンだ。しかし、現在、フローは手動のプロンプト工学と段階的に制御された学習技術の組み合わせで構築されている。本稿では,ラーニング・トゥ・サーチ(Learning to Search,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ(Learning to Search,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ)フレームワークに触発されたフロー全体をオンライン共同調整する手順について述べる。このアプローチはシミュレータアクセスを利用してエピソード全体の好みを減らし、個々のノード出力よりも好みを減らし、コンポーネントが言語モデルである場合、後者はよく研究される問題である。このアプローチは、エピソード評価モデルが利用可能であれば、報酬のない設定(例えば、テキストフィードバック)に適用できる。私は、最先端の結果を達成するためのマルチホップQAデータセットMuseicに適用します。 A Flow is a collection of component models ("Agents") which constructs the solution to a complex problem via iterative communication. Flows have emerged as state of the art architectures for code generation, and are the raison d'etre for frameworks like Autogen. However, flows are currently constructed via a combination of manual prompt engineering and stagewise supervised learning techniques; the latter is limited to acyclic flows with granular node supervision. In this writeup I describe a procedure for online joint fine-tuning of an entire flow inspired by the Learning to Search framework. The approach leverages simulator access to reduce preferences over entire episodes to preferences over individual node outputs; when the components are language models the latter is a well-studied problem. The approach is applicable to reward-free settings (e.g., text feedback) if an episode evaluator model is available. I apply to the multi-hop QA dataset Musique achieving a state-of-the-art result.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# LLMによる因果ビジネスプロセス推論のベンチマークに向けて Towards a Benchmark for Causal Business Process Reasoning with LLMs ( http://arxiv.org/abs/2406.05506v2 ) ライセンス: Link先を確認	Fabiana Fournier, Lior Limonad, Inna Skarbovsky,	(参考訳) 大きな言語モデル(LLM)は、組織の効率向上やタスクの自動化にますます使われています。もともとは複雑な認知プロセスのために設計されたものではないが、近年の取り組みは、推論、計画、意思決定といった活動にLLMを採用するように拡張されている。ビジネスプロセスにおいて、そのような能力は、そのようなプロセスの深い理解を得るために訓練された巨大なコーパスLLMを活用する上で、貴重なものになり得る。本研究は, LLMの因果的・プロセス的視点を推論する能力を評価するため, ベンチマーク開発のための種子を植え付けるものである。この見解を、BP^C(Causally-augmented Business Processes)と呼ぶ。ベンチマークのコアは、BP^C関連の一連の状況と、これらの状況に関する一連の質問と、これらの質問に対する基礎的な真実の答えを体系的に解決するために使用される導出規則から構成される。また、LLMの力により、種子はより大規模なドメイン固有の状況や問題にインスタンス化される。 BP^Cの推論は、プロセスの介入とプロセス改善にとって重要である。我々のベンチマークはhttps://huggingface.co/datasets/ibm/BPCでアクセス可能であり、任意のLLMの性能をテストし、BP^Cを推論するためにLLMを訓練する、2つの可能なモダリティの1つに利用できる。 Large Language Models (LLMs) are increasingly used for boosting organizational efficiency and automating tasks. While not originally designed for complex cognitive processes, recent efforts have further extended to employ LLMs in activities such as reasoning, planning, and decision-making. In business processes, such abilities could be invaluable for leveraging on the massive corpora LLMs have been trained on for gaining deep understanding of such processes. In this work, we plant the seeds for the development of a benchmark to assess the ability of LLMs to reason about causal and process perspectives of business operations. We refer to this view as Causally-augmented Business Processes (BP^C). The core of the benchmark comprises a set of BP^C related situations, a set of questions about these situations, and a set of deductive rules employed to systematically resolve the ground truth answers to these questions. Also with the power of LLMs, the seed is then instantiated into a larger-scale set of domain-specific situations and questions. Reasoning on BP^C is of crucial importance for process interventions and process improvement. Our benchmark, accessible at https://huggingface.co/datasets/ibm/BPC, can be used in one of two possible modalities: testing the performance of any target LLM and training an LLM to advance its capability to reason about BP^C.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# ジェットタグ用粒子多軸変圧器 Particle Multi-Axis Transformer for Jet Tagging ( http://arxiv.org/abs/2406.06638v2 ) ライセンス: Link先を確認	Muhammad Usman, M Husnain Shahid, Maheen Ejaz, Ummay Hani, Nayab Fatima, Abdul Rehman Khan, Asifullah Khan, Nasir Majid Mirza,	(参考訳) ジェットタグは高エネルギー物理学において重要な分類問題である。近年、Deep Learningはジェットタグ付けの課題に発展しただけでなく、パフォーマンスも大幅に向上した。本稿では,新しいアーキテクチャであるParticle Multi-Axis transformer (ParMAT)を提案する。 ParMATは単一ユニット内の局所的およびグローバルな空間的相互作用を含み、様々な入力長を扱う能力を向上させる。 JETCLASSは10種類の粒子からなる1億基のジェットを含む,公開可能な大規模データセットである。 ParMATは、パラレルアテンション機構と粒子のペアワイズ相互作用を統合することにより、ParTとParticleNetに対するロバスト性と高い精度を実現する。巨大なデータセットへのモデルのスケーラビリティと、重要な特徴を自動的に抽出する能力は、ジェットタグの強化の可能性を示している。 Jet tagging is an essential categorization problem in high energy physics. In recent times, Deep Learning has not only risen to the challenge of jet tagging but also significantly improved its performance. In this article, we proposed an idea of a new architecture, Particle Multi-Axis transformer (ParMAT) which is a modified version of Particle transformer (ParT). ParMAT contains local and global spatial interactions within a single unit which improves its ability to handle various input lengths. We trained our model on JETCLASS, a publicly available large dataset that contains 100M jets of 10 different classes of particles. By integrating a parallel attention mechanism and pairwise interactions of particles in the attention mechanism, ParMAT achieves robustness and higher accuracy over the ParT and ParticleNet. The scalability of the model to huge datasets and its ability to automatically extract essential features demonstrate its potential for enhancing jet tagging.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# 次世代データベースインタフェース: LLM-based Text-to-SQL の調査 Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL ( http://arxiv.org/abs/2406.08426v3 ) ライセンス: Link先を確認	Zijin Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, Xiao Huang,	(参考訳) 自然言語の質問(text-to-SQL)から正確なSQLを生成することは、ユーザ質問の理解、データベーススキーマの理解、SQL生成といった複雑さのため、長年にわたる課題である。人間のエンジニアリングとディープニューラルネットワークからなる従来のテキスト-SQLシステムは、かなりの進歩を遂げた。その後、事前訓練された言語モデル(PLM)が開発され、テキストからSQLまでのタスクに利用され、有望なパフォーマンスを実現している。現代のデータベースが複雑化するにつれて、対応するユーザの質問もますます難しくなり、パラメータ制約のあるPLMが誤ったSQLを生成するようになる。これはより洗練された最適化手法を必要とし、PLMベースのシステムの適用を制限する。近年,大規模言語モデル (LLM) は,モデルスケールが増大するにつれて,自然言語理解において重要な能力を発揮している。したがって、LLMベースの実装を統合することで、テキスト-SQL研究にユニークな機会、改善、ソリューションをもたらすことができる。本稿では LLM ベースのテキスト-to-SQL の総合的なレビューを行う。具体的には,テキスト・トゥ・SQLの技術的課題と進化過程について概説する。次に、テキスト・トゥ・SQLシステムを評価するために設計されたデータセットとメトリクスの詳細を紹介する。その後、LLMベースのテキスト・トゥ・SQLの最近の進歩を体系的に分析する。最後に,この分野での課題について考察し,今後の研究の方向性を期待する。 Generating accurate SQL from natural language questions (text-to-SQL) is a long-standing challenge due to the complexities in user question understanding, database schema comprehension, and SQL generation. Conventional text-to-SQL systems, comprising human engineering and deep neural networks, have made substantial progress. Subsequently, pre-trained language models (PLMs) have been developed and utilized for text-to-SQL tasks, achieving promising performance. As modern databases become more complex, the corresponding user questions also grow more challenging, causing PLMs with parameter constraints to produce incorrect SQL. This necessitates more sophisticated and tailored optimization methods, which, in turn, restricts the applications of PLM-based systems. Recently, large language models (LLMs) have demonstrated significant capabilities in natural language understanding as the model scale increases. Therefore, integrating LLM-based implementation can bring unique opportunities, improvements, and solutions to text-to-SQL research. In this survey, we present a comprehensive review of LLM-based text-to-SQL. Specifically, we propose a brief overview of the technical challenges and the evolutionary process of text-to-SQL. Then, we provide a detailed introduction to the datasets and metrics designed to evaluate text-to-SQL systems. After that, we present a systematic analysis of recent advances in LLM-based text-to-SQL. Finally, we discuss the remaining challenges in this field and propose expectations for future research directions.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# EgoExo-Fitness:Egocentric and Exocentric Full-Body Action Understandingに向けて EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding ( http://arxiv.org/abs/2406.08877v2 ) ライセンス: Link先を確認	Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, Wei-Shi Zheng,	(参考訳) EgoExo-Fitnessは、新しいフルボディアクション理解データセットで、同期型エゴセントリックカメラと固定型エゴセントリックカメラ(3人称)カメラから記録されたフィットネスシーケンスを特徴とする。既存のフルボディのアクション理解データセットと比較すると、EgoExo-Fitnessは一人称視点のビデオだけでなく、リッチなアノテーションも提供する。具体的には、各アクションのサブステップとともに、単一のアクションビデオをローカライズするために、2段階の時間境界が提供される。さらに重要なのは、EgoExo-Fitnessは、技術的キーポイント検証、アクション実行に関する自然言語コメント、アクション品質スコアを含む、解釈可能なアクション判断のための革新的なアノテーションを導入している。これらすべてを組み合わせることで、EgoExo-Fitnessは、エゴセントリックでエゴセントリックなフルボディの行動理解を"What"、"When"、"How well"の次元で研究するための新たなリソースを提供する。本研究では,行動分類,行動ローカライゼーション,クロスビューシーケンス検証,クロスビュースキル決定,新たに提案されたガイダンスに基づく実行検証タスクなどの一連のタスクのベンチマークを,詳細な分析とともに構築する。コードとデータはhttps://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/mainで入手できる。 We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal boundaries are provided to localize single action videos along with sub-steps of each action. More importantly, EgoExo-Fitness introduces innovative annotations for interpretable action judgement--including technical keypoint verification, natural language comments on action execution, and action quality scores. Combining all of these, EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research on egocentric and exocentric full-body action understanding, we construct benchmarks on a suite of tasks (i.e., action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification), together with detailed analysis. Code and data will be available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# SHMamba:オーディオ・ビジュアル質問応答のための構造的双曲的状態空間モデル SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering ( http://arxiv.org/abs/2406.09833v3 ) ライセンス: Link先を確認	Zhe Yang, Wenrui Li, Guanghui Cheng,	(参考訳) AVQA(Audio-Visual Question Answering)タスクは、アプリケーションにとって大きな可能性を秘めている。従来のユニモーダルアプローチと比較して、AVQAのマルチモーダル入力は特徴抽出と融合プロセスをより困難にする。ユークリッド空間は、データの多次元関係を効果的に表現することは困難である。特に木構造や階層構造でデータを抽出・処理する場合、ユークリッド空間は埋め込み空間には適さない。さらに、トランスフォーマーの自己保持機構は、シーケンス内の要素間の動的関係を捉えるのに有効である。しかし、ウィンドウモデリングと2次計算複雑性における自己注意機構の限界は、長いシーケンスをモデル化する際の効率を低下させる。これらの制約に対処するため、我々はSHMamba: Structured Hyperbolic State Space Modelを提案し、双曲幾何学と状態空間モデルの利点を統合する。具体的には、SHMambaは双曲空間の内在的性質を利用して、階層構造と音声・視覚データにおける複雑な関係を表現する。一方、状態空間モデルは、全シーケンスをグローバルにモデル化することで、時間とともに動的な変化を捉えます。さらに,適応的な曲率双曲アライメントモジュールとクロスフュージョンブロックを導入し,階層構造の理解とクロスモーダル情報の動的交換を強化する。 SHMambaはより少ないパラメータと計算コストで従来の手法より優れていることを示した。学習可能なパラメータは78.12\%削減され、平均性能は2.53\%向上した。実験の結果,本手法は現在のすべての主要な手法よりも優れており,実用的なアプリケーションシナリオに適していることがわかった。 The Audio-Visual Question Answering (AVQA) task holds significant potential for applications. Compared to traditional unimodal approaches, the multi-modal input of AVQA makes feature extraction and fusion processes more challenging. Euclidean space is difficult to effectively represent multi-dimensional relationships of data. Especially when extracting and processing data with a tree structure or hierarchical structure, Euclidean space is not suitable as an embedding space. Additionally, the self-attention mechanism in Transformers is effective in capturing the dynamic relationships between elements in a sequence. However, the self-attention mechanism's limitations in window modeling and quadratic computational complexity reduce its effectiveness in modeling long sequences. To address these limitations, we propose SHMamba: Structured Hyperbolic State Space Model to integrate the advantages of hyperbolic geometry and state space models. Specifically, SHMamba leverages the intrinsic properties of hyperbolic space to represent hierarchical structures and complex relationships in audio-visual data. Meanwhile, the state space model captures dynamic changes over time by globally modeling the entire sequence. Furthermore, we introduce an adaptive curvature hyperbolic alignment module and a cross fusion block to enhance the understanding of hierarchical structures and the dynamic exchange of cross-modal information, respectively. Extensive experiments demonstrate that SHMamba outperforms previous methods with fewer parameters and computational costs. Our learnable parameters are reduced by 78.12\%, while the average performance improves by 2.53\%. Experiments show that our method demonstrates superiority among all current major methods and is more suitable for practical application scenarios.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# ブロックベースアテンションマスクを用いた効率的かつ効率的な非自己回帰復号化に向けて Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask ( http://arxiv.org/abs/2406.10034v2 ) ライセンス: Link先を確認	Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jing, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu,	(参考訳) 本稿では,非自己回帰(NAR)ブロックベースのアテンションマスクデコーダ(AMD)を提案する。 AMDは、アテンションマスクを用いて隠蔽される出力ラベルの連続ブロック内で並列なNAR推論を行い、ブロック間の左から右へのAR予測と履歴コンテキストのアマルガメーションを行う。ビームサーチアルゴリズムは、CTC、ARデコーダ、AMD確率の動的融合を利用するように設計されている。 LibriSpeech-100hrコーパスの実験では、AMDモジュールを組み込んだトリパルタイトデコーダは、ベースラインのCTC+ARデコードに対して最大1.73xのデコード速度比を発生させるが、テストセットに統計的に有意な単語誤り率(WER)が増加しないことを示唆している。同じデコードリアルタイム因子で操作すると、CTC+ARベースライン上で統計学的に重要なWERの最大0.7%と0.3%の絶対値(5.3%と6.1%の相対値)が得られた。 This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam search algorithm is designed to leverage a dynamic fusion of CTC, AR Decoder, and AMD probabilities. Experiments on the LibriSpeech-100hr corpus suggest the tripartite Decoder incorporating the AMD module produces a maximum decoding speed-up ratio of 1.73x over the baseline CTC+AR decoding, while incurring no statistically significant word error rate (WER) increase on the test sets. When operating with the same decoding real time factors, statistically significant WER reductions of up to 0.7% and 0.3% absolute (5.3% and 6.1% relative) were obtained over the CTC+AR baseline.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# 言葉を超えて: ミッションクリティカルリスク分析における大規模言語モデルでの行動可能性 Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis ( http://arxiv.org/abs/2406.10273v2 ) ライセンス: Link先を確認	Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi,	(参考訳) コンテキスト。リスク分析は特定のシナリオにおける潜在的なリスクを評価する。リスク分析の原則は、コンテキストレスであり、同じ方法論を、健康や情報技術のセキュリティに関連するリスクに適用することができる。リスク分析には、国内外の規制や基準に関する膨大な知識が必要であり、時間と努力が集中している。大きな言語モデルは、人間よりも少ない時間で情報を素早く要約することができ、特定のタスクに微調整することができる。エイム。本研究は,リスク分析における検索・拡張型LLMと微調整型LLMの有効性を検討することを目的とした実証研究である。我々の知る限り、リスク分析の能力について事前の研究は行われていない。方法。我々は過去5年間に産業状況チームによってアーカイブされた50以上のミッションクリティカルな分析結果から,‘totalscenarios’というユニークなシナリオを手作業でキュレートした。基本モデルであるGPT-3.5とGPT-4とRetrieval-Augmented Generationおよび微調整モデルを比較した。我々は、モデルの競合相手として2人の人間専門家と、3人の人間専門家を雇い、モデルと以前の人間専門家の分析をレビューします。審査員は5000のシナリオ分析を行った。結果と結論。 HEsは高い精度を示したが、LSMsはより速く、より実用的な。さらに,RAG支援LSMが最も低い幻覚率を示し,隠れたリスクを効果的に発見し,人間の専門知識を補完することを示した。したがって、モデルの選択は、正確性のためのFTM、隠れたリスク発見のためのRAG、包括性と行動可能性のためのベースモデルなど、特定のニーズに依存する。したがって、専門家はLLMを、凝縮した時間枠内でのリスク分析を効果的に補完するコンパニオンとして活用することができる。また、不当な対策の実施に伴う不要な費用を回避することでコストを削減できる。 Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks. Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in Risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis. Method. We manually curated \totalscenarios unique scenarios leading to \totalsamples representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the base GPT-3.5 and GPT-4 models versus their Retrieval-Augmented Generation and fine-tuned counterparts. We employ two human experts as competitors of the models and three other three human experts to review the models and the former human expert's analysis. The reviewers analyzed 5,000 scenario analyses. Results and Conclusions. HEs demonstrated higher accuracy, but LLMs are quicker and more actionable. Moreover, our findings show that RAG-assisted LLMs have the lowest hallucination rates, effectively uncovering hidden risks and complementing human expertise. Thus, the choice of model depends on specific needs, with FTMs for accuracy, RAG for hidden risks discovery, and base models for comprehensiveness and actionability. Therefore, experts can leverage LLMs for an effective complementing companion in risk analysis within a condensed timeframe. They can also save costs by averting unnecessary expenses associated with implementing unwarranted countermeasures.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# SparseRadNet:サブサンプルレーダデータに基づくスパース知覚ニューラルネットワーク SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data ( http://arxiv.org/abs/2406.10600v4 ) ライセンス: Link先を確認	Jialong Wu, Mirko Meuter, Markus Schoeler, Matthias Rottmann,	(参考訳) レーダーに基づく認識は自律走行において注目を集めているが、レーダーの空間性は課題を生じさせている。レーダー生データは、しばしば過剰なノイズを含むが、レーダー点雲は限られた情報しか保持しない。本研究では,レーダ信号のグローバルおよびローカルな依存関係を発見するために,空間パターンを利用した適応型サブサンプリング手法と,適応型ネットワークアーキテクチャを導入することで,レーダデータの疎結合性を均質に扱う。我々のサブサンプリングモジュールは、下流の知覚タスクに最も寄与するレンジドップラー(RD)スペクトルから画素のサブセットを選択する。スパースサブサンプリングデータの特徴抽出を改善するために,レーダデータにグラフニューラルネットワークを適用する新しい手法を提案する。両方のブランチの機能を組み合わせるために、注意深い融合モジュールが適用される。 RADIalデータセットを用いた実験により,SparseRadNetはオブジェクト検出における最先端(SOTA)性能を超え,空間分割におけるSOTA精度に近づき,スパースサブサンプル入力データを用いた。 Radar-based perception has gained increasing attention in autonomous driving, yet the inherent sparsity of radars poses challenges. Radar raw data often contains excessive noise, whereas radar point clouds retain only limited information. In this work, we holistically treat the sparse nature of radar data by introducing an adaptive subsampling method together with a tailored network architecture that exploits the sparsity patterns to discover global and local dependencies in the radar signal. Our subsampling module selects a subset of pixels from range-doppler (RD) spectra that contribute most to the downstream perception tasks. To improve the feature extraction on sparse subsampled data, we propose a new way of applying graph neural networks on radar data and design a novel two-branch backbone to capture both global and local neighbor information. An attentive fusion module is applied to combine features from both branches. Experiments on the RADIal dataset show that our SparseRadNet exceeds state-of-the-art (SOTA) performance in object detection and achieves close to SOTA accuracy in freespace segmentation, meanwhile using sparse subsampled input data.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# WebCanvas: オンライン環境におけるWebエージェントのベンチマーク WebCanvas: Benchmarking Web Agents in Online Environments ( http://arxiv.org/abs/2406.12373v3 ) ライセンス: Link先を確認	Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu,	(参考訳) Webエージェントが実用的に有用であるためには、ユーザインターフェースやコンテンツへの頻繁な更新を特徴とする、継続的な進化するWeb環境に適応する必要がある。しかし、既存のベンチマークのほとんどは、Webの静的な側面のみをキャプチャしている。このギャップを埋めるために、WebCanvasはWebエージェントのための革新的なオンライン評価フレームワークであり、Webインタラクションの動的な性質を効果的に解決する。現実的な評価を促進するために, WebCanvas には3つの主要な要素がある。(1) 重要な中間動作やタスク完了に必要な状態を確実に捉えつつ,重要イベントや変更された Web 要素によるノイズを無視した,新たな評価指標。 2) Mind2Web-Liveと呼ばれるベンチマークデータセットは、オリジナルのMind2Web静的データセットの洗練されたバージョンで、2439の中間評価状態を持つ542のタスクを含む。 WebCanvas上に構築したエージェントフレームワークは,推論のための拡張可能なモジュールを備えたオープンソースであり,コミュニティがオンライン推論と評価を行うための基盤を提供する。ベストパフォーマンスエージェントは,Mind2Web-Liveテストセット上でのタスク成功率23.1%,タスク完了率48.8%を達成する。さらに,様々なWebサイト,ドメイン,実験環境におけるパフォーマンスの相違について分析する。我々は、オンラインエージェント評価に関するさらなる知見をコミュニティに提供し、この研究分野を前進させることを奨励する。 For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web interactions. WebCanvas contains three main components to facilitate realistic assessments: (1) A novel evaluation metric which reliably capture critical intermediate actions or states necessary for task completions while disregarding noise caused by insignificant events or changed web-elements. (2) A benchmark dataset called Mind2Web-Live, a refined version of original Mind2Web static dataset containing 542 tasks with 2439 intermediate evaluation states; (3) Lightweight and generalizable annotation tools and testing pipelines that enables the community to collect and maintain the high-quality, up-to-date dataset. Building on WebCanvas, we open-source an agent framework with extensible modules for reasoning, providing a foundation for the community to conduct online inference and evaluations. Our best-performing agent achieves a task success rate of 23.1% and a task completion rate of 48.8% on the Mind2Web-Live test set. Additionally, we analyze the performance discrepancies across various websites, domains, and experimental environments. We encourage the community to contribute further insights on online agent evaluation, thereby advancing this field of research.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# QOG:言語モデルに基づくクエクションとオプション生成 QOG:Question and Options Generation based on Language Model ( http://arxiv.org/abs/2406.12381v3 ) ライセンス: Link先を確認	Jincheng Zhou,	(参考訳) 質問-オプション生成(QOG)は、与えられたコンテキストの一連の質問-オプションペアを生成するタスクである。このタスクには、微調整された大規模モデル、情報検索、教育用複数選択質問の自動生成など、さまざまな応用がある。本稿では,細調整シーケンス・ツー・シーケンス言語モデル(LM)に基づく3つの異なる手法を用いてQOGモデルを開発する。実験により、エンドツーエンドのQOGモデルは、トレーニングと推論の両方において計算効率が良く、安定であり、他の手法よりも優れていることが示された。さらに,我々のQOGモデルは,大規模言語モデルであるLlama 3-8Bと比較して,QOGタスクにおいて競合することを示す。 Question-Options Generation (QOG) is a task that involves generating a set of question-options pairs given context. This task has various applications, including fine-tuning large models, information retrieval, and automated multiple-choice question generation for education. In this paper, we develop QOG models using three different methods based on fine-tuning sequence-to-sequence language models (LMs). Experiments demonstrate that the end-to-end QOG model is computationally efficient and stable during both training and inference, outperforming other methods. Furthermore, our analysis indicates that our QOG models are competitive on the QOG task compared to the large language model Llama 3-8B.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# ボソニックボゴリューボフ準粒子の量子幾何学 Quantum geometry of bosonic Bogoliubov quasiparticles ( http://arxiv.org/abs/2406.12981v2 ) ライセンス: Link先を確認	Isaac Tesfaye, André Eckardt,	(参考訳) ボソニックなBogoliubov-de Gennes(BdG)系で生じる位相的および幾何学的特徴は、ベリー曲率の一般化されたシンプレクティック版と関連するチャーン数を用いて主に研究されている。ここではシンプレクティック量子幾何テンソル(SQGT)を提案し、その虚部が以前に研究されたシンプレクティックベリー曲率を導く一方、実部はシンプレクティック量子計量を生じさせ、ボゴリューボフモードの空間における自然な距離測度を与える。本稿では,SQGTのパラメータの周期的変調に応答して励起率を抽出し,SQGTのすべての成分を測定する方法を提案する。さらに、シンプレクティックベリー曲率をボゴリューボフ・ブロッホ波パケットの一般化されたシンプレクティック異常速度項に接続する。ボソニックなボゴリューボフ・ハルダンモデルについて実験を行った。 Topological and geometrical features arising in bosonic Bogoliubov-de Gennes (BdG) systems have mainly been studied by utilizing a generalized symplectic version of the Berry curvature and related Chern numbers. Here, we propose a symplectic quantum geometric tensor (SQGT), whose imaginary part leads to the previously studied symplectic Berry curvature, while the real part gives rise to a symplectic quantum metric, providing a natural distance measure in the space of bosonic Bogoliubov modes. We propose how to measure all components of the SQGT by extracting excitation rates in response to periodic modulations of the systems' parameters. Moreover, we connect the symplectic Berry curvature to a generalized symplectic anomalous velocity term for Bogoliubov Bloch wave packets. We test our results for a bosonic Bogoliubov-Haldane model.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# 臨床適応を考慮した生体医用ビジュアルインストラクションチューニング Biomedical Visual Instruction Tuning with Clinician Preference Alignment ( http://arxiv.org/abs/2406.13173v3 ) ライセンス: Link先を確認	Hejie Cui, Lingjun Mao, Xin Liang, Jieyu Zhang, Hui Ren, Quanzheng Li, Xiang Li, Carl Yang,	(参考訳) マルチモーダル基礎モデルの最近の進歩は、視覚情報やテキスト情報による理解と推論において、印象的な能力を示した。これらの基礎モデルをバイオメディシンのような特殊なドメインに適用するには、大規模なドメイン固有の命令データセットが必要である。既存の作業では、そのようなデータセットを自動的にキュレーションする方法が検討されているが、結果のデータセットは、ドメインの専門知識と明確に一致していない。本研究では,臨床医の嗜好をバイオメディカル・マルチモーダル基礎モデルのチューニングのための指導データの生成と選択の両段階に組み込むデータ中心型ビオメディカル・ビジュアル・インストラクション・チューニング(BioMed-VITAL)を提案する。まず,GPT-4Vジェネレータに,好みに整合したデータ候補生成のための多種多様なクリニック選択による実演を誘導する。そして、選択期間中に、臨床医と政策指導を受けたモデルの選別を評価関数に明示的に蒸留して、医用指導のための高品質なデータを選択する別個の選別モデルを訓練する。その結果,提案手法から得られた指示追従データに調整したモデルでは,オープン・ビジュアル・チャット(18.5%)と医療用VQA(81.73%)の大幅な改善が見られた。 BioMed-VITAL.github.ioでは、インストラクション追跡データとモデルが利用可能です。 Recent advancements in multimodal foundation models have showcased impressive capabilities in understanding and reasoning with visual and textual information. Adapting these foundation models trained for general usage to specialized domains like biomedicine requires large-scale domain-specific instruction datasets. While existing works have explored curating such datasets automatically, the resultant datasets are not explicitly aligned with domain expertise. In this work, we propose a data-centric framework, Biomedical Visual Instruction Tuning with Clinician Preference Alignment (BioMed-VITAL), that incorporates clinician preferences into both stages of generating and selecting instruction data for tuning biomedical multimodal foundation models. First, during the generation stage, we prompt the GPT-4V generator with a diverse set of clinician-selected demonstrations for preference-aligned data candidate generation. Then, during the selection phase, we train a separate selection model, which explicitly distills clinician and policy-guided model preferences into a rating function to select high-quality data for medical instruction tuning. Results show that the model tuned with the instruction-following data from our method demonstrates a significant improvement in open visual chat (18.5% relatively) and medical VQA (win rate up to 81.73%). Our instruction-following data and models are available at BioMed-VITAL.github.io.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# 空間ボット:視覚言語モデルを用いた精密空間理解 SpatialBot: Precise Spatial Understanding with Vision Language Models ( http://arxiv.org/abs/2406.13642v3 ) ライセンス: Link先を確認	Wenxiao Cai, Yaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao,	(参考訳) 視覚言語モデル(VLM)は2次元画像理解において目覚ましい性能を達成しているが、Embodied AIの基盤である空間的理解に苦慮している。本稿では,RGB画像と深度画像の両方をフィードすることで,空間的理解を向上させるためのSpatialBotを提案する。さらに、深度理解のためのVLMを訓練するために、多段階の深度関連質問を含むSpatialQAデータセットを構築した。最後に、異なるレベルでの空間理解におけるVLMの能力を総合的に評価するために、SpatialBenchを提案する。我々の空間理解ベンチマーク、一般的なVLMベンチマーク、Embodied AIタスクに関する大規模な実験は、SpatialQAでトレーニングされたSpatialBotの顕著な改善を実証している。モデル、コード、データはhttps://github.com/BAAI-DCAI/SpatialBotで入手できる。 Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI. In this paper, we propose SpatialBot for better spatial understanding by feeding both RGB and depth images. Additionally, we have constructed the SpatialQA dataset, which involves multi-level depth-related questions to train VLMs for depth understanding. Finally, we present SpatialBench to comprehensively evaluate VLMs' capabilities in spatial understanding at different levels. Extensive experiments on our spatial-understanding benchmark, general VLM benchmarks and Embodied AI tasks, demonstrate the remarkable improvements of SpatialBot trained on SpatialQA. The model, code and data are available at https://github.com/BAAI-DCAI/SpatialBot.	翻訳日:2024-07-17 20:39:37 公開日:2024-07-16
# REVEAL-IT:InTerpretabilityのための進化エージェントpoLicyの可視性を用いた強化学習 REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability ( http://arxiv.org/abs/2406.14214v4 ) ライセンス: Link先を確認	Shuang Ao, Simon Khan, Haris Aziz, Flora D. Salim,	(参考訳) エージェントの学習過程、特にその成功や訓練後の失敗に寄与する要因を理解することは、エージェントの意思決定プロセスの背後にある根拠を理解するために重要である。従来の手法では、構造因果モデル(SCM)を作成したり、価値関数の分布を視覚的に表現することで学習過程を明らかにする。しかしながら、これらのアプローチは2次元環境や複雑でない遷移力学でのみ機能するので制約がある。複雑な環境やタスクでエージェントの学習プロセスを理解することはより難しい。本稿では,複雑な環境下でエージェントの学習過程を説明するための新しいフレームワークであるREVEAL-ITを提案する。まず,様々な学習課題に対する政策構造とエージェントの学習過程を可視化する。これらの知見を可視化することにより、特定のトレーニングタスクやステージがテストにおけるエージェントのパフォーマンスにどの程度影響するかを理解することができる。そして、GNNベースの説明者がポリシーの最も重要な部分を強調することを学び、エージェントの学習プロセスについてより明確で堅牢な説明を提供する。実験により,本フレームワークから導出した説明は,学習効率の向上と最終性能の向上に有効であることが示された。 Understanding the agent's learning process, particularly the factors that contribute to its success or failure post-training, is crucial for comprehending the rationale behind the agent's decision-making process. Prior methods clarify the learning process by creating a structural causal model (SCM) or visually representing the distribution of value functions. Nevertheless, these approaches have constraints as they exclusively function in 2D-environments or with uncomplicated transition dynamics. Understanding the agent's learning process in complicated environments or tasks is more challenging. In this paper, we propose REVEAL-IT, a novel framework for explaining the learning process of an agent in complex environments. Initially, we visualize the policy structure and the agent's learning process for various training tasks. By visualizing these findings, we can understand how much a particular training task or stage affects the agent's performance in test. Then, a GNN-based explainer learns to highlight the most important section of the policy, providing a more clear and robust explanation of the agent's learning process. The experiments demonstrate that explanations derived from this framework can effectively help in the optimization of the training tasks, resulting in improved learning efficiency and final performance.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# アップロード可能な機械学習のためのLoRAエキスパートの検索・拡張混合 Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning ( http://arxiv.org/abs/2406.16989v2 ) ライセンス: Link先を確認	Ziyu Zhao, Leilei Gan, Guoyin Wang, Yuwei Hu, Tao Shen, Hongxia Yang, Kun Kuang, Fei Wu,	(参考訳) Low-Rank Adaptation (LoRA)は、大規模言語モデル(LLM)を微調整する効率的な方法を提供する。モジュール性とプラグアンドプレイ性により、様々なドメイン固有のLoRAの統合が可能になり、LLMの能力が向上する。 HuggingfaceやModelscopeのようなオープンソースのプラットフォームは、新しい計算パラダイムであるUploadable Machine Learning (UML)を導入した。 UMLでは、コントリビュータは専用のアダプタをトレーニングするために分散データを使用し、LLMを改善するために中央プラットフォームにアップロードされる。このプラットフォームでは、ドメイン固有のアダプタを使用して、パーソナライズされたサービスを必要とする混合タスク要求を処理する。 LoRAの以前の研究は、特定のタスクに焦点を当てたり、トレーニング中のLoRAの選択を修正したりしていた。しかしUMLでは、LoRAのプールは動的に更新され、新しいアップロードが加えられる。さらに、ダウンストリームリクエストの混在する性質は、パーソナライズされたサービスを必要とします。これらの課題に対処するために、入力プロンプトに基づいて複数のLoRAを適応的に検索・構成するフレームワークであるLora Experts (RAMoLE)を提案する。 RAMoLEには、関連するLoRAを特定して検索するLoraRetriever、取得したLoRAをコーディネートするオンザフライのMoLEメカニズム、異種リクエストを処理するための効率的なバッチ推論の3つの主要コンポーネントがある。実験の結果、RAMoLEはベースラインを一貫して上回り、その有効性とスケーラビリティを強調している。 Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs). Its modular and plug-and-play nature allows the integration of various domain-specific LoRAs, enhancing LLM capabilities. Open-source platforms like Huggingface and Modelscope have introduced a new computational paradigm, Uploadable Machine Learning (UML). In UML, contributors use decentralized data to train specialized adapters, which are then uploaded to a central platform to improve LLMs. This platform uses these domain-specific adapters to handle mixed-task requests requiring personalized service. Previous research on LoRA composition either focuses on specific tasks or fixes the LoRA selection during training. However, in UML, the pool of LoRAs is dynamically updated with new uploads, requiring a generalizable selection mechanism for unseen LoRAs. Additionally, the mixed-task nature of downstream requests necessitates personalized services. To address these challenges, we propose Retrieval-Augmented Mixture of LoRA Experts (RAMoLE), a framework that adaptively retrieves and composes multiple LoRAs based on input prompts. RAMoLE has three main components: LoraRetriever for identifying and retrieving relevant LoRAs, an on-the-fly MoLE mechanism for coordinating the retrieved LoRAs, and efficient batch inference for handling heterogeneous requests. Experimental results show that RAMoLE consistently outperforms baselines, highlighting its effectiveness and scalability.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# 反断熱駆動による超低温原子によるNOON状態の加速生成 Accelerated creation of NOON states with ultracold atoms via counterdiabatic driving ( http://arxiv.org/abs/2406.17545v3 ) ライセンス: Link先を確認	Simon Dengis, Sandro Wimberger, Peter Schlagheck,	(参考訳) 量子制御プロトコルは、2つのモードでN$ウルトラコールドボソニック原子を持つNOON状態を生成するために提案され、コヒーレントな重ね合わせ $\vert N,0\rangle + \vert 0,N\rangle$ に対応する。この状態は、最初に全てのボソンが配置され、他の2つのモードと対称に結合された第3モードを用いて作成することができる。この第3モードのエネルギーを他のモードのエネルギーレベルに調整することで、NOON状態の断熱的な生成が可能になる。通常、このプロセスは実用性には時間がかかりすぎるが、関連するスペクトルギャップの小さいため、効率的なギャップ工学を可能にする反断熱駆動によって劇的に加速することができる。このプロセスは、超低温量子ガスで実験的に実現可能な静的パラメータ適応の観点で実装可能であることを実証する。要求されるプロトコル速度における利得因子は、関与する原子の数と指数関数的に増加し、したがって、この断熱遷移の根底にある指数関数的に遅い集団トンネル過程と相反する。 A quantum control protocol is proposed for the creation of NOON states with $N$ ultracold bosonic atoms on two modes, corresponding to the coherent superposition $\vert N,0\rangle + \vert 0,N\rangle$. This state can be prepared by using a third mode where all bosons are initially placed and which is symmetrically coupled to the two other modes. Tuning the energy of this third mode across the energy level of the other modes allows the adiabatic creation of the NOON state. While this process normally takes too much time to be of practical usefulness, due to the smallness of the involved spectral gap, it can be drastically boosted through counterdiabatic driving which allows for efficient gap engineering. We demonstrate that this process can be implemented in terms of static parameter adaptations that are experimentally feasible with ultracold quantum gases. Gain factors in the required protocol speed are obtained that increase exponentially with the number of involved atoms and thus counterbalance the exponentially slow collective tunneling process underlying this adiabatic transition.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# 小結晶の接地状態と接地状態の分離 Fast Ground State to Ground State Separation of Small Ion Crystals ( http://arxiv.org/abs/2406.17750v3 ) ライセンス: Link先を確認	Tyler H. Guglielmo, Dietrich Leibfried, Stephen B. Libby, Daniel H. Slichter,	(参考訳) 捕捉されたイオンの線形結晶を異なるサブセットに素早く分離することは、捕捉されたイオン量子コンピューティングアーキテクチャを実現する上で重要である。我々は,同種結晶と混合種結晶のより小さな部分集合への分離を記述するのに使用できる一般的な理論的枠組みを紹介する。この枠組みは二次ハミルトニアンの下でのガウス運動状態の進化の効率的な記述に依存しており、時間依存の応用ポテンシャルとイオンの相互クーロン反発の影響の下で量子進化を記述するために、イオンの古典的な運動方程式の特別な解のみを必要とする。本研究では, 混合種3イオン結晶の分離に適した時間依存性応用電位について, クーロン反発による自由膨張と同様の時間スケールで示し, 結晶軸に沿った全てのモードが基底状態に近づき, 終了することを示す。 3つの分離された混合種イオンは、この分離過程の時間反転によってエネルギーのゲインなしで1つの井戸に保持される結晶に結合することができる。 Rapid separation of linear crystals of trapped ions into different subsets is critical for realizing trapped ion quantum computing architectures where ions are rearranged in trap arrays to achieve all-to-all connectivity between qubits. We introduce a general theoretical framework that can be used to describe the separation of same-species and mixed-species crystals into smaller subsets. The framework relies on an efficient description of the evolution of Gaussian motional states under quadratic Hamiltonians that only requires a special solution of the classical equations of motion of the ions to describe their quantum evolution under the influence of a time-dependent applied potential and the ions' mutual Coulomb repulsion. We provide time-dependent applied potentials suitable for separation of a mixed species three-ion crystal on timescales similar to that of free expansion driven by Coulomb repulsion, with all modes along the crystal axis starting and ending close to their ground states. Three separately-confined mixed species ions can be combined into a crystal held in a single well without energy gain by time-reversal of this separation process.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# UniRec:シーケンスレコメンデーションにおける均一性と周波数の二重化 UniRec: A Dual Enhancement of Uniformity and Frequency in Sequential Recommendations ( http://arxiv.org/abs/2406.18470v3 ) ライセンス: Link先を確認	Yang Liu, Yitong Wang, Chenyue Feng,	(参考訳) ユーザのインタラクションパターンを正確にモデル化し、レコメンデーション精度を向上させるためには、シーケンシャルなレコメンデーションでの表現学習が重要である。しかし、既存のアプローチは主にアイテム間遷移を強調しており、しばしば行動パターンの変化と密接に関連する相互作用間の時間間隔を無視している。さらに、アイテム周波数などのより広範な相互作用属性は、しばしば見過ごされる。その結果,より均一な時間間隔を持つシーケンスと高い周波数を持つアイテムの両方で予測性能が向上することが判明した。逆に、一様でないシーケンスはユーザーの関心のドリフトを悪化させ、スパースサンプリングにより頻繁でないアイテムをモデル化することは困難であり、現在の手法では不十分に対処する固有の課題が提示される。本稿では,新しい双方向拡張シーケンシャルレコメンデーション手法であるUniRecを提案する。 UniRecは、シーケンスの均一性とアイテムの頻度を活用してパフォーマンスを高め、特に一様でないシーケンスやあまり頻度の低いアイテムの表現を改善している。これら2つのブランチは相互に強化され、複雑なシーケンシャルなレコメンデーションシナリオにおける包括的なパフォーマンス最適化を推進します。さらに,適応性をさらに向上する多次元時間モジュールを提案する。我々の知る限り、UniRecは特徴増強のための均一性と周波数の特性を利用する最初の方法である。 4つのデータセットにまたがる11の高度なモデルと比較して、UniRecがSOTAモデルを大幅に上回っていることを示す。コードはhttps://github.com/Linxi000/UniRec.comで入手できる。 Representation learning in sequential recommendation is critical for accurately modeling user interaction patterns and improving recommendation precision. However, existing approaches predominantly emphasize item-to-item transitions, often neglecting the time intervals between interactions, which are closely related to behavior pattern changes. Additionally, broader interaction attributes, such as item frequency, are frequently overlooked. We found that both sequences with more uniform time intervals and items with higher frequency yield better prediction performance. Conversely, non-uniform sequences exacerbate user interest drift and less-frequent items are difficult to model due to sparse sampling, presenting unique challenges inadequately addressed by current methods. In this paper, we propose UniRec, a novel bidirectional enhancement sequential recommendation method. UniRec leverages sequence uniformity and item frequency to enhance performance, particularly improving the representation of non-uniform sequences and less-frequent items. These two branches mutually reinforce each other, driving comprehensive performance optimization in complex sequential recommendation scenarios. Additionally, we present a multidimensional time module to further enhance adaptability. To the best of our knowledge, UniRec is the first method to utilize the characteristics of uniformity and frequency for feature augmentation. Comparing with eleven advanced models across four datasets, we demonstrate that UniRec outperforms SOTA models significantly. The code is available at https://github.com/Linxi000/UniRec.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# 人間-AI協調型分類体系の構築--専門的な書記アシスタントを事例として Human-AI Collaborative Taxonomy Construction: A Case Study in Profession-Specific Writing Assistants ( http://arxiv.org/abs/2406.18675v2 ) ライセンス: Link先を確認	Minhwa Lee, Zae Myung Kim, Vivek Khetan, Dongyeop Kang,	(参考訳) LLM(Large Language Models)は、テキストのリビジョンやストーリー生成など、複数の作業において人間を支援する。しかし、ドメイン固有の記述、特にビジネスコンテキストにおけるサポートの有効性は、比較的調査されていない。業界専門家とのフォーマティブな研究により、このようなドメイン固有の文章のニュアンスに対する現在のLLMの理解の限界が明らかになった。このギャップに対処するため、我々は、ドメイン固有書記アシスタントのガイドラインとして機能する人間-AI協調分類開発手法を提案する。この手法は、ドメインの専門家からの反復的なフィードバックと、これらの専門家とLSM間の複数の相互作用を統合し、分類学を洗練させる。大規模な実験を通じて、我々はこの方法論を検証し、LCMを活用した筆記支援を改善することを目指しており、異なる利害関係者のニーズのユニークな要件を満たすように調整している。 Large Language Models (LLMs) have assisted humans in several writing tasks, including text revision and story generation. However, their effectiveness in supporting domain-specific writing, particularly in business contexts, is relatively less explored. Our formative study with industry professionals revealed the limitations in current LLMs' understanding of the nuances in such domain-specific writing. To address this gap, we propose an approach of human-AI collaborative taxonomy development to perform as a guideline for domain-specific writing assistants. This method integrates iterative feedback from domain experts and multiple interactions between these experts and LLMs to refine the taxonomy. Through larger-scale experiments, we aim to validate this methodology and thus improve LLM-powered writing assistance, tailoring it to meet the unique requirements of different stakeholder needs.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# Zボソンの仮想励起から生じるニュートリノ振動 Neutrino oscillations originate from virtual excitation of Z bosons ( http://arxiv.org/abs/2407.00954v2 ) ライセンス: Link先を確認	Shi-Biao Zheng,	(参考訳) ニュートリノ振動を説明するために、ニュートリノは無消滅質量を持ち、各フレーバー固有状態は3つの異なる質量固有状態によって形成され、確率振幅はその伝播中に互いに干渉する。しかし、エネルギー保存法則は、もし存在するならば、ニュートリノと同じ弱い相互作用によって生成された他の粒子の異なる結合エネルギー固有状態と絡み合わなければならない。この絡み合いによってニュートリノの質量固有状態間の量子コヒーレンスが破壊され、前述の仮定の下でのフレーバーの振動の原因となる。ニュートリノ振動は、実際に空間上を拡散するZボゾン場の仮想励起に由来する。伝播中、ニュートリノは継続的に励起し、すぐに仮想Zボソンを吸収する。この仮想ボゾン励起はニュートリノに逆作用を起こし、3つのフレーバーの間で振動する。ニュートリノが物質中に伝播するとき、その挙動は散乱に起因するコヒーレントフレーバー変換とデコヒーレンス効果の競合によって決定される。 To account for neutrino oscillations, it is postulated that the neutrino has nonvanishing mass and each flavor eigenstate is formed by three distinct mass eigenstates, whose probability amplitudes interfere with each other during its propagation. However, I find that the energy conservation law requires these mass eigenstates, if they exist, to be entangled with distinct joint energy eigenstates of the other particles produced by the same weak interaction as the neutrino. This entanglement destroys the quantum coherence among the neutrino's mass eigenstates, which is responsible for flavor oscillations under the aforementioned postulation. I reveal that the neutrino oscillations actually originate from virtual excitation of the Z bosonic field diffusing over the space. During the propagation, the neutrino can continually excite and then immediately re-absorb a virtual Z boson. This virtual bosonic excitation produces a backaction on the neutrino, enabling it to oscillate among three flavors. When the neutrino propagates in matter, its behavior is determined by the competition between the coherent flavor transformation and decoherence effect resulting from scatterings.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# アクター・クリティカル強化学習による測地線の生成と中間点の予測 Generation of Geodesics with Actor-Critic Reinforcement Learning to Predict Midpoints ( http://arxiv.org/abs/2407.01991v2 ) ライセンス: Link先を確認	Kazumi Kasaura,	(参考訳) 無限小に定義された測度を持つ多様体上のすべての対の最も短い経路を見つけるために、中間点を再帰的に予測し、中間点予測を学ぶアクター・クリティカルな方法を提案する。提案手法は,提案手法が局所的・グローバルな経路計画タスクにおいて既存手法よりも優れていることを示す。 To find the shortest paths for all pairs on manifolds with infinitesimally defined metrics, we propose to generate them by predicting midpoints recursively and an actor-critic method to learn midpoint prediction. We prove the soundness of our approach and show experimentally that the proposed method outperforms existing methods on both local and global path planning tasks.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# カメラベースセマンティックシーン補完のための階層的時間文脈学習 Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion ( http://arxiv.org/abs/2407.02077v3 ) ライセンス: Link先を確認	Bohan Li, Jiajun Deng, Wenyao Zhang, Zhujin Liang, Dalong Du, Xin Jin, Wenjun Zeng,	(参考訳) カメラベースの3Dセマンティックシーン補完(SSC)は、2D画像の観察に制限のある複雑な3Dレイアウトを予測するために重要である。既存の主流のソリューションは一般的に、履歴フレームを概ね積み重ねて現在のフレームを補うことで、時間的情報を活用する。この問題に対処するために、カメラベースのセマンティックシーン補完を改善するための新しい階層型時間文脈学習パラダイムであるHTCLを提案する。この研究の主な革新は、時間的文脈学習を2つの階層的なステップに分解することである。 a)クロスフレーム親和性測定および (b)親和性に基づくダイナミックリファインメント。まず、重要コンテキストを冗長な情報から分離するために、パターン親和性とスケールアウェアアイソレーションと、よりきめ細かいコンテキスト対応モデリングのための複数の独立した学習者を導入する。その後、不完全観測を動的に補償するために、初期同定されたアフィニティの高い位置とその周辺地域に基づいて特徴サンプリング位置を適応的に洗練する。提案手法はSemanticKITTIベンチマークで1^{st}$をランク付けし,OpenOccupancyベンチマークでmIoUの点でLiDARベースのメソッドを超えている。私たちのコードはhttps://github.com/Arlo0o/HTCL.comで利用可能です。 Camera-based 3D semantic scene completion (SSC) is pivotal for predicting complicated 3D layouts with limited 2D image observations. The existing mainstream solutions generally leverage temporal information by roughly stacking history frames to supplement the current frame, such straightforward temporal modeling inevitably diminishes valid clues and increases learning difficulty. To address this problem, we present HTCL, a novel Hierarchical Temporal Context Learning paradigm for improving camera-based semantic scene completion. The primary innovation of this work involves decomposing temporal context learning into two hierarchical steps: (a) cross-frame affinity measurement and (b) affinity-based dynamic refinement. Firstly, to separate critical relevant context from redundant information, we introduce the pattern affinity with scale-aware isolation and multiple independent learners for fine-grained contextual correspondence modeling. Subsequently, to dynamically compensate for incomplete observations, we adaptively refine the feature sampling locations based on initially identified locations with high affinity and their neighboring relevant regions. Our method ranks $1^{st}$ on the SemanticKITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU on the OpenOccupancy benchmark. Our code is available on https://github.com/Arlo0o/HTCL.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# TIGER: 実践的なPython型推論のための生成テーマランキングフレームワーク TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference ( http://arxiv.org/abs/2407.02095v2 ) ライセンス: Link先を確認	Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, Xin Peng,	(参考訳) Pythonの動的型付けシステムは柔軟性と表現力を提供するが、型関連のエラーを引き起こす可能性があるため、型ヒントを強化するために自動型推論が必要になる。既存の学習ベースのアプローチは有望な推論精度を示しているが、複雑なジェネリックタイプや(見えない)ユーザ定義型など、さまざまなタイプを包括的に扱うという実践的な課題に苦慮している。本稿では,Pythonの多種多様な型カテゴリを効果的に扱えるように設計された2段階生成レベル(GTR)フレームワークであるTIGERを紹介する。 TIGERは、微調整された事前訓練されたコードモデルを利用して、スパンマスキングの目的を持つ生成モデルを訓練し、対照的なトレーニングの目的を持つ類似モデルを訓練する。このアプローチにより、TIGERは生成段階の複雑なジェネリクスを含む幅広い型候補を生成し、ランキング段階のユーザ定義型を正確にランク付けすることができる。 ManyTypes4Pyデータセットに対する評価は、TIGERが様々なタイプのカテゴリで既存のメソッドよりも優れていることを示し、特にTop-5 Exact Matchにおいて、ユーザ定義型と未確認型をそれぞれ11.2%、20.1%の精度で推測する際の精度を向上している。さらに、実験結果は、TIGERの優れた性能と効率を示すだけでなく、自動型推論の自動化における生成およびランキングステージの重要性も示している。 Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# プロトタイプベース継手埋め込み法によるソフトマックス分類器の説明可能性の向上 Improving Explainability of Softmax Classifiers Using a Prototype-Based Joint Embedding Method ( http://arxiv.org/abs/2407.02271v2 ) ライセンス: Link先を確認	Hilarie Sit, Brendan Keith, Karianne Bergen,	(参考訳) 本稿では,プロトタイプの確率的サンプリングによって生成される予測信頼度を提供するソフトマックス分類器の説明可能性向上のためのプロトタイプベースアプローチを提案し,分布検出(OOD)の可能性を示す。モデルアーキテクチャとトレーニングを変更して、トレーニングデータセットの任意のクラス例と類似性を利用して予測を行うことで、予測に寄与する原型例のサンプルを取得でき、モデルの決定に対するインスタンスベースの説明を提供する。さらに,モデルの潜在空間内の相対距離からトレーニングデータセットから画像間の関係を学習することにより,分布データからソフトマックスの信頼性よりも検出可能な不確かさの指標を得る。 We propose a prototype-based approach for improving explainability of softmax classifiers that provides an understandable prediction confidence, generated through stochastic sampling of prototypes, and demonstrates potential for out of distribution detection (OOD). By modifying the model architecture and training to make predictions using similarities to any set of class examples from the training dataset, we acquire the ability to sample for prototypical examples that contributed to the prediction, which provide an instance-based explanation for the model's decision. Furthermore, by learning relationships between images from the training dataset through relative distances within the model's latent space, we obtain a metric for uncertainty that is better able to detect out of distribution data than softmax confidence.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# ブラックボックスワーク抽出と複合仮説テスト Black box work extraction and composite hypothesis testing ( http://arxiv.org/abs/2407.03400v2 ) ライセンス: Link先を確認	Kaito Watanabe, Ryuji Takagi,	(参考訳) ワーク抽出は量子熱力学において最も中心的なプロセスの1つである。しかし、最適抽出可能な作業の事前解析は、初期状態に関する完全な情報が与えられる限られた運用シナリオに限定されている。ここでは,ブラックボックス作業抽出の一般的な枠組みを紹介し,初期状態に関する情報の入手不能に対処する。ブラックボックス設定における最適抽出可能作業は,情報理論の基本的な問題である複合仮説テストタスクの性能によって完全に特徴づけられ,この一般関係を用いて,合成仮説テストにおける量子シュタインの補題への漸近ブラックボックスワーク抽出を削減し,ヘルムホルツ自由エネルギーの観点からそれらの正確な特徴付けを行うことができることを示す。また、この物理環境では、合成仮説が特定の相関を含む新しい量子シュタインの補題も示している。本研究は、初期状態に関する情報の重要性を示し、複合量子仮説テストにおける量の新しい解釈を与え、物理設定と情報理論の相互作用を奨励する。 Work extraction is one of the most central processes in quantum thermodynamics. However, the prior analysis of optimal extractable work has been restricted to a limited operational scenario where complete information about the initial state is given. Here, we introduce a general framework of black box work extraction, which addresses the inaccessibility of information on the initial state. We show that the optimal extractable work in the black box setting is completely characterized by the performance of a composite hypothesis testing task, a fundamental problem in information theory.We employ this general relation to reduce the asymptotic black box work extraction to the quantum Stein's lemma in composite hypothesis testing, allowing us to provide their exact characterization in terms of the Helmholtz free energy. We also show a new quantum Stein's lemma motivated in this physical setting, where a composite hypothesis contains a certain correlation. Our work exhibits the importance of information about the initial state and gives a new interpretation of the quantities in the composite quantum hypothesis testing, encouraging the interplay between the physical settings and the information theory.	翻訳日:2024-07-17 20:29:52 公開日:2024-07-16
# 慣用翻訳におけるLLM能力の向上 Improving LLM Abilities in Idiomatic Translation ( http://arxiv.org/abs/2407.03518v2 ) ライセンス: Link先を確認	Sundesh Donthi, Maximilian Spencer, Om Patel, Joon Doh, Eid Rodan,	(参考訳) NLLBやGPTのような大きな言語モデル(LLM)では、イディオムの翻訳は依然として困難である。我々のゴールは、本来の言語スタイルを保ちながら、慣用的な言語のLLM処理を改善することで、翻訳の忠実性を高めることである。これは、文化的なニュアンスを維持し、翻訳されたテキストがその意図と感情的共鳴を維持し、より優れた文化的なコミュニケーションを育むことを保証するため、大きな社会的影響を持つ。これまでの研究は、翻訳に使用する慣用句の意味をLLMに提供することで、IdiomKBのような知識ベースを利用してきた。この手法は直接翻訳よりも優れた結果を得たが、言語間で慣用的な書体を維持する能力は依然として限られている。本研究では,対象言語に対応するイディオムを見つけるために,知識ベースを拡大する。本研究は,2つの手法を用いて翻訳を行う。第1の方法はSentence Transformersモデルを用いて,原語と対象言語のイディオムの意味のコサイン類似度スコアを意味的に生成し,最適なイディオムを選択する(コサイン類似度法)。第2の方法は、LLM生成イディオム法(LLM生成イディオム法)において、対象言語で対応するイディオムを見つけるためにLLMを使用する。ベースラインとして、追加情報を提供しずに直接翻訳を行った。英語・中国語・中国語の人的評価は,すべてのGPT4o翻訳において,コサイン類似性検索法が他より優れていたことを示している。 IdiomKBのさらなる構築のために、Urduイディオムとそれらの翻訳を含む低リソースなUrduデータセットを開発した。データセットの制限にもかかわらず、Cosine similarity Lookupメソッドは、将来性を示し、言語障壁を克服し、中国語とウルドゥー語における多様な文学作品の探索を可能にする。 For large language models (LLMs) like NLLB and GPT, translating idioms remains a challenge. Our goal is to enhance translation fidelity by improving LLM processing of idiomatic language while preserving the original linguistic style. This has a significant social impact, as it preserves cultural nuances and ensures translated texts retain their intent and emotional resonance, fostering better cross-cultural communication. Previous work has utilized knowledge bases like IdiomKB by providing the LLM with the meaning of an idiom to use in translation. Although this method yielded better results than a direct translation, it is still limited in its ability to preserve idiomatic writing style across languages. In this research, we expand upon the knowledge base to find corresponding idioms in the target language. Our research performs translations using two methods: The first method employs the SentenceTransformers model to semantically generate cosine similarity scores between the meanings of the original and target language idioms, selecting the best idiom (Cosine Similarity method). The second method uses an LLM to find a corresponding idiom in the target language for use in the translation (LLM-generated idiom method). As a baseline, we performed a direct translation without providing additional information. Human evaluations on the English -> Chinese, and Chinese -> English show the Cosine Similarity Lookup method out-performed others in all GPT4o translations. To further build upon IdiomKB, we developed a low-resource Urdu dataset containing Urdu idioms and their translations. Despite dataset limitations, the Cosine Similarity Lookup method shows promise, potentially overcoming language barriers and enabling the exploration of diverse literary works in Chinese and Urdu.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# Beyond Pixels: マルチスケールパッチベースマルチラベル分類器による半スーパービジョンセマンティックセマンティックセマンティックセグメンテーション Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier ( http://arxiv.org/abs/2407.04036v2 ) ライセンス: Link先を確認	Prantik Howlader, Srijan Das, Hieu Le, Dimitris Samaras,	(参考訳) ピクセルコンテキスト情報を組み込むことは、正確なセグメンテーションに不可欠である。本稿では,文脈情報を組み込む効果的な方法は,パッチベースの分類器によるものであることを示す。このパッチ分類器は、画像領域内に存在するクラスを識別するように訓練され、イントラクタの除去を容易にし、小さなオブジェクトセグメントの分類を強化する。具体的には、既存の半教師付きセグメンテーション(SSS)フレームワーク用に設計された新しいプラグインモジュールであるMPMC(Multiscale Patch-based Multi-label Classifier)を紹介する。 MPMCはパッチレベルの監視を提供し、パッチ内の異なるクラスのピクセル領域の識別を可能にする。さらに、MPMCは、教師のうるさい疑似ラベル監視の影響を軽減するために、パッチレベルの分類を用いて適応的な擬似ラベル重みを学習する。この軽量モジュールは任意のSSSフレームワークに統合することができ、パフォーマンスを大幅に向上させることができる。提案手法を4つのSSS手法に統合し、2つの自然な画像と1つの医学的セグメンテーションデータセットにわたって改善することにより,提案手法の有効性を実証する。 Incorporating pixel contextual information is critical for accurate segmentation. In this paper, we show that an effective way to incorporate contextual information is through a patch-based classifier. This patch classifier is trained to identify classes present within an image region, which facilitates the elimination of distractors and enhances the classification of small object segments. Specifically, we introduce Multi-scale Patch-based Multi-label Classifier (MPMC), a novel plug-in module designed for existing semi-supervised segmentation (SSS) frameworks. MPMC offers patch-level supervision, enabling the discrimination of pixel regions of different classes within a patch. Furthermore, MPMC learns an adaptive pseudo-label weight, using patch-level classification to alleviate the impact of the teacher's noisy pseudo-label supervision the student. This lightweight module can be integrated into any SSS framework, significantly enhancing their performance. We demonstrate the efficacy of our proposed MPMC by integrating it into four SSS methodologies and improving them across two natural image and one medical segmentation dataset, notably improving the segmentation results of the baselines across all the three datasets.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# 知識に基づく医薬品サンプルの比較 Knowledge-based Drug Samples' Comparison ( http://arxiv.org/abs/2407.04317v2 ) ライセンス: Link先を確認	Sébastien Guillemin, Ana Roxin, Laurence Dujourdy, Ludovic Journaux,	(参考訳) ドラッグ・サンプル・コンファレンス(英: Drug sample comparison)は、フランス国家警察が麻薬の流通ネットワークを識別するプロセスである。現在のアプローチは、法医学の専門家による手動比較に基づいている。本稿では,現在のプロセスを改善するために専門家の知識を取得し,形式化し,特定するためのアプローチを提案する。基礎となる知識をモデル化するためには、オントロジーと論理的ルールを使います。このアプローチのさまざまなステップは、他のアプリケーションドメインで再利用するように設計されています。得られた結果は、さまざまな分野の専門家が利用できるように説明できる。 Drug sample comparison is a process used by the French National police to identify drug distribution networks. The current approach is based on manual comparison done by forensic experts. In this article, we present our approach to acquire, formalise, and specify expert knowledge to improve the current process. For modelling the underlying knowledge we use an ontology coupled with logical rules. The different steps of our approach are designed to be reused in other application domains. The results obtained are explainable making them usable by experts in different fields.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# 大規模言語モデルは戦略的意思決定者か? : 2プレイヤーノンゼロサムゲームのパフォーマンスとバイアスに関する研究 Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games ( http://arxiv.org/abs/2407.04467v2 ) ライセンス: Link先を確認	Nathan Herr, Fernando Acero, Roberta Raileanu, María Pérez-Ortiz, Zhibin Li,	(参考訳) 大規模言語モデル(LLM)は、現実世界での利用が増えているが、その戦略能力はほとんど解明されていない。ゲーム理論は、他のエージェントとの相互作用におけるLSMの意思決定能力を評価するための優れたフレームワークを提供する。以前の研究では、LSMは慎重に計算されたプロンプトでこれらのタスクを解くことができるが、問題の設定やプロンプトが変わると失敗する。本研究では,戦略ゲームにおける LLM の動作,Stag Hunt と Prisoner Dilemma について検討し,異なる設定とプロンプト下での性能変動を分析した。以上の結果から,(1)位置バイアス,(2)支払いバイアス,(3)行動バイアスの少なくとも1つが評価された。その結果,ゲーム構成が影響するバイアスと一致していない場合,LLMの性能は低下することがわかった。パフォーマンスは正しいアクションの選択に基づいて評価される。アライメント(Alignment)とは、LLMのバイアスが正しい動作と一致しているかどうかをいう。例えば、GPT-4oの平均性能は、不一致時に34%低下する。さらに、GPT-4o(現在の最高の性能のLCM)が最大の性能低下を被る「より大きく新しいもの」という現在の傾向は、上記のようには保たない。最後に、チェーン・オブ・ソート・プロンプトは、ほとんどのモデルにおけるバイアスの影響を減少させるが、根本的なレベルでの問題解決には程遠いことに留意する。 Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic abilities remain largely unexplored. Game theory provides a good framework for assessing the decision-making abilities of LLMs in interactions with other agents. Although prior studies have shown that LLMs can solve these tasks with carefully curated prompts, they fail when the problem setting or prompt changes. In this work we investigate LLMs' behaviour in strategic games, Stag Hunt and Prisoner Dilemma, analyzing performance variations under different settings and prompts. Our results show that the tested state-of-the-art LLMs exhibit at least one of the following systematic biases: (1) positional bias, (2) payoff bias, or (3) behavioural bias. Subsequently, we observed that the LLMs' performance drops when the game configuration is misaligned with the affecting biases. Performance is assessed based on the selection of the correct action, one which agrees with the prompted preferred behaviours of both players. Alignment refers to whether the LLM's bias aligns with the correct action. For example, GPT-4o's average performance drops by 34% when misaligned. Additionally, the current trend of "bigger and newer is better" does not hold for the above, where GPT-4o (the current best-performing LLM) suffers the most substantial performance drop. Lastly, we note that while chain-of-thought prompting does reduce the effect of the biases on most models, it is far from solving the problem at the fundamental level.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# LoRA-GA: 勾配近似による低ランク適応 LoRA-GA: Low-Rank Adaptation with Gradient Approximation ( http://arxiv.org/abs/2407.05000v2 ) ライセンス: Link先を確認	Shaowen Wang, Linxi Yu, Jian Li,	(参考訳) 微調整された大規模事前訓練モデルは、計算とメモリコストの点で極めて高価である。 LoRAは、パラメータ効率の良いファインチューニング(PEFT)手法として、パラメータが著しく少ない補助的な低ランクモデルを微調整することで、コスト効率の良い代替手段を提供する。 LoRAは各イテレーションで計算とメモリの要求を大幅に削減するが、広範な実証的な証拠は、完全な微調整に比べてかなり遅い速度で収束し、最終的には計算全体の増加とテスト性能の悪化につながることを示している。本稿では,LoRAの初期化手法の詳細な検討を行い,アーキテクチャやトレーニングアルゴリズムの変更なしに,注意深い初期化が効率と性能の両方を大幅に向上させることを示す。特に,新しい初期化手法であるLoRA-GA(Low Rank Adaptation with Gradient Approximation)を導入する。我々の広範囲な実験により、LoRA-GAは完全な微調整と同等の収束率(バニラのLoRAよりも大幅に高速であり、最近の改良もいくつかある)を同時に達成し、同時に同等あるいはより優れた性能を実現していることが示された。例えば、GLUEデータセットのサブセットであるT5-Baseでは、LoRA-GAは平均で5.69%向上している。 Llama 2-7Bのような大型モデルでは、それぞれMT-bench、GSM8K、Human-evalで0.34、1.52%、および5.05%の性能向上を示した。さらに,バニラロラに比べて最大2～4倍の収束速度向上が観察され,収束の促進とモデル性能の向上に効果が検証された。コードはhttps://github.com/Outsider565/LoRA-GAで入手できる。 Fine-tuning large-scale pretrained models is prohibitively expensive in terms of computational and memory costs. LoRA, as one of the most popular Parameter-Efficient Fine-Tuning (PEFT) methods, offers a cost-effective alternative by fine-tuning an auxiliary low-rank model that has significantly fewer parameters. Although LoRA reduces the computational and memory requirements significantly at each iteration, extensive empirical evidence indicates that it converges at a considerably slower rate compared to full fine-tuning, ultimately leading to increased overall compute and often worse test performance. In our paper, we perform an in-depth investigation of the initialization method of LoRA and show that careful initialization (without any change of the architecture and the training algorithm) can significantly enhance both efficiency and performance. In particular, we introduce a novel initialization method, LoRA-GA (Low Rank Adaptation with Gradient Approximation), which aligns the gradients of low-rank matrix product with those of full fine-tuning at the first step. Our extensive experiments demonstrate that LoRA-GA achieves a convergence rate comparable to that of full fine-tuning (hence being significantly faster than vanilla LoRA as well as various recent improvements) while simultaneously attaining comparable or even better performance. For example, on the subset of the GLUE dataset with T5-Base, LoRA-GA outperforms LoRA by 5.69% on average. On larger models such as Llama 2-7B, LoRA-GA shows performance improvements of 0.34, 11.52%, and 5.05% on MT-bench, GSM8K, and Human-eval, respectively. Additionally, we observe up to 2-4 times convergence speed improvement compared to vanilla LoRA, validating its effectiveness in accelerating convergence and enhancing model performance. Code is available at https://github.com/Outsider565/LoRA-GA.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# 安定したデータ駆動気候モデリングのための非局所力学学習の重要性について:1次元重力波-QBOテストベッド On the importance of learning non-local dynamics for stable data-driven climate modeling: A 1D gravity wave-QBO testbed ( http://arxiv.org/abs/2407.05224v2 ) ライセンス: Link先を確認	Hamid A. Pahlavan, Pedram Hassanzadeh, M. Joan Alexander,	(参考訳) 機械学習(ML)技術、特にニューラルネットワーク(NN)は、気候モデルのためのサブグリッドスケールパラメータ化の学習において有望であることを示している。しかし、特に教師付きアルゴリズムで学んだデータ駆動パラメータ化の大きな問題は、モデル不安定性である。現在の治療法は、しばしばアドホックであり、理論的な基礎を欠いている。ここでは、ML理論と気候物理を組み合わせて、NNベースのパラメータ化における不安定性の源となる問題に対処する。本研究では,重力波をパラメータ化した準双年振動(QBO)の1次元モデルを用いて,空間的に$\textit{non-local}$ dynamicsを学習することの重要性を示す。非局所的ダイナミクスの学習において、一般的なオフラインメトリクスは欠点を識別できないが、受容場(RF)の概念は不安定なa-prioriを識別できることを示す。風面からGW強制を正確に予測すると考えられるNNベースのパラメータ化(\mathbf{R^2 \approx 0.99}$)は、RFが小さすぎて非局所的ダイナミクスを捕捉できない場合、不安定なシミュレーションを引き起こす。本稿では,3種類のアーキテクチャ,すなわち畳み込みNN,フーリエニューラル演算子,および完全連結NNについて検討する。また、非局所的ダイナミクスの学習は、粒子風場のデータ駆動時空間エミュレータの安定性と精度に不可欠であることを示す。気候システムにおける非局所力学の多様性を考えると、あらゆるNNアーキテクチャで計算できる実効的なRFの利用は多くのアプリケーションにとって重要であると期待する。この研究は、気象と気候モデリングのためのデータ駆動アルゴリズムの設計と解析のために、ML理論と物理を統合する必要性を強調している。 Machine learning (ML) techniques, especially neural networks (NNs), have shown promise in learning subgrid-scale parameterizations for climate models. However, a major problem with data-driven parameterizations, particularly those learned with supervised algorithms, is model instability. Current remedies are often ad-hoc and lack a theoretical foundation. Here, we combine ML theory and climate physics to address a source of instability in NN-based parameterization. We demonstrate the importance of learning spatially $\textit{non-local}$ dynamics using a 1D model of the quasi-biennial oscillation (QBO) with gravity wave (GW) parameterization as a testbed. While common offline metrics fail to identify shortcomings in learning non-local dynamics, we show that the concept of receptive field (RF) can identify instability a-priori. We find that NN-based parameterizations that seem to accurately predict GW forcings from wind profiles ($\mathbf{R^2 \approx 0.99}$) cause unstable simulations when RF is too small to capture the non-local dynamics, while NNs of the same size but large-enough RF are stable. We examine three broad classes of architectures, namely convolutional NNs, Fourier neural operators, and fully-connected NNs; the latter two have inherently large RFs. We also demonstrate that learning non-local dynamics is crucial for the stability and accuracy of a data-driven spatiotemporal emulator of the zonal wind field. Given the ubiquity of non-local dynamics in the climate system, we expect the use of effective RF, which can be computed for any NN architecture, to be important for many applications. This work highlights the necessity of integrating ML theory with physics to design and analyze data-driven algorithms for weather and climate modeling.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# 2重ポテンシャルにおけるボース・アインシュタイン凝縮体のダイナミクスに対するレゲット・ガーグの不等式の振動 Violation of the Leggett-Garg Inequality for Dynamics of a Bose-Einstein Condensate in a Double-Well Potential ( http://arxiv.org/abs/2407.05304v2 ) ライセンス: Link先を確認	Tsubasa Sakamoto, Ryosuke Yoshii, Shunji Tsuchiya,	(参考訳) Leggett-Garg不等式 (LGI) は、Leggett と Garg が仮定したように、マクロ的システム力学のマクロ現実主義への密着性を決定する基準として機能する。この不等式に違反することは、システムの現実的な記述がないか、非侵襲的な測定の不現実性を意味する。本研究では,2重井戸電位におけるボソン系のLGI違反について検討する。具体的には, ボース・アインシュタイン・凝縮系(BEC)の二重井戸ポテンシャルにおけるボソンの力学におけるLGIの違反について検討する。分析の結果,LGIはヨーゼフソン振動により不規則であることが明らかとなった。特に、粒子数が増加するにつれて、LGIの違反がますます顕著になるのを観察する。これらの結果は、ボースの凝縮体のマクロ現実的挙動に関する貴重な洞察を与え、測定がマクロ系の力学に与える影響を強調している。 The Leggett-Garg inequality (LGI) serves as a criterion to determine the adherence of macroscopic system dynamics to macrorealism, as postulated by Leggett and Garg. A violation of this inequality implies either the absence of a realistic description of the system or the impracticality of noninvasive measurements. In this work, we investigate the violation of the LGI for the system of bosons in a double-well potential. Specifically, we explore the violation of the LGI in the dynamics of bosons in a double-well potential in the Bose-Einstein-Condensation (BEC) regime, where the system can be considered as two weakly coupled Bose condensates, and in the single-particle regime to establish the conditions under which the violation of the LGI occurs. Our analysis reveals that the LGI is violated due to Josephson oscillations, while it remains unviolated in the strong coupling regime, attributed to the self-trapping phenomena. Notably, we observe that the violation of the LGI becomes increasingly significant as the particle number increases. These findings provide valuable insights into the macrorealistic behavior of Bose condensates and highlight the effect of measurements on the dynamics of a macroscopic system.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# CPM:音声視覚分割のためのクラス条件プロンプティングマシン CPM: Class-conditional Prompting Machine for Audio-visual Segmentation ( http://arxiv.org/abs/2407.05358v2 ) ライセンス: Link先を確認	Yuanhong Chen, Chong Wang, Yuyuan Liu, Hu Wang, Gustavo Carneiro,	(参考訳) オーディオ・ビジュアル・セグメンテーション (AVS) は、オーディオ・ビジュアル・キューに基づいた音質オブジェクトを正確にセグメンテーションすることを目的とした新しいタスクである。 AVS学習システムの成功は、モーダル間相互作用の有効性に依存する。このような要求は、トランスフォーマーベースのセグメンテーションアーキテクチャを活用することで自然に達成できる。しかし,AVSでは,特に学習された音声クエリが明確な意味的手がかりを提供していない場合,クロスアテンションの有効性の低下や不安定なバイパーティイトマッチングなどのトランスフォーマーベースの手法の固有のトレーニング問題を増幅することができる。本稿では,これら2つの問題を,CPM(Class-conditional Prompting Machine)を用いて解決する。 CPMは、クラスに依存しないクエリとクラス条件のクエリを組み合わせた学習戦略により、バイパーティイトマッチングを改善している。クロスモーダルアテンションの有効性は,音声・視覚・関節モダリティの新しい学習目標によって向上する。我々はAVSベンチマーク実験を行い、その手法がSOTA(State-of-the-art)セグメンテーションの精度を実現することを示す。 Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment sounding objects based on audio-visual cues. The success of AVS learning systems depends on the effectiveness of cross-modal interaction. Such a requirement can be naturally fulfilled by leveraging transformer-based segmentation architecture due to its inherent ability to capture long-range dependencies and flexibility in handling different modalities. However, the inherent training issues of transformer-based methods, such as the low efficacy of cross-attention and unstable bipartite matching, can be amplified in AVS, particularly when the learned audio query does not provide a clear semantic clue. In this paper, we address these two issues with the new Class-conditional Prompting Machine (CPM). CPM improves the bipartite matching with a learning strategy combining class-agnostic queries with class-conditional queries. The efficacy of cross-modal attention is upgraded with new learning objectives for the audio, visual and joint modalities. We conduct experiments on AVS benchmarks, demonstrating that our method achieves state-of-the-art (SOTA) segmentation accuracy.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# OneDiff:画像差分キャプションのためのジェネリストモデル OneDiff: A Generalist Model for Image Difference Captioning ( http://arxiv.org/abs/2407.05645v2 ) ライセンス: Link先を確認	Erdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu,	(参考訳) コンピュータビジョンにおいて、画像差分キャプション(IDC)は、近縁な画像間の変化を正確に記述するために重要である。従来のIDCの手法は、様々な文脈における適用性を制限する専門的なモデルに依存していることが多い。本稿では,シマウマ画像エンコーダをビジュアルデルタモジュールに統合し,ロバストな視覚言語モデルアーキテクチャを利用する新しいジェネラリスト手法であるOneDiffモデルを紹介する。この革新的な構成により、画像ペア間の微細な違いを正確に検出し、明瞭にすることができる。 OneDiffは、結合サンプルトレーニングとマルチタスク学習を、新たに開発したDiffCap Datasetによってサポートされたさまざまなデータタイプにわたって含む、二重フェーズ戦略を通じてトレーニングされている。このデータセットは実世界のデータと合成データをマージし、トレーニングプロセスを強化し、モデルの堅牢性を強化します。 Spot-the-Diff、CLEVR-Change、Birds-to-Wordsといった多様なIDCベンチマークの広範なテストは、OneDiffが既存の最先端モデルを精度と適応性で一貫して上回り、平均85%のCIDErポイントの改善を実現していることを示している。 IDCに新しいベンチマークを設定することで、OneDiffは視覚的差異の検出と記述において、より汎用的で効果的なアプリケーションを実現することができる。コード、モデル、データは公開されます。 In computer vision, Image Difference Captioning (IDC) is crucial for accurately describing variations between closely related images. Traditional IDC methods often rely on specialist models, which restrict their applicability across varied contexts. This paper introduces the OneDiff model, a novel generalist approach that utilizes a robust vision-language model architecture, integrating a siamese image encoder with a Visual Delta Module. This innovative configuration allows for the precise detection and articulation of fine-grained differences between image pairs. OneDiff is trained through a dual-phase strategy, encompassing Coupled Sample Training and multi-task learning across a diverse array of data types, supported by our newly developed DiffCap Dataset. This dataset merges real-world and synthetic data, enhancing the training process and bolstering the model's robustness. Extensive testing on diverse IDC benchmarks, such as Spot-the-Diff, CLEVR-Change, and Birds-to-Words, shows that OneDiff consistently outperforms existing state-of-the-art models in accuracy and adaptability, achieving improvements of up to 85\% CIDEr points in average. By setting a new benchmark in IDC, OneDiff paves the way for more versatile and effective applications in detecting and describing visual differences. The code, models, and data will be made publicly available.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# パラメータ化とオプティマイザ間のスケーリング指数 Scaling Exponents Across Parameterizations and Optimizers ( http://arxiv.org/abs/2407.05872v2 ) ライセンス: Link先を確認	Katie Everett, Lechao Xiao, Mitchell Wortsman, Alexander A. Alemi, Roman Novak, Peter J. Liu, Izzeddin Gur, Jascha Sohl-Dickstein, Leslie Pack Kaelbling, Jaehoon Lee, Jeffrey Pennington,	(参考訳) モデルの小幅から大幅までのロバストで効果的なスケーリングには、パラメータ化やオプティマイザの選択など、多くのアルゴリズムやアーキテクチャの詳細を正確に調整する必要がある。本研究では,パラメータとデータのアライメントに関する先行研究における重要な仮定を調査し,より弱い仮定とより広い最適化条件の下での新たな理論的結果を導出することによる,パラメータ化に関する新たな視点を提案する。我々の広範な実証調査には、3つのオプティマイザと4つのパラメータ化、いくつかのアライメント仮定、12以上の学習率、最大26.8Bパラメータの14のモデルサイズの組み合わせで訓練された数万のモデルが含まれている。最高の学習率のスケーリング基準は、事前の作業の仮定から除外されることがよくあります。以上の結果から,最大更新パラメータ化(muP)だけでなく,すべてのパラメータ化がハイパーパラメータ転送を実現することが示唆された。最後に、パラメータ化の見過ごされた側面であるAdamのエプシロンパラメータが勾配下流を避けるために正しくスケールする必要があることを実証し、Epsilonハイパーパラメータを完全に排除するAdamの新しい数値安定なスケール不変バージョンAdam-atan2を提案する。 Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices. In this work, we propose a new perspective on parameterization by investigating a key assumption in prior work about the alignment between parameters and data and derive new theoretical results under weaker assumptions and a broader set of optimizers. Our extensive empirical investigation includes tens of thousands of models trained with all combinations of three optimizers, four parameterizations, several alignment assumptions, more than a dozen learning rates, and fourteen model sizes up to 26.8B parameters. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work. Our results show that all parameterizations, not just maximal update parameterization (muP), can achieve hyperparameter transfer; moreover, our novel per-layer learning rate prescription for standard parameterization outperforms muP. Finally, we demonstrate that an overlooked aspect of parameterization, the epsilon parameter in Adam, must be scaled correctly to avoid gradient underflow and propose Adam-atan2, a new numerically stable, scale-invariant version of Adam that eliminates the epsilon hyperparameter entirely.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# 表現の絡み合いの役割の解明--CLIPモデルにおける構成的一般化の考察 Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models ( http://arxiv.org/abs/2407.05897v2 ) ライセンス: Link先を確認	Reza Abbasi, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah,	(参考訳) CLIPモデルは、最近、OoD(Out of Distribution)の一般化機能を示す。しかし、CLIPモデルでは、既知の概念の未知の合成を理解するためのモデルの能力の重要な側面である構成外分布(C-OoD)の一般化は、比較的未解明である。私たちのゴールは、CLIPのC-OoDに寄与する要因を特定し、この問題に対処することです。 CLIPの合成理解に関するこれまでの研究は、テストサンプルがCLIPトレーニングデータに対して真に新しいものであることを保証できないことが多かった。この目的のために、我々は、CLIPモデルの複合トレーニングデータセットに遭遇する可能性が極めて低いオブジェクトの属性を含む、大規模で多様なデータセットを単一のオブジェクト設定で慎重に合成した。このデータセットは、C-OoD一般化の真正性評価を可能にする。各種CLIPモデルにおけるC-OoDの一般化について検討した。本稿では,CLIP表現のアンタングル化が,この文脈における重要な指標となることを提案する。合成データセットやその他の既存のデータセットを利用することで、テキストと画像表現の様々なアンタングルメント指標を評価する。本研究は,画像およびテキスト表現の歪み,特に構成要素に関して,CLIPモデルのアウト・オブ・ディストリビューション・セッティングにおける一般化に重要な役割を担っていることを明らかにした。この発見は、CLIPにおけるアウト・オブ・ディストリビューションの一般化を促進する有望な機会を示唆している。 CLIP models have recently shown to exhibit Out of Distribution (OoD) generalization capabilities. However, Compositional Out of Distribution (C-OoD) generalization, which is a crucial aspect of a model's ability to understand unseen compositions of known concepts, is relatively unexplored for the CLIP models. Our goal is to address this problem and identify the factors that contribute to the C-OoD in CLIPs. We noted that previous studies regarding compositional understanding of CLIPs frequently fail to ensure that test samples are genuinely novel relative to the CLIP training data. To this end, we carefully synthesized a large and diverse dataset in the single object setting, comprising attributes for objects that are highly unlikely to be encountered in the combined training datasets of various CLIP models. This dataset enables an authentic evaluation of C-OoD generalization. Our observations reveal varying levels of C-OoD generalization across different CLIP models. We propose that the disentanglement of CLIP representations serves as a critical indicator in this context. By utilizing our synthesized datasets and other existing datasets, we assess various disentanglement metrics of text and image representations. Our study reveals that the disentanglement of image and text representations, particularly with respect to their compositional elements, plays a crucial role in improving the generalization of CLIP models in out-of-distribution settings. This finding suggests promising opportunities for advancing out-of-distribution generalization in CLIPs.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# 量子作用素に対する局所同変表現の学習 Learning local equivariant representations for quantum operators ( http://arxiv.org/abs/2407.06053v3 ) ライセンス: Link先を確認	Zhanghao Zhouyin, Zixi Gan, Shishir Kumar Pandey, Linfeng Zhang, Qiangqiang Gu,	(参考訳) 密度汎関数理論(DFT)フレームワークにおけるハミルトン行列、重なり合い、密度行列などの量子作用素行列の予測は、材料特性を理解するために重要である。現在の手法は個々の演算子に焦点を合わせ、大規模システムの効率性とスケーラビリティに苦慮することが多い。本稿では、複数の量子演算子を予測するための新しい深層学習モデルSLEM(厳密な局所化同変メッセージパス)を紹介し、計算効率を劇的に向上させながら最先端の精度を実現する。 SLEMの重要な革新は、その厳密な局所性に基づく設計であり、物理対称性を維持しながら量子テンソルの局所的同変表現を構築することである。これにより、効果的な受容場を拡張することなく複雑な多体依存が可能となり、データ効率と転送性が向上する。革新的なSO(2)畳み込み法を用いて、SLEMは高次テンソル積の計算複雑性を低減し、従って基底集合に$f$と$g$の軌道を必要とするシステムを扱うことができる。 SLEMの能力は多種多様な2次元および3次元材料にまたがって実証し,限られた訓練データでも高い精度を達成できることを示した。 SLEMの設計は効率的な並列化を促進し、DFTシミュレーションをデバイスレベルのサイズを持つシステムに拡張し、大規模量子シミュレーションと高スループット材料発見の新たな可能性を開く。 Predicting quantum operator matrices such as Hamiltonian, overlap, and density matrices in the density functional theory (DFT) framework is crucial for understanding material properties. Current methods often focus on individual operators and struggle with efficiency and scalability for large systems. Here we introduce a novel deep learning model, SLEM (strictly localized equivariant message-passing) for predicting multiple quantum operators, that achieves state-of-the-art accuracy while dramatically improving computational efficiency. SLEM's key innovation is its strict locality-based design, constructing local, equivariant representations for quantum tensors while preserving physical symmetries. This enables complex many-body dependence without expanding the effective receptive field, leading to superior data efficiency and transferability. Using an innovative SO(2) convolution technique, SLEM reduces the computational complexity of high-order tensor products and is therefore capable of handling systems requiring the $f$ and $g$ orbitals in their basis sets. We demonstrate SLEM's capabilities across diverse 2D and 3D materials, achieving high accuracy even with limited training data. SLEM's design facilitates efficient parallelization, potentially extending DFT simulations to systems with device-level sizes, opening new possibilities for large-scale quantum simulations and high-throughput materials discovery.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# PerlDiff:パースペクティブレイアウト拡散モデルを用いた制御可能なストリートビュー合成 PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models ( http://arxiv.org/abs/2407.06109v2 ) ライセンス: Link先を確認	Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu,	(参考訳) 制御可能な生成は3次元データのアノテートという課題に対処するための潜在的に不可欠なアプローチと考えられており、このような制御可能な生成の精度は、自律運転のデータ生産の文脈において特に不可欠である。既存の手法は、GLIGENやControlNetといったフレームワークを利用して、様々な生成情報を入力を制御することに集中し、制御可能な生成において可換な結果を生成する。しかし、そのようなアプローチは、本質的には、事前に定義されたネットワークアーキテクチャの学習能力に、生成性能を制限している。本稿では,3次元幾何学的情報を完全に活用したストリートビュー画像生成手法であるPerlDiff(Perspective-Layout Diffusion Models)を導入する。我々のPerlDiffは、ネットワーク学習プロセス内で正確なオブジェクトレベル制御でストリートビュー画像の生成をガイドするために、3次元の幾何学的事前情報を用いており、その結果、より堅牢で制御可能な出力が得られる。さらに、代替レイアウト制御法よりも優れた制御性を示す。 PerlDiffはNuScenesとKITTIデータセットの生成精度を著しく向上させる。私たちのコードとモデルはhttps://github.com/LabShuHangGU/PerlDiff.comで公開されています。 Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or ControlNet, to produce commendable outcomes in controllable generation. However, such approaches intrinsically restrict generation performance to the learning capacities of predefined network architectures. In this paper, we explore the integration of controlling information and introduce PerlDiff (Perspective-Layout Diffusion Models), a method for effective street view image generation that fully leverages perspective 3D geometric information. Our PerlDiff employs 3D geometric priors to guide the generation of street view images with precise object-level control within the network learning process, resulting in a more robust and controllable output. Moreover, it demonstrates superior controllability compared to alternative layout control methods. Empirical results justify that our PerlDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets. Our codes and models are publicly available at https://github.com/LabShuHangGU/PerlDiff.	翻訳日:2024-07-17 20:20:06 公開日:2024-07-16
# 自動運転における安全性の向上--エンド・ツー・エンドナビゲーションにおける潜在状態拡散モデルの統合 Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation ( http://arxiv.org/abs/2407.06317v3 ) ライセンス: Link先を確認	Detian Chu, Linyuan Bai, Jianuo Huang, Zhenlong Fang, Peng Zhang, Wei Kang,	(参考訳) 自動運転の進歩により、移動計画やナビゲーションにおける安全性の確保がますます重要になっている。しかし、ほとんどのエンドツーエンドの計画手法は安全性の欠如に悩まされている。本研究は、CMDP(Constrained Markov Decision Processs)として定式化された自動運転の制御最適化問題における安全性問題に対処する。複雑な高次元状態空間における制約を効果的に管理するために,条件付きバリュー・アット・リスクに基づくソフト・アクター・クリティカルを用いて,ポリシー最適化のための新しいモデルベースアプローチを提案する。本手法では, 安全探索を誘導する最悪のアクターを導入し, 予測不可能なシナリオにおいても, 安全要件の厳密な遵守を確保する。政策最適化は拡張ラグランジアン法を採用し、遅延拡散モデルを利用して将来の軌道を予測しシミュレーションする。この2つのアプローチは、環境を安全にナビゲートするだけでなく、環境の不確実性を考慮した流通モデルを統合することで、政策のパフォーマンスを向上する。シミュレーションと実環境の両方で実施した実証評価では,既存の手法よりも安全性,効率,意思決定能力が優れていた。 With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based approach for policy optimization, utilizing a conditional Value-at-Risk based Soft Actor Critic to manage constraints in complex, high-dimensional state spaces effectively. Our method introduces a worst-case actor to guide safe exploration, ensuring rigorous adherence to safety requirements even in unpredictable scenarios. The policy optimization employs the Augmented Lagrangian method and leverages latent diffusion models to predict and simulate future trajectories. This dual approach not only aids in navigating environments safely but also refines the policy's performance by integrating distribution modeling to account for environmental uncertainties. Empirical evaluations conducted in both simulated and real environment demonstrate that our approach outperforms existing methods in terms of safety, efficiency, and decision-making capabilities.	翻訳日:2024-07-17 20:10:21 公開日:2024-07-16
# あらゆるものを追跡する分解 Decomposition Betters Tracking Everything Everywhere ( http://arxiv.org/abs/2407.06531v2 ) ライセンス: Link先を確認	Rui Li, Dong Liu,	(参考訳) 動き推定に関する最近の研究は、ビデオ全体、好ましくは各ピクセルに対して一様に一貫した、最適化された動き表現を提唱している。均一な表現は、自然ビデオの複雑で多様な動きや外観を考慮しないため、これは難しい。この問題に対処し,DecoMotionという新しいテスト時間最適化手法を提案する。 DecoMotionはビデオコンテンツを静的シーンと動的オブジェクトに明示的に分解する。 DecoMotionは局所空間と標準空間の間の変換を別々に調整し、カメラの動きに対応する静的シーンに対するアフィン変換を容易にする。ダイナミックボリュームに対しては、DecoMotionは差別的かつ時間的に一貫した特徴を活用して、非厳密な変換を是正する。最終的に2巻は、動きと外観を完全に表現するために融合される。この分割・対数戦略は、閉塞や変形によるより堅牢な追跡につながり、一方、分解された外観を得る。我々はTAP-Vidベンチマークで評価を行う。その結果,提案手法は点追跡精度を高いマージンで向上させ,最先端の専用点追跡ソリューションと同等に動作することを示した。 Recent studies on motion estimation have advocated an optimized motion representation that is globally consistent across the entire video, preferably for every pixel. This is challenging as a uniform representation may not account for the complex and diverse motion and appearance of natural videos. We address this problem and propose a new test-time optimization method, named DecoMotion, for estimating per-pixel and long-range motion. DecoMotion explicitly decomposes video content into static scenes and dynamic objects, either of which uses a quasi-3D canonical volume to represent. DecoMotion separately coordinates the transformations between local and canonical spaces, facilitating an affine transformation for the static scene that corresponds to camera motion. For the dynamic volume, DecoMotion leverages discriminative and temporally consistent features to rectify the non-rigid transformation. The two volumes are finally fused to fully represent motion and appearance. This divide-and-conquer strategy leads to more robust tracking through occlusions and deformations and meanwhile obtains decomposed appearances. We conduct evaluations on the TAP-Vid benchmark. The results demonstrate our method boosts the point-tracking accuracy by a large margin and performs on par with some state-of-the-art dedicated point-tracking solutions.	翻訳日:2024-07-17 20:10:21 公開日:2024-07-16
# QAOA上の多段階量子ウォークの利点 Advantages of multistage quantum walks over QAOA ( http://arxiv.org/abs/2407.06663v2 ) ライセンス: Link先を確認	Lasse Gerblich, Tamanna Dasanjh, Horatio Q. X. Wong, David Ross, Leonardo Novo, Nicholas Chancellor, Viv Kendon,	(参考訳) イジング・ハミルトニアンに符号化された最適化問題の解状態を見つける方法は、現在の研究の非常に活発な領域である。本研究では、量子近似最適化アルゴリズム(QAOA)とマルチステージ量子ウォーク(MSQW)を比較する。どちらも変分量子アルゴリズムとして使用することができ、制御パラメータは古典的に最適化される。公正な比較では、量子的資源と古典的資源の両方を評価する必要がある。あるいは、この作業で行ったようにパラメータをヒューリスティックに選択して、比較の簡単な設定を提供することもできます。数値的手法と解析的手法の両方を用いて,MSQWが等価資源を用いてQAOAより優れていることを示す。また,MSQWが古典的最適化を伴わずに,少数の段階やヒューリスティックパラメータに対しても良好に動作するようなランダムなスピングラス基底状態問題についても数値的に示す。 Methods to find the solution state for optimization problems encoded into Ising Hamiltonians are a very active area of current research. In this work we compare the quantum approximate optimization algorithm (QAOA) with multi-stage quantum walks (MSQW). Both can be used as variational quantum algorithms, where the control parameters are optimized classically. A fair comparison requires both quantum and classical resources to be assessed. Alternatively, parameters can be chosen heuristically, as we do in this work, providing a simpler setting for comparisons. Using both numerical and analytical methods, we obtain evidence that MSQW outperforms QAOA, using equivalent resources. We also show numerically for random spin glass ground state problems that MSQW performs well even for few stages and heuristic parameters, with no classical optimization.	翻訳日:2024-07-17 20:10:21 公開日:2024-07-16
# 3DGS.zip:3次元ガウス散乱圧縮法に関する調査 3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods ( http://arxiv.org/abs/2407.09510v2 ) ライセンス: Link先を確認	Milena T. Bagdasarian, Paul Knoll, Florian Barthel, Anna Hilsmann, Peter Eisert, Wieland Morgenstern,	(参考訳) 本稿では,3次元ガウススプラッティング圧縮法について,様々なベンチマークにおける統計的性能に着目して検討する。本調査は,異なる圧縮手法の鍵となる統計データを表形式で要約することにより,可読性の向上を目的とする。評価されたデータセットには、TurpsAndTemples、MipNeRF360、DeepBlending、SyntheticNeRFがある。各手法について,各著者が提案するPak Signal-to-Noise Ratio (PSNR), Structure similarity Index (SSIM), Learned Perceptual Image Patch similarity (LPIPS), and the resultant size in megabytes (MB)について報告する。これは進行中のオープンソースプロジェクトであり、GitHubの問題やプルリクエストとして、リサーチコミュニティからのコントリビューションを募集しています。詳細はhttp://w-m.github.io/3dgs-compression-survey/を参照してください。 We present a work-in-progress survey on 3D Gaussian Splatting compression methods, focusing on their statistical performance across various benchmarks. This survey aims to facilitate comparability by summarizing key statistics of different compression approaches in a tabulated format. The datasets evaluated include TanksAndTemples, MipNeRF360, DeepBlending, and SyntheticNeRF. For each method, we report the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), and the resultant size in megabytes (MB), as provided by the respective authors. This is an ongoing, open project, and we invite contributions from the research community as GitHub issues or pull requests. Please visit http://w-m.github.io/3dgs-compression-survey/ for more information and a sortable version of the table.	翻訳日:2024-07-17 20:10:21 公開日:2024-07-16
# AIとリスクの反復的エピストピー AI and the Iterable Epistopics of Risk ( http://arxiv.org/abs/2407.10236v2 ) ライセンス: Link先を確認	Andy Crabtree, Glenn McGarry, Lachlan Urquhart,	(参考訳) 抽象。 AIが社会に提示するリスクは、一般的な計算、すなわち、AI開発に関わる人々がAI影響評価、倫理的枠組み、新興国際標準、規制などのリスクを認識・管理できるように設計された一般的なフレームワークを通じて、広く理解されている。本稿では、規制当局、開発者、サイバーセキュリティの専門家によるリスクの把握と管理について詳述する。リスクとリスクマネジメントは、一般的な計算にカプセル化されていない日常的な場所にあるプラクティスに依存していることが明らかになった。 Situated practiceは反復可能なエピストピーを表面化し、AI開発に関わる人々がどのようにしてリスクを理解し、その後に反応し、仕事における大きな課題を明らかにするかを明らかにする。 AIにおけるリスクのエピストピーの発見と解明 a) 学際的調査の潜在的なプログラムを提供する b)AI開発者にリスクを認識させる手段を提供し、 c) 一般計算の現在進行中の進化を知らせる。 Abstract. The risks AI presents to society are broadly understood to be manageable through general calculus, i.e., general frameworks designed to enable those involved in the development of AI to apprehend and manage risk, such as AI impact assessments, ethical frameworks, emerging international standards, and regulations. This paper elaborates how risk is apprehended and managed by a regulator, developer and cyber-security expert. It reveals that risk and risk management is dependent on mundane situated practices not encapsulated in general calculus. Situated practice surfaces iterable epistopics, revealing how those involved in the development of AI know and subsequently respond to risk and uncover major challenges in their work. The ongoing discovery and elaboration of epistopics of risk in AI a) furnishes a potential program of interdisciplinary inquiry, b) provides AI developers with a means of apprehending risk, and c) informs the ongoing evolution of general calculus.	翻訳日:2024-07-17 20:10:21 公開日:2024-07-16
# 逆問題における拡散モデルの近似後サンプリングのためのゼロショット適応 Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems ( http://arxiv.org/abs/2407.11288v1 ) ライセンス: Link先を確認	Yaşar Utku Alçalar, Mehmet Akçakaya,	(参考訳) 拡散モデルは、逆問題を解決するための強力な生成技術として登場した。画像の様々な逆問題で成功したにもかかわらず、これらのモデルは収束するために多くのステップを必要とし、推論時間が遅くなる。近年,低騒音レベルにおける時間経過の頻繁な反復を伴い,画像生成や収束速度が向上する,高度なノイズスケジュールを利用するための拡散モデルが流行している。しかしながら、これらのアイデアの拡散モデルによる逆問題への応用は、前方モデルの対数的項重みに対する経験的チューニングを使用する場合、これらのノイズスケジュールがうまく機能しないため、依然として困難である。これらの課題に対処するために、ゼロショット物理駆動深層学習への接続を利用するゼロショット近似後方サンプリング(ZAPS)を提案する。 ZAPSはサンプリングステップの数を修正し、物理学誘導損失関数によるゼロショットトレーニングを使用して、不規則な時間ステップ毎にログライクな重みを学習する。本稿では,最近提案した拡散後サンプリング法をベースラインとしてZAPSを適用した。さらに,学習可能な対角成分を用いた対角化手法を用いて,先行対数ヘシアンを近似し,計算効率を向上する。これらのパラメータは、所定の計算予算を持つ固定数のエポックに対して最適化される。ガウス, 運動遅延, 塗装, 超解像などの様々な雑音逆問題に対する結果から, ZAPSは推定時間を短縮し, 不規則な騒音スケジュールに対して頑健性を提供し, 再現性の向上を図っている。コードはhttps://github.com/ualcalar17/ZAPSで入手できる。 Diffusion models have emerged as powerful generative techniques for solving inverse problems. Despite their success in a variety of inverse problems in imaging, these models require many steps to converge, leading to slow inference time. Recently, there has been a trend in diffusion models for employing sophisticated noise schedules that involve more frequent iterations of timesteps at lower noise levels, thereby improving image generation and convergence speed. However, application of these ideas for solving inverse problems with diffusion models remain challenging, as these noise schedules do not perform well when using empirical tuning for the forward model log-likelihood term weights. To tackle these challenges, we propose zero-shot approximate posterior sampling (ZAPS) that leverages connections to zero-shot physics-driven deep learning. ZAPS fixes the number of sampling steps, and uses zero-shot training with a physics-guided loss function to learn log-likelihood weights at each irregular timestep. We apply ZAPS to the recently proposed diffusion posterior sampling method as baseline, though ZAPS can also be used with other posterior sampling diffusion models. We further approximate the Hessian of the logarithm of the prior using a diagonalization approach with learnable diagonal entries for computational efficiency. These parameters are optimized over a fixed number of epochs with a given computational budget. Our results for various noisy inverse problems, including Gaussian and motion deblurring, inpainting, and super-resolution show that ZAPS reduces inference time, provides robustness to irregular noise schedules and improves reconstruction quality. Code is available at https://github.com/ualcalar17/ZAPS	翻訳日:2024-07-17 19:02:01 公開日:2024-07-16
# LoRA-PT:主テンソル特異値とベクトルを用いた海馬セグメンテーションのための低ランク適応UNETR LoRA-PT: Low-Rank Adapting UNETR for Hippocampus Segmentation Using Principal Tensor Singular Values and Vectors ( http://arxiv.org/abs/2407.11292v1 ) ライセンス: Link先を確認	Guanghua He, Wangang Cheng, Hancan Zhu, Gaohang Yu,	(参考訳) 海馬は様々な精神疾患に関連する重要な脳構造であり、その自動的かつ正確なセグメンテーションはこれらの疾患の研究に不可欠である。近年,深層学習に基づく手法は海馬セグメンテーションにおいて大きな進歩を遂げている。しかし、深層ニューラルネットワークモデルのトレーニングには、大量のラベル付きトレーニングデータだけでなく、かなりの計算資源と時間が必要です。そこで本研究では,LoRA-PTと呼ばれるパラメータ効率の高いファインチューニング手法を提案する。この方法は、BraTS2021データセット上の事前訓練されたUNETRモデルを、海馬セグメンテーションタスクに転送する。特に、LoRA-PT法は変圧器構造のパラメータ行列を3つのサイズに分類し、3つの3次元テンソルを形成する。テンソル特異値分解により、これらのテンソルは分解され、主特異値と特異ベクトルを持つ低ランクテンソルを生成し、残りの特異値とベクトルは残留テンソルを形成する。 LoRA法と同様に、パラメータの微調整の間は、残留テンソルを一定に保ちながら、主テンソル特異値とベクトルの低ランクテンソルのみを更新する。提案手法を3つの公開海馬データセットで検証した。実験結果から,LoRA-PTは,パラメータ更新回数を大幅に削減しつつ,既存のパラメータ効率変換学習手法よりもセグメンテーション精度が高いことがわかった。 The hippocampus is a crucial brain structure associated with various psychiatric disorders, and its automatic and precise segmentation is essential for studying these diseases. In recent years, deep learning-based methods have made significant progress in hippocampus segmentation. However, training deep neural network models requires substantial computational resources and time, as well as a large amount of labeled training data, which is often difficult to obtain in medical image segmentation. To address this issue, we propose a new parameter-efficient fine-tuning method called LoRA-PT. This method transfers the pre-trained UNETR model on the BraTS2021 dataset to the hippocampus segmentation task. Specifically, the LoRA-PT method categorizes the parameter matrix of the transformer structure into three sizes, forming three 3D tensors. Through tensor singular value decomposition, these tensors are decomposed to generate low-rank tensors with the principal singular values and singular vectors, while the remaining singular values and vectors form the residual tensor. Similar to the LoRA method, during parameter fine-tuning, we only update the low-rank tensors, i.e. the principal tensor singular values and vectors, while keeping the residual tensor unchanged. We validated the proposed method on three public hippocampus datasets. Experimental results show that LoRA-PT outperforms existing parameter-efficient transfer learning methods in segmentation accuracy while significantly reducing the number of parameter updates.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# COHO: 環境に敏感な都市規模の階層的都市レイアウト生成 COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation ( http://arxiv.org/abs/2407.11294v1 ) ライセンス: Link先を確認	Liu He, Daniel Aliaga,	(参考訳) 大規模な都市レイアウトの生成は、様々な分野において大きな関心を集めてきた。従来の手法では、手動のルールコーディングや豊富なデータを必要とするディープラーニングを必要とする手続き生成が利用されていた。しかし、従来の手法では、都市レイアウト生成の文脈に敏感な性質は考慮されていない。提案手法は, 都市全体の標準グラフ表現を活用することで, 拡張性を高め, 都市レイアウトに固有の多層セマンティクスを捉えることで, このギャップに対処する。都市規模の都市レイアウト生成のための新しいグラフベースのマスク付きオートエンコーダ(GMAE)を提案する。この手法は、属性付き建物、都市ブロック、コミュニティ、都市を統一的なグラフ構造に符号化し、グラフオートエンコーダのための自己教師付きマスクトレーニングを可能にする。さらに,重要な都市ブロックや建物の発生を優先し,2.5次元レイアウト生成のための定期的な反復サンプリングも実施している。提案手法は,米国330都市における異質な都市スタイルにおける良好な現実性,意味的整合性,正当性を実現する。コードとデータセットはhttps://github.com/Arking 1995/COHOで公開されている。 The generation of large-scale urban layouts has garnered substantial interest across various disciplines. Prior methods have utilized procedural generation requiring manual rule coding or deep learning needing abundant data. However, prior approaches have not considered the context-sensitive nature of urban layout generation. Our approach addresses this gap by leveraging a canonical graph representation for the entire city, which facilitates scalability and captures the multi-layer semantics inherent in urban layouts. We introduce a novel graph-based masked autoencoder (GMAE) for city-scale urban layout generation. The method encodes attributed buildings, city blocks, communities and cities into a unified graph structure, enabling self-supervised masked training for graph autoencoder. Additionally, we employ scheduled iterative sampling for 2.5D layout generation, prioritizing the generation of important city blocks and buildings. Our approach achieves good realism, semantic consistency, and correctness across the heterogeneous urban styles in 330 US cities. Codes and datasets are released at https://github.com/Arking1995/COHO.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# 格子厚みモデルにおける動的量子相転移と熱平衡 Dynamical Quantum Phase Transition and Thermal Equilibrium in the Lattice Thirring Model ( http://arxiv.org/abs/2407.11295v1 ) ライセンス: Link先を確認	Mari Carmen Bañuls, Krzysztof Cichy, Hao-Ti Hung, Ying-Jer Kao, C. -J. David Lin, Amit Singh,	(参考訳) テンソルネットワーク法を用いて、臨界相と質量相の平衡から切り出された格子チリングモデルのリアルタイム進化をシミュレートし、ロシミト速度の非解析性として動的量子相転移の出現を研究する。モデルにおける動的量子相転移の存在は、平衡相図の臨界線を0温度で横断するクエンチとは一致しないが、動的量子相転移が起こるために必要な初期状態のエネルギー密度の閾値を同定する。さらに、ギャップ付きクエンチハミルトニアンの場合、このしきい値と有限温度位相図内の異なる領域間の遷移との接続を明らかにする。 Using tensor network methods, we simulate the real-time evolution of the lattice Thirring model quenched out of equilibrium in both the critical and massive phases, and study the appearance of dynamical quantum phase transitions, as non-analyticities in the Loschmidt rate. Whereas the presence of a dynamical quantum phase transition in the model does not correspond to quenches across the critical line of the equilibrium phase diagram at zero temperature, we identify a threshold in the energy density of the initial state, necessary for a dynamical quantum phase transition to be present. Moreover, in the case of the gapped quench Hamiltonian, we unveil a connection of this threshold to a transition between different regions in the finite temperature phase diagram.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# FR-SLAM:床計画登録に基づくSLAM改善手法 FR-SLAM: A SLAM Improvement Method Based on Floor Plan Registration ( http://arxiv.org/abs/2407.11299v1 ) ライセンス: Link先を確認	Jiantao Feng, Xinde Li, HyunCheol Park, Juan Liu, Zhentong Zhang,	(参考訳) SLAM技術は,移動ロボットの屋内自律走行における重要な技術として,環境マップの構築と位置決めを可能にする。従来のSLAM法は、完全な地図を得るためには、屋内ナビゲーション中にすべての部屋を徹底的に横断する必要があるため、長い経路計画時間と目標地点に到達するのに長い時間がかかる。さらに,動作中の累積誤差がロボットの局所化に寄与し,ナビゲーション効率に影響を及ぼすとともに,フロアプラン登録に基づく改良されたSLAM法であるFR-SLAMを提案し,フロアプランの整列と変換にモルフォロジーに基づくフロアプラン登録アルゴリズムを用いた。このアプローチにより、包括的なモーションマップの迅速な取得と効率的な経路計画が実現され、より短い時間枠内での迅速なナビゲーションが可能となる。登録とロボット動作のローカライゼーションの精度を高めるために、現在位置の建物構造を地図と比較し、正確なローカライゼーションのためのフロアプラン登録結果を動的に更新するリアルタイム更新戦略を採用する。実データとシミュレーションデータの比較実験により, 他のベンチマークアルゴリズムと比較して, フロアプランの登録精度が向上し, 目標位置に到達するまでの所要時間が短縮された。 Simultaneous Localization and Mapping (SLAM) technology enables the construction of environmental maps and localization, serving as a key technique for indoor autonomous navigation of mobile robots. Traditional SLAM methods typically require exhaustive traversal of all rooms during indoor navigation to obtain a complete map, resulting in lengthy path planning times and prolonged time to reach target points. Moreover, cumulative errors during motion lead to inaccurate robot localization, impacting navigation efficiency.This paper proposes an improved SLAM method, FR-SLAM, based on floor plan registration, utilizing a morphology-based floor plan registration algorithm to align and transform original floor plans. This approach facilitates the rapid acquisition of comprehensive motion maps and efficient path planning, enabling swift navigation to target positions within a shorter timeframe. To enhance registration and robot motion localization accuracy, a real-time update strategy is employed, comparing the current position's building structure with the map and dynamically updating floor plan registration results for precise localization. Comparative tests conducted on real and simulated datasets demonstrate that, compared to other benchmark algorithms, this method achieves higher floor plan registration accuracy and shorter time consumption to reach target positions.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# 文脈認識における感情認識としての大規模視覚言語モデル Large Vision-Language Models as Emotion Recognizers in Context Awareness ( http://arxiv.org/abs/2407.11300v1 ) ライセンス: Link先を確認	Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, Jiawei Chen, Peng Zhai, Lihua Zhang,	(参考訳) 文脈対応感情認識(CAER)は、様々な文脈から感情を知覚する必要がある複雑で重要なタスクである。以前のアプローチは主に、イメージから感情的な手がかりを抽出する洗練されたアーキテクチャを設計することに焦点を当てていた。しかし、それらの知識は特定の訓練データセットに限定されており、アノテータの主観的な感情バイアスを反映する可能性がある。さらに、大量のラベル付きデータを取得することは、現実世界のアプリケーションではしばしば困難である。本稿では、3つのパラダイムからCAERタスクを強化するためにLVLM(Large Vision-Language Models)を活用する可能性について体系的に検討する。 1) 大規模モデルを下流タスクに転送する最も一般的な方法である2つのCAERデータセット上でLVLMを微調整する。 2) 限られたデータや全く見えないシナリオにおいて, LVLMの性能を評価するため, ゼロショットと少数ショットのパターンを設計する。この場合、LVLMのIn-Context Learning(ICL)機能を完全に活用するために、トレーニング不要のフレームワークが提案されている。具体的には、画像類似度に基づくランキングアルゴリズムを開発し、サンプルを検索し、次に命令、サンプルを検索し、テスト例を組み合わせてLVLMをフィードし、対応する感情判断を得る。 3) LVLMの豊富な知識基盤を活用するため, モデルの推論能力を高め, 解釈可能な結果を提供するために, フレームワークにChain-of-Thought(CoT)を組み込んだ。大規模な実験と分析により、LVLMは異なるパラダイムにわたるCAERタスクにおいて競争性能を達成することを示した。特に、数ショット設定での優れた性能は、広範囲のトレーニングを伴わずに特定のタスクを達成するためのLVLMの実現可能性を示している。 Context-aware emotion recognition (CAER) is a complex and significant task that requires perceiving emotions from various contextual cues. Previous approaches primarily focus on designing sophisticated architectures to extract emotional cues from images. However, their knowledge is confined to specific training datasets and may reflect the subjective emotional biases of the annotators. Furthermore, acquiring large amounts of labeled data is often challenging in real-world applications. In this paper, we systematically explore the potential of leveraging Large Vision-Language Models (LVLMs) to empower the CAER task from three paradigms: 1) We fine-tune LVLMs on two CAER datasets, which is the most common way to transfer large models to downstream tasks. 2) We design zero-shot and few-shot patterns to evaluate the performance of LVLMs in scenarios with limited data or even completely unseen. In this case, a training-free framework is proposed to fully exploit the In-Context Learning (ICL) capabilities of LVLMs. Specifically, we develop an image similarity-based ranking algorithm to retrieve examples; subsequently, the instructions, retrieved examples, and the test example are combined to feed LVLMs to obtain the corresponding sentiment judgment. 3) To leverage the rich knowledge base of LVLMs, we incorporate Chain-of-Thought (CoT) into our framework to enhance the model's reasoning ability and provide interpretable results. Extensive experiments and analyses demonstrate that LVLMs achieve competitive performance in the CAER task across different paradigms. Notably, the superior performance in few-shot settings indicates the feasibility of LVLMs for accomplishing specific tasks without extensive training.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# ゼーマンモデルによるロデオアルゴリズムの解法 Unraveling Rodeo Algorithm Through the Zeeman Model ( http://arxiv.org/abs/2407.11301v1 ) ライセンス: Link先を確認	Raphael Fortes Infante Gomes, Julio Cesar Siqueira Rocha, Wallon Anderson Tadaiesky Nogueira, Rodrigo Alves Dias,	(参考訳) 任意の初期状態を考慮したハミルトニアン一般に対する固有状態と固有値スペクトルを決定するために、ロデオアルゴリズムを解く。新たな方法論を提示することにより,固有状態に関する事前の知識を必要とせずに,元の手法を詳述し,すべての特性をどのように定義するかを示す。この目的のために、我々はPennylane と Qiskit のプラットフォームリソースを利用して、ハミルトニアンが1つのスピンと2つのスピンに対してゼーマンモデルによって記述されるシナリオを分析する。また本研究では,本質的なパラメータを調整し,データ分布に固有のゆらぎを低減し,アルゴリズムの性能向上のための戦略や手法についても紹介する。まず,Xanaduシミュレータ上の単一キュービットのダイナミクスを探索し,メソッド性能を最適化するパラメータを設定し,アルゴリズムを実行するための最善の戦略を選択する。そこで本研究では,両部システムの方法論を拡張し,縮退や絡み合いを考慮した場合のアルゴリズムの動作について検討する。最後に、IBM Q Experienceプログラムによって提供される実超伝導デバイス上で得られた結果と比較し、マルチキュービットシステムにおけるプロトコル効率を向上させる条件を確立する。 We unravel the Rodeo Algorithm to determine the eigenstates and eigenvalues spectrum for a general Hamiltonian considering arbitrary initial states. By presenting a novel methodology, we detail the original method and show how to define all properties without having prior knowledge regarding the eigenstates. To this end, we exploit Pennylane and Qiskit platforms resources to analyze scenarios where the Hamiltonians are described by the Zeeman model for one and two spins. We also introduce strategies and techniques to improve the algorithm's performance by adjusting its intrinsic parameters and reducing the fluctuations inherent to data distribution. First, we explore the dynamics of a single qubit on Xanadu simulators to set the parameters that optimize the method performance and select the best strategies to execute the algorithm. On the sequence, we extend the methodology for bipartite systems to discuss how the algorithm works when degeneracy and entanglement are taken into account. Finally, we compare the predictions with the results obtained on a real superconducting device provided by the IBM Q Experience program, establishing the conditions to increase the protocol efficiency for multi-qubit systems.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# PADRe:高能率視覚変換器のためのポリノミアルアテンション・ドロップイン・リプレース PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer ( http://arxiv.org/abs/2407.11306v1 ) ライセンス: Link先を確認	Pierre-David Letourneau, Manish Kumar Singh, Hsin-Pai Cheng, Shizhong Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, Fatih Porikli,	(参考訳) 本稿では,変圧器モデルにおける従来の自己注意機構を置き換えるために設計された,新規で統一的なフレームワークであるPADReを提案する。特に、Hyena、Mamba、SimA、Conv2Former、Castling-ViTといった最近の別の注意機構は、当社のPADReフレームワークの特定のインスタンスと見なすことができます。 PADReは多項式関数を利用し、近似理論から確立された結果を導き、精度を損なうことなく計算効率を向上する。 PADReの鍵となるコンポーネントは乗法的非線形性であり、Adamard製品のような単純でハードウェアフレンドリーな操作を用いて実装し、線形計算とメモリコストのみを発生させる。 PADReはさらに、Softmaxのような複雑な関数の使用を回避しているが、従来の自己アテンションと同等または優れた精度を維持している。多様なコンピュータビジョンタスクにおける自己注意の代替手段としてのPADReの有効性を評価する。これらのタスクには、画像分類、画像ベースの2Dオブジェクト検出、および3Dポイントクラウドオブジェクト検出が含まれる。実験結果から、PADReは従来の自己注意(サーバGPUやモバイルNPUでは11x〜43倍高速)よりもはるかに高速に動作し、トランスフォーマーモデルに自己注意を代用する場合も同様の精度を維持した。 We present Polynomial Attention Drop-in Replacement (PADRe), a novel and unifying framework designed to replace the conventional self-attention mechanism in transformer models. Notably, several recent alternative attention mechanisms, including Hyena, Mamba, SimA, Conv2Former, and Castling-ViT, can be viewed as specific instances of our PADRe framework. PADRe leverages polynomial functions and draws upon established results from approximation theory, enhancing computational efficiency without compromising accuracy. PADRe's key components include multiplicative nonlinearities, which we implement using straightforward, hardware-friendly operations such as Hadamard products, incurring only linear computational and memory costs. PADRe further avoids the need for using complex functions such as Softmax, yet it maintains comparable or superior accuracy compared to traditional self-attention. We assess the effectiveness of PADRe as a drop-in replacement for self-attention across diverse computer vision tasks. These tasks include image classification, image-based 2D object detection, and 3D point cloud object detection. Empirical results demonstrate that PADRe runs significantly faster than the conventional self-attention (11x ~ 43x faster on server GPU and mobile NPU) while maintaining similar accuracy when substituting self-attention in the transformer models.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# デバイス間通信による分散IoTエッジ上のグローバル異常の検出 Detection of Global Anomalies on Distributed IoT Edges with Device-to-Device Communication ( http://arxiv.org/abs/2407.11308v1 ) ライセンス: Link先を確認	Hideya Ochiai, Riku Nishihata, Eisuke Tomiyama, Yuwei Sun, Hiroshi Esaki,	(参考訳) 異常検出は、異常事象によって引き起こされる外れ値を見つけるためのIoTアプリケーションにおいて重要な機能である。異常検出には、クラウドではなくエッジデバイスで実施すべき高周波データサンプリングが伴うことがある。本稿では,複数のIoTデバイスを1つのリモートサイトに設置し,デバイス間通信による観測から異常を共同検出する事例について考察する。そこで本研究では,無線アドホックフェデレートラーニング(WAFL-Autoencoder)を用いた分散異常検知器のトレーニングを行うための,完全分散協調方式を提案する。サンプルは局所的なデバイスに限らず,対象領域のすべてのデバイスに稀である,Global Anomalyの概念を導入する。また,グローバル異常検出のための分散しきい値探索アルゴリズムを提案する。標準ベンチマークによる評価により、我々はデバイス全体で完全に異常検出を訓練したことを確認した。また, 偽陽性率の低いGlobal Anomaly検出のしきい値が, 例外が少なく, 真陽性率の高い値が得られたことも確認した。 Anomaly detection is an important function in IoT applications for finding outliers caused by abnormal events. Anomaly detection sometimes comes with high-frequency data sampling which should be carried out at Edge devices rather than Cloud. In this paper, we consider the case that multiple IoT devices are installed in a single remote site and that they collaboratively detect anomalies from the observations with device-to-device communications. For this, we propose a fully distributed collaborative scheme for training distributed anomaly detectors with Wireless Ad Hoc Federated Learning, namely "WAFL-Autoencoder". We introduce the concept of Global Anomaly which sample is not only rare to the local device but rare to all the devices in the target domain. We also propose a distributed threshold-finding algorithm for Global Anomaly detection. With our standard benchmark-based evaluation, we have confirmed that our scheme trained anomaly detectors perfectly across the devices. We have also confirmed that the devices collaboratively found thresholds for Global Anomaly detection with low false positive rates while achieving high true positive rates with few exceptions.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# ガウス散乱LK Gaussian Splatting LK ( http://arxiv.org/abs/2407.11309v1 ) ライセンス: Link先を確認	Liuyue Xie, Joel Julin, Koichiro Niinuma, Laszlo A. Jeni,	(参考訳) 2D画像からダイナミックな3Dシーンを再構築し、時間とともに多様なビューを生成することは、固有の複雑さと時間的ダイナミクスによって大きな課題となる。ニューラル暗黙的モデルと動的ガウススプラッティングの最近の進歩は有望であるが、特に高度にダイナミックなシーンの基礎となる幾何学を正確に捉える際に制限は持続している。いくつかのアプローチは、拡散モデルを通して強い意味論と幾何学的先入観を組み込むことによってこの問題に対処する。しかし,動的ガウススティングフレームワークにおいて,ネイティブワープフィールドの正規化の可能性を検討することによって,異なる経路を探索する。本手法は, 正確なワープ場が連続した時空運動を発生させるという重要な直観に基づいている。ワープフィールドの運動制限は簡単ではないものの,解析速度場を導出するためにフォワードワープフィールドネットワークに固有の知識を生かして,シーンフローの時間積分を行い,ガウスの2次元運動と3次元位置の両方を効果的に拘束できることが示される。このルーカス・カナーデ型解析正規化により、最小限のカメラ動作下であっても、非常にダイナミックなシーンを再構成し、既存の動的ガウス・スプレイティングフレームワークが達成できる範囲を広げる上で、優れた性能を実現することができる。 Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time presents a significant challenge due to the inherent complexity and temporal dynamics involved. While recent advancements in neural implicit models and dynamic Gaussian Splatting have shown promise, limitations persist, particularly in accurately capturing the underlying geometry of highly dynamic scenes. Some approaches address this by incorporating strong semantic and geometric priors through diffusion models. However, we explore a different avenue by investigating the potential of regularizing the native warp field within the dynamic Gaussian Splatting framework. Our method is grounded on the key intuition that an accurate warp field should produce continuous space-time motions. While enforcing the motion constraints on warp fields is non-trivial, we show that we can exploit knowledge innate to the forward warp field network to derive an analytical velocity field, then time integrate for scene flows to effectively constrain both the 2D motion and 3D positions of the Gaussians. This derived Lucas-Kanade style analytical regularization enables our method to achieve superior performance in reconstructing highly dynamic scenes, even under minimal camera movement, extending the boundaries of what existing dynamic Gaussian Splatting frameworks can achieve.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# Digital Twin Vehicular Edge Computing Network: Task Offloading と Resource Allocation Digital Twin Vehicular Edge Computing Network: Task Offloading and Resource Allocation ( http://arxiv.org/abs/2407.11310v1 ) ライセンス: Link先を確認	Yu Xie, Qiong Wu, Pingyi Fan,	(参考訳) 車両のインターネット上の複数のアプリケーションに対する需要が高まっている。車両は複数の計算タスクをリアルタイムで実行する必要がある。しかし、車両自体の計算能力が不足しているため、車両エッジコンピューティング(VEC)サーバにタスクをオフロードし、コンピュータリソースをタスクに割り当てることは困難である。本稿では,マルチタスクディジタルツイン(DT)VECネットワークを構築した。 DTを用いて、各車両の複数のタスクに対するオフロード戦略とリソース割り当て戦略を1つのスロットで開発することにより、最適化問題を構築する。そこで本研究では,タスクオフロードとリソース割り当てに関するマルチエージェント強化学習手法を提案する。多数の実験により,本手法は他のベンチマークアルゴリズムと比較して有効であることが示された。 With the increasing demand for multiple applications on internet of vehicles. It requires vehicles to carry out multiple computing tasks in real time. However, due to the insufficient computing capability of vehicles themselves, offloading tasks to vehicular edge computing (VEC) servers and allocating computing resources to tasks becomes a challenge. In this paper, a multi task digital twin (DT) VEC network is established. By using DT to develop offloading strategies and resource allocation strategies for multiple tasks of each vehicle in a single slot, an optimization problem is constructed. To solve it, we propose a multi-agent reinforcement learning method on the task offloading and resource allocation. Numerous experiments demonstrate that our method is effective compared to other benchmark algorithms.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# COMET:数学問題生成のための大規模マルチモーダルモデルの強化 COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation ( http://arxiv.org/abs/2407.11315v1 ) ライセンス: Link先を確認	Sannyuya Liu, Jintian Feng, Zongkai Yang, Yawei Luo, Qian Wan, Xiaoxuan Shen, Jianwen Sun,	(参考訳) 高品質な数学問題の自動生成は、多くの教育シナリオにおいて事実上価値のあるものである。大規模マルチモーダルモデルは、クロスモーダルデータシナリオで広く成功しているため、数学的問題生成のための新しい技術的アプローチを提供する。しかし、問題生成から問題解決を分離する従来の手法と、一様学習目的を持つ単調データ構造を主軸とした微調整フレームワークは、数学的な問題生成における大規模マルチモーダルモデルの適用を制限している。これらの課題に対処するため,本論文では,数学的問題生成のための大規模マルチモーダルモデルであるCOMETを提案する。まず、相互能力の促進と応用論理の観点から、茎生成と問題解決を数学的問題生成に統合する。次に、"Cone of Experience"によってガイドされた3段階のファインターンフレームワークを提案する。このフレームワークは、微調整データを象徴的な経験、象徴的な経験、直接的な経験に分割し、教師のキャリア成長における経験と類似性を引き出す。このフレームワークでは、いくつかのきめ細かいデータ構築および注入方法が設計されている。最後に、この分野における中国のマルチモーダルデータの空白を満たすために、中国のマルチモーダル数学問題データセットを構築した。客観的および主観的な指標と組み合わせて、提案したフレームワークとモデルの有効性を複数のデータセットで完全に検証する。 The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framework of monotonous data structure with homogeneous training objectives limit the application of large multimodal model in mathematical problem generation. Addressing these challenges, this paper proposes COMET, a "Cone of Experience" enhanced large multimodal model for mathematical problem generation. Firstly, from the perspective of mutual ability promotion and application logic, we unify stem generation and problem solving into mathematical problem generation. Secondly, a three-stage fine-turning framework guided by the "Cone of Experience" is proposed. The framework divides the fine-tuning data into symbolic experience, iconic experience, and direct experience to draw parallels with experiences in the career growth of teachers. Several fine-grained data construction and injection methods are designed in this framework. Finally, we construct a Chinese multimodal mathematical problem dataset to fill the vacancy of Chinese multimodal data in this field. Combined with objective and subjective indicators, experiments on multiple datasets fully verify the effectiveness of the proposed framework and model.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# BUSClean:医療用AIのための乳房超音波画像前処理と知識抽出のためのオープンソースソフトウェア BUSClean: Open-source software for breast ultrasound image pre-processing and knowledge extraction for medical AI ( http://arxiv.org/abs/2407.11316v1 ) ライセンス: Link先を確認	Arianna Bunnell, Kailee Hung, John A. Shepherd, Peter Sadowski,	(参考訳) 医療画像のための人工知能(AI)の開発は、数十万の画像からなる大規模な臨床データセットのキュレーションとクリーニングを要求する。マンモグラフィーのようないくつかのモダリティは、高度に標準化されたイメージングを含んでいる。対照的に、乳房超音波画像(BUS)は、スキャンモード、ソノグラフアノテーション、追加のビューなど、スキャンメタデータによって示されない多くの不規則性を含むことができる。臨床BUSデータセットを自動処理するオープンソースソフトウェアソリューションを提案する。このアルゴリズムは、ソノグラフアノテーションからBUSスキャンフィルタリング、クリーニング、知識抽出を行う。モジュラーデザインにより、ユーザーは新しい設定に適応できる。 430の臨床的BUS画像の内部試験データセットの実験は、あらゆる種類のテキストアノテーションの検出において、95%の感度と98%の特異性、98%の感度と特異性、血液フローハイライト、代替スキャンモード、または無効スキャンによるスキャンの検出において達成される。 A case study on a completely external, public dataset of BUS scans found that BUSClean identified text annotations and scan with blood flow highlighting with 88.6% and 90.9% sensitivity and 98.3% and 99.9% specificity。ケーススタディに特有のキャリパーの種類を考慮に入れた病変キャリパー検出法の適応は、新しいデータ分布におけるBUSCleanの使用を意図し、病変キャリパー検出の性能を43.3%、93.3%のアウト・オブ・ザ・ボックスから92.1%、92.3%の感度と特異性に向上させる。ソースコード、サンプルノート、サンプルデータはhttps://github.com/hawaii-ai/bus-cleaning.comで公開されている。 Development of artificial intelligence (AI) for medical imaging demands curation and cleaning of large-scale clinical datasets comprising hundreds of thousands of images. Some modalities, such as mammography, contain highly standardized imaging. In contrast, breast ultrasound imaging (BUS) can contain many irregularities not indicated by scan metadata, such as enhanced scan modes, sonographer annotations, or additional views. We present an open-source software solution for automatically processing clinical BUS datasets. The algorithm performs BUS scan filtering, cleaning, and knowledge extraction from sonographer annotations. Its modular design enables users to adapt it to new settings. Experiments on an internal testing dataset of 430 clinical BUS images achieve >95% sensitivity and >98% specificity in detecting every type of text annotation, >98% sensitivity and specificity in detecting scans with blood flow highlighting, alternative scan modes, or invalid scans. A case study on a completely external, public dataset of BUS scans found that BUSClean identified text annotations and scans with blood flow highlighting with 88.6% and 90.9% sensitivity and 98.3% and 99.9% specificity, respectively. Adaptation of the lesion caliper detection method to account for a type of caliper specific to the case study demonstrates intended use of BUSClean in new data distributions and improved performance in lesion caliper detection from 43.3% and 93.3% out-of-the-box to 92.1% and 92.3% sensitivity and specificity, respectively. Source code, example notebooks, and sample data are available at https://github.com/hawaii-ai/bus-cleaning.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# マヨラナ・クリフォード群の構造 The Structure of the Majorana Clifford Group ( http://arxiv.org/abs/2407.11319v1 ) ライセンス: Link先を確認	Valérie Bettaque, Brian Swingle,	(参考訳) 量子情報科学において、クリフォード作用素と安定化符号は量子ビット(または量子ビット)系において中心的な役割を果たす。本稿では,マヨラナフェルミオン系の類似物について検討する。決定的な役割はフェルミオンパリティ対称性 (fermion parity symmetric) によって演じられる。パリティ保存型フェルミオンクリフォードの部分群は二進体 $\mathbb{F}_2$ 上の直交群で表せることを証明し、演算子をブレイディングして生成し、任意の(偶数の)マヨラナ安定化符号を構成する方法を示す。また、このいわゆる p-クリフォード群に対するフレームポテンシャルを解析し、これはヒルベルト空間の固定パリティセクターで作用する通常のクリフォード群のフレームポテンシャルと同値であることを示した。 In quantum information science, Clifford operators and stabilizer codes play a central role for systems of qubits (or qudits). In this paper, we study the analogous objects for systems of Majorana fermions. A crucial role is played by fermion parity symmetry, which is an unbreakable symmetry present in any system in which the fundamental degrees of freedom are fermionic. We prove that the subgroup of parity-preserving fermionic Cliffords can be represented by the orthogonal group over the binary field $\mathbb{F}_2$, and we show how it can be generated by braiding operators and used to construct any (even-parity) Majorana stabilizer code. We also analyze the frame potential for this so-called p-Clifford group, proving that it is equivalent to the frame potential of the ordinary Clifford group acting on a fixed-parity sector of the Hilbert space.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# A2E:ドライバーレスタクシーサービスへのアクセスのための属性に基づく匿名化認証 A2E: Attribute-based Anonymity-Enhanced Authentication for Accessing Driverless Taxi Service ( http://arxiv.org/abs/2407.11320v1 ) ライセンス: Link先を確認	Yanwei Gong, Xiaolin Chang, Jelena Mišić, Vojislav B. Mišić,	(参考訳) タクシーとしての無人車は、都市交通効率を高める可能性から注目を集めている。しかし、未管理の物理的利用者の無人タクシー(DT)による予期せぬ事故と、DTに乗る場合の個人化のニーズの両方が、ユーザアイデンティティと属性の認証を必要としている。さらに、ユーザIDのプライバシを保護し、DTの採用を強化する必要があれば、悪意のあるユーザを迅速にトレースすることは、依然として課題である。本稿では,DTサービスにアクセスするためのA2E(Attribute-based Anonymity Enhanced)認証方式を提案する。セキュリティ面から、A2Eは属性検証可能性を持ち、再実行可能なシグネチャに基づいてユーザ属性クレデンシャルを設計することで達成される。一方、この属性クレデンシャルはリンク不能と偽造不能も満足している。さらに、A2Eは、リングシグネチャとシークレット共有を利用した分散型クレデンシャル発行機構を設計し、匿名IDとの関連性からユーザ属性を保護することで、匿名性を向上した。さらに、このメカニズムはユーザに対してトレーサビリティと非フレーム性を提供します。パフォーマンス面では、悪意のあるユーザをトレースし、資格情報を更新する場合、A2Eはオーバーヘッドを低くする。さらに、スケーラビリティも軽量さも満足しており、A2Eの実践性に貢献している。我々は,A2Eのセキュリティと性能について,セキュリティ分析と性能評価を行う。 Driverless vehicle as a taxi is gaining more attention due to its potential to enhance urban transportation efficiency. However, both unforeseen incidents led by unsupervised physical users' driverless taxi (DT) rides and personalized needs of users when riding in a DT necessitate the authentication of user identity and attributes. Moreover, safeguarding user identity privacy and quickly tracing malicious users if necessary to enhance the adoption of DTs remains a challenge. This paper proposes a novel Attribute-based Anonymity Enhanced (A2E) authentication scheme for users to access DT service. From the security aspect, A2E has attribute verifiability, which is achieved by designing a user attribute credential based on redactable signature. Meanwhile, this attribute credential also satisfies unlinkability and unforgeability. In addition, A2E has enhanced anonymity, which is achieved by designing a decentralized credential issuance mechanism utilizing ring signature and secret sharing, safeguarding user attributes from association with anonymous identities. Moreover, this mechanism provides traceability and non-frameability to users. From the performance aspect, A2E causes low overhead when tracing malicious users and updating credentials. Besides, both scalability and lightweight are satisfied, which contributes to A2E's practicability. We conduct security analysis and performance evaluation to the security and performance capabilities of A2E.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# TCTCer:Token Clustering Transformerによる視覚認識 TCFormer: Visual Recognition via Token Clustering Transformer ( http://arxiv.org/abs/2407.11321v1 ) ライセンス: Link先を確認	Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang,	(参考訳) トランスフォーマーはコンピュータビジョン領域で広く使われており、大きな成功を収めている。ほとんどの最先端のアプローチでは、イメージを通常のグリッドに分割し、各グリッド領域を視覚トークンで表現する。しかし、固定されたトークン分布は、異なる画像領域の意味を無視し、結果として準最適性能をもたらす。この問題に対処するために,意味的意味に基づく動的視覚トークンを生成するToken Clustering Transformer (TCFormer)を提案する。ダイナミックトークンには2つの重要な特徴がある:(1)同じ視覚トークンを用いて類似の意味を持つ画像領域を表現し、(2)それらの領域が隣接していない場合でも、(2)貴重な詳細を持つ領域に集中し、細かなトークンを用いてそれらを表現する。画像分類,人物ポーズ推定,セマンティックセグメンテーション,オブジェクト検出など,さまざまな応用の広範な実験を通じて,TCFormerの有効性を実証する。この作業のコードとモデルはhttps://github.com/zengwang430521/TCFormer.comで公開されている。 Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Transformer (TCFormer), which generates dynamic vision tokens based on semantic meaning. Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens. Through extensive experimentation across various applications, including image classification, human pose estimation, semantic segmentation, and object detection, we demonstrate the effectiveness of our TCFormer. The code and models for this work are available at https://github.com/zengwang430521/TCFormer.	翻訳日:2024-07-17 18:52:01 公開日:2024-07-16
# VISA:大規模言語モデルによるビデオオブジェクトのセグメンテーションの推論 VISA: Reasoning Video Object Segmentation via Large Language Models ( http://arxiv.org/abs/2407.11325v1 ) ライセンス: Link先を確認	Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves,	(参考訳) 既存のビデオオブジェクトセグメンテーション(VOS)は、カテゴリ、マスク、ショートフレーズなどの明示的なユーザー指示に依存しており、世界知識の推論を必要とする複雑なビデオセグメンテーションを実行する能力を制限する。本稿では,新しいタスクであるReasoning Video Object Segmentation(ReasonVOS)を紹介する。この課題は、世界知識とビデオコンテキストに基づく複雑な推論能力を必要とする暗黙のテキストクエリに応答して、セグメンテーションマスクのシーケンスを生成することを目的としている。 ReasonVOSに取り組むために,マスクデコーダを用いたビデオ内のオブジェクトのセグメンテーションと追跡機能を有しつつ,マルチモーダルLCMの世界の知識推論能力を活用するためのVISA(ビデオベース大規模言語命令セグメンテーションアシスタント)を導入する。さらに、1,042の多様なビデオから35,074の命令マスクシーケンスペアからなる総合ベンチマークを構築し、複雑な世界知識推論をReasonVOSモデルの命令チューニングと評価のためのセグメンテーションタスクに組み込む。 8つのデータセットで行った実験は、ビデオ領域と画像領域の両方において、複雑な推論セグメンテーションとバニラ参照セグメンテーションに取り組む上で、VISAの有効性を示す。コードとデータセットはhttps://github.com/cilinyan/VISAで公開されている。 Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as categories, masks, or short phrases, restricting their ability to perform complex video segmentation requiring reasoning with world knowledge. In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS). This task aims to generate a sequence of segmentation masks in response to implicit text queries that require complex reasoning abilities based on world knowledge and video contexts, which is crucial for structured environment understanding and object-centric interactions, pivotal in the development of embodied AI. To tackle ReasonVOS, we introduce VISA (Video-based large language Instructed Segmentation Assistant), to leverage the world knowledge reasoning capabilities of multi-modal LLMs while possessing the ability to segment and track objects in videos with a mask decoder. Moreover, we establish a comprehensive benchmark consisting of 35,074 instruction-mask sequence pairs from 1,042 diverse videos, which incorporates complex world knowledge reasoning into segmentation tasks for instruction-tuning and evaluation purposes of ReasonVOS models. Experiments conducted on 8 datasets demonstrate the effectiveness of VISA in tackling complex reasoning segmentation and vanilla referring segmentation in both video and image domains. The code and dataset are available at https://github.com/cilinyan/VISA.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# スペクトルテンソルトレインを用いたオープン量子系の高精度数値シミュレーション Accurate Numerical Simulations of Open Quantum Systems Using Spectral Tensor Trains ( http://arxiv.org/abs/2407.11327v1 ) ライセンス: Link先を確認	Ryan T. Grimm, Joel D. Eaves,	(参考訳) 量子ビット間のデコヒーレンス(英語版)は量子計算における主要なボトルネックである。デコヒーレンス(decoherence)は、内在的な量子および熱ゆらぎと、測定および準備プロセスを実行する外部磁場のノイズから生じる。固有・外部雑音に対する所定の色付き雑音スペクトルを用いて、固有・外部雑音の存在下での時間依存性雑音平均低減密度行列を解くために、量子加速度確率伝搬器評価法(Q-ASPEN)を提案する。 Q-ASPENは任意に正確であり、誤り訂正量子計算に必要なリソースを推定するために適用することができる。我々は、テンソルネットワークと疑似スペクトル法の利点を組み合わせたスペクトルテンソルトレインを、量子緩和問題に対する変分アンザッツとして使用し、ニューラルネットワークのトレーニングに一般的に使用される手法を用いてアンザッツを最適化する。 Q-ASPENのスペクトルテンソルは、数十の量子レベルで正確に計算できる。スピンボソンモデルにおけるQ-ASPENのベンチマークについて,内在ノイズの存在下でのQ-ASPENのベンチマーク,外在ノイズの存在下での最大32箇所の量子連鎖について述べる。本ベンチマークでは,Q-ASPENのメモリコストが基底関数の数よりも大きい場合,システムサイズと線形にスケールする。 Decoherence between qubits is a major bottleneck in quantum computations. Decoherence results from intrinsic quantum and thermal fluctuations as well as noise in the external fields that perform the measurement and preparation processes. With prescribed colored noise spectra for intrinsic and extrinsic noise, we present a numerical method, Quantum Accelerated Stochastic Propagator Evaluation (Q-ASPEN), to solve the time-dependent noise-averaged reduced density matrix in the presence of intrinsic and extrinsic noise. Q-ASPEN is arbitrarily accurate and can be applied to provide estimates for the resources needed to error-correct quantum computations. We employ spectral tensor trains, which combine the advantages of tensor networks and pseudospectral methods, as a variational ansatz to the quantum relaxation problem and optimize the ansatz using methods typically used to train neural networks. The spectral tensor trains in Q-ASPEN make accurate calculations with tens of quantum levels feasible. We present benchmarks for Q-ASPEN on the spin-boson model in the presence of intrinsic noise and on a quantum chain of up to 32 sites in the presence of extrinsic noise. In our benchmark, the memory cost of Q-ASPEN scales linearly with the system size once the number of states is larger than the number of basis functions.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# Swarmをナビゲートする:Deep Neural Networkは創発的行動を制御する Navigating the swarm: Deep neural networks command emergent behaviours ( http://arxiv.org/abs/2407.11330v1 ) ライセンス: Link先を確認	Dongjo Kim, Jeongsu Lee, Ho-Young Kim,	(参考訳) 複雑な系の相互作用する個人はしばしば、協調した大域構造を示すコヒーレントな運動を引き起こす。このような現象は、細胞移動、細菌群集、動物や昆虫群、さらには人間社会など、自然界で広く見られる。集団行動の出現に寄与する主要なメカニズムは、平均的または相対的な速度に基づく局所的なアライメント、距離に基づく電位のような非局所的な相互反発的相互作用、局所的な相互作用と非局所的な相互作用の相互作用、認知に基づく不均一な相互作用などである。しかし、これらのメカニズムを創発的行動に適応させる方法を見つけることは、いまだ解明されていない。ここでは、エージェント間相互作用ルールを微調整することにより、目的とする大域的パターンを用いて、所望のタイミングで集団行動の協調構造を生成できることを実証する。我々の戦略は、望ましい集合構造を指示する相互作用規則を見つけるために、ダイナミックスの法則に従うディープニューラルネットワークを用いています。相互作用規則の分散と整合力への分解は、多項式級数で表される、望ましい相互作用モデルを提案するためにニューラルネットワークのトレーニングを促進する。代表的な例としては、渦群におけるクラスターの平均半径と大きさの変更、ランダム状態から順序状態への遷移のタイミング、集団運動の典型的なモードの連続的なシフトがある。この戦略は、集合的なモードを重畳するためにも利用でき、その結果、探索されていないが、保護的なセキュリティ形成のような非常に実用的なハイブリッドな集団パターンが生まれる。本研究は, ロボット群操作, アクティブ物質組織, 生体システムにおける不明瞭な相互作用ルールの解明における新たな応用の道を開くことを目的とした, 集団動作の生成と制御のための革新的な戦略を明らかにするものである。 Interacting individuals in complex systems often give rise to coherent motion exhibiting coordinated global structures. Such phenomena are ubiquitously observed in nature, from cell migration, bacterial swarms, animal and insect groups, and even human societies. Primary mechanisms responsible for the emergence of collective behavior have been extensively identified, including local alignments based on average or relative velocity, non-local pairwise repulsive-attractive interactions such as distance-based potentials, interplay between local and non-local interactions, and cognitive-based inhomogeneous interactions. However, discovering how to adapt these mechanisms to modulate emergent behaviours remains elusive. Here, we demonstrate that it is possible to generate coordinated structures in collective behavior at desired moments with intended global patterns by fine-tuning an inter-agent interaction rule. Our strategy employs deep neural networks, obeying the laws of dynamics, to find interaction rules that command desired collective structures. The decomposition of interaction rules into distancing and aligning forces, expressed by polynomial series, facilitates the training of neural networks to propose desired interaction models. Presented examples include altering the mean radius and size of clusters in vortical swarms, timing of transitions from random to ordered states, and continuously shifting between typical modes of collective motions. This strategy can even be leveraged to superimpose collective modes, resulting in hitherto unexplored but highly practical hybrid collective patterns, such as protective security formations. Our findings reveal innovative strategies for creating and controlling collective motion, paving the way for new applications in robotic swarm operations, active matter organisation, and for the uncovering of obscure interaction rules in biological systems.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# LaMI-DETR:言語モデル命令による開語彙検出 LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction ( http://arxiv.org/abs/2407.11335v1 ) ライセンス: Link先を確認	Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu,	(参考訳) 既存の手法では、CLIPのような視覚言語モデル(VLM)の頑健なオープン語彙認識機能を活用することにより、オープン語彙オブジェクトの検出が向上するが、概念表現の欠如により、CLIPのテキスト空間内のカテゴリ名がテキストや視覚的知識を欠いている。 2) VLM から検出器への移動において,基本カテゴリに偏りを呈するオープン語彙の過剰適合傾向に対処するため,視覚的概念間の関係を生かした言語モデル命令 (LaMI) 戦略を提案する。LaMI-DETR.LaMI は GPT を利用して視覚的概念を構築し,カテゴリ間の類似性を調査する。これらのカテゴリ間関係は,概念表現を洗練し,基本カテゴリへの過度な適合を回避するとともに,我々のアプローチが,外部トレーニングリソースに依存しないような厳密な方法で,既存の手法よりも優れたパフォーマンスを実証する。LaMI-DETR は,AP 43 の OV 43 のレアボックスを達成している。 Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the open vocabulary knowledge biased towards base categories during the transfer from VLMs to detectors.To address these challenges, we propose the Language Model Instruction (LaMI) strategy, which leverages the relationships between visual concepts and applies them within a simple yet effective DETR-like detector, termed LaMI-DETR.LaMI utilizes GPT to construct visual concepts and employs T5 to investigate visual similarities across categories.These inter-category relationships refine concept representation and avoid overfitting to base categories.Comprehensive experiments validate our approach's superior performance over existing methods in the same rigorous setting without reliance on external training resources.LaMI-DETR achieves a rare box AP of 43.4 on OV-LVIS, surpassing the previous best by 7.8 rare box AP.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# オンラインCenterLineグラフ学習のための連続性保存 Continuity Preserving Online CenterLine Graph Learning ( http://arxiv.org/abs/2407.11337v1 ) ライセンス: Link先を確認	Yunhui Han, Kun Yu, Zhiwei Li,	(参考訳) 通常、中心線グラフによってモデル化されるレーントポロジーは、ハイレベルな自律運転には不可欠である。高品質グラフでは、トポロジー接続性と中心線セグメントの空間連続性の両方が重要である。しかし、既存のアプローチのほとんどは、連続性を無視しながら接続性に注意を払っている。このような中心線グラフは、通常、自律運転の計画に問題を引き起こす。この問題を解決するために,1) 接合点の正確な予測に先立って位置付けを行うJunction Aware Query Enhancement Module,2) B\'ezier Space Connection Module, 3) トポロジカル接続を反復的に改善するグラフベースのネットワークであるIterative Topology Refinement Moduleの3つの主要なモジュール,CGNetを提案する。 CGNetはnuScenesとArgoverse2データセットの両方で最先端のパフォーマンスを実現している。 Lane topology, which is usually modeled by a centerline graph, is essential for high-level autonomous driving. For a high-quality graph, both topology connectivity and spatial continuity of centerline segments are critical. However, most of existing approaches pay more attention to connectivity while neglect the continuity. Such kind of centerline graph usually cause problem to planning of autonomous driving. To overcome this problem, we present an end-to-end network, CGNet, with three key modules: 1)Junction Aware Query Enhancement module, which provides positional prior to accurately predict junction points; 2)B\'ezier Space Connection module, which enforces continuity constraints on any two topologically connected segments in a B\'ezier space; 3) Iterative Topology Refinement module, which is a graph-based network with memory to iteratively refine the predicted topological connectivity. CGNet achieves state-of-the-art performance on both nuScenes and Argoverse2 datasets.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# 共振共振器チェーンにおける非相互単光子バンド構造 Nonreciprocal Single-Photon Band Structure in a Coupled-Spinning-Resonator chain ( http://arxiv.org/abs/2407.11339v1 ) ライセンス: Link先を確認	Jing Li, Ya Yang, Xun Wei Xu, Jing Lu, Hui Jing, Lan Zhou,	(参考訳) 単光子バンド構造と1次元共振共振器鎖における単一光子の輸送を解析した。共振器チェーンの時間反転対称性は、外部磁場や合成磁場の代わりに共振器の回転によって破られる。スピン共振器の角速度に依存する共振共振器チェーンにおいて、2つの非相互単光子バンドギャップが得られる。非相互帯域ギャップに基づいて、複数の周波数窓に単一光子循環器を実装でき、異なる帯域ギャップに対して光子サイクリングの方向が反対である。さらに、すべての共振器が等角速度で同じ方向に回転する場合、共振共振器チェインで相互に単光子バンド構造を実現できる。我々の研究は、非相互または相互の単光子バンド構造を達成、操作、切り替える新しい経路を開き、新しい単光子デバイスを実現する新しい機会を提供する。 We analyze the single-photon band structure and the transport of a single photon in a one-dimensional coupled-spinning-resonator chain. The time-reversal symmetry of the resonators chain is broken by the spinning of the resonators, instead of external or synthetic magnetic field. Two nonreciprocal single-photon band gaps can be obtained in the coupled-spinning-resonator chain, whose width depends on the angular velocity of the spinning resonator. Based on the nonreciprocal band gaps, we can implement a single photon circulator at multiple frequency windows, and the direction of photon cycling is opposite for different band gaps. In addition, reciprocal single-photon band structures can also be realized in the coupled-spinning-resonator chain when all resonators rotate in the same direction with equal angular velocity. Our work open a new route to achieve, manipulate, and switch nonreciprocal or reciprocal single-photon band structures, and provides new opportunities to realize novel single-photon devices.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# Ev-GS:高効率かつ高精度な放射場レンダリングのためのイベントベースガウススプラッティング Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering ( http://arxiv.org/abs/2407.11343v1 ) ライセンス: Link先を確認	Jingqian Wu, Shuo Zhu, Chutian Wang, Edmund Y. Lam,	(参考訳) イベントカメラを用いたコンピュータニューロモルフィックイメージング(CNI)は、従来のフレームベースの手法と比較して、最小の運動ぼけや拡張されたダイナミックレンジのような利点を提供する。既存のイベントベースレーダランス場レンダリング手法は, 計算的に重く, 再構成速度が遅いニューラルレーダランス場上に構築されている。この2つの側面を動機として,単眼イベントカメラから3次元ガウススプラッティングを推定する初のCNIインフォームドスキームであるEv-GSを導入する。純粋なイベントベースの監視で3Dガウシアンを活用することで、Ev-GSは高速移動物体の検出や照明不足といった課題を克服する。実験結果から,フレームベースの信号を入力として,ぼやけを低減し,視覚的品質を向上させたリアルなビューをレンダリングすることで,Ev-GSの精度が向上することが示された。さらに,信号処理に高効率なCNIアプローチを採用する既存手法と比較して,競争力のある再構成品質と計算能力の低下を示す。 Computational neuromorphic imaging (CNI) with event cameras offers advantages such as minimal motion blur and enhanced dynamic range, compared to conventional frame-based methods. Existing event-based radiance field rendering methods are built on neural radiance field, which is computationally heavy and slow in reconstruction speed. Motivated by the two aspects, we introduce Ev-GS, the first CNI-informed scheme to infer 3D Gaussian splatting from a monocular event camera, enabling efficient novel view synthesis. Leveraging 3D Gaussians with pure event-based supervision, Ev-GS overcomes challenges such as the detection of fast-moving objects and insufficient lighting. Experimental results show that Ev-GS outperforms the method that takes frame-based signals as input by rendering realistic views with reduced blurring and improved visual quality. Moreover, it demonstrates competitive reconstruction quality and reduced computing occupancy compared to existing methods, which paves the way to a highly efficient CNI approach for signal processing.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# あらゆるモダリティの価値を集中する:効率的かつ弾力的なモダリティ非依存セマンティックセマンティックセマンティックセグメンテーションを目指して Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation ( http://arxiv.org/abs/2407.11344v1 ) ライセンス: Link先を確認	Xu Zheng, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang,	(参考訳) 任意の数のモダリティを融合させることは、セマンティックセグメンテーションの堅牢なマルチモーダル融合を実現する上で不可欠である。最近の試みでは、RGBのモダリティを中心とみなし、その他を補助的とみなし、2つの枝を持つ非対称なアーキテクチャを生み出している。しかし、RGBのモダリティは特定の状況、例えば夜間、他の状況、例えばイベントデータ、それらのメリットを所有する状況で苦労する可能性があるため、融合モデルが堅牢で脆弱なモダリティを識別し、回復力のあるマルチモーダルフレームワークを学ぶために最も堅牢で脆弱なモダリティを組み込むのは必須である。そこで本研究では,コンパクトモデルから高性能モデルに至るまで,様々なバックボーンと柔軟にペアリングできるMAGICという新しい手法を提案する。本手法は2つの重要なプラグアンドプレイモジュールから構成される。まず,マルチモーダルバッチの特徴を効率的に処理し,補完的なシーン情報を抽出する多モーダルアグリゲーションモジュールを提案する。さらに、類似度スコアに基づいて、複数のモーダル特徴をランク付けするベンチマークとして、集約された特徴を利用するために、統一された任意のモーダル選択モジュールを提案する。このようにして、RGBのモダリティへの依存を排除し、セグメンテーション性能を確保しつつ、センサの故障を克服することができる。一般に検討されているマルチモーダル設定では,モデルパラメータを60%削減しつつ,最先端の性能を実現する。さらに,本手法は,<19.41% mIoU>の大きなマージンで先行芸術を上回り,モダリティに依存しない新しい環境において優れている。 Fusing an arbitrary number of modalities is vital for achieving robust multi-modal fusion of semantic segmentation yet remains less explored to date. Recent endeavors regard RGB modality as the center and the others as the auxiliary, yielding an asymmetric architecture with two branches. However, the RGB modality may struggle in certain circumstances, e.g., nighttime, while others, e.g., event data, own their merits; thus, it is imperative for the fusion model to discern robust and fragile modalities, and incorporate the most robust and fragile ones to learn a resilient multi-modal framework. To this end, we propose a novel method, named MAGIC, that can be flexibly paired with various backbones, ranging from compact to high-performance models. Our method comprises two key plug-and-play modules. Firstly, we introduce a multi-modal aggregation module to efficiently process features from multi-modal batches and extract complementary scene information. On top, a unified arbitrary-modal selection module is proposed to utilize the aggregated features as the benchmark to rank the multi-modal features based on the similarity scores. This way, our method can eliminate the dependence on RGB modality and better overcome sensor failures while ensuring the segmentation performance. Under the commonly considered multi-modal setting, our method achieves state-of-the-art performance while reducing the model parameters by 60%. Moreover, our method is superior in the novel modality-agnostic setting, where it outperforms prior arts by a large margin of +19.41% mIoU	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# Beyond Binary: Generative Pretrained Transformer と End-to-End Model を用いたマルチクラスパラパシア検出 Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models ( http://arxiv.org/abs/2407.11345v1 ) ライセンス: Link先を確認	Matthew Perez, Aneesha Sampath, Minxue Niu, Emily Mower Provost,	(参考訳) 失語症(英: Aphasia)は、言語障害の一種で、失語、置換、または単語の発明を含むパラ失語と呼ばれる言語エラーを引き起こす。自動失語症検出は、臨床評価と治療計画の選択肢を促進することで失語症の患者を助けることができる。しかし、ほとんどの自動失語症検出作業はバイナリー検出のみに焦点を当てており、失語症の有無のみを認識する必要がある。マルチクラス失語症検出は、複数のタイプの失語症を特定し、特定の音声セグメントでそれらがどこで起こるかに焦点を当てた、探索されていない研究領域である。本稿では、生成事前学習型トランスフォーマー(GPT)を用いて書き起こしから失語を識別する手法と、自動音声認識(ASR)と失語症分類の両方を1つのシーケンスに対して複数のシーケンスとしてモデル化することに焦点を当てた2つのエンドツーエンドアプローチを提案する。単一シーケンスモデルはマルチクラスパラパシア検出においてGPTベースラインより優れていることを示す。 Aphasia is a language disorder that can lead to speech errors known as paraphasias, which involve the misuse, substitution, or invention of words. Automatic paraphasia detection can help those with Aphasia by facilitating clinical assessment and treatment planning options. However, most automatic paraphasia detection works have focused solely on binary detection, which involves recognizing only the presence or absence of a paraphasia. Multiclass paraphasia detection represents an unexplored area of research that focuses on identifying multiple types of paraphasias and where they occur in a given speech segment. We present novel approaches that use a generative pretrained transformer (GPT) to identify paraphasias from transcripts as well as two end-to-end approaches that focus on modeling both automatic speech recognition (ASR) and paraphasia classification as multiple sequences vs. a single sequence. We demonstrate that a single sequence model outperforms GPT baselines for multiclass paraphasia detection.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# I$^2$-SLAM:ロバストフォトリアリスティック高密度SLAMの反転イメージングプロセス I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM ( http://arxiv.org/abs/2407.11347v1 ) ライセンス: Link先を確認	Gwangtak Bae, Changwoon Choi, Hyeongjun Heo, Sang Min Kim, Young Min Kim,	(参考訳) カジュアルにキャプチャされたシナリオに対して、既存の視覚SLAMパイプラインの堅牢性を高めることができる逆画像形成モジュールを提案する。カジュアルビデオは、しばしば動きのぼやけや様々な外観に悩まされ、コヒーレントな3D視覚表現の最終的な品質を低下させる。本稿では、線形HDR放射率マップを用いて測定値の収集を行うSLAMシステムに物理画像を統合することを提案する。具体的には、それぞれのフレームがカメラ軌道に沿って複数のポーズの画像を集約し、手持ちの動画で一般的な動きのぼやけを説明する。さらに、画像形成ステップ、すなわちホワイトバランス、露出時間、カメラ応答関数に明示的な変数を割り当てることで、フレーム単位の外観変化に対応する。追加変数の合同最適化により、SLAMパイプラインはより正確な軌跡を持つ高品質な画像を生成する。広汎な実験により,ニューラルラディアンスフィールドやガウススプラッティングなどの様々なシーン表現を用いて,近年の視覚SLAMパイプラインに本手法を組み込むことが実証された。 We present an inverse image-formation module that can enhance the robustness of existing visual SLAM pipelines for casually captured scenarios. Casual video captures often suffer from motion blur and varying appearances, which degrade the final quality of coherent 3D visual representation. We propose integrating the physical imaging into the SLAM system, which employs linear HDR radiance maps to collect measurements. Specifically, individual frames aggregate images of multiple poses along the camera trajectory to explain prevalent motion blur in hand-held videos. Additionally, we accommodate per-frame appearance variation by dedicating explicit variables for image formation steps, namely white balance, exposure time, and camera response function. Through joint optimization of additional variables, the SLAM pipeline produces high-quality images with more accurate trajectories. Extensive experiments demonstrate that our approach can be incorporated into recent visual SLAM pipelines using various scene representations, such as neural radiance fields or Gaussian splatting.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# 部分分割法と画像生成に基づくフラットフィッシュ病の検出 Flatfish Disease Detection Based on Part Segmentation Approach and Disease Image Generation ( http://arxiv.org/abs/2407.11348v1 ) ライセンス: Link先を確認	Seo-Bin Hwang, Han-Young Kim, Chae-Yeon Heo, Hie-Yong Jung, Sung-Ju Jung, Yeong-Jun Cho,	(参考訳) ヒラメは、世界中で大量に消費される主要な養殖種である。しかし、人口密集した農業環境のため、ヒラメは怪我や病気の影響を受けやすいため、早期発見が不可欠である。伝統的に、視覚検査によって病気が検出されるが、多くの魚を観察することは困難である。深層学習技術に基づく自動的なアプローチは、この問題に対処するために広く用いられているが、魚の多様性と魚病データセットの欠如により、正確な検出は難しいままである。本研究では, 魚の病気画像を, 生成的敵ネットワークと画像調和法を用いて拡張する。次に、疾患検出装置は、個々の疾患に適切に対処するために3つの身体部分(頭、ひれ、体)を個別に訓練する。さらに, 提案手法を用いて, ヒラメのイメージデータセットを作成, 検証した。また,本手法の一般化性を検証するために,フラッシュサーモン病データセットも試験した。その結果、ベースラインフレームワークよりも12倍高いパフォーマンスを実現した。本研究は,大規模なヒラメ画像データセットを作成し,有効な疾患検出フレームワークを提案するための最初の試みである。提案手法とデータセットに基づいて,農業環境において自動疾患モニタリングを実現することができる。 The flatfish is a major farmed species consumed globally in large quantities. However, due to the densely populated farming environment, flatfish are susceptible to injuries and diseases, making early disease detection crucial. Traditionally, diseases were detected through visual inspection, but observing large numbers of fish is challenging. Automated approaches based on deep learning technologies have been widely used, to address this problem, but accurate detection remains difficult due to the diversity of the fish and the lack of the fish disease dataset. In this study, augments fish disease images using generative adversarial networks and image harmonization methods. Next, disease detectors are trained separately for three body parts (head, fins, and body) to address individual diseases properly. In addition, a flatfish disease image dataset called \texttt{FlatIMG} is created and verified on the dataset using the proposed methods. A flash salmon disease dataset is also tested to validate the generalizability of the proposed methods. The results achieved 12\% higher performance than the baseline framework. This study is the first attempt to create a large-scale flatfish disease image dataset and propose an effective disease detection framework. Automatic disease monitoring could be achieved in farming environments based on the proposed methods and dataset.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# モダリティからのセマンティックセグメンテーションのためのモダリティ非依存表現の学習 Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities ( http://arxiv.org/abs/2407.11351v1 ) ライセンス: Link先を確認	Xu Zheng, Yuanhuiyi Lyu, Lin Wang,	(参考訳) 画像のモダリティは、特定の条件、例えば夜や速い動きでしばしば失敗するため、完璧ではない。これは既存のマルチモーダル(つまり Image+X)セマンティックセグメンテーションメソッドが、実世界のアプリケーションでしばしば発生するように、モダリティの欠如や失敗に直面するときの堅牢性と汎用性を著しく制限する。マルチモーダル視覚言語モデル(MVLM)のオープンワールド学習能力に触発されて,MVLMから知識蒸留(KD)を通してモダリティ非依存表現を学習する新たな方向性を探求する。直感的には、任意の視覚条件におけるモダリティの組み合わせからロバストなセグメンテーションを実現する新しいフレームワークであるAny2Segを提案する。具体的には,新しい言語誘導型意味的相関蒸留(LSCD)モジュールを導入し,MVLM,eg,LanguageBindから組込み空間におけるモーダル間およびモーダル間意味的知識の両方を伝達する。これにより、モダリティギャップを最小化し、意味的あいまいさを緩和し、どんな視覚条件でもモダリティを組み合わせることができる。次に、モーダル間相関に基づいてマルチモーダル特徴を再重み付けし、細粒度特徴を選択するモーダル非依存的特徴融合(MFF)モジュールを提案する。このように、Any2Segは最終的に最適なモダリティに依存しない表現をもたらす。 4つのモダリティを持つ2つのベンチマークの大規模な実験は、Any2Segがマルチモーダル設定(+3.54 mIoU)の下で最先端を達成し、挑戦的なモダリティ不完全設定(+19.79 mIoU)で優れていることを示した。 Image modality is not perfect as it often fails in certain conditions, e.g., night and fast motion. This significantly limits the robustness and versatility of existing multi-modal (i.e., Image+X) semantic segmentation methods when confronting modality absence or failure, as often occurred in real-world applications. Inspired by the open-world learning capability of multi-modal vision-language models (MVLMs), we explore a new direction in learning the modality-agnostic representation via knowledge distillation (KD) from MVLMs. Intuitively, we propose Any2Seg, a novel framework that can achieve robust segmentation from any combination of modalities in any visual conditions. Specifically, we first introduce a novel language-guided semantic correlation distillation (LSCD) module to transfer both inter-modal and intra-modal semantic knowledge in the embedding space from MVLMs, e.g., LanguageBind. This enables us to minimize the modality gap and alleviate semantic ambiguity to combine any modalities in any visual conditions. Then, we introduce a modality-agnostic feature fusion (MFF) module that reweights the multi-modal features based on the inter-modal correlation and selects the fine-grained feature. This way, our Any2Seg finally yields an optimal modality-agnostic representation. Extensive experiments on two benchmarks with four modalities demonstrate that Any2Seg achieves the state-of-the-art under the multi-modal setting (+3.54 mIoU) and excels in the challenging modality-incomplete setting(+19.79 mIoU).	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# 非パラメトリック回帰のためのシャープ一般化による過パラメータニューラルネットワークの事前条件付き勾配 Preconditioned Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression ( http://arxiv.org/abs/2407.11353v1 ) ライセンス: Link先を確認	Yingzhen Yang,	(参考訳) 本稿では、勾配降下(GD)またはその変種により訓練された過パラメータ化された2層ニューラルネットワークによる非パラメトリック回帰を考察する。ニューラルネットワークが早期停止と目標関数のスペクトルバイアスが深層学習文献で広く研究されている新しいプレコンディショニング勾配(PGD)でトレーニングされている場合、トレーニングされたネットワークは、特に、極小マックスの最適レートが$\cO({1}/{n^{4\alpha/(4\alpha+1)}})$、現在の標準レートが$\cO({1}/{n^{2\alpha/(2\alpha+1)}})$、$2\alpha = d/(d-1)$、$2\alphaが$\RR^d$で均一に分散され、$n$がトレーニングデータのサイズであることを示す。対象関数にスペクトルバイアスがない場合、早期停止を伴う正規GDで訓練されたニューラルネットワークは、依然として極小の最適速度を享受していることを証明し、この場合、現在の既知の結果と対照的に分布仮定を必要としない。私たちの結果は2つの重要な技術的貢献に基づいています。第一に、NTKへの一様収束は、PGD または GD によるトレーニングプロセス中に確立され、GD または PGD の任意のステップにおいて、ニューラルネットワーク関数を RKHS の関数と、小さな$L^{\infty}$-norm の誤差関数に分解することができる。第2に、局所ラデマチャー複雑性は、GDまたはPGDによって得られる全てのニューラルネットワーク機能からなる関数クラスのラデマチャー複雑性を強固に束縛するために使用される。また、PGDは、通常のGDによってトレーニングされたネットワークアーキテクチャによって誘導される通常のNTKに比べて、トレーニング中にカーネルの複雑さが低い異なるカーネルを誘導するため、通常のNTKの線形構造を避け、よりシャープな一般化バウンダリを得る別の方法であることを示す。 We consider nonparametric regression by an over-parameterized two-layer neural network trained by gradient descent (GD) or its variant in this paper. We show that, if the neural network is trained with a novel Preconditioned Gradient Descent (PGD) with early stopping and the target function has spectral bias widely studied in the deep learning literature, the trained network renders a particularly sharp generalization bound with a minimax optimal rate of $\cO({1}/{n^{4\alpha/(4\alpha+1)}})$, which is sharper the current standard rate of $\cO({1}/{n^{2\alpha/(2\alpha+1)}})$ with $2\alpha = d/(d-1)$ when the data is distributed uniformly on the unit sphere in $\RR^d$ and $n$ is the size of the training data. When the target function has no spectral bias, we prove that neural network trained with regular GD with early stopping still enjoys minimax optimal rate, and in this case our results do not require distributional assumptions in contrast with the current known results. Our results are built upon two significant technical contributions. First, uniform convergence to the NTK is established during the training process by PGD or GD, so that we can have a nice decomposition of the neural network function at any step of GD or PGD into a function in the RKHS and an error function with a small $L^{\infty}$-norm. Second, local Rademacher complexity is employed to tightly bound the Rademacher complexity of the function class comprising all the possible neural network functions obtained by GD or PGD. Our results also indicate that PGD can be another way of avoiding the usual linear regime of NTK and obtaining sharper generalization bound, because PGD induces a different kernel with lower kernel complexity during the training than the regular NTK induced by the network architecture trained by regular GD.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# The Devil is in the Statistics: Mitigating and Exploiting Statistics difference for generalizable Semi-supervised Medical Image Segmentation The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2407.11356v1 ) ライセンス: Link先を確認	Muyang Qiu, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao,	(参考訳) 医用画像セグメンテーションにおける領域一般化の成功にもかかわらず、すべてのソースドメインに対するvoxel-wiseアノテーションは依然として大きな負担である。半教師付き領域の一般化は、複数の医療機関から収集された豊富なラベルなしデータとともに、同時にラベルなしデータを正確に活用し、同時に一般化を改善しながら、制限付きラベル付きデータを活用することで、この課題に対処するために提案されている。本研究では,医療機関間のドメインシフトが異質な特徴統計を引き起こし,予期せぬ正規化プロセスにより擬似ラベルの品質が著しく低下するのを観察する。それでもこの現象は、目に見えない領域の一般化を促進するために利用することができる。そこで,我々は提案する。 1)信頼できる擬似ラベルに対する領域シフトの影響を緩和する複数の統計・個人分枝 2) ドメイン不変な特徴学習のための統計集約ブランチ。さらに,画像レベルでのヒストグラムマッチングによる摂動と特徴レベルでのランダムなバッチ正規化選択戦略という2つの側面から,統計的差異のある未確認領域をシミュレートする。 3つの医用画像データセットの評価結果から,最近のSOTA法と比較して,本手法の有効性が示された。コードはhttps://github.com/qiumuyang/SIABで公開されている。 Despite the recent success of domain generalization in medical image segmentation, voxel-wise annotation for all source domains remains a huge burden. Semi-supervised domain generalization has been proposed very recently to combat this challenge by leveraging limited labeled data along with abundant unlabeled data collected from multiple medical institutions, depending on precisely harnessing unlabeled data while improving generalization simultaneously. In this work, we observe that domain shifts between medical institutions cause disparate feature statistics, which significantly deteriorates pseudo-label quality due to an unexpected normalization process. Nevertheless, this phenomenon could be exploited to facilitate unseen domain generalization. Therefore, we propose 1) multiple statistics-individual branches to mitigate the impact of domain shifts for reliable pseudo-labels and 2) one statistics-aggregated branch for domain-invariant feature learning. Furthermore, to simulate unseen domains with statistics difference, we approach this from two aspects, i.e., a perturbation with histogram matching at image level and a random batch normalization selection strategy at feature level, producing diverse statistics to expand the training distribution. Evaluation results on three medical image datasets demonstrate the effectiveness of our method compared with recent SOTA methods. The code is available at https://github.com/qiumuyang/SIAB.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# SES: グラフニューラルネットワークの説明可能性と予測のギャップを埋める SES: Bridging the Gap Between Explainability and Prediction of Graph Neural Networks ( http://arxiv.org/abs/2407.11358v1 ) ライセンス: Link先を確認	Zhenhua Huang, Kunhao Li, Shaojie Wang, Zhaohong Jia, Wentao Zhu, Sharad Mehrotra,	(参考訳) グラフニューラルネットワーク(GNN)のグラフデータを解析する習熟度にもかかわらず、高精度で解釈可能な予測を実現することは依然として困難である。既存のGNNインタプリタは、通常、GNNの予測から外れたポストホックな説明を提供し、誤った表現をもたらす。自己説明可能なGNNは、トレーニングプロセス中にビルトインの説明を提供する。しかし、予測性能を向上させるために説明結果を利用することができず、ノードの特徴の高品質な説明を提供しず、説明可能な部分グラフを生成するために追加のプロセスを必要とするため、コストがかかる。上記の制限に対処するため、説明可能性と予測のギャップを埋める自己説明型自己教師型グラフニューラルネットワーク(SES)を提案する。 SESは説明可能なトレーニングと予測学習の2つのプロセスから構成される。説明可能なトレーニングの間、SESはグラフエンコーダと共同でトレーニングされたグローバルマスクジェネレータを使用し、重要な構造と特徴マスクを直接生成し、時間消費を低減し、ノードの特徴とサブグラフの説明を提供する。強化された予測学習フェーズでは、マスクベースの正負のペアが3重項損失を計算し、対照的な学習によってノード表現を強化するために説明を利用して構築される。 Despite the Graph Neural Networks' (GNNs) proficiency in analyzing graph data, achieving high-accuracy and interpretable predictions remains challenging. Existing GNN interpreters typically provide post-hoc explanations disjointed from GNNs' predictions, resulting in misrepresentations. Self-explainable GNNs offer built-in explanations during the training process. However, they cannot exploit the explanatory outcomes to augment prediction performance, and they fail to provide high-quality explanations of node features and require additional processes to generate explainable subgraphs, which is costly. To address the aforementioned limitations, we propose a self-explained and self-supervised graph neural network (SES) to bridge the gap between explainability and prediction. SES comprises two processes: explainable training and enhanced predictive learning. During explainable training, SES employs a global mask generator co-trained with a graph encoder and directly produces crucial structure and feature masks, reducing time consumption and providing node feature and subgraph explanations. In the enhanced predictive learning phase, mask-based positive-negative pairs are constructed utilizing the explanations to compute a triplet loss and enhance the node representations by contrastive learning.	翻訳日:2024-07-17 18:42:16 公開日:2024-07-16
# 共有値に対する特徴推論攻撃 Feature Inference Attack on Shapley Values ( http://arxiv.org/abs/2407.11359v1 ) ライセンス: Link先を確認	Xinjian Luo, Yangfan Jiang, Xiaokui Xiao,	(参考訳) 協調ゲーム理論におけるソリューション概念として、Shapleyの価値はモデル解釈可能性の研究で高く評価されており、Google、Microsoft、IBMといった主要なMLaaSプロバイダによって広く採用されている。しかしながら、Shapley値に基づくモデル解釈可能性法が徹底的に研究されているため、解釈可能性とプライバシが機械学習(ML)モデルの2つの基盤であるにもかかわらず、Shapley値によって引き起こされるプライバシーリスクを考慮する研究者はほとんどいない。本稿では,特徴推論攻撃を用いたShapley値に基づくモデル解釈可能性手法のプライバシリスクについて検討する。具体的には2つの敵を紹介します。第1の敵は、補助データセットに基づいて攻撃モデルをトレーニングし、モデル解釈可能性サービスへのブラックボックスアクセスをトレーニングすることで、プライベート入力を再構築することができる。第2の敵は、背景知識がなくても、モデル入力と出力の局所的な線形相関を利用して、プライベートな特徴のほとんどを再構築することができる。私たちは、主要なMLaaSプラットフォーム、すなわちGoogle Cloud、Microsoft Azure、IBM aix360に対する提案された攻撃を実行します。実験結果は、MLaaSプラットフォームで使用されている最先端のShapley値ベースのモデル解釈可能性メソッドの脆弱性を実証し、将来の研究におけるプライバシ保護モデル解釈可能性メソッドの設計の重要性と必要性を強調した。われわれの知る限りでは、これはShapleyの価値観のプライバシーリスクを調査する最初の研究でもある。 As a solution concept in cooperative game theory, Shapley value is highly recognized in model interpretability studies and widely adopted by the leading Machine Learning as a Service (MLaaS) providers, such as Google, Microsoft, and IBM. However, as the Shapley value-based model interpretability methods have been thoroughly studied, few researchers consider the privacy risks incurred by Shapley values, despite that interpretability and privacy are two foundations of machine learning (ML) models. In this paper, we investigate the privacy risks of Shapley value-based model interpretability methods using feature inference attacks: reconstructing the private model inputs based on their Shapley value explanations. Specifically, we present two adversaries. The first adversary can reconstruct the private inputs by training an attack model based on an auxiliary dataset and black-box access to the model interpretability services. The second adversary, even without any background knowledge, can successfully reconstruct most of the private features by exploiting the local linear correlations between the model inputs and outputs. We perform the proposed attacks on the leading MLaaS platforms, i.e., Google Cloud, Microsoft Azure, and IBM aix360. The experimental results demonstrate the vulnerability of the state-of-the-art Shapley value-based model interpretability methods used in the leading MLaaS platforms and highlight the significance and necessity of designing privacy-preserving model interpretability methods in future studies. To our best knowledge, this is also the first work that investigates the privacy risks of Shapley values.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# ソーンとアルゴリズム:キリンとアカシアに触発された創造的AIチャレンジをナビゲートする Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias ( http://arxiv.org/abs/2407.11360v1 ) ライセンス: Link先を確認	Waqar Hussain,	(参考訳) 人間とジェネレーティブAI(Gen AI)の相互作用は、アフリカサバンナのキリンとアカシアのダイナミックな関係と、洞察に富んでいる。キリンがアカシアの角ばった防御線をナビゲートして栄養を得るのと同じように、人間はGen AIと関わり、倫理的および運用上の課題を通じてその利益を活用する。この論文は、まだ環境をマスターしている若いキリンのように、人間がGen AIに適応し形作りする初期段階にあることを探求する。偏見、誤情報、プライバシー侵害などのリスクを軽減し、Gen AIの進化に影響を与え、形作るのに役立つ。キリン・アカシアのアナロジーは、人間とAIの関係を適切に形作っているが、自然の進化的完全性は、人造技術の本質的な欠陥と人間がそれを誤用する傾向とを対比し、多くの倫理的ジレンマを引き起こしている。 HHHフレームワークを通じて、AI開発に有用な、誠実で、無害な価値を埋め込むための経路を特定し、人間の価値に共鳴する安全に整合したエージェントを育む。この物語は、人間のレジリエンスと適応性について慎重に楽観的な見解を示しており、我々の能力は技術の活用と安全対策を効果的に実施することができる。それは、人間とAIが相互利益のために継続的に互いに形成する共生関係を強調している。 The interplay between humans and Generative AI (Gen AI) draws an insightful parallel with the dynamic relationship between giraffes and acacias on the African Savannah. Just as giraffes navigate the acacia's thorny defenses to gain nourishment, humans engage with Gen AI, maneuvering through ethical and operational challenges to harness its benefits. This paper explores how, like young giraffes that are still mastering their environment, humans are in the early stages of adapting to and shaping Gen AI. It delves into the strategies humans are developing and refining to help mitigate risks such as bias, misinformation, and privacy breaches, that influence and shape Gen AI's evolution. While the giraffe-acacia analogy aptly frames human-AI relations, it contrasts nature's evolutionary perfection with the inherent flaws of human-made technology and the tendency of humans to misuse it, giving rise to many ethical dilemmas. Through the HHH framework we identify pathways to embed values of helpfulness, honesty, and harmlessness in AI development, fostering safety-aligned agents that resonate with human values. This narrative presents a cautiously optimistic view of human resilience and adaptability, illustrating our capacity to harness technologies and implement safeguards effectively, without succumbing to their perils. It emphasises a symbiotic relationship where humans and AI continually shape each other for mutual benefit.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# グラフ構造プロンプト学習:グラフニューラルネットワークの性能向上のための新しい手法 Graph Structure Prompt Learning: A Novel Methodology to Improve Performance of Graph Neural Networks ( http://arxiv.org/abs/2407.11361v1 ) ライセンス: Link先を確認	Zhenhua Huang, Kunhao Li, Shaojie Wang, Zhaohong Jia, Wentao Zhu, Sharad Mehrotra,	(参考訳) グラフニューラルネットワーク(GNN)はグラフデータモデリングに広く応用されている。しかし、既存のGNNは、しばしばタスク駆動方式で訓練され、グラフ構造の本質的な性質を完全に把握できず、その結果、準最適ノードとグラフ表現が生じる。この制限に対処するために,自然言語処理の促進メカニズムにインスパイアされたGNNの訓練を強化する新しいグラフ構造であるPrompt Learning法(GPL)を提案する。 GPLはタスク非依存のグラフ構造損失を利用して、GNNがダウンストリームタスクを同時に解決し、高品質なノードとグラフ表現を生成する一方で、固有のグラフ特性を学習することを奨励している。 GPLで訓練された11の実世界のデータセットに関する広範な実験において、GNNはノード分類、グラフ分類、エッジ予測タスク(それぞれ10.28%、16.5%、24.15%)において、元のパフォーマンスを大きく上回った。 GNNがGPL内のグラフの固有の構造的プロンプトをキャプチャできるようにすることで、オーバースムース(over-smooth)の問題を緩和し、新しい最先端のパフォーマンスを達成することができる。 Graph neural networks (GNNs) are widely applied in graph data modeling. However, existing GNNs are often trained in a task-driven manner that fails to fully capture the intrinsic nature of the graph structure, resulting in sub-optimal node and graph representations. To address this limitation, we propose a novel Graph structure Prompt Learning method (GPL) to enhance the training of GNNs, which is inspired by prompt mechanisms in natural language processing. GPL employs task-independent graph structure losses to encourage GNNs to learn intrinsic graph characteristics while simultaneously solving downstream tasks, producing higher-quality node and graph representations. In extensive experiments on eleven real-world datasets, after being trained by GPL, GNNs significantly outperform their original performance on node classification, graph classification, and edge prediction tasks (up to 10.28%, 16.5%, and 24.15%, respectively). By allowing GNNs to capture the inherent structural prompts of graphs in GPL, they can alleviate the issue of over-smooth and achieve new state-of-the-art performances, which introduces a novel and effective direction for GNN research with potential applications in various domains.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# 学習強化最大独立集合 Learning-augmented Maximum Independent Set ( http://arxiv.org/abs/2407.11364v1 ) ライセンス: Link先を確認	Vladimir Braverman, Prathamesh Dharangutte, Vihan Shah, Chen Wang,	(参考訳) 学習強化アルゴリズムの枠組みにおける一般グラフ上での最大独立集合(MIS)問題について検討する。 MIS問題はNPハードであることが知られており、任意の$\delta>0$に対して$n^{1-\delta}$に近似するNPハードでもある。固定MISの頂点メンバシップクエリに1/2+\varepsilon$の確率で答える機械学習モデルから得られる予測によって得られたオラクルの存在下で、この障壁を破ることができることを示す。最初の設定では、頂点が固定MISに属しているかどうかを知るために頂点毎に1回、オラクルは1/2 + \varepsilon$の確率で正しい答えを返すことができる。この設定では、$\tilde{O}(\sqrt{\Delta}/\varepsilon)$-approximation in $O(m)$ time where $\Delta$ is the maximum degree of the graph。 2つ目の設定では、頂点のオラクルへの複数のクエリを許容し、それぞれが確率1/2 + \varepsilon$で正しい。この設定では、$O(n/\varepsilon^2)$トータルクエリと$\tilde{O}(m)$ランタイムを使用して、$O(1)$-approximationアルゴリズムを示す。 We study the Maximum Independent Set (MIS) problem on general graphs within the framework of learning-augmented algorithms. The MIS problem is known to be NP-hard and is also NP-hard to approximate to within a factor of $n^{1-\delta}$ for any $\delta>0$. We show that we can break this barrier in the presence of an oracle obtained through predictions from a machine learning model that answers vertex membership queries for a fixed MIS with probability $1/2+\varepsilon$. In the first setting we consider, the oracle can be queried once per vertex to know if a vertex belongs to a fixed MIS, and the oracle returns the correct answer with probability $1/2 + \varepsilon$. Under this setting, we show an algorithm that obtains an $\tilde{O}(\sqrt{\Delta}/\varepsilon)$-approximation in $O(m)$ time where $\Delta$ is the maximum degree of the graph. In the second setting, we allow multiple queries to the oracle for a vertex, each of which is correct with probability $1/2 + \varepsilon$. For this setting, we show an $O(1)$-approximation algorithm using $O(n/\varepsilon^2)$ total queries and $\tilde{O}(m)$ runtime.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# パンパンガにおける体育教員の教育・学習・キャリア向上におけるICT能力の重要性 Perceived Importance of ICT Proficiency for Teaching, Learning, and Career Progression among Physical Education Teachers in Pampanga ( http://arxiv.org/abs/2407.11366v1 ) ライセンス: Link先を確認	Kristine Joy D. Magallanes, Mark Brianne C. Carreon, Kristalyn C. Miclat, Niña Vina V. Salita, Gino A. Sumilhig, Raymart Christopher C. Guevarra, John Paul P. Miranda,	(参考訳) 情報通信技術(ICT)の統合は、体育(PE)を含む様々な教育分野においてますます重要になっている。本研究の目的は,メキシコ・パンパンガ市における高等学校PE教員のICT活用能力の評価と,教員の教育・学習におけるICT能力の重要性,キャリアの進歩,および実習性との関連性を検討することである。本研究は定量的記述的アプローチを用いた。回答者はメキシコの市町村であるパンパンガ出身のPE教師であった。この調査は2部構成のサーベイを用いた。第1部は、年齢、性別、ランク・ポジション、教育経験年数などの人口統計データを収集し、第2部は、ICTスキルレベルと、ICTが教育、学習、キャリアの進歩において重要であると認識されたことを評価した。その結果,PE教師の大多数がICTリソースにアクセスできたことがわかった。しかし,これらの道具の習熟度は多様であった。年齢, 教職経験, 職業的地位などの要因は, 教員の習熟度や, PE教育におけるICT統合のメリットに対する認識に大きく影響を及ぼすことがわかった。この研究は、フィリピンのパンパンガ州メキシコの高等学校PE教師のICT統合の現状を垣間見せてくれた。これはまた、改善の領域を強調します。本研究は, 政策立案者, 管理者, 研修プログラム開発者は, 教員のICT能力向上に重点を置き, 教員の実践や学生のエンゲージメントを向上させる必要があることを示唆している。 PE教師のICT能力の向上は、より良い教育経験を育み、学生のエンゲージメントを高め、総合的な教育成果を促進することが推奨されている。 The integration of information and communication technology (ICT) has become increasingly vital across various educational fields, including physical education (PE). This study aimed to evaluate the proficiency levels of PE teachers in using various ICT applications and to examine the relationship between the perceived importance of ICT proficiency for teaching and learning, career advancement, and actual proficiency among Senior High school PE teachers in the municipality of Mexico, Pampanga. This study employed a quantitative descriptive approach. PE teachers from the municipality of Mexico, Pampanga, were selected as the respondents. This study used a two-part survey. The first section collected demographic data, such as age, gender, rank/position, and years of teaching experience, and the second section assessed ICT skill levels and the perceived importance of ICT in teaching, learning, and career progression. The results revealed that the majority of PE teachers had access to ICT resources. However, their proficiency levels with these tools varied significantly. Factors such as age, teaching experience, and professional position were found to significantly influence teachers proficiency and their perceptions of the benefits of ICT integration in PE instruction. The study provided a glimpse of the current state of ICT integration among Senior High school PE teachers in Mexico, Pampanga, Philippines. This also highlights areas of improvement. The study suggests that policymakers, administrators, and training program developers should focus on enhancing the ICT proficiency of PE teachers to improve teaching practices and student engagement. Enhancing the ICT proficiency of PE teachers is recommended to foster better teaching experiences, increase student engagement, and promote overall educational outcomes.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# ポストセレクトフォンノイマン測定による2モード圧縮真空状態の非古典的特性の増強 Enhancement of nonclassical properties of two-mode squeezed vacuum state with postselected von Neumann measurement ( http://arxiv.org/abs/2407.11367v1 ) ライセンス: Link先を確認	Janarbek Yuanbek, Yi-Fang Ren, Ahmad Abliz, Yusuf Turek,	(参考訳) 弱値増幅が2モードスクイーズ真空状態の非古典的特性に及ぼす影響について検討した。選択後弱測定に基づく2モードスクイーズ真空状態の利点を示す。 We investigate the effects of weak value amplification on the nonclassical properties of two-mode squeezing vacuum state. To show the advantages of the two-mode squeezing vacuum state based post-selective weak measurements.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# 古代韓国のアーカイブ翻訳:統計的フレーズアライメント、LLMインコンテクスト学習、およびメソッド間アプローチの比較分析 Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach ( http://arxiv.org/abs/2407.11368v1 ) ライセンス: Link先を確認	Sojung Lucia Kim, Taehong Jang, Joonmo Ahn,	(参考訳) 本研究は,古文書をスパースコーパスに翻訳する3つの手法を比較することを目的としている。(1)フレーズアライメントの従来の統計的翻訳法,(2)インコンテクストLLM学習法,(3)ソース・ターゲットコーパスの統一集合から派生した文片トークンを用いた統計的機械翻訳法を提案する。本研究における提案手法の性能はBLEUスコア36.71で,SOLAR-10.7Bの文脈学習と既存のSeq2Seqモデルに勝っている。さらなる分析と議論を行う。 This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach in this study is 36.71 in BLEU score, surpassing the scores of SOLAR-10.7B context learning and the best existing Seq2Seq model. Further analysis and discussion are presented.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# ネイティブ音声コーパスのみを用いたGSLMによる外国人アクセントのシミュレーションに関する実験的検討 A Pilot Study of GSLM-based Simulation of Foreign Accentuation Only Using Native Speech Corpora ( http://arxiv.org/abs/2407.11370v1 ) ライセンス: Link先を確認	Kentaro Onda, Joonyong Park, Nobuaki Minematsu, Daisuke Saito,	(参考訳) 生成音声言語モデル (GSLM) を母語コーパスのみを用いて, 外国語アクセントの人為的過程をシミュレーションする手法を提案する。外国語の話し言葉を聴き、それを繰り返すと、その聞き手のL1のアクセントで繰り返される。これは、音声単語がL1の音韻的単位の列として精神的に表現され、これらの単位が口頭再生に使用されるためとされる。我々は、言語Aの音声を言語BのGSLMに入力し、Bのアクセントを入力音声に加えることで、このプロセスをシミュレートする。外部入力音声に対してL1のASRを実行し、L1のTSにASR結果を与えるプロセスは、このアプローチの素直な実装と見なすことができる。実験の結果,L1がBの話者が生成したAの実際のサンプルと比較すると,音声の合成アクセントは極めて自然であり,アクセントの程度は制御可能であることがわかった。 We propose a method of simulating the human process of foreign accentuation using Generative Spoken Language Model (GSLM) only with native speech corpora. When one listens to spoken words of a foreign language and repeats them, the repeated speech is often with the accent of that listener's L1. This is said to be because the spoken words are mentally represented as a sequence of phonological units of the L1, and those units are used for oral reproduction. We simulate this process by inputting speech of language A into GSLM of language B to add B's accent onto the input speech. The process of running ASR of the L1 for foreign input speech and giving the ASR result to TTS of the L1 can be viewed as a naive implementation of this approach. The results of our experiments show that the synthesized accent of the output speech is highly natural, compared to real samples of A generated by speakers whose L1 is B, and that the degree of accentuation is controllable.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# シーケンスアノテーションによる合意の推定 Estimating Agreement by Chance for Sequence Annotation ( http://arxiv.org/abs/2407.11371v1 ) ライセンス: Link先を確認	Diya Li, Carolyn Rosé, Ao Yuan, Chunxiao Zhou,	(参考訳) 自然言語処理の分野では、アノテーションの信頼性を評価する上で、チャンス合意に対する性能評価の補正が重要な役割を担っている。しかし、現場で広く普及しているにもかかわらず、シーケンスアノテーションタスクの信頼性を評価するための確率補正に焦点を当てた研究が目覚ましい。このギャップに対処するために、シーケンスアノテーションタスクにおける確率一致を推定する基礎となるランダムアノテーションを生成する新しいモデルを提案する。提案したランダム化モデルと関連する比較手法を用いて、各注釈付きテキストセグメントの確率的位置の計算とそれに続く確率一致推定を可能にし、分布の分析形式を導出する。シミュレーションとコーパスに基づく評価を組み合わせることで,適用性を評価し,精度と有効性を検証した。 In the field of natural language processing, correction of performance assessment for chance agreement plays a crucial role in evaluating the reliability of annotations. However, there is a notable dearth of research focusing on chance correction for assessing the reliability of sequence annotation tasks, despite their widespread prevalence in the field. To address this gap, this paper introduces a novel model for generating random annotations, which serves as the foundation for estimating chance agreement in sequence annotation tasks. Utilizing the proposed randomization model and a related comparison approach, we successfully derive the analytical form of the distribution, enabling the computation of the probable location of each annotated text segment and subsequent chance agreement estimation. Through a combination simulation and corpus-based evaluation, we successfully assess its applicability and validate its accuracy and efficacy.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# UNIT: 自動ニューラルネットワークによるバックドア緩和 UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening ( http://arxiv.org/abs/2407.11372v1 ) ライセンス: Link先を確認	Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang,	(参考訳) ディープニューラルネットワーク(DNN)は様々な分野で有効性を示している。しかし、DNNはバックドア攻撃に弱いため、インプットにトリガーと呼ばれるユニークなパターンを注入し、アタック・チョーゼンターゲットラベルの誤分類を引き起こす。既存の研究では、毒性のあるモデルにおけるバックドア効果を緩和する様々な方法が提案されているが、最近の高度な攻撃に対して効果が低い傾向にある。本稿では,様々な攻撃に対するバックドア効果を効果的に除去できる,訓練後防御技術UNITを提案する。具体的には、UNITはモデル内の各ニューロンのユニークかつ厳密な活性化分布を近似する。すると、近似境界を超える実質的な大きな活性化値を積極的に取り除く。実験の結果,UNITは既存の14件のバックドア攻撃に対して,クリーントレーニングデータの56%しか使用せず,7件の防御方法に優れていた。 UNITも費用対効果が高い。コードはhttps://github.com/Megum1/UNITでアクセスできる。 Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at https://github.com/Megum1/UNIT.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# 自然言語を超えた信頼性の高い推論 Reliable Reasoning Beyond Natural Language ( http://arxiv.org/abs/2407.11373v1 ) ライセンス: Link先を確認	Nasim Borazjanizadeh, Steven T. Piantadosi,	(参考訳) 言語能力にもかかわらず、Large Language Model (LLM) はしばしば、信頼性と柔軟に推論する能力の限界を示す。そこで本稿では,問題文からすべての関連情報を論理コード文として抽出・エンコードし,論理プログラム言語(Prolog)を用いて明示的帰納的推論の反復計算を行うニューロシンボリックアプローチを提案する。提案手法は,標準的な数学的推論ベンチマークであるGSM8kと,BIG-benchデータセットからのNavigateデータセット上でのLCMの性能を大幅に向上させる。さらに,LLMの次のトークン予測パラダイムの欠点を目標とし,複雑な非線形推論を必要とするが,解くための基本的な算術的スキルのみを必要とする,55のユニークな単語問題からなる新しいデータセットであるNon-Linear Reasoning (NLR)データセットを導入する。以上の結果から,Prologの統合により,最上級言語モデル(GPT4を含む)でもテキストのみを用いて解けないNLRデータセット上でのLLMの高性能化が可能であることが示唆された。 Despite their linguistic competence, Large Language models (LLMs) often exhibit limitations in their ability to reason reliably and flexibly. To address this, we propose a neurosymbolic approach that prompts LLMs to extract and encode all relevant information from a problem statement as logical code statements, and then use a logic programming language (Prolog) to conduct the iterative computations of explicit deductive reasoning. Our approach significantly enhances the performance of LLMs on the standard mathematical reasoning benchmark, GSM8k, and the Navigate dataset from the BIG-bench dataset. Additionally, we introduce a novel dataset, the Non-Linear Reasoning (NLR) dataset, consisting of 55 unique word problems that target the shortcomings of the next token prediction paradigm of LLMs and require complex non-linear reasoning but only basic arithmetic skills to solve. Our findings demonstrate that the integration of Prolog enables LLMs to achieve high performance on the NLR dataset, which even the most advanced language models (including GPT4) fail to solve using text only.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# 医療領域におけるニューラルネットワーク解釈のためのマスクフリーニューロン概念アノテーション Mask-Free Neuron Concept Annotation for Interpreting Neural Networks in Medical Domain ( http://arxiv.org/abs/2407.11375v1 ) ライセンス: Link先を確認	Hyeon Bae Kim, Yong Hyun Ahn, Seong Tae Kim,	(参考訳) ディープニューラルネットワークの最近の進歩は、病気の診断と医療的意思決定を支援することの公約を示している。しかし、規則に準拠したAIモデルの透明な意思決定プロセスを保証するには、モデルの内部動作の包括的な理解が必要である。しかし、従来の手法は、モデルを解釈するための高価なピクセル単位の注釈付きデータセットに大きく依存しており、医療領域において大きな欠点が示される。本稿では,Mask-free Medical Model Interpretation (MAMMI) という新しい医療ニューロン概念アノテーション手法を提案する。視覚言語モデルを用いて,ニューロン概念アノテーションのためのピクセルレベルのマスクの必要性を緩和する。 MAMMIは他の解釈法に比べて優れた性能を示し、医用画像解析においてニューロンに豊かな表現を提供することの有効性を示した。 NIH胸部X線で訓練したモデルを用いて,MAMMIの有効性を検証し,医療領域における透明な臨床診断の可能性を示した。コードはhttps://github.com/ailab-kyunghee/MAMMIで公開されている。 Recent advancements in deep neural networks have shown promise in aiding disease diagnosis and medical decision-making. However, ensuring transparent decision-making processes of AI models in compliance with regulations requires a comprehensive understanding of the model's internal workings. However, previous methods heavily rely on expensive pixel-wise annotated datasets for interpreting the model, presenting a significant drawback in medical domains. In this paper, we propose a novel medical neuron concept annotation method, named Mask-free Medical Model Interpretation (MAMMI), addresses these challenges. By using a vision-language model, our method relaxes the need for pixel-level masks for neuron concept annotation. MAMMI achieves superior performance compared to other interpretation methods, demonstrating its efficacy in providing rich representations for neurons in medical image analysis. Our experiments on a model trained on NIH chest X-rays validate the effectiveness of MAMMI, showcasing its potential for transparent clinical decision-making in the medical domain. The code is available at https://github.com/ailab-kyunghee/MAMMI.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# 量子リピータネットワークシナリオの解析的性能推定 Analytical Performance Estimations for Quantum Repeater Network Scenarios ( http://arxiv.org/abs/2407.11376v1 ) ライセンス: Link先を確認	Allen Zang, Joaquin Chung, Rajkumar Kettimuthu, Martin Suchara, Tian Zhong,	(参考訳) 量子リピータチェーンは将来の量子ネットワークのバックボーンを形成し、ネットワークノード間の絡み合いを分散する。したがって、量子リピータチェーンの絡み合い分布性能、特にスループットとレイテンシを理解することが重要である。量子リピータ連鎖の確率力学をマルコフ連鎖を用いてモデル化することにより、長期スループットと連続エンタングルメント分布のオンデマンドレイテンシの解析的推定を行う。まず、一般的な多元的プロトコルを用いたシングルリンク絡み合わせ生成について検討する。次に,2つのリンクを交互に切り換えたエンタングルメント分布を,単一あるいは二重のエンタングルメント生成プロトコルを用いてモデル化する。また、2つのリンクの結果が、一般的な2^k$-linkネストされたリピータチェーンの性能について、どのように洞察を与えるかを実証する。その結果,量子リピータネットワークの性能,特にシステムパラメータ依存性の定量的理解が深まる。解析公式自体は、量子ネットワークコミュニティにとって貴重な参照リソースである。量子ネットワークシミュレーション検証のベンチマークや、マルコフ連鎖形式を用いた量子ネットワークダイナミクスモデリングの例として機能する。 Quantum repeater chains will form the backbone of future quantum networks that distribute entanglement between network nodes. Therefore, it is important to understand the entanglement distribution performance of quantum repeater chains, especially their throughput and latency. By using Markov chains to model the stochastic dynamics in quantum repeater chains, we offer analytical estimations for long-run throughput and on-demand latency of continuous entanglement distribution. We first study single-link entanglement generation using general multiheralded protocols. We then model entanglement distribution with entanglement swapping over two links, using either a single- or a double-heralded entanglement generation protocol. We also demonstrate how the two-link results offer insights into the performance of general $2^k$-link nested repeater chains. Our results enrich the quantitative understanding of quantum repeater network performance, especially the dependence on system parameters. The analytical formulae themselves are valuable reference resources for the quantum networking community. They can serve as benchmarks for quantum network simulation validation or as examples of quantum network dynamics modeling using the Markov chain formalism.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# 医用画像におけるスペクトル解析と伝達学習の関連性を探る Exploring connections of spectral analysis and transfer learning in medical imaging ( http://arxiv.org/abs/2407.11379v1 ) ライセンス: Link先を確認	Yucheng Lu, Dovile Juodelyte, Jonathan Victor, Veronika Cheplygina,	(参考訳) 本稿では, 医用画像における周波数ショートカットに対する伝達学習とモデル感度について, スペクトル分析を用いて検討する。予め訓練されたモデル勾配と微調整されたモデル勾配と人工的に生成された周波数ショートカットのパワースペクトル密度を解析することにより、自然画像と医用画像に事前訓練されたモデル間の学習優先度の顕著な差を観察する。モデルの学習優先度がアーティファクトのパワースペクトル密度と一致した場合、そのアーティファクトに過度に適合する。これらの観測から,情報源データ編集が学習のショートカットに対するモデルの抵抗を変化させることを示す。 In this paper, we use spectral analysis to investigate transfer learning and study model sensitivity to frequency shortcuts in medical imaging. By analyzing the power spectrum density of both pre-trained and fine-tuned model gradients, as well as artificially generated frequency shortcuts, we observe notable differences in learning priorities between models pre-trained on natural vs medical images, which generally persist during fine-tuning. We find that when a model's learning priority aligns with the power spectrum density of an artifact, it results in overfitting to that artifact. Based on these observations, we show that source data editing can alter the model's resistance to shortcut learning.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# NAMER:手書き数式認識のための非自己回帰モデリング NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition ( http://arxiv.org/abs/2407.11380v1 ) ライセンス: Link先を確認	Chenyu Liu, Jia Pan, Jinshui Hu, Baocai Yin, Bing Yin, Mingjun Chen, Cong Liu, Jun Du, Qingfeng Liu,	(参考訳) 近年,文書理解における多種多様な応用のために,手書き数式認識(HMER)が注目されている。現在のメソッドは通常、オートレグレッシブ(AR)エンコーダ・デコーダフレームワーク内のイメージ・ツー・シーケンス生成タスクとしてHMERにアプローチする。しかし、これらのアプローチにはいくつかの欠点がある。 1) 全体的な言語文脈の欠如,現在の復号段階を超えて情報利用を制限すること。 2)AR復号時のエラー蓄積,及び 3)復号速度が遅い。これらの問題に対処するため,本研究では,NAMERと呼ばれるHMERのためのボトムアップ非自己回帰モデリング手法を初めて構築する。 NAMERはVisual Aware Tokenizer (VAT)とParallel Graph Decoder (PGD)で構成されている。当初、VATは目に見えるシンボルと局所的な関係を粗いレベルでトークン化する。その後、PGDは全てのトークンを洗練し、相互接続性を確立し、包括的な視覚的および言語的コンテキストを活用する。 CROHME 2014/2016/2019およびHME100Kデータセットの実験では、NAMERはExpRate上の現在の最先端(SOTA)メソッドを1.93%/2.35%/1.49%/0.62%上回るだけでなく、復号時間とFPSの13.7倍と6.7倍の大幅な高速化を実現し、NAMERの有効性と効率を実証している。 Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding. Current methods typically approach HMER as an image-to-sequence generation task within an autoregressive (AR) encoder-decoder framework. However, these approaches suffer from several drawbacks: 1) a lack of overall language context, limiting information utilization beyond the current decoding step; 2) error accumulation during AR decoding; and 3) slow decoding speed. To tackle these problems, this paper makes the first attempt to build a novel bottom-up Non-AutoRegressive Modeling approach for HMER, called NAMER. NAMER comprises a Visual Aware Tokenizer (VAT) and a Parallel Graph Decoder (PGD). Initially, the VAT tokenizes visible symbols and local relations at a coarse level. Subsequently, the PGD refines all tokens and establishes connectivities in parallel, leveraging comprehensive visual and linguistic contexts. Experiments on CROHME 2014/2016/2019 and HME100K datasets demonstrate that NAMER not only outperforms the current state-of-the-art (SOTA) methods on ExpRate by 1.93%/2.35%/1.49%/0.62%, but also achieves significant speedups of 13.7x and 6.7x faster in decoding time and overall FPS, proving the effectiveness and efficiency of NAMER.	翻訳日:2024-07-17 18:32:32 公開日:2024-07-16
# 人道支援のための衛星画像からの難民キャンプ(SAM4Refugee)内の建物識別におけるセグメンションの活用 Leveraging Segment Anything Model in Identifying Buildings within Refugee Camps (SAM4Refugee) from Satellite Imagery for Humanitarian Operations ( http://arxiv.org/abs/2407.11381v1 ) ライセンス: Link先を確認	Yunya Gao,	(参考訳) 高解像度の衛星画像から避難キャンプのある建物の足跡が更新され、関連する人道支援が可能になった。本研究では,衛星画像から建物を抽出する際のセグメンテーションのセグメンテーションにおける「セグメンテーション・アシング・モデル」と,その1つの枝であるSAM-Adapterの利用について検討する。 SAM-AdapterはSAMの軽量な適応であり、様々な難民キャンプでこの抽出作業の強力なツールとして登場した。我々の研究は、SAM-Adapterが、他の古典的(例えば、U-Net)や高度なセマンティックセグメンテーションモデル(例えば、Transformer)と比較して、データの可用性が制限されるシナリオで優れていることを証明している。さらに,モデル性能向上に有効な超解像(SR)モデルなどの手法を用いて,アップスケーリング手法がモデル性能に与える影響を強調した。さらに、この研究は、トレーニングのためにスケールアップされた画像データを使用する最初の訓練時代におけるモデルの急速な収束を含む興味深い現象を明らかにし、将来の研究の機会を示唆している。データ準備、モデルトレーニング、モデル推論、予測マスクのためのShapefileの生成の各ステップをカバーするコードは、拡張された科学コミュニティと人道活動の恩恵を受けるためにGitHubリポジトリで公開されている。 Updated building footprints with refugee camps from high-resolution satellite imagery can support related humanitarian operations. This study explores the utilization of the "Segment Anything Model" (SAM) and one of its branches, SAM-Adapter, for semantic segmentation tasks in the building extraction from satellite imagery. SAM-Adapter is a lightweight adaptation of the SAM and emerges as a powerful tool for this extraction task across diverse refugee camps. Our research proves that SAM-Adapter excels in scenarios where data availability is limited compared to other classic (e.g., U-Net) or advanced semantic segmentation models (e.g., Transformer). Furthermore, the impact of upscaling techniques on model performance is highlighted, with methods like super-resolution (SR) models proving invaluable for improving model performance. Additionally, the study unveils intriguing phenomena, including the model's rapid convergence in the first training epoch when using upscaled image data for training, suggesting opportunities for future research. The codes covering each step from data preparation, model training, model inferencing, and the generation of Shapefiles for predicted masks are available on a GitHub repository to benefit the extended scientific community and humanitarian operations.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# セグメント、リフト、フィット:2Dプロンプからの自動3D形状ラベル Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts ( http://arxiv.org/abs/2407.11382v1 ) ライセンス: Link先を確認	Jianhao Li, Tianyu Sun, Zhongdao Wang, Enze Xie, Bailan Feng, Hongbo Zhang, Ze Yuan, Ke Xu, Jiaheng Liu, Ping Luo,	(参考訳) 本稿では2Dポイントやボックスプロンプトから3Dオブジェクトを自動的にラベル付けするアルゴリズムを提案する。従来のアートとは異なり、自動ラベルはバウンディングボックスの代わりに3D形状を予測し、特定のデータセットのトレーニングを必要としない。この目的を達成するために、Segment, Lift, and Fit(SLF)パラダイムを提案する。まず、Segment Anything Model(SAM)を用いてプロンプトから高品質なインスタンスマスクを分割し、残りの問題を与えられた2次元マスクから3次元形状を予測する。この問題の性質が不明確であるため、複数の3次元形状が同一のマスクに投影できるため、大きな課題となる。この問題に対処するため、我々は2Dマスクを3D形状に上げ、その姿勢と形状を調整するために勾配勾配を利用して、プロジェクションがマスクと表面が周囲のLiDAR点に適合するまでに配置する。注目すべきなのは、特定のデータセットをトレーニングしないため、SLF自動ラベルラは他のメソッドと同じように、トレーニングセット内のバイアス付きアノテーションパターンに過度に適合しないことです。これにより、異なるデータセット間の一般化能力が改善される。 KITTIデータセットによる実験結果から,SLFオートラベルは高品質なバウンディングボックスアノテーションを生成し,AP@0.5 IoUの90%近くを達成した。生成された擬似ラベルで訓練されたディテクターは、実際の接頭辞アノテーションで訓練されたディテクターとほぼ同等に機能する。さらに、SLFオートラベルは、詳細な形状予測の有望な結果を示し、動的オブジェクトの占有アノテーションの潜在的な代替手段を提供する。 This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts, especially focusing on applications in autonomous driving. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset. We propose a Segment, Lift, and Fit (SLF) paradigm to achieve this goal. Firstly, we segment high-quality instance masks from the prompts using the Segment Anything Model (SAM) and transform the remaining problem into predicting 3D shapes from given 2D masks. Due to the ill-posed nature of this problem, it presents a significant challenge as multiple 3D shapes can project into an identical mask. To tackle this issue, we then lift 2D masks to 3D forms and employ gradient descent to adjust their poses and shapes until the projections fit the masks and the surfaces conform to surrounding LiDAR points. Notably, since we do not train on a specific dataset, the SLF auto-labeler does not overfit to biased annotation patterns in the training set as other methods do. Thus, the generalization ability across different datasets improves. Experimental results on the KITTI dataset demonstrate that the SLF auto-labeler produces high-quality bounding box annotations, achieving an AP@0.5 IoU of nearly 90\%. Detectors trained with the generated pseudo-labels perform nearly as well as those trained with actual ground-truth annotations. Furthermore, the SLF auto-labeler shows promising results in detailed shape predictions, providing a potential alternative for the occupancy annotation of dynamic objects.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# TM-PATHVQA:90000以上のテキストレス多言語質問 TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering ( http://arxiv.org/abs/2407.11383v1 ) ライセンス: Link先を確認	Tonmoy Rajkhowa, Amartya Roy Chowdhury, Sankalp Nagaonkar, Achyut Mani Tripathi,	(参考訳) 医療や医療の分野では、複雑な医療画像の分析が正確な診断に重要になるシナリオにおいて、視覚的質問回答(VQA)が有用である。現行のテキストベースのVQAシステムは、タスク実行中にハンズフリーのインタラクションとアクセシビリティが不可欠であるシナリオにおいて、その実用性を制限している。音声ベースのVQAシステムは、タスクを同時に実行しながら情報にアクセス可能な、よりよい対話手段を提供することができる。この目的のために、この研究は、英語、ドイツ語、フランス語の音声質問を含むPathVQAデータセットの拡張であるTextless Multilingual Pathological VQA(TMPathVQA)データセットを導入して、音声ベースのVQAシステムを実装した。このデータセットは5,004の病理画像と70時間の音声に基づいて、98,397の多言語音声質問と回答からなる。最後に、様々な音響的特徴と視覚的特徴の組み合わせを用いて実装されたTMPathVQAシステムをベンチマークし比較する。 In healthcare and medical diagnostics, Visual Question Answering (VQA) mayemergeasapivotal tool in scenarios where analysis of intricate medical images becomes critical for accurate diagnoses. Current text-based VQA systems limit their utility in scenarios where hands-free interaction and accessibility are crucial while performing tasks. A speech-based VQA system may provide a better means of interaction where information can be accessed while performing tasks simultaneously. To this end, this work implements a speech-based VQA system by introducing a Textless Multilingual Pathological VQA (TMPathVQA) dataset, an expansion of the PathVQA dataset, containing spoken questions in English, German & French. This dataset comprises 98,397 multilingual spoken questions and answers based on 5,004 pathological images along with 70 hours of audio. Finally, this work benchmarks and compares TMPathVQA systems implemented using various combinations of acoustic and visual features.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# InvAgent:サプライチェーンにおけるインベントリマネジメントのための大規模言語モデルに基づくマルチエージェントシステム InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains ( http://arxiv.org/abs/2407.11384v1 ) ライセンス: Link先を確認	Yinzhu Quan, Zefang Liu,	(参考訳) サプライチェーン管理(SCM)は、商品を効率的に届けるために、商品、情報、財務のフローを調整する。現在の揮発性、不確実性、複雑、曖昧性(VUCA)の世界では、効果的な在庫管理が不可欠である。これまでの研究では,在庫管理におけるヒューリスティック手法と強化学習の優位性を実証してきた。しかし、在庫管理のための多エージェントシステムにおいて、大規模言語モデル(LLM)を自律エージェントとして適用することは、まだ未定である。本研究では,マルチエージェントインベントリシステムを管理するためにLLMを用いた新しい手法を提案する。ゼロショット学習機能を活用することで、当社のモデルであるInvAgentはレジリエンスを高め、サプライチェーンネットワーク全体の効率を向上します。我々の貢献は、ゼロショット学習にLLMを活用して、事前訓練をせずに適応的かつ情報的意思決定を可能にすること、CoT(Chain-of-Thought)を通じて大きな説明可能性と明確性を提供し、コストを最小化し、在庫を回避しながら、様々な需要シナリオに動的適応性を示すことである。さまざまなシナリオにわたる広範囲な評価は、SCMにおける私たちのモデルの効率を浮き彫りにします。 Supply chain management (SCM) involves coordinating the flow of goods, information, and finances across various entities to deliver products efficiently. Effective inventory management is crucial in today's volatile, uncertain, complex, and ambiguous (VUCA) world. Previous research has demonstrated the superiority of heuristic methods and reinforcement learning applications in inventory management. However, the application of large language models (LLMs) as autonomous agents in multi-agent systems for inventory management remains underexplored. This study introduces a novel approach using LLMs to manage multi-agent inventory systems. Leveraging their zero-shot learning capabilities, our model, InvAgent, enhances resilience and improves efficiency across the supply chain network. Our contributions include utilizing LLMs for zero-shot learning to enable adaptive and informed decision-making without prior training, providing significant explainability and clarity through Chain-of-Thought (CoT), and demonstrating dynamic adaptability to varying demand scenarios while minimizing costs and avoiding stockouts. Extensive evaluations across different scenarios highlight the efficiency of our model in SCM.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# サブワイブル分布の指数傾き Exponential tilting of subweibull distributions ( http://arxiv.org/abs/2407.11386v1 ) ライセンス: Link先を確認	F. William Townes,	(参考訳) 部分ワイブル分布のクラスは、最近、部分指数および部分ガウス確率変数の重要な性質を一般化することが示されている。本報告では, サブワイブル分布の代替特性について述べるとともに, 指数傾斜後, 尾の挙動が保たれる条件について詳述する。 The class of subweibull distributions has recently been shown to generalize the important properties of subexponential and subgaussian random variables. We describe alternative characterizations of subweibull distributions and detail the conditions under which their tail behavior is preserved after exponential tilting.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# 2フレーバーシュウィンガーモデルにおけるテータ依存性質量スペクトルのDMRGによる研究 DMRG study of the theta-dependent mass spectrum in the 2-flavor Schwinger model ( http://arxiv.org/abs/2407.11391v1 ) ライセンス: Link先を確認	Etsuko Itou, Akira Matsumoto, Yuya Tanizaki,	(参考訳) 我々は密度行列再正規化群(DMRG)を用いてハミルトン形式における2ドルフレーバーシュウィンガーモデルの$\theta$依存質量スペクトルを研究する。複合粒子の質量、ピオンとシグマ中間体は2つの独立した方法で計算される。 1つは改良された一点関数スキームであり、そこでは、境界状態に結合した局所中間子作用素を測定し、その指数的崩壊から質量を抽出する。 $\theta$ 項は非自明な作用素混合を引き起こすので、相関行列を対角化して中間子作用素を定義することでそれを解き放つ。もう1つは分散関係スキームであり、ハミルトン形式主義に特有のヒューリスティックなアプローチである。我々は励起状態のエネルギーと運動量を測定することによって直接分散関係を得る。符号問題はこれらの方法で回避され、その結果は大きな$\theta$であっても互いに一致する。ピオン質量の$m/g=0.1$の$\theta$-dependenceがボゾン化モデルによる予測と一致していることを明らかにする。また、シグマ中間子の質量は半古典式 $M_{\sigma}/M_{\pi}=\sqrt{3}$ を満たす。この関係によりシグマ中間子は安定粒子であるが、eta中間子はもはや$G$-parityによって保護されず、$\theta\neq 0$に対して不安定となる。 We study the $\theta$-dependent mass spectrum of the massive $2$-flavor Schwinger model in the Hamiltonian formalism using the density-matrix renormalization group(DMRG). The masses of the composite particles, the pion and sigma meson, are computed by two independent methods. One is the improved one-point-function scheme, where we measure the local meson operator coupled to the boundary state and extract the mass from its exponential decay. Since the $\theta$ term causes a nontrivial operator mixing, we unravel it by diagonalizing the correlation matrix to define the meson operator. The other is the dispersion-relation scheme, a heuristic approach specific to Hamiltonian formalism. We obtain the dispersion relation directly by measuring the energy and momentum of the excited states. The sign problem is circumvented in these methods, and their results agree with each other even for large $\theta$. We reveal that the $\theta$-dependence of the pion mass at $m/g=0.1$ is consistent with the prediction by the bosonized model. We also find that the mass of the sigma meson satisfies the semi-classical formula, $M_{\sigma}/M_{\pi}=\sqrt{3}$, for almost all region of $\theta$. While the sigma meson is a stable particle thanks to this relation, the eta meson is no longer protected by the $G$-parity and becomes unstable for $\theta\neq 0$.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# CIC-BART-SSA:構造化セマンティック拡張による制御可能な画像キャプション CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation ( http://arxiv.org/abs/2407.11393v1 ) ライセンス: Link先を確認	Kalliopi Basioti, Mohamed A. Abdelsalam, Federico Fancellu, Vladimir Pavlovic, Afsaneh Fazly,	(参考訳) Controllable Image Captioning (CIC)は、エンドユーザ、例えばリージョン、エンティティ、興味のあるイベントなどの情報に基づいて、画像の自然言語記述を生成することを目的としている。しかし、利用可能な画像言語データセットは、主に画像全体を記述するキャプションを含んでいるため、任意の領域や関係のサブセットに参加可能なCICモデルのトレーニングには効果がない。この課題に対処するために、画像に関連付けられた既存の字幕セットの上に構築された統一的な構造的意味表現を用いて、集中型および視覚的接地された字幕をサンプリングする、新しい完全自動手法を提案する。我々は、抽象的意味表現(AMR)を利用して、現在の手法の典型的な空間関係のみの焦点を超えて、エンティティ間の空間-意味関係を符号化する。本研究では,SSA(Structured Semantic Augmentation)フレームワークを用いて,既存の画像キャプチャデータセットを制御キャプションで拡張し,空間的・意味的多様性と焦点範囲を増大させる。次に、CICタスクに適した新しいモデルであるCIC-BART-SSAを開発し、その制御信号をSSAに分散したデータセットから出力する。我々は、SOTA CICモデルと比較して、CIC-BART-SSAは、多様性とテキスト品質に優れたキャプションを生成し、制御性に競争力があり、また、難易度の高いシナリオに効率よく一般化することで、広範と高度に焦点を絞ったキャプション性能のギャップを最小化できることを実証的に示す。コードはhttps://github.com/SamsungLabs/CIC-BART-SSAで公開されている。 Controllable Image Captioning (CIC) aims at generating natural language descriptions for an image, conditioned on information provided by end users, e.g., regions, entities or events of interest. However, available image--language datasets mainly contain captions that describe the entirety of an image, making them ineffective for training CIC models that can potentially attend to any subset of regions or relationships. To tackle this challenge, we propose a novel, fully automatic method to sample additional focused and visually grounded captions using a unified structured semantic representation built on top of the existing set of captions associated with an image. We leverage Abstract Meaning Representation (AMR), a cross-lingual graph-based semantic formalism, to encode all possible spatio-semantic relations between entities, beyond the typical spatial-relations-only focus of current methods. We use this Structured Semantic Augmentation (SSA) framework to augment existing image--caption datasets with the grounded controlled captions, increasing their spatial and semantic diversity and focal coverage. We then develop a new model, CIC-BART-SSA, specifically tailored for the CIC task, that sources its control signals from SSA-diversified datasets. We empirically show that, compared to SOTA CIC models, CIC-BART-SSA generates captions that are superior in diversity and text quality, are competitive in controllability, and, importantly, minimize the gap between broad and highly focused controlled captioning performance by efficiently generalizing to the challenging highly focused scenarios. Code is available at https://github.com/SamsungLabs/CIC-BART-SSA.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# DreamCatalyst: 編集可能性とアイデンティティ保存の制御による高速かつ高品質な3D編集 DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation ( http://arxiv.org/abs/2407.11394v1 ) ライセンス: Link先を確認	Jiwook Kim, Seonho Lee, Jaeyo Shin, Jiho Choi, Hyunjung Shim,	(参考訳) スコア蒸留サンプリング(SDS)は,本質的な3D一貫性のため,テキスト駆動型3D編集作業において有効なフレームワークとして登場した。しかし、既存のSDSベースの3D編集手法は、広範囲なトレーニング時間に悩まされ、主に拡散モデルのサンプリング力学から逸脱するため、低品質な結果をもたらす。本稿では,SDSベースの編集を拡散逆過程として解釈する新しいフレームワークであるDreamCatalystを提案する。目的関数はサンプリングのダイナミクスを考慮し,DreamCatalystの最適化プロセスは編集作業における拡散逆過程の近似となる。 DreamCatalystは、トレーニング時間を短縮し、編集品質を改善することを目的としている。 DreamCatalystは,(1)NeRFシーンを25分程度で編集する高速モード,(2)高品質モード,そして,70分未満で優れた結果が得られる。具体的には、我々の高品質モードは、現在の最先端のNeRF編集方法よりも、スピードと品質の両面で優れています。より広範な結果については、プロジェクトのページを参照してください。 Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks due to its inherent 3D consistency. However, existing SDS-based 3D editing methods suffer from extensive training time and lead to low-quality results, primarily because these methods deviate from the sampling dynamics of diffusion models. In this paper, we propose DreamCatalyst, a novel framework that interprets SDS-based editing as a diffusion reverse process. Our objective function considers the sampling dynamics, thereby making the optimization process of DreamCatalyst an approximation of the diffusion reverse process in editing tasks. DreamCatalyst aims to reduce training time and improve editing quality. DreamCatalyst presents two modes: (1) a faster mode, which edits the NeRF scene in only about 25 minutes, and (2) a high-quality mode, which produces superior results in less than 70 minutes. Specifically, our high-quality mode outperforms current state-of-the-art NeRF editing methods both in terms of speed and quality. See more extensive results on our project page: https://dream-catalyst.github.io.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# Animate3D:マルチビュービデオ拡散によるどんな3Dモデルでもアニメーション化 Animate3D: Animating Any 3D Model with Multi-view Video Diffusion ( http://arxiv.org/abs/2407.11398v1 ) ライセンス: Link先を確認	Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao,	(参考訳) 近年の4D生成技術は、事前訓練されたテキストや単一ビューの画像条件付きモデルを蒸留することによって、主に4Dコンテンツを生成することに重点を置いている。多視点特性を持つオフ・ザ・シェルフの3Dアセットを利用するのは不便であり、それらの結果は、監視信号の固有のあいまいさによる時空間的不整合に悩まされる。本稿では,静的な3Dモデルをアニメーションする新しいフレームワークであるAnimate3Dを紹介する。中心となる考え方は2つあります。 1) 静的な3Dオブジェクトの多視点レンダリングを前提とした新しい多視点ビデオ拡散モデル(MV-VDM)を提案し, 提案した大規模多視点ビデオデータセット(MV-Video)をトレーニングした。 2) MV-VDMをベースとした4次元スコア蒸留サンプリング(4D-SDS)と4次元スコア蒸留サンプリング(4D-SDS)を組み合わせたフレームワークを導入し,3次元オブジェクトのアニメーション化に多視点ビデオ拡散の先駆けを生かした。具体的には,MV-VDMに対して,空間的・時間的整合性を高めるために3次元およびビデオ拡散モデルを統合することで,新しい時空間アテンションモジュールを設計する。さらに,静的な3次元モデルのマルチビューレンダリングを条件として,そのアイデンティティを保持する。まず,生成したマルチビュービデオから直接動きを再構成し,次に4D-SDSを導入して外観と動きを改良する。定性的かつ定量的な実験は、Animate3Dが以前のアプローチよりも大幅に優れていることを示した。データ、コード、モデルは公開されます。 Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. In this work, we present Animate3D, a novel framework for animating any static 3D model. The core idea is two-fold: 1) We propose a novel multi-view video diffusion model (MV-VDM) conditioned on multi-view renderings of the static 3D object, which is trained on our presented large-scale multi-view video dataset (MV-Video). 2) Based on MV-VDM, we introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects. Specifically, for MV-VDM, we design a new spatiotemporal attention module to enhance spatial and temporal consistency by integrating 3D and video diffusion models. Additionally, we leverage the static 3D model's multi-view renderings as conditions to preserve its identity. For animating 3D models, an effective two-stage pipeline is proposed: we first reconstruct motions directly from generated multi-view videos, followed by the introduced 4D-SDS to refine both appearance and motion. Qualitative and quantitative experiments demonstrate that Animate3D significantly outperforms previous approaches. Data, code, and models will be open-released.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# EndoFinder: 説明可能な大腸ポリープ診断のためのオンライン画像検索 EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis ( http://arxiv.org/abs/2407.11401v1 ) ライセンス: Link先を確認	Ruijie Yang, Yan Zhu, Peiyao Fu, Yizhe Zhang, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang, Shuo Wang,	(参考訳) 大腸内視鏡検査で悪性ポリープを切除する必要性を判断することは患者に不可欠であるが,病理組織学的検査の時間的・費用的な性質から困難である。深層学習に基づく分類モデルは、内視鏡画像を用いた光学的生検の達成を約束する一方で、説明可能性の欠如に悩まされることが多い。この制限を克服するため,コンテンツベースの画像検索フレームワークであるEndoFinderを導入する。新しいポリプの臨床的意味は、一致したポリプを参照して推測することができる。 EndoFinderは、自己教師付き方法で大規模なPolypデータセット上で事前トレーニングされた、ポリプ対応の画像エンコーダの先駆者であり、マスク付きイメージモデリングとコントラスト学習を組み合わせたものだ。これにより、画像検索に基づいて、下流の様々な臨床タスクに対応できる汎用的な埋め込み空間が得られる。我々は,ポリプ再同定と光学バイオプシータスクの枠組みを検証し,EndoFinderが説明可能な診断を達成できるだけでなく,教師付き分類モデルの性能に適合することを示す広範な実験を行った。 EndoFinderのイメージ検索への依存は、リアルタイム大腸内視鏡手術中に様々な下流決定タスクをサポートする可能性がある。 Determining the necessity of resecting malignant polyps during colonoscopy screen is crucial for patient outcomes, yet challenging due to the time-consuming and costly nature of histopathology examination. While deep learning-based classification models have shown promise in achieving optical biopsy with endoscopic images, they often suffer from a lack of explainability. To overcome this limitation, we introduce EndoFinder, a content-based image retrieval framework to find the 'digital twin' polyp in the reference database given a newly detected polyp. The clinical semantics of the new polyp can be inferred referring to the matched ones. EndoFinder pioneers a polyp-aware image encoder that is pre-trained on a large polyp dataset in a self-supervised way, merging masked image modeling with contrastive learning. This results in a generic embedding space ready for different downstream clinical tasks based on image retrieval. We validate the framework on polyp re-identification and optical biopsy tasks, with extensive experiments demonstrating that EndoFinder not only achieves explainable diagnostics but also matches the performance of supervised classification models. EndoFinder's reliance on image retrieval has the potential to support diverse downstream decision-making tasks during real-time colonoscopy procedures.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# 多種多種ドローンと超スペクトルEnMAPデータによるサバンナ林植生の種レベルでのマッピング Mapping savannah woody vegetation at the species level with multispecral drone and hyperspectral EnMAP data ( http://arxiv.org/abs/2407.11404v1 ) ライセンス: Link先を確認	Christina Karakizi, Akpona Okujeni, Eleni Sofikiti, Vasileios Tsironis, Athina Psalta, Konstantinos Karantzalos, Patrick Hostert, Elias Symeonakis,	(参考訳) サバンナは重要な生態系であり、その持続性は木質植物の普及によって脅かされている。本研究は,EnMAPハイパースペクトルデータを用いて,南アフリカのサバンナの種レベルでの分画木質被覆(FWC)の正確なマッピングを目標とする。フィールドアノテーションは、非常に高解像度のマルチスペクトルドローンデータと組み合わせて、3つの木質種を含む土地被覆マップを生成した。その後、高解像度のラベル付き地図を用いて、EnMAPの30m空間解像度で、各木質種のFWCサンプルを生成した。乾季EnMAP画像のFWCマッピングにおいて, 4つの機械学習回帰アルゴリズムを検証した。また, 乾季および湿季のSentinel-2データから, 新たな回帰特性, 分光時間指標を付加することにより, 多時期情報の寄与も評価した。その結果,FWCを種レベルで正確にマッピングする手法の妥当性が示された。 EnMAPとSentinel-2の組み合わせ実験で得られた最も高い精度は、種レベルでの植生マッピングにおける相乗的ポテンシャルを強調した。 Savannahs are vital ecosystems whose sustainability is endangered by the spread of woody plants. This research targets the accurate mapping of fractional woody cover (FWC) at the species level in a South African savannah, using EnMAP hyperspectral data. Field annotations were combined with very high-resolution multispectral drone data to produce land cover maps that included three woody species. The high-resolution labelled maps were then used to generate FWC samples for each woody species class at the 30-m spatial resolution of EnMAP. Four machine learning regression algorithms were tested for FWC mapping on dry season EnMAP imagery. The contribution of multitemporal information was also assessed by incorporating as additional regression features, spectro-temporal metrics from Sentinel-2 data of both the dry and wet seasons. The results demonstrated the suitability of our approach for accurately mapping FWC at the species level. The highest accuracy rates achieved from the combined EnMAP and Sentinel-2 experiments highlighted their synergistic potential for species-level vegetation mapping.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# 深部生成モデルによるカバー分離型ニューラルネットワークステレオグラフィ Cover-separable Fixed Neural Network Steganography via Deep Generative Models ( http://arxiv.org/abs/2407.11405v1 ) ライセンス: Link先を確認	Guobiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang,	(参考訳) 画像ステガノグラフィー(英: Image steganography)は、微妙な摂動によって秘密データを隠蔽する過程である。近年の研究では、データ埋め込みと抽出に固定されたニューラルネットワークを使用することが可能であることが示されている。このようなFNNS(Fixed Neural Network Steganography)は、トレーニングネットワークを必要とせずに良好なパフォーマンスを示し、現実世界のアプリケーションでより実用的である。しかし、既存のFNNS法で生成されたステゴイメージングは、高い歪みを示し、ステガナリシスツールによって検出される傾向にある。この問題に対処するため、我々はCs-FNNSという、カバー分離可能な固定ニューラルネットワークステレオグラフィーを提案する。 Cs-FNNSでは,シークレットデータを認識不能な摂動にエンコードするSPSアルゴリズムを提案する。同じ深層生成モデルにアクセスすることで、レシーバは事前認識キーを使用してカバーイメージを再生し、データ復号のためのステゴイメージ内の摂動を分離できる。このような符号化/復号戦略は、秘密データに焦点を当て、カバー画像の乱れを排除し、より良い性能を達成する。我々は,Cs-FNNSを隠蔽画像内に隠蔽するステガノグラフィー領域に適用する。総合的な実験を通じて,提案手法の視覚的品質と検出不能性の観点から,優れた性能を示す。さらに,複数のシークレットイメージを異なる受信機に隠蔽することで,Cs-FNNSの柔軟性を示す。 Image steganography is the process of hiding secret data in a cover image by subtle perturbation. Recent studies show that it is feasible to use a fixed neural network for data embedding and extraction. Such Fixed Neural Network Steganography (FNNS) demonstrates favorable performance without the need for training networks, making it more practical for real-world applications. However, the stego-images generated by the existing FNNS methods exhibit high distortion, which is prone to be detected by steganalysis tools. To deal with this issue, we propose a Cover-separable Fixed Neural Network Steganography, namely Cs-FNNS. In Cs-FNNS, we propose a Steganographic Perturbation Search (SPS) algorithm to directly encode the secret data into an imperceptible perturbation, which is combined with an AI-generated cover image for transmission. Through accessing the same deep generative models, the receiver could reproduce the cover image using a pre-agreed key, to separate the perturbation in the stego-image for data decoding. such an encoding/decoding strategy focuses on the secret data and eliminates the disturbance of the cover images, hence achieving a better performance. We apply our Cs-FNNS to the steganographic field that hiding secret images within cover images. Through comprehensive experiments, we demonstrate the superior performance of the proposed method in terms of visual quality and undetectability. Moreover, we show the flexibility of our Cs-FNNS in terms of hiding multiple secret images for different receivers.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# コード生成におけるモジュール性獲得の影響を再考する Revisiting the Impact of Pursuing Modularity for Code Generation ( http://arxiv.org/abs/2407.11406v1 ) ライセンス: Link先を確認	Deokyeong Kang, Ki Jung Seo, Taeuk Kim,	(参考訳) より小さな独立したビルディングブロックを統合することで最終プログラムを構築することを目的としたモジュールプログラミングは、ソフトウェア開発において望ましい実践とみなされてきた。しかし、最近、大きな言語モデル(LLM)上に構築されたコード生成エージェントの台頭により、疑問が浮かび上がっている。本研究では,コード生成におけるモジュラリティの影響を定量的な測定基準として導入することによって評価する。驚くべきことに、このトピックに関する従来の知恵とは異なり、モジュラリティはコード生成モデルのパフォーマンスを改善するための中核的な要素ではない。また、LLMがモジュラーコードよりもモジュラーコードを好む理由についても検討する。 Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development. However, with the rise of recent code generation agents built upon large language models (LLMs), a question emerges: is this traditional practice equally effective for these new tools? In this work, we assess the impact of modularity in code generation by introducing a novel metric for its quantitative measurement. Surprisingly, unlike conventional wisdom on the topic, we find that modularity is not a core factor for improving the performance of code generation models. We also explore potential explanations for why LLMs do not exhibit a preference for modular code compared to non-modular code.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# 交通流予測におけるワークゾーン崩壊の会計 Accounting for Work Zone Disruptions in Traffic Flow Forecasting ( http://arxiv.org/abs/2407.11407v1 ) ライセンス: Link先を確認	Yuanjie Lu, Amarda Shehu, David Lattanzi,	(参考訳) 交通速度予測はインテリジェント交通システム管理において重要な課題である。現在の計算研究の多くは予測速度と実際の速度の差を最小限に抑えることを目的としているが、速度先行以外の情報モダリティは考慮されていない。特に、グラフニューラルネットワーク手法による速度予測において、最先端のパフォーマンスが達成されているが、これらの手法は道路整備作業区域に関する情報と予測される交通流への影響を取り入れていない。本稿では、畳み込みグラフニューラルネットワークアーキテクチャを構築し、新しいデータ融合機構と、トラフィック状態間の時空間依存性を考慮したワークゾーン情報に対応するヘテロジニアスグラフ集約手法を含む、新しい「道路作業ゾーンのためのグラフ畳み込みネットワーク」モデルを提案する。このモデルは、バージニア共和国のワークゾーンの存在下でトラフィックフローをキャプチャする2つのデータセットで評価される。特にワークゾーンイベントにおける交通流の予測において,交通路を横断する複雑で非線形な時空間的関係を抽出し,ベースラインモデルより優れていることを示す。 Traffic speed forecasting is an important task in intelligent transportation system management. The objective of much of the current computational research is to minimize the difference between predicted and actual speeds, but information modalities other than speed priors are largely not taken into account. In particular, though state of the art performance is achieved on speed forecasting with graph neural network methods, these methods do not incorporate information on roadway maintenance work zones and their impacts on predicted traffic flows; yet, the impacts of construction work zones are of significant interest to roadway management agencies, because they translate to impacts on the local economy and public well-being. In this paper, we build over the convolutional graph neural network architecture and present a novel ``Graph Convolutional Network for Roadway Work Zones" model that includes a novel data fusion mechanism and a new heterogeneous graph aggregation methodology to accommodate work zone information in spatio-temporal dependencies among traffic states. The model is evaluated on two data sets that capture traffic flows in the presence of work zones in the Commonwealth of Virginia. Extensive comparative evaluation and ablation studies show that the proposed model can capture complex and nonlinear spatio-temporal relationships across a transportation corridor, outperforming baseline models, particularly when predicting traffic flow during a workzone event.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# 大規模言語モデルを用いた政治サンプルシミュレーションにおける表現バイアス Representation Bias in Political Sample Simulations with Large Language Models ( http://arxiv.org/abs/2407.11409v1 ) ライセンス: Link先を確認	Weihong Qi, Hanjia Lyu, Jiebo Luo,	(参考訳) 本研究は,大規模言語モデルを用いた政治サンプルのシミュレーションにおけるバイアスの同定と定量化を目的としており,特に投票選択と世論に焦点を当てている。 GPT-3.5-Turboモデルを用いて、投票行動や世論をシミュレートするために、米国選挙研究、ドイツ縦割り選挙研究、ズオビアオデータセット、中国家族パネル研究のデータを活用する。本手法により,国語,人口集団,政治体制の3種類の表現バイアスを検討することができる。この結果は、世論よりもシミュレーション性能の方が概して投票選択に適しており、英語圏ではより正確であり、多党制よりも両党制の方が効果的であり、独裁政権よりも民主的な状況の方が強いことを示している。これらの結果は、計算社会科学の分野におけるAI応用におけるバイアスを軽減するための理解と戦略の発展に寄与する。 This study seeks to identify and quantify biases in simulating political samples with Large Language Models, specifically focusing on vote choice and public opinion. Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao Dataset, and China Family Panel Studies to simulate voting behaviors and public opinions. This methodology enables us to examine three types of representation bias: disparities based on the the country's language, demographic groups, and political regime types. The findings reveal that simulation performance is generally better for vote choice than for public opinions, more accurate in English-speaking countries, more effective in bipartisan systems than in multi-partisan systems, and stronger in democratic settings than in authoritarian regimes. These results contribute to enhancing our understanding and developing strategies to mitigate biases in AI applications within the field of computational social science.	翻訳日:2024-07-17 18:22:47 公開日:2024-07-16
# SDPT:融合型ビジュアルランゲージ事前学習モデルのための同期デュアルプロンプトチューニング SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models ( http://arxiv.org/abs/2407.11414v1 ) ライセンス: Link先を確認	Yang Zhou, Yongjian Wu, Jiya Saiyin, Bingzheng Wei, Maode Lai, Eric Chang, Yan Xu,	(参考訳) プロンプトチューニング法は、大きな事前訓練されたモデルにおけるパラメータ効率の良い微調整において顕著な成功を収めた。しかし、GLIPのようなデュアルモーダル融合に基づく視覚言語事前訓練モデル(VLPM)への応用は問題に直面している。既存のプロンプトチューニング手法は、異なるモダリティのトークンに対するモダルマッピングやアライメントの問題に効果的に対処していないため、転送一般化は不十分である。この問題に対処するため,Synchronous Dual Prompt Tuning (SDPT)を提案する。 SDPTは、確立されたモーダル整合空間における学習可能な統一されたプロトタイプトークンのセットを初期化して、下流タスクのテキストと画像のモダリティの整合性を表現する。さらにSDPTは、異なるモダリティの入力空間に統一されたプロトタイプトークンの情報を埋め込む訓練を必要としない逆線形射影を確立する。逆線形射影により、統一されたプロトタイプトークンは2つのモダリティを同期的に表現し、SDPTは異なるモダリティプロンプトで下流タスクのためにテキストと画像の統一的なセマンティクスを共有することができる。実験の結果,SDPTは核融合型VLPMを補助し,モデルパラメータの0.04\%しか得られず,他の単モード法や双モード法よりも優れていることがわかった。コードはhttps://github.com/wuyongjianCODE/SDPTで公開される。 Prompt tuning methods have achieved remarkable success in parameter-efficient fine-tuning on large pre-trained models. However, their application to dual-modal fusion-based visual-language pre-trained models (VLPMs), such as GLIP, has encountered issues. Existing prompt tuning methods have not effectively addressed the modal mapping and aligning problem for tokens in different modalities, leading to poor transfer generalization. To address this issue, we propose Synchronous Dual Prompt Tuning (SDPT). SDPT initializes a single set of learnable unified prototype tokens in the established modal aligning space to represent the aligned semantics of text and image modalities for downstream tasks. Furthermore, SDPT establishes inverse linear projections that require no training to embed the information of unified prototype tokens into the input space of different modalities. The inverse linear projections allow the unified prototype token to synchronously represent the two modalities and enable SDPT to share the unified semantics of text and image for downstream tasks across different modal prompts. Experimental results demonstrate that SDPT assists fusion-based VLPMs to achieve superior outcomes with only 0.04\% of model parameters for training across various scenarios, outperforming other single- or dual-modal methods. The code will be released at https://github.com/wuyongjianCODE/SDPT.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# SPINACH: SPARQLによるリアルタイム質問のマッチングのための情報ナビゲーション SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions ( http://arxiv.org/abs/2407.11417v1 ) ライセンス: Link先を確認	Shicheng Liu, Sina J. Semnani, Harold Triedman, Jialiang Xu, Isaac Dan Zhao, Monica S. Lam,	(参考訳) 近年,Large Language Models (LLMs) の統合作業は,KBQA(Knowledge Base Question Answering)タスクの大幅な改善につながっている。しかし,既存のKBQAデータセットは,単純な質問や合成論理形式,あるいは小さな知識ベース(KB)スキーマに基づくものであり,KBQAタスクの真の複雑さを捉えていないと仮定する。そこで本稿では,Wikidata の "Request a Query" フォーラムでのフォーラムディスカッションから収集した KBQA データセットである SPINACH データセットを紹介する。既存のデータセットよりもはるかに複雑で、SPINACHはKBスキーマを学ぶためにトレーニングデータに頼るのではなく、大規模で多くの場合不完全なスキーマを動的に探索し、それらについて推論できる強力なKBQAシステムを求めている。データセットに加えて、このような難しい問題に対して、人間の専門家がどのようにSPARQLを書くかを模した、KBQAアプローチであるSPINACHエージェントも導入しています。既存のデータセットの実験では、KBQAにおけるSPINACHの能力を示し、QALD-7、QALD-9 Plus、QALD-10データセットでそれぞれ30.1%、27.0%、F1で10.0%、WikiWebQuestionsで微調整されたLLaMA SOTAモデルの1.6%に到達した。我々の新しいSPINACHデータセットでは、SPINACHエージェントは、最高のGPT-4ベースのKBQAエージェントを含む全てのベースラインを38.1%上回る。 Recent work integrating Large Language Models (LLMs) has led to significant improvements in the Knowledge Base Question Answering (KBQA) task. However, we posit that existing KBQA datasets that either have simple questions, use synthetically generated logical forms, or are based on small knowledge base (KB) schemas, do not capture the true complexity of KBQA tasks. To address this, we introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from forum discussions on Wikidata's "Request a Query" forum with 320 decontextualized question-SPARQL pairs. Much more complex than existing datasets, SPINACH calls for strong KBQA systems that do not rely on training data to learn the KB schema, but can dynamically explore large and often incomplete schemas and reason about them. Along with the dataset, we introduce the SPINACH agent, a new KBQA approach that mimics how a human expert would write SPARQLs for such challenging questions. Experiments on existing datasets show SPINACH's capability in KBQA, achieving a new state of the art on the QALD-7, QALD-9 Plus and QALD-10 datasets by 30.1%, 27.0%, and 10.0% in F1, respectively, and coming within 1.6% of the fine-tuned LLaMA SOTA model on WikiWebQuestions. On our new SPINACH dataset, SPINACH agent outperforms all baselines, including the best GPT-4-based KBQA agent, by 38.1% in F1.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# LOTUS:非構造化および構造化データのテーブル上でのLLMによるセマンティッククエリの実現 LOTUS: Enabling Semantic Queries with LLMs Over Tables of Unstructured and Structured Data ( http://arxiv.org/abs/2407.11418v1 ) ライセンス: Link先を確認	Liana Patel, Siddharth Jha, Carlos Guestrin, Matei Zaharia,	(参考訳) 言語モデル(LM)のセマンティック能力は、豊富な知識コーパスに対するリッチな分析と推論を可能にする可能性がある。残念ながら、既存のシステムは、大規模にセマンティッククエリを実行するためのハイレベルな抽象化を欠いている。我々は、データセット上のセマンティッククエリ(例えば、自然言語の基準を用いたレコードのソートや集約など)のための構成可能なAIベースの操作により、リレーショナルモデルを拡張する宣言型プログラミングインターフェースであるセマンティック演算子を紹介する。各オペレータは、複数の方法で実装および最適化することができ、リレーショナル演算子に似た実行計画のための豊富なスペースを開放する。我々は,PandasライクなAPIを備えたオープンソースのクエリエンジンであるLOTUSで,演算子といくつかの最適化を実装した。我々は,ファクトチェック,極端なマルチラベル分類,検索など,一連の実アプリケーションにおいてLOTUSの有効性を実証する。 LOTUSのプログラミングモデルは非常に表現力が高く、開発オーバーヘッドの少ない最先端のクエリパイプラインをキャプチャする。具体的には、FEVERデータセット上で、LOTUSのプログラムは、最近の最先端のファクトチェックパイプラインであるFacToolを数行のコードで再現でき、新しいパイプラインを実装して、9.5\%の精度を向上し、7～34\times$低い実行時間を提供する。 BioDEXデータセットの極端なマルチラベル分類タスクでは、LOTUSはジョイン演算子を使って、最先端のアート結果の品質を再現すると同時に、単純なジョインよりも800\times$高速な効率的なアルゴリズムを提供する。検索とランキングアプリケーションでは、LOTUSはバニラレトリバーやリランカよりも5.9 - 49.4\%$高いnDCG@10を達成することができるが、クエリ効率は1.67 - 10\times$ LMベースのランキング手法よりも低い。 LOTUSはhttps://github.com/stanford-futuredata/lotus.comで公開されている。 The semantic capabilities of language models (LMs) have the potential to enable rich analytics and reasoning over vast knowledge corpora. Unfortunately, existing systems lack high-level abstractions to perform semantic queries at scale. We introduce semantic operators, a declarative programming interface that extends the relational model with composable AI-based operations for semantic queries over datasets (e.g., sorting or aggregating records using natural language criteria). Each operator can be implemented and optimized in multiple ways, opening a rich space for execution plans similar to relational operators. We implement our operators and several optimizations for them in LOTUS, an open-source query engine with a Pandas-like API. We demonstrate LOTUS' effectiveness across a series of real applications, including fact-checking, extreme multi-label classification, and search. We find that LOTUS' programming model is highly expressive, capturing state-of-the-art query pipelines with low development overhead. Specifically, on the FEVER dataset, LOTUS' programs can reproduce FacTool, a recent state-of-the-art fact-checking pipeline, in few lines of code, and implement a new pipeline that improves accuracy by $9.5\%$, while offering $7-34\times$ lower execution time. In the extreme multi-label classification task on the BioDEX dataset, LOTUS reproduces state-of-the art result quality with its join operator, while providing an efficient algorithm that runs $800\times$ faster than a naive join. In the search and ranking application, LOTUS allows a simple composition of operators to achieve $5.9 - 49.4\%$ higher nDCG@10 than the vanilla retriever and re-ranker, while also providing query efficiency, with $1.67 - 10\times$ lower execution time than LM-based ranking methods used by prior works. LOTUS is publicly available at https://github.com/stanford-futuredata/lotus.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# 歯のドレマー : 5枚の口腔内写真からの3次元歯の再構築 TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs ( http://arxiv.org/abs/2407.11419v1 ) ライセンス: Link先を確認	Chenfan Xu, Zhentao Liu, Yuan Liu, Yulong Dou, Jiamin Wu, Jiepeng Wang, Minjiao Wang, Dinggang Shen, Zhiming Cui,	(参考訳) 矯正治療は通常、患者の歯の状態を監視するために、顔と顔の定期的な検査を必要とする。人体診断が不可能な場合、遠隔歯科監視に5枚の口腔内写真を使用する方法がある。しかし, 3D 情報がないため, 疎視写真から 3D モデルをどのように再構築するかが課題である。そこで本研究では,上下顎歯の形状と位置の復元を目的とした3次元再構築フレームワークTeethDreamerを提案する。口腔内5枚の写真から,まず大きな拡散モデルの先行知識を生かして,スパース入力に対処する新しい多視点画像を生成し,その後,神経表面再構成により高品質な3次元歯形を再構築する。生成したビュー間の3D整合性を確保するために,逆拡散プロセスに3D対応機能アテンション機構を統合する。さらに、歯の再建工程には、幾何学的認識の正常な損失が組み込まれ、幾何学的精度が向上する。広汎な実験により,術式が現状よりも優れていることが示され,遠隔での矯正治療の監視が可能となった。私たちのコードはhttps://github.com/ShanghaiTech-IMPACT/TeethDreamerで利用可能です。 Orthodontic treatment usually requires regular face-to-face examinations to monitor dental conditions of the patients. When in-person diagnosis is not feasible, an alternative is to utilize five intra-oral photographs for remote dental monitoring. However, it lacks of 3D information, and how to reconstruct 3D dental models from such sparse view photographs is a challenging problem. In this study, we propose a 3D teeth reconstruction framework, named TeethDreamer, aiming to restore the shape and position of the upper and lower teeth. Given five intra-oral photographs, our approach first leverages a large diffusion model's prior knowledge to generate novel multi-view images with known poses to address sparse inputs and then reconstructs high-quality 3D teeth models by neural surface reconstruction. To ensure the 3D consistency across generated views, we integrate a 3D-aware feature attention mechanism in the reverse diffusion process. Moreover, a geometry-aware normal loss is incorporated into the teeth reconstruction process to enhance geometry accuracy. Extensive experiments demonstrate the superiority of our method over current state-of-the-arts, giving the potential to monitor orthodontic treatment remotely. Our code is available at https://github.com/ShanghaiTech-IMPACT/TeethDreamer	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# 隠れた州に隠れた州: LLMは国家表現を重要視 States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly ( http://arxiv.org/abs/2407.11421v1 ) ライセンス: Link先を確認	Junhao Chen, Shengding Hu, Zhiyuan Liu, Maosong Sun,	(参考訳) 大きな言語モデル(LLM)は、様々な創発的な能力を示す。これらの能力の中には、モデルの内部動作機構を明らかにするものもある。本稿では,モデルにおける新たな創発的能力,すなわち,チェーン・オブ・ステップ・バイ・ステップの解に頼らずに計算列を拡張できる本質的な能力を明らかにする。注目すべきは、最も先進的なモデルでは、2桁の加算結果を直接出力できることだ。我々は,本モデルが隠れ状態内にインプリシット離散状態表現(IDSR)を出現させ,内部でシンボル計算を行うという仮説を立てる。この仮説をテストするために、隠れた状態を調べる一連の実験を設計する。具体的には、IDSRが存在することを最初に確認する。次に,レイヤ,ディジット,シーケンスの観点からのIDSRの生成について興味深い観察を行った。最後に,モデルがIDSRを用いて最終回答を生成することを確認した。しかし、これらの状態表現は、現在のオープンソースモデルでは損失のないものではないことが分かり、最終的な性能が不正確であることが判明した。本研究は,LLMの記号計算能力とその基礎となるメカニズムを新たに探求するものである。 Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output the results of two-digit number additions with lengths extending up to 15 addends. We hypothesize that the model emerges Implicit Discrete State Representations (IDSRs) within its hidden states and performs symbolic calculations internally. To test this hypothesis, we design a sequence of experiments that look into the hidden states. Specifically, we first confirm that IDSRs exist. Then, we provide interesting observations about the formation of IDSRs from layer, digit, and sequence perspectives. Finally, we confirm that models indeed use IDSRs to produce the final answers. However, we also discover that these state representations are far from lossless in current open-sourced models, leading to inaccuracies in their final performance. Our work presents a novel exploration of LLMs' symbolic calculation abilities and the underlying mechanisms.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# リフレクティブ・インストラクション・チューニング:大規模視覚言語モデルにおける幻覚の緩和 Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models ( http://arxiv.org/abs/2407.11422v1 ) ライセンス: Link先を確認	Jinrui Zhang, Teng Wang, Haigang Zhang, Ping Lu, Feng Zheng,	(参考訳) 大規模視覚言語モデル(LVLM)は様々な視覚言語タスクにおいて有望な性能を示す。しかし、幻覚の影響を受け難いままであり、視覚内容や指示と不一致な出力を生成する。様々な緩和戦略が提案されているが、彼らはしばしば幻覚への重要な貢献を無視している。中間的推論ステップがなければ、モデルは命令と応答の間の表面的なショートカットを確立することができ、固有の推論ロジックの内部化に失敗する。この課題に対処するために,合理化学習を視覚的指導調律に統合した反射的指導調律を提案する。反応のみから学習する従来の方法とは異なり、我々の手法はなぜ応答が正しいのか、正しくないのかを正当化する合理性を予測するモデルを必要とする。これにより、各応答の根底にあるきめ細かい推論とのより深い関わりが促進され、モデルの推論習熟度が向上する。このアプローチを容易にするために,ReflEctiVE RatIonalEアノテーションを用いた最初の大規模命令チューニングデータセットであるREVERIEを提案する。 ReverIEは、115kの機械生成推論命令からなり、それぞれの応答の正当性や誤当性の背後にある正当性を解明する包括的論理とともに、対応する正当性と紛らわしい応答のペアに細心の注意を払って注釈付けされる。複数のLVLMベンチマークによる実験結果から,REVERIEデータセットによる反射的命令チューニングがベースラインモデルよりも顕著な性能向上を達成し,有理数からの反射の有効性が示された。プロジェクトページはhttps://zjr2000.github.io/projects/reverieにある。 Large vision-language models (LVLMs) have shown promising performance on a variety of vision-language tasks. However, they remain susceptible to hallucinations, generating outputs misaligned with visual content or instructions. While various mitigation strategies have been proposed, they often neglect a key contributor to hallucinations: lack of fine-grained reasoning supervision during training. Without intermediate reasoning steps, models may establish superficial shortcuts between instructions and responses, failing to internalize the inherent reasoning logic. To address this challenge, we propose reflective instruction tuning, which integrates rationale learning into visual instruction tuning. Unlike previous methods that learning from responses only, our approach entails the model predicting rationales justifying why responses are correct or incorrect. This fosters a deeper engagement with the fine-grained reasoning underlying each response, thus enhancing the model's reasoning proficiency. To facilitate this approach, we propose REVERIE, the first large-scale instruction-tuning dataset with ReflEctiVE RatIonalE annotations. REVERIE comprises 115k machine-generated reasoning instructions, each meticulously annotated with a corresponding pair of correct and confusing responses, alongside comprehensive rationales elucidating the justification behind the correctness or erroneousness of each response. Experimental results on multiple LVLM benchmarks reveal that reflective instruction tuning with the REVERIE dataset yields noticeable performance gain over the baseline model, demonstrating the effectiveness of reflecting from the rationales. Project page is at https://zjr2000.github.io/projects/reverie.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# 目標条件付き拡散モデルによるモデル反転攻撃 Model Inversion Attacks Through Target-Specific Conditional Diffusion Models ( http://arxiv.org/abs/2407.11424v1 ) ライセンス: Link先を確認	Ouxiang Li, Yanbin Hao, Zhicai Wang, Bin Zhu, Shuo Wang, Zaixi Zhang, Fuli Feng,	(参考訳) モデル反転攻撃(MIA)は、ターゲット分類器のトレーニングセットからプライベートイメージを再構築することを目的としており、それによってAIアプリケーションにおけるプライバシー上の懸念が高まる。従来のGANベースのMIAは、GANの固有の欠陥と潜伏空間における最適化の偏りにより、劣った遺伝子的忠実度に悩まされる傾向にある。これらの問題を緩和し,拡散モデルの顕著な合成機能を活用するために,拡散型モデル反転(Diff-MI)攻撃を提案する。具体的には,ターゲット特定条件拡散モデル(CDM)を導入し,ターゲット分類器の個人分布を意図的に近似し,精度・忠実バランスを向上する。本手法は2段階の学習パラダイムを含む。 Step-1は、対象の分類器を、事前訓練と微調整を行う際にモデル条件として擬似ラベルを作成することによって、訓練前のファイントゥン方式でCDM学習全体に組み込む。ステップ2では、拡散先行と目標知識の組み合わせにより、攻撃性能をさらに向上する反復画像再構成手法を提案する。さらに,最大値を最大値に置き換え,特徴情報とソフトラベルをターゲット分類器から完全に活用する改良された最大値損失を提案する。大規模な実験により、Diff-MIは、様々なデータセットやモデルにわたる最先端の手法と比較して、競合攻撃精度を維持しながら、FIDの平均20%の低下で生成忠実度を著しく向上することが示された。コードとモデルをリリースします。 Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. To alleviate these issues, leveraging on diffusion models' remarkable synthesis capabilities, we propose Diffusion-based Model Inversion (Diff-MI) attacks. Specifically, we introduce a novel target-specific conditional diffusion model (CDM) to purposely approximate target classifier's private distribution and achieve superior accuracy-fidelity balance. Our method involves a two-step learning paradigm. Step-1 incorporates the target classifier into the entire CDM learning under a pretrain-then-finetune fashion, with creating pseudo-labels as model conditions in pretraining and adjusting specified layers with image predictions in fine-tuning. Step-2 presents an iterative image reconstruction method, further enhancing the attack performance through a combination of diffusion priors and target knowledge. Additionally, we propose an improved max-margin loss that replaces the hard max with top-k maxes, fully leveraging feature information and soft labels from the target classifier. Extensive experiments demonstrate that Diff-MI significantly improves generative fidelity with an average decrease of 20% in FID while maintaining competitive attack accuracy compared to state-of-the-art methods across various datasets and models. We will release our code and models.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# ロバストな非現実的説明のための一般発生モデル変化 Generally-Occurring Model Change for Robust Counterfactual Explanations ( http://arxiv.org/abs/2407.11426v1 ) ライセンス: Link先を確認	Ao Xu, Tieru Wu,	(参考訳) アルゴリズムによる意思決定が人間の生活に与える影響の増大に伴い、モデルの解釈可能性は機械学習において重要な問題となっている。これは、機械学習モデルがなぜ特定の決定を下すのかをユーザーが理解するのに役立つだけでなく、ユーザーがこれらの決定をどう変えるかを理解するのに役立つ。当然のことながら、変化をモデル化するための反実的説明生成アルゴリズムの堅牢性を研究することは重要な課題である。これまでの文献では、自然発生モデル変更の概念が提案されており、モデル変更に対する堅牢性についてより深く理解されています。本稿では, モデルパラメータ変化のより一般的な概念である, 適用範囲の広いモデルパラメータ変化を提唱し, 自然発生モデル変化の概念をさらに一般化する。また、それに対応する確率的保証も証明する。さらに、より具体的な問題、データセットの摂動を考察し、最適化理論を組み合わせることで関連する理論的結果を与える。 With the increasing impact of algorithmic decision-making on human lives, the interpretability of models has become a critical issue in machine learning. Counterfactual explanation is an important method in the field of interpretable machine learning, which can not only help users understand why machine learning models make specific decisions, but also help users understand how to change these decisions. Naturally, it is an important task to study the robustness of counterfactual explanation generation algorithms to model changes. Previous literature has proposed the concept of Naturally-Occurring Model Change, which has given us a deeper understanding of robustness to model change. In this paper, we first further generalize the concept of Naturally-Occurring Model Change, proposing a more general concept of model parameter changes, Generally-Occurring Model Change, which has a wider range of applicability. We also prove the corresponding probabilistic guarantees. In addition, we consider a more specific problem, data set perturbation, and give relevant theoretical results by combining optimization theory.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# 半監督型疾患軌跡生成モデル : 全身性硬化症を事例として Semi-Supervised Generative Models for Disease Trajectories: A Case Study on Systemic Sclerosis ( http://arxiv.org/abs/2407.11427v1 ) ライセンス: Link先を確認	Cécile Trottet, Manuel Schürch, Ahmed Allam, Imon Barua, Liubov Petelytska, Oliver Distler, Anna-Maria Hoffmann-Vold, Michael Krauthammer, the EUSTAR collaborators,	(参考訳) 複雑な疾患の軌跡をモデル化・全体解析するために潜時過程を用いた深部生成法を提案し,特に全身性硬化症(SSc)に焦点を当てた。本研究の目的は、患者疾患の軌跡を解釈可能かつ包括的に説明するための、根底にある生成過程の時間的潜在表現を学習することである。そこで我々は,これらの潜伏時間過程の解釈可能性を高めるために,確立された医療知識を用いて潜伏空間を遠ざけるための半教師付きアプローチを開発した。 SScの異なる特徴の医学的定義と生成的アプローチを組み合わせることで,病の新たな側面の発見が容易になる。本研究は, SSc患者軌跡を新たなサブタイプに分類するなど, さらにデータ分析や臨床仮説の検証に, 学習時潜伏過程を活用できることを示唆する。さらに、不確実な定量化を伴う多変量時系列のパーソナライズされたオンラインモニタリングと予測を可能にする。 We propose a deep generative approach using latent temporal processes for modeling and holistically analyzing complex disease trajectories, with a particular focus on Systemic Sclerosis (SSc). We aim to learn temporal latent representations of the underlying generative process that explain the observed patient disease trajectories in an interpretable and comprehensive way. To enhance the interpretability of these latent temporal processes, we develop a semi-supervised approach for disentangling the latent space using established medical knowledge. By combining the generative approach with medical definitions of different characteristics of SSc, we facilitate the discovery of new aspects of the disease. We show that the learned temporal latent processes can be utilized for further data analysis and clinical hypothesis testing, including finding similar patients and clustering SSc patient trajectories into novel sub-types. Moreover, our method enables personalized online monitoring and prediction of multivariate time series with uncertainty quantification.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# Unrolled Neural Networksによる共同データ塗装とグラフ学習 Joint Data Inpainting and Graph Learning via Unrolled Neural Networks ( http://arxiv.org/abs/2407.11429v1 ) ライセンス: Link先を確認	Subbareddy Batreddy, Pushkal Mishra, Yaswanth Kakarla, Aditya Siripuram,	(参考訳) 時間変化グラフ信号の部分的な測定を考慮し、基礎となるグラフトポロジと欠測値の両方を同時に推定するアルゴリズムを提案する。提案アルゴリズムは、アンローリングフレームワークから設計された解釈可能なニューラルネットワークをトレーニングすることによって動作する。提案手法はグラフ学習とグラフ信号再構成アルゴリズムの両方に利用できる。この研究は、基礎となるグラフを未知にすることで、グラフ信号再構成における先行作業を強化するとともに、学習したグラフを信号再構成タスクに合わせることにより、グラフ学習における先行作業を構築する。 Given partial measurements of a time-varying graph signal, we propose an algorithm to simultaneously estimate both the underlying graph topology and the missing measurements. The proposed algorithm operates by training an interpretable neural network, designed from the unrolling framework. The proposed technique can be used both as a graph learning and a graph signal reconstruction algorithm. This work enhances prior work in graph signal reconstruction by allowing the underlying graph to be unknown; and also builds on prior work in graph learning by tailoring the learned graph to the signal reconstruction task.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# MRIo3DS-Net: モデル適応屋内3次元再構成のための3次元表面RNNライクなフレームワークへの画像の相互強化 MRIo3DS-Net: A Mutually Reinforcing Images to 3D Surface RNN-like framework for model-adaptation indoor 3D reconstruction ( http://arxiv.org/abs/2407.11431v1 ) ライセンス: Link先を確認	Chang Li, Jiao Guo, Yufei Zhao, Yongjun Zhang,	(参考訳) 本稿では, モデル適応型屋内3次元再構成のための3次元面再帰型ニューラルネットワークに画像を相互に補強するエンドツーエンドのフレームワークを提案する。マルチビュー密集型マッチングモジュールでは, 多視点密集型マッチングモジュールを用いて, トランスフォーマーベースのマルチビュー密集型マッチングDNNを微調整・最適化し, マッチングと詳細化のために高い画像特徴を有するようにし, ポイントクラウド表面最適化モジュールでは, 3次元面再帰型ニューラルネットワークをモデル適応型戦略を用いて最適化し, 3次元面再帰型ニューラルネットワークを最適化し, 3次元面再帰型ニューラルネットワークを最適化する。 This paper is the first to propose an end-to-end framework of mutually reinforcing images to 3D surface recurrent neural network-like for model-adaptation indoor 3D reconstruction,where multi-view dense matching and point cloud surface optimization are mutually reinforced by a RNN-like structure rather than being treated as a separate issue.The characteristics are as follows:In the multi-view dense matching module, the model-adaptation strategy is used to fine-tune and optimize a Transformer-based multi-view dense matching DNN,so that it has the higher image feature for matching and detail expression capabilities;In the point cloud surface optimization module,the 3D surface reconstruction network based on 3D implicit field is optimized by using model-adaptation strategy,which solves the problem of point cloud surface optimization without knowing normal vector of 3D surface.To improve and finely reconstruct 3D surfaces from point cloud,smooth loss is proposed and added to this module;The MRIo3DS-Net is a RNN-like framework,which utilizes the finely optimized 3D surface obtained by PCSOM to recursively reinforce the differentiable warping for optimizing MVDMM.This refinement leads to achieving better dense matching results, and better dense matching results leads to achieving better 3D surface results recursively and mutually.Hence, model-adaptation strategy can better collaborate the differences between the two network modules,so that they complement each other to achieve the better effect;To accelerate the transfer learning and training convergence from source domain to target domain,a multi-task loss function based on Bayesian uncertainty is used to adaptively adjust the weights between the two networks loss functions of MVDMM and PCSOM;In this multi-task cascade network framework,any modules can be replaced by any state-of-the-art networks to achieve better 3D reconstruction results.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# CycleHOI: 検出・生成のサイクル整合性による人間と物体の相互作用検出の改善 CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation ( http://arxiv.org/abs/2407.11433v1 ) ライセンス: Link先を確認	Yisen Wang, Yao Teng, Limin Wang,	(参考訳) 認識と生成は、コンピュータビジョンにおける2つの基本的なタスクであり、しばしば出口文学において別々に研究される。しかし、これら2つのタスクは、視覚概念の根底的な意味論を理解する必要があるため、本質的に非常に相関性が高い。本稿では,DeTRに基づく検出パイプラインと事前学習したテキスト・画像拡散モデルをブリッジすることで,人物体間相互作用(HOI)の検出性能を向上させるための学習フレームワークCycleHOIを提案する。我々のキーとなる設計は、HOI検出器のトレーニングに新たなサイクル整合性損失を導入することであり、これはHOI検出器のトレーニングをガイドするために、強力な拡散モデルで得られた知識を明示的に活用することができる。具体的には、HOI検出器からデコードされたインスタンス表現の上に余分な生成タスクを構築し、検出・生成サイクルの一貫性を強制する。さらに,拡散モデルから検出器エンコーダへの特徴蒸留を行い,その表現力を高める。さらに,拡散モデルの生成力を利用してラベル補正とサンプル生成の両面でトレーニングセットを増強する。 HICO-DETとV-COCOの2つの公開データセット上で,3つのHOI検出フレームワークを用いて,CycleHOIの有効性と一般化力を検証した。実験の結果,CycleHOIは最先端のHOI検出器の性能を大幅に向上させることができることがわかった。 Recognition and generation are two fundamental tasks in computer vision, which are often investigated separately in the exiting literature. However, these two tasks are highly correlated in essence as they both require understanding the underline semantics of visual concepts. In this paper, we propose a new learning framework, coined as CycleHOI, to boost the performance of human-object interaction (HOI) detection by bridging the DETR-based detection pipeline and the pre-trained text-to-image diffusion model. Our key design is to introduce a novel cycle consistency loss for the training of HOI detector, which is able to explicitly leverage the knowledge captured in the powerful diffusion model to guide the HOI detector training. Specifically, we build an extra generation task on top of the decoded instance representations from HOI detector to enforce a detection-generation cycle consistency. Moreover, we perform feature distillation from diffusion model to detector encoder to enhance its representation power. In addition, we further utilize the generation power of diffusion model to augment the training set in both aspects of label correction and sample generation. We perform extensive experiments to verify the effectiveness and generalization power of our CycleHOI with three HOI detection frameworks on two public datasets: HICO-DET and V-COCO. The experimental results demonstrate our CycleHOI can significantly improve the performance of the state-of-the-art HOI detectors.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# ゲノム言語モデル:機会と課題 Genomic Language Models: Opportunities and Challenges ( http://arxiv.org/abs/2407.11435v1 ) ライセンス: Link先を確認	Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S. Song,	(参考訳) 大規模言語モデル(LLM)は、幅広い科学分野、特に生物医学分野において、変革的な影響を及ぼしている。自然言語処理の目的が単語の列を理解することにあるように、生物学の主要な目的は生物学的列を理解することである。ゲノム言語モデル(gLM)は、DNA配列に基づいて訓練されたLLMであり、ゲノムの理解を深め、様々なスケールのDNA要素がどのように相互作用して複雑な機能を引き起こすかを示す可能性がある。本稿では、フィットネス予測、シーケンス設計、伝達学習など、gLMの重要応用について紹介する。しかし、最近の顕著な進歩にもかかわらず、効率的かつ効率的なgLMの開発は、特に大型で複雑なゲノムを持つ種に対して多くの課題を呈している。本稿では,gLMの開発と評価について論じる。 Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to significantly advance our understanding of genomes and how DNA elements at various scales interact to give rise to complex functions. In this review, we showcase this potential by highlighting key applications of gLMs, including fitness prediction, sequence design, and transfer learning. Despite notable recent progress, however, developing effective and efficient gLMs presents numerous challenges, especially for species with large, complex genomes. We discuss major considerations for developing and evaluating gLMs.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# Trust No Bot: 野生の人間とLLMの会話で個人的情報開示が発見される Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild ( http://arxiv.org/abs/2407.11438v1 ) ライセンス: Link先を確認	Niloofar Mireshghallah, Maria Antoniak, Yash More, Yejin Choi, Golnoosh Farnadi,	(参考訳) ヒューマン・チャットボットのインタラクションにおける個人の開示を測定することで、ユーザのAIリテラシーをよりよく理解し、大規模言語モデル(LLM)のプライバシー調査を容易にすることができる。我々は、実際のユーザによる商用GPTモデルへの個人情報開示を詳細に分析し、個人を識別し、機密性の高い情報の漏洩を調査した。ユーザがチャットボットに開示するコンテキストを理解するために,自然発生会話の質的・定量的分析に基づいて,タスクやセンシティブなトピックの分類を開発する。個人の識別可能な情報(PII)は,(1)翻訳やコード編集などの予期せぬ状況に現れ,(2)PII検出だけでは,詳細な性的嗜好や特定の薬物使用習慣など,人間とチャットボットの相互作用に共通するセンシティブなトピックを捉えるには不十分である。これらの情報開示率は研究者やデータキュレーターにとって非常に重要であると我々は信じており、ユーザによるインタラクションの緩和を支援するための適切なヌード機構の設計を求めている。 Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate privacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand the contexts in which users disclose to chatbots, we develop a taxonomy of tasks and sensitive topics, based on qualitative and quantitative analysis of naturally occurring conversations. We discuss these potential privacy harms and observe that: (1) personally identifiable information (PII) appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively) and (2) PII detection alone is insufficient to capture the sensitive topics that are common in human-chatbot interactions, such as detailed sexual preferences or specific drug use habits. We believe that these high disclosure rates are of significant importance for researchers and data curators, and we call for the design of appropriate nudging mechanisms to help users moderate their interactions.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# リパーフォーマ: 再パーポーシング対応分子生成用トランス Repurformer: Transformers for Repurposing-Aware Molecule Generation ( http://arxiv.org/abs/2407.11439v1 ) ライセンス: Link先を確認	Changhun Lee, Gyumin Lee,	(参考訳) 薬物発見研究には、可能な限り多様な分子の生成が不可欠であり、今日では多くのアプローチが深層生成モデルに基づいて進められている。これらのモデル、特に変分自己エンコーダ(VAE)、生成逆数ネットワーク(GAN)、変換器(Transformer)、拡散モデルにおいて、近年の進歩にもかかわらず、 \textit{the sample bias problem(サンプルバイアス問題)として知られる重要な課題が残っている。この問題は、同じタンパク質を標的とする生成分子が構造的に類似する傾向にあり、生成の多様性を低下させる。そこで本研究では,タンパク質と化合物のマルチホップ関係を活用することを提案する。我々のモデルであるRepurformerは、Fast Fourier Transform (FFT) と Low-pass Filtering (LPF) と双方向事前学習を統合し、複雑な相互作用を捕捉し、多様な分子を生成する。 BindingDBデータセットに関する一連の実験は、Repurformerが正の化合物に類似したアンカー化合物の代替品をうまく生成し、アンカーと生成された化合物の間の多様性を増大させることを確認した。 Generating as diverse molecules as possible with desired properties is crucial for drug discovery research, which invokes many approaches based on deep generative models today. Despite recent advancements in these models, particularly in variational autoencoders (VAEs), generative adversarial networks (GANs), Transformers, and diffusion models, a significant challenge known as \textit{the sample bias problem} remains. This problem occurs when generated molecules targeting the same protein tend to be structurally similar, reducing the diversity of generation. To address this, we propose leveraging multi-hop relationships among proteins and compounds. Our model, Repurformer, integrates bi-directional pretraining with Fast Fourier Transform (FFT) and low-pass filtering (LPF) to capture complex interactions and generate diverse molecules. A series of experiments on BindingDB dataset confirm that Repurformer successfully creates substitutes for anchor compounds that resemble positive compounds, increasing diversity between the anchor and generated compounds.	翻訳日:2024-07-17 16:22:29 公開日:2024-07-16
# スマートコントラクトにおける転送リスクのエンドユーザー理解 End-user Comprehension of Transfer Risks in Smart Contracts ( http://arxiv.org/abs/2407.11440v1 ) ライセンス: Link先を確認	Yustynn Panicker, Ezekiel Soremekun, Sumei Sun, Sudipta Chattopadhyay,	(参考訳) スマートコントラクトは、重要なユースケース(例えば、金融トランザクション)でますます使われています。したがって、エンドユーザーがスマートコントラクトにおける転送リスクを確実に理解することが重要となる。これを解決するために、最も人気のあるEthereumスマートコントラクト(USDテザー(USDT))におけるエンドユーザによるリスクの理解と、上位ECC-20スマートコントラクトにおけるそれらの普及状況について調査する。我々は、ユーザーをブラックリストにし、契約を一時停止し、契約を任意にアップグレードすることを含む、転送結果とユーザ目標に深刻な影響を与える5つの転送リスクに焦点を当てる。まず,110名の参加者とUSDT/MetaMaskによるスマートコントラクト転送リスクのエンドユーザー理解調査を行った。第2に,次のトップ78のERC-20スマートコントラクト(USDT以降)を手動で自動でソースコード解析して,これらのリスクの出現状況を特定した。その結果、エンドユーザは本当のリスクを理解していないことが示され、ほとんどのユーザ(最大71.8%)は、契約のアップグレードとブラックリストは、非常に厳しい/予想外であると信じている。さらに重要なのは、USDT/MetaMask UIフローを使用したリスクのある結果よりも、成功した結果を見つけるのがより簡単であることだ。これらの結果は、自己評価プログラミングと参加者のWeb3習熟度に関わらず成り立つ。さらに,ソースコード解析の結果,ERC-20の上位契約の19.2%が調査リスクであることがわかった。さらに、これらの契約で最大25.6%の頻度で他の(3つの)リスクを発見しました。この研究は、リスクのある結果に説明可能なスマートコントラクト、理解可能なUI、関連する情報を提供する必要があることを知らせる。 Smart contracts are increasingly used in critical use cases (e.g., financial transactions). Thus, it is pertinent to ensure that end-users understand the transfer risks in smart contracts. To address this, we investigate end-user comprehension of risks in the most popular Ethereum smart contract (i.e., USD Tether (USDT)) and their prevalence in the top ERC-20 smart contracts. We focus on five transfer risks with severe impact on transfer outcomes and user objectives, including users being blacklisted, contract being paused, and contract being arbitrarily upgraded. Firstly, we conducted a user study investigating end-user comprehension of smart contract transfer risks with 110 participants and USDT/MetaMask. Secondly, we performed manual and automated source code analysis of the next top (78) ERC-20 smart contracts (after USDT) to identify the prevalence of these risks. Results show that end-users do not comprehend real risks: most (up to 71.8% of) users believe contract upgrade and blacklisting are highly severe/surprising. More importantly, twice as many users find it easier to discover successful outcomes than risky outcomes using the USDT/MetaMask UI flow. These results hold regardless of the self-rated programming and Web3 proficiency of participants. Furthermore, our source code analysis demonstrates that the examined risks are prevalent in up to 19.2% of the top ERC-20 contracts. Additionally, we discovered (three) other risks with up to 25.6% prevalence in these contracts. This study informs the need to provide explainable smart contracts, understandable UI and relevant information for risky outcomes.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# EARN Fairness: 株主間の人工知能フェアネスメトリクスの説明、調査、レビュー、交渉 EARN Fairness: Explaining, Asking, Reviewing and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders ( http://arxiv.org/abs/2407.11442v1 ) ライセンス: Link先を確認	Lin Luo, Yuri Nakao, Mathieu Chollet, Hiroya Inakoshi, Simone Stumpf,	(参考訳) バイアスを定量的に測定し、AIモデルの公正性を定義するために、人工知能(AI)の専門家によって多くの公正度メトリクスが提案され、採用されている。利害関係者の多様な公正理解に適合する必要性を認識し、インプットを要請する努力が進行中である。しかし、AIフェアネスメトリクスをAIの専門知識のないステークホルダーに伝達し、個人の好みを捉え、集合的なコンセンサスを求めることは、依然として困難で過小評価されている。このギャップを埋めるために、AIの専門知識を必要とせず、利害関係者間の総合的な計量決定を促進する新しいフレームワークEARN Fairnessを提案する。このフレームワークは、適応可能な対話システムと、利害関係者中心のEARNフェアネスプロセスを備えており、フェアネスメトリクス、利害関係者の個人的メトリック選好、総合的なレビューメトリクス、メトリクス選択に関するコンセンサスを説明する。実験的な結果を得るために,この枠組みを信用格付けシナリオに適用し,AI知識のない18人の意思決定者を対象としたユーザスタディを行った。個別のセッションにおいて、個人の計量的嗜好と許容される不公平度を識別する。その後、チームのセッションでメトリクスコンセンサスにどのように到達したかを明らかにしました。我々の研究によると、EARN Fairnessフレームワークは、利害関係者が個人の好みを表現し、合意に達することを可能にし、リスクの高い状況下で人間中心のAIフェアネスを実装するための実践的なガイダンスを提供する。このアプローチを通じて、多様な利害関係者の公正な期待を調和させ、より公平で包括的なAI公正を育むことを目指している。 Numerous fairness metrics have been proposed and employed by artificial intelligence (AI) experts to quantitatively measure bias and define fairness in AI models. Recognizing the need to accommodate stakeholders' diverse fairness understandings, efforts are underway to solicit their input. However, conveying AI fairness metrics to stakeholders without AI expertise, capturing their personal preferences, and seeking a collective consensus remain challenging and underexplored. To bridge this gap, we propose a new framework, EARN Fairness, which facilitates collective metric decisions among stakeholders without requiring AI expertise. The framework features an adaptable interactive system and a stakeholder-centered EARN Fairness process to Explain fairness metrics, Ask stakeholders' personal metric preferences, Review metrics collectively, and Negotiate a consensus on metric selection. To gather empirical results, we applied the framework to a credit rating scenario and conducted a user study involving 18 decision subjects without AI knowledge. We identify their personal metric preferences and their acceptable level of unfairness in individual sessions. Subsequently, we uncovered how they reached metric consensus in team sessions. Our work shows that the EARN Fairness framework enables stakeholders to express personal preferences and reach consensus, providing practical guidance for implementing human-centered AI fairness in high-risk contexts. Through this approach, we aim to harmonize fairness expectations of diverse stakeholders, fostering more equitable and inclusive AI fairness.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# マイクロ波駆動型トラップイオン用超電導表面トラップチップ Superconducting surface trap chips for microwave-driven trapped ions ( http://arxiv.org/abs/2407.11443v1 ) ライセンス: Link先を確認	Yuta Tsuchimoto, Ippei Nakamura, Shotaro Shirai, Atsushi Noguchi,	(参考訳) マイクロ波駆動の捕捉されたイオンロジックゲートは、レーザーベースのロジック操作を超えて前進するための有望な道を提供する。しかし、将来のマイクロ波ベースの運用では、狭いマイクロ波電極を流れる大きなマイクロ波電流によって発生するジュール熱は、ゲート速度と忠実性の改善を妨げる可能性がある。さらに、特に低温に閉じ込められたイオン系におけるスケーラビリティは、過剰なジュール熱によって妨げられる。これらの課題に対処するために,マイクロ波共振器と大電流容量を一体化した超伝導表面トラップチップを提案する。超伝導Nb共振器におけるサブアンペアマイクロ波電流を用いることで、従来の金属チップに比べて著しく損失が減少し、かなりの磁場勾配が生じる。超伝導共振器の高Q$係数を利用して,1kHzのゲートラビ周波数でサブミリワットのマイクロ波入力電力を実現できる電力効率の良い2ビットゲート方式を提案する。 Microwave-driven trapped ion logic gates offer a promising avenue for advancing beyond laser-based logic operations. In future microwave-based operations, however, the joule heat produced by large microwave currents flowing through narrow microwave electrodes would potentially hinder improvements in gate speed and fidelity. Moreover, scalability, particularly in cryogenic trapped ion systems, is impeded by the excessive joule heat. To address these challenges, we present a novel approach: superconducting surface trap chips that integrate high-$Q$ microwave resonators with large current capacities. Utilizing sub-ampere microwave currents in superconducting Nb resonators, we generate substantial magnetic field gradients with significantly reduced losses compared to conventional metal chips. By harnessing the high $Q$ factors of superconducting resonators, we propose a power-efficient two-qubit gate scheme capable of achieving a sub-milliwatt external microwave input power at a gate Rabi frequency of 1 kHz.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# 物質波干渉計のためのファインマン図 Feynman Diagrams for Matter Wave Interferometry ( http://arxiv.org/abs/2407.11446v1 ) ライセンス: Link先を確認	Jonah Glick, Tim Kovachy,	(参考訳) 物質波干渉法における位相シフトを計算するために、ファインマン図に基づく新しい理論フレームワークを導入する。この方法は、従来の半古典近似を超えた高階量子補正の解析計算を可能にする。これらの追加用語は、初期物質波動関数の有限サイズに依存するか、または$\hbar$に高次依存を持つ。本研究では,物質波干渉計の応答を任意の空間依存性を持つ法則ポテンシャルとポテンシャルに求める手法を適用した。解析式は数値シミュレーションとの比較により検証され、地球の重力場に対する位相シフト応答に対する量子補正のスケール、非調和トラップ電位、局所的な証明質量からの重力場に対する推定値が提供される。実験により実現可能なパラメータについて、これらの補正は測定できるほど大きく、考慮されない場合、体系的な誤りを引き起こす可能性があることが判明した。我々は,これらの補正が,閉じ込められた物質波干渉計や証明質量の存在下での自由空間物質波干渉計にとって特に重要であることを期待する。これらの干渉計は、移動慣性センシング、重力探査、重力の試験、量子力学との相互作用、ダークエネルギーの探索など、ますますセンシティブなツールになりつつある。 We introduce a new theoretical framework based on Feynman diagrams to compute phase shifts in matter wave interferometry. The method allows for analytic computation of higher order quantum corrections, beyond the traditional semi-classical approximation. These additional terms depend on the finite size of the initial matter wavefunction and/or have higher order dependence on $\hbar$. We apply the method to compute the response of matter wave interferometers to power law potentials and potentials with an arbitrary spatial dependence. The analytic expressions are validated by comparing to numerical simulations, and estimates are provided for the scale of the quantum corrections to the phase shift response to the gravitational field of the earth, anharmonic trapping potentials, and gravitational fields from local proof masses. We find that for certain experimentally feasible parameters, these corrections are large enough to be measured, and could lead to systematic errors if not accounted for. We anticipate these corrections will be especially important for trapped matter wave interferometers and for free-space matter wave interferometers in the presence of proof masses. These interferometers are becoming increasingly sensitive tools for mobile inertial sensing, gravity surveying, tests of gravity and its interplay with quantum mechanics, and searches for dark energy.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# cDP-MIL:カスケードディリクレプロセスによるロバストな複数インスタンス学習 cDP-MIL: Robust Multiple Instance Learning via Cascaded Dirichlet Process ( http://arxiv.org/abs/2407.11448v1 ) ライセンス: Link先を確認	Yihang Chen, Tsai Hor Chan, Guosheng Yin, Yuming Jiang, Lequan Yu,	(参考訳) マルチプル・インスタンス・ラーニング(MIL)は全スライド病理画像(WSI)解析に広く応用されている。 MILの既存の集約戦略は、主にインスタンス間の一階距離(平均差など)に依存しており、各インスタンスの真の特徴分布を正確に近似することができず、バイアスのあるスライドレベルの表現をもたらす。さらに、WSI観測の不足はモデルオーバーフィッティングを容易にし、不安定な試験性能と限定的な一般化性をもたらす。このような課題に対処するために、我々は、複数のインスタンス学習のための新しいベイズ非パラメトリックフレームワークを提案し、WSIのインスタンス・ツー・バッグ特性を組み込むために、ディリクレ・プロセスのカスケード(cDP)を採用する。パッチ特徴の共分散を取り入れ,より代表的なクラスタを形成するDirichletプロセスによって形成された潜在クラスタに基づいて,特徴集約を行う。次に、バッグ上の別のディリクレプロセスモデルを用いてバッグレベルの予測を行い、学習に自然な正規化を課し、過度な適合を防止し、一般化性を高める。さらに、ベイズ非パラメトリック法として、cDPモデルは後方の不確かさを正確に生成することができ、異常サンプルの検出と腫瘍の局在が可能である。 5つのWSIベンチマークの大規模な実験は、我々の手法の優れた性能と、その一般化可能性と不確実性を推定する能力を検証する。コードはhttps://github.com/HKU-MedAI/cDPMILで入手できる。 Multiple instance learning (MIL) has been extensively applied to whole slide histopathology image (WSI) analysis. The existing aggregation strategy in MIL, which primarily relies on the first-order distance (e.g., mean difference) between instances, fails to accurately approximate the true feature distribution of each instance, leading to biased slide-level representations. Moreover, the scarcity of WSI observations easily leads to model overfitting, resulting in unstable testing performance and limited generalizability. To tackle these challenges, we propose a new Bayesian nonparametric framework for multiple instance learning, which adopts a cascade of Dirichlet processes (cDP) to incorporate the instance-to-bag characteristic of the WSIs. We perform feature aggregation based on the latent clusters formed by the Dirichlet process, which incorporates the covariances of the patch features and forms more representative clusters. We then perform bag-level prediction with another Dirichlet process model on the bags, which imposes a natural regularization on learning to prevent overfitting and enhance generalizability. Moreover, as a Bayesian nonparametric method, the cDP model can accurately generate posterior uncertainty, which allows for the detection of outlier samples and tumor localization. Extensive experiments on five WSI benchmarks validate the superior performance of our method, as well as its generalizability and ability to estimate uncertainties. Codes are available at https://github.com/HKU-MedAI/cDPMIL.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# 制御可能なコンテクスト化画像キャプション: ユーザ定義ハイライトによるビジュアルナラティブの指示 Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights ( http://arxiv.org/abs/2407.11449v1 ) ライセンス: Link先を確認	Shunqi Mao, Chaoyi Zhang, Hang Su, Hwanjun Song, Igor Shalyminov, Weidong Cai,	(参考訳) コンテキスト化されたイメージキャプション(CIC)は、従来のイメージキャプションをより複雑なドメインに進化させ、マルチモーダル推論の能力を必要とする。特定の文脈情報を付加した画像キャプションを生成することを目的としている。本稿では,Ctrl-CIC(Ctrl-CIC)の新たな領域についても紹介する。広義のコンテキストのみに依存するCICとは異なり、Ctrl-CICはユーザ定義のハイライトをアクセントし、コンテキストの強調された側面に共鳴するキャプションをカスタマイズするようにモデルを説得する。本稿では, Prompting-based Controller (P-Ctrl) と Recalibration-based Controller (R-Ctrl) の2つのアプローチを提案する。 P-Ctrlはハイライト駆動プレフィックス付きキャプションをプリプレプションすることで、モデル生成をハイライトに設定する一方、R-Ctrlは、ハイライトされたトークンに対するエンコーダ埋め込みを選択的に再調整するためにモデルをチューニングする。さらに,標準評価手法とともに,制御されたキャプションの品質を評価するためのGPT-4Vパワー評価器を設計する。広範にわたる実験結果から,ユーザ適応画像キャプションの実現に向けた新たな方向性を図示し,提案手法の効率的かつ効果的な制御性を示した。コードはhttps://github.com/ShunqiM/Ctrl-CICで入手できる。 Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further introduces a novel domain of Controllable Contextualized Image Captioning (Ctrl-CIC). Unlike CIC, which solely relies on broad context, Ctrl-CIC accentuates a user-defined highlight, compelling the model to tailor captions that resonate with the highlighted aspects of the context. We present two approaches, Prompting-based Controller (P-Ctrl) and Recalibration-based Controller (R-Ctrl), to generate focused captions. P-Ctrl conditions the model generation on highlight by prepending captions with highlight-driven prefixes, whereas R-Ctrl tunes the model to selectively recalibrate the encoder embeddings for highlighted tokens. Additionally, we design a GPT-4V empowered evaluator to assess the quality of the controlled captions alongside standard assessment methods. Extensive experimental results demonstrate the efficient and effective controllability of our method, charting a new direction in achieving user-adaptive image captioning. Code is available at https://github.com/ShunqiM/Ctrl-CIC .	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# 拡散モデルの非交叉潜在空間に対する等尺的表現学習 Isometric Representation Learning for Disentangled Latent Space of Diffusion Models ( http://arxiv.org/abs/2407.11451v1 ) ライセンス: Link先を確認	Jaehoon Hahm, Junho Lee, Sunghyun Kim, Joonseok Lee,	(参考訳) 拡散モデルの潜在空間は、生成モデリングの分野で大きな成功と可能性にもかかわらず、いまだに未解明のままである。実際、既存の拡散モデルの潜在空間は、その潜在空間からイメージ空間への歪んだ写像で絡み合っている。この問題に対処するため,幾何正規化器を備えた拡散モデルを用いて,トレーニングデータ多様体の幾何学的音響潜在空間を学習する。このアプローチにより、拡散モデルはより不整合な潜在空間を学習することができ、より滑らかな補間、より正確な反転、より正確な潜在空間の属性の制御を可能にする。画像補間, 画像反転, 線形編集による広範囲な実験により, 提案手法の有効性が示された。 The latent space of diffusion model mostly still remains unexplored, despite its great success and potential in the field of generative modeling. In fact, the latent space of existing diffusion models are entangled, with a distorted mapping from its latent space to image space. To tackle this problem, we present Isometric Diffusion, equipping a diffusion model with a geometric regularizer to guide the model to learn a geometrically sound latent space of the training data manifold. This approach allows diffusion models to learn a more disentangled latent space, which enables smoother interpolation, more accurate inversion, and more precise control over attributes directly in the latent space. Our extensive experiments consisting of image interpolations, image inversions, and linear editing show the effectiveness of our method.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# クラウドベースの半量子マネー Cloud-based Semi-Quantum Money ( http://arxiv.org/abs/2407.11454v1 ) ライセンス: Link先を確認	Yichi Zhang, Siyuan Jin, Yuhan Huang, Bei Zeng, Qiming Shao,	(参考訳) 1970年代、ヴィースナーは量子マネーの概念を導入し、特定の規則に従って生成された量子状態が通貨として機能した。これらの状態は、量子チャネルや対面相互作用を通じて、量子リソースを持つユーザの間で循環する。量子力学は量子マネーの物理レベルの非鍛造性を付与するが、マイニング、保存、循環は極めて困難である。現在、量子マネーのマイニングと保存が可能な量子コンピュータはまだ登場しておらず、既存の量子チャネルは量子マネーのための量子状態の効率的な伝送をサポートするのに十分な安定性を持っておらず、実用性に限界がある。半量子マネースキームは、完全に古典的な取引と完全な古典的な銀行をサポートし、量子リソースへの依存を減らし、実現可能性を高める。量子資源への依存をさらに最小化するために,クラウドベースの半量子マネー(CSQM)方式を提案する。このスキームは半正直なサードパーティの量子雲にのみ依存するが、残りのシステムは完全に古典的のままである。また、このスキームのために量子クラウドが必要とする計算パワーを推定し、セキュリティ分析を行う。 In the 1970s, Wiesner introduced the concept of quantum money, where quantum states generated according to specific rules function as currency. These states circulate among users with quantum resources through quantum channels or face-to-face interactions. Quantum mechanics grants quantum money physical-level unforgeability but also makes minting, storing, and circulating it significantly challenging. Currently, quantum computers capable of minting and preserving quantum money have not yet emerged, and existing quantum channels are not stable enough to support the efficient transmission of quantum states for quantum money, limiting its practicality. Semi-quantum money schemes support fully classical transactions and complete classical banks, reducing dependence on quantum resources and enhancing feasibility. To further minimize the system's reliance on quantum resources, we propose a cloud-based semi-quantum money (CSQM) scheme. This scheme relies only on semi-honest third-party quantum clouds, while the rest of the system remains entirely classical. We also discuss estimating the computational power required by the quantum cloud for the scheme and conduct a security analysis.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# 両半球RLエージェントによるグレースフルタスク適応 Graceful task adaptation with a bi-hemispheric RL agent ( http://arxiv.org/abs/2407.11456v1 ) ライセンス: Link先を確認	Grant Nicholas, Levin Kuhlmann, Gideon Kowadlo,	(参考訳) 人間では、タスクを実行する責任は徐々に右半球から左へシフトする。 NRH(Novety-Routine hypothesis)では,右半球と左半球をそれぞれ新しいタスクと日常的なタスクに用いており,タスクを順応しながら,多様な新しいタスクを学べるようにしている。 NRHをベースとして,右半球からの一般知識を活かし,新規タスクの初期性能の低下を回避するための強化学習エージェントを開発した。さらに,この設計が新しいタスクを学習する能力に最小限の影響があることが判明した。我々は、エージェントの改善を特定し、継続的な学習環境の拡張の可能性を探ることで結論付ける。 In humans, responsibility for performing a task gradually shifts from the right hemisphere to the left. The Novelty-Routine Hypothesis (NRH) states that the right and left hemispheres are used to perform novel and routine tasks respectively, enabling us to learn a diverse range of novel tasks while performing the task capably. Drawing on the NRH, we develop a reinforcement learning agent with specialised hemispheres that can exploit generalist knowledge from the right-hemisphere to avoid poor initial performance on novel tasks. In addition, we find that this design has minimal impact on its ability to learn novel tasks. We conclude by identifying improvements to our agent and exploring potential expansion to the continual learning setting.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# RIMformer: FMCWレーダ干渉軽減のためのエンドツーエンド変換器 RIMformer: An End-to-End Transformer for FMCW Radar Interference Mitigation ( http://arxiv.org/abs/2407.11459v1 ) ライセンス: Link先を確認	Ziang Zhang, Guangzhi Chen, Youlong Weng, Shunchuan Yang, Zhiyu Jia, Jingxuan Chen,	(参考訳) 周波数変調連続波(FMCW)レーダーはリモートセンシングの分野において重要な役割を果たす。 FMCWレーダー配備の度合いの増大は相互干渉を増大させ、レーダーの検出能力を弱め、システムの信頼性と安全性を脅かす。本稿では, RIMformerと呼ばれる新しいFMCWレーダ干渉緩和法について, エンドツーエンドのTransformer構造を用いて提案する。 RIMformerでは、中間周波数(IF)信号の異なる距離要素間の相関を捉えるために、デュアルマルチヘッド自己アテンション機構が提案されている。さらに、局所的な特徴を抽出するために畳み込みの力を利用するために改良された畳み込みブロックが統合される。このアーキテクチャは、時間領域IF信号をエンドツーエンドに処理するように設計されており、これにより、追加の手動データ処理ステップが不要になる。改良されたデコーダ構造により、ネットワークの並列化が保証され、その計算効率が向上する。提案手法の精度と有効性を検証するため,シミュレーションおよび測定実験を行った。その結果,提案したRIMformerは干渉を効果的に軽減し,ターゲット信号の復元を可能にすることがわかった。 Frequency-modulated continuous-wave (FMCW) radar plays a pivotal role in the field of remote sensing. The increasing degree of FMCW radar deployment has increased the mutual interference, which weakens the detection capabilities of radars and threatens reliability and safety of systems. In this paper, a novel FMCW radar interference mitigation (RIM) method, termed as RIMformer, is proposed by using an end-to-end Transformer-based structure. In the RIMformer, a dual multi-head self-attention mechanism is proposed to capture the correlations among the distinct distance elements of intermediate frequency (IF) signals. Additionally, an improved convolutional block is integrated to harness the power of convolution for extracting local features. The architecture is designed to process time-domain IF signals in an end-to-end manner, thereby avoiding the need for additional manual data processing steps. The improved decoder structure ensures the parallelization of the network to increase its computational efficiency. Simulation and measurement experiments are carried out to validate the accuracy and effectiveness of the proposed method. The results show that the proposed RIMformer can effectively mitigate interference and restore the target signals.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# タブラルデータに対する敵対的攻撃の非受容性の検討--経験的分析 Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis ( http://arxiv.org/abs/2407.11463v1 ) ライセンス: Link先を確認	Zhipeng He, Chun Ouyang, Laith Alzubaidi, Alistair Barros, Catarina Moreira,	(参考訳) 敵攻撃は機械学習モデルに対する潜在的な脅威であり、入力データに知覚不能な摂動を導入することによって、モデルに誤った予測をさせる可能性がある。画像のような非構造化データで広く研究されているが、表データのような構造化データへの応用は、表データの不均一性と複雑な特徴相互依存性のため、ユニークな課題を呈している。表型データにおける非受容性は、データ完全性を維持しながら、潜在的に分類ミスを引き起こし、表型データに対する修正された非受容性基準の必要性を強調する。しかし、現在、表データに特化した敵攻撃を評価するための標準化された指標が欠如している。このギャップに対処するために、表データに対する敵攻撃の認識不能性を評価するための一連の特性を導出する。これらの特性は、摂動データの7つの視点を捉えるために定義される: 原入力への近接、変化の空間性、元のデータセットにおけるデータポイントへの偏差、感度のある特徴の感度、摂動の不変性、摂動値の実現可能性、表特徴間の複雑な特徴相互依存性。さらに,7つの特性について定量的評価とケースベース定性例解析を行った。この評価は、特に近接性、感度、偏差に関して、攻撃の成功と不可避性の間のトレードオフを明らかにしている。評価された攻撃は最適効果と非知覚性を同時に達成することができないが、非有界攻撃は、非知覚的な敵の例を作成する際に、表層データに対してより有望であることが証明されている。この研究は、空間性を制御するための評価アルゴリズムの限界も強調している。我々は,今後の攻撃設計に空間距離を組み込むことによって,乱れた特徴の数を調節することを提案する。 Adversarial attacks are a potential threat to machine learning models, as they can cause the model to make incorrect predictions by introducing imperceptible perturbations to the input data. While extensively studied in unstructured data like images, their application to structured data like tabular data presents unique challenges due to the heterogeneity and intricate feature interdependencies of tabular data. Imperceptibility in tabular data involves preserving data integrity while potentially causing misclassification, underscoring the need for tailored imperceptibility criteria for tabular data. However, there is currently a lack of standardised metrics for assessing adversarial attacks specifically targeted at tabular data. To address this gap, we derive a set of properties for evaluating the imperceptibility of adversarial attacks on tabular data. These properties are defined to capture seven perspectives of perturbed data: proximity to original inputs, sparsity of alterations, deviation to datapoints in the original dataset, sensitivity of altering sensitive features, immutability of perturbation, feasibility of perturbed values and intricate feature interdepencies among tabular features. Furthermore, we conduct both quantitative empirical evaluation and case-based qualitative examples analysis for seven properties. The evaluation reveals a trade-off between attack success and imperceptibility, particularly concerning proximity, sensitivity, and deviation. Although no evaluated attacks can achieve optimal effectiveness and imperceptibility simultaneously, unbounded attacks prove to be more promised for tabular data in crafting imperceptible adversarial examples. The study also highlights the limitation of evaluated algorithms in controlling sparsity effectively. We suggest incorporating a sparsity metric in future attack design to regulate the number of perturbed features.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# Crowd-SAM: クラウドシーンにおけるオブジェクト検出のためのスマートアノテーションとしてのSAM Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes ( http://arxiv.org/abs/2407.11464v1 ) ライセンス: Link先を確認	Zhi Cai, Yingjie Gao, Yaoyan Zheng, Nan Zhou, Di Huang,	(参考訳) コンピュータビジョンでは、オブジェクト検出は多くのシナリオでその応用を見つける重要なタスクである。しかし、特に混み合ったシーンでは、広範囲なラベルを取得することは困難である。最近、Segment Anything Model (SAM) は強力なゼロショットセグメンタとして提案され、インスタンスセグメンテーションタスクに新しいアプローチを提供している。しかし、SAMとそのバリエーションの精度と効率は、混み合ったシーンでオブジェクトを扱うときにしばしば損なわれる。本稿では,学習可能なパラメータが少なく,ラベル付き画像が最小限のコストで,混み合ったシーンでSAMのパフォーマンスを向上させるために設計されたSAMベースのフレームワークであるCrowd-SAMを紹介する。本稿では,高効率なプロンプトサンプリング(EPS)とPWD-Net(PWD-Net)を導入し,混み合ったシーンにおけるマスクの選択と精度を向上させる。その単純さにもかかわらず、Crowd-SAMはCrowdHumanやCityPersonsといったいくつかのベンチマークで、最先端のSOTA(State-of-the-art)と競合する。私たちのコードはhttps://github.com/FelixCaae/CrowdSAMで公開されています。 In computer vision, object detection is an important task that finds its application in many scenarios. However, obtaining extensive labels can be challenging, especially in crowded scenes. Recently, the Segment Anything Model (SAM) has been proposed as a powerful zero-shot segmenter, offering a novel approach to instance segmentation tasks. However, the accuracy and efficiency of SAM and its variants are often compromised when handling objects in crowded and occluded scenes. In this paper, we introduce Crowd-SAM, a SAM-based framework designed to enhance SAM's performance in crowded and occluded scenes with the cost of few learnable parameters and minimal labeled images. We introduce an efficient prompt sampler (EPS) and a part-whole discrimination network (PWD-Net), enhancing mask selection and accuracy in crowded scenes. Despite its simplicity, Crowd-SAM rivals state-of-the-art (SOTA) fully-supervised object detection methods on several benchmarks including CrowdHuman and CityPersons. Our code is available at https://github.com/FelixCaae/CrowdSAM.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# データトレーディングのクロスロードをナビゲートする - 学際的な調査 Navigating the Data Trading Crossroads: An Interdisciplinary Survey ( http://arxiv.org/abs/2407.11466v1 ) ライセンス: Link先を確認	Yi Yu, Jingru Yu, Xuhong Wang, Juanjuan Li, Yilun Lin, Conghui He, Yanqing Yang, Yu Qiao, Li Li, Fei-Yue Wang,	(参考訳) データは、将来の経済にとって重要な要素として、ますます認識されるようになった。しかし、効率的なデータトレーディング市場の構築は、プライバシー侵害、データ独占、誤用といった問題に直面している。プライバシとデータ価格の方法を保護するアルゴリズムを提案している多くの研究にもかかわらず、これらの問題とシステム的解決策の包括的な理解はいまだ解明されていない。本稿では,データトレーディング研究の広範なレビューと評価を行い,既存の問題,研究ギャップの特定,潜在的な解決策の提案を行う。課題を3つの主要な分野に分類する。コンプライアンス・チャレンジ,コラテラル・コンシークエンス,コスト・トランザクション(“3C問題”)。文献の定量的解析を通じて、孤立解から統合解へのパラダイムシフトを観察する。正しいあいまいさの未解決問題に対処するため、個人が所有していないデータを使って利益を得ることのできる、新しい概念「データ活用」を紹介します。この概念は、データをより従来的な生産要素として再編成し、確立された経済理論と整合させ、研究理論、技術ツール、プラットフォームの包括的な枠組みの道を開くのに役立つ。この調査は、研究者、実践家、政策立案者に貴重な洞察とガイダンスを提供し、デジタル経済の発展に寄与することを願っている。 Data has been increasingly recognized as a critical factor in the future economy. However, constructing an efficient data trading market faces challenges such as privacy breaches, data monopolies, and misuse. Despite numerous studies proposing algorithms to protect privacy and methods for pricing data, a comprehensive understanding of these issues and systemic solutions remain elusive. This paper provides an extensive review and evaluation of data trading research, aiming to identify existing problems, research gaps, and propose potential solutions. We categorize the challenges into three main areas: Compliance Challenges, Collateral Consequences, and Costly Transactions (the "3C problems"), all stemming from ambiguity in data rights. Through a quantitative analysis of the literature, we observe a paradigm shift from isolated solutions to integrated approaches. Addressing the unresolved issue of right ambiguity, we introduce the novel concept of "data usufruct," which allows individuals to use and benefit from data they do not own. This concept helps reframe data as a more conventional factor of production and aligns it with established economic theories, paving the way for a comprehensive framework of research theories, technical tools, and platforms. We hope this survey provides valuable insights and guidance for researchers, practitioners, and policymakers, thereby contributing to digital economy advancements.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# AU-vMAE:ビデオマスクオートエンコーダによる知識ガイドアクションユニットの検出 AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder ( http://arxiv.org/abs/2407.11468v1 ) ライセンス: Link先を確認	Qiaoqiao Jin, Rui Shi, Yishun Dou, Bingbing Ni,	(参考訳) 現在の顔行動単位(FAU)検出法は、ラベル付きビデオトレーニングデータの不足と、訓練された特徴抽出器が顔間構造や動きの多様さをモデル化するのに不十分である訓練顔IDの数が限られているため、一般的に困難に直面する。上記の課題に明確に対処するために,ビデオ中のFAUのマルチラベル特性と時間的ラベルの整合性を十分に探求し,新しいビデオレベルの事前学習手法を提案する。我々の設計の核心は、ビデオマインドオートエンコーダに基づく事前訓練されたビデオ特徴抽出機であり、マルチレベルビデオFAU分析タスクを共同で完了する微調整ネットワークである \emph{i.e.} は、ビデオレベルとフレームレベル両方のFAU検出を統合し、スパースFAUアノテーションからマスク付きフレームを含むオールビデオフレームへの監督セットを劇的に拡大する。さらに、従来のグラフニューラルネットワークの代わりに、フレーム間およびフレーム内AUペア状態行列を事前知識として利用し、時間的監視を改善する。提案手法は,BP4DおよびdisFA FAUsデータセットで使用されている既存の最先端手法と比較して,性能の大幅な向上を示す。 Current Facial Action Unit (FAU) detection methods generally encounter difficulties due to the scarcity of labeled video training data and the limited number of training face IDs, which renders the trained feature extractor insufficient coverage for modeling the large diversity of inter-person facial structures and movements. To explicitly address the above challenges, we propose a novel video-level pre-training scheme by fully exploring the multi-label property of FAUs in the video as well as the temporal label consistency. At the heart of our design is a pre-trained video feature extractor based on the video-masked autoencoder together with a fine-tuning network that jointly completes the multi-level video FAUs analysis tasks, \emph{i.e.} integrating both video-level and frame-level FAU detections, thus dramatically expanding the supervision set from sparse FAUs annotations to ALL video frames including masked ones. Moreover, we utilize inter-frame and intra-frame AU pair state matrices as prior knowledge to guide network training instead of traditional Graph Neural Networks, for better temporal supervision. Our approach demonstrates substantial enhancement in performance compared to the existing state-of-the-art methods used in BP4D and DISFA FAUs datasets.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# 正確性を超えて:大規模言語モデルのための多次元コード生成のベンチマーク Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models ( http://arxiv.org/abs/2407.11470v1 ) ライセンス: Link先を確認	Jiasheng Zheng, Boxi Cao, Zhengzhao Ma, Ruotong Pan, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun,	(参考訳) 近年,大規模言語モデル(LLM)の符号化能力を評価するために,多数のベンチマークが提案されている。しかし、既存のベンチマークは主に、LLMが生成したコードの正確性を評価することに焦点を当て、コード品質に大きな影響を及ぼす他の重要な次元を無視している。そこで本研究では,可読性,mAintainability,正確性,効率性の4次元にわたってLLMが生成するコードの品質を総合的に評価するRASベンチマークを提案する。具体的には、正確性を超えた次元の要求に依存した性質を考慮し、各次元に対する様々なタイプのユーザ要求を設計し、モデルがユーザ要求を満たす正しいコードを生成する能力を評価する。 RACE上での18の代表的なLCMを評価し,そを見いだす。 1) 要求に応じて高品質なコードを生成する現在のLLMの能力は、まだソフトウェア開発の要件を満たしていない。 2) 可読性は,生成されたコード全体の品質の臨界指標として機能する。 3)ほとんどのLCMは,特定のコーディングスタイルに固有の嗜好を示す。これらの発見は、研究者が現在のLLMのコーディング能力についてより深く理解し、将来のモデル改善の方向性に光を当てるのに役立つ。 In recent years, researchers have proposed numerous benchmarks to evaluate the impressive coding capabilities of large language models (LLMs). However, existing benchmarks primarily focus on assessing the correctness of code generated by LLMs, while neglecting other critical dimensions that also significantly impact code quality. Therefore, this paper proposes the RACE benchmark, which comprehensively evaluates the quality of code generated by LLMs across 4 dimensions: Readability, mAintainability, Correctness, and Efficiency. Specifically, considering the demand-dependent nature of dimensions beyond correctness, we design various types of user requirements for each dimension to assess the model's ability to generate correct code that also meets user demands. We evaluate 18 representative LLMs on RACE and find that: 1) the current LLMs' ability to generate high-quality code on demand does not yet meet the requirements of software development; 2) readability serves as a critical indicator of the overall quality of generated code; 3) most LLMs exhibit an inherent preference for specific coding style. These findings can help researchers gain a deeper understanding of the coding capabilities of current LLMs and shed light on future directions for model improvement.	翻訳日:2024-07-17 16:12:18 公開日:2024-07-16
# マルチポイントフィードバックによる安全なオンライン凸最適化 Safe Online Convex Optimization with Multi-Point Feedback ( http://arxiv.org/abs/2407.11471v1 ) ライセンス: Link先を確認	Spencer Hutchinson, Mahnoosh Alizadeh,	(参考訳) 実世界のアプリケーションでよく見られる厳格な安全要件に感化され、我々は、ゼロオーダー情報のみを使用しながら、プレイヤーがサブ線形後悔とゼロ制約違反を同時に達成する必要がある安全なオンライン凸最適化設定について検討する。特に,各ラウンドで$d + 1$ポイント($d$は問題次元)を選択し,各ポイントで制約関数とコスト関数の値を受け取るマルチポイントフィードバック設定を考える。この問題に対処するために,制約関数が滑らかで凸であるという仮定の下で,前向き差分勾配推定と楽観的かつ悲観的な作用セットを利用して,$\mathcal{O}(d \sqrt{T})$ regretおよびゼロ制約違反を実現するアルゴリズムを提案する。次に、未知の制約とゼロオーダーフィードバックが経験的性能に与える影響を数値的に研究する。 Motivated by the stringent safety requirements that are often present in real-world applications, we study a safe online convex optimization setting where the player needs to simultaneously achieve sublinear regret and zero constraint violation while only using zero-order information. In particular, we consider a multi-point feedback setting, where the player chooses $d + 1$ points in each round (where $d$ is the problem dimension) and then receives the value of the constraint function and cost function at each of these points. To address this problem, we propose an algorithm that leverages forward-difference gradient estimation as well as optimistic and pessimistic action sets to achieve $\mathcal{O}(d \sqrt{T})$ regret and zero constraint violation under the assumption that the constraint function is smooth and strongly convex. We then perform a numerical study to investigate the impacts of the unknown constraint and zero-order feedback on empirical performance.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# DynSyn:過動型身体システムにおける効率的な学習と制御のための動的相乗的表現 DynSyn: Dynamical Synergistic Representation for Efficient Learning and Control in Overactuated Embodied Systems ( http://arxiv.org/abs/2407.11472v1 ) ライセンス: Link先を確認	Kaibo He, Chenhui Zuo, Chengtian Ma, Yanan Sui,	(参考訳) 高次元の過度なシステムを制御する効果的なポリシーを学ぶことは、深い強化学習アルゴリズムにとって重要な課題である。このような制御シナリオは脊椎動物の骨格系の神経制御においてしばしば観察される。これらの制御機構の研究は、高次元の過度なシステムの制御に関する洞察を与える。神経力学における筋シナジーとして知られるアクチュエータの協調は、運動指令の生成を単純化する予備的なメカニズムであると考えられている。系の力学構造はその関数の基底であり、アクチュエータの相乗的表現を導出することができる。この理論を動機として,動的シナジスティック表現(DynSyn)アルゴリズムを提案する。 DynSynは、動的構造から相乗的表現を生成し、運動制御を改善するためにタスク固有の状態依存適応を実行することを目的としている。異なる筋骨格モデルを含む様々なタスクにまたがるDynSynの効率を実証し、ベースラインアルゴリズムと比較して最先端のサンプル効率と堅牢性を達成する。 DynSynは、動的構造の本質的な特徴を捉え、様々な運動タスクにおける一般化可能性を示す解釈可能な相乗表現を生成する。 Learning an effective policy to control high-dimensional, overactuated systems is a significant challenge for deep reinforcement learning algorithms. Such control scenarios are often observed in the neural control of vertebrate musculoskeletal systems. The study of these control mechanisms will provide insights into the control of high-dimensional, overactuated systems. The coordination of actuators, known as muscle synergies in neuromechanics, is considered a presumptive mechanism that simplifies the generation of motor commands. The dynamical structure of a system is the basis of its function, allowing us to derive a synergistic representation of actuators. Motivated by this theory, we propose the Dynamical Synergistic Representation (DynSyn) algorithm. DynSyn aims to generate synergistic representations from dynamical structures and perform task-specific, state-dependent adaptation to the representations to improve motor control. We demonstrate DynSyn's efficiency across various tasks involving different musculoskeletal models, achieving state-of-the-art sample efficiency and robustness compared to baseline algorithms. DynSyn generates interpretable synergistic representations that capture the essential features of dynamical structures and demonstrates generalizability across diverse motor tasks.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# 量子最大エントロピー推論とハミルトン学習 Quantum Maximum Entropy Inference and Hamiltonian Learning ( http://arxiv.org/abs/2407.11473v1 ) ライセンス: Link先を確認	Minbo Gao, Zhengfeng Ji, Fuchao Wei,	(参考訳) グラフィカルモデルの最大エントロピー推論と学習は、学習理論と最適化において重要なタスクである。この研究は、一般化反復スケーリング(GIS)や勾配降下(GD)を含むこれらの問題のアルゴリズムを量子領域に拡張する。量子反復スケーリング(QIS)として知られる一般化は単純であるが、鍵となる課題は量子問題インスタンスの非可換性にある。本研究の主な技術的貢献は収束率の厳密な解析であり、これらのアルゴリズムの反復ごとにヤコビ行列のスペクトル半径の上下境界を設定することである。さらに,QISとGDの性能向上のための準ニュートン法について検討する。具体的にはアンダーソン混合法とL-BFGS法を用いて,それぞれQISとGDについて検討する。これらの準ニュートン法は顕著な効率向上を示し、性能が大幅に向上した。アプリケーションとして、我々のアルゴリズムはハミルトン学習アルゴリズムを設計するための実行可能なアプローチを提供する。 Maximum entropy inference and learning of graphical models are pivotal tasks in learning theory and optimization. This work extends algorithms for these problems, including generalized iterative scaling (GIS) and gradient descent (GD), to the quantum realm. While the generalization, known as quantum iterative scaling (QIS), is straightforward, the key challenge lies in the non-commutative nature of quantum problem instances, rendering the convergence rate analysis significantly more challenging than the classical case. Our principal technical contribution centers on a rigorous analysis of the convergence rates, involving the establishment of both lower and upper bounds on the spectral radius of the Jacobian matrix for each iteration of these algorithms. Furthermore, we explore quasi-Newton methods to enhance the performance of QIS and GD. Specifically, we propose using Anderson mixing and the L-BFGS method for QIS and GD, respectively. These quasi-Newton techniques exhibit remarkable efficiency gains, resulting in orders of magnitude improvements in performance. As an application, our algorithms provide a viable approach to designing Hamiltonian learning algorithms.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# XTraffic: 説明可能性などを備えた,トラフィックがインシデントに遭遇するデータセット XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More ( http://arxiv.org/abs/2407.11477v1 ) ライセンス: Link先を確認	Xiaochuan Gou, Ziyue Li, Tian Lan, Junpeng Lin, Zhishuai Li, Bingyu Zhao, Chen Zhang, Di Wang, Xiangliang Zhang,	(参考訳) トラヒックとインシデントという2つの非常に相関の深いトラックについて、長期にわたる研究が行われてきた。トラフィックトラックは、例えば、予測をより正確にするためにディープラーニングモデルを複雑にし、インシデントトラックは、インシデントリスクを推測するために、インシデントのみを研究する。当社のXTrafficデータセットにはトラフィック,すなわち交通流,車線占有率,平均車両速度の時系列インデックス,およびトラフィックデータと時空間的に一致したインシデントを含む7つの異なるインシデントクラスが含まれています。さらに、各ノードは、レーンの詳細な物理的およびポリシーレベルのメタ属性を含む。我々は,従来の交通関連タスクを,従来の予測や分類タスクに代えて,交通指標への影響を定量化する事後交通予測,交通指標を用いた交通指標を用いた事故分類,交通指標間の大域的因果分析,メタ属性,インシデントによる様々な要因の相互関係の高レベルなガイダンスを与える事,道路ノード内の局地的因果解析により,道路セグメントの関係にどのような影響があるかを調べる事,などを行う。データセットはhttp://xaitraffic.github.ioで公開されている。 Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e.g., to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e.g., to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16,972 traffic nodes) over the whole year of 2023: our XTraffic dataset includes traffic, i.e., time-series indexes on traffic flow, lane occupancy, and average vehicle speed, and incidents, whose records are spatiotemporally-aligned with traffic data, with seven different incident classes. Additionally, each node includes detailed physical and policy-level meta-attributes of lanes. Our data can revolutionalize traditional traffic-related tasks towards higher interpretability and practice: instead of traditional prediction or classification tasks, we conduct: (1) post-incident traffic forecasting to quantify the impact of different incidents on traffic indexes; (2) incident classification using traffic indexes to determine the incidents types for precautions measures; (3) global causal analysis among the traffic indexes, meta-attributes, and incidents to give high-level guidance of the interrelations of various factors; (4) local causal analysis within road nodes to examine how different incidents affect the road segments' relations. The dataset is available at http://xaitraffic.github.io.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# 産業時系列のためのAIGC:深部生成モデルから大規模生成モデルへ AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models ( http://arxiv.org/abs/2407.11480v1 ) ライセンス: Link先を確認	Lei Ren, Haiteng Wang, Yang Tang, Chunhua Yang,	(参考訳) ChatGPTのような生成モデルの成功により、AIGC(Artificial Intelligence Generated Content)は爆発的な発展を遂げている。テキストや画像に限らず、生成モデルは産業時系列データを生成し、データ収集やデータアノテーションの難しさといった課題に対処することができる。優れた生成能力のため、産業生産の効率を高めるためにモノのインターネット、メタバース、サイバー物理社会システムで広く使われている。本稿では,DGM(Deep Generative Model)からLGM(Big Generative Model)への産業時系列生成モデルの概要を概説する。まず,産業時系列生成のためのDGMベースのAIGCフレームワークを提案する。本枠組みでは,先進的な産業用DGMを調査し,多視点分類を提案する。さらに, 産業用LGMの構築に必要な重要な技術は, 大規模産業用データセット, 複合産業用LGMアーキテクチャ, 産業用時系列の自己監督訓練, 産業用ダウンストリームタスクの微調整の4つの側面から体系的に分析した。最後に,産業における生産モデル開発の実現に向けた課題と今後の方向性について述べる。 With the remarkable success of generative models like ChatGPT, Artificial Intelligence Generated Content (AIGC) is undergoing explosive development. Not limited to text and images, generative models can generate industrial time series data, addressing challenges such as the difficulty of data collection and data annotation. Due to their outstanding generation ability, they have been widely used in Internet of Things, metaverse, and cyber-physical-social systems to enhance the efficiency of industrial production. In this paper, we present a comprehensive overview of generative models for industrial time series from deep generative models (DGMs) to large generative models (LGMs). First, a DGM-based AIGC framework is proposed for industrial time series generation. Within this framework, we survey advanced industrial DGMs and present a multi-perspective categorization. Furthermore, we systematically analyze the critical technologies required to construct industrial LGMs from four aspects: large-scale industrial dataset, LGMs architecture for complex industrial characteristics, self-supervised training for industrial time series, and fine-tuning of industrial downstream tasks. Finally, we conclude the challenges and future directions to enable the development of generative models in industry.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# マルチチャネルマスク付きオートエンコーダと任意単値心電図からの12レベル心電図再構成のための総合的評価 Multi-Channel Masked Autoencoder and Comprehensive Evaluations for Reconstructing 12-Lead ECG from Arbitrary Single-Lead ECG ( http://arxiv.org/abs/2407.11481v1 ) ライセンス: Link先を確認	Jiarong Chen, Wanqing Wu, Tong Liu, Shenda Hong,	(参考訳) 心血管疾患(CVD)では、心電図(ECG)は医師にとって一般的で標準的な診断ツールである。しかし、表面に置かれた10個の電極は、多くの不便と不快を招き、急速に進歩するウェアラブルデバイスは、長期監視におけるソリューションとしての不快感を軽減するために、リードまたはシングルリードのECGを採用する。シングルリードECGは12リードECGのサブセットであるため、心臓の健康情報が不十分であり、現実世界の医療応用においてサブスタンダードの役割を担っている。したがって、信号生成技術を用いて、実際の単誘導心電図から12リード心電図を再構成することにより、臨床的重要性のギャップを低減する必要がある。具体的には,マルチチャネルマスク付きオートエンコーダ(MCMA)を提案する。実験の結果,生成した信号と実信号の可視化結果から,提案手法の有効性が示された。同時に、信号レベル、特徴レベル、診断レベルの評価を包含する総合評価ベンチマークECGGenEvalを導入し、12リードのECG信号と生成モデルを総合評価する。さらに, 信号レベル評価における平均平方誤差0.0178, 0.0658, 相関係数0.7698, 0.7237, 診断レベル評価における平均F1スコア0.8319, 0.7824である。オープンソースコードは \url{https://github.com/CHENJIAR3/MCMA} で公開されている。 In the context of cardiovascular diseases (CVD) that exhibit an elevated prevalence and mortality, the electrocardiogram (ECG) is a popular and standard diagnostic tool for doctors, commonly utilizing a 12-lead configuration in clinical practice. However, the 10 electrodes placed on the surface would cause a lot of inconvenience and discomfort, while the rapidly advancing wearable devices adopt the reduced-lead or single-lead ECG to reduce discomfort as a solution in long-term monitoring. Since the single-lead ECG is a subset of 12-lead ECG, it provides insufficient cardiac health information and plays a substandard role in real-world healthcare applications. Hence, it is necessary to utilize signal generation technologies to reduce their clinical importance gap by reconstructing 12-lead ECG from the real single-lead ECG. Specifically, this study proposes a multi-channel masked autoencoder (MCMA) for this goal. In the experimental results, the visualized results between the generated and real signals can demonstrate the effectiveness of the proposed framework. At the same time, this study introduces a comprehensive evaluation benchmark named ECGGenEval, encompassing the signal-level, feature-level, and diagnostic-level evaluations, providing a holistic assessment of 12-lead ECG signals and generative model. Further, the quantitative experimental results are as follows, the mean square errors of 0.0178 and 0.0658, correlation coefficients of 0.7698 and 0.7237 in the signal-level evaluation, the average F1-score with two generated 12-lead ECG is 0.8319 and 0.7824 in the diagnostic-level evaluation, achieving the state-of-the-art performance. The open-source code is publicly available at \url{https://github.com/CHENJIAR3/MCMA}.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# AIシアターのオスカー: 言語モデルによるロールプレイングに関する調査 The Oscars of AI Theater: A Survey on Role-Playing with Language Models ( http://arxiv.org/abs/2407.11484v1 ) ライセンス: Link先を確認	Nuo Chen, Y. Wang, Yang Deng, Jia Li,	(参考訳) 本研究では,言語モデルを用いたロールプレイングの急成長分野を探求し,初期のペルソナモデルから,大規模言語モデル(LLM)によって促進される高度なキャラクタ駆動シミュレーションへの展開に焦点を当てた。当初はモデル能力の制限により単純なペルソナ一貫性に制限されていたため、ロールプレイングタスクは、キャラクターの一貫性、行動アライメント、全体的な魅力を含む複雑なキャラクター描写を受け入れるように拡張された。データやモデル,アライメント,エージェントアーキテクチャ,評価など,これらのシステムを設計する上で重要なコンポーネントを包括的に分類する。この調査は、動的な個人プロファイルの管理やハイレベルなペルソナの整合性の実現など、現在の方法論や課題を概説するだけでなく、ロールプレイングアプリケーションの深さと現実性を改善するための今後の研究の道筋も示唆している。目標は、現在の方法論の構造化された概要を提供し、改善のための潜在的な領域を特定することで、将来の研究を導くことである。関連リソースとドキュメントはhttps://github.com/nuochenpku/Awesome-Role-Play-Papers.comで公開されている。 This survey explores the burgeoning field of role-playing with language models, focusing on their development from early persona-based models to advanced character-driven simulations facilitated by Large Language Models (LLMs). Initially confined to simple persona consistency due to limited model capabilities, role-playing tasks have now expanded to embrace complex character portrayals involving character consistency, behavioral alignment, and overall attractiveness. We provide a comprehensive taxonomy of the critical components in designing these systems, including data, models and alignment, agent architecture and evaluation. This survey not only outlines the current methodologies and challenges, such as managing dynamic personal profiles and achieving high-level persona consistency but also suggests avenues for future research in improving the depth and realism of role-playing applications. The goal is to guide future research by offering a structured overview of current methodologies and identifying potential areas for improvement. Related resources and papers are available at https://github.com/nuochenpku/Awesome-Role-Play-Papers.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# 検証可能な回答を用いた科学的QAシステム Scientific QA System with Verifiable Answers ( http://arxiv.org/abs/2407.11485v1 ) ライセンス: Link先を確認	Adela Ljajić, Miloš Košprdić, Bojana Bašaragin, Darija Medvecki, Lorenzo Cassano, Nikola Milošević,	(参考訳) 本稿では,オープンソースの科学的質問応答システムであるVerifAIプロジェクトを紹介し,参照されたばかりでなく,自動的に検証し,検証可能な回答を提供する。本システムの特徴は,(1)科学論文(PubMed)上の意味的および語彙的検索技術を組み合わせた情報検索システム,(2)微調整生成モデル(Mistral 7B)を用いた検索用モジュール,(3)SciFACTデータセットを用いた自然言語推論タスクに基づく細調整DeBERTaおよびXLM-RoBERTaモデルに基づく検証エンジン,である。検証エンジンは、生成されたクレームとクレームが導出された記事とを相互にチェックし、クレームの生成に幻覚があったかどうかを検証する。 Information RetrievalとRAGモジュールを活用することで、Verif.aiは様々な科学資料から事実情報を生成できる。同時に、検証エンジンはこの出力を厳格に2倍にチェックし、精度と信頼性を確保する。この2段階のプロセスは、事実情報の取得と確認において重要な役割を担い、情報ランドスケープを著しく向上させる。我々の手法は科学者の生産性を大幅に向上させ、幻覚や誤報が受け入れられない科学領域に生成言語モデルを適用することへの信頼を同時に促進する可能性がある。 In this paper, we introduce the VerifAI project, a pioneering open-source scientific question-answering system, designed to provide answers that are not only referenced but also automatically vetted and verifiable. The components of the system are (1) an Information Retrieval system combining semantic and lexical search techniques over scientific papers (PubMed), (2) a Retrieval-Augmented Generation (RAG) module using fine-tuned generative model (Mistral 7B) and retrieved articles to generate claims with references to the articles from which it was derived, and (3) a Verification engine, based on a fine-tuned DeBERTa and XLM-RoBERTa models on Natural Language Inference task using SciFACT dataset. The verification engine cross-checks the generated claim and the article from which the claim was derived, verifying whether there may have been any hallucinations in generating the claim. By leveraging the Information Retrieval and RAG modules, Verif.ai excels in generating factual information from a vast array of scientific sources. At the same time, the Verification engine rigorously double-checks this output, ensuring its accuracy and reliability. This dual-stage process plays a crucial role in acquiring and confirming factual information, significantly enhancing the information landscape. Our methodology could significantly enhance scientists' productivity, concurrently fostering trust in applying generative language models within scientific domains, where hallucinations and misinformation are unacceptable.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# 頚部細胞病理全体スクリーニングのための大規模基盤モデルに基づく効率的な枠組み An efficient framework based on large foundation model for cervical cytopathology whole slide image screening ( http://arxiv.org/abs/2407.11486v1 ) ライセンス: Link先を確認	Jialong Huang, Gaojie Li, Shichao Kan, Jianfeng Liu, Yixiong Liang,	(参考訳) 現在の頚部細胞病理全体像(WSI)スクリーニングは、主に検出に基づくアプローチに依存しており、費用と時間のかかるアノテーションプロセスにより、パフォーマンスが制限されている。バッグレベルのラベルのみに依存する弱い教師付きアプローチであるMIL(Multiple Instance Learning)は、これらの課題を効果的に軽減することができる。それでも、MILは一般的に凍結した事前訓練されたモデルや自己教師付き学習を特徴抽出に用いており、その効果は低いか非効率である。本稿では,非教師付き・弱教師付き学習によるWSIレベルラベルのみを用いた頚部細胞病理学WSI分類のための効率的なフレームワークを提案する。細胞病理学的なWSIにおける異常細胞の分散特性を考慮し, 事前学習した基盤モデルを用いて, トップ$k$高リスクパッチをフィルタリングする手法を提案する。次に,フィルタパッチ上でのコントラスト学習を用いた大規模基盤モデルのパラメータ効率細調整(PEFT)を提案し,タスク固有信号の表現能力を向上する。追加の線形アダプタのみをトレーニングすることにより、時間とメモリ消費を大幅に削減し、パッチレベルの特徴の学習を強化する。 CSDおよびFNAC 2019データセットで実施された実験は、提案手法が様々なMIL手法の性能を高め、最先端(SOTA)性能を達成することを示した。コードとトレーニングされたモデルはhttps://github.com/CVIU-CSU/TCT-InfoNCEで公開されている。 Current cervical cytopathology whole slide image (WSI) screening primarily relies on detection-based approaches, which are limited in performance due to the expense and time-consuming annotation process. Multiple Instance Learning (MIL), a weakly supervised approach that relies solely on bag-level labels, can effectively alleviate these challenges. Nonetheless, MIL commonly employs frozen pretrained models or self-supervised learning for feature extraction, which suffers from low efficacy or inefficiency. In this paper, we propose an efficient framework for cervical cytopathology WSI classification using only WSI-level labels through unsupervised and weakly supervised learning. Given the sparse and dispersed nature of abnormal cells within cytopathological WSIs, we propose a strategy that leverages the pretrained foundation model to filter the top$k$ high-risk patches. Subsequently, we suggest parameter-efficient fine-tuning (PEFT) of a large foundation model using contrastive learning on the filtered patches to enhance its representation ability for task-specific signals. By training only the added linear adapters, we enhance the learning of patch-level features with substantially reduced time and memory consumption. Experiments conducted on the CSD and FNAC 2019 datasets demonstrate that the proposed method enhances the performance of various MIL methods and achieves state-of-the-art (SOTA) performance. The code and trained models are publicly available at https://github.com/CVIU-CSU/TCT-InfoNCE.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# PreT: ビジョンと言語ナビゲーションのための指向性軌道による計画 PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation ( http://arxiv.org/abs/2407.11487v1 ) ライセンス: Link先を確認	Renjie Lu, Jingke Meng, Wei-Shi Zheng,	(参考訳) 視覚と言語ナビゲーションは、エージェントが自然言語の指示に従ってナビゲートする必要があるタスクである。近年の手法では、各ステップで構築されたトポロジーマップのサブゴールを予測し、長期的な行動計画を可能にする。しかし、GCNのようなモデルでそのような高いレベルの予測をサポートしようとすると、高い計算コストに悩まされる。本研究では,初期ノードから有向グラフ上の候補位置への経路を参照し,指示と指向性軌道のアライメントを考慮し,ナビゲーション計画を容易にする方法を提案する。この計画戦略は、高いパフォーマンスを達成しつつ、効率的なモデルにつながる。具体的には、環境の探索領域を図示する有向グラフを導入し、方向性を強調する。次に、まず、軌道表現を、対応する方向に基づいてパノラマから抽出された有向エッジ特徴の列として定義する。最終的に、ナビゲーション中の命令と異なるトラジェクトリのアライメントを評価し、比較し、次のナビゲーションターゲットを決定する。提案手法は,従来のSOTA法であるBEVBertをRxRデータセットで上回り,計算コストを大幅に削減しながらR2Rデータセットで比較する。コードはhttps://github.com/iSEE-Laboratory/VLN-PRET.comで入手できる。 Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction. Recent methods predict sub-goals on constructed topology map at each step to enable long-term action planning. However, they suffer from high computational cost when attempting to support such high-level predictions with GCN-like models. In this work, we propose an alternative method that facilitates navigation planning by considering the alignment between instructions and directed fidelity trajectories, which refers to a path from the initial node to the candidate locations on a directed graph without detours. This planning strategy leads to an efficient model while achieving strong performance. Specifically, we introduce a directed graph to illustrate the explored area of the environment, emphasizing directionality. Then, we firstly define the trajectory representation as a sequence of directed edge features, which are extracted from the panorama based on the corresponding orientation. Ultimately, we assess and compare the alignment between instruction and different trajectories during navigation to determine the next navigation target. Our method outperforms previous SOTA method BEVBert on RxR dataset and is comparable on R2R dataset while largely reducing the computational cost. Code is available: https://github.com/iSEE-Laboratory/VLN-PRET.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# 持続可能な家庭環境における多目的強化学習のためのメタラーニングアプローチ A Meta-Learning Approach for Multi-Objective Reinforcement Learning in Sustainable Home Environments ( http://arxiv.org/abs/2407.11489v1 ) ライセンス: Link先を確認	Junlin Lu, Patrick Mannion, Karl Mason,	(参考訳) 効果的な家電機器のスケジューリングは、持続可能な生活に不可欠である。多目的強化学習(MORL)は、アプライアンススケジューリングにおいてユーザの嗜好のバランスをとるのに有効であることが証明されているが、従来のMORLは、再生可能生成のバリエーションを特徴とする非定常住宅環境において、限られたデータを扱う。学習済みのポリシーを無効にできる重要なコンテキストシフト。これらの課題に対処するため、我々はメタラーニングパラダイムを用いて最先端のMORLアルゴリズムを拡張し、シフトするコンテキストへの高速で少数ショットの適応を可能にする。さらに,環境環境変化を検出するために,自動エンコーダ(AE)に基づく教師なしの手法を用いる。また,ロンドンの住宅環境から得られた実世界データを用いて,住宅エネルギー環境の評価を行った。本研究は,住宅機器スケジューリングにおけるMORLの適用性を評価するだけでなく,エネルギー管理におけるメタラーニングの有効性を裏付けるものである。我々のトップパフォーマンス手法は最高のベースラインをはるかに上回り、訓練されたモデルは電気料金の3.28%を節約し、2.74%のユーザー快適化と5.9%の実用性向上を実現している。さらに、ソリューションの幅を62.44%削減する。注目すべきは、これらのゲインは96.71%のトレーニングデータと61.1%のトレーニングステップを使用して達成されたことである。 Effective residential appliance scheduling is crucial for sustainable living. While multi-objective reinforcement learning (MORL) has proven effective in balancing user preferences in appliance scheduling, traditional MORL struggles with limited data in non-stationary residential settings characterized by renewable generation variations. Significant context shifts that can invalidate previously learned policies. To address these challenges, we extend state-of-the-art MORL algorithms with the meta-learning paradigm, enabling rapid, few-shot adaptation to shifting contexts. Additionally, we employ an auto-encoder (AE)-based unsupervised method to detect environment context changes. We have also developed a residential energy environment to evaluate our method using real-world data from London residential settings. This study not only assesses the application of MORL in residential appliance scheduling but also underscores the effectiveness of meta-learning in energy management. Our top-performing method significantly surpasses the best baseline, while the trained model saves 3.28% on electricity bills, a 2.74% increase in user comfort, and a 5.9% improvement in expected utility. Additionally, it reduces the sparsity of solutions by 62.44%. Remarkably, these gains were accomplished using 96.71% less training data and 61.1% fewer training steps.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# MMSD-Net:マルチモーダル・スタッタ検出に向けて MMSD-Net: Towards Multi-modal Stuttering Detection ( http://arxiv.org/abs/2407.11492v1 ) ライセンス: Link先を確認	Liangyu Nie, Sudarsana Reddy Kadiri, Ruchit Agrawal,	(参考訳) 発声は、世界中の7000万人以上の人々に影響を及ぼす、不規則な音声生成の破壊によって引き起こされる一般的な音声障害である。標準の自動音声処理ツールは、音声障害を考慮に入れず、入力として散らばった音声を提示しても有意義な結果が得られない。発声の自動検出は、効率的な文脈認識音声処理システムを構築するための重要なステップである。従来の手法では統計的アプローチとニューラルアプローチの両方が検討されていたが、これらの手法はすべて本質的にはユニモーダルである。本稿では,スタブリング検出のための最初のマルチモーダルニューラルネットワークであるMMSD-Netを提案する。実験と結果から, 視覚信号の導入は, 散乱検出に大きく寄与し, 既存の一様法に比べてF1スコアが2～17%向上することが示唆された。 Stuttering is a common speech impediment that is caused by irregular disruptions in speech production, affecting over 70 million people across the world. Standard automatic speech processing tools do not take speech ailments into account and are thereby not able to generate meaningful results when presented with stuttered speech as input. The automatic detection of stuttering is an integral step towards building efficient, context-aware speech processing systems. While previous approaches explore both statistical and neural approaches for stuttering detection, all of these methods are uni-modal in nature. This paper presents MMSD-Net, the first multi-modal neural framework for stuttering detection. Experiments and results demonstrate that incorporating the visual signal significantly aids stuttering detection, and our model yields an improvement of 2-17% in the F1-score over existing state-of-the-art uni-modal approaches.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# 高精度かつ制御可能な人間の動作予測のための意味的潜在方向の学習 Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction ( http://arxiv.org/abs/2407.11494v1 ) ライセンス: Link先を確認	Guowei Xu, Jiale Tao, Wen Li, Lixin Duan,	(参考訳) 確率的人間の動き予測(SHMP)の領域では、研究者はしばしばGANS、VAE、拡散モデルといった生成モデルに目を向けてきた。しかし、従来のほとんどのアプローチは、潜伏分布に関するガイダンスの欠如により、現実的かつ過去の動きに忠実な動きを正確に予測するのに苦労してきた。本稿では,SLD(Semantic Latent Directions)を課題の解決策として紹介し,意味的な動作の意味を学習し,SHMPの精度を高めるために潜在空間を制約することを目的とする。 SLDは一連の直交遅延方向を定義し、将来の動きの仮説をこれらの方向の線形結合として表している。このような情報ボトルネックを作成することにより、SLDは意味のある動きのセマンティクスを捕捉し、動き予測の精度を向上させる。さらに、SLDは、推論フェーズ中に潜伏方向の係数を調整することにより、制御可能な予測機能を提供する。 SLDを拡張して,予測の多様性を高めるために,一連の動作クエリを導入する。これらの動きクエリをSLD空間と整合させることにより、SLDはより正確で一貫性のある動き予測へとさらに前進する。広範に使用されているベンチマークで実施した広範囲な実験を通じて、現実性と多様性のバランスを維持しながら、動作を正確に予測する手法の優位性を示す。私たちのコードと事前訓練されたモデルは、https://github.com/GuoweiXu368/SLD-HMPで利用可能です。 In the realm of stochastic human motion prediction (SHMP), researchers have often turned to generative models like GANS, VAEs and diffusion models. However, most previous approaches have struggled to accurately predict motions that are both realistic and coherent with past motion due to a lack of guidance on the latent distribution. In this paper, we introduce Semantic Latent Directions (SLD) as a solution to this challenge, aiming to constrain the latent space to learn meaningful motion semantics and enhance the accuracy of SHMP. SLD defines a series of orthogonal latent directions and represents the hypothesis of future motion as a linear combination of these directions. By creating such an information bottleneck, SLD excels in capturing meaningful motion semantics, thereby improving the precision of motion predictions. Moreover, SLD offers controllable prediction capabilities by adjusting the coefficients of the latent directions during the inference phase. Expanding on SLD, we introduce a set of motion queries to enhance the diversity of predictions. By aligning these motion queries with the SLD space, SLD is further promoted to more accurate and coherent motion predictions. Through extensive experiments conducted on widely used benchmarks, we showcase the superiority of our method in accurately predicting motions while maintaining a balance of realism and diversity. Our code and pretrained models are available at https://github.com/GuoweiXu368/SLD-HMP.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# ReLaX-VQA:ビデオ品質評価向上のための残留フラグメントとレイヤスタック抽出 ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment ( http://arxiv.org/abs/2407.11496v1 ) ライセンス: Link先を確認	Xinyi Wang, Angeliki Katsenou, David Bull,	(参考訳) ユーザと共有プラットフォーム間でのユーザ生成コンテンツ(UGC)の交換が急速に増加し,ビデオ品質評価の必要性が高まっている。 UGCは主にコンシューマデバイスを使用して取得され、エンドユーザに到達する前に、複数ラウンドの圧縮やトランスコーディングが行われる。したがって、参照として元のコンテンツを必要とする従来の品質指標は使用できない。本稿では,ビデオコンテンツの多様性を評価する上での課題と,参照ビデオを使わずに品質を評価することを目的とした,新しいNo-Reference Video Quality Assessment(NR-VQA)モデルであるReLaX-VQAを提案する。 ReLaX-VQAは、残留フレームと光学フローの断片と、サンプルフレームの空間的特徴の異なる表現を用いて、動きと空間的知覚を高める。さらに、このモデルはディープニューラルネットワーク機能(Residual NetworksやVision Transformersなど)にレイヤスタック技術を用いることで抽象化を強化する。 4つのUGCデータセットの大規模なテストにより、ReLaX-VQAは、SRCC平均0.8658、PLCC0.8872の既存のNR-VQA法より優れていることが確認された。 NR-VQAのさらなる研究と応用を促進するために、コードとトレーニングされたモデルをオープンソースにします。 With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild has emerged. UGC is mostly acquired using consumer devices and undergoes multiple rounds of compression or transcoding before reaching the end user. Therefore, traditional quality metrics that require the original content as a reference cannot be used. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the diversity of video content and the assessment of its quality without reference videos. ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features (from Residual Networks and Vision Transformers). Extensive testing on four UGC datasets confirms that ReLaX-VQA outperforms existing NR-VQA methods with an average SRCC value of 0.8658 and PLCC value of 0.8872. We will open source the code and trained models to facilitate further research and applications of NR-VQA: https://github.com/xinyiW915/ReLaX-VQA.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# 橋梁過去と未来:インクリメンタル物体検出における情報非対称性の克服 Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection ( http://arxiv.org/abs/2407.11499v1 ) ライセンス: Link先を確認	Qijie Mo, Yipeng Gao, Shenghao Fu, Junkai Yan, Ancong Wu, Wei-Shi Zheng,	(参考訳) 漸進的な物体検出において、知識蒸留は破滅的な忘れを緩和する有効な方法であることが証明されている。しかし、過去の研究は古いモデルの知識の保存に重点を置いており、画像が過去、現在、将来の段階のカテゴリを同時に含んでいることを無視していた。オブジェクトの共起により、前景オブジェクトの定義が様々なステージで異なり、モデルの性能が大幅に制限されるため、最適化の目的は異なるステージにわたって矛盾する。この問題を解決するために, BPF (Bridge Past and Future') と呼ばれる手法を提案する。さらに,新しいクラスを学習する上で高い適応性を確保しつつ,古いクラスの忘れを緩和するために,背景の確率を十分に活用する新しいDwF(Distillation with Future)の損失を提案する。パスカルVOCとMS COCOのベンチマークで大規模な実験が行われた。 BPFはメモリなしで、様々な設定で現在の最先端のメソッドより優れている。コードはhttps://github.com/iSEE-Laboratory/BPF.comで入手できる。 In incremental object detection, knowledge distillation has been proven to be an effective way to alleviate catastrophic forgetting. However, previous works focused on preserving the knowledge of old models, ignoring that images could simultaneously contain categories from past, present, and future stages. The co-occurrence of objects makes the optimization objectives inconsistent across different stages since the definition for foreground objects differs across various stages, which limits the model's performance greatly. To overcome this problem, we propose a method called ``Bridge Past and Future'' (BPF), which aligns models across stages, ensuring consistent optimization directions. In addition, we propose a novel Distillation with Future (DwF) loss, fully leveraging the background probability to mitigate the forgetting of old classes while ensuring a high level of adaptability in learning new classes. Extensive experiments are conducted on both Pascal VOC and MS COCO benchmarks. Without memory, BPF outperforms current state-of-the-art methods under various settings. The code is available at https://github.com/iSEE-Laboratory/BPF.	翻訳日:2024-07-17 16:02:34 公開日:2024-07-16
# 有限データを用いた自己監督型異常検出を用いた変形性膝関節症患者の重症度評価のためのAIシステム An AI System for Continuous Knee Osteoarthritis Severity Grading Using Self-Supervised Anomaly Detection with Limited Data ( http://arxiv.org/abs/2407.11500v1 ) ライセンス: Link先を確認	Niamh Belton, Aonghus Lawlor, Kathleen M. Curran,	(参考訳) 既存の変形性膝関節症(OA)の診断精度と主観性は,現在進行中の議論と懸念の対象となっている。既存の自動化ソリューションは、これらの不完全なシステムをエミュレートするために訓練され、また、完全に教師されたトレーニングのために、大きな注釈付きデータベースに依存している。本研究は, 異常検出(AD)の原理を基礎として, 正常度中心までの距離に基づいて, 健常膝X線の堅牢な表現と重症度を学習する, 自動膝OAの3段階的連続グルーピング手法を提案する。 SS-FewSOMEは「正常」表現を学習し、健康な被験者のサンプルと既存の手法が必要とするラベルの3%しか必要としない自己監督型のAD技術である。第2段階では、このモデルを使用して、未ラベルデータのサブセットを'normal'または'anomalous'として擬似ラベルし、続いてCLIPで擬似ラベルを復号する。最終段階では、正規空間と異常空間という2つの表現空間の中心を学習する提案されたデュアルセンター表現学習(DCRL)を用いてラベル付きおよび擬似ラベル付きデータの再学習を行う。病気の重症度は、学習センターまでの距離に基づいて評価される。提案手法は,OA検出率において最大24%のマージンで既存の技術よりも優れており,疾患重症度スコアはKellgren-Lawrenceグレーティングシステムと人間専門家の成績と同等のレベルで相関している。コードはhttps://github.com/niamhbelton/SS-FewSOME_Disease_Severity_Knee_Osteoarthritisで公開されている。 The diagnostic accuracy and subjectivity of existing Knee Osteoarthritis (OA) ordinal grading systems has been a subject of on-going debate and concern. Existing automated solutions are trained to emulate these imperfect systems, whilst also being reliant on large annotated databases for fully-supervised training. This work proposes a three stage approach for automated continuous grading of knee OA that is built upon the principles of Anomaly Detection (AD); learning a robust representation of healthy knee X-rays and grading disease severity based on its distance to the centre of normality. In the first stage, SS-FewSOME is proposed, a self-supervised AD technique that learns the 'normal' representation, requiring only examples of healthy subjects and <3% of the labels that existing methods require. In the second stage, this model is used to pseudo label a subset of unlabelled data as 'normal' or 'anomalous', followed by denoising of pseudo labels with CLIP. The final stage involves retraining on labelled and pseudo labelled data using the proposed Dual Centre Representation Learning (DCRL) which learns the centres of two representation spaces; normal and anomalous. Disease severity is then graded based on the distance to the learned centres. The proposed methodology outperforms existing techniques by margins of up to 24% in terms of OA detection and the disease severity scores correlate with the Kellgren-Lawrence grading system at the same level as human expert performance. Code available at https://github.com/niamhbelton/SS-FewSOME_Disease_Severity_Knee_Osteoarthritis.	翻訳日:2024-07-17 15:52:21 公開日:2024-07-16
# Diff-MTS:大規模モデル時代に向けての産業時系列のための時間拡張条件拡散型AIGC Diff-MTS: Temporal-Augmented Conditional Diffusion-based AIGC for Industrial Time Series Towards the Large Model Era ( http://arxiv.org/abs/2407.11501v1 ) ライセンス: Link先を確認	Lei Ren, Haiteng Wang, Yuanjun Laili,	(参考訳) 産業多変量時系列(英: Industrial Multivariate Time Series, MTS)は、機械の状態を理解するための産業分野の批判的な見解である。しかし、データ収集の困難さとプライバシー上の懸念のため、産業インテリジェンスや工業用大規模モデルを構築する上で利用可能なデータは十分ではない。したがって、産業時系列データ生成は非常に重要である。既存の研究は通常、MTSを生成するためにGAN(Generative Adversarial Networks)を適用している。しかし、GANは発電機と識別器の共同訓練により不安定な訓練プロセスに苦しむ。本稿では,MTS生成のための時間拡張型条件適応拡散モデルDiff-MTSを提案する。 MTSデータの複雑な時間的依存関係とダイナミックスをよりよく扱うことを目的としている。具体的には,MTSの制御生成に適応的最大平均離散性 (Ada-MMD) 法が提案されている。拡散モデルの条件整合性を改善する。さらに、複雑な時間パターンを捕捉し、さらに合成時系列の品質を向上させるために、時間分解再構成UNet(TDR-UNet)を構築した。 C-MAPSSおよびFEMTOデータセットに関する総合的な実験により、提案したDiff-MTSは、GANベースの手法と比較して多様性、忠実性、有用性において著しく優れていることが示された。これらの結果から,Diff-MTSは産業データの生成を促進し,インテリジェントな保守と産業用大規模モデルの構築に寄与することが示唆された。 Industrial Multivariate Time Series (MTS) is a critical view of the industrial field for people to understand the state of machines. However, due to data collection difficulty and privacy concerns, available data for building industrial intelligence and industrial large models is far from sufficient. Therefore, industrial time series data generation is of great importance. Existing research usually applies Generative Adversarial Networks (GANs) to generate MTS. However, GANs suffer from unstable training process due to the joint training of the generator and discriminator. This paper proposes a temporal-augmented conditional adaptive diffusion model, termed Diff-MTS, for MTS generation. It aims to better handle the complex temporal dependencies and dynamics of MTS data. Specifically, a conditional Adaptive Maximum-Mean Discrepancy (Ada-MMD) method has been proposed for the controlled generation of MTS, which does not require a classifier to control the generation. It improves the condition consistency of the diffusion model. Moreover, a Temporal Decomposition Reconstruction UNet (TDR-UNet) is established to capture complex temporal patterns and further improve the quality of the synthetic time series. Comprehensive experiments on the C-MAPSS and FEMTO datasets demonstrate that the proposed Diff-MTS performs substantially better in terms of diversity, fidelity, and utility compared with GAN-based methods. These results show that Diff-MTS facilitates the generation of industrial data, contributing to intelligent maintenance and the construction of industrial large models.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# 制御情報は多言語テキスト生成と編集にどのように影響するか? How Control Information Influences Multilingual Text Image Generation and Editing? ( http://arxiv.org/abs/2407.11502v1 ) ライセンス: Link先を確認	Boqiang Zhang, Zuan Gao, Yadong Qu, Hongtao Xie,	(参考訳) 可読かつリアルなテキストで画像を生成することを目的とした拡散モデルにより、視覚テキスト生成は大幅に進歩した。最近の研究は主に、拡散モデルを制御するために標準フォントのテキストイメージを使用するコントロールネットベースのフレームワークを使用している。高品質テキスト生成における制御情報の重要性を認識し,入力符号化,異なる段階における役割,出力特性の3点からその影響を考察する。私たちの発見は、こう示しています。 1)入力制御情報は,カニーエッジや深度マップのような従来の入力と比較して特徴がある。 2)制御情報は認知過程の異なる段階で異なる役割を担っている。 3)出力制御機能は周波数領域におけるU-Netデコーダのベースとスキップ機能とは大きく異なる。これらの知見に基づいて,制御情報の最適化による生成品質向上を目的とした新しいフレームワークであるTextGenを提案する。本研究では、Fourier解析を用いて入力・出力特性を改善し、関連する情報を強調し、ノイズを低減する。さらに、制御情報の異なる役割を異なる段階で整合させるために、2段階生成フレームワークを用いる。さらに、トレーニングのための効果的で軽量なデータセットも導入する。本手法は、中国語と英語の両方のテキスト生成における最先端のパフォーマンスを実現する。コードとデータセットが利用可能になる。 Visual text generation has significantly advanced through diffusion models aimed at producing images with readable and realistic text. Recent works primarily use a ControlNet-based framework, employing standard font text images to control diffusion models. Recognizing the critical role of control information in generating high-quality text, we investigate its influence from three perspectives: input encoding, role at different stages, and output features. Our findings reveal that: 1) Input control information has unique characteristics compared to conventional inputs like Canny edges and depth maps. 2) Control information plays distinct roles at different stages of the denoising process. 3) Output control features significantly differ from the base and skip features of the U-Net decoder in the frequency domain. Based on these insights, we propose TextGen, a novel framework designed to enhance generation quality by optimizing control information. We improve input and output features using Fourier analysis to emphasize relevant information and reduce noise. Additionally, we employ a two-stage generation framework to align the different roles of control information at different stages. Furthermore, we introduce an effective and lightweight dataset for training. Our method achieves state-of-the-art performance in both Chinese and English text generation. The code and dataset will be made available.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# マスクを超えて: ショットセグメンテーションにおけるガイダンスタイプの再考 Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation ( http://arxiv.org/abs/2407.11503v1 ) ライセンス: Link先を確認	Shijie Chang, Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu,	(参考訳) 既存の数ショットセグメンテーション(FSS)メソッドは、主にプロトタイプの機能生成とクエリ対応マッチング機構に焦点を当てている。プロトタイプ機能を生成する上で重要なプロンプトとして、サポートセット内のイメージマスク型がデフォルト設定になっている。しかし、画像、テキスト、ボックス、マスクといった様々なタイプは、コンテキスト、クラス、ローカライゼーション、形状の外観に関する貴重な情報を提供することができる。既存の研究はガイダンスの特定の組み合わせに焦点を当てており、FSSを異なる研究分野に導く。 FSSにおけるガイダンスタイプの再検討は,サポートセットとクエリセットの結合の効率的な共同表現を探求することが期待される。本研究では、一般化されたFSSに7つのガイダンスパラダイムを提供し、テキスト、マスク、ボックス、画像からのプロンプトを統合するユニバーサルビジョン言語フレームワーク(UniFSS)を開発する。テキストおよび視覚埋め込みにおける大規模事前学習型視覚言語モデルの利点を生かし、UniFSSは高レベルの空間補正と対話ユニットの埋め込みを提案し、クラス内外見の多様性に直面する場合の純粋視覚マッチング手法によって生じる意味的あいまいさの欠点を克服する。大規模な実験により、UniFSSは最先端の手法よりも大幅に優れていることが示された。特に、弱アノテートなクラス認識ボックスパラダイムは、微アノテートされたマスクパラダイムを超越している。 Existing few-shot segmentation (FSS) methods mainly focus on prototype feature generation and the query-support matching mechanism. As a crucial prompt for generating prototype features, the pair of image-mask types in the support set has become the default setting. However, various types such as image, text, box, and mask all can provide valuable information regarding the objects in context, class, localization, and shape appearance. Existing work focuses on specific combinations of guidance, leading FSS into different research branches. Rethinking guidance types in FSS is expected to explore the efficient joint representation of the coupling between the support set and query set, giving rise to research trends in the weakly or strongly annotated guidance to meet the customized requirements of practical users. In this work, we provide the generalized FSS with seven guidance paradigms and develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image. Leveraging the advantages of large-scale pre-training vision-language models in textual and visual embeddings, UniFSS proposes high-level spatial correction and embedding interactive units to overcome the semantic ambiguity drawbacks typically encountered by pure visual matching methods when facing intra-class appearance diversities. Extensive experiments show that UniFSS significantly outperforms the state-of-the-art methods. Notably, the weakly annotated class-aware box paradigm even surpasses the finely annotated mask paradigm.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# シングルイメージ脱ハージングのためのヘイズアウェアアテンションネットワーク Haze-Aware Attention Network for Single-Image Dehazing ( http://arxiv.org/abs/2407.11505v1 ) ライセンス: Link先を確認	Lihan Tong, Yun Liu, Weijia Li, Liyuan Chen, Erkang Chen,	(参考訳) シングルイメージのデハージングは、画像からヘイズを取り除き、クリーンな背景の詳細を復元するコンピュータビジョンにおいて重要な課題である。従来の物理モデルに基づく手法の限界と、現在の注意に基づくソリューションの非効率性を認識し、革新的なHaze-Aware Attention Module(HAAM)とMFEM(Multiscale Frequency Enhancement Module)を組み合わせた新しいデハージングネットワークを提案する。 HAAMは大気散乱モデルにインスパイアされ、物理的原理を高次元の特徴に巧みに組み込んで目的の脱ハージングを行う。 MFEMは高頻度の詳細を効率的に強化し、ウェーブレットやフーリエ変換の複雑さをサイドステッピングする。マルチスケールフィールドを使用して、パラメータのオーバーヘッドを最小限に抑えたキー周波数成分を抽出し、強調する。 Haze-Aware Attention Network(HAA-Net)は、単純なU-Netフレームワークに統合され、単一イメージのデハージングのためのネットワークであり、既存の注意ベースのモデルやトランスフォーマーモデルよりも効率と効果が大幅に向上します。さまざまな公開データセットでテストされ、HAA-Netは新しいパフォーマンスベンチマークを設定している。我々の研究は、画像デハジングの分野を前進させるだけでなく、コンピュータビジョンの幅広い応用のための注意機構の設計に関する洞察も提供する。 Single-image dehazing is a pivotal challenge in computer vision that seeks to remove haze from images and restore clean background details. Recognizing the limitations of traditional physical model-based methods and the inefficiencies of current attention-based solutions, we propose a new dehazing network combining an innovative Haze-Aware Attention Module (HAAM) with a Multiscale Frequency Enhancement Module (MFEM). The HAAM is inspired by the atmospheric scattering model, thus skillfully integrating physical principles into high-dimensional features for targeted dehazing. It picks up on latent features during the image restoration process, which gives a significant boost to the metrics, while the MFEM efficiently enhances high-frequency details, thus sidestepping wavelet or Fourier transform complexities. It employs multiscale fields to extract and emphasize key frequency components with minimal parameter overhead. Integrated into a simple U-Net framework, our Haze-Aware Attention Network (HAA-Net) for single-image dehazing significantly outperforms existing attention-based and transformer models in efficiency and effectiveness. Tested across various public datasets, the HAA-Net sets new performance benchmarks. Our work not only advances the field of image dehazing but also offers insights into the design of attention mechanisms for broader applications in computer vision.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# 大規模言語モデルによる推論 Reasoning with Large Language Models, a Survey ( http://arxiv.org/abs/2407.11511v1 ) ライセンス: Link先を確認	Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back,	(参考訳) 言語モデルを数十億のパラメータにスケールアップすることは、コンテキスト内学習の可能性を開放し、モデルが特にトレーニングされていないタスクに対して、命令チューニングと数発の学習を可能にする。これは、翻訳、要約、質問応答といった言語タスクにおいて画期的なパフォーマンスを達成した。さらに、これらの連想的な「システム1」タスクに加えて、近年の思考の即興学習の進歩は強力な「システム2」推論能力を示し、LLMが推論できるかどうかという人工知能分野の疑問に答えている。この分野は、LLMが小学校の数学用語の問題を解くことができるかどうかという問題から始まった。本稿では,LSMによるプロンプトベース推論の急速に進展する分野について概説する。我々の分類学は、多段階推論の生成、評価、制御の異なる方法を特定します。我々は,コアアプローチとオープンな問題に関する詳細な情報を提供し,近い将来の研究課題を提案する。最後に、推論と素早い学習の関係を強調し、推論、逐次決定過程、強化学習の関係について論じる。我々は,自己改善,自己回帰,および推論過程のメタ認知能力が,プロンプトの司法的利用によって可能であることを発見した。 LLMによる推論からLLMによる推論まで、真の自己改善と自己推論は、今後も継続する。 Scaling up language models to billions of parameters has opened up possibilities for in-context learning, allowing instruction tuning and few-shot learning on tasks that the model was not specifically trained for. This has achieved breakthrough performance on language tasks such as translation, summarization, and question-answering. Furthermore, in addition to these associative "System 1" tasks, recent advances in Chain-of-thought prompt learning have demonstrated strong "System 2" reasoning abilities, answering a question in the field of artificial general intelligence whether LLMs can reason. The field started with the question whether LLMs can solve grade school math word problems. This paper reviews the rapidly expanding field of prompt-based reasoning with LLMs. Our taxonomy identifies different ways to generate, evaluate, and control multi-step reasoning. We provide an in-depth coverage of core approaches and open problems, and we propose a research agenda for the near future. Finally, we highlight the relation between reasoning and prompt-based learning, and we discuss the relation between reasoning, sequential decision processes, and reinforcement learning. We find that self-improvement, self-reflection, and some metacognitive abilities of the reasoning processes are possible through the judicious use of prompts. True self-improvement and self-reasoning, to go from reasoning with LLMs to reasoning by LLMs, remains future work.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# ColorwAI: GANと拡散遠絡による織物の創成色調 ColorwAI: Generative Colorways of Textiles through GAN and Diffusion Disentanglement ( http://arxiv.org/abs/2407.11514v1 ) ライセンス: Link先を確認	Ludovica Schaerf, Andrea Alfarano, Eric Postma,	(参考訳) カラーウェイ生成は、下層のパターンを維持した交互な色変化で繊維サンプルを生成するタスクである。カラーウェイに適したカラーパレットの分割は、クライアントと市場のニーズ、スタイルと文化の仕様、ムードに応じて複雑な創造的なタスクである。本稿では,最小形状修正を含む「生成色道」作成という課題の修正を紹介し,StyleGAN と Diffusion のカラーアンタングルを用いて,この課題に対処するためのフレームワーク "ColorwAI" を提案する。教師付きディコンタングルメントのためのInterfaceGAN法のバリエーションであるShapleyVecを紹介する。検出された遅延方向のいくつかの次元をサブセレクトするために、Shapley値を使用する。さらに, セマンティックな潜在空間を持つ任意のアーキテクチャ上で, 共通不整合法を採用し, 拡散とGANで検証する一般的なフレームワークを提案する。モデルの潜在空間における色表現を解釈する。 StyleGANのW空間は、人間の色の概念と最もよく一致している。最後に,色道創出のための創造的システムと,専門的なアンケートや創造的理論を通じて評価することを提案する。 Colorway creation is the task of generating textile samples in alternate color variations maintaining an underlying pattern. The individuation of a suitable color palette for a colorway is a complex creative task, responding to client and market needs, stylistic and cultural specifications, and mood. We introduce a modification of this task, the "generative colorway" creation, that includes minimal shape modifications, and propose a framework, "ColorwAI", to tackle this task using color disentanglement on StyleGAN and Diffusion. We introduce a variation of the InterfaceGAN method for supervised disentanglement, ShapleyVec. We use Shapley values to subselect a few dimensions of the detected latent direction. Moreover, we introduce a general framework to adopt common disentanglement methods on any architecture with a semantic latent space and test it on Diffusion and GANs. We interpret the color representations within the models' latent space. We find StyleGAN's W space to be the most aligned with human notions of color. Finally, we suggest that disentanglement can solicit a creative system for colorway creation, and evaluate it through expert questionnaires and creativity theory.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# 最大値離散性最適化によるエンサンブルトランスポートフィルタ Ensemble Transport Filter via Optimized Maximum Mean Discrepancy ( http://arxiv.org/abs/2407.11518v1 ) ライセンス: Link先を確認	Dengfei Zeng, Lijian Jiang,	(参考訳) 本稿では, 粒子を後続粒子へ直接輸送する輸送マップを用いて, 粒子フィルタの解析工程を再構築し, アンサンブルに基づく新しいフィルタ法を提案する。このトランスポートマップは、最大平均離散損失関数によって記述された最適化問題によって構築され、近似された後部および基準後部の期待情報と一致する。提案手法は, 粒子フィルタリングから後方分布の正確な推定を継承する。最大平均離散性のロバスト性を改善するために、最適化を導くために分散ペナルティ項を用いる。これは、近似された後部と参照された後部の非常に情報性の高い統計の期待の相違を最小化することを優先する。ペナルティ項は提案手法のロバスト性を大幅に向上させ, 後部近似の精度を向上させる。アンサンブルカルマンフィルタに対する提案手法の利点を示すために,いくつかの数値例を示す。 In this paper, we present a new ensemble-based filter method by reconstructing the analysis step of the particle filter through a transport map, which directly transports prior particles to posterior particles. The transport map is constructed through an optimization problem described by the Maximum Mean Discrepancy loss function, which matches the expectation information of the approximated posterior and reference posterior. The proposed method inherits the accurate estimation of the posterior distribution from particle filtering. To improve the robustness of Maximum Mean Discrepancy, a variance penalty term is used to guide the optimization. It prioritizes minimizing the discrepancy between the expectations of highly informative statistics for the approximated and reference posteriors. The penalty term significantly enhances the robustness of the proposed method and leads to a better approximation of the posterior. A few numerical examples are presented to illustrate the advantage of the proposed method over the ensemble Kalman filter.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# スター型三部体状態による未知の量子ビットのテレポーテーション Teleportation of unknown qubit via Star type tripartite states ( http://arxiv.org/abs/2407.11519v1 ) ライセンス: Link先を確認	Anushree Bhattacharjee, Abhijit Mandal, Sovik Roy,	(参考訳) Eylee Jung \textit{et.al}\cite{jung2008} は、$P_{max}=\frac{1}{2}$ は完全二元テレポーテーションの必要十分条件であり、従って絡み合い資源のグロベリア測度は$\frac{1}{\sqrt{2}}$ でなければならないと推測していた。プロトタイプの$W$ stateは標準的なテレポーテーションには役に立たないことも知られている。 Agrawal と Pati\cite{pati2006} は、非プロトタイプ$W$状態の完全(標準)テレポーテーションを成功させた。 Pati's protocol\cite{pati2006} に従えば、$Star$型三分項状態を考えることができ、完全テレポーテーションがそのような状態に適していることを示すことができる。さらに、原型でない$W$状態の線型重ね合わせとそのスピンフルプ版を取り、$Star$クラスに属することを示した。また、これらの状態に対して標準的なテレポーテーションが可能である。標準テレポーテーションを成功させるためのチャネルとして状態を使用するためには、真の三部体の絡み合いが必須ではないことが観察された。これらの$Star$クラス状態は$P_{max}=\frac{1}{4}$状態であり、グロヴェリアの絡み合いは$\frac{\sqrt{3}}{2}$であることを示し、ユング予想は必要条件ではないと結論づけた。 Eylee Jung \textit{et.al}\cite{jung2008} had conjectured that $P_{max}=\frac{1}{2}$ is a necessary and sufficient condition for the perfect two-party teleportation and consequently the Groverian measure of entanglement for the entanglement resource must be $\frac{1}{\sqrt{2}}$. It is also known that prototype $W$ state is not useful for standard teleportation. Agrawal and Pati\cite{pati2006} have successfully executed perfect (standard) teleportation with non-prototype $W$ state. Aligned with Pati's protocol\cite{pati2006} we have considered here $Star$ type tripartite states and have shown that perfect teleportation is suitable with such states. Moreover, we have taken the linear superposition of non-prototype $W$ state and its spin-flipped version and shown that it belongs to $Star$ class. Also, standard teleportation is possible with these states. It is observed that genuine tripartite entanglement is not necessary requirement for a state to be used as a channel for successful standard teleportation. We have also shown that these $Star$ class states are $P_{max}=\frac{1}{4}$ states and their Groverian entanglement is $\frac{\sqrt{3}}{2}$, thus concluding that Jung conjecture is not a necessary condition.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# FIRE:マルチモーダルモデルのフィードバック統合とリファインメント評価のためのデータセット FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models ( http://arxiv.org/abs/2407.11522v1 ) ライセンス: Link先を確認	Pengxiang Li, Zhi Gao, Bofei Zhang, Tao Yuan, Yuwei Wu, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li,	(参考訳) 視覚言語モデル (VLM) は様々な応用において顕著な進歩を遂げており、研究の方向性として広く利用されている。本稿では、27のソースデータセットから得られた1.10万のマルチターン会話からなるフィードバックリファインメントデータセットであるFIREを構築し、多様なタスクにまたがるユーザフィードバックに基づいて、VLMが自発的に応答を洗練できるようにする。データ収集のスケールアップには、FIRE-100KとFIRE-1Mの2つのコンポーネントが使用される:FIRE-100KはGPT-4Vで生成され、FIRE-1MはFIRE-100Kで訓練されたモデルを介して自由に生成される。 FIRE-Benchは、VLMのフィードバック修正能力を総合的に評価するベンチマークであり、テストデータとして11Kのフィードバック修正会話、2つの評価設定、VLMのフィードバックを提供するモデルを含む。 FIRE-100K と FIRE-1M 上で LLaVA を微調整し、FIRE-Bench 上で顕著なフィードバック精錬能力を示し、未学習の VLM を50% 上回る性能を示し、より効率的なユーザエージェントインタラクションを実現し、FIRE データセットの重要性を裏付ける FIRE-LLaVA モデルを開発した。 Vision language models (VLMs) have achieved impressive progress in diverse applications, becoming a prevalent research direction. In this paper, we build FIRE, a feedback-refinement dataset, consisting of 1.1M multi-turn conversations that are derived from 27 source datasets, empowering VLMs to spontaneously refine their responses based on user feedback across diverse tasks. To scale up the data collection, FIRE is collected in two components: FIRE-100K and FIRE-1M, where FIRE-100K is generated by GPT-4V, and FIRE-1M is freely generated via models trained on FIRE-100K. Then, we build FIRE-Bench, a benchmark to comprehensively evaluate the feedback-refining capability of VLMs, which contains 11K feedback-refinement conversations as the test data, two evaluation settings, and a model to provide feedback for VLMs. We develop the FIRE-LLaVA model by fine-tuning LLaVA on FIRE-100K and FIRE-1M, which shows remarkable feedback-refining capability on FIRE-Bench and outperforms untrained VLMs by 50%, making more efficient user-agent interactions and underscoring the significance of the FIRE dataset.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# 高精度で低レイテンシな表面符号上での信念伝搬復号法の改良 Improved Belief Propagation Decoding on Surface Codes with High Accuracy and Low Latency ( http://arxiv.org/abs/2407.11523v1 ) ライセンス: Link先を確認	Jiahan Chen, Zhipeng Liang, Zhengzhong Yi, Xuan Wang,	(参考訳) 量子誤り訂正は普遍量子コンピューティングにとって不可欠であり、高精度で低遅延の復号アルゴリズムを必要とする。 BP(Breief Propagation)は、線形時間複雑性と量子LDPC符号の適用性で有名である。しかし、BPは秩序統計復号(OSD)後処理を使わずに高度に縮退したコードでは性能が悪く、時間的複雑さが著しく増大する。我々は、表面符号におけるBPの性能改善に焦点をあてる。まず,機械学習最適化技術にヒントを得たMomentum-BPとAdaGrad-BPを提案する。さらに、初期確率を適応的に更新し、積極的な探索能力を示すEWAInit-BPを提案する。 EWAInit-BPは、平面面符号、トーリック符号、XZZX面符号をOSD後処理することなく、BPの改善の中で最高精度を達成し、従来のBPに比べて1～3桁の精度向上を実現し、並列スケジューリング下でもエラー訂正能力を実証した。理論上のO(1)時間複雑性と高精度により、高精度リアルタイムデコーダの候補となる。 Quantum error correction is crucial for universal quantum computing, requiring highly accurate and low-latency decoding algorithms. Belief Propagation (BP) is notable for its linear time complexity and general applicability to quantum LDPC codes. However, BP performs poorly on highly degenerate codes without Order Statistic Decoding (OSD) post-processing, which significantly increases time complexity. We focus on improving BP's performance on surface codes. We first propose Momentum-BP and AdaGrad-BP, inspired by machine learning optimization techniques, to reduce the oscillation of message updating and break the symmetric trapping sets. We further propose EWAInit-BP, which adaptively updates initial probabilities and exhibits aggressive exploration capabilities. EWAInit-BP achieves the highest accuracy among BP improvements without OSD post-processing on planar surface code, toric code, and XZZX surface code, providing a 1~3 orders of magnitude improvement compared to traditional BP, and demonstrating the error correction capability even under parallel scheduling. Its theoretical O(1) time complexity and high accuracy make it a promising candidate for high-precision real-time decoders.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# 非コントラストCTスキャンにおける肺塞栓症同定のための相互学習フレームワーク Cross-Phase Mutual Learning Framework for Pulmonary Embolism Identification on Non-Contrast CT Scans ( http://arxiv.org/abs/2407.11529v1 ) ライセンス: Link先を確認	Bizhe Bai, Yan-Jie Zhou, Yujian Hu, Tony C. W. Mok, Yilang Xiang, Le Lu, Hongkun Zhang, Minfeng Xu,	(参考訳) 肺塞栓症 (PE) は, 急速かつ正確な診断が困難であるが, 主に非定型的な症状を呈する疾患である。肺動脈造影(CTPA)はクリニックでは金の標準画像診断ツールとして認められているが,緊急部(ED)患者に対しては禁忌であり,非造影CT(NCT)スキャンでPEの同定が必要である。本研究では,PE識別のための深層学習手法をNCTスキャンに適用する可能性について検討する。我々は,CTPAからNCTへの知識伝達を促進するクロスパス・ミューチュアル・ラーニング・フレームワーク(CPMN)を提案する。提案したCPMNは,2経路ネットワーク間の空間的整合性と相互学習を向上するIFA(Inter-Feature Alignment)戦略を活用する一方,IFD(Intra-Feature Disrepancy)戦略は,単一経路ネットワークの複雑な背景に対するPEの正確なセグメンテーションを容易にする。提案手法を総合的に評価するために,334名のPE患者と1,105名の健常者を含む大規模二重位相データセットを構築した。 CPMNは,NCTスキャンにおける患者レベルの感度および特異性において95.4\%,99.6\%の先行的識別性能を達成し,臨床におけるPE識別の経済的,アクセス性,正確なツールとしてのアプローチの可能性を示した。 Pulmonary embolism (PE) is a life-threatening condition where rapid and accurate diagnosis is imperative yet difficult due to predominantly atypical symptomatology. Computed tomography pulmonary angiography (CTPA) is acknowledged as the gold standard imaging tool in clinics, yet it can be contraindicated for emergency department (ED) patients and represents an onerous procedure, thus necessitating PE identification through non-contrast CT (NCT) scans. In this work, we explore the feasibility of applying a deep-learning approach to NCT scans for PE identification. We propose a novel Cross-Phase Mutual learNing framework (CPMN) that fosters knowledge transfer from CTPA to NCT, while concurrently conducting embolism segmentation and abnormality classification in a multi-task manner. The proposed CPMN leverages the Inter-Feature Alignment (IFA) strategy that enhances spatial contiguity and mutual learning between the dual-pathway network, while the Intra-Feature Discrepancy (IFD) strategy can facilitate precise segmentation of PE against complex backgrounds for single-pathway networks. For a comprehensive assessment of the proposed approach, a large-scale dual-phase dataset containing 334 PE patients and 1,105 normal subjects has been established. Experimental results demonstrate that CPMN achieves the leading identification performance, which is 95.4\% and 99.6\% in patient-level sensitivity and specificity on NCT scans, indicating the potential of our approach as an economical, accessible, and precise tool for PE identification in clinical practice.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# 潜時拡散による長さ認識運動合成 Length-Aware Motion Synthesis via Latent Diffusion ( http://arxiv.org/abs/2407.11532v1 ) ライセンス: Link先を確認	Alessio Sampieri, Alessio Palma, Indro Spinelli, Fabio Galasso,	(参考訳) 合成された人間の動きの目標期間は、運動力学とスタイルのモデリング制御を必要とする重要な属性である。アクションパフォーマンスのスピードアップは、単に高速な転送ではない。しかし、人間の行動合成の最先端技術は、ターゲットの配列長を限定的に制御できる。本稿では,テキスト記述子から長さ認識型3次元動作系列を生成する問題を導入し,可変長の動作を合成する新しいモデルを提案し,これをLength-Aware Latent Diffusion (LADiff) と呼ぶ。 LADiffは2つの新しいモジュールから構成される。 1) 長さ依存の潜時符号による動作表現を学習する長さ対応変分自動エンコーダ 2) 所要のターゲット配列長に応じて増大する細部が豊富である動きを生成できる長さ変化潜時拡散モデル。 LADiffは、HumanML3DとKIT-MLの2つの確立されたベンチマークにおいて、既存のモーションシンセサイザーの指標の大部分において、最先端よりも大幅に改善されている。 The target duration of a synthesized human motion is a critical attribute that requires modeling control over the motion dynamics and style. Speeding up an action performance is not merely fast-forwarding it. However, state-of-the-art techniques for human behavior synthesis have limited control over the target sequence length. We introduce the problem of generating length-aware 3D human motion sequences from textual descriptors, and we propose a novel model to synthesize motions of variable target lengths, which we dub "Length-Aware Latent Diffusion" (LADiff). LADiff consists of two new modules: 1) a length-aware variational auto-encoder to learn motion representations with length-dependent latent codes; 2) a length-conforming latent diffusion model to generate motions with a richness of details that increases with the required target sequence length. LADiff significantly improves over the state-of-the-art across most of the existing motion synthesis metrics on the two established benchmarks of HumanML3D and KIT-ML.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# LRQ:低ランクウェイトスケーリング行列の学習による大規模言語モデルの学習後量子化の最適化 LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices ( http://arxiv.org/abs/2407.11534v1 ) ライセンス: Link先を確認	Jung Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee,	(参考訳) 大規模言語モデル (LLM) の商用化に伴い, LLM の圧縮と高速化が実現し, 推論コストを低減しつつ高いスループットを実現している。しかし、LPMの重量と活性化を定量化するための既存のPTQ技術は、特に大規模なマルチタスク言語理解において、無視できない精度低下に悩まされている。そこで本研究では,低ランクウェイトスケーリング行列を利用して中間変圧器ブロックの出力を再構築し,学習可能なウェイトスケールを多く含む従来のフルウェイトスケーリング行列を置き換え,LLMの簡易かつ効果的なポストトレーニングウェイト量子化法としてLRQ(Low-Rank Quantization)$-$を提案する。低ランク構造によるパラメータ共有により、LRQは重みの個別のスケーリングを可能にしながらパラメータを著しく少ない値で学習する必要があり、量子化LLMの一般化能力を高めることができる。従来の LLM PTQ よりも LRQ の方が優れていることを示す。 (i)ウェイト8ドル、アクティベーションアクティベーション量化 (ii)4ドル/bit/tokenアクティベーション量子化と8ドル/bit/tokenアクティベーション量子化 (iii)低ビット量のみの量子化方式。私たちのコードは、LLM研究者やエンジニアを刺激するために、 \url{https://github.com/onliwad101/FlexRound_LRQ}で利用可能です。 With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization (PTQ) techniques for quantizing weights and activations of LLMs still suffer from non-negligible accuracy drops, especially on massive multitask language understanding. To address this issue, we propose Low-Rank Quantization (LRQ) $-$ a simple yet effective post-training weight quantization method for LLMs that reconstructs the outputs of an intermediate Transformer block by leveraging low-rank weight-scaling matrices, replacing the conventional full weight-scaling matrices that entail as many learnable scales as their associated weights. Thanks to parameter sharing via low-rank structure, LRQ only needs to learn significantly fewer parameters while enabling the individual scaling of weights, thus boosting the generalization capability of quantized LLMs. We show the superiority of LRQ over prior LLM PTQ works under (i) $8$-bit weight and per-tensor activation quantization, (ii) $4$-bit weight and $8$-bit per-token activation quantization, and (iii) low-bit weight-only quantization schemes. Our code is available at \url{https://github.com/onliwad101/FlexRound_LRQ} to inspire LLM researchers and engineers.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# 長期的理解とドメインエキスパートのための微調整医療用言語モデル Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise ( http://arxiv.org/abs/2407.11536v1 ) ライセンス: Link先を確認	Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan,	(参考訳) 大規模言語モデル(LLM)は様々な専門分野に広く応用されている。ドメイン固有の質問と回答データセットを用いてモデルを微調整することで、これらのモデルの専門的なドメイン知識とQ\&A能力が大幅に向上した。しかし、特定のドメイン知識の改善にもかかわらず、長いコンテキスト理解における医学的LLMの性能は、特に類似したパラメータを持つ一般的な言語モデルと比較して著しく低下している。本研究の目的は,医療用LLMにおける長文理解における性能低下現象について検討することである。我々は、オープンブックの専門的知識試験をすべてのモデルで実施し、長文の読みやすさを評価する一連の実験を設計した。微調整の過程で一般的なデータと医療データの比率と量を調整することで、プロのモデルを最適化し、長期のコンテキスト性能と特定のドメイン知識のバランスをとるのに最適なデータ構成を決定できる。 Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities. However, we observed that despite improvements in specific domain knowledge, the performance of medical LLM in long-context understanding has significantly declined, especially compared to general language models with similar parameters. The purpose of this study is to investigate the phenomenon of reduced performance in understanding long-context in medical LLM. We designed a series of experiments to conduct open-book professional knowledge exams on all models to evaluate their ability to read long-context. By adjusting the proportion and quantity of general data and medical data in the process of fine-tuning, we can determine the best data composition to optimize the professional model and achieve a balance between long-context performance and specific domain knowledge.	翻訳日:2024-07-17 15:52:20 公開日:2024-07-16
# AEMIM:マズード画像モデリングを例に AEMIM: Adversarial Examples Meet Masked Image Modeling ( http://arxiv.org/abs/2407.11537v1 ) ライセンス: Link先を確認	Wenzhao Xiang, Chang Liu, Hang Su, Hongyang Yu,	(参考訳) マスク付き画像モデリング(MIM)は,表現学習において顕著な進歩を遂げている。従来の手法の代替として、腐敗した画像からの復元が、最近、有望な前提課題として浮上した。しかし、正規の劣化画像はジェネリックジェネレータを用いて生成され、しばしば事前学習に関わる特定の再構成タスクに関連性がない。したがって、通常の劣化画像からの再構成は、プリテキストタスクの難しさを保証できないため、性能低下につながる可能性がある。さらに、劣化した画像を生成すると、余分なジェネレータが導入され、計算負荷が顕著になる可能性がある。これらの課題に対処するために,新たな再構成対象として,敵の例をマスク画像モデリングに組み込むことを提案する。トレーニング済みモデルのみを使用してオンラインで生成された逆例は、事前トレーニングに関連するタスクを直接的に破壊することを目的としている。したがって, 組織化は再建における課題のレベルを高くするだけでなく, 効率の向上にも寄与し, モデルによる優れた表現の獲得に寄与する。特に、原画像に対応する敵の例を再構成する、新しい補助的前文タスクを導入する。また,MIM事前学習において,より適切な対戦例を構築するために,革新的な敵攻撃を考案する。また,本手法は特定のモデルアーキテクチャやMIM戦略に限らず,すべてのMIM手法を拡張できる適応可能なプラグインであることを示す。既存のMIM法の一般化とロバスト性を増幅する手法として,本手法の顕著な能力について実験的に検証した。特に,本手法は,ImageNetやその変種,下流タスクなど,さまざまなタスクにおけるベースラインのパフォーマンスを上回ります。 Masked image modeling (MIM) has gained significant traction for its remarkable prowess in representation learning. As an alternative to the traditional approach, the reconstruction from corrupted images has recently emerged as a promising pretext task. However, the regular corrupted images are generated using generic generators, often lacking relevance to the specific reconstruction task involved in pre-training. Hence, reconstruction from regular corrupted images cannot ensure the difficulty of the pretext task, potentially leading to a performance decline. Moreover, generating corrupted images might introduce an extra generator, resulting in a notable computational burden. To address these issues, we propose to incorporate adversarial examples into masked image modeling, as the new reconstruction targets. Adversarial examples, generated online using only the trained models, can directly aim to disrupt tasks associated with pre-training. Therefore, the incorporation not only elevates the level of challenge in reconstruction but also enhances efficiency, contributing to the acquisition of superior representations by the model. In particular, we introduce a novel auxiliary pretext task that reconstructs the adversarial examples corresponding to the original images. We also devise an innovative adversarial attack to craft more suitable adversarial examples for MIM pre-training. It is noted that our method is not restricted to specific model architectures and MIM strategies, rendering it an adaptable plug-in capable of enhancing all MIM methods. Experimental findings substantiate the remarkable capability of our approach in amplifying the generalization and robustness of existing MIM methods. Notably, our method surpasses the performance of baselines on various tasks, including ImageNet, its variants, and other downstream tasks.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# カラーノイズ下におけるゲートセットトモグラフィーの微視的パラメトリゼーション Microscopic parametrizations for gate set tomography under coloured noise ( http://arxiv.org/abs/2407.11539v1 ) ライセンス: Link先を確認	P. Viñas, A. Bermudez,	(参考訳) ゲートセットトモグラフィ(GST)は、ノイズの多い量子情報プロセッサの自己整合性評価を可能にする。標準的なデバイスに依存しないアプローチは、QIPを物理法則に制約されるブラックボックスとして扱い、かなりのリソースコストで完全な汎用性を得る:ゲートセットから構築された多数の回路はゲートセットのパラメータを増幅するために実行されなければならない。本研究では, 駆動相における時間相関ノイズ下での量子ゲートの微視的パラメトリゼーションにより, GSTのより効率的なバージョンを実現するために必要な資源を削減できることを示す。雑音スペクトル密度上のフィルタ関数の定式化を利用して、各ゲートにおける有限相関時間と非マルコフ量子進化の影響を含むゲートセットの最小パラメトリゼーションについて議論する。我々は,本手法と標準長周期GSTを用いて得られた推定ゲートセットを比較し,それらの精度を確立された指標の観点から論じるとともに,特定例のサンプリング複雑性の観点からパラメタライズドアプローチの利点を示す。 Gate set tomography (GST) allows for a self-consistent characterization of noisy quantum information processors. The standard device-agnostic approach treats the QIPs as black boxes that are only constrained by the laws of physics, attaining full generality at a considerable resource cost: numerous circuits built from the gate set must be run in order to amplify each of the gate set parameters. In this work, we show that a microscopic parametrization of quantum gates under time-correlated noise on the driving phase, motivated by recent experiments with trapped-ion gates, reduces the required resources enabling a more efficient version of GST. By making use of the formalism of filter functions over the noise spectral densities, we discuss the minimal parametrizations of the gate set that include the effect of finite correlation times and non-Markovian quantum evolutions during the individual gates. We compare the estimated gate sets obtained by our method and the standard long-sequence GST, discussing their accuracies in terms of established metrics, as well as showcasing the advantages of the parametrized approach in terms of the sampling complexity for specific examples.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# もう1つの命令法: タブラリデータセットにおける値の欠落に対するトランスフォーマーベースモデル Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets ( http://arxiv.org/abs/2407.11540v1 ) ライセンス: Link先を確認	Camillo Maria Caruso, Paolo Soda, Valerio Guarrasi,	(参考訳) 表形式のデータセットで欠落した値を扱うことは、人工知能モデルのトレーニングとテストにおいて重大な課題となる。本稿では,従来の計算手法を必要とせずにこの問題に対処するために設計された,新しいトランスフォーマーベースモデルである"Not Another Imputation Method"(NAIM)を紹介する。 NAIMは機能固有の埋め込みと、利用可能なデータから効果的に学習するマスク付き自己認識機構を採用しており、欠落した値をインプットする必要がない。さらに、不完全なデータからモデルの一般化能力を高めるために、新しい正規化手法が導入された。 NAIMを利用可能な5つの表付きデータセット上で広範囲に評価し、6つの最先端機械学習モデルと4つのディープラーニングモデルよりも優れた性能を示し、必要に応じて3つの異なる計算手法を組み合わせた。その結果、NAIMが欠落したデータの存在下での予測性能とレジリエンスを向上させる効果が浮き彫りになった。そこで我々はNAIMのコードをhttps://github.com/cosbidev/NAIMで公開した。 Handling missing values in tabular datasets presents a significant challenge in training and testing artificial intelligence models, an issue usually addressed using imputation techniques. Here we introduce "Not Another Imputation Method" (NAIM), a novel transformer-based model specifically designed to address this issue without the need for traditional imputation techniques. NAIM employs feature-specific embeddings and a masked self-attention mechanism that effectively learns from available data, thus avoiding the necessity to impute missing values. Additionally, a novel regularization technique is introduced to enhance the model's generalization capability from incomplete data. We extensively evaluated NAIM on 5 publicly available tabular datasets, demonstrating its superior performance over 6 state-of-the-art machine learning models and 4 deep learning models, each paired with 3 different imputation techniques when necessary. The results highlight the efficacy of NAIM in improving predictive performance and resilience in the presence of missing data. To facilitate further research and practical application in handling missing data without traditional imputation methods, we made the code for NAIM available at https://github.com/cosbidev/NAIM.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# 相互予測のための一様加速度運動モデル Uniformly Accelerated Motion Model for Inter Prediction ( http://arxiv.org/abs/2407.11541v1 ) ライセンス: Link先を確認	Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu,	(参考訳) インター予測は、ビデオ符号化における時間的冗長性を減少させる重要な技術である。自然ビデオでは、通常、変動速度を持つ複数の移動物体が存在し、その結果、コンパクトに表現することが難しい複雑な運動場が生じる。 Versatile Video Coding (VVC) では、既存のインター予測手法は、通常、連続するフレーム間の均一な速度運動を仮定し、実世界の複雑な運動場をうまく扱えないような動き推定(ME)と動き補償(MC)に線形モデルを使用する。これらの問題に対処するために,動画フレーム間の移動物体の運動関連要素(速度,加速度)を利用する一様加速度運動モデル(UAMM)を導入し,その組み合わせにより,時間領域における変動運動を扱うための相互予測手法を支援する。具体的には、まずUAMMの理論について述べる。次に,UAMMに基づくパラメータ導出手法と外挿方式を提案する。第3に,UAMMを既存の予測モード(Merge, MMVD, CIIP)に統合し,高い予測精度を実現する。提案手法はVVC参照ソフトウェアであるVTMバージョン12.0に実装されている。実験の結果,VTMアンカーに比べて最大0.38%,平均0.13%のBDレート削減が可能であり,符号化/復号側では時間的複雑さがわずかに増大していることがわかった。 Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear models for motion estimation (ME) and motion compensation (MC), which may not well handle the complex motion fields in the real world. To address these issues, we introduce a uniformly accelerated motion model (UAMM) to exploit motion-related elements (velocity, acceleration) of moving objects between the video frames, and further combine them to assist the inter prediction methods to handle the variable motion in the temporal domain. Specifically, first, the theory of UAMM is mentioned. Second, based on that, we propose the UAMM-based parameter derivation and extrapolation schemes in the coding process. Third, we integrate the UAMM into existing inter prediction modes (Merge, MMVD, CIIP) to achieve higher prediction accuracy. The proposed method is implemented into the VVC reference software, VTM version 12.0. Experimental results show that the proposed method achieves up to 0.38% and on average 0.13% BD-rate reduction compared to the VTM anchor, under the Low-delay P configuration, with a slight increase of time complexity on the encoding/decoding side.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# 小形変圧器における計数理解:注意層とフィードフォワード層との相互作用 Understanding Counting in Small Transformers: The Interplay between Attention and Feed-Forward Layers ( http://arxiv.org/abs/2407.11542v1 ) ライセンス: Link先を確認	Freya Behrens, Luca Biggio, Lenka Zdeborová,	(参考訳) ヒストグラムタスクで訓練された単純なトランスフォーマーモデルの包括的解析を行い、固定アルファベットからの入力シーケンスにおける各項目の発生をカウントする。その明らかな単純さにもかかわらず、このタスクは、異なるアーキテクチャコンポーネントが、異なるアルゴリズムソリューションの出現にどのように貢献するかを特徴づける、豊富な現象論を示す。特に、ソリューション、関係性、在庫に基づく計数を実装する2つの定性的に異なるメカニズムの存在を示します。モデルが実装できるソリューションは、注意機構、アクティベーション機能、記憶能力、シーケンス開始トークンの存在の正確な選択に依存しない。計数作業における学習モデルのイントロスペクションにより、両方のメカニズムの形成の証拠を見出す。より広い視点から見ると、我々の分析は、トランスフォーマーモデルの異なるアーキテクチャコンポーネントの相互作用が、様々なアルゴリズムの解と近似をどう形成するかを理解するためのフレームワークを提供する。 We provide a comprehensive analysis of simple transformer models trained on the histogram task, where the goal is to count the occurrences of each item in the input sequence from a fixed alphabet. Despite its apparent simplicity, this task exhibits a rich phenomenology that allows us to characterize how different architectural components contribute towards the emergence of distinct algorithmic solutions. In particular, we showcase the existence of two qualitatively different mechanisms that implement a solution, relation- and inventory-based counting. Which solution a model can implement depends non-trivially on the precise choice of the attention mechanism, activation function, memorization capacity and the presence of a beginning-of-sequence token. By introspecting learned models on the counting task, we find evidence for the formation of both mechanisms. From a broader perspective, our analysis offers a framework to understand how the interaction of different architectural components of transformer models shapes diverse algorithmic solutions and approximations.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# スパース確率的ブールネットワーク構築に向けた離散的展望 A Discrete Perspective Towards the Construction of Sparse Probabilistic Boolean Networks ( http://arxiv.org/abs/2407.11543v1 ) ライセンス: Link先を確認	Christopher H. Fok, Chi-Wing Wong, Wai-Ki Ching,	(参考訳) ブールネットワーク(英語版)(BN)とその拡張確率ブールネットワーク(英語版)(PBN)は遺伝制御ネットワークを研究するための一般的な数学的モデルである。 BNとPBNは、製造業のモデルシステム、金融リスク、医療サービスシステムにも適用されている。本稿では,スパースPBNを構築するための新しいGreedy Entry removal (GER)アルゴリズムを提案する。既存のアルゴリズムとGERアルゴリズムの両方に対して理論上界を導出する。さらに、スパースPBNの構築における下界問題を初めて研究し、関連する理論結果のシリーズを導出する。合成データと実用データの両方に基づく数値実験では、GERは最先端のスパースPBN構築アルゴリズムの中で最高の性能を示し、テスト中の遷移確率行列のほとんどで可能な限りの分解を出力する。 Boolean Network (BN) and its extension Probabilistic Boolean Network (PBN) are popular mathematical models for studying genetic regulatory networks. BNs and PBNs are also applied to model manufacturing systems, financial risk and healthcare service systems. In this paper, we propose a novel Greedy Entry Removal (GER) algorithm for constructing sparse PBNs. We derive theoretical upper bounds for both existing algorithms and the GER algorithm. Furthermore, we are the first to study the lower bound problem of the construction of sparse PBNs, and to derive a series of related theoretical results. In our numerical experiments based on both synthetic and practical data, GER gives the best performance among state-of-the-art sparse PBN construction algorithms and outputs sparsest possible decompositions on most of the transition probability matrices being tested.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# スパース・デンス混合符号化プロセスにおけるMajoranaオブジェクトの無散逸位相量子計算 Dissipationless topological quantum computation for Majorana objects in sparse-dense mixed encoding process ( http://arxiv.org/abs/2407.11544v1 ) ライセンス: Link先を確認	Ye-Min Zhan, Guan-Dong Mao, Yu-Ge Chen, Yue Yu, Xi Luo,	(参考訳) 2量子ビットの量子ゲートの少なくともいくつかは、量子ビットのフェルミオン(電荷またはスピン)パリティに依存しているため、マヨラナ天体に基づくトポロジカル量子計算は重要な課題である。この依存は、量子回路モデル内で量子プロセスを進めようとするとき、これらのゲートを含む量子演算を確率的に表す。このようなアプローチは、測定が望ましくないフェルミオンパリティをもたらすと、重大な情報損失につながる。情報の浪費問題を解決するため,不要なフェルミオンパリティから所望のフェミオンパリティへの情報の非散逸補正を可能にするトポロジカルな操作を考案した。我々は、制御NOTゲートに対してスパース・デンス混合符号化プロセスを用いて、計算量子ビットが持つ量子情報に影響を与えることなく、どのように修正を行うかを説明する。この補正プロセスは、望ましくない入力量子ビットかフェルミオンパリティ依存の量子ゲートのいずれかに適用することができ、Majorana-zero-modeベースおよびMajorana-edge-modeベースのトポロジカル量子計算に有効である。 Topological quantum computation based on Majorana objects is subject to a significant challenge because at least some of the two-qubit quantum gates rely on the fermion (either charge or spin) parity of the qubits. This dependency renders the quantum operations involving these gates probabilistic when attempting to advance quantum processes within the quantum circuit model. Such an approach leads to significant information loss whenever measurements yield the undesired fermion parity. To resolve the problem of wasting information, we devise topological operations that allow for the non-dissipative correction of information from undesired fermion parity to the desired one. We will use the sparse-dense mixed encoding process for the controlled-NOT gate as an example to explain how corrections can be implemented without affecting the quantum information carried by the computational qubits. This correction process can be applied {to} either the undesired input qubits or the fermion parity-dependent quantum gates, and it works for both Majorana-zero-mode-based and Majorana-edge-mode-based topological quantum computation.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# V2X-M2C:2つの接続を持つ効率的な多モジュール協調知覚 V2X-M2C: Efficient Multi-Module Collaborative Perception with Two Connections ( http://arxiv.org/abs/2407.11546v1 ) ライセンス: Link先を確認	Hyunchul Bae, Minhee Kang, Heejin Ahn,	(参考訳) 本稿では、他の車両や道路インフラとの通信による自動運転車の認識性能の向上について検討する。この目的のために、複数のモジュールからなる協調認識モデル$\textbf{V2X-M2C}$を導入し、それぞれがエージェント間補完情報、空間的グローバルコンテキスト、空間的局所情報を生成する。既存のアーキテクチャがなぜシーケンシャルなのかという疑問に触発され、$\textit{sequential}$と$\textit{parallel}$モジュールの接続の両方を分析します。逐次接続はモジュールを相乗化するが、並列接続は各モジュールを独立的に改善する。大規模な実験により、V2X-M2Cは最先端の知覚性能を達成し、検出精度は8.00%から10.87%に向上し、FLOPは42.81%から52.64%に低下した。 In this paper, we investigate improving the perception performance of autonomous vehicles through communication with other vehicles and road infrastructures. To this end, we introduce a collaborative perception model $\textbf{V2X-M2C}$, consisting of multiple modules, each generating inter-agent complementary information, spatial global context, and spatial local information. Inspired by the question of why most existing architectures are sequential, we analyze both the $\textit{sequential}$ and $\textit{parallel}$ connections of the modules. The sequential connection synergizes the modules, whereas the parallel connection independently improves each module. Extensive experiments demonstrate that V2X-M2C achieves state-of-the-art perception performance, increasing the detection accuracy by 8.00% to 10.87% and decreasing the FLOPs by 42.81% to 52.64%.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# 人格特性がネゴシエーションに与える影響 : 大規模言語モデルに基づくシミュレーション How Personality Traits Influence Negotiation Outcomes? A Simulation based on Large Language Models ( http://arxiv.org/abs/2407.11549v1 ) ライセンス: Link先を確認	Yin Jou Huang, Rafik Hadfi,	(参考訳) 心理学的証拠は、人格特性が意思決定に与える影響を明らかにしている。例えば、合意性は一般的に交渉において肯定的な結果と結びついているのに対し、神経症はしばしば好ましくない結果と結びついている。本稿では,Large Language Model (LLM) エージェントに着目したシミュレーションフレームワークを提案する。エージェントはドメインを交渉し、カスタマイズ可能なパーソナリティと目的を持つ。実験結果から, LLMシミュレーションの行動傾向は, 人間の交渉で観察された行動パターンを再現できることが示唆された。コントリビューションは2倍です。まず,LLMエージェントの言語的能力と経済的能力の整合性を検討するシミュレーション手法を提案する。第2に、二国間交渉の結果に対するビッグファイブの性格特性の戦略的影響に関する実証的な洞察を提供する。また, 合成交渉に基づく事例研究を行い, 騙し行動や妥協行動など, 興味深い行動を明らかにする。 Psychological evidence reveals the influence of personality traits on decision-making. For instance, agreeableness is generally associated with positive outcomes in negotiations, whereas neuroticism is often linked to less favorable outcomes. This paper introduces a simulation framework centered on Large Language Model (LLM) agents endowed with synthesized personality traits. The agents negotiate within bargaining domains and possess customizable personalities and objectives. The experimental results show that the behavioral tendencies of LLM-based simulations could reproduce behavioral patterns observed in human negotiations. The contribution is twofold. First, we propose a simulation methodology that investigates the alignment between the linguistic and economic capabilities of LLM agents. Secondly, we offer empirical insights into the strategic impact of Big-Five personality traits on the outcomes of bilateral negotiations. We also provide a case study based on synthesized bargaining dialogues to reveal intriguing behaviors, including deceitful and compromising behaviors.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# LLMにおけるKVキャッシュの最適化:予算削減のための適応配置 Optimizing KV Cache Eviction in LLMs: Adaptive Allocation for Enhanced Budget Utilization ( http://arxiv.org/abs/2407.11550v1 ) ライセンス: Link先を確認	Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, S. Kevin Zhou,	(参考訳) 大規模言語モデルは様々な分野で優れているが、長いシーケンス推論に必要な広範なKVキャッシュのために効率の限界に直面している。多くの取り組みは、実行中に非クリティカルなキャッシュ要素を排除し、生成品質を維持しながら、所定のメモリ予算内でキャッシュサイズを削減しようとしている。我々の根底にある原則の再検討は、戦略が基本的に特定の予算配分内での排除損失の上限の上限を最小化することを目的としていることを明確にしている。しかし,現在実施されている,異なる注意点にまたがる予算均等化の実践は,世代別ポストエミッションの質を低下させる傾向にある。これらの結果を踏まえ, 従来の一様割当手法の損失上限を理論的に超過せず, 自己保持機構の特性と効果的に整合し, 上限を実質的に低減する, 単純かつ効果的な適応的割当アルゴリズムを提案する。さらに、このアルゴリズムを最も進んだ2つの方法に統合すると、Ada-SnapKVとAda-Pyramidが得られる。 16のデータセットとNeedle-in-a-Haystackテストにわたる大規模な実験的検証は、Ada-SnapKVとAda-Pyramidがさらなる拡張を実現し、最先端のパフォーマンスの新たなベンチマークを確立することを確認している。 Large Language Models have excelled in various fields but encounter efficiency limitations due to the extensive KV cache required for long sequences inference. Many efforts try to evict non-critical cache elements during runtime, thereby reducing cache size within a given memory budget while preserving generation quality. Our reexamination of their underlying principles discerns that prevailing strategies essentially aim to minimize an upper bound of eviction loss within a specific budget allocation. However, we observe that the current practice of uniformly allocating budgets across different attention heads during the eviction procedure tends to degrade the quality of generation posten-eviction. In light of these findings, we propose a simple yet effective adaptive allocation algorithm that not only theoretically ensures its loss upper bound does not exceed that of previous uniform allocation methods, but also effectively aligns with the characteristics of the self-attention mechanism, thus practically reducing the upper bound. Further, integrating this algorithm with two of the most advanced methods yields Ada-SnapKV and Ada-Pyramid. Extensive experimental validation across 16 datasets and the Needle-in-a-Haystack test confirm that Ada-SnapKV and Ada-Pyramid achieve further enhancements, establishing new benchmarks in state-of-the-art performance.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# 変圧器と2D-CNNによる電力負荷系列のグローバルおよび局所的特徴の学習:位相空間再構成を考慮した画像に基づく多段階予測手法 Learning Global and Local Features of Power Load Series Through Transformer and 2D-CNN: An image-based Multi-step Forecasting Approach Incorporating Phase Space Reconstruction ( http://arxiv.org/abs/2407.11553v1 ) ライセンス: Link先を確認	Zihan Tang, Tianyao Ji, Wenhu Tang,	(参考訳) 現代の電力システムは進化を続けており、正確な電力負荷予測は依然として重要な問題である。位相空間再構成法はシステム力学の観点から電力負荷のカオス的特性を効果的に維持することができ、電力負荷予測のための有望な知識ベース前処理法である。しかし、その基本的な理論によって制限されているため、現在の研究では、多段階予測スキームの実装にはまだギャップがある。このギャップを埋めるために、PSRとニューラルネットワークを統合することで、新しい多段階予測手法を提案する。まず,PSRの前処理から得られる位相軌跡の有用な特徴について詳細に述べる。数学的導出を通じて、PSRと他の時系列前処理法であるパッチセグメンテーションの等価な特徴を初めて示す。この事前知識に基づいて,グローバルかつ局所的な特徴抽出戦略を用いた画像ベースモデリングの視点を導入する。その後、画像のグローバルパターンと局所パターンの抽出にトランスフォーマーエンコーダと2次元畳み込みニューラルネットワークを用いるエンド・ツー・エンド処理のために、PSR-GALIENと呼ばれる新しいディープラーニングモデルが設計され、効率的な相関モデリングに多層認識に基づく予測器が使用される。次に、実世界の5つのベンチマークデータセットで広範な実験を行い、有効性を検証するとともに、詳細な特性に関する洞察を得る。その結果、PSR-GALIENの予測性能は、最先端の6つのディープラーニングモデルと比較すると、これらのベースラインを一貫して上回り、日中・日中両方の予測シナリオにおいて優れた精度が得られることがわかった。同時に,予測結果の属性を説明するために,可視化に基づく手法を提案する。 As modern power systems continue to evolve, accurate power load forecasting remains a critical issue. The phase space reconstruction method can effectively retain the chaotic characteristics of power load from a system dynamics perspective and thus is a promising knowledge-based preprocessing method for power load forecasting. However, limited by its fundamental theory, there is still a gap in implementing a multi-step forecasting scheme in current studies. To bridge this gap, this study proposes a novel multi-step forecasting approach by integrating the PSR with neural networks. Firstly, the useful features in the phase trajectory obtained from the preprocessing of PSR are discussed in detail. Through mathematical derivation, the equivalent characterization of the PSR and another time series preprocessing method, patch segmentation, is demonstrated for the first time. Based on this prior knowledge, an image-based modeling perspective with the global and local feature extraction strategy is introduced. Subsequently, a novel deep learning model, namely PSR-GALIEN, is designed for end-to-end processing, in which the Transformer Encoder and 2D-convolutional neural networks are employed for the extraction of the global and local patterns in the image, and a multi-layer perception based predictor is used for the efficient correlation modeling. Then, extensive experiments are conducted on five real-world benchmark datasets to verify the effectiveness as well as to have an insight into the detailed properties. The results show that, comparing it with six state-of-the-art deep learning models, the forecasting performance of PSR-GALIEN consistently surpasses these baselines, which achieves superior accuracy in both intra-day and day-ahead forecasting scenarios. At the same time, a visualization-based method is proposed to explain the attributions of the forecasting results.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# 拡散モデルを用いた微量試料の自己誘導生成 Self-Guided Generation of Minority Samples Using Diffusion Models ( http://arxiv.org/abs/2407.11555v1 ) ライセンス: Link先を確認	Soobin Um, Jong Chul Ye,	(参考訳) データ多様体の低密度領域に居住する少数サンプルを生成するための新しい手法を提案する。このフレームワークは拡散モデルに基づいて構築されており、推定時間中に任意のエネルギーベースのガイダンスを組み込んだガイドサンプリングの原理を利用している。サンプルのキーとなる特徴は、事前訓練されたモデルでのみ実装可能な、emph{self-contained} な性質である \ie にある。これは、マイノリティ世代のための高価な追加コンポーネント(外部分類器など)を必要とする既存の技術と、サンプルを区別します。具体的には、まず、その後部平均値の再構成損失を評価することにより、中間潜伏試料に含まれる特徴の可能性を推定する。生成は推定された推定値の最小化に進み、その後の時間経過の潜伏サンプルにおける少数の特徴の出現を促す。提案手法では,提案手法の精度向上を図るため,提案手法の有効性を適切に管理する時間スケジューリング手法を提案する。ベンチマーク実データセットの実験により、我々のアプローチは、コストのかかる追加要素に頼ることなく、既存の技術よりもリアルに低品質なマイノリティインスタンスを作成する能力を大幅に改善できることを示した。コードは \url{https://github.com/soobin-um/sg-minority} で公開されている。 We present a novel approach for generating minority samples that live on low-density regions of a data manifold. Our framework is built upon diffusion models, leveraging the principle of guided sampling that incorporates an arbitrary energy-based guidance during inference time. The key defining feature of our sampler lies in its \emph{self-contained} nature, \ie, implementable solely with a pretrained model. This distinguishes our sampler from existing techniques that require expensive additional components (like external classifiers) for minority generation. Specifically, we first estimate the likelihood of features within an intermediate latent sample by evaluating a reconstruction loss w.r.t. its posterior mean. The generation then proceeds with the minimization of the estimated likelihood, thereby encouraging the emergence of minority features in the latent samples of subsequent timesteps. To further improve the performance of our sampler, we provide several time-scheduling techniques that properly manage the influence of guidance over inference steps. Experiments on benchmark real datasets demonstrate that our approach can greatly improve the capability of creating realistic low-likelihood minority instances over the existing techniques without the reliance on costly additional elements. Code is available at \url{https://github.com/soobin-um/sg-minority}.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# RobotKeyframing:DenseとSparse Rewardsを併用した高レベルオブジェクトによるロコモーション学習 RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards ( http://arxiv.org/abs/2407.11562v1 ) ライセンス: Link先を確認	Fatemeh Zargarbashi, Jin Cheng, Dongho Kang, Robert Sumner, Stelian Coros,	(参考訳) 本稿では,手足ロボットの自然な移動にキーフレーミングを用いて高次目標を組み込む新しい学習ベース制御フレームワークを提案する。これらの高レベルな目的は、任意に時間内に空間化された部分的または完全なポーズターゲットの可変数として指定される。提案手法は,高密度およびスパース報酬の混合を効果的に処理するために,多項強化学習アルゴリズムを利用する。さらに、トランスフォーマーベースのエンコーダを使用して、入力ターゲットの可変数に対応し、それぞれが特定の時間から到着時間に関連付けられている。シミュレーションとハードウェア実験を通じて,本フレームワークが要求されたタイミングでターゲットキーフレームシーケンスを効果的に満足できることを実証した。実験では、マルチクリティック法は標準の単一クリティック法と比較してハイパーパラメータチューニングの労力を大幅に削減する。さらに,トランスフォーマーをベースとしたアーキテクチャにより,ロボットは将来の目標を予測でき,目標達成能力の定量的改善が期待できる。 This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle the mixture of dense and sparse rewards. Additionally, it employs a transformer-based encoder to accommodate a variable number of input targets, each associated with specific time-to-arrivals. Throughout simulation and hardware experiments, we demonstrate that our framework can effectively satisfy the target keyframe sequence at the required times. In the experiments, the multi-critic method significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative. Moreover, the proposed transformer-based architecture enables robots to anticipate future goals, which results in quantitative improvements in their ability to reach their targets.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# SGIFormer:3次元インスタンスセグメンテーションのための意味誘導型および幾何学強化型インターリーブ変換器 SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation ( http://arxiv.org/abs/2407.11564v1 ) ライセンス: Link先を確認	Lei Yao, Yi Wang, Moyun Liu, Lap-Pui Chau,	(参考訳) 近年、トランスフォーマーベースのモデルでは、ポイントクラウドインスタンスのセグメンテーションにかなりの可能性がある。既存のメソッドが達成した有望なパフォーマンスにもかかわらず、インスタンスクエリの初期化問題や積み重ねられたレイヤへの過度な依存といった課題に直面し、大規模な3Dシーンと互換性がない。本稿ではSGIFormerという,SMQの初期化とGeometric-enhanced Interleaving Transformer(GIT)デコーダで構成される3Dインスタンスセグメンテーションのための新しい手法を提案する。具体的には、SMQ初期化方式の原則として、予測されたボクセルのセマンティック情報を利用して、暗黙的にシーン認識クエリを生成し、適切なシーンを事前に生成し、学習可能なクエリセットを補償する。その後、生成した全クエリをGITデコーダに入力し、インスタンスクエリとグローバルシーン機能を交互に洗練し、より詳細な情報を取得し、複雑な設計の複雑さを同時に低減する。幾何的特性を強調するため、偏差推定を補助的タスクとみなし、シフト点座標の埋め込みを段階的に統合し、インスタンスの局所化を強化する。 SGIFormerは、ScanNet V2、ScanNet200データセット、そして挑戦的な高忠実なScanNet++ベンチマークで最先端のパフォーマンスを達成し、正確性と効率のバランスを保った。コード、ウェイト、デモビデオはhttps://rayyoh.github.io/sgiformer.comで公開されている。 In recent years, transformer-based models have exhibited considerable potential in point cloud instance segmentation. Despite the promising performance achieved by existing methods, they encounter challenges such as instance query initialization problems and excessive reliance on stacked layers, rendering them incompatible with large-scale 3D scenes. This paper introduces a novel method, named SGIFormer, for 3D instance segmentation, which is composed of the Semantic-guided Mix Query (SMQ) initialization and the Geometric-enhanced Interleaving Transformer (GIT) decoder. Specifically, the principle of our SMQ initialization scheme is to leverage the predicted voxel-wise semantic information to implicitly generate the scene-aware query, yielding adequate scene prior and compensating for the learnable query set. Subsequently, we feed the formed overall query into our GIT decoder to alternately refine instance query and global scene features for further capturing fine-grained information and reducing complex design intricacies simultaneously. To emphasize geometric property, we consider bias estimation as an auxiliary task and progressively integrate shifted point coordinates embedding to reinforce instance localization. SGIFormer attains state-of-the-art performance on ScanNet V2, ScanNet200 datasets, and the challenging high-fidelity ScanNet++ benchmark, striking a balance between accuracy and efficiency. The code, weights, and demo videos are publicly available at https://rayyoh.github.io/sgiformer.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# TGIF:テキスト入力による偽造データ TGIF: Text-Guided Inpainting Forgery Dataset ( http://arxiv.org/abs/2407.11566v1 ) ライセンス: Link先を確認	Hannes Mareen, Dimitrios Karageorgiou, Glenn Van Wallendael, Peter Lambert, Symeon Papadopoulos,	(参考訳) デジタル画像操作は、生成AI技術の出現により、ますますアクセスしやすく、現実的なものになりつつある。近年の進歩により、テキストガイドによるインペイントが可能となり、最小限の努力で高度な画像編集が可能になった。これはデジタルメディアの法医学に新たな課題をもたらす。例えば、拡散モデルに基づくアプローチは、塗装された領域を元の画像に分割するか、あるいは全体像を再生することができる。後者の場合、従来のイメージフォージェリーローカライゼーション(IFL)メソッドは通常失敗する。本稿では,画像フォージェリローカライゼーションと合成画像検出(SID)手法のトレーニングと評価を支援するために設計された画像の包括的コレクションであるText-Guided Inpainting Forgery (TGIF)データセットを紹介する。 TGIFデータセットには、人気のオープンソースおよび商用メソッドであるSD2、SDXL、Adobe Fireflyから派生した、約80kの偽画像が含まれている。このデータを用いて、いくつかの最先端の IFL と SID の手法をベンチマークする。従来のIRF法ではスプライシング画像が検出できるが、再生されたインペイント画像は検出できない。さらに、従来のSIDは、再生した塗布された画像が偽のものであることを検出できるが、塗布された領域をローカライズすることはできない。最後に、どちらの手法も強い圧縮にさらされると失敗するが、WEBPのような現代の圧縮アルゴリズムでは堅牢ではない。この研究は、現代の生成的アプローチによる局所的な操作に対する最先端検出器の非効率性を実証し、より有能なIFL法とSID法の開発を支援することを目的としている。データセットはhttps://github.com/IDLabMedia/tgif-dataset.comからダウンロードできる。 Digital image manipulation has become increasingly accessible and realistic with the advent of generative AI technologies. Recent developments allow for text-guided inpainting, making sophisticated image edits possible with minimal effort. This poses new challenges for digital media forensics. For example, diffusion model-based approaches could either splice the inpainted region into the original image, or regenerate the entire image. In the latter case, traditional image forgery localization (IFL) methods typically fail. This paper introduces the Text-Guided Inpainting Forgery (TGIF) dataset, a comprehensive collection of images designed to support the training and evaluation of image forgery localization and synthetic image detection (SID) methods. The TGIF dataset includes approximately 80k forged images, originating from popular open-source and commercial methods; SD2, SDXL, and Adobe Firefly. Using this data, we benchmark several state-of-the-art IFL and SID methods. Whereas traditional IFL methods can detect spliced images, they fail to detect regenerated inpainted images. Moreover, traditional SID may detect the regenerated inpainted images to be fake, but cannot localize the inpainted area. Finally, both types of methods fail when exposed to stronger compression, while they are less robust to modern compression algorithms, such as WEBP. As such, this work demonstrates the inefficiency of state-of-the-art detectors on local manipulations performed by modern generative approaches, and aspires to help with the development of more capable IFL and SID methods. The dataset can be downloaded at https://github.com/IDLabMedia/tgif-dataset.	翻訳日:2024-07-17 15:42:36 公開日:2024-07-16
# 量子コヒーレンス, ダイナミクスとその接続 Quantum Coherence, Dynamics and Their connections ( http://arxiv.org/abs/2407.11568v1 ) ライセンス: Link先を確認	Hai Wang,	(参考訳) 量子コヒーレンス(quantum coherence)とは、量子力学の重ね合わせの性質に根ざした概念である。これまで、コヒーレンスの概念は射影やPOVMにも一般化され、様々なコヒーレンス対策が提案されてきた。しかし、量子資源理論の枠組みの下では、コヒーレンスとその様々な尺度の解釈は、通常、オペレーショナルな解釈に分類される。これまでのところ、コヒーレンスや絡み合いのような量子資源が状態の進化や力学過程においてどのような役割を果たすのかは明らかになっていない。この研究において、平均量子距離という新しい概念を導入することで、時間に依存しないハミルトン派にとって、平均の進化速度を決定するのは量子コヒーレンスであることを示す。さらに、単一量子ビット系では、システムの進化を完全に決定するのはコヒーレンスであることを示す。さらに、時間依存ハミルトニアンに対しては、ある量子ビットの進化における距離の上限を1つ与える。一般量子系では、量子コヒーレンスと量子力学の関係についても論じる。また,1つの副産物として,射影に関するコヒーレンス尺度の基準を検討するとともに,秩序保存条件と呼ばれる新しい条件を提案する。射影に対する新しいコヒーレンス尺度が提案されている。正則基底に関するコヒーレンス条件と比較すると、順序保存というこの新しい条件は、ヒルベルト空間の直交分解と直交基底の違いを強調し、我々の新しい測度で満たされる。 Quantum coherence, rooted in the superposition nature of quantum mechanics, is one core concept. Until now, the concept of coherence has also been generalized into projections and POVMs, and various coherence measures have been proposed. However, under the framework of quantum resource theory, interpretations of coherence and its various measures are usually grouped into the operational interpretation, where coherence measures correspond to the successful probability of discrimination tasks. Until now, it is still not clear which role quantum resources like coherence and entanglement play in states' evolutions and dynamical processes. In this work, by introducing the new concept, average quantum distance, we show that for time-independent Hamiltonians, it is quantum coherence that determines the evolution speed on average. Furthermore, for the single qubit system, we show that it is coherence that completely determines the system's evolution. Besides, for time-dependent Hamiltonians, we also give one upper bound for distance during one qubit's evolution. For general quantum systems, the relation between quantum coherence and quantum dynamics is also discussed. And as one byproduct, we examine the criterion for coherence measures about projections and propose one new condition called order preserving condition. One new coherence measure for projections is proposed. Compared with conditions for coherence about orthonormal basis, this new condition, order preserving, highlights the difference between orthogonal decompositions and orthonormal basis of Hilbert spaces and is satisfied by our new measure.	翻訳日:2024-07-17 15:32:52 公開日:2024-07-16
# SFPNet:Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds ( http://arxiv.org/abs/2407.11569v1 ) ライセンス: Link先を確認	Yanbo Wang, Wentao Zhao, Chuan Cao, Tianchen Deng, Jingchuan Wang, Weidong Chen,	(参考訳) LiDARセマンティックセグメンテーションは急速に進歩するが、機械的回転LiDARから派生したベンチマークから派生した特異的に設計された帰納的バイアスを組み込むことが多い。これはモデル一般化性を他の種類のLiDAR技術に制限し、ハイパーパラメータチューニングをより複雑にすることができる。これらの課題に対処するため,我々は,ウィンドウアテンションをスパース焦点変調に置き換えることにより,マーケットで広く普及している様々な種類のLiDARに対応するための一般化されたフレームワークを提案する。我々のSFPNetは、複数のレベルのコンテキストを抽出し、ゲート機構を用いて動的に集約することができる。チャネルワイズ情報クエリを実装することにより、ローカルコンテキストとグローバルコンテキストの両方を含む機能をエンコードする。また,ロボットアプリケーションのための大規模ハイブリッド型LiDARセマンティックセマンティックセマンティックセマンティックデータセットについても紹介する。 SFPNetは、メカニカルスピンLiDARから派生した従来のベンチマークと、固体LiDARから派生したベンチマークの最先端結果との競合性能を示す。さらに、ハイブリッドソリッドLiDARをベースとした新しいデータセットの既存手法よりも優れています。コードとデータセットはhttps://github.com/Cavendish518/SFPNetとhttps://www.semanticindustry.topで入手できる。 Although LiDAR semantic segmentation advances rapidly, state-of-the-art methods often incorporate specifically designed inductive bias derived from benchmarks originating from mechanical spinning LiDAR. This can limit model generalizability to other kinds of LiDAR technologies and make hyperparameter tuning more complex. To tackle these issues, we propose a generalized framework to accommodate various types of LiDAR prevalent in the market by replacing window-attention with our sparse focal point modulation. Our SFPNet is capable of extracting multi-level contexts and dynamically aggregating them using a gate mechanism. By implementing a channel-wise information query, features that incorporate both local and global contexts are encoded. We also introduce a novel large-scale hybrid-solid LiDAR semantic segmentation dataset for robotic applications. SFPNet demonstrates competitive performance on conventional benchmarks derived from mechanical spinning LiDAR, while achieving state-of-the-art results on benchmark derived from solid-state LiDAR. Additionally, it outperforms existing methods on our novel dataset sourced from hybrid-solid LiDAR. Code and dataset are available at https://github.com/Cavendish518/SFPNet and https://www.semanticindustry.top.	翻訳日:2024-07-17 15:32:52 公開日:2024-07-16
# ベル不等式測定と量子状態トモグラフィー Undergraduate setup for measuring the Bell inequalities and performing Quantum State Tomography ( http://arxiv.org/abs/2407.11570v1 ) ライセンス: Link先を確認	Raul Lahoz Sanz, Lidia Lozano Martín, Adrià Brú i Cortés, Martí Duocastella, Jose M. Gómez Cama, Bruno Juliá-Díaz,	(参考訳) 量子技術の成長は、量子絡み合いや量子重畳といった概念を学びたがっている多くの学生の興味を引き付けている。しかし、これらの概念の直観的でない性質は、それらを理解することの難しさを招いている。ここでは、2光子状態の完全なトモグラフィーを得ることができ、ベル試験(CHSH不等式)を行うことができる絡み合った光子システムを提案する。提案されたセットアップは多用途で費用対効果があり、複数の教室操作モードが可能である。本稿ではベル不等式測定と量子状態トモグラフィーの2つの変種について述べる。実験の結果、光子の量子状態の操作が成功し、高忠実な絡み合った状態が達成され、ベルの不等式が著しく破られた。セットアップの単純さと手頃さは、あまり専門的でない研究室のアクセシビリティを高め、学生が量子物理学の概念に精通することを可能にする。 The growth of quantum technologies is attracting the interest of many students eager to learn concepts such as quantum entanglement or quantum superposition. However, the non-intuitive nature of these concepts poses a challenge to understanding them. Here, we present an entangled photon system which can perform a Bell test, i.e. the CHSH inequality, and can obtain the complete tomography of the two-photon state. The proposed setup is versatile, cost-effective and allows for multiple classroom operating modes. We present two variants, both facilitating the measurement of Bell inequalities and quantum state tomography. Experimental results showcase successful manipulation of the quantum state of the photons, achieving high-fidelity entangled states and significant violations of Bell's inequalities. Our setup's simplicity and affordability enhances accessibility for less specialized laboratories, allowing students to familiarize themselves with quantum physics concepts.	翻訳日:2024-07-17 15:32:52 公開日:2024-07-16
# グリッド信頼性向上のためのフェデレーション学習予測とレジリエンス市場 Federated Learning Forecasting for Strengthening Grid Reliability and Enabling Markets for Resilience ( http://arxiv.org/abs/2407.11571v1 ) ライセンス: Link先を確認	Lucas Pereira, Vineet Jagadeesan Nair, Bruno Dias, Hugo Morais, Anuradha Annaswamy,	(参考訳) 分散エネルギー資源に富む将来の電力グリッドの信頼性とレジリエンスを高めるための包括的アプローチを提案する。分散方式では,フェデレーション学習に基づく攻撃検出と,地域電気市場による攻撃軽減手法を組み合わせた。太陽PVに富んだ実世界の配電網に適用し,その有効性を検証した。シミュレーションの結果、このアプローチは実現可能であり、サイバー物理攻撃によるグリッドの影響を軽減できることが示されている。 We propose a comprehensive approach to increase the reliability and resilience of future power grids rich in distributed energy resources. Our distributed scheme combines federated learning-based attack detection with a local electricity market-based attack mitigation method. We validate the scheme by applying it to a real-world distribution grid rich in solar PV. Simulation results demonstrate that the approach is feasible and can successfully mitigate the grid impacts of cyber-physical attacks.	翻訳日:2024-07-17 15:32:52 公開日:2024-07-16
# 医用画像分類のための視覚変換器のフェデレーションパラメータ効率の良い微調整の有効性の検証 Probing the Efficacy of Federated Parameter-Efficient Fine-Tuning of Vision Transformers for Medical Image Classification ( http://arxiv.org/abs/2407.11573v1 ) ライセンス: Link先を確認	Naif Alkhunaizi, Faris Almalik, Rouqaiah Al-Refai, Muzammal Naseer, Karthik Nandakumar,	(参考訳) 大規模な事前学習型トランスモデルの出現により、これらのモデルを様々な下流タスク向けに微調整することが重要な問題である。トレーニングデータの正確性、データサイロの存在、厳密なプライバシー制約は、医療画像領域におけるこの微調整問題を悪化させ、事前訓練されたモデルの協調的な微調整を可能にするアルゴリズムの強い必要性を生み出します。さらに、これらのモデルの大規模化は、フェデレート学習における通信負担を軽減するために、パラメータ効率のよい微調整(PEFT)を使用する必要がある。本研究では,医用画像分類のためのビジョントランスフォーマー(ViT)モデル(大容量の自然画像データセットで事前学習)を適応するための各種PEFT戦略を体系的に検討する。既知のPEFT技術の評価とは別に、視覚的プロンプトチューニング(VPT)、視覚的プロンプトの低ランク分解、確率的ブロックアテンション微調整、低ランク適応(LoRA)+VPTのようなハイブリッドPEFT手法などの新しいPEFTアルゴリズムを導入した。さらに、フェデレーション設定のための最適なPEFT法を同定し、特に外部領域(OOD)および非IIDデータにおいて、フェデレーションされたPEFTに対するデータ分布の影響を理解するための徹底的な実験分析を行う。本研究の重要な洞察は,ほとんどのPEFT法がドメイン内転送に有効であるが,OODおよび非IIDシナリオを扱う場合,高い精度で効率のトレードオフが生じることである。具体的には、微調整/交換パラメータの桁違いの縮小は、4%の精度低下につながる可能性がある。したがって、フェデレートされたPEFTには、初期モデル選択が不可欠である。一般的な視覚モデルよりも、ドメイン内の医療画像データ(利用可能であれば)から学んだ医療基礎モデルを使うことが好ましい。 With the advent of large pre-trained transformer models, fine-tuning these models for various downstream tasks is a critical problem. Paucity of training data, the existence of data silos, and stringent privacy constraints exacerbate this fine-tuning problem in the medical imaging domain, creating a strong need for algorithms that enable collaborative fine-tuning of pre-trained models. Moreover, the large size of these models necessitates the use of parameter-efficient fine-tuning (PEFT) to reduce the communication burden in federated learning. In this work, we systematically investigate various federated PEFT strategies for adapting a Vision Transformer (ViT) model (pre-trained on a large natural image dataset) for medical image classification. Apart from evaluating known PEFT techniques, we introduce new federated variants of PEFT algorithms such as visual prompt tuning (VPT), low-rank decomposition of visual prompts, stochastic block attention fine-tuning, and hybrid PEFT methods like low-rank adaptation (LoRA)+VPT. Moreover, we perform a thorough empirical analysis to identify the optimal PEFT method for the federated setting and understand the impact of data distribution on federated PEFT, especially for out-of-domain (OOD) and non-IID data. The key insight of this study is that while most federated PEFT methods work well for in-domain transfer, there is a substantial accuracy vs. efficiency trade-off when dealing with OOD and non-IID scenarios, which is commonly the case in medical imaging. Specifically, every order of magnitude reduction in fine-tuned/exchanged parameters can lead to a 4% drop in accuracy. Thus, the initial model choice is crucial for federated PEFT. It is preferable to use medical foundation models learned from in-domain medical image data (if available) rather than general vision models.	翻訳日:2024-07-17 15:32:52 公開日:2024-07-16
# コンフィギュアブルトラベルレコメンダシステムの必要性について:システマティックマッピングによる検討 On the Need for Configurable Travel Recommender Systems: A Systematic Mapping Study ( http://arxiv.org/abs/2407.11575v1 ) ライセンス: Link先を確認	Rickson Simioni Pereira, Claudio Di Sipio, Martina De Sanctis, Ludovico Iovino,	(参考訳) トラベルレコメンダシステム(TRS)は、ユーザの好みに基づいて、旅行領域の選択の負担を軽減するために提案されている。 TRSが提供する機能やデータの幅広い類似性にもかかわらず、これらのシステムは、それらが提供する旅行レコメンデーションの正確性と適切性を決定する上で重要な役割を担っている。例えば、スマートシティやナチュラルパークのようなコンテキストにおいて、交通条件やパスステータスなどの多様なランタイム情報をそれぞれ利用して、特定のコンテキスト内でユーザの好みに合わせた適切なレコメンデーションのデリバリを保証することができる。 Travel Recommender Systems TRSs have been proposed to ease the burden of choice in the travel domain by providing valuable suggestions based on user preferences Despite the broad similarities in functionalities and data provided by TRSs these systems are significantly influenced by the diverse and heterogeneous contexts in which they operate This plays a crucial role in determining the accuracy and appropriateness of the travel recommendations they deliver For instance in contexts like smart cities and natural parks diverse runtime informationsuch as traffic conditions and trail status respectivelyshould be utilized to ensure the delivery of pertinent recommendations aligned with user preferences within the specific context However there is a trend to build TRSs from scratch for different contexts rather than supporting developers with configuration approaches that promote reuse minimize errors and accelerate timetomarket To illustrate this gap in this paper we conduct a systematic mapping study to examine the extent to which existing TRSs are configurable for different contexts The conducted analysis reveals the lack of configuration support assisting TRSs providers in developing TRSs closely tied to their operational context Our findings shed light on uncovered challenges in the domain thus fostering future research focused on providing new methodologies enabling providers to handle TRSs configurations	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# UP-Diff:リモートセンシング都市予測のための潜時拡散モデル UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction ( http://arxiv.org/abs/2407.11578v1 ) ライセンス: Link先を確認	Zeyu Wang, Zecheng Hao, Jingyu Lin, Yuchao Feng, Yufei Guo,	(参考訳) 本研究は,既存の都市レイアウト情報と計画変更マップを利用して都市レイアウトを予測することを目的とした,今後の都市計画に焦点を当てた新しいリモートセンシング(RS)都市予測(UP)タスクを提案する。提案するRS UPタスクに対処するため,都市レイアウトや計画された変更マップの位置情報の埋め込みを,LDM(Latent Diffusion Model)を利用したUP-Diffを提案する。具体的には、UP-Diffの反復拡散モジュール内のトレーニング可能なクロスアテンション層は、ターゲットとなる修正のための重要な領域を動的にハイライトすることができる。 UP-Diffを利用することで、設計者は変更マップを動的かつ適応的に変更することにより、将来の都市計画を効果的に洗練・調整することができる。従来の RS 変更検出 (CD) 法と比較して,提案した RS UP タスクの UP-Diff は,都市開発における実用性を高めるために,ペア化事前変更画像と後変更画像の要求を回避している。 LEVIRCDとSYSU-CDデータセットの実験結果は、UP-Diffが将来の都市レイアウトを高い忠実度で正確に予測できることを示し、都市計画の可能性を示している。コードとモデルの重み付けは、作業の受理時に利用可能になる。 This study introduces a novel Remote Sensing (RS) Urban Prediction (UP) task focused on future urban planning, which aims to forecast urban layouts by utilizing information from existing urban layouts and planned change maps. To address the proposed RS UP task, we propose UP-Diff, which leverages a Latent Diffusion Model (LDM) to capture positionaware embeddings of pre-change urban layouts and planned change maps. In specific, the trainable cross-attention layers within UP-Diff's iterative diffusion modules enable the model to dynamically highlight crucial regions for targeted modifications. By utilizing our UP-Diff, designers can effectively refine and adjust future urban city plans by making modifications to the change maps in a dynamic and adaptive manner. Compared with conventional RS Change Detection (CD) methods, the proposed UP-Diff for the RS UP task avoids the requirement of paired prechange and post-change images, which enhances the practical usage in city development. Experimental results on LEVIRCD and SYSU-CD datasets show UP-Diff's ability to accurately predict future urban layouts with high fidelity, demonstrating its potential for urban planning. Code and model weights will be available upon the acceptance of the work.	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# 不完全都市移動データセットにおける停止位置検出の強化 Enhancing stop location detection for incomplete urban mobility datasets ( http://arxiv.org/abs/2407.11579v1 ) ライセンス: Link先を確認	Margherita Bertè, Rashid Ibrahimli, Lars Koopmans, Pablo Valgañón, Nicola Zomer, Davide Colombi,	(参考訳) 人体移動研究における位置検出の停止は、都市計画、交通ネットワーク設計、疫学モデリング、社会経済的分離分析など、複数の分野に影響を及ぼす。しかし、従来の密度クラスタリングアルゴリズムはノイズや不完全なGPSデータセットに悩まされることが多いため、これは依然として難しい課題である。本研究は, 位置同定のための密度に基づく手法を強化するために, 分類アルゴリズムの適用について検討する。提案手法は,様々な時間スケールにわたる個別のルーチン行動や,個々のGPS点の局所的特徴など,複数の特徴を取り入れている。データセットは、以前に列指向の密度依存アルゴリズムで停止とラベル付けされたプライバシー保護および匿名化されたGPSポイントを含む。選択した停止点から点密度を除去してデータギャップをシミュレートし、スパースデータ条件下での性能を評価する。モデルは、軌道内の個々のGPSポイントを、潜在的な停止またはノンストップとして分類する。データセットの高度に不均衡な性質を考慮し、性能評価の精度よりもリコールを優先した。以上の結果から, 時空間ギャップの存在下においてもほとんどの停止を検知し, 偽陽性と分類された点が, 通常は以前の停止点に近い装置の繰り返し位置に対応することが示唆された。この研究は移動分析技術に寄与するが、重要な課題は残る。基底真理データの欠如は、アルゴリズムの正確性に関する決定的な結論を制限している。多様なデータセットにまたがる手法を検証し、集団行動入力を組み込むためには、さらなる研究が必要である。 Stop location detection, within human mobility studies, has an impacts in multiple fields including urban planning, transport network design, epidemiological modeling, and socio-economic segregation analysis. However, it remains a challenging task because classical density clustering algorithms often struggle with noisy or incomplete GPS datasets. This study investigates the application of classification algorithms to enhance density-based methods for stop identification. Our approach incorporates multiple features, including individual routine behavior across various time scales and local characteristics of individual GPS points. The dataset comprises privacy-preserving and anonymized GPS points previously labeled as stops by a sequence-oriented, density-dependent algorithm. We simulated data gaps by removing point density from select stops to assess performance under sparse data conditions. The model classifies individual GPS points within trajectories as potential stops or non-stops. Given the highly imbalanced nature of the dataset, we prioritized recall over precision in performance evaluation. Results indicate that this method detects most stops, even in the presence of spatio-temporal gaps and that points classified as false positives often correspond to recurring locations for devices, typically near previous stops. While this research contributes to mobility analysis techniques, significant challenges persist. The lack of ground truth data limits definitive conclusions about the algorithm's accuracy. Further research is needed to validate the method across diverse datasets and to incorporate collective behavior inputs.	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# 多粒子量子アルノール猫の動的挙動における相関関数の挙動 Behavior of correlation functions in the dynamics of the Multiparticle Quantum Arnol'd Cat ( http://arxiv.org/abs/2407.11583v1 ) ライセンス: Link先を確認	Giorgio Mantica,	(参考訳) 多粒子アルノールの猫はハミルトン系の一般化であり、古典的かつ量子的であり、周期進化作用素はその名前を持つ無名写像である。猫の配置空間に多数の散乱粒子を加えることにより、Joos-Zehの脱コヒーレンス法則に従って得られる。量子化は、半古典主義ではなくハミルトン的アプローチが採用されるとき、素早く従う。私は、量子古典対応の問題に焦点をあてて、このシステムを以前の一連の研究で研究してきた。本稿では、標準位置の時間自己相関関数と、位置と運動量のアウトオブタイム相関関数という、2つの関連する異なる指標を用いて、このシステムのダイナミクスを検証する。 The multi-particle Arnol'd cat is a generalization of the Hamiltonian system, both classical and quantum, whose period evolution operator is the renown map that bears its name. It is obtained following the Joos-Zeh prescription for decoherence, by adding a number of scattering particles in the configuration space of the cat. Quantization follows swiftly, if the Hamiltonian approach, rather than the semiclassical, is adopted. I have studied this system in a series of previous works, focusing on the problem of quantum-classical correspondence. In this paper I test the dynamics of this system by two related yet different indicators: the time autocorrelation function of the canonical position and the out of time correlator of position and momentum.	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# QVD:ビデオ拡散モデルのための後学習量子化 QVD: Post-training Quantization for Video Diffusion Models ( http://arxiv.org/abs/2407.11585v1 ) ライセンス: Link先を確認	Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie,	(参考訳) 近年,映像拡散モデル (VDM) が注目されている。しかし、複数のフレームを並列に処理し、相当なモデルサイズと組み合わせることで、高いレイテンシと広範なメモリ消費が発生し、より広範なアプリケーションを妨げる。ポストトレーニング量子化(PTQ)は、メモリフットプリントの削減と計算効率の向上に有効な手法である。画像拡散とは異なり、時間的特徴は全てのフレーム特徴に統合され、顕著な歪みを示す。さらに,ビデオ拡散モデルのアクティベーションにおけるチャネル間の相違や非対称性について検討し,個々のチャンネルによる量子化レベルの範囲が低くなり,量子化の課題が増大することを示した。これらの問題に対処するために、QVDと呼ばれるビデオ拡散モデルに適した最初のPTQ戦略を導入する。具体的には,時間的特徴に対する高時間的識別可能性量子化法(HTDQ)を提案する。さらに,各チャネルにおける量子化レベルのカバレッジ向上を目的としたScattered Channel Range Integration (SCRI)法を提案する。様々なモデル、データセット、ビット幅設定の実験的検証は、様々なメトリクスの観点から、私たちのQVDの有効性を示しています。特にW8A8では,FVDの205.12倍の性能向上を実現している。 Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effective technique to reduce memory footprint and improve computational efficiency. Unlike image diffusion, we observe that the temporal features, which are integrated into all frame features, exhibit pronounced skewness. Furthermore, we investigate significant inter-channel disparities and asymmetries in the activation of video diffusion models, resulting in low coverage of quantization levels by individual channels and increasing the challenge of quantization. To address these issues, we introduce the first PTQ strategy tailored for video diffusion models, dubbed QVD. Specifically, we propose the High Temporal Discriminability Quantization (HTDQ) method, designed for temporal features, which retains the high discriminability of quantized features, providing precise temporal guidance for all video frames. In addition, we present the Scattered Channel Range Integration (SCRI) method which aims to improve the coverage of quantization levels across individual channels. Experimental validations across various models, datasets, and bit-width settings demonstrate the effectiveness of our QVD in terms of diverse metrics. In particular, we achieve near-lossless performance degradation on W8A8, outperforming the current methods by 205.12 in FVD.	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# 多粒子型アルノール猫の量子エントロピーと脱コヒーレンス Quantum Entropies and Decoherence for the Multiparticle Quantum Arnol'd Cat ( http://arxiv.org/abs/2407.11587v1 ) ライセンス: Link先を確認	Giorgio Mantica,	(参考訳) カオス系における衝突誘起脱コヒーレンスモデルにおいて, 動的エントロピー, 古典的, 量子的パラメータのスケーリング挙動について検討した。治療は完全に標準的であり、近似は関与せず、無限に制限される。このモデルは、量子カオスの性質、定義、および関連性に関する私の見解を明らかにするために、詳細な方法で提示する。 I study the scaling behavior in the physical parameters of dynamical entropies, classical and quantum, in a specifically devised model of collision-induced decoherence in a chaotic system. The treatment is fully canonical and no approximations are involved or infinite limits taken. I present this model in a detailed way, in order to clarify my views in the debate about the nature, definition, and relevance of quantum chaos.	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# 人間軌道予測のためのプログレッシブ・プレテキスト・タスク・ラーニング Progressive Pretext Task Learning for Human Trajectory Prediction ( http://arxiv.org/abs/2407.11588v1 ) ライセンス: Link先を確認	Xiaotong Lin, Tianming Liang, Jianhuang Lai, Jian-Fang Hu,	(参考訳) 人間軌道予測は、道路上の歩行者の将来の位置を予測するための実践的なタスクであり、通常は軌道内の短期から長期までの時間範囲をカバーしている。しかし、既存の研究は、人間の軌跡における短期力学と長期力学の区別を無視して、特異で均一な訓練パラダイムで軌道予測全体を解決しようと試みている。この制限を克服するため,我々は,PPT(Progressive Pretext Task Learning)フレームワークを導入した。具体的には,PTフレームワークにおける3段階のトレーニングタスクを精巧に設計する。最初の段階では、ステップワイズな次位置予測タスクを通じて、短期的なダイナミクスを理解することを学習する。第2段階では、目的地予測タスクを通じて長期的な依存関係を理解するために、モデルをさらに強化する。最終段階では、前段階からの知識を最大限に活用することで、将来の軌道上の課題に対処することを目的としている。忘れる知識を緩和するために、クロスタスクな知識蒸留を更に適用する。さらに,トランスフォーマーをベースとした軌道予測器を設計し,目的地駆動予測戦略と学習可能なプロンプト埋め込みのグループを統合することにより,高い効率の2段階推論を実現する。一般的なベンチマーク実験により,提案手法が最先端の性能を高い効率で達成できることが実証された。コードはhttps://github.com/iSEE-Laboratory/PPTで入手できる。 Human trajectory prediction is a practical task of predicting the future positions of pedestrians on the road, which typically covers all temporal ranges from short-term to long-term within a trajectory. However, existing works attempt to address the entire trajectory prediction with a singular, uniform training paradigm, neglecting the distinction between short-term and long-term dynamics in human trajectories. To overcome this limitation, we introduce a novel Progressive Pretext Task learning (PPT) framework, which progressively enhances the model's capacity of capturing short-term dynamics and long-term dependencies for the final entire trajectory prediction. Specifically, we elaborately design three stages of training tasks in the PPT framework. In the first stage, the model learns to comprehend the short-term dynamics through a stepwise next-position prediction task. In the second stage, the model is further enhanced to understand long-term dependencies through a destination prediction task. In the final stage, the model aims to address the entire future trajectory task by taking full advantage of the knowledge from previous stages. To alleviate the knowledge forgetting, we further apply a cross-task knowledge distillation. Additionally, we design a Transformer-based trajectory predictor, which is able to achieve highly efficient two-step reasoning by integrating a destination-driven prediction strategy and a group of learnable prompt embeddings. Extensive experiments on popular benchmarks have demonstrated that our proposed approach achieves state-of-the-art performance with high efficiency. Code is available at https://github.com/iSEE-Laboratory/PPT.	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# 多体局在化によるヴィリチュアル量子固有解法の改良 Improve Virational Quantum Eigensolver by Many-Body Localization ( http://arxiv.org/abs/2407.11589v1 ) ライセンス: Link先を確認	Li Xin, Zhang-qi Yin,	(参考訳) 変分量子アルゴリズムは、量子シミュレーション、最適化、機械学習に広く応用されるように、実験と理論の両方の文脈で広く実証されてきた。しかし、ヒルベルト空間の次元の指数的な成長は、バレンプラトー現象として知られる量子ビットの数と回路深さの増加によって回路内のパラメータ勾配が消滅する現象をもたらす。近年、非平衡統計物理学の研究が多体局在の発見につながっている。フラケット系の一種として, 多体局所花束系は, 広いパラメータ空間範囲で熱化を回避し, 時間結晶の生成を実験的に実証している。この回路を多体基底状態の計算のための変分量子アルゴリズムに適用し,パラメータ更新のための勾配のばらつきについて検討した。この回路構造はバレン高原を効果的に回避できることがわかった。また,この回路のエントロピー成長,情報スクランブル,オプティマイザダイナミクスを解析した。この特徴を生かして,我々は「多体局所化アンザッツ」と呼ばれる新しいタイプの変分アンザッツを設計した。量子多体基底状態の解法として応用し,その回路特性について検討した。数値計算の結果,我々のアンサッツは変分量子アルゴリズムを大幅に改善した。 Variational quantum algorithms have been widely demonstrated in both experimental and theoretical contexts to have extensive applications in quantum simulation, optimization, and machine learning. However, the exponential growth in the dimension of the Hilbert space results in the phenomenon of vanishing parameter gradients in the circuit as the number of qubits and circuit depth increase, known as the barren plateau phenomena. In recent years, research in non-equilibrium statistical physics has led to the discovery of the realization of many-body localization. As a type of floquet system, many-body localized floquet system has phase avoiding thermalization with an extensive parameter space coverage and have been experimentally demonstrated can produce time crystals. We applied this circuit to the variational quantum algorithms for the calculation of many-body ground states and studied the variance of gradient for parameter updates under this circuit. We found that this circuit structure can effectively avoid barren plateaus. We also analyzed the entropy growth, information scrambling, and optimizer dynamics of this circuit. Leveraging this characteristic, we designed a new type of variational ansatz, called the 'many-body localization ansatz'. We applied it to solve quantum many-body ground states and examined its circuit properties. Our numerical results show that our ansatz significantly improved the variational quantum algorithm.	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# 学習した画像の圧縮を再考する Rethinking Learned Image Compression: Context is All You Need ( http://arxiv.org/abs/2407.11590v1 ) ライセンス: Link先を確認	Jixiang Luo,	(参考訳) 近年,licは従来の方法と比較して急速に進歩しているため,本論文では「学習画像圧縮(lic)の境界線はどこにあるのか」という課題を主観的マストリクスで論じる。以上の問題を2つのサブプロブレムに分割する: 1)PSNRの速度歪み性能の境界は何か? 2) 圧縮ゲインをさらに改善し、境界を達成するにはどうすればいいのか? そこで本研究では,エンコーダ,デコーダ,コンテキストモデルのスケーリングパラメータの有効性を解析する。そして、licのスケーリングは、lic内のコンテキストモデルとデコーダのスケーリングである、と結論付けます。大規模な実験は、オーバーフィッティングが実際に効果的な文脈として機能することを示した。文脈を最適化することにより、PSNRをさらに改善し、最先端のパフォーマンスを実現し、VVCよりもBD-RATEの方が14.39%向上したことを示す。 Since LIC has made rapid progress recently compared to traditional methods, this paper attempts to discuss the question about 'Where is the boundary of Learned Image Compression(LIC)?' with regard to subjective matrics. Thus this paper splits the above problem into two sub-problems:1)Where is the boundary of rate-distortion performance of PSNR? 2)How to further improve the compression gain and achieve the boundary? Therefore this paper analyzes the effectiveness of scaling parameters for encoder, decoder and context model, which are the three components of LIC. Then we conclude that scaling for LIC is to scale for context model and decoder within LIC. Extensive experiments demonstrate that overfitting can actually serve as an effective context. By optimizing the context, this paper further improves PSNR and achieves state-of-the-art performance, showing a performance gain of 14.39% with BD-RATE over VVC.	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# AdaptEval: テキスト要約のためのドメイン適応に基づく大規模言語モデルの評価 AdaptEval: Evaluating Large Language Models on Domain Adaptation for Text Summarization ( http://arxiv.org/abs/2407.11591v1 ) ライセンス: Link先を確認	Anum Afzal, Ribin Chalumattu, Florian Matthes, Laura Mascarell Espuny,	(参考訳) LLM(Large Language Models)を用いた抽象的な要約タスクの進歩にもかかわらず、異なるドメインに容易に適応できる能力を評価する研究が不足している。各種ドメイン間の要約タスクにおいて,様々なLLMのドメイン適応能力について,微調整と文脈内学習の両方で評価する。また、最初のドメイン適応評価スイートであるAdaptEvalも紹介する。 AdaptEvalには、ドメイン適応の分析を容易にするための、ドメインベンチマークとメトリクスのセットが含まれている。この結果から,LLMはパラメータスケールに関係なく,文脈内学習環境において同等の性能を示すことが示された。 Despite the advances in the abstractive summarization task using Large Language Models (LLM), there is a lack of research that asses their abilities to easily adapt to different domains. We evaluate the domain adaptation abilities of a wide range of LLMs on the summarization task across various domains in both fine-tuning and in-context learning settings. We also present AdaptEval, the first domain adaptation evaluation suite. AdaptEval includes a domain benchmark and a set of metrics to facilitate the analysis of domain adaptation. Our results demonstrate that LLMs exhibit comparable performance in the in-context learning setting, regardless of their parameter scale.	翻訳日:2024-07-17 15:32:51 公開日:2024-07-16
# 自己監督型プレトレーニングによる医療拡散のスケーリング DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training ( http://arxiv.org/abs/2407.11594v1 ) ライセンス: Link先を確認	Guillermo Jimenez-Perez, Pedro Osorio, Josef Cersovsky, Javier Montalt-Tordera, Jens Hooge, Steffen Vogler, Sadegh Mohammadi,	(参考訳) 拡散モデル(DM)は様々なタスクのための強力な基礎モデルとして登場し、合成画像生成に大きな焦点をあてている。しかし、トレーニングのための大きなアノテートデータセットの要求は、通常、データセットが小さく、わずかにアノテートされている医療画像における適用性を制限している。本稿では,Dinoから抽出した画像埋め込みの生成過程を条件としたLDMの自己教師付き手法であるDino-Diffusionを紹介する。アノテーションへの依存をなくすことで、我々のトレーニングは、公開胸部X線(CXR)データセットから868万以上の未ラベル画像を活用する。自己監督されているにもかかわらず、Dino-Diffusionは、FIDスコアが4.7以下で、下流タスクで評価されたときに出現する特性を持つ包括的な多様体のカバレッジを示している。これは、小さなデータプールからでも意味的に異なる合成データセットを生成するために使用することができ、データ拡張に使用する場合、最大20%のAUCの分類性能が向上することを示す。画像は、Dino埋め込み多様体上で異なるサンプリング戦略で生成され、実際のイメージを出発点として使用した。結果として、DiNO-Diffusionは、限られた量の実際のデータから下流AIモデルの柔軟なトレーニングのための大規模なデータセットの作成を促進すると同時に、プライバシ保護の可能性を秘めている可能性が示唆されている。さらに、Dino-Diffusionは、肺葉のセグメンテーションを評価する際に、最大84.4%のDiceスコアのゼロショットセグメンテーション性能を示す。これは、バニラDM上のテキスト記述子を用いたセグメント化に似た、優れたCXR画像-解剖学的アライメントを示す。最後に、Dino-Diffusionは、他の医療画像モダリティや最先端拡散モデルに容易に適応でき、医療画像のための大規模マルチドメイン画像生成パイプラインの扉を開くことができる。 Diffusion models (DMs) have emerged as powerful foundation models for a variety of tasks, with a large focus in synthetic image generation. However, their requirement of large annotated datasets for training limits their applicability in medical imaging, where datasets are typically smaller and sparsely annotated. We introduce DiNO-Diffusion, a self-supervised method for training latent diffusion models (LDMs) that conditions the generation process on image embeddings extracted from DiNO. By eliminating the reliance on annotations, our training leverages over 868k unlabelled images from public chest X-Ray (CXR) datasets. Despite being self-supervised, DiNO-Diffusion shows comprehensive manifold coverage, with FID scores as low as 4.7, and emerging properties when evaluated in downstream tasks. It can be used to generate semantically-diverse synthetic datasets even from small data pools, demonstrating up to 20% AUC increase in classification performance when used for data augmentation. Images were generated with different sampling strategies over the DiNO embedding manifold and using real images as a starting point. Results suggest, DiNO-Diffusion could facilitate the creation of large datasets for flexible training of downstream AI models from limited amount of real data, while also holding potential for privacy preservation. Additionally, DiNO-Diffusion demonstrates zero-shot segmentation performance of up to 84.4% Dice score when evaluating lung lobe segmentation. This evidences good CXR image-anatomy alignment, akin to segmenting using textual descriptors on vanilla DMs. Finally, DiNO-Diffusion can be easily adapted to other medical imaging modalities or state-of-the-art diffusion models, opening the door for large-scale, multi-domain image generation pipelines for medical imaging.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# HyperAggregation: Hypernetworksによるグラフエッジ上のアグリゲーション HyperAggregation: Aggregating over Graph Edges with Hypernetworks ( http://arxiv.org/abs/2407.11596v1 ) ライセンス: Link先を確認	Nicolas Lell, Ansgar Scherp,	(参考訳) HyperAggregationは、グラフニューラルネットワークのためのハイパーネットワークベースの集約機能である。ハイパーネットワークを使って現在の近所の大きさの重みを動的に生成し、そこからこの近所を集約する。この重み付きアグリゲーションは、可変サイズの頂点近傍を混合するMLP-Mixerチャネルのように行われる。 GraphHyperMixerはMLP-Mixerに基づくモデルであり、GraphHyperConvはGCNから派生しているが、ハイパーネットワークベースのアグリゲーション機能を持つ。我々は、頂点分類、グラフ分類、グラフ回帰タスクのための多様なベンチマークデータセットの実験を行う。その結果、ハイパーアグリゲーションは、誘導的およびトランスダクティブな設定の両方において、ホモ親和性およびヘテロ親和性のあるデータセットに有効に使用できることが示された。 GraphHyperConvはGraphHyperMixerよりもパフォーマンスが良く、特にトランスダクティブ設定では強い。ヘテロ親善的なデータセットであるRoman-Empireでは、新しい最先端に到達している。グラフレベルのタスクでは、モデルも同様の大きさのモデルと一致して実行されます。アブレーション研究は、様々なハイパーパラメータ選択に対するロバスト性を研究する。 HyperAggregationの実装と、すべての実験を再現するためのコードについては、https://github.com/Foisunt/HyperAggregation.comで公開されている。 HyperAggregation is a hypernetwork-based aggregation function for Graph Neural Networks. It uses a hypernetwork to dynamically generate weights in the size of the current neighborhood, which are then used to aggregate this neighborhood. This aggregation with the generated weights is done like an MLP-Mixer channel mixing over variable-sized vertex neighborhoods. We demonstrate HyperAggregation in two models, GraphHyperMixer is a model based on MLP-Mixer while GraphHyperConv is derived from a GCN but with a hypernetwork-based aggregation function. We perform experiments on diverse benchmark datasets for the vertex classification, graph classification, and graph regression tasks. The results show that HyperAggregation can be effectively used for homophilic and heterophilic datasets in both inductive and transductive settings. GraphHyperConv performs better than GraphHyperMixer and is especially strong in the transductive setting. On the heterophilic dataset Roman-Empire it reaches a new state of the art. On the graph-level tasks our models perform in line with similarly sized models. Ablation studies investigate the robustness against various hyperparameter choices. The implementation of HyperAggregation as well code to reproduce all experiments is available under https://github.com/Foisunt/HyperAggregation .	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# TinyMLセキュリティの強化: 敵攻撃伝達性の検討 Enhancing TinyML Security: Study of Adversarial Attack Transferability ( http://arxiv.org/abs/2407.11599v1 ) ライセンス: Link先を確認	Parin Shah, Yuvaraj Govindarajulu, Pavan Kulkarni, Manojkumar Parmar,	(参考訳) 人工知能(AI)と機械学習(ML)の最近の進歩は、クラウド接続に依存することなく、エッジでのAI計算を可能にするパラダイムであるTinyMLの台頭を促している。 TinyMLは、さまざまなアプリケーションにとって重要なリアルタイムデータ分析と迅速なレスポンスを提供するが、そのデバイス固有のリソース制限は、セキュリティリスクを露呈する。この研究は、リソースに制限された組み込みハードウェア上のAIモデルの敵対的脆弱性を深く掘り下げ、モデル抽出と侵入攻撃に焦点をあてる。以上の結果から,強力なホストマシンからの敵攻撃は,ESP32やRaspberry Piなど,より小型で安全性の低いデバイスに転送される可能性が示唆された。このことは、敵対的攻撃が小さなデバイスに拡張され、脆弱性が強調され、TinyMLデプロイメントにおける強化されたセキュリティ対策の必要性を強調していることを示している。この調査は、TinyMLのセキュリティ課題の理解を強化し、センシティブなデータを保護し、AIによるエッジコンピューティング設定におけるデバイス依存性を保証するための洞察を提供する。 The recent strides in artificial intelligence (AI) and machine learning (ML) have propelled the rise of TinyML, a paradigm enabling AI computations at the edge without dependence on cloud connections. While TinyML offers real-time data analysis and swift responses critical for diverse applications, its devices' intrinsic resource limitations expose them to security risks. This research delves into the adversarial vulnerabilities of AI models on resource-constrained embedded hardware, with a focus on Model Extraction and Evasion Attacks. Our findings reveal that adversarial attacks from powerful host machines could be transferred to smaller, less secure devices like ESP32 and Raspberry Pi. This illustrates that adversarial attacks could be extended to tiny devices, underscoring vulnerabilities, and emphasizing the necessity for reinforced security measures in TinyML deployments. This exploration enhances the comprehension of security challenges in TinyML and offers insights for safeguarding sensitive data and ensuring device dependability in AI-powered edge computing settings.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# トークン化の基礎:統計的・計算的懸念 The Foundations of Tokenization: Statistical and Computational Concerns ( http://arxiv.org/abs/2407.11606v1 ) ライセンス: Link先を確認	Juan Luis Gastaldi, John Terilla, Luca Malagutti, Brian DuSell, Tim Vieira, Ryan Cotterell,	(参考訳) トークン化(Tokenization) - アルファベット上の文字列を語彙上のトークンのシーケンスに変換するプラクティス。特に、広く使われているエンドツーエンドのニューラルモデルに完全に統合されていない唯一の主要なステップである。本稿では,トークン化の基礎を形式的観点から構築することで,この理論的ギャップに対処することを目的とする。確率写像のカテゴリに関する基本特性を記述・拡張することにより,トークン化モデルを表現・解析するための統一的な枠組みを提案する。このフレームワークにより、トークン化剤の使用に関する一般的な条件が確立できます。特に,統計的推定器の整合性を維持するために,トークン化モデルに必要な,十分な条件を正式に確立する。さらに,トークン化モデルの設計と実装に不可欠な統計的および計算上の懸念についても論じる。本稿では,ニューラルネットワークモデリングの堅牢な理論的基盤に向けた第一歩として,その枠組みと成果について述べる。 Tokenization - the practice of converting strings of characters over an alphabet into sequences of tokens over a vocabulary - is a critical yet under-theorized step in the NLP pipeline. Notably, it remains the only major step not fully integrated into widely used end-to-end neural models. This paper aims to address this theoretical gap by laying the foundations of tokenization from a formal perspective. By articulating and extending basic properties about the category of stochastic maps, we propose a unified framework for representing and analyzing tokenizer models. This framework allows us to establish general conditions for the use of tokenizers. In particular, we formally establish the necessary and sufficient conditions for a tokenizer model to preserve the consistency of statistical estimators. Additionally, we discuss statistical and computational concerns crucial for the design and implementation of tokenizer models. The framework and results advanced in this paper represent a step toward a robust theoretical foundation for neural language modeling.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# 擬似宇宙密度行列 Pseudorandom density matrices ( http://arxiv.org/abs/2407.11607v1 ) ライセンス: Link先を確認	Nikhil Bansal, Wai-Keong Mok, Kishor Bharti, Dax Enshan Koh, Tobias Haug,	(参考訳) Pseudorandom state (PRS) は、任意の効率的な量子アルゴリズムによってハールランダム状態と区別できない状態アンサンブルである。しかしながら、PSSの定義は純粋な状態に限定されており、ノイズに対する堅牢性に欠ける。本研究では, 擬似ランダム密度行列 (PRDM) を導入し, 一般化されたヒルベルト・シュミット・アンサンブルと計算的に区別できない$n$-qubit 状態のアンサンブルを, $m$ qubit を持つ$(n+m)$-qubit Haar ランダム状態から構成する。混合度パラメータ $m=0$ の場合、PRDM は PRS と同値であるが、$m=\omega(\log n)$ の場合、PRDM は最大混合状態と計算的に区別できない。 PRSとは対照的に、$m=\omega(\log n)$のPRDMはユニタリノイズチャネルに対して堅牢であり、最近導入された$\mathsf{PostBQP}$ attackである。さらに, 擬似的および擬似的コヒーレントな状態アンサンブルを構築し, ほぼ最大のマジックとコヒーレンスを持つが, ゼロマジックとコヒーレンスを持つ状態と計算的に区別できない。 PRDMは$\Theta(n)$と$0$の擬似情報源ギャップを示すことができる。ノイズを受ける場合であっても、計算的に区別できないが統計的に遠い状態アンサンブルであるノイズローバスEFIペアを導入する。テストの絡み合い、魔法、一貫性は効率的ではないことを示す。さらに,ブラックボックスの資源蒸留にはスーパーポリノミカルなコピー数が必要であることを証明した。また, 効率的な試験およびブラックボックス蒸留に必要な純度を低く設定した。最後に、量子メモリを使わずに効率的なアルゴリズムのために、Haarランダム状態と区別できないPRSのノイズロバストな概念である、メモリレスPSSを紹介する。我々の研究は、混合状態に対する擬似ランダム性の包括的な枠組みを提供し、強力な量子暗号プリミティブと量子資源理論の基本的な境界をもたらす。 Pseudorandom states (PRSs) are state ensembles that cannot be distinguished from Haar random states by any efficient quantum algorithm. However, the definition of PRSs has been limited to pure states and lacks robustness against noise. In this work, we introduce pseudorandom density matrices (PRDMs), ensembles of $n$-qubit states that are computationally indistinguishable from the generalized Hilbert-Schmidt ensemble, which is constructed from $(n+m)$-qubit Haar random states with $m$ qubits traced out. For a mixedness parameter $m=0$, PRDMs are equivalent to PRSs, whereas for $m=\omega(\log n)$, PRDMs are computationally indistinguishable from the maximally mixed state. In contrast to PRSs, PRDMs with $m=\omega(\log n)$ are robust to unital noise channels and a recently introduced $\mathsf{PostBQP}$ attack. Further, we construct pseudomagic and pseudocoherent state ensembles, which possess near-maximal magic and coherence, but are computationally indistinguishable from states with zero magic and coherence. PRDMs can exhibit a pseudoresource gap of $\Theta(n)$ vs $0$, surpassing previously found gaps. We introduce noise-robust EFI pairs, which are state ensembles that are computationally indistinguishable yet statistically far, even when subject to noise. We show that testing entanglement, magic and coherence is not efficient. Further, we prove that black-box resource distillation requires a superpolynomial number of copies. We also establish lower bounds on the purity needed for efficient testing and black-box distillation. Finally, we introduce memoryless PRSs, a noise-robust notion of PRS which are indistinguishable to Haar random states for efficient algorithms without quantum memory. Our work provides a comprehensive framework of pseudorandomness for mixed states, which yields powerful quantum cryptographic primitives and fundamental bounds on quantum resource theories.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# 確率的サイバー物理系の分布シフトによる統計的到達性解析 Statistical Reachability Analysis of Stochastic Cyber-Physical Systems under Distribution Shift ( http://arxiv.org/abs/2407.11609v1 ) ライセンス: Link先を確認	Navid Hashemi, Lars Lindemann, Jyotirmoy V. Deshmukh,	(参考訳) 到達可能性解析(Reachability analysis)は、確率的サイバー物理システム(SCPS)の安全性を保証するための一般的な手法であり、システムダイナミクスの象徴的な記述を取り入れ、設定プロパゲーション法を用いて、到達可能な状態の集合の過度な近似を境界時間地平線上で計算する。本稿では,システムトラジェクトリを生成するためにシミュレーション可能なディジタルツインモデルを用いて,力学の記号的記述を持たないSCPSの到達可能性解析を行う問題について検討する。重要な課題は、シミュレータがSCPSの軌道上の確率分布を暗黙的にモデル化することであるが、一般的にはsim2realギャップがあること、すなわち、配置設定における軌道の実際の分布は、シミュレータが仮定した分布からシフトすることができる。そこで本稿では,ユーザが提供するしきい値が1-\epsilon$となると,このしきい値よりも小さい確率で,デプロイ中の到達可能な状態がこのセットに存在することを保証できるような統計的到達可能性解析手法を提案する。提案手法は,(1)サンプル軌道から決定論的サロゲートモデルを学習し,(2)サロゲートモデル上で到達可能性解析を行い,(3)追加のサンプル軌道を用いてサロゲートモデルの分布シフトを定量化する。到達可能な集合における保守性に対抗するために、量子的損失項を最小化するサロゲートモデルをトレーニングする方法(通常の平均2乗損失の代わりに)と、正規化サロゲート誤差を用いて共形推論を用いてより厳密な保証を提供する新しい手法を提案する。各種ケーススタディにおいて,本手法の有効性を実証する。 Reachability analysis is a popular method to give safety guarantees for stochastic cyber-physical systems (SCPSs) that takes in a symbolic description of the system dynamics and uses set-propagation methods to compute an overapproximation of the set of reachable states over a bounded time horizon. In this paper, we investigate the problem of performing reachability analysis for an SCPS that does not have a symbolic description of the dynamics, but instead is described using a digital twin model that can be simulated to generate system trajectories. An important challenge is that the simulator implicitly models a probability distribution over the set of trajectories of the SCPS; however, it is typical to have a sim2real gap, i.e., the actual distribution of the trajectories in a deployment setting may be shifted from the distribution assumed by the simulator. We thus propose a statistical reachability analysis technique that, given a user-provided threshold $1-\epsilon$, provides a set that guarantees that any reachable state during deployment lies in this set with probability not smaller than this threshold. Our method is based on three main steps: (1) learning a deterministic surrogate model from sampled trajectories, (2) conducting reachability analysis over the surrogate model, and (3) employing {\em robust conformal inference} using an additional set of sampled trajectories to quantify the surrogate model's distribution shift with respect to the deployed SCPS. To counter conservatism in reachable sets, we propose a novel method to train surrogate models that minimizes a quantile loss term (instead of the usual mean squared loss), and a new method that provides tighter guarantees using conformal inference using a normalized surrogate error. We demonstrate the effectiveness of our technique on various case studies.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# MergeNet: エッジ予測によるスパースポイントクラウドからの明示的なメッシュ再構築 MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction ( http://arxiv.org/abs/2407.11610v1 ) ライセンス: Link先を確認	Weimin Wang, Yingxu Deng, Zezeng Li, Yu Liu, Na Lei,	(参考訳) 本稿では,エッジ接続を予測して疎点雲からメッシュを再構築する新しい手法を提案する。既存の暗黙の手法は、通常、等表面抽出アルゴリズム~(eg, Marching Cubes)により、滑らかで水密なメッシュを生成する。しかし、これらの手法は高解像度化とともにメモリと計算集約化される。顔を直接ポイントから形成することで、明示的な手法がより効率的になる。それでも、巨大な候補から適切な顔を選択するという課題は、しばしば望ましくない顔や穴につながる。さらに、両方のアプローチの再構築性能は、ポイントクラウドがスパースになると劣化する傾向にある。この目的のために,メッシュ再構成を局所接続予測問題に変換する,edGE~(MergeNet)によるMesh再構成を提案する。具体的には、MergeNetは、候補エッジの特徴を抽出し、基礎となる表面までの距離を遅らせることを学ぶ。これにより、予測された距離を利用して、表面に横たわるエッジをフィルタリングする。最後に、メッシュは、これらのエッジによって形成された三角形を精製することによって再構成される。合成および実スキャンされたデータセットに関する大規模な実験は、MergeNetとSoTAの明示的な手法の優位性を実証している。 This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by directly forming the face from points. Nevertheless, the challenge of selecting appropriate faces from enormous candidates often leads to undesirable faces and holes. Moreover, the reconstruction performance of both approaches tends to degrade when the point cloud gets sparse. To this end, we propose MEsh Reconstruction via edGE~(MergeNet), which converts mesh reconstruction into local connectivity prediction problems. Specifically, MergeNet learns to extract the features of candidate edges and regress their distances to the underlying surface. Consequently, the predicted distance is utilized to filter out edges that lay on surfaces. Finally, the meshes are reconstructed by refining the triangulations formed by these edges. Extensive experiments on synthetic and real-scanned datasets demonstrate the superiority of MergeNet to SoTA explicit methods.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# ソフトウェアシステムのエネルギーフットプリントの推定:プライマー Estimating the Energy Footprint of Software Systems: a Primer ( http://arxiv.org/abs/2407.11611v1 ) ライセンス: Link先を確認	Fernando Castor,	(参考訳) 本論文では,グリーンソフトウェア開発を支援するために,ソフトウェアシステムのエネルギーフットプリントをどのように見積もることができるのかを論じる。我々の焦点は一般的な概念とアプローチであり、特定のツールではない。この文書は、この分野で研究を始めたい研究者の出発点となることを目的としている。 In this document, we talk about how the energy footprint of a software system can be estimated to support Green Software Development. Our focus is on general concepts and approaches and not on specific tools, although we do refer to some of them to make explanations more concrete. This document aims to be a starting point for researchers who want to start conducting work in this area.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# ストレスコーピングにおけるmHealth micro-intervention(mHealth micro-intervention)の効果とエンゲージメントの改善 Improving Engagement and Efficacy of mHealth Micro-Interventions for Stress Coping: an In-The-Wild Study ( http://arxiv.org/abs/2407.11612v1 ) ライセンス: Link先を確認	Chaya Ben Yehuda, Ran Gilad-Bachrach, Yarin Udi,	(参考訳) モバイルヘルス(mHealth)の介入による長期的なユーザエンゲージメントを維持しながら、高い有効性を維持することは、現実の幸福なアプリケーションにおいて、現在進行中の課題である。この問題に対処するために、フィールド実験における介入選択と性能評価のための新しいアルゴリズムであるパーソナライズされたコンテキストアウェア・レコメンド(PCAR)を導入する。子ども29人の親を対象とする4週間の実験で、モバイルチャットボットを通じて、個人化されたストレス低減マイクロインターベンションを納品した。介入前後の時間的ストレスレベル・時間的評価(EMA)を用いてストレス低減効果を評価した。本研究は,PCAR介入選択によるストレス対策に対するmHealth micro-interventionsの関与と有効性の向上が,無作為介入選択や介入を受けない制御群よりも優れていることを示すものである。さらに,短時間の1分間の介入でも,知覚ストレスレベル(p=0.001)が著しく低下することを示した。午後の行動から就寝のルーチンへの移行など,活動の移行期間中の1分間の介入に対して,個人が最も受け入れやすいことが観察された。本研究は,mHealth介入のエンゲージメントと有効性を向上し,ストレス介入の重要なタイミングを特定し,ストレス対処を改善するメカニズムに関する洞察を提供することにより,文献に寄与する。 Sustaining long-term user engagement with mobile health (mHealth) interventions while preserving their high efficacy remains an ongoing challenge in real-world well-being applications. To address this issue, we introduce a new algorithm, the Personalized, Context-Aware Recommender (PCAR), for intervention selection and evaluate its performance in a field experiment. In a four-week, in-the-wild experiment involving 29 parents of young children, we delivered personalized stress-reducing micro-interventions through a mobile chatbot. We assessed their impact on stress reduction using momentary stress level ecological momentary assessments (EMAs) before and after each intervention. Our findings demonstrate the superiority of PCAR intervention selection in enhancing the engagement and efficacy of mHealth micro-interventions to stress coping compared to random intervention selection and a control group that did not receive any intervention. Furthermore, we show that even brief, one-minute interventions can significantly reduce perceived stress levels (p=0.001). We observe that individuals are most receptive to one-minute interventions during transitional periods between activities, such as transitioning from afternoon activities to bedtime routines. Our study contributes to the literature by introducing a personalized context-aware intervention selection algorithm that improves engagement and efficacy of mHealth interventions, identifying key timing for stress interventions, and offering insights into mechanisms to improve stress coping.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# AI参加をスケールダウンさせる - AIプロジェクトへのオープンAIの民主的入力に関するコメント Bringing AI Participation Down to Scale: A Comment on Open AIs Democratic Inputs to AI Project ( http://arxiv.org/abs/2407.11613v1 ) ライセンス: Link先を確認	David Moats, Chandrima Ganguly,	(参考訳) 原文(投稿日:2019/09/17)へのリンクこのコメンタリー記事は、生成AIへの公的な参加のための手順設計に10のチームに資金を提供した最近のOpen AI Democratic Inputsプログラムをレビューする。これらのプロジェクトの技術的革新を称賛しながら、LLMの一般化、抽象的な価値の抽出、問題ではなく解決策の勧誘、民主主義への参加など、いくつかの共通前提を特定した。私たちは代わりに、特定のコミュニティやユースケースを含むAI参加を呼びかけ、修正すべき具体的な問題を提起します。また、これらのコミュニティが、データやモデルの所有権を含む結果に対する関心を持っていることも重要です。 This commentary piece reviews the recent Open AI Democratic Inputs programme, which funded 10 teams to design procedures for public participation in generative AI. While applauding the technical innovations in these projects, we identify several shared assumptions including the generality of LLMs, extracting abstract values, soliciting solutions not problems and equating participation with democracy. We call instead for AI participation which involves specific communities and use cases and solicits concrete problems to be remedied. We also find it important that these communities have a stake in the outcome, including ownership of data or models.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# 企業クレジットアセスメントのためのグラフ次元注意ネットワーク Graph Dimension Attention Networks for Enterprise Credit Assessment ( http://arxiv.org/abs/2407.11615v1 ) ライセンス: Link先を確認	Shaopeng Wei, Beni Egressy, Xingyan Chen, Yu Zhao, Fuzhen Zhuang, Roger Wattenhofer, Gang Kou,	(参考訳) 企業クレジットアセスメントは金融リスクを評価する上で重要であり、グラフニューラルネットワーク(GNN)は、相互関係をモデル化する高度な能力を持ち、これらの金融ネットワークをより深く理解するための自然なツールである。しかし、既存のGNNベースの手法は、しばしば異なる特徴次元の不均一な重要性を見落とし、信用リスクレベルを適切にモデル化する上で不足する、感染リスク集約のためのエンティティレベルの注意機構を主に強調する。この問題に対処するため,我々はGDAN (Graph Dimension Attention Network) という新しいアーキテクチャを提案する。さらに、財務シナリオにおけるGNN手法の解釈可能性について検討し、GDAN-DistShiftと呼ばれるGDANのためのシンプルだが効果的なデータ中心説明器を提案する。 DistShiftは、メッセージパッシングプロセス中の分散シフトを定量化することで、エッジレベルの解釈性を提供する。さらに、我々は、実世界のマルチソースエンタープライズクレジットアセスメントデータセット(ECAD)を収集し、高品質なデータセットがこの分野で欠落しているため、研究コミュニティにアクセスできるようにしました。 ECADを用いた大規模な実験により,本手法の有効性が示された。さらに、よく知られたSMEsDとDBLPのデータセット上でGDANを実行し、優れた結果を得た。 Enterprise credit assessment is critical for evaluating financial risk, and Graph Neural Networks (GNNs), with their advanced capability to model inter-entity relationships, are a natural tool to get a deeper understanding of these financial networks. However, existing GNN-based methodologies predominantly emphasize entity-level attention mechanisms for contagion risk aggregation, often overlooking the heterogeneous importance of different feature dimensions, thus falling short in adequately modeling credit risk levels. To address this issue, we propose a novel architecture named Graph Dimension Attention Network (GDAN), which incorporates a dimension-level attention mechanism to capture fine-grained risk-related characteristics. Furthermore, we explore the interpretability of the GNN-based method in financial scenarios and propose a simple but effective data-centric explainer for GDAN, called GDAN-DistShift. DistShift provides edge-level interpretability by quantifying distribution shifts during the message-passing process. Moreover, we collected a real-world, multi-source Enterprise Credit Assessment Dataset (ECAD) and have made it accessible to the research community since high-quality datasets are lacking in this field. Extensive experiments conducted on ECAD demonstrate the effectiveness of our methods. In addition, we ran GDAN on the well-known datasets SMEsD and DBLP, also with excellent results.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# ストラテジック・リトルストーン・ディメンション:オンラインストラテジック分類における改善された境界 Strategic Littlestone Dimension: Improved Bounds on Online Strategic Classification ( http://arxiv.org/abs/2407.11619v1 ) ライセンス: Link先を確認	Saba Ahmadi, Kunhe Yang, Hanrui Zhang,	(参考訳) 戦略エージェントが観測可能な特徴を修正して肯定的な分類を受けられるような設定において、オンライン二項分類の問題について検討する。特徴空間上の有向グラフによる実現可能な操作の集合をモデル化し、学習者が本来の操作ではなく、操作された特徴のみを観測すると仮定する。我々は,仮説クラスと操作グラフの結合複雑性をキャプチャする新たな組合せ尺度である,ストラテジック・リトルストーン次元を導入する。実測可能な設定において、決定論的学習アルゴリズムのインスタンス最適誤り境界を特徴付けることを実証する。我々はまた、エージェントの本来の特徴を観察しないという追加の課題を考慮に入れた、洗練された無知から実現可能な削減によって、無知環境における後悔の改善も達成した。最後に、学習者が操作グラフを知っているという仮定を緩和し、その代わりに、その知識がグラフの族によって捕捉されると仮定する。我々は、すべてのエージェントがグラフファミリ内の同じグラフで操作する実現可能な設定と、操作グラフが逆向きに選択され、家族内の1つのグラフで一貫したモデル化が行われない非依存的な設定の両方において、後悔すべき境界を導出する。 We study the problem of online binary classification in settings where strategic agents can modify their observable features to receive a positive classification. We model the set of feasible manipulations by a directed graph over the feature space, and assume the learner only observes the manipulated features instead of the original ones. We introduce the Strategic Littlestone Dimension, a new combinatorial measure that captures the joint complexity of the hypothesis class and the manipulation graph. We demonstrate that it characterizes the instance-optimal mistake bounds for deterministic learning algorithms in the realizable setting. We also achieve improved regret in the agnostic setting by a refined agnostic-to-realizable reduction that accounts for the additional challenge of not observing agents' original features. Finally, we relax the assumption that the learner knows the manipulation graph, instead assuming their knowledge is captured by a family of graphs. We derive regret bounds in both the realizable setting where all agents manipulate according to the same graph within the graph family, and the agnostic setting where the manipulation graphs are chosen adversarially and not consistently modeled by a single graph in the family.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# 公正なグラフニューラルネットワークの再バランス Rethinking Fair Graph Neural Networks from Re-balancing ( http://arxiv.org/abs/2407.11624v1 ) ライセンス: Link先を確認	Zhixun Li, Yushun Dong, Qiang Liu, Jeffrey Xu Yu,	(参考訳) グラフニューラルネットワーク(GNN)の強力な表現能力によって駆動される、豊富なGNNモデルは、多くの現実世界のアプリケーションに広くデプロイされている。それにもかかわらず、異なる人口集団間の分布格差により、ハイテイクな意思決定システムにおける公正さが注目されている。 GNNの公正性を向上し、大きな成功を収めるために、近年の多くの研究が続けられているが、それらはすべて、大きなアーキテクチャ変更や、より過度なパラメータチューニングを必要とする追加の損失関数を必要としている。驚いたことに、単純な再分散手法は、既存の公正なGNN手法と簡単に一致したり、超えたりすることができる。我々は、異なる人口集団間の不均衡が重要な不公平の原因であり、その結果、各グループからパラメータの更新に対する不均衡な貢献をもたらすと主張している。しかし、これらの単純な再バランス手法には、トレーニング中に独自の欠点がある。本稿では,グループバランスによるGNNの不公平さを軽減するために,再バランシングによるFairGB,Fair Graph Neural Networkを提案する。技術的には、FairGBは2つのモジュールで構成されている。まず、ドメイン間とクラス間の対物対を選択し、ego-networksを補間して新しいサンプルを生成する。分析によって、因果的視点でモデルの偏りのメカニズムを明らかにすることができ、我々の戦略がターゲットラベルから統計的に独立していることを示すことができる。第2に、勾配に応じて各群の貢献を再考する。これら2つのモジュールを組み合わせることで、相互に促進することができる。ベンチマークによる実験結果から,本手法は有効性と公平性の両方に関して,最先端の結果が得られることが示された。コードはhttps://github.com/ZhixunLEE/FairGBで入手できる。 Driven by the powerful representation ability of Graph Neural Networks (GNNs), plentiful GNN models have been widely deployed in many real-world applications. Nevertheless, due to distribution disparities between different demographic groups, fairness in high-stake decision-making systems is receiving increasing attention. Although lots of recent works devoted to improving the fairness of GNNs and achieved considerable success, they all require significant architectural changes or additional loss functions requiring more hyper-parameter tuning. Surprisingly, we find that simple re-balancing methods can easily match or surpass existing fair GNN methods. We claim that the imbalance across different demographic groups is a significant source of unfairness, resulting in imbalanced contributions from each group to the parameters updating. However, these simple re-balancing methods have their own shortcomings during training. In this paper, we propose FairGB, Fair Graph Neural Network via re-Balancing, which mitigates the unfairness of GNNs by group balancing. Technically, FairGB consists of two modules: counterfactual node mixup and contribution alignment loss. Firstly, we select counterfactual pairs across inter-domain and inter-class, and interpolate the ego-networks to generate new samples. Guided by analysis, we can reveal the debiasing mechanism of our model by the causal view and prove that our strategy can make sensitive attributes statistically independent from target labels. Secondly, we reweigh the contribution of each group according to gradients. By combining these two modules, they can mutually promote each other. Experimental results on benchmark datasets show that our method can achieve state-of-the-art results concerning both utility and fairness metrics. Code is available at https://github.com/ZhixunLEE/FairGB.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# 目による検証の注意:散乱体における線形トレンドの視覚的検証 Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots ( http://arxiv.org/abs/2407.11625v1 ) ライセンス: Link先を確認	Daniel Braun, Remco Chang, Michael Gleicher, Tatiana von Landesberger,	(参考訳) スパータプロットにおける回帰モデルの視覚的検証は、モデル品質を評価する一般的なプラクティスであるが、その有効性はいまだに不明である。線形回帰モデル(線形傾向)を視覚的に検証する個人の能力を調べるための実証実験を2回行った。最初の実験では、傾きの視覚的評価(線をデータに合わせる)の精度は、傾きの視覚的検証(線を受理する)よりも高いことがわかった。特に、両方のケースで"急すぎる"斜面へのバイアスが見つかりました。これは、参加者が共通の垂直距離(OLS回帰)ではなく、点と線の間の直交距離(すなわちODR回帰)で回帰を自然に評価する、という新たな洞察につながった。第2の実験では,レグレッション・ビジュアライゼーション(エラー線,バウンディングボックス,信頼区間)に共通設計を導入することで,視覚的検証が向上するかどうかを検討した。エラーラインはバリデーションバイアスを減らしたが、結果はどの設計にも望ましい精度の向上を示さなかった。以上の結果から,スキャッタプロットの線形傾向に対する視覚モデル検証の有用性が示唆された。 Visual validation of regression models in scatterplots is a common practice for assessing model quality, yet its efficacy remains unquantified. We conducted two empirical experiments to investigate individuals' ability to visually validate linear regression models (linear trends) and to examine the impact of common visualization designs on validation quality. The first experiment showed that the level of accuracy for visual estimation of slope (i.e., fitting a line to data) is higher than for visual validation of slope (i.e., accepting a shown line). Notably, we found bias toward slopes that are "too steep" in both cases. This lead to novel insights that participants naturally assessed regression with orthogonal distances between the points and the line (i.e., ODR regression) rather than the common vertical distances (OLS regression). In the second experiment, we investigated whether incorporating common designs for regression visualization (error lines, bounding boxes, and confidence intervals) would improve visual validation. Even though error lines reduced validation bias, results failed to show the desired improvements in accuracy for any design. Overall, our findings suggest caution in using visual model validation for linear trends in scatterplots.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# 動的次元ラッピング(DDW: Dynamic Dimension Wrapping)アルゴリズム:動的多次元空間における効率的なクロス次元探索のための新しいアプローチ Dynamic Dimension Wrapping (DDW) Algorithm: A Novel Approach for Efficient Cross-Dimensional Search in Dynamic Multidimensional Spaces ( http://arxiv.org/abs/2407.11626v1 ) ライセンス: Link先を確認	Dongnan Jin, Yali Liu, Qiuzhi Song, Xunju Ma, Yue Liu, Dehao Wu,	(参考訳) 現実の世界では、最適化問題の複雑さが増大し続けており、より効率的な最適化方法の研究が急務である。現在の最適化アルゴリズムは、一定次元の問題を解くのに優れている。しかし、動的多次元空間の探索における効率性は不十分である。次元の異なる多次元空間におけるクロス次元探索の課題に対して,本研究では,新しい最適化アルゴリズムである動的次元ラッピング(DDW)アルゴリズムを提案する。まず、動的時間ウォーピング(DTW)アルゴリズムとユークリッド距離を利用して、次元の異なる時系列間のマッピング関係を確立することにより、次元の動的多次元空間に適した適合関数を作成する。さらに、DDWは動的多次元空間に対してより効率的で効率的なクロス次元探索機構を導入している。最後に、動的多次元空間探索における31の最適化アルゴリズムを用いた比較試験により、DDWは優れた探索効率を示し、実際の最適解に最も近い検索結果を提供することを示した。 In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search in multi-dimensional spaces with varying numbers of dimensions, this study proposes a new optimization algorithm-Dynamic Dimension Wrapping (DDW) algorithm. Firstly, by utilizing the Dynamic Time Warping (DTW) algorithm and Euclidean distance, a mapping relationship between different time series across dimensions is established, thus creating a fitness function suitable for dimensionally dynamic multi-dimensional space. Additionally, DDW introduces a novel, more efficient cross-dimensional search mechanism for dynamic multidimensional spaces. Finally, through comparative tests with 31 optimization algorithms in dynamic multidimensional space search, the results demonstrate that DDW exhibits outstanding search efficiency and provides search results closest to the actual optimal solution.	翻訳日:2024-07-17 15:23:07 公開日:2024-07-16
# 離散時間量子ウォーク:グラフ表現のための量子アドバンテージ Discrete-Time Quantum Walks: A Quantum Advantage for Graph Representation ( http://arxiv.org/abs/2407.11630v1 ) ライセンス: Link先を確認	Boxuan Ai,	(参考訳) 本稿では,離散時間量子ウォークをグラフ埋め込み技術に変換し,グラフ表現手法の新たな視点を提供する新しい手法を提案する。数学的操作により,この手法のアプローチは複雑なグラフトポロジをヒルベルト空間に適応的にマッピングし,グラフ解析の有効性を大幅に向上させ,高度な量子機械学習タスクの道を開く。この開発は量子コンピューティングとグラフ理論の交差点に革命をもたらすことを約束し、グラフコンピューティングとネットワーク科学への量子アルゴリズムの適用における新たなフロンティアをグラフ化している。 This paper presents a novel methodology that transforms discrete-time quantum walks into a graph embedding technique, offering a fresh perspective on graph representation methods.Through mathematical manipulations, the approach of this paper adeptly maps intricate graph topologies into the Hilbert space, which significantly enhances the efficacy of graph analysis and paves the way for sophisticated quantum machine learning tasks. This development promises to revolutionize the intersection of quantum computing and graph theory , charting new frontiers in the application of quantum algorithms to graph computing and network science.	翻訳日:2024-07-17 15:11:54 公開日:2024-07-16
# 拡散変換器の16億パラメータへのスケーリング Scaling Diffusion Transformers to 16 Billion Parameters ( http://arxiv.org/abs/2407.11633v1 ) ライセンス: Link先を確認	Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang,	(参考訳) 本稿では,拡散変換器のスパースバージョンであるDiT-MoEについて述べる。 DiT-MoEには、共有専門家ルーティングと専門家レベルのバランス損失という2つのシンプルな設計が含まれている。条件付き画像生成に適用した場合、専門家の専門化を深く分析すると、興味深い結果が得られます。一専門家の選択は、異なるクラス条件情報に敏感でありながら、空間的位置及び騒音の段階による嗜好を示す。 (二)MoE層が深くなるにつれて、専門家の選抜は徐々に、特定の空間的位置から分散とバランスへと変化していく。三専門家の専門化は、早い段階でより集中し、半後徐々に一様になる傾向にある。本稿では、まず低周波空間情報をモデル化し、次に高周波複素情報をモデル化する拡散過程に起因する。上記のガイダンスに基づき、一連のDiT-MoEは、高密度ネットワークと同等の性能を実験的に達成するが、推論時に計算負荷をはるかに少なくする。さらに、合成画像データを用いてDiT-MoEの可能性を示し、新しいSoTA FID-50Kスコアが512$\times$512の解像度設定で1.80となる16.5Bパラメータで拡散モデルをスケーリングする。プロジェクトページ:https://github.com/feizc/DiT-MoE。 In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is scalable and competitive with dense networks while exhibiting highly optimized inference. The DiT-MoE includes two simple designs: shared expert routing and expert-level balance loss, thereby capturing common knowledge and reducing redundancy among the different routed experts. When applied to conditional image generation, a deep analysis of experts specialization gains some interesting observations: (i) Expert selection shows preference with spatial position and denoising time step, while insensitive with different class-conditional information; (ii) As the MoE layers go deeper, the selection of experts gradually shifts from specific spacial position to dispersion and balance. (iii) Expert specialization tends to be more concentrated at the early time step and then gradually uniform after half. We attribute it to the diffusion process that first models the low-frequency spatial information and then high-frequency complex information. Based on the above guidance, a series of DiT-MoE experimentally achieves performance on par with dense networks yet requires much less computational load during inference. More encouragingly, we demonstrate the potential of DiT-MoE with synthesized image data, scaling diffusion model at a 16.5B parameter that attains a new SoTA FID-50K score of 1.80 in 512$\times$512 resolution settings. The project page: https://github.com/feizc/DiT-MoE.	翻訳日:2024-07-17 15:11:54 公開日:2024-07-16
# REMM:End-to-End Multimodal Image Matchingのための回転同変フレームワーク REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching ( http://arxiv.org/abs/2407.11637v1 ) ライセンス: Link先を確認	Han Nie, Bin Luo, Jun Liu, Zhitao Fu, Weixing Liu, Xin Su,	(参考訳) 提案するREMMは、エンドツーエンドのマルチモーダル画像マッチングのための回転不変フレームワークであり、マッチングパイプライン全体のディスクリプタの回転差を完全にエンコードする。従来の学習に基づく手法は主にモーダル不変な記述子を抽出することに焦点を当て、回転不変性を一貫して無視していた。本稿では,REMMがマルチモーダル特徴学習モジュールや循環シフトモジュールなどのマルチモーダル画像マッチングに非常に有用であることを示す。まず、マルチモーダルな特徴学習モジュールを通してモーダル不変の特徴を学習する。そして, 循環シフトモジュールを設計して, ディスクリプタを回転的に符号化し, 回転同変マッチングの性能を大幅に向上し, 任意の角度で頑健になる。提案手法を検証するため,4つの公開データセットからのマルチアングル変換とマルチスケール変換を組み合わせたマルチモーダル画像の反ローテーション性能を評価するための総合的なローテーション・スケールマッチングベンチマークを構築した。大規模な実験により,本手法は既存のベンチマーク手法よりも優れ,独立したデータセットによく当てはまることが示された。さらに、循環シフトモジュールによる改善を検証するため、REMMのキーコンポーネントの詳細な分析を行った。コードとデータセットはhttps://github.com/HanNieWHU/REMM。 We present REMM, a rotation-equivariant framework for end-to-end multimodal image matching, which fully encodes rotational differences of descriptors in the whole matching pipeline. Previous learning-based methods mainly focus on extracting modal-invariant descriptors, while consistently ignoring the rotational invariance. In this paper, we demonstrate that our REMM is very useful for multimodal image matching, including multimodal feature learning module and cyclic shift module. We first learn modal-invariant features through the multimodal feature learning module. Then, we design the cyclic shift module to rotationally encode the descriptors, greatly improving the performance of rotation-equivariant matching, which makes them robust to any angle. To validate our method, we establish a comprehensive rotation and scale-matching benchmark for evaluating the anti-rotation performance of multimodal images, which contains a combination of multi-angle and multi-scale transformations from four publicly available datasets. Extensive experiments show that our method outperforms existing methods in benchmarking and generalizes well to independent datasets. Additionally, we conducted an in-depth analysis of the key components of the REMM to validate the improvements brought about by the cyclic shift module. Code and dataset at https://github.com/HanNieWHU/REMM.	翻訳日:2024-07-17 15:11:54 公開日:2024-07-16
# 時間イベント予測における大規模言語モデルの包括的評価 A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting ( http://arxiv.org/abs/2407.11638v1 ) ライセンス: Link先を確認	He Chang, Chenchen Ye, Zhulin Tao, Jie Wu, Zhengmao Yang, Yunshan Ma, Xianglin Huang, Tat-Seng Chua,	(参考訳) 近年,Large Language Models (LLM) は知識質問応答,数学的推論,常識推論など,様々なデータマイニングタスクにおいて大きな可能性を示している。しかし, 時間的事象予測におけるLCMの推論能力は未解明である。時間的事象予測におけるそれらの能力を体系的に検討するため,時間的事象予測のためのLLMに基づく手法を総合的に評価する。グラフデータとテキストデータの両方を含む高品質なデータセットがないため、私たちはまず、MidEast-TE-miniというベンチマークデータセットを構築しました。このデータセットに基づいて,様々な入力形式と検索拡張生成(RAG)モジュールを特徴とする一連のベースライン手法を設計する。大規模な実験から,LLMの入力に生テキストを直接統合しても,ゼロショット外挿性能は向上しないことがわかった。対照的に、特定の複雑なイベントや微調整LDMに生テキストを組み込むことで、性能が大幅に向上する。さらに、検索モジュールによって強化され、LLMは歴史的事象に隠された時間的関係パターンを効果的に捉えることができる。一方、LLMでは、特にRAG法では、人気バイアスやロングテール問題などの問題が続いている。これらの知見は, LLMに基づく事象予測手法の理解を深めるだけでなく, 将来的な研究の方向性も浮き彫りにしている。 Recently, Large Language Models (LLMs) have demonstrated great potential in various data mining tasks, such as knowledge question answering, mathematical reasoning, and commonsense reasoning. However, the reasoning capability of LLMs on temporal event forecasting has been under-explored. To systematically investigate their abilities in temporal event forecasting, we conduct a comprehensive evaluation of LLM-based methods for temporal event forecasting. Due to the lack of a high-quality dataset that involves both graph and textual data, we first construct a benchmark dataset, named MidEast-TE-mini. Based on this dataset, we design a series of baseline methods, characterized by various input formats and retrieval augmented generation(RAG) modules. From extensive experiments, we find that directly integrating raw texts into the input of LLMs does not enhance zero-shot extrapolation performance. In contrast, incorporating raw texts in specific complex events and fine-tuning LLMs significantly improves performance. Moreover, enhanced with retrieval modules, LLM can effectively capture temporal relational patterns hidden in historical events. Meanwhile, issues such as popularity bias and the long-tail problem still persist in LLMs, particularly in the RAG-based method. These findings not only deepen our understanding of LLM-based event forecasting methods but also highlight several promising research directions.We consider that this comprehensive evaluation, along with the identified research opportunities, will significantly contribute to future research on temporal event forecasting through LLMs.	翻訳日:2024-07-17 15:11:54 公開日:2024-07-16
# 準粒子間の破壊的干渉による長距離相互作用スピン鎖の絡み合いの緩やかな成長 The Slow Growth of Entanglement in Long-range Interacting Spins Chains due to Destructive Interference between quasi-Particles ( http://arxiv.org/abs/2407.11639v1 ) ライセンス: Link先を確認	Peyman Azodi, Herschel A. Rabitz,	(参考訳) 長距離相互作用量子系における有効光円錐の持続性は、依然として有意義で興味深いコンダンラムである。本稿では,低温におけるハイゼンベルクスピンチェーンの機構を理論的に明らかにする。この機構は、有効光円錐の外側の準粒子間の破壊的干渉に起因する。さらに, 破壊干渉現象の発生に必要条件を定めている。 1次元のパワー-ローの崩壊相互作用、具体的には、\|d_{ij}\|^{-p}$に従う相互作用では、$p>2$のときに破壊的干渉が起こる。理論的、数値的な解析により、この破壊的干渉は、相互作用範囲が切り離されたときに、予期せぬエンタングルメント伝播の加速をもたらす。この予測は、破壊干渉効果の直接の顕在化として、捕捉されたイオンの鎖で実験的に観測可能であることが示唆されている。 The persistence of effective light cones in long-range interacting quantum systems remains a significant and intriguing conundrum. In this paper, we theoretically reveal a mechanism in Heisenberg spin chains at low temperatures, wherein subsystems resist entanglement despite the onset of entangling effects in quench scenarios. This mechanism is attributed to destructive interference among quasi-particles outside an effective light cone. Furthermore, we establish a necessary condition for the occurrence of the destructive interference phenomenon. We demonstrate that for 1-D power-law decay interactions, specifically those following $\|d_{ij}\|^{-p}$, destructive interference occurs when $p>2$. As shown through theoretical and numerical analysis, this destructive interference results in an unexpected acceleration of entanglement propagation when the interaction range is truncated. This prediction is proposed to be observable experimentally in chains of trapped ions as a direct manifestation of the destructive interference effect.	翻訳日:2024-07-17 15:11:54 公開日:2024-07-16
# 低Q円筒ナノキャビティに結合したInAs/GaAs量子ドットからの紫外単光子放出 Purcell-enhanced single-photon emission from InAs/GaAs quantum dots coupled to low-Q cylindrical nanocavities ( http://arxiv.org/abs/2407.11642v1 ) ライセンス: Link先を確認	Abhiroop Chellu, Subhajit Bej, Hanna Wahl, Hermann Kahle, Topi Uusitalo, Roosa Hytönen, Heikki Rekola, Jouko Lang, Eva Schöll, Lukas Hanschke, Patricia Kallert, Tobias Kipp, Christian Strelow, Marjukka Tuominen, Klaus D. Jöns, Petri Karvinen, Tapio Niemi, Mircea Guina, Teemu Hakkarainen,	(参考訳) 単一および絡み合った光子の生成は、フォトニック量子情報処理に不可欠である。量子ドット(QD)は、要求に応じて高品質な量子光状態を生成することができる有望な情報源であるが、通常は自発放射寿命によって制限される。本研究では,Purcell効果を利用して,InAs QDの排出速度を最大38倍に向上させる。本研究では, それぞれのQDを, 4.5x10-4 ({\lambda}/n)3のモード体積と62のクオリティ係数を特徴とする金属クラッドGaAsナノピラーに結合することにより, 15nmの帯域幅で高いパーセル向上を実現する。我々はこれらのキャビティ内のQDから0.5%の低い光子放出確率を計測した。この研究は、ナノスケールでの光-物質相互作用を探索するための貴重なプラットフォームを提供するだけでなく、高い繰り返し速度で放出されるQDソースの開発に向けて大きな一歩を踏み出し、量子通信とコンピューティングの将来における彼らの役割を支えている。 Generation of single and entangled photons is crucial for photonic quantum information processing. Quantum dots (QDs) are promising sources capable of producing high-quality quantum light states on demand, although at a rate typically limited by their spontaneous radiative lifetime. In this study, we utilize the Purcell effect to demonstrate up to a 38-fold enhancement in the emission rate of InAs QDs. We achieve this by coupling individual QDs to metal-clad GaAs nanopillars characterized with mode volume of 4.5x10-4 ({\lambda}/n)3 and quality factor of 62, consequently enabling high Purcell enhancement across a bandwidth of 15 nm. We measure a multi-photon emission probability as low as 0.5 % from QDs within these cavities. In addition to providing a valuable platform for exploring light-matter interaction at the nanoscale, this work represents a significant stride towards developing QD-sources emitting at high repetition rates, underpinning their role in the future of quantum communication and computing.	翻訳日:2024-07-17 15:11:54 公開日:2024-07-16
# パーセプションは計画を支援する: 二重エッジ構造によるマルチステージレーン・レベル統合の実現 Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures ( http://arxiv.org/abs/2407.11644v1 ) ライセンス: Link先を確認	Guoliang You, Xiaomeng Chu, Yifan Duan, Wenyu Zhang, Xingchen Li, Sha Zhang, Yao Li, Jianmin Ji, Yanyong Zhang,	(参考訳) 自動運転の計画においては、車線、交差点、交通規制、ダイナミックエージェントといった重要な交通要素を検討することが不可欠である。しかし、それらは従来のエンドツーエンドの計画手法によって見落とされ、おそらく非効率性や交通規制の遵守につながる。本研究は,これらの要素の認識を計画課題に統合する試みである。そこで我々は,車線レベルの計画と知覚を調和させる新しいフレームワークであるPerception Helps Planning (PHP)を提案する。この統合により、計画が本質的にトラフィックの制約に一致していることが保証され、安全で効率的な運転が容易になる。特に、PHPは、車線交差点、車線方向、車線占有率、計画について、車線エッジと属性の双方の3D位置を考慮して、計画と知覚のために車線の両端に焦点を当てている。アルゴリズム設計では、マルチカメラ画像を符号化して上記の特徴を抽出し、レーンレベルの知覚結果を予測する。次に、階層的な早期融合モジュールは、計画属性を予測する機能を洗練します。最後に、両端インタプリタは、車線レベルの認識と計画情報を統合するために特別に設計されたレイトフュージョンプロセスを利用して、車両制御信号を生成する。 3つのCarlaベンチマークの実験では、既存のアルゴリズムよりも27.20%、33.47%、15.54%の駆動スコアが大幅に改善され、最新性能が22.57 FPSまで向上した。 When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into the planning task. To this end, we propose Perception Helps Planning (PHP), a novel framework that reconciles lane-level planning with perception. This integration ensures that planning is inherently aligned with traffic constraints, thus facilitating safe and efficient driving. Specifically, PHP focuses on both edges of a lane for planning and perception purposes, taking into consideration the 3D positions of both lane edges and attributes for lane intersections, lane directions, lane occupancy, and planning. In the algorithmic design, the process begins with the transformer encoding multi-camera images to extract the above features and predicting lane-level perception results. Next, the hierarchical feature early fusion module refines the features for predicting planning attributes. Finally, the double-edge interpreter utilizes a late-fusion process specifically designed to integrate lane-level perception and planning information, culminating in the generation of vehicle control signals. Experiments on three Carla benchmarks show significant improvements in driving score of 27.20%, 33.47%, and 15.54% over existing algorithms, respectively, achieving the state-of-the-art performance, with the system operating up to 22.57 FPS.	翻訳日:2024-07-17 15:11:54 公開日:2024-07-16

Title

Authors

Abstract

論文公表日・翻訳日

# メラノーマ検出のためのハイブリッドディープラーニングフレームワーク

Hybrid Deep Learning Framework for Enhanced Melanoma Detection ( http://arxiv.org/abs/2408.00772v1 )

ライセンス: Link先を確認

Peng Zhang, Divya Chaudhary,

(参考訳) がんは世界中で主要な死因であり、早期発見と治療技術の進歩を必要としている。本稿では,皮膚画像の分類におけるU-Netの長所と有効ネットの長所を相乗的に組み合わせた,新規で高効率なメラノーマ検出フレームワークを提案する。本研究の目的は, メラノーマ検出の精度と効率を, 革新的なハイブリッドアプローチにより向上させることである。我々は、HAM10000データセットを使用して、U-Netモデルを綿密にトレーニングし、癌領域を正確に分類できるようにした。同時に,ISIC 2020データセットを用いてEfficientNetモデルをトレーニングし,皮膚がんのバイナリ分類に最適化した。私たちのハイブリッドモデルは、ISIC 2020データセットで99.01%の顕著な精度を達成することで、パフォーマンスを著しく向上させる。この例外的な結果は、既存のモデル構造と比較して、我々のアプローチの優位性を示している。 EfficientNetの高度な分類技術とU-Netの正確なセグメンテーション機能を統合することで、我々のフレームワークはメラノーマ検出のための包括的なソリューションを提供する。大規模な実験の結果は,セグメント化タスクと分類タスクの両方において,提案手法の精度と信頼性を強調した。悪性黒色腫の早期診断と治療において医療従事者にとって堅牢なツールである。われわれのフレームワークは、皮膚がん自動検出の分野で新しいベンチマークを設定でき、この重要な医療画像領域におけるさらなる研究と開発を奨励できると信じている。

Cancer is a leading cause of death worldwide, necessitating advancements in early detection and treatment technologies. In this paper, we present a novel and highly efficient melanoma detection framework that synergistically combines the strengths of U-Net for segmentation and EfficientNet for the classification of skin images. The primary objective of our study is to enhance the accuracy and efficiency of melanoma detection through an innovative hybrid approach. We utilized the HAM10000 dataset to meticulously train the U-Net model, enabling it to precisely segment cancerous regions. Concurrently, we employed the ISIC 2020 dataset to train the EfficientNet model, optimizing it for the binary classification of skin cancer. Our hybrid model demonstrates a significant improvement in performance, achieving a remarkable accuracy of 99.01% on the ISIC 2020 dataset. This exceptional result underscores the superiority of our approach compared to existing model structures. By integrating the precise segmentation capabilities of U-Net with the advanced classification prowess of EfficientNet, our framework offers a comprehensive solution for melanoma detection. The results of our extensive experiments highlight the high accuracy and reliability of our method in both segmentation and classification tasks. This indicates the potential of our hybrid approach to significantly enhance cancer detection, providing a robust tool for medical professionals in the early diagnosis and treatment of melanoma. We believe that our framework can set a new benchmark in the field of automated skin cancer detection, encouraging further research and development in this crucial area of medical imaging.

翻訳日:2024-08-19 05:28:21 公開日:2024-07-16

# K平均クラスタリングに基づく色抽出によるWebサイトの視覚分析のためのファジィ論理手法

Fuzzy Logic Approach For Visual Analysis Of Websites With K-means Clustering-based Color Extraction ( http://arxiv.org/abs/2408.00774v1 )

ライセンス: Link先を確認

Tamiris Abildayeva, Pakizar Shamoi,

(参考訳) ウェブサイトはインターネットの基礎を形成し、情報を広め、デジタルリソースにアクセスするためのプラットフォームとして機能する。ユーザーは幅広いコンテンツやサービスにアクセスできるようになり、インターネットの利便性が向上する。ウェブサイトの美学は、全体的な効果において重要な役割を担い、ユーザー体験、エンゲージメント、満足度に大きな影響を与えます。本稿では,世界中のインターネット利用者の増加を踏まえ,Webサイトデザインの美学がユーザエクスペリエンスの向上に重要であることを考察する。これは、しばしば50ミリ秒以内に形成される最初の印象が、ウェブサイトの魅力とユーザビリティに対するユーザの認識に重大な影響を与えることを強調している。本稿では、ファジィ論理を用いて、色調和とフォント人気に基づいてウェブサイトの美意識を測定する新しい手法を提案する。私たちは、Webデザイントレンドの動的性質に対する妥当性と適応性を確保するために、200近い人気で頻繁に使用されるWebサイトデザインからなる、独自のデータセットを収集しました。ウェブサイトのスクリーンショットから、k-meansクラスタリングを用いて、支配的な色を抽出した。本研究の目的は,Webサイトデザインにおける美学とユーザビリティの関係の理解を深めることである。

Websites form the foundation of the Internet, serving as platforms for disseminating information and accessing digital resources. They allow users to engage with a wide range of content and services, enhancing the Internet's utility for all. The aesthetics of a website play a crucial role in its overall effectiveness and can significantly impact user experience, engagement, and satisfaction. This paper examines the importance of website design aesthetics in enhancing user experience, given the increasing number of internet users worldwide. It emphasizes the significant impact of first impressions, often formed within 50 milliseconds, on users' perceptions of a website's appeal and usability. We introduce a novel method for measuring website aesthetics based on color harmony and font popularity, using fuzzy logic to predict aesthetic preferences. We collected our own dataset, consisting of nearly 200 popular and frequently used website designs, to ensure relevance and adaptability to the dynamic nature of web design trends. Dominant colors from website screenshots were extracted using k-means clustering. The findings aim to improve understanding of the relationship between aesthetics and usability in website design.

翻訳日:2024-08-19 05:28:21 公開日:2024-07-16

# 多スケール偏微分方程式に対する拡張畳み込みニューラル作用素

Dilated convolution neural operator for multiscale partial differential equations ( http://arxiv.org/abs/2408.00775v1 )

ライセンス: Link先を確認

Bo Xu, Xinliang Liu, Lei Zhang,

(参考訳) 本稿では,多スケール偏微分方程式に対するデータ駆動型演算子学習法を提案する。低周波数フーリエモードのような低ランクなグローバルベースと粗いパッチ(拡張畳み込みに類似)上の局所化されたベースの組み合わせによるマルチスケールパラメータ化ソリューションの表現からインスピレーションを得て、Dilated Convolutional Neural Operator (DCNO)を提案する。 DCNOアーキテクチャは、畳み込み層とフーリエ層を組み合わせて低計算コストを維持しながら、高周波と低周波の両方の特徴を効果的に捉えている。我々は,多スケール楕円型方程式,逆問題,ナビエ・ストークス方程式,ヘルムホルツ方程式など,様々なデータセット上でのDCNOの性能を評価する実験を行った。我々は,DCNOが精度と計算コストの最適なバランスをとることを示し,マルチスケール演算子学習に有望なソリューションを提供する。

This paper introduces a data-driven operator learning method for multiscale partial differential equations, with a particular emphasis on preserving high-frequency information. Drawing inspiration from the representation of multiscale parameterized solutions as a combination of low-rank global bases (such as low-frequency Fourier modes) and localized bases over coarse patches (analogous to dilated convolution), we propose the Dilated Convolutional Neural Operator (DCNO). The DCNO architecture effectively captures both high-frequency and low-frequency features while maintaining a low computational cost through a combination of convolution and Fourier layers. We conduct experiments to evaluate the performance of DCNO on various datasets, including the multiscale elliptic equation, its inverse problem, Navier-Stokes equation, and Helmholtz equation. We show that DCNO strikes an optimal balance between accuracy and computational cost and offers a promising solution for multiscale operator learning.

翻訳日:2024-08-19 05:28:21 公開日:2024-07-16

# CATD:脳波対fMRIのクロスモーダル生成のための統一表現学習

CATD: Unified Representation Learning for EEG-to-fMRI Cross-Modal Generation ( http://arxiv.org/abs/2408.00777v1 )

ライセンス: Link先を確認

Weiheng Yao, Shuqiang Wang,

(参考訳) マルチモーダル・ニューロイメージング分析は、異なるイメージング技術の統合を可能にするため、脳機能と病理の包括的理解に不可欠であり、個々のモダリティの限界を克服する。しかし、高いコストと特定のモダリティの可用性の制限は、大きな課題を引き起こしている。これらの課題に対処するために,脳波(EEG)信号から機能的磁気共鳴画像(fMRI)検出血酸素レベル依存性(BOLD)信号を生成するために,ニューロイメージングの終端と終端のクロスモーダル合成のための条件付き時間拡散(CATD)フレームワークを提案する。条件付きアラインドブロック(CAB)を構築することにより、異種ニューロイメージングはポテンシャル空間に整列し、ニューロイメージングにおけるクロスモーダル変換の基礎となる統一された表現を実現する。構築されたDynamic Time-Frequency Segmentation (DTFS)モジュールと組み合わせることで、脳波信号を使用してBOLD信号の時間分解能を改善し、脳のダイナミックな詳細を捉えることができる。実験により,神経活動予測の精度の向上,異常脳領域の同定,BOLD信号の時間分解能の向上にフレームワークの有効性が示された。提案フレームワークは、異種神経画像データを潜在的表現空間に統一し、パーキンソン病予測の改善や異常脳領域の同定などの医学的応用の約束を示すことにより、ニューロイメージングのクロスモーダル合成のための新しいパラダイムを確立する。

Multi-modal neuroimaging analysis is crucial for a comprehensive understanding of brain function and pathology, as it allows for the integration of different imaging techniques, thus overcoming the limitations of individual modalities. However, the high costs and limited availability of certain modalities pose significant challenges. To address these issues, this paper proposed the Condition-Aligned Temporal Diffusion (CATD) framework for end-to-end cross-modal synthesis of neuroimaging, enabling the generation of functional magnetic resonance imaging (fMRI)-detected Blood Oxygen Level Dependent (BOLD) signals from more accessible Electroencephalography (EEG) signals. By constructing Conditionally Aligned Block (CAB), heterogeneous neuroimages are aligned into a potential space, achieving a unified representation that provides the foundation for cross-modal transformation in neuroimaging. The combination with the constructed Dynamic Time-Frequency Segmentation (DTFS) module also enables the use of EEG signals to improve the temporal resolution of BOLD signals, thus augmenting the capture of the dynamic details of the brain. Experimental validation demonstrated the effectiveness of the framework in improving the accuracy of neural activity prediction, identifying abnormal brain regions, and enhancing the temporal resolution of BOLD signals. The proposed framework establishes a new paradigm for cross-modal synthesis of neuroimaging by unifying heterogeneous neuroimaging data into a potential representation space, showing promise in medical applications such as improving Parkinson's disease prediction and identifying abnormal brain regions.

翻訳日:2024-08-19 05:28:21 公開日:2024-07-16

# フロントエンド拡散: 抽象的から詳細なタスク遷移によるインテントベースのユーザインタフェースの探索

Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions ( http://arxiv.org/abs/2408.00778v1 )

ライセンス: Link先を確認

Qinshi Zhang, Latisha Besariani Hendra, Mohan Chi, Zijian Ding,

(参考訳) Generative AIの出現は、コマンドベースのユーザインターフェースからインテントベースの結果仕様へのパラダイムシフトを引き起こしている。本稿では,ユーザインタフェースの抽象化と具体的実装のギャップを埋めることを目的として,フロントエンドコード生成の文脈における抽象的から詳細的なタスク遷移を,意図に基づくユーザインタフェースへのステップとして検討する。本稿では,ユーザスケッチから高品質なWebサイトを生成する,エンドツーエンドのLDMツールであるFrontend Diffusionを紹介する。このシステムは、スケッチ、書き込み、コーディングという3段階のタスク遷移プロセスを採用している。複雑なタスクにおける人的介入やコミュニケーションコストを低減するために,タスク遷移の可能性を示す。我々の研究は、他のドメインで同様のアプローチを探求するための道を開き、ビデオ制作のようなより複雑で相互依存的なタスクにまで拡張する可能性がある。

The emergence of Generative AI is catalyzing a paradigm shift in user interfaces from command-based to intent-based outcome specification. In this paper, we explore abstract-to-detailed task transitions in the context of frontend code generation as a step towards intent-based user interfaces, aiming to bridge the gap between abstract user intentions and concrete implementations. We introduce Frontend Diffusion, an end-to-end LLM-powered tool that generates high-quality websites from user sketches. The system employs a three-stage task transition process: sketching, writing, and coding. We demonstrate the potential of task transitions to reduce human intervention and communication costs in complex tasks. Our work also opens avenues for exploring similar approaches in other domains, potentially extending to more complex, interdependent tasks such as video production.

翻訳日:2024-08-19 05:28:21 公開日:2024-07-16

# ショーショット画像分類のためのシームズ変圧器ネットワーク

Siamese Transformer Networks for Few-shot Image Classification ( http://arxiv.org/abs/2408.01427v1 )

ライセンス: Link先を確認

Weihao Jiang, Shuoxi Zhang, Kun He,

(参考訳) 人間は視覚分類タスクにおいて顕著な熟練度を示し、最小限の例で新しい画像を正確に認識し分類する。この能力は、詳細に集中し、以前の画像と新しい画像の間で共通の特徴を識別する能力に起因している。対照的に、既存の少数ショット画像分類法は、大域的特徴と局所的特徴の両方を強調し、両者を統合することを考える研究はほとんどない。この制限に対処するため,Samese Transformer Network (STN) に基づく新しいアプローチを提案する。提案手法では,事前学習した視覚変換器 (ViT) アーキテクチャを用いて,グローバルな特徴と局所的な特徴を抽出する2つの並列分岐ネットワークを用いる。具体的には、ViT-Smallネットワークアーキテクチャを実装し、自己教師付き学習によって得られた事前学習モデルパラメータを用いて分岐ネットワークを初期化する。ユークリッド距離測度を大域的特徴に適用し,KL(Kulback-Leibler)偏差測度を局所特徴量に適用する。 2つの指標を統合するために、まずL2正規化を用い、次に正規化結果を重み付けして最終的な類似度スコアを得る。この戦略は、グローバル機能とローカル機能の両方の利点を生かし、相補的なメリットを保証します。トレーニングフェーズでは、ネットワーク全体を微調整するメタラーニングアプローチを採用しています。本戦略は, 複雑な特徴適応モジュールの必要性を回避し, モデルの一般化能力を高めることを目的として, 画像分類におけるグローバルな特徴と局所的な特徴の可能性を効果的に活用する。大規模な実験により、我々のフレームワークはシンプルで有効であり、5ショットと1ショットの両方のシナリオで人気のある4つの数ショット分類ベンチマークの最先端のベースラインよりも優れたパフォーマンスを実現していることが示された。

Humans exhibit remarkable proficiency in visual classification tasks, accurately recognizing and classifying new images with minimal examples. This ability is attributed to their capacity to focus on details and identify common features between previously seen and new images. In contrast, existing few-shot image classification methods often emphasize either global features or local features, with few studies considering the integration of both. To address this limitation, we propose a novel approach based on the Siamese Transformer Network (STN). Our method employs two parallel branch networks utilizing the pre-trained Vision Transformer (ViT) architecture to extract global and local features, respectively. Specifically, we implement the ViT-Small network architecture and initialize the branch networks with pre-trained model parameters obtained through self-supervised learning. We apply the Euclidean distance measure to the global features and the Kullback-Leibler (KL) divergence measure to the local features. To integrate the two metrics, we first employ L2 normalization and then weight the normalized results to obtain the final similarity score. This strategy leverages the advantages of both global and local features while ensuring their complementary benefits. During the training phase, we adopt a meta-learning approach to fine-tune the entire network. Our strategy effectively harnesses the potential of global and local features in few-shot image classification, circumventing the need for complex feature adaptation modules and enhancing the model's generalization ability. Extensive experiments demonstrate that our framework is simple yet effective, achieving superior performance compared to state-of-the-art baselines on four popular few-shot classification benchmarks in both 5-shot and 1-shot scenarios.

翻訳日:2024-08-19 05:08:48 公開日:2024-07-16

# レイテンシ最適化ディープニューラルネットワーク(DNNs):チップ上のマルチプロセッサシステム(MPSoC)を用いたエッジでの人工知能アプローチ

Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC) ( http://arxiv.org/abs/2407.18264v1 )

ライセンス: Link先を確認

Seyed Nima Omidsajedi, Rekha Reddy, Jianming Yi, Jan Herbst, Christoph Lipps, Hans Dieter Schotten,

(参考訳) 6G通信システムから自律運転プラットフォームに至るまで、計算に大きく依存するほとんどのアプリケーションにおいて、計算の大部分はクライアント側であるべきです。モバイルデバイスにおけるエッジコンピューティング(Edge at Edge)は、この要件に対処するための最適化されたアプローチのひとつだ。そこで本研究では,低レイテンシ・電力最適化型スマートモバイルシステムの実現の可能性と課題について検討する。 FPGA(Field Programmable Gate Array)ベースのソリューションをエッジで利用すると、帯域幅最適化設計が実現し、結果としてシステムレベルのデッドラインでの計算効率が向上する。さらに,組込みFPGAエッジデバイス(MPSoC(Xilinx Multiprocessor System on Chip))とクラウドの両方におけるニューラルネットワーク(NN)の性能面と実装可能性について論じる。この研究の主な目的は、Xilinx Inc.によって開発されたディープラーニングプログラマブルエンジンをハードウェアアクセラレーターの主要コンポーネントとして使用するハイブリッドシステムの実証である。そして、この設計に基づいて、組込みソリューションを用いて、モバイルエッジコンピューティングの効率的なシステムを表現する。

Almost in every heavily computation-dependent application, from 6G communication systems to autonomous driving platforms, a large portion of computing should be near to the client side. Edge computing (AI at Edge) in mobile devices is one of the optimized approaches for addressing this requirement. Therefore, in this work, the possibilities and challenges of implementing a low-latency and power-optimized smart mobile system are examined. Utilizing Field Programmable Gate Array (FPGA) based solutions at the edge will lead to bandwidth-optimized designs and as a consequence can boost the computational effectiveness at a system-level deadline. Moreover, various performance aspects and implementation feasibilities of Neural Networks (NNs) on both embedded FPGA edge devices (using Xilinx Multiprocessor System on Chip (MPSoC)) and Cloud are discussed throughout this research. The main goal of this work is to demonstrate a hybrid system that uses the deep learning programmable engine developed by Xilinx Inc. as the main component of the hardware accelerator. Then based on this design, an efficient system for mobile edge computing is represented by utilizing an embedded solution.

翻訳日:2024-08-05 01:35:56 公開日:2024-07-16

# NudgeRank: パーソナライズされた健康のためのディジタルアルゴリズムナッジ

NudgeRank: Digital Algorithmic Nudging for Personalized Health ( http://arxiv.org/abs/2407.20241v1 )

ライセンス: Link先を確認

Jodi Chiam, Aloysius Lim, Ankur Teredesai,

(参考訳) 本稿では、人口規模でポジティブな健康行動を促進するために設計された革新的なデジタルアルゴリズムヌードシステムであるNudgeRankについて述べる。拡張可能な知識グラフを付加したグラフニューラルネットワークの新たな組み合わせを利用して、このレコメンダシステムは本番環境で運用されており、パーソナライズされたコンテキスト対応のナッジを毎日1100万人以上の介護者に提供する。この企業展開は、さまざまな健康状態とウェアラブルデバイスを収容するAI駆動型ヘルス行動変革イニシアチブの中で、最大のもののひとつだ。厳格な評価は、日歩が6.17%増加し、運動時間が7.61%増加したことを含む、統計的に有意な健康改善を示している。さらにユーザエンゲージメントとプログラムの登録が増加し、ベースラインシステムの4%に比べて13.1%のオープンレートとなった。スケーラビリティと信頼性を実証するため、NudgeRankは、製品システムに不可欠な自動化と可観測性標準を維持しながら、コモディティな計算資源を効率的に運用している。

In this paper we describe NudgeRank, an innovative digital algorithmic nudging system designed to foster positive health behaviors on a population-wide scale. Utilizing a novel combination of Graph Neural Networks augmented with an extensible Knowledge Graph, this Recommender System is operational in production, delivering personalized and context-aware nudges to over 1.1 million care recipients daily. This enterprise deployment marks one of the largest AI-driven health behavior change initiatives, accommodating diverse health conditions and wearable devices. Rigorous evaluation reveals statistically significant improvements in health outcomes, including a 6.17% increase in daily steps and 7.61% more exercise minutes. Moreover, user engagement and program enrollment surged, with a 13.1% open rate compared to baseline systems' 4%. Demonstrating scalability and reliability, NudgeRank operates efficiently on commodity compute resources while maintaining automation and observability standards essential for production systems.

翻訳日:2024-08-05 00:56:24 公開日:2024-07-16

# BadRobot:物理世界でLLMベースの体操AIをジェイルブレイク

BadRobot: Jailbreaking LLM-based Embodied AI in the Physical World ( http://arxiv.org/abs/2407.20242v1 )

ライセンス: Link先を確認

Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Shengshan Hu, Leo Yu Zhang,

(参考訳) 人工知能(AI)は、センサーやアクチュエータを通じて物理的な世界と相互作用し、知覚と行動をシームレスに統合する人工知能システムである。この設計により、AIは複雑な現実世界の環境から学び、操作することができる。大規模言語モデル(LLM)は言語命令を深く探求し、複雑なタスクの計画策定において重要な役割を担います。その結果、LLMベースのインボディードAIがコミュニティ内の研究の焦点として現れ、エンボディードAIを増強する大きな可能性を徐々に示してきた。今後10年間で、LLMベースのエンボディAIロボットが広く普及し、家庭や産業で一般的なものになるだろうと予測されている。 LLMベースのインボディードAIは有害な振る舞いを迫害するだろうか? アシモフの『3つのロボット法則』に逆らって人間の安全を脅かすこのロボットがもたらす深刻なリスクを、我々の研究は初めて確認した。具体的には、AIのジェイルブレイクを具体化して、3つの重大なセキュリティ上の脆弱性を露呈する。我々はまた、潜在的な緩和策を分析し、実世界における具体化されたAIアプリケーションの安全性に関するコミュニティの認識を提唱する。

Embodied artificial intelligence (AI) represents an artificial intelligence system that interacts with the physical world through sensors and actuators, seamlessly integrating perception and action. This design enables AI to learn from and operate within complex, real-world environments. Large Language Models (LLMs) deeply explore language instructions, playing a crucial role in devising plans for complex tasks. Consequently, they have progressively shown immense potential in empowering embodied AI, with LLM-based embodied AI emerging as a focal point of research within the community. It is foreseeable that, over the next decade, LLM-based embodied AI robots are expected to proliferate widely, becoming commonplace in homes and industries. However, a critical safety issue that has long been hiding in plain sight is: could LLM-based embodied AI perpetrate harmful behaviors? Our research investigates for the first time how to induce threatening actions in embodied AI, confirming the severe risks posed by these soon-to-be-marketed robots, which starkly contravene Asimov's Three Laws of Robotics and threaten human safety. Specifically, we formulate the concept of embodied AI jailbreaking and expose three critical security vulnerabilities: first, jailbreaking robotics through compromised LLM; second, safety misalignment between action and language spaces; and third, deceptive prompts leading to unaware hazardous behaviors. We also analyze potential mitigation measures and advocate for community awareness regarding the safety of embodied AI applications in the physical world.

翻訳日:2024-08-05 00:56:24 公開日:2024-07-16

# 小児相談における軽量オープンソース大言語モデルの性能評価 : 比較分析

Performance Evaluation of Lightweight Open-source Large Language Models in Pediatric Consultations: A Comparative Analysis ( http://arxiv.org/abs/2407.15862v1 )

ライセンス: Link先を確認

Qiuhong Wei, Ying Cui, Mengwei Ding, Yanqin Wang, Lingling Xiang, Zhengxiong Yao, Ceran Chen, Ying Long, Zhezhen Jin, Ximing Xu,

(参考訳) 大規模言語モデル(LLM)は医療への応用の可能性を示しているが、データのプライバシーと計算上の負担は医療機関への展開を制限する。 LLMのオープンソース版と軽量版は潜在的な解決策として浮上するが、その性能、特に小児科の環境では未調査である。 2022年12月1日から2023年10月30日にかけて、25の小児科からそれぞれ10の質問が寄せられた。 2つの軽量オープンソースLLM、ChatGLM3-6BとVicuna-7Bは、より大規模なモデルであるVicuna-13Bと、広く使われているプロプライエタリなChatGPT-3.5と共に、2023年11月1日から2023年11月7日までの間に、これらの質問に独立して答えた。再現性を評価するために、各調査は一度複製された。 We found that ChatGLM3-6B showed higher accuracy and completeness than Vicuna-13B and Vicuna-7B (P < .001) but all performance by ChatGPT-3.5。 ChatGPT-3.5は、ChatGLM3-6B (41.2%)、Vicuna-13B (11.2%)、Vicuna-7B (4.4%)と比較して高い評価を受けた。同様に、ChatGPT-3.5が78.4%、ChatGLM3-6Bが76.0%、Vicuna-13Bが34.8%、Vicuna-7Bが22.0%だった。 ChatGLM3-6Bは読みやすさにおいてChatGPT-3.5と一致し、どちらもVicunaモデル(P < .001)を上回った。共感の面では、ChatGPT-3.5は軽量LLM(P < .001)よりも優れていた。安全性の面では、全てのモデルが良好に動作し(P > .05)、98.4%以上の応答が安全であると評価された。調査を繰り返して確認した。結論として、軽量LSMは小児医療に有望な応用を実証している。しかし、軽量と大規模プロプライエタリなLLM間のギャップは、継続的な開発努力の必要性を浮き彫りにしている。

Large language models (LLMs) have demonstrated potential applications in medicine, yet data privacy and computational burden limit their deployment in healthcare institutions. Open-source and lightweight versions of LLMs emerge as potential solutions, but their performance, particularly in pediatric settings remains underexplored. In this cross-sectional study, 250 patient consultation questions were randomly selected from a public online medical forum, with 10 questions from each of 25 pediatric departments, spanning from December 1, 2022, to October 30, 2023. Two lightweight open-source LLMs, ChatGLM3-6B and Vicuna-7B, along with a larger-scale model, Vicuna-13B, and the widely-used proprietary ChatGPT-3.5, independently answered these questions in Chinese between November 1, 2023, and November 7, 2023. To assess reproducibility, each inquiry was replicated once. We found that ChatGLM3-6B demonstrated higher accuracy and completeness than Vicuna-13B and Vicuna-7B (P < .001), but all were outperformed by ChatGPT-3.5. ChatGPT-3.5 received the highest ratings in accuracy (65.2%) compared to ChatGLM3-6B (41.2%), Vicuna-13B (11.2%), and Vicuna-7B (4.4%). Similarly, in completeness, ChatGPT-3.5 led (78.4%), followed by ChatGLM3-6B (76.0%), Vicuna-13B (34.8%), and Vicuna-7B (22.0%) in highest ratings. ChatGLM3-6B matched ChatGPT-3.5 in readability, both outperforming Vicuna models (P < .001). In terms of empathy, ChatGPT-3.5 outperformed the lightweight LLMs (P < .001). In safety, all models performed comparably well (P > .05), with over 98.4% of responses being rated as safe. Repetition of inquiries confirmed these findings. In conclusion, Lightweight LLMs demonstrate promising application in pediatric healthcare. However, the observed gap between lightweight and large-scale proprietary LLMs underscores the need for continued development efforts.

翻訳日:2024-07-28 18:29:13 公開日:2024-07-16

# コントラスト学習における過度な適合?

Overfitting In Contrastive Learning? ( http://arxiv.org/abs/2407.15863v1 )

ライセンス: Link先を確認

Zachary Rabin, Jim Davis, Benjamin Lewis, Matthew Scherreik,

(参考訳) オーバーフィッティング(Overfitting)は、モデルがトレーニングデータにあまりにも密接に適合し、結果として一般化が不十分な機械学習現象を記述している。この現象は、教師付き学習の様々な形態について完全に文書化されているが、教師付き学習の文脈では十分に研究されていない。本研究では,教師なしコントラスト学習における過剰適合の性質について検討する。オーバーフィッティングが実際に起こり、オーバーフィッティングのメカニズムが明らかになる。

Overfitting describes a machine learning phenomenon where the model fits too closely to the training data, resulting in poor generalization. While this occurrence is thoroughly documented for many forms of supervised learning, it is not well examined in the context of \underline{un}supervised learning. In this work we examine the nature of overfitting in unsupervised contrastive learning. We show that overfitting can indeed occur and the mechanism behind overfitting.

翻訳日:2024-07-28 18:29:13 公開日:2024-07-16

# 大規模移動データにおけるバイアスの緩和--大規模交通システムのモニタリングを事例として

Mitigating biases in big mobility data: a case study of monitoring large-scale transit systems ( http://arxiv.org/abs/2407.14541v1 )

ライセンス: Link先を確認

Feilong Wang, Xuegang Ban, Peng Chen, Chenxi Liu, Rong Zhao,

(参考訳) ビッグモビリティデータセット(BMD)は、人間のモビリティを研究し、交通システムの性能を評価する上で、多くの利点を示してきた。しかし、BMDの質はいまだによく分かっていない。本研究では,BMDのバイアスを評価し,緩和法を開発した。今回の研究では、GoogleとAppleのモビリティデータを例として、政府機関のベンチマークデータと比較します。 BMDとベンチマークの時空間差が観察され,輸送アプリケーションへの影響が調査され,誤った政策立案を防止するために,これらのバイアスに緊急に対応する必要性が強調された。本研究は, バイアス緩和法の提案と試験を行う。この緩和されたBMDは、米国100郡以上の大規模公共交通システムに貴重な洞察を与え、新型コロナウイルス(COVID-19)からの交通システムの復旧に地域差があることが示されている。本研究は,BMDを用いた交通研究における注意点と,実践者にとって有益となる効果的な緩和策を提示するものである。

Big mobility datasets (BMD) have shown many advantages in studying human mobility and evaluating the performance of transportation systems. However, the quality of BMD remains poorly understood. This study evaluates biases in BMD and develops mitigation methods. Using Google and Apple mobility data as examples, this study compares them with benchmark data from governmental agencies. Spatio-temporal discrepancies between BMD and benchmark are observed and their impacts on transportation applications are investigated, emphasizing the urgent need to address these biases to prevent misguided policymaking. This study further proposes and tests a bias mitigation method. It is shown that the mitigated BMD could generate valuable insights into large-scale public transit systems across 100+ US counties, revealing regional disparities of the recovery of transit systems from the COVID-19. This study underscores the importance of caution when using BMD in transportation research and presents effective mitigation strategies that would benefit practitioners.

翻訳日:2024-07-23 22:03:21 公開日:2024-07-16

# ルールベース説明書とブラックボックスモデルの整合性を目指して -- ルール誘導とXAIに基づく特徴の融合

Towards consistency of rule-based explainer and black box model -- fusion of rule induction and XAI-based feature importance ( http://arxiv.org/abs/2407.14543v1 )

ライセンス: Link先を確認

Michał Kozielski, Marek Sikora, Łukasz Wawrowski,

(参考訳) ルールベースのモデルは、人間の理解可能な表現、すなわち解釈可能な表現を提供する。このため、ブラックボックスモデルと呼ばれる非解釈可能な複素モデルの決定を説明するために用いられる。このような説明の生成には、ルールベースモデルによるブラックボックスモデルの近似が含まれる。しかし,ルールベースモデルがブラックボックスモデルと同じような判断を下すかどうかについては,現時点では調査されていない。同様に意思決定は、決定の一貫性と意思決定に使用される最も重要な属性の一貫性として、この研究で理解されています。本研究では,ルールベースサロゲートモデルがブラックボックスモデルの性能を模倣することを保証する新しい手法を提案する。提案手法はルール生成を含む説明融合を行い,ブラックボックスモデルに対する選択されたXAI法で決定される特徴を考慮に入れた。この手法の結果は、大域的および局所的なルールに基づく説明である。提案手法の品質は,分類問題を表す30の表付きベンチマークデータセットの広範囲な解析により検証された。評価には, 基準法との比較と, 図案ケーススタディが含まれていた。さらに,本論文では,XAIにおけるルールベースアプローチの適用の可能性と,提案手法を含むルールベースの説明が,コンテンツとプレゼンテーションの両方のユーザ視点と要件を満たす方法について論じる。ソフトウェアと完全な実験結果を含む詳細なレポートはGitHubリポジトリ(https://github.com/ruleminer/FI-rules4XAI )で公開されている。

Rule-based models offer a human-understandable representation, i.e. they are interpretable. For this reason, they are used to explain the decisions of non-interpretable complex models, referred to as black box models. The generation of such explanations involves the approximation of a black box model by a rule-based model. To date, however, it has not been investigated whether the rule-based model makes decisions in the same way as the black box model it approximates. Decision making in the same way is understood in this work as the consistency of decisions and the consistency of the most important attributes used for decision making. This study proposes a novel approach ensuring that the rule-based surrogate model mimics the performance of the black box model. The proposed solution performs an explanation fusion involving rule generation and taking into account the feature importance determined by the selected XAI methods for the black box model being explained. The result of the method can be both global and local rule-based explanations. The quality of the proposed solution was verified by extensive analysis on 30 tabular benchmark datasets representing classification problems. Evaluation included comparison with the reference method and an illustrative case study. In addition, the paper discusses the possible pathways for the application of the rule-based approach in XAI and how rule-based explanations, including the proposed method, meet the user perspective and requirements for both content and presentation. The software created and a detailed report containing the full experimental results are available on the GitHub repository (https://github.com/ruleminer/FI-rules4XAI ).

翻訳日:2024-07-23 22:03:21 公開日:2024-07-16

# 大規模言語モデルのジェイルブレークにおけるクラップ入力による連続埋め込み攻撃

Continuous Embedding Attacks via Clipped Inputs in Jailbreaking Large Language Models ( http://arxiv.org/abs/2407.13796v1 )

ライセンス: Link先を確認

Zihao Xu, Yi Liu, Gelei Deng, Kailong Wang, Yuekang Li, Ling Shi, Stjepan Picek,

(参考訳) 大規模言語モデル(LLM)に対するセキュリティ上の懸念は最近エスカレートされ、個別のプロンプトでジェイルブレイクの試みを阻止することに焦点が当てられている。しかしながら、連続的な埋め込みから生じるジェイルブレイクの脆弱性の探索は制限されており、以前のアプローチは主に個別または連続的な接尾辞を入力に追加するものだった。本研究は,所望の出力が予め定義されている場合の補足の追加や特定の質問の必要をなくし,LSM入力に対して直接攻撃を行うための新しいチャネルを提案する。さらに、大規模なイテレーションは、出力の繰り返しによって特徴づけられる過度な適合につながることが多いことも観察します。これに対抗するために,CLIP というシンプルで効果的な戦略を提案する。実験の結果,繰り返し1000回に40回入力した場合,CLIPを適用するとASRは62%から83%に改善することがわかった。

Security concerns for large language models (LLMs) have recently escalated, focusing on thwarting jailbreaking attempts in discrete prompts. However, the exploration of jailbreak vulnerabilities arising from continuous embeddings has been limited, as prior approaches primarily involved appending discrete or continuous suffixes to inputs. Our study presents a novel channel for conducting direct attacks on LLM inputs, eliminating the need for suffix addition or specific questions provided that the desired output is predefined. We additionally observe that extensive iterations often lead to overfitting, characterized by repetition in the output. To counteract this, we propose a simple yet effective strategy named CLIP. Our experiments show that for an input length of 40 at iteration 1000, applying CLIP improves the ASR from 62% to 83%

翻訳日:2024-07-22 21:39:27 公開日:2024-07-16

# ディープ・パーセプチュアル・ハッシュを破る学習 : ニューラル・ハッシュのユースケース

Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash ( http://arxiv.org/abs/2111.06628v5 )

ライセンス: Link先を確認

Lukas Struppek, Dominik Hintersdorf, Daniel Neider, Kristian Kersting,

(参考訳) Appleは最近、ユーザーのデバイス上で子どもの性的虐待物質(CSAM)を検知し、ファイルをiCloudサービスにアップロードする、深い知覚的ハッシュシステムNeuralHashを公開した。ユーザプライバシの保護とシステムの信頼性に関する批判が急速に起こった。本稿では,ニューラルハッシュに基づく深層感性ハッシュの包括的分析について述べる。具体的には、現在の深い知覚的ハッシュは堅牢でない可能性があることを示す。相手は、勾配ベースのアプローチによって引き起こされた画像のわずかな変化を施すことや、標準画像変換の実行、ハッシュ衝突の強制または防止によってハッシュ値を操作できる。このような攻撃は、悪意のあるアクターが容易に検出システムを利用することを可能にする。さらに、ハッシュ値を使用することで、ユーザデバイスに格納されたデータに関する推論を行うこともできる。私たちの見解では、私たちの結果に基づいて、現在の形式での深い知覚的ハッシュは、一般的に、堅牢なクライアント側のスキャンには準備ができておらず、プライバシの観点からは使用すべきではありません。

Apple recently revealed its deep perceptual hashing system NeuralHash to detect child sexual abuse material (CSAM) on user devices before files are uploaded to its iCloud service. Public criticism quickly arose regarding the protection of user privacy and the system's reliability. In this paper, we present the first comprehensive empirical analysis of deep perceptual hashing based on NeuralHash. Specifically, we show that current deep perceptual hashing may not be robust. An adversary can manipulate the hash values by applying slight changes in images, either induced by gradient-based approaches or simply by performing standard image transformations, forcing or preventing hash collisions. Such attacks permit malicious actors easily to exploit the detection system: from hiding abusive material to framing innocent users, everything is possible. Moreover, using the hash values, inferences can still be made about the data stored on user devices. In our view, based on our results, deep perceptual hashing in its current form is generally not ready for robust client-side scanning and should not be used from a privacy perspective.

翻訳日:2024-07-20 00:38:23 公開日:2024-07-16

# SELF-GUIDE: 自己合成ファインタニングによるタスク特定指導の改善

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning ( http://arxiv.org/abs/2407.12874v1 )

ライセンス: Link先を確認

Chenyang Zhao, Xueying Jia, Vijay Viswanathan, Tongshuang Wu, Graham Neubig,

(参考訳) 大規模言語モデル(LLM)は、適切な自然言語プロンプトを提供する際に、多様なタスクを解決するという約束を持っている。しかしながら、モデルのプロンプトは、十分なトレーニングデータでモデルを微調整するよりも、精度の低い予測をすることがしばしばある。一方、タスク固有のデータ上でのLCMの微調整は、一般的にそのパフォーマンスを改善するが、豊富な注釈付きデータセットは全てのタスクで利用できない。従来の研究では、最先端のLLMからタスク固有のデータを生成して、このデータを使ってより小さなモデルを微調整する方法が検討されてきたが、このアプローチでは、トレーニング対象以外の言語モデルへのアクセスが必要となり、コスト、スケーラビリティの課題、より強力なLLMに継続的に依存する法的なハードルがもたらされる。これに対応して,学生LLMからタスク固有の入出力ペアを合成し,これらの入出力ペアを用いて学生LLM自体を微調整する多段階メカニズムであるSELF-GUIDEを提案する。本研究では,Natural Instructions V2ベンチマークを実証的に評価した結果,SELF-GUIDEによりLLMの性能が大幅に向上することが確認された。具体的には,分類タスクが約15%,生成タスクが18%の絶対的な改善をベンチマークの指標で報告する。このことは、LLMが外部の学習信号なしでタスク固有の専門家になるための自己合成データの約束に光を当てている。

Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. However, prompting often leads models to make predictions with lower accuracy compared to finetuning a model with ample training data. On the other hand, while finetuning LLMs on task-specific data generally improves their performance, abundant annotated datasets are not available for all tasks. Previous work has explored generating task-specific data from state-of-the-art LLMs and using this data to finetune smaller models, but this approach requires access to a language model other than the one being trained, which introduces cost, scalability challenges, and legal hurdles associated with continuously relying on more powerful LLMs. In response to these, we propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM, then use these input-output pairs to finetune the student LLM itself. In our empirical evaluation of the Natural Instructions V2 benchmark, we find that SELF-GUIDE improves the performance of LLM by a substantial margin. Specifically, we report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics. This sheds light on the promise of self-synthesized data guiding LLMs towards becoming task-specific experts without any external learning signals.

翻訳日:2024-07-19 20:02:37 公開日:2024-07-16

# ChatBCG:AIはあなたのスライドデッキを読むことができるか?

ChatBCG: Can AI Read Your Slide Deck? ( http://arxiv.org/abs/2407.12875v1 )

ライセンス: Link先を確認

Nikita Singh, Rob Balian, Lukas Martinelli,

(参考訳) GPT4oやGemini Flashのようなマルチモーダルモデルは、人間レベルのパフォーマンスにアプローチする推論および要約タスクにおいて例外的である。しかし、これらのモデルは、特にビジネスデッキのビジュアルチャートの文脈において、非常に具体的な「読み上げと推定」タスクを行うよう依頼されたとき、人間に比べて性能が劣っていることがわかった。本稿では,GPT 4o と Gemini Flash-1.5 の精度を評価し,ラベル付きチャート(グラフ上にデータが明確に注釈付けされている場合)およびラベルなしチャート(データが明確に注釈付けされておらず,X軸とY軸から推測する必要がある場合)に関する簡単な質問に答える。これらのモデルは、複雑なグラフやラベル付けされていないグラフを含む場合、現在、デッキを正確にエンドツーエンドに読むことはできないと結論付けています。たとえユーザーがラベル付きチャートのみのデッキを作ったとしても、このモデルは15のラベル付きチャートのうち7～8個しか読めない。スライドデッキのフィギュアの全リストについては、https://www.repromptai.com/chat_bcgを参照してください。

Multimodal models like GPT4o and Gemini Flash are exceptional at inference and summarization tasks, which approach human-level in performance. However, we find that these models underperform compared to humans when asked to do very specific 'reading and estimation' tasks, particularly in the context of visual charts in business decks. This paper evaluates the accuracy of GPT 4o and Gemini Flash-1.5 in answering straightforward questions about data on labeled charts (where data is clearly annotated on the graphs), and unlabeled charts (where data is not clearly annotated and has to be inferred from the X and Y axis). We conclude that these models aren't currently capable of reading a deck accurately end-to-end if it contains any complex or unlabeled charts. Even if a user created a deck of only labeled charts, the model would only be able to read 7-8 out of 15 labeled charts perfectly end-to-end. For full list of slide deck figures visit https://www.repromptai.com/chat_bcg

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# Civitaiにおける乱用生成AIモデルの利用を探る

Exploring the Use of Abusive Generative AI Models on Civitai ( http://arxiv.org/abs/2407.12876v1 )

ライセンス: Link先を確認

Yiluo Wei, Yiming Zhu, Pan Hui, Gareth Tyson,

(参考訳) 生成AIの台頭はデジタル画像の風景を変え、オンラインクリエイティブコミュニティに大きな影響を与えている。これにより、CivitaiのようなAIGC(AI-Generated Content)ソーシャルプラットフォームが誕生した。これらのユニークなソーシャルプラットフォームにより、ユーザーは独自の生成AIモデルを構築し、共有することができ、それによってより多様な芸術的表現の可能性を高めることができる。ソーシャルネットワークの中でデザインされた彼らは、アーチストたちに自分たちの創造(モデルから生成される)を披露する手段を提供し、議論を行い、フィードバックを得て、コミュニティの感覚を育む。しかし、このオープン性は、例えば、偽りのディープフェイクを広めたり、著作権を侵害したりするモデルの使用など、そのようなプラットフォームの悪用に対する懸念も引き起こす。これを探るため,我々はAIGCソーシャルプラットフォームに関する総合的な実証的研究を行い,乱用コンテンツの生成に利用することに焦点を当てた。例として、利用可能なAIGCソーシャルプラットフォームとして最大であるCivitaiをカバーする包括的データセットを構築した。この87Kモデルと2M画像のデータセットに基づいて、コンテンツの特徴を調査し、これらのプラットフォームをよりよく管理するためのモデレーション戦略について議論する。

The rise of generative AI is transforming the landscape of digital imagery, and exerting a significant influence on online creative communities. This has led to the emergence of AI-Generated Content (AIGC) social platforms, such as Civitai. These distinctive social platforms allow users to build and share their own generative AI models, thereby enhancing the potential for more diverse artistic expression. Designed in the vein of social networks, they also provide artists with the means to showcase their creations (generated from the models), engage in discussions, and obtain feedback, thus nurturing a sense of community. Yet, this openness also raises concerns about the abuse of such platforms, e.g., using models to disseminate deceptive deepfakes or infringe upon copyrights. To explore this, we conduct the first comprehensive empirical study of an AIGC social platform, focusing on its use for generating abusive content. As an exemplar, we construct a comprehensive dataset covering Civitai, the largest available AIGC social platform. Based on this dataset of 87K models and 2M images, we explore the characteristics of content and discuss strategies for moderation to better govern these platforms.

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# Review-Feedback-Reason (ReFeR): NLG評価と推論のための新しいフレームワーク

Review-Feedback-Reason (ReFeR): A Novel Framework for NLG Evaluation and Reasoning ( http://arxiv.org/abs/2407.12877v1 )

ライセンス: Link先を確認

Yaswanth Narsupalli, Abhranil Chandra, Sreevatsa Muppirala, Manish Gupta, Pawan Goyal,

(参考訳) 大規模言語モデル(LLM)によって生成されるような自然言語生成(NLG)出力の品質を評価することは、大きな課題となる。従来のアプローチでは、リソース集約的な人的評価と自動メトリクスの両方が関係しており、しばしば人間の判断と相関が低い。本研究では,LPM エージェントを用いた NLG 評価フレームワークである Review-Feedback-Reason (ReFeR) を提案する。 NLGタスクの2つの既存のベンチマークデータセットを使用して、ReFeRを厳格にテストする。提案フレームワークは,NLG評価の精度を高め,従来のベンチマークを$\sim$20\%以上越えるだけでなく,構成的フィードバックを生成し,集合的推論を大幅に改善する。このフィードバックは、Mistral-7Bのような小さなモデルを微調整するために使用する命令チューニングデータセットの作成に利用される。また,GPT-3.5 Turbo を$\sim$11.67\% ,GPT-4 を$\sim$1\% で評価する。

Assessing the quality of Natural Language Generation (NLG) outputs, such as those produced by large language models (LLMs), poses significant challenges. Traditional approaches involve either resource-intensive human evaluations or automatic metrics, which often exhibit a low correlation with human judgment. In this study, we propose Review-Feedback-Reason (ReFeR), a novel evaluation framework for NLG using LLM agents. We rigorously test ReFeR using two pre-existing benchmark datasets on diverse NLG tasks. The proposed framework not only enhances the accuracy of NLG evaluation, surpassing previous benchmarks by $\sim$20\%, but also generates constructive feedback and significantly improves collective reasoning. This feedback is then leveraged for the creation of instruction-tuning datasets, which, when used to fine-tune smaller models like Mistral-7B, makes them extremely good evaluators, yielding a better correlation with human evaluations and performance nearly on par with GPT-3.5. We highlight the effectiveness of our methodology through its application on three reasoning benchmarks, where it outperforms most of the state-of-the-art methods, and also outperforms the reasoning capabilities of models like GPT-3.5 Turbo by $\sim$11.67\% and GPT-4 by $\sim$1\% on an average.

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# LLMには一貫性のある価値はあるか?

Do LLMs have Consistent Values? ( http://arxiv.org/abs/2407.12878v1 )

ライセンス: Link先を確認

Naama Rozen, Gal Elidan, Amir Globerson, Ella Daniel,

(参考訳) 価値は人間の行動の基礎となる基本的な原動力である。大規模言語モデル(LLM)技術は、人間のような対話に向けて常に改善されている。しかし、LLMが生成したテキストで表される値についての研究はほとんど行われていない。ここでは、心理学における価値構造に関する豊富な文献に目を向けることで、この問題を研究する。我々は,LLMが,値のランク付けや値の相関など,人間で実証されたのと同じ値構造を示すかどうかを問う。この分析の結果は, LLMの推進方法に強く依存しており, 特定の促進戦略(「値アンチョリング」と呼ぶ)の下では, 人的データとの合意が極めて説得力があることが示されている。この結果は,LLMにおける値の理解の向上と,LLM応答の一貫性を評価する新しい手法の導入に寄与する。

Values are a basic driving force underlying human behavior. Large Language Models (LLM) technology is constantly improving towards human-like dialogue. However, little research has been done to study the values exhibited in text generated by LLMs. Here we study this question by turning to the rich literature on value structure in psychology. We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values. We show that the results of this analysis strongly depend on how the LLM is prompted, and that under a particular prompting strategy (referred to as 'Value Anchoring') the agreement with human data is quite compelling. Our results serve both to improve our understanding of values in LLMs, as well as introduce novel methods for assessing consistency in LLM responses.

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# 大規模視覚言語モデルも良い分類法である:インテクストマルチモーダルフェイクニュース検出の検討

Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection ( http://arxiv.org/abs/2407.12879v1 )

ライセンス: Link先を確認

Ye Jiang, Yimin Wang,

(参考訳) 大規模視覚言語モデル(LVLM)は、多種多様なクロスモーダルベンチマークにおいて、視覚言語推論において例外的な性能を示す。これらの進歩にもかかわらず、最近の研究は、GPT-3.5-turboのような大規模言語モデル(LLM)が、Fake News Detection (FND)においてBERTのようなよく訓練された小型モデルと比較され、FNDタスクにおけるLVLMsの有効性を問うことが示唆されている。微調整のLVLMにより性能は向上するが、かなりのパラメータと必要な事前訓練の重み付けにより、FNDアプリケーションのためのリソース重み付けの取り組みとなった。本稿は,CLIPモデルと比較し,まず2つの有名なLVLM(CagVLMとGPT4V)のFND能力を評価する。以上の結果から,LVLMは小型モデルと競合する性能が得られることが示された。次に,標準文脈学習(ICL)をLVLMと統合し,FND性能の向上に言及する。この問題に対処するため、我々は、よく訓練された小さなモデルからの予測と対応する確率で、文脈内例とテストインプットを豊かにすることで、textbf{I}n-context \textbf{M}ultimodal \textbf{F}ake \textbf{N}ews \textbf{D}etection (IMFND) フレームワークを導入する。この戦略的統合により、LVLMは高い確率に関連するニュースセグメントに焦点を向け、分析精度を向上させることができる。実験結果から,IMFNDフレームワークはLVLMのFND効率を大幅に向上し,3つのFNDデータセットの標準ICLアプローチよりも精度が向上したことが示唆された。

Large visual-language models (LVLMs) exhibit exceptional performance in visual-language reasoning across diverse cross-modal benchmarks. Despite these advances, recent research indicates that Large Language Models (LLMs), like GPT-3.5-turbo, underachieve compared to well-trained smaller models, such as BERT, in Fake News Detection (FND), prompting inquiries into LVLMs' efficacy in FND tasks. Although performance could improve through fine-tuning LVLMs, the substantial parameters and requisite pre-trained weights render it a resource-heavy endeavor for FND applications. This paper initially assesses the FND capabilities of two notable LVLMs, CogVLM and GPT4V, in comparison to a smaller yet adeptly trained CLIP model in a zero-shot context. The findings demonstrate that LVLMs can attain performance competitive with that of the smaller model. Next, we integrate standard in-context learning (ICL) with LVLMs, noting improvements in FND performance, though limited in scope and consistency. To address this, we introduce the \textbf{I}n-context \textbf{M}ultimodal \textbf{F}ake \textbf{N}ews \textbf{D}etection (IMFND) framework, enriching in-context examples and test inputs with predictions and corresponding probabilities from a well-trained smaller model. This strategic integration directs the LVLMs' focus towards news segments associated with higher probabilities, thereby improving their analytical accuracy. The experimental results suggest that the IMFND framework significantly boosts the FND efficiency of LVLMs, achieving enhanced accuracy over the standard ICL approach across three publicly available FND datasets.

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# Few-Shot Multimodal Fake News 検出のためのクロスモーダル拡張

Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection ( http://arxiv.org/abs/2407.12880v1 )

ライセンス: Link先を確認

Ye Jiang, Taihang Wang, Xiaoman Xu, Yimin Wang, Xingyi Song, Diana Maynard,

(参考訳) 偽ニュースの初期段階のトピックは、限られた注釈付きサンプルから素早く学習する自動検出方法を必要とする。そのため,早期のフェイクニュースの検出には,限られた指導力,あるいは少数ショットラーニング(英語版)としても知られる新しいタスクにおいて,急速に習熟する能力が不可欠である。既存のアプローチでは、多数のパラメータを伴ってトレーニング済みの言語モデルを微調整するか、大規模な注釈付きデータセットでスクラッチから複雑なニューラルネットワークをトレーニングする。本稿では,一様特徴を用いたマルチモーダル特徴を付加したマルチモーダルフェイクニュース検出モデルを提案する。この目的のために,Nショット分類をより堅牢な (n $\times$ z) ショット問題に変換することで,マルチモーダルな複数モーダルな偽ニュースの検出を簡易に行うCMA(Cross-Modal Augmentation)を導入する。提案したCMAは3つのベンチマークデータセット上でSOTA結果を達成し、驚くほど単純な線形探索法を用いて、少数のトレーニングサンプルでマルチモーダルフェイクニュースを分類する。さらに,本手法は従来手法よりもはるかに軽量であり,特に訓練可能なパラメータの数やエポック時間の観点からも顕著である。コードはここで入手できる。 \url{https://github.com/zgjiangtoby/FND_fewshot}

The nascent topic of fake news requires automatic detection methods to quickly learn from limited annotated samples. Therefore, the capacity to rapidly acquire proficiency in a new task with limited guidance, also known as few-shot learning, is critical for detecting fake news in its early stages. Existing approaches either involve fine-tuning pre-trained language models which come with a large number of parameters, or training a complex neural network from scratch with large-scale annotated datasets. This paper presents a multimodal fake news detection model which augments multimodal features using unimodal features. For this purpose, we introduce Cross-Modal Augmentation (CMA), a simple approach for enhancing few-shot multimodal fake news detection by transforming n-shot classification into a more robust (n $\times$ z)-shot problem, where z represents the number of supplementary features. The proposed CMA achieves SOTA results over three benchmark datasets, utilizing a surprisingly simple linear probing method to classify multimodal fake news with only a few training samples. Furthermore, our method is significantly more lightweight than prior approaches, particularly in terms of the number of trainable parameters and epoch times. The code is available here: \url{https://github.com/zgjiangtoby/FND_fewshot}

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# BinaryAlign:バイナリシーケンスラベリングとしての単語アライメント

BinaryAlign: Word Alignment as Binary Sequence Labeling ( http://arxiv.org/abs/2407.12881v1 )

ライセンス: Link先を確認

Gaetan Lopez Latouche, Marc-André Carbonneau, Ben Swanson,

(参考訳) 単語アライメントの現実的な展開は、高リソース言語と低リソース言語の両方をカバーすることがほぼ確実である。しかし、このタスクの最先端は、特定の言語ペアに対するゴールドアライメントトレーニングデータの可用性に応じて、異なるモデルクラスを推奨する。両シナリオの既存手法よりも優れたバイナリシーケンスラベリングに基づく新しい単語アライメント手法であるBinaryAlignを提案する。さらに,多言語基盤モデルの具体的選択に違いがあり,アライメントエラー型よりも階層化された誤り解析を行い,非英語言語対上でのBinaryAlignの性能について検討する。ソースコードを公開しています。

Real world deployments of word alignment are almost certain to cover both high and low resource languages. However, the state-of-the-art for this task recommends a different model class depending on the availability of gold alignment training data for a particular language pair. We propose BinaryAlign, a novel word alignment technique based on binary sequence labeling that outperforms existing approaches in both scenarios, offering a unifying approach to the task. Additionally, we vary the specific choice of multilingual foundation model, perform stratified error analysis over alignment error type, and explore the performance of BinaryAlign on non-English language pairs. We make our source code publicly available.

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# InstructAV:オーサリング検証のためのインストラクションファインタニング大型言語モデル

InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification ( http://arxiv.org/abs/2407.12882v1 )

ライセンス: Link先を確認

Yujia Hu, Zhiqiang Hu, Chun-Wei Seah, Roy Ka-Wei Lee,

(参考訳) 大規模言語モデル(LLM)は、幅広いNLPタスクにおいて顕著な習熟性を示している。しかし、2つのテキストが同じ著者シップを共有しているかどうかを判断するオーサシップ検証(AV)タスクに関しては、ChatGPTのような先進的なモデルでさえ、顕著な制限がある。本稿では,著者確認のための新しいアプローチであるInstructAVを紹介する。このアプローチでは,パラメータ効率の細かいチューニング(PEFT)手法と併用して,精度と説明可能性の向上を図る。 InstructAVの特徴は、分類決定を透明で理解可能な説明と整合させる能力にある。さまざまなデータセットにわたる包括的な実験を通じて、InstructAVはAVタスクにおける最先端のパフォーマンスを示し、高い分類精度と説明信頼性の強化を提供する。

Large Language Models (LLMs) have demonstrated remarkable proficiency in a wide range of NLP tasks. However, when it comes to authorship verification (AV) tasks, which involve determining whether two given texts share the same authorship, even advanced models like ChatGPT exhibit notable limitations. This paper introduces a novel approach, termed InstructAV, for authorship verification. This approach utilizes LLMs in conjunction with a parameter-efficient fine-tuning (PEFT) method to simultaneously improve accuracy and explainability. The distinctiveness of InstructAV lies in its ability to align classification decisions with transparent and understandable explanations, representing a significant progression in the field of authorship verification. Through comprehensive experiments conducted across various datasets, InstructAV demonstrates its state-of-the-art performance on the AV task, offering high classification accuracy coupled with enhanced explanation reliability.

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# BRIGHT: 推論集約検索のための現実的でカオスなベンチマーク

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval ( http://arxiv.org/abs/2407.12883v1 )

ライセンス: Link先を確認

Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu,

(参考訳) 既存の検索ベンチマークは主に、キーワードまたは意味に基づく検索が通常十分である情報検索クエリ(例えば、検索エンジンからの集約された質問)で構成されている。しかし、多くの複雑な現実世界のクエリは、サーフェスフォームマッチングを超える関連ドキュメントを特定するために、詳細な推論を必要とする。例えば、コーディング問題のためのドキュメントを見つけるには、関連する関数のロジックと構文を理解する必要がある。このような難解なクエリに対する検索のベンチマークを改善するために,関係文書の検索に集中的推論を必要とする最初のテキスト検索ベンチマークBRIGHTを導入する。 BRIGHTは、さまざまな領域(経済学、心理学、ロボット工学、ソフトウェア工学、地球科学など)から収集された1,398の現実世界のクエリから構築されている。広範囲な評価により,最先端の検索モデルでさえBRIGHTでは性能が良くないことが明らかとなった。 MTEBリーダーボード[38]のリードモデルは59.0 nDCG@10,2であり、BRIGHTでは18.0 nDCG@10のスコアを生成する。さらに,大規模言語モデル(LLM)が生成するChain-of-Thought推論によるクエリの強化により,最大12.2ポイントの性能向上が図られている。さらに、BRIGHTは、トレーニングデータにベンチマークからの文書が含まれている場合でも、同様の性能を示すことによって、ベンチマークされたモデルの事前トレーニング中にデータ漏洩に対して堅牢である。 BRIGHTは、より現実的で困難な環境での検索システムに関する将来の研究の道を開くものだと考えています。私たちのコードとデータはhttps://brightbenchmark.github.io.comで公開されています。

Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries require in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. To better benchmark retrieval on such challenging queries, we introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. BRIGHT is constructed from the 1,398 real-world queries collected from diverse domains (such as economics, psychology, robotics, software engineering, earth sciences, etc.), sourced from naturally occurring or carefully curated human data. Extensive evaluation reveals that even state-of-the-art retrieval models perform poorly on BRIGHT. The leading model on the MTEB leaderboard [38 ], which achieves a score of 59.0 nDCG@10,2 produces a score of nDCG@10 of 18.0 on BRIGHT. We further demonstrate that augmenting queries with Chain-of-Thought reasoning generated by large language models (LLMs) improves performance by up to 12.2 points. Moreover, BRIGHT is robust against data leakage during pretraining of the benchmarked models as we validate by showing similar performance even when documents from the benchmark are included in the training data. We believe that BRIGHT paves the way for future research on retrieval systems in more realistic and challenging settings. Our code and data are available at https://brightbenchmark.github.io.

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# Surro Flow:パラメータ空間探索と不確実性定量のためのフローベースサロゲートモデル

SurroFlow: A Flow-Based Surrogate Model for Parameter Space Exploration and Uncertainty Quantification ( http://arxiv.org/abs/2407.12884v1 )

ライセンス: Link先を確認

Jingyi Shen, Yuhan Duan, Han-Wei Shen,

(参考訳) 既存のディープラーニングベースのサロゲートモデルは、効率的なデータ生成を容易にするが、不確実な定量化、効率的なパラメータ空間探索、逆予測に不足する。本研究では,新しいフローベース代理モデルであるSurroFlowを導入し,シミュレーションパラメータとシミュレーション出力の間の可逆変換を学習する。このモデルは、与えられたシミュレーションパラメータのシミュレーション結果の正確な予測を可能にするだけでなく、データ生成プロセスにおける不確実な定量化もサポートする。さらに、効率的なシミュレーションパラメータのレコメンデーションと探索を可能にする。我々は,SurroFlowと遺伝的アルゴリズムを視覚インタフェースのバックエンドとして統合し,効果的なユーザ誘導アンサンブルシミュレーション探索と可視化を支援する。本フレームワークは,科学的サロゲートモデルの信頼性と探索能力を向上しつつ,計算コストを大幅に削減する。

Existing deep learning-based surrogate models facilitate efficient data generation, but fall short in uncertainty quantification, efficient parameter space exploration, and reverse prediction. In our work, we introduce SurroFlow, a novel normalizing flow-based surrogate model, to learn the invertible transformation between simulation parameters and simulation outputs. The model not only allows accurate predictions of simulation outcomes for a given simulation parameter but also supports uncertainty quantification in the data generation process. Additionally, it enables efficient simulation parameter recommendation and exploration. We integrate SurroFlow and a genetic algorithm as the backend of a visual interface to support effective user-guided ensemble simulation exploration and visualization. Our framework significantly reduces the computational costs while enhancing the reliability and exploration capabilities of scientific surrogate models.

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# LLMの分類課題に推奨されない白化

Whitening Not Recommended for Classification Tasks in LLMs ( http://arxiv.org/abs/2407.12886v1 )

ライセンス: Link先を確認

Ali Forooghi, Shaghayegh Sadeghi, Jianguo Lu,

(参考訳) 文の埋め込みはNLPの基盤となる。ホワイトニングは、LLM(Large Language Models)から得られる埋め込み品質を改善する効果的な操作であると言われている。しかし,ホワイトニングの有効性はモデルに依存し,タスクに依存していることがわかった。特に、ホワイトニングは分類タスクの埋め込みを退化させる。結論は広範な実験によって支持される。また, PCA, ZCA, PCA-Cor, ZCA-Cor, Cholesky などの白化処理についても検討した。我々の研究の副産物は、SentEval+と呼ばれるLCMの組込み評価プラットフォームである。

Sentence embedding is a cornerstone in NLP. Whitening has been claimed to be an effective operation to improve embedding quality obtained from Large Language Models (LLMs). However, we find that the efficacy of whitening is model-dependent and task-dependent. In particular, whitening degenerates embeddings for classification tasks. The conclusion is supported by extensive experiments. We also explored a variety of whitening operations, including PCA, ZCA, PCA-Cor, ZCA-Cor and Cholesky whitenings. A by-product of our research is embedding evaluation platform for LLMs called SentEval+.

翻訳日:2024-07-19 19:52:52 公開日:2024-07-16

# UAVを利用した宇宙空間統合ネットワーク:最近の学習アルゴリズムの技術レビュー

UAV-Assisted Space-Air-Ground Integrated Networks: A Technical Review of Recent Learning Algorithms ( http://arxiv.org/abs/2211.14931v2 )

ライセンス: Link先を確認

Atefeh H. Arani, Peng Hu, Yeying Zhu,

(参考訳) 近年の宇宙・空気・地上機器の技術進歩により、宇宙空地上統合ネットワーク(SAGIN)と呼ばれる新しいネットワークパラダイムが実現されている。無人航空機(UAV)はSAGINにおいて重要な役割を果たしている。しかし、UAVの高ダイナミック性と複雑さのため、SAGINの実際の展開は、そのようなSAGINを実現する上で重要な障壁となる。 UAVは、限られた操作性と宇宙および地上コンポーネントの資源を備えた重要な性能要件を満たすことが期待されている。したがって、様々な利用シナリオでUAVを採用するには、アルゴリズム的なアプローチで十分に設計された計画が必要である。本稿では,UAV支援SAGINにおける最近の学習アルゴリズムのレビューと分析を行う。報奨関数について検討し、Q-ラーニング、深層Q-ラーニング、マルチアームバンディット、粒子群最適化、満足度に基づく学習アルゴリズムなどの報酬関数を最適化するための最先端アルゴリズムについて議論する。他の調査論文とは異なり、最適化問題における方法論的視点に注目し、SAGIN上の様々なミッションに適用する。実世界の構成と2次元(2次元)と3次元(3次元)のUAV軌跡を配置事例の反映として検討する。シミュレーションの結果,3次元満足度に基づく学習アルゴリズムは,ほとんどの場合,他の手法よりも優れていたことが示唆された。最後に,UAV支援型SAGINの設計・展開ガイドラインについて述べる。

Recent technological advancements in space, air, and ground components have made possible a new network paradigm called space-air-ground integrated network (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, real-world deployment of a SAGIN becomes a significant barrier to realizing such SAGINs. UAVs are expected to meet key performance requirements with limited maneuverability and resources with space and terrestrial components. Therefore, employing UAVs in various usage scenarios requires well-designed planning in algorithmic approaches. This paper provides an essential review and analysis of recent learning algorithms in a UAV-assisted SAGIN. We consider possible reward functions and discuss the state-of-the-art algorithms for optimizing the reward functions, including Q-learning, deep Q-learning, multi-armed bandit, particle swarm optimization, and satisfaction-based learning algorithms. Unlike other survey papers, we focus on the methodological perspective of the optimization problem, applicable to various missions on a SAGIN. We consider real-world configurations and the 2-dimensional (2D) and 3-dimensional (3D) UAV trajectories to reflect deployment cases. Our simulations suggest the 3D satisfaction-based learning algorithm outperforms other approaches in most cases. With open challenges discussed at the end, we aim to provide design and deployment guidelines for UAV-assisted SAGINs.

翻訳日:2024-07-19 03:58:48 公開日:2024-07-16

# 量子力学の非線形拡張の幾何学的解釈

Geometric Interpretation of a nonlinear extension of Quantum Mechanics ( http://arxiv.org/abs/2405.07289v3 )

ライセンス: Link先を確認

Alan Chodos, Fred Cooper,

(参考訳) 我々は最近、通常の線形量子力学問題のハミルトニアンの固有値と固有関数の観点から正確に解ける性質を持つ特定の非線形量子力学の一般化を導入した。本稿では,波動関数の2つの成分が時空の2つの異なる漸近領域におけるハミルトニアンHによって記述された系を表すことを示唆し,非線型項が重力効果をもたらすと考えられることを示す。

We recently introduced a particular nonlinear generalization of quantum mechanics which has the property that it is exactly solvable in terms of the eigenvalues and eigenfunctions of the Hamiltonian of the usual linear quantum mechanics problem. In this paper we suggest that the two components of the wave function represent the system described by the Hamiltonian H in two different asymptotic regions of spacetime and we show that the non-linear terms can be viewed as giving rise to gravitational effects.

翻訳日:2024-07-19 03:51:44 公開日:2024-07-16

# 拡張エッジを持つ量子ホール系におけるホーキング放射-異常法の適用

Hawking radiation in quantum Hall system with an expanding edge: application of anomaly method ( http://arxiv.org/abs/2407.02796v2 )

ライセンス: Link先を確認

Riku Yoshimoto, Yasusada Nambu,

(参考訳) 重力異常とブラックホールのホーキング放射の関係はウィルツェクとロビンソンによって明らかにされた。本研究では,拡張エッジを持つ量子ホール系におけるジッター時空のアナログにそれらの手法を適用した。この系はキラルであるため、元々の方法で仮定した地平線付近での進入モードの条件を課す必要はない。さらに、この系は、ド・ジッター空間が2つの平坦空間の間に挟まれるように構成されており、この異常の影響はオーディナル・ド・ジッター時空には現れないが、ド・ジッターと平坦領域の境界条件として現れる。これらの境界条件下での計算により、ド・ジッター地平線のギボンズ・ホーキング温度で外平領域におけるホーキング放射のフラックスを求める。

The relationship between gravitational anomalies and Hawking radiation of black holes was revealed by Wilczek and Robinson. In this study, we apply their method to an analogue de Sitter spacetime in the quantum Hall system with an expanding edge. Because this system is chiral, there is no need to impose the condition of ingoing modes near the horizon, which was assumed in the original method. Moreover, this system is structured so that the de Sitter space is sandwiched between two flat spaces, and although the effects of the anomaly would not appear in an ordinal de Sitter spacetime, they manifest themselves as boundary conditions between the de Sitter and the flat regions. By performing calculations under these boundary conditions, we obtain the flux of Hawking radiation in the outer flat region with the Gibbons-Hawking temperature of the de Sitter horizon.

翻訳日:2024-07-19 03:51:44 公開日:2024-07-16

# パッチ空間下マンニフォルドの測地線図を用いた画像デノーミング

Image Denoising Using the Geodesics' Gramian of the Manifold Underlying Patch-Space ( http://arxiv.org/abs/2010.07769v3 )

ライセンス: Link先を確認

Kelum Gajamannage,

(参考訳) 現代社会における高度なカメラの普及に伴い、正確で視覚的な画像の需要が高まっている。しかし、カメラが捉えた画像の品質はノイズによって劣化する可能性がある。したがって, 画像の特徴を損なうことなく, ノイズを除去するためには, 画像の処理が必要である。現在の文献では様々な復調法が提供されているが、その正当性や効力性は不確かである。そこで本研究では,精度の高い画像を生成することができる新しい,計算効率の良い画像復号法を提案する。画像の滑らか性を維持するため、画素ではなく画像から分割されたパッチを入力する。そして、画像領域ではなく、パッチ空間の裏にある多様体をデノナイズして、画像全体の機能をよりよく保存する。本稿では,この手法の性能をベンチマーク画像処理法に対して検証する。

With the proliferation of sophisticated cameras in modern society, the demand for accurate and visually pleasing images is increasing. However, the quality of an image captured by a camera may be degraded by noise. Thus, some processing of images is required to filter out the noise without losing vital image features. Even though the current literature offers a variety of denoising methods, the fidelity and efficacy of their denoising are sometimes uncertain. Thus, here we propose a novel and computationally efficient image denoising method that is capable of producing accurate images. To preserve image smoothness, this method inputs patches partitioned from the image rather than pixels. Then, it performs denoising on the manifold underlying the patch-space rather than that in the image domain to better preserve the features across the whole image. We validate the performance of this method against benchmark image processing methods.

翻訳日:2024-07-19 00:05:30 公開日:2024-07-16

# 観測・干渉データを用いた文脈特異的因果関係モデルの表現

Representation of Context-Specific Causal Models with Observational and Interventional Data ( http://arxiv.org/abs/2101.09271v4 )

ライセンス: Link先を確認

Eliana Duarte, Liam Solus,

(参考訳) CStreesと呼ばれるコンテキスト固有独立モデルの新たなファミリを導入することにより、一般に(例えば硬さや柔らかい)介入によって収集された観測データと実験データの両方に基づいて、文脈固有因果モデルを表現する問題に対処する。この族は、一般的な介入DAGモデルを定義する因子化特性の一般化を可能にする新しい分解基準によって定義される。 DAGのVermaとPearlの基準を拡張した観測CSツリーのモデル等価性のグラフィカルな特徴を導出する。この特徴は、一般にコンテキスト特異的な介入の下でCStreeモデルに拡張される。これらの結果を得るために、CStreeモデルの簡潔なグラフィカル表現に組み込むことができる文脈依存的介入の概念を定式化する。 CSツリーと他の文脈特化モデルとを関連づけ,DAG,CSツリー,ラベル付きDAG,ステージ付きツリーが厳密な包摂連鎖を形成することを示す。 CStreeモデルを実際のデータセットに適用し、データ依存構造とソフトな介入摂動の文脈固有の性質を明らかにする。

We address the problem of representing context-specific causal models based on both observational and experimental data collected under general (e.g. hard or soft) interventions by introducing a new family of context-specific conditional independence models called CStrees. This family is defined via a novel factorization criterion that allows for a generalization of the factorization property defining general interventional DAG models. We derive a graphical characterization of model equivalence for observational CStrees that extends the Verma and Pearl criterion for DAGs. This characterization is then extended to CStree models under general, context-specific interventions. To obtain these results, we formalize a notion of context-specific intervention that can be incorporated into concise graphical representations of CStree models. We relate CStrees to other context-specific models, showing that the families of DAGs, CStrees, labeled DAGs and staged trees form a strict chain of inclusions. We end with an application of interventional CStree models to a real data set, revealing the context-specific nature of the data dependence structure and the soft, interventional perturbations.

翻訳日:2024-07-19 00:00:34 公開日:2024-07-16

# 非互換性を超えた: 機械学習と法における相互排他的公正基準のトレードオフ

Beyond Incompatibility: Trade-offs between Mutually Exclusive Fairness Criteria in Machine Learning and Law ( http://arxiv.org/abs/2212.00469v4 )

ライセンス: Link先を確認

Meike Zehlike, Alex Loosley, Håkan Jonsson, Emil Wiedemann, Philipp Hacker,

(参考訳) 公正で信頼できるAIは、マシンラーニングと法的なドメインの両方において、ますます重要になっている。重要な結果の1つは、意思決定者は「公正」すなわち非差別的、アルゴリズム的な決定手順を保証する必要があることである。しかし、現実的な事実的仮定の下で相互に相容れないことが示されているアルゴリズム的公正性のいくつかの競合する概念がある。この懸念は、例えば「グループ内の校正」と「正・負のクラスに対する均衡」の広く使われている公平度尺度である。本稿では,これら3つのフェアネス基準を補間する新しいアルゴリズム(FAir Interpolation Method: FAIM)を提案する。したがって、初期不公平な予測は、少なくとも部分的には、各公正条件の所望の重み付けされた組み合わせを満たすように修正することができる。我々は,合成データ,CompASデータセット,電子商取引部門による新たな実世界のデータセットに適用した場合のアルゴリズムの有効性を実証する。最後に、FAIMが相反する法的義務を満たすためにどの程度活用できるかについて議論する。この分析は、信用スコアリングや刑事司法手続といった従来の法分野における業務を運用するだけでなく、デジタル市場法や最近制定されたAI法など、EUで実施された最新のAI規制についても運用する可能性があることを示唆している。

Fair and trustworthy AI is becoming ever more important in both machine learning and legal domains. One important consequence is that decision makers must seek to guarantee a 'fair', i.e., non-discriminatory, algorithmic decision procedure. However, there are several competing notions of algorithmic fairness that have been shown to be mutually incompatible under realistic factual assumptions. This concerns, for example, the widely used fairness measures of 'calibration within groups' and 'balance for the positive/negative class'. In this paper, we present a novel algorithm (FAir Interpolation Method: FAIM) for continuously interpolating between these three fairness criteria. Thus, an initially unfair prediction can be remedied to, at least partially, meet a desired, weighted combination of the respective fairness conditions. We demonstrate the effectiveness of our algorithm when applied to synthetic data, the COMPAS data set, and a new, real-world data set from the e-commerce sector. Finally, we discuss to what extent FAIM can be harnessed to comply with conflicting legal obligations. The analysis suggests that it may operationalize duties in traditional legal fields, such as credit scoring and criminal justice proceedings, but also for the latest AI regulations put forth in the EU, like the Digital Markets Act and the recently enacted AI Act.

翻訳日:2024-07-19 00:00:34 公開日:2024-07-16

# 自転車のフレーム:ニュースの中のサイクリストの欠かせないポートレイダルを理解する

Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News ( http://arxiv.org/abs/2301.06178v2 )

ライセンス: Link先を確認

Xingmeng Zhao, Dan Schumacher, Sashank Nalluri, Xavier Walton, Suhana Shrestha, Anthony Rios,

(参考訳) 輸送やレクリエーションのためのサイクリングの増加は、健康を増し、車両の環境への影響を減少させる。しかし、報道機関のイデオロギーや報告スタイルは、しばしばサイクリングに対する大衆の認識に影響を及ぼす。例えば、報道機関がサイクリング事故を過度に報告すると、人々はサイクリストを「危険な」と認識させ、サイクリングを選択したサイクリストの数を減少させる可能性がある。さらに、サイクリングの減少は、安全なインフラに対する政府資金の削減につながる可能性がある。本稿では,ニュース見出し中のサイクリストの知覚を検知する手法を開発する。これを達成するために ``Bike Frames'' と呼ばれる新しいデータセットを導入します。データセットは31,480のニュース見出しと1500のアノテーションで構成されている。私たちの焦点は、米国からの11,385の見出しを分析することです。 BikeFrame Chain-of-Codeフレームワークを導入し、サイクリストの知覚を予測し、事故に関連する見出しを特定し、欠陥を判定する。このフレームワークは、正確な論理に擬似コードを使用し、ニューズエージェンシーのバイアス分析を統合して、大規模言語モデルにおける従来のチェーン・オブ・シークレット推論に対する予測を改善する。提案手法は,他の手法よりも優れており,特に,ニュースバイアス情報の導入がパフォーマンスに大きく影響を与え,平均F1が.739から.815に向上することがわかった。最後に,米国発ニュースの見出しを包括的に分析し,報道機関とサイクリング特化ウェブサイトの相違や,サイクリストの性別による報告の相違を見出した。 WARNING: 本論文では事故と死亡について記述する。

Increasing cycling for transportation or recreation can boost health and reduce the environmental impacts of vehicles. However, news agencies' ideologies and reporting styles often influence public perception of cycling. For example, if news agencies overly report cycling accidents, it may make people perceive cyclists as "dangerous," reducing the number of cyclists who opt to cycle. Additionally, a decline in cycling can result in less government funding for safe infrastructure. In this paper, we develop a method for detecting the perceived perception of cyclists within news headlines. We introduce a new dataset called ``Bike Frames'' to accomplish this. The dataset consists of 31,480 news headlines and 1,500 annotations. Our focus is on analyzing 11,385 headlines from the United States. We also introduce the BikeFrame Chain-of-Code framework to predict cyclist perception, identify accident-related headlines, and determine fault. This framework uses pseudocode for precise logic and integrates news agency bias analysis for improved predictions over traditional chain-of-thought reasoning in large language models. Our method substantially outperforms other methods, and most importantly, we find that incorporating news bias information substantially impacts performance, improving the average F1 from .739 to .815. Finally, we perform a comprehensive case study on US-based news headlines, finding reporting differences between news agencies and cycling-specific websites as well as differences in reporting depending on the gender of cyclists. WARNING: This paper contains descriptions of accidents and death.

翻訳日:2024-07-19 00:00:34 公開日:2024-07-16

# SSL-Cleanse: 自己監視学習におけるトロイの木馬の検出と緩和

SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning ( http://arxiv.org/abs/2303.09079v3 )

ライセンス: Link先を確認

Mengxin Zheng, Jiaqi Xue, Zihao Wang, Xun Chen, Qian Lou, Lei Jiang, Xiaofeng Wang,

(参考訳) 自己教師付き学習(SSL)は、データ表現を符号化する一般的な手法である。トレーニング済みのSSLイメージエンコーダを使用して、その後、下流の分類器をトレーニングすることで、ラベル付きデータをほとんど持たずに、様々なタスクで印象的なパフォーマンスを実現することができる。 SSLの採用の増加により、SSLエンコーダと関連するTrojan攻撃に関するセキュリティ調査が増加した。 SSLエンコーダに埋め込まれたTrojan攻撃は隠蔽的に動作し、複数のユーザやデバイスに分散する。トロイの木馬エンコーダにおけるバックドアの挙動の存在は、下流の分類器によって必然的に継承され、脅威の検出と緩和がさらに困難になる。教師あり学習における現在のトロイの木馬検出手法は、SSL下流の分類器を保護できる可能性があるが、広く普及する前にSSLエンコーダ内のトリガーを特定し、対処することは難しい課題である。この課題は、ダウンストリームタスクが不明な場合やデータセットラベルが使用できない場合、SSLエンコーダのトロイの木馬検出時に、元のラベルのないトレーニングデータセットにアクセスできない場合、発生します。 SSLエンコーダのバックドア脅威を特定し軽減するためのソリューションとしてSSL-Cleanseを導入します。 1200エンコーダを用いてさまざまなデータセット上でSSL-Cleanseを評価し,ImageNet-100では平均82.2%の検出成功率を達成した。バックドアを緩和した後、平均して、バックドアエンコーダは、高い精度の損失なしに0.3%の攻撃成功率を達成し、SSL-Cleanseの有効性を証明した。 SSL-Cleanseのソースコードはhttps://github.com/UCF-ML-Research/SSL-Cleanseで公開されている。

Self-supervised learning (SSL) is a prevalent approach for encoding data representations. Using a pre-trained SSL image encoder and subsequently training a downstream classifier, impressive performance can be achieved on various tasks with very little labeled data. The growing adoption of SSL has led to an increase in security research on SSL encoders and associated Trojan attacks. Trojan attacks embedded in SSL encoders can operate covertly, spreading across multiple users and devices. The presence of backdoor behavior in Trojaned encoders can inadvertently be inherited by downstream classifiers, making it even more difficult to detect and mitigate the threat. Although current Trojan detection methods in supervised learning can potentially safeguard SSL downstream classifiers, identifying and addressing triggers in the SSL encoder before its widespread dissemination is a challenging task. This challenge arises because downstream tasks might be unknown, dataset labels may be unavailable, and the original unlabeled training dataset might be inaccessible during Trojan detection in SSL encoders. We introduce SSL-Cleanse as a solution to identify and mitigate backdoor threats in SSL encoders. We evaluated SSL-Cleanse on various datasets using 1200 encoders, achieving an average detection success rate of 82.2% on ImageNet-100. After mitigating backdoors, on average, backdoored encoders achieve 0.3% attack success rate without great accuracy loss, proving the effectiveness of SSL-Cleanse. The source code of SSL-Cleanse is available at https://github.com/UCF-ML-Research/SSL-Cleanse.

翻訳日:2024-07-19 00:00:34 公開日:2024-07-16

# PlayBest: 拡散型プランニングによるプロバスケットボール選手の行動合成

PlayBest: Professional Basketball Player Behavior Synthesis via Planning with Diffusion ( http://arxiv.org/abs/2306.04090v3 )

ライセンス: Link先を確認

Xiusi Chen, Wei-Yao Wang, Ziniu Hu, David Reynoso, Kun Jin, Mingyan Liu, P. Jeffrey Brantingham, Wei Wang,

(参考訳) 複雑なシステムにおける動的計画は、様々な領域における意思決定を改善するために研究されている。プロバスケットボールは、文脈に依存した意思決定を含む動的時空間ゲームの魅力的な例である。しかし,様々なオンコート信号の処理や潜在的な行動や成果の膨大な空間のナビゲートは,進化する状況に対応する最適な戦略を迅速に特定することが困難である。本研究では,条件付き軌道生成プロセスとして逐次決定過程を定式化する。この定式化に基づき,プレイベスト (PLAYer BEhavior Synthesis) を導入する。本研究では,NBA選手の運動追跡データから,拡散確率モデルを拡張し,環境動態を学習する。データ駆動戦略を取り入れるために、補助値関数は対応する報酬で訓練される。報奨誘導軌道生成を実現するため,分類器誘導サンプリングにより,値関数上の拡散モデルを条件とした。本研究では,プロバスケットボールチームで採用されている選手と,生成された軌跡を対比し,シミュレーション研究によりPlayBestの有効性を検証した。以上の結果から,このモデルは,効率的なプレーを実現する合理的なバスケットボールコースの創出に優れることが明らかとなった。さらに、合成されたプレイ戦略はプロの戦術と一致しており、バスケットボールの試合の複雑なダイナミクスを捉えるためのモデルの能力を強調している。

Dynamically planning in complex systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing context-dependent decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes make it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we formulate the sequential decision-making process as a conditional trajectory generation process. Based on the formulation, we introduce PlayBest (PLAYer BEhavior SynThesis), a method to improve player decision-making. We extend the diffusion probabilistic model to learn challenging environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained with corresponding rewards. To accomplish reward-guided trajectory generation, we condition the diffusion model on the value function via classifier-guided sampling. We validate the effectiveness of PlayBest through simulation studies, contrasting the generated trajectories with those employed by professional basketball teams. Our results reveal that the model excels at generating reasonable basketball trajectories that produce efficient plays. Moreover, the synthesized play strategies exhibit an alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games.

翻訳日:2024-07-18 23:50:47 公開日:2024-07-16

# Skills-in-Context Prompting:大規模言語モデルにおける構成性の解き放つ

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models ( http://arxiv.org/abs/2308.00304v3 )

ライセンス: Link先を確認

Jiaao Chen, Xiaoman Pan, Dian Yu, Kaiqiang Song, Xiaoyang Wang, Dong Yu, Jianshu Chen,

(参考訳) 本研究では,大規模言語モデル(LLM)における合成一般化能力の活用方法について検討する。構成的一般化は、人間の知性に似た重要な推論能力である基礎的スキルを組み合わせることによって、LCMに複雑な問題を解決する権限を与える。しかし、最も先進的なLSMでさえ、このタイプの推論に苦戦している。我々は,この問題を文脈内学習の枠組みの中で検討し,これらのスキルに根ざした基礎的スキルと構成的事例の両方を同じプロンプトの文脈で示すことが重要であることを発見した。本稿では,このプロンプト構造をスキル・イン・コンテクスト(SKiC)と呼ぶ。 2つの例に限らず、この文脈内学習構造により、LLMは革新的なスキルの組み合わせを必要とするより困難な問題に取り組み、幅広いタスクにわたってほぼ完璧な体系的一般化を実現することができる。興味深いことに、SKiCはLSMの潜在可能性を解き、既存の内部スキルをより積極的に活用して複雑な推論問題を解決することができる。 SKiCの構造は、異なるスキル構成や模範的な選択にまたがって堅牢であり、新しいタスクへの強い伝達性を示す。最後に,本研究では,SKiC型データを用いた微調整LDMを用いて,ゼロショットの弱強一般化を導出し,モデルが標準的プロンプトで直接的にはるかに難しい問題を解けることを示す。

We investigate how to elicit compositional generalization capabilities in large language models (LLMs). Compositional generalization empowers LLMs to solve complex problems by combining foundational skills, a critical reasoning ability akin to human intelligence. However, even the most advanced LLMs currently struggle with this form of reasoning. We examine this problem within the framework of in-context learning and find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial. We refer to this prompt structure as skills-in-context (SKiC). With as few as two exemplars, this in-context learning structure enables LLMs to tackle more challenging problems requiring innovative skill combinations, achieving near-perfect systematic generalization across a broad range of tasks. Intriguingly, SKiC also unlocks the latent potential of LLMs, allowing them to more actively utilize pre-existing internal skills acquired during earlier pretraining stages to solve complex reasoning problems. The SKiC structure is robust across different skill constructions and exemplar choices and demonstrates strong transferability to new tasks. Finally, inspired by our in-context learning study, we show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization, enabling the models to solve much harder problems directly with standard prompting.

翻訳日:2024-07-18 23:28:28 公開日:2024-07-16

# EGIC:セマンティックセグメンテーションによる低ビットレート生成画像圧縮の強化

EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation ( http://arxiv.org/abs/2309.03244v3 )

ライセンス: Link先を確認

Nikolai Körber, Eduard Kromer, Andreas Siebert, Sascha Hauke, Daniel Mueller-Gritschneder, Björn Schuller,

(参考訳) 本稿では,1つのモデルから歪み知覚曲線を効率的にトラバースできる改良された生成画像圧縮手法EGICを紹介する。 EGICは2つの新しいビルディングブロックに基づいている。一)OASIS-Cは、空間的及び意味的に認識された勾配フィードバックをジェネレータに提供し、潜画像分布を条件とした条件付き訓練済みセマンティックセマンティクス誘導識別装置である。二出力残差予測(英: Output Residual Prediction、ORP)とは、MSE最適化とGAN最適化デコーダ出力の残差がGAN再構成に与える影響を調整することにより、合成過程の制御を可能にするマルチリアリズム画像圧縮の逆最適化ソリューションである。共に、EGICは強力なコーデックを形成し、最先端の拡散とGANベースの手法(例えば、HiFiC、MS-ILLM、DIRAC-100)を上回り、歪み端のVTM-20.0とほぼ同等に動作する。 EGICは実装が簡単で、非常に軽量であり、補間特性に優れたので、低ビット範囲をターゲットとした実用的なアプリケーションには有望な候補となる。

We introduce EGIC, an enhanced generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. EGIC is based on two novel building blocks: i) OASIS-C, a conditional pre-trained semantic segmentation-guided discriminator, which provides both spatially and semantically-aware gradient feedback to the generator, conditioned on the latent image distribution, and ii) Output Residual Prediction (ORP), a retrofit solution for multi-realism image compression that allows control over the synthesis process by adjusting the impact of the residual between an MSE-optimized and GAN-optimized decoder output on the GAN-based reconstruction. Together, EGIC forms a powerful codec, outperforming state-of-the-art diffusion and GAN-based methods (e.g., HiFiC, MS-ILLM, and DIRAC-100), while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight, and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.

翻訳日:2024-07-18 23:28:28 公開日:2024-07-16

# 共鳴駆動型アンサンブルにおける量子デフォーカスの能動的抑制

Active Suppression of Quantum Dephasing in Resonantly Driven Ensembles ( http://arxiv.org/abs/2310.10525v3 )

ライセンス: Link先を確認

Chengxing He, Robert R. Jones,

(参考訳) 我々は、ランダムな原子位置が原子対内のコヒーレントな集団移動に与える影響を抑制するために量子制御を用い、数百原子のリドベルクガス中の双極子-双極子駆動のラビ振動の観測を可能にした。この方法は、非共振ラビ周波数の結合強度の低下を利用して、非線形光学における準位相マッチングと類似して達成可能な集団移動をコヒーレントに増幅する。シミュレーションは実験結果を再現し、他の多体量子制御アプリケーションに対する手法の潜在的な利点を実証する。

We have used quantum control to suppress the impact of random atom positions on coherent population transfer within atom pairs, enabling the observation of dipole-dipole driven Rabi oscillations in a Rydberg gas with hundreds of atoms. The method exploits the reduced coupling-strength sensitivity of the off-resonant Rabi frequency, and coherently amplifies the achievable population transfer in analogy to quasi-phase-matching in non-linear optics. Simulations reproduce the experimental results and demonstrate the potential benefits of the technique to other many-body quantum control applications.

翻訳日:2024-07-18 23:18:25 公開日:2024-07-16

# 生成的データ拡張を用いた統合失調症診断のための説明可能な深層学習手法

An Explainable Deep Learning-Based Method For Schizophrenia Diagnosis Using Generative Data-Augmentation ( http://arxiv.org/abs/2310.16867v2 )

ライセンス: Link先を確認

Mehrshad Saadatinia, Armin Salimi-Badr,

(参考訳) 本研究では,脳波記録を用いた統合失調症の自動診断にディープラーニングを用いた手法を応用した。このアプローチは、診断の精度を高める強力な手法である生成データ拡張を利用する。時間周波数特性を利用するために, 原信号からスペクトルを抽出した。いくつかのニューラルネットワークアーキテクチャのセットアップを探索した後、最初の診断には適切な畳み込みニューラルネットワーク(CNN)が使用された。その後、Wasserstein GAN と Gradient Penalty (WGAN-GP) と Variational Autoencoder (VAE) を用いて、2つの異なる合成データセットを生成し、初期データセットを増大させ、過度に適合する問題に対処した。 VAEを用いたデータセットの精度は最大99.0.%まで3.0\%向上し、損失値も低く、収束も速くなった。最後に、診断過程において最も重要なスーパーピクセル(周波数)を決定するために、Local Interpretable Model-Agnostic Explanations (LIME)アルゴリズムを用いたブラックボックスモデルの信頼性の欠如に対処した。

In this study, we leverage a deep learning-based method for the automatic diagnosis of schizophrenia using EEG brain recordings. This approach utilizes generative data augmentation, a powerful technique that enhances the accuracy of the diagnosis. To enable the utilization of time-frequency features, spectrograms were extracted from the raw signals. After exploring several neural network architectural setups, a proper convolutional neural network (CNN) was used for the initial diagnosis. Subsequently, using Wasserstein GAN with Gradient Penalty (WGAN-GP) and Variational Autoencoder (VAE), two different synthetic datasets were generated in order to augment the initial dataset and address the over-fitting issue. The augmented dataset using VAE achieved a 3.0\% improvement in accuracy reaching up to 99.0\% and yielded a lower loss value as well as a faster convergence. Finally, we addressed the lack of trust in black-box models using the Local Interpretable Model-agnostic Explanations (LIME) algorithm to determine the most important superpixels (frequencies) in the diagnosis process.

翻訳日:2024-07-18 23:18:25 公開日:2024-07-16

# メタラーニングにおけるアクティブラーニングの探求 - コンテキストセットラベリングの強化

Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling ( http://arxiv.org/abs/2311.02879v2 )

ライセンス: Link先を確認

Wonho Bae, Jing Wang, Danica J. Sutherland,

(参考訳) ほとんどのメタ学習手法は、テスト時に新しいタスクを確立するのに使用される(非常に小さい)コンテキストセットが受動的に提供されると仮定する。しかし、ある設定では、どのポイントをラベルにするかを積極的に選択することは可能であり、慎重に選択することによる潜在的な利益は相当であるが、典型的なアクティブな学習設定との大きな違いが必要である。メタラーニングプロセスのどの部分がアクティブラーニングを使用するかによって、アクティブなメタラーニングを用いてコンテキストセットをラベル付けする方法を明確にする。本枠組みでは,ラベルのどの点を選択するかを選択するため,ガウス混合に適合した自然なアルゴリズムを提案する。提案アルゴリズムは、複数のベンチマークデータセットにまたがる様々なメタラーニングアルゴリズムを使用する場合、最先端のアクティブラーニング手法より優れている。

Most meta-learning methods assume that the (very small) context set used to establish a new task at test time is passively provided. In some settings, however, it is feasible to actively select which points to label; the potential gain from a careful choice is substantial, but the setting requires major differences from typical active learning setups. We clarify the ways in which active meta-learning can be used to label a context set, depending on which parts of the meta-learning process use active learning. Within this framework, we propose a natural algorithm based on fitting Gaussian mixtures for selecting which points to label; though simple, the algorithm also has theoretical motivation. The proposed algorithm outperforms state-of-the-art active learning methods when used with various meta-learning algorithms across several benchmark datasets.

翻訳日:2024-07-18 23:18:25 公開日:2024-07-16

# 階層的関係と常識知識によるシーングラフ生成の強化

Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge ( http://arxiv.org/abs/2311.12889v2 )

ライセンス: Link先を確認

Bowen Jiang, Zhijun Zhuang, Shreyas S. Shivakumar, Camillo J. Taylor,

(参考訳) この研究は、関係階層とコモンセンス知識の両方を組み込むことにより、シーングラフを生成するための拡張されたアプローチを導入する。具体的には、情報的階層構造を利用する階層的関係ヘッドの提案から始める。画像内のオブジェクトペア間の関係のスーパーカテゴリと、各スーパーカテゴリの詳細な関係を共同で予測する。これに続いて、我々は、基礎モデルを利用してシーングラフ予測システムから結果を批判する堅牢なコモンセンス検証パイプラインを実装し、小さな言語のみのモデルであっても非意味な述語を除去する。 Visual GenomeとOpenImage V6データセットに関する大規模な実験は、既存のシーングラフ生成アルゴリズムのプラグイン・アンド・プレイ拡張として提案されたモジュールをシームレスに統合できることを実証している。結果は、データセットアノテーションを超えて、妥当な予測の広範なセットで大幅に改善されたことを示している。コードはhttps://github.com/bowen-upenn/scene_graph_commonsenseで公開されている。

This work introduces an enhanced approach to generating scene graphs by incorporating both a relationship hierarchy and commonsense knowledge. Specifically, we begin by proposing a hierarchical relation head that exploits an informative hierarchical structure. It jointly predicts the relation super-category between object pairs in an image, along with detailed relations under each super-category. Following this, we implement a robust commonsense validation pipeline that harnesses foundation models to critique the results from the scene graph prediction system, removing nonsensical predicates even with a small language-only model. Extensive experiments on Visual Genome and OpenImage V6 datasets demonstrate that the proposed modules can be seamlessly integrated as plug-and-play enhancements to existing scene graph generation algorithms. The results show significant improvements with an extensive set of reasonable predictions beyond dataset annotations. Codes are available at https://github.com/bowen-upenn/scene_graph_commonsense.

翻訳日:2024-07-18 23:08:39 公開日:2024-07-16

# 連続表現学習のための再考

Revisiting Supervision for Continual Representation Learning ( http://arxiv.org/abs/2311.13321v2 )

ライセンス: Link先を確認

Daniel Marczak, Sebastian Cygert, Tomasz Trzciński, Bartłomiej Twardowski,

(参考訳) 連続学習の分野では、モデルは次々にタスクを学ぶように設計されている。ほとんどの研究は教師なし連続学習を中心にしているが、大量のラベルのないデータを活用する教師なし連続学習への関心が高まっている。近年の研究では、堅牢な表現を提供する上で、教師なしの方法、特に自己教師付き学習の強みを強調している。自己教師付き手法で構築した表現の転写性の向上は、多層パーセプトロンプロジェクタが果たす役割と関連していることが多い。本研究では、この観察から出発し、連続表現学習における監督の役割を再検討する。人間のアノテーションのような追加情報は、表現の質を損なうべきではないと考えている。本研究は,多層パーセプトロンヘッドで強化された教師付きモデルにおいて,連続表現学習において自己教師付きモデルよりも優れることを示す。このことは、連続学習における一連のタスクにまたがる特徴伝達可能性を形成する上で、多層パーセプトロンプロジェクタの重要性を強調している。コードはgithubで入手できる。 https://github.com/danielm1405/sl-vs-ssl-cl。

In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, there is a growing interest in unsupervised continual learning, which makes use of the vast amounts of unlabeled data. Recent studies have highlighted the strengths of unsupervised methods, particularly self-supervised learning, in providing robust representations. The improved transferability of those representations built with self-supervised methods is often associated with the role played by the multi-layer perceptron projector. In this work, we depart from this observation and reexamine the role of supervision in continual representation learning. We reckon that additional information, such as human annotations, should not deteriorate the quality of representations. Our findings show that supervised models when enhanced with a multi-layer perceptron head, can outperform self-supervised models in continual representation learning. This highlights the importance of the multi-layer perceptron projector in shaping feature transferability across a sequence of tasks in continual learning. The code is available on github: https://github.com/danielm1405/sl-vs-ssl-cl.

翻訳日:2024-07-18 23:08:38 公開日:2024-07-16

# 拡散モデルを用いた時間連続デテール合成によるビデオ超解像の知覚品質向上

Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models ( http://arxiv.org/abs/2311.15908v2 )

ライセンス: Link先を確認

Claudio Rota, Marco Buzzelli, Joost van de Weijer,

(参考訳) 本稿では,フレーム間の時間的一貫性を確保しつつ,拡散モデル(DM)を用いたビデオ超解像(VSR)の知覚品質向上の問題に対処する。本稿では,リアルタイムかつ時間的に一貫性のある細部を合成することにより,高画質映像の知覚的品質を大幅に向上させる,DMに基づくVSR手法であるStableVSRを提案する。本稿では,TCM(Temporal Conditioning Module)を訓練済みのDMに導入し,単一画像の超解像をVSR法に変換する。 TCMは、隣接フレームで合成された空間的に整列し、詳細に富んだテクスチャ情報を提供する、新しいテンポラルテクスチャガイダンスを使用している。これは、現在のフレームの生成過程を、高品質で時間的に一貫性のある結果へと導く。さらに,過去から未来への情報活用を促進するために,新しいフレームワイド双方向サンプリング戦略を導入する。この戦略は、結果の知覚的品質とフレーム間の時間的一貫性を改善する。本稿では,既存のVSRの最先端手法と比較して,時間的整合性を向上しつつ,高画質映像の知覚品質を高める上でのStableVSRの有効性を実証する。プロジェクトページはhttps://github.com/claudiom4sir/StableVSRで公開されている。

In this paper, we address the problem of enhancing perceptual quality in video super-resolution (VSR) using Diffusion Models (DMs) while ensuring temporal consistency among frames. We present StableVSR, a VSR method based on DMs that can significantly enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details. We introduce the Temporal Conditioning Module (TCM) into a pre-trained DM for single image super-resolution to turn it into a VSR method. TCM uses the novel Temporal Texture Guidance, which provides it with spatially-aligned and detail-rich texture information synthesized in adjacent frames. This guides the generative process of the current frame toward high-quality and temporally-consistent results. In addition, we introduce the novel Frame-wise Bidirectional Sampling strategy to encourage the use of information from past to future and vice-versa. This strategy improves the perceptual quality of the results and the temporal consistency across frames. We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR. The project page is available at https://github.com/claudiom4sir/StableVSR.

翻訳日:2024-07-18 23:08:38 公開日:2024-07-16

# ストリートトライオン:不自由な人物画像からWildのバーチャルトライオンを学習する

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images ( http://arxiv.org/abs/2311.16094v3 )

ライセンス: Link先を確認

Aiyu Cui, Jay Mahajan, Viraj Shah, Preeti Gomathinayagam, Chang Liu, Svetlana Lazebnik,

(参考訳) ほとんどの仮想試着研究は、低コストでスタジオモデルに衣服を展示するための画像を生成することで、ファッションビジネスに役立てることを目的としている。しかし、仮想トライオンは、ユーザーが自分のカジュアルな写真を使って自分自身で衣服を視覚化する、より広範なアプリケーションでなければならない。残念なことに、スタジオ・トライオン・セッティングで有効な結果が得られる既存の手法は、Wildのコンテキストでは性能が良くない。これは、トレーニングにはペア画像(同じ服を着ている人のイメージをペアにした衣料品画像)を必要とすることが多いためである。このようなペアリングされたデータは、スタジオ設定のためにショッピングサイトから簡単に収集できるが、現場のシーンでは入手が困難である。本研究は,(1)実機での仮想試行を支援するためのStreetTryOnベンチマークを導入し,(2)ペアデータを必要とすることなく,実機で直接仮想試行を学習する新しい手法を提案することでギャップを埋める。我々は,DensePoseワープ補正法と拡散型条件付き塗装を組み合わせることで,衣服をより多様な人間のポーズに変形させ,より複雑な背景を忠実にレンダリングするなど,ユニークな課題に取り組む。実験では,標準的なスタジオトライオンタスクと,ストリートトライオンタスクとクロスドメイントライオンタスクのSOTAパフォーマンスの競合性能を示す。

Most virtual try-on research is motivated to serve the fashion business by generating images to demonstrate garments on studio models at a lower cost. However, virtual try-on should be a broader application that also allows customers to visualize garments on themselves using their own casual photos, known as in-the-wild try-on. Unfortunately, the existing methods, which achieve plausible results for studio try-on settings, perform poorly in the in-the-wild context. This is because these methods often require paired images (garment images paired with images of people wearing the same garment) for training. While such paired data is easy to collect from shopping websites for studio settings, it is difficult to obtain for in-the-wild scenes. In this work, we fill the gap by (1) introducing a StreetTryOn benchmark to support in-the-wild virtual try-on applications and (2) proposing a novel method to learn virtual try-on from a set of in-the-wild person images directly without requiring paired data. We tackle the unique challenges, including warping garments to more diverse human poses and rendering more complex backgrounds faithfully, by a novel DensePose warping correction method combined with diffusion-based conditional inpainting. Our experiments show competitive performance for standard studio try-on tasks and SOTA performance for street try-on and cross-domain try-on tasks.

翻訳日:2024-07-18 23:08:38 公開日:2024-07-16

# ScribblePrompt:どんなバイオメディカル画像でも高速でフレキシブルなインタラクティブセグメンテーション

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image ( http://arxiv.org/abs/2312.07381v3 )

ライセンス: Link先を確認

Hallee E. Wong, Marianne Rakic, John Guttag, Adrian V. Dalca,

(参考訳) バイオメディカルイメージセグメンテーションは、科学研究と臨床医療の両方において重要な部分である。十分なラベル付きデータによって、ディープラーニングモデルは、特定のバイオメディカルイメージセグメンテーションタスクを正確に自動化するように訓練することができる。しかし、トレーニングデータを作成するために手動で画像のセグメンテーションを行うのは、非常に労力がかかり、ドメインの専門知識を必要とする。バイオメディカルイメージングのためのフレキシブルニューラルネットワークベースのインタラクティブセグメンテーションツールである \emph{ScribblePrompt} を紹介した。厳密な定量的実験により、ScribblePromptはトレーニング中に見つからないデータセットの従来の方法よりも正確なセグメンテーションを生成することを示した。ドメインの専門家によるユーザスタディでは、ScribblePromptはアノテーションの時間を28%削減し、Diceを15%改善した。 ScribblePromptの成功は、注意深い設計決定にかかっている。これには、非常に多様なイメージとタスクのセット、ユーザインタラクションとラベルをシミュレートする新しいアルゴリズム、高速な推論を可能にするネットワークを含むトレーニング戦略が含まれる。インタラクティブなデモでScribblePromptを紹介し、コードを提供し、https://scribbleprompt.csail.mit.eduでscribbleアノテーションのデータセットをリリースする。

Biomedical image segmentation is a crucial part of both scientific research and clinical care. With enough labelled data, deep learning models can be trained to accurately automate specific biomedical image segmentation tasks. However, manually segmenting images to create training data is highly labor intensive and requires domain expertise. We present \emph{ScribblePrompt}, a flexible neural network based interactive segmentation tool for biomedical imaging that enables human annotators to segment previously unseen structures using scribbles, clicks, and bounding boxes. Through rigorous quantitative experiments, we demonstrate that given comparable amounts of interaction, ScribblePrompt produces more accurate segmentations than previous methods on datasets unseen during training. In a user study with domain experts, ScribblePrompt reduced annotation time by 28% while improving Dice by 15% compared to the next best method. ScribblePrompt's success rests on a set of careful design decisions. These include a training strategy that incorporates both a highly diverse set of images and tasks, novel algorithms for simulated user interactions and labels, and a network that enables fast inference. We showcase ScribblePrompt in an interactive demo, provide code, and release a dataset of scribble annotations at https://scribbleprompt.csail.mit.edu

翻訳日:2024-07-18 22:58:48 公開日:2024-07-16

# 高速拡散方式によるショートカット除去・生成対策

Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation ( http://arxiv.org/abs/2312.14223v2 )

ライセンス: Link先を確認

Nina Weng, Paraskevas Pegios, Eike Petersen, Aasa Feragen, Siavash Bigdeli,

(参考訳) ショートカット学習(英: shortcut learning)とは、モデル、例えば心臓病分類器が対象のラベルと急激なショートカット特徴、例えばペースメーカーとの間の相関を利用して、実際の識別的特徴ではなく、そのショートカットに基づいてターゲットのラベルを予測することである。これは医学的画像において一般的であり、治療と臨床的アノテーションは疾患のラベルと相関し、疾患を予測するためのショートカットを容易にする。本稿では,ショートカットを合成的に除去あるいは付加できる高速拡散型反ファクト画像生成により,潜在的ショートカット特徴の影響の新たな検出と定量化を提案する。空間的制約のあるショートカット特徴の除去を奨励するとともに, ショートカットのないカウンターファクトファクトが残像特徴を高い精度で保持することを保証する。これらを用いて,ショートカットがモデル予測に与える影響を評価する。これは2つ目の貢献によって実現された: 効率的な拡散に基づく反ファクト的説明法は、画像品質を最先端技術に匹敵する精度で推論速度を向上する。胸部X線データセット,皮膚病変データセット,CelebAで確認した。私たちのコードはfastdime.compute.dtu.dkで公開されています。

Shortcut learning is when a model -- e.g. a cardiac disease classifier -- exploits correlations between the target label and a spurious shortcut feature, e.g. a pacemaker, to predict the target label based on the shortcut rather than real discriminative features. This is common in medical imaging, where treatment and clinical annotations correlate with disease labels, making them easy shortcuts to predict disease. We propose a novel detection and quantification of the impact of potential shortcut features via a fast diffusion-based counterfactual image generation that can synthetically remove or add shortcuts. Via a novel inpainting-based modification we spatially limit the changes made with no extra inference step, encouraging the removal of spatially constrained shortcut features while ensuring that the shortcut-free counterfactuals preserve their remaining image features to a high degree. Using these, we assess how shortcut features influence model predictions. This is enabled by our second contribution: An efficient diffusion-based counterfactual explanation method with significant inference speed-up at comparable image quality as state-of-the-art. We confirm this on two large chest X-ray datasets, a skin lesion dataset, and CelebA. Our code is publicly available at fastdime.compute.dtu.dk.

翻訳日:2024-07-18 22:58:48 公開日:2024-07-16

# 変分量子アルゴリズムのための階層型多重グリッドアンサッツ

Hierarchical Multigrid Ansatz for Variational Quantum Algorithms ( http://arxiv.org/abs/2312.15048v2 )

ライセンス: Link先を確認

Christo Meriwether Keller, Stephan Eidenbenz, Andreas Bärtschi, Daniel O'Malley, John Golden, Satyajayant Misra,

(参考訳) 量子コンピューティングは、基礎物理学を用いたスーパーコンピューティングを強化することを約束する、エンジニアリングにおける新たなトピックである。短期的には、この利点を達成するための最良の候補アルゴリズムは変分量子アルゴリズム(VQA)である。本稿では,変分量子固有解法(VQE)を中心に,新しいVQAアンサッツの設計と数値評価を行う。我々のアンザッツは古典的な多重グリッド階層法にインスパイアされているので、我々はそれを「マルチグリッド」アンザッツと呼ぶ。マルチグリッドアンサッツは、より小さなキュービット数に対する回路を連続的に構築し最適化することにより、$n$ qubitsの量子問題に対するパラメータ化量子回路を生成し、$j+1$の次のレベル階層に対する初期解として最適化されたパラメータ値を再利用する。数値シミュレーションにより,Laplacian 固有解器の解法品質やMaxCut と Maximum $k$-Satisfiability の具体例による組合せ最適化問題において,マルチグリッドアンサッツは標準的なハードウェア効率のアンサッツよりも優れていることを示す。本研究は,多くのVQAに対して有効な候補としてマルチグリッドアンサッツを確立し,特に組合せ最適化問題に対するQAOAアプローチの代替として有望であることを示す。

Quantum computing is an emerging topic in engineering that promises to enhance supercomputing using fundamental physics. In the near term, the best candidate algorithms for achieving this advantage are variational quantum algorithms (VQAs). We design and numerically evaluate a novel ansatz for VQAs, focusing in particular on the variational quantum eigensolver (VQE). As our ansatz is inspired by classical multigrid hierarchy methods, we call it "multigrid" ansatz. The multigrid ansatz creates a parameterized quantum circuit for a quantum problem on $n$ qubits by successively building and optimizing circuits for smaller qubit counts $j < n$, reusing optimized parameter values as initial solutions to next level hierarchy at $j+1$. We show through numerical simulation that the multigrid ansatz outperforms the standard hardware-efficient ansatz in terms of solution quality for the Laplacian eigensolver as well as for a large class of combinatorial optimization problems with specific examples for MaxCut and Maximum $k$-Satisfiability. Our studies establish the multi-grid ansatz as a viable candidate for many VQAs and in particular present a promising alternative to the QAOA approach for combinatorial optimization problems.

翻訳日:2024-07-18 22:58:48 公開日:2024-07-16

# Diffusion-ES: 自律走行とゼロショット指示に続く拡散を考慮したグラディエントフリープランニング

Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following ( http://arxiv.org/abs/2402.06559v2 )

ライセンス: Link先を確認

Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki,

(参考訳) 拡散モデルは、意思決定と制御のための複雑および多モーダルな軌道分布のモデリングにおいて優れている。微分可能報酬関数と拡散モデルで捉えたデータ分布下での確率の両方を最大化する軌道を生成するために、最近、逆勾配誘導型雑音発生法が提案されている。逆勾配誘導復調法は、クリーンサンプルとノイズサンプルの両方に適合する微分可能な報酬関数を必要とし、一般的な軌道最適化器としての適用性を制限する。本稿では,データ多様体に留まりながら,勾配のない最適化とトラジェクトリデノイングを組み合わせて,ブラックボックスの非微分対象を最適化するDiffusionESを提案する。拡散-ESは、拡散モデルからの進化的探索中に軌道をサンプリングし、ブラックボックス報酬関数を用いてそれらをスコアする。トランキャット拡散法を用いて高次の軌道を変異させ、少数のノイズ化とデノナイジングのステップを適用し、解空間のより効率的な探索を可能にする。 DiffusionESは、自動運転のための確立されたクローズドループ計画ベンチマークであるnuPlan上で、最先端のパフォーマンスを実現する。 Diffusion-ESは、既存のサンプリングベースプランナー、リアクティブ決定性または拡散ベースのポリシー、報酬段階のガイダンスより優れている。また,従来の指導手法と異なり,本手法はLLMプロンプトが生成する非微分言語型報酬関数を最適化できることを示す。学習データには存在しないアグレッシブレーンウィービングのような,新たな複雑な行動を生成することができる。これにより、既存の軌道最適化メソッドと駆動ポリシーの能力を超えた最も難しいnuPlanシナリオを解決できます。

Diffusion models excel at modeling complex and multimodal trajectory distributions for decision-making and control. Reward-gradient guided denoising has been recently proposed to generate trajectories that maximize both a differentiable reward function and the likelihood under the data distribution captured by a diffusion model. Reward-gradient guided denoising requires a differentiable reward function fitted to both clean and noised samples, limiting its applicability as a general trajectory optimizer. In this paper, we propose DiffusionES, a method that combines gradient-free optimization with trajectory denoising to optimize black-box non-differentiable objectives while staying in the data manifold. Diffusion-ES samples trajectories during evolutionary search from a diffusion model and scores them using a black-box reward function. It mutates high-scoring trajectories using a truncated diffusion process that applies a small number of noising and denoising steps, allowing for much more efficient exploration of the solution space. We show that DiffusionES achieves state-of-the-art performance on nuPlan, an established closed-loop planning benchmark for autonomous driving. Diffusion-ES outperforms existing sampling-based planners, reactive deterministic or diffusion-based policies, and reward-gradient guidance. Additionally, we show that unlike prior guidance methods, our method can optimize non-differentiable language-shaped reward functions generated by few-shot LLM prompting. When guided by a human teacher that issues instructions to follow, our method can generate novel, highly complex behaviors, such as aggressive lane weaving, which are not present in the training data. This allows us to solve the hardest nuPlan scenarios which are beyond the capabilities of existing trajectory optimization methods and driving policies.

翻訳日:2024-07-18 22:48:58 公開日:2024-07-16

# 拡散生成モデルにおける最近近傍スコア推定器

Nearest Neighbour Score Estimators for Diffusion Generative Models ( http://arxiv.org/abs/2402.08018v2 )

ライセンス: Link先を確認

Matthew Niedoba, Dylan Green, Saeid Naderiparizi, Vasileios Lioutas, Jonathan Wilder Lavington, Xiaoxuan Liang, Yunpeng Liu, Ke Zhang, Setareh Dabiri, Adam Ścibior, Berend Zwartsenberg, Frank Wood,

(参考訳) スコア関数推定は拡散生成モデルからのトレーニングとサンプリングの両方の基礎となる。この事実にもかかわらず、最もよく使われる推定器は、バイアス付きニューラルネットワーク近似または条件スコアに基づく高分散モンテカルロ推定器である。トレーニングセットから複数のサンプルを抽出し,推定値の分散を劇的に低減する新しい近傍スコア関数推定器を提案する。低分散推定器を2つの説得力のある応用に活用する。推定器による整合性モデルの訓練を行い, 収束速度と試料品質の両面で有意な増加が報告された。拡散モデルでは,確率フローODE統合のための学習ネットワークを推定器で置き換えることができ,将来的な研究の新たな道が開かれる。

Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research.

翻訳日:2024-07-18 22:48:58 公開日:2024-07-16

# ELECTRAの文は修復以上のものなのか? : 意味的テクスチャ類似性の事例

Are ELECTRA's Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity ( http://arxiv.org/abs/2402.13130v2 )

ライセンス: Link先を確認

Ivan Rep, David Dukić, Jan Šnajder,

(参考訳) BERTは高品質な文埋め込みを生成するが、事前学習の計算コストは大きな欠点である。これとは対照的に、ELECTRAはコスト効率のよい事前学習目標と下流タスクのパフォーマンスの改善を提供するが、文の埋め込みとしては機能しない。コミュニティは、セマンティックテキスト類似性(STS)にELECTRAの文を埋め込むことを熱心に止めた。 ELECTRAディスクリミネータの最後の層を以前の層と比較すると,性能が著しく低下していることが分かる。我々はこの落下を探索し、ELECTRAの埋め込みを修復する方法を考案し、新しいTMFT法を提案する。 TMFTは、STSベンチマークデータセットのパラメータ効率を高めながら、スピアマン相関係数を8点以上改善する。我々は分析を様々なモデルサイズと言語に拡張する。さらに,BERTと同等に動作するELECTRAのジェネレータモデルに対して,パラメータが大幅に小さく,埋め込みサイズも大幅に小さくなった。最後に、TMFTと単語類似性タスク、ドメイン適応型事前学習を組み合わせることで、さらなる向上を観察する。

While BERT produces high-quality sentence embeddings, its pre-training computational cost is a significant drawback. In contrast, ELECTRA delivers a cost-effective pre-training objective and downstream task performance improvements, but not as performant sentence embeddings. The community tacitly stopped utilizing ELECTRA's sentence embeddings for semantic textual similarity (STS). We notice a significant drop in performance when using the ELECTRA discriminator's last layer in comparison to earlier layers. We explore this drop and devise a way to repair ELECTRA's embeddings, proposing a novel truncated model fine-tuning (TMFT) method. TMFT improves the Spearman correlation coefficient by over 8 points while increasing parameter efficiency on the STS benchmark dataset. We extend our analysis to various model sizes and languages. Further, we discover the surprising efficacy of ELECTRA's generator model, which performs on par with BERT, using significantly fewer parameters and a substantially smaller embedding size. Finally, we observe further boosts by combining TMFT with a word similarity task or domain adaptive pre-training.

翻訳日:2024-07-18 22:39:10 公開日:2024-07-16

# MedContext:効率的なボリューム・メディカル・セグメンテーションのためのコンテキストキューの学習

MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation ( http://arxiv.org/abs/2402.17725v2 )

ライセンス: Link先を確認

Hanan Gani, Muzammal Naseer, Fahad Khan, Salman Khan,

(参考訳) ボリューム・メディカル・セグメンテーションは、異なる意味領域を規定する3次元医用画像解析の重要な構成要素である。ディープニューラルネットワークは、ボリューム医学のセグメンテーションを大幅に改善したが、一般的には、パフォーマンス向上のために大規模な注釈付きデータを必要とするため、高価で入手が禁止される可能性がある。この制限に対処するために、既存の研究は典型的には伝達学習や、代表的特徴を学習するための専用の訓練済みファインタニングステージの設計を行う。しかし、ソースとターゲットドメインのミスマッチにより、ボリュームデータの最適な表現を学習することは困難になり、マルチステージトレーニングでは、より高い計算と、ステージ固有の設計選択を慎重に選択する必要がある。対照的に、アーキテクチャに依存しない、既存の医用セグメンテーションのトレーニングフレームワークに組み込むことのできる、MedContextと呼ばれる普遍的なトレーニングフレームワークを提案する。本手法は,大規模注釈付ボリューム・メディカル・データや専用トレーニング前ファインタニング・ステージを必要とせず,教師付きボクセルセグメンテーション・タスクと協調して自己指導型コンテキストキューを効果的に学習する。提案手法は,出力セグメンテーション空間における臓器の欠損部分や臓器の再構築を学習することで,ネットワーク内の文脈知識を誘導する。 MedContextの有効性は、複数の3D医療データセットと4つの最先端モデルアーキテクチャで検証されている。このアプローチは、数ショットのデータシナリオであっても、データセットと異なるアーキテクチャ間でセグメンテーションパフォーマンスが一貫して向上していることを示します。私たちのコードと事前訓練されたモデルはhttps://github.com/hananshafi/MedContextで利用可能です。

Volumetric medical segmentation is a critical component of 3D medical image analysis that delineates different semantic regions. Deep neural networks have significantly improved volumetric medical segmentation, but they generally require large-scale annotated data to achieve better performance, which can be expensive and prohibitive to obtain. To address this limitation, existing works typically perform transfer learning or design dedicated pretraining-finetuning stages to learn representative features. However, the mismatch between the source and target domain can make it challenging to learn optimal representation for volumetric data, while the multi-stage training demands higher compute as well as careful selection of stage-specific design choices. In contrast, we propose a universal training framework called MedContext that is architecture-agnostic and can be incorporated into any existing training framework for 3D medical segmentation. Our approach effectively learns self supervised contextual cues jointly with the supervised voxel segmentation task without requiring large-scale annotated volumetric medical data or dedicated pretraining-finetuning stages. The proposed approach induces contextual knowledge in the network by learning to reconstruct the missing organ or parts of an organ in the output segmentation space. The effectiveness of MedContext is validated across multiple 3D medical datasets and four state-of-the-art model architectures. Our approach demonstrates consistent gains in segmentation performance across datasets and different architectures even in few-shot data scenarios. Our code and pretrained models are available at https://github.com/hananshafi/MedContext

翻訳日:2024-07-18 22:39:10 公開日:2024-07-16

# アルゴリズムはいつ辞任すべきか? AIガバナンスの提案

When Should Algorithms Resign? A Proposal for AI Governance ( http://arxiv.org/abs/2402.18326v2 )

ライセンス: Link先を確認

Umang Bhatt, Holli Sargeant,

(参考訳) アルゴリズムの辞退は、ガバナンスを直接AIシステムに埋め込むことによって、人工知能(AI)の使用を管理する戦略的アプローチである。特定のシナリオにおいて、AIの適切な効果的な使用を支援するために、アクセスAI出力の制限やパフォーマンス評価の表示など、AIからの意図的かつインフォームドな切り離しが含まれる。アルゴリズムの辞退をガバナンスメカニズムとして統合することにより、組織はAIの使用タイミングと使い方をよりよく制御し、自動化のメリットと人間の監視の必要性のバランスを取ることができる。

Algorithmic resignation is a strategic approach for managing the use of artificial intelligence (AI) by embedding governance directly into AI systems. It involves deliberate and informed disengagement from AI, such as restricting access AI outputs or displaying performance disclaimers, in specific scenarios to aid the appropriate and effective use of AI. By integrating algorithmic resignation as a governance mechanism, organizations can better control when and how AI is used, balancing the benefits of automation with the need for human oversight.

翻訳日:2024-07-18 22:39:10 公開日:2024-07-16

# 空間コヒーレンス損失:正当性およびカモフラージュ性物体検出における全ての物体

Spatial Coherence Loss: All Objects Matter in Salient and Camouflaged Object Detection ( http://arxiv.org/abs/2402.18698v2 )

ライセンス: Link先を確認

Ziyun Yang, Kevin Choy, Sina Farsiu,

(参考訳) ジェネリックオブジェクト検出は、オブジェクトの正確なモデリングに依存するカテゴリに依存しないタスクである。正確な意味分析を行うには,事前定義された基底真理(GT)オブジェクトや,ネットワークが前景と誤認する曖昧なデコイオブジェクトを含む,学習の任意の段階で現れるオブジェクトレベルの予測を学習する必要がある。しかし、最も関連するモデルは、主にGTオブジェクトの学習を改善することに焦点を当てた。デコイオブジェクトを考えるいくつかの方法は、単一の不明瞭なピクセルの損失応答にのみ焦点をあてるロス関数を利用するため、オブジェクトレベルのあいまいさ学習設計が提供できる豊富な情報から恩恵を受けない。人間の視覚システムに触発され,まず意味を掘り下げる前に曖昧な領域の境界を識別し,隣接する画素間の相互応答を広義に用いた新しい損失関数である空間コヒーレンス損失(SCLoss)を提案する。提案するSCLosは, 自己適応的に境界を検出, 強調することにより, あいまいな領域を徐々に学習できることを実証する。総合的な実験により、一般的な損失関数をSCLosに置き換えることで、現在の最先端(SOTA)サラリアンまたはカモフラージュされたオブジェクト検出(SODまたはCOD)モデルの性能が向上することを示した。また、SCLosと他の損失関数を組み合わせることで、パフォーマンスが向上し、異なるアプリケーションに対してSOTA結果が得られることを示す。

Generic object detection is a category-independent task that relies on accurate modeling of objectness. We show that for accurate semantic analysis, the network needs to learn all object-level predictions that appear at any stage of learning, including the pre-defined ground truth (GT) objects and the ambiguous decoy objects that the network misidentifies as foreground. Yet, most relevant models focused mainly on improving the learning of the GT objects. A few methods that consider decoy objects utilize loss functions that only focus on the single-response, i.e., the loss response of a single ambiguous pixel, and thus do not benefit from the wealth of information that an object-level ambiguity learning design can provide. Inspired by the human visual system, which first discerns the boundaries of ambiguous regions before delving into the semantic meaning, we propose a novel loss function, Spatial Coherence Loss (SCLoss), that incorporates the mutual response between adjacent pixels into the widely-used single-response loss functions. We demonstrate that the proposed SCLoss can gradually learn the ambiguous regions by detecting and emphasizing their boundaries in a self-adaptive manner. Through comprehensive experiments, we demonstrate that replacing popular loss functions with SCLoss can improve the performance of current state-of-the-art (SOTA) salient or camouflaged object detection (SOD or COD) models. We also demonstrate that combining SCLoss with other loss functions can further improve performance and result in SOTA outcomes for different applications.

翻訳日:2024-07-18 22:39:10 公開日:2024-07-16

# ゼロショットインスタンスナビゲーションのための優先順位付きセマンティック学習

Prioritized Semantic Learning for Zero-shot Instance Navigation ( http://arxiv.org/abs/2403.11650v2 )

ライセンス: Link先を確認

Xinyu Sun, Lizhao Liu, Hongyan Zhi, Ronghe Qiu, Junwei Liang,

(参考訳) 我々はゼロショットのインスタンスナビゲーションについて研究し、エージェントはトレーニングにオブジェクトアノテーションを使わずに特定のオブジェクトにナビゲートする。従来のオブジェクトナビゲーション手法では、事前トレーニングのためにImage-goal Navigation (ImageNav) タスクを適用し、エージェントを移動して視覚言語モデルを用いてオブジェクト目標を達成する。しかし、これらのアプローチは意味的無視の問題を招き、モデルが意味的な意味的アライメントを学ばない。本稿では,ナビゲーションエージェントのセマンティック理解能力を向上させるために,優先度付き意味学習(PSL)手法を提案する。具体的には、セマンティック強化PSLエージェントを提案し、明確なセマンティックインスペクションを示すゴールイメージを選択し、厳密な正確なビューマッチングから報酬関数を緩和するために、優先順位付けされたセマンティックトレーニング戦略を導入する。推論時には、目標セマンティクスの粒度レベルをトレーニングと同一に保つために意味拡張推論スキームが設計される。さらに、一般的なHM3D環境では、目的が単にオブジェクトカテゴリによって定義されるObject Navigation(ObjectNav)タスクとは対照的に、特定のオブジェクトインスタンスに詳細な説明をする必要のあるインスタンスナビゲーション(InstanceNav)タスクを提示します。我々のPSLエージェントは、0ショットのObjectNavにおいて、0ショットのObjectNavを66%上回り、新しいInstanceNavタスクよりも優れている。コードはhttps://github.com/XinyuSun/PSL-InstanceNav.comでリリースされる。

We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training. Previous object navigation approaches apply the image-goal navigation (ImageNav) task (go to the location of an image) for pretraining, and transfer the agent to achieve object goals using a vision-language model. However, these approaches lead to issues of semantic neglect, where the model fails to learn meaningful semantic alignments. In this paper, we propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents. Specifically, a semantic-enhanced PSL agent is proposed and a prioritized semantic training strategy is introduced to select goal images that exhibit clear semantic supervision and relax the reward function from strict exact view matching. At inference time, a semantic expansion inference scheme is designed to preserve the same granularity level of the goal semantic as training. Furthermore, for the popular HM3D environment, we present an Instance Navigation (InstanceNav) task that requires going to a specific object instance with detailed descriptions, as opposed to the Object Navigation (ObjectNav) task where the goal is defined merely by the object category. Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot ObjectNav in terms of success rate and is also superior on the new InstanceNav task. Code will be released at https://github.com/XinyuSun/PSL-InstanceNav.

翻訳日:2024-07-18 22:29:24 公開日:2024-07-16

# テキスト分類のためのアクティブ学習者の脆弱性について

On the Fragility of Active Learners for Text Classification ( http://arxiv.org/abs/2403.15744v4 )

ライセンス: Link先を確認

Abhishek Ghose, Emma Thuong Nguyen,

(参考訳) アクティブラーニング(AL)技術は、学習に最も価値のあるインスタンスを反復的に選択することで、ラベル付け予算を最適に活用する。しかし、それらは '`prerequisite checks''' を欠いている。すなわち、データセットに最も適したALアルゴリズムを選択するための所定の基準はない。実践者は、事前に報告された結果に基づいて、ランダムサンプリングを破るテクニックを選択し、データセット、予算のラベル付け、予測パイプラインといった、環境内の多くの変数に対してレジリエンスを期待する必要があります。平均してどのくらいの頻度で、任意のALテクニックが、ランダムサンプリングの計算的安価で実装が容易な戦略を確実に打ち負かすことを期待していますか? ALを予測パイプラインの‘Always ON’モードで使用するのは,少なくとも意味があるのだろうか? ALの成功において、予測パイプラインはどの程度の役割を担っていますか? 本稿では,現在ユビキタスな事前学習表現を用いたテキスト分類タスクについて,これらの質問を詳細に検討する。ここでの私たちの主な貢献は、wrtデータセット、テキスト表現、分類器によって異なるセットアップをまたいだALテクニック、古くて新しい、厳密な評価です。これはウォームアップ時間に関する複数の洞察、すなわちALからの利得の前にラベルの数、`Always ON'モードの生存可能性、および異なる要因の相対的重要性を解き放つ。さらに,テキスト分類のためのAL手法の厳密なベンチマークを行うためのフレームワークもリリースした。

Active learning (AL) techniques optimally utilize a labeling budget by iteratively selecting instances that are most valuable for learning. However, they lack ``prerequisite checks'', i.e., there are no prescribed criteria to pick an AL algorithm best suited for a dataset. A practitioner must pick a technique they \emph{trust} would beat random sampling, based on prior reported results, and hope that it is resilient to the many variables in their environment: dataset, labeling budget and prediction pipelines. The important questions then are: how often on average, do we expect any AL technique to reliably beat the computationally cheap and easy-to-implement strategy of random sampling? Does it at least make sense to use AL in an ``Always ON'' mode in a prediction pipeline, so that while it might not always help, it never under-performs random sampling? How much of a role does the prediction pipeline play in AL's success? We examine these questions in detail for the task of text classification using pre-trained representations, which are ubiquitous today. Our primary contribution here is a rigorous evaluation of AL techniques, old and new, across setups that vary wrt datasets, text representations and classifiers. This unlocks multiple insights around warm-up times, i.e., number of labels before gains from AL are seen, viability of an ``Always ON'' mode and the relative significance of different factors. Additionally, we release a framework for rigorous benchmarking of AL techniques for text classification.

翻訳日:2024-07-18 22:29:24 公開日:2024-07-16

# フェデレートラーニングにおけるマルチモーダルトランスフォーマー

Towards Multi-modal Transformers in Federated Learning ( http://arxiv.org/abs/2404.12467v2 )

ライセンス: Link先を確認

Guangyu Sun, Matias Mendieta, Aritra Dutta, Xin Li, Chen Chen,

(参考訳) マルチモーダルトランスは、異なる領域で顕著な進歩を示すが、サイロ化された高品質なデータは、さらなる改善を妨げる。これを解決するために、フェデレートラーニング(FL)は、異なるクライアントが保持する生データに直接アクセスすることなく、モデルをトレーニングする上で有望なプライバシー保護パラダイムとして登場した。その可能性にもかかわらず、未実装のユニモーダルクライアントとFLのトランスフォーマーアーキテクチャに関するかなりの研究の方向性は未解明のままである。このギャップを埋めるために,クライアントが異なるデータセットに分散した様々なモダリティのデータを保有する視覚言語領域内でのマルチモーダル・フェデレート・ラーニング(MFL)シナリオについて検討する。我々は,トランスフォーマーアーキテクチャを利用する場合の既存手法の性能を体系的に評価し,クライアント間の非モダリティと相互モダリティのギャップに対処することで,FedCola(Federated modality complementary and collaboration)と呼ばれる新しいフレームワークを導入する。さまざまなFL設定にわたる広範な実験を通じて、FedColaは従来のアプローチよりも優れたパフォーマンスを示し、将来のマルチモーダルトランスのフェデレーショントレーニングに関する新たな視点を提供する。

Multi-modal transformers mark significant progress in different domains, but siloed high-quality data hinders their further improvement. To remedy this, federated learning (FL) has emerged as a promising privacy-preserving paradigm for training models without direct access to the raw data held by different clients. Despite its potential, a considerable research direction regarding the unpaired uni-modal clients and the transformer architecture in FL remains unexplored. To fill this gap, this paper explores a transfer multi-modal federated learning (MFL) scenario within the vision-language domain, where clients possess data of various modalities distributed across different datasets. We systematically evaluate the performance of existing methods when a transformer architecture is utilized and introduce a novel framework called Federated modality complementary and collaboration (FedCola) by addressing the in-modality and cross-modality gaps among clients. Through extensive experiments across various FL settings, FedCola demonstrates superior performance over previous approaches, offering new perspectives on future federated training of multi-modal transformers.

翻訳日:2024-07-18 22:07:40 公開日:2024-07-16

# 教師付き学習のためのMPP定式化:一般化された時間差学習モデル

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models ( http://arxiv.org/abs/2404.15518v3 )

ライセンス: Link先を確認

Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr,

(参考訳) 従来の統計的学習では、データポイントは通常、未知の確率分布の後、独立して同じ分布(すなわち、同じ分布)であると仮定される。本稿では、データポイントを相互接続したものとして認識し、データモデリングにマルコフ報酬プロセス(MRP)を用いる、対照的な視点を示す。我々は、強化学習(RL)における政治政策評価問題として、典型的教師付き学習を再構成し、一般化時間差学習アルゴリズム(TD)を解法として導入する。理論的には、線形TD学習の解と通常の最小二乗(OLS)の間の関係を抽出する。また、特定の条件下では、特にノイズが相関している場合、TDの解はOLSよりも効果的に推定できることを示す。さらに,線形関数近似の下で一般化されたTDアルゴリズムの収束性を確立する。実験的な研究により、我々の理論的結果を検証し、我々のTDアルゴリズムの重要設計を検証し、回帰や深層学習による画像分類といったタスクを含む様々なデータセットで実用性を示す。

In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling. We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution. Theoretically, our analysis draws connections between the solutions of linear TD learning and ordinary least squares (OLS). We also show that under specific conditions, particularly when noises are correlated, the TD's solution proves to be a more effective estimator than OLS. Furthermore, we establish the convergence of our generalized TD algorithms under linear function approximation. Empirical studies verify our theoretical results, examine the vital design of our TD algorithm and show practical utility across various datasets, encompassing tasks such as regression and image classification with deep learning.

翻訳日:2024-07-18 22:07:40 公開日:2024-07-16

# Sim-Grasp: 合成ベンチマークによるクラスタリング環境のための6-DOF Grasp ポリシの学習

Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments Using a Synthetic Benchmark ( http://arxiv.org/abs/2405.00841v2 )

ライセンス: Link先を確認

Juncheng Li, David J. Cappelleri,

(参考訳) そこで本稿では, オブジェクト操作の強化を目的とした高度な言語モデルを統合する, 頑健な6-DOF2指グリップシステムであるSim-Graspを提案する。我々はSim-Grasp-Datasetを紹介し、500のシナリオに7.9百万のアノテートラベルを持つ1,550のオブジェクトを含み、ポイントクラウドから把握ポーズを生成するSim-GraspNetを開発した。 Sim-Grasp-Policesは1つのオブジェクトで97.14%、Levels 1-2とLevels 3-4の混合クラッタシナリオで87.43%、83.33%の達成率を達成した。テキストとボックスプロンプトを通じてターゲット識別のための言語モデルを統合することで、Sim-Graspはオブジェクト非依存とターゲットピッキングの両方を可能にし、インテリジェントなロボットシステムのバウンダリを押し上げる。

In this paper, we present Sim-Grasp, a robust 6-DOF two-finger grasping system that integrates advanced language models for enhanced object manipulation in cluttered environments. We introduce the Sim-Grasp-Dataset, which includes 1,550 objects across 500 scenarios with 7.9 million annotated labels, and develop Sim-GraspNet to generate grasp poses from point clouds. The Sim-Grasp-Polices achieve grasping success rates of 97.14% for single objects and 87.43% and 83.33% for mixed clutter scenarios of Levels 1-2 and Levels 3-4 objects, respectively. By incorporating language models for target identification through text and box prompts, Sim-Grasp enables both object-agnostic and target picking, pushing the boundaries of intelligent robotic systems.

翻訳日:2024-07-18 21:57:43 公開日:2024-07-16

# Phylotrack:シリコ系統追跡のためのC++およびPythonライブラリ

Phylotrack: C++ and Python libraries for in silico phylogenetic tracking ( http://arxiv.org/abs/2405.09389v2 )

ライセンス: Link先を確認

Emily Dolson, Santiago Rodriguez-Papa, Matthew Andres Moreno,

(参考訳) ケイ素進化(英: silico evolution)は、コンピュータエージェントのデジタル集団における遺伝、変異、微分生殖成功の過程(自然選択による進化のための3つの「独立」)をインスタンス化する。その結果、これらの個体群は進化し、進化力学を研究するための仮想モデルシステムとして利用することができる。この実験パラダイムは、生物学的モデリング、人工生命、進化的計算にまたがって使用され、実験室やフィールドで不可能な実験を可能にすることで、in vitroおよびin vivoシステムを用いて行われた研究を補完する。ひとつ大きなメリットは、完全な、正確な可観測性です。例えば、シミュレーションの歴史を通してすべての親子関係を完璧に記録し、完全な系統(系統樹)を作り出すことができる。この情報は、いつ特性が得られたか、失われたかを明らかにし、根底にある進化力学の推論を促進する。 Phylotrackプロジェクトは、シリコの進化における系統の追跡と解析のためのライブラリを提供する。プロジェクトは構成されています 1) Phylotracklib: Empiricalプロジェクトの傘下で開発されたヘッダのみのC++ライブラリ。 2) Phylotrackpy: Phylotracklibを囲むPythonラッパー。両方のコンポーネントは、デジタル進化システムに系統追跡を付加する公開APIと、さまざまな一般的な系統トポロジーメトリクスを測定するスタンドアロンインターフェースを提供する。設計とC++の実装は効率を優先し、数万のエージェントの数を高速に世代交代できる。系統情報のメモリフットプリントを低減するために、いくつかの明示的な特徴(例えば、系統解析や抽象化など)を提供する。

In silico evolution instantiates the processes of heredity, variation, and differential reproductive success (the three "ingredients" for evolution by natural selection) within digital populations of computational agents. Consequently, these populations undergo evolution, and can be used as virtual model systems for studying evolutionary dynamics. This experimental paradigm -- used across biological modeling, artificial life, and evolutionary computation -- complements research done using in vitro and in vivo systems by enabling experiments that would be impossible in the lab or field. One key benefit is complete, exact observability. For example, it is possible to perfectly record all parent-child relationships across simulation history, yielding complete phylogenies (ancestry trees). This information reveals when traits were gained or lost, and also facilitates inference of underlying evolutionary dynamics. The Phylotrack project provides libraries for tracking and analyzing phylogenies in in silico evolution. The project is composed of 1) Phylotracklib: a header-only C++ library, developed under the umbrella of the Empirical project, and 2) Phylotrackpy: a Python wrapper around Phylotracklib, created with Pybind11. Both components supply a public-facing API to attach phylogenetic tracking to digital evolution systems, as well as a stand-alone interface for measuring a variety of popular phylogenetic topology metrics. Underlying design and C++ implementation prioritizes efficiency, allowing for fast generational turnover for agent populations numbering in the tens of thousands. Several explicit features (e.g., phylogeny pruning and abstraction, etc.) are provided for reducing the memory footprint of phylogenetic information.

翻訳日:2024-07-18 21:57:43 公開日:2024-07-16

# GenMix:医療画像分類のための生成データと混合データの統合

GenMix: Combining Generative and Mixture Data Augmentation for Medical Image Classification ( http://arxiv.org/abs/2405.20650v2 )

ライセンス: Link先を確認

Hansang Lee, Haeil Lee, Helen Hong,

(参考訳) 本稿では、生成的手法と混合的手法を組み合わせて、両方の手法の強みを利用するGenMixと呼ばれる新しいデータ拡張手法を提案する。生成モデルは新たなデータパターンの作成に優れていますが、GANのモード崩壊や、拡散モデルのトレーニングの困難、特に限られた医療画像データといった課題に直面しています。一方、混合モデルはクラス境界領域を強化するが、クラス不均衡のシナリオでは主要なクラスを好む傾向にある。これらの制限に対処するため、GenMixは両方のアプローチを統合して相互補完する。 GenMix は,(1) 合成画像を生成するために生成モデルを訓練し,(2) 合成データと実データとの混合を行う。このプロセスは、生成モデルの新たなパターン学習と混合モデルのバウンダリ強化の恩恵を受けながら、合成データの質と多様性を向上させる。局所肝病変(FLL)をCT画像で分類する作業において,本法の有効性を検証した。この結果から,GenMix は DCGAN, StyleGAN, Textual Inversion, Diffusion Models など,様々な生成モデルの性能を向上させることが示された。特に、テキスト・インバージョンを用いた提案手法は、FLLデータセット上での微調整拡散モデルなしで他の手法よりも優れている。

In this paper, we propose a novel data augmentation technique called GenMix, which combines generative and mixture approaches to leverage the strengths of both methods. While generative models excel at creating new data patterns, they face challenges such as mode collapse in GANs and difficulties in training diffusion models, especially with limited medical imaging data. On the other hand, mixture models enhance class boundary regions but tend to favor the major class in scenarios with class imbalance. To address these limitations, GenMix integrates both approaches to complement each other. GenMix operates in two stages: (1) training a generative model to produce synthetic images, and (2) performing mixup between synthetic and real data. This process improves the quality and diversity of synthetic data while simultaneously benefiting from the new pattern learning of generative models and the boundary enhancement of mixture models. We validate the effectiveness of our method on the task of classifying focal liver lesions (FLLs) in CT images. Our results demonstrate that GenMix enhances the performance of various generative models, including DCGAN, StyleGAN, Textual Inversion, and Diffusion Models. Notably, the proposed method with Textual Inversion outperforms other methods without fine-tuning diffusion model on the FLL dataset.

翻訳日:2024-07-18 21:57:43 公開日:2024-07-16

# 製品検索における関連判断のための大規模言語モデル

Large Language Models for Relevance Judgment in Product Search ( http://arxiv.org/abs/2406.00247v2 )

ライセンス: Link先を確認

Navid Mehrdad, Hrushikesh Mohapatra, Mossaab Bagdouri, Prijith Chandran, Alessandro Magnani, Xunfan Cai, Ajit Puthenputhussery, Sachin Yadav, Tony Lee, ChengXiang Zhai, Ciya Liao,

(参考訳) 検索クエリに対する検索および再ランク項目の高関連性は、製品検索の成功の基盤であるが、クエリに対するアイテムの関連性の測定は、製品情報検索において最も困難な課題の1つであり、製品検索の品質は、利用可能な関連ラベル付きデータの正確性とスケールの影響を強く受けている。本稿では,大規模言語モデル (LLM) を利用したクエリ-イム対 (QIP) の関連判断を大規模に行うための一連の手法を提案する。マルチミリオンQIPのユニークなデータセットを用いて,低ランク適応 (LoRA) と低ランク適応 (LoRA) を併用した10億パラメトリックLCMの微調整のためのハイパーパラメータのテストと最適化を行い,LCMファインタニングにおけるアイテム属性の結合と促進の様々なモードについて検討し,関連性予測の品質に対するアイテム属性の包摂性を考慮したトレードオフを検討する。我々は,従来のLLMのベースライン,および市販のモデルに対して,人間の関連性評価値と同等の関連アノテーションに対して,大幅に改善されていることを示す。本研究は,製品検索における関連判断の自動化の分野への直接的な影響を示唆するものである。

High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for leveraging Large Language Models (LLMs) for automating the relevance judgment of query-item pairs (QIPs) at scale. Using a unique dataset of multi-million QIPs, annotated by human evaluators, we test and optimize hyper parameters for finetuning billion-parameter LLMs with and without Low Rank Adaption (LoRA), as well as various modes of item attribute concatenation and prompting in LLM finetuning, and consider trade offs in item attribute inclusion for quality of relevance predictions. We demonstrate considerable improvement over baselines of prior generations of LLMs, as well as off-the-shelf models, towards relevance annotations on par with the human relevance evaluators. Our findings have immediate implications for the growing field of relevance judgment automation in product search.

翻訳日:2024-07-18 21:57:43 公開日:2024-07-16

# テリル制御可能な状態表現の学習

Learning telic-controllable state representations ( http://arxiv.org/abs/2406.14476v2 )

ライセンス: Link先を確認

Nadav Amir, Stas Tiomkin, Angela Langdon,

(参考訳) 目的的行動の計算的記述は、記述的側面と規範的側面の両方から構成される。前者は、世界の現在(または未来)の状態を確認するために使用され、後者は、ある目標の下でこれらの状態の望ましさ、またはその欠如を評価するために使用される。強化学習(Reinforcement Learning)では、規範的側面(逆と値関数)は、事前定義された、固定された記述的側面(状態表現)に依存すると仮定される。ゴールは状態依存の報酬関数によって近似されるが、取得した状態表現自体を形作ることもできる。本稿では,有界エージェントにおける状態表現学習のための新しい計算フレームワークを提案する。本稿では, テリック状態表現の粒度と, 全てのテリック状態に到達するために必要な政策複雑性とのトレードオフを特徴付ける, テリック制御可能性の概念を紹介する。制御可能な状態表現を学習するためのアルゴリズムを提案する。当社のフレームワークは、目標の柔軟性とポリシの複雑さのバランスをとる状態表現を学習する上で、意図的な無知(どのエクスペリエンスを無視すべきかを知る)という重要な役割を強調しています。より広範に、我々の研究は、自然エージェントと人工エージェントの目標指向状態表現学習に関する統一的な理論的視点を推し進めている。

Computational descriptions of purposeful behavior comprise both descriptive and normative} aspects. The former are used to ascertain current (or future) states of the world and the latter to evaluate the desirability, or lack thereof, of these states under some goal. In Reinforcement Learning, the normative aspect (reward and value functions) is assumed to depend on a predefined and fixed descriptive one (state representation). Alternatively, these two aspects may emerge interdependently: goals can be, and indeed often are, approximated by state-dependent reward functions, but they may also shape the acquired state representations themselves. Here, we present a novel computational framework for state representation learning in bounded agents, where descriptive and normative aspects are coupled through the notion of goal-directed, or telic, states. We introduce the concept of telic controllability to characterize the tradeoff between the granularity of a telic state representation and the policy complexity required to reach all telic states. We propose an algorithm for learning controllable state representations, illustrating it using a simple navigation task with shifting goals. Our framework highlights the crucial role of deliberate ignorance -- knowing which features of experience to ignore -- for learning state representations that balance goal flexibility and policy complexity. More broadly, our work advances a unified theoretical perspective on goal-directed state representation learning in natural and artificial agents.

翻訳日:2024-07-18 21:47:53 公開日:2024-07-16

# OPT-Tree:適応的なドラフトツリー構造を持つ投機的デコーディング

OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure ( http://arxiv.org/abs/2406.17276v2 )

ライセンス: Link先を確認

Jikai Wang, Yi Su, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Min Zhang,

(参考訳) 自動回帰言語モデルは、様々なシナリオにおいて優れたパフォーマンスを示す。しかし,1ステップ1ワード生成モードでは推論効率が制限されるため,モデルが大きくなったため,近年はプレッシャー問題となっている。投機的復号法では、複数のトークンを1ステップで生成できる「ドラフト・アンド・検証」機構を採用し、損失のない加速を実現する。既存の手法は主に固定ヒューリスティックなドラフト構造を採用しており、検証中の受け入れ長を最大化するために異なる状況に適応できない。このジレンマを緩和するために、適応的でスケーラブルなドラフトツリーを構築するアルゴリズムであるOPT-Treeを提案する。各復号ステップにおける受理長の数学的期待を最大化する最適な木構造を探索する。実験結果から, OPT-Treeは既存のドラフト構造より優れており, 自己回帰復号と比較して最大3.2の高速化率を実現していることがわかった。ドラフトモデルが十分に強力で、ノード予算が十分であれば、1ステップで10以上のトークンを生成することができる。私たちのコードはhttps://github.com/Jikai0Wang/OPT-Tree.comで公開されています。

Autoregressive language models demonstrate excellent performance in various scenarios. However, the inference efficiency is limited by its one-step-one-word generation mode, which has become a pressing problem recently as the models become increasingly larger. Speculative decoding employs a "draft and then verify" mechanism to allow multiple tokens to be generated in one step, realizing lossless acceleration. Existing methods mainly adopt fixed heuristic draft structures, which fail to adapt to different situations to maximize the acceptance length during verification. To alleviate this dilemma, we proposed OPT-Tree, an algorithm to construct adaptive and scalable draft trees. It searches the optimal tree structure that maximizes the mathematical expectation of the acceptance length in each decoding step. Experimental results reveal that OPT-Tree outperforms the existing draft structures and achieves a speed-up ratio of up to 3.2 compared with autoregressive decoding. If the draft model is powerful enough and the node budget is sufficient, it can generate more than ten tokens in a single step. Our code is available at https://github.com/Jikai0Wang/OPT-Tree.

翻訳日:2024-07-18 21:47:53 公開日:2024-07-16

# AnatoMask:リコンストラクション誘導型セルフマスキングによる医用画像セグメンテーションの強化

AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking ( http://arxiv.org/abs/2407.06468v2 )

ライセンス: Link先を確認

Yuheng Li, Tianyu Luan, Yizhou Wu, Shaoyan Pan, Yenho Chen, Xiaofeng Yang,

(参考訳) ラベル付きデータの不足により、ラベル付きデータから意味表現を抽出することにより、自己教師付き学習(SSL)が3次元画像のセグメンテーションにおいて大きな注目を集めている。 SSL戦略の中で、マスクされた画像をランダムに再構成して詳細な表現を学習することで、Masked Image Modeling (MIM)の有効性を示した。しかし, 従来のMIM法では, 医用画像の撮影に課題があるため, 良好な成績を収めるために, 広範囲なトレーニングデータが必要である。ランダムマスキングは医療画像内の全ての領域を均一にサンプリングするため、重要な解剖学的領域を見落とし、事前学習効率を低下させる可能性がある。本稿では,再建損失を利用して解剖学的に重要な領域を動的に識別・マスキングし,事前トレーニングの有効性を向上させる新しいMIM手法であるAnatoMaskを提案する。 AnatoMaskは自己蒸留アプローチを採用し、より重要なマスク領域を見つける方法と、これらのマスクされた領域を再構築する方法の両方を学ぶ。準最適学習を避けるため、Anatomaskはマスキングダイナミクス関数を用いて事前学習の難しさを段階的に調整する。我々は,CT,MRI,PETの4つのパブリックデータセットを用いて,複数の画像モダリティ(CT,MRI,PET)を用いて評価を行った。 AnatoMaskは既存のSSLメソッドよりも優れたパフォーマンスとスケーラビリティを示している。コードはhttps://github.com/ricklisz/AnatoMask.comで入手できる。

Due to the scarcity of labeled data, self-supervised learning (SSL) has gained much attention in 3D medical image segmentation, by extracting semantic representations from unlabeled data. Among SSL strategies, Masked image modeling (MIM) has shown effectiveness by reconstructing randomly masked images to learn detailed representations. However, conventional MIM methods require extensive training data to achieve good performance, which still poses a challenge for medical imaging. Since random masking uniformly samples all regions within medical images, it may overlook crucial anatomical regions and thus degrade the pretraining efficiency. We propose AnatoMask, a novel MIM method that leverages reconstruction loss to dynamically identify and mask out anatomically significant regions to improve pretraining efficacy. AnatoMask takes a self-distillation approach, where the model learns both how to find more significant regions to mask and how to reconstruct these masked regions. To avoid suboptimal learning, Anatomask adjusts the pretraining difficulty progressively using a masking dynamics function. We have evaluated our method on 4 public datasets with multiple imaging modalities (CT, MRI, and PET). AnatoMask demonstrates superior performance and scalability compared to existing SSL methods. The code is available at https://github.com/ricklisz/AnatoMask.

翻訳日:2024-07-18 21:38:02 公開日:2024-07-16

# データ汚染下における分断等角予測

Split Conformal Prediction under Data Contamination ( http://arxiv.org/abs/2407.07700v2 )

ライセンス: Link先を確認

Jase Clarkson, Wenkai Xu, Mihai Cucuringu, Gesine Reinert,

(参考訳) コンフォーマル予測(Conformal prediction)とは、データ交換可能な仮定の下で任意の予測モデルから予測間隔や集合を構築するための非パラメトリック手法である。予測セットの限界被覆に関する理論的保証が伴い、分割共形予測変種はモデルトレーニングと比較して計算コストが極めて低いことから人気がある。データ汚染条件下での分割共形予測のロバスト性について検討し、キャリブレーションスコアのごく一部がバルクと異なる分布から引き出されると仮定する。クリーンな」テストポイントで評価した場合, 破損したデータの影響を定量的に評価し, 数値実験による検証を行った。さらに,汚染ロバスト・コンフォーマル予測(Contamination Robust Conformal Prediction)と呼ぶ分類設定の調整を提案し,合成データと実データの両方を用いて本手法の有効性を検証する。

Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is popular as it comes with theoretical guarantees on the marginal coverage of the prediction sets and the split conformal prediction variant has a very low computational cost compared to model training. We study the robustness of split conformal prediction in a data contamination setting, where we assume a small fraction of the calibration scores are drawn from a different distribution than the bulk. We quantify the impact of the corrupted data on the coverage and efficiency of the constructed sets when evaluated on "clean" test points, and verify our results with numerical experiments. Moreover, we propose an adjustment in the classification setting which we call Contamination Robust Conformal Prediction, and verify the efficacy of our approach using both synthetic and real datasets.

翻訳日:2024-07-18 21:28:12 公開日:2024-07-16

# LiteGPT: 胸部X線像の局所化と分類作業のための大規模視覚言語モデル

LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task ( http://arxiv.org/abs/2407.12064v1 )

ライセンス: Link先を確認

Khai Le-Duc, Ryan Zhang, Ngoc Son Nguyen, Tan-Hanh Pham, Anh Dao, Ba Hung Ngo, Anh Totti Nguyen, Truong-Son Hy,

(参考訳) 視覚言語モデルは幅広いタスクにわたって広範囲に探索され、良好な性能を保っているが、医療画像への応用は未解明のままである。本研究では,医用画像用統合フレームワークLiteGPTを提案する。複数の事前学習されたビジュアルエンコーダを利用して情報を強化し、視覚言語モデルの性能を向上させる。我々の知る限りでは、医用画像における共同局所化と分類の新たな課題に視覚言語モデルを利用した最初の研究である。また, 胸部X線における疾患局在の基準線を提供する先駆者でもある。最後に、よくベンチマークされたVinDr-CXRデータセット上で、画像分類タスクに新しい最先端性能を設定した。すべてのコードとモデルはオンラインで公開されている。

Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To the best of our knowledge, this is the first study to utilize vision-language models for the novel task of joint localization and classification in medical images. Besides, we are pioneers in providing baselines for disease localization in chest X-rays. Finally, we set new state-of-the-art performance in the image classification task on the well-benchmarked VinDr-CXR dataset. All code and models are publicly available online: https://github.com/leduckhai/LiteGPT

翻訳日:2024-07-18 21:28:12 公開日:2024-07-16

# 自動運転車評価のためのデータ選択手法

Data selection method for assessment of autonomous vehicles ( http://arxiv.org/abs/2407.12065v1 )

ライセンス: Link先を確認

Linh Trinh, Ali Anwar, Siegfried Mercelis,

(参考訳) 自動運転車の人気が高まるにつれて、ISO、NHTSA、Euro NCAPといった多くの標準や規制機関は、実際の世界に配備する前に十分なレベルの安全性を確保するために、安全性の検証を必要としている。製造業者は、この目的のために大量の公道データを収集します。しかしながら、これらのバリデーション活動の大部分は、人間が手作業で行います。さらに、各駆動特性を検証するために使用されるデータが異なる場合がある。その結果、検証プロセスの高速化を図りつつ、柔軟かつ動的に検証・検証に使用できる効率的なデータ選択方法を持つことが不可欠である。本稿では,自律走行車の評価を行う上で,実用的で柔軟かつ効率的なデータ選択手法を提案する。我々の考えは、選択したデータのメタデータ分布と、バリデーションに期待される事前定義されたメタデータ分布との類似性を最適化することである。 BDD100Kを用いた大規模データセット実験により,提案手法が効率的にデータ選択タスクを実行できることを示す。これらの結果から,本手法は信頼性が高く,各種安全機能の検証に有効なデータ選択に有用であることが示唆された。

As the popularity of autonomous vehicles has grown, many standards and regulators, such as ISO, NHTSA, and Euro NCAP, require safety validation to ensure a sufficient level of safety before deploying them in the real world. Manufacturers gather a large amount of public road data for this purpose. However, the majority of these validation activities are done manually by humans. Furthermore, the data used to validate each driving feature may differ. As a result, it is essential to have an efficient data selection method that can be used flexibly and dynamically for verification and validation while also accelerating the validation process. In this paper, we present a data selection method that is practical, flexible, and efficient for assessment of autonomous vehicles. Our idea is to optimize the similarity between the metadata distribution of the selected data and a predefined metadata distribution that is expected for validation. Our experiments on the large dataset BDD100K show that our method can perform data selection tasks efficiently. These results demonstrate that our methods are highly reliable and can be used to select appropriate data for the validation of various safety functions.

翻訳日:2024-07-18 21:28:12 公開日:2024-07-16

# 非拘束映像における時間的グラウンドインストラクショナルダイアグラム

Temporally Grounding Instructional Diagrams in Unconstrained Videos ( http://arxiv.org/abs/2407.12066v1 )

ライセンス: Link先を確認

Jiahao Zhang, Frederic Z. Zhang, Cristian Rodriguez, Yizhak Ben-Shabat, Anoop Cherian, Stephen Gould,

(参考訳) ビデオ中の命令図の形式でクエリのシーケンスを同時にローカライズするという課題について検討する。これは個々のクエリだけでなく、相互関係も理解する必要がある。しかし、既存のほとんどの手法は、汎用的な相互排他性や時間的順序といったクエリの固有の構造を無視して、一度に1つのクエリを基底にすることに焦点を当てている。これにより、異なるステップダイアグラムの予測タイムパンが著しく重複したり、時間順序に反したりし、精度を損なう可能性がある。本稿では,一連のステップ図を同時に構築することにより,この問題に対処する。具体的には、ステップダイアグラムの視覚的特徴と学習可能な定数の位置埋め込みとを徹底的に組み合わせて構築した複合クエリを提案する。コンテントの特徴が異なる複合クエリ間の自己アテンションが抑制され,予測の時間的重複が減少するのに対して,クロスアテンションはコンテンツと位置ジョイントガイダンスによって時間的ミスアライメントを補正する。ステップダイアグラムのグラウンド化のためのIAWデータセットと自然言語クエリのグラウンド化のためのYouCook2ベンチマークに対するアプローチの有効性を示す。

We study the challenging problem of simultaneously localizing a sequence of queries in the form of instructional diagrams in a video. This requires understanding not only the individual queries but also their interrelationships. However, most existing methods focus on grounding one query at a time, ignoring the inherent structures among queries such as the general mutual exclusiveness and the temporal order. Consequently, the predicted timespans of different step diagrams may overlap considerably or violate the temporal order, thus harming the accuracy. In this paper, we tackle this issue by simultaneously grounding a sequence of step diagrams. Specifically, we propose composite queries, constructed by exhaustively pairing up the visual content features of the step diagrams and a fixed number of learnable positional embeddings. Our insight is that self-attention among composite queries carrying different content features suppress each other to reduce timespan overlaps in predictions, while the cross-attention corrects the temporal misalignment via content and position joint guidance. We demonstrate the effectiveness of our approach on the IAW dataset for grounding step diagrams and the YouCook2 benchmark for grounding natural language queries, significantly outperforming existing methods while simultaneously grounding multiple queries.

翻訳日:2024-07-18 21:28:12 公開日:2024-07-16

# MaskVD: 効率的なビデオオブジェクト検出のための領域マスキング

MaskVD: Region Masking for Efficient Video Object Detection ( http://arxiv.org/abs/2407.12067v1 )

ライセンス: Link先を確認

Sreetama Sarkar, Gourav Datta, Souvik Kundu, Kai Zheng, Chirayata Bhattacharyya, Peter A. Beerel,

(参考訳) ビデオタスクは計算量が多いため、特に最先端のビジョントランスフォーマー(ViT)を必要とするタスクにおいて、リアルタイムアプリケーションにデプロイする際の課題となる。いくつかの研究は、ビデオの大部分がフレーム間でほとんど変化せず、フレームベースのビデオ処理における冗長な計算に繋がるという事実を活用することで、この問題に対処しようとしている。特に、フレーム間のピクセルやセマンティックな違いを利用する研究もあるが、メモリオーバーヘッドが大幅に増加するため、レイテンシのメリットは限られている。一方,本論文では,画像中の意味情報とフレーム間の時間的相関を利用して,ビデオフレーム内の領域をマスキングする手法を提案する。特に、以前のフレームから抽出した特徴を活用することで、ViTバックボーンは、領域マスキングから直接恩恵を受け、入力領域の80%をスキップし、FLOPとレイテンシを3.14倍、1.5倍改善することを示した。我々は、同様の検出性能を維持しながら、最新技術(SOTA)のメモリとレイテンシを2.3倍と1.14倍改善する。さらに,提案手法は畳み込みニューラルネットワーク(CNN)の有望な結果を示し,特殊計算カーネルを用いたSOTAの最大1.3倍のレイテンシ向上を実現する。

Video tasks are compute-heavy and thus pose a challenge when deploying in real-time applications, particularly for tasks that require state-of-the-art Vision Transformers (ViTs). Several research efforts have tried to address this challenge by leveraging the fact that large portions of the video undergo very little change across frames, leading to redundant computations in frame-based video processing. In particular, some works leverage pixel or semantic differences across frames, however, this yields limited latency benefits with significantly increased memory overhead. This paper, in contrast, presents a strategy for masking regions in video frames that leverages the semantic information in images and the temporal correlation between frames to significantly reduce FLOPs and latency with little to no penalty in performance over baseline models. In particular, we demonstrate that by leveraging extracted features from previous frames, ViT backbones directly benefit from region masking, skipping up to 80% of input regions, improving FLOPs and latency by 3.14x and 1.5x. We improve memory and latency over the state-of-the-art (SOTA) by 2.3x and 1.14x, while maintaining similar detection performance. Additionally, our approach demonstrates promising results on convolutional neural networks (CNNs) and provides latency improvements over the SOTA up to 1.3x using specialized computational kernels.

翻訳日:2024-07-18 21:28:12 公開日:2024-07-16

# 大規模言語モデル(LLM)を用いたグラフの学習 : モデルロバストネスの深層化

Learning on Graphs with Large Language Models(LLMs): A Deep Dive into Model Robustness ( http://arxiv.org/abs/2407.12068v1 )

ライセンス: Link先を確認

Kai Guo, Zewen Liu, Zhikai Chen, Hongzhi Wen, Wei Jin, Jiliang Tang, Yi Chang,

(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理タスクにおいて顕著な性能を示している。近年,テキスト属性を持つグラフの学習を向上し,有望な性能を示すLLMベースのパイプラインがいくつか開発されている。しかし、グラフは敵攻撃の影響を受けやすいことがよく知られており、LLMがグラフ上での学習において堅牢性を示すかどうかは不明である。このギャップに対処するため,本研究は,グラフに対する敵対的攻撃の文脈におけるLLMの可能性を探究することを目的としている。具体的には, LLMs-as-Enhancers と LLMs-as-Predictors という2次元のグラフ構造とテキストの摂動に対する頑健性について検討する。より広範な実験により,LLM-as-EnhancersとLLM-as-Predictorsは,浅層モデルと比較して,構造的およびテキスト的攻撃に対して優れた堅牢性を有することが明らかとなった。さらに、我々のベンチマークライブラリを公開して、迅速かつ公平な評価を容易にし、この分野で進行中の革新的な研究を促進するようにしました。

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language processing tasks. Recently, several LLMs-based pipelines have been developed to enhance learning on graphs with text attributes, showcasing promising performance. However, graphs are well-known to be susceptible to adversarial attacks and it remains unclear whether LLMs exhibit robustness in learning on graphs. To address this gap, our work aims to explore the potential of LLMs in the context of adversarial attacks on graphs. Specifically, we investigate the robustness against graph structural and textual perturbations in terms of two dimensions: LLMs-as-Enhancers and LLMs-as-Predictors. Through extensive experiments, we find that, compared to shallow models, both LLMs-as-Enhancers and LLMs-as-Predictors offer superior robustness against structural and textual attacks.Based on these findings, we carried out additional analyses to investigate the underlying causes. Furthermore, we have made our benchmark library openly available to facilitate quick and fair evaluations, and to encourage ongoing innovative research in this field.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 個人的アイデンティティのワンショットアンラーニング

One-Shot Unlearning of Personal Identities ( http://arxiv.org/abs/2407.12069v1 )

ライセンス: Link先を確認

Thomas De Min, Subhankar Roy, Massimiliano Mancini, Stéphane Lathuilière, Elisa Ricci,

(参考訳) マシン・アンラーニング(MU)は、トレーニング中に見たことのないようなモデルからデータを消去することを目的としている。この範囲で、既存のMUアプローチはトレーニングデータへの完全または部分的なアクセスを前提としており、これはプライバシー規制のために時間とともに制限される可能性がある。しかし、そのようなシナリオにおけるMUメソッドの有効性を調査するための設定やベンチマークは存在しない。このギャップを埋めるために、トレーニングデータにアクセスできない場合の未学習モデルを評価できるOne-Shot Unlearning of Personal Identities (O-UPI) と呼ばれる新しいタスクを提案する。具体的には、トレーニング後のデータ削除を要求される現在の規制が関係しているIDアンラーニングケースに焦点を当てる。データの欠如に対処するため,利用者は未学習のポートレート画像の提供を期待する。 O-UPIの手法を評価するため,異なる未学習データセットサイズでCelebAとCelebA-HQデータセットの誤りをベンチマークした。我々は、この挑戦的なベンチマークで適用可能な手法を検証し、メタ学習者が1つの画像からアイデンティティを忘れる効果的な方法を提案する。得られたサンプルとトレーニング時に使用するデータとの相違点がある場合,データ可用性が制限された場合,既存のアプローチは困難であることが示唆された。受け入れ次第、コードとベンチマークをリリースします。

Machine unlearning (MU) aims to erase data from a model as if it never saw them during training. To this extent, existing MU approaches assume complete or partial access to the training data, which can be limited over time due to privacy regulations. However, no setting or benchmark exists to probe the effectiveness of MU methods in such scenarios, i.e. when training data is missing. To fill this gap, we propose a novel task we call One-Shot Unlearning of Personal Identities (O-UPI) that evaluates unlearning models when the training data is not accessible. Specifically, we focus on the identity unlearning case, which is relevant due to current regulations requiring data deletion after training. To cope with data absence, we expect users to provide a portraiting picture to perform unlearning. To evaluate methods in O-UPI, we benchmark the forgetting on CelebA and CelebA-HQ datasets with different unlearning set sizes. We test applicable methods on this challenging benchmark, proposing also an effective method that meta-learns to forget identities from a single image. Our findings indicate that existing approaches struggle when data availability is limited, with greater difficulty when there is dissimilarity between provided samples and data used at training time. We will release the code and benchmark upon acceptance.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# エッジ展開効率向上のための二元化変圧器とハードウェア加速器の共設計

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment ( http://arxiv.org/abs/2407.12070v1 )

ライセンス: Link先を確認

Yuhao Ji, Chao Fang, Shaobo Ma, Haikuo Shao, Zhongfeng Wang,

(参考訳) トランスフォーマーモデルはAIタスクに革命をもたらしたが、その大きなサイズはリソース制約やレイテンシクリティカルなエッジデバイスへの実際のデプロイメントを妨げる。バイナライズされたトランスフォーマーは、モデルサイズを大幅に削減することで、有望なソリューションを提供するが、既存のアプローチでは、アルゴリズムとハードウェアのミスマッチに悩まされ、コデザイン探索が制限され、エッジデバイス上でのサブ最適化のパフォーマンスが向上する。そこで本研究では,アルゴリズム,ハードウェア,共同最適化の3つの側面から,トランスフォーマーのエンドツーエンド配置を効率的に行うための設計手法を提案する。まず、最適化された量子化手法とコンポーネントを備えたハードウェアフレンドリなバイナライズトランスであるBMTを提案し、重み付き三重分割トレーニング技術を活用することにより、モデル精度をさらに向上する。第2に,二項変換器を効率よく推定するための専用ユニットとスケジューリングパイプラインを備えたストリーミングプロセッサ混合二項変換器アクセラレータ,すなわちBATを開発した。最後に、我々は設計空間探索アプローチを通じてアルゴリズムとハードウェアを協調して最適化し、現実世界のデプロイメントにおける正確性、レイテンシ、堅牢性の間のグローバルなトレードオフを実現する。実験結果から,2.14-49.37倍のスループット向上と3.72-88.53倍のエネルギー効率を実現し,エンドツーエンドのエッジ展開を効果的に実現した。

Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing model size, existing approaches suffer from algorithm-hardware mismatches with limited co-design exploration, leading to suboptimal performance on edge devices. Hence, we propose a co-design method for efficient end-to-end edge deployment of Transformers from three aspects: algorithm, hardware, and joint optimization. First, we propose BMT, a novel hardware-friendly binarized Transformer with optimized quantization methods and components, and we further enhance its model accuracy by leveraging the weighted ternary weight splitting training technique. Second, we develop a streaming processor mixed binarized Transformer accelerator, namely BAT, which is equipped with specialized units and scheduling pipelines for efficient inference of binarized Transformers. Finally, we co-optimize the algorithm and hardware through a design space exploration approach to achieve a global trade-off between accuracy, latency, and robustness for real-world deployments. Experimental results show our co-design achieves up to 2.14-49.37x throughput gains and 3.72-88.53x better energy efficiency over state-of-the-art Transformer accelerators, enabling efficient end-to-end edge deployment.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# リレーショナル表現蒸留

Relational Representation Distillation ( http://arxiv.org/abs/2407.12073v1 )

ライセンス: Link先を確認

Nikolaos Giakoumoglou, Tania Stathaki,

(参考訳) 知識蒸留(KD)は、大きく訓練された教師モデルからより小さく、より効率的な学生モデルに知識を移す効果的な方法である。その成功にもかかわらず、KDの主な課題の1つは、学生の計算効率を維持しながら、複雑な知識の効率的な伝達を保証することである。明示的な負のインスタンスを促進するために対照的な目的を適用した以前の研究とは異なり、リレーショナル表現蒸留(RRD)を導入している。本手法は,教師モデルと学生モデルの関係を探索し,強化するために,ペアワイズな類似性を利用する。自己監督学習の原則に触発されて、正確な複製よりも類似性に焦点を当てた、リラックスした対照的な損失を使用する。本手法は,教師サンプルの出力分布を大容量メモリバッファに整列させ,厳密な負のインスタンス差分を伴わずに生徒モデルの堅牢性と性能を向上させる。提案手法はCIFAR-100よりも優れた性能を示し,従来のKD技術より優れ,最先端手法は13を超える。 Tiny ImageNetやSTL-10といった他のデータセットへの転送も成功している。コードはまもなく公開されます。

Knowledge distillation (KD) is an effective method for transferring knowledge from a large, well-trained teacher model to a smaller, more efficient student model. Despite its success, one of the main challenges in KD is ensuring the efficient transfer of complex knowledge while maintaining the student's computational efficiency. Unlike previous works that applied contrastive objectives promoting explicit negative instances, we introduce Relational Representation Distillation (RRD). Our approach leverages pairwise similarities to explore and reinforce the relationships between the teacher and student models. Inspired by self-supervised learning principles, it uses a relaxed contrastive loss that focuses on similarity rather than exact replication. This method aligns the output distributions of teacher samples in a large memory buffer, improving the robustness and performance of the student model without the need for strict negative instance differentiation. Our approach demonstrates superior performance on CIFAR-100, outperforming traditional KD techniques and surpassing 13 state-of-the-art methods. It also transfers successfully to other datasets like Tiny ImageNet and STL-10. The code will be made public soon.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 大規模モデルにおけるパラメータ効率と一般化の促進--正規化とマスク付き低ランク適応アプローチ

Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach ( http://arxiv.org/abs/2407.12074v1 )

ライセンス: Link先を確認

Yuzhu Mao, Siqi Ping, Zihao Zhao, Yang Liu, Wenbo Ding,

(参考訳) 大規模言語モデル(LLM)のような大規模事前学習モデルでは、特にモバイルシステムでの応用において、パラメータサイズが広いため、微調整において重要なリソース課題が生じる。これを解決するため、ローランド適応(LoRA)は、良好な微調整結果を維持しつつ、資源消費を減らすために開発された。その効果にもかかわらず、オリジナルのLoRA法は最適化性能と過度な適合性の課題に直面している。本稿では,LoRA法により近似された行列更新の本質的な次元について検討し,本質的な次元を増大させることによる性能上の利点を明らかにする。正規化法と勾配マスキング法を用いることで,正規化法とMasked LoRA (RM-LoRA) と呼ばれる手法は,従来のLoRAや,様々なオープンソースビジョンや言語データセットにまたがる最新のバリエーションと比較して,同じあるいは低いトレーニング可能なパラメータ予算で優れた一般化性能を実現する。

Large pre-trained models, such as large language models (LLMs), present significant resource challenges for fine-tuning due to their extensive parameter sizes, especially for applications in mobile systems. To address this, Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results. Despite its effectiveness, the original LoRA method faces challenges of suboptimal performance and overfitting. This paper investigates the intrinsic dimension of the matrix updates approximated by the LoRA method and reveals the performance benefits of increasing this intrinsic dimension. By employing regularization and a gradient masking method that encourages higher intrinsic dimension, the proposed method, termed Regularized and Masked LoRA (RM-LoRA), achieves superior generalization performance with the same or lower trainable parameter budget compared to the original LoRA and its latest variants across various open-source vision and language datasets.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# Tiled Bit Networks:学習可能なバイナリベクトルの再利用によるサブビットニューラルネットワーク圧縮

Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors ( http://arxiv.org/abs/2407.12075v1 )

ライセンス: Link先を確認

Matt Gorbett, Hossein Shirazi, Indrakshi Ray,

(参考訳) バイナリニューラルネットワーク(BNN)は、ストレージと計算コストを節約して効率的なディープラーニングを実現する。しかしながら、ニューラルネットワークのサイズが拡大し続けるにつれて、計算要求を満たすことは依然として困難である。本研究では,2次重み付きニューラルネットワークのサブビット圧縮を実現するために,ビット列を持つタイル型ニューラルネットワーク層に対する新しい量子化方式を提案する。この方法は2進ベクトル(すなわちタイル)を学習し、アグリゲーションとリフォーム操作を通じてモデルの各層をポップアップさせる。推論中、この方法は全テンソルを表すために層ごとに1つのタイルを再利用する。私たちは完全に接続された層と畳み込み層の両方にアプローチを採用しています。経験的に、このアプローチは、様々なアーキテクチャ(CNN、トランスフォーマー、MPP)とタスク(分類、セグメンテーション、時系列予測)において、バイナリ重み付けモデルと比較して最大8倍の精度で、ほぼ完全な性能を達成する。我々は、Tiled Bit Networksに2つの実装を提供している。 1) 資源制約環境におけるその実現可能性を評価するため, マイクロコントローラにモデルを展開する。 2) GPU互換の推論カーネルで、メモリ内の1層当たりのタイルの再利用を容易にする。

Binary Neural Networks (BNNs) enable efficient deep learning by saving on storage and computational costs. However, as the size of neural networks continues to grow, meeting computational requirements remains a challenge. In this work, we propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks. The method learns binary vectors (i.e. tiles) to populate each layer of a model via aggregation and reshaping operations. During inference, the method reuses a single tile per layer to represent the full tensor. We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures. Empirically, the approach achieves near fullprecision performance on a diverse range of architectures (CNNs, Transformers, MLPs) and tasks (classification, segmentation, and time series forecasting) with up to an 8x reduction in size compared to binary-weighted models. We provide two implementations for Tiled Bit Networks: 1) we deploy the model to a microcontroller to assess its feasibility in resource-constrained environments, and 2) a GPU-compatible inference kernel to facilitate the reuse of a single tile per layer in memory.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# GoldFinch:Linear Pre-FillとExtreme KV-Cache圧縮を備えた高性能RWKV/Transformerハイブリッド

GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression ( http://arxiv.org/abs/2407.12077v1 )

ライセンス: Link先を確認

Daniel Goldstein, Fares Obeid, Eric Alcaide, Guangyu Song, Eugene Cheah,

(参考訳) 我々は,線形時間と空間において高圧縮・再利用可能なKVキャッシュを効率よく生成する新しい手法を用いたハイブリッド線形アテンション/トランスフォーマーシーケンスモデルGoldFinchを紹介する。 GoldFinchは、Finch(RWKV-6)アーキテクチャの拡張版の上に、新しいGOLDトランスフォーマーを積み重ねています。我々は、Finch、Llama、GoldFinchアーキテクチャの1.5Bパラメータクラスモデルをトレーニングし、FinchおよびLlamaと比較して、劇的に改善されたモデリング性能を見出した。キャッシュサイズ削減はモデル層数とともに線形的に増加し,従来型のトランスフォーマーキャッシュの756～2550倍の小型化が可能となり,限られたハードウェア上でも極めて大きなコンテキスト長の推測が可能となった。自己回帰生成はトークン毎のO(n)時間複雑性を持つが、このキャッシュを生成するためにリカレントニューラルネットワーク(RNN)を使用するため、送信されたコンテキストに対する初期キャッシュ状態全体のプリフィル計算はトークン毎のO(1)時間しかかからない。コミュニティ利用のためのApache 2.0ライセンスの下で、トレーニングされたウェイトとトレーニングコードをリリースしています。

We introduce GoldFinch, a hybrid Linear Attention/Transformer sequence model that uses a new technique to efficiently generate a highly compressed and reusable KV-Cache in linear time and space with respect to sequence length. GoldFinch stacks our new GOLD transformer on top of an enhanced version of the Finch (RWKV-6) architecture. We train up to 1.5B parameter class models of the Finch, Llama, and GoldFinch architectures, and find dramatically improved modeling performance relative to both Finch and Llama. Our cache size savings increase linearly with model layer count, ranging from 756-2550 times smaller than the traditional transformer cache for common sizes, enabling inference of extremely large context lengths even on limited hardware. Although autoregressive generation has O(n) time complexity per token because of attention, pre-fill computation of the entire initial cache state for a submitted context costs only O(1) time per token due to the use of a recurrent neural network (RNN) to generate this cache. We release our trained weights and training code under the Apache 2.0 license for community use.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 非ガウス状態の絡み合い構造とその測定方法

Entanglement Structure of Non-Gaussian States and How to Measure It ( http://arxiv.org/abs/2407.12083v1 )

ライセンス: Link先を確認

Henry Froland, Torsten V. Zache, Robert Ott, Niklas Mueller,

(参考訳) 量子シミュレーターの急速に成長する量子多体現象を探索する能力は、ますます複雑な状態を特徴づける新しい方法を必要とする。本稿では,システムサイズと多項式的にしかスケールしない相関関数を実験的に測定することで,量子状態の制約を行うプロトコルを提案する。この方法は量子状態の絡み合い構造の測定を可能にし、絡み合いに関連する現象を研究するための新しい経路を開く。我々の手法は高次相関を体系的に組み込むことでガウス状態パラメータ化を拡張する。本稿では,提案プロトコルが現在および今後の実験能力とともに有用であることを示し,概念実証として弱い相互作用を持つフェルミオンに着目した。ここでは、最も低い非自明な展開は、ハミルトンの絡み合いによって示される量子カオスのオンセットのシグナルを含む初期の熱化ダイナミクスを定量的に予測する。

Rapidly growing capabilities of quantum simulators to probe quantum many-body phenomena require new methods to characterize increasingly complex states. We present a protocol that constrains quantum states by experimentally measured correlation functions which only scales polynomially with system size. This method enables measurement of a quantum state's entanglement structure, opening a new route to study entanglement-related phenomena. Our approach extends Gaussian state parameterizations by systematically incorporating higher-order correlations. We show the protocol's usefulness in conjunction with current and forthcoming experimental capabilities, focusing on weakly interacting fermions as a proof of concept. Here, the lowest non-trivial expansion quantitatively predicts early time thermalization dynamics, including signaling the on-set of quantum chaos indicated by the entanglement Hamiltonian.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 空洞埋没時のマヨラナ境界状態の高品質化

High-quality poor man's Majorana bound states from cavity embedding ( http://arxiv.org/abs/2407.12088v1 )

ライセンス: Link先を確認

Álvaro Gómez-León, Marco Schirò, Olesia Dmytruk,

(参考訳) 粗い男のマヨアナ境界状態(MBS)は、パラメータがスイートスポットに微調整されたときに、最小限のキータエフ鎖に現れる。単一モードキャビティに結合した相互作用する2部位の北エフ鎖を考えると, スイートスポット状態は, キャビティ周波数とサイト間のホッピングによって制御可能であることを示す。さらに,光子を介する効果的な相互作用は,本質的な相互作用のスクリーニングやMBSの本来の品質向上に有効であることを示す。キャビティ伝達における実験的なシグネチャを記述し,その存在と品質を検出する。我々の研究は、空洞に結合された量子ドットアレイで貧しい人間のMBSをチューニングする新しい方法を提案する。

Poor man's Majorana Bound States (MBS) arise in minimal Kitaev chains when the parameters are fine-tuned to a sweet spot. We consider an interacting two-site Kitaev chain coupled to a single-mode cavity and show that the sweet spot condition can be controlled with the cavity frequency and the hopping between sites. Furthermore, we demonstrate that photon-mediated effective interactions can be used to screen intrinsic interactions, improving the original quality of the MBS. We describe experimental signatures in the cavity transmission to detect their presence and quality. Our work proposes a new way to tune poor man's MBS in a quantum dot array coupled to a cavity.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 近藤効果における異方性の関係-シンプレクティックケースからの教訓-

Relevance of Anisotropy in the Kondo Effect -- Lessons From the Symplectic Case ( http://arxiv.org/abs/2407.12093v1 )

ライセンス: Link先を確認

Matan Lotem, Sarath Sankar, Tianhao Ren, Moshe Goldstein, Elio. J. König, Andreas Weichselbaum, Eran Sela, Alexei M. Tsvelik,

(参考訳) シンプレクティック対称性を持つ近藤模型は, 超伝導アイランドデバイスの有効低エネルギー理論として最近提案された。非フェルミ液体物理学と有効エノンを持つこのモデルは、位相的近藤効果のクラスに属すると論じられた。ここでは、ボゾン化と共形場理論とともに摂動的および数値的再正規化群を用いて、その異方的不動点の安定性の程度を明らかにする。従来の主張とは対照的に、鉛とのカップリングにおける非対称性が非フェルミ液体を不安定化することを示す。その他の不安定な摂動には、超伝導対の非対称性や、島内の個々の量子ドットの内部エネルギーが含まれる。それでもこれらの摂動は、すべて同じ関連する作用素を生成する。したがって、結合を個別に調整する必要は少なく、これらは実験的な利便性に応じて選択できる。本結果は,近藤結合における異方性は常に無関係であるという共通の誤解を浮き彫りにしている。証明されたように、群生成元が不純物作用素の全空間にまたがらないとき、関連する用語が現れる。これは、大スピン不純物やSO(M)コンドモデルのような、この性質を示すモデルのより詳細な検査を要求する。

A Kondo model with symplectic symmetry was recently put forward as the effective low-energy theory of a superconducting-island device coupled to multiple leads. This model, which possesses non-Fermi liquid physics and effective anyons, was argued to belong to the class of topological Kondo effects. Here, we clarify the extent of stability of its exotic fixed point using perturbative and numerical renormalization group in conjunction with bosonization and conformal field theory. In contrast to previous claims, we show that asymmetry in the coupling to the leads destabilizes the non-Fermi liquid. Other destabilizing perturbations include asymmetry in the superconducting pairing or internal energy of the individual quantum dots in the island. Nevertheless, these perturbations all generate the same relevant operators. Thus, only a small number of couplings need to be tuned individually, and these can be selected according to experimental convenience. Our results highlight a common misconception that anisotropy in the Kondo coupling is always irrelevant. As demonstrated, relevant terms will emerge whenever the group generators do not span the full space of impurity operators. This calls for a more detailed inspection of models that exhibit this property, such as large-spin impurities and SO(M) Kondo models

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 対話文中の話者の識別:事前学習型言語モデルを用いたテキストベースアプローチ

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models ( http://arxiv.org/abs/2407.12094v1 )

ライセンス: Link先を確認

Minh Nguyen, Franck Dernoncourt, Seunghyun Yoon, Hanieh Deilamsalehy, Hao Tan, Ryan Rossi, Quan Hung Tran, Trung Bui, Thien Huu Nguyen,

(参考訳) 本稿では,デジタルメディアアーカイブにおけるコンテンツアクセシビリティと検索可能性を高めるための重要な課題である,対話テキスト中の話者名同定手法を提案する。音声認識の進歩にもかかわらず、テキストベースの話者識別(SpeakerID)のタスクには、効果的なモデルトレーニングのための大規模で多様なデータセットが欠如している。これらのギャップに対処するために,メディアサムコーパスから派生した,幅広いメディアソースからの転写を含む,新しい大規模データセットを提案する。本稿では,話者名を正確に属性付けるために,対話中の文脈的手がかりを活用する,話者IDに適したトランスフォーマーモデルを提案する。広範囲な実験を通して、我々の最良のモデルは 80.3\% の精度を達成し、SpeakerID のベンチマークを新たに設定する。データとコードはここで公開されている。 \url{https://github.com/adobe-research/speaker-identification}

We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. Despite the advancements in speech recognition, the task of text-based speaker identification (SpeakerID) has received limited attention, lacking large-scale, diverse datasets for effective model training. Addressing these gaps, we present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names. Through extensive experiments, our best model achieves a great precision of 80.3\%, setting a new benchmark for SpeakerID. The data and code are publicly available here: \url{https://github.com/adobe-research/speaker-identification}

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 正規化ワッサースタイン距離を用いたシミュレーション出力分布の集約クラスタリング

Agglomerative Clustering of Simulation Output Distributions Using Regularized Wasserstein Distance ( http://arxiv.org/abs/2407.12100v1 )

ライセンス: Link先を確認

Mohammadmahdi Ghasemloo, David J. Eckman,

(参考訳) 本稿では,確率シミュレータによるデータに対するクラスタリング手法の適用について検討し,異常検出,事前最適化,オンラインモニタリングへの応用について述べる。本稿では,正規化ワッサーシュタイン距離を用いて経験分布をクラスタリングする集合的クラスタリングアルゴリズムを導入し,その手法をコールセンタモデルに適用する。

We investigate the use of clustering methods on data produced by a stochastic simulator, with applications in anomaly detection, pre-optimization, and online monitoring. We introduce an agglomerative clustering algorithm that clusters multivariate empirical distributions using the regularized Wasserstein distance and apply the proposed methodology on a call-center model.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 関連情報ゲインを用いたRAGの改善

Better RAG using Relevant Information Gain ( http://arxiv.org/abs/2407.12101v1 )

ライセンス: Link先を確認

Marc Pickett, Jeremy Hartman, Ayan Kumar Bhowmick, Raquib-ul Alam, Aditya Vempaty,

(参考訳) 大きな言語モデル(LLM)のメモリを拡張する一般的な方法は、より大きなメモリから取得したテキストをLLMのコンテキストウィンドウに挿入する検索拡張生成(RAG)である。しかし、コンテキストウィンドウは通常数千のトークンに制限されており、モデルが応答したことを知らせる検索されたパスの数を制限する。このため,検索したパス間の多様性の度合いを確保することにより,冗長な情報によるコンテキストウィンドウの占有を回避することが重要である。同時に、情報は現在のタスクにも関係するべきです。 MMR(Maximal Marginal Relevance)のような、得られた結果の多様性を促進する最も以前の手法は、多様性と妥当性を明確に取り除く目的を組み込むことによって実現している。本稿では,検索結果の集合に対するクエリに関連する総情報の確率的尺度である,関連情報ゲインに基づく新しい単純な最適化指標を提案する。この計量を最適化することで、多様性は我々のシステムから有機的に現れる。 RAGシステムの検索コンポーネントのドロップイン置換として使用すると、RGB(Retrieval Augmented Generation Benchmark)から質問応答タスクの最先端性能が得られ、妥当性と多様性を直接最適化する既存の指標よりも優れる。

A common way to extend the memory of large language models (LLMs) is by retrieval augmented generation (RAG), which inserts text retrieved from a larger memory into an LLM's context window. However, the context window is typically limited to several thousand tokens, which limits the number of retrieved passages that can inform a model's response. For this reason, it's important to avoid occupying context window space with redundant information by ensuring a degree of diversity among retrieved passages. At the same time, the information should also be relevant to the current task. Most prior methods that encourage diversity among retrieved results, such as Maximal Marginal Relevance (MMR), do so by incorporating an objective that explicitly trades off diversity and relevance. We propose a novel simple optimization metric based on relevant information gain, a probabilistic measure of the total information relevant to a query for a set of retrieved results. By optimizing this metric, diversity organically emerges from our system. When used as a drop-in replacement for the retrieval component of a RAG system, this method yields state-of-the-art performance on question answering tasks from the Retrieval Augmented Generation Benchmark (RGB), outperforming existing metrics that directly optimize for relevance and diversity.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 大規模合成テキスト生成のためのプライベート予測

Private prediction for large-scale synthetic text generation ( http://arxiv.org/abs/2407.12108v1 )

ライセンス: Link先を確認

Kareem Amin, Alex Bie, Weiwei Kong, Alexey Kurakin, Natalia Ponomareva, Umar Syed, Andreas Terzis, Sergei Vassilvitskii,

(参考訳) 本稿では,大規模言語モデル (LLM) を用いた個人用テキスト生成手法を提案する。プライベートな予測フレームワークでは、差分プライバシー保証を満たすために出力された合成データのみを必要とする。これは、潜在的に敏感なユーザ供給ソースデータに対して生成モデルをトレーニングし、モデル自体が安全にリリースできるようにするアプローチとは対照的である。我々は、ソースデータで事前訓練されたLLMを起動するが、次の注意すべき予測が、異なるプライバシ保証で実行されることを保証する。このパラダイムの以前の研究は、適切なプライバシーレベルで少数の例(10)を生成したと報告していた。対照的に、私たちは何千もの高品質な合成データポイントを生成できるように変更し、潜在的なアプリケーションセットを大きく拡大します。我々の改善は、LLMのトークンをサンプリングするソフトマックス層と指数的なメカニズムとの等価性を活用することで、プライバシー分析の改善と、より優れたプライベート選択機構によって実現されている。さらに、機密データなしで予測可能なトークンに対して、プライバシコストを払わないスパースベクター手法によるパブリック予測を新たに導入し、構造化データに特に有効であることが判明した。

We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the model itself is safe to release. We prompt a pretrained LLM with source data, but ensure that next-token predictions are made with differential privacy guarantees. Previous work in this paradigm reported generating a small number of examples (<10) at reasonable privacy levels, an amount of data that is useful only for downstream in-context learning or prompting. In contrast, we make changes that allow us to generate thousands of high-quality synthetic data points, greatly expanding the set of potential applications. Our improvements come from an improved privacy analysis and a better private selection mechanism, which makes use of the equivalence between the softmax layer for sampling tokens in LLMs and the exponential mechanism. Furthermore, we introduce a novel use of public predictions via the sparse vector technique, in which we do not pay privacy costs for tokens that are predictable without sensitive data; we find this to be particularly effective for structured data.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 公正なグラフ学習のためのベンチマーク

A Benchmark for Fairness-Aware Graph Learning ( http://arxiv.org/abs/2407.12112v1 )

ライセンス: Link先を確認

Yushun Dong, Song Wang, Zhenyu Lei, Zaiyi Zheng, Jing Ma, Chen Chen, Jundong Li,

(参考訳) 公正なグラフ学習は近年注目を集めている。それにもかかわらず、さまざまな公正を意識したグラフ学習手法の評価と比較を行うための包括的なベンチマークが欠けているため、実践者がより広い現実世界のアプリケーションに適切なものを選択するのを妨げている。本稿では,10の代表的な公正性を考慮したグラフ学習手法に関する広範なベンチマークを示す。具体的には、グループフェアネス、個人フェアネス、異なるフェアネス基準間のバランス、計算効率など、複数の視点からこれらの手法を評価するために、体系的な評価プロトコルを設計し、7つの実世界のデータセット上で実験を行う。我々の詳細な分析は、既存の手法の強みと限界に関する重要な洞察を明らかにしている。さらに,フェアネスを考慮したグラフ学習手法を応用するための実践的ガイダンスを提供する。我々の知識を最大限に活用するために、本研究は、この分野の今後の進歩を促進するために、代表的公正を意識したグラフ学習手法を包括的に理解するための最初のステップとなる。

Fairness-aware graph learning has gained increasing attention in recent years. Nevertheless, there lacks a comprehensive benchmark to evaluate and compare different fairness-aware graph learning methods, which blocks practitioners from choosing appropriate ones for broader real-world applications. In this paper, we present an extensive benchmark on ten representative fairness-aware graph learning methods. Specifically, we design a systematic evaluation protocol and conduct experiments on seven real-world datasets to evaluate these methods from multiple perspectives, including group fairness, individual fairness, the balance between different fairness criteria, and computational efficiency. Our in-depth analysis reveals key insights into the strengths and limitations of existing methods. Additionally, we provide practical guidance for applying fairness-aware graph learning methods in applications. To the best of our knowledge, this work serves as an initial step towards comprehensively understanding representative fairness-aware graph learning methods to facilitate future advancements in this area.

翻訳日:2024-07-18 21:18:26 公開日:2024-07-16

# 都市空調における信頼性・リアルタイムフリートスケジューリングのためのグラフベース逆模倣学習フレームワーク

A Graph-based Adversarial Imitation Learning Framework for Reliable & Realtime Fleet Scheduling in Urban Air Mobility ( http://arxiv.org/abs/2407.12113v1 )

ライセンス: Link先を確認

Prithvi Poddar, Steve Paul, Souma Chowdhury,

(参考訳) UAM(Urban Air Mobility)の出現は、都市交通の領域における変革的シフトの範囲を示す。しかし、その普及と経済性は、空域の混雑、気象条件の変化、および様々な要求に起因する不確実性の下で、UAMネットワーク内のバーチポートを横断する航空機の艦隊を最適にスケジュールする能力に部分的に依存している。そこで本論文では, 整数型非線形計画問題の直接解法は, 日次スケジューリングでは計算が不可能であるため, フラッグスケジューリング問題の総合的な最適化を図りながら, 代替解法の必要性を同定する。従来の研究は、(グラフ)強化学習(RL)アプローチを用いて、艦隊スケジューリングのためのリアルタイム実行可能なポリシーモデルを訓練することの有効性を示した。しかし、そのようなポリシーは、アウト・オブ・ディストリビューションのシナリオやエッジのケースでは不安定であることが多い。さらに、問題の複雑さ(例えば制約の数)が増加するにつれて、トレーニングパフォーマンスも悪化する。これらの問題に対処するために,RLに基づくポリシーは,遺伝的アルゴリズムを用いて正確な最適化を解くことで得られる専門家の実証を活用できる模擬学習手法を提案する。ポリシーモデルは、バーティポートと航空機の空間を埋め込むグラフニューラルネットワーク(GNN)ベースのエンコーダ、需要、乗客運賃、輸送コストプロファイルをエンコードするトランスフォーマーネットワーク、マルチヘッドアテンション(MHA)ベースのデコーダを含む。専門家によるデモンストレーションは、GAIL(Generative Adversarial Imitation Learning)アルゴリズムを通じて行われている。 8機と40機からなるUAMシミュレーション環境と対話し、毎日の利益が報われるという観点から、新しい模倣アプローチは、純粋なRL結果と比較して、目に見えない最悪のシナリオの場合において、より良い平均性能と顕著な改善を達成する。

The advent of Urban Air Mobility (UAM) presents the scope for a transformative shift in the domain of urban transportation. However, its widespread adoption and economic viability depends in part on the ability to optimally schedule the fleet of aircraft across vertiports in a UAM network, under uncertainties attributed to airspace congestion, changing weather conditions, and varying demands. This paper presents a comprehensive optimization formulation of the fleet scheduling problem, while also identifying the need for alternate solution approaches, since directly solving the resulting integer nonlinear programming problem is computationally prohibitive for daily fleet scheduling. Previous work has shown the effectiveness of using (graph) reinforcement learning (RL) approaches to train real-time executable policy models for fleet scheduling. However, such policies can often be brittle on out-of-distribution scenarios or edge cases. Moreover, training performance also deteriorates as the complexity (e.g., number of constraints) of the problem increases. To address these issues, this paper presents an imitation learning approach where the RL-based policy exploits expert demonstrations yielded by solving the exact optimization using a Genetic Algorithm. The policy model comprises Graph Neural Network (GNN) based encoders that embed the space of vertiports and aircraft, Transformer networks to encode demand, passenger fare, and transport cost profiles, and a Multi-head attention (MHA) based decoder. Expert demonstrations are used through the Generative Adversarial Imitation Learning (GAIL) algorithm. Interfaced with a UAM simulation environment involving 8 vertiports and 40 aircrafts, in terms of the daily profits earned reward, the new imitative approach achieves better mean performance and remarkable improvement in the case of unseen worst-case scenarios, compared to pure RL results.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# Wigner関数のモーメントを用いた連続変数状態の非古典性の効率的な検出

Efficient detection of non-classicality of continuous variable states using moments of Wigner function ( http://arxiv.org/abs/2407.12116v1 )

ライセンス: Link先を確認

Bivas Mallick, Sudip Chakrabarty, Saheli Mukherjee, Ananda G. Maity, A. S. Majumdar,

(参考訳) 非古典状態の重要なサブクラスである負のウィグナー関数を持つ状態は、様々な量子情報処理タスクの貴重な資源として機能する。ここでは、負のウィグナー関数を示すような量子状態を検出するための基準を提供する。本手法は, 単純な関数を計算し, 完全な状態トモグラフィやウィグナー関数再構成を必要とせずに実実験で実装できるWigner関数のモーメントを評価することに依存する。次に、検出方式をサポートするための明示的な例を示す。さらに,連続変数SWAP演算子を用いて実実験でこれらのモーメントを実現する実験手法を提案する。

States with negative Wigner function, a significant subclass of non-classical states, serve as a valuable resource for various quantum information processing tasks. Here, we provide a criterion for detecting such quantum states exhibiting negative Wigner function. Our method relies on evaluating moments of the Wigner function which involves computing simple functionals and can be implemented in a real experiment without the need for full state tomography or Wigner function reconstruction. We then provide explicit examples to support our detection scheme. Further, we propose an experimental method utilizing the continuous variable SWAP operator to realize these moments in a real experiment.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# 8GPU上で100万シーケンスの7B LLMを効率的にトレーニングする

Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs ( http://arxiv.org/abs/2407.12117v1 )

ライセンス: Link先を確認

Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui,

(参考訳) 現在、LLM(Large Language Models)は、よりクリエイティブなアプリケーションを促進するために、拡張コンテキスト長を使用して訓練されている。しかし、長いコンテキストトレーニングはGPUメモリの制約を考慮すると大きな課題となる。トレーニング中にメモリ消費が相当に活性化されるだけでなく、メモリの断片化も生じる。長期のコンテキストトレーニングを容易にするため、既存のフレームワークでは、再計算や様々な形式の並列処理といった戦略を採用している。しかしながら、これらの手法は冗長な計算や広範囲な通信に依存しており、結果としてモデルFLOPS(MFU)が低くなる。本稿では,メモリ管理の微粒化を目的とした新しいLCMトレーニングフレームワークMEMOを提案する。 FlashAttentionを使用する場合、メモリの2次スケーリングとシーケンス長の線形スケーリングを考慮し、各レイヤの前方通過後にメモリ消費の活性化をCPUメモリにオフロードし、後方通過時にそれらをフェッチする。演算を邪魔することなくアクティベーションのスワップを最大化し、限られたCPUメモリの浪費を避けるため、トークン単位のアクティベーション再計算とスワップ機構を実装した。さらに,2レベル混合整数プログラミング(MIP)アプローチを採用し,トランスフォーマー層間のメモリ再利用を最適化することで,メモリ断片化の問題に取り組む。実験の結果、MEMOはMegatron-LMとDeepSpeedと比較して平均2.42倍、平均2.26倍のMFUを達成することが示された。この改善は、メモリの断片化を最小限に抑え、再計算と集中的な通信を減らし、断片化によるメモリ再編成プロセスに伴う遅延を回避できるMEMOの能力に起因している。きめ細かいアクティベーションメモリ管理を活用することで、MEMOはわずか8A800 GPU上で100万のシーケンス長を持つ7B LLMの効率的なトレーニングを可能にし、52.30%のMFUを達成する。

Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing frameworks have adopted strategies such as recomputation and various forms of parallelisms. Nevertheless, these techniques rely on redundant computation or extensive communication, resulting in low Model FLOPS Utilization (MFU). In this paper, we propose MEMO, a novel LLM training framework designed for fine-grained activation memory management. Given the quadratic scaling of computation and linear scaling of memory with sequence lengths when using FlashAttention, we offload memory-consuming activations to CPU memory after each layer's forward pass and fetch them during the backward pass. To maximize the swapping of activations without hindering computation, and to avoid exhausting limited CPU memory, we implement a token-wise activation recomputation and swapping mechanism. Furthermore, we tackle the memory fragmentation issue by employing a bi-level Mixed Integer Programming (MIP) approach, optimizing the reuse of memory across transformer layers. Empirical results demonstrate that MEMO achieves an average of 2.42x and 2.26x MFU compared to Megatron-LM and DeepSpeed, respectively. This improvement is attributed to MEMO's ability to minimize memory fragmentation, reduce recomputation and intensive communication, and circumvent the delays associated with the memory reorganization process due to fragmentation. By leveraging fine-grained activation memory management, MEMO facilitates efficient training of 7B LLM with 1 million sequence length on just 8 A800 GPUs, achieving an MFU of 52.30%.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# FoodMem:リアルタイムと精密なフードビデオセグメンテーション

FoodMem: Near Real-time and Precise Food Video Segmentation ( http://arxiv.org/abs/2407.12121v1 )

ライセンス: Link先を確認

Ahmad AlMughrabi, Adrián Galán, Ricardo Marques, Petia Radeva,

(参考訳) ビデオを含む食品のセグメンテーションは、現実世界の健康、農業、食品バイオテクノロジーの問題に対処するために不可欠である。現在の制限は、不正確な栄養分析、非効率な作物管理、最適な食品加工につながり、食料安全保障と公衆衛生に影響を及ぼす。セグメンテーション技術の改善は、食物アセスメント、農業生産性、および食品生産プロセスを向上させることができる。本研究では、最小限のハードウェアリソースを用いて、高品質でほぼリアルタイムなセグメンテーションとビデオ内の食品の追跡のための堅牢なフレームワークの開発を紹介する。私たちは、360度無境界シーンのビデオシーケンスから食品を分割する新しいフレームワーク、FoodMemを紹介します。 FoodMemは、ビデオ処理コンテキストにおけるフリッカリングや禁止推論速度といった、既存のセマンティックセグメンテーションモデルの制限を克服して、ビデオシーケンス内の食品部分のマスクを一貫して生成することができる。これらの問題に対処するため、FoodMemは、トランスフォーマーセグメンテーションフェーズを使用して、初期セグメンテーションマスクと、複雑なシーンにおけるフードマスクを監視するメモリベースのトラッキングフェーズを生成する。われわれのフレームワークは、現在の最先端食品セグメンテーションモデルより優れており、カメラアングル、照明、反射、シーンの複雑さ、食品の多様性など、様々な条件で優れたパフォーマンスが得られる。これにより、セグメンテーションノイズの低減、アーティファクトの除去、欠落セグメントの完成が実現される。ここでは、以前のベンチマークにない挑戦的なシナリオを含む、新しい注釈付き食品データセットについても紹介する。 Nutrition5k と Vegetables & Fruits のデータセットで実施された大規模な実験は、FoodMem が食品ビデオのセグメンテーションにおける平均精度を2.5%向上し、平均で58倍高速であることを示した。

Food segmentation, including in videos, is vital for addressing real-world health, agriculture, and food biotechnology issues. Current limitations lead to inaccurate nutritional analysis, inefficient crop management, and suboptimal food processing, impacting food security and public health. Improving segmentation techniques can enhance dietary assessments, agricultural productivity, and the food production process. This study introduces the development of a robust framework for high-quality, near-real-time segmentation and tracking of food items in videos, using minimal hardware resources. We present FoodMem, a novel framework designed to segment food items from video sequences of 360-degree unbounded scenes. FoodMem can consistently generate masks of food portions in a video sequence, overcoming the limitations of existing semantic segmentation models, such as flickering and prohibitive inference speeds in video processing contexts. To address these issues, FoodMem leverages a two-phase solution: a transformer segmentation phase to create initial segmentation masks and a memory-based tracking phase to monitor food masks in complex scenes. Our framework outperforms current state-of-the-art food segmentation models, yielding superior performance across various conditions, such as camera angles, lighting, reflections, scene complexity, and food diversity. This results in reduced segmentation noise, elimination of artifacts, and completion of missing segments. Here, we also introduce a new annotated food dataset encompassing challenging scenarios absent in previous benchmarks. Extensive experiments conducted on Nutrition5k and Vegetables & Fruits datasets demonstrate that FoodMem enhances the state-of-the-art by 2.5% mean average precision in food video segmentation and is 58 x faster on average.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# LLMs-in-the-loop Part-1:バイオメディカルテキスト翻訳のためのエキスパート・スモールAIモデル

LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation ( http://arxiv.org/abs/2407.12126v1 )

ライセンス: Link先を確認

Bunyamin Keles, Murat Gunay, Serdar I. Caglar,

(参考訳) 機械翻訳は、言語にまたがる医療知識のグローバルな普及を可能にするために、医療において不可欠である。しかし、複雑な医学用語は、適切な翻訳品質と精度を達成するために固有の課題を生んでいる。本研究では,医療用テキストに最適化された教師ありニューラルマシン翻訳モデルを開発するために,新しい"LLMs-in-the-loop"アプローチを提案する。大規模言語モデル(LLM)は強力な能力を示しているが、この研究は、高品質なドメイン(主に合成された)データに基づいて訓練された小さな特殊なモデルの方が、さらに大きなLLMよりも優れていることを示している。 6つの言語での独自の平行コーパスは、科学論文、人工的に生成された臨床文書、医療文書から編纂された。 LLM-in-the-loop法では,データ生成,厳密な評価,エージェントオーケストレーションを用いて性能を向上させる。 MarianMTベースモデルを用いた小さな医療用翻訳モデルを開発した。この領域での評価を標準化するための新しい医療翻訳試験データセットを導入する。このテストセットでBLEU、METEOR、ROUGE、BERTのスコアを用いて評価すると、MarianMTベースのモデルはGoogle Translate、DeepL、GPT-4-Turboより優れています。その結果、LLM-in-the-loopアプローチと、微調整された高品質なドメイン固有データを組み合わせることで、汎用システムや大規模システムよりも優れた性能を発揮することが示された。この研究は、専門家の小さなモデルに関するより広範なシリーズの一部であり、身元特定やバイオメディカルな実体抽出モデルを含む、将来の医療関連AI開発への道を開く。本研究は,データ生成,評価,エージェント,モデリング技術の改善を通じて,ニューラルネットワークモデルの改良とLLM-in-the-loop法の可能性を明らかにする。

Machine translation is indispensable in healthcare for enabling the global dissemination of medical knowledge across languages. However, complex medical terminology poses unique challenges to achieving adequate translation quality and accuracy. This study introduces a novel "LLMs-in-the-loop" approach to develop supervised neural machine translation models optimized specifically for medical texts. While large language models (LLMs) have demonstrated powerful capabilities, this research shows that small, specialized models trained on high-quality in-domain (mostly synthetic) data can outperform even vastly larger LLMs. Custom parallel corpora in six languages were compiled from scientific articles, synthetically generated clinical documents, and medical texts. Our LLMs-in-the-loop methodology employs synthetic data generation, rigorous evaluation, and agent orchestration to enhance performance. We developed small medical translation models using the MarianMT base model. We introduce a new medical translation test dataset to standardize evaluation in this domain. Assessed using BLEU, METEOR, ROUGE, and BERT scores on this test set, our MarianMT-based models outperform Google Translate, DeepL, and GPT-4-Turbo. Results demonstrate that our LLMs-in-the-loop approach, combined with fine-tuning high-quality, domain-specific data, enables specialized models to outperform general-purpose and some larger systems. This research, part of a broader series on expert small models, paves the way for future healthcare-related AI developments, including deidentification and bio-medical entity extraction models. Our study underscores the potential of tailored neural translation models and the LLMs-in-the-loop methodology to advance the field through improved data generation, evaluation, agent, and modeling techniques.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# 動的オンラインデータストリームを用いた完全テスト時間適応のための配電アライメント

Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams ( http://arxiv.org/abs/2407.12128v1 )

ライセンス: Link先を確認

Ziqiang Wang, Zhixiang Chi, Yanan Wu, Li Gu, Zhi Liu, Konstantinos Plataniotis, Yang Wang,

(参考訳) ソースデータに基づいてトレーニングされたモデルによって、テスト時間適応(TTA)は、ソースからのドメインシフトを伴うテストデータストリームの適応と推論を可能にする。現在の手法では、自己学習損失を使用して、入ってくるテストデータバッチ毎にモデルを最適化している。これらの手法は、バッチが独立して、ターゲットの分布から同一にサンプリングされる理想的なテストデータストリームに変換可能な結果をもたらすが、より実用的なテストデータストリームは、独立で、同一に分散されていない(非i.d.)。非i.d.ストリームのデータバッチは、相互に顕著なラベルシフトを表示する。 TTAプロセスの間、バッチ間で最適化の目標が矛盾することになります。ソースモデルを予測不能なテスト時間分布に適応させる固有のリスクを考慮し、適応過程を逆転させ、新しいTTA分布アライメント損失を提案する。これにより、十分に訓練されたソースモデルとの互換性を確保し、矛盾する最適化目標に関連する落とし穴を取り除くことができる。さらに、連続的なドメインシフトシナリオにおいて、提案したTTA手法の成功を拡大するためのドメインシフト検出機構を考案する。本研究では,本手法の論理と有効性を検証した。 6つのベンチマークデータセットでは、非i.d.シナリオで既存の手法を超越し、理想的なi.d.仮定の下で競争性能を維持する。

Given a model trained on source data, Test-Time Adaptation (TTA) enables adaptation and inference in test data streams with domain shifts from the source. Current methods predominantly optimize the model for each incoming test data batch using self-training loss. While these methods yield commendable results in ideal test data streams, where batches are independently and identically sampled from the target distribution, they falter under more practical test data streams that are not independent and identically distributed (non-i.i.d.). The data batches in a non-i.i.d. stream display prominent label shifts relative to each other. It leads to conflicting optimization objectives among batches during the TTA process. Given the inherent risks of adapting the source model to unpredictable test-time distributions, we reverse the adaptation process and propose a novel Distribution Alignment loss for TTA. This loss guides the distributions of test-time features back towards the source distributions, which ensures compatibility with the well-trained source model and eliminates the pitfalls associated with conflicting optimization objectives. Moreover, we devise a domain shift detection mechanism to extend the success of our proposed TTA method in the continual domain shift scenarios. Our extensive experiments validate the logic and efficacy of our method. On six benchmark datasets, we surpass existing methods in non-i.i.d. scenarios and maintain competitive performance under the ideal i.i.d. assumption.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# ゲームとブロックチェーン: ハイプと現実

Gaming and Blockchain: Hype and Reality ( http://arxiv.org/abs/2407.12134v1 )

ライセンス: Link先を確認

Max McGuinness,

(参考訳) 本稿では,ゲーム産業におけるブロックチェーン技術の導入について考察する。支持者は、分散台帳技術はゲーム経済に革命をもたらし、プレイヤーに仮想資産のコントロールを提供する可能性があると断言する一方で、エネルギー消費やユーザの採用といった現実的な課題に対処し、デトラクタはブロックチェーンの統合がさらに必要かどうかを疑問視している。このレポートでは、EnjinやAxie Infinityといった一般的なブロックチェーンベースのゲームプロジェクトを特徴付け、トランザクションコストやプレイヤーのフィードバックといったメトリクスを比較して、ブロックチェーン統合ゲーム全体の長寿を評価する。

This paper explores the adoption of blockchain technology in the gaming industry. While supporters affirm that distributed ledger technology has potential to revolutionize gaming economies and provide players with control over their virtual assets, there are practical challenges such as energy consumption and user adoption to be addressed, and detractors question whether blockchain integration is even necessary. This report characterises popular blockchain-based gaming projects like Enjin and Axie Infinity, then compares metrics such as transaction cost and player feedback to evaluate the longevity of blockchain-integrated gaming as a whole.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# 分子トポロジープロファイル(MOLTOP) -- 分子グラフ分類のための単純で強力なベースライン

Molecular Topological Profile (MOLTOP) -- Simple and Strong Baseline for Molecular Graph Classification ( http://arxiv.org/abs/2407.12136v1 )

ライセンス: Link先を確認

Jakub Adamczyk, Wojciech Czech,

(参考訳) 分子グラフ分類におけるトポロジカル記述子の有効性を再検討し、単純で強力なベースラインを設計する。本稿では,エッジディスクリプタのヒストグラムアグリゲーションと原子番号と結合型のワンホットエンコーディングを併用した機能工学への簡単なアプローチが,ランダムフォレスト分類器と組み合わせることで,グラフニューラルネットワーク(GNN)の強力なベースラインを確立することを実証する。新たなアルゴリズムである分子トポロジカルプロファイル(MOLTOP)は、エッジ間の中央性、調整されたランダムインデックス、SCAN構造類似度スコアを統合している。このアプローチは、現代的なGNNと比較して、非常に競争力がある一方で、単純で、高速で、低分散で、ハイパーパラメータフリーであることを示す。提案手法は, Open Graph Benchmark による公正な評価プロトコルを用いて, MoleculeNet データセット上で厳密に検証されている。また、Long Range Graph Benchmarkのペプチド分類タスクにおいて、ドメインのアウトオブドメイン生成機能を示す。 11のベンチマークデータセットに対する評価では、MOLTOPの強力な識別能力が、グラフのクラスで1ドル=WLテスト、さらに3ドル=WLテストを超えていることが明らかになった。我々の結論は、GNNドメインの進歩を正確に評価するためには、記述子ベースのベースライン(例えば、提案するもの)が依然として不可欠であるということだ。

We revisit the effectiveness of topological descriptors for molecular graph classification and design a simple, yet strong baseline. We demonstrate that a simple approach to feature engineering - employing histogram aggregation of edge descriptors and one-hot encoding for atomic numbers and bond types - when combined with a Random Forest classifier, can establish a strong baseline for Graph Neural Networks (GNNs). The novel algorithm, Molecular Topological Profile (MOLTOP), integrates Edge Betweenness Centrality, Adjusted Rand Index and SCAN Structural Similarity score. This approach proves to be remarkably competitive when compared to modern GNNs, while also being simple, fast, low-variance and hyperparameter-free. Our approach is rigorously tested on MoleculeNet datasets using fair evaluation protocol provided by Open Graph Benchmark. We additionally show out-of-domain generation capabilities on peptide classification task from Long Range Graph Benchmark. The evaluations across eleven benchmark datasets reveal MOLTOP's strong discriminative capabilities, surpassing the $1$-WL test and even $3$-WL test for some classes of graphs. Our conclusion is that descriptor-based baselines, such as the one we propose, are still crucial for accurately assessing advancements in the GNN domain.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# 開腹手術における整形外科器具の単眼的ポーズ推定

Monocular pose estimation of articulated surgical instruments in open surgery ( http://arxiv.org/abs/2407.12138v1 )

ライセンス: Link先を確認

Robert Spektor, Tom Friedman, Itay Or, Gil Bolotin, Shlomi Laufer,

(参考訳) 本研究は, 開腹手術における手術器具の単眼6Dポーズ推定に対する新しいアプローチとして, 物体調音, 対称性, 閉塞, 注釈付き実世界のデータの欠如といった課題に対処する。この手法は、これらの障害を克服するために合成データ生成とドメイン適応技術を利用する。提案手法は,(1)調音リギングと物理的レンダリングを用いた外科的ツールの3次元モデリングを用いた合成データ生成,(2)ポーズ推定とハイブリッドな幾何学的融合戦略を組み合わせた適切なポーズ推定フレームワーク,(3)合成データと実際の注釈データの両方を利用したトレーニング戦略,および(3)自動生成擬似ラベルを用いた実ビデオデータへのドメイン適応を用いたトレーニング戦略からなる。オープン手術の映像で行った評価は,提案手法の優れた性能と実世界の応用性を示し,医療用拡張現実およびロボットシステムへの統合の可能性を強調した。このアプローチは、実際の外科的データの広範な手動アノテーションを不要にする。

This work presents a novel approach to monocular 6D pose estimation of surgical instruments in open surgery, addressing challenges such as object articulations, symmetries, occlusions, and lack of annotated real-world data. The method leverages synthetic data generation and domain adaptation techniques to overcome these obstacles. The proposed approach consists of three main components: (1) synthetic data generation using 3D modeling of surgical tools with articulation rigging and physically-based rendering; (2) a tailored pose estimation framework combining object detection with pose estimation and a hybrid geometric fusion strategy; and (3) a training strategy that utilizes both synthetic and real unannotated data, employing domain adaptation on real video data using automatically generated pseudo-labels. Evaluations conducted on videos of open surgery demonstrate the good performance and real-world applicability of the proposed method, highlighting its potential for integration into medical augmented reality and robotic systems. The approach eliminates the need for extensive manual annotation of real surgical data.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# 傾斜クエンチ量子相転移における長距離相互作用と雑音の競合--長距離ペアの北エフ鎖の場合-

Competition of long-range interactions and noise at ramped quench dynamical quantum phase transition: The case of the long-range pairing Kitaev chain ( http://arxiv.org/abs/2407.12140v1 )

ライセンス: Link先を確認

R. Baghran, R. Jafari, A. Langari,

(参考訳) 動的量子相転移 (DQPTs) のフレームワークにおいて, ノイズのない線形時間依存性の化学ポテンシャルを持つ長距離ペアリング北エフモデルの非平衡ダイナミクスについて検討した。短距離ペアリング北エフモデルでは1つの臨界時間尺度が示され、長距離ペアリングは3つのDQPT時間尺度を持つ領域を誘導する。 3つのDQPT時間スケールを持つ領域はノイズの存在下で縮小することがわかった。さらに,DQPTが消滅する臨界スイープ速度であるクエンチが2つの臨界点を横切ることを明らかにした。数値シミュレーションに基づいて,雑音が長距離ペアリング誘導を減少させることを示した。

The nonequilibrium dynamics of long-range pairing Kitaev model with noiseless/noisy linear time dependent chemical potential, is investigated in the frame work of dynamical quantum phase transitions (DQPTs). We have shown for the ramp crosses a single quantum critical point, while the short-range pairing Kitaev model displays a single critical time scale, the long-range pairing induces a region with three DQPTs time scales. We have found that the region with three DQPTs time scales shrinks in the presence of the noise. In addition, we have uncovered for a quench crossess two critical points, the critical sweep velocity above which the DQPTs disappear, enhances by the long-range pairing exponent while decreases in the presence of the noise. On the basis of numerical simulations, we have shown that noise diminishes the long-range pairing inductions.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# ポーランドの政治文における感情強度の予測:資源不足言語における教師付きモデルと大規模言語モデルの比較

Predicting Emotion Intensity in Polish Political Texts: Comparing Supervised Models and Large Language Models in a Resource-Poor Language ( http://arxiv.org/abs/2407.12141v1 )

ライセンス: Link先を確認

Hubert Plisiecki, Piotr Koc, Maria Flakus, Artur Pokropek,

(参考訳) 本研究では,ポーランドの政治文における感情強度の予測に大規模言語モデル(LLM)を用いることを検討した。この研究は、専門家による感情の強さを評価するために、1万のソーシャルメディアテキストの注釈付きコーパスで訓練された教師付きモデルと比較した。これらの結果から, 教師付きモデルはLLMよりも優れ, 精度が高く, 分散度も低いが, LLMは特にデータアノテーションに関連するコストが高いため, 有効な代替手段となることが示唆された。この研究は、低リソース言語設定におけるLLMの可能性を強調し、感情の強度予測とその異なる言語と連続した特徴に対するさらなる研究の必要性を強調している。これらの意味は、リソースの可用性とタスクの特定の要求に基づいて、研究者や実践者にとっての感情予測への正しいアプローチを選択するための、曖昧な意思決定プロセスが示唆されている。

This study explores the use of large language models (LLMs) to predict emotion intensity in Polish political texts, a resource-poor language context. The research compares the performance of several LLMs against a supervised model trained on an annotated corpus of 10,000 social media texts, evaluated for the intensity of emotions by expert judges. The findings indicate that while the supervised model generally outperforms LLMs, offering higher accuracy and lower variance, LLMs present a viable alternative, especially given the high costs associated with data annotation. The study highlights the potential of LLMs in low-resource language settings and underscores the need for further research on emotion intensity prediction and its application across different languages and continuous features. The implications suggest a nuanced decision-making process to choose the right approach to emotion prediction for researchers and practitioners based on resource availability and the specific requirements of their tasks.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# 投票結果の予測でオンライン関係を超越した物理パルチザン

Physical partisan proximity outweighs online ties in predicting US voting outcomes ( http://arxiv.org/abs/2407.12146v1 )

ライセンス: Link先を確認

Marco Tonin, Bruno Lepri, Michele Tizzoni,

(参考訳) 影響力のある分極と社会的分裂の増大は、社会の混合や、オンラインや物理的な空間における情報の拡散、社会的・選挙的分裂の強化、政治的結果への影響に影響を及ぼす。ここでは、集約された非特定コロケーションとオンラインネットワークデータを用いて、米国におけるパルチザン曝露と投票パターンの関係を、同じ社会的文脈への物理的近接と露出、オンライン社会関係、住宅分類の3次元で比較検討する。様々な統計的モデリングアプローチを活用することで、コロケーションパターンによって捉えられた物理空間におけるパルチザン露光が、米国郡の選挙結果をより正確に予測し、大都市圏や非都市圏でのオンライン・住宅露光よりも優れていたことを一貫して見出す。さらに, 投票結果が不確実なスウィング郡では, 投票パターンの予測に物理パルチザンが最適であることが示唆された。また、郡レベルの経験者分離を推定し、個人の人口動態と社会経済特性との関係について検討した。本稿は、大都市圏を中心に、米国における大規模なパルチザン分離の存在を確認し、オフラインのパルチザン分離が、物理的出会いや住宅の選別の両方を考慮して、オンラインのセグメンテーションよりも高く、主に教育的達成と関連していることを示す。本研究は,ソーシャル・ネットワークと政治行動の関係を理解する上での物理空間の重要性を強調し,オンライン・ソーシャルネットワークと選挙に焦点を絞った厳しい精査とは対照的である。

Affective polarization and increasing social divisions affect social mixing and the spread of information across online and physical spaces, reinforcing social and electoral cleavages and influencing political outcomes. Here, using aggregated and de-identified co-location and online network data, we investigate the relationship between partisan exposure and voting patterns in the USA by comparing three dimensions of partisan exposure: physical proximity and exposure to the same social contexts, online social ties, and residential sorting. By leveraging various statistical modeling approaches, we consistently find that partisan exposure in the physical space, as captured by co-location patterns, more accurately predicts electoral outcomes in US counties, outperforming online and residential exposures across metropolitan and non-metro areas. Moreover, our results show that physical partisan proximity is the best predictor of voting patterns in swing counties, where the election results are most uncertain. We also estimate county-level experienced partisan segregation and examine its relationship with individuals' demographic and socioeconomic characteristics. Focusing on metropolitan areas, our results confirm the presence of extensive partisan segregation in the US and show that offline partisan isolation, both considering physical encounters or residential sorting, is higher than online segregation and is primarily associated with educational attainment. Our findings emphasize the importance of physical space in understanding the relationship between social networks and political behavior, in contrast to the intense scrutiny focused on online social networks and elections.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# 生物活性予測のための深層化学言語処理のためのヒッチハイカーガイド

A Hitchhiker's Guide to Deep Chemical Language Processing for Bioactivity Prediction ( http://arxiv.org/abs/2407.12152v1 )

ライセンス: Link先を確認

Rıza Özçelik, Francesca Grisoni,

(参考訳) 深層学習は薬物発見を著しく加速させ、顕著なアプローチとして「化学言語」処理(CLP)が出現した。 CLPは、分子文字列表現(例えば、Simplified Molecular Input Line Entry Systems(SMILES)とSelf-Reference Embedded Strings(SELFIES))から、自然言語処理に似たメソッドで学習する。その重要性は増しているが、予測型CLPモデルは、多くの「鐘と笛」を含むため、決して自明ではない。ここでは,CLPトレーニングの重要な要素を分析し,新参者や専門家のガイドラインを提供する。我々の研究は、分類と回帰の両方のために、3つのニューラルネットワークアーキテクチャ、2つの文字列表現、3つの埋め込み戦略、10の生物活性データセットにまたがる。この「ヒッチハイカーのガイド」は、特定の方法論的選択の重要性を浮き彫りにするだけでなく、ニューラルネットワークアーキテクチャ、分子表現、ハイパーパラメータ最適化といった、理想的な選択に関する実践的な勧告を研究者に与えている。

Deep learning has significantly accelerated drug discovery, with 'chemical language' processing (CLP) emerging as a prominent approach. CLP learns from molecular string representations (e.g., Simplified Molecular Input Line Entry Systems [SMILES] and Self-Referencing Embedded Strings [SELFIES]) with methods akin to natural language processing. Despite their growing importance, training predictive CLP models is far from trivial, as it involves many 'bells and whistles'. Here, we analyze the key elements of CLP training, to provide guidelines for newcomers and experts alike. Our study spans three neural network architectures, two string representations, three embedding strategies, across ten bioactivity datasets, for both classification and regression purposes. This 'hitchhiker's guide' not only underscores the importance of certain methodological choices, but it also equips researchers with practical recommendations on ideal choices, e.g., in terms of neural network architectures, molecular representations, and hyperparameter optimization.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# Parity-deformed $su(2)$ and $so(3)$ Algebras: a Basis for Quantum Optics and Quantum Communications Applications

Parity-deformed $su(2)$ and $so(3)$ Algebras: a Basis for Quantum Optics and Quantum Communications Applications ( http://arxiv.org/abs/2407.12157v1 )

ライセンス: Link先を確認

W. S. Chung, H. Hassanabadi, L. M. Nieto, S. Zarrinkamar,

(参考訳) 物理学の様々な分野におけるパリティ(リフレクション)の重要性を念頭に置いて、単モードおよび二モードウィグナー代数はそれらにリフレクション作用素を加えると考えられる。関連する $su(2)$ algebra, $su_{\nu}(2)$, and the deformed $so(3)$ algebra, $so_{\nu}(3)$, is constructed for the wide use Jordan-Schwinger and Holstein-Primakoff realizations, commenting on various aspects and ingredients of the formalism for both single-mode and two-mode case。最後に、このフレームワークでは、パリティ変形した $so_{\nu}(3)$表現が解析される。

Having in mind the significance of parity (reflection) in various areas of physics, the single-mode and two-mode Wigner algebras are considered adding to them a reflection operator. The associated deformed $su(2)$ algebra, $su_{\nu}(2)$, and the deformed $so(3)$ algebra, $so_{\nu}(3)$, are constructed for the widely used Jordan-Schwinger and Holstein-Primakoff realizations, commenting on various aspects and ingredients of the formalism for both single-mode and two-mode cases. Finally, in this framework the parity-deformed $so_{\nu}(3)$ representation is analyzed, due to its potential application in the study of qubit and qutrit systems.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# 家庭のIoTが再び育つ

The IoT Breaches your Household Again ( http://arxiv.org/abs/2407.12159v1 )

ライセンス: Link先を確認

Davide Bonaventura, Sergio Esposito, Giampaolo Bella,

(参考訳) その明らかな単純さにもかかわらず、スマート電球や電気プラグのようなデバイスは、厳格なセキュリティ対策から除外されることが多い。しかし、本稿は、この誤解に挑戦し、これらの一見無害なデバイスの脆弱性がどのようにユーザを重大なリスクに晒すかを明らかにする。本報告では, これまでの研究の概要を概説し, 新たな攻撃シナリオを導入する。この新たな攻撃により、悪意のあるアクターは、被害者のTapoアカウントのEメールやパスワード、SSID、ローカルネットワークのパスワードなど、機密情報を取得することができる。さらに、同じIoTエコシステム内の他のスマートデバイス、特にTp-Linkによって製造されたデバイスに対して、これらの発見を部分的にあるいは完全に複製する方法を実証する。調査は、スマート電球(Tapo L530E, Tapo L510E V2, Tapo L630)、スマートプラグ(Tapo P100)、スマートカメラ(Tapo C200)を含むTp-Link Tapoの範囲に焦点を当てた。類似した通信プロトコル,あるいは若干の変種を用いることで,新たに同定された攻撃シナリオを含むすべての攻撃シナリオの完全活用が可能であることが判明した。逆に、Tapo P100とTapo C200は攻撃シナリオのサブセットにのみ脆弱性を示す。結論として、これらの脆弱性とその潜在的な影響を強調して、認識を高め、スマートデバイスデプロイメントにおけるセキュリティリスクを軽減するための積極的なステップを促進することを目指している。

Despite their apparent simplicity, devices like smart light bulbs and electrical plugs are often perceived as exempt from rigorous security measures. However, this paper challenges this misconception, uncovering how vulnerabilities in these seemingly innocuous devices can expose users to significant risks. This paper extends the findings outlined in previous work, introducing a novel attack scenario. This new attack allows malicious actors to obtain sensitive credentials, including the victim's Tapo account email and password, as well as the SSID and password of her local network. Furthermore, we demonstrate how these findings can be replicated, either partially or fully, across other smart devices within the same IoT ecosystem, specifically those manufactured by Tp-Link. Our investigation focused on the Tp-Link Tapo range, encompassing smart bulbs (Tapo L530E, Tapo L510E V2, and Tapo L630), a smart plug (Tapo P100), and a smart camera (Tapo C200). Utilizing similar communication protocols, or slight variants thereof, we found that the Tapo L530E, Tapo L510E V2, and Tapo L630 are susceptible to complete exploitation of all attack scenarios, including the newly identified one. Conversely, the Tapo P100 and Tapo C200 exhibit vulnerabilities to only a subset of attack scenarios. In conclusion, by highlighting these vulnerabilities and their potential impact, we aim to raise awareness and encourage proactive steps towards mitigating security risks in smart device deployment.

翻訳日:2024-07-18 19:18:21 公開日:2024-07-16

# 行動の解釈可能性:マインクラフトエージェントVPTの探索分析

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent ( http://arxiv.org/abs/2407.12161v1 )

ライセンス: Link先を確認

Karolis Jucys, George Adamopoulos, Mehrab Hamidi, Stephanie Milani, Mohammad Reza Samsami, Artem Zholus, Sonia Joseph, Blake Richards, Irina Rish, Özgür Şimşek,

(参考訳) 意思決定タスクにおける大規模基盤モデルによる意思決定の背後にあるメカニズムを理解することは、そのようなシステムが透過的かつ安全に動作することを保証するために重要である。本研究では,最大規模のオープンソースビジョンベースエージェントである Video PreTraining (VPT) Minecraft プレイエージェントについて探索分析を行った。我々は,様々な解釈可能性技術を適用して,その推論機構を照らし出すことを目的とする。まず、エージェントがトレーニングタスクを完了している間の注意機構を分析し、ダイヤモンドピックアックスを製作する。エージェントは6秒のメモリで最後の4フレームといくつかのキーフレームに注意を払っている。これは、メモリが短いにもかかわらず、3～10分かかるタスクでコヒーレンスを維持するためのメカニズムである。第2に,我々は様々な介入を行い,目標誤一般化の懸念事例を明らかにするのに役立つ。VPTは,緑葉の葉の下に静止しているときに,茶色の服を着ている村人を誤って木の幹として特定し,それを打倒する。

Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task - crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# ベルマン拡散モデル

Bellman Diffusion Models ( http://arxiv.org/abs/2407.12163v1 )

ライセンス: Link先を確認

Liam Schramm, Abdeslam Boularias,

(参考訳) 拡散モデルは生成的アーキテクチャとして大きな成功を収めた。近年,オフライン強化学習や模倣学習のためのポリシーのモデル化に有効であることが示されている。政策の後継状態尺度(SSM)のモデルクラスとして拡散を利用する方法について検討する。ベルマンフローの制約を強制することは、拡散ステップ分布の単純なベルマン更新につながる。

Diffusion models have seen tremendous success as generative architectures. Recently, they have been shown to be effective at modelling policies for offline reinforcement learning and imitation learning. We explore using diffusion as a model class for the successor state measure (SSM) of a policy. We find that enforcing the Bellman flow constraints leads to a simple Bellman update on the diffusion step distribution.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# 選好型強化学習による主観的テキスト・ツー・イメージ生成

Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning ( http://arxiv.org/abs/2407.12164v1 )

ライセンス: Link先を確認

Yanting Miao, William Loh, Suraj Kothawade, Pascal Poupart, Abdullah Rashwan, Yeqing Li,

(参考訳) 近年,テキスト・ツー・イメージ生成モデルが注目され,テキスト・プロンプトから高品質な画像の合成が可能となった。しかし、これらのモデルには、与えられた参照画像から特定の主題を生成する能力や、異なる条件下で新規な表現を合成する能力がないことが多い。 DreamBooth や Subject-driven Text-to-Image (SuTI) のような手法はこの分野で大きな進歩を遂げている。しかし、どちらのアプローチも主に参照画像との類似性の向上に重点を置いており、しばしば効率的なトレーニングの必要性を見落とし、参照画像への過度な適合を避けるために高価なセットアップを必要としている。本稿では,信頼度の高い報奨信号を提供する$\lambda$-Harmonic reward関数を提案する。 Bradley-Terry の選好モデルを組み合わせることで、$\lambda$-Harmonic reward関数は主観駆動生成タスクの選好ラベルも提供する。本稿では,Reward Preference Optimization(RPO)を提案する。これはより簡単なセットアップ(DreamBoothが使用する負のサンプルのわずか$3\%)と,微調整のための勾配ステップの削減を実現する。既存の方法とは異なり,本手法ではテキストエンコーダのトレーニングやテキスト埋め込みの最適化を必要とせず,U-Netコンポーネントのみを微調整することでテキストイメージアライメントを実現する。経験的に、$\lambda$-Harmonicは、主観駆動生成タスクにおけるモデル選択の信頼性の高いアプローチであることが証明されている。このアルゴリズムは、好みラベルと$\lambda$-Harmonic reward関数の早期停止検証に基づいて、最先端のCLIP-Iスコア0.833、DreamBenchのCLIP-Tスコア0.314を達成する。

Text-to-image generative models have recently attracted considerable interest, enabling the synthesis of high-quality images from textual prompts. However, these models often lack the capability to generate specific subjects from given reference images or to synthesize novel renditions under varying conditions. Methods like DreamBooth and Subject-driven Text-to-Image (SuTI) have made significant progress in this area. Yet, both approaches primarily focus on enhancing similarity to reference images and require expensive setups, often overlooking the need for efficient training and avoiding overfitting to the reference images. In this work, we present the $\lambda$-Harmonic reward function, which provides a reliable reward signal and enables early stopping for faster training and effective regularization. By combining the Bradley-Terry preference model, the $\lambda$-Harmonic reward function also provides preference labels for subject-driven generation tasks. We propose Reward Preference Optimization (RPO), which offers a simpler setup (requiring only $3\%$ of the negative samples used by DreamBooth) and fewer gradient steps for fine-tuning. Unlike most existing methods, our approach does not require training a text encoder or optimizing text embeddings and achieves text-image alignment by fine-tuning only the U-Net component. Empirically, $\lambda$-Harmonic proves to be a reliable approach for model selection in subject-driven generation tasks. Based on preference labels and early stopping validation from the $\lambda$-Harmonic reward function, our algorithm achieves a state-of-the-art CLIP-I score of 0.833 and a CLIP-T score of 0.314 on DreamBench.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# 自律クラウドのためのAIエージェントの構築 - 課題と設計原則

Building AI Agents for Autonomous Clouds: Challenges and Design Principles ( http://arxiv.org/abs/2407.12165v1 )

ライセンス: Link先を確認

Manish Shetty, Yinfang Chen, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, Suman Nath, Chetan Bansal, Saravan Rajmohan,

(参考訳) ソフトウェア開発とデプロイメントの一部としてのLarge Language Models(LLM)とAI Agentsの利用の急速な成長は、情報技術の展望に革命をもたらしている。コード生成は大きな注目を集める一方で、AIエージェントをクラウドサービスの運用上のレジリエンスに使用する場合、よりインパクトの高いアプリケーションは、現在、かなりの人的努力とドメイン知識を必要としている。 AI for IT Operations(AIOps)には、障害のローカライゼーションや根本原因分析といった複雑な運用タスクを自動化することを目的としている。しかし、自律的で自己修復的なクラウドのビジョンを達成することは、AIOpsエージェントの構築、評価、改善のための標準化されたフレームワークが欠如していることによって妨げられている。このビジョンペーパーは、まず要求をフレーミングし、それを満たす設計決定について議論することで、そのようなフレームワークの基礎を定めます。また、アプリケーションをオーケストレーションし、カオスエンジニアリングを使用してリアルタイム障害を注入するエージェント-クラウドインターフェースを活用したプロトタイプ実装であるAIOpsLabや、障害のローカライズと解決を行うエージェントとのインターフェースも提案する。我々は有望な結果を報告し、自律クラウドのエージェントの構築、評価、改善のためのモジュラーで堅牢なフレームワークを構築するための基礎を築きます。

The rapid growth in the use of Large Language Models (LLMs) and AI Agents as part of software development and deployment is revolutionizing the information technology landscape. While code generation receives significant attention, a higher-impact application lies in using AI agents for operational resilience of cloud services, which currently require significant human effort and domain knowledge. There is a growing interest in AI for IT Operations (AIOps) which aims to automate complex operational tasks, like fault localization and root cause analysis, thereby reducing human intervention and customer impact. However, achieving the vision of autonomous and self-healing clouds though AIOps is hampered by the lack of standardized frameworks for building, evaluating, and improving AIOps agents. This vision paper lays the groundwork for such a framework by first framing the requirements and then discussing design decisions that satisfy them. We also propose AIOpsLab, a prototype implementation leveraging agent-cloud-interface that orchestrates an application, injects real-time faults using chaos engineering, and interfaces with an agent to localize and resolve the faults. We report promising results and lay the groundwork to build a modular and robust framework for building, evaluating, and improving agents for autonomous clouds.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# 乱流雰囲気のダイナミクス予測のためのスケーラブルなリアルタイムデータ同化フレームワーク

A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics ( http://arxiv.org/abs/2407.12168v1 )

ライセンス: Link先を確認

Junqi Yin, Siming Liang, Siyan Liu, Feng Bao, Hristo G. Chipilski, Dan Lu, Guannan Zhang,

(参考訳) 天気と気候のドメインは、FourCastNet、GraphCast、ClimaX、Pangu-WeatherといったAIベースの基盤モデルの進歩により、大きな変革が進んでいる。これらのモデルはかなりの可能性を示しているが、気象予報や気候予報にはまだ運用される準備ができていない。これは、受信した地球系の観測をリアルタイムで同化可能にするために、ワークフローの一部としてデータ同化法が欠如しているためである。この制限は、熱帯のサイクロンや大気の川のような複雑な大気現象を予測する効果に影響を及ぼす。これらの障害を克服するために,汎用的なリアルタイムデータ同化フレームワークを導入し,Frontierスーパーコンピュータ上でのエンド・ツー・エンドの性能を示す。アンサンブルスコアフィルタ(EnSF)は、局所アンサンブル変換カルマンフィルタ(LETKF)と、観測データの統合によるリアルタイム適応が可能な視覚変換器ベースのサロゲートの2つの主要モジュールから構成される。 ViTサロゲートは、物理ベースのモデルまたはAIベースのファンデーションモデルのいずれかを表現することができる。 ExascaleスーパーコンピュータであるFrontier上では、私たちのフレームワークの強いスケーリングと弱いスケーリングの両方を1024GPUで実証しています。本研究は,高性能コンピューティングシステムにおけるフレームワークの卓越したスケーラビリティを示すだけでなく,気象・気候予報のリアルタイムデータ同化におけるスーパーコンピュータの重要性を示すものである。提案したフレームワークは、ベンチマーク表面の準地磁気(SQG)乱流システムでのみテストされるが、既存のAIベースの基盤モデルと組み合わせる可能性があり、将来の運用実装に適している。

The weather and climate domains are undergoing a significant transformation thanks to advances in AI-based foundation models such as FourCastNet, GraphCast, ClimaX and Pangu-Weather. While these models show considerable potential, they are not ready yet for operational use in weather forecasting or climate prediction. This is due to the lack of a data assimilation method as part of their workflow to enable the assimilation of incoming Earth system observations in real time. This limitation affects their effectiveness in predicting complex atmospheric phenomena such as tropical cyclones and atmospheric rivers. To overcome these obstacles, we introduce a generic real-time data assimilation framework and demonstrate its end-to-end performance on the Frontier supercomputer. This framework comprises two primary modules: an ensemble score filter (EnSF), which significantly outperforms the state-of-the-art data assimilation method, namely, the Local Ensemble Transform Kalman Filter (LETKF); and a vision transformer-based surrogate capable of real-time adaptation through the integration of observational data. The ViT surrogate can represent either physics-based models or AI-based foundation models. We demonstrate both the strong and weak scaling of our framework up to 1024 GPUs on the Exascale supercomputer, Frontier. Our results not only illustrate the framework's exceptional scalability on high-performance computing systems, but also demonstrate the importance of supercomputers in real-time data assimilation for weather and climate predictions. Even though the proposed framework is tested only on a benchmark surface quasi-geostrophic (SQG) turbulence system, it has the potential to be combined with existing AI-based foundation models, making it suitable for future operational implementations.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# ブロックチェーンにおけるThreshold暗号システムのレイテンシ価格

The Latency Price of Threshold Cryptosystem in Blockchains ( http://arxiv.org/abs/2407.12172v1 )

ライセンス: Link先を確認

Zhuolun Xiang, Sourav Das, Zekun Li, Zhoujun Ma, Alexander Spiegelman,

(参考訳) 閾値暗号は多くのブロックチェーンプロトコルに必須である。例えば、多くのプロトコルは、非同期コンセンサス、リーダー選挙を実装し、ランダム化されたアプリケーションのサポートを提供するために、しきい値共通のコインに依存している。同様に、しきい値署名スキームはプロトコル効率や状態認証に頻繁に使用され、しきい値復号としきい値タイムロックパズルはプライバシーのために必要となることが多い。本稿では,Byzantine-fault Tolerant(BFT)コンセンサスプロトコルを用いて,レイテンシに着目したしきい値暗号とブロックチェーンのクラス間の相互作用について検討する。具体的には、ブロックチェーンネイティブなしきい値暗号システムに注目し、ブロックチェーン検証者は、しきい値暗号プロトコルへの入力としてブロック内容を持つブロック毎に、しきい値暗号プロトコルを実行しようとする。ブロックチェーンネイティブなしきい値暗号システムに対する既存のアプローチはすべて、しきい値暗号プロトコルを実行するための少なくとも1つのメッセージ遅延のレイテンシオーバーヘッドを導入している。本稿では,秘密と復元しきい値が同じしきい値暗号プロトコルにおいて,厳密なしきい値を持つブロックチェーンネイティブのしきい値暗号システムに対して,このオーバーヘッドを解消する機構を最初に提案する。しかし、現実の証明ベースのブロックチェーンネイティブなしきい値暗号システムの多くは、復元しきい値が機密しきい値より厳密に大きいランプしきい値に依存している。これらのブロックチェーンについては、追加の遅延が避けられないことを正式に示しています。次に、楽観的な場合において、この遅延を最小限に抑えるメカニズムを導入する。我々は,Aptosブロックチェーン上での分散ランダム性証明のための楽観的なプロトコルを実装した。 Aptosのメインネットからの測定によると、楽観的なアプローチは遅延オーバーヘッドを71%削減する。

Threshold cryptography is essential for many blockchain protocols. For example, many protocols rely on threshold common coin to implement asynchronous consensus, leader elections, and provide support for randomized applications. Similarly, threshold signature schemes are frequently used for protocol efficiency and state certification, and threshold decryption and threshold time-lock puzzles are often necessary for privacy. In this paper, we study the interplay between threshold cryptography and a class of blockchains that use Byzantine-fault tolerant (BFT) consensus protocols with a focus on latency. More specifically, we focus on blockchain-native threshold cryptosystem, where the blockchain validators seek to run a threshold cryptographic protocol once for every block with the block contents as an input to the threshold cryptographic protocol. All existing approaches for blockchain-native threshold cryptosystems introduce a latency overhead of at least one message delay for running the threshold cryptographic protocol. In this paper, we first propose a mechanism to eliminate this overhead for blockchain-native threshold cryptosystems with tight thresholds, i.e., in threshold cryptographic protocols where the secrecy and reconstruction thresholds are the same. However, many real-world proof-of-stake-based blockchain-native threshold cryptosystems rely on ramp thresholds, where reconstruction thresholds are strictly greater than secrecy thresholds. For these blockchains, we formally demonstrate that the additional delay is unavoidable. We then introduce a mechanism to minimize this delay in the optimistic case. We implement our optimistic protocol for the proof-of-stake distributed randomness scheme on the Aptos blockchain. Our measurements from the Aptos mainnet show that the optimistic approach reduces latency overhead by 71%.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# ベータサンプリングは必要なすべて:ステップワイド分光分析を用いた拡散モデルのための効率的な画像生成戦略

Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis ( http://arxiv.org/abs/2407.12173v1 )

ライセンス: Link先を確認

Haeil Lee, Hansang Lee, Seoyeon Gye, Junmo Kim,

(参考訳) 生成拡散モデルは高品質な画像合成のための強力なツールとして登場してきたが、その反復性は重要な計算資源を必要とする。本稿では,拡散過程の画像スペクトル分析に基づく効率的な時間ステップサンプリング手法を提案する。従来の均一な分散ベースのタイムステップサンプリングの代わりに、プロセスの初期段階と後期において重要なステップを優先する、ベータディストリビューションのようなサンプリング技術を導入します。我々の仮説では、あるステップは画像の内容に大きな変化を示すが、他のステップは最小限に寄与する。フーリエ変換を用いて各ステップの周波数応答変化を計測し, 早期の低周波変化と, その後の高周波調整について検証した。 ADMとStable Diffusionを用いた実験では、ベータサンプリング法は一貫して一様サンプリングよりも優れ、FIDとISスコアが向上し、AutoDiffusionのような最先端の手法と比較して競争効率が向上することを示した。この研究は、計算資源を最も影響の大きいステップに集中させることで拡散モデルの効率を高めるための実践的なフレームワークを提供し、さらなる最適化とより広範な応用の可能性を秘めている。

Generative diffusion models have emerged as a powerful tool for high-quality image synthesis, yet their iterative nature demands significant computational resources. This paper proposes an efficient time step sampling method based on an image spectral analysis of the diffusion process, aimed at optimizing the denoising process. Instead of the traditional uniform distribution-based time step sampling, we introduce a Beta distribution-like sampling technique that prioritizes critical steps in the early and late stages of the process. Our hypothesis is that certain steps exhibit significant changes in image content, while others contribute minimally. We validated our approach using Fourier transforms to measure frequency response changes at each step, revealing substantial low-frequency changes early on and high-frequency adjustments later. Experiments with ADM and Stable Diffusion demonstrated that our Beta Sampling method consistently outperforms uniform sampling, achieving better FID and IS scores, and offers competitive efficiency relative to state-of-the-art methods like AutoDiffusion. This work provides a practical framework for enhancing diffusion model efficiency by focusing computational resources on the most impactful steps, with potential for further optimization and broader application.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# GPT-4Vは放射線学のレポートをまだ生成できない

GPT-4V Cannot Generate Radiology Reports Yet ( http://arxiv.org/abs/2407.12176v1 )

ライセンス: Link先を確認

Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao Tan,

(参考訳) GPT-4Vの強いマルチモーダル能力は、放射線学レポート作成の自動化に関心を喚起するが、徹底的な評価は得られていない。本研究では,2つの胸部X線レポートデータセット(MIMIC-CXRとIU X-Ray)について,GPT-4Vの系統的評価を行った。我々は, GPT-4V を用いた報告を異なるプロンプト戦略により直接生成し, 語彙指標と臨床効果指標の両方で異常を生じさせることを試みた。低パフォーマンスを理解するために、タスクを2つのステップに分解します。 1)画像から医療条件ラベルを予測するための医用画像推論ステップ 2)(地中)条件から報告を生成するための報告合成ステップ。画像推論におけるGPT-4Vの性能は、異なるプロンプト間で一貫して低いことを示す。実際、モデル予測ラベルの分布は、画像上にどの基底条件が存在するかに関わらず一定であり、モデルが胸部X線を有意に解釈していないことを示唆している。レポート合成における基底条件が与えられたとしても、その生成した報告は微調整されたLLaMA-2よりも正確で自然音の少ないものである。また,GPT-4Vを放射線学のワークフローで用いる可能性についても疑念を呈していた。

GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# 線形回帰モデルはホワイトボックスと解釈可能か?

Are Linear Regression Models White Box and Interpretable? ( http://arxiv.org/abs/2407.12177v1 )

ライセンス: Link先を確認

Ahmed M Salih, Yuhe Wang,

(参考訳) 説明可能な人工知能(XAI)は、モデルを理解し解釈するために機械学習モデルに適用または組み込んだ一連のツールとアルゴリズムである。人間の視点では解釈できないため、ディープニューラルネットワークを含む複雑なモデルや高度なモデルでは特に推奨される。一方、線形回帰を含む単純なモデルは実装が容易であり、計算の複雑さが小さく、出力の可視化も容易である。文学における一般的な概念は、線形回帰を含む単純なモデルはより解釈可能で理解しやすいことから「白い箱」と見なされる。これは線形回帰モデルがモデルの特徴の影響やモデル出力に対して正あるいは負の影響を及ぼすかどうかなど、いくつかの好ましい結果をもたらすという考えに基づいている。さらに、信頼区間を用いてモデルの不確実性を計測または推定することができる。しかし、この認識は正確ではなく、線形回帰モデルは一般的なXAIメトリクスや潜在的な課題を考えると、容易には理解できない。これには線形性、局所的説明、多重線型性、共変量、正規化、不確実性、特徴の寄与と公正性が含まれる。したがって、説明可能性や解釈可能性に関して、いわゆる単純なモデルは、複雑なモデルに対して等しく扱われるべきである。

Explainable artificial intelligence (XAI) is a set of tools and algorithms that applied or embedded to machine learning models to understand and interpret the models. They are recommended especially for complex or advanced models including deep neural network because they are not interpretable from human point of view. On the other hand, simple models including linear regression are easy to implement, has less computational complexity and easy to visualize the output. The common notion in the literature that simple models including linear regression are considered as "white box" because they are more interpretable and easier to understand. This is based on the idea that linear regression models have several favorable outcomes including the effect of the features in the model and whether they affect positively or negatively toward model output. Moreover, uncertainty of the model can be measured or estimated using the confidence interval. However, we argue that this perception is not accurate and linear regression models are not easy to interpret neither easy to understand considering common XAI metrics and possible challenges might face. This includes linearity, local explanation, multicollinearity, covariates, normalization, uncertainty, features contribution and fairness. Consequently, we recommend the so-called simple models should be treated equally to complex models when it comes to explainability and interpretability.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# 探査

Exploration Unbound ( http://arxiv.org/abs/2407.12178v1 )

ライセンス: Link先を確認

Dilip Arumugam, Wanqiao Xu, Benjamin Van Roy,

(参考訳) シーケンシャルな意思決定エージェントは、環境に関する新しい知識を得るための探索と、現在の知識を活用して即時報酬を最大化することのバランスをとる。伝統的な文献で研究される環境において、エージェントが十分な知識を蓄積し、さらなる探索の恩恵が消えるにつれて、最適な決定は時間の経過とともに搾取へと導かれる。しかし、もし環境が無限に有用な知識を提供しており、エージェントがどれだけ学習したとしても、さらなる探索には大きなメリットがあるとしたらどうだろうか? このような複雑な環境の単純で簡潔な例を示します。この環境では、報酬は非有界であり、エージェントは常に、より多くのことを学ぶことで報酬が蓄積される率を高めることができる。その結果、最適なエージェントは、探索する確率を永久に維持する。

A sequential decision-making agent balances between exploring to gain new knowledge about an environment and exploiting current knowledge to maximize immediate reward. For environments studied in the traditional literature, optimal decisions gravitate over time toward exploitation as the agent accumulates sufficient knowledge and the benefits of further exploration vanish. What if, however, the environment offers an unlimited amount of useful knowledge and there is large benefit to further exploration no matter how much the agent has learned? We offer a simple, quintessential example of such a complex environment. In this environment, rewards are unbounded and an agent can always increase the rate at which rewards accumulate by exploring to learn more. Consequently, an optimal agent forever maintains a propensity to explore.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# 半月板異常の画像再構成評価と臨床的解釈を支援する物体検出法

The object detection method aids in image reconstruction evaluation and clinical interpretation of meniscal abnormalities ( http://arxiv.org/abs/2407.12184v1 )

ライセンス: Link先を確認

Natalia Konovalova, Aniket Tolpadi, Felix Liu, Zehra Akkaya, Felix Gassert, Paula Giesler, Johanna Luitjens, Misung Han, Emma Bahroos, Sharmila Majumdar, Valentina Pedoia,

(参考訳) 本研究は, 深層学習(DL)画像再構成の品質と異常検出性能の関係について検討し, 再建画像に対する半月異常の解釈を高度化するための人工知能(AI)アシスタントの有効性を評価する。 896例の膝関節MRI画像を評価するために, 室内再建と異常検出パイプラインを用いて回顧調査を行った。 DL再構成画像の原画像と14セットを,新たに開発されたボックスベース再構築指標とともに,標準再構成とオブジェクト検出指標を用いて評価した。 2人の臨床放射線技師が50人の患者の画像のサブセットをレビューした。その結果, 構造類似度指数 (SSIM) は異常検出指標 (mAP, r=0.64, p=0.01; F1 score, r=0.38, p=0.18) との相関が弱く, ボックスベースSSIMは検出性能 (mAP, r=0.81, p<0.01; F1 score, r=0.65, p=0.01) と強い相関を示した。 SSIMの小さな変動は検出結果には影響しなかったが、大きな変化は性能を低下させた。放射線技師によるAIによる評価では、精度が改善(援助なし86.0%、援助なし88.3%、p<0.05)し、インターラッター契約(Cohen's kappa、援助なし0.39、援助なし0.57)した。さらなるレビューにより、データセットにさらに17の病変が組み込まれた。提案手法は, 自動作業のための再構成アルゴリズムの評価と, DL再構成MR画像の解釈において, 放射線技師を支援することの確証を示す。

This study investigates the relationship between deep learning (DL) image reconstruction quality and anomaly detection performance, and evaluates the efficacy of an artificial intelligence (AI) assistant in enhancing radiologists' interpretation of meniscal anomalies on reconstructed images. A retrospective study was conducted using an in-house reconstruction and anomaly detection pipeline to assess knee MR images from 896 patients. The original and 14 sets of DL-reconstructed images were evaluated using standard reconstruction and object detection metrics, alongside newly developed box-based reconstruction metrics. Two clinical radiologists reviewed a subset of 50 patients' images, both original and AI-assisted reconstructed, with subsequent assessment of their accuracy and performance characteristics. Results indicated that the structural similarity index (SSIM) showed a weaker correlation with anomaly detection metrics (mAP, r=0.64, p=0.01; F1 score, r=0.38, p=0.18), while box-based SSIM had a stronger association with detection performance (mAP, r=0.81, p<0.01; F1 score, r=0.65, p=0.01). Minor SSIM fluctuations did not affect detection outcomes, but significant changes reduced performance. Radiologists' AI-assisted evaluations demonstrated improved accuracy (86.0% without assistance vs. 88.3% with assistance, p<0.05) and interrater agreement (Cohen's kappa, 0.39 without assistance vs. 0.57 with assistance). An additional review led to the incorporation of 17 more lesions into the dataset. The proposed anomaly detection method shows promise in evaluating reconstruction algorithms for automated tasks and aiding radiologists in interpreting DL-reconstructed MR images.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# 深層強化学習のための満足度探索

Satisficing Exploration for Deep Reinforcement Learning ( http://arxiv.org/abs/2407.12185v1 )

ライセンス: Link先を確認

Dilip Arumugam, Saurabh Kumar, Ramki Gummadi, Benjamin Van Roy,

(参考訳) 強化学習アルゴリズムの設計におけるデフォルトの前提は、意思決定エージェントが常に最適な行動を学ぶことである。しかし、現実世界の広大さと規模にアプローチする十分な複雑な環境では、最適なパフォーマンスを達成することは、実際には完全に難解な試みであり、エージェントが最適な政策を特定するための必要な探索を完了させる立場に立つことは滅多にない。最近の研究は、情報理論から設計エージェントへのツールを活用し、十分な満足度や満足度のあるソリューションが、損失のある圧縮によって得られるように、最適なソリューションを意図的に強制する。特に、このようなエージェントは、データ集約的な最適なエージェントよりも効率よく満足度の高い行動を学ぶために、根本的に異なる探索的決定を採用する可能性がある。厳密な近似理論によって支持されているが、基礎となるアルゴリズムはモデルに基づく計画に依存しており、関数近似と高次元観測によるこれらのアイデアの互換性を劇的に制限している。本研究では、モデルベースの計画の必要性を回避し、満足度の高いポリシーを学習できるように、最適な値関数に対する不確実性を直接表現するエージェントを拡張して、この問題を是正する。我々は,本アルゴリズムが深層強化学習エージェントを満足度の高い行動にどのように適用できるかを示す,単純かつ図解的な実験を行った。マルチアームバンディットのこの設定に関する以前の研究に従えば、我々のアルゴリズムは、情報理論以外の手法よりも効率的に最適な振る舞いを合成できることが分かる。

A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisite exploration for identifying an optimal policy. Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions, obtained through lossy compression. Notably, such agents may employ fundamentally different exploratory decisions to learn satisficing behaviors more efficiently than optimal ones that are more data intensive. While supported by a rigorous corroborating theory, the underlying algorithm relies on model-based planning, drastically limiting the compatibility of these ideas with function approximation and high-dimensional observations. In this work, we remedy this issue by extending an agent that directly represents uncertainty over the optimal value function allowing it to both bypass the need for model-based planning and to learn satisficing policies. We provide simple yet illustrative experiments that demonstrate how our algorithm enables deep reinforcement-learning agents to achieve satisficing behaviors. In keeping with previous work on this setting for multi-armed bandits, we additionally find that our algorithm is capable of synthesizing optimal behaviors, when feasible, more efficiently than its non-information-theoretic counterpart.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# L2AI:IOMTブロックチェーン環境における軽量3要素認証と認証

L2AI: lightweight three-factor authentication and authorization in IOMT blockchain-based environment ( http://arxiv.org/abs/2407.12187v1 )

ライセンス: Link先を確認

Laleh Khajehzadeh, Hamid Barati, Ali Barati,

(参考訳) 医療用インターネット・オブ・モノ(IoMT)は、デジタル革命の次のフロンティアであり、医療で利用されている。このコンテキストにおいて、IoTは、最小限のインタラクションで、個人が重要なアクティビティをリモートで管理することを可能にする。しかしながら、ネットワークリソースの制限とセキュアなチャネルを確立することの課題、さらにはセキュアでない公開チャネルを通じて機密情報を共有および収集することは、医療用IoTにセキュリティ上の課題をもたらす。本稿では,ブロックチェーン環境におけるリアルタイムデータにアクセスするための,軽量な多要素認証と匿名ユーザ認証方式を提案する。このスキームはL2AIと呼ばれる安全でないチャネルを利用する。 L2AIは、疑似同一性および動的インデックス化を使用して、ユーザ匿名性を高めながら、セキュリティと効率を確保する。提案手法は,ユーザ登録処理の効率化を図り,既存のシステムと新たに追加されたシステムの両方に,新たなプロセスなしでアクセスできるようにする。このスキームは主に医療インフラなどの大規模システム向けに設計されているが、資源に制約のあるデバイスにも適している。この方式は、一方通行の暗号ハッシュ関数とビットワイズXOR操作に依存している。さらに、ユーザ側でファジィマイニングアルゴリズムを使用して、ユーザの生体情報を検証する。 L2AIはセキュリティ証明に"Real-Or-Random (ROR)"モデルを採用し、認証の証明にBANロジックを採用している。形式的セキュリティ検証は、L2AIの適切な機能を示す非公式なセキュリティ分析によって補完される"Automatic Validation of Internet Security Protocols and Programs"(Proverif)ツールを用いて行われる。

Medical Internet of Things (IoMT) is the next frontier in the digital revolution and is utilized in healthcare. In this context, IoT enables individuals to remotely manage their essential activities with minimal interaction. However, the limitations of network resources and the challenges of establishing a secure channel, as well as sharing and collecting sensitive information through an insecure public channel, pose security challenges for the medical IoT. This paper presents a lightweight multi-factor authentication and anonymous user authentication scheme to access real-time data in a blockchain-based environment. The scheme utilizes an insecure channel called L2AI. L2AI ensures security and efficiency while enhancing user anonymity through the use of pseudo-identity and dynamic indexing. The proposed method supports highly scalable systems with an efficient user registration process, allowing authenticated users to access both existing and newly added system entities without additional processes. Although the scheme is primarily designed for large systems, such as health infrastructure, it is also suitable for resource-constrained devices. The scheme relies on one-way cryptographic hashing functions and bitwise XOR operations. Additionally, a fuzzy mining algorithm is employed on the user side to verify the user's biometric information. L2AI adopts the "Real-Or-Random (ROR)" model for security proof and employs BAN logic for proof of authenticity. Formal security verification is conducted using the "Automatic Validation of Internet Security Protocols and Programs" (Proverif) tool, complemented by informal security analysis demonstrating the proper functionality of L2AI.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# CroMo-Mixup: 継続的自己監督型学習のためのクロスモデル表現の拡張

CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning ( http://arxiv.org/abs/2407.12188v1 )

ライセンス: Link先を確認

Erum Mushtaq, Duygu Nur Yaldiz, Yavuz Faruk Bakman, Jie Ding, Chenyang Tao, Dimitrios Dimitriadis, Salman Avestimehr,

(参考訳) 連続自己教師付き学習(CSSL)は、ラベルのないデータに基づいて連続的に一連のタスクを学習する。継続的な学習の2つの主な課題は、破滅的な忘れとタスクの混乱である。 CSSL問題は、悲惨な忘れがちな課題に対処するために研究されているが、タスクの混乱に対処する作業はほとんど行われていない。本研究は,CSSLがタスク混乱問題,特に異なるタスクに属する異なるクラスが同時に訓練されないため,クラスインクリメンタルラーニングの少ない環境において,自己教師型学習(SSL)により CSSL がより受け入れやすいことを示す。この課題に触発され、2つの重要なコンポーネントを通じてこの問題に対処する新しいクロスモデル機能Mixup(CroMo-Mixup)フレームワークを提案する。 1)タスク間でサンプルを混合して負のサンプルの多様性を高めるクロスタスクデータ混成 2) クロスモデルの特徴は, 混合サンプルの現在のモデルと古いモデルから得られた埋め込みと原画像との類似性を学習し, クロスタスククラスコントラスト学習と古い知識検索を容易にする。 CIFAR10, CIFAR100, littleImageNetの3つのデータセットにおいて, タスクID予測と平均線形精度の両方を改善するためにCroMo-Mixupの有効性を評価した。最先端SSLの4つの目標に対して,CroMo-Mixupの互換性を検証する。コードは \url{https://github.com/ErumMushtaq/CroMo-Mixup} で公開されている。

Continual self-supervised learning (CSSL) learns a series of tasks sequentially on the unlabeled data. Two main challenges of continual learning are catastrophic forgetting and task confusion. While CSSL problem has been studied to address the catastrophic forgetting challenge, little work has been done to address the task confusion aspect. In this work, we show through extensive experiments that self-supervised learning (SSL) can make CSSL more susceptible to the task confusion problem, particularly in less diverse settings of class incremental learning because different classes belonging to different tasks are not trained concurrently. Motivated by this challenge, we present a novel cross-model feature Mixup (CroMo-Mixup) framework that addresses this issue through two key components: 1) Cross-Task data Mixup, which mixes samples across tasks to enhance negative sample diversity; and 2) Cross-Model feature Mixup, which learns similarities between embeddings obtained from current and old models of the mixed sample and the original images, facilitating cross-task class contrast learning and old knowledge retrieval. We evaluate the effectiveness of CroMo-Mixup to improve both Task-ID prediction and average linear accuracy across all tasks on three datasets, CIFAR10, CIFAR100, and tinyImageNet under different class-incremental learning settings. We validate the compatibility of CroMo-Mixup on four state-of-the-art SSL objectives. Code is available at \url{https://github.com/ErumMushtaq/CroMo-Mixup}.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# MASIVE:英語とスペイン語でオープンエンディングされた影響のある国家識別

MASIVE: Open-Ended Affective State Identification in English and Spanish ( http://arxiv.org/abs/2407.12196v1 )

ライセンス: Link先を確認

Nicholas Deas, Elsbeth Turcan, Iván Pérez Mejía, Kathleen McKeown,

(参考訳) 感情分析の分野では、多くのNLP研究は、言語にまたがる限られた数の個別の感情カテゴリーを特定することに焦点を当てている。しかし、これらの基本セットはテキストデータを念頭に置いて設計されることはめったになく、文化、言語、方言は特定の感情がどのように解釈されるかに影響を与える。本研究は,人間が感情体験を記述するために使用する言葉を含む,事実上無拘束な「textit{affective states}」の範囲を広げる。私たちは、英語とスペイン語でReddit投稿のデータセットであるMASIVEを収集し、公開しています。次に、マスク付きスパン予測タスクとしてフレーム化された言語生成モデルに対する「textit{affective state Identification}」という新しい問題を定義する。このタスクでは、より小さな微調整された多言語モデルの方が、地域固有のスペイン感情状態においても、ずっと大きなLLMより優れていることが分かる。さらに,MASIVEの事前学習により,既存の感情ベンチマークのモデル性能が向上することを示す。最後に, 機械翻訳実験により, この課題に対して, ネイティブな話者記述データが不可欠であることが確認された。

In the field of emotion analysis, much NLP research focuses on identifying a limited number of discrete emotion categories, often applied across languages. These basic sets, however, are rarely designed with textual data in mind, and culture, language, and dialect can influence how particular emotions are interpreted. In this work, we broaden our scope to a practically unbounded set of \textit{affective states}, which includes any terms that humans use to describe their experiences of feeling. We collect and publish MASIVE, a dataset of Reddit posts in English and Spanish containing over 1,000 unique affective states each. We then define the new problem of \textit{affective state identification} for language generation models framed as a masked span prediction task. On this task, we find that smaller finetuned multilingual models outperform much larger LLMs, even on region-specific Spanish affective states. Additionally, we show that pretraining on MASIVE improves model performance on existing emotion benchmarks. Finally, through machine translation experiments, we find that native speaker-written data is vital to good performance on this task.

翻訳日:2024-07-18 19:08:36 公開日:2024-07-16

# ソフトロボットインタラクションのための解釈可能なビジュオ触覚予測モデルを目指して

Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot Interactions ( http://arxiv.org/abs/2407.12197v1 )

ライセンス: Link先を確認

Enrico Donato, Thomas George Thuruthel, Egidio Falotico,

(参考訳) 自律システムは予測不可能な環境をナビゲートし、外部オブジェクトと対話するという、複雑な課題に直面します。ロボットエージェントを現実世界の状況にうまく統合することは、世界モデルと予測スキルの融合を含む知覚能力に依存している。効果的な知覚モデルは、周囲を探索するために様々な感覚モダリティの融合の上に構築される。生の感覚モダリティに応用されたディープラーニングは、実行可能な選択肢を提供する。しかし、学習に基づく知覚表現は解釈が困難になる。この課題はソフトロボットにおいて特に顕著であり、構造や素材のコンプライアンスが予測をさらに困難にしている。我々の研究は、生成モデルを利用してソフトロボットのためのマルチモーダル認識モデルを構築し、対外物体との接触を予測・解釈するために、受容的・視覚的情報を活用することで、この複雑さに対処する。知覚モデルを理解するための一連のツールが提供され、学習段階の後に複数の感覚入力の融合と予測プロセスに光を当てる。我々は、知覚モデルとその制御目的への含意の展望を掘り下げる。

Autonomous systems face the intricate challenge of navigating unpredictable environments and interacting with external objects. The successful integration of robotic agents into real-world situations hinges on their perception capabilities, which involve amalgamating world models and predictive skills. Effective perception models build upon the fusion of various sensory modalities to probe the surroundings. Deep learning applied to raw sensory modalities offers a viable option. However, learning-based perceptive representations become difficult to interpret. This challenge is particularly pronounced in soft robots, where the compliance of structures and materials makes prediction even harder. Our work addresses this complexity by harnessing a generative model to construct a multi-modal perception model for soft robots and to leverage proprioceptive and visual information to anticipate and interpret contact interactions with external objects. A suite of tools to interpret the perception model is furnished, shedding light on the fusion and prediction processes across multiple sensory inputs after the learning phase. We will delve into the outlooks of the perception model and its implications for control purposes.

翻訳日:2024-07-18 18:58:45 公開日:2024-07-16

# 可逆性プロトタイプネットワークは、たぶんそんな感じだ

This Probably Looks Exactly Like That: An Invertible Prototypical Network ( http://arxiv.org/abs/2407.12200v1 )

ライセンス: Link先を確認

Zachariah Carmichael, Timothy Redgrave, Daniel Gonzalez Cedre, Walter J. Scheirer,

(参考訳) 概念に基づくニューラルネットワークと、フローベースの生成的分類器を組み合わせることで、教師あり学習に対する、本質的に説明可能な、正確に非可逆的なアプローチを実現します。概念に基づくニューラルネットワークの一種であるプロトタイプニューラルネットワークは、概念アノテーションを使わずに人間の理解可能な機械学習を実現する上で、エキサイティングな方法を示しているが、人間と機械のセマンティックギャップは、現在のアプローチを悩ませ続けている。原型的説明に対する間接的解釈関数への依存は、プロトタイプの情報的力に厳しい制限を課すことが判明した。このことから、潜在空間上の分布としてプロトタイプを非可逆的に学習することで、より堅牢で表現的で解釈可能なモデリングが可能になると仮定する。本稿では,ガウス混合モデルを用いて正規化フローを構成することにより,ProtoFlowと呼ばれるモデルを提案する。 The new-of-the-art in joint generative and predictive modeling and (2) achieves predictive performance with existing prototypeal neural network while capable more interpretation。

We combine concept-based neural networks with generative, flow-based classifiers into a novel, intrinsically explainable, exactly invertible approach to supervised learning. Prototypical neural networks, a type of concept-based neural network, represent an exciting way forward in realizing human-comprehensible machine learning without concept annotations, but a human-machine semantic gap continues to haunt current approaches. We find that reliance on indirect interpretation functions for prototypical explanations imposes a severe limit on prototypes' informative power. From this, we posit that invertibly learning prototypes as distributions over the latent space provides more robust, expressive, and interpretable modeling. We propose one such model, called ProtoFlow, by composing a normalizing flow with Gaussian mixture models. ProtoFlow (1) sets a new state-of-the-art in joint generative and predictive modeling and (2) achieves predictive performance comparable to existing prototypical neural networks while enabling richer interpretation.

翻訳日:2024-07-18 18:58:45 公開日:2024-07-16

# 発音自由ヘブライ語TSに対する言語モデリング手法

A Language Modeling Approach to Diacritic-Free Hebrew TTS ( http://arxiv.org/abs/2407.12206v1 )

ライセンス: Link先を確認

Amit Roth, Arnon Turetzky, Yossi Adi,

(参考訳) 我々はヘブライ語におけるテキスト音声(TTS)の課題に取り組む。伝統的なヘブライ語には、個人が与えられた言葉を発音する方法を指示するダイアクリティカル語が含まれているが、現代のヘブライ語ではほとんど使われていない。現代のヘブライ語における発音学の欠如は、読者が正しい発音を結論付け、文脈に基づいてどの音素を使うべきかを理解することを期待する結果となった。これにより、TSシステムにテキストから音声への正確なマッピングを行うという根本的な課題が生じる。本研究では,Hubrew TTSの課題に対して,言語モデリングのダイアクリティカルスフリーアプローチを採用することを提案する。モデルは個別の音声表現で動作し、ワードピーストークン化器で条件付けされる。本稿では,弱教師付きデータを用いて提案手法を最適化し,複数のダイアクリティカルベースTSシステムと比較する。その結果,提案手法は,生成音声の内容保存と自然性の両方を考慮した評価ベースラインよりも優れていることが示唆された。 page.cs.huji.ac.il/adiyoss-lab/HebTTS/

We tackle the task of text-to-speech (TTS) in Hebrew. Traditional Hebrew contains Diacritics, which dictate the way individuals should pronounce given words, however, modern Hebrew rarely uses them. The lack of diacritics in modern Hebrew results in readers expected to conclude the correct pronunciation and understand which phonemes to use based on the context. This imposes a fundamental challenge on TTS systems to accurately map between text-to-speech. In this work, we propose to adopt a language modeling Diacritics-Free approach, for the task of Hebrew TTS. The model operates on discrete speech representations and is conditioned on a word-piece tokenizer. We optimize the proposed method using in-the-wild weakly supervised data and compare it to several diacritic-based TTS systems. Results suggest the proposed method is superior to the evaluated baselines considering both content preservation and naturalness of the generated speech. Samples can be found under the following link: pages.cs.huji.ac.il/adiyoss-lab/HebTTS/

翻訳日:2024-07-18 18:58:45 公開日:2024-07-16

# NeuSurfEmb:CADモデルなしの高密度対応型6次元オブジェクト空間推定のための完全パイプライン

NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models ( http://arxiv.org/abs/2407.12207v1 )

ライセンス: Link先を確認

Francesco Milano, Jen Jen Chung, Hermann Blum, Roland Siegwart, Lionel Ott,

(参考訳) 6Dオブジェクトのポーズ推定のための最先端のアプローチはCADモデルの可用性を前提としており、ユーザーは合成トレーニングデータ生成のために物理ベースレンダリング(PBR)パイプラインを手動でセットアップする必要がある。どちらの要因も実際のシナリオにおけるこれらの手法の適用を制限する。本研究では,CADモデルを必要とせず,少数の実画像のみを入力として必要とする最先端のポーズ推定器を訓練できるパイプラインを提案する。提案手法は,Structure-from-Motion (SfM) とオブジェクトに依存しないセグメンテーションに基づいて,半自動で学習するNeuS2オブジェクト表現に基づいている。我々は、NeuS2とシンプルなカット・アンド・ペースト・オーグメンテーションの新規なビュー合成機能を利用して、自動的にフォトリアリスティックなオブジェクトレンダリングを生成し、通信ベースのSurfEmbポーズ推定器を訓練する。提案手法をLINEMOD-Occlusionデータセット上で評価し,各コンポーネントの影響を広範囲に検討し,CADモデルとPBRデータに基づくアプローチによる競合性能を示す。さらに,本手法は,従来のCADモデルのない手法よりも優れた精度とロバスト性を実現し,自己コンパイルされた実世界のオブジェクトに対するパイプラインの使用の容易さと有効性を実証する。ロボットコミュニティがこのシステムの恩恵を受けるために、https://www.github.com/ethz-asl/neusurfemb.comで公開します。

State-of-the-art approaches for 6D object pose estimation assume the availability of CAD models and require the user to manually set up physically-based rendering (PBR) pipelines for synthetic training data generation. Both factors limit the application of these methods in real-world scenarios. In this work, we present a pipeline that does not require CAD models and allows training a state-of-the-art pose estimator requiring only a small set of real images as input. Our method is based on a NeuS2 object representation, that we learn through a semi-automated procedure based on Structure-from-Motion (SfM) and object-agnostic segmentation. We exploit the novel-view synthesis ability of NeuS2 and simple cut-and-paste augmentation to automatically generate photorealistic object renderings, which we use to train the correspondence-based SurfEmb pose estimator. We evaluate our method on the LINEMOD-Occlusion dataset, extensively studying the impact of its individual components and showing competitive performance with respect to approaches based on CAD models and PBR data. We additionally demonstrate the ease of use and effectiveness of our pipeline on self-collected real-world objects, showing that our method outperforms state-of-the-art CAD-model-free approaches, with better accuracy and robustness to mild occlusions. To allow the robotics community to benefit from this system, we will publicly release it at https://www.github.com/ethz-asl/neusurfemb.

翻訳日:2024-07-18 18:58:45 公開日:2024-07-16

# 画像分類による自己監督型事前学習のベンチマーク

A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification ( http://arxiv.org/abs/2407.12210v1 )

ライセンス: Link先を確認

Markus Marks, Manuel Knott, Neehar Kondapaneni, Elijah Cole, Thijs Defraeye, Fernando Perez-Cruz, Pietro Perona,

(参考訳) 自己教師付き学習(SSL)は、データ自体が監視を提供する機械学習アプローチであり、外部ラベルの必要性を排除している。モデルは、プリテキストタスクを解くことで、データ構造やコンテキストについて学ぶことを余儀なくされます。 SSLでは、モデルが豊富で安価なラベル付きデータから学習でき、ラベルが高価またはアクセス不能なトレーニングモデルのコストが大幅に削減される。コンピュータビジョンでは、SSLは事前トレーニングに続き、教師付き転送、より小さなラベル付きデータセットでの少数ショット学習、および/または教師なしクラスタリングなどの下流タスクとして広く使用されている。残念ながら、すべてのダウンストリームタスクに対してSSLメソッドを評価し、学習した表現の質を客観的に測定することは不可能である。代わりに、SSLメソッドは、細調整、線形探索、k-nearest neighbors(kNN)などのドメイン内評価プロトコルを用いて評価される。しかし、これらの評価プロトコルが、データセット、メートル法、モデルアーキテクチャといった異なる条件下で、異なる下流タスクに対する事前訓練されたモデルの表現品質をどのように評価するかはよく分かっていない。 SSLの分類に基づく評価プロトコルがどのように相関し、異なるデータセットのダウンストリーム性能を予測するかを検討する。我々の研究には、11の一般的なイメージデータセットと26のモデルが含まれており、それらは異なるSSLメソッドまたは異なるモデルバックボーンで事前トレーニングされた。ドメイン内線形/kNN探索プロトコルは,平均してドメイン外性能の予測器として最適であることがわかった。さらに、バッチ正規化の重要性について検討し、異なる種類のデータセットドメインシフトに対するロバストな相関性を評価する。識別的自己管理手法と生成的自己管理手法の関係に関する仮定に挑戦し,その性能差の大部分は,モデルバックボーンの変更によって説明できることを示した。

Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. The model is forced to learn about the data structure or context by solving a pretext task. With SSL, models can learn from abundant and cheap unlabeled data, significantly reducing the cost of training models where labels are expensive or inaccessible.In Computer Vision, SSL is widely used as pre-training followed by a downstream task, such as supervised transfer, few-shot learning on smaller labeled data sets, and/or unsupervised clustering. Unfortunately, it is infeasible to evaluate SSL methods on all possible downstream tasks and objectively measure the quality of the learned representation. Instead, SSL methods are evaluated using in-domain evaluation protocols, such as fine-tuning, linear probing, and k-nearest neighbors (kNN). However, it is not well understood how well these evaluation protocols estimate the representation quality of a pre-trained model for different downstream tasks under different conditions, such as dataset, metric, and model architecture. We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types. Our study includes eleven common image datasets and 26 models that were pre-trained with different SSL methods or have different model backbones. We find that in-domain linear/kNN probing protocols are, on average, the best general predictors for out-of-domain performance. We further investigate the importance of batch normalization and evaluate how robust correlations are for different kinds of dataset domain shifts. We challenge assumptions about the relationship between discriminative and generative self-supervised methods, finding that most of their performance differences can be explained by changes to model backbones.

翻訳日:2024-07-18 18:58:45 公開日:2024-07-16

# 疫学不確かさの校正について--原理・パラドックス・紛争損失

On the Calibration of Epistemic Uncertainty: Principles, Paradoxes and Conflictual Loss ( http://arxiv.org/abs/2407.12211v1 )

ライセンス: Link先を確認

Mohammed Fellaji, Frédéric Pennerath, Brieuc Conan-Guez, Miguel Couceiro,

(参考訳) 予測分布の校正は、ディープラーニングにおいて広く研究されてきたが、Deep Ensembles、Bayesian Deep Networks、Evidential Deep Networksによって生み出されたより具体的な疫学的な不確実性については、同じことは言えない。測定可能ではあるが、この不確実性は、様々な選択肢が存在する事前に依存するため、客観的に校正することは困難である。それにもかかわらず、すべてのケースにおいて、疫学的不確実性は2つの形式的要件を満たす必要がある: 第一に、トレーニングデータセットが大きくなると減少し、第二に、モデル表現性が大きくなると増大しなければならない。これらの期待にもかかわらず、いくつかの基準データセットやモデルにおいて、疫学的不確実性の尺度がこれらの要件に違反し、時には予想とは全く逆の傾向を示すことが実験的に示されている。これらの期待と現実の間のパラドックスは、これらのモデルによって推定されるてんかんの不確実性の真の有用性に関する疑問を提起する。公式な議論は、この不一致は測度自体の欠陥ではなく、後部分布の近似が不十分なためであることを示している。そこで本研究では,これらの要求に則った競合損失という,深層アンサンブルの正規化関数を提案する。我々は,深層アンサンブルの性能や校正を犠牲にすることなく,疫学的不確実性の要件の双方を修復できることを実験的に示すことで,その強度を強調した。

The calibration of predictive distributions has been widely studied in deep learning, but the same cannot be said about the more specific epistemic uncertainty as produced by Deep Ensembles, Bayesian Deep Networks, or Evidential Deep Networks. Although measurable, this form of uncertainty is difficult to calibrate on an objective basis as it depends on the prior for which a variety of choices exist. Nevertheless, epistemic uncertainty must in all cases satisfy two formal requirements: first, it must decrease when the training dataset gets larger and, second, it must increase when the model expressiveness grows. Despite these expectations, our experimental study shows that on several reference datasets and models, measures of epistemic uncertainty violate these requirements, sometimes presenting trends completely opposite to those expected. These paradoxes between expectation and reality raise the question of the true utility of epistemic uncertainty as estimated by these models. A formal argument suggests that this disagreement is due to a poor approximation of the posterior distribution rather than to a flaw in the measure itself. Based on this observation, we propose a regularization function for deep ensembles, called conflictual loss in line with the above requirements. We emphasize its strengths by showing experimentally that it restores both requirements of epistemic uncertainty, without sacrificing either the performance or the calibration of the deep ensembles.

翻訳日:2024-07-18 18:58:45 公開日:2024-07-16

# よりロバストな低予算能動学習のための一般化被覆

Generalized Coverage for More Robust Low-Budget Active Learning ( http://arxiv.org/abs/2407.12212v1 )

ライセンス: Link先を確認

Wonho Bae, Junhyug Noh, Danica J. Sutherland,

(参考訳) Yehuda et al の ProbCover 法は低予算体制下での活発な学習のためのよく動機付けられたアルゴリズムであり、与えられた半径の球でデータ分布を探索しようとするものである。しかし,本アルゴリズムの性能は,この半径ハイパーパラメータの選択に極めて敏感であり,チューニングは非常に困難であり,本来のヒューリスティックは頻繁に失敗することを示した。したがって、特殊ケースとしてのProbCoverの目的を含む一般化された「被覆」の概念を導入する(そして理論的に動機づける)が、超パラメータ選択に対してはるかに堅牢な滑らかな概念を可能にする。本稿では、このカバレッジを最適化し、ProbCoverのアルゴリズムを一般化する効率的なグリージー手法を提案する。この目的は、$k$-medoidsの変種によって非グレードに最適化され、他の低予算のアクティブな学習方法との関係を明確にすることができる。総合的な実験では、MaxHerdingは複数の低予算画像分類ベンチマークにまたがる既存のアクティブな学習手法を超越し、ほとんどの競争的手法よりも計算コストが低い。

The ProbCover method of Yehuda et al. is a well-motivated algorithm for active learning in low-budget regimes, which attempts to "cover" the data distribution with balls of a given radius at selected data points. We demonstrate, however, that the performance of this algorithm is extremely sensitive to the choice of this radius hyper-parameter, and that tuning it is quite difficult, with the original heuristic frequently failing. We thus introduce (and theoretically motivate) a generalized notion of "coverage," including ProbCover's objective as a special case, but also allowing smoother notions that are far more robust to hyper-parameter choice. We propose an efficient greedy method to optimize this coverage, generalizing ProbCover's algorithm; due to its close connection to kernel herding, we call it "MaxHerding." The objective can also be optimized non-greedily through a variant of $k$-medoids, clarifying the relationship to other low-budget active learning methods. In comprehensive experiments, MaxHerding surpasses existing active learning methods across multiple low-budget image classification benchmarks, and does so with less computational cost than most competitive methods.

翻訳日:2024-07-18 18:58:45 公開日:2024-07-16

# VideoClusterNet:ビデオの自己監視と適応的クラスタリング

VideoClusterNet: Self-Supervised and Adaptive Clustering For Videos ( http://arxiv.org/abs/2407.12214v1 )

ライセンス: Link先を確認

Devesh Walawalkar, Pablo Garrido,

(参考訳) デジタルメディアのコンテンツ制作が進むにつれ、映画やテレビシリーズのエピソードを分析してキャラクタの主役を正確に特定する必要性が高まっており、特にビデオ顔クラスタリングは、検出された顔のトラックを共通の顔のアイデンティティでまとめることを目的としている。この問題は、ビデオフレームにまたがる特定の顔のポーズ、表情、外観、照明のバリエーションが多岐にわたるため、非常に難しい。ジェネリックな事前訓練された顔識別(ID)モデルは、高いダイナミックレンジのコンテンツとユニークなシネマティックスタイルを考えると、ビデオ制作領域に適さない。さらに、従来のクラスタリングアルゴリズムはデータセットをまたいだ個別のチューニングを必要とするハイパーパラメータに依存している。本稿では,ジェネリック・フェイスIDモデルから新しいビデオ・フェイス・トラックへの適応を,完全自己管理方式で学習する新しいビデオ・フェイス・クラスタリング手法を提案する。また,任意の入力ビデオに対して,微調整されたモデルの埋め込み空間に自動的に適応できるパラメータフリークラスタリングアルゴリズムを提案する。包括的な映画顔クラスタリングベンチマークが欠如しているため、第1世代の映画データセットであるMovieFaceClusterも提示する。私たちのデータセットは、映画業界の専門家によって手作業で作成されており、非常に困難な顔認証シナリオが含まれています。実験により,従来のテレビシリーズのデータセットでは,ベンチマークデータセットにおける難易度の高いメインストリームのシーンの処理と,最先端の性能が評価された。

With the rise of digital media content production, the need for analyzing movies and TV series episodes to locate the main cast of characters precisely is gaining importance.Specifically, Video Face Clustering aims to group together detected video face tracks with common facial identities. This problem is very challenging due to the large range of pose, expression, appearance, and lighting variations of a given face across video frames. Generic pre-trained Face Identification (ID) models fail to adapt well to the video production domain, given its high dynamic range content and also unique cinematic style. Furthermore, traditional clustering algorithms depend on hyperparameters requiring individual tuning across datasets. In this paper, we present a novel video face clustering approach that learns to adapt a generic face ID model to new video face tracks in a fully self-supervised fashion. We also propose a parameter-free clustering algorithm that is capable of automatically adapting to the finetuned model's embedding space for any input video. Due to the lack of comprehensive movie face clustering benchmarks, we also present a first-of-kind movie dataset: MovieFaceCluster. Our dataset is handpicked by film industry professionals and contains extremely challenging face ID scenarios. Experiments show our method's effectiveness in handling difficult mainstream movie scenes on our benchmark dataset and state-of-the-art performance on traditional TV series datasets.

翻訳日:2024-07-18 18:58:45 公開日:2024-07-16

# AFIDAF: ViT におけるアテンションの効率的な代替手段としてのフーリエと画像ドメイン適応フィルタの代替

AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs ( http://arxiv.org/abs/2407.12217v1 )

ライセンス: Link先を確認

Yunling Zheng, Zeyi Xu, Fanghui Xue, Biao Yang, Jiancheng Lyu, Shuai Zhang, Yingyong Qi, Jack Xin,

(参考訳) 本稿では,視覚バックボーン構築の代替として,特徴抽出のためのFourier と Image Domain Filtering の交互なアプローチを提案する。軽量モデル間の性能は、ImageNet-1K分類の最先端レベルに達し、オブジェクト検出やセグメンテーションの下流タスクも一貫して改善する。我々のアプローチは、視覚変換器(ViT)を圧縮するための新しいツールとしても機能する。

We propose and demonstrate an alternating Fourier and image domain filtering approach for feature extraction as an efficient alternative to build a vision backbone without using the computationally intensive attention. The performance among the lightweight models reaches the state-of-the-art level on ImageNet-1K classification, and improves downstream tasks on object detection and segmentation consistently as well. Our approach also serves as a new tool to compress vision transformers (ViTs).

翻訳日:2024-07-18 18:58:45 公開日:2024-07-16

# 静止画をダイナミックビデオに変える「Animate Your Motion」

Animate Your Motion: Turning Still Images into Dynamic Videos ( http://arxiv.org/abs/2403.10179v3 )

ライセンス: Link先を確認

Mingxiao Li, Bo Wan, Marie-Francine Moens, Tinne Tuytelaars,

(参考訳) 近年、拡散モデルはテキスト・ビデオ生成において顕著な進歩を遂げており、ユーザの意図をより正確に反映するために、ビデオ出力の制御を強化しようと試みている。従来の取り組みは主に、画像や深度マップのようなセマンティックな手がかりや、スケッチやオブジェクト境界ボックスの移動といったモーションベースの条件の採用に重点を置いている。セマンティックな入力はリッチなシーンコンテキストを提供するが、詳細な動きの特異性は欠く; 逆に、モーションインプットは正確な軌跡情報を提供するが、より広いセマンティックな物語を見逃す。図1に示すように、ビデオ生成のための拡散モデルにおいて、セマンティックキューとモーションキューの両方を初めて統合する。この目的のために,マルチモーダル入力を管理する新しい手法であるScene and Motion Conditional Diffusion (SMCD)を紹介した。認識された動作条件モジュールを組み込み、シーン条件を統合する様々なアプローチを調査し、異なるモーダル間のシナジーを促進する。モデルトレーニングでは、2つのモードの条件を分離し、2段階のトレーニングパイプラインを導入します。実験により,映像品質,動作精度,セマンティックコヒーレンスを著しく向上させることが示された。

In recent years, diffusion models have made remarkable strides in text-to-video generation, sparking a quest for enhanced control over video outputs to more accurately reflect user intentions. Traditional efforts predominantly focus on employing either semantic cues, like images or depth maps, or motion-based conditions, like moving sketches or object bounding boxes. Semantic inputs offer a rich scene context but lack detailed motion specificity; conversely, motion inputs provide precise trajectory information but miss the broader semantic narrative. For the first time, we integrate both semantic and motion cues within a diffusion model for video generation, as demonstrated in Fig 1. To this end, we introduce the Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs. It incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions, promoting synergy between different modalities. For model training, we separate the conditions for the two modalities, introducing a two-stage training pipeline. Experimental results demonstrate that our design significantly enhances video quality, motion precision, and semantic coherence.

翻訳日:2024-07-18 11:56:44 公開日:2024-07-16

# BraTS-PED:2023年度国際小児脳腫瘍研究会議報告

BraTS-PEDs: Results of the Multi-Consortium International Pediatric Brain Tumor Segmentation Challenge 2023 ( http://arxiv.org/abs/2407.08855v2 )

ライセンス: Link先を確認

Anahita Fathi Kazerooni, Nastaran Khalili, Xinyang Liu, Debanjan Haldar, Zhifan Jiang, Anna Zapaishchykova, Julija Pavaine, Lubdha M. Shah, Blaise V. Jones, Nakul Sheth, Sanjay P. Prabhu, Aaron S. McAllister, Wenxin Tu, Khanak K. Nandolia, Andres F. Rodriguez, Ibraheem Salman Shaikh, Mariana Sanchez Montano, Hollie Anne Lai, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Hannah Anderson, Syed Muhammed Anwar, Alejandro Aristizabal, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, James Eddy, Ivan Ezhov, Ariana M. Familiar, Keyvan Farahani, Deep Gandhi, Anurag Gottipati, Shuvanjan Haldar, Juan Eugenio Iglesias, Anastasia Janas, Elaine Elaine, Alexandros Karargyris, Hasan Kassem, Neda Khalili, Florian Kofler, Dominic LaBella, Koen Van Leemput, Hongwei B. Li, Nazanin Maleki, Zeke Meier, Bjoern Menze, Ahmed W. Moawad, Sarthak Pati, Marie Piraud, Tina Poussaint, Zachary J. Reitman, Jeffrey D. Rudie, Rachit Saluja, MIcah Sheller, Russell Takeshi Shinohara, Karthik Viswanathan, Chunhao Wang, Benedikt Wiestler, Walter F. Wiggins, Christos Davatzikos, Phillip B. Storm, Miriam Bornhorst, Roger Packer, Trent Hummel, Peter de Blank, Lindsey Hoffman, Mariam Aboian, Ali Nabavizadeh, Jeffrey B. Ware, Benjamin H. Kann, Brian Rood, Adam Resnick, Spyridon Bakas, Arastoo Vossough, Marius George Linguraru,

(参考訳) 小児中枢神経系腫瘍は、小児のがん関連死亡の主な原因である。小児の高次グリオーマの生存率は20%未満である。新しい治療法の開発は、再現可能で正確な集中的反応評価を必要とする多施設共同臨床試験に依存している。小児脳腫瘍に焦点を当てた第1回BraTS-PEDs 2023チャレンジ(BraTS-PEDs 2023 Challenge)の結果を報告する。この課題は、小児神経腫瘍学と臨床試験に特化した複数の国際コンソーシアムから取得したデータを利用した。 BraTS-PEDs 2023は、BraTS 2023の課題にまたがる標準的な定量的パフォーマンス評価指標を用いて、磁気共鳴画像から小児脳グリオーマのボリュームセグメンテーションアルゴリズムを評価することを目的とした。小児腫瘍分析におけるトップパフォーマンスのAIアプローチには、nnU-NetとSwin UNETR、Auto3DSeg、あるいはnnU-Netの自己組織化フレームワークによるアンサンブルが含まれていた。 BraTSPEDs 2023は、臨床医(神経腫瘍学者、神経放射線学者)とAI/画像科学者とのコラボレーションを促進し、より高速なデータ共有と自動ボリューム分析技術の開発を促進した。これらの進歩は臨床試験に大きく貢献し、脳腫瘍の子供のケアを改善する可能性がある。

Pediatric central nervous system tumors are the leading cause of cancer-related deaths in children. The five-year survival rate for high-grade glioma in children is less than 20%. The development of new treatments is dependent upon multi-institutional collaborative clinical trials requiring reproducible and accurate centralized response assessment. We present the results of the BraTS-PEDs 2023 challenge, the first Brain Tumor Segmentation (BraTS) challenge focused on pediatric brain tumors. This challenge utilized data acquired from multiple international consortia dedicated to pediatric neuro-oncology and clinical trials. BraTS-PEDs 2023 aimed to evaluate volumetric segmentation algorithms for pediatric brain gliomas from magnetic resonance imaging using standardized quantitative performance evaluation metrics employed across the BraTS 2023 challenges. The top-performing AI approaches for pediatric tumor analysis included ensembles of nnU-Net and Swin UNETR, Auto3DSeg, or nnU-Net with a self-supervised framework. The BraTSPEDs 2023 challenge fostered collaboration between clinicians (neuro-oncologists, neuroradiologists) and AI/imaging scientists, promoting faster data sharing and the development of automated volumetric analysis techniques. These advancements could significantly benefit clinical trials and improve the care of children with brain tumors.

翻訳日:2024-07-18 11:56:44 公開日:2024-07-16

# 情報検索と製品検索のギャップを埋める:eコマースへのQ&A勧告

Bridging the Gap Between Information Seeking and Product Search Systems: Q&A Recommendation for E-commerce ( http://arxiv.org/abs/2407.09653v2 )

ライセンス: Link先を確認

Saar Kuzi, Shervin Malmasi,

(参考訳) ショッピングミッションの消費者は、商品の理解を深め、購入決定に達するための反復的なプロセスにおいて、Web検索エンジンや質問回答(QA)システムのような製品検索と情報検索システムの両方を利用することが多い。商品検索は、購入者が自分の要求を満たす実際の商品をカタログで見つけるのに有用であるが、情報検索システムは、それらの要求を洗練させるために必要なあらゆる質問に答えるために利用することができる。最近、LLM(Large Language Models)の成功により、顧客が目標を迅速に効果的に達成するための2つのタスク間のギャップを埋める機会が開かれた。本稿では,ユーザに対して,製品検索に関連する質問応答(Q&A)ペアを推薦し,購入決定を支援することを提案する。本稿では、Q&Aペアの要件と特性、その生成、Q&Aレコメンデーションタスクの最適化など、問題のさまざまな側面について論じる。我々は、この新興分野における今後の研究を促進するための課題、オープンな課題、そして解決策を提案する。

Consumers on a shopping mission often leverage both product search and information seeking systems, such as web search engines and Question Answering (QA) systems, in an iterative process to improve their understanding of available products and reach a purchase decision. While product search is useful for shoppers to find the actual products meeting their requirements in the catalog, information seeking systems can be utilized to answer any questions they may have to refine those requirements. The recent success of Large Language Models (LLMs) has opened up an opportunity to bridge the gap between the two tasks to help customers achieve their goals quickly and effectively by integrating conversational QA within product search. In this paper, we propose to recommend users Question-Answer (Q&A) pairs that are relevant to their product search and can help them make a purchase decision. We discuss the different aspects of the problem including the requirements and characteristics of the Q&A pairs, their generation, and the optimization of the Q&A recommendation task. We highlight the challenges, open problems, and suggested solutions to encourage future research in this emerging area.

翻訳日:2024-07-18 11:56:44 公開日:2024-07-16

# 正・未ラベルデータ:モデル、推定、推論、分類

Positive and Unlabeled Data: Model, Estimation, Inference, and Classification ( http://arxiv.org/abs/2407.09735v2 )

ライセンス: Link先を確認

Siyan Liu, Chi-Kuang Yeh, Xin Zhang, Qinglong Tian, Pengfei Li,

(参考訳) 本研究では,2次指数傾斜モデル(DETM)による正・ラベルなし(PU)データへの新たなアプローチを提案する。従来の手法は、正とラベルなしの正のデータが同じ分布から来ると仮定されるランダムな(SCAR)PUデータでのみ適用されるため、しばしば不足する。対照的に、DEMの双対構造は、ラベル付きおよびラベルなしの正のデータが異なる分布から得られるランダムPUデータにおいて、より複雑で未探索のデータを効果的に許容する。同定可能性,パラメータ推定,漸近特性など,DETMの理論的基礎を厳格に確立する。さらに、SCAR条件の適合性テストを開発し、対象領域における正のインスタンスの割合に対する信頼区間を構築することにより、統計的推測を推し進める。我々は、近似ベイズ分類器を分類タスクに利用し、予測におけるDETMの頑健な性能を実証する。本研究は、理論的洞察と実用的応用を通じて、PUデータの課題に対処するための包括的なフレームワークとして、DETMを強調した。

This study introduces a new approach to addressing positive and unlabeled (PU) data through the double exponential tilting model (DETM). Traditional methods often fall short because they only apply to selected completely at random (SCAR) PU data, where the labeled positive and unlabeled positive data are assumed to be from the same distribution. In contrast, our DETM's dual structure effectively accommodates the more complex and underexplored selected at random PU data, where the labeled and unlabeled positive data can be from different distributions. We rigorously establish the theoretical foundations of DETM, including identifiability, parameter estimation, and asymptotic properties. Additionally, we move forward to statistical inference by developing a goodness-of-fit test for the SCAR condition and constructing confidence intervals for the proportion of positive instances in the target domain. We leverage an approximated Bayes classifier for classification tasks, demonstrating DETM's robust performance in prediction. Through theoretical insights and practical applications, this study highlights DETM as a comprehensive framework for addressing the challenges of PU data.

翻訳日:2024-07-18 11:56:44 公開日:2024-07-16

# 安全ファインチューニングの作り方と壊し方 : メカニカルスタディ

What Makes and Breaks Safety Fine-tuning? A Mechanistic Study ( http://arxiv.org/abs/2407.10264v2 )

ライセンス: Link先を確認

Samyak Jain, Ekdeep Singh Lubana, Kemal Oksuz, Tom Joy, Philip H. S. Torr, Amartya Sanyal, Puneet K. Dokania,

(参考訳) 安全性の微調整は、大規模な言語モデル(LLM)を、安全なデプロイメントのための人間の好みに合わせるのに役立つ。モデルが実行するタスク間の相互作用(例えば「設計」)をモデル化し、そのタスクの実行を要求される特定の概念(例えば「サイクル」対「ボム」)に対してモデル化することで、安全でない入力の健全な側面を捉える合成データ生成フレームワークを設計する。これを用いて、教師付き安全微調整、直接選好最適化、未学習の3つの有名な安全微調整手法を調査し、これらの手法がMDP重みを最小限に変換し、安全でない入力をその重みのnull空間に具体的に整合させることを示す重要な証拠を提供する。これにより、モデルがそれらを安全とみなすかどうかに基づいて、入力のクラスタリングが生成される。それに対応して、敵入力(例えばジェイルブレイク)が提供されると、その活性化はより安全なサンプルに近づき、安全であるかのように入力などのモデル処理が行われる。実世界のモデル、特にLlama-2 7BとLlama-3 8Bでこの結果を検証する。

Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation framework that captures salient aspects of an unsafe input by modeling the interaction between the task the model is asked to perform (e.g., "design") versus the specific concepts the task is asked to be performed upon (e.g., a "cycle" vs. a "bomb"). Using this, we investigate three well-known safety fine-tuning methods -- supervised safety fine-tuning, direct preference optimization, and unlearning -- and provide significant evidence demonstrating that these methods minimally transform MLP weights to specifically align unsafe inputs into its weights' null space. This yields a clustering of inputs based on whether the model deems them safe or not. Correspondingly, when an adversarial input (e.g., a jailbreak) is provided, its activations are closer to safer samples, leading to the model processing such an input as if it were safe. We validate our findings, wherever possible, on real-world models -- specifically, Llama-2 7B and Llama-3 8B.

翻訳日:2024-07-18 11:56:44 公開日:2024-07-16

# RecGS:リカレントガウススプラッティングによる水源除去

RecGS: Removing Water Caustic with Recurrent Gaussian Splatting ( http://arxiv.org/abs/2407.10318v2 )

ライセンス: Link先を確認

Tianyi Zhang, Weiming Zhi, Kaining Huang, Joshua Mangelson, Corina Barbalata, Matthew Johnson-Roberson,

(参考訳) 水の因果関係は浅海域の海底画像データでよく見られる。画像から因果パターンを除去する従来の方法は、注釈付きデータセットの2Dフィルタリングや事前トレーニングに依存しており、3D構造を持つ現実世界の海底データに一般化する際のパフォーマンスを妨げている。本稿では,今日の光現実的3次元再構成技術である3DGSを利用して,海底画像から因果関係を分離する新たな手法であるRecurrent Gaussian Splatting(RecGS)を提案する。水中ロボットによって撮影された一連の画像を用いて、3DGSを反復的に構築し、各イテレーションで低通過フィルタで因果関係を分解する。実験では, 共同最適化, 2次元フィルタリング, 深層学習など, 様々な手法を解析・比較した。以上の結果から,本手法は海底の因果関係を効果的に分離し,視覚的外観を良くし,不整合照明の問題点にも適用できる可能性が示唆された。

Water caustics are commonly observed in seafloor imaging data from shallow-water areas. Traditional methods that remove caustic patterns from images often rely on 2D filtering or pre-training on an annotated dataset, hindering the performance when generalizing to real-world seafloor data with 3D structures. In this paper, we present a novel method Recurrent Gaussian Splatting (RecGS), which takes advantage of today's photorealistic 3D reconstruction technology, 3DGS, to separate caustics from seafloor imagery. With a sequence of images taken by an underwater robot, we build 3DGS recurrently and decompose the caustic with low-pass filtering in each iteration. In the experiments, we analyze and compare with different methods, including joint optimization, 2D filtering, and deep learning approaches. The results show that our method can effectively separate the caustic from the seafloor, improving the visual appearance, and can be potentially applied on more problems with inconsistent illumination.

翻訳日:2024-07-18 11:56:44 公開日:2024-07-16

# 計算木論理によるシーケンシャルプランニングにおけるMCTS説明可能性の実現

Enabling MCTS Explainability for Sequential Planning Through Computation Tree Logic ( http://arxiv.org/abs/2407.10820v2 )

ライセンス: Link先を確認

Ziyan An, Hendrik Baier, Abhishek Dubey, Ayan Mukhopadhyay, Meiyi Ma,

(参考訳) モンテカルロ木探索(MCTS)は、シーケンシャルな計画タスクのための最も有能なオンライン検索アルゴリズムの1つであり、資源配分やトランジット計画といった分野において重要な応用がある。実世界のデプロイメントのパフォーマンスは高いが、MCTSの本質的な複雑さは、技術的なバックグラウンドのないユーザにとって理解を困難にしている。本稿では,MCTSを交通ルーティングサービスに利用し,最適化された経路計画を構築するためにアルゴリズムを統合することを検討する。これらの計画は、様々な制約と要件を同時に満たし、現実の文脈でアルゴリズムの操作を説明するタスクをさらに複雑にする必要がある。この重要な研究ギャップに対処するために、MCTSのための新しい計算木論理ベースの説明器を導入する。私たちのフレームワークは、ユーザ定義の要件を言語テンプレートを使って厳密なロジック仕様に翻訳することから始まります。そこで,本論文では,MCTSアルゴリズムでトラバースされた状態と動作を検証する論理検証と定量的評価モジュールを組み込んだ。この分析の結果は、第2の言語テンプレートを使用して、人間可読な記述テキストに変換される。アプローチのユーザ満足度を82名を対象に調査した。その結果,説明的アプローチはユーザの嗜好において,他のベースラインよりも有意に優れていた。

Monte Carlo tree search (MCTS) is one of the most capable online search algorithms for sequential planning tasks, with significant applications in areas such as resource allocation and transit planning. Despite its strong performance in real-world deployment, the inherent complexity of MCTS makes it challenging to understand for users without technical background. This paper considers the use of MCTS in transportation routing services, where the algorithm is integrated to develop optimized route plans. These plans are required to meet a range of constraints and requirements simultaneously, further complicating the task of explaining the algorithm's operation in real-world contexts. To address this critical research gap, we introduce a novel computation tree logic-based explainer for MCTS. Our framework begins by taking user-defined requirements and translating them into rigorous logic specifications through the use of language templates. Then, our explainer incorporates a logic verification and quantitative evaluation module that validates the states and actions traversed by the MCTS algorithm. The outcomes of this analysis are then rendered into human-readable descriptive text using a second set of language templates. The user satisfaction of our approach was assessed through a survey with 82 participants. The results indicated that our explanatory approach significantly outperforms other baselines in user preference.

翻訳日:2024-07-18 11:42:46 公開日:2024-07-16

# 地形モデルを用いたマラリアベクター飼育地の検出

Detection of Malaria Vector Breeding Habitats using Topographic Models ( http://arxiv.org/abs/2011.13714v2 )

ライセンス: Link先を確認

Aishwarya Jadhav,

(参考訳) マラリアベクターの繁殖地として機能する停滞した水域の処理は、ほとんどのマラリア除去キャンペーンの基本的なステップである。しかし、大規模な水域の特定は高価であり、労働集約的で時間を要するため、資源が限られている国では困難である。水体を効率的に発見できる実用的なモデルは、現場労働者がスキャンする必要がある領域を大幅に減らし、限られた資源を標的にすることができる。そこで本研究では,可能でグローバルで高解像度なDEMデータに基づく実用的な地形モデルを提案する。ガーナのオプアシ地域を調査し,様々な地形特性が異なる水域に与える影響を調査し,水生生物形成に大きな影響を及ぼす特徴を明らかにする。複数のモデルの有効性をさらに評価する。我々の最良モデルは、衛星画像データを利用し、異なる設定で堅牢性を示すものでさえも、小さな水面の検出に地形変数を用いた以前の試みよりも大幅に優れています。

Treatment of stagnant water bodies that act as a breeding site for malarial vectors is a fundamental step in most malaria elimination campaigns. However, identification of such water bodies over large areas is expensive, labour-intensive and time-consuming and hence, challenging in countries with limited resources. Practical models that can efficiently locate water bodies can target the limited resources by greatly reducing the area that needs to be scanned by the field workers. To this end, we propose a practical topographic model based on easily available, global, high-resolution DEM data to predict locations of potential vector-breeding water sites. We surveyed the Obuasi region of Ghana to assess the impact of various topographic features on different types of water bodies and uncover the features that significantly influence the formation of aquatic habitats. We further evaluate the effectiveness of multiple models. Our best model significantly outperforms earlier attempts that employ topographic variables for detection of small water sites, even the ones that utilize additional satellite imagery data and demonstrates robustness across different settings.

翻訳日:2024-07-18 00:37:39 公開日:2024-07-16

# 量子ヤンミル理論の公理 -- 1. ユークリッド公理(不完全)

Axioms for Quantum Yang-Mills Theories -- 1. Euclidean Axioms (incomplete) ( http://arxiv.org/abs/2112.08575v6 )

ライセンス: Link先を確認

Min C. Lee,

(参考訳) 本稿では、シュウィンガー関数の概念を量子ヤン・ミルズ理論に拡張し、それらが満たすべき公理を提案する。この公理スキームの2つの主な特徴は、ゲージ不変な共位置シュウィンガー函数の存在を仮定し、それらにのみ反射正の積を課すことである。これはゲージ不変量のみが物理的意味を与えられるというゲージ理論の基本原理に従っている。

This paper extends the notion of Schwinger functions to quantum Yang-Mills theories and propose the axioms they should satisfy. Two main features of this axiom scheme is that we assume existence of gauge-invariant co-located Schwinger functions and impose reflection positivity only on them. This is in accordance with the fundamental principle of gauge theories that only gauge-invariant quantities can be given physical meaning.

翻訳日:2024-07-18 00:37:39 公開日:2024-07-16

# ガウス過程回帰による未知の力学系の形式的検証

Formal Verification of Unknown Dynamical Systems via Gaussian Process Regression ( http://arxiv.org/abs/2201.00655v2 )

ライセンス: Link先を確認

John Skovbekk, Luca Laurenti, Eric Frew, Morteza Lahijanian,

(参考訳) 安全クリティカルなシナリオにおける自律システムの活用には、システムのダイナミクスに影響を与える不確実性やブラックボックスコンポーネントの存在下での行動を検証する必要がある。本研究では,インプット・アウトプット・データセットから,時間論理仕様に対する非モデル化された力学と雑音測定による離散時間力学システムの検証を行うフレームワークを開発する。検証フレームワークはガウス過程(GP)回帰を用いてデータセットから未知のダイナミクスを学習し、連続空間システムを有限状態で不確実なマルコフ決定過程(MDP)として抽象化する。この抽象化は、再現可能なカーネルヒルベルト空間解析を用いて、GP回帰の誤差による不確かさを捉える空間の離散化と遷移確率間隔、および離散化によって引き起こされる不確かさに依存する。このフレームワークは、既存のモデルチェックツールを使用して、特定の時間論理仕様に対して不確実なMDP抽象化を検証する。ノイズ測定結果から基礎システムへの抽象化結果の拡張の正当性を確立した。フレームワークの計算複雑性は、データセットのサイズと離散抽象の多項式であることを示す。複雑性分析は、検証結果の品質と、より大きなデータセットとより詳細な抽象化を扱うための計算負荷との間のトレードオフを示している。最後に,線形・非線形・切替力学系を用いたいくつかのケーススタディにおいて,学習・検証フレームワークの有効性を実証した。

Leveraging autonomous systems in safety-critical scenarios requires verifying their behaviors in the presence of uncertainties and black-box components that influence the system dynamics. In this work, we develop a framework for verifying discrete-time dynamical systems with unmodelled dynamics and noisy measurements against temporal logic specifications from an input-output dataset. The verification framework employs Gaussian process (GP) regression to learn the unknown dynamics from the dataset and abstracts the continuous-space system as a finite-state, uncertain Markov decision process (MDP). This abstraction relies on space discretization and transition probability intervals that capture the uncertainty due to the error in GP regression by using reproducible kernel Hilbert space analysis as well as the uncertainty induced by discretization. The framework utilizes existing model checking tools for verification of the uncertain MDP abstraction against a given temporal logic specification. We establish the correctness of extending the verification results on the abstraction created from noisy measurements to the underlying system. We show that the computational complexity of the framework is polynomial in the size of the dataset and discrete abstraction. The complexity analysis illustrates a trade-off between the quality of the verification results and the computational burden to handle larger datasets and finer abstractions. Finally, we demonstrate the efficacy of our learning and verification framework on several case studies with linear, nonlinear, and switched dynamical systems.

翻訳日:2024-07-18 00:37:39 公開日:2024-07-16

# Infinityにおける凸解析:アストラル空間入門

Convex Analysis at Infinity: An Introduction to Astral Space ( http://arxiv.org/abs/2205.03260v3 )

ライセンス: Link先を確認

Miroslav Dudík, Robert E. Schapire, Matus Telgarsky,

(参考訳) $\mathbb{R}^n$ 上の凸函数は、有限の最小化子を持つわけではない。本研究は,無限大におけるそのような最小化要因を理解するための理論を開発することを目的としている。無限遠点が加わったような$\mathbb{R}^n$のコンパクトな拡張であるアストラル空間について研究する。アストラル空間はできるだけ小さいように構成され、すべての線型函数が新しい空間へ連続的に拡張されることを保証する。アストラル空間は$\mathbb{R}^n$のすべてを含むが、これはベクトル空間ではない。しかし、凸性、共役性、および部分微分の概念の有用かつ有意義な拡張を可能にするには十分に構造化されている。我々はこれらの概念を開発し、アストラル空間上の凸関数の様々な特性を解析し、それらの最小化器の詳細な構造、連続性の正確な特徴付け、降下アルゴリズムの収束を含む。

Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity. In this work, we aim to develop a theory for understanding such minimizers at infinity. We study astral space, a compact extension of $\mathbb{R}^n$ to which such points at infinity have been added. Astral space is constructed to be as small as possible while still ensuring that all linear functions can be continuously extended to the new space. Although astral space includes all of $\mathbb{R}^n$, it is not a vector space, nor even a metric space. However, it is sufficiently well-structured to allow useful and meaningful extensions of concepts of convexity, conjugacy, and subdifferentials. We develop these concepts and analyze various properties of convex functions on astral space, including the detailed structure of their minimizers, exact characterizations of continuity, and convergence of descent algorithms.

翻訳日:2024-07-18 00:37:39 公開日:2024-07-16

# 安全強化学習による並行性制約付き経済派遣

Contingency-constrained economic dispatch with safe reinforcement learning ( http://arxiv.org/abs/2205.06212v3 )

ライセンス: Link先を確認

Michael Eichelbeck, Hannah Markgraf, Matthias Althoff,

(参考訳) 将来の電力システムは、分散化された再生可能エネルギー源とエネルギー貯蔵システムを多く含むマイクログリッドに大きく依存する。この文脈における高い複雑さと不確実性により、従来の配電戦略が実現不可能になる可能性がある。強化学習ベース(RL)コントローラは、この課題に対処することができるが、それ自体が安全保証を提供しておらず、実際にデプロイすることを防ぐことはできない。この制限を克服するために、経済派遣のための正式に検証されたRLコントローラを提案する。従来の制約を時間依存制約によって拡張する。セットベースの後方到達可能性分析を用いて一致制約を算出し、安全層を介してRLエージェントの動作を検証する。安全でないアクションは安全なアクション空間に投影され、制約付きゾノトペ集合表現を計算効率に活用する。本手法は実世界の実測値を用いた住宅利用事例で実証された。

Future power systems will rely heavily on micro grids with a high share of decentralised renewable energy sources and energy storage systems. The high complexity and uncertainty in this context might make conventional power dispatch strategies infeasible. Reinforcement-learning based (RL) controllers can address this challenge, however, cannot themselves provide safety guarantees, preventing their deployment in practice. To overcome this limitation, we propose a formally validated RL controller for economic dispatch. We extend conventional constraints by a time-dependent constraint encoding the islanding contingency. The contingency constraint is computed using set-based backwards reachability analysis and actions of the RL agent are verified through a safety layer. Unsafe actions are projected into the safe action space while leveraging constrained zonotope set representations for computational efficiency. The developed approach is demonstrated on a residential use case using real-world measurements.

翻訳日:2024-07-18 00:37:39 公開日:2024-07-16

# インタラクティブな固定効果を用いた線形多次元回帰

Linear multidimensional regression with interactive fixed-effects ( http://arxiv.org/abs/2209.11691v3 )

ライセンス: Link先を確認

Hugo Freeman,

(参考訳) 本稿では,3次元以上の多次元パネルデータに対する線形かつ付加的に分離可能なモデルについて検討する。 2つのアプローチは、観測された共変量に対する係数を推定する際に、これらの観測されていないインタラクティブな固定効果を考慮に入れていると考えられる。第一に、モデルは標準的な2次元パネルの枠組みに埋め込まれており、Bai (2009) における因子構造法がモデルパラメータの一貫した推定に繋がる制約を形成するが、収束速度は遅い。第2のアプローチでは、カーネル重み付き固定効果法を開発し、この問題の多次元的性質に対してより堅牢であり、特定の条件下での一貫性のパラメトリック速度を達成することができる。理論的な結果とシミュレーションは、インタラクティブな固定効果項の構造が知られている場合の標準的な2次元パネル法にいくつかの利点を示す一方で、カーネル重み付け法がこの構造を知らずにどのように機能するかを強調している。ビールの需要弾力性を推定する手法が提案されている。

This paper studies a linear and additively separable model for multidimensional panel data of three or more dimensions with unobserved interactive fixed effects. Two approaches are considered to account for these unobserved interactive fixed-effects when estimating coefficients on the observed covariates. First, the model is embedded within the standard two dimensional panel framework and restrictions are formed under which the factor structure methods in Bai (2009) lead to consistent estimation of model parameters, but at slow rates of convergence. The second approach develops a kernel weighted fixed-effects method that is more robust to the multidimensional nature of the problem and can achieve the parametric rate of consistency under certain conditions. Theoretical results and simulations show some benefits to standard two-dimensional panel methods when the structure of the interactive fixed-effect term is known, but also highlight how the kernel weighted method performs well without knowledge of this structure. The methods are implemented to estimate the demand elasticity for beer.

翻訳日:2024-07-18 00:30:09 公開日:2024-07-16

# 量子シミュレーションのための高次積公式の改良

Greatly improved higher-order product formulae for quantum simulation ( http://arxiv.org/abs/2210.15817v2 )

ライセンス: Link先を確認

Mauro E. S. Morales, Pedro C. S. Costa, Giacomo Pantaleoni, Daniel K. Burgarth, Yuval R. Sanders, Dominic W. Berry,

(参考訳) ハミルトン進化のシミュレーションのための量子アルゴリズムは、しばしば積公式に基づいている。スズキのフラクタル法は、任意の高次積公式を見つける体系的な方法を与えるが、多くの指数関数をもたらす。一方、指数関数の少ない積公式は、同時非線形方程式の数値解によって見つけることができる。また、カーネルを繰り返し、プロセッサをシミュレーションの開始と終了にのみ適用する必要があるような処理によって、長時間シミュレーションのコストを削減することもできる。本研究では,8位と10位の両方の新しい積公式を数千個発見し,これらの式と先行文献の多くの公式を数値的に検証した。異なる長さと異なる順序の積公式を適切に比較する方法を提供する。システムパラメータ$T$ (time) と$\epsilon$ (allowable error) の8桁の精度で、他のテスト済み製品式よりも優れた性能を持つ8階目の製品公式が発見された。これには、量子アルゴリズムで使用されるパラメータの最も合理的な組み合わせが含まれる。

Quantum algorithms for simulation of Hamiltonian evolution are often based on product formulae. The fractal method of Suzuki gives a systematic way to find arbitrarily high-order product formulae, but results in a large number of exponentials. On the other hand, product formulae with fewer exponentials can be found by numerical solution of simultaneous nonlinear equations. It is also possible to reduce the cost of long-time simulations by processing, where a kernel is repeated and a processor need only be applied at the beginning and end of the simulation. In this work, we found thousands of new product formulae of both 8th and 10th order, and numerically tested these formulae, together with many formulae from prior literature. We provide methods to fairly compare product formulae of different lengths and different orders. We have found a new 8th order processed product formula with exceptional performance, that outperforms all other tested product formulae for about eight orders of magnitude in system parameters $T$ (time) and $\epsilon$ (allowable error). That includes most reasonable combinations of parameters to be used in quantum algorithms.

翻訳日:2024-07-18 00:30:09 公開日:2024-07-16

# コンセンサストラッキングとアグリゲーションゲームにおける結合制約による差分プライバシと収束精度の確保

Ensure Differential Privacy and Convergence Accuracy in Consensus Tracking and Aggregative Games with Coupling Constraints ( http://arxiv.org/abs/2210.16395v4 )

ライセンス: Link先を確認

Yongqiang Wang,

(参考訳) 共有結合制約を持つ完全分散集約ゲームに対する差分プライバシに対処する。一般化ナッシュ平衡(GNE)探索機構と微分プライバシ雑音注入機構を共同設計することにより、GNEへの証明可能な収束と厳密なエプシロン差分プライバシーを両立できる最初のGNE探索アルゴリズムを提案する。共同設計の基盤として,我々の知る限りでは達成されていない正確な追跡性能を維持しつつ,厳密なエプシロン差分プライバシーを実現するための新たなコンセンサス追跡アルゴリズムを提案する。収束解析を容易にするために,多数の最適化と変分問題の中核に位置する確率論的に摂動された非定常不動点反復過程に対する一般化結果も確立する。数値シミュレーションの結果,提案手法の有効性が確認された。

We address differential privacy for fully distributed aggregative games with shared coupling constraints. By co-designing the generalized Nash equilibrium (GNE) seeking mechanism and the differential-privacy noise injection mechanism, we propose the first GNE seeking algorithm that can ensure both provable convergence to the GNE and rigorous epsilon-differential privacy, even with the number of iterations tending to infinity. As a basis of the co-design, we also propose a new consensus-tracking algorithm that can achieve rigorous epsilon-differential privacy while maintaining accurate tracking performance, which, to our knowledge, has not been achieved before. To facilitate the convergence analysis, we also establish a general convergence result for stochastically-perturbed nonstationary fixed-point iteration processes, which lie at the core of numerous optimization and variational problems. Numerical simulation results confirm the effectiveness of the proposed approach.

翻訳日:2024-07-18 00:30:09 公開日:2024-07-16

# グローバルモーメント初期化による敵攻撃の伝達性向上

Boosting the Transferability of Adversarial Attacks with Global Momentum Initialization ( http://arxiv.org/abs/2211.11236v3 )

ライセンス: Link先を確認

Jiafeng Wang, Zhaoyu Chen, Kaixun Jiang, Dingkang Yang, Lingyi Hong, Pinxue Guo, Haijing Guo, Wenqiang Zhang,

(参考訳) ディープニューラルネットワーク(Deep Neural Networks, DNN)は、敵対的な例に対して脆弱である。同時に、敵の例はモデル間での転送可能性を示し、実用的なブラックボックス攻撃を可能にした。しかし、既存の手法では所望の転送攻撃性能を達成できない。本研究では、勾配最適化と整合性に着目し、勾配除去現象と局所運動量最適ジレンマを解析する。これらの課題に対処するために,Global Momentum Initialization (GI)を導入し,勾配除去を緩和するためのグローバルな運動量知識を提供する。具体的には、攻撃前に勾配前収束を行い、この段階でグローバル検索を行う。 GIは既存の転送方式とシームレスに統合され、最先端の防御機構により平均6.4%の転送攻撃の成功率を大幅に向上させる。最終的に、GIは画像とビデオの両方の攻撃領域で強力な転送可能性を示す。特に、画像領域における高度な防御方法を攻撃する場合、平均的な攻撃成功率は95.4%に達する。コードは$\href{https://github.com/Omenzychen/Global-Momentum-Initialization}{https://github.com/Omenzychen/Global-Momentum-Initialization}$で入手できる。

Deep Neural Networks (DNNs) are vulnerable to adversarial examples, which are crafted by adding human-imperceptible perturbations to the benign inputs. Simultaneously, adversarial examples exhibit transferability across models, enabling practical black-box attacks. However, existing methods are still incapable of achieving the desired transfer attack performance. In this work, focusing on gradient optimization and consistency, we analyse the gradient elimination phenomenon as well as the local momentum optimum dilemma. To tackle these challenges, we introduce Global Momentum Initialization (GI), providing global momentum knowledge to mitigate gradient elimination. Specifically, we perform gradient pre-convergence before the attack and a global search during this stage. GI seamlessly integrates with existing transfer methods, significantly improving the success rate of transfer attacks by an average of 6.4% under various advanced defense mechanisms compared to the state-of-the-art method. Ultimately, GI demonstrates strong transferability in both image and video attack domains. Particularly, when attacking advanced defense methods in the image domain, it achieves an average attack success rate of 95.4%. The code is available at $\href{https://github.com/Omenzychen/Global-Momentum-Initialization}{https://github.com/Omenzychen/Global-Momentum-Initialization}$.

翻訳日:2024-07-18 00:30:09 公開日:2024-07-16

# 連続学習のための潜在スペクトル規則化

Latent Spectral Regularization for Continual Learning ( http://arxiv.org/abs/2301.03345v4 )

ライセンス: Link先を確認

Emanuele Frascaroli, Riccardo Benaglia, Matteo Boschini, Luca Moschella, Cosimo Fiorini, Emanuele Rodolà, Simone Calderara,

(参考訳) 生物の知性は、新しい知識が生涯にわたって収集されるにつれて有機的に成長するが、ニューラルネットワークは、変化するトレーニングデータ分布に直面すると破滅的なことを忘れる。リハーサルベースの連続学習(CL)アプローチは、この制限を克服するための汎用的で信頼性の高いソリューションとして確立されているが、突然の入力障害とメモリ制約は、それらの予測の一貫性を変えることが知られている。本研究では,学習者の潜伏空間の幾何学的特徴を調べた結果,異なるクラスにおけるリプレイされたデータポイントが次第に混在し,分類に干渉していることが判明した。そこで我々は,ラプラシアンスペクトルの弱要求を強制する幾何正則化器を提案し,分割挙動を推し進める。提案手法はCaSpeR-IL(Continuous Spectral Regularizer for Incremental Learning)と呼ばれ,任意のリハーサルベースのCLアプローチと簡単に組み合わせることができる。

While biological intelligence grows organically as new knowledge is gathered throughout life, Artificial Neural Networks forget catastrophically whenever they face a changing training data distribution. Rehearsal-based Continual Learning (CL) approaches have been established as a versatile and reliable solution to overcome this limitation; however, sudden input disruptions and memory constraints are known to alter the consistency of their predictions. We study this phenomenon by investigating the geometric characteristics of the learner's latent space and find that replayed data points of different classes increasingly mix up, interfering with classification. Hence, we propose a geometric regularizer that enforces weak requirements on the Laplacian spectrum of the latent space, promoting a partitioning behavior. Our proposal, called Continual Spectral Regularizer for Incremental Learning (CaSpeR-IL), can be easily combined with any rehearsal-based CL approach and improves the performance of SOTA methods on standard benchmarks.

翻訳日:2024-07-18 00:30:09 公開日:2024-07-16

# ODIM:Under-Fitted Generative Modelの類似による外部検出

ODIM: Outlier Detection via Likelihood of Under-Fitted Generative Models ( http://arxiv.org/abs/2301.04257v2 )

ライセンス: Link先を確認

Dongha Kim, Jaesung Hwang, Jongjin Lee, Kunwoong Kim, Yongdai Kim,

(参考訳) unsupervised outlier detection (UOD) 問題とは、インリアーとインリアーを含む訓練データからインリアーとインリアーのラベルを付けずにインリアーを識別するタスクである。完全に訓練された確率ベース深部生成モデル(DGM)を用いることで、不整合と外れ値の区別性能が低下することが広く認識されている。本研究は、DGMが慎重に不適合であることを前提として、UDDタスクの不整合を識別する強力な証拠となる可能性を主張する。我々のアプローチは、inlier-memorization(IM)エフェクトと呼ばれる新しい観測から始まる。そこで本研究では, IM効果(ODIM)を用いた外乱検出法を開発した。注目すべきなのは、ODIMは数回のアップデートしか必要とせず、計算効率が他のディープラーニングベースのアルゴリズムの何倍も高速であることだ。また、ODIMは、表、画像、テキストデータを含むデータの種類にかかわらず、アウトレーヤを良好にフィルタリングする。提案手法の優位性と効率性を検証するため,60近いデータセットに対して広範な実験分析を行った。

The unsupervised outlier detection (UOD) problem refers to a task to identify inliers given training data which contain outliers as well as inliers, without any labeled information about inliers and outliers. It has been widely recognized that using fully-trained likelihood-based deep generative models (DGMs) often results in poor performance in distinguishing inliers from outliers. In this study, we claim that the likelihood itself could serve as powerful evidence for identifying inliers in UOD tasks, provided that DGMs are carefully under-fitted. Our approach begins with a novel observation called the inlier-memorization (IM) effect-when training a deep generative model with data including outliers, the model initially memorizes inliers before outliers. Based on this finding, we develop a new method called the outlier detection via the IM effect (ODIM). Remarkably, the ODIM requires only a few updates, making it computationally efficient-at least tens of times faster than other deep-learning-based algorithms. Also, the ODIM filters out outliers excellently, regardless of the data type, including tabular, image, and text data. To validate the superiority and efficiency of our method, we provide extensive empirical analyses on close to 60 datasets.

翻訳日:2024-07-18 00:30:09 公開日:2024-07-16

# NeSIG: 計画問題生成のためのニューロシンボリックな学習方法

NeSIG: A Neuro-Symbolic Method for Learning to Generate Planning Problems ( http://arxiv.org/abs/2301.10280v2 )

ライセンス: Link先を確認

Carlos Núñez-Molina, Pablo Mesejo, Juan Fernández-Olivares,

(参考訳) 自動計画(Automated Planning)の分野では、マシンラーニングのトレーニングデータや、計画競合のベンチマークとして使用されるような、特定のドメインからの計画上の問題セットが必要になることが多い。ほとんどの場合、これらの問題は手動かドメイン固有のジェネレータによって生成され、人間の設計者に負担がかかる。本稿では,NeSIGを提案する。この知識を最大限に活用するために,有効で多種多様で解決が難しい計画問題を自動的に生成する,ドメインに依存しない最初の手法を提案する。マルコフ決定プロセスとして問題生成を定式化し、Deep Reinforcement Learning を用いて2つの生成ポリシーを訓練し、所望の特性の問題を発生させる。我々は3つの古典的ドメインについて実験を行い、手工芸のドメイン固有のインスタンスジェネレータと様々なアブリケーションに対するアプローチを比較した。結果は、NeSIGがドメイン固有のジェネレータよりもはるかに困難(幾何平均の15.5倍)な、有効で多様な問題を自動生成できることを示している。さらに、トレーニング中に見られる問題よりも大きな問題に一般化することができる。

In the field of Automated Planning there is often the need for a set of planning problems from a particular domain, e.g., to be used as training data for Machine Learning or as benchmarks in planning competitions. In most cases, these problems are created either by hand or by a domain-specific generator, putting a burden on the human designers. In this paper we propose NeSIG, to the best of our knowledge the first domain-independent method for automatically generating planning problems that are valid, diverse and difficult to solve. We formulate problem generation as a Markov Decision Process and train two generative policies with Deep Reinforcement Learning to generate problems with the desired properties. We conduct experiments on three classical domains, comparing our approach against handcrafted, domain-specific instance generators and various ablations. Results show NeSIG is able to automatically generate valid and diverse problems of much greater difficulty (15.5 times more on geometric average) than domain-specific generators, while simultaneously reducing human effort when compared to them. Additionally, it can generalize to larger problems than those seen during training.

翻訳日:2024-07-18 00:30:09 公開日:2024-07-16

# 深層学習を用いた医用画像分割のためのマスク処理による外レンズ補間の評価

Evaluation of Extra Pixel Interpolation with Mask Processing for Medical Image Segmentation with Deep Learning ( http://arxiv.org/abs/2302.11522v4 )

ライセンス: Link先を確認

Olivier Rukundo,

(参考訳) 現在のマスク処理操作は、隣接する(NN)補間のような余分なピクセルを生成しない補間アルゴリズムに依存しており、バイキュビック(BIC)やバイリニア(BIL)補間のような余分なピクセルを生成するアルゴリズムとは対照的である。前報では,NNを用いたマスク処理の代替手法を提案し,その効果が深層学習訓練結果に及ぼす影響を評価した。本研究では,BICベースの画像とマスク処理とBICとNNベースの画像とマスク処理の両方が,NNベースの画像とマスク処理に与える影響を評価した。評価の結果、BIC-BICモデルは8.9578 %(画像サイズ256 x 256)、1.0496 %(画像サイズ384 x 384)、NN-NNネットワークは8.3127 %(画像サイズ256 x 256)、0.2887 %(画像サイズ384 x 384)であった。

Current mask processing operations rely on interpolation algorithms that do not produce extra pixels, such as nearest neighbor (NN) interpolation, as opposed to algorithms that do produce extra pixels, like bicubic (BIC) or bilinear (BIL) interpolation. In our previous study, the author proposed an alternative approach to NN-based mask processing and evaluated its effects on deep learning training outcomes. In this study, the author evaluated the effects of both BIC-based image and mask processing and BIC-and-NN-based image and mask processing versus NN-based image and mask processing. The evaluation revealed that the BIC-BIC model/network was an 8.9578 % (with image size 256 x 256) and a 1.0496 % (with image size 384 x 384) increase of the NN-NN network compared to the NN-BIC network which was an 8.3127 % (with image size 256 x 256) and a 0.2887 % (with image size 384 x 384) increase of the NN-NN network.

翻訳日:2024-07-18 00:30:09 公開日:2024-07-16

# CompoDiff:Versatileの合成画像検索と遅延拡散

CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion ( http://arxiv.org/abs/2303.11916v4 )

ライセンス: Link先を確認

Geonmo Gu, Sanghyuk Chun, Wonjae Kim, HeeJae Jun, Yoohoon Kang, Sangdoo Yun,

(参考訳) 本稿では,ゼロショット合成画像検索(ZS-CIR)を遅延拡散で解くための新しい拡散モデルCompoDiffを提案する。また,CIRモデルをトレーニングするための1880万の参照画像,条件,および対応するターゲット画像三重項を含む,SynthTriplets18Mという新しい合成データセットも紹介した。 CompoDiffとSynthTriplets18Mは、小さなデータセットスケールと限られた条件による一般化性の低下など、従来のCIRアプローチの不足に対処している。 CompoDiffは、FashionIQ、CIRR、CIRCO、GeneCISを含む4つのZS-CIRベンチマークで新たな最先端を達成しているだけでなく、ネガティブテキストやイメージマスク条件などのさまざまな条件を受け入れることで、より汎用的で制御可能なCIRを実現している。 CompoDiffはまた、テキストと画像クエリ間の条件強度の制御性と、既存のCIRメソッドでは利用できない推論速度と性能のトレードオフも示す。コードとデータセットはhttps://github.com/navervision/CompoDiffで公開されている。

This paper proposes a novel diffusion-based model, CompoDiff, for solving zero-shot Composed Image Retrieval (ZS-CIR) with latent diffusion. This paper also introduces a new synthetic dataset, named SynthTriplets18M, with 18.8 million reference images, conditions, and corresponding target image triplets to train CIR models. CompoDiff and SynthTriplets18M tackle the shortages of the previous CIR approaches, such as poor generalizability due to the small dataset scale and the limited types of conditions. CompoDiff not only achieves a new state-of-the-art on four ZS-CIR benchmarks, including FashionIQ, CIRR, CIRCO, and GeneCIS, but also enables a more versatile and controllable CIR by accepting various conditions, such as negative text, and image mask conditions. CompoDiff also shows the controllability of the condition strength between text and image queries and the trade-off between inference speed and performance, which are unavailable with existing CIR methods. The code and dataset are available at https://github.com/navervision/CompoDiff

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# LMExplainer: 知識の基盤と言語モデル

LMExplainer: Grounding Knowledge and Explaining Language Models ( http://arxiv.org/abs/2303.16537v3 )

ライセンス: Link先を確認

Zichen Chen, Jianda Chen, Yuanyuan Chen, Han Yu, Ambuj K Singh, Misha Sra,

(参考訳) GPT-4のような言語モデル(LM)は、AIアプリケーションにおいて重要であるが、不透明な意思決定プロセスは、特に安全クリティカルな領域において、ユーザの信頼を低下させる。 LMExplainerは,人間の直感的,理解可能な説明を通じて,LMの推論過程を明らかにする新しい知識基盤説明器である。大規模知識グラフ(KG)を用いたグラフアテンションネットワーク(GAT)を活用することで、LMExplainerは推論空間を正確に狭め、最も関連する知識にフォーカスするだけでなく、幻覚を減らし、解釈可能性を高めるために、構造化された検証可能な知識にその推論を基礎付ける。 LMExplainerは、透明性を高め、意思決定プロセスを合理化するために、人間の理解可能な説明を効果的に生成する。さらに、デバッグを説明に組み込むことで、開発の観点からLMを改善する専門的な提案を提供する。したがって、LMExplainerは、LMをユーザにとってよりアクセスしやすく、理解しやすいものにするための拡張である。我々は、CommonsenseQAやOpenBookQAといったベンチマークデータセット上でLMExplainerを評価し、既存のメソッドよりも優れていることを示す。 LMExplainerが生成した説明と他のモデルの説明を比較することで、我々のアプローチは推論プロセスのより包括的で明確な説明を提供することを示す。 LMExplainerは、LMの内部動作をより深く理解し、より信頼性が高く、透明で、公平なAIに向かっている。

Language models (LMs) like GPT-4 are important in AI applications, but their opaque decision-making process reduces user trust, especially in safety-critical areas. We introduce LMExplainer, a novel knowledge-grounded explainer that clarifies the reasoning process of LMs through intuitive, human-understandable explanations. By leveraging a graph attention network (GAT) with a large-scale knowledge graph (KG), LMExplainer not only precisely narrows the reasoning space to focus on the most relevant knowledge but also grounds its reasoning in structured, verifiable knowledge to reduce hallucinations and enhance interpretability. LMExplainer effectively generates human-understandable explanations to enhance transparency and streamline the decision-making process. Additionally, by incorporating debugging into the explanation, it offers expertise suggestions that improve LMs from a developmental perspective. Thus, LMExplainer stands as an enhancement in making LMs more accessible and understandable to users. We evaluate LMExplainer on benchmark datasets such as CommonsenseQA and OpenBookQA, demonstrating that it outperforms most existing methods. By comparing the explanations generated by LMExplainer with those of other models, we show that our approach offers more comprehensive and clearer explanations of the reasoning process. LMExplainer provides a deeper understanding of the inner workings of LMs, advancing towards more reliable, transparent, and equitable AI.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# 全距離イジングモデルによるN$ spin-$1/2$システムにおける量子絡み合い、幾何学的および動的外観の相補性

Complementarity between quantum entanglement, geometrical and dynamical appearances in $N$ spin-$1/2$ system under all-range Ising model ( http://arxiv.org/abs/2304.05278v2 )

ライセンス: Link先を確認

Jamal Elfakir, Brahim Amghar, Abdallah Slaoui, Mohammed Daoud,

(参考訳) 幾何学科学の成長に伴い、現代の幾何学によって情報の世界を探索する手法を含め、幾何学的・位相的・動的特性と量子的絡み合いとの間には謎の曖昧な関係が常にある。幾何学は距離や曲率などの要素間の相互関係を研究するため、積分可能量子系の実用的で理解可能な記述をもたらす強力な構造を持つ情報科学を提供する。ここでは、これらの構造を全範囲イジングモデルの下でN$相互作用スピン-1/2$の物理系で探索する。系の力学により、関連する量子状態空間を定義するフビニ・スタディ計量を決定する。ガウス・ボンネットの定理の範囲内でガウス曲率を適用することで、ダンベル型構造と球面位相の両方を持つ閉2次元多様体上でその力学が生じることを証明した。系の進化過程に現れる幾何学的位相と位相的位相を十分に議論する。その後、時間-最適進化を達成して量子ブラキストロン問題を解く。一つ目は幾何学的な性質であり、その絡み合いレベルがフビニ・スタディ計量、ガウス曲率、幾何学的位相などの導出した幾何学的構造にどのように影響するかを探求する。 2つ目は動的性質であり、進化速度と関連するフビニ・スタディ距離に対する絡み合い効果に対処する。さらに、絡み合いの度合いにより、量子ブラキストロン問題を解く。

With the growth of geometric science, including the methods of exploring the world of information by means of modern geometry, there has always been a mysterious and fascinating ambiguous link between geometric, topological and dynamical characteristics with quantum entanglement. Since geometry studies the interrelations between elements such as distance and curvature, it provides the information sciences with powerful structures that yield practically useful and understandable descriptions of integrable quantum systems. We explore here these structures in a physical system of $N$ interaction spin-$1/2$ under all-range Ising model. By performing the system dynamics, we determine the Fubini-Study metric defining the relevant quantum state space. Applying Gaussian curvature within the scope of the Gauss-Bonnet theorem, we proved that the dynamics happens on a closed two-dimensional manifold having both a dumbbell-shape structure and a spherical topology. The geometric and topological phases appearing during the system evolution processes are sufficiently discussed. Subsequently, we resolve the quantum brachistochrone problem by achieving the time-optimal evolution. By restricting the whole system to a two spin-$1/2$ system, we investigate the relevant entanglement from two viewpoints; The first is of geometric nature and explores how the entanglement level affects derived geometric structures such as the Fubini-Study metric, the Gaussian curvature, and the geometric phase. The second is of dynamic nature and addresses the entanglement effect on the evolution speed and the related Fubini-Study distance. Further, depending on the degree of entanglement, we resolve the quantum brachistochrone problem.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# 基礎モデルに基づくシステム設計のための参照アーキテクチャ

A Reference Architecture for Designing Foundation Model based Systems ( http://arxiv.org/abs/2304.11090v5 )

ライセンス: Link先を確認

Qinghua Lu, Liming Zhu, Xiwei Xu, Zhenchang Xing, Jon Whittle,

(参考訳) ChatGPT、Gemini、その他の大規模言語モデルのリリースは、基礎モデルに大きな関心を集めている。ファンデーションモデルが将来のAIシステムの基本的なビルディングブロックになる、という広いコンセンサスがある。しかし、アーキテクチャ設計に関する体系的なガイダンスが不足している。特に、ファンデーションモデルの急速な機能向上は、最終的にはAIシステムの他のコンポーネントを吸収し、アーキテクチャ設計における境界の移動とインターフェースの進化の課題を提起する。さらに、基礎モデルをAIシステムに組み込むことは、不透明な性質と急速に進歩するインテリジェンスのために、責任と安全性に関する重要な懸念を提起する。これらの課題に対処するため,本論文では,基礎モデル時代におけるAIシステムのアーキテクチャ進化について,"境界モデル・アズ・ア・コネクタ"から"境界モデル・ア・ア・モノリシックアーキテクチャ"へ移行した。そこで本論文では,設計上の重要な決定事項を特定し,基礎モデルに基づくシステム設計のためのパターン指向参照アーキテクチャを提案する。このパターンは、関連するリスクを確保しながら、ファンデーションモデルの可能性を可能にする。

The release of ChatGPT, Gemini, and other large language model has drawn huge interests on foundations models. There is a broad consensus that foundations models will be the fundamental building blocks for future AI systems. However, there is a lack of systematic guidance on the architecture design. Particularly, the the rapidly growing capabilities of foundations models can eventually absorb other components of AI systems, posing challenges of moving boundary and interface evolution in architecture design. Furthermore, incorporating foundations models into AI systems raises significant concerns about responsible and safe AI due to their opaque nature and rapidly advancing intelligence. To address these challenges, the paper first presents an architecture evolution of AI systems in the era of foundation models, transitioning from "foundation-model-as-a-connector" to "foundation-model-as-a-monolithic architecture". The paper then identifies key design decisions and proposes a pattern-oriented reference architecture for designing responsible foundation-model-based systems. The patterns can enable the potential of foundation models while ensuring associated risks.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# 量子オブジェクト間の変換のキャラクタリゼーション、量子特性の「完全性」、固定因数順序のない変換

Characterising transformations between quantum objects, 'completeness' of quantum properties, and transformations without a fixed causal order ( http://arxiv.org/abs/2305.01247v2 )

ライセンス: Link先を確認

Simon Milz, Marco Túlio Quintino,

(参考訳) 量子力学における多くの基本的および鍵的対象は、特定のアフィン/線型空間間の線型写像である。この構造には、状態、測定、チャネル、機器、非シグナリングチャネル、メモリを持つチャネルといった基本的な量子要素や、スーパーチャネル、量子コム、n時間プロセス、テスタ、プロセス行列といった特定の因果順序を尊重しない高次演算が含まれる。線形および半定値制約の観点でそれらの構造特性を導出し特徴付けることは、基礎的関連性だけでなく、量子オブジェクトの集合に対する数値最適化を可能にし、異なる概念とオブジェクト間のより単純な接続を可能にする上で重要な役割を担っている。ここでは、これらのプロパティを直接的で使いやすい方法で推論する一般的なフレームワークを提供する。現実的な量子力学的考察によって導かれるが、一般線型/ファイン空間間の写像に解析を拡張し、それらの性質を導出し、量子論によって明示的に禁止されていないが、まだあまり研究されていない集合を解析する可能性を開く。これらの結果と合わせて、量子力学などにおいて線形変換の特徴付けを必要とする全てのタスクに対して、汎用的で容易に適用可能なツールが得られる。提案手法の適用例として、高次量子変換において不定因性の存在が自然に出現し、入力空間の部分のみに非自明に振る舞う場合の「完全」な意味での性質を保たなければならない写像のキャラクタリゼーションのための簡単な戦略について論じる。

Many fundamental and key objects in quantum mechanics are linear mappings between particular affine/linear spaces. This structure includes basic quantum elements such as states, measurements, channels, instruments, non-signalling channels and channels with memory, and also higher-order operations such as superchannels, quantum combs, n-time processes, testers, and process matrices which may not respect a definite causal order. Deducing and characterising their structural properties in terms of linear and semidefinite constraints is not only of foundational relevance, but plays an important role in enabling the numerical optimisation over sets of quantum objects and allowing simpler connections between different concepts and objects. Here, we provide a general framework to deduce these properties in a direct and easy to use way. While primarily guided by practical quantum mechanical considerations, we also extend our analysis to mappings between general linear/affine spaces and derive their properties, opening the possibility for analysing sets which are not explicitly forbidden by quantum theory, but are still not much explored. Together, these results yield versatile and readily applicable tools for all tasks that require the characterisation of linear transformations, in quantum mechanics and beyond. As an application of our methods, we discuss how the existence of indefinite causality naturally emerges in higher-order quantum transformations and provide a simple strategy for the characterisation of mappings that have to preserve properties in a 'complete' sense, i.e., when acting non-trivially only on parts of an input space.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# 室内のエレファント:自然言語処理研究におけるビッグデータの存在分析

The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research ( http://arxiv.org/abs/2305.02797v4 )

ライセンス: Link先を確認

Mohamed Abdalla, Jan Philip Wahle, Terry Ruas, Aurélie Névéol, Fanny Ducel, Saif M. Mohammad, Karën Fort,

(参考訳) 自然言語処理(NLP)の深層学習手法の最近の進歩は、新たなビジネス機会を生み出し、NLP研究を産業発展に欠かせないものにしている。 NLPの分野では、政府や大学とともに大きなプレーヤーの1つとして、産業が研究に与える影響を追跡することが重要である。本研究では,NLPコミュニティにおける産業の存在を時間とともに定量化し,特徴付けることを目的とする。 78,187冊のNLP出版物と701冊のNLP出版物の包括的なメタデータを持つコーパスを用いて,90年代初め以降の分野における業界の存在を探求する。 NLP作家の業界における存在感は、過去5年間で急激な増加(2017年から2022年までの180%)を前に着実に推移している。いくつかの企業は出版物の大半を占め、助成金やインターンシップを通じて学術研究者に資金を提供している。本研究は,自然言語処理研究における産業の存在と影響が重要かつ急速に成長していることを示している。この研究は、この分野における産業の影響の透明性を高めることを求めている。

Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence in the NLP community over time. Using a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors, we explore the industry presence in the field since the early 90s. We find that industry presence among NLP authors has been steady before a steep increase over the past five years (180% growth from 2017 to 2022). A few companies account for most of the publications and provide funding to academic researchers through grants and internships. Our study shows that the presence and impact of the industry on natural language processing research are significant and fast-growing. This work calls for increased transparency of industry influence in the field.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# 自由電子干渉計を用いたコヒーレント増幅超高速イメージング

Coherently amplified ultrafast imaging using a free-electron interferometer ( http://arxiv.org/abs/2305.04877v2 )

ライセンス: Link先を確認

Tomer Bucher, Harel Nahari, Hanan Herzig Sheinfux, Ron Ruimy, Arthur Niedermayr, Raphael Dahan, Qinghui Yan, Yuval Adiv, Michael Yannai, Jialin Chen, Yaniv Kurman, Sang Tae Park, Daniel J. Masiel, Eli Janzen, James H. Edgar, Fabrizio Carbone, Guy Bartal, Shai Tsesses, Frank H. L. Koppens, Giovanni Maria Vanacore, Ido Kaminer,

(参考訳) 空間分解能と時間分解能の同時同時に物質とその分極子の低エネルギー非平衡ダイナミクスにアクセスすることは、近年の電子顕微鏡の大胆なフロンティアである。主な課題の1つは、振幅と位相情報を同時に切り離しながら非常に弱い信号を取得する能力である。本稿では、光誘起電子変調に基づく顕微鏡手法であるFree-Electron Ramsey Imaging(FERI)を提案する。六方晶窒化ホウ素膜から作製したマイクロドラムの時間・空間・位相同時分解測定を行い、2次元偏光子波束のサブサイクルダイナミクスを可視化した。位相分解測定により、ポラリトン波面上の渦反渦特異点と、定常波の振幅プロファイルを模倣する走行波の興味深い現象が明らかになった。実験では, 従来の電子近接場イメージングと比較して20倍のコヒーレント増幅を行い, 数kV/mの磁場振幅に対応するピーク場強度を ~W/cm2 の順に解消した。その結果、我々の研究は、生体試料や量子材料を時空間電子顕微鏡で観察する方法を練り上げました。

Accessing the low-energy non-equilibrium dynamics of materials and their polaritons with simultaneous high spatial and temporal resolution has been a bold frontier of electron microscopy in recent years. One of the main challenges lies in the ability to retrieve extremely weak signals while simultaneously disentangling amplitude and phase information. Here, we present Free-Electron Ramsey Imaging (FERI), a microscopy approach based on light-induced electron modulation that enables coherent amplification of optical near-fields in electron imaging. We provide simultaneous time-, space-, and phase-resolved measurements of a micro-drum made from a hexagonal boron nitride membrane visualizing the sub-cycle dynamics of 2D polariton wavepackets therein. The phase-resolved measurements reveals vortex-anti-vortex singularities on the polariton wavefronts, together with an intriguing phenomenon of a traveling wave mimicking the amplitude profile of a standing wave. Our experiments show a 20-fold coherent amplification of the near-field signal compared to conventional electron near-field imaging, resolving peak field intensities in the order of ~W/cm2, corresponding to field amplitudes of a few kV/m. As a result, our work paves the way for spatio-temporal electron microscopy of biological specimens and quantum materials, exciting yet delicate samples that are currently difficult to investigate.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# 大規模言語モデルの時代における関係抽出の再検討

Revisiting Relation Extraction in the era of Large Language Models ( http://arxiv.org/abs/2305.05003v2 )

ライセンス: Link先を確認

Somin Wadhwa, Silvio Amir, Byron C. Wallace,

(参考訳) 関係抽出(RE)は、テキストからエンティティ間の意味的関係を推測する中核的なNLPタスクである。標準教師付きRE技術は、エンティティスパンを構成するトークンをタグ付けし、それらの関係を予測するためのトレーニングモジュールを提供する。最近の研究は、この問題を「emph{sequence-to-sequence}」タスクとして扱い、入力に条件付けされたターゲット文字列としてエンティティ間の関係を線形化する。ここでは、従来の作業よりも大きい言語モデル(GPT-3とFlan-T5)を用いて、標準的なREタスクの性能を様々なレベルの監督下で評価し、このアプローチの限界を推し進める。我々は、正確なマッチングに頼る代わりに、人間による評価を行うことにより、REに対する生成的アプローチを評価することに固有の問題に対処する。改良された評価では,(1) GPT-3 を用いたショットプロンプトは SOTA に近い性能,すなわち,既存の完全教師付きモデルとほぼ同等である。(2) Flan-T5 は,ショットセットではあまり機能しないが,チェーン・オブ・ソート(CoT) スタイルの説明(GPT-3 で生成)でそれを監視・微調整することで SOTA の結果が得られる。私たちはこのモデルをREタスクの新しいベースラインとしてリリースします。

Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) Few-shot prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing fully supervised models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# ParamNet: 高速マルチツーワンステン正規化のための動的パラメータネットワーク

ParamNet: A Dynamic Parameter Network for Fast Multi-to-One Stain Normalization ( http://arxiv.org/abs/2305.06511v3 )

ライセンス: Link先を確認

Hongtao Kang, Die Luo, Li Chen, Junbo Hu, Tingwei Quan, Shaoqun Zeng, Shenghua Cheng, Xiuli Liu,

(参考訳) 実際には、デジタル病理画像は様々な要因に影響され、色と明るさに大きな違いをもたらすことが多い。 Stain normalizationは、デジタル病理画像の色と明るさの違いを効果的に低減し、コンピュータ支援診断システムの性能を向上させる。従来の染色正規化法は1つまたは複数の参照画像に依存しているが、1つまたは複数の画像はデータセット全体を適切に表現していない。学習に基づく染色正規化法は一般的な手法であるが、複雑なディープネットワークを使用し、計算効率を大幅に低下させるだけでなく、アーティファクトの導入リスクも低減する。特殊なネットワーク構造を用いて計算効率と信頼性を向上させる研究もあるが、これらの手法はネットワーク容量が不足しているため、複数対1の染色正規化に適用することは困難である。本研究では,動的パラメータネットワークを導入し,ParamNetと呼ばれる新しい染色正規化法を提案する。 ParamNetは、ネットワーク設計に動的パラメータ(畳み込み層の重みとバイアス)を導入することで、限られたネットワーク容量と計算効率の課題に対処する。これらのパラメータを効果的に活用することにより、ParamNetは、計算効率を維持しながら、染色正規化における優れた性能を達成する。その結果、ParamNetは25秒で10万×100,000のスライド画像(WSI)を正規化できることがわかった。コードは、https://github.com/khtao/ParamNet.comで入手できる。

In practice, digital pathology images are often affected by various factors, resulting in very large differences in color and brightness. Stain normalization can effectively reduce the differences in color and brightness of digital pathology images, thus improving the performance of computer-aided diagnostic systems. Conventional stain normalization methods rely on one or several reference images, but one or several images may not adequately represent the entire dataset. Although learning-based stain normalization methods are a general approach, they use complex deep networks, which not only greatly reduce computational efficiency, but also risk introducing artifacts. Some studies use specialized network structures to enhance computational efficiency and reliability, but these methods are difficult to apply to multi-to-one stain normalization due to insufficient network capacity. In this study, we introduced dynamic-parameter network and proposed a novel method for stain normalization, called ParamNet. ParamNet addresses the challenges of limited network capacity and computational efficiency by introducing dynamic parameters (weights and biases of convolutional layers) into the network design. By effectively leveraging these parameters, ParamNet achieves superior performance in stain normalization while maintaining computational efficiency. Results show ParamNet can normalize one whole slide image (WSI) of 100,000x100,000 within 25s. The code is available at: https://github.com/khtao/ParamNet.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# 言語間QA: コンテキスト内の言語間パフォーマンスをアンロックする鍵

Cross-lingual QA: A Key to Unlocking In-context Cross-lingual Performance ( http://arxiv.org/abs/2305.15233v3 )

ライセンス: Link先を確認

Sunkyoung Kim, Dayeon Ki, Yireun Kim, Jinsik Lee,

(参考訳) MLLM(Multilingual Large Language Model)は、コンテキスト内学習を通じて、言語間の重要な機能を示す。既存のアプローチは、典型的には、ソースまたはターゲット言語のいずれかで、モノリンガルなインコンテキストの例を構築します。しかし、コンテキスト内サンプル全体を対象言語に翻訳することは、コンテキスト整合性を損なう可能性があり、長いコンテキストパスの場合、コストがかかる。そこで本研究では,質問部と回答部のみを翻訳する言語間プロンプト手法であるクロスランガルQAを導入し,翻訳コストを削減した。 4つの類型的多言語ベンチマークの実験により、クロスランガルQAがモデルに効果的に刺激を与え、それらの言語間知識を引き出すことが示され、以前の単言語間プロンプトアプローチよりも優れていた。さらに,言語間実例を用いたオープンソースMLLMの高速化により,モデルスケールの増大とともに性能が向上することを示す。

Multilingual large language models (MLLMs) have demonstrated significant cross-lingual capabilities through in-context learning. Existing approaches typically construct monolingual in-context examples, either in the source or target language. However, translating entire in-context examples into the target language might compromise contextual integrity and be costly in the case of long-context passages. To address this, we introduce Cross-lingual QA, a cross-lingual prompting method that translates only the question and answer parts, thus reducing translation costs. Experiments on four typologically diverse multilingual benchmarks show that Cross-lingual QA prompting effectively stimulates models to elicit their cross-lingual knowledge, outperforming prior monolingual prompting approaches. Furthermore, we show that prompting open-source MLLMs with cross-lingual in-context examples enhances performance as the model scale increases.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# FlexRound: トレーニング後の量子化のための要素分割に基づく学習可能なラウンドリング

FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization ( http://arxiv.org/abs/2306.00317v2 )

ライセンス: Link先を確認

Jung Hyun Lee, Jeonghoon Kim, Se Jung Kwon, Dongsoo Lee,

(参考訳) トレーニング後の量子化(PTQ)は、量子化対応のトレーニングとは異なり、完全なトレーニングデータセットもエンドツーエンドトレーニングもまったく必要としないため、リソース制限されたデバイスへのディープニューラルネットワークのデプロイで人気が高まっている。近年, 各層やブロック出力を再構成したPTQスキームは, 定量化モデルの性能向上に有効であることが判明し, 各層やブロック出力をより良く再構築するための新しい重み付きスキームを考案し, 学習するアルゴリズムが開発されている。そこで本研究では,FlexRoundが共通の量子化グリッドサイズと,事前学習した各ウェイトに対する異なるスケールを共同学習できるように,従来の要素分割ではなく,要素分割をベースとしたPTQの簡易かつ効果的な新しいウェイトラウンド機構を提案する。要素分割によって誘導される微分の相互規則により、FlexRoundは本質的に、対応するスケールを更新する際に事前トレーニングされた重みを利用することができ、したがって、その大きさに応じて柔軟に事前トレーニングされた重みを定量化することができる。幅広いモデルやタスクにおいてFlexRoundの有効性を実証的に検証する。我々の知識を最大限に活用するために、画像分類と自然言語理解だけでなく、自然言語生成に関する総合的な実験を初めて行った。さらに,大規模言語モデルをブロック単位で再構築することで,半精度のベースラインと比較して,性能に無視できる影響しか持たず,効率的に定量化できることを実証した。私たちのコードは \url{https://github.com/onliwad101/FlexRound_LRQ} で利用可能です。

Post-training quantization (PTQ) has been gaining popularity for the deployment of deep neural networks on resource-limited devices since unlike quantization-aware training, neither a full training dataset nor end-to-end training is required at all. As PTQ schemes based on reconstructing each layer or block output turn out to be effective to enhance quantized model performance, recent works have developed algorithms to devise and learn a new weight-rounding scheme so as to better reconstruct each layer or block output. In this work, we propose a simple yet effective new weight-rounding mechanism for PTQ, coined \emph{FlexRound}, based on element-wise division instead of typical element-wise addition such that FlexRound enables jointly learning a common quantization grid size as well as a different scale for each pre-trained weight. Thanks to the reciprocal rule of derivatives induced by element-wise division, FlexRound is inherently able to exploit pre-trained weights when updating their corresponding scales, and thus, flexibly quantize pre-trained weights depending on their magnitudes. We empirically validate the efficacy of FlexRound on a wide range of models and tasks. To the best of our knowledge, our work is the first to carry out comprehensive experiments on not only image classification and natural language understanding but also natural language generation. Moreover, we demonstrate, for the first time, that large language models can be efficiently quantized, with only a negligible impact on performance compared to half-precision baselines, achieved by reconstructing the output in a block-by-block manner. Our code is available at \url{https://github.com/onliwad101/FlexRound_LRQ}.

翻訳日:2024-07-18 00:20:24 公開日:2024-07-16

# 対実予測セットを用いた意思決定支援システムの設計

Designing Decision Support Systems Using Counterfactual Prediction Sets ( http://arxiv.org/abs/2306.03928v3 )

ライセンス: Link先を確認

Eleni Straitouri, Manuel Gomez Rodriguez,

(参考訳) 分類タスクの意思決定支援システムは主に、基底真理ラベルの価値を予測するために設計されている。しかし、予測が完璧ではないため、これらのシステムは、いつ、どのようにして予測を更新するかを人間の専門家に理解させる必要がある。残念なことに、これは挑戦的だった。この文脈では最近、代替的な意思決定支援システムがこの課題を回避できるかもしれないと論じられている。これらのシステムは、単一のラベル予測を提供するのではなく、共形予測器を用いて構築されたラベル予測値のセット、すなわち予測セットを提供し、専門家に予測セットからラベル値を予測するように強制的に要求する。しかしながら、これらのシステムの設計と評価は、これまでのところ、形式化された専門家モデルに依存しており、彼らの約束に疑問を呈している。本稿では,このようなシステムの設計をオンライン学習の観点から再考し,専門家モデルを必要としない方法論を開発する。提案手法は,任意の共形予測器によって提供される予測セットのネスト構造と自然な反ファクト的単調性仮定を利用して,バニラバンディットアルゴリズムと比較して,後悔の指数的な改善を実現する。我々は、我々の方法論をいくつかの競争基準と比較するために、大規模な人体研究(n = 2{,}751$)を行う。その結果, 予測セットに基づく意思決定支援システムにおいて, 専門家のエージェントレベルを制限することは, 専門家が常に自分自身のエージェンシーを行使することよりも, 高いパフォーマンスをもたらすことがわかった。我々は、人間の主題研究に集められたデータと、我々のシステムのオープンソース実装をhttps://github.com/Networks-Learning/counterfactual-prediction-setsで公開しました。

Decision support systems for classification tasks are predominantly designed to predict the value of the ground truth labels. However, since their predictions are not perfect, these systems also need to make human experts understand when and how to use these predictions to update their own predictions. Unfortunately, this has been proven challenging. In this context, it has been recently argued that an alternative type of decision support systems may circumvent this challenge. Rather than providing a single label prediction, these systems provide a set of label prediction values constructed using a conformal predictor, namely a prediction set, and forcefully ask experts to predict a label value from the prediction set. However, the design and evaluation of these systems have so far relied on stylized expert models, questioning their promise. In this paper, we revisit the design of this type of systems from the perspective of online learning and develop a methodology that does not require, nor assumes, an expert model. Our methodology leverages the nested structure of the prediction sets provided by any conformal predictor and a natural counterfactual monotonicity assumption to achieve an exponential improvement in regret in comparison to vanilla bandit algorithms. We conduct a large-scale human subject study ($n = 2{,}751$) to compare our methodology to several competitive baselines. The results show that, for decision support systems based on prediction sets, limiting experts' level of agency leads to greater performance than allowing experts to always exercise their own agency. We have made available the data gathered in our human subject study as well as an open source implementation of our system at https://github.com/Networks-Learning/counterfactual-prediction-sets.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# PromptRobust: 対話型プロンプトにおける大規模言語モデルのロバスト性評価に向けて

PromptRobust: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts ( http://arxiv.org/abs/2306.04528v5 )

ライセンス: Link先を確認

Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Zhenqiang Gong, Xing Xie,

(参考訳) 学界や業界全体にわたる大規模言語モデル(LLM)への依存度の増加は、その堅牢さをプロンプトに包括的に理解する必要がある。この重要なニーズに対応するために,LLMの弾力性を測定するために設計された頑健性ベンチマークであるPromptRobustを導入する。本研究は、文字、単語、文、意味といった複数のレベルにわたるプロンプトを標的とした、敵対的なテキスト攻撃を多用する。逆のプロンプトは、タイプミスやシノニムなどのユーザエラーを模倣することを目的としており、意味的整合性を維持しながら、LCMの結果にわずかな偏差がどの程度影響するかを評価することを目的としている。これらのプロンプトは、感情分析、自然言語推論、読書理解、機械翻訳、数学の問題解決など様々なタスクに使用される。本研究は,8つのタスクと13のデータセットに対して慎重に評価した4,788の逆のプロンプトを生成する。以上の結果から,現代のLDMは敵のプロンプトに対して堅牢ではないことが示唆された。さらに,素早い強靭性と伝達性の背後にあるミステリーを理解するための包括的解析を行った。次に、洞察に富んだ堅牢性分析と、即興的な構成のための実用的なレコメンデーションを提供し、研究者と日々のユーザーの両方に有益である。

The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. The adversarial prompts, crafted to mimic plausible user errors like typos or synonyms, aim to evaluate how slight deviations can affect LLM outcomes while maintaining semantic integrity. These prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts. Furthermore, we present a comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# ロバストなセマンティックセグメンテーションモデルの信頼性評価と高速訓練に向けて

Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models ( http://arxiv.org/abs/2306.12941v2 )

ライセンス: Link先を確認

Francesco Croce, Naman D Singh, Matthias Hein,

(参考訳) 画像分類において、特に$\ell_\infty$-threatモデルにおいて、敵対的ロバスト性は広範囲に研究されてきたが、オブジェクト検出やセマンティックセグメンテーションといった関連するタスクでは、画像分類よりもはるかに難しい最適化問題であることが判明した。我々は,mIoUとmIoUの精度の異なる指標を最小化する,いくつかの問題固有の新規攻撃を提案する。攻撃のアンサンブルであるSEAは、既存の攻撃がセマンティックセグメンテーションモデルの堅牢性を大幅に過大評価していることを示している。驚くべきことに、セマンティックセグメンテーションモデルに対する既存の敵の訓練の試みは、弱かったり、全く損なわれなかったりしている。従来の逆行訓練のセマンティックセグメンテーションへの適応が失敗した理由を考察し、最近提案された堅牢なImageNetバックボーンを用いて、PASCAL-VOCとADE20kのトレーニング時間の最大6倍の堅牢なセマンティックセグメンテーションモデルを得ることができることを示す。関連コードとロバストモデルはhttps://github.com/nmndeep/robust-segmentationで公開されている。

Adversarial robustness has been studied extensively in image classification, especially for the $\ell_\infty$-threat model, but significantly less so for related tasks such as object detection and semantic segmentation, where attacks turn out to be a much harder optimization problem than for image classification. We propose several problem-specific novel attacks minimizing different metrics in accuracy and mIoU. The ensemble of our attacks, SEA, shows that existing attacks severely overestimate the robustness of semantic segmentation models. Surprisingly, existing attempts of adversarial training for semantic segmentation models turn out to be weak or even completely non-robust. We investigate why previous adaptations of adversarial training to semantic segmentation failed and show how recently proposed robust ImageNet backbones can be used to obtain adversarially robust semantic segmentation models with up to six times less training time for PASCAL-VOC and the more challenging ADE20k. The associated code and robust models are available at https://github.com/nmndeep/robust-segmentation

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# DP-SGDでは感度が過大評価される

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD ( http://arxiv.org/abs/2307.00310v3 )

ライセンス: Link先を確認

Anvith Thudi, Hengrui Jia, Casey Meehan, Ilia Shumailov, Nicolas Papernot,

(参考訳) 個人的確率勾配勾配勾配(DP-SGD)は、個人的深層学習における標準的アプローチである。 DP-SGDの現在のプライバシ分析は、いくつかの設定では厳密であることが知られているが、いくつかの実証的な結果は、一般的なベンチマークデータセットでトレーニングされたモデルが、多くのデータポイントのプライバシを著しく減らすことを示唆している。しかし、過去の試みにもかかわらず、なぜこれがそうなるのかという厳格な説明は得られていない。これは、これらのデータセット設定に制限された場合、より厳格なプライバシー上限が存在するためなのか、あるいは特定のデータポイントに対して、我々の攻撃は不十分なのか? 本稿では,DP-SGD の初 DP 解析(すなわち ``data-dependent' )を行う。我々の分析では、データセット内の類似の隣人が、アウトリージよりもデータ依存のプライバシを享受していることを示す直感を捉えています。形式的には、DP-SGDのステップごとのプライバシー分析を変更して、トレーニングデータセットから計算されたモデル更新の分布に依存するようにする。我々はさらに、この新たなステップごとの分析を効果的に活用して、トレーニングの実行全体について推論する新しい合成定理を開発した。まとめると、この新たなDP-SGD分析により、DP-SGDのリークが、現在のデータ非依存保証よりも多くのデータポイント(一般的なベンチマークでトレーニングされた場合)のプライバシーを著しく少なくすることを示すことができる。これは、敵がトレーニングデータセットを十分にコントロールしていない場合、プライバシ攻撃が多くのデータポイントに対して必ず失敗することを意味する。

Differentially private stochastic gradient descent (DP-SGD) is the canonical approach to private deep learning. While the current privacy analysis of DP-SGD is known to be tight in some settings, several empirical results suggest that models trained on common benchmark datasets leak significantly less privacy for many datapoints. Yet, despite past attempts, a rigorous explanation for why this is the case has not been reached. Is it because there exist tighter privacy upper bounds when restricted to these dataset settings, or are our attacks not strong enough for certain datapoints? In this paper, we provide the first per-instance (i.e., ``data-dependent") DP analysis of DP-SGD. Our analysis captures the intuition that points with similar neighbors in the dataset enjoy better data-dependent privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints (when trained on common benchmarks) than the current data-independent guarantee. This implies privacy attacks will necessarily fail against many datapoints if the adversary does not have sufficient control over the possible training datasets.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# MOCA:masked Online Codebook Assignmentsの予測による自己指導型表現学習

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments ( http://arxiv.org/abs/2307.09361v2 )

ライセンス: Link先を確認

Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos Komodakis, Matthieu Cord, Patrick Pérez,

(参考訳) 自己教師付き学習は、非常に大きな完全注釈付きデータセットに対するビジョントランスフォーマーネットワークの欲求を緩和するために使用することができる。自己教師付き学習の異なるクラスは、例えば、マスク付き画像モデリング戦略を使用する、あるいは、コントラスト的手法で画像摂動に不変な、文脈的推論特性を持つ表現を提供する。そこで本研究では,高レベルの特徴(ピクセルレベルの細部ではなく)で定義された新しいマスク・アンド・予測目標を用いて,所望のプロパティを統一するMOCAを提案する。さらに,学習パラダイムを相乗的かつ計算効率のよい方法で効果的に活用する方法を示す。そこで我々は,従来の手法よりも3倍高速なトレーニングを施した各種評価プロトコルにおいて,低照度設定における最先端の新たな結果と強力な実験結果を得る。実装コードはhttps://github.com/valeoai/MOCAで提供します。

Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose a single-stage and standalone method, MOCA, which unifies both desired properties using novel mask-and-predict objectives defined with high-level features (instead of pixel-level details). Moreover, we show how to effectively employ both learning paradigms in a synergistic and computation-efficient way. Doing so, we achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols with a training that is at least 3 times faster than prior methods. We provide the implementation code at https://github.com/valeoai/MOCA.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# HeightFormer:バードアイビューにおけるカメラのみの3次元物体検出のための余分なデータのない明示的な高さモデリング

HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird's Eye View ( http://arxiv.org/abs/2307.13510v3 )

ライセンス: Link先を確認

Yiming Wu, Ruixiang Li, Zequn Qin, Xinhai Zhao, Xi Li,

(参考訳) 視覚に基づくバードアイビュー(Bird's Eye View, BEV)の表現は、自律運転のための新たな知覚定式化である。最大の課題は、マルチカメラ機能を備えたBEVスペースを構築することだ。従来のBEV表現生成手法に分割すると,そのほとんどはイメージビューの深度をモデル化するか,BEV空間の高さをモデル化するかの2つのタイプに分類される。本研究では、LiDARのような余分なデータを必要としないBEV空間における高さを明示的にモデル化し、モデリング深度と比較して任意のカメラリグやタイプを適合させることができることを提案する。理論的には,高さに基づく手法と深さに基づく手法の等価性を示す。自己再帰的手法で高さと不確実性をモデル化するHeightFormerを提案する。追加のデータがなければ、提案されたHeightFormerはBEVの高度を正確に見積もることができる。ベンチマークの結果,HeightFormerの性能はカメラのみの手法と比較してSOTAを実現していることがわかった。

Vision-based Bird's Eye View (BEV) representation is an emerging perception formulation for autonomous driving. The core challenge is to construct BEV space with multi-camera features, which is a one-to-many ill-posed problem. Diving into all previous BEV representation generation methods, we found that most of them fall into two types: modeling depths in image views or modeling heights in the BEV space, mostly in an implicit way. In this work, we propose to explicitly model heights in the BEV space, which needs no extra data like LiDAR and can fit arbitrary camera rigs and types compared to modeling depths. Theoretically, we give proof of the equivalence between height-based methods and depth-based methods. Considering the equivalence and some advantages of modeling heights, we propose HeightFormer, which models heights and uncertainties in a self-recursive way. Without any extra data, the proposed HeightFormer could estimate heights in BEV accurately. Benchmark results show that the performance of HeightFormer achieves SOTA compared with those camera-only methods.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# 一般化されたアンバイアス付きシーングラフ生成

Generalized Unbiased Scene Graph Generation ( http://arxiv.org/abs/2308.04802v2 )

ライセンス: Link先を確認

Xinyu Lyu, Lianli Gao, Junlin Xie, Pengpeng Zeng, Yulu Tian, Jie Shao, Heng Tao Shen,

(参考訳) 既存のUnbiased Scene Graph Generation (USGG) 手法は、概念レベルの不均衡を克服しながら、高周波クラスが希少なクラスの予測を支配している述語レベルの不均衡に対処することのみに焦点を当てている。実際、たとえ述語自体がバランスが取れているとしても、文脈の長い尾の分布(つまり主観と対象の組み合わせ)のために、その中に重要な概念不均衡が存在する。この概念レベルの不均衡は、主対象対が本質的に結合において複雑であるため、述語レベルの不均衡よりも広範で困難な問題を引き起こす。そこで我々は, 述語レベルと概念レベルの両不均衡を考慮に入れた, 一般化されたアンバイアスドシーングラフ生成(G-USGG)という新たな研究問題を紹介した。最後に,MCL(Multi-Concept Learning)フレームワークを提案する。 MCLはまず、異なる概念の量の観点から述語間の概念レベルの不均衡を定量化し、同じクラス内の複数の概念-プロトタイプとして表す。その後、概念正規化(CR)技術を用いて概念プロトタイプを効果的に学習する。さらに、異なる概念に対するバランスの取れた学習を実現するために、SGGモデルを誘導し、コンセプトプロトタイプのためのバランスのとれた表現を生成する、バランスのとれたプロトタイプメモリ(BPM)を導入する。 VG-SGGデータセットとOI-SGGデータセットのベンチマークモデルの性能向上に,我々のモデル非依存戦略の顕著な効果が実証された。

Existing Unbiased Scene Graph Generation (USGG) methods only focus on addressing the predicate-level imbalance that high-frequency classes dominate predictions of rare ones, while overlooking the concept-level imbalance. Actually, even if predicates themselves are balanced, there is still a significant concept-imbalance within them due to the long-tailed distribution of contexts (i.e., subject-object combinations). This concept-level imbalance poses a more pervasive and challenging issue compared to the predicate-level imbalance since subject-object pairs are inherently complex in combinations. Hence, we introduce a novel research problem: Generalized Unbiased Scene Graph Generation (G-USGG), which takes into account both predicate-level and concept-level imbalance. To the end, we propose the Multi-Concept Learning (MCL) framework, which ensures a balanced learning process across rare/ uncommon/ common concepts. MCL first quantifies the concept-level imbalance across predicates in terms of different amounts of concepts, representing as multiple concept-prototypes within the same class. It then effectively learns concept-prototypes by applying the Concept Regularization (CR) technique. Furthermore, to achieve balanced learning over different concepts, we introduce the Balanced Prototypical Memory (BPM), which guides SGG models to generate balanced representations for concept-prototypes. Extensive experiments demonstrate the remarkable efficacy of our model-agnostic strategy in enhancing the performance of benchmark models on both VG-SGG and OI-SGG datasets, leading to new state-of-the-art achievements in two key aspects: predicate-level unbiased relation recognition and concept-level compositional generability.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# 変分量子回路による多変量積分

Multi-variable integration with a variational quantum circuit ( http://arxiv.org/abs/2308.05657v2 )

ライセンス: Link先を確認

Juan M. Cruz-Martinez, Matteo Robbiati, Stefano Carrazza,

(参考訳) 本研究では,量子回路を用いた多変数積分の評価手法を提案する。手順はまず、積分変数をパラメトリック回路に符号化する。得られた回路は、パラメータシフトルール法を用いて積分変数に対して導出される。導関数を表すオブザーバブルは、量子機械学習アプローチに従って、ターゲット積分関数の予測器として使用される。積分は、元の回路を評価することによって積分計算の基本定理を用いて推定される。再ロード戦略に従ってデータを埋め込み、多次元変数を回路のゲートに容易にエンコードし、回路を導出しながら個別にターゲットとして取り込むことができる。これらの手法は、関数を部分的に統合したり、トレーニングハイパースペース内でパラメトリック積分を高速に計算するために利用することができる。

In this work we present a novel strategy to evaluate multi-variable integrals with quantum circuits. The procedure first encodes the integration variables into a parametric circuit. The obtained circuit is then derived with respect to the integration variables using the parameter shift rule technique. The observable representing the derivative is then used as the predictor of the target integrand function following a quantum machine learning approach. The integral is then estimated using the fundamental theorem of integral calculus by evaluating the original circuit. Embedding data according to a reuploading strategy, multi-dimensional variables can be easily encoded into the circuit's gates and then individually taken as targets while deriving the circuit. These techniques can be exploited to partially integrate a function or to quickly compute parametric integrands within the training hyperspace.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# CUPID:より正確なバグレポート検出のためのChatGPTの活用

CUPID: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection ( http://arxiv.org/abs/2308.10022v3 )

ライセンス: Link先を確認

Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, David Lo,

(参考訳) 重複バグレポート検出(DBRD)は、学術と産業の両方において長年の課題である。過去数十年にわたって、重複バグレポートをより正確に検出するための様々なアプローチが提案されてきた。近年のディープラーニングの進歩により、DBRDタスクに対処するためのディープラーニングベースのアプローチもいくつか提案されている。多くのバグレポートを持つバグリポジトリでは、ディープラーニングベースのアプローチが有望なパフォーマンスを示している。しかし、バグ報告が少ないバグレポジトリでは、既存のディープラーニングアプローチは従来のアプローチよりもパフォーマンスが悪くなっている。従来のアプローチにも制限がある。例えば、バグレポートのセマンティクスをキャプチャできないbag-of-wordsモデルに基づいているのが一般的だ。上記の課題に対処するため,従来のDBRDアプローチの性能向上のために,最先端の大規模言語モデル(LLM)を活用しようと考えている。本稿では,従来のDBRD手法(すなわちREP)と最先端LLM(すなわちChatGPT)を組み合わせたCUPIDという手法を提案する。 CUPIDと既存の3つのデータセットを比較して評価を行った。実験の結果、CUPIDは最先端の結果を達成し、解析されたすべてのデータセットに対して、Recall Rate@10スコアが0.602から0.654まで到達した。特に、CUPIDは、データセットのリコールレート@10において、従来の最先端アプローチよりも5%から8%改善している。 CUPIDはまた、最先端のディープラーニングベースのDBRDアプローチを最大82%上回った。

Duplicate bug report detection (DBRD) is a long-standing challenge in both academia and industry. Over the past decades, researchers have proposed various approaches to detect duplicate bug reports more accurately. With the recent advancement of deep learning, researchers have also proposed several deep learning-based approaches to address the DBRD task. In the bug repositories with many bug reports, deep learning-based approaches have shown promising performance. However, in the bug repositories with a smaller number of bug reports, i.e., around 10k, the existing deep learning approaches show worse performance than the traditional approaches. Traditional approaches have limitations, too, e.g., they are usually based on the bag-of-words model, which cannot capture the semantics of bug reports. To address these aforementioned challenges, we seek to leverage a state-of-the-art large language model (LLM) to improve the performance of the traditional DBRD approach. In this paper, we propose an approach called CUPID, which combines the bestperforming traditional DBRD approach (i.e., REP) with the state-of-the-art LLM (i.e., ChatGPT). We conducted an evaluation by comparing CUPID with three existing approaches on three datasets. The experimental results show that CUPID achieves state-of-theart results, reaching Recall Rate@10 scores ranging from 0.602 to 0.654 across all the datasets analyzed. In particular, CUPID improves over the prior state-ofthe-art approach by 5% - 8% in terms of Recall Rate@10 in the datasets. CUPID also surpassed the state-of-the-art deep learning-based DBRD approach by up to 82%.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# GADePo:文書レベル関係抽出のためのグラフ支援宣言型ポーリング変換器

GADePo: Graph-Assisted Declarative Pooling Transformers for Document-Level Relation Extraction ( http://arxiv.org/abs/2308.14423v3 )

ライセンス: Link先を確認

Andrei C. Coman, Christos Theodoropoulos, Marie-Francine Moens, James Henderson,

(参考訳) 文書レベルの関係抽出は、典型的にはテキストベースのエンコーダと手書きプーリングヒューリスティックに頼り、エンコーダが学習した情報を集約する。本稿では,Transformerモデルの本質的なグラフ処理機能を活用し,アテンション重み計算における明示的なグラフ関係による情報収集を目的とした,手書きプーリング手法を入力に新しいトークンで置き換えることを提案する。本稿では,共同テキストグラフ変換モデルとグラフ支援型宣言型プール(GADePo)仕様を導入し,情報集約のための明示的かつ高レベルな命令を提供する。 GADePoによって、プールプロセスはドメイン固有の知識や望ましい結果によってガイドされるが、Transformerによってまだ学習され、より柔軟でカスタマイズ可能なプール戦略が実現される。提案手法は,多様なデータセットやモデルにまたがって評価し,手作業によるプール機能よりも一貫した優れた有望な結果が得られることを示す。

Document-level relation extraction typically relies on text-based encoders and hand-coded pooling heuristics to aggregate information learned by the encoder. In this paper, we leverage the intrinsic graph processing capabilities of the Transformer model and propose replacing hand-coded pooling methods with new tokens in the input, which are designed to aggregate information via explicit graph relations in the computation of attention weights. We introduce a joint text-graph Transformer model and a graph-assisted declarative pooling (GADePo) specification of the input, which provides explicit and high-level instructions for information aggregation. GADePo allows the pooling process to be guided by domain-specific knowledge or desired outcomes but still learned by the Transformer, leading to more flexible and customisable pooling strategies. We evaluate our method across diverse datasets and models and show that our approach yields promising results that are consistently better than those achieved by the hand-coded pooling functions.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# 深度3次元視覚接地における両眼融合改善のための4つの方法

Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding ( http://arxiv.org/abs/2309.04561v3 )

ライセンス: Link先を確認

Ozan Unal, Christos Sakaridis, Suman Saha, Luc Van Gool,

(参考訳) 3Dビジュアルグラウンドティング(3D visual grounding)は、自然言語で記述された3Dシーンでオブジェクトをローカライズするタスクである。自律型屋内ロボティクスからAR/VRまで幅広い応用により、このタスクは最近人気が高まっている。 3次元の視覚的接地に取り組むための一般的な定式化は、境界ボックスを介して局所化を行うグラウンド・バイ・検出である。しかし、物理的な相互作用を必要とする現実のアプリケーションでは、境界ボックスは対象の幾何学を十分に記述していない。そこで我々は,高密度な3次元視覚的接地,すなわちレファレンシャルベースの3次元インスタンスセグメンテーションの問題に取り組む。本研究では,4つの新しいスタンドアロンモジュールを特徴とする高密度な3DグラウンドネットワークであるContactNetを提案する。まず,階層間関係を曖昧にすることを目的としたボトムアップ注意融合モジュールを導入し,次に,潜時空間の分離を誘導する対照的な学習手法を構築し,学習されたグローバルカメラトークンを用いてビュー依存発話を解決し,最後に,参照マスクの品質を向上させるためにマルチビューアンサンブルを用いる。 concreteNetは、挑戦的なScanReferオンラインベンチマークで1位にランクインし、ICCV 3rd Workshop on Language for 3D Scenes "3D Object Localization"チャレンジで優勝した。私たちのコードはouenal.github.io/concretenet/で利用可能です。

3D visual grounding is the task of localizing the object in a 3D scene which is referred by a description in natural language. With a wide range of applications ranging from autonomous indoor robotics to AR/VR, the task has recently risen in popularity. A common formulation to tackle 3D visual grounding is grounding-by-detection, where localization is done via bounding boxes. However, for real-life applications that require physical interactions, a bounding box insufficiently describes the geometry of an object. We therefore tackle the problem of dense 3D visual grounding, i.e. referral-based 3D instance segmentation. We propose a dense 3D grounding network ConcreteNet, featuring four novel stand-alone modules that aim to improve grounding performance for challenging repetitive instances, i.e. instances with distractors of the same semantic class. First, we introduce a bottom-up attentive fusion module that aims to disambiguate inter-instance relational cues, next, we construct a contrastive training scheme to induce separation in the latent space, we then resolve view-dependent utterances via a learned global camera token, and finally we employ multi-view ensembling to improve referred mask quality. ConcreteNet ranks 1st on the challenging ScanRefer online benchmark and has won the ICCV 3rd Workshop on Language for 3D Scenes "3D Object Localization" challenge. Our code is available at ouenal.github.io/concretenet/.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# スケーラブルニューラルネットワークによる粒子流イベント再構成の改良

Improved particle-flow event reconstruction with scalable neural networks for current and future particle detectors ( http://arxiv.org/abs/2309.06782v6 )

ライセンス: Link先を確認

Joosep Pata, Eric Wulff, Farouk Mokhtar, David Southwick, Mengke Zhang, Maria Girone, Javier Duarte,

(参考訳) 高感度大型ハドロン衝突型加速器とFuture Circular Colliderで期待される高粒度検出器の粒子を、効率的かつ正確なアルゴリズムで再構成する必要がある。電子-陽電子衝突における事象再構成のためのスケーラブルな機械学習モデルについて, フル検出器シミュレーションに基づく検討を行った。粒子フロー再構成は、トラックとカロリークラスタを用いた教師付き学習タスクとして定式化することができる。グラフニューラルネットワークとカーネルベースのトランスフォーマーを比較し、現実的な再構築を実現しながら二次演算を回避できることを実証する。ハイパーパラメータチューニングにより,モデルの性能が大幅に向上することを示す。最良のグラフニューラルネットワークモデルでは、ルールベースのアルゴリズムと比較して、ジェット横運動量分解能が最大50%向上している。結果はNvidia、AMD、Habanaのハードウェアに移植できる。高精度かつ高速な機械学習に基づく再構築は、衝突機における将来の測定を大幅に改善することができる。

Efficient and accurate algorithms are necessary to reconstruct particles in the highly granular detectors anticipated at the High-Luminosity Large Hadron Collider and the Future Circular Collider. We study scalable machine learning models for event reconstruction in electron-positron collisions based on a full detector simulation. Particle-flow reconstruction can be formulated as a supervised learning task using tracks and calorimeter clusters. We compare a graph neural network and kernel-based transformer and demonstrate that we can avoid quadratic operations while achieving realistic reconstruction. We show that hyperparameter tuning significantly improves the performance of the models. The best graph neural network model shows improvement in the jet transverse momentum resolution by up to 50% compared to the rule-based algorithm. The resulting model is portable across Nvidia, AMD and Habana hardware. Accurate and fast machine-learning based reconstruction can significantly improve future measurements at colliders.

翻訳日:2024-07-18 00:10:39 公開日:2024-07-16

# 言語モデルの物理:その3.1,知識の蓄積と抽出

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction ( http://arxiv.org/abs/2309.14316v3 )

ライセンス: Link先を確認

Zeyuan Allen-Zhu, Yuanzhi Li,

(参考訳) 大規模な言語モデル(LLM)は膨大な量の世界の知識を格納することができ、しばしば質問回答によって抽出できる(例:エイブラハム・リンカーンの誕生日とは何か)。しかし、これらの質問は、トレーニング中に類似した質問(つまり不正行為)に暴露されたり、ウィキペディアのような情報源から知識を引き出すために真に学習することで答えるのだろうか? 本稿では,制御されたバイオグラフィーデータセットを用いてこの問題を考察する。モデルが知識を抽出する能力と,トレーニングデータの多様な多様性尺度との間には,強い相関関係が認められた。 $\textbf{Essentially}$、知識を確実に抽出するには、十分な拡張(言い換え、文シャッフル、翻訳)が必要である。このような拡張がなければ、知識は記憶されるが抽出できないため、その後の命令の微調整に関わらず、精度は0%になる。この理由を理解するために、我々は、観測された相関関係とモデル内部の知識のエンコード方法(エンティティ名の隠された埋め込みに線形にエンコードされているか、あるいはトレーニングテキストに他のトークンの埋め込みに分散されているか)の強い関係を示すために、線形なプローブを用いています。本論文では、LLM事前学習のための$\textbf{several key recommend for LLM pretraining}$: (1) 事前学習データ -- 小さな補助モデルを使って -- を書き換え、知識増強を提供し、(2) 事前学習段階により多くの命令精細化データを組み込む。

Large language models (LLMs) can store a vast amount of world knowledge, often extractable via question-answering (e.g., "What is Abraham Lincoln's birthday?"). However, do they answer such questions based on exposure to similar questions during training (i.e., cheating), or by genuinely learning to extract knowledge from sources like Wikipedia? In this paper, we investigate this issue using a controlled biography dataset. We find a strong correlation between the model's ability to extract knowledge and various diversity measures of the training data. $\textbf{Essentially}$, for knowledge to be reliably extracted, it must be sufficiently augmented (e.g., through paraphrasing, sentence shuffling, translations) $\textit{during pretraining}$. Without such augmentation, knowledge may be memorized but not extractable, leading to 0% accuracy, regardless of subsequent instruction fine-tuning. To understand why this occurs, we employ (nearly) linear probing to demonstrate a strong connection between the observed correlation and how the model internally encodes knowledge -- whether it is linearly encoded in the hidden embeddings of entity names or distributed across other token embeddings in the training text. This paper provides $\textbf{several key recommendations for LLM pretraining in the industry}$: (1) rewrite the pretraining data -- using small, auxiliary models -- to provide knowledge augmentation, and (2) incorporate more instruction-finetuning data into the pretraining stage before it becomes too late.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# 言語モデルの物理:その3.2,知識操作

Physics of Language Models: Part 3.2, Knowledge Manipulation ( http://arxiv.org/abs/2309.14402v2 )

ライセンス: Link先を確認

Zeyuan Allen-Zhu, Yuanzhi Li,

(参考訳) 言語モデルは膨大な事実知識を格納することができるが、この知識を下流のタスク(例えば、命令の微調整)に柔軟に活用する能力には疑問が残る。本稿では、検索(eg, "A's attribute X?")、分類(eg, "A's attribute X even or odd?)、比較(eg, "A's greater than B in attribute X?")、逆探索(eg, "Which person's attribute X equals T?)の4つの基本的な知識操作タスクについて検討する。思考の連鎖(CoT)を学習と推論の両方に用いない限り,言語モデルは知識検索に優れるが,最も単純な分類や比較タスクにおいても困難であることを示す。さらに,その逆知識探索における性能は,プロンプトによらずほぼ0%である。十分なトレーニングと十分なモデルサイズにもかかわらず、これらの知識がモデルに完全に格納されている場合でも、事前学習データから知識を効率的に操作することはできない。また、GPT-4のような現代の事前学習言語モデルにも適用でき、現代のAIと人間を区別するためのチューリングテストが多数発生している。

Language models can store vast factual knowledge, yet their ability to flexibly use this knowledge for downstream tasks (e.g., via instruction finetuning) remains questionable. This paper investigates four fundamental knowledge manipulation tasks: retrieval (e.g., "What is person A's attribute X?"), classification (e.g., "Is A's attribute X even or odd?"), comparison (e.g., "Is A greater than B in attribute X?"), and inverse search (e.g., "Which person's attribute X equals T?"). We show that language models excel in knowledge retrieval but struggle even in the simplest classification or comparison tasks unless Chain of Thoughts (CoTs) are employed during both training and inference. Moreover, their performance in inverse knowledge search is virtually 0%, regardless of the prompts. Our primary contribution is a controlled, synthetic experiment that confirms these weaknesses are inherent to language models: they cannot efficiently manipulate knowledge from pre-training data, even when such knowledge is perfectly stored in the models, despite adequate training and sufficient model size. Our findings also apply to modern pretrained language models such as GPT-4, thus giving rise to many Turing tests to distinguish Humans from contemporary AIs.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# UltraFeedback: 大規模AIフィードバックによる言語モデルの強化

UltraFeedback: Boosting Language Models with Scaled AI Feedback ( http://arxiv.org/abs/2310.01377v2 )

ライセンス: Link先を確認

Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, Maosong Sun,

(参考訳) 人間からのフィードバックから学ぶことは、大きな言語モデル(LLM)と人間の好みを整合させる重要なテクニックとなっている。しかし、膨大な量の人的フィードバックを取得することは、時間、労力、人的能力によってボトルネックとなり、結果として、現在のデータセットの小さなサイズや限られたトピックが生まれる。これにより、フィードバック学習だけでなく、オープンソースコミュニティ内のアライメント調査も妨げられます。この問題に対処するために,人間のフィードバックを超えて,スケーラブルな代替手段として高品質な‘textit{AI feedback’を自動的に収集する方法を検討する。具体的には,フィードバックデータに影響を及ぼす重要な要因として,‘textbf{scale and diversity} を同定する。そこで,我々はまず,幅広いユーザ・アシスタントインタラクションを包含するために,量と幅の両方で指示と応答を広げる。そして、より信頼性の高いAIフィードバックに対するアノテーションバイアスを軽減するために、慎重に一連のテクニックを適用します。我々はついに、大規模で高品質で多様なAIフィードバックデータセットである‘textsc{UltraFeedback}を提示した。 textsc{UltraFeedback}に基づいて構築され、LLaMAベースのモデルをベスト・オブ・n$のサンプリングと強化学習によって整列させ、チャットベンチマークで例外的なパフォーマンスを示す。我々の研究は、オープンソースのチャット言語モデルの構築におけるスケールドAIフィードバックデータの有効性を検証し、将来のフィードバック学習研究の基盤となる。我々のデータとモデルはhttps://github.com/thunlp/UltraFeedback.comで利用可能です。

Learning from human feedback has become a pivot technique in aligning large language models (LLMs) with human preferences. However, acquiring vast and premium human feedback is bottlenecked by time, labor, and human capability, resulting in small sizes or limited topics of current datasets. This further hinders feedback learning as well as alignment research within the open-source community. To address this issue, we explore how to go beyond human feedback and collect high-quality \textit{AI feedback} automatically for a scalable alternative. Specifically, we identify \textbf{scale and diversity} as the key factors for feedback data to take effect. Accordingly, we first broaden instructions and responses in both amount and breadth to encompass a wider range of user-assistant interactions. Then, we meticulously apply a series of techniques to mitigate annotation biases for more reliable AI feedback. We finally present \textsc{UltraFeedback}, a large-scale, high-quality, and diversified AI feedback dataset, which contains over 1 million GPT-4 feedback for 250k user-assistant conversations from various aspects. Built upon \textsc{UltraFeedback}, we align a LLaMA-based model by best-of-$n$ sampling and reinforcement learning, demonstrating its exceptional performance on chat benchmarks. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models, serving as a solid foundation for future feedback learning research. Our data and models are available at https://github.com/thunlp/UltraFeedback.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# 土壌相互作用を考慮した自由フェルミオン系における情報交換

Information Scrambling in Free Fermion Systems with a Sole Interaction ( http://arxiv.org/abs/2310.07043v2 )

ライセンス: Link先を確認

Qucheng Gao, Pengfei Zhang, Xiao Chen,

(参考訳) 単一不純物の存在は、低温における量子多体系の輸送特性に大きな影響を与えることがよく確認されている。本研究では,量子情報力学の観点から,この問題の類似性について検討する。我々は、自由フェルミオンホッピング項と単独相互作用からなるブラウン回路とクリフォード回路を構築する。両回路とも,演算子のスクランブルの発生が明らかとなった。特に、作用素の成長は、単一の点に局所化された元項の存在下での対称排除過程にマッピングすることができる。 1次元システムでは、演算子と絡み合いの両方が拡散スケーリングを示す。逆に、オール・ツー・オールホッピングによって特徴づけられるシナリオでは、作用素のサイズは指数関数的に成長し、エンタングルメントは時間とともに線形に増加する。

It is well established that the presence of single impurity can have a substantial impact on the transport properties of quantum many-body systems at low temperature. In this work, we investigate a close analog of this problem from the perspective of quantum information dynamics. We construct Brownian circuits and Clifford circuits consisting of a free fermion hopping term and a sole interaction. In both circuits, our findings reveal the emergence of operator scrambling. Notably, the growth of the operator can be mapped to the symmetric exclusion process in the presence of a source term localized at a single point. We demonstrate that in the one-dimensional system, both the operator and entanglement exhibit diffusive scaling. Conversely, in scenarios characterized by all-to-all hopping, the operator's size undergoes exponential growth, while the entanglement exhibits a linear increase over time.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# 汎用バックボーンネットワーク設計のための画像復元ネットワークの比較検討

A Comparative Study of Image Restoration Networks for General Backbone Network Design ( http://arxiv.org/abs/2310.11881v4 )

ライセンス: Link先を確認

Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, Chao Dong,

(参考訳) 様々な画像復元作業における深層モデルによる顕著な進歩にもかかわらず、既存の画像復元ネットワークはタスクの汎用性の観点からも課題に直面している。直感的な表現は、あるタスクで優れているネットワークは、他のタスクで満足な結果をもたらすのに失敗することが多い、ということである。この点を説明するために、5つの代表的ネットワークを選択し、5つの古典的画像復元タスクの比較研究を行う。まず、画像復元タスクとバックボーンネットワークの特徴について、詳細な説明を行う。次に、ベンチマーク結果を示し、様々なタスクにおける異なるモデルの性能格差の背景にある理由を分析する。この比較研究から,一般的な画像復元バックボーンネットワークは多様なタスクの機能的要件を満たす必要があることを示唆する。この原理に基づいて,新しい画像復元バックボーンネットワークであるX-Restormerを設計する。大規模な実験により、X-Restormerは優れたタスクの汎用性を有し、様々なタスクで最先端のパフォーマンスを達成することが示された。

Despite the significant progress made by deep models in various image restoration tasks, existing image restoration networks still face challenges in terms of task generality. An intuitive manifestation is that networks which excel in certain tasks often fail to deliver satisfactory results in others. To illustrate this point, we select five representative networks and conduct a comparative study on five classic image restoration tasks. First, we provide a detailed explanation of the characteristics of different image restoration tasks and backbone networks. Following this, we present the benchmark results and analyze the reasons behind the performance disparity of different models across various tasks. Drawing from this comparative study, we propose that a general image restoration backbone network needs to meet the functional requirements of diverse tasks. Based on this principle, we design a new general image restoration backbone network, X-Restormer. Extensive experiments demonstrate that X-Restormer possesses good task generality and achieves state-of-the-art performance across a variety of tasks.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# 外部誘導による画像クラスタリング

Image Clustering with External Guidance ( http://arxiv.org/abs/2310.11989v3 )

ライセンス: Link先を確認

Yunfan Li, Peng Hu, Dezhong Peng, Jiancheng Lv, Jianping Fan, Xi Peng,

(参考訳) クラスタリングのコアは、監視信号を構築するために、事前の知識を取り入れている。データコンパクト性に基づく古典的なk-平均から、自己スーパービジョンによって導かれる最近のコントラストクラスタリングまで、クラスタリング法の進化は本質的に監督信号の進行に対応している。現在、データから内部監視信号のマイニングに多大な努力が注がれている。それでも、クラスタリングに自然に寄与する意味記述のような豊富な外部知識は、残念なことに見過ごされている。本研究では,クラスタリングを誘導する新たな監視信号として外部知識を活用することを提案する。提案手法の実装と検証のために,WordNetのテキストセマンティクスを活用して画像クラスタリングを容易にする外部ガイド型クラスタリング手法(Text-Aided Clustering, TAC)を設計した。特に、TACは最初にWordNetの名詞を選択して検索し、特徴識別性を高めるために画像を最もよく区別する。そして、画像クラスタリング性能を向上させるために、TACは、相互にモダル近傍情報を蒸留することにより、テキストと画像のモダリティを協調する。実験によると、TACは、広く使用されている5つの画像クラスタリングベンチマークと、完全なImageNet-1Kデータセットを含む、より難しい3つのイメージクラスタリングベンチマークで、最先端のパフォーマンスを達成する。

The core of clustering is incorporating prior knowledge to construct supervision signals. From classic k-means based on data compactness to recent contrastive clustering guided by self-supervision, the evolution of clustering methods intrinsically corresponds to the progression of supervision signals. At present, substantial efforts have been devoted to mining internal supervision signals from data. Nevertheless, the abundant external knowledge such as semantic descriptions, which naturally conduces to clustering, is regrettably overlooked. In this work, we propose leveraging external knowledge as a new supervision signal to guide clustering, even though it seems irrelevant to the given data. To implement and validate our idea, we design an externally guided clustering method (Text-Aided Clustering, TAC), which leverages the textual semantics of WordNet to facilitate image clustering. Specifically, TAC first selects and retrieves WordNet nouns that best distinguish images to enhance the feature discriminability. Then, to improve image clustering performance, TAC collaborates text and image modalities by mutually distilling cross-modal neighborhood information. Experiments demonstrate that TAC achieves state-of-the-art performance on five widely used and three more challenging image clustering benchmarks, including the full ImageNet-1K dataset.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# アウト・オブ・ディストリビューション検出のためのディープ・アンサンブルの再検討:失われた景観の展望

Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective ( http://arxiv.org/abs/2310.14227v2 )

ライセンス: Link先を確認

Kun Fang, Qinghua Tao, Xiaolin Huang, Jie Yang,

(参考訳) In-Distribution(In-Distribution(InD)データからOoDサンプルを検出する既存のOoD検出手法は,Deep Neural Networks(DNNs)の特徴,ロジット,勾配の差を主に探究する。本研究では,OoD検出における損失景観とモードアンサンブルの新しい視点を提案する。 DNNの最適化では、パラメータ空間やモードに多くの局所最適化が存在する。興味深いことに、これらの独立モードはInDデータ(トレーニングとテストデータ)で低損失領域に到達するが、OoDデータでは損失ランドスケープが著しく異なる。このような観察は、損失ランドスケープからのOoD検出を調査するための新しい視点を提供し、さらに、これらのモード間でのOoD検出性能を著しく変動させることを示唆している。例えば、RopFeatメソッドのFPR値は5つのモードのうち46.58%から84.70%まで変化し、独立モード間で不確実な検出性能評価を示す。モード間におけるOoD損失ランドスケープの多様化により,モードアンサンブルによるOoD検出の深層アンサンブル法が再検討され,性能が向上し,ばらつきを低減したOoD検出器のメリットが得られた。様々なOoD検出器とネットワーク構造を包含する広範囲な実験は、モード間の高いばらつきを示し、OOD検出を促進するモードアンサンブルの優位性を検証する。我々は、OoDデータのロスランドスケープにおける独立モードや、OoD検出器の信頼性の高い評価の観点から、この研究が注目されることを期待している。

Existing Out-of-Distribution (OoD) detection methods address to detect OoD samples from In-Distribution (InD) data mainly by exploring differences in features, logits and gradients in Deep Neural Networks (DNNs). We in this work propose a new perspective upon loss landscape and mode ensemble to investigate OoD detection. In the optimization of DNNs, there exist many local optima in the parameter space, or namely modes. Interestingly, we observe that these independent modes, which all reach low-loss regions with InD data (training and test data), yet yield significantly different loss landscapes with OoD data. Such an observation provides a novel view to investigate the OoD detection from the loss landscape, and further suggests significantly fluctuating OoD detection performance across these modes. For instance, FPR values of the RankFeat method can range from 46.58% to 84.70% among 5 modes, showing uncertain detection performance evaluations across independent modes. Motivated by such diversities on OoD loss landscape across modes, we revisit the deep ensemble method for OoD detection through mode ensemble, leading to improved performance and benefiting the OoD detector with reduced variances. Extensive experiments covering varied OoD detectors and network structures illustrate high variances across modes and validate the superiority of mode ensemble in boosting OoD detection. We hope this work could attract attention in the view of independent modes in the loss landscape of OoD data and more reliable evaluations on OoD detectors.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# 生成・検出のためのパラフレーズタイプ

Paraphrase Types for Generation and Detection ( http://arxiv.org/abs/2310.14863v3 )

ライセンス: Link先を確認

Jan Philip Wahle, Bela Gipp, Terry Ruas,

(参考訳) パラフレーズの生成と検出の現在のアプローチは、言語の複雑な言語特性を無視して、単一の一般的な類似点に大きく依存している。本稿では, パラフレーズ型, 特定のテキスト位置における特定の言語摂動を考慮した2つの新しい課題を提案する。これらのタスクをパラフレーズ型生成とパラフレーズ型検出と呼ぶ。以上の結果から,従来の手法は二項分類のシナリオ,すなわちパラフレーズ化の有無でよく機能するが,粒度の細かいパラフレーズ型の含みは大きな課題となることが示唆された。ほとんどのアプローチは、一般的な意味的類似コンテンツの生成と検出に長けているが、それらが操作する固有の言語変数を理解できない。パラフレーズ型の生成と識別について訓練されたモデルは、それらなしでのタスクの改善も示している。さらに、これらのモデルをスケールすることで、パラフレーズの型を理解する能力がさらに向上する。我々は、パラフレーズ型が将来、パラフレーズモデルの開発とタスクの解決のための新しいパラダイムを解き放つことができると考えている。

Current approaches in paraphrase generation and detection heavily rely on a single general similarity score, ignoring the intricate linguistic properties of language. This paper introduces two new tasks to address this shortcoming by considering paraphrase types - specific linguistic perturbations at particular text positions. We name these tasks Paraphrase Type Generation and Paraphrase Type Detection. Our results suggest that while current techniques perform well in a binary classification scenario, i.e., paraphrased or not, the inclusion of fine-grained paraphrase types poses a significant challenge. While most approaches are good at generating and detecting general semantic similar content, they fail to understand the intrinsic linguistic variables they manipulate. Models trained in generating and identifying paraphrase types also show improvements in tasks without them. In addition, scaling these models further improves their ability to understand paraphrase types. We believe paraphrase types can unlock a new paradigm for developing paraphrase models and solving tasks in the future.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# 私たちは誰だ:自然言語処理と他の学術分野の影響の橋渡し

We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields ( http://arxiv.org/abs/2310.14870v3 )

ライセンス: Link先を確認

Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad,

(参考訳) 自然言語処理(NLP)は、世界に大きな影響を与える可能性がある。しかし、大きな進歩は大きなリスクを伴う。これに対処するには、様々な分野の研究に幅広く関与する必要がある。しかし、そのようなエンゲージメント(パストまたはカレント)の状態を実証する経験的な研究はほとんどない。本稿では,23分野の学習分野とNLP(相互に)の影響力の程度を定量化する。我々は,77kのNLP論文,NLP論文から他の論文への3.1mの引用,および他の論文からNLP論文への1.8mの引用を分析した。その結果,1980年には0.58から2022年には0.31に減少した。さらに、NLPはますます不規則になってきており、NLPの論文が増え、フィールド間のブリッジとして機能する論文も少なくなっている。 NLP引用の8%未満は言語学、3%未満は数学と心理学である。これらの知見は,NLPの様々な分野への関与を反映する緊急の必要性を浮き彫りにしている。

Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# VMAFによるPyTorchの再実装:実験結果

VMAF Re-implementation on PyTorch: Some Experimental Results ( http://arxiv.org/abs/2310.15578v4 )

ライセンス: Link先を確認

Kirill Aistov, Maxim Koroteev,

(参考訳) 標準VMAF実装に基づいて,PyTorchフレームワークを用いたVMAFの実装を提案する。この実装について、標準 (libvmaf) と比較すると、VMAF単位における差は$\lesssim 10^{-2}$である。目的関数としてVMAFを使用する場合の勾配計算について検討し、この関数を用いたトレーニングが不利な勾配を生じさせないことを示す。実装はプレプロセスフィルタのトレーニングに使用される。その性能はアンシャープマスキングフィルタよりも優れていることが実証された。結果として得られるフィルタは実装も容易であり、ビデオ圧縮改善のためのビデオ処理タスクにも適用できる。これは数値実験の結果によって確認される。

Based on the standard VMAF implementation we propose an implementation of VMAF using PyTorch framework. For this implementation comparisons with the standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We investigate gradients computation when using VMAF as an objective function and demonstrate that training using this function does not result in ill-behaving gradients. The implementation is then used to train a preprocessing filter. It is demonstrated that its performance is superior to the unsharp masking filter. The resulting filter is also easy for implementation and can be applied in video processing tasks for video copression improvement. This is confirmed by the results of numerical experiments.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# 時間優先による自己注意: 時間短縮からもっと学べるか?

Self Attention with Temporal Prior: Can We Learn More from Arrow of Time? ( http://arxiv.org/abs/2310.18932v3 )

ライセンス: Link先を確認

Kyung Geun Kim, Byeong Tak Lee,

(参考訳) 自然界における多くの多様な現象は、特に時間の流れの方向から生じる短期的および長期的依存関係の両方を本質的にエンコードする。この点に関して、より近い時間スタンプでは、これらの事象の相互関係がより高いことを示す実験的証拠が発見された。しかし、注意に基づくモデルでこれらの規則を短期的な依存関係で学習するためには、大量のデータが必要である。これは、断片的な時間的依存を学ぶのに長けているが、注意に基づくモデルは時系列のバイアスをエンコードする構造を欠いているためである。そこで本研究では,学習可能な適応型カーネルをアテンション行列に直接適用することにより,これらのデータセットの短期的時間的バイアスをよりよく符号化する,シンプルで効率的な手法を提案する。我々はElectronic Health Records(EHR)データセットを用いた実験の様々な予測タスクを選択した。本実験は,ほとんどのタスクやデータセットにおいて,最高の性能を示すモデルと比較して,例外的な分類結果を示す。

Many diverse phenomena in nature often inherently encode both short- and long-term temporal dependencies, which especially result from the direction of the flow of time. In this respect, we discovered experimental evidence suggesting that interrelations of these events are higher for closer time stamps. However, to be able for attention-based models to learn these regularities in short-term dependencies, it requires large amounts of data, which are often infeasible. This is because, while they are good at learning piece-wise temporal dependencies, attention-based models lack structures that encode biases in time series. As a resolution, we propose a simple and efficient method that enables attention layers to better encode the short-term temporal bias of these data sets by applying learnable, adaptive kernels directly to the attention matrices. We chose various prediction tasks for the experiments using Electronic Health Records (EHR) data sets since they are great examples with underlying long- and short-term temporal dependencies. Our experiments show exceptional classification results compared to best-performing models on most tasks and data sets.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# TeacherLM: 魚を贈るよりも魚を教えること、言語モデリングも同じように

TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise ( http://arxiv.org/abs/2310.19019v3 )

ライセンス: Link先を確認

Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan,

(参考訳) 大規模言語モデル(LLM)は、様々なNLPタスクにおいて印象的な推論とデータ拡張能力を示す。しかし、小さなモデルはどうだろう? 本研究では,多くのNLPサンプルに対して,関連する基本や思考の連鎖,一般的な誤りを注釈できるTeachLM-7.1Bを提案する。 TeacherLM-7.1BモデルはMMLUで0ショットスコア52.3を獲得し、100B以上のパラメータを持つほとんどのモデルを上回った。さらに注目すべきは、データ拡張機能だ。 TeacherLM-7.1Bに基づいて58個のNLPデータセットを拡張し,OPTおよびBLOOMシリーズと異なるパラメータを持つ様々な学生モデルをマルチタスク環境で教えた。実験結果から,TeachLMが提供するデータ拡張が大きなメリットをもたらしたことが示唆された。 TeacherLMシリーズのモデルと拡張データセットをオープンソースとしてリリースします。

Large Language Models (LLMs) exhibit impressive reasoning and data augmentation capabilities in various NLP tasks. However, what about small models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant fundamentals, chain of thought, and common mistakes for most NLP samples, which makes annotation more than just an answer, thus allowing other models to learn "why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we augmented 58 NLP datasets and taught various student models with different parameters from OPT and BLOOM series in a multi-task setting. The experimental results indicate that the data augmentation provided by TeacherLM has brought significant benefits. We will release the TeacherLM series of models and augmented datasets as open-source.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# フェデレーション・アンラーニングに関する調査 : 課題,方法,今後の方向性

A Survey on Federated Unlearning: Challenges, Methods, and Future Directions ( http://arxiv.org/abs/2310.20448v4 )

ライセンス: Link先を確認

Ziyao Liu, Yu Jiang, Jiyuan Shen, Minyi Peng, Kwok-Yan Lam, Xingliang Yuan, Xiaoning Liu,

(参考訳) 近年、忘れられる権利(RTBF)の概念は、デジタル信頼とAI安全のためのデータプライバシの重要な側面となり、要求に応じて個人の個人データの削除をサポートするメカニズムの提供が求められている。その結果、機械学習(MU)が注目され、MLモデルが識別可能な情報を選択的に排除することができるようになった。 MUから進化したFunderated Unlearning(FU)は、FLモデルにFLクライアントを解放する権限を与えるフェデレートラーニング(FL)設定におけるデータ消去の課題に直面している。それでも、連合学習の特徴は、FU技術に固有の課題をもたらす。これらの課題は、FUアルゴリズムを開発する際に適切な設計を必要とする。この分野では、様々な概念や多くの非学習スキームが存在するが、統一ワークフローとFUのカスタマイズ設計はまだ十分に理解されていない。そこで本研究では, FUにおける基礎概念と原則の概観, 既存のアンラーニングアルゴリズムの評価, フェデレーション学習に適した最適化の見直しなど, 手法と方法論を総合的に検討した。さらに、実用的応用について検討し、その限界を評価する。最後に、将来の研究への有望な方向性を概説する。

In recent years, the notion of ``the right to be forgotten" (RTBF) has become a crucial aspect of data privacy for digital trust and AI safety, requiring the provision of mechanisms that support the removal of personal data of individuals upon their requests. Consequently, machine unlearning (MU) has gained considerable attention which allows an ML model to selectively eliminate identifiable information. Evolving from MU, federated unlearning (FU) has emerged to confront the challenge of data erasure within federated learning (FL) settings, which empowers the FL model to unlearn an FL client or identifiable information pertaining to the client. Nevertheless, the distinctive attributes of federated learning introduce specific challenges for FU techniques. These challenges necessitate a tailored design when developing FU algorithms. While various concepts and numerous federated unlearning schemes exist in this field, the unified workflow and tailored design of FU are not yet well understood. Therefore, this comprehensive survey delves into the techniques and methodologies in FU providing an overview of fundamental concepts and principles, evaluating existing federated unlearning algorithms, and reviewing optimizations tailored to federated learning. Additionally, it discusses practical applications and assesses their limitations. Finally, it outlines promising directions for future research.

翻訳日:2024-07-18 00:00:40 公開日:2024-07-16

# 蒸留言語モデルにおける容量ギャップの法則に向けて

Towards the Law of Capacity Gap in Distilling Language Models ( http://arxiv.org/abs/2311.07052v2 )

ライセンス: Link先を確認

Chen Zhang, Dawei Song, Zheyu Ye, Yan Gao,

(参考訳) 言語モデル (LM) 蒸留は, 大規模教師のLMに居住する知識を小学生に活用することを目的とした, 流行の分野である。蒸留の有効性を最大化するために様々な方法が提案されているが、特に教師と学生のLMの間にかなりの容量差がある場合、大きな課題が続いている。この問題は、しばしばキャパシティギャップの「textit{curse}」と呼ばれ、より大きな教師が、より小さな教師から蒸留されたものよりも優れた生徒をもたらすとは限らないことを示唆している。言い換えれば、教師のスケーリングコースに沿って、最高の生徒を得られる最適な教師がいる可能性が高い。さらに悪いことに、以前の研究で示されているように、余分な計算がなければキャパシティギャップの呪いは解けない。大規模な LM (LLMs) の文脈では、特に計算量を増やすことなく、大きな教師を良い生徒に蒸留することは不可能であるため、これまで実現可能であったアプローチは、はるかに意味を欠くものとなる。しかし、この物語は決して片面ではない。大規模な教師を使うことがリソース需要であることを知るのは遅刻しない。そのため、呪いを解き放つ代わりに、呪いをそのまま残し、小さいが適切な教師を使わなければならない。さらに、本論文では、法をスケールする精神を取り入れ、最適な教師スケールが、様々なモデルアーキテクチャやデータスケールにわたる学生スケールとほぼ一貫して線形に相関していることを明らかにし、幸運にも呪いをキャパシティギャップの「textit{law}」に変える。この法則は後に LLaMA2-7B から 3B の学生 LM (termed \textsc{MiniMA}) を除去するように導かれる。 \textsc{MiniMA} は幅広い 3B の競合より優れており、いくつかの 7B モデルと競合することも可能である。

Language model (LM) distillation is a trending area that aims to distil the knowledge residing in a large teacher LM to a small student one. While various methods have been proposed to maximize the effectiveness of the distillation, significant challenges persist, particularly when there is a substantial capacity gap between the teacher and student LMs. This issue, often referred to as the \textit{curse} of capacity gap, suggests that a larger teacher does not necessarily result in a superior student compared to one distilled from a smaller teacher. In other words, there is likely an optimal teacher yielding the best student along the scaling course of the teacher. Even worse, the curse of capacity gap can not be lifted without additional compute, as indicated in previous studies. In the context of large LMs (LLMs), previously viable approaches become much less meaningful, as it is impossible to distill a large teacher to a good student without notably additional compute. However, the tale is not ever one-sided. It is always not late to acquire that using a large teacher is resource-demanding. Consequently, instead of sticking to lifting the curse, leaving the curse as is and using a small yet adequate teacher should be arguably fine. Even better, in this paper, we take the spirits of scaling law and reveal that the optimal teacher scale is almost consistently and linearly correlated to the student scale across different model architectures and data scales, fortunately turning the curse into a \textit{law} of capacity gap. The law later guides us to distil a 3B student LM (termed \textsc{MiniMA}) from LLaMA2-7B. \textsc{MiniMA} is demonstrated to outperform a wide range of 3B competitors and could even compete with several 7B models.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# ロボット制御のための事前訓練強化学習を目的とした中央モータシステム

A Central Motor System Inspired Pre-training Reinforcement Learning for Robotic Control ( http://arxiv.org/abs/2311.07822v3 )

ライセンス: Link先を確認

Pei Zhang, Zhaobo Hua, Jinliang Ding,

(参考訳) マルチジョイントロボットの自然運動能力を実現するためのコントローラーの設計は、大きな課題である。しかし、自然界の動物は自然に基本的な運動能力を持ち、獲得した学習を通じて様々な複雑な運動スキルを習得することができる。哺乳類の中枢運動系のメカニズムを解析し,ロボットが外部データに頼ることなく,リッチな運動能力を学び,複雑なタスク環境に適用することのできる,事前学習型強化学習アルゴリズムを提案する。本稿ではまず,小脳基底核における随意運動の選択機構と小脳の運動調節能力を利用して,小脳に似たスキルベースネットワークを設計する。その後、中央モーターシステムにおける高度なセンターの構造を模倣することにより、異なるスキルの組み合わせを生成するための高レベルなポリシーを提案し、ロボットが自然運動能力を得ることができるようにした。本研究では,4種類のロボットと22種類のタスク環境について実験を行い,提案手法により,柔軟な運動能力を実現することができることを示す。全体として、我々の研究はニューラルネットワークモーターコントローラの設計に有望なフレームワークを提供する。

Designing controllers to achieve natural motor capabilities for multi-joint robots is a significant challenge. However, animals in nature are naturally with basic motor abilities and can master various complex motor skills through acquired learning. On the basis of analyzing the mechanism of the central motor system in mammals, we propose a novel pre-training reinforcement learning algorithm that enables robots to learn rich motor skills and apply them to complex task environments without relying on external data. We first design a skill based network similar to the cerebellum by utilizing the selection mechanism of voluntary movements in the basal ganglia and the basic motor regulation ability of the cerebellum. Subsequently, by imitating the structure of advanced centers in the central motor system, we propose a high-level policy to generate different skill combinations, thereby enabling the robot to acquire natural motor abilities. We conduct experiments on 4 types of robots and 22 task environments, and the results show that the proposed method can enable different types of robots to achieve flexible motor skills. Overall, our research provides a promising framework for the design of neural network motor controllers.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# 分散(非)-ベイジアン推論の周波数保証

Frequentist Guarantees of Distributed (Non)-Bayesian Inference ( http://arxiv.org/abs/2311.08214v4 )

ライセンス: Link先を確認

Bohan Wu, César A. Uribe,

(参考訳) 大規模で分散化されたデータセットを分析する必要性から、分散ベイズ推論は統計学、電気工学、経済学など、様々な分野において重要な研究領域となっている。本稿では、通信ネットワークを介して接続されたエージェント間の分散(非)ベイズ推論問題に対して、後続一貫性、漸近正規性、後続収縮率などの周波数特性を確立する。この結果から,分散ベイズ推定は不確実性定量化におけるロバスト性を高めつつ,パラメトリックな効率を保ちながら,通信グラフ上の適切な仮定の下で分散ベイズ推定が維持されることが示唆された。また,通信グラフの設計とサイズが後部収縮率にどのように影響するかを検討することで,統計的効率と通信効率のトレードオフについても検討する。さらに,解析結果を時間変化グラフに拡張し,指数関数系モデル,分散ロジスティック回帰モデル,分散検出モデルに適用する。

Motivated by the need to analyze large, decentralized datasets, distributed Bayesian inference has become a critical research area across multiple fields, including statistics, electrical engineering, and economics. This paper establishes Frequentist properties, such as posterior consistency, asymptotic normality, and posterior contraction rates, for the distributed (non-)Bayes Inference problem among agents connected via a communication network. Our results show that, under appropriate assumptions on the communication graph, distributed Bayesian inference retains parametric efficiency while enhancing robustness in uncertainty quantification. We also explore the trade-off between statistical efficiency and communication efficiency by examining how the design and size of the communication graph impact the posterior contraction rate. Furthermore, We extend our analysis to time-varying graphs and apply our results to exponential family models, distributed logistic regression, and decentralized detection models.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# 機械学習に基づくアプリケーション行動の継続的管理

Continuous Management of Machine Learning-Based Application Behavior ( http://arxiv.org/abs/2311.12686v2 )

ライセンス: Link先を確認

Marco Anisetti, Claudio A. Ardagna, Nicola Bena, Ernesto Damiani, Paolo G. Panero,

(参考訳) 現代のアプリケーションは、設計から運用までのアプリケーションライフサイクル全体に影響を与える非決定的な振る舞いを持つ機械学習(ML)モデルによって、ますます推進されています。 MLの広範な採用は、MLベースのアプリケーションの時間的およびモデル変更間の安定した非機能的動作を保証するアプローチを緊急に求めている。この目的のために、プライバシ、機密性、公正性、説明可能性などのMLモデルの非機能特性を監視、検証、維持する必要がある。既存のアプローチは主に焦点をあてる一 MLモデルの機能的振舞いに応じて分類器選択のソリューションを実装すること。二連続的再訓練のような新しいアルゴリズムの解を見つけること。本稿では,MLベースのアプリケーションの安定な非機能動作を保証するためのマルチモデルアプローチを提案する。同様の非機能特性を示す複数のMLモデルを比較し、(動的かつ予測不可能な)文脈変化に応じて、時間とともに安定した非機能挙動をサポートするモデルを選択するためのアーキテクチャ的および方法論的アプローチが提供される。我々のアプローチは、MLベースのアプリケーションの安定した非機能的動作を継続的に保証し、MLアルゴリズムに依存しず、MLモデル自身で評価された非機能的特性によって駆動されるソリューションを提供することによって、最先端以上のものを提供します。モデル評価は、開発時に訓練され、選択されたMLモデルの非機能特性を検証し、モデル置換は、非機能特性の連続的かつ安定したサポートを保証する。非機能的プロパティフェアネスに着目した実世界のシナリオで,我々のソリューションを実験的に評価した。

Modern applications are increasingly driven by Machine Learning (ML) models whose non-deterministic behavior is affecting the entire application life cycle from design to operation. The pervasive adoption of ML is urgently calling for approaches that guarantee a stable non-functional behavior of ML-based applications over time and across model changes. To this aim, non-functional properties of ML models, such as privacy, confidentiality, fairness, and explainability, must be monitored, verified, and maintained. Existing approaches mostly focus on i) implementing solutions for classifier selection according to the functional behavior of ML models, ii) finding new algorithmic solutions, such as continuous re-training. In this paper, we propose a multi-model approach that aims to guarantee a stable non-functional behavior of ML-based applications. An architectural and methodological approach is provided to compare multiple ML models showing similar non-functional properties and select the model supporting stable non-functional behavior over time according to (dynamic and unpredictable) contextual changes. Our approach goes beyond the state of the art by providing a solution that continuously guarantees a stable non-functional behavior of ML-based applications, is ML algorithm-agnostic, and is driven by non-functional properties assessed on the ML models themselves. It consists of a two-step process working during application operation, where model assessment verifies non-functional properties of ML models trained and selected at development time, and model substitution guarantees continuous and stable support of non-functional properties. We experimentally evaluate our solution in a real-world scenario focusing on non-functional property fairness.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# 非対称Bethe Ansatz

Asymmetric Bethe Ansatz ( http://arxiv.org/abs/2311.15155v2 )

ライセンス: Link先を確認

Steven G. Jackson, Hélène Perrin, Gregory E. Astrakharchik, Maxim Olshanii,

(参考訳) 最近提案された2つのデルタ関数相互作用粒子に対する正確な量子解は、ハードウォールボックス (Y. Liu, F. Qi, Y. Zhang, S. Chen, iScience 22 181 (2019)) に質量比3:1の質量比を持つ。本稿では、この条件を緩和する方法を見出した: 既知の自己不変鏡重ね合わせの半透明鏡の一部が、完全に反射する鏡に置き換えられ、自己不変性を損なう。提案された手法の名は、非対称ベテ・アンザッツ (Asymmetric Bethe Ansatz, Asymmetric BA) である。実例として、デルタウェル内のボゾン二量体からなる名目上は非可積分系の有界状態について詳細に研究する。最後に、Lou-Qi-Zhang-Chen問題の正確な解は非対称BAの特別な例であることを示す。

The recently proposed exact quantum solution for two delta-function-interacting particles with a mass-ratio 3:1 in a hard-wall box [Y. Liu, F. Qi, Y. Zhang and S. Chen, iScience 22, 181 (2019)] violates the conventional necessary condition for a Bethe Ansatz integrability, the condition being that the system must be reducible to a superposition of semi-transparent mirrors that is invariant under all the reflections it generates. In this article, we found a way to relax this condition: some of the semi-transparent mirrors of a known self-invariant mirror superposition can be replaced by the perfectly reflecting ones, thus breaking the self-invariance. The proposed name for the method is Asymmetric Bethe Ansatz (Asymmetric BA). As a worked example, we study in detail the bound states of the nominally non-integrable system comprised of a bosonic dimer in a delta-well. Finally, we show that the exact solution of the Liu-Qi-Zhang-Chen problem is a particular instance of the the Asymmetric BA.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# シンクロナイゼーションは必要なものすべて:非ラベル同期ビデオペアを用いた時間的アクションセグメンテーションのためのExocentric-to-Egocentric Transfer

Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs ( http://arxiv.org/abs/2312.02638v3 )

ライセンス: Link先を確認

Camillo Quattrocchi, Antonino Furnari, Daniele Di Mauro, Mario Valerio Giuffrida, Giovanni Maria Farinella,

(参考訳) 我々は、当初、外向型(固定型)カメラ用に設計された時間的アクションセグメンテーションシステムを、ウェアラブルカメラが映像データをキャプチャするエゴセントリックなシナリオに転送する問題を考える。従来の教師付きアプローチでは、コストと時間を要するモデルに適応するために、新しいエゴセントリックなビデオのコレクションとラベリングが必要となる。そこで本稿では,既存のラベル付きエキソセントリックビデオと,時間的アクションセグメンテーションアノテーションを収集する必要のない,非ラベル付き,同期型エキソセントリックビデオペアを新たに導入する手法を提案する。提案手法を知識蒸留に基づく手法を用いて実装し,特徴量と時間行動セグメンテーションモデルの両方について検討する。 Assembly101とEgoExo4Dの実験は、従来の教師なし領域適応と時間的アライメントアプローチに対する提案手法の有効性を実証している。我々の最良のモデルは、ラベル付きエゴセントリックなデータに基づいてトレーニングされた教師付きアプローチと同等に動作し、単一のエゴセントリックなラベルを見ることなく、アセンブリ101データセットの編集スコア(28.59対12.60)を、エゴセントリックなデータのみに基づいてトレーニングされたベースラインモデルと比較して+15.99改善した。同様の設定では、EgoExo4Dベンチマークの編集スコアを+3.32に改善する。コードはここにある。 https://github.com/fpv-iplab/synchronization-is-all-you-need。

We consider the problem of transferring a temporal action segmentation system initially designed for exocentric (fixed) cameras to an egocentric scenario, where wearable cameras capture video data. The conventional supervised approach requires the collection and labeling of a new set of egocentric videos to adapt the model, which is costly and time-consuming. Instead, we propose a novel methodology which performs the adaptation leveraging existing labeled exocentric videos and a new set of unlabeled, synchronized exocentric-egocentric video pairs, for which temporal action segmentation annotations do not need to be collected. We implement the proposed methodology with an approach based on knowledge distillation, which we investigate both at the feature and Temporal Action Segmentation model level. Experiments on Assembly101 and EgoExo4D demonstrate the effectiveness of the proposed method against classic unsupervised domain adaptation and temporal alignment approaches. Without bells and whistles, our best model performs on par with supervised approaches trained on labeled egocentric data, without ever seeing a single egocentric label, achieving a +15.99 improvement in the edit score (28.59 vs 12.60) on the Assembly101 dataset compared to a baseline model trained solely on exocentric data. In similar settings, our method also improves edit score by +3.32 on the challenging EgoExo4D benchmark. Code is available here: https://github.com/fpv-iplab/synchronization-is-all-you-need.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# Egocentric Hand-Object Interaction Detection に合成データは有用か?

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? ( http://arxiv.org/abs/2312.02672v3 )

ライセンス: Link先を確認

Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella,

(参考訳) 本研究では,エゴセントリックな手・物体間相互作用検出における合成データの有効性について検討した。また,3つのエゴセントリックデータセット(VISOR,EgoHOS,ENIGMA-51)の広範な実験と比較分析により,実際のラベル付きデータが不足あるいは利用できない場合に,HOI検出タスクの合成データを利用する方法が明らかになった。具体的には、実際のラベル付きデータの10%しか利用せず、EPIC-KITCHENS VISORで+5.67%、EgoHOSで+8.24%、ENIGMA-51で+11.69%のトレーニングを受けたベースラインと比較して、全体的なAPの改善を実現している。我々の分析は、新しいデータ生成パイプラインと、新たに導入されたHOI-Synthベンチマークによって支援され、手オブジェクト間相互作用の合成画像に手オブジェクト接触状態、バウンディングボックス、ピクセルワイドセグメンテーションマスクを自動ラベル付けする。将来の研究をサポートするデータ、コード、およびデータ生成ツールは、https://fpv-iplab.github.io/HOI-Synth/でリリースされている。

In this study, we investigate the effectiveness of synthetic data in enhancing egocentric hand-object interaction detection. Via extensive experiments and comparative analyses on three egocentric datasets, VISOR, EgoHOS, and ENIGMA-51, our findings reveal how to exploit synthetic data for the HOI detection task when real labeled data are scarce or unavailable. Specifically, by leveraging only 10% of real labeled data, we achieve improvements in Overall AP compared to baselines trained exclusively on real data of: +5.67% on EPIC-KITCHENS VISOR, +8.24% on EgoHOS, and +11.69% on ENIGMA-51. Our analysis is supported by a novel data generation pipeline and the newly introduced HOI-Synth benchmark which augments existing datasets with synthetic images of hand-object interactions automatically labeled with hand-object contact states, bounding boxes, and pixel-wise segmentation masks. Data, code, and data generation tools to support future research are released at: https://fpv-iplab.github.io/HOI-Synth/.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# MotionCtrl:ビデオ生成のための統一型フレキシブルモーションコントローラ

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation ( http://arxiv.org/abs/2312.03641v2 )

ライセンス: Link先を確認

Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan,

(参考訳) ビデオ中の動きは、主にカメラの動きによって誘導されるカメラの動きと、物体の動きによって生じる物体の動きから成り立っている。映像生成にはカメラと物体の動きの正確な制御が不可欠である。しかし、既存の作品は、主に1つのタイプの動きに焦点を当てるか、その2つを明確に区別せず、制御能力と多様性を制限している。そこで本稿では,カメラと物体の動きを効果的かつ独立に制御するビデオ生成用統合フレキシブルモーションコントローラであるMotionCtrlを提案する。 MotionCtrlのアーキテクチャとトレーニング戦略は、カメラモーション、オブジェクトモーション、および不完全なトレーニングデータの性質を考慮して慎重に考案されている。従来の方法と比較して、MotionCtrlには3つの大きな利点がある。 1) カメラの動きと物体の動きを効果的かつ独立に制御し, よりきめ細かい動きの制御を可能にし, 両動作の柔軟性と多様な組み合わせを容易にする。 2) 動作条件はカメラのポーズや軌跡によって決定され, 映像中の物体の外観や形状に最小限に影響を及ぼす。 3)広範に訓練されたカメラのポーズや軌跡に適応できる比較的一般化可能なモデルである。既存の手法よりもMotionCtrlの方が優れていることを示すために,大規模な定性的および定量的実験が実施されている。 Project Page: https://wzhouxiff.github.io/projects/MotionCtrl/

Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. Accurate control of both camera and object motion is essential for video generation. However, existing works either mainly focus on one type of motion or do not clearly distinguish between the two, limiting their control capabilities and diversity. Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion. The architecture and training strategy of MotionCtrl are carefully devised, taking into account the inherent properties of camera motion, object motion, and imperfect training data. Compared to previous methods, MotionCtrl offers three main advantages: 1) It effectively and independently controls camera motion and object motion, enabling more fine-grained motion control and facilitating flexible and diverse combinations of both types of motion. 2) Its motion conditions are determined by camera poses and trajectories, which are appearance-free and minimally impact the appearance or shape of objects in generated videos. 3) It is a relatively generalizable model that can adapt to a wide array of camera poses and trajectories once trained. Extensive qualitative and quantitative experiments have been conducted to demonstrate the superiority of MotionCtrl over existing methods. Project Page: https://wzhouxiff.github.io/projects/MotionCtrl/

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# MEVG:テキスト・ツー・ビデオモデルによるマルチイベントビデオ生成

MEVG: Multi-event Video Generation with Text-to-Video Models ( http://arxiv.org/abs/2312.04086v2 )

ライセンス: Link先を確認

Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Sangpil Kim,

(参考訳) 本稿では,ユーザから複数の個々の文が与えられた複数のイベントを示すビデオを生成する,拡散に基づく新しいビデオ生成手法を提案する。提案手法は, 微調整処理を伴わずに, 事前学習した拡散型テキスト・ビデオ生成モデルを使用するため, 大規模なビデオデータセットを必要としない。具体的には、各ビデオが異なるイベントで構成されている連続ビデオ間の視覚的コヒーレンスを維持するための最後のフレーム認識拡散プロセスを提案する。さらに, 先行フレームを全て参照することで, ビデオクリップ内のフレーム全体のグローバルな外観を保ちながら, 遅延ベクトルの反復的な更新を行うことが判明した。ビデオ生成のための動的テキスト入力を処理するために,ユーザからテキスト拡散モデルのための複数の最適プロンプトにコーステキストメッセージを転送する新しいプロンプト生成器を利用する。広汎な実験とユーザスタディにより,提案手法はコンテンツとセマンティクスの時間的コヒーレンシーの観点から,他のビデオ生成モデルよりも優れていることが示された。ビデオ例はプロジェクトのページで公開されている。

We introduce a novel diffusion-based video generation method, generating a video showing multiple events given multiple individual sentences from the user. Our method does not require a large-scale video dataset since our method uses a pre-trained diffusion-based text-to-video generative model without a fine-tuning process. Specifically, we propose a last frame-aware diffusion process to preserve visual coherence between consecutive videos where each video consists of different events by initializing the latent and simultaneously adjusting noise in the latent to enhance the motion dynamic in a generated video. Furthermore, we find that the iterative update of latent vectors by referring to all the preceding frames maintains the global appearance across the frames in a video clip. To handle dynamic text input for video generation, we utilize a novel prompt generator that transfers course text messages from the user into the multiple optimal prompts for the text-to-video diffusion model. Extensive experiments and user studies show that our proposed method is superior to other video-generative models in terms of temporal coherency of content and semantics. Video examples are available on our project page: https://kuai-lab.github.io/eccv2024mevg.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# LiCamPose: マルチビューLiDARとRGBカメラの組み合わせによるロバストな1フレーム3D人物位置推定

LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation ( http://arxiv.org/abs/2312.06409v3 )

ライセンス: Link先を確認

Zhiyu Pan, Zhicheng Zhong, Wenxuan Guo, Yifan Chen, Jianjiang Feng, Jie Zhou,

(参考訳) 多視点画像から3次元人間のポーズを推定する手法が提案されている。しかし、RGBやポイントクラウドデータといったマルチモーダル入力から3次元人間の骨格を抽出する手法は限られている。このギャップに対処するために,マルチビューRGBとスパースポイントクラウド情報を統合するパイプラインLiCamPoseを導入する。これらのモダリティを組み合わせる上で,ボリュームアーキテクチャの有効性を実証する。さらに,手動でラベル付けされた3次元ポーズアノテーションの必要性を回避するため,手動アノテーションを使わずに3次元ポーズ推定器を訓練するための教師なしドメイン適応戦略を事前訓練・設計するための合成データセット生成器を開発した。提案手法の一般化能力を検証するため,LiCamPoseは2つの公開データセット,1つの合成データセット,BasketBallという名の挑戦的な自己収集データセットを含む4つのデータセットで評価され,多様なシナリオをカバーする。その結果,LiCamPoseは高い一般化性能とアプリケーションの可能性を示した。この論文を受け入れると、コード、ジェネレータ、データセットが利用可能になる。

Several methods have been proposed to estimate 3D human pose from multi-view images, achieving satisfactory performance on public datasets collected under relatively simple conditions. However, there are limited approaches studying extracting 3D human skeletons from multimodal inputs, such as RGB and point cloud data. To address this gap, we introduce LiCamPose, a pipeline that integrates multi-view RGB and sparse point cloud information to estimate robust 3D human poses via single frame. We demonstrate the effectiveness of the volumetric architecture in combining these modalities. Furthermore, to circumvent the need for manually labeled 3D human pose annotations, we develop a synthetic dataset generator for pretraining and design an unsupervised domain adaptation strategy to train a 3D human pose estimator without manual annotations. To validate the generalization capability of our method, LiCamPose is evaluated on four datasets, including two public datasets, one synthetic dataset, and one challenging self-collected dataset named BasketBall, covering diverse scenarios. The results demonstrate that LiCamPose exhibits great generalization performance and significant application potential. The code, generator, and datasets will be made available upon acceptance of this paper.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# 高次ショートカット規則によるモデル整合性復元

Advanced Model Consistency Restoration with Higher-Order Short-Cut Rules ( http://arxiv.org/abs/2312.09828v3 )

ライセンス: Link先を確認

Lars Fritsche, Jens Kosiol, Alexander Lauer, Adrian Möller, Andy Schürr,

(参考訳) 逐次モデル同期は、あるモデルから別のモデルへの変化を伝達し、一貫性を回復するタスクである。不要な削除(情報損失を引き起こす可能性がある)を避けるため、この伝播を最小限の変更方法で実行することは困難である。理論的な観点からは、情報損失を回避しつつ変化の伝播を確実に補正するいわゆるショートカット(SC)ルールが開発されている。しかし、可能なすべての変化に反応できるためには、そのような規則の無限の集合が必要であるかもしれない。実際には、事前計算された基本的なSCルールの小さなセットのみが使われており、情報を失うことなく伝達できる変更の種類を厳しく制限している。本研究は、同期中に必要となるSCルールをオンザフライで計算するアプローチを開発することで、そのギャップを埋めるものである。これらの高階のSCルールは、複数の変更を1ステップで処理しなければならない場合に、より複雑なシナリオに対処することができます。モデル変換ツールeMoflonにアプローチを実装しました。評価により、高次SCルールのオンザフライでの計算のオーバーヘッドは許容可能であり、時には全体的な性能も向上することが示された。その上、情報を失うことなく、まったく新しいシナリオを扱うことができます。

Sequential model synchronisation is the task of propagating changes from one model to another correlated one to restore consistency. It is challenging to perform this propagation in a least-changing way that avoids unnecessary deletions (which might cause information loss). From a theoretical point of view, so-called short-cut (SC) rules have been developed that enable provably correct propagation of changes while avoiding information loss. However, to be able to react to every possible change, an infinite set of such rules might be necessary. Practically, only small sets of pre-computed basic SC rules have been used, severely restricting the kind of changes that can be propagated without loss of information. In this work, we close that gap by developing an approach to compute more complex required SC rules on-the-fly during synchronisation. These higher-order SC rules allow us to cope with more complex scenarios when multiple changes must be handled in one step. We implemented our approach in the model transformation tool eMoflon. An evaluation shows that the overhead of computing higher-order SC rules on-the-fly is tolerable and at times even improves the overall performance. Above that, completely new scenarios can be dealt with without the loss of information.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# クォート振動子に対する経路積分:分割関数の正確な解析公式

Path integral for the quartic oscillator: An accurate analytic formula for the partition function ( http://arxiv.org/abs/2312.09859v3 )

ライセンス: Link先を確認

Michel Caffarel,

(参考訳) 本稿では、ポテンシャル $V(x) = \frac{1}{2} \omega^2 x^2 + g x^4$ で表されるクォート発振子の量子分割関数の近似解析式を示す。経路積分形式を用いて、正確な分割関数は、温度と結合定数$g$に依存する有効周波数を持つ調和振動子の分配関数によって近似される。実効周波数に積分された経路の最小感度原理(PMS)を導出することにより、分割関数の数学的に明確に定義された式を導出する。極めて顕著に、この公式は正確な分割関数の重要な特徴を定性的かつ定量的に再現する。自由エネルギーは温度と結合強度全体の数パーセントまで正確である。調和(g\rightarrow 0$)と古典的(高温)の制限はどちらも正確に回復される。摂動エネルギーの因子的成長を特徴とする弱結合時の基底状態エネルギーの動力系列のばらつきと、正確な係数とともに強結合膨張の関数形式を再現する。基底および第1励起状態エネルギーの正確な式、$E_0(g)$と$E_1(g)$も提示される。

In this work an approximate analytic expression for the quantum partition function of the quartic oscillator described by the potential $V(x) = \frac{1}{2} \omega^2 x^2 + g x^4$ is presented. Using a path integral formalism, the exact partition function is approximated by the partition function of a harmonic oscillator with an effective frequency depending both on the temperature and coupling constant $g$. By invoking a Principle of Minimal Sensitivity (PMS) of the path integral to the effective frequency, we derive a mathematically well-defined analytic formula for the partition function. Quite remarkably, the formula reproduces qualitatively and quantitatively the key features of the exact partition function. The free energy is accurate to a few percent over the entire range of temperatures and coupling strengths $g$. Both the harmonic ($g\rightarrow 0$) and classical (high-temperature) limits are exactly recovered. The divergence of the power series of the ground-state energy at weak coupling, characterized by a factorial growth of the perturbational energies, is reproduced as well as the functional form of the strong-coupling expansion along with accurate coefficients. Explicit accurate expressions for the ground- and first-excited state energies, $E_0(g)$ and $E_1(g)$ are also presented.

翻訳日:2024-07-17 23:50:29 公開日:2024-07-16

# SPIRE: セマンティックプロンプト駆動画像復元

SPIRE: Semantic Prompt-Driven Image Restoration ( http://arxiv.org/abs/2312.11595v2 )

ライセンス: Link先を確認

Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi,

(参考訳) テキスト駆動拡散モデルは、インペイント、スタイリゼーション、オブジェクト置換など、様々な画像編集タスクでますます人気が高まっている。しかし、この言語ビジョンパラダイムを、より精細な画像処理タスク(例えば、デノイング、超解像、デブロアリング、圧縮アーティファクト削除など)に採用することは、依然としてオープンな研究課題である。本稿では,自然言語をユーザフレンドリなインタフェースとして活用し,画像復元プロセスを制御する,セマンティック・プロンプト駆動型画像復元フレームワークであるSPIREを開発する。本稿では,2次元における情報伝達能力について考察する。まず、コンテンツ関連プロンプトを用いてセマンティックアライメントを強化し、修復結果におけるアイデンティティの曖昧さを効果的に軽減する。第2に,本手法は,タスク固有の明示的な設計を必要とせず,言語に基づく復元強度の定量的な仕様化による細粒度指導を支援する最初のフレームワークである。さらに,既存のControlNetアーキテクチャを拡張した新しい融合機構を導入し,生成前の再スケールを学習することで,復元精度の向上を実現した。我々は,SPIREの回復性能を最先端技術と比較し,回復効果に対するテキストベース制御の柔軟性を実証した。

Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework that leverages natural language as a user-friendly interface to control the image restoration process. We consider the capacity of prompt information in two dimensions. First, we use content-related prompts to enhance the semantic alignment, effectively alleviating identity ambiguity in the restoration outcomes. Second, our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength, without the need for explicit task-specific design. In addition, we introduce a novel fusion mechanism that augments the existing ControlNet architecture by learning to rescale the generative prior, thereby achieving better restoration fidelity. Our extensive experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts, alongside offering the flexibility of text-based control over the restoration effects.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# Qスコアマッチングによるリワードからの拡散モデルポリシーの学習

Learning a Diffusion Model Policy from Rewards via Q-Score Matching ( http://arxiv.org/abs/2312.11752v3 )

ライセンス: Link先を確認

Michael Psenka, Alejandro Escontrela, Pieter Abbeel, Yi Ma,

(参考訳) 拡散モデルは、行動クローニングとオフライン強化学習においてアクターポリシーを表現するために一般的な選択肢となっている。これは、連続空間上の表現的分布のクラスを最適化する自然な能力のためである。しかし、以前の作品では楽譜に基づく拡散モデルの構造を活用できず、代わりに単純な行動クローニング用語を使用してアクターを訓練し、アクター批判的な設定におけるそれらの能力を制限する。本稿では,拡散モデルポリシの構造を学習されたQ-関数にリンクする理論的枠組みを提案する。本稿では, 外部強化学習に着目し, この理論からQスコアマッチングを示す新しいポリシー更新手法を提案する。特に、このアルゴリズムは拡散モデル全体の評価よりもデノナイジングモデルを通してしか区別する必要がなく、Qスコアマッチングによる収束ポリシーは、連続的なドメインにおいて暗黙的に多重モーダルかつ爆発的である。シミュレーション環境で実験を行い,提案手法の有効性を実証し,一般的なベースラインと比較した。ソースコードはプロジェクトのWebサイト(https://michaelpsenka.io/qsm)から入手できる。

Diffusion models have become a popular choice for representing actor policies in behavior cloning and offline reinforcement learning. This is due to their natural ability to optimize an expressive class of distributions over a continuous space. However, previous works fail to exploit the score-based structure of diffusion models, and instead utilize a simple behavior cloning term to train the actor, limiting their ability in the actor-critic setting. In this paper, we present a theoretical framework linking the structure of diffusion model policies to a learned Q-function, by linking the structure between the score of the policy to the action gradient of the Q-function. We focus on off-policy reinforcement learning and propose a new policy update method from this theory, which we denote Q-score matching. Notably, this algorithm only needs to differentiate through the denoising model rather than the entire diffusion model evaluation, and converged policies through Q-score matching are implicitly multi-modal and explorative in continuous domains. We conduct experiments in simulated environments to demonstrate the viability of our proposed method and compare to popular baselines. Source code is available from the project website: https://michaelpsenka.io/qsm.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# LiDAR領域の一般化を再考する:多重密度領域としての単一ソース

Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains ( http://arxiv.org/abs/2312.12098v2 )

ライセンス: Link先を確認

Jaeyeul Kim, Jungwan Woo, Jeonghoon Kim, Sunghoon Im,

(参考訳) LiDARに基づく認識の領域では、重要な進歩がなされているが、領域の一般化は依然として重大な課題である。この性能は、異なるLiDARセンサーを持つ未知のデータセットにモデルを適用する場合や、主に点雲密度分布の変化のために新しい環境にデプロイする場合に劣化することが多い。この課題に対処するために、単一ソースのLiDAR点雲が密度のスペクトルを包含しているという観測に乗じて、DDFE(Divate Discriminative Feature Embedding)モジュールを提案する。 DDFEモジュールは、単一のソースドメイン内で密度固有の特徴を抽出し、異なるLiDARセンサー間で類似した密度特性を共有するオブジェクトの認識を容易にするように設計されている。さらに、ソースデータの密度スペクトルを拡大し、DDFEの能力を高めることを目的とした、シンプルで効果的な密度拡張手法を導入する。 DDFEは汎用的で軽量なドメイン一般化モジュールとして際立っている。様々な3Dバックボーンネットワークにシームレスに統合することができ、現在の最先端ドメイン一般化法よりも優れた性能を示している。コードはhttps://github.com/dgist-cvlab/MultiDensityDGで入手できる。

In the realm of LiDAR-based perception, significant strides have been made, yet domain generalization remains a substantial challenge. The performance often deteriorates when models are applied to unfamiliar datasets with different LiDAR sensors or deployed in new environments, primarily due to variations in point cloud density distributions. To tackle this challenge, we propose a Density Discriminative Feature Embedding (DDFE) module, capitalizing on the observation that a single source LiDAR point cloud encompasses a spectrum of densities. The DDFE module is meticulously designed to extract density-specific features within a single source domain, facilitating the recognition of objects sharing similar density characteristics across different LiDAR sensors. In addition, we introduce a simple yet effective density augmentation technique aimed at expanding the spectrum of density in source data, thereby enhancing the capabilities of the DDFE. Our DDFE stands out as a versatile and lightweight domain generalization module. It can be seamlessly integrated into various 3D backbone networks, where it has demonstrated superior performance over current state-of-the-art domain generalization methods. Code is available at https://github.com/dgist-cvlab/MultiDensityDG.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# NeRF-VO:ニューラルラジアンス場を用いたリアルタイムスパース視覚計測

NeRF-VO: Real-Time Sparse Visual Odometry with Neural Radiance Fields ( http://arxiv.org/abs/2312.13471v2 )

ライセンス: Link先を確認

Jens Naumann, Binbin Xu, Stefan Leutenegger, Xingxing Zuo,

(参考訳) 我々は,低遅延カメラ追跡のための学習ベーススパース視覚計測システムNeRF-VOと,微細な高密度再構成と新しいビュー合成のためのニューラルラディアンスシーン表現を統合した新しいモノクロ視覚計測システムNeRF-VOを導入する。本システムでは,スパース・ビジュアル・オドメトリーを用いてカメラのポーズを初期化し,モノラルな予測ネットワークからビュー依存の高密度な幾何学的先行情報を得る。我々は、ポーズのスケールと密な幾何学を調和させ、それらを神経暗黙のシーン表現を訓練するための監督的手がかりとして扱う。 NeRF-VOは、キーフレームされたポーズのスライドウィンドウと、ボリュームレンダリングによるラディアンスフィールドのトレーニングによって達成される下層の密度幾何を共同最適化することにより、シーン表現の測度と幾何学的忠実度の両方において、例外的な性能を示す。我々は、高いカメラトラッキング周波数を実現し、GPUメモリの消費を抑えつつ、ポーズ推定精度、新しいビュー合成忠実度、および様々な合成および実世界のデータセットにおける密度の高い再構成品質をSOTA法を超えている。

We introduce a novel monocular visual odometry (VO) system, NeRF-VO, that integrates learning-based sparse visual odometry for low-latency camera tracking and a neural radiance scene representation for fine-detailed dense reconstruction and novel view synthesis. Our system initializes camera poses using sparse visual odometry and obtains view-dependent dense geometry priors from a monocular prediction network. We harmonize the scale of poses and dense geometry, treating them as supervisory cues to train a neural implicit scene representation. NeRF-VO demonstrates exceptional performance in both photometric and geometric fidelity of the scene representation by jointly optimizing a sliding window of keyframed poses and the underlying dense geometry, which is accomplished through training the radiance field with volume rendering. We surpass SOTA methods in pose estimation accuracy, novel view synthesis fidelity, and dense reconstruction quality across a variety of synthetic and real-world datasets while achieving a higher camera tracking frequency and consuming less GPU memory.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# 正規化誤り訂正によるフェデレーション学習のためのスパーストレーニング

Sparse Training for Federated Learning with Regularized Error Correction ( http://arxiv.org/abs/2312.13795v2 )

ライセンス: Link先を確認

Ran Greidi, Kobi Cohen,

(参考訳) Federated Learning(FL)は、ディープニューラルネットワーク(DNN)モデルをトレーニングする上で大きなメリットがあるため、大きな関心を集めている。しかし、通信資源や計算資源は限られているため、FLシステムにおけるDNNモデルの訓練は、複雑なタスクにおける計算コストや通信コストの増大などの課題に直面している。スパーストレーニングスキームは、各クライアント(すなわちノード)送信の寸法を縮小するために注目される。具体的には,重要な更新のみをパラメータサーバ(PS)に送信し,残りをローカルに蓄積するという,エラー訂正手法によるスペーシングが有望な手法である。誤り訂正法は収束を損なうことなくクライアント対PSメッセージの大幅なスペーサー化レベルを達成することが示されているが、スペーサー化は安定化効果によりさらに未解決のままである。本稿では,FLARE(Federated Learning with Accumulated Regularized Embeddings)と呼ばれる新しいアルゴリズムを提案する。 FLAREでは,更新モデルの蓄積とFLプロセスへの埋め込みの正規化によるスパーストレーニング手法を提案する。 FLAREの性能は、多種多様な複雑なモデルに関する広範な実験を通じて検証され、顕著なスパーシリティレベル(現在の最先端の10倍以上の)を達成するとともに、精度が大幅に向上した。さらに、研究者や関連分野の開発者の利益のために、オープンソースのソフトウェアパッケージが開発されている。

Federated Learning (FL) has attracted much interest due to the significant advantages it brings to training deep neural network (DNN) models. However, since communications and computation resources are limited, training DNN models in FL systems face challenges such as elevated computational and communication costs in complex tasks. Sparse training schemes gain increasing attention in order to scale down the dimensionality of each client (i.e., node) transmission. Specifically, sparsification with error correction methods is a promising technique, where only important updates are sent to the parameter server (PS) and the rest are accumulated locally. While error correction methods have shown to achieve a significant sparsification level of the client-to-PS message without harming convergence, pushing sparsity further remains unresolved due to the staleness effect. In this paper, we propose a novel algorithm, dubbed Federated Learning with Accumulated Regularized Embeddings (FLARE), to overcome this challenge. FLARE presents a novel sparse training approach via accumulated pulling of the updated models with regularization on the embeddings in the FL process, providing a powerful solution to the staleness effect, and pushing sparsity to an exceptional level. The performance of FLARE is validated through extensive experiments on diverse and complex models, achieving a remarkable sparsity level (10 times and more beyond the current state-of-the-art) along with significantly improved accuracy. Additionally, an open-source software package has been developed for the benefit of researchers and developers in related fields.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# 古典最適化ユニタリ回路による非平衡量子力学のスケーラブルシミュレーション

Scalable simulation of non-equilibrium quantum dynamics via classically optimised unitary circuits ( http://arxiv.org/abs/2312.14245v2 )

ライセンス: Link先を確認

Luke Causer, Felix Jung, Asimpunya Mitra, Frank Pollmann, Adam Gammon-Smith,

(参考訳) 短期的なデジタル量子コンピュータの出現は、古典的コンピューティング以上の量子多体現象を研究するエキサイティングな機会になるかもしれない。ハードウェアを最大限に活用するためには、限られた回路深さに対してハミルトン力学を正確にシミュレートする手法が最重要である。本稿では,量子時間進化演算子を近似するために,一元的ブロックウォール回路を古典的に最適化する手法を提案する。本手法はテンソルネットワークを用いてシステムサイズを拡張可能である。様々な3体ハミルトニアンに対して、我々の手法は、その精度と力学を実装するために必要な量子回路深さの両方でトロタライズを上回る量子回路を生成し、正確な詳細はハミルトニアンに依存することを示した。また、量子デバイスとブロックウォール回路の近似の組合せ誤差を最小限に抑える最適な時間ステップを選択する方法についても説明する。

The advent of near-term digital quantum computers could offer us an exciting opportunity to investigate quantum many-body phenomena beyond that of classical computing. To make the best use of the hardware available, it is paramount that we have methods that accurately simulate Hamiltonian dynamics for limited circuit depths. In this paper, we propose a method to classically optimise unitary brickwall circuits to approximate quantum time evolution operators. Our method is scalable in system size through the use of tensor networks. We demonstrate that, for various three-body Hamiltonians, our approach produces quantum circuits that can outperform Trotterization in both their accuracy and the quantum circuit depth needed to implement the dynamics, with the exact details being dependent on the Hamiltonian. We also explain how to choose an optimal time step that minimises the combined errors of the quantum device and the brickwall circuit approximation.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# 深層学習法により得られた包括的電子-炭素散乱データへの経験的適合

Empirical fits to inclusive electron-carbon scattering data obtained by deep-learning methods ( http://arxiv.org/abs/2312.17298v2 )

ライセンス: Link先を確認

Beata E. Kowal, Krzysztof M. Graczyk, Artur M. Ankowski, Rwik Dharmapal Banerjee, Hemant Prasad, Jan T. Sobczyk,

(参考訳) ニューラルネットワークの枠組みを応用して、準弾性ピークから共鳴励起を経て深い非弾性散乱の開始まで、広い運動領域上の炭素の電子散乱断面積に実験的に適合することを示す。このようなモデル非依存のパラメトリゼーションとそれに対応する不確実性を得る2つの異なる方法を考える:ブートストラップ法とモンテカルロのドロップアウト法に基づく。解析において、$\chi^2$は、各独立した測定セットに対する点対点と正規化の不確かさを含む損失関数を定義する。我々の統計的アプローチは、同等の品質と、同様の不確実性の7ドル%の順序に適合する。これらのモデルをテストするために、これらの予測を、トレーニングプロセスから除外されたデータセットとスペクトル関数アプローチで得られた理論的予測と比較する。両方のモデルの予測は、実験的な測定と理論的な計算と一致している。また,対象キネマティック領域を超えたデータセットとの比較を行い,ブートストラップ手法は,ドロップアウトアルゴリズムに基づくデータセットよりも,補間能力と補間性能が優れていることを示した。

Employing the neural network framework, we obtain empirical fits to the electron-scattering cross sections for carbon over a broad kinematic region, extending from the quasielastic peak through resonance excitation to the onset of deep-inelastic scattering. We consider two different methods of obtaining such model-independent parametrizations and the corresponding uncertainties: based on the bootstrap approach and the Monte Carlo dropout approach. In our analysis, the $\chi^2$ defines the loss function, including point-to-point and normalization uncertainties for each independent set of measurements. Our statistical approaches lead to fits of comparable quality and similar uncertainties of the order of $7$%. To test these models, we compare their predictions to test datasets excluded from the training process and theoretical predictions obtained within the spectral function approach. The predictions of both models agree with experimental measurements and theoretical calculations. We also perform a comparison to a dataset lying beyond the covered kinematic region, and find that the bootstrap approach shows better interpolation and extrapolation abilities than the one based on the dropout algorithm.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# ストリートガウシアン:ガウシアンスプレイティングによる動的都市景観のモデル化

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting ( http://arxiv.org/abs/2401.01339v2 )

ライセンス: Link先を確認

Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng,

(参考訳) 本稿では,自律走行シーンの動的街路をモデル化する問題に取り組むことを目的とする。近年の手法では、車両のアニメーション化に追従した車両のポーズを取り入れてNeRFを拡張し、ダイナミックな街路シーンの写実的なビュー合成を可能にしている。しかし、トレーニングの遅さとレンダリングのスピードには大きな制限がある。この制限に対処する新たな明示的なシーン表現であるStreet Gaussiansを紹介します。具体的には、ダイナミックアーバンシーンは、セマンティックロジットと3Dガウスアンを備えた点雲の集合として表現され、それぞれが前景車両または背景に関連付けられている。前景の物体車両の動力学をモデル化するために、各物体点雲は、動的外観のための4次元球面調和モデルとともに、最適化可能な追跡されたポーズで最適化される。明示的な表現は、オブジェクト車両と背景の簡単な構成を可能にし、30分以内のトレーニングで、シーン編集操作とレンダリングを135 FPS (1066$\times$1600 resolution)で行うことができる。提案手法は、KITTIやWaymo Openデータセットなど、複数の挑戦的なベンチマークで評価される。実験の結果,提案手法はすべてのデータセットで常に最先端の手法よりも優れていた。再現性を確保するために、コードはリリースされます。

This paper aims to tackle the problem of modeling dynamic urban streets for autonomous driving scenes. Recent methods extend NeRF by incorporating tracked vehicle poses to animate vehicles, enabling photo-realistic view synthesis of dynamic urban street scenes. However, significant limitations are their slow training and rendering speed. We introduce Street Gaussians, a new explicit scene representation that tackles these limitations. Specifically, the dynamic urban scene is represented as a set of point clouds equipped with semantic logits and 3D Gaussians, each associated with either a foreground vehicle or the background. To model the dynamics of foreground object vehicles, each object point cloud is optimized with optimizable tracked poses, along with a 4D spherical harmonics model for the dynamic appearance. The explicit representation allows easy composition of object vehicles and background, which in turn allows for scene editing operations and rendering at 135 FPS (1066 $\times$ 1600 resolution) within half an hour of training. The proposed method is evaluated on multiple challenging benchmarks, including KITTI and Waymo Open datasets. Experiments show that the proposed method consistently outperforms state-of-the-art methods across all datasets. The code will be released to ensure reproducibility.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# 要求工学における自然言語処理技術の選択と評価に関する実践的ガイドライン

Practical Guidelines for the Selection and Evaluation of Natural Language Processing Techniques in Requirements Engineering ( http://arxiv.org/abs/2401.01508v3 )

ライセンス: Link先を確認

Mehrdad Sabetzadeh, Chetan Arora,

(参考訳) 自然言語処理(NLP)が要求自動化の基礎になった。要求工学(RE)におけるNLPの採用の増加の背景にある重要な要因の1つは、業界における要求を特定するために自然言語(NL)が普及していることである。 NLP技術は、要求を自動的に分類し、重要な情報、例えばドメインモデルや用語を抽出し、曖昧性処理や完全性チェックなどの品質保証タスクを実行するために一般的に用いられる。多くの異なるNLPソリューション戦略が利用可能であり、機械学習を同時に適用することが可能であるため、特定のREタスクの適切な戦略を選択し、結果のソリューションを経験的に厳密な方法で評価することは困難である。本章では,NLP技術の選択に関するガイドラインと,REの文脈における評価について述べる。特に,従来のNLP,特徴ベース機械学習,言語モデルに基づく手法など,さまざまな戦略を選択する方法について議論する。この章の究極の希望は、NLP4REへの新規参入者を支援し、RE分野に最も関係のあるNLP技術に迅速に参入することである。

Natural Language Processing (NLP) is now a cornerstone of requirements automation. One compelling factor behind the growing adoption of NLP in Requirements Engineering (RE) is the prevalent use of natural language (NL) for specifying requirements in industry. NLP techniques are commonly used for automatically classifying requirements, extracting important information, e.g., domain models and glossary terms, and performing quality assurance tasks, such as ambiguity handling and completeness checking. With so many different NLP solution strategies available and the possibility of applying machine learning alongside, it can be challenging to choose the right strategy for a specific RE task and to evaluate the resulting solution in an empirically rigorous manner. In this chapter, we present guidelines for the selection of NLP techniques as well as for their evaluation in the context of RE. In particular, we discuss how to choose among different strategies such as traditional NLP, feature-based machine learning, and language-model-based methods. Our ultimate hope for this chapter is to serve as a stepping stone, assisting newcomers to NLP4RE in quickly initiating themselves into the NLP technologies most pertinent to the RE field.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# パワーロー減衰下における分析スペクトルアルゴリズムの一般化誤差曲線

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay ( http://arxiv.org/abs/2401.01599v2 )

ライセンス: Link先を確認

Yicheng Li, Weiye Gan, Zuoqiang Shi, Qian Lin,

(参考訳) カーネル回帰法の一般化誤差曲線は,極小率ではなく,様々な音源条件,雑音レベル,正規化パラメータの選択による一般化誤差の正確な順序を決定することを目的としている。本研究では、軽微な仮定の下で、カーネル回帰におけるカーネル勾配勾配法(および分析スペクトルアルゴリズムの大規模なクラス)の一般化誤差曲線を厳格に評価する。その結果、カーネル補間の不整合性を明確化し、より高い資格を有するカーネル回帰アルゴリズムの飽和効果を明らかにすることができた。ニューラル・タンジェント・カーネル理論により、これらの結果は広義のニューラルネットワークを訓練する際の一般化行動の理解を大幅に改善する。解析的機能論という新しい技術的貢献は、独立した関心事であるかもしれない。

The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# 認知領域における量子機械学習 : アルツハイマー病研究

Quantum Machine Learning in the Cognitive Domain: Alzheimer's Disease Study ( http://arxiv.org/abs/2401.06697v2 )

ライセンス: Link先を確認

Emine Akpinar,

(参考訳) アルツハイマー病(英語: Alzheimer's disease、AD)は、特に高齢者において認知障害となる神経変性性脳疾患である。認知障害は、集中力、記憶力、その他の高次認知能力などの様々な精神能力の低下として現れる。これらの欠陥は、個人が情報を理解し、新しい知識を取得し、効果的にコミュニケーションする能力に大きな影響を及ぼす可能性がある。認知障害による影響の1つは手書きである。圧力、速度、空間的な組織など、手書きのさまざまな側面を分析することで、研究者は早期の認知障害、特にADを示す微妙な変化を検出することができる。近年,高齢者のADを手書き解析により検出するための古典的人工知能(AI)手法がいくつか提案されている。しかし、高度なAI手法は、データのサイズが大きくなるにつれて、より多くの計算能力を必要とする。さらに、診断は古典的ベクトル空間の制限や特徴間の相関などの影響を受けうる。近年の研究では、医療における量子コンピューティング技術の使用は、これらの問題に対処するだけでなく、複雑なデータ分析を加速し、大規模データセットをより効率的に処理できることが示されている。本研究では,手書きデータに基づく高齢者のAD早期診断を容易にするため,回路要素が少ない変分量子分類器を提案する。機能のエンコーディングにはZZFeatureMapを使用しました。 ADを分類するために、繰り返しRyとRzの回転ゲートとCYとCZの2量子エンタングルゲートからなるパラメータ化量子回路を設計、実装した。提案したモデルはADの分類において0.75の精度を達成した。

Alzheimer's disease (AD) is the most prevalent neurodegenerative brain disorder, which results in significant cognitive impairments, especially in the elderly population. Cognitive impairments can manifest as a decline in various mental faculties, such as concentration, memory, and other higher-order cognitive abilities. These deficits can significantly impact an individual's capacity to comprehend information, acquire new knowledge, and communicate effectively. One of the affected activities due to cognitive impairments is handwriting. By analyzing different aspects of handwriting, including pressure, velocity, and spatial organization, researchers can detect subtle alterations that might indicate early-stage cognitive impairments, especially AD. Recently, several classical artificial intelligence (AI) approaches have been proposed for detecting AD in elderly individuals through handwriting analysis. However, advanced AI methods require more computational power as the size of the data increases. Additionally, diagnoses can be influenced by factors such as limited relevant classical vector space and correlations between features. Recent studies have shown that using quantum computing technologies in healthcare can not only address these problems but also accelerate complex data analysis and process large datasets more efficiently. In this study, we introduced a variational quantum classifier with fewer circuit elements to facilitate the early diagnosis of AD in elderly individuals based on handwriting data. We employed ZZFeatureMap for encoding features. To classify AD, a parameterized quantum circuit consisting of repeated Ry and Rz rotation gates, as well as CY and CZ two-qubit entangling gates, was designed and implemented. The proposed model achieved an accuracy of 0.75 in classifying AD.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# 暗黙のニューラルキャンバスの解説:その貢献を追究して、レンズとニューロンを繋ぐ

Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions ( http://arxiv.org/abs/2401.10217v2 )

ライセンス: Link先を確認

Namitha Padmanabhan, Matthew Gwilliam, Pulkit Kumar, Shishira R Maiya, Max Ehrlich, Abhinav Shrivastava,

(参考訳) ニューラルネットワークが信号の連続的な表現として訓練されるインプリシトニューラルネットワーク表現(INR)の多くのバリエーションは、新しいビュー合成、ビデオ圧縮、画像超解像といった下流タスクに極めて実用的である。残念なことに、これらのネットワークの内部構造は、あまり研究されていない。我々の研究であるeXplaining the Implicit Neural Canvas (XINC)は、各ニューロンの出力画素への寄与の強さを調べることによって、INRの特性を説明する統一的なフレームワークである。これらのコントリビューションの集合をImplicit Neural Canvasと呼び、この概念を使って、私たちが研究しているINRが、彼らの表現するフレームを驚くべき方法で"見る"ことを学ぶことを実証します。例えば、INRは高度に分散した表現を持つ傾向がある。高レベルのオブジェクトセマンティクスを欠いているが、色とエッジには大きなバイアスがあり、ほとんど完全に空間に依存しない。我々は、ビデオINRにおいてオブジェクトがどのように時間にわたって表現されるかを調べ、クラスタリングを使用して、レイヤやアーキテクチャにわたって類似したニューロンを視覚化し、これが動きに支配されていることを示す、という結論に達した。これらの知見は分析フレームワークの汎用性を示している。私たちのプロジェクトページはhttps://namithap10.github.io/xinc.comで公開されている。

The many variations of Implicit Neural Representations (INRs), where a neural network is trained as a continuous representation of a signal, have tremendous practical utility for downstream tasks including novel view synthesis, video compression, and image super-resolution. Unfortunately, the inner workings of these networks are seriously under-studied. Our work, eXplaining the Implicit Neural Canvas (XINC), is a unified framework for explaining properties of INRs by examining the strength of each neuron's contribution to each output pixel. We call the aggregate of these contribution maps the Implicit Neural Canvas and we use this concept to demonstrate that the INRs we study learn to "see" the frames they represent in surprising ways. For example, INRs tend to have highly distributed representations. While lacking high-level object semantics, they have a significant bias for color and edges, and are almost entirely space-agnostic. We arrive at our conclusions by examining how objects are represented across time in video INRs, using clustering to visualize similar neurons across layers and architectures, and show that this is dominated by motion. These insights demonstrate the general usefulness of our analysis framework. Our project page is available at https://namithap10.github.io/xinc.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# 簡易潜時拡散法によるパノプティカルセグメンテーションとマスク塗布

A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting ( http://arxiv.org/abs/2401.10227v2 )

ライセンス: Link先を確認

Wouter Van Gansbeke, Bert De Brabandere,

(参考訳) パノプティクスとインスタンスセグメンテーションネットワークは、しばしば、特殊なオブジェクト検出モジュール、複雑な損失関数、およびインスタンスマスクの置換不変性を管理するためのアドホックな後処理ステップで訓練される。この研究は安定拡散の上に構築され、汎視的セグメンテーションの潜在拡散アプローチを提案し、その結果、これらの複雑さを省略する単純なアーキテクチャをもたらす。トレーニングは,(1)部分分割マスクを潜伏空間に投影する浅層オートエンコーダの訓練,(2)潜伏空間における画像条件付きサンプリングを可能にする拡散モデルの訓練,の2段階からなる。この生成的アプローチは、マスクの完成または塗装の探索を解き放つ。 COCOとADE20kに関する実験的検証は、強いセグメンテーション結果をもたらす。最後に,学習可能なタスク埋め込みを導入することで,マルチタスクへのモデルの適応性を実証する。

Panoptic and instance segmentation networks are often trained with specialized object detection modules, complex loss functions, and ad-hoc post-processing steps to manage the permutation-invariance of the instance masks. This work builds upon Stable Diffusion and proposes a latent diffusion approach for panoptic segmentation, resulting in a simple architecture that omits these complexities. Our training consists of two steps: (1) training a shallow autoencoder to project the segmentation masks to latent space; (2) training a diffusion model to allow image-conditioned sampling in latent space. This generative approach unlocks the exploration of mask completion or inpainting. The experimental validation on COCO and ADE20k yields strong segmentation results. Finally, we demonstrate our model's adaptability to multi-tasking by introducing learnable task embeddings.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# Yoneda Lemmaを用いた完全同型暗号スキームの構築

Constructing a fully homomorphic encryption scheme with the Yoneda Lemma ( http://arxiv.org/abs/2401.13255v3 )

ライセンス: Link先を確認

Rémy Tuyéras,

(参考訳) 本稿では, Yoneda Lemmaの適用を通じて, 非対称暗号の同型暗号システムの基盤を再定義する。これは、ElGamal、RSA、Benaloh、RegevのLWE、NTRUEncryptといった広く採用されているシステムが、Yoneda Lemmaの原則から直接派生していることを示している。この合成により、Yoneda Encryption Schemeと呼ばれる全体論的同型暗号化フレームワークが生まれる。このスキームの中では、暗号は Yoneda Lemma 同型(英語版)の単射写像を通して解明され、復号はこれらの写像の自然性からシームレスに従う。この統合は統一モデル理論フレームワークの予想を示唆し、同型および完全同型暗号(FHE)スキームの推論の基礎を提供する。実演として、スキャッシングやブートストレッピングといった追加の調整を必要とせず、暗号化された乗算と加算の任意の有限列を処理できるFHE方式を提案する。このことは、提案された理論の進歩の実践的な意味を浮き彫りにするだけでなく、FHEスキームの設計を容易にするために、モデル理論と暗号の強制技術を活用する新たな可能性ももたらしている。

This paper redefines the foundations of asymmetric cryptography's homomorphic cryptosystems through the application of the Yoneda Lemma. It explicitly illustrates that widely adopted systems, including ElGamal, RSA, Benaloh, Regev's LWE, and NTRUEncrypt, directly derive from the principles of the Yoneda Lemma. This synthesis gives rise to a holistic homomorphic encryption framework named the Yoneda Encryption Scheme. Within this scheme, encryption is elucidated through the bijective maps of the Yoneda Lemma Isomorphism, and decryption seamlessly follows from the naturality of these maps. This unification suggests a conjecture for a unified model theory framework, providing a basis for reasoning about both homomorphic and fully homomorphic encryption (FHE) schemes. As a practical demonstration, the paper introduces an FHE scheme capable of processing arbitrary finite sequences of encrypted multiplications and additions without the need for additional tweaking techniques, such as squashing or bootstrapping. This not only underscores the practical implications of the proposed theoretical advancements but also introduces new possibilities for leveraging model theory and forcing techniques in cryptography to facilitate the design of FHE schemes.

翻訳日:2024-07-17 23:40:44 公開日:2024-07-16

# ConTextual:大規模マルチモーダルモデルにおけるコンテキスト感性テキストリッチビジュアル推論の評価

ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models ( http://arxiv.org/abs/2401.13311v3 )

ライセンス: Link先を確認

Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang, Nanyun Peng,

(参考訳) 多くの実世界のタスクでは、エージェントがテキストとビジュアルオブジェクト(例えば、公共空間をナビゲートする)を共同で推論する必要がある。具体的には、これらのタスクは、テキストが画像内の視覚的要素と相互作用するコンテキストを理解する必要がある。しかし、文脈に敏感なテキストリッチな視覚的推論に対して、最先端のマルチモーダルモデルの能力をベンチマークする既存のデータセットが欠如している。本稿では,テキストリッチな画像に対する文脈依存推論を必要とする人為的命令を特徴とする新しいデータセットであるConTextualを紹介する。我々は,14の基礎モデル(GPT-4V,Gemini-Pro-Vision,LLaVA-Next)の性能評価実験を行い,人間のパフォーマンスベースラインを確立する。さらに、モデル応答の人的評価を行い、GPT-4V(現在の最高性能の大規模マルチモーダルモデル)と人的性能の30.8%の顕著な性能ギャップを観察する。 GPT-4Vは時間関連データやインフォグラフィックの解釈が困難であることが明らかとなった。しかし、ミームや引用文のような抽象的な視覚的文脈を解釈する能力を示す。最後に、質的分析により、視覚の正確な知覚や幻覚の欠如など、パフォーマンスの低下に寄与する様々な要因が明らかになった。私たちのデータセット、コード、リーダーボードはプロジェクトページ https://con-textual.github.io/ で確認できます。

Many real-world tasks require an agent to reason jointly over text and visual objects, (e.g., navigating in public spaces), which we refer to as context-sensitive text-rich visual reasoning. Specifically, these tasks require an understanding of the context in which the text interacts with visual elements within an image. However, there is a lack of existing datasets to benchmark the state-of-the-art multimodal models' capability on context-sensitive text-rich visual reasoning. In this paper, we introduce ConTextual, a novel dataset featuring human-crafted instructions that require context-sensitive reasoning for text-rich images. We conduct experiments to assess the performance of 14 foundation models (GPT-4V, Gemini-Pro-Vision, LLaVA-Next) and establish a human performance baseline. Further, we perform human evaluations of the model responses and observe a significant performance gap of 30.8% between GPT-4V (the current best-performing Large Multimodal Model) and human performance. Our fine-grained analysis reveals that GPT-4V encounters difficulties interpreting time-related data and infographics. However, it demonstrates proficiency in comprehending abstract visual contexts such as memes and quotes. Finally, our qualitative analysis uncovers various factors contributing to poor performance including lack of precise visual perception and hallucinations. Our dataset, code, and leaderboard can be found on the project page https://con-textual.github.io/

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# 無限次元のChoi形式主義から完全正の動的半群の生成元の一意分解へ

From the Choi Formalism in Infinite Dimensions to Unique Decompositions of Generators of Completely Positive Dynamical Semigroups ( http://arxiv.org/abs/2401.14344v3 )

ライセンス: Link先を確認

Frederik vom Ende,

(参考訳) 任意の可分複素ヒルベルト空間が与えられたとき、純粋に虚トレースを持たないトレースクラス作用素$B$と、全正写像のノルム連続一パラメータ半群の任意の生成元$L$は、一意有界作用素$K$と一意完全正写像$Phi$が存在することを証明する。 (i)$L=K(\cdot)+(\cdot)K^*+\Phi$, (ii) Superoperator $\Phi(B^*(\cdot)B)$はトレースクラスであり、トレースが消滅する。 (iii)${\rm tr}(B^*K)$は実数である。私たちの証明の中心は、正の半定値作用素に完全正の写像を関連付けるチェ形式論の修正版である。この対応がそれぞれ単射かつ全射であるときの特徴付けを行い、その結果、主結果の証明アイデアが非分離ヒルベルト空間に拡張できない理由を説明する。特に、上述のヒルベルト空間が無限次元となるとすぐに、チェイ形式の下で空の事前像を持つ正半定値作用素の例が見つかる。

Given any separable complex Hilbert space, any trace-class operator $B$ which does not have purely imaginary trace, and any generator $L$ of a norm-continuous one-parameter semigroup of completely positive maps we prove that there exists a unique bounded operator $K$ and a unique completely positive map $\Phi$ such that (i) $L=K(\cdot)+(\cdot)K^*+\Phi$, (ii) the superoperator $\Phi(B^*(\cdot)B)$ is trace class and has vanishing trace, and (iii) ${\rm tr}(B^*K)$ is a real number. Central to our proof is a modified version of the Choi formalism which relates completely positive maps to positive semi-definite operators. We characterize when this correspondence is injective and surjective, respectively, which in turn explains why the proof idea of our main result cannot extend to non-separable Hilbert spaces. In particular, we find examples of positive semi-definite operators which have empty pre-image under the Choi formalism as soon as the underlying Hilbert space is infinite-dimensional.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# 浮動小数点演算におけるReLUとステップネットワークの表現力

Expressive Power of ReLU and Step Networks under Floating-Point Operations ( http://arxiv.org/abs/2401.15121v2 )

ライセンス: Link先を確認

Yeachan Park, Geonho Hwang, Wonyeol Lee, Sejun Park,

(参考訳) ニューラルネットワークの表現力の研究は、ニューラルネットワークの基本的限界について研究してきた。既存の結果の多くは、実数値入力とパラメータと、ニューラルネットワークの評価中の正確な操作を仮定している。しかしながら、ニューラルネットワークは通常、実数のごく一部しか表現できないコンピュータ上で実行され、不正確な操作を適用する。本研究では、実際に浮動小数点数と演算を使用する場合のニューラルネットワークの表現力について、より現実的な設定で分析する。最初の結果の集合は浮動小数点演算を仮定し、浮動小数点演算は有限ビットで表されるが、指数関数は任意の整数値を取ることができる。この設定では、バイナリしきい値単位またはReLUを用いたニューラルネットワークが有限入力/出力ペアを記憶し、任意の誤差内で連続関数を近似することができることを示す。特に、普遍近似と記憶のための構成におけるパラメータの数は、正確な数学的操作を仮定する古典的な結果と一致する。また,浮動小数点演算が有意および指数の両方に有限ビットを使用する場合の記憶と普遍近似についても同様の結果を示す。これらの結果はIEEE 754規格(例えば,32ビット単精度フォーマット)やbfloat16など,多くの一般的な浮動小数点形式に適用できる。

The study of the expressive power of neural networks has investigated the fundamental limits of neural networks. Most existing results assume real-valued inputs and parameters as well as exact operations during the evaluation of neural networks. However, neural networks are typically executed on computers that can only represent a tiny subset of the reals and apply inexact operations, i.e., most existing results do not apply to neural networks used in practice. In this work, we analyze the expressive power of neural networks under a more realistic setup: when we use floating-point numbers and operations as in practice. Our first set of results assumes floating-point operations where the significand of a float is represented by finite bits but its exponent can take any integer value. Under this setup, we show that neural networks using a binary threshold unit or ReLU can memorize any finite input/output pairs and can approximate any continuous function within an arbitrary error. In particular, the number of parameters in our constructions for universal approximation and memorization coincides with that in classical results assuming exact mathematical operations. We also show similar results on memorization and universal approximation when floating-point operations use finite bits for both significand and exponent; these results are applicable to many popular floating-point formats such as those defined in the IEEE 754 standard (e.g., 32-bit single-precision format) and bfloat16.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# 拡散に基づくグラフ生成法

Diffusion-based Graph Generative Methods ( http://arxiv.org/abs/2401.15617v2 )

ライセンス: Link先を確認

Hongyang Chen, Can Xu, Lingyu Zheng, Qiang Zhang, Xuemin Lin,

(参考訳) 最も最先端な生成法であるため、拡散法は幅広いタスクにおいて大きな進歩を見せている。中でもグラフ生成は、実生活に広く応用されていることから、大きな研究の注目を集めている。本研究では,拡散グラフ生成法について,系統的,包括的に検討した。まず,拡散確率モデル,スコアベース生成モデル,確率微分方程式の3つの主流パラダイムについて検討する。次に、グラフ上の拡散モデルの最新の応用を分類し、紹介する。最後に,現在の研究の限界と今後の探査の方向性を指摘する。この調査で得られた既存のメソッドの要約はhttps://github.com/zhejiangzhuque/Diffusion-based-Graph-Generative-Methodsにある。

Being the most cutting-edge generative methods, diffusion methods have shown great advances in wide generation tasks. Among them, graph generation attracts significant research attention for its broad application in real life. In our survey, we systematically and comprehensively review on diffusion-based graph generative methods. We first make a review on three mainstream paradigms of diffusion methods, which are denoising diffusion probabilistic models, score-based genrative models, and stochastic differential equations. Then we further categorize and introduce the latest applications of diffusion models on graphs. In the end, we point out some limitations of current studies and future directions of future explorations. The summary of existing methods metioned in this survey is in https://github.com/zhejiangzhuque/Diffusion-based-Graph-Generative-Methods.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# 連続強化学習のための世界モデルにおけるリプレイ強化

Augmenting Replay in World Models for Continual Reinforcement Learning ( http://arxiv.org/abs/2401.16650v3 )

ライセンス: Link先を確認

Luke Yang, Levin Kuhlmann, Gideon Kowadlo,

(参考訳) 連続RLは、エージェントが過去のタスクと将来のタスクの両方を改善しながら、以前のタスクを忘れずに新しいタスクを学ぶ必要がある。最も一般的なアプローチは、モデルフリーのアルゴリズムとリプレイバッファを使うことで、破滅的な忘れを軽減できますが、大きなメモリ要求のためにスケーラビリティに悩まされます。生物学的にインスパイアされたリプレイは、モデルベースのRLと整合した世界モデルへのリプレイを提案し、モデルフリーアルゴリズムにおけるリプレイの一般的な設定とは対照的である。モデルベースのRLは、ポリシーとは独立して環境の知識を活用することで、連続的なRLに利益をもたらす。 WMAR(World Models with Augmented Replay)は,メモリ効率の高い分散マッチングリプレイバッファを持つモデルベースRLアルゴリズムである。 WMARは、単純なFIFOバッファを使用し、連続RLではテストされなかった、よく知られたDreamerV3アルゴリズムを拡張している。 WMARとDreamerV3を同サイズのリプレイバッファで評価した。 OpenAI Procgenを使って共有構造を持つタスクと、Atariベンチマークを使って共有構造を持たないタスクの2つのシナリオでテストされた。 WMARは、過去のタスクと将来のタスクのスキル伝達だけでなく、忘れるための指標も考慮した連続RLに好適な特性を示した。 DreamerV3と比較して、WMARは共有構造を持つタスクにはわずかに利点があり、共有構造を持たないタスクの特徴をかなりよく忘れている。その結果,メモリ効率のよいリプレイバッファを持つモデルベースRLは連続RLに有効なアプローチであり,さらなる研究を正当化する可能性が示唆された。

Continual RL requires an agent to learn new tasks without forgetting previous ones, while improving on both past and future tasks. The most common approaches use model-free algorithms and replay buffers can help to mitigate catastrophic forgetting, but often struggle with scalability due to large memory requirements. Biologically inspired replay suggests replay to a world model, aligning with model-based RL; as opposed to the common setting of replay in model-free algorithms. Model-based RL offers benefits for continual RL by leveraging knowledge of the environment, independent of policy. We introduce WMAR (World Models with Augmented Replay), a model-based RL algorithm with a memory-efficient distribution-matching replay buffer. WMAR extends the well known DreamerV3 algorithm, which employs a simple FIFO buffer and was not tested in continual RL. We evaluated WMAR and DreamerV3, with the same-size replay buffers. They were tested on two scenarios: tasks with shared structure using OpenAI Procgen and tasks without shared structure using the Atari benchmark. WMAR demonstrated favourable properties for continual RL considering metrics for forgetting as well as skill transfer on past and future tasks. Compared to DreamerV3, WMAR showed slight benefits in tasks with shared structure and substantially better forgetting characteristics on tasks without shared structure. Our results suggest that model-based RL with a memory-efficient replay buffer can be an effective approach to continual RL, justifying further research.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# グラフオートエンコーダを用いたネットワーク表現の学習

Learning Network Representations with Disentangled Graph Auto-Encoder ( http://arxiv.org/abs/2402.01143v2 )

ライセンス: Link先を確認

Di Fan, Chuanhou Gao,

(参考訳) 変分グラフオートエンコーダはグラフ構造化データの表現を学習するために広く用いられている。しかし、実世界のグラフの形成は、潜在因子の影響を受け、複雑で不均一な過程である。既存のエンコーダは基本的に全体論的であり、潜在因子の絡み合いを無視している。これにより、グラフ解析タスクの有効性が低下すると同時に、学習した表現を説明するのが難しくなる。その結果、(変分)グラフオートエンコーダを用いた非絡合グラフ表現の学習は、大きな課題を生じさせ、現在の研究でほとんど解明されていない。本稿では,DVGA(Disentangled Graph Auto-Encoder)とDVGA(Disentangled Variational Graph Auto-Encoder)を導入して,不整形表現の学習を行う。具体的には、まず、エンコーダとして機能するマルチチャネルメッセージパッシング層を持つ非絡み合いグラフ畳み込みネットワークを設計する。これにより、各チャネルは各潜伏因子に関する情報を集約することができる。次に、各チャネルにコンポーネントワイドフローを適用することにより、不整合変分グラフ自動エンコーダの表現能力を向上する。さらに,不整合表現の特徴を考慮に入れた因子的デコーダを構築する。我々は、異なる潜伏要因のマッピングチャネルに独立性制約を課すことにより、表現の独立性を改善する。人工的および実世界の両方のデータセットに関する実証実験は、いくつかの最先端ベースラインと比較して提案手法の優位性を実証している。

The (variational) graph auto-encoder is widely used to learn representations for graph-structured data. However, the formation of real-world graphs is a complicated and heterogeneous process influenced by latent factors. Existing encoders are fundamentally holistic, neglecting the entanglement of latent factors. This reduces the effectiveness of graph analysis tasks, while also making it more difficult to explain the learned representations. As a result, learning disentangled graph representations with the (variational) graph auto-encoder poses significant challenges and remains largely unexplored in the current research. In this paper, we introduce the Disentangled Graph Auto-Encoder (DGA) and the Disentangled Variational Graph Auto-Encoder (DVGA) to learn disentangled representations. Specifically, we first design a disentangled graph convolutional network with multi-channel message-passing layers to serve as the encoder. This allows each channel to aggregate information about each latent factor. The disentangled variational graph auto-encoder's expressive capability is then enhanced by applying a component-wise flow to each channel. In addition, we construct a factor-wise decoder that takes into account the characteristics of disentangled representations. We improve the independence of representations by imposing independence constraints on the mapping channels for distinct latent factors. Empirical experiments on both synthetic and real-world datasets demonstrate the superiority of our proposed method compared to several state-of-the-art baselines.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# リクエストを超えて: ブラウザ間のWebトラッカー分類のためのHTTPレスポンスヘッダを不均衡に設定する

Beyond the Request: Harnessing HTTP Response Headers for Cross-Browser Web Tracker Classification in an Imbalanced Setting ( http://arxiv.org/abs/2402.01240v2 )

ライセンス: Link先を確認

Wolf Rieder, Philip Raschke, Thomas Cory,

(参考訳) World Wide Webの接続性はHTTPプロトコルに大きく影響しており、HTTPメッセージはWebセキュリティやプライバシ、特にWebトラッキングに関する規律に訴える情報的ヘッダフィールドを提供する。既存の調査では、Webトラッカーを特定するためにHTTPリクエストメッセージを使用しているが、HTTPレスポンスヘッダはしばしば見過ごされている。本研究は、二項化HTTP応答ヘッダを用いたWebトラッカー検出のための効果的な機械学習分類器を設計する試みである。トラフィック監視ブラウザエクステンションであるT.EXを通じて得られたChrome、Firefox、Braveブラウザのデータは、私たちのデータセットとして役立ちます。 10の教師付きモデルがChromeデータ上でトレーニングされ、1年後のChromeデータセットを含むすべてのブラウザでテストされた。結果は、ChromeとFirefoxで高い精度、F1スコア、精度、リコール、最小ログロスエラーを示したが、Braveのデータ分散と機能セットが異なるため、Braveのパフォーマンスは低い。その結果,これらの分類器はWebトラッカー検出に有効であることが示唆された。しかし、現実のアプリケーションテストはまだ進行中であり、トラッカータイプとより広範なラベルソースの区別は今後の研究で検討される可能性がある。

The World Wide Web's connectivity is greatly attributed to the HTTP protocol, with HTTP messages offering informative header fields that appeal to disciplines like web security and privacy, especially concerning web tracking. Despite existing research employing HTTP request messages to identify web trackers, HTTP response headers are often overlooked. This study endeavors to design effective machine learning classifiers for web tracker detection using binarized HTTP response headers. Data from the Chrome, Firefox, and Brave browsers, obtained through the traffic monitoring browser extension T.EX, serves as our dataset. Ten supervised models were trained on Chrome data and tested across all browsers, including a Chrome dataset from a year later. The results demonstrated high accuracy, F1-score, precision, recall, and minimal log-loss error for Chrome and Firefox, but subpar performance on Brave, potentially due to its distinct data distribution and feature set. The research suggests that these classifiers are viable for web tracker detection. However, real-world application testing remains pending, and the distinction between tracker types and broader label sources could be explored in future studies.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# 量子テレポーテーションにおけるノイズ低減

Noise mitigation in quantum teleportation ( http://arxiv.org/abs/2402.02343v2 )

ライセンス: Link先を確認

Zi-Jian Xu, Jun-Hong An,

(参考訳) 量子テレポーテーション(quantum teleportation)は、多くの量子技術において重要な構成要素である。しかし、ノイズによって引き起こされるデコヒーレンスによって量子テレポーテーションの実用的な実現は必然的に挑戦される。本稿では,離散型および連続型量子テレポーテーション方式の両方に適用可能なノイズ低減機構を提案する。 2種類の量子テレポーテーションスキームの非マルコフ的デコヒーレンスダイナミクスを調査した結果、関連するサブシステムとそれらの貯水池からなる全系のエネルギースペクトルにおいて境界状態が形成される限り、その忠実性の量子的優越性は持続的に回復されることがわかった。ノイズ緩和プロトコルの洞察に富んだ理解を提供するため,ノイズ耐性量子テレポーテーションの実現に向けての道を開いた。

Permitting the transmission of unknown quantum states over long distances by using entanglement, quantum teleportation serves as an important building block for many quantum technologies. However, in the noisy intermediate-scale quantum era, the practical realization of quantum teleportation is inevitably challenged by the noise-induced decoherence. We here propose a noise-mitigation mechanism applicable in both the discrete- and continuous-variable quantum teleportation schemes. Via investigating the non-Markovian decoherence dynamics of the two types of quantum teleportation schemes, we find that, as long as a bound state is formed in the energy spectrum of the total system consisting of the involved subsystems and their respective reservoirs, the quantum superiority of the fidelity is persistently recovered. Supplying an insightful understanding of the noise-mitigation protocols, our result paves the way to the practical realization of noise-tolerant quantum teleportation.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# LHRS-Bot:VGI強化大規模マルチモーダル言語モデルを用いたリモートセンシング

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model ( http://arxiv.org/abs/2402.02544v4 )

ライセンス: Link先を確認

Dilxat Muhtar, Zhenshi Li, Feng Gu, Xueliang Zhang, Pengfeng Xiao,

(参考訳) 大規模言語モデル(LLM)の革命的能力は、マルチモーダルな大規模言語モデル(MLLM)の道を切り開き、様々な専門分野にまたがる多様な応用を育んでいる。しかし、リモートセンシング(RS)分野では、最近のMLLMでは、多様な地形やRS画像の様々な物体が適切に考慮されていない。このギャップを埋めるために、大規模なRS画像テキストデータセットであるLHRS-Alignと情報的RS固有の命令データセットであるLHRS-Instructを構築し、大規模なボランティア地理情報(VGI)とグローバルに利用可能なRS画像を活用する。この基盤の上に構築されたLHRS-Botは、新しい多段階視覚言語アライメント戦略とカリキュラム学習手法により、RS画像理解に適したMLLMである。さらに、RS画像理解におけるMLLMの能力を徹底的に評価するベンチマークであるLHRS-Benchを紹介する。総合的な実験により、LHRS-BotはRS画像の深い理解と、RS領域内でニュアンス推論を行う能力を示すことが示された。

The revolutionary capabilities of large language models (LLMs) have paved the way for multimodal large language models (MLLMs) and fostered diverse applications across various specialized domains. In the remote sensing (RS) field, however, the diverse geographical landscapes and varied objects in RS imagery are not adequately considered in recent MLLM endeavors. To bridge this gap, we construct a large-scale RS image-text dataset, LHRS-Align, and an informative RS-specific instruction dataset, LHRS-Instruct, leveraging the extensive volunteered geographic information (VGI) and globally available RS images. Building on this foundation, we introduce LHRS-Bot, an MLLM tailored for RS image understanding through a novel multi-level vision-language alignment strategy and a curriculum learning method. Additionally, we introduce LHRS-Bench, a benchmark for thoroughly evaluating MLLMs' abilities in RS image understanding. Comprehensive experiments demonstrate that LHRS-Bot exhibits a profound understanding of RS images and the ability to perform nuanced reasoning within the RS domain.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# 拡張Open-Set Object DetectorによるクロスドメインFew-Shotオブジェクト検出

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector ( http://arxiv.org/abs/2402.03094v3 )

ライセンス: Link先を確認

Yuqian Fu, Yu Wang, Yixuan Pan, Lian Huai, Xingyu Qiu, Zeyu Shangguan, Tong Liu, Yanwei Fu, Luc Van Gool, Xingqun Jiang,

(参考訳) 本稿では,最小限のラベル付きサンプルを用いた新規ドメイン向け高精度物体検出装置の開発を目指して,CD-FSODの挑戦的領域間多重ショット検出手法について検討する。 DE-ViTのようなトランスフォーマーベースのオープンセット検出器は、従来の数発の物体検出において有望であるが、CD-FSODへの一般化はまだ不明である。 1) このような開集合検出法はCD-FSODに容易に一般化できるのか? 2) もしそうでなければ、巨大なドメインギャップに直面したモデルをどのように拡張できるでしょうか? 最初の質問に答えるために、私たちは、ドメインギャップを理解するために、スタイル、クラス間分散(ICV)、定義不能境界(IB)などの手段を使用します。これらの測定値に基づいて,オブジェクト検出手法を評価するためのCD-FSODという新しいベンチマークを構築し,現在のアプローチの大部分がドメイン全体の一般化に失敗していることを明らかにする。技術的には, 性能低下は, 提案手法であるスタイル, ICV, IBと関連していると考えられる。そこで本研究では,これらの問題に対処する新しいモジュールをいくつか提案する。まず、学習可能なインスタンス機能は、初期固定インスタンスをターゲットカテゴリに整列し、特徴の識別性を向上する。第二に、インスタンス再重み付けモジュールは、わずかなIBを持つ高品質なインスタンスにより高い重要性を割り当てる。第3に、ドメインプロンプトは、意味内容を変更することなく想像領域を合成することにより、異なるスタイルに回復する機能を奨励する。これらの技術はCD-FSOD(CD-ViTO)用クロスドメインビジョントランスの開発に一括して寄与し、D-ViTベースで大幅に改善された。実験により,本モデルの有効性が検証された。

This paper studies the challenging cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors, such as DE-ViT, show promise in traditional few-shot object detection, their generalization to CD-FSOD remains unclear: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If not, how can models be enhanced when facing huge domain gaps? To answer the first question, we employ measures including style, inter-class variance (ICV), and indefinable boundaries (IB) to understand the domain gap. Based on these measures, we establish a new benchmark named CD-FSOD to evaluate object detection methods, revealing that most of the current approaches fail to generalize across domains. Technically, we observe that the performance decline is associated with our proposed measures: style, ICV, and IB. Consequently, we propose several novel modules to address these issues. First, the learnable instance features align initial fixed instances with target categories, enhancing feature distinctiveness. Second, the instance reweighting module assigns higher importance to high-quality instances with slight IB. Third, the domain prompter encourages features resilient to different styles by synthesizing imaginary domains without altering semantic contents. These techniques collectively contribute to the development of the Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO), significantly improving upon the base DE-ViT. Experimental results validate the efficacy of our model.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# CAT-SAM:Segment Anything ModelのFew-Shot Adaptationのための条件調整

CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model ( http://arxiv.org/abs/2402.03631v3 )

ライセンス: Link先を確認

Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu,

(参考訳) 最近のSegment Anything Model (SAM) は、一般画像のセグメンテーションにおいて顕著なゼロショット能力と柔軟な幾何学的プロンプトを示した。しかしSAMは、航空、医療、非RGB画像など、様々な非伝統的なイメージを扱う際にしばしば苦労する。本稿では,CAT-SAM(ConditionAl Tuning Network)を提案する。 CAT-SAMはSAM全体を凍結し、マスクデコーダとイメージエンコーダに少数の学習可能なパラメータを同時に適用する。コア設計は、重厚画像エンコーダと軽量マスクデコーダのデコーダ条件付きジョイントチューニングを可能にするプロンプトブリッジ構造である。ブリッジングはマスクデコーダのプロンプトトークンを画像エンコーダにマッピングし、エンコーダとデコーダの相乗的適応を相互に促進する。我々は、入力空間に学習可能なプロンプトトークンを注入する1つのCAT-SAMと、軽量なアダプタネットワークを挿入する2つのCAT-SAM変異をもたらすイメージエンコーダの2つの代表的チューニング戦略を開発する。 11の非従来型タスクに対する大規模な実験により、CAT-SAMはどちらも、非常に困難なワンショット適応設定の下でも、より優れた目標セグメンテーション性能を達成することが示された。プロジェクトページ: https://xiaoaoran.github.io/projects/CAT-SAM

The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just few-shot target samples. CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters. The core design is a prompt bridge structure that enables decoder-conditioned joint tuning of the heavyweight image encoder and the lightweight mask decoder. The bridging maps the prompt token of the mask decoder to the image encoder, fostering synergic adaptation of the encoder and the decoder with mutual benefits. We develop two representative tuning strategies for the image encoder which leads to two CAT-SAM variants: one injecting learnable prompt tokens in the input space and the other inserting lightweight adapter networks. Extensive experiments over 11 unconventional tasks show that both CAT-SAM variants achieve superior target segmentation performance consistently even under the very challenging one-shot adaptation setup. Project page: https://xiaoaoran.github.io/projects/CAT-SAM

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# 安全なマルチモーダル学習システムに関する調査研究

A Survey on Safe Multi-Modal Learning System ( http://arxiv.org/abs/2402.05355v6 )

ライセンス: Link先を確認

Tianyi Zhao, Liangliang Zhang, Yao Ma, Lu Cheng,

(参考訳) 人工知能の急速な発展の中で、マルチモーダル学習システム(MMLS)は、様々なモーダル入力から情報を処理し統合する能力によって、注目を集めている。医療などの重要な分野での利用が拡大し、安全保証が重要な関心事となっている。しかし、その安全性に関する体系的な研究が欠如していることは、この分野の進歩にとって重要な障壁である。このギャップを埋めるために,MMLSの安全性を体系的に分類し評価する最初の分類法を提案する。この分類は、MMLSの安全性を保証するために重要な4つの基本的な柱、すなわち堅牢性、アライメント、監視、制御性に基づいて構成されている。この分類を活用して、既存の方法論、ベンチマーク、研究の現状をレビューするとともに、知識の主な限界とギャップを指摘します。最後に,MMLSの安全性に関するユニークな課題について論じる。これらの課題を明らかにするために,我々は今後の研究の道を開くことを目指しており,MMLSの安全性プロトコルの大幅な進歩につながる可能性のある潜在的方向性を提案する。

In the rapidly evolving landscape of artificial intelligence, multimodal learning systems (MMLS) have gained traction for their ability to process and integrate information from diverse modality inputs. Their expanding use in vital sectors such as healthcare has made safety assurance a critical concern. However, the absence of systematic research into their safety is a significant barrier to progress in this field. To bridge the gap, we present the first taxonomy that systematically categorizes and assesses MMLS safety. This taxonomy is structured around four fundamental pillars that are critical to ensuring the safety of MMLS: robustness, alignment, monitoring, and controllability. Leveraging this taxonomy, we review existing methodologies, benchmarks, and the current state of research, while also pinpointing the principal limitations and gaps in knowledge. Finally, we discuss unique challenges in MMLS safety. In illuminating these challenges, we aim to pave the way for future research, proposing potential directions that could lead to significant advancements in the safety protocols of MMLS.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# 未知状態を用いた実時間ホロスティックロボットの姿勢推定

Real-time Holistic Robot Pose Estimation with Unknown States ( http://arxiv.org/abs/2402.05655v4 )

ライセンス: Link先を確認

Shikun Ban, Juling Fan, Xiaoxuan Ma, Wentao Zhu, Yu Qiao, Yizhou Wang,

(参考訳) RGB画像からロボットのポーズを推定することは、コンピュータビジョンとロボット工学において重要な問題である。従来の手法は有望な性能を達成してきたが、そのほとんどはロボットの内部状態、例えば接地型ロボット関節角の完全な知識を前提としている。しかし、この仮定は現実的な状況では必ずしも有効ではない。マルチロボットのコラボレーションや人間とロボットのインタラクションのような現実世界のアプリケーションでは、ロボットの関節状態は共有されず、信頼できないこともある。一方, 従来のロボットの動作推定手法は, 計算負荷が重いため, リアルタイムアプリケーションをサポートできない。本研究は,RGB画像からリアルタイムロボットのポーズ推定を行う上で,既知のロボットの状態を必要としない効率的なフレームワークを提案する。本手法では,ロボットの状態パラメータ,キーポイント位置,ルート深さを推定し,各タスクにニューラルネットワークモジュールを用いて学習とシミュレートを容易にする。特に、繰り返し最適化することなく、単一のフィードフォワードパスでの推論を実現する。提案手法は,最先端の精度で12倍の速度向上を実現し,実時間で総合的なロボットのポーズ推定を可能にする。コードとモデルはhttps://github.com/Oliverbansk/Holistic-Robot-Pose-Estimationで公開されている。

Estimating robot pose from RGB images is a crucial problem in computer vision and robotics. While previous methods have achieved promising performance, most of them presume full knowledge of robot internal states, e.g. ground-truth robot joint angles. However, this assumption is not always valid in practical situations. In real-world applications such as multi-robot collaboration or human-robot interaction, the robot joint states might not be shared or could be unreliable. On the other hand, existing approaches that estimate robot pose without joint state priors suffer from heavy computation burdens and thus cannot support real-time applications. This work introduces an efficient framework for real-time robot pose estimation from RGB images without requiring known robot states. Our method estimates camera-to-robot rotation, robot state parameters, keypoint locations, and root depth, employing a neural network module for each task to facilitate learning and sim-to-real transfer. Notably, it achieves inference in a single feed-forward pass without iterative optimization. Our approach offers a 12-time speed increase with state-of-the-art accuracy, enabling real-time holistic robot pose estimation for the first time. Code and models are available at https://github.com/Oliverbansk/Holistic-Robot-Pose-Estimation.

翻訳日:2024-07-17 23:30:59 公開日:2024-07-16

# 集積オンチップフィルタによる高速制御による劣化保護超伝導量子ビット

Decay-protected superconducting qubit with fast control enabled by integrated on-chip filters ( http://arxiv.org/abs/2402.08906v2 )

ライセンス: Link先を確認

Aashish Sah, Suman Kundu, Heikki Suominen, Qiming Chen, Mikko Möttönen,

(参考訳) 超伝導量子ビットの高速ゲートと長いコヒーレンス時間を達成することは、通常、駆動線がより強く結合するか、または過度に強いマイクロ波信号が量子ビットに結合する必要があるという課題を示す。そこで本研究では、キュービット周波数で停止帯域を示すキュービットドライブのオンチップフィルタを導入し、低調波周波数での長いコヒーレンス時間と強い結合を可能にし、高速な単一キュービットゲートの実現と熱負荷の低減を実現した。フィルタは外因性緩和時間を数秒で示し、サブハーモニック制御を備えたサブ10nゲートを実現した。ここでは, ストップバンドにおける測定緩和時間を200倍に改善した。さらに、12 nsのパルス持続時間$$\pi$のラビ振動のサブハーモニック駆動を実装した。 2次元量子プロセッサにおけるオンチップフィルタと効率的なサブハーモニック駆動の実証は、制御線からの熱負荷とノイズを低減したスケーラブルな量子ビットアーキテクチャへの道を開く。

Achieving fast gates and long coherence times for superconducting qubits presents challenges, typically requiring either a stronger coupling of the drive line or an excessively strong microwave signal to the qubit. To address this, we introduce on-chip filters of the qubit drive exhibiting a stopband at the qubit frequency, thus enabling long coherence times and strong coupling at the subharmonic frequency, facilitating fast single-qubit gates, and reduced thermal load. The filters exhibit an extrinsic relaxation time of a few seconds while enabling sub-10-ns gates with subharmonic control. Here we show up to 200-fold improvement in the measured relaxation time at the stopband. Furthermore, we implement subharmonic driving of Rabi oscillations with a $\pi$ pulse duration of 12 ns. Our demonstration of on-chip filters and efficient subharmonic driving in a two-dimensional quantum processor paves the way for a scalable qubit architecture with reduced thermal load and noise from the control line.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# 連続列列列モデリングのための階層的状態空間モデル

Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling ( http://arxiv.org/abs/2402.10211v2 )

ライセンス: Link先を確認

Raunaq Bhirangi, Chenyu Wang, Venkatesh Pattabiraman, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, Lerrel Pinto,

(参考訳) 生の知覚データのシーケンスから推論することは、医療機器からロボティクスまで、あらゆる分野にまたがる問題である。これらの問題はしばしば、望ましい物理量のシーケンス(例えば力、慣性測定)を予測するために、センサーデータの長いシーケンス(例えば磁力計、ピエゾ抵抗器)を使用する。古典的なアプローチは、局所的な線形予測問題には強力だが、実世界のセンサーを使用すると、しばしば不足する。これらのセンサーは典型的には非線形であり、外部変数(例えば振動)に影響を受け、データ依存のドリフトを示す。多くの問題において、地上のトラスラベルを取得するには高価な機器を必要とするため、小さなラベル付きデータセットによって予測タスクが悪化する。本研究では,階層型状態空間モデル(HiSS)を提案する。 HiSSスタックは、時間階層を生成するために、互いに上にステートスペースモデルを構造化する。触覚に基づく状態予測から加速度計による慣性測定に至るまで、現実世界の6つのセンサデータセットにわたって、HiSSは、因果変換器、LSTM、S4、Mambaといった最先端のシーケンスモデルを、MSEで少なくとも23%上回っている。我々の実験は、HiSSがより小さなデータセットへの効率的なスケーリングを示し、既存のデータフィルタリング技術と互換性があることを示唆している。コード、データセット、ビデオはhttps://hiss-csp.github.io.comで見ることができる。

Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e.g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space Models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https://hiss-csp.github.io.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# Bayesian Online Multiple Testing: リソース割り当てアプローチ

Bayesian Online Multiple Testing: A Resource Allocation Approach ( http://arxiv.org/abs/2402.11425v4 )

ライセンス: Link先を確認

Ruicheng Ao, Hongyu Chen, David Simchi-Levi, Feng Zhu,

(参考訳) 各実験が仮説テストタスクに対応する複数の実験を順次実施する問題を考察する。各時点において、実験者は、次の実験結果が到着する前に、ヌル仮説を拒絶するか(または、同等に発見を主張するか)という不可解な決定をしなければならない。目標は、ローカル偽発見率(LFDR)によって測定されるすべての時点において、低いエラー率を維持しながら、発見回数を最大化することである。オンラインのknapsack問題として,外因性ランダム予算補充問題として定式化する。まず、一般的な到着分布から始め、単純なポリシーが$O(\sqrt{T})$後悔を達成することを示す。このような後悔率は一般的には実現不可能であることを示すことで、結果を補完する。次に、個別の到着分布に焦点を移します。オンラインリソース割り当て文献における多くの既存の再解決ヒューリスティックは、標準設定における有界損失を達成したとしても、$\Omega(\sqrt{T})$あるいは$\Omega(T)$後悔を招きかねない。標準政策は楽観的すぎる傾向にあり,要求発見を超越する傾向にあることから,予算安全バッファを組み込んだ新たな政策を提案する。小さな対数バッファは、後悔を$\Omega(\sqrt{T})$または$\Omega(T)$から$O(\ln^2T)$に減らすのに十分である。実用の観点からは、ポリシーを連続到着分布、時間依存情報構造、未知の$T$のシナリオに拡張する。ニューヨーク市のタクシー乗客の時系列データに合成実験と経験的応用の両方を施し,提案手法の有効性を検証した。本研究は,外因性予算補充を伴うオンライン資源配分問題において,政策がいかに効果的に設計されるべきかを強調した。

We consider the problem of sequentially conducting multiple experiments where each experiment corresponds to a hypothesis testing task. At each time point, the experimenter must make an irrevocable decision of whether to reject the null hypothesis (or equivalently claim a discovery) before the next experimental result arrives. The goal is to maximize the number of discoveries while maintaining a low error rate at all time points measured by Local False Discovery Rate (LFDR). We formulate the problem as an online knapsack problem with exogenous random budget replenishment. We start with general arrival distributions and show that a simple policy achieves a $O(\sqrt{T})$ regret. We complement the result by showing that such regret rate is in general not improvable. We then shift our focus to discrete arrival distributions. We find that many existing re-solving heuristics in the online resource allocation literature, albeit achieve bounded loss in canonical settings, may incur a $\Omega(\sqrt{T})$ or even a $\Omega(T)$ regret. With the observation that canonical policies tend to be too optimistic and over claim discoveries, we propose a novel policy that incorporates budget safety buffers. It turns out that a little more safety can greatly enhance efficiency -- small additional logarithmic buffers suffice to reduce the regret from $\Omega(\sqrt{T})$ or even $\Omega(T)$ to $O(\ln^2 T)$. From a practical perspective, we extend the policy to the scenario with continuous arrival distributions, time-dependent information structures, as well as unknown $T$. We conduct both synthetic experiments and empirical applications on a time series data from New York City taxi passengers to validate the performance of our proposed policies. Our results emphasize how effective policies should be designed in online resource allocation problems with exogenous budget replenishment.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# FairProof : ニューラルネットワークの信頼性と証明可能な公正性

FairProof : Confidential and Certifiable Fairness for Neural Networks ( http://arxiv.org/abs/2402.12572v2 )

ライセンス: Link先を確認

Chhavi Yadav, Amrita Roy Chowdhury, Dan Boneh, Kamalika Chaudhuri,

(参考訳) 機械学習モデルは、社会的アプリケーションでますます使われているが、法的およびプライバシー上の懸念は、しばしば秘密にしておくことを要求する。その結果、モデル予測の受信端にいる消費者の心の中で、これらのモデルの公平性に関する不信が高まっている。この目的のために,Zero-Knowledge Proofs (暗号プリミティブ) を用いて,機密性を保ちながらモデルの公正性を公に検証するシステムである \name を提案する。また、ZKPに適合し、本システムで使用される完全連結ニューラルネットワークの公平性検証アルゴリズムを提案する。 Gnark で \name を実装し、我々のシステムが実際に実現可能であることを実証的に示す。コードはhttps://github.com/infinite-pursuits/FairProof.comで入手できる。

Machine learning models are increasingly used in societal applications, yet legal and privacy concerns demand that they very often be kept confidential. Consequently, there is a growing distrust about the fairness properties of these models in the minds of consumers, who are often at the receiving end of model predictions. To this end, we propose \name -- a system that uses Zero-Knowledge Proofs (a cryptographic primitive) to publicly verify the fairness of a model, while maintaining confidentiality. We also propose a fairness certification algorithm for fully-connected neural networks which is befitting to ZKPs and is used in this system. We implement \name in Gnark and demonstrate empirically that our system is practically feasible. Code is available at https://github.com/infinite-pursuits/FairProof.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# 無秩序な相関を緩和する無秩序な概念発見

Unsupervised Concept Discovery Mitigates Spurious Correlations ( http://arxiv.org/abs/2402.13368v2 )

ライセンス: Link先を確認

Md Rifat Arefin, Yan Zhang, Aristide Baratin, Francesco Locatello, Irina Rish, Dianbo Liu, Kenji Kawaguchi,

(参考訳) トレーニングデータにおける急激な相関関係のモデルはしばしば脆い予測を発生させ、意図しないバイアスを導入する。この課題に対処するには、多くの場合、多くのアプリケーションでは容易に利用できない急激な相関を取り除くために、事前の知識とグループアノテーションに依存するメソッドが関係する。本稿では,教師なし対象中心学習と突発的相関の緩和の新たな関連性を確立する。ラベルと異なる相関関係を持つ部分群を直接推論する代わりに、本手法では、入力サンプル間で共有される離散的なアイデアという概念の発見に重点を置いている。既存のオブジェクト指向表現学習を活用したCoBalTは,サブグループの人によるラベル付けを必要とせず,効果的な相関を緩和する概念バランス技術である。サブポピュレーションシフトのためのベンチマークデータセットによる評価は、グループアノテーションを必要とせずに、最先端のベースラインよりも優れた、あるいは競合的なパフォーマンスを示している。コードはhttps://github.com/rarefin/CoBalT.comで入手できる。

Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases. Addressing this challenge typically involves methods relying on prior knowledge and group annotation to remove spurious correlations, which may not be readily available in many applications. In this paper, we establish a novel connection between unsupervised object-centric learning and mitigation of spurious correlations. Instead of directly inferring subgroups with varying correlations with labels, our approach focuses on discovering concepts: discrete ideas that are shared across input samples. Leveraging existing object-centric representation learning, we introduce CoBalT: a concept balancing technique that effectively mitigates spurious correlations without requiring human labeling of subgroups. Evaluation across the benchmark datasets for sub-population shifts demonstrate superior or competitive performance compared state-of-the-art baselines, without the need for group annotation. Code is available at https://github.com/rarefin/CoBalT.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# AIによる心理的仮説生成の自動化--大言語モデルが因果グラフに一致する場合

Automating psychological hypothesis generation with AI: when large language models meet causal graph ( http://arxiv.org/abs/2402.14424v3 )

ライセンス: Link先を確認

Song Tong, Kai Mao, Zhen Huang, Yukun Zhao, Kaiping Peng,

(参考訳) 因果知識グラフと大言語モデル(LLM)の相乗効果を利用して,心理学における計算仮説生成のための画期的なアプローチを提案する。 LLMを用いて43,312の心理学記事を分析し,因果関係のペアを抽出した。この分析は心理学の専門的な因果グラフを生み出した。リンク予測アルゴリズムを適用し,「幸福」に焦点をあてた130の心理学的仮説を作成した。興味深いことに, LLM と因果グラフの組み合わせは, LLM のみの仮説 (t(59) = 3.34, p=0.007, t(59) = 4.32, p<0.001, ) を明らかに上回り, 新奇性の観点から専門家レベルの洞察を反映している。このアライメントは、ディープセマンティック分析によってさらに裏付けられた。その結果, LLMと因果知識グラフなどの機械学習技術を組み合わせることで, 心理学における自動発見に革命をもたらし, 幅広い文献から新たな知見を抽出できることが示唆された。この研究は心理学と人工知能のクロスロードに立っており、心理学研究においてデータ駆動仮説生成のための新しい豊かなパラダイムを推進している。

Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 potential psychological hypotheses focusing on `well-being', then compared them against research ideas conceived by doctoral scholars and those produced solely by the LLM. Interestingly, our combined approach of a LLM and causal graphs mirrored the expert-level insights in terms of novelty, clearly surpassing the LLM-only hypotheses (t(59) = 3.34, p=0.007 and t(59) = 4.32, p<0.001, respectively). This alignment was further corroborated using deep semantic analysis. Our results show that combining LLM with machine learning techniques such as causal knowledge graphs can revolutionize automated discovery in psychology, extracting novel insights from the extensive literature. This work stands at the crossroads of psychology and artificial intelligence, championing a new enriched paradigm for data-driven hypothesis generation in psychological research.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# 構造Marginalizationと自己回帰順序による効果的なベイズ因果推論

Effective Bayesian Causal Inference via Structural Marginalisation and Autoregressive Orders ( http://arxiv.org/abs/2402.14781v2 )

ライセンス: Link先を確認

Christian Toth, Christian Knoll, Franz Pernkopf, Robert Peharz,

(参考訳) ベイズ因果推論(BCI)は、真の因果モデルに関する疫学的な不確実性を、因果モデルに対する後部平均化によって下流因果推論タスクに自然に組み込む。しかし、これは難解な数の因果構造が疎外されるため、非常に難しい計算問題を引き起こす。本研究では,構造学習問題を推論に分解する。 (i)因果順序、及び (ii)各変数の親集合に因果順序を付与する。変数あたりの親数を制限することで、多項式時間で親集合を正確に極小化することができ、因果順序のみを極小化することができる。そこで本研究では,勾配法で学習可能な因果順序(ARCO)に対する自己回帰モデルを提案する。提案手法は, スケールフリーおよびエルドス・レーニグラフ構造を用いた非線形加法雑音ベンチマークによる構造学習の最先端と実世界のデータに対する競合結果を得る。さらに,本手法は介入分布を正確に推算し,平均因果効果および他の多くの因果量の推定を行う。

Bayesian causal inference (BCI) naturally incorporates epistemic uncertainty about the true causal model into down-stream causal reasoning tasks by posterior averaging over causal models. However, this poses a tremendously hard computational problem due to the intractable number of causal structures to marginalise over. In this work, we decompose the structure learning problem into inferring (i) a causal order and (ii) a parent set for each variable given a causal order. By limiting the number of parents per variable, we can exactly marginalise over the parent sets in polynomial time, which leaves only the causal order to be marginalised. To this end, we propose a novel autoregressive model over causal orders (ARCO) learnable with gradient-based methods. Our method yields state-of-the-art in structure learning on simulated non-linear additive noise benchmarks with scale-free and Erdos-Renyi graph structures, and competitive results on real-world data. Moreover, we illustrate that our method accurately infers interventional distributions, which allows us to estimate posterior average causal effects and many other causal quantities of interest.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# 高速自動回帰デコードのためのLCM-to-SLM

Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding ( http://arxiv.org/abs/2402.16844v2 )

ライセンス: Link先を確認

Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi,

(参考訳) 大規模言語モデル(LLM)は、実際にはユビキタスなものとなり、翻訳、要約、命令の追従といった生成タスクに広く利用されている。しかし、その巨大なサイズと自動回帰デコードへの依存は、デプロイメントコストを増大させ、レイテンシクリティカルなアプリケーションでの使用を複雑にする。本研究では,異なる大きさの言語モデルを組み合わせて,高い性能を維持しながら自己回帰復号の効率を向上させるハイブリッド手法を提案する。提案手法では, 並列に全てのプロンプトトークンを符号化し, その表現を条件付けし, 小言語モデル(SLM)を導出し, その応答をより効率的に生成する。本研究では,エンコーダ・デコーダとモデルファミリのデコーダ・デコーダ・専用SLMの組み合わせについて検討し,SLMの微調整のみを要した。様々なベンチマークによる実験では、LLMと比較して、翻訳および要約タスクに対して1-2\%の小さなパフォーマンスペナルティで、最大4\times$の大幅なスピードアップが示されている。

Large language models (LLMs) have become ubiquitous in practice and are widely used for generation tasks such as translation, summarization and instruction following. However, their enormous size and reliance on autoregressive decoding increase deployment costs and complicate their use in latency-critical applications. In this work, we propose a hybrid approach that combines language models of different sizes to increase the efficiency of autoregressive decoding while maintaining high performance. Our method utilizes a pretrained frozen LLM that encodes all prompt tokens once in parallel, and uses the resulting representations to condition and guide a small language model (SLM), which then generates the response more efficiently. We investigate the combination of encoder-decoder LLMs with both encoder-decoder and decoder-only SLMs from different model families and only require fine-tuning of the SLM. Experiments with various benchmarks show substantial speedups of up to $4\times$, with minor performance penalties of $1-2\%$ for translation and summarization tasks compared to the LLM.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# ディスクリプタとしてのユニバーサルニューラルネットワークポテンシャル:量子コンピュータと古典コンピュータを用いたスケーラブルな化学特性予測を目指して

Universal neural network potentials as descriptors: Towards scalable chemical property prediction using quantum and classical computers ( http://arxiv.org/abs/2402.18433v2 )

ライセンス: Link先を確認

Tomoya Shiota, Kenji Ishihara, Wataru Mizukami,

(参考訳) 多様な化学特性の正確な予測は、分子設計と材料発見の進展に不可欠である。本稿では,化学特性予測のための汎用記述子として,普遍的ニューラルネットワークポテンシャルの中間情報を利用する汎用的アプローチを提案する。本手法は, 汎用力場のための洗練されたニューラルネットワークアーキテクチャを訓練することにより, 原子環境の伝達可能な表現を学習する,という知見に基づいている。本稿では,M3GNet や MACE などのグラフニューラルネットワークポテンシャルを用いた伝達学習が,量子機械学習を用いたNMR化学シフトの予測手法に匹敵する精度を実現するとともに,記述子のコンパクトさにもかかわらず,標準的な古典回帰モデルも実現可能であることを示す。特に、MACEディスクリプタは、薬物分子の${^{13}}$C NMR化学シフトベンチマークにおいて、これまでで最高の精度を示している。この研究は、特性を正確に予測する効率的な方法を提供し、新しい分子や物質の発見を加速させる可能性がある。

Accurate prediction of diverse chemical properties is crucial for advancing molecular design and materials discovery. Here we present a versatile approach that uses the intermediate information of a universal neural network potential as a general-purpose descriptor for chemical property prediction. Our method is based on the insight that by training a sophisticated neural network architecture for universal force fields, it learns transferable representations of atomic environments. We show that transfer learning with graph neural network potentials such as M3GNet and MACE achieves accuracy comparable to state-of-the-art methods for predicting the NMR chemical shifts of using quantum machine learning as well as a standard classical regression model, despite the compactness of its descriptors. In particular, the MACE descriptor demonstrates the highest accuracy to date on the ${^{13}}$C NMR chemical shift benchmarks for drug molecules. This work provides an efficient way to accurately predict properties, potentially accelerating the discovery of new molecules and materials.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# 安定化剤基底状態:理論、アルゴリズムおよび応用

Stabilizer ground states: theory, algorithms and applications ( http://arxiv.org/abs/2403.08441v2 )

ライセンス: Link先を確認

Jiace Sun, Lixue Cheng, Shi-Xin Zhang,

(参考訳) 安定化器状態は、単純な数学的構造のため、量子情報、量子エラー補正、量子回路シミュレーションで一般的に利用されてきた。本研究では、量子多体問題に対処するために安定化器状態を適用し、安定化器基底状態の概念を導入する。我々は、パウリ・ハミルトニアン将軍の安定化基底状態を特定するための同値形式を確立した。さらに、1次元局所ハミルトニアンの安定化基底状態を得るための完全かつ線形スケールのアルゴリズムも開発し、したがって離散最適化は不要である。この等価形式と線形スケールのアルゴリズムは、有限サイズシステムだけでなく、無限周期システムにも適用可能である。アルゴリズムのスケーラビリティと効率は、異なるハミルトン多様体上で数値的にベンチマークされる。最後に、安定化器基底状態は、量子システムの質的な理解だけでなく、古典的および量子コンピュータ上でのより高度な量子状態の基盤としても有望なツールであることを示す。

Stabilizer states have been commonly utilized in quantum information, quantum error correction, and quantum circuit simulation due to their simple mathematical structure. In this work, we apply stabilizer states to tackle quantum many-body problems and introduce the concept of stabilizer ground states. We establish an equivalence formalism for identifying stabilizer ground states of general Pauli Hamiltonians. Moreover, we also develop an exact and linear-scaled algorithm to obtain stabilizer ground states of 1D local Hamiltonians and thus free from discrete optimization. This proposed equivalence formalism and linear-scaled algorithm are not only applicable to finite-size systems, but also adaptable to infinite periodic systems. The scalability and efficiency of the algorithms are numerically benchmarked on different Hamiltonians. Finally, we demonstrate that stabilizer ground states are promising tools for not only qualitative understanding of quantum systems, but also cornerstones of more advanced quantum state ansatzes on both classical and quantum computers.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# Q学習者に対する戦略化:制御理論的アプローチ

Strategizing against Q-learners: A Control-theoretical Approach ( http://arxiv.org/abs/2403.08906v3 )

ライセンス: Link先を確認

Yuksel Arslantas, Ege Yuceel, Muhammed O. Sayin,

(参考訳) 本稿では,従来の多エージェント強化学習手法である独立Q-ラーニングアルゴリズム(Q-ラーニングアルゴリズム)の,通常型ゲームにおける高度な対戦相手の戦略的操作に対する感受性について検討する。敵のQ-ラーニングアルゴリズムを知っていれば、いかに戦略的に洗練されたエージェントが素質のQ-ラーナーを活用できるかを定量化する。この目的のために、戦略アクターの相互作用を確率ゲーム(Q-学習者のQ-関数推定を含む状態)として定式化し、Q-学習アルゴリズムが基礎となる力学系であるようにする。また、連続状態空間への量子化に基づく近似手法を提案し、競合する2人の戦略的アクターと1人の戦略的アクターのパフォーマンスを解析的および数値的に解析する。

In this paper, we explore the susceptibility of the independent Q-learning algorithms (a classical and widely used multi-agent reinforcement learning method) to strategic manipulation of sophisticated opponents in normal-form games played repeatedly. We quantify how much strategically sophisticated agents can exploit naive Q-learners if they know the opponents' Q-learning algorithm. To this end, we formulate the strategic actors' interactions as a stochastic game (whose state encompasses Q-function estimates of the Q-learners) as if the Q-learning algorithms are the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance for two competing strategic actors and a single strategic actor both analytically and numerically.

翻訳日:2024-07-17 21:30:11 公開日:2024-07-16

# PYRA: トレーニング推論効率の良いタスク適応のための並列収量再活性化

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation ( http://arxiv.org/abs/2403.09192v3 )

ライセンス: Link先を確認

Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding,

(参考訳) 近年, 変圧器の規模が急速に拡大し, タスク適応の分野において, トレーニングオーバーヘッドや推論効率の面で大きな課題がもたらされている。既存の研究、すなわちパラメータ効率のよいファインチューニング(PEFT)とモデル圧縮は、これらの課題を別々に検討している。しかしPEFTは、特に大規模モデルでは、元のバックボーンの推論効率を保証できない。モデル圧縮は構造探索と再訓練にかなりの訓練コストを必要とする。したがって、これらの単純な組み合わせは、最小のコストでトレーニング効率と推論効率の両方を達成することを保証できない。本稿では,PYRA(Parallel Yielding Re-Activation)手法を提案する。 PYRAは、まず並列出力適応重みを利用して、下流タスクのデータ分布を包括的に知覚する。その後、トークン変調のための再活性化戦略がマージされるトークンに適用され、キャリブレーションされたトークン特徴が導かれる。 PYRAは低圧縮率と高圧縮率の両方で競合する全ての手法より優れており、大規模基礎モデルのトレーニング効率と推論効率の両面において、PYRAの有効性と優位性を示している。私たちのコードはhttps://github.com/THU-MIG/PYRA.comで公開されています。

Recently, the scale of transformers has grown rapidly, which introduces considerable challenges in terms of training overhead and inference efficiency in the scope of task adaptation. Existing works, namely Parameter-Efficient Fine-Tuning (PEFT) and model compression, have separately investigated the challenges. However, PEFT cannot guarantee the inference efficiency of the original backbone, especially for large-scale models. Model compression requires significant training costs for structure searching and re-training. Consequently, a simple combination of them cannot guarantee accomplishing both training efficiency and inference efficiency with minimal costs. In this paper, we propose a novel Parallel Yielding Re-Activation (PYRA) method for such a challenge of training-inference efficient task adaptation. PYRA first utilizes parallel yielding adaptive weights to comprehensively perceive the data distribution in downstream tasks. A re-activation strategy for token modulation is then applied for tokens to be merged, leading to calibrated token features. Extensive experiments demonstrate that PYRA outperforms all competing methods under both low compression rate and high compression rate, demonstrating its effectiveness and superiority in maintaining both training efficiency and inference efficiency for large-scale foundation models. Our code is available at https://github.com/THU-MIG/PYRA.

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# SCP-Diff:拡散に基づくセマンティック画像合成のための空間カテゴリー結合

SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis ( http://arxiv.org/abs/2403.09638v2 )

ライセンス: Link先を確認

Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhao,

(参考訳) セマンティック画像合成(SIS)は、センサシミュレーションに良い可能性を示している。しかし、この分野の現在のベストプラクティスは、GANに基づいており、まだ望ましい品質レベルに達していません。遅延拡散モデルが画像生成において顕著な進歩を遂げる中、我々はその高密度制御能力の顕著な方法である制御ネットを評価するよう促される。調査の結果,大きなセマンティック領域に奇妙なサブ構造が存在すること,セマンティックマスクによるコンテンツ調整の誤り,という2つの大きな問題が明らかになった。実験的な研究を通じて,これらの問題の原因を,推測段階で適用される雑音付きトレーニングデータ分布と標準正規値とのミスマッチとして特定した。この課題に対処するために、推論に先立って、空間的、カテゴリー的、および新しい空間的カテゴリー的関節を含む、SISの特定のノイズ先行法を開発した。 SCP-Diffという名前のこのアプローチは、SIS on Cityscapes, ADE20K and COCO-Stuffにおいて、新しい最先端の成果を設定し、Cityscapesでは10.53という低いFIDが得られる。コードとモデルはプロジェクトページからアクセスすることができる。

Semantic image synthesis (SIS) shows good promises for sensor simulation. However, current best practices in this field, based on GANs, have not yet reached the desired level of quality. As latent diffusion models make significant strides in image generation, we are prompted to evaluate ControlNet, a notable method for its dense control capabilities. Our investigation uncovered two primary issues with its results: the presence of weird sub-structures within large semantic areas and the misalignment of content with the semantic mask. Through empirical study, we pinpointed the cause of these problems as a mismatch between the noised training data distribution and the standard normal prior applied at the inference stage. To address this challenge, we developed specific noise priors for SIS, encompassing spatial, categorical, and a novel spatial-categorical joint prior for inference. This approach, which we have named SCP-Diff, has set new state-of-the-art results in SIS on Cityscapes, ADE20K and COCO-Stuff, yielding a FID as low as 10.53 on Cityscapes. The code and models can be accessed via the project page.

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# ニューロシンボリックビデオ理解に向けて

Towards Neuro-Symbolic Video Understanding ( http://arxiv.org/abs/2403.11021v2 )

ライセンス: Link先を確認

Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali,

(参考訳) 近年のビデオデータ生産の急激な増加は、下流のタスクのためにビデオから意味のあるフレームを抽出する効率的なツールを必要としている。長期的時間的推論は、フレーム検索システムにとって重要なデシダータムである。 VideoLLaMAやViCLIPのような最先端の基盤モデルは、短期的な意味理解に熟練しているが、フレーム間の長期的な推論では驚くほど失敗する。この失敗の主な理由は、フレーム単位の認識と時間的推論を1つのディープネットワークに織り込むためである。したがって、効率的なシーン識別には、疎結合だが協調設計のセマンティック理解と時間的推論が不可欠である。本稿では,個々のフレームのセマンティック理解に視覚言語モデルを活用するシステムを提案する。我々のTLベースの推論は、WaymoやNuScenesといった最先端の自動運転データセットの推論にGPT4を使用するベンチマークと比較して、複雑なイベント識別のF1スコアを9～15%改善します。

The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reasoning across frames. A key reason for this failure is that they intertwine per-frame perception and temporal reasoning into a single deep network. Hence, decoupling but co-designing semantic understanding and temporal reasoning is essential for efficient scene identification. We propose a system that leverages vision-language models for semantic understanding of individual frames but effectively reasons about the long-term evolution of events using state machines and temporal logic (TL) formulae that inherently capture memory. Our TL-based reasoning improves the F1 score of complex event identification by 9-15% compared to benchmarks that use GPT4 for reasoning on state-of-the-art self-driving datasets such as Waymo and NuScenes.

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# 教師なしマルチクラス異常検出のための統一参照表現の学習

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection ( http://arxiv.org/abs/2403.11561v2 )

ライセンス: Link先を確認

Liren He, Zhengkai Jiang, Jinlong Peng, Liang Liu, Qiangang Du, Xiaobin Hu, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang,

(参考訳) 多クラス異常検出の分野では、単一クラス異常検出から導かれる再構成に基づく手法は、「学習ショートカット」というよく知られた課題に直面し、モデルが通常のサンプルのパターンを学習するのに失敗し、その代わりにアイデンティティマッピングや人工ノイズ除去などのショートカットを選択する。結果として、モデルは通常のインスタンスとして真の異常を再構築することができなくなり、結果として異常検出が失敗する。本稿では,RLR (Reconstruct features from a Learnable Reference representation) と呼ばれる新しい特徴再構成に基づく異常検出フレームワークを提案する。従来の方法とは異なり、RLRは学習可能な参照表現を使用して、モデルに正常な特徴パターンを明示的に学習させる。さらに、RLRは学習可能な参照に局所性制約を組み込んで、より効果的な正常なパターンキャプチャを容易にし、マスク付き学習可能なキーアテンション機構を使用して堅牢性を高める。 15カテゴリのMVTec-ADデータセットと12カテゴリのVisAデータセットによるRLRの評価は、統一された設定下での最先端手法と比較して優れた性能を示している。 RLRのコードは公開されます。

In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of "learning shortcuts", wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity mapping or artificial noise elimination. Consequently, the model becomes unable to reconstruct genuine anomalies as normal instances, resulting in a failure of anomaly detection. To counter this issue, we present a novel unified feature reconstruction-based anomaly detection framework termed RLR (Reconstruct features from a Learnable Reference representation). Unlike previous methods, RLR utilizes learnable reference representations to compel the model to learn normal feature patterns explicitly, thereby prevents the model from succumbing to the "learning shortcuts" issue. Additionally, RLR incorporates locality constraints into the learnable reference to facilitate more effective normal pattern capture and utilizes a masked learnable key attention mechanism to enhance robustness. Evaluation of RLR on the 15-category MVTec-AD dataset and the 12-category VisA dataset shows superior performance compared to state-of-the-art methods under the unified setting. The code of RLR will be publicly available.

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# GVGEN: ボリューム表現によるテキストから3D生成

GVGEN: Text-to-3D Generation with Volumetric Representation ( http://arxiv.org/abs/2403.12957v2 )

ライセンス: Link先を確認

Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He,

(参考訳) 近年, 高速かつ高品質なレンダリング機能で知られる3次元再構成・生成技術として, 3次元ガウシアンスプラッティングが登場している。これらの欠点に対処するために,テキスト入力から3次元ガウス表現を効率的に生成する新しい拡散型フレームワークGVGENを提案する。提案手法は,(1)構造化体積表現法である。まず、分解された3次元ガウス点を構成形式として配置する。この変換により、一定数のガウスからなる体積内で複雑なテクスチャの詳細を捉えることができる。これらの詳細の表現を最適化するために,Candidate Pool Strategy という独特なプルーニング・デンシフィケーション手法を提案する。 2)粗粒化パイプライン GaussianVolumeの生成を単純化し、詳細な3次元形状のインスタンスを生成するためにモデルに力を与えるため、粗いパイプラインを提案する。最初は基本的な幾何学構造を構築し、続いて完全なガウス属性の予測を行う。筆者らのフレームワークであるGVGENは,既存の3次元生成法と比較して質的,定量的な評価において優れた性能を示す。同時に、高速な生成速度($7秒)を維持し、品質と効率のバランスを効果的に損なう。私たちのプロジェクトページは以下のとおりです。

In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed ($\sim$7 seconds), effectively striking a balance between quality and efficiency. Our project page is: https://gvgen.github.io/

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# 視覚変換器の回転位置埋め込み

Rotary Position Embedding for Vision Transformer ( http://arxiv.org/abs/2403.13298v2 )

ライセンス: Link先を確認

Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun,

(参考訳) RoPE(Rotary Position Embedding)は、特にトランスフォーマーの長さ外挿において、言語モデルにおいて顕著に機能する。しかし、RoPEは視覚変換器(ViT)の性能を言語ドメインと似た方法で向上させることができるにもかかわらず、コンピュータビジョン領域に対するRoPEの影響は過小評価されている。本研究では,2次元視覚データに対するRoPEの実践的実装を利用して,VTに適用したRoPEの包括的解析を行う。解析の結果、RoPEは印象的な外挿性能、すなわち推論時の画像分解能を高めながら精度を維持できることが判明した。最終的にImageNet-1k、COCO検出、ADE-20kセグメンテーションのパフォーマンスが向上した。この研究は、RoPEをViTに適用するための徹底的なガイドラインを提供し、計算オーバーヘッドを最小限に抑えたバックボーン性能の向上を約束する。私たちのコードと事前訓練済みモデルはhttps://github.com/naver-ai/rope-vitで利用可能です。

Rotary Position Embedding (RoPE) performs remarkably on language models, especially for length extrapolation of Transformers. However, the impacts of RoPE on computer vision domains have been underexplored, even though RoPE appears capable of enhancing Vision Transformer (ViT) performance in a way similar to the language domain. This study provides a comprehensive analysis of RoPE when applied to ViTs, utilizing practical implementations of RoPE for 2D vision data. The analysis reveals that RoPE demonstrates impressive extrapolation performance, i.e., maintaining precision while increasing image resolution at inference. It eventually leads to performance improvement for ImageNet-1k, COCO detection, and ADE-20k segmentation. We believe this study provides thorough guidelines to apply RoPE into ViT, promising improved backbone performance with minimal extra computational overhead. Our code and pre-trained models are available at https://github.com/naver-ai/rope-vit

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# DaCapo: ビデオ分析のための自律システムにおける継続的学習の高速化

DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics ( http://arxiv.org/abs/2403.14353v3 )

ライセンス: Link先を確認

Yoonsung Kim, Changhun Oh, Jinwoo Hwang, Wonung Kim, Seongryong Oh, Yubin Lee, Hardik Sharma, Amir Yazdanbakhsh, Jongse Park,

(参考訳) ディープニューラルネットワーク(DNN)ビデオ分析は、自動運転車、無人航空機(UAV)、セキュリティロボットなどの自律システムにとって不可欠である。しかし、実際のデプロイメントは、計算リソースの制限とバッテリ電力のために困難に直面している。これらの課題に取り組むために、継続的学習は、デプロイメント(推論)における軽量な"学生"モデルを利用し、サンプルデータ(ラベル付け)のラベル付けにより大きな"教師"モデルを活用し、変化するシナリオ(トレーニング)に適応するために、学生モデルを継続的に再トレーニングする。本稿では,1)推論とラベリングの計算ニーズを見越しながら,リトレーニングのための計算に重点を置くこと,(2)バッテリー駆動の自律システムには適さないパワーハングリーGPUに依存すること,(3)マルチテナントシナリオを想定したリモート集中型サーバ上に置かれること,そして,プライバシー,ネットワーク可用性,レイテンシに関する懸念から,自律システムには適さないこと,といった,最先端の継続的学習システムの限界を強調した。本研究では,自律型システムによる推論,ラベル付け,トレーニングの同時実行を実現するためのハードウェアアルゴリズムであるDaCapoを提案する。 DaCapoは,(1)サブアクセラレータ上のカーネルをそれぞれの精度で並列実行可能な空間分割可能かつ高精度な加速器と,(2)資源・正確性トレードオフ空間を戦略的にナビゲートし,資源割り当ての最適決定を容易にする時空間資源割り当てアルゴリズムを備える。評価の結果,DaCapoは最先端のGPUベースの継続的学習システムであるEkyaとEOMUよりも6.5%,5.5%高い精度を実現し,消費電力は254倍減少した。

Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots. However, real-world deployment faces challenges due to their limited computational resources and battery power. To tackle these challenges, continuous learning exploits a lightweight "student" model at deployment (inference), leverages a larger "teacher" model for labeling sampled data (labeling), and continuously retrains the student model to adapt to changing scenarios (retraining). This paper highlights the limitations in state-of-the-art continuous learning systems: (1) they focus on computations for retraining, while overlooking the compute needs for inference and labeling, (2) they rely on power-hungry GPUs, unsuitable for battery-operated autonomous systems, and (3) they are located on a remote centralized server, intended for multi-tenant scenarios, again unsuitable for autonomous systems due to privacy, network availability, and latency concerns. We propose a hardware-algorithm co-designed solution for continuous learning, DaCapo, that enables autonomous systems to perform concurrent executions of inference, labeling, and training in a performant and energy-efficient manner. DaCapo comprises (1) a spatially-partitionable and precision-flexible accelerator enabling parallel execution of kernels on sub-accelerators at their respective precisions, and (2) a spatiotemporal resource allocation algorithm that strategically navigates the resource-accuracy tradeoff space, facilitating optimal decisions for resource allocation to achieve maximal accuracy. Our evaluation shows that DaCapo achieves 6.5% and 5.5% higher accuracy than a state-of-the-art GPU-based continuous learning systems, Ekya and EOMU, respectively, while consuming 254x less power.

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# 格子量子色力学の量子シミュレーションのためのQu8its

Qu8its for Quantum Simulations of Lattice Quantum Chromodynamics ( http://arxiv.org/abs/2403.14537v2 )

ライセンス: Link先を確認

Marc Illa, Caroline E. P. Robin, Martin J. Savage,

(参考訳) 1+1D SU(3)格子量子色力学の力学の量子シミュレーションにおける$d=8$ qudits, qu8itsの有用性を探求する。並列ゲートの応用の最近の進歩は、2クォーディット演算と比較して単一クォーディット演算の適用時間が短くなり、量子シミュレーションの忠実度や量子ビットではなくクォーディットを用いた回路深度において大きな利点が期待できる。 qu8itsを用いた時間進化に必要な2量子エンタングリングゲートの数は、qubitsよりも5倍以下であることが判明した。この研究で示された発展により、新しい量子ハードウェアを用いて改良された量子シミュレーションが実行できるようになることを期待する。

We explore the utility of $d=8$ qudits, qu8its, for quantum simulations of the dynamics of 1+1D SU(3) lattice quantum chromodynamics, including a mapping for arbitrary numbers of flavors and lattice size and a re-organization of the Hamiltonian for efficient time-evolution. Recent advances in parallel gate applications, along with the shorter application times of single-qudit operations compared with two-qudit operations, lead to significant projected advantages in quantum simulation fidelities and circuit depths using qu8its rather than qubits. The number of two-qudit entangling gates required for time evolution using qu8its is found to be more than a factor of five fewer than for qubits. We anticipate that the developments presented in this work will enable improved quantum simulations to be performed using emerging quantum hardware.

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# データ増幅学習による簡潔で高品質な顔作り

Toward Tiny and High-quality Facial Makeup with Data Amplify Learning ( http://arxiv.org/abs/2403.15033v3 )

ライセンス: Link先を確認

Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin, Ying Chen, Rui Shi, Yucheng Zheng, Yupeng Zhu, Bingbing Ni,

(参考訳) 現代の化粧は、主に障害のない学習パラダイムにヒンジでアプローチするが、不正確な監督(例えば、顔の修正)と洗練された顔のプロンプト(顔解析、ランドマーク検出を含む)の課題に対処する。これらの課題は、特にモバイルデバイスにおける顔化粧モデルの低コスト展開を禁止している。以上の問題を解決するために、我々は「データ増幅学習(DAL)」と呼ばれる新しい学習パラダイムを提案し、さらに「TinyBeauty」というコンパクトメイクモデルも提案する。 DALの中核となる考え方は、DDA(Diffusion-based Data Amplifier)を使用して、モデルトレーニングのための制限されたイメージを"増幅する"ことである。 1)残差拡散モデル(RDM)は、高忠実度の詳細を生成し、バニラ拡散モデルにおける詳細化問題を回避し、(2)ファイングラインドメイクアップモジュール(FGMM)は、顔認証を維持しながら正確なメイクアップ制御と組み合わせを実現するために提案されている。 DALと組み合わせて、TinyBeautyは80Kパラメータを必要とせず、複雑な顔プロンプトなしで最先端のパフォーマンスを実現する。一方、TinyBeautyはiPhone 13で460fpsという驚くべき速度を実現している。大規模な実験により、DALは5つの画像ペアだけで非常に競争力のあるメイクモデルを作成できることが示された。

Contemporary makeup approaches primarily hinge on unpaired learning paradigms, yet they grapple with the challenges of inaccurate supervision (e.g., face misalignment) and sophisticated facial prompts (including face parsing, and landmark detection). These challenges prohibit low-cost deployment of facial makeup models, especially on mobile devices. To solve above problems, we propose a brand-new learning paradigm, termed "Data Amplify Learning (DAL)," alongside a compact makeup model named "TinyBeauty." The core idea of DAL lies in employing a Diffusion-based Data Amplifier (DDA) to "amplify" limited images for the model training, thereby enabling accurate pixel-to-pixel supervision with merely a handful of annotations. Two pivotal innovations in DDA facilitate the above training approach: (1) A Residual Diffusion Model (RDM) is designed to generate high-fidelity detail and circumvent the detail vanishing problem in the vanilla diffusion models; (2) A Fine-Grained Makeup Module (FGMM) is proposed to achieve precise makeup control and combination while retaining face identity. Coupled with DAL, TinyBeauty necessitates merely 80K parameters to achieve a state-of-the-art performance without intricate face prompts. Meanwhile, TinyBeauty achieves a remarkable inference speed of up to 460 fps on the iPhone 13. Extensive experiments show that DAL can produce highly competitive makeup models using only 5 image pairs.

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# 無視からの憎しみ! 会話のヘイトスピーチに対する説得モードの蒸留

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech ( http://arxiv.org/abs/2403.15449v2 )

ライセンス: Link先を確認

Ghadi Alyahya, Abeer Aldayel,

(参考訳) カウンター音声が使用する要因を調べることは、オンラインでヘイトスピーチに直面する最適な方法を理解するための中核にある。様々な研究は、感情的共感、攻撃性、敵意など、カウンタースピーチで使用される感情的基盤因子を評価してきた。本研究は、会話で使用される対語をより深く理解するために、説得様式を理性、感情、信頼性に精査し、人種差別、セクシズム、宗教的な偏見に関するクローズド(複数ターン)とオープン(単ターン)の2つのタイプの会話相互作用におけるそれらの使用を評価した。この評価は、機械が生成した逆音声とは対照的に、人間のソースで見られる異なる振る舞いをカバーしている。また、対訳に見られる姿勢と説得の態様との相互作用を評価する。特に、オープン・クローズドな相互作用において用いられる対音声の説得モードの微妙な相違、特にトピックの観点からは、コメントを憎むための対位法を表現するために、説得モードとして理性を用いる傾向が一般的である。機械が生成したカウンター音声は感情的な説得モードを示す傾向があり、人間のカウンターは理性に傾いている。さらに,本研究は,他の説得モードよりも支援的な応答が得られる傾向が示唆された。この知見は、ヘイトスピーチに対する研究に説得モードを組み込むことの可能性を強調し、それらが説明可能性の最適な手段として機能し、回答のスタンスをさらに導入する方法と、それが最適なカウンター音声を構成するものを評価する上で果たす役割を明らかにする。

Examining the factors that the counterspeech uses are at the core of understanding the optimal methods for confronting hate speech online. Various studies have assessed the emotional base factors used in counter speech, such as emotional empathy, offensiveness, and hostility. To better understand the counterspeech used in conversations, this study distills persuasion modes into reason, emotion, and credibility and evaluates their use in two types of conversation interactions: closed (multi-turn) and open (single-turn) concerning racism, sexism, and religious bigotry. The evaluation covers the distinct behaviors seen with human-sourced as opposed to machine-generated counterspeech. It also assesses the interplay between the stance taken and the mode of persuasion seen in the counterspeech. Notably, we observe nuanced differences in the counterspeech persuasion modes used in open and closed interactions, especially in terms of the topic, with a general tendency to use reason as a persuasion mode to express the counterpoint to hate comments. The machine-generated counterspeech tends to exhibit an emotional persuasion mode, while human counters lean toward reason. Furthermore, our study shows that reason tends to obtain more supportive replies than other persuasion modes. The findings highlight the potential for incorporating persuasion modes into studies about countering hate speech, as they can serve as an optimal means of explainability and pave the way for the further adoption of the reply's stance and the role it plays in assessing what comprises the optimal counterspeech.

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# インターフュージョン:3次元ヒューマンオブジェクトインタラクションのテキスト駆動生成

InterFusion: Text-Driven Generation of 3D Human-Object Interaction ( http://arxiv.org/abs/2403.15612v2 )

ライセンス: Link先を確認

Sisi Dai, Wenhao Li, Haowen Sun, Haibin Huang, Chongyang Ma, Hui Huang, Kai Xu, Ruizhen Hu,

(参考訳) 本研究では,ゼロショットテキスト・ツー・3D方式でテキスト記述から3次元オブジェクト間インタラクション(HOI)を生成する複雑な課題に取り組む。 HOIにおける直接テキスト・ツー・3D手法の不満足な結果は主にペアのテキスト・インタラクションデータがないことによるものであり、複雑な空間的関係を持つ複数の概念を同時に生成する上で固有の困難さである。これらの問題を効果的に解決するために,HOI生成用に設計された2段階のフレームワークであるInterFusionを提案する。インターフュージョンは、テキストから派生した人間のポーズ推定を幾何学的先行として含み、テキストから3Dへの変換プロセスを単純化し、正確なオブジェクト生成のための追加の制約を導入する。最初の段階では、InterFusionは、幅広いインタラクションを描写した合成画像データセットから3Dの人間のポーズを抽出し、その後、これらのポーズをインタラクション記述にマッピングする。 InterFusionの第2段階は、テキストから3D生成の最新の発展を活かし、現実的で高品質な3D HOIシーンを制作できる。これは、人体とオブジェクトの生成を別々に最適化し、シーン全体のグローバルな最適化と共同で洗練し、シームレスでコンテキスト的に一貫性のある統合を保証する、ローカル・グローバルな最適化プロセスによって達成される。実験の結果,InterFusionは3次元HOI生成において既存の最先端手法よりも優れていたことが確認された。

In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with complex spatial relationships. To effectively address these issues, we present InterFusion, a two-stage framework specifically designed for HOI generation. InterFusion involves human pose estimations derived from text as geometric priors, which simplifies the text-to-3D conversion process and introduces additional constraints for accurate object generation. At the first stage, InterFusion extracts 3D human poses from a synthesized image dataset depicting a wide range of interactions, subsequently mapping these poses to interaction descriptions. The second stage of InterFusion capitalizes on the latest developments in text-to-3D generation, enabling the production of realistic and high-quality 3D HOI scenes. This is achieved through a local-global optimization process, where the generation of human body and object is optimized separately, and jointly refined with a global optimization of the entire scene, ensuring a seamless and contextually coherent integration. Our experimental results affirm that InterFusion significantly outperforms existing state-of-the-art methods in 3D HOI generation.

翻訳日:2024-07-17 21:18:43 公開日:2024-07-16

# 空間・時間・脳におけるバックプロパゲーション

Backpropagation through space, time, and the brain ( http://arxiv.org/abs/2403.16933v2 )

ライセンス: Link先を確認

Benjamin Ellenberger, Paul Haider, Jakob Jordan, Kevin Max, Ismael Jaras, Laura Kriener, Federico Benitez, Mihai A. Petrovici,

(参考訳) 時空間的局所性制約によって束縛された神経細胞の物理的ネットワークが、いかに効率的なクレジット割り当てを行うことができるかは、大きな疑問が残る。機械学習では、その答えは空間と時間の両方を通して、ほとんど普遍的にエラーのバックプロパゲーションアルゴリズムによって与えられる。しかし、このアルゴリズムは生物学的に証明できない仮定、特に時空間(非局所性)に頼っていることはよく知られている。リアルタイム・リカレント・ラーニングのような別のフォワード・プロパゲーション・モデルでは、局所性の問題が部分的に解決されるが、ストレージの要求が禁止されているため、スケーリングのコストに限られる。本稿では,ニューロンの物理的,動的ネットワークにおける完全局所的時空間クレジット割り当てのための計算フレームワークであるGeneralized Latent Equilibrium (GLE)を紹介する。まず、ニューロン局所的なミスマッチに基づいてエネルギーを定義し、そこから定常性による神経力学と勾配降下によるパラメータ力学の両方を導出する。結果として生じる力学は、連続的な活動的な局所シナプス可塑性を持つ深部皮質神経回路網における空間と時間を通しての、リアルタイムで生物学的に妥当なバックプロパゲーションの近似と解釈できる。特に、GLEは樹状体の木の形態を利用して、単一ニューロンにおけるより複雑な情報保存と処理を可能にし、情報伝達の両方向において必須である膜電位に関して、生物学的ニューロンが出力速度を位相シフトさせる能力も持っている。前方の計算では、時間連続入力をニューロン空間にマッピングすることができ、時空間の時空間畳み込みを効果的に行うことができる。後ろ向きの計算では、フィードバック信号の時間反転を許容し、結果として有用なパラメータ更新に必要な随伴変数を近似する。

How physical networks of neurons, bound by spatio-temporal locality constraints, can perform efficient credit assignment, remains, to a large extent, an open question. In machine learning, the answer is almost universally given by the error backpropagation algorithm, through both space and time. However, this algorithm is well-known to rely on biologically implausible assumptions, in particular with respect to spatio-temporal (non-)locality. Alternative forward-propagation models such as real-time recurrent learning only partially solve the locality problem, but only at the cost of scaling, due to prohibitive storage requirements. We introduce Generalized Latent Equilibrium (GLE), a computational framework for fully local spatio-temporal credit assignment in physical, dynamical networks of neurons. We start by defining an energy based on neuron-local mismatches, from which we derive both neuronal dynamics via stationarity and parameter dynamics via gradient descent. The resulting dynamics can be interpreted as a real-time, biologically plausible approximation of backpropagation through space and time in deep cortical networks with continuous-time neuronal dynamics and continuously active, local synaptic plasticity. In particular, GLE exploits the morphology of dendritic trees to enable more complex information storage and processing in single neurons, as well as the ability of biological neurons to phase-shift their output rate with respect to their membrane potential, which is essential in both directions of information propagation. For the forward computation, it enables the mapping of time-continuous inputs to neuronal space, effectively performing a spatio-temporal convolution. For the backward computation, it permits the temporal inversion of feedback signals, which consequently approximate the adjoint variables necessary for useful parameter updates.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# 内視鏡映像からの単眼深度推定のための近接場照明の活用

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos ( http://arxiv.org/abs/2403.17915v2 )

ライセンス: Link先を確認

Akshay Paruchuri, Samuel Ehrenstein, Shuxian Wang, Inbar Fried, Stephen M. Pizer, Marc Niethammer, Roni Sengupta,

(参考訳) 内視鏡ビデオにおける単眼深度推定は、補助手術やロボット手術によって臓器のより良いカバレッジと様々な健康問題の検出を可能にする。主流である自然画像深度推定の進歩は期待できるが、強力な幾何学的特徴の欠如と難解な照明効果のため、内視鏡画像では技術が不十分である。本稿では, 内視鏡から放射される光を表面から反射する光学的手がかりを用いて, 単分子深度推定を改善する。まず、画素ごとのシェーディング表現を利用した教師付きおよび自己監督型の2つの新しい損失関数を作成する。次に、同じピクセルごとのシェーディング表現を利用する新しい深度改善ネットワーク(PPSNet)を提案する。最後に,教師学生の移動学習を導入し,自己監督型と臨床データを用いた合成データから,より深い深度マップを作成する。我々は,臨床データから高品質な深度マップを推定しながら,C3VDデータセットの最先端結果を得る。私たちのコード、事前訓練されたモデル、補足的な資料は、プロジェクトのページで確認できます。

Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation. We first create two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. We then propose a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation. Finally, we introduce teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. We achieve state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data. Our code, pre-trained models, and supplementary materials can be found on our project page: https://ppsnet.github.io/

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# ボース・アインシュタイン凝縮体を用いた重力の量子的性質の探索

Probing the quantum nature of gravity using a Bose-Einstein condensate ( http://arxiv.org/abs/2403.18460v3 )

ライセンス: Link先を確認

Soham Sen, Sunandan Gangopadhyay,

(参考訳) ボース・アインシュタイン凝縮体を用いてグラビトンによる騒音の影響について検討した。重力波の摂動は運動量空間における離散フーリエモードの和と見なされる。作用素表現と、全系の重力とボゾン部分に対応する正準共役変数の間の正準可換関係を通じて位相空間変数を量子化し、適切な量子重力設定を得る。次に, 擬ゴールドストーン粒子の時間依存性部分の解からボゴリューボフ係数を求め, 初期懸濁状態にあるボソンの共分散測定値を構成する。フィッシャー情報の確率平均を用いて重力波の振幅パラメータの低い値を求める。計算全体をゼロ温度で行うと、ボゾン系は建設によってボース=アインシュタイン凝縮体として振る舞う。ボース=アインシュタインが1つのモードで凝縮すると、振幅測定における不確実性の平方の期待値の低い境界は、全観測項が0に近づくと無限にならない。すべての運動量モードをまとめるために、次は時間とともに減衰する適切なガウス重み係数を持つ雑音項を考える。次に、振幅パラメータの分散の正方形の最終的な期待値に対する下界を求める。重力波によって誘導されるノイズのため、ボース・アインシュタイン凝縮体を用いて重力波を検出できない測定時間の最小値が存在する。最後に、ボース・アインシュタイン凝縮体のフォノンモード間の相互作用を考察し、デコヒーレンスをもたらす。この脱コヒーレンス効果は, 最小のスクイージングを有するグラビトンに対して重要であることが観察された。

The effect of noise induced by gravitons has been investigated using a Bose-Einstein condensate. The gravitational wave perturbation is then considerd as a sum of discrete Fourier modes in the momentum space. Coming to an operatorial representation and quantizing the phase space variables via appropriately introduced canonincal commutation relations between the canonically conjugate variables corresponding to the graviton and bosonic part of the total system, one obtains a proper quantum gravity setup. Then we obtain the Bogoliubov coefficients from the solution of the time-dependent part of the pseudo-Goldstone boson and construct the covariance metric for the bosons initially being in a squeezed state. Using the stochastic average of the Fisher information, we obtain a lower bound on the amplitude parameter of the gravitational wave. As the entire calculation is done at zero temperature, the bosonic system, by construction, will behave as a Bose-Einstein condensate. For a Bose-Einstein condensate with a single mode, we observe that the lower bound of the expectation value of the square of the uncertainty in the amplitude measurement does not become infinite when the total observational term approaches zero. In order to sum over all possible momentum modes, we next consider a noise term with a suitable Gaussian weight factor which decays over time. We then obtain the lower bound on the final expectation value of the square of the variance in the amplitude parameter. Because of the noise induced by the graviton, there is a minimum value of the measurement time below which it is impossible to detect any gravitational wave using a Bose-Einstein condensate. Finally, we consider interaction between the phonon modes of the Bose-Einstein condensate which results in the decoherence. We observe that the decoherence effect becomes significant for gravitons with minimal squeezing.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# 動的コンテキスト内:詩列を用いた慣性認識型3次元人体モデリング

Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence ( http://arxiv.org/abs/2403.19160v2 )

ライセンス: Link先を確認

Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng,

(参考訳) ニューラルレンダリング技術は、3次元の人体モデリングを著しく進歩させた。しかし、従来のアプローチでは、運動慣性などの要因によって引き起こされるダイナミクスを見落とし、回転後の突然停止のようなシナリオでは、ポーズが変化しながら静止している。この制限は、1つのポーズを条件入力として依存することから生じ、1つのポーズを複数の外観にマッピングするあいまいさをもたらす。本研究では、現在のフレームのポーズ状態だけでなく、過去のポーズ状態にも人間の外観の変化が依存していることを明らかにする。そこで本稿では,非剛性変形と標準空間にデルタポーズシーケンス表現を応用し,時間変動を効果的にモデル化するDycoを提案する。新たなポーズに対するモデルの一般化能力の低下を防止するため、不要なボディ間の依存関係を減らすための低次元グローバルコンテキストと、モデルによるデルタポーズシーケンスのオーバーフィッティングを軽減するための量子化操作を提案する。 I3D-Human という新しいデータセットを収集し,衣服の外観の時間的変化を近似的なポーズで捉えた。 I3D-Humanおよび既存のデータセットに関する広範な実験を通じて,本手法は質的かつ定量的な性能を示す。さらに, 慣性を考慮した3次元人間の手法は, 異なる速度での慣性による外観変化を前例なくシミュレートすることができる。

Neural rendering techniques have significantly advanced 3D human body modeling. However, previous approaches often overlook dynamics induced by factors such as motion inertia, leading to challenges in scenarios like abrupt stops after rotation, where the pose remains static while the appearance changes. This limitation arises from reliance on a single pose as conditional input, resulting in ambiguity in mapping one pose to multiple appearances. In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states. Therefore, we introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations and canonical space to effectively model temporal appearance variations. To prevent a decrease in the model's generalization ability to novel poses, we further propose low-dimensional global context to reduce unnecessary inter-body part dependencies and a quantization operation to mitigate overfitting of the delta pose sequence by the model. To validate the effectiveness of our approach, we collected a novel dataset named I3D-Human, with a focus on capturing temporal changes in clothing appearance under approximate poses. Through extensive experiments on both I3D-Human and existing datasets, our approach demonstrates superior qualitative and quantitative performance. In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# 一次元非相互格子における非エルミート皮膚効果と拡張状態の共存

Coexistence of non-Hermitian skin effect and extended states in one-dimensional nonreciprocal lattices ( http://arxiv.org/abs/2403.19430v2 )

ライセンス: Link先を確認

Han Xiao, Qi-Bo Zeng,

(参考訳) 本研究は,1次元非エルミチアン格子のスタッガートオンサイト変調と非相互ホッピングを次のアレスト近傍(NNN)サイトまで行うことを目的とした。 NNN の非相反性のため、開境界条件 (OBC) 下での系の非エルミート皮膚効果 (NHSE) はエネルギー依存性があり、格子の反対側に局在する固有状態を分離する固有エネルギースペクトルに NHSE エッジが存在する。非相反ホッピングとオンサイト変調の相互作用は、皮膚効果の方向を逆転させ、NHSEエッジの位置を変更することができる。さらに、システムパラメータをチューニングすることにより、OBCの下での固有状態のいくつかは完全に拡張され、対応する固有エネルギーは開境界条件と周期境界条件の両方で虚数となる。したがって、拡張状態は同じシステムでNHSHと共存することができる。 NHSEは、変調が虚構であるときに全ての固有状態が拡張されて完全に溶解する。本研究は,非エルミート系におけるオンサイト変調と非相互ホッピングの複雑な相互作用を明らかにする。

We study the one-dimensional non-Hermitian lattices with staggered onsite modulations and nonreciprocal hopping up to the next-nearest-neighboring (NNN) sites. Due to the NNN nonreciprocity, the non-Hermitian skin effect (NHSE) in the system under open boundary conditions (OBC) can be energy dependent, and there will be NHSE edges in the eigenenergy spectrum, which separates the eigenstates localized at the opposite ends of the lattice. We find that the interplay between the nonreciprocal hopping and onsite modulations can reverse the direction of the skin effect and modify the position of the NHSE edge. Moreover, by tuning the system parameters, some of the eigenstates under OBC will become fully extended with the corresponding eigenenergies being imaginary under both open and periodic boundary conditions. Thus, the extended states can coexist with the NHSH in the same system. The NHSE can even be completely dissolved with all the eigenstates being extended when the modulation is imaginary. Our work unveils the intricate interplay between onsite modulations and nonreciprocal hopping in non-Hermitian systems.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# Change-Agent: 対話型総合的リモートセンシング変化解釈と分析を目指して

Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis ( http://arxiv.org/abs/2403.19646v3 )

ライセンス: Link先を確認

Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, Zhenwei Shi,

(参考訳) 地球表面における変化のモニタリングは、自然の過程や人間の影響を理解するために不可欠であり、精密で包括的な解釈手法を必要とする。リモートセンシング衛星画像は、これらの変化を監視するためのユニークな視点を提供し、重要な研究焦点としてリモートセンシング画像変化解釈(RSICI)の出現につながった。現在のRSICI技術は、変更検出と変更キャプションを包含しており、それぞれに包括的な解釈を提供する限界がある。そこで本稿では,変更検出や変更キャプション,変更対象のカウント,変更原因分析など,包括的な変更解釈と洞察に富んだ分析を実現するためのユーザ指示に従うインタラクティブなChange-Agentを提案する。 Change-Agentは、マルチレベル変化解釈(MCI)モデルを目として、大きな言語モデル(LLM)を脳として統合する。 MCIモデルには2つのピクセルレベルの変化検出とセマンティックレベルの変化キャプションが含まれており、BI時間的反復相互作用(BI3)層がモデルの識別的特徴表現能力を高めるために提案されている。 MCIモデルのトレーニングを支援するため、多数の変更マスクと変更のキャプションを備えたLEVIR-MCIデータセットを構築した。実験では,変化検出と変化記述を同時に達成する上で,MCIモデルのSOTA性能を実証し,表面変化の包括的解釈を容易にする上で,我々のChange-Agentの有望な応用価値を強調し,インテリジェントなリモートセンシングアプリケーションのための新たな道を開く。 MCIモデルとChange-Agentのデータセットとコードベースをhttps://github.com/Chen-Yang-Liu/Change-Agentで公開します。

Monitoring changes in the Earth's surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of remote sensing image change interpretation (RSICI) as a significant research focus. Current RSICI technology encompasses change detection and change captioning, each with its limitations in providing comprehensive interpretation. To address this, we propose an interactive Change-Agent, which can follow user instructions to achieve comprehensive change interpretation and insightful analysis, such as change detection and change captioning, change object counting, change cause analysis, etc. The Change-Agent integrates a multi-level change interpretation (MCI) model as the eyes and a large language model (LLM) as the brain. The MCI model contains two branches of pixel-level change detection and semantic-level change captioning, in which the BI-temporal Iterative Interaction (BI3) layer is proposed to enhance the model's discriminative feature representation capabilities. To support the training of the MCI model, we build the LEVIR-MCI dataset with a large number of change masks and captions of changes. Experiments demonstrate the SOTA performance of the MCI model in achieving both change detection and change description simultaneously, and highlight the promising application value of our Change-Agent in facilitating comprehensive interpretation of surface changes, which opens up a new avenue for intelligent remote sensing applications. To facilitate future research, we will make our dataset and codebase of the MCI model and Change-Agent publicly available at https://github.com/Chen-Yang-Liu/Change-Agent

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# Diff-Reg v1: 登録問題に対する拡散マッチングモデル

Diff-Reg v1: Diffusion Matching Model for Registration Problem ( http://arxiv.org/abs/2403.19919v2 )

ライセンス: Link先を確認

Qianliang Wu, Haobo Jiang, Lei Luo, Jun Li, Yaqing Ding, Jin Xie, Jian Yang,

(参考訳) 3Dや2D3Dの登録のような登録タスクには、信頼できる対応を確立することが不可欠である。既存の手法では、幾何学的あるいは意味的な特徴を利用して潜在的な対応を生成する。しかし、これらの特徴は大きな変形、スケールの不整合、曖昧なマッチング問題(例えば対称性)といった課題に直面している可能性がある。さらに、シングルパス予測に依存する多くの従来の手法は、複雑なシナリオにおいて局所ミニマと競合する可能性がある。これらの課題を軽減するために,ロバスト対応構築のための拡散マッチングモデルを提案する。提案手法は, 2次確率行列空間内の共振拡散過程として対応し, 2次確率マッチング行列を2次確率マッチング行列から2次確率マッチング行列に分解し,高品質な対応推定を行う。これは、ガウス雑音を基底の真理マッチング行列に徐々に導入する前方拡散過程と、雑音マッチング行列を反復的に洗練する逆復調過程を含む。特に、バックボーンからの特徴抽出は推論フェーズ中に1回だけ発生する。我々の軽量デノナイジングモジュールは、各逆サンプリングステップで同じ機能を利用する。 3次元および2次元の登録タスクにおける本手法の有効性を検証した。コードはhttps://github.com/wuqianliang/Diff-Reg.comで公開されている。

Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a reverse denoising process that iteratively refines the noisy matching matrix. In particular, the feature extraction from the backbone occurs only once during the inference phase. Our lightweight denoising module utilizes the same feature at each reverse sampling step. Evaluation of our method on both 3D and 2D3D registration tasks confirms its effectiveness. The code is available at https://github.com/wuqianliang/Diff-Reg.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# 振動子に対する量子ランゲヴィン方程式の弱結合限界

Weak-coupling limits of the quantum Langevin equation for an oscillator ( http://arxiv.org/abs/2404.01285v2 )

ライセンス: Link先を確認

Aritra Ghosh, Sushanta Dattagupta,

(参考訳) 独立振動子モデルから得られる量子ランゲヴィン方程式は、ゴリーニ=コサコフスキー=スダルシャン=リンドブラッド方程式の文脈で用いられるボルン=マルコフ近似を欠いた強い結合状態を記述する。この問題は、変動散逸定理を満たす雑音項を持つ高調波発振器に対して、量子ランゲヴィン方程式のレベルでそのような'Born-Markov'のような近似を実装するとどうなるかということである。この背景には、回転波近似についてもコメントする。

The quantum Langevin equation as obtained from the independent-oscillator model describes a strong-coupling situation, devoid of the Born-Markov approximation that is employed in the context of the Gorini-Kossakowski-Sudarshan-Lindblad equation. The question we address is what happens when we implement such 'Born-Markov'-like approximations at the level of the quantum Langevin equation for a harmonic oscillator which carries a noise term satisfying a fluctuation-dissipation theorem. In this backdrop, we also comment on the rotating-wave approximation.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# JailbreakBench: 大規模言語モデルのジェイルブレークのためのオープンなロバストネスベンチマーク

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models ( http://arxiv.org/abs/2404.01318v4 )

ライセンス: Link先を確認

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong,

(参考訳) ジェイルブレイク攻撃は、大きな言語モデル(LLM)が有害、非倫理的、またはその他の不快なコンテンツを生成する原因となる。これらの攻撃を評価することは、現在のベンチマークや評価技術が適切に対処していない多くの課題を示す。第一に、脱獄評価に関する明確な基準はない。第二に、既存の作業はコストと成功率を相容れない方法で計算します。そして第3に、多くの著作物は再現不可能で、敵のプロンプトを無視したり、クローズドソースのコードに関わったり、プロプライエタリなAPIの進化に依存している。これらの課題に対処するために、(1) ジェイルブレイクアーティファクトと呼ばれる最先端の敵対的プロンプトの進化したリポジトリ、(2) 以前の作業(Zou et al , 2023; Mazeika et al , 2023, 2024)から生まれた100の行動からなるジェイルブレイクデータセット、(3) https://github.com/JailbreakBench/jailbreakbenchの標準化された評価フレームワークで、明確に定義された脅威モデル、システムプロンプト、チャットテンプレート、スコアリング機能を含む。我々は、このベンチマークのリリースによる倫理的影響を慎重に検討し、コミュニティにとってプラスになると考えている。

Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. First, there is no clear standard of practice regarding jailbreaking evaluation. Second, existing works compute costs and success rates in incomparable ways. And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely on evolving proprietary APIs. To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work (Zou et al., 2023; Mazeika et al., 2023, 2024) -- which align with OpenAI's usage policies; (3) a standardized evaluation framework at https://github.com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at https://jailbreakbench.github.io/ that tracks the performance of attacks and defenses for various LLMs. We have carefully considered the potential ethical implications of releasing this benchmark, and believe that it will be a net positive for the community.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# HSViT:水平にスケーラブルな視覚変換器

HSViT: Horizontally Scalable Vision Transformer ( http://arxiv.org/abs/2404.05196v2 )

ライセンス: Link先を確認

Chenhao Xu, Chang-Tsun Li, Chee Peng Lim, Douglas Creighton,

(参考訳) 事前知識(帰納バイアス)が不足しているため、ViT(Vision Transformer)は大規模データセットの事前トレーニングを必要としている。さらに、ViTモデルの成長するレイヤとパラメータは、限られたコンピューティングリソースを持つデバイスへの適用性を妨げている。上記の課題を緩和するため,本稿では,新しい水平拡張型視覚変換器(HSViT)方式を提案する。具体的には、ViTに新しいイメージレベルの機能埋め込みが導入され、保存された帰納バイアスにより、小さなデータセットでパフォーマンスを向上しながら、事前トレーニングの必要性を排除できる。さらに、水平にスケーラブルな新しいアーキテクチャが設計され、複数のコンピューティングデバイス間で協調的なモデルトレーニングと推論を容易にする。実験結果は、事前トレーニングなしで、HSViTは、小さなデータセット上の最先端のスキームよりも最大10%高いトップ1精度を達成する一方で、ImageNet上で既存のCNNバックボーンを最大3.1%改善することを示している。コードはhttps://github.com/xuchenhao001/HSViTで入手できる。

Due to its deficiency in prior knowledge (inductive bias), Vision Transformer (ViT) requires pre-training on large-scale datasets to perform well. Moreover, the growing layers and parameters in ViT models impede their applicability to devices with limited computing resources. To mitigate the aforementioned challenges, this paper introduces a novel horizontally scalable vision transformer (HSViT) scheme. Specifically, a novel image-level feature embedding is introduced to ViT, where the preserved inductive bias allows the model to eliminate the need for pre-training while outperforming on small datasets. Besides, a novel horizontally scalable architecture is designed, facilitating collaborative model training and inference across multiple computing devices. The experimental results depict that, without pre-training, HSViT achieves up to 10% higher top-1 accuracy than state-of-the-art schemes on small datasets, while providing existing CNN backbones up to 3.1% improvement in top-1 accuracy on ImageNet. The code is available at https://github.com/xuchenhao001/HSViT.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# PromptAD:Few-Shot 異常検出のための正規サンプルのみを用いた学習プロンプト

PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection ( http://arxiv.org/abs/2404.05231v2 )

ライセンス: Link先を確認

Xiaofan Li, Zhizhong Zhang, Xin Tan, Chengwei Chen, Yanyun Qu, Yuan Xie, Lizhuang Ma,

(参考訳) 視覚言語モデルは、数発の産業異常検出に大きな改善をもたらしており、通常は急速エンジニアリングを通じて数百のプロンプトを設計する必要がある。自動シナリオでは,まず従来のプロンプト学習をベースラインとして多クラスパラダイムを用いて,プロンプトを自動的に学習するが,一クラス異常検出ではうまく動作しないことがわかった。そこで本研究では,PromptADと呼ばれる,数発の異常検出のための一級プロンプト学習手法を提案する。まず,正常なプロンプトと異常なサフィックスを連結することにより,通常のプロンプトを異常なプロンプトに変換できるセマンティック・コンカネーションを提案する。さらに,異常画像の欠如によるトレーニング課題を軽減するために,異常画像と異常画像とのマージンを明示的に制御する明示的異常マージンの概念を導入する。画像レベル/ピクセルレベルの異常検出のために、PromptADはMVTecとVisAで11/12のショット設定で1位を達成した。

The vision-language model has brought great improvement to few-shot industrial anomaly detection, which usually needs to design of hundreds of prompts through prompt engineering. For automated scenarios, we first use conventional prompt learning with many-class paradigm as the baseline to automatically learn prompts but found that it can not work well in one-class anomaly detection. To address the above problem, this paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD. First, we propose semantic concatenation which can transpose normal prompts into anomaly prompts by concatenating normal prompts with anomaly suffixes, thus constructing a large number of negative samples used to guide prompt learning in one-class setting. Furthermore, to mitigate the training challenge caused by the absence of anomaly images, we introduce the concept of explicit anomaly margin, which is used to explicitly control the margin between normal prompt features and anomaly prompt features through a hyper-parameter. For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# 3D-COCO:画像検出用MS-COCOデータセットと3D再構成モジュールの拡張

3D-COCO: extension of MS-COCO dataset for image detection and 3D reconstruction modules ( http://arxiv.org/abs/2404.05641v3 )

ライセンス: Link先を確認

Maxence Bideaux, Alice Phe, Mohamed Chaouch, Bertrand Luvison, Quoc-Cuong Pham,

(参考訳) 3Dモデルと2D-3Dアライメントアノテーションを提供するMS-COCOデータセットの拡張である3D-COCOを紹介する。 3D-COCOは、テキスト、2D画像、および3DCADモデルクエリで構成可能な3D再構成や画像検出などのコンピュータビジョンタスクを実現するように設計されている。既存のMS-COCOデータセットは、ShapeNetとObjaverseで収集された28Kの3Dモデルで完結する。 IoUをベースとした手法により,各MS-COCOアノテーションと最適な3Dモデルとをマッチングし,2D-3Dアライメントを実現する。 3D-COCOのオープンソース性は、新しい3D関連トピック研究の道を開くためのプレミアである。データセットとそのソースコードはhttps://kalisteo.cea.fr/index.php/coco3d-object-detection-and-reconstruction/で公開されている。

We introduce 3D-COCO, an extension of the original MS-COCO dataset providing 3D models and 2D-3D alignment annotations. 3D-COCO was designed to achieve computer vision tasks such as 3D reconstruction or image detection configurable with textual, 2D image, and 3D CAD model queries. We complete the existing MS-COCO dataset with 28K 3D models collected on ShapeNet and Objaverse. By using an IoU-based method, we match each MS-COCO annotation with the best 3D models to provide a 2D-3D alignment. The open-source nature of 3D-COCO is a premiere that should pave the way for new research on 3D-related topics. The dataset and its source codes is available at https://kalisteo.cea.fr/index.php/coco3d-object-detection-and-reconstruction/

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# 球面面表現による安定3次元フルヘッド合成

SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation ( http://arxiv.org/abs/2404.05680v2 )

ライセンス: Link先を確認

Heyuan Li, Ce Chen, Tianhao Shi, Yuda Qiu, Sizhe An, Guanying Chen, Xiaoguang Han,

(参考訳) 近年のGAN(Generative Adversarial Networks)の進歩は,ヒトの顔合成の発達に寄与しているが,すべての角度から視認できる完全な3D頭部を包括的に合成するという課題は今も続いている。 PanoHeadは、正面と後方の両方のビューをイメージした大規模なデータセットをフルヘッド合成に使用する可能性を証明しているが、多くの場合、バックビューのアーティファクトを発生させる。詳細な分析の結果,主に2倍の理由が判明した。まず、ネットワークアーキテクチャの観点から、利用した三平面/三格子表現空間の各平面は、両面から特徴を混乱させる傾向があり、「輝く」アーティファクト(例えば、眼鏡が後ろに現れる)が生じる。第2に、データ監視の観点から、既存の3D GANにおける差別化訓練は、レンダリング画像自体の品質に重点を置いており、レンダリングされた視点では、その妥当性をあまり気にしていないことがわかった。これにより、差別者を騙すのが簡単であるため、前向きでない視点で「顔」を生成できる。球面座標系における新しい三面面表現であるSphereHeadを提案し,人間の頭部の幾何学的特徴に適合し,生成した人工物の多くを効率的に緩和する。さらに、カメラパラメータと画像の対応性を強調するために、識別器の視像整合性損失を導入する。これらの取り組みを組み合わせることで、視覚的に優れた成果が得られ、成果物は著しく少ない。私たちのコードとデータセットはhttps://lhyfst.github.io/spherehead.comで公開されています。

While recent advances in 3D-aware Generative Adversarial Networks (GANs) have aided the development of near-frontal view human face synthesis, the challenge of comprehensively synthesizing a full 3D head viewable from all angles still persists. Although PanoHead proves the possibilities of using a large-scale dataset with images of both frontal and back views for full-head synthesis, it often causes artifacts for back views. Based on our in-depth analysis, we found the reasons are mainly twofold. First, from network architecture perspective, we found each plane in the utilized tri-plane/tri-grid representation space tends to confuse the features from both sides, causing "mirroring" artifacts (e.g., the glasses appear in the back). Second, from data supervision aspect, we found that existing discriminator training in 3D GANs mainly focuses on the quality of the rendered image itself, and does not care much about its plausibility with the perspective from which it was rendered. This makes it possible to generate "face" in non-frontal views, due to its easiness to fool the discriminator. In response, we propose SphereHead, a novel tri-plane representation in the spherical coordinate system that fits the human head's geometric characteristics and efficiently mitigates many of the generated artifacts. We further introduce a view-image consistency loss for the discriminator to emphasize the correspondence of the camera parameters and the images. The combination of these efforts results in visually superior outcomes with significantly fewer artifacts. Our code and dataset are publicly available at https://lhyfst.github.io/spherehead.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# フェルミオン熱場理論における量子計算

Quantum computation in fermionic thermal field theories ( http://arxiv.org/abs/2404.07912v2 )

ライセンス: Link先を確認

Wenyang Qian, Bin Wu,

(参考訳) 有限温度での量子場の熱的性質は、強く相互作用する物質を理解するために不可欠であり、量子コンピューティングにおける最近の発展は、代替的で有望な研究の道筋となった。本研究では,量子アルゴリズムを用いてフェルミオンのみを含む熱場理論を研究する。まず、汎用量子場理論の熱的性質を評価するために用いられる量子想像時間進化のような量子アルゴリズムとともに、デジタル量子コンピュータ上の量子ビットによるフェルミオン場のプレゼンテーションを探索する。具体的には、Majoranaフェルミオンの熱分布やエネルギー密度などの数値計算結果を量子シミュレーターを用いて1+1次元で示す。自由場理論に加えて、空間的に均質なマヨナ場との結合から生じる相互作用の効果についても検討する。どちらの場合も、位相空間分布を用いて系の熱的性質を記述できることを解析的に示し、量子シミュレーションの結果は解析的および半古典的期待値と一致することを示す。我々の研究は、熱的固定点を理解するための重要なステップであり、リアルタイムの熱化の量子シミュレーションの準備である。

Thermal properties of quantum fields at finite temperature are crucial to understanding strongly interacting matter and recent development in quantum computing has provided an alternative and promising avenue of study. In this work, we study thermal field theories involving only fermions using quantum algorithms. We first delve into the presentations of fermion fields via qubits on digital quantum computers alongside the quantum algorithms such as quantum imaginary time evolutions employed to evaluate thermal properties of generic quantum field theories. Specifically, we show numerical results such as the thermal distribution and the energy density of thermal field theories for Majorana fermions in 1+1 dimensions using quantum simulators. In addition to free field theory, we also study the effects of interactions resulting from coupling with a spatially homogeneous Majorana field. In both cases, we show analytically that thermal properties of the system can be described using phase-space distributions, and the quantum simulation results agree with analytical and semiclassical expectations. Our work is an important step to understand thermal fixed points, preparing for quantum simulation of thermalization in real time.

翻訳日:2024-07-17 21:08:58 公開日:2024-07-16

# LaVy: ベトナムのマルチモーダル大言語モデル

LaVy: Vietnamese Multimodal Large Language Model ( http://arxiv.org/abs/2404.07922v6 )

ライセンス: Link先を確認

Chi Tran, Huong Le Thanh,

(参考訳) LLM(Large Language Models)とMLLM(Multimodal Large Language Models)は、複雑な推論と言語理解において印象的な能力を持つ嵐によって世界を席巻している。一方、ベトナムの大規模言語モデルに関連する多くの作品があり、マルチモーダリティにおける高品質な資源の欠如はベトナムのMLLMの進歩を妨げている。本稿では,現在最先端のベトナム語MLLMであるLaVyを導入することでこの問題に対処し,また,MLLMのベトナム語視覚言語タスクに対する理解を評価するためのLaVy-Benchベンチマークも導入する。私たちのプロジェクトはhttps://github.com/baochi0212/LaVyで公開されています。

Large Language Models (LLMs) and Multimodal Large language models (MLLMs) have taken the world by storm with impressive abilities in complex reasoning and linguistic comprehension. Meanwhile there are plethora of works related to Vietnamese Large Language Models, the lack of high-quality resources in multimodality limits the progress of Vietnamese MLLMs. In this paper, we pioneer in address this by introducing LaVy, a state-of-the-art Vietnamese MLLM, and we also introduce LaVy-Bench benchmark designated for evaluating MLLMs's understanding on Vietnamese visual language tasks. Our project is public at https://github.com/baochi0212/LaVy

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# 量子ビットゆらぎの物理インフォームドトラッキング

Physics-informed tracking of qubit fluctuations ( http://arxiv.org/abs/2404.09212v2 )

ライセンス: Link先を確認

Fabrizio Berritta, Jan A. Krzywda, Jacob Benestad, Joost van der Heijden, Federico Fedele, Saeed Fallahi, Geoffrey C. Gardner, Michael J. Manfra, Evert van Nieuwenburg, Jeroen Danon, Anasua Chatterjee, Ferdinand Kuemmeth,

(参考訳) 環境変動は固体量子ビットの性能を低下させるが、原理的には推定効率によって設定された時間スケールまでリアルタイムハミルトン推定によって緩和することができる。物理インフォームドおよび適応ベイズ推定戦略を実装し,それを半導体スピン量子ビットにリアルタイムで適用する。物理インフォームド戦略は、ガリウム-ヒ素中の核スピン拡散の影響を説明するのに適した、フォッカー・プランク方程式に従って量子コントローラ内の確率分布を伝播させる。所定のキュービットプローブシーケンスによる予測分布の評価と絞りにより、シングルトリップキュービット内の非制御磁場勾配の動的追跡を改善することができる。適応戦略は、プローブシーケンスを少数のキュービットプローブサイクルに置き換え、前の測定結果に基づいて各プローブ時間を設定することにより、推定効率をさらに高める。組み合わせたリアルタイム推定戦略は、固体量子ビット内の低周波核スピン変動を効率的に追跡し、適切な更新方程式を調整して異なるノイズ源を捕捉することにより、他の量子ビットプラットフォームに適用することができる。

Environmental fluctuations degrade the performance of solid-state qubits but can in principle be mitigated by real-time Hamiltonian estimation down to time scales set by the estimation efficiency. We implement a physics-informed and an adaptive Bayesian estimation strategy and apply them in real time to a semiconductor spin qubit. The physics-informed strategy propagates a probability distribution inside the quantum controller according to the Fokker-Planck equation, appropriate for describing the effects of nuclear spin diffusion in gallium-arsenide. Evaluating and narrowing the anticipated distribution by a predetermined qubit probe sequence enables improved dynamical tracking of the uncontrolled magnetic field gradient within the singlet-triplet qubit. The adaptive strategy replaces the probe sequence by a small number of qubit probe cycles, with each probe time conditioned on the previous measurement outcomes, thereby further increasing the estimation efficiency. The combined real-time estimation strategy efficiently tracks low-frequency nuclear spin fluctuations in solid-state qubits, and can be applied to other qubit platforms by tailoring the appropriate update equation to capture their distinct noise sources.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# アンダーバッグングのレプリカ解析

A replica analysis of under-bagging ( http://arxiv.org/abs/2404.09779v3 )

ライセンス: Link先を確認

Takashi Takahashi,

(参考訳) アンダーバッグング(Under-bagging, UB)は、アンダーサンプリングとバッグングを組み合わせたアンサンブル学習法である。アンダーサンプリングによる試料径の減少による分散の増大を抑えるためにバッジを用いることは自然なアプローチである。しかし近年、一般化線形モデルでは、クラス不均衡構造を考慮しない単純バッグングとリッジ正規化が同じ結果をもたらすことが指摘されている。したがって、線形モデルのトレーニングにおいて、アンダーサンプルデータセットの数に比例する計算コストの増大を必要とするUBを使う方がよいかどうかは明らかではない。このような状況を踏まえ、本研究ではUBの急激な漸近をヒューリスティックに導き、二成分混合データから線形分類器を訓練するシナリオにおいて、不均衡データから学習する他の一般的な方法と比較する。比較した方法は、アンダーサンプリング(US)法と、アンダーサンプリングデータの単一実現を用いてモデルをトレーニングする単純な重み付け(SW)法と、全データに重み付き損失を持つモデルをトレーニングする単純な重み付け(SW)法を含む。特に少数クラスのサイズが小さい場合において、クラス不均衡が大きい場合であっても、少数クラスのサイズを維持しながら、多数クラスのサイズを増大させることにより、UBの性能が向上することが示されている。これは、大多数のクラスサイズからほぼ独立したパフォーマンスを持つ米国とは対照的である。この意味では、アンダーサンプリングによる分散の増大を抑える方法として、バッグングと単純な正規化が異なる。一方、最適な重み付け係数を持つSWの性能はUBとほぼ同等であり、再重み付けと正則化の組み合わせはUBと類似している可能性がある。

Under-bagging (UB), which combines under-sampling and bagging, is a popular ensemble learning method for training classifiers on an imbalanced data. Using bagging to reduce the increased variance caused by the reduction in sample size due to under-sampling is a natural approach. However, it has recently been pointed out that in generalized linear models, naive bagging, which does not consider the class imbalance structure, and ridge regularization can produce the same results. Therefore, it is not obvious whether it is better to use UB, which requires an increased computational cost proportional to the number of under-sampled data sets, when training linear models. Given such a situation, in this study, we heuristically derive a sharp asymptotics of UB and use it to compare with several other popular methods for learning from imbalanced data, in the scenario where a linear classifier is trained from a two-component mixture data. The methods compared include the under-sampling (US) method, which trains a model using a single realization of the under-sampled data, and the simple weighting (SW) method, which trains a model with a weighted loss on the entire data. It is shown that the performance of UB is improved by increasing the size of the majority class while keeping the size of the minority fixed, even though the class imbalance can be large, especially when the size of the minority class is small. This is in contrast to US, whose performance is almost independent of the majority class size. In this sense, bagging and simple regularization differ as methods to reduce the variance increased by under-sampling. On the other hand, the performance of SW with the optimal weighting coefficients is almost equal to UB, indicating that the combination of reweighting and regularization may be similar to UB.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# 分子アンサンブルに対するTavis-Cummingsモデルの拡張 -- 双極子自己エネルギーと静的双極子モーメントの効果を探る

Extending the Tavis-Cummings model for molecular ensembles -- Exploring the effects of dipole self energies and static dipole moments ( http://arxiv.org/abs/2404.10680v2 )

ライセンス: Link先を確認

Lucas Borges, Thomas Schnappinger, Markus Kowalewski,

(参考訳) 有機分子とナノスケールキャビティの真空場との強いカップリングは、それらの化学的および物理的性質を変更するために用いられる。分子アンサンブルに対するTavis-Cummingsモデルを拡張し、静的双極子モーメントと双極子自己エネルギーから生じるしばしば無視される相互作用項が、偏光化学における光-物質相互作用の正確な記述に不可欠であることを示す。完全な量子記述に基づいて、光空洞に共鳴結合したMgH$^+$分子の励起状態ダイナミクスと分光をシミュレートする。静的双極子モーメントと双極子自己エネルギーの包含は、一貫したモデルを得るのに必要であることを示す。実分子系の主要な特徴を再現し,より大規模な分子アンサンブルをシミュレートする,効率的な2レベルシステムアプローチを構築した。

Strong coupling of organic molecules to the vacuum field of a nanoscale cavity can be used to modify their chemical and physical properties. We extend the Tavis-Cummings model for molecular ensembles and show that the often neglected interaction terms arising from the static dipole moment and the dipole self-energy are essential for a correct description of the light-matter interaction in polaritonic chemistry. On the basis of a full quantum description, we simulate the excited-state dynamics and spectroscopy of MgH$^+$ molecules resonantly coupled to an optical cavity. We show that the inclusion of static dipole moments and the dipole self-energy is necessary to obtain a consistent model. We construct an efficient two-level system approach that reproduces the main features of the real molecular system and may be used to simulate larger molecular ensembles.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# StyleCity: 大規模3D都市シーンのスタイリゼーション

StyleCity: Large-Scale 3D Urban Scenes Stylization ( http://arxiv.org/abs/2404.10681v2 )

ライセンス: Link先を確認

Yingshu Chen, Huajian Huang, Tuan-Anh Vu, Ka Chun Shum, Sai-Kit Yeung,

(参考訳) さまざまなスタイルで大規模な仮想都市シーンを作ることは、本質的に困難である。仮想制作のプロトタイプを容易にし,複雑な材料や照明設備の必要を回避すべく,大規模な都市シーンを対象とした視覚・テキスト駆動型テクスチャスタイリングシステムであるStyleCityを紹介した。画像とテキストを参照として、StyleCityは、大都市シーンの3次元テクスチャメッシュを意味論的に認識し、調和した全方位空背景を生成する。そこで我々は,2次元の視覚とテクスチャをグローバルかつ局所的に3Dに転送することで,ニューラルネットワークのテクスチャフィールドをスタイリングすることを提案する。 3Dスタイリングでは,高品質なシーンコンテンツを保存するために,入力された3Dシーンのトレーニングビューを異なるレベルに段階的に拡大する。次に、トレーニングビューのスケールにスタイルイメージのスケールを適用することで、世界規模でシーンスタイルを最適化する。さらに,写真リアリスティックなスタイリゼーションに不可欠なセマンティクス・アウェアスタイルの損失によって,局所的なセマンティクスの整合性を向上させる。テクスチャのスタイリゼーションに加えて,より没入的な雰囲気を提供し,セマンティックなスタイリゼーションプロセスを支援する,スタイルに一貫性のある全方位スカイイメージを合成するための生成拡散モデルも導入する。スタイリングされたニューラルテクスチャフィールドを任意の解像度のテクスチャに焼き込むことができ、従来のレンダリングパイプラインへのシームレスな統合を可能にし、仮想生産プロトタイピングプロセスを大幅に緩和することができる。大規模な実験は、質的で定量的なパフォーマンスとユーザの嗜好において、スタイリングされたシーンの優越性を実証する。

Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a large-scale urban scene in a semantics-aware fashion and generates a harmonic omnidirectional sky background. To achieve that, we propose to stylize a neural texture field by transferring 2D vision-and-text priors to 3D globally and locally. During 3D stylization, we progressively scale the planned training views of the input 3D scene at different levels in order to preserve high-quality scene content. We then optimize the scene style globally by adapting the scale of the style image with the scale of the training views. Moreover, we enhance local semantics consistency by the semantics-aware style loss which is crucial for photo-realistic stylization. Besides texture stylization, we further adopt a generative diffusion model to synthesize a style-consistent omnidirectional sky image, which offers a more immersive atmosphere and assists the semantic stylization process. The stylized neural texture field can be baked into an arbitrary-resolution texture, enabling seamless integration into conventional rendering pipelines and significantly easing the virtual production prototyping process. Extensive experiments demonstrate our stylized scenes' superiority in qualitative and quantitative performance and user preferences.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# 物理インフォームドアクティブラーニングによる量子化学シミュレーションの高速化

Physics-informed active learning for accelerating quantum chemical simulations ( http://arxiv.org/abs/2404.11811v2 )

ライセンス: Link先を確認

Yi-Fan Hou, Lina Zhang, Quanhao Zhang, Fuchun Ge, Pavlo O. Dral,

(参考訳) 量子化学シミュレーションは、しばしばアクティブラーニング(AL)を使用して行われる機械学習ポテンシャルを構築することで、大幅に加速することができる。構築されたポテンシャルの有用性は、必要とされる高い労力とシミュレーションにおいて不十分なロバスト性によって制限されることが多い。ここでは、時間とリソースを手頃な価格で投資し、人間の干渉を最小限に抑えて、堅牢なデータ効率ポテンシャルを構築するためのエンドツーエンドALを紹介する。我々のALプロトコルは、物理インフォームドされたトレーニングポイントのサンプリング、初期データの自動選択、不確実性定量化、収束モニタリングに基づいている。このプロトコルの汎用性は、振動スペクトルをシミュレートするための準古典分子動力学、重要な生化学分子のコンホメータ探索、ディールス・アルダー反応の時間分解機構の実装において示される。これらの調査は、高性能コンピューティングクラスタ上での純粋な量子化学計算ではなく、数週間を要した。 MLatomとチュートリアルのコードはhttps://github.com/dralgroup/mlatom.comで公開されている。

Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable investment of time and resources and minimum human interference. Our AL protocol is based on the physics-informed sampling of training points, automatic selection of initial data, uncertainty quantification, and convergence monitoring. The versatility of this protocol is shown in our implementation of quasi-classical molecular dynamics for simulating vibrational spectra, conformer search of a key biochemical molecule, and time-resolved mechanism of the Diels-Alder reactions. These investigations took us days instead of weeks of pure quantum chemical calculations on a high-performance computing cluster. The code in MLatom and tutorials are available at https://github.com/dralgroup/mlatom.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# RetailOpt:スマートフォンの動きデータと小売施設情報から簡単に軌道を推定できるOpt-In

RetailOpt: Opt-In, Easy-to-Deploy Trajectory Estimation from Smartphone Motion Data and Retail Facility Information ( http://arxiv.org/abs/2404.12548v2 )

ライセンス: Link先を確認

Ryo Yonetani, Jun Baba, Yasutaka Furukawa,

(参考訳) RetailOptは、屋内小売環境でオフラインで顧客の動きを追跡するための、オプトインで簡単にデプロイできる新しいシステムである。このシステムは、顧客のスマートフォンや小売アプリから簡単にアクセス可能な情報(モーションデータ、ストアマップ、購入記録など)を利用する。これにより、追加のハードウェアインストール/メンテナンスが不要になり、顧客が完全なデータコントロールを保証できる。具体的には、RetailOptはまず慣性ナビゲーションを使用して、スマートフォンのモーションデータから相対軌道を復元する。店舗マップと購入記録は、訪問した棚のリストを特定するために相互参照され、連続的かつ離散的な最適化を通じて、店舗内の相対軌跡をローカライズするアンカーを提供する。 5つの異なる環境におけるシステムの有効性を実証する。このシステムは、成功すれば、顧客の行動分析や店内ナビゲーションを含む幅広い小売アプリケーションに不可欠な、正確な顧客移動データを生成する。

We present RetailOpt, a novel opt-in, easy-to-deploy system for tracking customer movements offline in indoor retail environments. The system uses readily accessible information from customer smartphones and retail apps, including motion data, store maps, and purchase records. This eliminates the need for additional hardware installations/maintenance and ensures customers full data control. Specifically, RetailOpt first uses inertial navigation to recover relative trajectories from smartphone motion data. The store map and purchase records are cross-referenced to identify a list of visited shelves, providing anchors to localize the relative trajectories in a store through continuous and discrete optimization. We demonstrate the effectiveness of our system in five diverse environments. The system, if successful, would produce accurate customer movement data, essential for a broad range of retail applications including customer behavior analysis and in-store navigation.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# 絡み合いに基づく人工トポロジー:周辺ネットワークノード

Entanglement-Based Artificial Topology: Neighboring Remote Network Nodes ( http://arxiv.org/abs/2404.16204v2 )

ライセンス: Link先を確認

SiYi Chen, Jessica Illiano, Angela Sara Cacciapuoti, Marcello Caleffi,

(参考訳) 絡み合いは、量子インターネットの鍵となる通信資源として全会一致で認識される。しかし, 両端の絡み合いに注意を集中させることによって, 両端の絡み合いを生かして, 新たなネットワーク機能を実現する可能性について, これまでに検討が進んでいない。本稿では,ネットワーク間リソースとしてマルチパーティ・エンタングルメントを活用することを目的としている。具体的には、異なる量子局所領域ネットワーク(QLAN)の相互接続を考察し、マルチパーティント・エンタングルメントにより、局所演算のみにより、物理QLANトポロジの限界を克服する、QLAN間人工トポロジを動的に生成できることを示す。そこで本研究ではまず,各QLANに分散するマルチパーティの絡み合った状態を設計する。そして、そのような状態がどのように設計されるかを示す。一異なるQLANに属する相互接続ノード及び二異なるQLAN間トラフィックパターンに動的に適応すること。我々の貢献は、ネットワークエンジニアリングコミュニティに、人工トポロジと人工地区の概念に関する手持ちのガイドラインを提供することである。

Entanglement is unanimously recognized as the key communication resource of the Quantum Internet. Yet, the possibility of implementing novel network functionalities by exploiting the marvels of entanglement has been poorly investigated so far, by mainly restricting the attention to bipartite entanglement. Conversely, in this paper, we aim at exploiting multipartite entanglement as inter-network resource. Specifically, we consider the interconnection of different Quantum Local Area Networks (QLANs), and we show that multipartite entanglement allows to dynamically generate an inter-QLAN artificial topology, by means of local operations only, that overcomes the limitations of the physical QLAN topologies. To this aim, we first design the multipartite entangled state to be distributed within each QLAN. Then, we show how such a state can be engineered to: i) interconnect nodes belonging to different QLANs, and ii) dynamically adapt to different inter-QLAN traffic patterns. Our contribution aims at providing the network engineering community with a hands-on guideline towards the concept of artificial topology and artificial neighborhood.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# HYPE:未特定画像とテキストのためのハイパーボリックエンターメントフィルタ

HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts ( http://arxiv.org/abs/2404.17507v2 )

ライセンス: Link先を確認

Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun,

(参考訳) データ量によって自己教師付き学習の有効性が促進される時代において、データセマンティクスの特異性と明確性はモデルトレーニングにおいて重要な役割を担っている。そこで, HYPerbolic Entailment Filtering (HYPE) を導入し, 広範でノイズの多い画像とテキストのペアのデータセットから, モダリティに有意かつ整合性のあるデータを正確に抽出する手法を提案する。提案手法は, ハイパーボリックな埋め込みとエンテーメント・コーンの概念を利用して, サンプルを無意味あるいは不特定なセマンティクスで評価・フィルタリングし, サンプルの特異性の向上に重点を置いている。 HYPEは、フィルタリング効率を大幅に改善するだけでなく、既存のフィルタリング技術と組み合わせることで、DataCompベンチマークの最先端を新たに設定する。このブレークスルーは、HYPEがデータ選択プロセスを洗練させる可能性を示し、より正確で効率的な自己教師型学習モデルの開発に寄与する。さらに、画像特異性$\epsilon_{i}$は、画像テキストまたは画像のみのデータプールから画像のみのデータセットをインジェクションして、画像のみの自己教師付きモデルをトレーニングし、CLIPスコアによって誘導されたデータセットと比較して優れたパフォーマンスを示すために独立して適用することができる。

In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training. Addressing this, we introduce HYPerbolic Entailment filtering (HYPE), a novel methodology designed to meticulously extract modality-wise meaningful and well-aligned data from extensive, noisy image-text pair datasets. Our approach leverages hyperbolic embeddings and the concept of entailment cones to evaluate and filter out samples with meaningless or underspecified semantics, focusing on enhancing the specificity of each data sample. HYPE not only demonstrates a significant improvement in filtering efficiency but also sets a new state-of-the-art in the DataComp benchmark when combined with existing filtering techniques. This breakthrough showcases the potential of HYPE to refine the data selection process, thereby contributing to the development of more accurate and efficient self-supervised learning models. Additionally, the image specificity $\epsilon_{i}$ can be independently applied to induce an image-only dataset from an image-text or image-only data pool for training image-only self-supervised models and showed superior performance when compared to the dataset induced by CLIP score.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# 行列積状態における非安定度と絡み合い

Non-stabilizerness versus entanglement in matrix product states ( http://arxiv.org/abs/2404.18768v2 )

ライセンス: Link先を確認

M. Frau, P. S. Tarabunga, M. Collura, M. Dalmonte, E. Tirrito,

(参考訳) 本稿では,行列積状態(MPS)における絡み合いと非安定化剤性(マジックとも呼ばれる)の関係について検討する。スピン1異方性ハイゼンベルク鎖のマジックと相互マジックの完全状態(相互情報の非安定化アナログ、したがって境界効果のない)の2つの異なる文脈において、多体系の基底状態を近似するために用いられるマジックと結合次元の関係について検討する。この結果から,非安定化剤性に対する収束結果の取得は,典型的には絡み合いよりもかなり容易であることが示唆された。臨界点と十分に大きな体積での完全な状態マジックに対して、$\chi$はMPS結合次元である1/\chi^2$の収束を観測する。小さなボリュームでは、マジック飽和が非常に速く、エラーバー内では、有限$\chi$補正を評価できない。相互魔法はまた、結合次元との高速な収束を示すが、その特定の機能形態はサンプリングエラーによって妨げられる。本研究の副産物として,パウリ・マルコフ連鎖(当初は魔法を評価するために定式化された)がMPSの相互情報の計算において最先端の情報をリセットする方法を示す。臨界点における連結分割間の相互情報の対数的増加を検証することで、この最後の事実を説明する。相互情報と相互マジックを比較することで、接続されたパーティションの場合、後者は通常、パーティションサイズとパーティションサイズとのスケーリングが遅くなります。

In this paper, we investigate the relationship between entanglement and non-stabilizerness (also known as magic) in matrix product states (MPSs). We study the relation between magic and the bond dimension used to approximate the ground state of a many-body system in two different contexts: full state of magic and mutual magic (the non-stabilizer analogue of mutual information, thus free of boundary effects) of spin-1 anisotropic Heisenberg chains. Our results indicate that obtaining converged results for non-stabilizerness is typically considerably easier than entanglement. For full state magic at critical points and at sufficiently large volumes, we observe convergence with $1/\chi^2$, with $\chi$ being the MPS bond dimension. At small volumes, magic saturation is so quick that, within error bars, we cannot appreciate any finite-$\chi$ correction. Mutual magic also shows a fast convergence with bond dimension, whose specific functional form is however hindered by sampling errors. As a by-product of our study, we show how Pauli-Markov chains (originally formulated to evaluate magic) resets the state of the art in terms of computing mutual information for MPS. We illustrate this last fact by verifying the logarithmic increase of mutual information between connected partitions at critical points. By comparing mutual information and mutual magic, we observe that, for connected partitions, the latter is typically scaling much slower - if at all - with the partition size, while for disconnected partitions, both are constant in size.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# ニューロモルフィックハードウェアにおけるロバスト多時間記号計算を可能にする分散表現

Distributed Representations Enable Robust Multi-Timescale Symbolic Computation in Neuromorphic Hardware ( http://arxiv.org/abs/2405.01305v2 )

ライセンス: Link先を確認

Madison Cotteret, Hugh Greatorex, Alpha Renner, Junren Chen, Emre Neftci, Huaqiang Wu, Giacomo Indiveri, Martin Ziegler, Elisabetta Chicca,

(参考訳) マルチスケール計算を堅牢に行うために、繰り返しスパイクニューラルネットワーク(RSNN)をプログラミングすることは、依然として難しい課題である。これを解決するために,高次元分布表現の特性を利用して,ロバストなマルチタイムダイナミックスをアトラクタベースRSNNに組み込むシングルショット重み学習手法について述べる。対称自己解離重み行列と非対称遷移項を重畳することにより、有限状態機械をRSNN力学に埋め込み、それぞれ状態間の入力とヘテロ解離外部積のベクトル結合によって形成される。提案手法は,高度に非理想的な重みを持つシミュレーション,実験的なクローズドループ・メムリシブ・ハードウェア・セットアップ,および大規模マシンにシームレスにスケールするLoihi 2を用いて検証した。この研究は、パラメータの微調整やプラットフォーム固有の重要な最適化を必要とせず、リカレントダイナミクスによる堅牢な記号計算をニューロモルフィックハードウェアに組み込むスケーラブルなアプローチを導入している。さらに、分散シンボル表現は、ニューロモルフィックハードウェアにおける認知アルゴリズムのための高度に有能な表現不変言語として機能することを示した。

Programming recurrent spiking neural networks (RSNNs) to robustly perform multi-timescale computation remains a difficult challenge. To address this, we describe a single-shot weight learning scheme to embed robust multi-timescale dynamics into attractor-based RSNNs, by exploiting the properties of high-dimensional distributed representations. We embed finite state machines into the RSNN dynamics by superimposing a symmetric autoassociative weight matrix and asymmetric transition terms, which are each formed by the vector binding of an input and heteroassociative outer-products between states. Our approach is validated through simulations with highly non-ideal weights; an experimental closed-loop memristive hardware setup; and on Loihi 2, where it scales seamlessly to large state machines. This work introduces a scalable approach to embed robust symbolic computation through recurrent dynamics into neuromorphic hardware, without requiring parameter fine-tuning or significant platform-specific optimisation. Moreover, it demonstrates that distributed symbolic representations serve as a highly capable representation-invariant language for cognitive algorithms in neuromorphic hardware.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# ワンオンラインエージェントは、平均的なフィールドゲームを効果的に学習できる

A Single Online Agent Can Efficiently Learn Mean Field Games ( http://arxiv.org/abs/2405.03718v2 )

ライセンス: Link先を確認

Chenyu Zhang, Xu Chen, Xuan Di,

(参考訳) 平均場ゲーム (MFGs) は大規模人口システムの振る舞いをモデル化するための有望なフレームワークである。しかし、MFGの解決は、前向きの個体群進化と後向きのエージェントダイナミクスの結合によって困難になる可能性がある。通常、平均場 Nash 平衡 (MFNE) を得るには、固定点反復 (FPI) と呼ばれる前方と後方のプロセスが交互に解かれる反復的アプローチが必要となる。この方法は、空間領域全体にわたって完全に観察された人口伝播とエージェントダイナミクスを必要とするが、現実のシナリオでは現実的ではない。この制限を克服するために,本研究では,オンラインサンプルを用いたMFNE学習を,状態-行動空間,報酬関数,遷移ダイナミクスの事前知識を伴わずに行うことのできる,新しいオンライン単エージェントモデルフリー学習方式を提案する。具体的には、エージェントは、そのポリシーを値関数(Q)を介して更新し、同時に平均場状態(M)を評価し、同じ観察バッチを用いて評価する。我々はこの学習方式の2つの変種を開発する: オフ・ポリティクスとオン・ポリティクスのQM反復である。それらが効率的にFPIを近似していることが証明され、複雑性の保証が提供される。数値実験により本手法の有効性を確認した。

Mean field games (MFGs) are a promising framework for modeling the behavior of large-population systems. However, solving MFGs can be challenging due to the coupling of forward population evolution and backward agent dynamics. Typically, obtaining mean field Nash equilibria (MFNE) involves an iterative approach where the forward and backward processes are solved alternately, known as fixed-point iteration (FPI). This method requires fully observed population propagation and agent dynamics over the entire spatial domain, which could be impractical in some real-world scenarios. To overcome this limitation, this paper introduces a novel online single-agent model-free learning scheme, which enables a single agent to learn MFNE using online samples, without prior knowledge of the state-action space, reward function, or transition dynamics. Specifically, the agent updates its policy through the value function (Q), while simultaneously evaluating the mean field state (M), using the same batch of observations. We develop two variants of this learning scheme: off-policy and on-policy QM iteration. We prove that they efficiently approximate FPI, and a sample complexity guarantee is provided. The efficacy of our methods is confirmed by numerical experiments.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# 確率的な1ステップ生成のための特徴学習

Characteristic Learning for Provable One Step Generation ( http://arxiv.org/abs/2405.05512v4 )

ライセンス: Link先を確認

Zhao Ding, Chenguang Duan, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang, Pingwen Zhang,

(参考訳) 本稿では,GAN(Generative Adversarial Networks)におけるサンプリング効率とフローベースモデルの安定した性能を組み合わせた,新しい一段階生成モデルである特徴生成器を提案する。我々のモデルは、確率密度輸送を通常の微分方程式(ODE)で記述できる特性によって駆動される。具体的には、非パラメトリック回帰を用いて速度場を推定し、Euler法を用いて確率フローODEを解き、特性に対する一連の離散近似を生成する。次に、深層ニューラルネットワークを用いてこれらの特性に適合し、先行分布を目標分布へ効果的にプッシュするワンステップマッピングを確実にする。理論的には, 速度マッチング, オイラー離散化, 特性適合の誤差を分析し, 2-ワッサーシュタイン距離における特性発生器の非漸近収束速度を確立する。私たちの知る限りでは、これはシミュレーションなしの1ステップ生成モデルに対する最初の徹底的な分析である。さらに,本研究では,前処理におけるフローベース生成モデルの誤差解析を改良する。提案手法を合成データセットと実データセットの両方に適用し,ニューラルネットワークの単一評価で特徴生成器が高次品質を実現することを示す。

We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field through nonparametric regression and utilize Euler method to solve the probability flow ODE, generating a series of discrete approximations to the characteristics. We then use a deep neural network to fit these characteristics, ensuring a one-step mapping that effectively pushes the prior distribution towards the target distribution. In the theoretical aspect, we analyze the errors in velocity matching, Euler discretization, and characteristic fitting to establish a non-asymptotic convergence rate for the characteristic generator in 2-Wasserstein distance. To the best of our knowledge, this is the first thorough analysis for simulation-free one step generative models. Additionally, our analysis refines the error analysis of flow-based generative models in prior works. We apply our method on both synthetic and real datasets, and the results demonstrate that the characteristic generator achieves high generation quality with just a single evaluation of neural network.

翻訳日:2024-07-17 20:59:06 公開日:2024-07-16

# ダンス・アニー・ビート:ダンス・ビデオ・ジェネレーションのビジュアル・ビート

Dance Any Beat: Blending Beats with Visuals in Dance Video Generation ( http://arxiv.org/abs/2405.09266v2 )

ライセンス: Link先を確認

Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai,

(参考訳) 自動振付は、音楽からダンスを生成することによって進行する。現在の方法では、完全なダンスビデオではなくスケルトンキーポイントシーケンスを作成し、実際の使用を制限することで、特定の個人がダンスをすることができない。これらのメソッドには正確なキーポイントアノテーションも必要であり、データの収集が難しくなり、自作のビデオデータセットの使用が制限される。これらの課題を克服するために、音楽によってガイドされた個人の画像から直接ダンスビデオを生成するという新しいタスクを導入する。このタスクは、キーポイントアノテーションを必要とせず、特定の個人のダンス生成を可能にする。我々のソリューションであるDance Any Beat Diffusion Model (DabFusion)は、参照画像と楽曲を使用して、さまざまなダンスタイプや振付を特徴とするダンスビデオを生成する。音楽は、ダンススタイル、ムーブメント、リズムといった重要な特徴を識別する、特別に設計された音楽エンコーダによって分析される。 DabFusionは、トレーニングデータセットの個人だけでなく、これまで目に見えない人でもダンスビデオを生成するのに長けている。この汎用性は、画像中の任意の人物をアニメーションするために必要なすべての動き情報を含む潜在光学フローを生成するというアプローチに起因している。 AIST++データセットを用いてDabFusionの性能評価を行い,映像品質,オーディオ・ビデオ同期,モーション・ミュージックアライメントに着目した。本研究では、ビートアライメントスコアをベースとした2次元モーションミュージックアライメントスコア(2D-MMアライメントスコア)を提案する。実験の結果、我々のDabFusionがこの革新的なタスクの確かなベースラインを確立していることがわかった。ビデオの結果はプロジェクトのページで確認できます。

Automated choreography advances by generating dance from music. Current methods create skeleton keypoint sequences, not full dance videos, and cannot make specific individuals dance, limiting their real-world use. These methods also need precise keypoint annotations, making data collection difficult and restricting the use of self-made video datasets. To overcome these challenges, we introduce a novel task: generating dance videos directly from images of individuals guided by music. This task enables the dance generation of specific individuals without requiring keypoint annotations, making it more versatile and applicable to various situations. Our solution, the Dance Any Beat Diffusion model (DabFusion), utilizes a reference image and a music piece to generate dance videos featuring various dance types and choreographies. The music is analyzed by our specially designed music encoder, which identifies essential features including dance style, movement, and rhythm. DabFusion excels in generating dance videos not only for individuals in the training dataset but also for any previously unseen person. This versatility stems from its approach of generating latent optical flow, which contains all necessary motion information to animate any person in the image. We evaluate DabFusion's performance using the AIST++ dataset, focusing on video quality, audio-video synchronization, and motion-music alignment. We propose a 2D Motion-Music Alignment Score (2D-MM Align), which builds on the Beat Alignment Score to more effectively evaluate motion-music alignment for this new task. Experiments show that our DabFusion establishes a solid baseline for this innovative task. Video results can be found on our project page: https://DabFusion.github.io.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# モダリティエキスパートの混在による脳病変分割の基礎モデル

A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts ( http://arxiv.org/abs/2405.10246v2 )

ライセンス: Link先を確認

Xinru Zhang, Ni Ou, Berke Doga Basaran, Marco Visentin, Mengyun Qiao, Renyang Gu, Cheng Ouyang, Yaou Liu, Paul M. Matthew, Chuyang Ye, Wenjia Bai,

(参考訳) 脳病変の分節は神経研究や診断において重要な役割を担っている。脳病変は様々な病理学的変化によって引き起こされる可能性があるため、異なるタイプの脳病変は、異なる画像モダリティに異なる特徴を持つ傾向がある。この複雑さのため、脳病変のセグメンテーション法はしばしばタスク固有の方法で開発される。特定の病変タイプと画像のモダリティに対して、特定のセグメンテーションモデルを開発する。しかし、タスク固有のモデルを使用することで、病変のタイプや画像のモダリティが事前に決定され、現実のシナリオへの展開が複雑になる。そこで本研究では,様々な画像モダリティの入力データに対して,異なる種類の脳病変を自動的に分割できる3次元脳病変分割のための普遍的基礎モデルを提案する。我々は,様々な画像モダリティに対応する複数のエキスパートネットワークを備えた,新しいMixture of Modality Experts (MoME) フレームワークを定式化する。階層的なゲーティングネットワークは、専門家の予測を組み合わせて、専門的なコラボレーションを促進する。さらに、各専門家ネットワークの劣化を回避し、その専門性を維持するために、訓練中のカリキュラム学習戦略を導入する。提案手法は5つの画像モダリティと8種類の病変を含む9つの脳病変データセットを用いて評価した。その結果、我々のモデルは最先端のユニバーサルモデルよりも優れており、未知のデータセットに有望な一般化を提供することが示された。

Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# KernelSHAP-IQ:共有インタラクションのための重み付き最小二乗最適化

KernelSHAP-IQ: Weighted Least-Square Optimization for Shapley Interactions ( http://arxiv.org/abs/2405.10852v2 )

ライセンス: Link先を確認

Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, Barbara Hammer,

(参考訳) Shapley値(SV)は、ブラックボックスMLモデルを理解するために、クレジットカードを機械学習(ML)エンティティに割り当てる一般的なアプローチである。このような解釈を高次相互作用で強化することは、Shapley Interaction Index (SII) がSVの直接公理的拡張である複雑なシステムでは避けられない。 SVが重み付き最小二乗(WLS)の目的によって任意のゲームの最適近似を得られることはよく知られているが、この結果のSIIへの拡張は長い間未解決の問題であり、代替指標の提案さえも導いた。本研究では、WLS問題の解として高階SIIを特徴付け、SIIと$k$-Shapley値(k$-SII)による最適近似を構築する。 SV とペアワイズ SII に対してこの表現を証明し、より高い順序に対して経験的に検証された予想を与える。その結果、SII 用 KernelSHAP の直接拡張である KernelSHAP-IQ を提案し、機能相互作用の最先端性能を示す。

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and $k$-Shapley values ($k$-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# 情報アクセスのための生成人工知能の社会学的意味

Sociotechnical Implications of Generative Artificial Intelligence for Information Access ( http://arxiv.org/abs/2405.11612v2 )

ライセンス: Link先を確認

Bhaskar Mitra, Henriette Cramer, Olya Gurevich,

(参考訳) 信頼できる情報へのロバストなアクセスは、知識生産、公衆衛生教育、民主社会における情報市民の促進といった社会にとって重要な必要性である。生成的AI技術は、情報にアクセスし、既存の情報検索システムの有効性を改善する新しい方法を可能にするかもしれませんが、私たちはその長期的な社会的意味を理解し、理解し始めています。本章では、情報アクセスの文脈において、生成AIを採用する際のシステム的結果とリスクについて概説する。また,評価と緩和の勧告も提供し,今後の研究課題について論じる。

Robust access to trustworthy information is a critical need for society with implications for knowledge production, public health education, and promoting informed citizenry in democratic societies. Generative AI technologies may enable new ways to access information and improve effectiveness of existing information retrieval systems but we are only starting to understand and grapple with their long-term social implications. In this chapter, we present an overview of some of the systemic consequences and risks of employing generative AI in the context of information access. We also provide recommendations for evaluation and mitigation, and discuss challenges for future research.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# グローバル・ローカル・セマンティック・一貫性学習によるテキスト・ビデオ検索

Text-Video Retrieval with Global-Local Semantic Consistent Learning ( http://arxiv.org/abs/2405.12710v3 )

ライセンス: Link先を確認

Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Hengtao Shen,

(参考訳) 大規模画像テキスト事前学習モデル(例えばCLIP)をビデオ領域に適応させることは、テキストビデオ検索の最先端を表現している。第一のアプローチは、テキストとビデオのペアを共通の埋め込み空間に転送することと、特定のエンティティ上のクロスモーダルな相互作用を活用してセマンティックアライメントを構築することである。効果はあるものの、これらのパラダイムは計算コストを禁止し、非効率な検索に繋がる。そこで本研究では,テキスト・ビデオ検索のモダリティにまたがる潜在的共有セマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティックセマンティック・ラーニング(GLSCL)を提案する。具体的には、粗い粒度のアライメントを探索するパラメータフリーなグローバル相互作用モジュールを提案する。そこで我々は,複数の学習可能なクエリを用いて,微粒なアライメントを学習するための潜在意味概念をキャプチャする共有ローカルインタラクションモジュールを考案した。さらに、ビジュアルクエリと対応するテキストクエリの整合性を達成するために、ICL(Inter-Consistency Loss)が考案され、ビジュアル(テキスト)クエリ内の分散を反発させてより識別的な概念を生成するために、IDL(Intra-Diversity Loss)が開発された。 MSR-VTT, MSVD, DiDeMo, LSMDC, ActivityNet の5つの広く使用されているベンチマーク実験により,提案手法の有効性と有効性を実証した。また,本手法はSOTAと同等の性能を示し,計算コストの約220倍の高速化を実現している。コードは、https://github.com/zchoi/GLSCLで入手できる。

Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, leading to inefficient retrieval. To address this, we propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL), which capitalizes on latent shared semantics across modalities for text-video retrieval. Specifically, we introduce a parameter-free global interaction module to explore coarse-grained alignment. Then, we devise a shared local interaction module that employs several learnable queries to capture latent semantic concepts for learning fine-grained alignment. Furthermore, an Inter-Consistency Loss (ICL) is devised to accomplish the concept alignment between the visual query and corresponding textual query, and an Intra-Diversity Loss (IDL) is developed to repulse the distribution within visual (textual) queries to generate more discriminative concepts. Extensive experiments on five widely used benchmarks (i.e., MSR-VTT, MSVD, DiDeMo, LSMDC, and ActivityNet) substantiate the superior effectiveness and efficiency of the proposed method. Remarkably, our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost. Code is available at: https://github.com/zchoi/GLSCL.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# FLIPHAT:高次元スパルスリニアバンドの差分プライバシー

FLIPHAT: Joint Differential Privacy for High Dimensional Sparse Linear Bandits ( http://arxiv.org/abs/2405.14038v2 )

ライセンス: Link先を確認

Sunrit Chakraborty, Saptarshi Roy, Debabrota Basu,

(参考訳) 高次元スペアリニアバンドは、ユーザの高次元特徴(例えばゲノムデータ)が利用できるが、そのごく一部だけが関連している、シーケンシャルな意思決定問題(例えばパーソナライズドメディカル)の効率的なモデルとして機能する。これらのアプリケーションにおけるデータプライバシの懸念により、我々は、報酬と文脈の両方をプライベートデータとみなす、差分的にプライベートな高次元の疎線形帯域について検討する。まず、プライバシのコストを定量化するために、この設定で達成可能な後悔の限界を低くする。さらにこの問題に対処するため、計算効率の良い帯域幅アルゴリズムである \textbf{F}orgetfu\textbf{L} \textbf{I}terative \textbf{P}rivate \textbf{HA}rd \textbf{T}hresholding (FLIPHAT) を設計する。 FLIPHATはエピソードの倍増とエピソード的忘れ込みとともに、プライバシと後悔の最適性の両方を保証するために、疎線形回帰オラクルとしてノイズイテレーティブ・ハード・スレッショニング(N-IHT)アルゴリズムの亜種をデプロイする。また,FLIPHATは対数的要因を最適に再現できることが示唆された。並列利害関係であるN-IHTの推定誤差を, より精巧に解析することで, 後悔の分析を行う。

High dimensional sparse linear bandits serve as an efficient model for sequential decision-making problems (e.g. personalized medicine), where high dimensional features (e.g. genomic data) on the users are available, but only a small subset of them are relevant. Motivated by data privacy concerns in these applications, we study the joint differentially private high dimensional sparse linear bandits, where both rewards and contexts are considered as private data. First, to quantify the cost of privacy, we derive a lower bound on the regret achievable in this setting. To further address the problem, we design a computationally efficient bandit algorithm, \textbf{F}orgetfu\textbf{L} \textbf{I}terative \textbf{P}rivate \textbf{HA}rd \textbf{T}hresholding (FLIPHAT). Along with doubling of episodes and episodic forgetting, FLIPHAT deploys a variant of Noisy Iterative Hard Thresholding (N-IHT) algorithm as a sparse linear regression oracle to ensure both privacy and regret-optimality. We show that FLIPHAT achieves optimal regret up to logarithmic factors. We analyze the regret by providing a novel refined analysis of the estimation error of N-IHT, which is of parallel interest.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# ニューラルPDEサロゲートを用いた二相流の加速シミュレーション

Accelerating Simulation of Two-Phase Flows with Neural PDE Surrogates ( http://arxiv.org/abs/2405.17260v2 )

ライセンス: Link先を確認

Yoeri Poels, Koen Minartz, Harshit Bansal, Vlado Menkovski,

(参考訳) シミュレーションは物理系をよりよく理解するための強力なツールであるが、一般に計算に高価な数値法を必要とする。このようなシミュレーションの下流の応用は、例えば多くの自由度を持つ逆設計の場合など、多くの前方解を必要とする場合、計算不可能となる。本研究では,2相流問題に対するスケーリングシミュレーションを支援するツールとして,ニューラルPDEソルバを検討・拡張し,特に孔内からの油流出のシミュレーションを行う。この問題に対する既存の数値的手法を、ドメインの様々なジオメトリを含むより複雑な設定に拡張し、挑戦的なデータセットを生成する。さらに,UNet,DRN,U-FNOの3つの顕著なPDE解法について検討し,油流出問題の特徴として,(1)幾何学上の空間条件,(2)境界における周期性,(3)近似質量保存について検討した。我々は全ての手法をスケールし、その速度精度トレードオフをベンチマークし、質的特性を評価し、アブレーション研究を行う。提案手法は, 最大3桁の速さで液滴力学を正確にモデル化し, 拡張によりベースラインよりも性能が向上し, 導入した様々なジオメトリーが, 従来検討されていた油流出問題よりもはるかに困難であることがわかった。

Simulation is a powerful tool to better understand physical systems, but generally requires computationally expensive numerical methods. Downstream applications of such simulations can become computationally infeasible if they require many forward solves, for example in the case of inverse design with many degrees of freedom. In this work, we investigate and extend neural PDE solvers as a tool to aid in scaling simulations for two-phase flow problems, and simulations of oil expulsion from a pore specifically. We extend existing numerical methods for this problem to a more complex setting involving varying geometries of the domain to generate a challenging dataset. Further, we investigate three prominent neural PDE solver methods, namely the UNet, DRN, and U-FNO, and extend them for characteristics of the oil-expulsion problem: (1) spatial conditioning on the geometry; (2) periodicity in the boundary; (3) approximate mass conservation. We scale all methods and benchmark their speed-accuracy trade-off, evaluate qualitative properties, and perform an ablation study. We find that the investigated methods can accurately model the droplet dynamics with up to three orders of magnitude speed-up, that our extensions improve performance over the baselines, and that the introduced varying geometries constitute a significantly more challenging setting over the previously considered oil expulsion problem.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# 視覚強化学習における非有界データ強化の試み

A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning ( http://arxiv.org/abs/2405.17416v2 )

ライセンス: Link先を確認

Abdulaziz Almuzairee, Nicklas Hansen, Henrik I. Christensen,

(参考訳) Q-learningアルゴリズムは、データ効率のために現実世界のアプリケーションにアピールしていますが、視覚的な観察からトレーニングされた場合、過度に適合し、トレーニングする傾向があります。以前の研究、すなわちSVEAは、データ拡張の選択的応用は、トレーニングを不安定にすることなく、RLエージェントの視覚的一般化を改善することができることを示した。我々は、データ拡張のためのレシピを再検討し、その効果を測光特性の増強に制限する仮定を求める。これらの制限に対処し、より広い種類の拡張を扱う一般化されたレシピであるSADAを提案する。提案するDMControl Generalization Benchmark と Meta-World および Distracting Control Suite のタスクを拡張した DMC-GB2 にその効果をベンチマークし,その方法である SADA が,多種多様な拡張セットにおけるトレーニング安定性と RL エージェントの一般化を大幅に改善することを発見した。視覚化、コード、ベンチマークについてはhttps://aalmuzairee.github.io/SADA/を参照してください。

Q-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 - our proposed extension of the popular DMControl Generalization Benchmark - as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. For visualizations, code and benchmark: see https://aalmuzairee.github.io/SADA/

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# 画像コピー検出のためのコンパクトディスクリプタによる自己教師付き蒸留

Relational Self-supervised Distillation with Compact Descriptors for Image Copy Detection ( http://arxiv.org/abs/2405.17928v4 )

ライセンス: Link先を確認

Juntae Kim, Sungwon Woo, Jongho Nang,

(参考訳) 画像コピー検出は、参照データベース内の任意の画像から編集されたコピーを検出するタスクである。従来のアプローチは目覚ましい進歩を見せたが、ネットワークと記述子の大きさは依然として不利であり、実用的応用を複雑にしている。本稿では,軽量ネットワークとコンパクトディスクリプタを用いて,競争性能を実現する手法を提案する。大規模ネットワークから小さなネットワークへ知識を伝達するために,リレーショナル自己教師型蒸留を利用することで,少ない記述子サイズの軽量ネットワークのトレーニングを可能にする。より小さな特徴空間におけるフレキシブルな表現のためのリレーショナル自己教師型蒸留を導入し, 次元崩壊を防止するために, 強負損失を伴うコントラスト学習を適用した。 DISC2021ベンチマークでは、ResNet-50/EfficientNet-B0を教師と学生それぞれに使用し、ベースライン法と比較して64/128/256ディスクリプタサイズのマイクロ平均精度を5.0%/4.9%/5.9%改善した。

Image copy detection is a task of detecting edited copies from any image within a reference database. While previous approaches have shown remarkable progress, the large size of their networks and descriptors remains disadvantage, complicating their practical application. In this paper, we propose a novel method that achieves a competitive performance by using a lightweight network and compact descriptors. By utilizing relational self-supervised distillation to transfer knowledge from a large network to a small network, we enable the training of lightweight networks with a small descriptor size. We introduce relational self-supervised distillation for flexible representation in a smaller feature space and applies contrastive learning with a hard negative loss to prevent dimensional collapse. For the DISC2021 benchmark, ResNet-50/EfficientNet-B0 are used as a teacher and student respectively, the micro average precision improved by 5.0%/4.9%/5.9% for 64/128/256 descriptor sizes compared to the baseline method.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# DeMamba: 数百万台のGenVideoベンチマークでAIが生成したビデオ検出

DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark ( http://arxiv.org/abs/2405.19707v2 )

ライセンス: Link先を確認

Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, Huaxiong Li,

(参考訳) 近年,映像生成技術は急速に進歩している。ソーシャルメディアプラットフォームでの動画コンテンツの人気を考えると、これらのモデルは偽情報の拡散に対する懸念を強めている。したがって、偽のAI生成ビデオを区別し、偽の情報による潜在的な害を軽減できる検出器の需要が高まっている。しかし、最も先進的なビデオジェネレータからの大規模なデータセットの欠如は、そのような検出器の開発に障壁をもたらす。このギャップに対処するために、最初のAI生成ビデオ検出データセットであるGenVideoを紹介する。 1)AIが生成した100万以上の実ビデオを含む大量のビデオ、(2)ビデオカテゴリと生成テクニックの幅広い範囲をカバーする、生成されたコンテンツと方法論の豊富な多様性。そこで,本研究では,実世界のシナリオに合わせた2つの評価手法を提案する。クロスジェネレータビデオ分類タスクは,ジェネレータ上での訓練された検出器の一般化性を評価する。さらに,デテール・マンバ (DeMamba, DeMamba) というプラグイン・アンド・プレイ・モジュールを導入し,時間次元と空間次元の矛盾を解析することにより,AI生成した映像を識別することで検出器の強化を図った。我々の大規模な実験は、既存の検出器と比較して、DeMambaのGenVideoにおける優れた一般化性とロバスト性を示している。我々は、GenVideoデータセットとDeMambaモジュールがAI生成ビデオ検出の分野を大幅に前進させると考えている。コードとデータセットは \url{https://github.com/chenhaoxing/DeMamba} でアビリザブルになります。

Recently, video generation techniques have advanced rapidly. Given the popularity of video content on social media platforms, these models intensify concerns about the spread of fake information. Therefore, there is a growing demand for detectors capable of distinguishing between fake AI-generated videos and mitigating the potential harm caused by fake information. However, the lack of large-scale datasets from the most advanced video generators poses a barrier to the development of such detectors. To address this gap, we introduce the first AI-generated video detection dataset, GenVideo. It features the following characteristics: (1) a large volume of videos, including over one million AI-generated and real videos collected; (2) a rich diversity of generated content and methodologies, covering a broad spectrum of video categories and generation techniques. We conducted extensive studies of the dataset and proposed two evaluation methods tailored for real-world-like scenarios to assess the detectors' performance: the cross-generator video classification task assesses the generalizability of trained detectors on generators; the degraded video classification task evaluates the robustness of detectors to handle videos that have degraded in quality during dissemination. Moreover, we introduced a plug-and-play module, named Detail Mamba (DeMamba), designed to enhance the detectors by identifying AI-generated videos through the analysis of inconsistencies in temporal and spatial dimensions. Our extensive experiments demonstrate DeMamba's superior generalizability and robustness on GenVideo compared to existing detectors. We believe that the GenVideo dataset and the DeMamba module will significantly advance the field of AI-generated video detection. Our code and dataset will be aviliable at \url{https://github.com/chenhaoxing/DeMamba}.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# エゴセントリックな行動認識のためのマルチモーダルなクロスドメインFew-Shot学習

Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition ( http://arxiv.org/abs/2405.19917v3 )

ライセンス: Link先を確認

Masashi Hatano, Ryo Hachiuma, Ryo Fujii, Hideo Saito,

(参考訳) マルチモーダル入力とラベルなしターゲットデータを用いた,エゴセントリックな行動認識のための新しいクロスドメイン少ショット学習タスク(CD-FSL)について検討する。本稿では,CD-FSL設定におけるエゴセントリックなアクション認識に関わる2つの重要な課題について,(1)エゴセントリックなビデオ(例えば,日々の生活と産業の領域)における極端なドメインギャップ,(2)実世界のアプリケーションにおける計算コストの2つを同時に解決する。本稿では,対象領域への適応性を向上し,推論コストを改善するために,ドメイン適応的で効率的なアプローチであるMM-CDFSLを提案する。最初の課題に対処するために,教師モデルを用いた学生RGBモデルへのマルチモーダル蒸留の導入を提案する。各教師モデルは、それぞれのモダリティのソースデータとターゲットデータに基づいて、独立して訓練される。マルチモーダル蒸留における未ラベルのターゲットデータのみを活用すると、学生モデルのターゲット領域への適応性が向上する。さらに,マスクによる入力トークン数を削減する手法であるアンサンブルマスク推論を導入する。このアプローチでは、アンサンブル予測はマスキングによる性能劣化を緩和し、2つ目の問題に効果的に対処する。当社のアプローチは、最先端のCD-FSLアプローチよりも優れており、複数のエゴセントリックデータセットに対してかなりのマージンを有し、平均6.12/6.10ポイントの1ショット/5ショット設定で改善され、推論速度は2.2ドルの速さで達成された。プロジェクトページ:https://masashi-hatano.github.io/MM-CDFSL/

We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input and unlabeled target data for egocentric action recognition. This paper simultaneously tackles two critical challenges associated with egocentric action recognition in CD-FSL settings: (1) the extreme domain gap in egocentric videos (e.g., daily life vs. industrial domain) and (2) the computational cost for real-world applications. We propose MM-CDFSL, a domain-adaptive and computationally efficient approach designed to enhance adaptability to the target domain and improve inference cost. To address the first challenge, we propose the incorporation of multimodal distillation into the student RGB model using teacher models. Each teacher model is trained independently on source and target data for its respective modality. Leveraging only unlabeled target data during multimodal distillation enhances the student model's adaptability to the target domain. We further introduce ensemble masked inference, a technique that reduces the number of input tokens through masking. In this approach, ensemble prediction mitigates the performance degradation caused by masking, effectively addressing the second issue. Our approach outperformed the state-of-the-art CD-FSL approaches with a substantial margin on multiple egocentric datasets, improving by an average of 6.12/6.10 points for 1-shot/5-shot settings while achieving $2.2$ times faster inference speed. Project page: https://masashi-hatano.github.io/MM-CDFSL/

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# スーパーガウシアン:3Dスーパーレゾリューションのためにビデオモデルを再購入

SuperGaussian: Repurposing Video Models for 3D Super Resolution ( http://arxiv.org/abs/2406.00609v4 )

ライセンス: Link先を確認

Yuan Shen, Duygu Ceylan, Paul Guerrero, Zexiang Xu, Niloy J. Mitra, Shenlong Wang, Anna Frühstück,

(参考訳) 本稿では,幾何学的および外観的詳細を付加することにより,粗い3次元モデルをアップサンプルする,単純でモジュラーで汎用的な手法を提案する。生成的な3Dモデルは現在存在するが、画像やビデオの領域におけるそれらのモデルの品質とはまだ一致していない。既存の(事前訓練済み)ビデオモデルを3次元超解像に直接再利用することは可能であり、高品質な3次元トレーニングモデルの大規模なリポジトリ不足の問題を副次的に解決できることを実証する。本稿では,3次元整合性のない映像アップサンプリングモデルを再利用し,それらを3次元整合化と組み合わせて3次元整合性のある結果を生成する方法について述べる。出力として、オブジェクト中心で有効である高品質なガウススプラモデルを生成する。本手法はカテゴリ非依存であり,既存の3Dワークフローに容易に組み込むことができる。提案したSuperGaussianを,複雑性と表現の両面で多種多様な3次元インプット(例えばガウススプレートやNeRF)で評価し,本手法が最終3次元モデルの忠実度を著しく向上させることを示す。詳細はプロジェクトのWebサイトをご覧ください。

We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models. Check our project website for details: supergaussian.github.io

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# LanEvil: レーン検出のロバストさを環境問題にベンチマークする

LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions ( http://arxiv.org/abs/2406.00934v4 )

ライセンス: Link先を確認

Tianyuan Zhang, Lu Wang, Hainan Li, Yisong Xiao, Siyuan Liang, Aishan Liu, Xianglong Liu, Dacheng Tao,

(参考訳) レーン検出(LD)は自律走行システムにおいて不可欠な要素であり、適応型クルーズ制御や自動車線センターなどの基本的な機能を提供している。既存のLDベンチマークは主に、道路上の影やタイヤマークのような環境錯覚に対するLDモデルの堅牢性を無視し、一般的なケースを評価することに焦点を当てている。この研究のギャップは、現実世界の交通状況に自然に存在するため、重要な安全上の課題を生じさせる。本稿では,これらの環境錯覚によるLDに対する潜在的脅威を初めて研究し,この自然破壊に対するLDの堅牢性を評価するための総合的な指標であるLanEvilを確立する。 LDタスクにおける実世界の影響要因を幅広くカバーする,14種類の重要かつ重要な環境錯覚(例えば,影,反射)を体系的に設計する。実世界の環境をベースとして、広く使われているCARLAシミュレータを用いて、94の現実的でカスタマイズ可能な3Dケースを作成し、90,292枚のサンプル画像からなるデータセットを作成する。大規模な実験を通じて、LanEvilを用いた一般的なLD手法の堅牢性をベンチマークし、性能劣化(平均5.37%の精度と10.70%のF1スコア)を明らかにし、シャドーエフェクトが最もリスクが高い(7.39%の精度)。さらに、協調シミュレーションにより商用自動運転システムOpenPilotとApolloの性能を評価し、提案した環境錯覚が誤った判断や交通事故につながることを実証する。環境イリュージョンに対する対策として,照明条件下でのロバスト性向上(+3.76%)を目立たせる厳密な例を用いた注意領域混合(AAM)手法を提案する。われわれの論文が今後、より堅牢な自動運転システムに貢献できることを願っている。ウェブサイト: https://lanevil.github.io/.com

Lane detection (LD) is an essential component of autonomous driving systems, providing fundamental functionalities like adaptive cruise control and automated lane centering. Existing LD benchmarks primarily focus on evaluating common cases, neglecting the robustness of LD models against environmental illusions such as shadows and tire marks on the road. This research gap poses significant safety challenges since these illusions exist naturally in real-world traffic situations. For the first time, this paper studies the potential threats caused by these environmental illusions to LD and establishes the first comprehensive benchmark LanEvil for evaluating the robustness of LD against this natural corruption. We systematically design 14 prevalent yet critical types of environmental illusions (e.g., shadow, reflection) that cover a wide spectrum of real-world influencing factors in LD tasks. Based on real-world environments, we create 94 realistic and customizable 3D cases using the widely used CARLA simulator, resulting in a dataset comprising 90,292 sampled images. Through extensive experiments, we benchmark the robustness of popular LD methods using LanEvil, revealing substantial performance degradation (-5.37% Accuracy and -10.70% F1-Score on average), with shadow effects posing the greatest risk (-7.39% Accuracy). Additionally, we assess the performance of commercial auto-driving systems OpenPilot and Apollo through collaborative simulations, demonstrating that proposed environmental illusions can lead to incorrect decisions and potential traffic accidents. To defend against environmental illusions, we propose the Attention Area Mixing (AAM) approach using hard examples, which witness significant robustness improvement (+3.76%) under illumination effects. We hope our paper can contribute to advancing more robust auto-driving systems in the future. Website: https://lanevil.github.io/.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# オンラインデータの重要性 : カバレッジによる選好の微調整を理解する

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage ( http://arxiv.org/abs/2406.01462v2 )

ライセンス: Link先を確認

Yuda Song, Gokul Swamy, Aarti Singh, J. Andrew Bagnell, Wen Sun,

(参考訳) 人間の嗜好データからの学習が,大規模言語モデル (LLM) を微調整する主要なパラダイムとして浮上している。 PPO(Proximal Policy Optimization)のようなオンライン強化学習(RL)と、DPO(Direct Preference Optimization)のようなオフラインのコントラスト的手法は、どちらも同一のオフライン優先データセットから開始する必要があるため、以前の作業では同等と位置づけられていた。選好微調整のためのオンラインとオフラインの技法の類似点と相違点に関する理論的理解をさらに深めるため、データセットカバレッジのレンズを通して厳密な分析を行い、トレーニングデータがテスト分布をどのようにカバーしているかを捉え、RLで広く使われている概念である。グローバルなカバレッジ条件は,オフラインのコントラスト手法が最適ポリシーに収束するのに必要かつ十分であることを示すが,オンラインRL手法ではより弱い部分カバレッジ条件で十分である。この分離によって、オンラインRLメソッドがオフラインメソッドよりも優れたパフォーマンスを得られる理由が説明できる。最後に, 従来の理論的観測をベースとして, オフラインデータをコントラッシブな選好最適化に用いるハイブリッド選好最適化(HyPO)アルゴリズムと, KL正則化のためのオンラインデータを導出する。理論的かつ実証的に、HyPOは純粋なオフラインのDPOよりも高性能でありながら、その計算とメモリ効率を保っていることを実証する。

Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset. To further expand our theoretical understanding of the similarities and differences between online and offline techniques for preference fine-tuning, we conduct a rigorous analysis through the lens of dataset coverage, a concept that captures how the training data covers the test distribution and is widely used in RL. We prove that a global coverage condition is both necessary and sufficient for offline contrastive methods to converge to the optimal policy, but a weaker partial coverage condition suffices for online RL methods. This separation provides one explanation of why online RL methods can perform better than offline methods, especially when the offline preference data is not diverse enough. Finally, motivated by our preceding theoretical observations, we derive a hybrid preference optimization (HyPO) algorithm that uses offline data for contrastive-based preference optimization and online data for KL regularization. Theoretically and empirically, we demonstrate that HyPO is more performant than its pure offline counterpart DPO, while still preserving its computation and memory efficiency.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# CommonPowerを用いた系統制御のための安全強化学習の強化

Empowering Safe Reinforcement Learning for Power System Control with CommonPower ( http://arxiv.org/abs/2406.03231v2 )

ライセンス: Link先を確認

Michael Eichelbeck, Hannah Markgraf, Matthias Althoff,

(参考訳) 電力系統管理の複雑さの増大により、強化学習(RL)への関心が高まっている。しかしながら、バニラRLコントローラはシステム制約の満足度を保証することはできない。したがって, 電力系統管理のためのRL研究において, 公式に正しい保護機構と組み合わせることが重要である。複雑なユースケースにセーフガードを統合するには、ツールのサポートが必要だ。このニーズに対処するために、PythonツールのCommonPowerを紹介します。 CommonPowerのユニークな貢献は、RLコントローラの柔軟なモデルベースの保護を可能にするシンボリックモデリングアプローチにある。さらにCommonPowerは、単一エージェントRL、マルチエージェントRL、最適制御のための統一インターフェースを提供し、異なる予測メソッドをシームレスに統合する。これにより、ユーザは、さまざまなケーススタディで安全なRLコントローラの有効性を検証し、全体的なパフォーマンスに対する特定の側面の影響を調べることができる。我々は、異なる安全ガードを特徴とするRLエージェントと、エネルギー管理のコンテキストにおけるモデル予測制御器を比較した数値ケーススタディにより、CommonPowerの汎用性を実証する。

The growing complexity of power system management has led to an increased interest in reinforcement learning (RL). However, vanilla RL controllers cannot themselves ensure satisfaction of system constraints. Therefore, combining them with formally correct safeguarding mechanisms is an important aspect when studying RL for power system management. Integrating safeguarding into complex use cases requires tool support. To address this need, we introduce the Python tool CommonPower. CommonPower's unique contribution lies in its symbolic modeling approach, which enables flexible, model-based safeguarding of RL controllers. Moreover, CommonPower offers a unified interface for single-agent RL, multi-agent RL, and optimal control, with seamless integration of different forecasting methods. This allows users to validate the effectiveness of safe RL controllers across a large variety of case studies and investigate the influence of specific aspects on overall performance. We demonstrate CommonPower's versatility through a numerical case study that compares RL agents featuring different safeguards with a model predictive controller in the context of building energy management.

翻訳日:2024-07-17 20:49:21 公開日:2024-07-16

# マルチエージェント流れのオンライン・ジョイント微調整

Online Joint Fine-tuning of Multi-Agent Flows ( http://arxiv.org/abs/2406.04516v3 )

ライセンス: Link先を確認

Paul Mineiro,

(参考訳) フローはコンポーネントモデルの集合("Agents")であり、反復的なコミュニケーションを通じて複雑な問題の解を構築する。フローはコード生成のための最先端アーキテクチャとして登場し、Autogenのようなフレームワークのラジソンだ。しかし、現在、フローは手動のプロンプト工学と段階的に制御された学習技術の組み合わせで構築されている。本稿では,ラーニング・トゥ・サーチ(Learning to Search,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ(Learning to Search,ラーニング・トゥ・サーチ,ラーニング・トゥ・サーチ)フレームワークに触発されたフロー全体をオンライン共同調整する手順について述べる。このアプローチはシミュレータアクセスを利用してエピソード全体の好みを減らし、個々のノード出力よりも好みを減らし、コンポーネントが言語モデルである場合、後者はよく研究される問題である。このアプローチは、エピソード評価モデルが利用可能であれば、報酬のない設定(例えば、テキストフィードバック)に適用できる。私は、最先端の結果を達成するためのマルチホップQAデータセットMuseicに適用します。

A Flow is a collection of component models ("Agents") which constructs the solution to a complex problem via iterative communication. Flows have emerged as state of the art architectures for code generation, and are the raison d'etre for frameworks like Autogen. However, flows are currently constructed via a combination of manual prompt engineering and stagewise supervised learning techniques; the latter is limited to acyclic flows with granular node supervision. In this writeup I describe a procedure for online joint fine-tuning of an entire flow inspired by the Learning to Search framework. The approach leverages simulator access to reduce preferences over entire episodes to preferences over individual node outputs; when the components are language models the latter is a well-studied problem. The approach is applicable to reward-free settings (e.g., text feedback) if an episode evaluator model is available. I apply to the multi-hop QA dataset Musique achieving a state-of-the-art result.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# LLMによる因果ビジネスプロセス推論のベンチマークに向けて

Towards a Benchmark for Causal Business Process Reasoning with LLMs ( http://arxiv.org/abs/2406.05506v2 )

ライセンス: Link先を確認

Fabiana Fournier, Lior Limonad, Inna Skarbovsky,

(参考訳) 大きな言語モデル(LLM)は、組織の効率向上やタスクの自動化にますます使われています。もともとは複雑な認知プロセスのために設計されたものではないが、近年の取り組みは、推論、計画、意思決定といった活動にLLMを採用するように拡張されている。ビジネスプロセスにおいて、そのような能力は、そのようなプロセスの深い理解を得るために訓練された巨大なコーパスLLMを活用する上で、貴重なものになり得る。本研究は, LLMの因果的・プロセス的視点を推論する能力を評価するため, ベンチマーク開発のための種子を植え付けるものである。この見解を、BP^C(Causally-augmented Business Processes)と呼ぶ。ベンチマークのコアは、BP^C関連の一連の状況と、これらの状況に関する一連の質問と、これらの質問に対する基礎的な真実の答えを体系的に解決するために使用される導出規則から構成される。また、LLMの力により、種子はより大規模なドメイン固有の状況や問題にインスタンス化される。 BP^Cの推論は、プロセスの介入とプロセス改善にとって重要である。我々のベンチマークはhttps://huggingface.co/datasets/ibm/BPCでアクセス可能であり、任意のLLMの性能をテストし、BP^Cを推論するためにLLMを訓練する、2つの可能なモダリティの1つに利用できる。

Large Language Models (LLMs) are increasingly used for boosting organizational efficiency and automating tasks. While not originally designed for complex cognitive processes, recent efforts have further extended to employ LLMs in activities such as reasoning, planning, and decision-making. In business processes, such abilities could be invaluable for leveraging on the massive corpora LLMs have been trained on for gaining deep understanding of such processes. In this work, we plant the seeds for the development of a benchmark to assess the ability of LLMs to reason about causal and process perspectives of business operations. We refer to this view as Causally-augmented Business Processes (BP^C). The core of the benchmark comprises a set of BP^C related situations, a set of questions about these situations, and a set of deductive rules employed to systematically resolve the ground truth answers to these questions. Also with the power of LLMs, the seed is then instantiated into a larger-scale set of domain-specific situations and questions. Reasoning on BP^C is of crucial importance for process interventions and process improvement. Our benchmark, accessible at https://huggingface.co/datasets/ibm/BPC, can be used in one of two possible modalities: testing the performance of any target LLM and training an LLM to advance its capability to reason about BP^C.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# ジェットタグ用粒子多軸変圧器

Particle Multi-Axis Transformer for Jet Tagging ( http://arxiv.org/abs/2406.06638v2 )

ライセンス: Link先を確認

Muhammad Usman, M Husnain Shahid, Maheen Ejaz, Ummay Hani, Nayab Fatima, Abdul Rehman Khan, Asifullah Khan, Nasir Majid Mirza,

(参考訳) ジェットタグは高エネルギー物理学において重要な分類問題である。近年、Deep Learningはジェットタグ付けの課題に発展しただけでなく、パフォーマンスも大幅に向上した。本稿では,新しいアーキテクチャであるParticle Multi-Axis transformer (ParMAT)を提案する。 ParMATは単一ユニット内の局所的およびグローバルな空間的相互作用を含み、様々な入力長を扱う能力を向上させる。 JETCLASSは10種類の粒子からなる1億基のジェットを含む,公開可能な大規模データセットである。 ParMATは、パラレルアテンション機構と粒子のペアワイズ相互作用を統合することにより、ParTとParticleNetに対するロバスト性と高い精度を実現する。巨大なデータセットへのモデルのスケーラビリティと、重要な特徴を自動的に抽出する能力は、ジェットタグの強化の可能性を示している。

Jet tagging is an essential categorization problem in high energy physics. In recent times, Deep Learning has not only risen to the challenge of jet tagging but also significantly improved its performance. In this article, we proposed an idea of a new architecture, Particle Multi-Axis transformer (ParMAT) which is a modified version of Particle transformer (ParT). ParMAT contains local and global spatial interactions within a single unit which improves its ability to handle various input lengths. We trained our model on JETCLASS, a publicly available large dataset that contains 100M jets of 10 different classes of particles. By integrating a parallel attention mechanism and pairwise interactions of particles in the attention mechanism, ParMAT achieves robustness and higher accuracy over the ParT and ParticleNet. The scalability of the model to huge datasets and its ability to automatically extract essential features demonstrate its potential for enhancing jet tagging.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# 次世代データベースインタフェース: LLM-based Text-to-SQL の調査

Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL ( http://arxiv.org/abs/2406.08426v3 )

ライセンス: Link先を確認

Zijin Hong, Zheng Yuan, Qinggang Zhang, Hao Chen, Junnan Dong, Feiran Huang, Xiao Huang,

(参考訳) 自然言語の質問(text-to-SQL)から正確なSQLを生成することは、ユーザ質問の理解、データベーススキーマの理解、SQL生成といった複雑さのため、長年にわたる課題である。人間のエンジニアリングとディープニューラルネットワークからなる従来のテキスト-SQLシステムは、かなりの進歩を遂げた。その後、事前訓練された言語モデル(PLM)が開発され、テキストからSQLまでのタスクに利用され、有望なパフォーマンスを実現している。現代のデータベースが複雑化するにつれて、対応するユーザの質問もますます難しくなり、パラメータ制約のあるPLMが誤ったSQLを生成するようになる。これはより洗練された最適化手法を必要とし、PLMベースのシステムの適用を制限する。近年,大規模言語モデル (LLM) は,モデルスケールが増大するにつれて,自然言語理解において重要な能力を発揮している。したがって、LLMベースの実装を統合することで、テキスト-SQL研究にユニークな機会、改善、ソリューションをもたらすことができる。本稿では LLM ベースのテキスト-to-SQL の総合的なレビューを行う。具体的には,テキスト・トゥ・SQLの技術的課題と進化過程について概説する。次に、テキスト・トゥ・SQLシステムを評価するために設計されたデータセットとメトリクスの詳細を紹介する。その後、LLMベースのテキスト・トゥ・SQLの最近の進歩を体系的に分析する。最後に,この分野での課題について考察し,今後の研究の方向性を期待する。

Generating accurate SQL from natural language questions (text-to-SQL) is a long-standing challenge due to the complexities in user question understanding, database schema comprehension, and SQL generation. Conventional text-to-SQL systems, comprising human engineering and deep neural networks, have made substantial progress. Subsequently, pre-trained language models (PLMs) have been developed and utilized for text-to-SQL tasks, achieving promising performance. As modern databases become more complex, the corresponding user questions also grow more challenging, causing PLMs with parameter constraints to produce incorrect SQL. This necessitates more sophisticated and tailored optimization methods, which, in turn, restricts the applications of PLM-based systems. Recently, large language models (LLMs) have demonstrated significant capabilities in natural language understanding as the model scale increases. Therefore, integrating LLM-based implementation can bring unique opportunities, improvements, and solutions to text-to-SQL research. In this survey, we present a comprehensive review of LLM-based text-to-SQL. Specifically, we propose a brief overview of the technical challenges and the evolutionary process of text-to-SQL. Then, we provide a detailed introduction to the datasets and metrics designed to evaluate text-to-SQL systems. After that, we present a systematic analysis of recent advances in LLM-based text-to-SQL. Finally, we discuss the remaining challenges in this field and propose expectations for future research directions.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# EgoExo-Fitness:Egocentric and Exocentric Full-Body Action Understandingに向けて

EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding ( http://arxiv.org/abs/2406.08877v2 )

ライセンス: Link先を確認

Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, Wei-Shi Zheng,

(参考訳) EgoExo-Fitnessは、新しいフルボディアクション理解データセットで、同期型エゴセントリックカメラと固定型エゴセントリックカメラ(3人称)カメラから記録されたフィットネスシーケンスを特徴とする。既存のフルボディのアクション理解データセットと比較すると、EgoExo-Fitnessは一人称視点のビデオだけでなく、リッチなアノテーションも提供する。具体的には、各アクションのサブステップとともに、単一のアクションビデオをローカライズするために、2段階の時間境界が提供される。さらに重要なのは、EgoExo-Fitnessは、技術的キーポイント検証、アクション実行に関する自然言語コメント、アクション品質スコアを含む、解釈可能なアクション判断のための革新的なアノテーションを導入している。これらすべてを組み合わせることで、EgoExo-Fitnessは、エゴセントリックでエゴセントリックなフルボディの行動理解を"What"、"When"、"How well"の次元で研究するための新たなリソースを提供する。本研究では,行動分類,行動ローカライゼーション,クロスビューシーケンス検証,クロスビュースキル決定,新たに提案されたガイダンスに基づく実行検証タスクなどの一連のタスクのベンチマークを,詳細な分析とともに構築する。コードとデータはhttps://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/mainで入手できる。

We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal boundaries are provided to localize single action videos along with sub-steps of each action. More importantly, EgoExo-Fitness introduces innovative annotations for interpretable action judgement--including technical keypoint verification, natural language comments on action execution, and action quality scores. Combining all of these, EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research on egocentric and exocentric full-body action understanding, we construct benchmarks on a suite of tasks (i.e., action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification), together with detailed analysis. Code and data will be available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# SHMamba:オーディオ・ビジュアル質問応答のための構造的双曲的状態空間モデル

SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering ( http://arxiv.org/abs/2406.09833v3 )

ライセンス: Link先を確認

Zhe Yang, Wenrui Li, Guanghui Cheng,

(参考訳) AVQA(Audio-Visual Question Answering)タスクは、アプリケーションにとって大きな可能性を秘めている。従来のユニモーダルアプローチと比較して、AVQAのマルチモーダル入力は特徴抽出と融合プロセスをより困難にする。ユークリッド空間は、データの多次元関係を効果的に表現することは困難である。特に木構造や階層構造でデータを抽出・処理する場合、ユークリッド空間は埋め込み空間には適さない。さらに、トランスフォーマーの自己保持機構は、シーケンス内の要素間の動的関係を捉えるのに有効である。しかし、ウィンドウモデリングと2次計算複雑性における自己注意機構の限界は、長いシーケンスをモデル化する際の効率を低下させる。これらの制約に対処するため、我々はSHMamba: Structured Hyperbolic State Space Modelを提案し、双曲幾何学と状態空間モデルの利点を統合する。具体的には、SHMambaは双曲空間の内在的性質を利用して、階層構造と音声・視覚データにおける複雑な関係を表現する。一方、状態空間モデルは、全シーケンスをグローバルにモデル化することで、時間とともに動的な変化を捉えます。さらに,適応的な曲率双曲アライメントモジュールとクロスフュージョンブロックを導入し,階層構造の理解とクロスモーダル情報の動的交換を強化する。 SHMambaはより少ないパラメータと計算コストで従来の手法より優れていることを示した。学習可能なパラメータは78.12\%削減され、平均性能は2.53\%向上した。実験の結果,本手法は現在のすべての主要な手法よりも優れており,実用的なアプリケーションシナリオに適していることがわかった。

The Audio-Visual Question Answering (AVQA) task holds significant potential for applications. Compared to traditional unimodal approaches, the multi-modal input of AVQA makes feature extraction and fusion processes more challenging. Euclidean space is difficult to effectively represent multi-dimensional relationships of data. Especially when extracting and processing data with a tree structure or hierarchical structure, Euclidean space is not suitable as an embedding space. Additionally, the self-attention mechanism in Transformers is effective in capturing the dynamic relationships between elements in a sequence. However, the self-attention mechanism's limitations in window modeling and quadratic computational complexity reduce its effectiveness in modeling long sequences. To address these limitations, we propose SHMamba: Structured Hyperbolic State Space Model to integrate the advantages of hyperbolic geometry and state space models. Specifically, SHMamba leverages the intrinsic properties of hyperbolic space to represent hierarchical structures and complex relationships in audio-visual data. Meanwhile, the state space model captures dynamic changes over time by globally modeling the entire sequence. Furthermore, we introduce an adaptive curvature hyperbolic alignment module and a cross fusion block to enhance the understanding of hierarchical structures and the dynamic exchange of cross-modal information, respectively. Extensive experiments demonstrate that SHMamba outperforms previous methods with fewer parameters and computational costs. Our learnable parameters are reduced by 78.12\%, while the average performance improves by 2.53\%. Experiments show that our method demonstrates superiority among all current major methods and is more suitable for practical application scenarios.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# ブロックベースアテンションマスクを用いた効率的かつ効率的な非自己回帰復号化に向けて

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask ( http://arxiv.org/abs/2406.10034v2 )

ライセンス: Link先を確認

Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jing, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu,

(参考訳) 本稿では,非自己回帰(NAR)ブロックベースのアテンションマスクデコーダ(AMD)を提案する。 AMDは、アテンションマスクを用いて隠蔽される出力ラベルの連続ブロック内で並列なNAR推論を行い、ブロック間の左から右へのAR予測と履歴コンテキストのアマルガメーションを行う。ビームサーチアルゴリズムは、CTC、ARデコーダ、AMD確率の動的融合を利用するように設計されている。 LibriSpeech-100hrコーパスの実験では、AMDモジュールを組み込んだトリパルタイトデコーダは、ベースラインのCTC+ARデコードに対して最大1.73xのデコード速度比を発生させるが、テストセットに統計的に有意な単語誤り率(WER)が増加しないことを示唆している。同じデコードリアルタイム因子で操作すると、CTC+ARベースライン上で統計学的に重要なWERの最大0.7%と0.3%の絶対値(5.3%と6.1%の相対値)が得られた。

This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam search algorithm is designed to leverage a dynamic fusion of CTC, AR Decoder, and AMD probabilities. Experiments on the LibriSpeech-100hr corpus suggest the tripartite Decoder incorporating the AMD module produces a maximum decoding speed-up ratio of 1.73x over the baseline CTC+AR decoding, while incurring no statistically significant word error rate (WER) increase on the test sets. When operating with the same decoding real time factors, statistically significant WER reductions of up to 0.7% and 0.3% absolute (5.3% and 6.1% relative) were obtained over the CTC+AR baseline.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# 言葉を超えて: ミッションクリティカルリスク分析における大規模言語モデルでの行動可能性

Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis ( http://arxiv.org/abs/2406.10273v2 )

ライセンス: Link先を確認

Matteo Esposito, Francesco Palagiano, Valentina Lenarduzzi,

(参考訳) コンテキスト。リスク分析は特定のシナリオにおける潜在的なリスクを評価する。リスク分析の原則は、コンテキストレスであり、同じ方法論を、健康や情報技術のセキュリティに関連するリスクに適用することができる。リスク分析には、国内外の規制や基準に関する膨大な知識が必要であり、時間と努力が集中している。大きな言語モデルは、人間よりも少ない時間で情報を素早く要約することができ、特定のタスクに微調整することができる。エイム。本研究は,リスク分析における検索・拡張型LLMと微調整型LLMの有効性を検討することを目的とした実証研究である。我々の知る限り、リスク分析の能力について事前の研究は行われていない。方法。我々は過去5年間に産業状況チームによってアーカイブされた50以上のミッションクリティカルな分析結果から,‘totalscenarios’というユニークなシナリオを手作業でキュレートした。基本モデルであるGPT-3.5とGPT-4とRetrieval-Augmented Generationおよび微調整モデルを比較した。我々は、モデルの競合相手として2人の人間専門家と、3人の人間専門家を雇い、モデルと以前の人間専門家の分析をレビューします。審査員は5000のシナリオ分析を行った。結果と結論。 HEsは高い精度を示したが、LSMsはより速く、より実用的な。さらに,RAG支援LSMが最も低い幻覚率を示し,隠れたリスクを効果的に発見し,人間の専門知識を補完することを示した。したがって、モデルの選択は、正確性のためのFTM、隠れたリスク発見のためのRAG、包括性と行動可能性のためのベースモデルなど、特定のニーズに依存する。したがって、専門家はLLMを、凝縮した時間枠内でのリスク分析を効果的に補完するコンパニオンとして活用することができる。また、不当な対策の実施に伴う不要な費用を回避することでコストを削減できる。

Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks. Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in Risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis. Method. We manually curated \totalscenarios unique scenarios leading to \totalsamples representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the base GPT-3.5 and GPT-4 models versus their Retrieval-Augmented Generation and fine-tuned counterparts. We employ two human experts as competitors of the models and three other three human experts to review the models and the former human expert's analysis. The reviewers analyzed 5,000 scenario analyses. Results and Conclusions. HEs demonstrated higher accuracy, but LLMs are quicker and more actionable. Moreover, our findings show that RAG-assisted LLMs have the lowest hallucination rates, effectively uncovering hidden risks and complementing human expertise. Thus, the choice of model depends on specific needs, with FTMs for accuracy, RAG for hidden risks discovery, and base models for comprehensiveness and actionability. Therefore, experts can leverage LLMs for an effective complementing companion in risk analysis within a condensed timeframe. They can also save costs by averting unnecessary expenses associated with implementing unwarranted countermeasures.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# SparseRadNet:サブサンプルレーダデータに基づくスパース知覚ニューラルネットワーク

SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data ( http://arxiv.org/abs/2406.10600v4 )

ライセンス: Link先を確認

Jialong Wu, Mirko Meuter, Markus Schoeler, Matthias Rottmann,

(参考訳) レーダーに基づく認識は自律走行において注目を集めているが、レーダーの空間性は課題を生じさせている。レーダー生データは、しばしば過剰なノイズを含むが、レーダー点雲は限られた情報しか保持しない。本研究では,レーダ信号のグローバルおよびローカルな依存関係を発見するために,空間パターンを利用した適応型サブサンプリング手法と,適応型ネットワークアーキテクチャを導入することで,レーダデータの疎結合性を均質に扱う。我々のサブサンプリングモジュールは、下流の知覚タスクに最も寄与するレンジドップラー(RD)スペクトルから画素のサブセットを選択する。スパースサブサンプリングデータの特徴抽出を改善するために,レーダデータにグラフニューラルネットワークを適用する新しい手法を提案する。両方のブランチの機能を組み合わせるために、注意深い融合モジュールが適用される。 RADIalデータセットを用いた実験により,SparseRadNetはオブジェクト検出における最先端(SOTA)性能を超え,空間分割におけるSOTA精度に近づき,スパースサブサンプル入力データを用いた。

Radar-based perception has gained increasing attention in autonomous driving, yet the inherent sparsity of radars poses challenges. Radar raw data often contains excessive noise, whereas radar point clouds retain only limited information. In this work, we holistically treat the sparse nature of radar data by introducing an adaptive subsampling method together with a tailored network architecture that exploits the sparsity patterns to discover global and local dependencies in the radar signal. Our subsampling module selects a subset of pixels from range-doppler (RD) spectra that contribute most to the downstream perception tasks. To improve the feature extraction on sparse subsampled data, we propose a new way of applying graph neural networks on radar data and design a novel two-branch backbone to capture both global and local neighbor information. An attentive fusion module is applied to combine features from both branches. Experiments on the RADIal dataset show that our SparseRadNet exceeds state-of-the-art (SOTA) performance in object detection and achieves close to SOTA accuracy in freespace segmentation, meanwhile using sparse subsampled input data.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# WebCanvas: オンライン環境におけるWebエージェントのベンチマーク

WebCanvas: Benchmarking Web Agents in Online Environments ( http://arxiv.org/abs/2406.12373v3 )

ライセンス: Link先を確認

Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, Zhengyang Wu,

(参考訳) Webエージェントが実用的に有用であるためには、ユーザインターフェースやコンテンツへの頻繁な更新を特徴とする、継続的な進化するWeb環境に適応する必要がある。しかし、既存のベンチマークのほとんどは、Webの静的な側面のみをキャプチャしている。このギャップを埋めるために、WebCanvasはWebエージェントのための革新的なオンライン評価フレームワークであり、Webインタラクションの動的な性質を効果的に解決する。現実的な評価を促進するために, WebCanvas には3つの主要な要素がある。(1) 重要な中間動作やタスク完了に必要な状態を確実に捉えつつ,重要イベントや変更された Web 要素によるノイズを無視した,新たな評価指標。 2) Mind2Web-Liveと呼ばれるベンチマークデータセットは、オリジナルのMind2Web静的データセットの洗練されたバージョンで、2439の中間評価状態を持つ542のタスクを含む。 WebCanvas上に構築したエージェントフレームワークは,推論のための拡張可能なモジュールを備えたオープンソースであり,コミュニティがオンライン推論と評価を行うための基盤を提供する。ベストパフォーマンスエージェントは,Mind2Web-Liveテストセット上でのタスク成功率23.1%,タスク完了率48.8%を達成する。さらに,様々なWebサイト,ドメイン,実験環境におけるパフォーマンスの相違について分析する。我々は、オンラインエージェント評価に関するさらなる知見をコミュニティに提供し、この研究分野を前進させることを奨励する。

For web agents to be practically useful, they must adapt to the continuously evolving web environment characterized by frequent updates to user interfaces and content. However, most existing benchmarks only capture the static aspects of the web. To bridge this gap, we introduce WebCanvas, an innovative online evaluation framework for web agents that effectively addresses the dynamic nature of web interactions. WebCanvas contains three main components to facilitate realistic assessments: (1) A novel evaluation metric which reliably capture critical intermediate actions or states necessary for task completions while disregarding noise caused by insignificant events or changed web-elements. (2) A benchmark dataset called Mind2Web-Live, a refined version of original Mind2Web static dataset containing 542 tasks with 2439 intermediate evaluation states; (3) Lightweight and generalizable annotation tools and testing pipelines that enables the community to collect and maintain the high-quality, up-to-date dataset. Building on WebCanvas, we open-source an agent framework with extensible modules for reasoning, providing a foundation for the community to conduct online inference and evaluations. Our best-performing agent achieves a task success rate of 23.1% and a task completion rate of 48.8% on the Mind2Web-Live test set. Additionally, we analyze the performance discrepancies across various websites, domains, and experimental environments. We encourage the community to contribute further insights on online agent evaluation, thereby advancing this field of research.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# QOG:言語モデルに基づくクエクションとオプション生成

QOG:Question and Options Generation based on Language Model ( http://arxiv.org/abs/2406.12381v3 )

ライセンス: Link先を確認

Jincheng Zhou,

(参考訳) 質問-オプション生成(QOG)は、与えられたコンテキストの一連の質問-オプションペアを生成するタスクである。このタスクには、微調整された大規模モデル、情報検索、教育用複数選択質問の自動生成など、さまざまな応用がある。本稿では,細調整シーケンス・ツー・シーケンス言語モデル(LM)に基づく3つの異なる手法を用いてQOGモデルを開発する。実験により、エンドツーエンドのQOGモデルは、トレーニングと推論の両方において計算効率が良く、安定であり、他の手法よりも優れていることが示された。さらに,我々のQOGモデルは,大規模言語モデルであるLlama 3-8Bと比較して,QOGタスクにおいて競合することを示す。

Question-Options Generation (QOG) is a task that involves generating a set of question-options pairs given context. This task has various applications, including fine-tuning large models, information retrieval, and automated multiple-choice question generation for education. In this paper, we develop QOG models using three different methods based on fine-tuning sequence-to-sequence language models (LMs). Experiments demonstrate that the end-to-end QOG model is computationally efficient and stable during both training and inference, outperforming other methods. Furthermore, our analysis indicates that our QOG models are competitive on the QOG task compared to the large language model Llama 3-8B.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# ボソニックボゴリューボフ準粒子の量子幾何学

Quantum geometry of bosonic Bogoliubov quasiparticles ( http://arxiv.org/abs/2406.12981v2 )

ライセンス: Link先を確認

Isaac Tesfaye, André Eckardt,

(参考訳) ボソニックなBogoliubov-de Gennes(BdG)系で生じる位相的および幾何学的特徴は、ベリー曲率の一般化されたシンプレクティック版と関連するチャーン数を用いて主に研究されている。ここではシンプレクティック量子幾何テンソル(SQGT)を提案し、その虚部が以前に研究されたシンプレクティックベリー曲率を導く一方、実部はシンプレクティック量子計量を生じさせ、ボゴリューボフモードの空間における自然な距離測度を与える。本稿では,SQGTのパラメータの周期的変調に応答して励起率を抽出し,SQGTのすべての成分を測定する方法を提案する。さらに、シンプレクティックベリー曲率をボゴリューボフ・ブロッホ波パケットの一般化されたシンプレクティック異常速度項に接続する。ボソニックなボゴリューボフ・ハルダンモデルについて実験を行った。

Topological and geometrical features arising in bosonic Bogoliubov-de Gennes (BdG) systems have mainly been studied by utilizing a generalized symplectic version of the Berry curvature and related Chern numbers. Here, we propose a symplectic quantum geometric tensor (SQGT), whose imaginary part leads to the previously studied symplectic Berry curvature, while the real part gives rise to a symplectic quantum metric, providing a natural distance measure in the space of bosonic Bogoliubov modes. We propose how to measure all components of the SQGT by extracting excitation rates in response to periodic modulations of the systems' parameters. Moreover, we connect the symplectic Berry curvature to a generalized symplectic anomalous velocity term for Bogoliubov Bloch wave packets. We test our results for a bosonic Bogoliubov-Haldane model.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# 臨床適応を考慮した生体医用ビジュアルインストラクションチューニング

Biomedical Visual Instruction Tuning with Clinician Preference Alignment ( http://arxiv.org/abs/2406.13173v3 )

ライセンス: Link先を確認

Hejie Cui, Lingjun Mao, Xin Liang, Jieyu Zhang, Hui Ren, Quanzheng Li, Xiang Li, Carl Yang,

(参考訳) マルチモーダル基礎モデルの最近の進歩は、視覚情報やテキスト情報による理解と推論において、印象的な能力を示した。これらの基礎モデルをバイオメディシンのような特殊なドメインに適用するには、大規模なドメイン固有の命令データセットが必要である。既存の作業では、そのようなデータセットを自動的にキュレーションする方法が検討されているが、結果のデータセットは、ドメインの専門知識と明確に一致していない。本研究では,臨床医の嗜好をバイオメディカル・マルチモーダル基礎モデルのチューニングのための指導データの生成と選択の両段階に組み込むデータ中心型ビオメディカル・ビジュアル・インストラクション・チューニング(BioMed-VITAL)を提案する。まず,GPT-4Vジェネレータに,好みに整合したデータ候補生成のための多種多様なクリニック選択による実演を誘導する。そして、選択期間中に、臨床医と政策指導を受けたモデルの選別を評価関数に明示的に蒸留して、医用指導のための高品質なデータを選択する別個の選別モデルを訓練する。その結果,提案手法から得られた指示追従データに調整したモデルでは,オープン・ビジュアル・チャット(18.5%)と医療用VQA(81.73%)の大幅な改善が見られた。 BioMed-VITAL.github.ioでは、インストラクション追跡データとモデルが利用可能です。

Recent advancements in multimodal foundation models have showcased impressive capabilities in understanding and reasoning with visual and textual information. Adapting these foundation models trained for general usage to specialized domains like biomedicine requires large-scale domain-specific instruction datasets. While existing works have explored curating such datasets automatically, the resultant datasets are not explicitly aligned with domain expertise. In this work, we propose a data-centric framework, Biomedical Visual Instruction Tuning with Clinician Preference Alignment (BioMed-VITAL), that incorporates clinician preferences into both stages of generating and selecting instruction data for tuning biomedical multimodal foundation models. First, during the generation stage, we prompt the GPT-4V generator with a diverse set of clinician-selected demonstrations for preference-aligned data candidate generation. Then, during the selection phase, we train a separate selection model, which explicitly distills clinician and policy-guided model preferences into a rating function to select high-quality data for medical instruction tuning. Results show that the model tuned with the instruction-following data from our method demonstrates a significant improvement in open visual chat (18.5% relatively) and medical VQA (win rate up to 81.73%). Our instruction-following data and models are available at BioMed-VITAL.github.io.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# 空間ボット:視覚言語モデルを用いた精密空間理解

SpatialBot: Precise Spatial Understanding with Vision Language Models ( http://arxiv.org/abs/2406.13642v3 )

ライセンス: Link先を確認

Wenxiao Cai, Yaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao,

(参考訳) 視覚言語モデル(VLM)は2次元画像理解において目覚ましい性能を達成しているが、Embodied AIの基盤である空間的理解に苦慮している。本稿では,RGB画像と深度画像の両方をフィードすることで,空間的理解を向上させるためのSpatialBotを提案する。さらに、深度理解のためのVLMを訓練するために、多段階の深度関連質問を含むSpatialQAデータセットを構築した。最後に、異なるレベルでの空間理解におけるVLMの能力を総合的に評価するために、SpatialBenchを提案する。我々の空間理解ベンチマーク、一般的なVLMベンチマーク、Embodied AIタスクに関する大規模な実験は、SpatialQAでトレーニングされたSpatialBotの顕著な改善を実証している。モデル、コード、データはhttps://github.com/BAAI-DCAI/SpatialBotで入手できる。

Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI. In this paper, we propose SpatialBot for better spatial understanding by feeding both RGB and depth images. Additionally, we have constructed the SpatialQA dataset, which involves multi-level depth-related questions to train VLMs for depth understanding. Finally, we present SpatialBench to comprehensively evaluate VLMs' capabilities in spatial understanding at different levels. Extensive experiments on our spatial-understanding benchmark, general VLM benchmarks and Embodied AI tasks, demonstrate the remarkable improvements of SpatialBot trained on SpatialQA. The model, code and data are available at https://github.com/BAAI-DCAI/SpatialBot.

翻訳日:2024-07-17 20:39:37 公開日:2024-07-16

# REVEAL-IT:InTerpretabilityのための進化エージェントpoLicyの可視性を用いた強化学習

REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability ( http://arxiv.org/abs/2406.14214v4 )

ライセンス: Link先を確認

Shuang Ao, Simon Khan, Haris Aziz, Flora D. Salim,

(参考訳) エージェントの学習過程、特にその成功や訓練後の失敗に寄与する要因を理解することは、エージェントの意思決定プロセスの背後にある根拠を理解するために重要である。従来の手法では、構造因果モデル(SCM)を作成したり、価値関数の分布を視覚的に表現することで学習過程を明らかにする。しかしながら、これらのアプローチは2次元環境や複雑でない遷移力学でのみ機能するので制約がある。複雑な環境やタスクでエージェントの学習プロセスを理解することはより難しい。本稿では,複雑な環境下でエージェントの学習過程を説明するための新しいフレームワークであるREVEAL-ITを提案する。まず,様々な学習課題に対する政策構造とエージェントの学習過程を可視化する。これらの知見を可視化することにより、特定のトレーニングタスクやステージがテストにおけるエージェントのパフォーマンスにどの程度影響するかを理解することができる。そして、GNNベースの説明者がポリシーの最も重要な部分を強調することを学び、エージェントの学習プロセスについてより明確で堅牢な説明を提供する。実験により,本フレームワークから導出した説明は,学習効率の向上と最終性能の向上に有効であることが示された。

Understanding the agent's learning process, particularly the factors that contribute to its success or failure post-training, is crucial for comprehending the rationale behind the agent's decision-making process. Prior methods clarify the learning process by creating a structural causal model (SCM) or visually representing the distribution of value functions. Nevertheless, these approaches have constraints as they exclusively function in 2D-environments or with uncomplicated transition dynamics. Understanding the agent's learning process in complicated environments or tasks is more challenging. In this paper, we propose REVEAL-IT, a novel framework for explaining the learning process of an agent in complex environments. Initially, we visualize the policy structure and the agent's learning process for various training tasks. By visualizing these findings, we can understand how much a particular training task or stage affects the agent's performance in test. Then, a GNN-based explainer learns to highlight the most important section of the policy, providing a more clear and robust explanation of the agent's learning process. The experiments demonstrate that explanations derived from this framework can effectively help in the optimization of the training tasks, resulting in improved learning efficiency and final performance.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# アップロード可能な機械学習のためのLoRAエキスパートの検索・拡張混合

Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning ( http://arxiv.org/abs/2406.16989v2 )

ライセンス: Link先を確認

Ziyu Zhao, Leilei Gan, Guoyin Wang, Yuwei Hu, Tao Shen, Hongxia Yang, Kun Kuang, Fei Wu,

(参考訳) Low-Rank Adaptation (LoRA)は、大規模言語モデル(LLM)を微調整する効率的な方法を提供する。モジュール性とプラグアンドプレイ性により、様々なドメイン固有のLoRAの統合が可能になり、LLMの能力が向上する。 HuggingfaceやModelscopeのようなオープンソースのプラットフォームは、新しい計算パラダイムであるUploadable Machine Learning (UML)を導入した。 UMLでは、コントリビュータは専用のアダプタをトレーニングするために分散データを使用し、LLMを改善するために中央プラットフォームにアップロードされる。このプラットフォームでは、ドメイン固有のアダプタを使用して、パーソナライズされたサービスを必要とする混合タスク要求を処理する。 LoRAの以前の研究は、特定のタスクに焦点を当てたり、トレーニング中のLoRAの選択を修正したりしていた。しかしUMLでは、LoRAのプールは動的に更新され、新しいアップロードが加えられる。さらに、ダウンストリームリクエストの混在する性質は、パーソナライズされたサービスを必要とします。これらの課題に対処するために、入力プロンプトに基づいて複数のLoRAを適応的に検索・構成するフレームワークであるLora Experts (RAMoLE)を提案する。 RAMoLEには、関連するLoRAを特定して検索するLoraRetriever、取得したLoRAをコーディネートするオンザフライのMoLEメカニズム、異種リクエストを処理するための効率的なバッチ推論の3つの主要コンポーネントがある。実験の結果、RAMoLEはベースラインを一貫して上回り、その有効性とスケーラビリティを強調している。

Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs). Its modular and plug-and-play nature allows the integration of various domain-specific LoRAs, enhancing LLM capabilities. Open-source platforms like Huggingface and Modelscope have introduced a new computational paradigm, Uploadable Machine Learning (UML). In UML, contributors use decentralized data to train specialized adapters, which are then uploaded to a central platform to improve LLMs. This platform uses these domain-specific adapters to handle mixed-task requests requiring personalized service. Previous research on LoRA composition either focuses on specific tasks or fixes the LoRA selection during training. However, in UML, the pool of LoRAs is dynamically updated with new uploads, requiring a generalizable selection mechanism for unseen LoRAs. Additionally, the mixed-task nature of downstream requests necessitates personalized services. To address these challenges, we propose Retrieval-Augmented Mixture of LoRA Experts (RAMoLE), a framework that adaptively retrieves and composes multiple LoRAs based on input prompts. RAMoLE has three main components: LoraRetriever for identifying and retrieving relevant LoRAs, an on-the-fly MoLE mechanism for coordinating the retrieved LoRAs, and efficient batch inference for handling heterogeneous requests. Experimental results show that RAMoLE consistently outperforms baselines, highlighting its effectiveness and scalability.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# 反断熱駆動による超低温原子によるNOON状態の加速生成

Accelerated creation of NOON states with ultracold atoms via counterdiabatic driving ( http://arxiv.org/abs/2406.17545v3 )

ライセンス: Link先を確認

Simon Dengis, Sandro Wimberger, Peter Schlagheck,

(参考訳) 量子制御プロトコルは、2つのモードでN$ウルトラコールドボソニック原子を持つNOON状態を生成するために提案され、コヒーレントな重ね合わせ $\vert N,0\rangle + \vert 0,N\rangle$ に対応する。この状態は、最初に全てのボソンが配置され、他の2つのモードと対称に結合された第3モードを用いて作成することができる。この第3モードのエネルギーを他のモードのエネルギーレベルに調整することで、NOON状態の断熱的な生成が可能になる。通常、このプロセスは実用性には時間がかかりすぎるが、関連するスペクトルギャップの小さいため、効率的なギャップ工学を可能にする反断熱駆動によって劇的に加速することができる。このプロセスは、超低温量子ガスで実験的に実現可能な静的パラメータ適応の観点で実装可能であることを実証する。要求されるプロトコル速度における利得因子は、関与する原子の数と指数関数的に増加し、したがって、この断熱遷移の根底にある指数関数的に遅い集団トンネル過程と相反する。

A quantum control protocol is proposed for the creation of NOON states with $N$ ultracold bosonic atoms on two modes, corresponding to the coherent superposition $\vert N,0\rangle + \vert 0,N\rangle$. This state can be prepared by using a third mode where all bosons are initially placed and which is symmetrically coupled to the two other modes. Tuning the energy of this third mode across the energy level of the other modes allows the adiabatic creation of the NOON state. While this process normally takes too much time to be of practical usefulness, due to the smallness of the involved spectral gap, it can be drastically boosted through counterdiabatic driving which allows for efficient gap engineering. We demonstrate that this process can be implemented in terms of static parameter adaptations that are experimentally feasible with ultracold quantum gases. Gain factors in the required protocol speed are obtained that increase exponentially with the number of involved atoms and thus counterbalance the exponentially slow collective tunneling process underlying this adiabatic transition.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# 小結晶の接地状態と接地状態の分離

Fast Ground State to Ground State Separation of Small Ion Crystals ( http://arxiv.org/abs/2406.17750v3 )

ライセンス: Link先を確認

Tyler H. Guglielmo, Dietrich Leibfried, Stephen B. Libby, Daniel H. Slichter,

(参考訳) 捕捉されたイオンの線形結晶を異なるサブセットに素早く分離することは、捕捉されたイオン量子コンピューティングアーキテクチャを実現する上で重要である。我々は,同種結晶と混合種結晶のより小さな部分集合への分離を記述するのに使用できる一般的な理論的枠組みを紹介する。この枠組みは二次ハミルトニアンの下でのガウス運動状態の進化の効率的な記述に依存しており、時間依存の応用ポテンシャルとイオンの相互クーロン反発の影響の下で量子進化を記述するために、イオンの古典的な運動方程式の特別な解のみを必要とする。本研究では, 混合種3イオン結晶の分離に適した時間依存性応用電位について, クーロン反発による自由膨張と同様の時間スケールで示し, 結晶軸に沿った全てのモードが基底状態に近づき, 終了することを示す。 3つの分離された混合種イオンは、この分離過程の時間反転によってエネルギーのゲインなしで1つの井戸に保持される結晶に結合することができる。

Rapid separation of linear crystals of trapped ions into different subsets is critical for realizing trapped ion quantum computing architectures where ions are rearranged in trap arrays to achieve all-to-all connectivity between qubits. We introduce a general theoretical framework that can be used to describe the separation of same-species and mixed-species crystals into smaller subsets. The framework relies on an efficient description of the evolution of Gaussian motional states under quadratic Hamiltonians that only requires a special solution of the classical equations of motion of the ions to describe their quantum evolution under the influence of a time-dependent applied potential and the ions' mutual Coulomb repulsion. We provide time-dependent applied potentials suitable for separation of a mixed species three-ion crystal on timescales similar to that of free expansion driven by Coulomb repulsion, with all modes along the crystal axis starting and ending close to their ground states. Three separately-confined mixed species ions can be combined into a crystal held in a single well without energy gain by time-reversal of this separation process.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# UniRec:シーケンスレコメンデーションにおける均一性と周波数の二重化

UniRec: A Dual Enhancement of Uniformity and Frequency in Sequential Recommendations ( http://arxiv.org/abs/2406.18470v3 )

ライセンス: Link先を確認

Yang Liu, Yitong Wang, Chenyue Feng,

(参考訳) ユーザのインタラクションパターンを正確にモデル化し、レコメンデーション精度を向上させるためには、シーケンシャルなレコメンデーションでの表現学習が重要である。しかし、既存のアプローチは主にアイテム間遷移を強調しており、しばしば行動パターンの変化と密接に関連する相互作用間の時間間隔を無視している。さらに、アイテム周波数などのより広範な相互作用属性は、しばしば見過ごされる。その結果,より均一な時間間隔を持つシーケンスと高い周波数を持つアイテムの両方で予測性能が向上することが判明した。逆に、一様でないシーケンスはユーザーの関心のドリフトを悪化させ、スパースサンプリングにより頻繁でないアイテムをモデル化することは困難であり、現在の手法では不十分に対処する固有の課題が提示される。本稿では,新しい双方向拡張シーケンシャルレコメンデーション手法であるUniRecを提案する。 UniRecは、シーケンスの均一性とアイテムの頻度を活用してパフォーマンスを高め、特に一様でないシーケンスやあまり頻度の低いアイテムの表現を改善している。これら2つのブランチは相互に強化され、複雑なシーケンシャルなレコメンデーションシナリオにおける包括的なパフォーマンス最適化を推進します。さらに,適応性をさらに向上する多次元時間モジュールを提案する。我々の知る限り、UniRecは特徴増強のための均一性と周波数の特性を利用する最初の方法である。 4つのデータセットにまたがる11の高度なモデルと比較して、UniRecがSOTAモデルを大幅に上回っていることを示す。コードはhttps://github.com/Linxi000/UniRec.comで入手できる。

Representation learning in sequential recommendation is critical for accurately modeling user interaction patterns and improving recommendation precision. However, existing approaches predominantly emphasize item-to-item transitions, often neglecting the time intervals between interactions, which are closely related to behavior pattern changes. Additionally, broader interaction attributes, such as item frequency, are frequently overlooked. We found that both sequences with more uniform time intervals and items with higher frequency yield better prediction performance. Conversely, non-uniform sequences exacerbate user interest drift and less-frequent items are difficult to model due to sparse sampling, presenting unique challenges inadequately addressed by current methods. In this paper, we propose UniRec, a novel bidirectional enhancement sequential recommendation method. UniRec leverages sequence uniformity and item frequency to enhance performance, particularly improving the representation of non-uniform sequences and less-frequent items. These two branches mutually reinforce each other, driving comprehensive performance optimization in complex sequential recommendation scenarios. Additionally, we present a multidimensional time module to further enhance adaptability. To the best of our knowledge, UniRec is the first method to utilize the characteristics of uniformity and frequency for feature augmentation. Comparing with eleven advanced models across four datasets, we demonstrate that UniRec outperforms SOTA models significantly. The code is available at https://github.com/Linxi000/UniRec.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# 人間-AI協調型分類体系の構築--専門的な書記アシスタントを事例として

Human-AI Collaborative Taxonomy Construction: A Case Study in Profession-Specific Writing Assistants ( http://arxiv.org/abs/2406.18675v2 )

ライセンス: Link先を確認

Minhwa Lee, Zae Myung Kim, Vivek Khetan, Dongyeop Kang,

(参考訳) LLM(Large Language Models)は、テキストのリビジョンやストーリー生成など、複数の作業において人間を支援する。しかし、ドメイン固有の記述、特にビジネスコンテキストにおけるサポートの有効性は、比較的調査されていない。業界専門家とのフォーマティブな研究により、このようなドメイン固有の文章のニュアンスに対する現在のLLMの理解の限界が明らかになった。このギャップに対処するため、我々は、ドメイン固有書記アシスタントのガイドラインとして機能する人間-AI協調分類開発手法を提案する。この手法は、ドメインの専門家からの反復的なフィードバックと、これらの専門家とLSM間の複数の相互作用を統合し、分類学を洗練させる。大規模な実験を通じて、我々はこの方法論を検証し、LCMを活用した筆記支援を改善することを目指しており、異なる利害関係者のニーズのユニークな要件を満たすように調整している。

Large Language Models (LLMs) have assisted humans in several writing tasks, including text revision and story generation. However, their effectiveness in supporting domain-specific writing, particularly in business contexts, is relatively less explored. Our formative study with industry professionals revealed the limitations in current LLMs' understanding of the nuances in such domain-specific writing. To address this gap, we propose an approach of human-AI collaborative taxonomy development to perform as a guideline for domain-specific writing assistants. This method integrates iterative feedback from domain experts and multiple interactions between these experts and LLMs to refine the taxonomy. Through larger-scale experiments, we aim to validate this methodology and thus improve LLM-powered writing assistance, tailoring it to meet the unique requirements of different stakeholder needs.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# Zボソンの仮想励起から生じるニュートリノ振動

Neutrino oscillations originate from virtual excitation of Z bosons ( http://arxiv.org/abs/2407.00954v2 )

ライセンス: Link先を確認

Shi-Biao Zheng,

(参考訳) ニュートリノ振動を説明するために、ニュートリノは無消滅質量を持ち、各フレーバー固有状態は3つの異なる質量固有状態によって形成され、確率振幅はその伝播中に互いに干渉する。しかし、エネルギー保存法則は、もし存在するならば、ニュートリノと同じ弱い相互作用によって生成された他の粒子の異なる結合エネルギー固有状態と絡み合わなければならない。この絡み合いによってニュートリノの質量固有状態間の量子コヒーレンスが破壊され、前述の仮定の下でのフレーバーの振動の原因となる。ニュートリノ振動は、実際に空間上を拡散するZボゾン場の仮想励起に由来する。伝播中、ニュートリノは継続的に励起し、すぐに仮想Zボソンを吸収する。この仮想ボゾン励起はニュートリノに逆作用を起こし、3つのフレーバーの間で振動する。ニュートリノが物質中に伝播するとき、その挙動は散乱に起因するコヒーレントフレーバー変換とデコヒーレンス効果の競合によって決定される。

To account for neutrino oscillations, it is postulated that the neutrino has nonvanishing mass and each flavor eigenstate is formed by three distinct mass eigenstates, whose probability amplitudes interfere with each other during its propagation. However, I find that the energy conservation law requires these mass eigenstates, if they exist, to be entangled with distinct joint energy eigenstates of the other particles produced by the same weak interaction as the neutrino. This entanglement destroys the quantum coherence among the neutrino's mass eigenstates, which is responsible for flavor oscillations under the aforementioned postulation. I reveal that the neutrino oscillations actually originate from virtual excitation of the Z bosonic field diffusing over the space. During the propagation, the neutrino can continually excite and then immediately re-absorb a virtual Z boson. This virtual bosonic excitation produces a backaction on the neutrino, enabling it to oscillate among three flavors. When the neutrino propagates in matter, its behavior is determined by the competition between the coherent flavor transformation and decoherence effect resulting from scatterings.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# アクター・クリティカル強化学習による測地線の生成と中間点の予測

Generation of Geodesics with Actor-Critic Reinforcement Learning to Predict Midpoints ( http://arxiv.org/abs/2407.01991v2 )

ライセンス: Link先を確認

Kazumi Kasaura,

(参考訳) 無限小に定義された測度を持つ多様体上のすべての対の最も短い経路を見つけるために、中間点を再帰的に予測し、中間点予測を学ぶアクター・クリティカルな方法を提案する。提案手法は,提案手法が局所的・グローバルな経路計画タスクにおいて既存手法よりも優れていることを示す。

To find the shortest paths for all pairs on manifolds with infinitesimally defined metrics, we propose to generate them by predicting midpoints recursively and an actor-critic method to learn midpoint prediction. We prove the soundness of our approach and show experimentally that the proposed method outperforms existing methods on both local and global path planning tasks.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# カメラベースセマンティックシーン補完のための階層的時間文脈学習

Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion ( http://arxiv.org/abs/2407.02077v3 )

ライセンス: Link先を確認

Bohan Li, Jiajun Deng, Wenyao Zhang, Zhujin Liang, Dalong Du, Xin Jin, Wenjun Zeng,

(参考訳) カメラベースの3Dセマンティックシーン補完(SSC)は、2D画像の観察に制限のある複雑な3Dレイアウトを予測するために重要である。既存の主流のソリューションは一般的に、履歴フレームを概ね積み重ねて現在のフレームを補うことで、時間的情報を活用する。この問題に対処するために、カメラベースのセマンティックシーン補完を改善するための新しい階層型時間文脈学習パラダイムであるHTCLを提案する。この研究の主な革新は、時間的文脈学習を2つの階層的なステップに分解することである。 a)クロスフレーム親和性測定および (b)親和性に基づくダイナミックリファインメント。まず、重要コンテキストを冗長な情報から分離するために、パターン親和性とスケールアウェアアイソレーションと、よりきめ細かいコンテキスト対応モデリングのための複数の独立した学習者を導入する。その後、不完全観測を動的に補償するために、初期同定されたアフィニティの高い位置とその周辺地域に基づいて特徴サンプリング位置を適応的に洗練する。提案手法はSemanticKITTIベンチマークで1^{st}$をランク付けし,OpenOccupancyベンチマークでmIoUの点でLiDARベースのメソッドを超えている。私たちのコードはhttps://github.com/Arlo0o/HTCL.comで利用可能です。

Camera-based 3D semantic scene completion (SSC) is pivotal for predicting complicated 3D layouts with limited 2D image observations. The existing mainstream solutions generally leverage temporal information by roughly stacking history frames to supplement the current frame, such straightforward temporal modeling inevitably diminishes valid clues and increases learning difficulty. To address this problem, we present HTCL, a novel Hierarchical Temporal Context Learning paradigm for improving camera-based semantic scene completion. The primary innovation of this work involves decomposing temporal context learning into two hierarchical steps: (a) cross-frame affinity measurement and (b) affinity-based dynamic refinement. Firstly, to separate critical relevant context from redundant information, we introduce the pattern affinity with scale-aware isolation and multiple independent learners for fine-grained contextual correspondence modeling. Subsequently, to dynamically compensate for incomplete observations, we adaptively refine the feature sampling locations based on initially identified locations with high affinity and their neighboring relevant regions. Our method ranks $1^{st}$ on the SemanticKITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU on the OpenOccupancy benchmark. Our code is available on https://github.com/Arlo0o/HTCL.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# TIGER: 実践的なPython型推論のための生成テーマランキングフレームワーク

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference ( http://arxiv.org/abs/2407.02095v2 )

ライセンス: Link先を確認

Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, Xin Peng,

(参考訳) Pythonの動的型付けシステムは柔軟性と表現力を提供するが、型関連のエラーを引き起こす可能性があるため、型ヒントを強化するために自動型推論が必要になる。既存の学習ベースのアプローチは有望な推論精度を示しているが、複雑なジェネリックタイプや(見えない)ユーザ定義型など、さまざまなタイプを包括的に扱うという実践的な課題に苦慮している。本稿では,Pythonの多種多様な型カテゴリを効果的に扱えるように設計された2段階生成レベル(GTR)フレームワークであるTIGERを紹介する。 TIGERは、微調整された事前訓練されたコードモデルを利用して、スパンマスキングの目的を持つ生成モデルを訓練し、対照的なトレーニングの目的を持つ類似モデルを訓練する。このアプローチにより、TIGERは生成段階の複雑なジェネリクスを含む幅広い型候補を生成し、ランキング段階のユーザ定義型を正確にランク付けすることができる。 ManyTypes4Pyデータセットに対する評価は、TIGERが様々なタイプのカテゴリで既存のメソッドよりも優れていることを示し、特にTop-5 Exact Matchにおいて、ユーザ定義型と未確認型をそれぞれ11.2%、20.1%の精度で推測する際の精度を向上している。さらに、実験結果は、TIGERの優れた性能と効率を示すだけでなく、自動型推論の自動化における生成およびランキングステージの重要性も示している。

Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# プロトタイプベース継手埋め込み法によるソフトマックス分類器の説明可能性の向上

Improving Explainability of Softmax Classifiers Using a Prototype-Based Joint Embedding Method ( http://arxiv.org/abs/2407.02271v2 )

ライセンス: Link先を確認

Hilarie Sit, Brendan Keith, Karianne Bergen,

(参考訳) 本稿では,プロトタイプの確率的サンプリングによって生成される予測信頼度を提供するソフトマックス分類器の説明可能性向上のためのプロトタイプベースアプローチを提案し,分布検出(OOD)の可能性を示す。モデルアーキテクチャとトレーニングを変更して、トレーニングデータセットの任意のクラス例と類似性を利用して予測を行うことで、予測に寄与する原型例のサンプルを取得でき、モデルの決定に対するインスタンスベースの説明を提供する。さらに,モデルの潜在空間内の相対距離からトレーニングデータセットから画像間の関係を学習することにより,分布データからソフトマックスの信頼性よりも検出可能な不確かさの指標を得る。

We propose a prototype-based approach for improving explainability of softmax classifiers that provides an understandable prediction confidence, generated through stochastic sampling of prototypes, and demonstrates potential for out of distribution detection (OOD). By modifying the model architecture and training to make predictions using similarities to any set of class examples from the training dataset, we acquire the ability to sample for prototypical examples that contributed to the prediction, which provide an instance-based explanation for the model's decision. Furthermore, by learning relationships between images from the training dataset through relative distances within the model's latent space, we obtain a metric for uncertainty that is better able to detect out of distribution data than softmax confidence.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# ブラックボックスワーク抽出と複合仮説テスト

Black box work extraction and composite hypothesis testing ( http://arxiv.org/abs/2407.03400v2 )

ライセンス: Link先を確認

Kaito Watanabe, Ryuji Takagi,

(参考訳) ワーク抽出は量子熱力学において最も中心的なプロセスの1つである。しかし、最適抽出可能な作業の事前解析は、初期状態に関する完全な情報が与えられる限られた運用シナリオに限定されている。ここでは,ブラックボックス作業抽出の一般的な枠組みを紹介し,初期状態に関する情報の入手不能に対処する。ブラックボックス設定における最適抽出可能作業は,情報理論の基本的な問題である複合仮説テストタスクの性能によって完全に特徴づけられ,この一般関係を用いて,合成仮説テストにおける量子シュタインの補題への漸近ブラックボックスワーク抽出を削減し,ヘルムホルツ自由エネルギーの観点からそれらの正確な特徴付けを行うことができることを示す。また、この物理環境では、合成仮説が特定の相関を含む新しい量子シュタインの補題も示している。本研究は、初期状態に関する情報の重要性を示し、複合量子仮説テストにおける量の新しい解釈を与え、物理設定と情報理論の相互作用を奨励する。

Work extraction is one of the most central processes in quantum thermodynamics. However, the prior analysis of optimal extractable work has been restricted to a limited operational scenario where complete information about the initial state is given. Here, we introduce a general framework of black box work extraction, which addresses the inaccessibility of information on the initial state. We show that the optimal extractable work in the black box setting is completely characterized by the performance of a composite hypothesis testing task, a fundamental problem in information theory.We employ this general relation to reduce the asymptotic black box work extraction to the quantum Stein's lemma in composite hypothesis testing, allowing us to provide their exact characterization in terms of the Helmholtz free energy. We also show a new quantum Stein's lemma motivated in this physical setting, where a composite hypothesis contains a certain correlation. Our work exhibits the importance of information about the initial state and gives a new interpretation of the quantities in the composite quantum hypothesis testing, encouraging the interplay between the physical settings and the information theory.

翻訳日:2024-07-17 20:29:52 公開日:2024-07-16

# 慣用翻訳におけるLLM能力の向上

Improving LLM Abilities in Idiomatic Translation ( http://arxiv.org/abs/2407.03518v2 )

ライセンス: Link先を確認

Sundesh Donthi, Maximilian Spencer, Om Patel, Joon Doh, Eid Rodan,

(参考訳) NLLBやGPTのような大きな言語モデル(LLM)では、イディオムの翻訳は依然として困難である。我々のゴールは、本来の言語スタイルを保ちながら、慣用的な言語のLLM処理を改善することで、翻訳の忠実性を高めることである。これは、文化的なニュアンスを維持し、翻訳されたテキストがその意図と感情的共鳴を維持し、より優れた文化的なコミュニケーションを育むことを保証するため、大きな社会的影響を持つ。これまでの研究は、翻訳に使用する慣用句の意味をLLMに提供することで、IdiomKBのような知識ベースを利用してきた。この手法は直接翻訳よりも優れた結果を得たが、言語間で慣用的な書体を維持する能力は依然として限られている。本研究では,対象言語に対応するイディオムを見つけるために,知識ベースを拡大する。本研究は,2つの手法を用いて翻訳を行う。第1の方法はSentence Transformersモデルを用いて,原語と対象言語のイディオムの意味のコサイン類似度スコアを意味的に生成し,最適なイディオムを選択する(コサイン類似度法)。第2の方法は、LLM生成イディオム法(LLM生成イディオム法)において、対象言語で対応するイディオムを見つけるためにLLMを使用する。ベースラインとして、追加情報を提供しずに直接翻訳を行った。英語・中国語・中国語の人的評価は,すべてのGPT4o翻訳において,コサイン類似性検索法が他より優れていたことを示している。 IdiomKBのさらなる構築のために、Urduイディオムとそれらの翻訳を含む低リソースなUrduデータセットを開発した。データセットの制限にもかかわらず、Cosine similarity Lookupメソッドは、将来性を示し、言語障壁を克服し、中国語とウルドゥー語における多様な文学作品の探索を可能にする。

For large language models (LLMs) like NLLB and GPT, translating idioms remains a challenge. Our goal is to enhance translation fidelity by improving LLM processing of idiomatic language while preserving the original linguistic style. This has a significant social impact, as it preserves cultural nuances and ensures translated texts retain their intent and emotional resonance, fostering better cross-cultural communication. Previous work has utilized knowledge bases like IdiomKB by providing the LLM with the meaning of an idiom to use in translation. Although this method yielded better results than a direct translation, it is still limited in its ability to preserve idiomatic writing style across languages. In this research, we expand upon the knowledge base to find corresponding idioms in the target language. Our research performs translations using two methods: The first method employs the SentenceTransformers model to semantically generate cosine similarity scores between the meanings of the original and target language idioms, selecting the best idiom (Cosine Similarity method). The second method uses an LLM to find a corresponding idiom in the target language for use in the translation (LLM-generated idiom method). As a baseline, we performed a direct translation without providing additional information. Human evaluations on the English -> Chinese, and Chinese -> English show the Cosine Similarity Lookup method out-performed others in all GPT4o translations. To further build upon IdiomKB, we developed a low-resource Urdu dataset containing Urdu idioms and their translations. Despite dataset limitations, the Cosine Similarity Lookup method shows promise, potentially overcoming language barriers and enabling the exploration of diverse literary works in Chinese and Urdu.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# Beyond Pixels: マルチスケールパッチベースマルチラベル分類器による半スーパービジョンセマンティックセマンティックセマンティックセグメンテーション

Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier ( http://arxiv.org/abs/2407.04036v2 )

ライセンス: Link先を確認

Prantik Howlader, Srijan Das, Hieu Le, Dimitris Samaras,

(参考訳) ピクセルコンテキスト情報を組み込むことは、正確なセグメンテーションに不可欠である。本稿では,文脈情報を組み込む効果的な方法は,パッチベースの分類器によるものであることを示す。このパッチ分類器は、画像領域内に存在するクラスを識別するように訓練され、イントラクタの除去を容易にし、小さなオブジェクトセグメントの分類を強化する。具体的には、既存の半教師付きセグメンテーション(SSS)フレームワーク用に設計された新しいプラグインモジュールであるMPMC(Multiscale Patch-based Multi-label Classifier)を紹介する。 MPMCはパッチレベルの監視を提供し、パッチ内の異なるクラスのピクセル領域の識別を可能にする。さらに、MPMCは、教師のうるさい疑似ラベル監視の影響を軽減するために、パッチレベルの分類を用いて適応的な擬似ラベル重みを学習する。この軽量モジュールは任意のSSSフレームワークに統合することができ、パフォーマンスを大幅に向上させることができる。提案手法を4つのSSS手法に統合し、2つの自然な画像と1つの医学的セグメンテーションデータセットにわたって改善することにより,提案手法の有効性を実証する。

Incorporating pixel contextual information is critical for accurate segmentation. In this paper, we show that an effective way to incorporate contextual information is through a patch-based classifier. This patch classifier is trained to identify classes present within an image region, which facilitates the elimination of distractors and enhances the classification of small object segments. Specifically, we introduce Multi-scale Patch-based Multi-label Classifier (MPMC), a novel plug-in module designed for existing semi-supervised segmentation (SSS) frameworks. MPMC offers patch-level supervision, enabling the discrimination of pixel regions of different classes within a patch. Furthermore, MPMC learns an adaptive pseudo-label weight, using patch-level classification to alleviate the impact of the teacher's noisy pseudo-label supervision the student. This lightweight module can be integrated into any SSS framework, significantly enhancing their performance. We demonstrate the efficacy of our proposed MPMC by integrating it into four SSS methodologies and improving them across two natural image and one medical segmentation dataset, notably improving the segmentation results of the baselines across all the three datasets.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# 知識に基づく医薬品サンプルの比較

Knowledge-based Drug Samples' Comparison ( http://arxiv.org/abs/2407.04317v2 )

ライセンス: Link先を確認

Sébastien Guillemin, Ana Roxin, Laurence Dujourdy, Ludovic Journaux,

(参考訳) ドラッグ・サンプル・コンファレンス(英: Drug sample comparison)は、フランス国家警察が麻薬の流通ネットワークを識別するプロセスである。現在のアプローチは、法医学の専門家による手動比較に基づいている。本稿では,現在のプロセスを改善するために専門家の知識を取得し,形式化し,特定するためのアプローチを提案する。基礎となる知識をモデル化するためには、オントロジーと論理的ルールを使います。このアプローチのさまざまなステップは、他のアプリケーションドメインで再利用するように設計されています。得られた結果は、さまざまな分野の専門家が利用できるように説明できる。

Drug sample comparison is a process used by the French National police to identify drug distribution networks. The current approach is based on manual comparison done by forensic experts. In this article, we present our approach to acquire, formalise, and specify expert knowledge to improve the current process. For modelling the underlying knowledge we use an ontology coupled with logical rules. The different steps of our approach are designed to be reused in other application domains. The results obtained are explainable making them usable by experts in different fields.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# 大規模言語モデルは戦略的意思決定者か? : 2プレイヤーノンゼロサムゲームのパフォーマンスとバイアスに関する研究

Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games ( http://arxiv.org/abs/2407.04467v2 )

ライセンス: Link先を確認

Nathan Herr, Fernando Acero, Roberta Raileanu, María Pérez-Ortiz, Zhibin Li,

(参考訳) 大規模言語モデル(LLM)は、現実世界での利用が増えているが、その戦略能力はほとんど解明されていない。ゲーム理論は、他のエージェントとの相互作用におけるLSMの意思決定能力を評価するための優れたフレームワークを提供する。以前の研究では、LSMは慎重に計算されたプロンプトでこれらのタスクを解くことができるが、問題の設定やプロンプトが変わると失敗する。本研究では,戦略ゲームにおける LLM の動作,Stag Hunt と Prisoner Dilemma について検討し,異なる設定とプロンプト下での性能変動を分析した。以上の結果から,(1)位置バイアス,(2)支払いバイアス,(3)行動バイアスの少なくとも1つが評価された。その結果,ゲーム構成が影響するバイアスと一致していない場合,LLMの性能は低下することがわかった。パフォーマンスは正しいアクションの選択に基づいて評価される。アライメント(Alignment)とは、LLMのバイアスが正しい動作と一致しているかどうかをいう。例えば、GPT-4oの平均性能は、不一致時に34%低下する。さらに、GPT-4o(現在の最高の性能のLCM)が最大の性能低下を被る「より大きく新しいもの」という現在の傾向は、上記のようには保たない。最後に、チェーン・オブ・ソート・プロンプトは、ほとんどのモデルにおけるバイアスの影響を減少させるが、根本的なレベルでの問題解決には程遠いことに留意する。

Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic abilities remain largely unexplored. Game theory provides a good framework for assessing the decision-making abilities of LLMs in interactions with other agents. Although prior studies have shown that LLMs can solve these tasks with carefully curated prompts, they fail when the problem setting or prompt changes. In this work we investigate LLMs' behaviour in strategic games, Stag Hunt and Prisoner Dilemma, analyzing performance variations under different settings and prompts. Our results show that the tested state-of-the-art LLMs exhibit at least one of the following systematic biases: (1) positional bias, (2) payoff bias, or (3) behavioural bias. Subsequently, we observed that the LLMs' performance drops when the game configuration is misaligned with the affecting biases. Performance is assessed based on the selection of the correct action, one which agrees with the prompted preferred behaviours of both players. Alignment refers to whether the LLM's bias aligns with the correct action. For example, GPT-4o's average performance drops by 34% when misaligned. Additionally, the current trend of "bigger and newer is better" does not hold for the above, where GPT-4o (the current best-performing LLM) suffers the most substantial performance drop. Lastly, we note that while chain-of-thought prompting does reduce the effect of the biases on most models, it is far from solving the problem at the fundamental level.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# LoRA-GA: 勾配近似による低ランク適応

LoRA-GA: Low-Rank Adaptation with Gradient Approximation ( http://arxiv.org/abs/2407.05000v2 )

ライセンス: Link先を確認

Shaowen Wang, Linxi Yu, Jian Li,

(参考訳) 微調整された大規模事前訓練モデルは、計算とメモリコストの点で極めて高価である。 LoRAは、パラメータ効率の良いファインチューニング(PEFT)手法として、パラメータが著しく少ない補助的な低ランクモデルを微調整することで、コスト効率の良い代替手段を提供する。 LoRAは各イテレーションで計算とメモリの要求を大幅に削減するが、広範な実証的な証拠は、完全な微調整に比べてかなり遅い速度で収束し、最終的には計算全体の増加とテスト性能の悪化につながることを示している。本稿では,LoRAの初期化手法の詳細な検討を行い,アーキテクチャやトレーニングアルゴリズムの変更なしに,注意深い初期化が効率と性能の両方を大幅に向上させることを示す。特に,新しい初期化手法であるLoRA-GA(Low Rank Adaptation with Gradient Approximation)を導入する。我々の広範囲な実験により、LoRA-GAは完全な微調整と同等の収束率(バニラのLoRAよりも大幅に高速であり、最近の改良もいくつかある)を同時に達成し、同時に同等あるいはより優れた性能を実現していることが示された。例えば、GLUEデータセットのサブセットであるT5-Baseでは、LoRA-GAは平均で5.69%向上している。 Llama 2-7Bのような大型モデルでは、それぞれMT-bench、GSM8K、Human-evalで0.34、1.52%、および5.05%の性能向上を示した。さらに,バニラロラに比べて最大2～4倍の収束速度向上が観察され,収束の促進とモデル性能の向上に効果が検証された。コードはhttps://github.com/Outsider565/LoRA-GAで入手できる。

Fine-tuning large-scale pretrained models is prohibitively expensive in terms of computational and memory costs. LoRA, as one of the most popular Parameter-Efficient Fine-Tuning (PEFT) methods, offers a cost-effective alternative by fine-tuning an auxiliary low-rank model that has significantly fewer parameters. Although LoRA reduces the computational and memory requirements significantly at each iteration, extensive empirical evidence indicates that it converges at a considerably slower rate compared to full fine-tuning, ultimately leading to increased overall compute and often worse test performance. In our paper, we perform an in-depth investigation of the initialization method of LoRA and show that careful initialization (without any change of the architecture and the training algorithm) can significantly enhance both efficiency and performance. In particular, we introduce a novel initialization method, LoRA-GA (Low Rank Adaptation with Gradient Approximation), which aligns the gradients of low-rank matrix product with those of full fine-tuning at the first step. Our extensive experiments demonstrate that LoRA-GA achieves a convergence rate comparable to that of full fine-tuning (hence being significantly faster than vanilla LoRA as well as various recent improvements) while simultaneously attaining comparable or even better performance. For example, on the subset of the GLUE dataset with T5-Base, LoRA-GA outperforms LoRA by 5.69% on average. On larger models such as Llama 2-7B, LoRA-GA shows performance improvements of 0.34, 11.52%, and 5.05% on MT-bench, GSM8K, and Human-eval, respectively. Additionally, we observe up to 2-4 times convergence speed improvement compared to vanilla LoRA, validating its effectiveness in accelerating convergence and enhancing model performance. Code is available at https://github.com/Outsider565/LoRA-GA.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# 安定したデータ駆動気候モデリングのための非局所力学学習の重要性について:1次元重力波-QBOテストベッド

On the importance of learning non-local dynamics for stable data-driven climate modeling: A 1D gravity wave-QBO testbed ( http://arxiv.org/abs/2407.05224v2 )

ライセンス: Link先を確認

Hamid A. Pahlavan, Pedram Hassanzadeh, M. Joan Alexander,

(参考訳) 機械学習(ML)技術、特にニューラルネットワーク(NN)は、気候モデルのためのサブグリッドスケールパラメータ化の学習において有望であることを示している。しかし、特に教師付きアルゴリズムで学んだデータ駆動パラメータ化の大きな問題は、モデル不安定性である。現在の治療法は、しばしばアドホックであり、理論的な基礎を欠いている。ここでは、ML理論と気候物理を組み合わせて、NNベースのパラメータ化における不安定性の源となる問題に対処する。本研究では,重力波をパラメータ化した準双年振動(QBO)の1次元モデルを用いて,空間的に$\textit{non-local}$ dynamicsを学習することの重要性を示す。非局所的ダイナミクスの学習において、一般的なオフラインメトリクスは欠点を識別できないが、受容場(RF)の概念は不安定なa-prioriを識別できることを示す。風面からGW強制を正確に予測すると考えられるNNベースのパラメータ化(\mathbf{R^2 \approx 0.99}$)は、RFが小さすぎて非局所的ダイナミクスを捕捉できない場合、不安定なシミュレーションを引き起こす。本稿では,3種類のアーキテクチャ,すなわち畳み込みNN,フーリエニューラル演算子,および完全連結NNについて検討する。また、非局所的ダイナミクスの学習は、粒子風場のデータ駆動時空間エミュレータの安定性と精度に不可欠であることを示す。気候システムにおける非局所力学の多様性を考えると、あらゆるNNアーキテクチャで計算できる実効的なRFの利用は多くのアプリケーションにとって重要であると期待する。この研究は、気象と気候モデリングのためのデータ駆動アルゴリズムの設計と解析のために、ML理論と物理を統合する必要性を強調している。

Machine learning (ML) techniques, especially neural networks (NNs), have shown promise in learning subgrid-scale parameterizations for climate models. However, a major problem with data-driven parameterizations, particularly those learned with supervised algorithms, is model instability. Current remedies are often ad-hoc and lack a theoretical foundation. Here, we combine ML theory and climate physics to address a source of instability in NN-based parameterization. We demonstrate the importance of learning spatially $\textit{non-local}$ dynamics using a 1D model of the quasi-biennial oscillation (QBO) with gravity wave (GW) parameterization as a testbed. While common offline metrics fail to identify shortcomings in learning non-local dynamics, we show that the concept of receptive field (RF) can identify instability a-priori. We find that NN-based parameterizations that seem to accurately predict GW forcings from wind profiles ($\mathbf{R^2 \approx 0.99}$) cause unstable simulations when RF is too small to capture the non-local dynamics, while NNs of the same size but large-enough RF are stable. We examine three broad classes of architectures, namely convolutional NNs, Fourier neural operators, and fully-connected NNs; the latter two have inherently large RFs. We also demonstrate that learning non-local dynamics is crucial for the stability and accuracy of a data-driven spatiotemporal emulator of the zonal wind field. Given the ubiquity of non-local dynamics in the climate system, we expect the use of effective RF, which can be computed for any NN architecture, to be important for many applications. This work highlights the necessity of integrating ML theory with physics to design and analyze data-driven algorithms for weather and climate modeling.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# 2重ポテンシャルにおけるボース・アインシュタイン凝縮体のダイナミクスに対するレゲット・ガーグの不等式の振動

Violation of the Leggett-Garg Inequality for Dynamics of a Bose-Einstein Condensate in a Double-Well Potential ( http://arxiv.org/abs/2407.05304v2 )

ライセンス: Link先を確認

Tsubasa Sakamoto, Ryosuke Yoshii, Shunji Tsuchiya,

(参考訳) Leggett-Garg不等式 (LGI) は、Leggett と Garg が仮定したように、マクロ的システム力学のマクロ現実主義への密着性を決定する基準として機能する。この不等式に違反することは、システムの現実的な記述がないか、非侵襲的な測定の不現実性を意味する。本研究では,2重井戸電位におけるボソン系のLGI違反について検討する。具体的には, ボース・アインシュタイン・凝縮系(BEC)の二重井戸ポテンシャルにおけるボソンの力学におけるLGIの違反について検討する。分析の結果,LGIはヨーゼフソン振動により不規則であることが明らかとなった。特に、粒子数が増加するにつれて、LGIの違反がますます顕著になるのを観察する。これらの結果は、ボースの凝縮体のマクロ現実的挙動に関する貴重な洞察を与え、測定がマクロ系の力学に与える影響を強調している。

The Leggett-Garg inequality (LGI) serves as a criterion to determine the adherence of macroscopic system dynamics to macrorealism, as postulated by Leggett and Garg. A violation of this inequality implies either the absence of a realistic description of the system or the impracticality of noninvasive measurements. In this work, we investigate the violation of the LGI for the system of bosons in a double-well potential. Specifically, we explore the violation of the LGI in the dynamics of bosons in a double-well potential in the Bose-Einstein-Condensation (BEC) regime, where the system can be considered as two weakly coupled Bose condensates, and in the single-particle regime to establish the conditions under which the violation of the LGI occurs. Our analysis reveals that the LGI is violated due to Josephson oscillations, while it remains unviolated in the strong coupling regime, attributed to the self-trapping phenomena. Notably, we observe that the violation of the LGI becomes increasingly significant as the particle number increases. These findings provide valuable insights into the macrorealistic behavior of Bose condensates and highlight the effect of measurements on the dynamics of a macroscopic system.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# CPM:音声視覚分割のためのクラス条件プロンプティングマシン

CPM: Class-conditional Prompting Machine for Audio-visual Segmentation ( http://arxiv.org/abs/2407.05358v2 )

ライセンス: Link先を確認

Yuanhong Chen, Chong Wang, Yuyuan Liu, Hu Wang, Gustavo Carneiro,

(参考訳) オーディオ・ビジュアル・セグメンテーション (AVS) は、オーディオ・ビジュアル・キューに基づいた音質オブジェクトを正確にセグメンテーションすることを目的とした新しいタスクである。 AVS学習システムの成功は、モーダル間相互作用の有効性に依存する。このような要求は、トランスフォーマーベースのセグメンテーションアーキテクチャを活用することで自然に達成できる。しかし,AVSでは,特に学習された音声クエリが明確な意味的手がかりを提供していない場合,クロスアテンションの有効性の低下や不安定なバイパーティイトマッチングなどのトランスフォーマーベースの手法の固有のトレーニング問題を増幅することができる。本稿では,これら2つの問題を,CPM(Class-conditional Prompting Machine)を用いて解決する。 CPMは、クラスに依存しないクエリとクラス条件のクエリを組み合わせた学習戦略により、バイパーティイトマッチングを改善している。クロスモーダルアテンションの有効性は,音声・視覚・関節モダリティの新しい学習目標によって向上する。我々はAVSベンチマーク実験を行い、その手法がSOTA(State-of-the-art)セグメンテーションの精度を実現することを示す。

Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment sounding objects based on audio-visual cues. The success of AVS learning systems depends on the effectiveness of cross-modal interaction. Such a requirement can be naturally fulfilled by leveraging transformer-based segmentation architecture due to its inherent ability to capture long-range dependencies and flexibility in handling different modalities. However, the inherent training issues of transformer-based methods, such as the low efficacy of cross-attention and unstable bipartite matching, can be amplified in AVS, particularly when the learned audio query does not provide a clear semantic clue. In this paper, we address these two issues with the new Class-conditional Prompting Machine (CPM). CPM improves the bipartite matching with a learning strategy combining class-agnostic queries with class-conditional queries. The efficacy of cross-modal attention is upgraded with new learning objectives for the audio, visual and joint modalities. We conduct experiments on AVS benchmarks, demonstrating that our method achieves state-of-the-art (SOTA) segmentation accuracy.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# OneDiff:画像差分キャプションのためのジェネリストモデル

OneDiff: A Generalist Model for Image Difference Captioning ( http://arxiv.org/abs/2407.05645v2 )

ライセンス: Link先を確認

Erdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu,

(参考訳) コンピュータビジョンにおいて、画像差分キャプション(IDC)は、近縁な画像間の変化を正確に記述するために重要である。従来のIDCの手法は、様々な文脈における適用性を制限する専門的なモデルに依存していることが多い。本稿では,シマウマ画像エンコーダをビジュアルデルタモジュールに統合し,ロバストな視覚言語モデルアーキテクチャを利用する新しいジェネラリスト手法であるOneDiffモデルを紹介する。この革新的な構成により、画像ペア間の微細な違いを正確に検出し、明瞭にすることができる。 OneDiffは、結合サンプルトレーニングとマルチタスク学習を、新たに開発したDiffCap Datasetによってサポートされたさまざまなデータタイプにわたって含む、二重フェーズ戦略を通じてトレーニングされている。このデータセットは実世界のデータと合成データをマージし、トレーニングプロセスを強化し、モデルの堅牢性を強化します。 Spot-the-Diff、CLEVR-Change、Birds-to-Wordsといった多様なIDCベンチマークの広範なテストは、OneDiffが既存の最先端モデルを精度と適応性で一貫して上回り、平均85%のCIDErポイントの改善を実現していることを示している。 IDCに新しいベンチマークを設定することで、OneDiffは視覚的差異の検出と記述において、より汎用的で効果的なアプリケーションを実現することができる。コード、モデル、データは公開されます。

In computer vision, Image Difference Captioning (IDC) is crucial for accurately describing variations between closely related images. Traditional IDC methods often rely on specialist models, which restrict their applicability across varied contexts. This paper introduces the OneDiff model, a novel generalist approach that utilizes a robust vision-language model architecture, integrating a siamese image encoder with a Visual Delta Module. This innovative configuration allows for the precise detection and articulation of fine-grained differences between image pairs. OneDiff is trained through a dual-phase strategy, encompassing Coupled Sample Training and multi-task learning across a diverse array of data types, supported by our newly developed DiffCap Dataset. This dataset merges real-world and synthetic data, enhancing the training process and bolstering the model's robustness. Extensive testing on diverse IDC benchmarks, such as Spot-the-Diff, CLEVR-Change, and Birds-to-Words, shows that OneDiff consistently outperforms existing state-of-the-art models in accuracy and adaptability, achieving improvements of up to 85\% CIDEr points in average. By setting a new benchmark in IDC, OneDiff paves the way for more versatile and effective applications in detecting and describing visual differences. The code, models, and data will be made publicly available.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# パラメータ化とオプティマイザ間のスケーリング指数

Scaling Exponents Across Parameterizations and Optimizers ( http://arxiv.org/abs/2407.05872v2 )

ライセンス: Link先を確認

Katie Everett, Lechao Xiao, Mitchell Wortsman, Alexander A. Alemi, Roman Novak, Peter J. Liu, Izzeddin Gur, Jascha Sohl-Dickstein, Leslie Pack Kaelbling, Jaehoon Lee, Jeffrey Pennington,

(参考訳) モデルの小幅から大幅までのロバストで効果的なスケーリングには、パラメータ化やオプティマイザの選択など、多くのアルゴリズムやアーキテクチャの詳細を正確に調整する必要がある。本研究では,パラメータとデータのアライメントに関する先行研究における重要な仮定を調査し,より弱い仮定とより広い最適化条件の下での新たな理論的結果を導出することによる,パラメータ化に関する新たな視点を提案する。我々の広範な実証調査には、3つのオプティマイザと4つのパラメータ化、いくつかのアライメント仮定、12以上の学習率、最大26.8Bパラメータの14のモデルサイズの組み合わせで訓練された数万のモデルが含まれている。最高の学習率のスケーリング基準は、事前の作業の仮定から除外されることがよくあります。以上の結果から,最大更新パラメータ化(muP)だけでなく,すべてのパラメータ化がハイパーパラメータ転送を実現することが示唆された。最後に、パラメータ化の見過ごされた側面であるAdamのエプシロンパラメータが勾配下流を避けるために正しくスケールする必要があることを実証し、Epsilonハイパーパラメータを完全に排除するAdamの新しい数値安定なスケール不変バージョンAdam-atan2を提案する。

Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices. In this work, we propose a new perspective on parameterization by investigating a key assumption in prior work about the alignment between parameters and data and derive new theoretical results under weaker assumptions and a broader set of optimizers. Our extensive empirical investigation includes tens of thousands of models trained with all combinations of three optimizers, four parameterizations, several alignment assumptions, more than a dozen learning rates, and fourteen model sizes up to 26.8B parameters. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work. Our results show that all parameterizations, not just maximal update parameterization (muP), can achieve hyperparameter transfer; moreover, our novel per-layer learning rate prescription for standard parameterization outperforms muP. Finally, we demonstrate that an overlooked aspect of parameterization, the epsilon parameter in Adam, must be scaled correctly to avoid gradient underflow and propose Adam-atan2, a new numerically stable, scale-invariant version of Adam that eliminates the epsilon hyperparameter entirely.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# 表現の絡み合いの役割の解明--CLIPモデルにおける構成的一般化の考察

Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models ( http://arxiv.org/abs/2407.05897v2 )

ライセンス: Link先を確認

Reza Abbasi, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah,

(参考訳) CLIPモデルは、最近、OoD(Out of Distribution)の一般化機能を示す。しかし、CLIPモデルでは、既知の概念の未知の合成を理解するためのモデルの能力の重要な側面である構成外分布(C-OoD)の一般化は、比較的未解明である。私たちのゴールは、CLIPのC-OoDに寄与する要因を特定し、この問題に対処することです。 CLIPの合成理解に関するこれまでの研究は、テストサンプルがCLIPトレーニングデータに対して真に新しいものであることを保証できないことが多かった。この目的のために、我々は、CLIPモデルの複合トレーニングデータセットに遭遇する可能性が極めて低いオブジェクトの属性を含む、大規模で多様なデータセットを単一のオブジェクト設定で慎重に合成した。このデータセットは、C-OoD一般化の真正性評価を可能にする。各種CLIPモデルにおけるC-OoDの一般化について検討した。本稿では,CLIP表現のアンタングル化が,この文脈における重要な指標となることを提案する。合成データセットやその他の既存のデータセットを利用することで、テキストと画像表現の様々なアンタングルメント指標を評価する。本研究は,画像およびテキスト表現の歪み,特に構成要素に関して,CLIPモデルのアウト・オブ・ディストリビューション・セッティングにおける一般化に重要な役割を担っていることを明らかにした。この発見は、CLIPにおけるアウト・オブ・ディストリビューションの一般化を促進する有望な機会を示唆している。

CLIP models have recently shown to exhibit Out of Distribution (OoD) generalization capabilities. However, Compositional Out of Distribution (C-OoD) generalization, which is a crucial aspect of a model's ability to understand unseen compositions of known concepts, is relatively unexplored for the CLIP models. Our goal is to address this problem and identify the factors that contribute to the C-OoD in CLIPs. We noted that previous studies regarding compositional understanding of CLIPs frequently fail to ensure that test samples are genuinely novel relative to the CLIP training data. To this end, we carefully synthesized a large and diverse dataset in the single object setting, comprising attributes for objects that are highly unlikely to be encountered in the combined training datasets of various CLIP models. This dataset enables an authentic evaluation of C-OoD generalization. Our observations reveal varying levels of C-OoD generalization across different CLIP models. We propose that the disentanglement of CLIP representations serves as a critical indicator in this context. By utilizing our synthesized datasets and other existing datasets, we assess various disentanglement metrics of text and image representations. Our study reveals that the disentanglement of image and text representations, particularly with respect to their compositional elements, plays a crucial role in improving the generalization of CLIP models in out-of-distribution settings. This finding suggests promising opportunities for advancing out-of-distribution generalization in CLIPs.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# 量子作用素に対する局所同変表現の学習

Learning local equivariant representations for quantum operators ( http://arxiv.org/abs/2407.06053v3 )

ライセンス: Link先を確認

Zhanghao Zhouyin, Zixi Gan, Shishir Kumar Pandey, Linfeng Zhang, Qiangqiang Gu,

(参考訳) 密度汎関数理論(DFT)フレームワークにおけるハミルトン行列、重なり合い、密度行列などの量子作用素行列の予測は、材料特性を理解するために重要である。現在の手法は個々の演算子に焦点を合わせ、大規模システムの効率性とスケーラビリティに苦慮することが多い。本稿では、複数の量子演算子を予測するための新しい深層学習モデルSLEM(厳密な局所化同変メッセージパス)を紹介し、計算効率を劇的に向上させながら最先端の精度を実現する。 SLEMの重要な革新は、その厳密な局所性に基づく設計であり、物理対称性を維持しながら量子テンソルの局所的同変表現を構築することである。これにより、効果的な受容場を拡張することなく複雑な多体依存が可能となり、データ効率と転送性が向上する。革新的なSO(2)畳み込み法を用いて、SLEMは高次テンソル積の計算複雑性を低減し、従って基底集合に$f$と$g$の軌道を必要とするシステムを扱うことができる。 SLEMの能力は多種多様な2次元および3次元材料にまたがって実証し,限られた訓練データでも高い精度を達成できることを示した。 SLEMの設計は効率的な並列化を促進し、DFTシミュレーションをデバイスレベルのサイズを持つシステムに拡張し、大規模量子シミュレーションと高スループット材料発見の新たな可能性を開く。

Predicting quantum operator matrices such as Hamiltonian, overlap, and density matrices in the density functional theory (DFT) framework is crucial for understanding material properties. Current methods often focus on individual operators and struggle with efficiency and scalability for large systems. Here we introduce a novel deep learning model, SLEM (strictly localized equivariant message-passing) for predicting multiple quantum operators, that achieves state-of-the-art accuracy while dramatically improving computational efficiency. SLEM's key innovation is its strict locality-based design, constructing local, equivariant representations for quantum tensors while preserving physical symmetries. This enables complex many-body dependence without expanding the effective receptive field, leading to superior data efficiency and transferability. Using an innovative SO(2) convolution technique, SLEM reduces the computational complexity of high-order tensor products and is therefore capable of handling systems requiring the $f$ and $g$ orbitals in their basis sets. We demonstrate SLEM's capabilities across diverse 2D and 3D materials, achieving high accuracy even with limited training data. SLEM's design facilitates efficient parallelization, potentially extending DFT simulations to systems with device-level sizes, opening new possibilities for large-scale quantum simulations and high-throughput materials discovery.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# PerlDiff:パースペクティブレイアウト拡散モデルを用いた制御可能なストリートビュー合成

PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models ( http://arxiv.org/abs/2407.06109v2 )

ライセンス: Link先を確認

Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu,

(参考訳) 制御可能な生成は3次元データのアノテートという課題に対処するための潜在的に不可欠なアプローチと考えられており、このような制御可能な生成の精度は、自律運転のデータ生産の文脈において特に不可欠である。既存の手法は、GLIGENやControlNetといったフレームワークを利用して、様々な生成情報を入力を制御することに集中し、制御可能な生成において可換な結果を生成する。しかし、そのようなアプローチは、本質的には、事前に定義されたネットワークアーキテクチャの学習能力に、生成性能を制限している。本稿では,3次元幾何学的情報を完全に活用したストリートビュー画像生成手法であるPerlDiff(Perspective-Layout Diffusion Models)を導入する。我々のPerlDiffは、ネットワーク学習プロセス内で正確なオブジェクトレベル制御でストリートビュー画像の生成をガイドするために、3次元の幾何学的事前情報を用いており、その結果、より堅牢で制御可能な出力が得られる。さらに、代替レイアウト制御法よりも優れた制御性を示す。 PerlDiffはNuScenesとKITTIデータセットの生成精度を著しく向上させる。私たちのコードとモデルはhttps://github.com/LabShuHangGU/PerlDiff.comで公開されています。

Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or ControlNet, to produce commendable outcomes in controllable generation. However, such approaches intrinsically restrict generation performance to the learning capacities of predefined network architectures. In this paper, we explore the integration of controlling information and introduce PerlDiff (Perspective-Layout Diffusion Models), a method for effective street view image generation that fully leverages perspective 3D geometric information. Our PerlDiff employs 3D geometric priors to guide the generation of street view images with precise object-level control within the network learning process, resulting in a more robust and controllable output. Moreover, it demonstrates superior controllability compared to alternative layout control methods. Empirical results justify that our PerlDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets. Our codes and models are publicly available at https://github.com/LabShuHangGU/PerlDiff.

翻訳日:2024-07-17 20:20:06 公開日:2024-07-16

# 自動運転における安全性の向上--エンド・ツー・エンドナビゲーションにおける潜在状態拡散モデルの統合

Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation ( http://arxiv.org/abs/2407.06317v3 )

ライセンス: Link先を確認

Detian Chu, Linyuan Bai, Jianuo Huang, Zhenlong Fang, Peng Zhang, Wei Kang,

(参考訳) 自動運転の進歩により、移動計画やナビゲーションにおける安全性の確保がますます重要になっている。しかし、ほとんどのエンドツーエンドの計画手法は安全性の欠如に悩まされている。本研究は、CMDP(Constrained Markov Decision Processs)として定式化された自動運転の制御最適化問題における安全性問題に対処する。複雑な高次元状態空間における制約を効果的に管理するために,条件付きバリュー・アット・リスクに基づくソフト・アクター・クリティカルを用いて,ポリシー最適化のための新しいモデルベースアプローチを提案する。本手法では, 安全探索を誘導する最悪のアクターを導入し, 予測不可能なシナリオにおいても, 安全要件の厳密な遵守を確保する。政策最適化は拡張ラグランジアン法を採用し、遅延拡散モデルを利用して将来の軌道を予測しシミュレーションする。この2つのアプローチは、環境を安全にナビゲートするだけでなく、環境の不確実性を考慮した流通モデルを統合することで、政策のパフォーマンスを向上する。シミュレーションと実環境の両方で実施した実証評価では,既存の手法よりも安全性,効率,意思決定能力が優れていた。

With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based approach for policy optimization, utilizing a conditional Value-at-Risk based Soft Actor Critic to manage constraints in complex, high-dimensional state spaces effectively. Our method introduces a worst-case actor to guide safe exploration, ensuring rigorous adherence to safety requirements even in unpredictable scenarios. The policy optimization employs the Augmented Lagrangian method and leverages latent diffusion models to predict and simulate future trajectories. This dual approach not only aids in navigating environments safely but also refines the policy's performance by integrating distribution modeling to account for environmental uncertainties. Empirical evaluations conducted in both simulated and real environment demonstrate that our approach outperforms existing methods in terms of safety, efficiency, and decision-making capabilities.

翻訳日:2024-07-17 20:10:21 公開日:2024-07-16

# あらゆるものを追跡する分解

Decomposition Betters Tracking Everything Everywhere ( http://arxiv.org/abs/2407.06531v2 )

ライセンス: Link先を確認

Rui Li, Dong Liu,

(参考訳) 動き推定に関する最近の研究は、ビデオ全体、好ましくは各ピクセルに対して一様に一貫した、最適化された動き表現を提唱している。均一な表現は、自然ビデオの複雑で多様な動きや外観を考慮しないため、これは難しい。この問題に対処し,DecoMotionという新しいテスト時間最適化手法を提案する。 DecoMotionはビデオコンテンツを静的シーンと動的オブジェクトに明示的に分解する。 DecoMotionは局所空間と標準空間の間の変換を別々に調整し、カメラの動きに対応する静的シーンに対するアフィン変換を容易にする。ダイナミックボリュームに対しては、DecoMotionは差別的かつ時間的に一貫した特徴を活用して、非厳密な変換を是正する。最終的に2巻は、動きと外観を完全に表現するために融合される。この分割・対数戦略は、閉塞や変形によるより堅牢な追跡につながり、一方、分解された外観を得る。我々はTAP-Vidベンチマークで評価を行う。その結果,提案手法は点追跡精度を高いマージンで向上させ,最先端の専用点追跡ソリューションと同等に動作することを示した。

Recent studies on motion estimation have advocated an optimized motion representation that is globally consistent across the entire video, preferably for every pixel. This is challenging as a uniform representation may not account for the complex and diverse motion and appearance of natural videos. We address this problem and propose a new test-time optimization method, named DecoMotion, for estimating per-pixel and long-range motion. DecoMotion explicitly decomposes video content into static scenes and dynamic objects, either of which uses a quasi-3D canonical volume to represent. DecoMotion separately coordinates the transformations between local and canonical spaces, facilitating an affine transformation for the static scene that corresponds to camera motion. For the dynamic volume, DecoMotion leverages discriminative and temporally consistent features to rectify the non-rigid transformation. The two volumes are finally fused to fully represent motion and appearance. This divide-and-conquer strategy leads to more robust tracking through occlusions and deformations and meanwhile obtains decomposed appearances. We conduct evaluations on the TAP-Vid benchmark. The results demonstrate our method boosts the point-tracking accuracy by a large margin and performs on par with some state-of-the-art dedicated point-tracking solutions.

翻訳日:2024-07-17 20:10:21 公開日:2024-07-16

# QAOA上の多段階量子ウォークの利点

Advantages of multistage quantum walks over QAOA ( http://arxiv.org/abs/2407.06663v2 )

ライセンス: Link先を確認

Lasse Gerblich, Tamanna Dasanjh, Horatio Q. X. Wong, David Ross, Leonardo Novo, Nicholas Chancellor, Viv Kendon,

(参考訳) イジング・ハミルトニアンに符号化された最適化問題の解状態を見つける方法は、現在の研究の非常に活発な領域である。本研究では、量子近似最適化アルゴリズム(QAOA)とマルチステージ量子ウォーク(MSQW)を比較する。どちらも変分量子アルゴリズムとして使用することができ、制御パラメータは古典的に最適化される。公正な比較では、量子的資源と古典的資源の両方を評価する必要がある。あるいは、この作業で行ったようにパラメータをヒューリスティックに選択して、比較の簡単な設定を提供することもできます。数値的手法と解析的手法の両方を用いて,MSQWが等価資源を用いてQAOAより優れていることを示す。また,MSQWが古典的最適化を伴わずに,少数の段階やヒューリスティックパラメータに対しても良好に動作するようなランダムなスピングラス基底状態問題についても数値的に示す。

Methods to find the solution state for optimization problems encoded into Ising Hamiltonians are a very active area of current research. In this work we compare the quantum approximate optimization algorithm (QAOA) with multi-stage quantum walks (MSQW). Both can be used as variational quantum algorithms, where the control parameters are optimized classically. A fair comparison requires both quantum and classical resources to be assessed. Alternatively, parameters can be chosen heuristically, as we do in this work, providing a simpler setting for comparisons. Using both numerical and analytical methods, we obtain evidence that MSQW outperforms QAOA, using equivalent resources. We also show numerically for random spin glass ground state problems that MSQW performs well even for few stages and heuristic parameters, with no classical optimization.

翻訳日:2024-07-17 20:10:21 公開日:2024-07-16

# 3DGS.zip:3次元ガウス散乱圧縮法に関する調査

3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods ( http://arxiv.org/abs/2407.09510v2 )

ライセンス: Link先を確認

Milena T. Bagdasarian, Paul Knoll, Florian Barthel, Anna Hilsmann, Peter Eisert, Wieland Morgenstern,

(参考訳) 本稿では,3次元ガウススプラッティング圧縮法について,様々なベンチマークにおける統計的性能に着目して検討する。本調査は,異なる圧縮手法の鍵となる統計データを表形式で要約することにより,可読性の向上を目的とする。評価されたデータセットには、TurpsAndTemples、MipNeRF360、DeepBlending、SyntheticNeRFがある。各手法について,各著者が提案するPak Signal-to-Noise Ratio (PSNR), Structure similarity Index (SSIM), Learned Perceptual Image Patch similarity (LPIPS), and the resultant size in megabytes (MB)について報告する。これは進行中のオープンソースプロジェクトであり、GitHubの問題やプルリクエストとして、リサーチコミュニティからのコントリビューションを募集しています。詳細はhttp://w-m.github.io/3dgs-compression-survey/を参照してください。

We present a work-in-progress survey on 3D Gaussian Splatting compression methods, focusing on their statistical performance across various benchmarks. This survey aims to facilitate comparability by summarizing key statistics of different compression approaches in a tabulated format. The datasets evaluated include TanksAndTemples, MipNeRF360, DeepBlending, and SyntheticNeRF. For each method, we report the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), and the resultant size in megabytes (MB), as provided by the respective authors. This is an ongoing, open project, and we invite contributions from the research community as GitHub issues or pull requests. Please visit http://w-m.github.io/3dgs-compression-survey/ for more information and a sortable version of the table.

翻訳日:2024-07-17 20:10:21 公開日:2024-07-16

# AIとリスクの反復的エピストピー

AI and the Iterable Epistopics of Risk ( http://arxiv.org/abs/2407.10236v2 )

ライセンス: Link先を確認

Andy Crabtree, Glenn McGarry, Lachlan Urquhart,

(参考訳) 抽象。 AIが社会に提示するリスクは、一般的な計算、すなわち、AI開発に関わる人々がAI影響評価、倫理的枠組み、新興国際標準、規制などのリスクを認識・管理できるように設計された一般的なフレームワークを通じて、広く理解されている。本稿では、規制当局、開発者、サイバーセキュリティの専門家によるリスクの把握と管理について詳述する。リスクとリスクマネジメントは、一般的な計算にカプセル化されていない日常的な場所にあるプラクティスに依存していることが明らかになった。 Situated practiceは反復可能なエピストピーを表面化し、AI開発に関わる人々がどのようにしてリスクを理解し、その後に反応し、仕事における大きな課題を明らかにするかを明らかにする。 AIにおけるリスクのエピストピーの発見と解明 a) 学際的調査の潜在的なプログラムを提供する b)AI開発者にリスクを認識させる手段を提供し、 c) 一般計算の現在進行中の進化を知らせる。

Abstract. The risks AI presents to society are broadly understood to be manageable through general calculus, i.e., general frameworks designed to enable those involved in the development of AI to apprehend and manage risk, such as AI impact assessments, ethical frameworks, emerging international standards, and regulations. This paper elaborates how risk is apprehended and managed by a regulator, developer and cyber-security expert. It reveals that risk and risk management is dependent on mundane situated practices not encapsulated in general calculus. Situated practice surfaces iterable epistopics, revealing how those involved in the development of AI know and subsequently respond to risk and uncover major challenges in their work. The ongoing discovery and elaboration of epistopics of risk in AI a) furnishes a potential program of interdisciplinary inquiry, b) provides AI developers with a means of apprehending risk, and c) informs the ongoing evolution of general calculus.

翻訳日:2024-07-17 20:10:21 公開日:2024-07-16

# 逆問題における拡散モデルの近似後サンプリングのためのゼロショット適応

Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems ( http://arxiv.org/abs/2407.11288v1 )

ライセンス: Link先を確認

Yaşar Utku Alçalar, Mehmet Akçakaya,

(参考訳) 拡散モデルは、逆問題を解決するための強力な生成技術として登場した。画像の様々な逆問題で成功したにもかかわらず、これらのモデルは収束するために多くのステップを必要とし、推論時間が遅くなる。近年,低騒音レベルにおける時間経過の頻繁な反復を伴い,画像生成や収束速度が向上する,高度なノイズスケジュールを利用するための拡散モデルが流行している。しかしながら、これらのアイデアの拡散モデルによる逆問題への応用は、前方モデルの対数的項重みに対する経験的チューニングを使用する場合、これらのノイズスケジュールがうまく機能しないため、依然として困難である。これらの課題に対処するために、ゼロショット物理駆動深層学習への接続を利用するゼロショット近似後方サンプリング(ZAPS)を提案する。 ZAPSはサンプリングステップの数を修正し、物理学誘導損失関数によるゼロショットトレーニングを使用して、不規則な時間ステップ毎にログライクな重みを学習する。本稿では,最近提案した拡散後サンプリング法をベースラインとしてZAPSを適用した。さらに,学習可能な対角成分を用いた対角化手法を用いて,先行対数ヘシアンを近似し,計算効率を向上する。これらのパラメータは、所定の計算予算を持つ固定数のエポックに対して最適化される。ガウス, 運動遅延, 塗装, 超解像などの様々な雑音逆問題に対する結果から, ZAPSは推定時間を短縮し, 不規則な騒音スケジュールに対して頑健性を提供し, 再現性の向上を図っている。コードはhttps://github.com/ualcalar17/ZAPSで入手できる。

Diffusion models have emerged as powerful generative techniques for solving inverse problems. Despite their success in a variety of inverse problems in imaging, these models require many steps to converge, leading to slow inference time. Recently, there has been a trend in diffusion models for employing sophisticated noise schedules that involve more frequent iterations of timesteps at lower noise levels, thereby improving image generation and convergence speed. However, application of these ideas for solving inverse problems with diffusion models remain challenging, as these noise schedules do not perform well when using empirical tuning for the forward model log-likelihood term weights. To tackle these challenges, we propose zero-shot approximate posterior sampling (ZAPS) that leverages connections to zero-shot physics-driven deep learning. ZAPS fixes the number of sampling steps, and uses zero-shot training with a physics-guided loss function to learn log-likelihood weights at each irregular timestep. We apply ZAPS to the recently proposed diffusion posterior sampling method as baseline, though ZAPS can also be used with other posterior sampling diffusion models. We further approximate the Hessian of the logarithm of the prior using a diagonalization approach with learnable diagonal entries for computational efficiency. These parameters are optimized over a fixed number of epochs with a given computational budget. Our results for various noisy inverse problems, including Gaussian and motion deblurring, inpainting, and super-resolution show that ZAPS reduces inference time, provides robustness to irregular noise schedules and improves reconstruction quality. Code is available at https://github.com/ualcalar17/ZAPS

翻訳日:2024-07-17 19:02:01 公開日:2024-07-16

# LoRA-PT:主テンソル特異値とベクトルを用いた海馬セグメンテーションのための低ランク適応UNETR

LoRA-PT: Low-Rank Adapting UNETR for Hippocampus Segmentation Using Principal Tensor Singular Values and Vectors ( http://arxiv.org/abs/2407.11292v1 )

ライセンス: Link先を確認

Guanghua He, Wangang Cheng, Hancan Zhu, Gaohang Yu,

(参考訳) 海馬は様々な精神疾患に関連する重要な脳構造であり、その自動的かつ正確なセグメンテーションはこれらの疾患の研究に不可欠である。近年,深層学習に基づく手法は海馬セグメンテーションにおいて大きな進歩を遂げている。しかし、深層ニューラルネットワークモデルのトレーニングには、大量のラベル付きトレーニングデータだけでなく、かなりの計算資源と時間が必要です。そこで本研究では,LoRA-PTと呼ばれるパラメータ効率の高いファインチューニング手法を提案する。この方法は、BraTS2021データセット上の事前訓練されたUNETRモデルを、海馬セグメンテーションタスクに転送する。特に、LoRA-PT法は変圧器構造のパラメータ行列を3つのサイズに分類し、3つの3次元テンソルを形成する。テンソル特異値分解により、これらのテンソルは分解され、主特異値と特異ベクトルを持つ低ランクテンソルを生成し、残りの特異値とベクトルは残留テンソルを形成する。 LoRA法と同様に、パラメータの微調整の間は、残留テンソルを一定に保ちながら、主テンソル特異値とベクトルの低ランクテンソルのみを更新する。提案手法を3つの公開海馬データセットで検証した。実験結果から,LoRA-PTは,パラメータ更新回数を大幅に削減しつつ,既存のパラメータ効率変換学習手法よりもセグメンテーション精度が高いことがわかった。

The hippocampus is a crucial brain structure associated with various psychiatric disorders, and its automatic and precise segmentation is essential for studying these diseases. In recent years, deep learning-based methods have made significant progress in hippocampus segmentation. However, training deep neural network models requires substantial computational resources and time, as well as a large amount of labeled training data, which is often difficult to obtain in medical image segmentation. To address this issue, we propose a new parameter-efficient fine-tuning method called LoRA-PT. This method transfers the pre-trained UNETR model on the BraTS2021 dataset to the hippocampus segmentation task. Specifically, the LoRA-PT method categorizes the parameter matrix of the transformer structure into three sizes, forming three 3D tensors. Through tensor singular value decomposition, these tensors are decomposed to generate low-rank tensors with the principal singular values and singular vectors, while the remaining singular values and vectors form the residual tensor. Similar to the LoRA method, during parameter fine-tuning, we only update the low-rank tensors, i.e. the principal tensor singular values and vectors, while keeping the residual tensor unchanged. We validated the proposed method on three public hippocampus datasets. Experimental results show that LoRA-PT outperforms existing parameter-efficient transfer learning methods in segmentation accuracy while significantly reducing the number of parameter updates.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# COHO: 環境に敏感な都市規模の階層的都市レイアウト生成

COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation ( http://arxiv.org/abs/2407.11294v1 )

ライセンス: Link先を確認

Liu He, Daniel Aliaga,

(参考訳) 大規模な都市レイアウトの生成は、様々な分野において大きな関心を集めてきた。従来の手法では、手動のルールコーディングや豊富なデータを必要とするディープラーニングを必要とする手続き生成が利用されていた。しかし、従来の手法では、都市レイアウト生成の文脈に敏感な性質は考慮されていない。提案手法は, 都市全体の標準グラフ表現を活用することで, 拡張性を高め, 都市レイアウトに固有の多層セマンティクスを捉えることで, このギャップに対処する。都市規模の都市レイアウト生成のための新しいグラフベースのマスク付きオートエンコーダ(GMAE)を提案する。この手法は、属性付き建物、都市ブロック、コミュニティ、都市を統一的なグラフ構造に符号化し、グラフオートエンコーダのための自己教師付きマスクトレーニングを可能にする。さらに,重要な都市ブロックや建物の発生を優先し,2.5次元レイアウト生成のための定期的な反復サンプリングも実施している。提案手法は,米国330都市における異質な都市スタイルにおける良好な現実性,意味的整合性,正当性を実現する。コードとデータセットはhttps://github.com/Arking 1995/COHOで公開されている。

The generation of large-scale urban layouts has garnered substantial interest across various disciplines. Prior methods have utilized procedural generation requiring manual rule coding or deep learning needing abundant data. However, prior approaches have not considered the context-sensitive nature of urban layout generation. Our approach addresses this gap by leveraging a canonical graph representation for the entire city, which facilitates scalability and captures the multi-layer semantics inherent in urban layouts. We introduce a novel graph-based masked autoencoder (GMAE) for city-scale urban layout generation. The method encodes attributed buildings, city blocks, communities and cities into a unified graph structure, enabling self-supervised masked training for graph autoencoder. Additionally, we employ scheduled iterative sampling for 2.5D layout generation, prioritizing the generation of important city blocks and buildings. Our approach achieves good realism, semantic consistency, and correctness across the heterogeneous urban styles in 330 US cities. Codes and datasets are released at https://github.com/Arking1995/COHO.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# 格子厚みモデルにおける動的量子相転移と熱平衡

Dynamical Quantum Phase Transition and Thermal Equilibrium in the Lattice Thirring Model ( http://arxiv.org/abs/2407.11295v1 )

ライセンス: Link先を確認

Mari Carmen Bañuls, Krzysztof Cichy, Hao-Ti Hung, Ying-Jer Kao, C. -J. David Lin, Amit Singh,

(参考訳) テンソルネットワーク法を用いて、臨界相と質量相の平衡から切り出された格子チリングモデルのリアルタイム進化をシミュレートし、ロシミト速度の非解析性として動的量子相転移の出現を研究する。モデルにおける動的量子相転移の存在は、平衡相図の臨界線を0温度で横断するクエンチとは一致しないが、動的量子相転移が起こるために必要な初期状態のエネルギー密度の閾値を同定する。さらに、ギャップ付きクエンチハミルトニアンの場合、このしきい値と有限温度位相図内の異なる領域間の遷移との接続を明らかにする。

Using tensor network methods, we simulate the real-time evolution of the lattice Thirring model quenched out of equilibrium in both the critical and massive phases, and study the appearance of dynamical quantum phase transitions, as non-analyticities in the Loschmidt rate. Whereas the presence of a dynamical quantum phase transition in the model does not correspond to quenches across the critical line of the equilibrium phase diagram at zero temperature, we identify a threshold in the energy density of the initial state, necessary for a dynamical quantum phase transition to be present. Moreover, in the case of the gapped quench Hamiltonian, we unveil a connection of this threshold to a transition between different regions in the finite temperature phase diagram.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# FR-SLAM:床計画登録に基づくSLAM改善手法

FR-SLAM: A SLAM Improvement Method Based on Floor Plan Registration ( http://arxiv.org/abs/2407.11299v1 )

ライセンス: Link先を確認

Jiantao Feng, Xinde Li, HyunCheol Park, Juan Liu, Zhentong Zhang,

(参考訳) SLAM技術は,移動ロボットの屋内自律走行における重要な技術として,環境マップの構築と位置決めを可能にする。従来のSLAM法は、完全な地図を得るためには、屋内ナビゲーション中にすべての部屋を徹底的に横断する必要があるため、長い経路計画時間と目標地点に到達するのに長い時間がかかる。さらに,動作中の累積誤差がロボットの局所化に寄与し,ナビゲーション効率に影響を及ぼすとともに,フロアプラン登録に基づく改良されたSLAM法であるFR-SLAMを提案し,フロアプランの整列と変換にモルフォロジーに基づくフロアプラン登録アルゴリズムを用いた。このアプローチにより、包括的なモーションマップの迅速な取得と効率的な経路計画が実現され、より短い時間枠内での迅速なナビゲーションが可能となる。登録とロボット動作のローカライゼーションの精度を高めるために、現在位置の建物構造を地図と比較し、正確なローカライゼーションのためのフロアプラン登録結果を動的に更新するリアルタイム更新戦略を採用する。実データとシミュレーションデータの比較実験により, 他のベンチマークアルゴリズムと比較して, フロアプランの登録精度が向上し, 目標位置に到達するまでの所要時間が短縮された。

Simultaneous Localization and Mapping (SLAM) technology enables the construction of environmental maps and localization, serving as a key technique for indoor autonomous navigation of mobile robots. Traditional SLAM methods typically require exhaustive traversal of all rooms during indoor navigation to obtain a complete map, resulting in lengthy path planning times and prolonged time to reach target points. Moreover, cumulative errors during motion lead to inaccurate robot localization, impacting navigation efficiency.This paper proposes an improved SLAM method, FR-SLAM, based on floor plan registration, utilizing a morphology-based floor plan registration algorithm to align and transform original floor plans. This approach facilitates the rapid acquisition of comprehensive motion maps and efficient path planning, enabling swift navigation to target positions within a shorter timeframe. To enhance registration and robot motion localization accuracy, a real-time update strategy is employed, comparing the current position's building structure with the map and dynamically updating floor plan registration results for precise localization. Comparative tests conducted on real and simulated datasets demonstrate that, compared to other benchmark algorithms, this method achieves higher floor plan registration accuracy and shorter time consumption to reach target positions.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# 文脈認識における感情認識としての大規模視覚言語モデル

Large Vision-Language Models as Emotion Recognizers in Context Awareness ( http://arxiv.org/abs/2407.11300v1 )

ライセンス: Link先を確認

Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, Jiawei Chen, Peng Zhai, Lihua Zhang,

(参考訳) 文脈対応感情認識(CAER)は、様々な文脈から感情を知覚する必要がある複雑で重要なタスクである。以前のアプローチは主に、イメージから感情的な手がかりを抽出する洗練されたアーキテクチャを設計することに焦点を当てていた。しかし、それらの知識は特定の訓練データセットに限定されており、アノテータの主観的な感情バイアスを反映する可能性がある。さらに、大量のラベル付きデータを取得することは、現実世界のアプリケーションではしばしば困難である。本稿では、3つのパラダイムからCAERタスクを強化するためにLVLM(Large Vision-Language Models)を活用する可能性について体系的に検討する。 1) 大規模モデルを下流タスクに転送する最も一般的な方法である2つのCAERデータセット上でLVLMを微調整する。 2) 限られたデータや全く見えないシナリオにおいて, LVLMの性能を評価するため, ゼロショットと少数ショットのパターンを設計する。この場合、LVLMのIn-Context Learning(ICL)機能を完全に活用するために、トレーニング不要のフレームワークが提案されている。具体的には、画像類似度に基づくランキングアルゴリズムを開発し、サンプルを検索し、次に命令、サンプルを検索し、テスト例を組み合わせてLVLMをフィードし、対応する感情判断を得る。 3) LVLMの豊富な知識基盤を活用するため, モデルの推論能力を高め, 解釈可能な結果を提供するために, フレームワークにChain-of-Thought(CoT)を組み込んだ。大規模な実験と分析により、LVLMは異なるパラダイムにわたるCAERタスクにおいて競争性能を達成することを示した。特に、数ショット設定での優れた性能は、広範囲のトレーニングを伴わずに特定のタスクを達成するためのLVLMの実現可能性を示している。

Context-aware emotion recognition (CAER) is a complex and significant task that requires perceiving emotions from various contextual cues. Previous approaches primarily focus on designing sophisticated architectures to extract emotional cues from images. However, their knowledge is confined to specific training datasets and may reflect the subjective emotional biases of the annotators. Furthermore, acquiring large amounts of labeled data is often challenging in real-world applications. In this paper, we systematically explore the potential of leveraging Large Vision-Language Models (LVLMs) to empower the CAER task from three paradigms: 1) We fine-tune LVLMs on two CAER datasets, which is the most common way to transfer large models to downstream tasks. 2) We design zero-shot and few-shot patterns to evaluate the performance of LVLMs in scenarios with limited data or even completely unseen. In this case, a training-free framework is proposed to fully exploit the In-Context Learning (ICL) capabilities of LVLMs. Specifically, we develop an image similarity-based ranking algorithm to retrieve examples; subsequently, the instructions, retrieved examples, and the test example are combined to feed LVLMs to obtain the corresponding sentiment judgment. 3) To leverage the rich knowledge base of LVLMs, we incorporate Chain-of-Thought (CoT) into our framework to enhance the model's reasoning ability and provide interpretable results. Extensive experiments and analyses demonstrate that LVLMs achieve competitive performance in the CAER task across different paradigms. Notably, the superior performance in few-shot settings indicates the feasibility of LVLMs for accomplishing specific tasks without extensive training.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# ゼーマンモデルによるロデオアルゴリズムの解法

Unraveling Rodeo Algorithm Through the Zeeman Model ( http://arxiv.org/abs/2407.11301v1 )

ライセンス: Link先を確認

Raphael Fortes Infante Gomes, Julio Cesar Siqueira Rocha, Wallon Anderson Tadaiesky Nogueira, Rodrigo Alves Dias,

(参考訳) 任意の初期状態を考慮したハミルトニアン一般に対する固有状態と固有値スペクトルを決定するために、ロデオアルゴリズムを解く。新たな方法論を提示することにより,固有状態に関する事前の知識を必要とせずに,元の手法を詳述し,すべての特性をどのように定義するかを示す。この目的のために、我々はPennylane と Qiskit のプラットフォームリソースを利用して、ハミルトニアンが1つのスピンと2つのスピンに対してゼーマンモデルによって記述されるシナリオを分析する。また本研究では,本質的なパラメータを調整し,データ分布に固有のゆらぎを低減し,アルゴリズムの性能向上のための戦略や手法についても紹介する。まず,Xanaduシミュレータ上の単一キュービットのダイナミクスを探索し,メソッド性能を最適化するパラメータを設定し,アルゴリズムを実行するための最善の戦略を選択する。そこで本研究では,両部システムの方法論を拡張し,縮退や絡み合いを考慮した場合のアルゴリズムの動作について検討する。最後に、IBM Q Experienceプログラムによって提供される実超伝導デバイス上で得られた結果と比較し、マルチキュービットシステムにおけるプロトコル効率を向上させる条件を確立する。

We unravel the Rodeo Algorithm to determine the eigenstates and eigenvalues spectrum for a general Hamiltonian considering arbitrary initial states. By presenting a novel methodology, we detail the original method and show how to define all properties without having prior knowledge regarding the eigenstates. To this end, we exploit Pennylane and Qiskit platforms resources to analyze scenarios where the Hamiltonians are described by the Zeeman model for one and two spins. We also introduce strategies and techniques to improve the algorithm's performance by adjusting its intrinsic parameters and reducing the fluctuations inherent to data distribution. First, we explore the dynamics of a single qubit on Xanadu simulators to set the parameters that optimize the method performance and select the best strategies to execute the algorithm. On the sequence, we extend the methodology for bipartite systems to discuss how the algorithm works when degeneracy and entanglement are taken into account. Finally, we compare the predictions with the results obtained on a real superconducting device provided by the IBM Q Experience program, establishing the conditions to increase the protocol efficiency for multi-qubit systems.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# PADRe:高能率視覚変換器のためのポリノミアルアテンション・ドロップイン・リプレース

PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer ( http://arxiv.org/abs/2407.11306v1 )

ライセンス: Link先を確認

Pierre-David Letourneau, Manish Kumar Singh, Hsin-Pai Cheng, Shizhong Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, Fatih Porikli,

(参考訳) 本稿では,変圧器モデルにおける従来の自己注意機構を置き換えるために設計された,新規で統一的なフレームワークであるPADReを提案する。特に、Hyena、Mamba、SimA、Conv2Former、Castling-ViTといった最近の別の注意機構は、当社のPADReフレームワークの特定のインスタンスと見なすことができます。 PADReは多項式関数を利用し、近似理論から確立された結果を導き、精度を損なうことなく計算効率を向上する。 PADReの鍵となるコンポーネントは乗法的非線形性であり、Adamard製品のような単純でハードウェアフレンドリーな操作を用いて実装し、線形計算とメモリコストのみを発生させる。 PADReはさらに、Softmaxのような複雑な関数の使用を回避しているが、従来の自己アテンションと同等または優れた精度を維持している。多様なコンピュータビジョンタスクにおける自己注意の代替手段としてのPADReの有効性を評価する。これらのタスクには、画像分類、画像ベースの2Dオブジェクト検出、および3Dポイントクラウドオブジェクト検出が含まれる。実験結果から、PADReは従来の自己注意(サーバGPUやモバイルNPUでは11x〜43倍高速)よりもはるかに高速に動作し、トランスフォーマーモデルに自己注意を代用する場合も同様の精度を維持した。

We present Polynomial Attention Drop-in Replacement (PADRe), a novel and unifying framework designed to replace the conventional self-attention mechanism in transformer models. Notably, several recent alternative attention mechanisms, including Hyena, Mamba, SimA, Conv2Former, and Castling-ViT, can be viewed as specific instances of our PADRe framework. PADRe leverages polynomial functions and draws upon established results from approximation theory, enhancing computational efficiency without compromising accuracy. PADRe's key components include multiplicative nonlinearities, which we implement using straightforward, hardware-friendly operations such as Hadamard products, incurring only linear computational and memory costs. PADRe further avoids the need for using complex functions such as Softmax, yet it maintains comparable or superior accuracy compared to traditional self-attention. We assess the effectiveness of PADRe as a drop-in replacement for self-attention across diverse computer vision tasks. These tasks include image classification, image-based 2D object detection, and 3D point cloud object detection. Empirical results demonstrate that PADRe runs significantly faster than the conventional self-attention (11x ~ 43x faster on server GPU and mobile NPU) while maintaining similar accuracy when substituting self-attention in the transformer models.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# デバイス間通信による分散IoTエッジ上のグローバル異常の検出

Detection of Global Anomalies on Distributed IoT Edges with Device-to-Device Communication ( http://arxiv.org/abs/2407.11308v1 )

ライセンス: Link先を確認

Hideya Ochiai, Riku Nishihata, Eisuke Tomiyama, Yuwei Sun, Hiroshi Esaki,

(参考訳) 異常検出は、異常事象によって引き起こされる外れ値を見つけるためのIoTアプリケーションにおいて重要な機能である。異常検出には、クラウドではなくエッジデバイスで実施すべき高周波データサンプリングが伴うことがある。本稿では,複数のIoTデバイスを1つのリモートサイトに設置し,デバイス間通信による観測から異常を共同検出する事例について考察する。そこで本研究では,無線アドホックフェデレートラーニング(WAFL-Autoencoder)を用いた分散異常検知器のトレーニングを行うための,完全分散協調方式を提案する。サンプルは局所的なデバイスに限らず,対象領域のすべてのデバイスに稀である,Global Anomalyの概念を導入する。また,グローバル異常検出のための分散しきい値探索アルゴリズムを提案する。標準ベンチマークによる評価により、我々はデバイス全体で完全に異常検出を訓練したことを確認した。また, 偽陽性率の低いGlobal Anomaly検出のしきい値が, 例外が少なく, 真陽性率の高い値が得られたことも確認した。

Anomaly detection is an important function in IoT applications for finding outliers caused by abnormal events. Anomaly detection sometimes comes with high-frequency data sampling which should be carried out at Edge devices rather than Cloud. In this paper, we consider the case that multiple IoT devices are installed in a single remote site and that they collaboratively detect anomalies from the observations with device-to-device communications. For this, we propose a fully distributed collaborative scheme for training distributed anomaly detectors with Wireless Ad Hoc Federated Learning, namely "WAFL-Autoencoder". We introduce the concept of Global Anomaly which sample is not only rare to the local device but rare to all the devices in the target domain. We also propose a distributed threshold-finding algorithm for Global Anomaly detection. With our standard benchmark-based evaluation, we have confirmed that our scheme trained anomaly detectors perfectly across the devices. We have also confirmed that the devices collaboratively found thresholds for Global Anomaly detection with low false positive rates while achieving high true positive rates with few exceptions.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# ガウス散乱LK

Gaussian Splatting LK ( http://arxiv.org/abs/2407.11309v1 )

ライセンス: Link先を確認

Liuyue Xie, Joel Julin, Koichiro Niinuma, Laszlo A. Jeni,

(参考訳) 2D画像からダイナミックな3Dシーンを再構築し、時間とともに多様なビューを生成することは、固有の複雑さと時間的ダイナミクスによって大きな課題となる。ニューラル暗黙的モデルと動的ガウススプラッティングの最近の進歩は有望であるが、特に高度にダイナミックなシーンの基礎となる幾何学を正確に捉える際に制限は持続している。いくつかのアプローチは、拡散モデルを通して強い意味論と幾何学的先入観を組み込むことによってこの問題に対処する。しかし,動的ガウススティングフレームワークにおいて,ネイティブワープフィールドの正規化の可能性を検討することによって,異なる経路を探索する。本手法は, 正確なワープ場が連続した時空運動を発生させるという重要な直観に基づいている。ワープフィールドの運動制限は簡単ではないものの,解析速度場を導出するためにフォワードワープフィールドネットワークに固有の知識を生かして,シーンフローの時間積分を行い,ガウスの2次元運動と3次元位置の両方を効果的に拘束できることが示される。このルーカス・カナーデ型解析正規化により、最小限のカメラ動作下であっても、非常にダイナミックなシーンを再構成し、既存の動的ガウス・スプレイティングフレームワークが達成できる範囲を広げる上で、優れた性能を実現することができる。

Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time presents a significant challenge due to the inherent complexity and temporal dynamics involved. While recent advancements in neural implicit models and dynamic Gaussian Splatting have shown promise, limitations persist, particularly in accurately capturing the underlying geometry of highly dynamic scenes. Some approaches address this by incorporating strong semantic and geometric priors through diffusion models. However, we explore a different avenue by investigating the potential of regularizing the native warp field within the dynamic Gaussian Splatting framework. Our method is grounded on the key intuition that an accurate warp field should produce continuous space-time motions. While enforcing the motion constraints on warp fields is non-trivial, we show that we can exploit knowledge innate to the forward warp field network to derive an analytical velocity field, then time integrate for scene flows to effectively constrain both the 2D motion and 3D positions of the Gaussians. This derived Lucas-Kanade style analytical regularization enables our method to achieve superior performance in reconstructing highly dynamic scenes, even under minimal camera movement, extending the boundaries of what existing dynamic Gaussian Splatting frameworks can achieve.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# Digital Twin Vehicular Edge Computing Network: Task Offloading と Resource Allocation

Digital Twin Vehicular Edge Computing Network: Task Offloading and Resource Allocation ( http://arxiv.org/abs/2407.11310v1 )

ライセンス: Link先を確認

Yu Xie, Qiong Wu, Pingyi Fan,

(参考訳) 車両のインターネット上の複数のアプリケーションに対する需要が高まっている。車両は複数の計算タスクをリアルタイムで実行する必要がある。しかし、車両自体の計算能力が不足しているため、車両エッジコンピューティング(VEC)サーバにタスクをオフロードし、コンピュータリソースをタスクに割り当てることは困難である。本稿では,マルチタスクディジタルツイン(DT)VECネットワークを構築した。 DTを用いて、各車両の複数のタスクに対するオフロード戦略とリソース割り当て戦略を1つのスロットで開発することにより、最適化問題を構築する。そこで本研究では,タスクオフロードとリソース割り当てに関するマルチエージェント強化学習手法を提案する。多数の実験により,本手法は他のベンチマークアルゴリズムと比較して有効であることが示された。

With the increasing demand for multiple applications on internet of vehicles. It requires vehicles to carry out multiple computing tasks in real time. However, due to the insufficient computing capability of vehicles themselves, offloading tasks to vehicular edge computing (VEC) servers and allocating computing resources to tasks becomes a challenge. In this paper, a multi task digital twin (DT) VEC network is established. By using DT to develop offloading strategies and resource allocation strategies for multiple tasks of each vehicle in a single slot, an optimization problem is constructed. To solve it, we propose a multi-agent reinforcement learning method on the task offloading and resource allocation. Numerous experiments demonstrate that our method is effective compared to other benchmark algorithms.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# COMET:数学問題生成のための大規模マルチモーダルモデルの強化

COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation ( http://arxiv.org/abs/2407.11315v1 )

ライセンス: Link先を確認

Sannyuya Liu, Jintian Feng, Zongkai Yang, Yawei Luo, Qian Wan, Xiaoxuan Shen, Jianwen Sun,

(参考訳) 高品質な数学問題の自動生成は、多くの教育シナリオにおいて事実上価値のあるものである。大規模マルチモーダルモデルは、クロスモーダルデータシナリオで広く成功しているため、数学的問題生成のための新しい技術的アプローチを提供する。しかし、問題生成から問題解決を分離する従来の手法と、一様学習目的を持つ単調データ構造を主軸とした微調整フレームワークは、数学的な問題生成における大規模マルチモーダルモデルの適用を制限している。これらの課題に対処するため,本論文では,数学的問題生成のための大規模マルチモーダルモデルであるCOMETを提案する。まず、相互能力の促進と応用論理の観点から、茎生成と問題解決を数学的問題生成に統合する。次に、"Cone of Experience"によってガイドされた3段階のファインターンフレームワークを提案する。このフレームワークは、微調整データを象徴的な経験、象徴的な経験、直接的な経験に分割し、教師のキャリア成長における経験と類似性を引き出す。このフレームワークでは、いくつかのきめ細かいデータ構築および注入方法が設計されている。最後に、この分野における中国のマルチモーダルデータの空白を満たすために、中国のマルチモーダル数学問題データセットを構築した。客観的および主観的な指標と組み合わせて、提案したフレームワークとモデルの有効性を複数のデータセットで完全に検証する。

The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framework of monotonous data structure with homogeneous training objectives limit the application of large multimodal model in mathematical problem generation. Addressing these challenges, this paper proposes COMET, a "Cone of Experience" enhanced large multimodal model for mathematical problem generation. Firstly, from the perspective of mutual ability promotion and application logic, we unify stem generation and problem solving into mathematical problem generation. Secondly, a three-stage fine-turning framework guided by the "Cone of Experience" is proposed. The framework divides the fine-tuning data into symbolic experience, iconic experience, and direct experience to draw parallels with experiences in the career growth of teachers. Several fine-grained data construction and injection methods are designed in this framework. Finally, we construct a Chinese multimodal mathematical problem dataset to fill the vacancy of Chinese multimodal data in this field. Combined with objective and subjective indicators, experiments on multiple datasets fully verify the effectiveness of the proposed framework and model.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# BUSClean:医療用AIのための乳房超音波画像前処理と知識抽出のためのオープンソースソフトウェア

BUSClean: Open-source software for breast ultrasound image pre-processing and knowledge extraction for medical AI ( http://arxiv.org/abs/2407.11316v1 )

ライセンス: Link先を確認

Arianna Bunnell, Kailee Hung, John A. Shepherd, Peter Sadowski,

(参考訳) 医療画像のための人工知能(AI)の開発は、数十万の画像からなる大規模な臨床データセットのキュレーションとクリーニングを要求する。マンモグラフィーのようないくつかのモダリティは、高度に標準化されたイメージングを含んでいる。対照的に、乳房超音波画像(BUS)は、スキャンモード、ソノグラフアノテーション、追加のビューなど、スキャンメタデータによって示されない多くの不規則性を含むことができる。臨床BUSデータセットを自動処理するオープンソースソフトウェアソリューションを提案する。このアルゴリズムは、ソノグラフアノテーションからBUSスキャンフィルタリング、クリーニング、知識抽出を行う。モジュラーデザインにより、ユーザーは新しい設定に適応できる。 430の臨床的BUS画像の内部試験データセットの実験は、あらゆる種類のテキストアノテーションの検出において、95%の感度と98%の特異性、98%の感度と特異性、血液フローハイライト、代替スキャンモード、または無効スキャンによるスキャンの検出において達成される。 A case study on a completely external, public dataset of BUS scans found that BUSClean identified text annotations and scan with blood flow highlighting with 88.6% and 90.9% sensitivity and 98.3% and 99.9% specificity。ケーススタディに特有のキャリパーの種類を考慮に入れた病変キャリパー検出法の適応は、新しいデータ分布におけるBUSCleanの使用を意図し、病変キャリパー検出の性能を43.3%、93.3%のアウト・オブ・ザ・ボックスから92.1%、92.3%の感度と特異性に向上させる。ソースコード、サンプルノート、サンプルデータはhttps://github.com/hawaii-ai/bus-cleaning.comで公開されている。

Development of artificial intelligence (AI) for medical imaging demands curation and cleaning of large-scale clinical datasets comprising hundreds of thousands of images. Some modalities, such as mammography, contain highly standardized imaging. In contrast, breast ultrasound imaging (BUS) can contain many irregularities not indicated by scan metadata, such as enhanced scan modes, sonographer annotations, or additional views. We present an open-source software solution for automatically processing clinical BUS datasets. The algorithm performs BUS scan filtering, cleaning, and knowledge extraction from sonographer annotations. Its modular design enables users to adapt it to new settings. Experiments on an internal testing dataset of 430 clinical BUS images achieve >95% sensitivity and >98% specificity in detecting every type of text annotation, >98% sensitivity and specificity in detecting scans with blood flow highlighting, alternative scan modes, or invalid scans. A case study on a completely external, public dataset of BUS scans found that BUSClean identified text annotations and scans with blood flow highlighting with 88.6% and 90.9% sensitivity and 98.3% and 99.9% specificity, respectively. Adaptation of the lesion caliper detection method to account for a type of caliper specific to the case study demonstrates intended use of BUSClean in new data distributions and improved performance in lesion caliper detection from 43.3% and 93.3% out-of-the-box to 92.1% and 92.3% sensitivity and specificity, respectively. Source code, example notebooks, and sample data are available at https://github.com/hawaii-ai/bus-cleaning.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# マヨラナ・クリフォード群の構造

The Structure of the Majorana Clifford Group ( http://arxiv.org/abs/2407.11319v1 )

ライセンス: Link先を確認

Valérie Bettaque, Brian Swingle,

(参考訳) 量子情報科学において、クリフォード作用素と安定化符号は量子ビット(または量子ビット)系において中心的な役割を果たす。本稿では,マヨラナフェルミオン系の類似物について検討する。決定的な役割はフェルミオンパリティ対称性 (fermion parity symmetric) によって演じられる。パリティ保存型フェルミオンクリフォードの部分群は二進体 $\mathbb{F}_2$ 上の直交群で表せることを証明し、演算子をブレイディングして生成し、任意の(偶数の)マヨラナ安定化符号を構成する方法を示す。また、このいわゆる p-クリフォード群に対するフレームポテンシャルを解析し、これはヒルベルト空間の固定パリティセクターで作用する通常のクリフォード群のフレームポテンシャルと同値であることを示した。

In quantum information science, Clifford operators and stabilizer codes play a central role for systems of qubits (or qudits). In this paper, we study the analogous objects for systems of Majorana fermions. A crucial role is played by fermion parity symmetry, which is an unbreakable symmetry present in any system in which the fundamental degrees of freedom are fermionic. We prove that the subgroup of parity-preserving fermionic Cliffords can be represented by the orthogonal group over the binary field $\mathbb{F}_2$, and we show how it can be generated by braiding operators and used to construct any (even-parity) Majorana stabilizer code. We also analyze the frame potential for this so-called p-Clifford group, proving that it is equivalent to the frame potential of the ordinary Clifford group acting on a fixed-parity sector of the Hilbert space.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# A2E:ドライバーレスタクシーサービスへのアクセスのための属性に基づく匿名化認証

A2E: Attribute-based Anonymity-Enhanced Authentication for Accessing Driverless Taxi Service ( http://arxiv.org/abs/2407.11320v1 )

ライセンス: Link先を確認

Yanwei Gong, Xiaolin Chang, Jelena Mišić, Vojislav B. Mišić,

(参考訳) タクシーとしての無人車は、都市交通効率を高める可能性から注目を集めている。しかし、未管理の物理的利用者の無人タクシー(DT)による予期せぬ事故と、DTに乗る場合の個人化のニーズの両方が、ユーザアイデンティティと属性の認証を必要としている。さらに、ユーザIDのプライバシを保護し、DTの採用を強化する必要があれば、悪意のあるユーザを迅速にトレースすることは、依然として課題である。本稿では,DTサービスにアクセスするためのA2E(Attribute-based Anonymity Enhanced)認証方式を提案する。セキュリティ面から、A2Eは属性検証可能性を持ち、再実行可能なシグネチャに基づいてユーザ属性クレデンシャルを設計することで達成される。一方、この属性クレデンシャルはリンク不能と偽造不能も満足している。さらに、A2Eは、リングシグネチャとシークレット共有を利用した分散型クレデンシャル発行機構を設計し、匿名IDとの関連性からユーザ属性を保護することで、匿名性を向上した。さらに、このメカニズムはユーザに対してトレーサビリティと非フレーム性を提供します。パフォーマンス面では、悪意のあるユーザをトレースし、資格情報を更新する場合、A2Eはオーバーヘッドを低くする。さらに、スケーラビリティも軽量さも満足しており、A2Eの実践性に貢献している。我々は,A2Eのセキュリティと性能について,セキュリティ分析と性能評価を行う。

Driverless vehicle as a taxi is gaining more attention due to its potential to enhance urban transportation efficiency. However, both unforeseen incidents led by unsupervised physical users' driverless taxi (DT) rides and personalized needs of users when riding in a DT necessitate the authentication of user identity and attributes. Moreover, safeguarding user identity privacy and quickly tracing malicious users if necessary to enhance the adoption of DTs remains a challenge. This paper proposes a novel Attribute-based Anonymity Enhanced (A2E) authentication scheme for users to access DT service. From the security aspect, A2E has attribute verifiability, which is achieved by designing a user attribute credential based on redactable signature. Meanwhile, this attribute credential also satisfies unlinkability and unforgeability. In addition, A2E has enhanced anonymity, which is achieved by designing a decentralized credential issuance mechanism utilizing ring signature and secret sharing, safeguarding user attributes from association with anonymous identities. Moreover, this mechanism provides traceability and non-frameability to users. From the performance aspect, A2E causes low overhead when tracing malicious users and updating credentials. Besides, both scalability and lightweight are satisfied, which contributes to A2E's practicability. We conduct security analysis and performance evaluation to the security and performance capabilities of A2E.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# TCTCer:Token Clustering Transformerによる視覚認識

TCFormer: Visual Recognition via Token Clustering Transformer ( http://arxiv.org/abs/2407.11321v1 )

ライセンス: Link先を確認

Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang,

(参考訳) トランスフォーマーはコンピュータビジョン領域で広く使われており、大きな成功を収めている。ほとんどの最先端のアプローチでは、イメージを通常のグリッドに分割し、各グリッド領域を視覚トークンで表現する。しかし、固定されたトークン分布は、異なる画像領域の意味を無視し、結果として準最適性能をもたらす。この問題に対処するために,意味的意味に基づく動的視覚トークンを生成するToken Clustering Transformer (TCFormer)を提案する。ダイナミックトークンには2つの重要な特徴がある:(1)同じ視覚トークンを用いて類似の意味を持つ画像領域を表現し、(2)それらの領域が隣接していない場合でも、(2)貴重な詳細を持つ領域に集中し、細かなトークンを用いてそれらを表現する。画像分類,人物ポーズ推定,セマンティックセグメンテーション,オブジェクト検出など,さまざまな応用の広範な実験を通じて,TCFormerの有効性を実証する。この作業のコードとモデルはhttps://github.com/zengwang430521/TCFormer.comで公開されている。

Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Transformer (TCFormer), which generates dynamic vision tokens based on semantic meaning. Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens. Through extensive experimentation across various applications, including image classification, human pose estimation, semantic segmentation, and object detection, we demonstrate the effectiveness of our TCFormer. The code and models for this work are available at https://github.com/zengwang430521/TCFormer.

翻訳日:2024-07-17 18:52:01 公開日:2024-07-16

# VISA:大規模言語モデルによるビデオオブジェクトのセグメンテーションの推論

VISA: Reasoning Video Object Segmentation via Large Language Models ( http://arxiv.org/abs/2407.11325v1 )

ライセンス: Link先を確認

Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves,

(参考訳) 既存のビデオオブジェクトセグメンテーション(VOS)は、カテゴリ、マスク、ショートフレーズなどの明示的なユーザー指示に依存しており、世界知識の推論を必要とする複雑なビデオセグメンテーションを実行する能力を制限する。本稿では,新しいタスクであるReasoning Video Object Segmentation(ReasonVOS)を紹介する。この課題は、世界知識とビデオコンテキストに基づく複雑な推論能力を必要とする暗黙のテキストクエリに応答して、セグメンテーションマスクのシーケンスを生成することを目的としている。 ReasonVOSに取り組むために,マスクデコーダを用いたビデオ内のオブジェクトのセグメンテーションと追跡機能を有しつつ,マルチモーダルLCMの世界の知識推論能力を活用するためのVISA(ビデオベース大規模言語命令セグメンテーションアシスタント)を導入する。さらに、1,042の多様なビデオから35,074の命令マスクシーケンスペアからなる総合ベンチマークを構築し、複雑な世界知識推論をReasonVOSモデルの命令チューニングと評価のためのセグメンテーションタスクに組み込む。 8つのデータセットで行った実験は、ビデオ領域と画像領域の両方において、複雑な推論セグメンテーションとバニラ参照セグメンテーションに取り組む上で、VISAの有効性を示す。コードとデータセットはhttps://github.com/cilinyan/VISAで公開されている。

Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as categories, masks, or short phrases, restricting their ability to perform complex video segmentation requiring reasoning with world knowledge. In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS). This task aims to generate a sequence of segmentation masks in response to implicit text queries that require complex reasoning abilities based on world knowledge and video contexts, which is crucial for structured environment understanding and object-centric interactions, pivotal in the development of embodied AI. To tackle ReasonVOS, we introduce VISA (Video-based large language Instructed Segmentation Assistant), to leverage the world knowledge reasoning capabilities of multi-modal LLMs while possessing the ability to segment and track objects in videos with a mask decoder. Moreover, we establish a comprehensive benchmark consisting of 35,074 instruction-mask sequence pairs from 1,042 diverse videos, which incorporates complex world knowledge reasoning into segmentation tasks for instruction-tuning and evaluation purposes of ReasonVOS models. Experiments conducted on 8 datasets demonstrate the effectiveness of VISA in tackling complex reasoning segmentation and vanilla referring segmentation in both video and image domains. The code and dataset are available at https://github.com/cilinyan/VISA.