Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240902となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# MaskMol: アクティビティ・クリフのための知識誘導型分子画像事前学習フレームワーク MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for Activity Cliffs ( http://arxiv.org/abs/2409.12926v1 ) ライセンス: Link先を確認	Zhixiang Cheng, Hongxin Xiang, Pengsen Ma, Li Zeng, Xin Jin, Xixi Yang, Jianxin Lin, Yang Deng, Bosheng Song, Xinxin Feng, Changhui Deng, Xiangxiang Zeng,	(参考訳) 活性崖(英: Activity cliffs)は、構造的に類似しているが、その機能に顕著な違いを示す分子の対を指すもので、モデル表現の崩壊を招き、モデルを区別することが困難になる。我々の研究は、分子の類似性が増加するにつれて、グラフベースの手法はこれらのニュアンスを捉えるのに苦労する一方で、画像ベースのアプローチは事実上区別を保っていることを示唆している。そこで我々は知識誘導型分子画像自己教師型学習フレームワークMaskMolを開発した。 MaskMolは、原子、結合、サブ構造といった複数のレベルの分子知識を考慮し、分子画像の表現を正確に学習する。ピクセルマスキングタスクを利用することで、MaskMolは分子画像からきめ細かい情報を抽出し、微妙な構造変化を特定するために既存のディープラーニングモデルの限界を克服する。実験結果から,20のマクロ分子標的における活動崖推定と複合機能予測におけるMaskMolの精度と伝達性を示し,25の最先端ディープラーニングおよび機械学習アプローチを上回った。可視化分析は、活動崖関連分子サブ構造を同定する上で、MaskMolの高い生物学的解釈可能性を明らかにする。特に MaskMol を用いて腫瘍治療に用いられる候補EP4阻害剤を同定した。本研究は, 活動崖に対する意識を高めるだけでなく, 分子画像表現学習と仮想スクリーニング, 薬物発見の進展, 構造活性関係(SAR)の新たな洞察をもたらす新しい手法も導入する。 Activity cliffs, which refer to pairs of molecules that are structurally similar but show significant differences in their potency, can lead to model representation collapse and make the model challenging to distinguish them. Our research indicates that as molecular similarity increases, graph-based methods struggle to capture these nuances, whereas image-based approaches effectively retain the distinctions. Thus, we developed MaskMol, a knowledge-guided molecular image self-supervised learning framework. MaskMol accurately learns the representation of molecular images by considering multiple levels of molecular knowledge, such as atoms, bonds, and substructures. By utilizing pixel masking tasks, MaskMol extracts fine-grained information from molecular images, overcoming the limitations of existing deep learning models in identifying subtle structural changes. Experimental results demonstrate MaskMol's high accuracy and transferability in activity cliff estimation and compound potency prediction across 20 different macromolecular targets, outperforming 25 state-of-the-art deep learning and machine learning approaches. Visualization analyses reveal MaskMol's high biological interpretability in identifying activity cliff-relevant molecular substructures. Notably, through MaskMol, we identified candidate EP4 inhibitors that could be used to treat tumors. This study not only raises awareness about activity cliffs but also introduces a novel method for molecular image representation learning and virtual screening, advancing drug discovery and providing new insights into structure-activity relationships (SAR).	翻訳日:2024-11-07 12:48:01 公開日:2024-09-02
# ニューラルネットワークのための高能率汎用光加速器 An Efficient General-Purpose Optical Accelerator for Neural Networks ( http://arxiv.org/abs/2409.12966v1 ) ライセンス: Link先を確認	Sijie Fei, Amro Eldebiky, Grace Li Zhang, Bing Li, Ulf Schlichtmann,	(参考訳) 汎用光アクセラレータ(GOA)は、低レイテンシとエネルギー消費のため、ディープニューラルネットワーク(DNN)を加速する有望なプラットフォームとして登場した。このような加速器は、通常、所定の数のマッハ・ツェンダー干渉計(MZI)から構成される。しかし、このインターリービングアーキテクチャは、重み行列とGOAアーキテクチャのミスマッチにより、様々なサイズのニューラルネットワークを加速する際の効率が低い。本研究では,ニューラルネットワークのGOAへのマッピング効率を高めるために,ハイブリッドGOAアーキテクチャを提案する。このアーキテクチャでは、独立したMZIモジュールはマイクロリング共振器(MRR)と接続され、それらを結合して大きなニューラルネットワークを効率的に処理することができる。これらのモジュールはそれぞれ、可変係数で調整された入力を持つユニタリ行列を実装している。提案アーキテクチャのパラメータを遺伝的アルゴリズムを用いて探索する。ニューラルネットワークの精度を高めるために、選択された重み行列を特異値分解(SVD)を適用した複数のユニタリ行列に拡張する。ニューラルネットワークのカーネルも、オンチップの計算リソースを使用するように調整される。実験の結果,提案したアーキテクチャ上でのニューラルネットワークのマッピング効率は,データセットのCifar10とCifar100でそれぞれ21.87%,21.20%,24.69%,25.52%,VGG16とResnet18では25.52%向上した。また、消費電力と計算遅延をそれぞれ67%以上と21%以上削減することができる。 General-purpose optical accelerators (GOAs) have emerged as a promising platform to accelerate deep neural networks (DNNs) due to their low latency and energy consumption. Such an accelerator is usually composed of a given number of interleaving Mach-Zehnder- Interferometers (MZIs). This interleaving architecture, however, has a low efficiency when accelerating neural networks of various sizes due to the mismatch between weight matrices and the GOA architecture. In this work, a hybrid GOA architecture is proposed to enhance the mapping efficiency of neural networks onto the GOA. In this architecture, independent MZI modules are connected with microring resonators (MRRs), so that they can be combined to process large neural networks efficiently. Each of these modules implements a unitary matrix with inputs adjusted by tunable coefficients. The parameters of the proposed architecture are searched using genetic algorithm. To enhance the accuracy of neural networks, selected weight matrices are expanded to multiple unitary matrices applying singular value decomposition (SVD). The kernels in neural networks are also adjusted to use up the on-chip computational resources. Experimental results show that with a given number of MZIs, the mapping efficiency of neural networks on the proposed architecture can be enhanced by 21.87%, 21.20%, 24.69%, and 25.52% for VGG16 and Resnet18 on datasets Cifar10 and Cifar100, respectively. The energy consumption and computation latency can also be reduced by over 67% and 21%, respectively.	翻訳日:2024-11-07 12:36:59 公開日:2024-09-02
# プログラミングの割り当てをグラディングするとき、人間はどんな一貫性があるのか? How Consistent Are Humans When Grading Programming Assignments? ( http://arxiv.org/abs/2409.12967v1 ) ライセンス: Link先を確認	Marcus Messer, Neil C. C. Brown, Michael Kölling, Miaojing Shi,	(参考訳) 学生に一貫した総合評価を提供することが重要である。小さなコホートは通常、クラスリーダーのような単一の評価官によって評価されるが、より大きなコホートは複数の評価官によって評価されることが多く、矛盾する格付けのリスクが増大する。プログラミング課題の人間のグレーティングの一貫性を調べるために、私たちは、各グレード40 CS1イントロダクティブJava課題の参加者28人に、正確性、コードのエレガンス、可読性、ドキュメントのグレードとフィードバックを提供し、40の割り当てを20の2つのバッチに分割した。 20の2回目のバッチでは、まず1つの課題を複製し、個々の評価者の内部の一貫性を分析しました。我々はクリッペンドルフの $\alpha$ -- a $\alpha > 0.667$ を用いてグループ間の信頼性を測定し、評価に基づいて仮の結論を出すことを推奨した。コードのエレガンス、可読性、ドキュメントに対して、平均$\alpha < 0.1$と、グルーピングの正確さでは平均$\alpha = 0.2$、そして平均$\alpha < 0.1$でした。学習者の個人的一貫性を測定するため,バッチ1とバッチ2の重複代入に対して与えられた学年間距離を測定した。代入が重複であることに気づかなかった22名の参加者は、正当性、コードの優雅性、可読性、ドキュメントについて同じ評価を受けた。平均的なグレード差は、正確性は1.79で、コードのエレガンス、可読性、ドキュメントは1.6未満である。以上の結果から,本研究における人間の学年は,学生の作業の一部を与えるために学年に同意できず,個々に矛盾することが多いことが示唆され,共有ルーリックだけでは整合性を確保するには不十分な「黄金標準」の考え方が欠陥がある可能性が示唆された。 Providing consistent summative assessment to students is important, as the grades they are awarded affect their progression through university and future career prospects. While small cohorts are typically assessed by a single assessor, such as the class leader, larger cohorts are often assessed by multiple assessors, which increases the risk of inconsistent grading. To investigate the consistency of human grading of programming assignments, we asked 28 participants to each grade 40 CS1 introductory Java assignments, providing grades and feedback for correctness, code elegance, readability and documentation; the 40 assignments were split into two batches of 20. In the second batch of 20, we duplicated one assignment from the first to analyse the internal consistency of individual assessors. We measured the inter-rater reliability of the groups using Krippendorf's $\alpha$ -- an $\alpha > 0.667$ is recommended to make tentative conclusions based on the rating. Our groups were inconsistent, with an average $\alpha = 0.2$ when grading correctness and an average $\alpha < 0.1$ for code elegance, readability and documentation. To measure the individual consistency of graders, we measured the distance between the grades they awarded for the duplicated assignment in batch one and batch two. Only one participant of the 22 who didn't notice that the assignment was a duplicate was awarded the same grade for correctness, code elegance, readability and documentation. The average grade difference was 1.79 for correctness and less than 1.6 for code elegance, readability and documentation. Our results show that human graders in our study can not agree on the grade to give a piece of student work and are often individually inconsistent, suggesting that the idea of a ``gold standard'' of human grading might be flawed, and highlights that a shared rubric alone is not enough to ensure consistency.	翻訳日:2024-11-07 12:36:59 公開日:2024-09-02
# MITHOS: 学校における社会と感情の交流を支援する対話型複合現実感トレーニング MITHOS: Interactive Mixed Reality Training to Support Professional Socio-Emotional Interactions at Schools ( http://arxiv.org/abs/2409.12968v1 ) ライセンス: Link先を確認	Lara Chehayeb, Chirag Bhuvaneshwara, Manuel Anglet, Bernhard Hilpert, Ann-Kristin Meyer, Dimitra Tsovaltzi, Patrick Gebhard, Antje Biermann, Sinah Auchtor, Nils Lauinger, Julia Knopf, Andreas Kaiser, Fabian Kersting, Gregor Mehlmann, Florian Lingenfelser, Elisabeth André,	(参考訳) 対立状態に苦しむ教師は、しばしば恥と自白を経験するが、それは無能感に関係しているが、怒りとして外在することがある。混合信号のセンシングは、感情の規制を発達させるための緊急規則に失敗し、生徒が自分の感情を混乱させ、感情の規制を妨げてしまう可能性がある。したがって、感情の個々の経験に利益をもたらすだけでなく、効果的な対人感情の規制を育み、状況の管理方法に影響を与えることができる。 MITHOSは、教室衝突時の現実的なシチュエーション学習の機会を通じて、教師のコンフリクト解決スキルを訓練することを目的としたシステムである。 4つの段階において、MITHOSは教師の社会的感情的自己認識、視点決定、肯定的な態度をサポートする。以下に示す。イ自由な社会的相互作用を訓練し、相互の学生・エージェントの反応から自然の社会的フィードバックを受ける安全な仮想環境 b) アバターを通した空間的状況的視点 c) 共同規制プロセスによる感情経験に関する個別の仮想リフレクションガイダンス d)専門的行動戦略に関する専門家のフィードバック。本章は、半自動ウィザード・オブ・オズ(WoZ)システムにおける4つのステージとその実装について述べる。 WoZシステムは、完全に自動化されたハイブリッド(機械学習とモデルベース)システムの開発に使用されるデータを収集し、基礎となる心理的およびコンフリクト解決モデルを検証する。本稿では、シナリオリアリズムの観点からアプローチを検証する結果と、行動類似性を伴う自己認識の先行者に対する外部アバター類似性の効果の体系的検証について述べる。この章は、人間中心で一般化可能なXRのための学際的な研究を行うための共通の方法論に貢献し、それをサポートするように設計されたシステムを提示している。 Teachers in challenging conflict situations often experience shame and self-blame, which relate to the feeling of incompetence but may externalise as anger. Sensing mixed signals fails the contingency rule for developing affect regulation and may result in confusion for students about their own emotions and hinder their emotion regulation. Therefore, being able to constructively regulate emotions not only benefits individual experience of emotions but also fosters effective interpersonal emotion regulation and influences how a situation is managed. MITHOS is a system aimed at training teachers' conflict resolution skills through realistic situative learning opportunities during classroom conflicts. In four stages, MITHOS supports teachers' socio-emotional self-awareness, perspective-taking and positive regard. It provides: a) a safe virtual environment to train free social interaction and receive natural social feedback from reciprocal student-agent reactions, b) spatial situational perspective taking through an avatar, c) individual virtual reflection guidance on emotional experiences through co-regulation processes, and d) expert feedback on professional behavioural strategies. This chapter presents the four stages and their implementation in a semi-automatic Wizard-of-Oz (WoZ) System. The WoZ system affords collecting data that are used for developing the fully automated hybrid (machine learning and model-based) system, and to validate the underlying psychological and conflict resolution models. We present results validating the approach in terms of scenario realism, as well as a systematic testing of the effects of external avatar similarity on antecedents of self-awareness with behavior similarity. The chapter contributes to a common methodology of conducting interdisciplinary research for human-centered and generalisable XR and presents a system designed to support it.	翻訳日:2024-11-07 12:36:59 公開日:2024-09-02
# 目を通して見る:視覚言語モデルを用いた視覚的視点の評価 Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models ( http://arxiv.org/abs/2409.12969v1 ) ライセンス: Link先を確認	Gracjan Góral, Alicja Ziarko, Michal Nauman, Maciej Wołczyk,	(参考訳) 視覚的視点取り(VPT)は、他人の視点を理解する能力であり、個人が他人の行動を予測することを可能にする。例えば、運転手は歩行者が見ているものを評価することで事故を避けることができる。人間は通常、このスキルを幼少期に開発するが、最近登場したビジョン言語モデル(VLM)がそのような能力を持っているかどうかは不明だ。さらに、これらのモデルが現実世界にますます展開されるにつれて、VPTのようなニュアンスなタスクをどのように実行するかを理解することが不可欠です。本稿では、VPTスキルをテストするために、Isle-BricksとIsle-Dotsという2つの手動でキュレートしたデータセットを導入し、それを12のVLMの評価に利用した。すべてのモデルにおいて、視点を取る必要がある場合、大幅なパフォーマンス低下が観測される。さらに、オブジェクト検出タスクのパフォーマンスはVPTタスクのパフォーマンスと相関が低く、既存のベンチマークではこの問題を理解するのに十分でない可能性があることを示唆している。コードとデータセットはhttps://sites.google.com/view/perspective-takeで確認できる。 Visual perspective-taking (VPT), the ability to understand the viewpoint of another person, enables individuals to anticipate the actions of other people. For instance, a driver can avoid accidents by assessing what pedestrians see. Humans typically develop this skill in early childhood, but it remains unclear whether the recently emerging Vision Language Models (VLMs) possess such capability. Furthermore, as these models are increasingly deployed in the real world, understanding how they perform nuanced tasks like VPT becomes essential. In this paper, we introduce two manually curated datasets, Isle-Bricks and Isle-Dots for testing VPT skills, and we use it to evaluate 12 commonly used VLMs. Across all models, we observe a significant performance drop when perspective-taking is required. Additionally, we find performance in object detection tasks is poorly correlated with performance on VPT tasks, suggesting that the existing benchmarks might not be sufficient to understand this problem. The code and the dataset will be available at https://sites.google.com/view/perspective-taking	翻訳日:2024-11-07 12:36:59 公開日:2024-09-02
# TRACE: 分散クリックストリームイベントシーケンスからのトランスフォーマーベースのユーザ表現 TRACE: Transformer-based user Representations from Attributed Clickstream Event sequences ( http://arxiv.org/abs/2409.12972v1 ) ライセンス: Link先を確認	William Black, Alexander Manlove, Jack Pennington, Andrea Marchini, Ercument Ilhan, Vilda Markeviciute,	(参考訳) 旅行eコマースのWebサイトをナビゲートするユーザにとって、製品の調査と購入のプロセスは、長い期間にわたって多くのセッションにまたがる複雑なブラウジングパターンをもたらすことが多い。結果として得られたクリックストリームデータは、これらのユーザージャーニーを年代記し、パーソナライズされたレコメンデーションを大幅に強化できる洞察を導き出す貴重な機会を提供する。本稿では,リアルタイムレコメンデーションアプリケーションのために,ライブマルチセッションクリックストリームからリッチなユーザ埋め込みを生成するためのトランスフォーマーベースの新しいアプローチであるTRACEを紹介する。 TRACEは、複数のユーザセッションにまたがるサイト全体のページビューシーケンスを活用して、長期的なエンゲージメントをモデル化する。マルチタスク学習フレームワークを用いて、TRACEは、低次元表現に蒸留された包括的なユーザの好みと意図をキャプチャする。 TRACE がバニラトランスフォーマーや LLM スタイルのアーキテクチャよりも優れていることを実ユーザ旅行の大規模な旅行eコマースデータセットに関する大規模な実験を通じて実証する。学習した埋め込みの可視化は、潜伏したユーザ状態と行動に対応する有意義なクラスタを明らかにし、ナンスなユーザインタラクションと嗜好をキャプチャしてレコメンデーションシステムを強化するTRACEの可能性を強調している。 For users navigating travel e-commerce websites, the process of researching products and making a purchase often results in intricate browsing patterns that span numerous sessions over an extended period of time. The resulting clickstream data chronicle these user journeys and present valuable opportunities to derive insights that can significantly enhance personalized recommendations. We introduce TRACE, a novel transformer-based approach tailored to generate rich user embeddings from live multi-session clickstreams for real-time recommendation applications. Prior works largely focus on single-session product sequences, whereas TRACE leverages site-wide page view sequences spanning multiple user sessions to model long-term engagement. Employing a multi-task learning framework, TRACE captures comprehensive user preferences and intents distilled into low-dimensional representations. We demonstrate TRACE's superior performance over vanilla transformer and LLM-style architectures through extensive experiments on a large-scale travel e-commerce dataset of real user journeys, where the challenges of long page-histories and sparse targets are particularly prevalent. Visualizations of the learned embeddings reveal meaningful clusters corresponding to latent user states and behaviors, highlighting TRACE's potential to enhance recommendation systems by capturing nuanced user interactions and preferences	翻訳日:2024-11-07 12:36:59 公開日:2024-09-02
# 有限オートマタによる大規模言語モデルの宣言的統合と管理:自動化・コミュニケーション・倫理への応用 Declarative Integration and Management of Large Language Models through Finite Automata: Application to Automation, Communication, and Ethics ( http://arxiv.org/abs/2409.13693v1 ) ライセンス: Link先を確認	Thierry Petit, Arnault Pachot, Claire Conan-Vrinat, Alexandre Dubarry,	(参考訳) 本稿では,Large Language Models(LLM)を共有履歴と宣言的に組み合わせて設計した革新的なアーキテクチャを紹介する。我々のアプローチは汎用的で宣言的であり、イベント管理システムと組み合わされた有限オートマトンの構築に依存している。開発ツールは、プログラミングの最小限の労力、特にポジティブ心理学の手法をAIに統合するために、LLMの効率的で複雑な統合を容易にするために作られた。この手法の柔軟性は、自動化、コミュニケーション、倫理の応用例を通して実証される。 This article introduces an innovative architecture designed to declaratively combine Large Language Models (LLMs) with shared histories, and triggers to identify the most appropriate LLM for a given task. Our approach is general and declarative, relying on the construction of finite automata coupled with an event management system. The developed tool is crafted to facilitate the efficient and complex integration of LLMs with minimal programming effort, especially, but not only, for integrating methods of positive psychology to AI. The flexibility of our technique is demonstrated through applied examples in automation, communication, and ethics.	翻訳日:2024-11-07 05:57:35 公開日:2024-09-02
# プロジェクトコンフリクトを成功に導く - ステアメイトを解決するための複雑なシステム設計アプローチ Confronting Project Conflicts into Success: a Complex Systems Design Approach to Resolving Stalemates ( http://arxiv.org/abs/2409.10549v1 ) ライセンス: Link先を確認	L. G. Teuber, A. R. M. Wolfert,	(参考訳) 今日の複雑なプロジェクト開発では、ステークホルダーがしばしば遅すぎる。また、多くの場合、システムの振る舞いのみに焦点を当て、個々の利害関係者の好みを統合しない、一方的な技術的な焦点がある。これにより、ステークホルダーは"社会的"から生まれるのではなく、"技術的"衝突に陥る。さらに、ステークホルダーは多面的な開発プロセスにしばしば関与します。したがって、システム現実とステークホルダーの利益の両方を共同合意と技術枠組みに統合する純粋に連想的かつアプリオリ的なアプローチが必要である。最先端のPreferendusは、実証済みのOpen Design Systems(Odesys)方法論に組み込まれたコンピュータ支援設計エンジンであり、成功への複雑さに直面する中立的なツールである。 Preferendusは、風力発電に関連する多くの自由度、プロジェクトの制約、および多くの利害関係者の客観的機能のための、最良の汎用ソリューションを共同で作成するためにデプロイされる。そこで本研究では, 選択型コンジョイント分析(CBCA)手法を用いて, 個々の利害関係者の重み付けを透過的に行うための構造化された利害関係者判断手法を導入する。また、個々の利害関係者選好関数の初期推定値を得ることができる。議論の余地のある外因性要因を内在的デザインパラメータとしてモデル化することにより、どの要因が技術的にも社会的にも(未)解決可能であり、利害と現実が結合しているかを示す。 In today's complex projects development, stakeholders are often involved too late. There is also in many cases a one-sided technical focus that only focuses on the system's behaviour and does not integrate the individual stakeholder preferences. This locks stakeholders into a 'technical' conflict instead of being able to emerge from it 'socially'. Moreover, stakeholders are often involved a-posteriori in a multi-faceted development process which is untransparent, leading to stalemates or even artefacts that nobody ever wants. There is thus a need for a purely associative and a-priori design-supported approach that integrates both system's reality and stakeholder's interests within a joint agreement and technical framework. The state-of-the-art Preferendus, the computer-aided design engine embedded within the proven Open Design Systems (Odesys) methodology, is a neutral tool in confronting complexity into success. The Preferendus is deployed to co-creatively generate a best-fit-for-common-purpose solution for a number of wind farm related degrees of freedom, project constraints and given a number of stakeholder objective functions. Since, the Preferendus design potential for a stalemate depends strongly on stakeholder interest, importance and trust, in this paper an structured stakeholder judgement approach is introduced to transparently arrive at individual stakeholder weights using a choice-based conjoint analysis (CBCA) method. This method also allows for obtaining an initial estimate for the individual stakeholder preference functions. By modelling disputable exogenous factors as endogenous design parameters, it is also shown for which factors the stalemate problem is indeed both technically and socially (un)solvable, while interests and reality are conjoined.	翻訳日:2024-09-22 21:22:31 公開日:2024-09-02
# エージェント・ソサエティ: 現実世界の骨格と大規模言語モデルによるテクスチャの融合 Agentic Society: Merging skeleton from real world and texture from Large Language Model ( http://arxiv.org/abs/2409.10550v1 ) ライセンス: Link先を確認	Yuqi Bai, Kun Sun, Huishi Yin,	(参考訳) 大規模言語モデル(LLM)やエージェント技術の最近の進歩は、社会科学実験のシミュレーションに有望な解決策を提供するが、多くの人が必要とする実世界の人口のデータが利用可能であることは、依然として大きな課題である。本稿では,人口統計データとLCMを用いて仮想人口生成を行い,資源要件を著しく低減し,実世界のデータに関連するプライバシーコンプライアンス問題を回避し,統計的真理性を維持した新たな枠組みについて検討する。実世界の国勢調査データに基づいて,まず人口統計特性を反映したペルソナを作成した。次に、画像生成モデルに類似した手法を用いて、複雑な詳細でこれらのペルソナを豊かにするためにLLMを用いるが、テキストデータに適用する。さらに,人格特性テスト,特に,生成したペルソナの深さと現実性を高めるビッグファイブモデルに基づいて,LLMの能力に対する本手法の有効性を評価するための枠組みを提案する。予備実験と分析により,社会科学実験における多様な人間の行動のシミュレーションに不可欠な多様性を持つペルソナを創出することが実証された。しかし, 評価結果から, 現在のLSMの能力に限界があるため, 統計的真理性の弱い兆候しか得られないことが示唆された。我々の研究から得た洞察は、人間の価値観と現実の複雑さを反映することの間のLCM内の緊張も強調する。厳密で厳密なテストは、さらなる研究を求めている。私たちのコードはhttps://github.com/baiyuqi/agentic-society.gitで公開されています。 Recent advancements in large language models (LLMs) and agent technologies offer promising solutions to the simulation of social science experiments, but the availability of data of real-world population required by many of them still poses as a major challenge. This paper explores a novel framework that leverages census data and LLMs to generate virtual populations, significantly reducing resource requirements and bypassing privacy compliance issues associated with real-world data, while keeping a statistical truthfulness. Drawing on real-world census data, our approach first generates a persona that reflects demographic characteristics of the population. We then employ LLMs to enrich these personas with intricate details, using techniques akin to those in image generative models but applied to textual data. Additionally, we propose a framework for the evaluation of the feasibility of our method with respect to capability of LLMs based on personality trait tests, specifically the Big Five model, which also enhances the depth and realism of the generated personas. Through preliminary experiments and analysis, we demonstrate that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments. But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs. Insights from our study also highlight the tension within LLMs between aligning with human values and reflecting real-world complexities. Thorough and rigorous test call for further research. Our codes are released at https://github.com/baiyuqi/agentic-society.git	翻訳日:2024-09-22 21:22:31 公開日:2024-09-02
# AI Literacy for All: Adjustable Interciplinary Socio-Technical Curriculum AI Literacy for All: Adjustable Interdisciplinary Socio-technical Curriculum ( http://arxiv.org/abs/2409.10552v1 ) ライセンス: Link先を確認	Sri Yash Tadimalla, Mary Lou Maher,	(参考訳) 本稿では、AIの学際的理解を促進するカリキュラム「AI Literacy for All」と、その社会技術的意味、およびあらゆるレベルの教育への実践的応用について述べる。人工知能(AI)の急速な進化により、従来のAI教育カリキュラムを超えて、AIリテラシーが必要である。 AIリテラシーは、パブリックリテラシー、デザイナのための能力構築、AI概念の概念理解、ドメイン固有のスキルアップなど、さまざまな方法で概念化されている。これらの概念化のほとんどは、ChatGPTのようなジェネレーティブAI(Gen-AI)ツールの公開前に確立された。 AI教育は、AIの原則と応用に焦点を合わせ、AIの原則の熟達、これらの技術の基礎となる数学的基礎、AIソリューションを実装するために必要なプログラミングと数学的スキルを強調している。 AI Literacy for Allでは、技術的および非技術的学習結果を含むバランスの取れたカリキュラムを強調し、学際的な社会技術的文脈において、AI技術の概念的理解と批判的評価を可能にする。本稿では、AIリテラシーの4つの柱として、AIのスコープと技術的側面を理解すること、知識と責任のある方法でGen-AIと対話する方法を学ぶこと、倫理と責任のあるAIの社会技術的問題、そしてAIの社会的および将来の意味について述べる。 AI教育のすべての学習成果をコンピュータサイエンス専攻に含めることが重要であるが、学習成果は、非CS専攻、高校サマーキャンプ、成人労働者、一般人など、他の学習状況に合わせて調整することができる。本稿では、AIへの参加を広げる手段として、より学際的な社会技術アプローチを提供するために、AIリテラシー教育のシフトを提唱する。 This paper presents a curriculum, "AI Literacy for All," to promote an interdisciplinary understanding of AI, its socio-technical implications, and its practical applications for all levels of education. With the rapid evolution of artificial intelligence (AI), there is a need for AI literacy that goes beyond the traditional AI education curriculum. AI literacy has been conceptualized in various ways, including public literacy, competency building for designers, conceptual understanding of AI concepts, and domain-specific upskilling. Most of these conceptualizations were established before the public release of Generative AI (Gen-AI) tools like ChatGPT. AI education has focused on the principles and applications of AI through a technical lens that emphasizes the mastery of AI principles, the mathematical foundations underlying these technologies, and the programming and mathematical skills necessary to implement AI solutions. In AI Literacy for All, we emphasize a balanced curriculum that includes technical and non-technical learning outcomes to enable a conceptual understanding and critical evaluation of AI technologies in an interdisciplinary socio-technical context. The paper presents four pillars of AI literacy: understanding the scope and technical dimensions of AI, learning how to interact with Gen-AI in an informed and responsible way, the socio-technical issues of ethical and responsible AI, and the social and future implications of AI. While it is important to include all learning outcomes for AI education in a Computer Science major, the learning outcomes can be adjusted for other learning contexts, including, non-CS majors, high school summer camps, the adult workforce, and the public. This paper advocates for a shift in AI literacy education to offer a more interdisciplinary socio-technical approach as a pathway to broaden participation in AI.	翻訳日:2024-09-22 21:22:31 公開日:2024-09-02
# フラッピング」大学:LLM支援生涯学習環境 "Flipped" University: LLM-Assisted Lifelong Learning Environment ( http://arxiv.org/abs/2409.10553v1 ) ライセンス: Link先を確認	Kirill Krinkin, Tatiana Berlenko,	(参考訳) 人工知能技術の急速な発展、特にLarge Language Models (LLMs)は、生涯学習の風景に革命をもたらした。本稿では,LLMが支援する自己構築型生涯学習環境の概念的枠組みを提案する。知識と技能の急速な非現実化に追従する上で、従来の教育制度の欠如を強調している。提案する枠組みは、制度化された教育からパーソナライズされた自己駆動型学習への転換を強調する。 LLMの自然言語機能を活用して、動的かつ適応的な学習体験を提供し、知識獲得を支援する個人知的エージェントの作成を促進する。このフレームワークは、パーソナルワールドモデルの構築、学習の二重モード(トレーニングと探索)、再利用可能な学習アーティファクトの作成など、生涯学習の原則を統合する。さらに、効果的な学習軌跡を維持する上で、好奇心駆動学習と反射的実践の重要性を強調している。この論文は、単に知識を構造化したり伝達したりするのではなく、グローバルな知識の整合性を支援することに焦点を当て、教育機関の「華やかな」大学への進化を構想している。 The rapid development of artificial intelligence technologies, particularly Large Language Models (LLMs), has revolutionized the landscape of lifelong learning. This paper introduces a conceptual framework for a self-constructed lifelong learning environment supported by LLMs. It highlights the inadequacies of traditional education systems in keeping pace with the rapid deactualization of knowledge and skills. The proposed framework emphasizes the transformation from institutionalized education to personalized, self-driven learning. It leverages the natural language capabilities of LLMs to provide dynamic and adaptive learning experiences, facilitating the creation of personal intellectual agents that assist in knowledge acquisition. The framework integrates principles of lifelong learning, including the necessity of building personal world models, the dual modes of learning (training and exploration), and the creation of reusable learning artifacts. Additionally, it underscores the importance of curiosity-driven learning and reflective practices in maintaining an effective learning trajectory. The paper envisions the evolution of educational institutions into "flipped" universities, focusing on supporting global knowledge consistency rather than merely structuring and transmitting knowledge.	翻訳日:2024-09-22 21:22:31 公開日:2024-09-02
# 自律運転のためのビジョンベース深部強化学習におけるオフライン学習エンコーダの検討 An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous Driving ( http://arxiv.org/abs/2409.10554v1 ) ライセンス: Link先を確認	Shawan Mohammed, Alp Argun, Nicolas Bonnotte, Gerd Ascheid,	(参考訳) 本研究は、自律運転(AD)のような複雑な部分観測可能なマルコフ決定プロセス(POMDP)において、DRL(Deep Reinforcement Learning)が直面する課題について検討し、これらの環境における視覚に基づくナビゲーションのソリューションを提案する。部分可観測性はRL性能を著しく低下させ、センサ情報とデータ融合を増強して、よりマルコフ的な環境を反映させることにより、これを緩和することができる。しかし、これはより複雑な知覚モジュールを必要とし、RLによるトレーニングは固有の制限のために複雑である。ニューラルネットワークアーキテクチャが複雑化するにつれて、報酬関数がエラー信号としての有効性は低下する。空や特定の物体のようなイメージ内のタスク非関連要素は、さらなる複雑さを生じさせる。我々の研究は、オフラインで訓練されたエンコーダを用いて、自己教師付き学習を通じて大規模なビデオデータセットを活用し、一般化可能な表現を学習する。そして、DRLを通してこれらの表現の上にヘッドネットワークをトレーニングし、CARLA ADシミュレータでエゴ車両を制御することを学習する。本研究では,エンコーダのオフライン学習における学習方法の違いが,AD課題におけるDRLエージェントの性能に及ぼす影響を広範囲に調査する。さらに,CARLAシミュレータにおいて,BDD100Kの運転映像から得られた特徴を直接転送することで,車線追従や衝突回避をゼロショット学習方式で実現することを示す。最後に、転送された表現を効率的に活用するためのRLネットワークに対する様々なアーキテクチャ決定の影響について検討する。そこで本研究では,環境の適切な表現と,それらをRLネットワークに転送する最適な方法を紹介し,検証する。 Our research investigates the challenges Deep Reinforcement Learning (DRL) faces in complex, Partially Observable Markov Decision Processes (POMDP) such as autonomous driving (AD), and proposes a solution for vision-based navigation in these environments. Partial observability reduces RL performance significantly, and this can be mitigated by augmenting sensor information and data fusion to reflect a more Markovian environment. However, this necessitates an increasingly complex perception module, whose training via RL is complicated due to inherent limitations. As the neural network architecture becomes more complex, the reward function's effectiveness as an error signal diminishes since the only source of supervision is the reward, which is often noisy, sparse, and delayed. Task-irrelevant elements in images, such as the sky or certain objects, pose additional complexities. Our research adopts an offline-trained encoder to leverage large video datasets through self-supervised learning to learn generalizable representations. Then, we train a head network on top of these representations through DRL to learn to control an ego vehicle in the CARLA AD simulator. This study presents a broad investigation of the impact of different learning schemes for offline-training of encoders on the performance of DRL agents in challenging AD tasks. Furthermore, we show that the features learned by watching BDD100K driving videos can be directly transferred to achieve lane following and collision avoidance in CARLA simulator, in a zero-shot learning fashion. Finally, we explore the impact of various architectural decisions for the RL networks to utilize the transferred representations efficiently. Therefore, in this work, we introduce and validate an optimal way for obtaining suitable representations of the environment, and transferring them to RL networks.	翻訳日:2024-09-22 21:22:31 公開日:2024-09-02
# 医療におけるMLLMの民主化:資源制約環境における効果的な医療診断のためのTinyLLaVA-Med Democratizing MLLMs in Healthcare: TinyLLaVA-Med for Efficient Healthcare Diagnostics in Resource-Constrained Settings ( http://arxiv.org/abs/2409.12184v1 ) ライセンス: Link先を確認	Aya El Mir, Lukelo Thadei Luoga, Boyuan Chen, Muhammad Abdullah Hanif, Muhammad Shafique,	(参考訳) 医療にMLLM(Multi-Modal Large Language Model)を配置することは、その高い計算要求と重要なメモリ要求によって妨げられ、Nvidia Jetson Xavierのようなリソース制約のあるデバイスでは特に困難である。この問題は、高度な診断を必要とするがリソースが限られている遠隔医療環境では特に顕著である。本稿では,汎用MLLMであるTinyLLaVAの最適化手法を提案する。この適応には、LLaVA-Medトレーニングパイプラインからインスピレーションを得て、医療データセット上での命令チューニングと微調整のTinyLLaVAが含まれる。提案手法は計算複雑性と消費電力の最小化に成功し,TinyLLaVA-Medは18.9W,メモリは1.9GB,VQA-RADは64.54%,SLAKEは70.70%であった。そのため、TinyLLaVA-Medは、計算資源の少ないハードウェア制約環境において、本質的な機能を維持し、最先端モデルに近い精度を提供する。 Deploying Multi-Modal Large Language Models (MLLMs) in healthcare is hindered by their high computational demands and significant memory requirements, which are particularly challenging for resource-constrained devices like the Nvidia Jetson Xavier. This problem is particularly evident in remote medical settings where advanced diagnostics are needed but resources are limited. In this paper, we introduce an optimization method for the general-purpose MLLM, TinyLLaVA, which we have adapted and renamed TinyLLaVA-Med. This adaptation involves instruction-tuning and fine-tuning TinyLLaVA on a medical dataset by drawing inspiration from the LLaVA-Med training pipeline. Our approach successfully minimizes computational complexity and power consumption, with TinyLLaVA-Med operating at 18.9W and using 11.9GB of memory, while achieving accuracies of 64.54% on VQA-RAD and 70.70% on SLAKE for closed-ended questions. Therefore, TinyLLaVA-Med achieves deployment viability in hardware-constrained environments with low computational resources, maintaining essential functionalities and delivering accuracies close to state-of-the-art models.	翻訳日:2024-09-22 21:12:27 公開日:2024-09-02
# 非線形力学系のスパース同定によるグラフ構造データからのゲバニング方程式の発見 Discovering Governing equations from Graph-Structured Data by Sparse Identification of Nonlinear Dynamical Systems ( http://arxiv.org/abs/2409.04463v1 ) ライセンス: Link先を確認	Mohammad Amin Basiri, Sina Khanmohammadi,	(参考訳) 機械学習(ML)と疎性促進技術の組み合わせは、データから支配方程式を直接抽出し、科学と工学の様々な分野における計算モデルに革命をもたらす。発見された力学モデルは、気候科学、神経科学、生態学、財務学、疫学などの課題に対処するために用いられる。しかし、力学系を発見するための既存のスパース同定法のほとんどは、サブシステム間の相互作用を考慮せずにシステム全体を一つのものとして扱う。結果として、そのようなモデルは創発的なシステムの振る舞いの小さな変化を捉えることができない。そこで我々は,グラフ構造データ(SINDyG)から非線形力学系のスパース同定法を開発し,ネットワーク構造をスパース回帰に組み込んで,基礎となるネットワーク力学を説明するモデルパラメータを同定した。 SINDyGは、精度とモデルの単純さを改善しながら、ネットワーク力学の制御方程式を発見する。 The combination of machine learning (ML) and sparsity-promoting techniques is enabling direct extraction of governing equations from data, revolutionizing computational modeling in diverse fields of science and engineering. The discovered dynamical models could be used to address challenges in climate science, neuroscience, ecology, finance, epidemiology, and beyond. However, most existing sparse identification methods for discovering dynamical systems treat the whole system as one without considering the interactions between subsystems. As a result, such models are not able to capture small changes in the emergent system behavior. To address this issue, we developed a new method called Sparse Identification of Nonlinear Dynamical Systems from Graph-structured data (SINDyG), which incorporates the network structure into sparse regression to identify model parameters that explain the underlying network dynamics. SINDyG discovers the governing equations of network dynamics while offering improvements in accuracy and model simplicity.	翻訳日:2024-09-15 05:31:27 公開日:2024-09-02
# 病理診断のための脳波言語モデル EEG-Language Modeling for Pathology Detection ( http://arxiv.org/abs/2409.07480v1 ) ライセンス: Link先を確認	Sam Gijsen, Kerstin Ritter,	(参考訳) マルチモーダル言語モデリングは、大規模言語モデルの進歩を活用して、有能なマルチモーダルモデルを事前訓練する最近のブレークスルーを構成する。事前学習中の自然言語の統合は、特にコンピュータビジョンにおいて、学習された表現を大幅に改善することが示されている。しかし、機能的脳データ領域における多モーダル言語モデリングの有効性、特に病理診断の進歩は未解明のままである。本研究は臨床報告に基づく脳波モデルの先駆者であり,脳波は15,000である。我々は,この新たな領域にマルチモーダルアライメントを行う手法を拡張し,脳波言語モデルのトレーニングに有用なレポート中のテキスト情報について検討する。以上の結果から,患者の臨床経過,脳波の描写,医師の解釈など,さまざまな報告セグメントに曝露されることから,モデルがより豊かな表現を学習できることが示唆された。より狭い臨床テキスト情報に曝露されたモデルと比較して,臨床報告に基づいて脳波を検索するモデルが(その逆も)極めて高い精度で見つかる。しかし、これは対照的な学習アプローチを使用する場合にのみ観察される。特にアノテーションの少ないレギュレーションでは、ゼロショット分類と線形プローブの両方で示されるように、脳波言語モデルの表現は、脳波のみのモデルと比較して、病理診断を大幅に改善することができる。これらの結果は,脳活動データと臨床テキストの統合の可能性を強調し,脳波言語モデルが臨床応用の大きな進展を示すことを示唆している。 Multimodal language modeling constitutes a recent breakthrough which leverages advances in large language models to pretrain capable multimodal models. The integration of natural language during pretraining has been shown to significantly improve learned representations, particularly in computer vision. However, the efficacy of multimodal language modeling in the realm of functional brain data, specifically for advancing pathology detection, remains unexplored. This study pioneers EEG-language models trained on clinical reports and 15000 EEGs. We extend methods for multimodal alignment to this novel domain and investigate which textual information in reports is useful for training EEG-language models. Our results indicate that models learn richer representations from being exposed to a variety of report segments, including the patient's clinical history, description of the EEG, and the physician's interpretation. Compared to models exposed to narrower clinical text information, we find such models to retrieve EEGs based on clinical reports (and vice versa) with substantially higher accuracy. Yet, this is only observed when using a contrastive learning approach. Particularly in regimes with few annotations, we observe that representations of EEG-language models can significantly improve pathology detection compared to those of EEG-only models, as demonstrated by both zero-shot classification and linear probes. In sum, these results highlight the potential of integrating brain activity data with clinical text, suggesting that EEG-language models represent significant progress for clinical applications.	翻訳日:2024-09-15 05:01:16 公開日:2024-09-02
# 限られた結果データを用いた治療効果の効率的な評価におけるサロゲートの役割について On the role of surrogates in the efficient estimation of treatment effects with limited outcome data ( http://arxiv.org/abs/2003.12408v4 ) ライセンス: Link先を確認	Nathan Kallus, Xiaojie Mao,	(参考訳) 多くの実験的、観察的な研究において、関心の結果を観察することはしばしば困難またはコストがかかり、平均治療効果(ATE)を推定する有効なサンプルサイズが減少する。一次利害関係にない結果のみを代理する単位にデータを組み込むことは、ATE推定の精度を高めることができる。我々は、厳格な代理条件を課すことを控え、サロゲートを目標とする結果の完全な代替として許容する。代わりに、未確立の処理の割り当てや欠如、それに対応する重複条件以外の仮定を伴わずに、サロゲート結果の豊富な観察によって、対象とする結果の可利用かつ限定的な観察を補う。ポテンシャルゲインを定量化するために、圧倒的な単位数と同等数の単位が欠落した場合に、ATE推定と代理無しの効率境界の差を導出する。我々は,これらの効率向上を実現するために,ロバストなATE推定と推論手法を開発した。職種訓練の長期学習効果を実証的に実証した。 In many experimental and observational studies, the outcome of interest is often difficult or expensive to observe, reducing effective sample sizes for estimating average treatment effects (ATEs) even when identifiable. We study how incorporating data on units for which only surrogate outcomes not of primary interest are observed can increase the precision of ATE estimation. We refrain from imposing stringent surrogacy conditions, which permit surrogates as perfect replacements for the target outcome. Instead, we supplement the available, albeit limited, observations of the target outcome with abundant observations of surrogate outcomes, without any assumptions beyond unconfounded treatment assignment and missingness and corresponding overlap conditions. To quantify the potential gains, we derive the difference in efficiency bounds on ATE estimation with and without surrogates, both when an overwhelming or comparable number of units have missing outcomes. We develop robust ATE estimation and inference methods that realize these efficiency gains. We empirically demonstrate the gains by studying long-term-earning effects of job training.	翻訳日:2024-09-07 07:35:31 公開日:2024-09-02
# Bitcoin時代のポンプとダンプ:暗号通貨市場操作のリアルタイム検出 Pump and Dumps in the Bitcoin Era: Real Time Detection of Cryptocurrency Market Manipulations ( http://arxiv.org/abs/2005.06610v2 ) ライセンス: Link先を確認	Massimo La Morgia, Alessandro Mei, Francesco Sassi, Julinda Stefa,	(参考訳) ここ数年、暗号通貨はますます人気を博している。専門家でない人々でさえ、これらの証券に投資し始め、今日では暗号通貨取引所は月に1000億ドル以上で取引を処理する。しかし、多くの暗号通貨は流動性が低く、市場操作のスキームが非常に高い。本稿では,インターネット上のコミュニティによって組織されたポンプ・ダンプ方式の詳細な分析を行う。これらのコミュニティがどのように組織化され、どのように詐欺を行うかを観察します。次に,ポンプ群とダンプ群に関する2つのケーススタディを報告する。最後に,この不正をリアルタイムに検出する手法を導入することで,ポンプやダンプの仕組みが動作している場合に,投資家が市場から外れないようにする。 In the last years, cryptocurrencies are increasingly popular. Even people who are not experts have started to invest in these securities and nowadays cryptocurrency exchanges process transactions for over 100 billion US dollars per month. However, many cryptocurrencies have low liquidity and therefore they are highly prone to market manipulation schemes. In this paper, we perform an in-depth analysis of pump and dump schemes organized by communities over the Internet. We observe how these communities are organized and how they carry out the fraud. Then, we report on two case studies related to pump and dump groups. Lastly, we introduce an approach to detect the fraud in real time that outperforms the current state of the art, so to help investors stay out of the market when a pump and dump scheme is in action.	翻訳日:2024-09-07 07:35:31 公開日:2024-09-02
# フロッケ力学の量子カオス測度 Quantum chaos measures for Floquet dynamics ( http://arxiv.org/abs/2007.07283v3 ) ライセンス: Link先を確認	Amin A. Nizami,	(参考訳) キックローターのような周期的に蹴られたフロケットシステムは、カオスのパラダイム的で実証的な単純なモデルである。非可積分量子力学には、ロシミットエコー(英語版)、自己相関関数(英語版)、OTOC(英語版)などのカオス的挙動の存在(または遷移)の診断尺度がいくつか存在する。我々はこれらの測度を、駆動量子系のユニタリフロケット作用素の固有系の観点から解析的に計算する。これらの式を用いて、トーラス上の量子キックローターの時間的変動を、積分可能かつカオス的ケースに対して決定する。キックローターのより単純な可積分変種に対しては、その力学の表現論的導出を与える。 Periodically kicked Floquet systems such as the kicked rotor are a paradigmatic and illustrative simple model of chaos. For non-integrable quantum dynamics there are several diagnostic measures of the presence of (or the transition to) chaotic behaviour including the Loschmidt echo, autocorrelation function and OTOC. We analytically compute these measures in terms of the eigensystem of the unitary Floquet operator of driven quantum systems. We use these expressions to determine the time variation of the measures for the quantum kicked rotor on the torus, for the integrable as well as the chaotic case. For a simpler integrable variant of the kicked rotor, we also give a representation theoretic derivation of its dynamics.	翻訳日:2024-09-07 07:35:31 公開日:2024-09-02
# 画像のカラー化: 調査とデータセット Image Colorization: A Survey and Dataset ( http://arxiv.org/abs/2008.10774v4 ) ライセンス: Link先を確認	Saeed Anwar, Muhammad Tahir, Chongyi Li, Ajmal Mian, Fahad Shahbaz Khan, Abdul Wahab Muzaffar,	(参考訳) 画像のカラー化は、グレースケールの画像やビデオフレームのRGB色を推定し、美的および知覚的品質を改善する。過去10年間で、画像のカラー化のためのディープラーニング技術は大幅に進歩し、これらの技術の体系的な調査とベンチマークが必要である。本稿では、最近の最先端のディープラーニングベースの画像カラー化技術に関する総合的な調査を行い、それらの基本的なブロックアーキテクチャ、入力、オプティマイザ、損失関数、トレーニングプロトコル、トレーニングデータなどについて述べる。既存のカラー化テクニックを7つのクラスに分類し、ベンチマークデータセットや評価指標など、パフォーマンスを管理する重要な要因について論じる。既存のデータセットの制限を強調し、着色に特化した新しいデータセットを導入します。我々は既存のデータセットと提案した画像の両方を用いて、既存の画像のカラー化手法を広範囲に実験的に評価する。最後に,既存の手法の限界について議論し,この急速に進化する深層画像の着色に関する課題に対して,可能な解決策と今後の研究方向性を推奨する。データセットと評価のためのコードはhttps://github.com/saeed-anwar/ColorSurvey.comで公開されている。 Image colorization estimates RGB colors for grayscale images or video frames to improve their aesthetic and perceptual quality. Over the last decade, deep learning techniques for image colorization have significantly progressed, necessitating a systematic survey and benchmarking of these techniques. This article presents a comprehensive survey of recent state-of-the-art deep learning-based image colorization techniques, describing their fundamental block architectures, inputs, optimizers, loss functions, training protocols, training data, etc. It categorizes the existing colorization techniques into seven classes and discusses important factors governing their performance, such as benchmark datasets and evaluation metrics. We highlight the limitations of existing datasets and introduce a new dataset specific to colorization. We perform an extensive experimental evaluation of existing image colorization methods using both existing datasets and our proposed one. Finally, we discuss the limitations of existing methods and recommend possible solutions and future research directions for this rapidly evolving topic of deep image colorization. The dataset and codes for evaluation are publicly available at https://github.com/saeed-anwar/ColorSurvey.	翻訳日:2024-09-07 07:30:16 公開日:2024-09-02
# Beta-CoRM:$n$-gramプロファイル分析のためのベイズ的アプローチ Beta-CoRM: A Bayesian Approach for $n$-gram Profiles Analysis ( http://arxiv.org/abs/2011.11558v3 ) ライセンス: Link先を確認	José A. Perusquía, Jim E. Griffin, Cristiano Villa,	(参考訳) $n$-gramプロファイルは、クラスタリングや分類のために、潜在的に異なる長さの長いシーケンスを分析するのに成功し、広く利用されている。主に、この目的のために機械学習アルゴリズムが使われているが、予測性能にもかかわらず、これらの手法は隠れた構造を発見したり、データの完全な確率的表現を提供することはできない。バイナリ属性として使われる$n$-gramプロファイルのために設計されたベイズ生成モデルの新しいクラスが、この問題に対処するために設計されている。提案したモデリングの柔軟性により、生成モデルにおける特徴選択への簡単なアプローチを考えることができる。さらに,合成および実データシナリオに適用した高速な推論手順のためにスライスサンプリングアルゴリズムを導出し,特徴選択が分類精度を向上させることを示す。 $n$-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for $n$-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.	翻訳日:2024-09-07 07:30:16 公開日:2024-09-02
# Insta-YOLO: リアルタイムインスタンスセグメンテーション INSTA-YOLO: Real-Time Instance Segmentation ( http://arxiv.org/abs/2102.06777v3 ) ライセンス: Link先を確認	Eslam Mohamed, Abdelrahman Shaker, Ahmad El-Sallab, Mayada Hadhoud,	(参考訳) インスタンスセグメンテーションは、近年、様々なコンピュータビジョンアプリケーションで大きな注目を集めている。これは、同じクラスに属している場合でも、シーンの異なるオブジェクトに異なるIDを提供することを目的としている。これは様々なシナリオ、特にオクルージョンにおいて有用である。インスタンスセグメンテーションは通常、2段階のパイプラインとして実行される。まず、検出されたボックス領域内でオブジェクトを検出し、セマンティックセグメンテーションを行う。このプロセスは、特にセグメンテーション部分において、コストのかかるアップサンプリングを伴う。さらに、LiDARポイントクラウドや空中オブジェクト検出のようないくつかのアプリケーションでは、2段階のパイプラインに余分な複雑さをもたらすように、指向するボックスを予測する必要があることが多い。本稿では,リアルタイムインスタンス分割のための一段階のエンドツーエンドディープラーニングモデルであるInsta-YOLOを提案する。提案モデルはYOLOワンショットオブジェクト検出器にインスパイアされ,ボックス回帰損失はローカライゼーションヘッドの多項式回帰に置き換わる。この修正により、セグメント化アップサンプリングデコーダを完全に省略し、多項式出力係数からインスタンス分割輪郭を生成することができる。加えて、このアーキテクチャはオブジェクト指向オブジェクトに自然に適合します。当社のモデルは,Carnva,Cityscapes,Airbusの3つのデータセットで評価する。その結果,GTX-1080 GPUでは,mAPの精度は2倍に向上した。 Instance segmentation has gained recently huge attention in various computer vision applications. It aims at providing different IDs to different object of the scene, even if they belong to the same class. This is useful in various scenarios, especially in occlusions. Instance segmentation is usually performed as a two-stage pipeline. First, an object is detected, then semantic segmentation within the detected box area. This process involves costly up-sampling, especially for the segmentation part. Moreover, for some applications, such as LiDAR point clouds and aerial object detection, it is often required to predict oriented boxes, which add extra complexity to the two-stage pipeline. In this paper, we propose Insta-YOLO, a novel one-stage end-to-end deep learning model for real-time instance segmentation. The proposed model is inspired by the YOLO one-shot object detector, with the box regression loss is replaced with polynomial regression in the localization head. This modification enables us to skip the segmentation up-sampling decoder altogether and produces the instance segmentation contour from the polynomial output coefficients. In addition, this architecture is a natural fit for oriented objects. We evaluate our model on three datasets, namely, Carnva, Cityscapes and Airbus. The results show our model achieves competitive accuracy in terms of mAP with significant improvement in speed by 2x on GTX-1080 GPU.	翻訳日:2024-09-07 07:30:16 公開日:2024-09-02
# GAN-HA:新しい異種二重識別器ネットワークと近赤外・可視画像融合のための新しい注意基盤融合戦略を備えた生成逆数ネットワーク GAN-HA: A generative adversarial network with a novel heterogeneous dual-discriminator network and a new attention-based fusion strategy for infrared and visible image fusion ( http://arxiv.org/abs/2404.15992v3 ) ライセンス: Link先を確認	Guosheng Lu, Zile Fang, Jiaju Tian, Haowen Huang, Yuelong Xu, Zhuolin Han, Yaoming Kang, Can Feng, Zhigang Zhao,	(参考訳) 赤外線・可視画像融合(IVIF)は、可視画像からテクスチャの詳細を統合しつつ、赤外線画像からの熱放射情報を保存することを目的としている。熱放射情報は主として画像強度で表現されるが、テクスチャの詳細は画像勾配で表現されるのが一般的である。しかし、既存の二重識別器生成敵ネットワーク(GAN)は、赤外線と可視画像情報の異なる学習ニーズを完全に考慮していない2つの構造的に同一の識別器に依存していることが多い。そこで本研究では,異種二重識別器ネットワークと注意型融合戦略(GAN-HA)を備えた新しいGANを提案する。具体的には、赤外画像と可視画像の本質的な違いを認識し、熱放射情報とテクスチャの詳細を同時に捉える新しい異種二重識別ネットワークを提案する。このネットワーク内の2つの判別器は構造的に異なり、赤外画像のための有能な判別器と、可視画像のための詳細な判別器を含む。彼らはそれぞれ、リッチな画像強度情報と画像勾配情報を学ぶことができる。さらに、異なるソース画像からの学習情報を適切に強調するために、ジェネレータ内に新しい注目ベースの融合戦略を設計し、融合結果の情報表現能力を向上させる。このようにして、GAN-HAによって生成された融合画像は、熱標的の塩分濃度とテクスチャの鋭さの両方をより効果的に維持することができる。様々な公開データセットに対する大規模な実験は、他の最先端(SOTA)アルゴリズムよりもGAN-HAの方が優れていることを示し、実用的な応用の可能性を示している。 Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images. Thermal radiation information is mainly expressed through image intensities, while texture details are typically expressed through image gradients. However, existing dual-discriminator generative adversarial networks (GANs) often rely on two structurally identical discriminators for learning, which do not fully account for the distinct learning needs of infrared and visible image information. To this end, this paper proposes a novel GAN with a heterogeneous dual-discriminator network and an attention-based fusion strategy (GAN-HA). Specifically, recognizing the intrinsic differences between infrared and visible images, we propose, for the first time, a novel heterogeneous dual-discriminator network to simultaneously capture thermal radiation information and texture details. The two discriminators in this network are structurally different, including a salient discriminator for infrared images and a detailed discriminator for visible images. They are able to learn rich image intensity information and image gradient information, respectively. In addition, a new attention-based fusion strategy is designed in the generator to appropriately emphasize the learned information from different source images, thereby improving the information representation ability of the fusion result. In this way, the fused images generated by GAN-HA can more effectively maintain both the salience of thermal targets and the sharpness of textures. Extensive experiments on various public datasets demonstrate the superiority of GAN-HA over other state-of-the-art (SOTA) algorithms while showcasing its higher potential for practical applications.	翻訳日:2024-09-07 03:22:33 公開日:2024-09-02
# 暗黒物質とダークエネルギーの代替としての宇宙スケールの量子効果 Quantum Effects on Cosmic Scales as an Alternative to Dark Matter and Dark Energy ( http://arxiv.org/abs/2409.02954v1 ) ライセンス: Link先を確認	Da-Ming Chen, Lin Wang,	(参考訳) スピンねじれ理論 (英: spin-torsion theory) は、アインシュタインの一般相対性理論 (GR) に微小粒子のスピンを組み込むことで拡張する重力に対するゲージ理論である。本研究では、スピンねじれ理論をさらに発展させ、自由落下するマクロ粒子を含む球対称および静的重力系について検討する。我々は、マクロな物質の量子スピンが宇宙スケールで注目されるようになると仮定する。さらに、ディラックスピノルとディラック方程式は、粒子とその関連する過程のすべての重要な物理的特性を適切に捉えていると仮定する。このアプローチの重要な側面は、ディラック方程式の定数質量をスケール関数で置換することであり、量子効果と重力系のスケールとの接続を確立することができる。このメカニズムは、マクロな物質の量子効果がスケール依存であり、微小粒子では観測されない現象である局所的に減少することを保証している。任意の物質密度分布について、我々の理論は質量式内の量子ポテンシャルエネルギー(QPE)という追加の量子項を予測する。 QPEは時間拡張と距離収縮を誘導し、重力井戸を模倣する。宇宙論に適用すると、QPEはアインシュタインが静的宇宙論モデルで重力のバランスをとるために導入した宇宙定数に匹敵するものとして機能する。 QPEはまた、ハッブル赤方偏移の起源(伝統的には宇宙の膨張に由来する)のもっともらしい説明も提供している。予測光度距離-赤方偏移関係は、SNe Iaの宇宙試料から得られたSNe Iaデータと非常によく一致している。銀河の文脈では、QPEはダークマターに相当するものとして機能する。予測された円速度はSPARC (Spitzer Photometry and Accurate Rotation Curves database) の回転曲線データとよく一致している。 The spin-torsion theory is a gauge theory approach to gravity that expands upon Einstein's general relativity (GR) by incorporating the spin of microparticles. In this study, we further develop the spin-torsion theory to examine spherically symmetric and static gravitational systems that involve free-falling macroscopic particles. We posit that the quantum spin of macroscopic matter becomes noteworthy at cosmic scales. We further assume that the Dirac spinor and Dirac equation adequately capture all essential physical characteristics of the particles and their associated processes. A crucial aspect of our approach involves substituting the constant mass in the Dirac equation with a scale function, allowing us to establish a connection between quantum effects and the scale of gravitational systems. This mechanism ensures that the quantum effect of macroscopic matter is scale-dependent and diminishes locally, a phenomenon not observed in microparticles. For any given matter density distribution, our theory predicts an additional quantum term, the quantum potential energy (QPE), within the mass expression. The QPE induces time dilation and distance contraction, and thus mimics a gravitational well. When applied to cosmology, the QPE serves as a counterpart to the cosmological constant introduced by Einstein to balance gravity in his static cosmological model. The QPE also offers a plausible explanation for the origin of Hubble redshift (traditionally attributed to the universe's expansion). The predicted luminosity distance--redshift relation aligns remarkably well with SNe Ia data from the cosmological sample of SNe Ia. In the context of galaxies, the QPE functions as the equivalent of dark matter. The predicted circular velocities align well with rotation curve data from the SPARC (Spitzer Photometry and Accurate Rotation Curves database) sample.	翻訳日:2024-09-07 01:16:35 公開日:2024-09-02
# ジェネレーティブAIによるコード生成ツールがソフトウェアエンジニアの雇用に与える影響:リクルーターの経験、知覚、戦略 The Impact of Generative AI-Powered Code Generation Tools on Software Engineer Hiring: Recruiters' Experiences, Perceptions, and Strategies ( http://arxiv.org/abs/2409.00875v1 ) ライセンス: Link先を確認	Alyssia Chen, Timothy Huo, Yunhee Nam, Dan Port, Anthony Peruma,	(参考訳) ChatGPTやGitHub CopilotといったGenerative AI(GenAI)ツールの急速な進歩は、コード生成タスクを自動化することで、ソフトウェアエンジニアリングを変革している。これらのツールは開発者の生産性を向上させる一方で、ソフトウェアエンジニアリング候補の真の能力と潜在能力を評価する際に、組織や専門家を雇う上での課題も提示している。業界と学界の両方でこれらのツールに関する研究は存在するが、これらのツールが採用プロセスにどのように影響するかについては、研究の欠如がある。そこで本研究では,GenAIを利用したコード生成ツールに対する採用者の経験と認識,および候補評価の課題と戦略について検討する。業界の専門家32人を対象に行った調査では、ほとんどの参加者はそのようなツールに精通しているが、ほとんどの組織は、これらのツールの使用・知識を考慮に入れた候補評価手法を調整していない。面接中、候補者がこれらのツールの使用を許可すべきかどうかについては意見が分かれており、多くの参加者は、これらのツールを使用する上で、効果的に自分のスキルを発揮できる候補者を評価する。さらに、ほとんどの参加者は、GenAIを利用したコード生成ツールをコンピュータサイエンスカリキュラムに組み込むことが重要であると考えており、それを行う上で重要なリスクとメリットについて言及している。 The rapid advancements in Generative AI (GenAI) tools, such as ChatGPT and GitHub Copilot, are transforming software engineering by automating code generation tasks. While these tools improve developer productivity, they also present challenges for organizations and hiring professionals in evaluating software engineering candidates' true abilities and potential. Although there is existing research on these tools in both industry and academia, there is a lack of research on how these tools specifically affect the hiring process. Therefore, this study aims to explore recruiters' experiences and perceptions regarding GenAI-powered code generation tools, as well as their challenges and strategies for evaluating candidates. Findings from our survey of 32 industry professionals indicate that although most participants are familiar with such tools, the majority of organizations have not adjusted their candidate evaluation methods to account for candidates' use/knowledge of these tools. There are mixed opinions on whether candidates should be allowed to use these tools during interviews, with many participants valuing candidates who can effectively demonstrate their skills in using these tools. Additionally, most participants believe that it is important to incorporate GenAI-powered code generation tools into computer science curricula and mention the key risks and benefits of doing so.	翻訳日:2024-09-06 08:40:50 公開日:2024-09-02
# 添加物製造におけるディジタルツイン : システムレビュー Digital Twins in Additive Manufacturing: A Systematic Review ( http://arxiv.org/abs/2409.00877v1 ) ライセンス: Link先を確認	Md Manjurul Ahsan, Benjamin Bevans, Chris Billings, Alexander Riensche, Yingtao Liu, Shivakumar Raman, Zahed Siddique,	(参考訳) Digital Twins (DT) は、AMマシンの物理的コンポーネントの仮想レプリカを作成する能力によって、リアルタイム生産監視に役立っているため、アダプティブマニュファクチャリング (AM) で人気が高まっている。機械学習(ML)、拡張現実(AR)、シミュレーションベースのモデルといった高度な技術は、製造プロセスにおいてインテリジェントで適応可能なDTを開発する上で重要な役割を果たします。しかし、スケーラビリティ、高品質なデータの統合、DT開発におけるリアルタイムアプリケーションに必要な計算能力について疑問が残る。 AMにおけるDTの現在の状態を理解することは、これらの課題に対処し、AMプロセスを進める上でそのポテンシャルを完全に活用するために不可欠である。この機会を考慮して、本研究は以下の4つの研究課題に対処することで、AMにおけるDTの総合的な概要を提供することを目的としている。 2)最近のDTの開発と実装について教えてください。 (3)プロセス改善とハイブリッド製造にDTはどのように使われているか? (4) DTは産業用 4.0 技術とどのように統合されているか? 現在の応用と技術について議論することで、AMやDTの研究者や実践者に対して、より深い理解と今後の研究の方向性を提供することを目指している。 Digital Twins (DTs) are becoming popular in Additive Manufacturing (AM) due to their ability to create virtual replicas of physical components of AM machines, which helps in real-time production monitoring. Advanced techniques such as Machine Learning (ML), Augmented Reality (AR), and simulation-based models play key roles in developing intelligent and adaptable DTs in manufacturing processes. However, questions remain regarding scalability, the integration of high-quality data, and the computational power required for real-time applications in developing DTs. Understanding the current state of DTs in AM is essential to address these challenges and fully utilize their potential in advancing AM processes. Considering this opportunity, this work aims to provide a comprehensive overview of DTs in AM by addressing the following four research questions: (1) What are the key types of DTs used in AM and their specific applications? (2) What are the recent developments and implementations of DTs? (3) How are DTs employed in process improvement and hybrid manufacturing? (4) How are DTs integrated with Industry 4.0 technologies? By discussing current applications and techniques, we aim to offer a better understanding and potential future research directions for researchers and practitioners in AM and DTs.	翻訳日:2024-09-06 08:40:50 公開日:2024-09-02
# ガウス的不安定チャネルとガウス的操舵の計算可能な定量化 Gaussian unsteerable channels and computable quantifications of Gaussian steering ( http://arxiv.org/abs/2409.00878v1 ) ライセンス: Link先を確認	Taotao Yan, Jie Guo, Jinchuan Hou, Xiaofei Qi, Kan He,	(参考訳) 連続変数系に対するガウスの操舵に関する現在の量子資源理論は欠陥があり不完全である。その主な欠点は、ガウスの不安定な状態からガウスの不安定な状態へ変換するガウスのチャネルのアーキテクチャの不十分な理解に起因し、自由な操作の限定的な選択に繋がる。本稿では,そのような$(m+n)$-mode Gaussianチャネルの構造を深く探求し,ガウス的非ステアブルチャネルのクラスと最大ガウス的非ステアブルチャネルのクラスを導入する。また、2つの量子化も提案する: $\mathcal{J}_{j}$ $(j=1,2)$ of $(m+n)$-mode Gaussian steering from $A$ to $B$。ガウス状態の共分散行列にのみ依存するため、$\mathcal{J}_{j}$の値の計算は単純で効率的である。 $\mathcal{J}_{j}$s は真のガウス的ステアリング測度ではないが、あるガウス的不安定チャネルの下での非増加のような良い性質を持っている。さらに、${\mathcal J}_2$ とガウスの操舵測度 $\mathcal N_3$ を比較すると、${\mathcal J}_2$ があるクラス$(1+1)$-mode Gaussian純状態における $\mathcal N_3$ の上界であることが分かる。例として、マルコフ環境におけるガウスステアリングの挙動を議論するために$\mathcal J_2$を応用し、量子ステアリングにおける急激な崩壊の興味深い現象を明らかにする1+1$モードガウス状態について述べる。 The current quantum resource theory for Gaussian steering for continuous-variable systems is flawed and incomplete. Its primary shortcoming stems from an inadequate comprehension of the architecture of Gaussian channels transforming Gaussian unsteerable states into Gaussian unsteerable states, resulting in a restricted selection of free operations. In the present paper, we explore in depth the structure of such $(m+n)$-mode Gaussian channels, and introduce the class of the Gaussian unsteerable channels and the class of maximal Gaussian unsteerable channels, both of them may be chosen as the free operations, which completes the resource theory for Gaussian steering from $A$ to $B$ by Alice's Gaussian measurements. We also propose two quantifications $\mathcal{J}_{j}$ $(j=1,2)$ of $(m+n)$-mode Gaussian steering from $A$ to $B$. The computation of the value of $\mathcal{J}_{j}$ is straightforward and efficient, as it solely relies on the covariance matrices of Gaussian states, eliminating the need for any optimization procedures. Though $\mathcal{J}_{j}$s are not genuine Gaussian steering measures, they have some nice properties such as non-increasing under certain Gaussian unsteerable channels. Additionally, we compare ${\mathcal J}_2$ with the Gaussian steering measure $\mathcal N_3$, which is based on the Uhlmann fidelity, revealing that ${\mathcal J}_2$ is an upper bound of $\mathcal N_3$ at certain class of $(1+1)$-mode Gaussian pure states. As an illustration, we apply $\mathcal J_2$ to discuss the behaviour of Gaussian steering for a special class of $(1+1)$-mode Gaussian states in Markovian environments, which uncovers the intriguing phenomenon of rapid decay in quantum steering.	翻訳日:2024-09-06 08:40:50 公開日:2024-09-02
# パラメータ数を超える:専門家のソフトな混ざり合いに暗黙のバイアス Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts ( http://arxiv.org/abs/2409.00879v1 ) ライセンス: Link先を確認	Youngseog Chung, Dhruv Malik, Jeff Schneider, Yuanzhi Li, Aarti Singh,	(参考訳) スパースミキチャー・オブ・エキスパート(MoE)モデルに関する伝統的な見解は、単一の大規模専門家を訓練する代わりに、計算コストがかかるため、多数の小規模専門家を訓練できるというものである。小さい専門家の総パラメータ数が特異な大専門家のそれと等しければ、我々は、計算的トラクタビリティを得ながら、専門家の表現力を保ち、専門家の専門化を促進することを期待する。最近導入されたSoft MoEは、Sparse MoEの離散ルーティング機構をトークンを滑らかに混合する微分可能なゲーティング関数に置き換えている。このスムーズなゲーティング関数はスパースMoEに関連する様々なトレーニング不安定性を緩和するが、ソフトMoEの表現力に影響を及ぼす暗黙のバイアスを誘発するか、専門家の専門化の可能性は明らかでない。単元的に強力な専門家を持つSoft MoEは、単純な凸関数を表現できないことを証明した。このことは、Soft MoEの成功は、一大専門家の表現力を総合的に模倣する多くの小さな専門家の伝統的な視点では説明できないこと、そして複数の専門家が(固定された総パラメータ数であっても)優れた表現力を達成するために実際に必要であることを正当化している。本研究は,Soft MoEのエキスパート専門化の概念を導入し,パラメータの総数を変えながら,以下の(計算上は難解な)課題を考察する。入力が与えられたら、この入力のラベルを予測するための専門的なサブセットを見つけるにはどうすればよいのか? 経験的に、小さな専門家がたくさんいると、アーキテクチャは暗黙的に偏りがあり、専門的な専門家のサブセットを効率的に近似できることを示している。提案手法は推論時の計算量を削減するために容易に実装できる。 The traditional viewpoint on Sparse Mixture of Experts (MoE) models is that instead of training a single large expert, which is computationally expensive, we can train many small experts. The hope is that if the total parameter count of the small experts equals that of the singular large expert, then we retain the representation power of the large expert while gaining computational tractability and promoting expert specialization. The recently introduced Soft MoE replaces the Sparse MoE's discrete routing mechanism with a differentiable gating function that smoothly mixes tokens. While this smooth gating function successfully mitigates the various training instabilities associated with Sparse MoE, it is unclear whether it induces implicit biases that affect Soft MoE's representation power or potential for expert specialization. We prove that Soft MoE with a single arbitrarily powerful expert cannot represent simple convex functions. This justifies that Soft MoE's success cannot be explained by the traditional viewpoint of many small experts collectively mimicking the representation power of a single large expert, and that multiple experts are actually necessary to achieve good representation power (even for a fixed total parameter count). Continuing along this line of investigation, we introduce a notion of expert specialization for Soft MoE, and while varying the number of experts yet fixing the total parameter count, we consider the following (computationally intractable) task. Given any input, how can we discover the expert subset that is specialized to predict this input's label? We empirically show that when there are many small experts, the architecture is implicitly biased in a fashion that allows us to efficiently approximate the specialized expert subset. Our method can be easily implemented to potentially reduce computation during inference.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# 組込み型展開のためのVAEを用いたアウト・オブ・ディストリビューション検出器の圧縮 Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment ( http://arxiv.org/abs/2409.00880v1 ) ライセンス: Link先を確認	Aditya Bansal, Michael Yuhas, Arvind Easwaran,	(参考訳) アウト・オブ・ディストリビューション(OOD)検出器は、機械学習モデルのトレーニングディストリビューションの外でサンプルを識別することで、組み込みサイバー物理システムの安全モニターとして機能し、潜在的に安全でないアクションを防ぐことができる。しかし、OOD検出器はディープニューラルネットワークを使ってしばしば実装されるため、メモリと電力の制約のある組み込みシステムのリアルタイムな期限を満たすことは困難である。我々は,OOD検出を潜在空間で行う可変オートエンコーダ(VAE)に基づくOOD検出器のクラスを検討し,定量化,プルーニング,知識蒸留を適用した。これらの手法は他の深層モデルに対しても検討されてきたが、遅延空間のOOD検出に組み合わせた効果は検討されていない。これらの技術はVOEのテスト損失を増加させるが、これはOOD検出性能の比例的な低下には対応せず、組込みCPUやGPU上でリアルタイムに推測できるリーンOOD検出器を開発するために活用する。本稿では,3つの圧縮技術を組み合わせて,OOD検出器のAUROCを維持しながら,メモリと実行時間を著しく短縮する設計手法を提案する。この手法をJetson Nano上に2つの既存のOOD検出器を用いて実証し、GPUとCPUの推論時間をそれぞれ20%と28%削減し、AUROCをベースラインの5%に抑える。 Out-of-distribution (OOD) detectors can act as safety monitors in embedded cyber-physical systems by identifying samples outside a machine learning model's training distribution to prevent potentially unsafe actions. However, OOD detectors are often implemented using deep neural networks, which makes it difficult to meet real-time deadlines on embedded systems with memory and power constraints. We consider the class of variational autoencoder (VAE) based OOD detectors where OOD detection is performed in latent space, and apply quantization, pruning, and knowledge distillation. These techniques have been explored for other deep models, but no work has considered their combined effect on latent space OOD detection. While these techniques increase the VAE's test loss, this does not correspond to a proportional decrease in OOD detection performance and we leverage this to develop lean OOD detectors capable of real-time inference on embedded CPUs and GPUs. We propose a design methodology that combines all three compression techniques and yields a significant decrease in memory and execution time while maintaining AUROC for a given OOD detector. We demonstrate this methodology with two existing OOD detectors on a Jetson Nano and reduce GPU and CPU inference time by 20% and 28% respectively while keeping AUROC within 5% of the baseline.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# SAFE:ソフトウェア脆弱性検出のための意味的・統語的関係の活用における大規模言語モデルの改善 SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection ( http://arxiv.org/abs/2409.00882v1 ) ライセンス: Link先を確認	Van Nguyen, Surya Nepal, Tingmin Wu, Xingliang Yuan, Carsten Rudolph,	(参考訳) ソフトウェア脆弱性(SV)は、安全クリティカルなセキュリティシステムにとって、一般的かつ重要な懸念事項として浮上している。これにより、ソフトウェア脆弱性検出(SVD)のための機械学習やディープラーニングなど、AIベースの手法の利用が大幅に進歩した。 AIベースの手法はSVDで有望なパフォーマンスを示しているが、実際の、複雑で多様なソースコードデータセットに対する効果は、実際には限られている。そこで本研究では,SVDのソースコードデータから意味的・統語的関係を学習し,活用する大規模言語モデルの能力を高める新しいフレームワークを提案する。その結果,ソフトウェア脆弱性検出(SVD)問題に効果的に対処するため,ソースコードデータから基本知識の取得を可能とし,意味的・統語的関連性(セマンティック・アソシエーション)を十分に活用することが可能になる。実世界の3つの挑戦的データセット(ReVeal、D2A、Devign)に対する厳密で広範な実験結果は、我々のアプローチが最先端のベースラインと最先端のベースラインよりも優れていることを示している。要約すると、当社のSAFEアプローチは、F1測定で4.79%から9.15%、リコールで16.93%から21.70%のハイパフォーマンスを実現しています。 Software vulnerabilities (SVs) have emerged as a prevalent and critical concern for safety-critical security systems. This has spurred significant advancements in utilizing AI-based methods, including machine learning and deep learning, for software vulnerability detection (SVD). While AI-based methods have shown promising performance in SVD, their effectiveness on real-world, complex, and diverse source code datasets remains limited in practice. To tackle this challenge, in this paper, we propose a novel framework that enhances the capability of large language models to learn and utilize semantic and syntactic relationships from source code data for SVD. As a result, our approach can enable the acquisition of fundamental knowledge from source code data while adeptly utilizing crucial relationships, i.e., semantic and syntactic associations, to effectively address the software vulnerability detection (SVD) problem. The rigorous and extensive experimental results on three real-world challenging datasets (i.e., ReVeal, D2A, and Devign) demonstrate the superiority of our approach over the effective and state-of-the-art baselines. In summary, on average, our SAFE approach achieves higher performances from 4.79% to 9.15% for F1-measure and from 16.93% to 21.70% for Recall compared to the baselines across all datasets used.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# 海馬分節とアルツハイマー病診断のためのハイブリッドパラメーター高能率微調整法 A Novel Hybrid Parameter-Efficient Fine-Tuning Approach for Hippocampus Segmentation and Alzheimer's Disease Diagnosis ( http://arxiv.org/abs/2409.00884v1 ) ライセンス: Link先を確認	Wangang Cheng, Guanghua He, Keli Hu, Mingyu Fang, Liang Dong, Zhong Li, Hancan Zhu,	(参考訳) 深層学習法は医用画像のセグメンテーションを著しく進歩させたが、その成功は手動で注釈付けされた大量のデータに基づいており、正確なラベル付けには専門的な専門知識が必要である。さらに、これらの手法は、特に3次元の医療画像処理において、かなりの計算資源を必要とすることが多い。したがって、注釈付きデータや計算資源を限定した医用画像分割のための深層学習技術の適用は、依然として重要な課題である。本稿では,ハイブリッド並列およびシリアルアーキテクチャを用いたHyPSと呼ばれる,パラメータ効率の高いファインチューニング手法を提案する。 HyPSはモデルパラメータの最小限のサブセットを更新し、トレーニング済みモデルの本来の知識トラクチャーを維持しながら、下流タスクに関連する特定の特徴を学習する能力を向上する。医用画像分割のための最先端SwinUNETRモデルに適用する。当初、このモデルはBraTs2021データセットで事前トレーニングされ、その後HyPS法が3つの異なる海馬データセットに転送される。さらに, セグメンテーションの結果をもとに, ADNIデータセットから海馬の体積を算出し, それらをメタデータと組み合わせて病型分類を行った。アルツハイマー病(AD)と認知正常(CN)の個人、および早期軽度認知障害(EMCI)と後期軽度認知障害(LMCI)の区別において、HyPSはそれぞれ83.78%と64.29%の分類精度を達成した。以上の結果から,HyPS法はトレーニング済みモデルを用いた海馬セグメンテーションを効果的に促進するだけでなく,アルツハイマー病の検出を支援する可能性も示唆された。私たちのコードは公開されています。 Deep learning methods have significantly advanced medical image segmentation, yet their success hinges on large volumes of manually annotated data, which require specialized expertise for accurate labeling. Additionally, these methods often demand substantial computational resources, particularly for three-dimensional medical imaging tasks. Consequently, applying deep learning techniques for medical image segmentation with limited annotated data and computational resources remains a critical challenge. In this paper, we propose a novel parameter-efficient fine-tuning strategy, termed HyPS, which employs a hybrid parallel and serial architecture. HyPS updates a minimal subset of model parameters, thereby retaining the pre-trained model's original knowledge tructure while enhancing its ability to learn specific features relevant to downstream tasks. We apply this strategy to the state-of-the-art SwinUNETR model for medical image segmentation. Initially, the model is pre-trained on the BraTs2021 dataset, after which the HyPS method is employed to transfer it to three distinct hippocampus datasets.Extensive experiments demonstrate that HyPS outperforms baseline methods, especially in scenarios with limited training samples. Furthermore, based on the segmentation results, we calculated the hippocampal volumes of subjects from the ADNI dataset and combined these with metadata to classify disease types. In distinguishing Alzheimer's disease (AD) from cognitively normal (CN) individuals, as well as early mild cognitive impairment (EMCI) from late mild cognitive impairment (LMCI), HyPS achieved classification accuracies of 83.78% and 64.29%, respectively. These findings indicate that the HyPS method not only facilitates effective hippocampal segmentation using pre-trained models but also holds potential for aiding Alzheimer's disease detection. Our code is publicly available.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# ユーザプロファイルを考慮した事前学習モデルとパラメータ効率の良いファインチューニングによるユーザ特異的対話生成 User-Specific Dialogue Generation with User Profile-Aware Pre-Training Model and Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2409.00887v1 ) ライセンス: Link先を確認	Atsushi Otsuka, Kazuya Matsuo, Ryo Ishii, Narichika Nomoto, Hiroaki Sugiyama,	(参考訳) 本稿では,ユーザ固有のダイアログについて述べる。パーソナライズされた対話を人格記述で定義した仮想ユーザ対話に焦点をあてた以前の研究とは対照的に、ユーザ固有の対話は、人格に基づく対話以外の実際のユーザ対話を再現することを目的としている。対象ユーザの対話履歴を用いた微調整は,ユーザ固有のモデルの効率的な学習方法である。しかし、少量のデータのために過度に適合し、破壊する傾向がある。そこで本研究では,パラメータ効率の良い微調整と,ユーザプロファイルを含む事前学習された対話モデルを組み合わせることで,ユーザ固有モデルの学習手法を提案する。パラメータ効率の良い微調整は、モデル全体に少数のパラメータを追加するため、少量のトレーニングデータでも効率的にトレーニングすることができ、モデル破壊に対して堅牢である。さらに、自動推論されたユーザプロファイルに対する簡単なプロンプトを追加して学習した事前学習モデルは、微調整中のトレーニングデータが少ない場合でも、ユーザのプロファイルに関する知識を増強した音声を生成することができる。実験では,ユーザの個人情報を含むプロンプトを用いて,提案モデルと大言語モデル発話生成を比較した。実ユーザの発話を再現する実験により,提案モデルでは,小さいモデルであっても,比較手法よりも再現性の高い発話を生成できることが判明した。 This paper addresses user-specific dialogs. In contrast to previous research on personalized dialogue focused on achieving virtual user dialogue as defined by persona descriptions, user-specific dialogue aims to reproduce real-user dialogue beyond persona-based dialogue. Fine-tuning using the target user's dialogue history is an efficient learning method for a user-specific model. However, it is prone to overfitting and model destruction due to the small amount of data. Therefore, we propose a learning method for user-specific models by combining parameter-efficient fine-tuning with a pre-trained dialogue model that includes user profiles. Parameter-efficient fine-tuning adds a small number of parameters to the entire model, so even small amounts of training data can be trained efficiently and are robust to model destruction. In addition, the pre-trained model, which is learned by adding simple prompts for automatically inferred user profiles, can generate speech with enhanced knowledge of the user's profile, even when there is little training data during fine-tuning. In experiments, we compared the proposed model with large-language-model utterance generation using prompts containing users' personal information. Experiments reproducing real users' utterances revealed that the proposed model can generate utterances with higher reproducibility than the compared methods, even with a small model.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# 配列モデルにおける過度パラメータ化による適応性の向上 Improving Adaptivity via Over-Parameterization in Sequence Models ( http://arxiv.org/abs/2409.00894v1 ) ライセンス: Link先を確認	Yicheng Li, Qian Lin,	(参考訳) カーネルの固有関数がカーネル回帰において重要な役割を果たすことはよく知られている。いくつかの例を通して、同じ固有関数の集合であっても、これらの関数の順序が回帰結果に大きな影響を及ぼすことを示した。カーネルを対角化することでモデルを単純化し、列モデルの領域に過度にパラメータ化された勾配降下を導入し、固定された固有関数集合の様々な順序の影響を捉える。この方法は様々な固有関数順序の影響を探索するために設計されている。理論的には、過パラメータ化勾配流は信号の基盤構造に適応し、バニラ勾配流法を著しく上回ることを示す。さらに,より深いパラメータ化により,モデルの一般化能力がさらに向上することを示す。これらの結果は、オーバーパラメータ化のメリットに関する新たな視点を提供するだけでなく、カーネル体制を超えたニューラルネットワークの適応性と一般化の可能性に関する洞察を提供する。 It is well known that eigenfunctions of a kernel play a crucial role in kernel regression. Through several examples, we demonstrate that even with the same set of eigenfunctions, the order of these functions significantly impacts regression outcomes. Simplifying the model by diagonalizing the kernel, we introduce an over-parameterized gradient descent in the realm of sequence model to capture the effects of various orders of a fixed set of eigen-functions. This method is designed to explore the impact of varying eigenfunction orders. Our theoretical results show that the over-parameterization gradient flow can adapt to the underlying structure of the signal and significantly outperform the vanilla gradient flow method. Moreover, we also demonstrate that deeper over-parameterization can further enhance the generalization capability of the model. These results not only provide a new perspective on the benefits of over-parameterization and but also offer insights into the adaptivity and generalization potential of neural networks beyond the kernel regime.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# シャロウフェイクとディープフェイクの局所化のためのノイズとエッジ抽出に基づく二分岐法 A Noise and Edge extraction-based dual-branch method for Shallowfake and Deepfake Localization ( http://arxiv.org/abs/2409.00896v1 ) ライセンス: Link先を確認	Deepak Dagar, Dinesh Kumar Vishwakarma,	(参考訳) マルチメディアの信頼性は、高度な画像操作ローカライゼーション(IML)技術によってますます評価され、その結果、IMLフィールドが出現している。有効な操作モデルは、操作された部分と正当な部分の間の非意味的な差分の特徴を抽出し、アーティファクトを利用する必要がある。これは2つの領域間の直接比較を必要とする。と。現在のモデルでは、手作りの特徴に基づく機能アプローチ、畳み込みニューラルネットワーク(CNN)、あるいは両方を組み合わせたハイブリッドアプローチが採用されている。ハンドクラフト機能アプローチは事前にタンパリングを前提としており、それによって様々なタンパ処理の処理効率が制限されるが、CNNはアーティファクトに対処するには不十分なセマンティック情報をキャプチャする。これらの制約に対処するため,従来のCNN機能と手動で設計した特徴雑音を統合するデュアルブランチモデルを開発した。このモデルはデュアルブランチ戦略を採用しており、一方のブランチはノイズ特性を統合し、もう一方のブランチは階層的なConvNextモジュールを使用してRGB機能を統合する。さらに、エッジ監視損失を利用して境界操作情報を取得し、エッジの正確な位置決めを行う。さらに、この機能拡張モジュールを使用して属性の表示を最適化し、洗練する。 shallowfakesデータセット (CASIA, COVERAGE, COLUMBIA, NIST16) とディープフェイクデータセット Faceforensics++ (FF++) は、他のベースラインモデルと比較して特徴と優れたパフォーマンスを抽出する優れた能力を示すために、徹底的なテストを実施した。 AUCの得点は99%だった。このモデルは比較において優れており、既存の最先端モデル(SoTA)よりも容易に優れている。 The trustworthiness of multimedia is being increasingly evaluated by advanced Image Manipulation Localization (IML) techniques, resulting in the emergence of the IML field. An effective manipulation model necessitates the extraction of non-semantic differential features between manipulated and legitimate sections to utilize artifacts. This requires direct comparisons between the two regions.. Current models employ either feature approaches based on handcrafted features, convolutional neural networks (CNNs), or a hybrid approach that combines both. Handcrafted feature approaches presuppose tampering in advance, hence restricting their effectiveness in handling various tampering procedures, but CNNs capture semantic information, which is insufficient for addressing manipulation artifacts. In order to address these constraints, we have developed a dual-branch model that integrates manually designed feature noise with conventional CNN features. This model employs a dual-branch strategy, where one branch integrates noise characteristics and the other branch integrates RGB features using the hierarchical ConvNext Module. In addition, the model utilizes edge supervision loss to acquire boundary manipulation information, resulting in accurate localization at the edges. Furthermore, this architecture utilizes a feature augmentation module to optimize and refine the presentation of attributes. The shallowfakes dataset (CASIA, COVERAGE, COLUMBIA, NIST16) and deepfake dataset Faceforensics++ (FF++) underwent thorough testing to demonstrate their outstanding ability to extract features and their superior performance compared to other baseline models. The AUC score achieved an astounding 99%. The model is superior in comparison and easily outperforms the existing state-of-the-art (SoTA) models.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# 空に侵入する:地球観測星団におけるデータ遅延とオーバーフロー攻撃 Infiltrating the Sky: Data Delay and Overflow Attacks in Earth Observation Constellations ( http://arxiv.org/abs/2409.00897v1 ) ライセンス: Link先を確認	Xiaojian Wang, Ruozhou Yu, Dejun Yang, Guoliang Xue,	(参考訳) 低地球軌道(LEO)地球観測(EO)衛星は、地球を観測する方法を変えました。移動カメラのように、EO衛星は異なるミッションと優先順位の星座に形成され、処理のために地上に送信する必要がある膨大なデータを捕捉する。しかし、EO衛星はダウンリンク通信能力が非常に限られており、送信帯域、地上局の数と位置、高速衛星移動による小さな送信窓によって制限されている。資源利用を最適化するために、EOコンステレーションは、通信効率の最大化のために、通信スペクトルと地上局を共有することが期待されている。本稿では,EOコンステレーションにおける資源競争による新たな攻撃面について検討し,地球観測データの遅延や低下を正統なEOサービスを用いて検討する。具体的には、攻撃者は高優先度要求を注入して、一時的に低優先度データ送信ウィンドウをプリエンプトすることができる。さらに、予測可能な衛星力学を利用することで、攻撃者は低優先度の衛星から重要なデータを知的にターゲットし、配信を遅らせるか、データを不可逆的に落とすかのどちらかを示す。我々は、データ遅延攻撃とデータオーバーフロー攻撃の2つの攻撃を定式化し、攻撃者が攻撃戦略を考案するのを支援するアルゴリズムを設計し、典型的なシナリオにおけるその実現可能性や最適性を分析する。次に、実世界の衛星画像と軌道データを用いてトレース駆動シミュレーションを行い、現実的な衛星通信環境下でこれらの攻撃を発射する確率を評価する。これらの攻撃に対する防御の可能性についても論じる。 Low Earth Orbit (LEO) Earth Observation (EO) satellites have changed the way we monitor Earth. Acting like moving cameras, EO satellites are formed in constellations with different missions and priorities, and capture vast data that needs to be transmitted to the ground for processing. However, EO satellites have very limited downlink communication capability, limited by transmission bandwidth, number and location of ground stations, and small transmission windows due to high velocity satellite movement. To optimize resource utilization, EO constellations are expected to share communication spectrum and ground stations for maximum communication efficiency. In this paper, we investigate a new attack surface exposed by resource competition in EO constellations, targeting the delay or drop of Earth monitoring data using legitimate EO services. Specifically, an attacker can inject high-priority requests to temporarily preempt low-priority data transmission windows. Furthermore, we show that by utilizing predictable satellite dynamics, an attacker can intelligently target critical data from low-priority satellites, either delaying its delivery or irreversibly dropping the data. We formulate two attacks, the data delay attack and the data overflow attack, design algorithms to assist attackers in devising attack strategies, and analyze their feasibility or optimality in typical scenarios. We then conduct trace-driven simulations using real-world satellite images and orbit data to evaluate the success probability of launching these attacks under realistic satellite communication settings. We also discuss possible defenses against these attacks.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# 深部ReLUニューラルネットワークを用いたソボレフとベソフ関数の最適近似について On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks ( http://arxiv.org/abs/2409.00901v1 ) ライセンス: Link先を確認	Yunfei Yang,	(参考訳) 本稿では, ソボレフ空間 $\mathcal{W}^{s,q}([0,1]^d)$ および Besov 空間 $\mathcal{B}^s_{q,r}([0,1]^d)$ において, 誤差が$L^p([0,1]^d)$ノルムで測定された場合, 幅が$W$ で深さが$L$ の深いReLUニューラルネットワークによって近似できる問題について検討する。この問題はいくつかの最近の研究によって研究され、ソボレフ埋め込み条件が 1/q −1/p<s/d$ であるときに、$p=q=\infty$ のときの対数係数への近似率 $\mathcal{O}((WL)^{-2s/d})$ と、固定幅のネットワークに対する $\mathcal{O}(L^{-2s/d})$ が成立するときに得られる。これらの結果を一般化するために、$\mathcal{O}((WL)^{-2s/d})$が実際にソボレフ埋め込み条件の下で成り立つことを示す。この値は対数因子に最適であることが知られている。我々の証明の鍵となるツールは、幅と深さの異なる深部ReLUニューラルネットワークを用いてスパースベクトルを符号化することである。 This paper studies the problem of how efficiently functions in the Sobolev spaces $\mathcal{W}^{s,q}([0,1]^d)$ and Besov spaces $\mathcal{B}^s_{q,r}([0,1]^d)$ can be approximated by deep ReLU neural networks with width $W$ and depth $L$, when the error is measured in the $L^p([0,1]^d)$ norm. This problem has been studied by several recent works, which obtained the approximation rate $\mathcal{O}((WL)^{-2s/d})$ up to logarithmic factors when $p=q=\infty$, and the rate $\mathcal{O}(L^{-2s/d})$ for networks with fixed width when the Sobolev embedding condition $1/q -1/p<s/d$ holds. We generalize these results by showing that the rate $\mathcal{O}((WL)^{-2s/d})$ indeed holds under the Sobolev embedding condition. It is known that this rate is optimal up to logarithmic factors. The key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth, which may be of independent interest.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# MV-Match:植物栄養失調のドメイン適応同定のためのマルチビューマッチング MV-Match: Multi-View Matching for Domain-Adaptive Identification of Plant Nutrient Deficiencies ( http://arxiv.org/abs/2409.00903v1 ) ライセンス: Link先を確認	Jinhui Yi, Yanan Luo, Marion Deichmann, Gabriel Schaaf, Juergen Gall,	(参考訳) 栄養不足の早期、非侵襲的、オンサイト検出は、栄養不足による作物の大きな損失を防ぐためのタイムリーな行動を可能にするために重要である。ラベル付きデータを取得するのは非常に高価ですが、作物の複数のビューから画像を集めるのは簡単です。実用的な応用に関連があるにもかかわらず、ラベル付けされたソースドメインとラベル付けされていないターゲットドメインに対して複数のビューが利用できる教師なしのドメイン適応は、未調査の研究領域である。そこで本研究では,ソース領域とターゲット領域における複数のカメラビューを活用して,教師なし領域適応を実現する手法を提案する。 2つの栄養失調データセットに対する提案手法の評価を行った。提案手法は、他の教師なし領域適応法と比較して、両データセットの最先端結果を実現する。データセットとソースコードはhttps://github.com/jh-yi/MV-Match.comで入手できる。 An early, non-invasive, and on-site detection of nutrient deficiencies is critical to enable timely actions to prevent major losses of crops caused by lack of nutrients. While acquiring labeled data is very expensive, collecting images from multiple views of a crop is straightforward. Despite its relevance for practical applications, unsupervised domain adaptation where multiple views are available for the labeled source domain as well as the unlabeled target domain is an unexplored research area. In this work, we thus propose an approach that leverages multiple camera views in the source and target domain for unsupervised domain adaptation. We evaluate the proposed approach on two nutrient deficiency datasets. The proposed method achieves state-of-the-art results on both datasets compared to other unsupervised domain adaptation methods. The dataset and source code are available at https://github.com/jh-yi/MV-Match.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# 不完全な車両軌道予測のためのマルチスケールテンポラル核融合変圧器 Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction ( http://arxiv.org/abs/2409.00904v1 ) ライセンス: Link先を確認	Zhanwen Liu, Chao Li, Yang Wang, Nan Yang, Xing Fan, Jiaqi Ma, Xiangmo Zhao,	(参考訳) 運動予測は自律走行システムにおいて重要な役割を担い、周囲の車両の予測に基づいて、より正確な局所経路計画と運転決定を自動運転車が達成できるようにする。しかし、既存の手法では、実際の交通シナリオにおける軌道予測性能を必然的に低下させるオブジェクトの閉塞や知覚障害などによる潜在的な欠落値を無視している。この制限に対処するために,Multi-scale Temporal Fusion Transformer (MTFT) という,Multi-scale Attention Head (MAH) とContinuity Representation-guided Multi-scale Fusion (CRMF) モジュールからなる,不完全な車両軌道予測のための新しいエンドツーエンドフレームワークを提案する。具体的には、マルチヘッドアテンション機構を利用して、異なる時間的粒度から軌道のマルチスケールの運動表現を並列にキャプチャし、不足値の予測に対する悪影響を軽減する。さらに、マルチスケールの動作表現をCRMFモジュールに入力して、多スケールの融合を行い、車両の頑健な時間的特徴を得る。融合過程において、車両の運動の連続性表現は、最初に時間ステップを通して抽出され、融合を誘導し、結果として生じる時間的特徴が詳細な情報と車両の運動の全体的傾向の両方を包含し、車両の運動傾向と一致する将来の軌道の正確な復号を容易にする。道路交通シナリオと都市交通シナリオから得られた4つのデータセットについて,提案モデルの評価を行った。実験結果から, 不完全な車両軌道予測タスクにおいて, 高Dデータセット上での総合的な性能改善は39%以上であった。 Motion prediction plays an essential role in autonomous driving systems, enabling autonomous vehicles to achieve more accurate local-path planning and driving decisions based on predictions of the surrounding vehicles. However, existing methods neglect the potential missing values caused by object occlusion, perception failures, etc., which inevitably degrades the trajectory prediction performance in real traffic scenarios. To address this limitation, we propose a novel end-to-end framework for incomplete vehicle trajectory prediction, named Multi-scale Temporal Fusion Transformer (MTFT), which consists of the Multi-scale Attention Head (MAH) and the Continuity Representation-guided Multi-scale Fusion (CRMF) module. Specifically, the MAH leverages the multi-head attention mechanism to parallelly capture multi-scale motion representation of trajectory from different temporal granularities, thus mitigating the adverse effect of missing values on prediction. Furthermore, the multi-scale motion representation is input into the CRMF module for multi-scale fusion to obtain the robust temporal feature of the vehicle. During the fusion process, the continuity representation of vehicle motion is first extracted across time steps to guide the fusion, ensuring that the resulting temporal feature incorporates both detailed information and the overall trend of vehicle motion, which facilitates the accurate decoding of future trajectory that is consistent with the vehicle's motion trend. We evaluate the proposed model on four datasets derived from highway and urban traffic scenarios. The experimental results demonstrate its superior performance in the incomplete vehicle trajectory prediction task compared with state-of-the-art models, e.g., a comprehensive performance improvement of more than 39% on the HighD dataset.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# ViRED: エンジニアリング図面における視覚的関係の予測 ViRED: Prediction of Visual Relations in Engineering Drawings ( http://arxiv.org/abs/2409.00909v1 ) ライセンス: Link先を確認	Chao Gu, Ke Lin, Yiyang Luo, Jiahui Hou, Xiang-Yang Li,	(参考訳) エンジニアリング図面を正確に理解するためには,図面内の画像とその記述表との対応性を確立することが不可欠である。既存の文書理解手法は主にテキストを主なモダリティとして重視するが、実際の画像情報を含む文書には適さない。視覚的関係検出の分野では、タスクの構造は本質的に、描画中のすべてのエンティティペア間の関係を評価する能力を制限する。この問題に対処するため、電気工学図面における表と回路の関係を識別する視覚に基づく関係検出モデルViREDを提案する。我々のモデルは、主にビジョンエンコーダ、オブジェクトエンコーダ、リレーショナルデコーダの3つの部分から構成される。 We implement ViRED using PyTorch to evaluation its performance。 ViREDの有効性を検証するために,我々は一連の実験を行った。実験結果から,本手法は工学的描画データセットにおいて,関係予測のタスクにおいて96倍の精度を達成し,既存の手法よりも大幅に改善したことを示す。結果は、単一のエンジニアリング図面に多数のオブジェクトがある場合でも、ViREDは高速に推論できることを示している。 To accurately understand engineering drawings, it is essential to establish the correspondence between images and their description tables within the drawings. Existing document understanding methods predominantly focus on text as the main modality, which is not suitable for documents containing substantial image information. In the field of visual relation detection, the structure of the task inherently limits its capacity to assess relationships among all entity pairs in the drawings. To address this issue, we propose a vision-based relation detection model, named ViRED, to identify the associations between tables and circuits in electrical engineering drawings. Our model mainly consists of three parts: a vision encoder, an object encoder, and a relation decoder. We implement ViRED using PyTorch to evaluate its performance. To validate the efficacy of ViRED, we conduct a series of experiments. The experimental results indicate that, within the engineering drawing dataset, our approach attained an accuracy of 96\% in the task of relation prediction, marking a substantial improvement over existing methodologies. The results also show that ViRED can inference at a fast speed even when there are numerous objects in a single engineering drawing.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# 磁気三層膜に内在する非線形層間交換結合の駆動 Driving noncollinear interlayer exchange coupling intrinsically in magnetic trilayers ( http://arxiv.org/abs/2409.00911v1 ) ライセンス: Link先を確認	Guan-Wei Peng, Hung-Chin Wang, Yu-Jie Zhong, Chao-Cheng Kaun, Ching-Hao Chang,	(参考訳) 非磁性スペーサを金属三層として挟む強磁性側層は、スピントロニクスデバイスを実現するための重要なプラットフォームとなっている。最近の実験では、導電スペーサの幅や性質を操作することにより、側層間の非線形磁気アライメントが誘導されることが示されている。理論解析の結果,スペーサ幅の変化は層間交換結合(IEC)に大きく影響し,非線形アライメントをもたらすことが明らかとなった。解析および第1原理法により、Agスペーサの特定の幅において、側層の磁気モーメントが垂直であることを示す。このアライメントはAg量子井戸状態によって媒介され、3層にわたってスピンスパイラルを示す。以上の結果から,非直線IECは磁気デバイスやブートスピントロニクス技術を制御する自由度に優れており,輸送能力も向上していることが明らかとなった。 Ferromagnetic side layers sandwiching a nonmagnetic spacer as a metallic trilayer has become a pivotal platform for achieving spintronic devices. Recent experiments demonstrate that manipulating the width or the nature of conducting spacer induces noncollinear magnetic alignment between the side layers. Our theoretical analysis reveals that altering the width of spacer significantly affects the interlayer exchange coupling (IEC), resulting in noncollinear alignment. Through analytic and first-principles methods, our study on the Fe/Ag/Fe trilayer shows that at a specific width of the Ag spacer, the magnetic moments of side layers tend to be perpendicular. This alignment is mediated by Ag quantum well states, exhibiting spin spirals across the trilayer. Our results reveal that the noncollinear IEC offers a degree of freedom to control magnetic devices and boot spintronic technology with improved transport capabilities.	翻訳日:2024-09-06 08:30:49 公開日:2024-09-02
# 外観に基づく視線推定改善のための複数データセットのマージ Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation ( http://arxiv.org/abs/2409.00912v1 ) ライセンス: Link先を確認	Liang Wu, Bertram E. Shi,	(参考訳) 外観に基づく視線推定器のトレーニングとテストのために、複数のデータセットが作成されている。直感的には、より多くのデータがより良いパフォーマンスをもたらすはずです。しかし、1つのエスティマターをトレーニングするためにデータセットを組み合わせることで、視線推定性能が向上することは滅多にない。ひとつは、視線サムプルを得るための実験プロトコルの違いであり、その結果、頭部ポーズの分布、アングルの視線、照明などの違いが生じる可能性がある。もう一つの理由は、視線角(ラベルミスマッチ)を定義する方法の矛盾である。本稿では、複数のデータセット、推定器アーキテクチャの変更、および視線適応モジュールの導入による視線推定性能の向上のための2つのイノベーションを提案する。ほとんどの最先端推定器は、2つの目と顔全体の画像から抽出された情報と平行に融合するか、最初に目からの情報と顔を組み合わせる。提案手法では,2段階トランスフォーマーを用いたGaze-Feature Fusion (TTGF) 法を用いて,両眼と顔の情報を別々にマージし,両眼にマージする。頭部ポーズの変化が左右の眼像に異なる影響を与えるため,頭部ポーズの変動が改善すると考えられる。提案手法は,各データセットにGaze Adaption Moduleを適用して,単一の共有推定器から推定した推定値を補正することにより,アノテーションの不一致を処理する。これにより、ラベル付けの違いに関わらず、データセット間で情報を結合することができます。我々の経験から、これらのイノベーションは、個人と集団の両方(10%から20%)でSOTAの視線推定性能を改善することが示されています。私たちのコードはhttps://github.com/HKUST-NISL/GazeSetMerge.comから入手可能です。 Multiple datasets have been created for training and testing appearance-based gaze estimators. Intuitively, more data should lead to better performance. However, combining datasets to train a single esti-mator rarely improves gaze estimation performance. One reason may be differences in the experimental protocols used to obtain the gaze sam-ples, resulting in differences in the distributions of head poses, gaze an-gles, illumination, etc. Another reason may be the inconsistency between methods used to define gaze angles (label mismatch). We propose two innovations to improve the performance of gaze estimation by leveraging multiple datasets, a change in the estimator architecture and the intro-duction of a gaze adaptation module. Most state-of-the-art estimators merge information extracted from images of the two eyes and the entire face either in parallel or combine information from the eyes first then with the face. Our proposed Two-stage Transformer-based Gaze-feature Fusion (TTGF) method uses transformers to merge information from each eye and the face separately and then merge across the two eyes. We argue that this improves head pose invariance since changes in head pose affect left and right eye images in different ways. Our proposed Gaze Adaptation Module (GAM) method handles annotation inconsis-tency by applying a Gaze Adaption Module for each dataset to correct gaze estimates from a single shared estimator. This enables us to combine information across datasets despite differences in labeling. Our experi-ments show that these innovations improve gaze estimation performance over the SOTA both individually and collectively (by 10% - 20%). Our code is available at https://github.com/HKUST-NISL/GazeSetMerge.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# ネステロフ加速勾配法の一般化連続時間モデル Generalized Continuous-Time Models for Nesterov's Accelerated Gradient Methods ( http://arxiv.org/abs/2409.00913v1 ) ライセンス: Link先を確認	Chanwoong Park, Youngchae Cho, Insoon Yang,	(参考訳) 近年の研究では、ネステロフの加速勾配法を連続時間モデルで理解することへの関心が高まっている。しかし、既存のほとんどの研究はネステロフの方法の特定のクラスに焦点を当てており、これは深い理解と統一された視点の達成を妨げる。この欠点に対処するため、我々は、Nesterovの手法の幅広い範囲をカバーする一般化された連続時間モデルを提示した。主な貢献は以下の通りである。まず、一般化されたモデルの収束率を特定し、それらから派生した任意の特定の連続時間モデルに対する収束率を決定する必要をなくす。第2に,既存の6つの連続時間モデルが一般化されたモデルの特別な場合であることを示し,これらのモデルを分析し,理解するための統一ツールとして,我々のフレームワークを位置づけた。第三に、一般化されたモデルに基づくネステロフの手法の再起動方式を設計し、目的関数値の単調な減少を確実にすることを示す。モデルの広範な適用性のため、このスキームは元の再起動スキームと比較して、より広範なNesterovの手法のクラスに使用することができる。第4に、一般化されたモデルと連続時間における勾配流の関連を明らかにすることにより、一般化されたモデルの加速収束速度が勾配流の時間再パラメータ化に起因することを示す。理論的解析と結果を支援するための数値実験結果を提供する。 Recent research has indicated a substantial rise in interest in understanding Nesterov's accelerated gradient methods via their continuous-time models. However, most existing studies focus on specific classes of Nesterov's methods, which hinders the attainment of an in-depth understanding and a unified perspective. To address this deficit, we present generalized continuous-time models that cover a broad range of Nesterov's methods, including those previously studied under existing continuous-time frameworks. Our key contributions are as follows. First, we identify the convergence rates of the generalized models, eliminating the need to determine the convergence rate for any specific continuous-time model derived from them. Second, we show that six existing continuous-time models are special cases of our generalized models, thereby positioning our framework as a unifying tool for analyzing and understanding these models. Third, we design a restart scheme for Nesterov's methods based on our generalized models and show that it ensures a monotonic decrease in objective function values. Owing to the broad applicability of our models, this scheme can be used to a broader class of Nesterov's methods compared to the original restart scheme. Fourth, we uncover a connection between our generalized models and gradient flow in continuous time, showing that the accelerated convergence rates of our generalized models can be attributed to a time reparametrization in gradient flow. Numerical experiment results are provided to support our theoretical analyses and results.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# 有限熱貯留層下での量子熱機関の内部サイクルにおけるカルノー限界を超えて Beyond the Carnot Limit in the Internal Cycles of a Quantum Heat Engine under Finite Heat Reservoirs ( http://arxiv.org/abs/2409.00914v1 ) ライセンス: Link先を確認	L. -L. Yan, M. -R. Yun, M. Li, S. -L. Su, K. -F. Cui, Gang Chen, M. Feng,	(参考訳) 本研究では, 内部サイクルは, 余剰な量子資源(例えばコヒーレンス, スクイーズ特性など)を消費することなく, 通常のカルノー限界よりも高い電子効率を持つことができる2つのナイト熱貯水池に結合した微視的熱エンジンの量子カルノーサイクルを解析的に検討する。エンジンは時間依存で動作し、内部サイクルと外部サイクルの両方が完全なカルノットサイクルを協調的に達成し、エンジンの効率は貯水池の熱容量と作動物質に依存する。最大効率と最大出力の分析結果から, 微視的エンジンの高性能化の背景となるメカニズムを明らかにするとともに, ナイトサイズ熱貯留層が果たす重要な役割を明らかにした。提案手法は, あらゆる微視的熱力学系に対して有効であり, 現在の実験室条件下で完全に実現可能である。 We investigate, in an analytical fashion, quantum Carnot cycles of a microscopic heat engine coupled to two nite heat reservoirs, whose internal cycles could own higher e ciency than the standard Carnot limit without consuming extra quantum resources, e.g., coherence or squeezing properties. The engine runs time-dependently, involving both the internal and external cycles to collaboratively accomplish a complete Carnot cycle, and the e ciency of the engine depends on the reservoirs heat capacities and the working substance. Our analytical results of the maximum efficiency and the maximum power output clarify the mechanism behind the high performance of the microscopic engines, displaying the key roles played by the nite-sized heat reservoirs. Our proposal is generally valid for any microscopic thermodynamic system and fully feasible under current laboratory conditions.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# 大次元における内積核回帰のピンスカー境界について On the Pinsker bound of inner product kernel regression in large dimensions ( http://arxiv.org/abs/2409.00915v1 ) ライセンス: Link先を確認	Weihao Lu, Jialin Ding, Haobo Zhang, Qian Lin,	(参考訳) 特に球面$\mathbb{S}^{d}$上の内積核に関する最近の研究に基づいて、そのような設定における内積核回帰に対するピンスカー境界について検討する。具体的には、サンプルサイズ$n$ が $\alpha d^{\gamma}(1+o_{d}(1))$ によって与えられるシナリオに対処する。我々は、この設定でカーネル回帰の正確なミニマックスリスクを決定し、ミニマックス率だけでなく、過剰リスクに関連するピンスカー定数と呼ばれる正確な定数も特定した。 Building on recent studies of large-dimensional kernel regression, particularly those involving inner product kernels on the sphere $\mathbb{S}^{d}$, we investigate the Pinsker bound for inner product kernel regression in such settings. Specifically, we address the scenario where the sample size $n$ is given by $\alpha d^{\gamma}(1+o_{d}(1))$ for some $\alpha, \gamma>0$. We have determined the exact minimax risk for kernel regression in this setting, not only identifying the minimax rate but also the exact constant, known as the Pinsker constant, associated with the excess risk.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# MMT-BERT:Multitrack Music TransformerとMusicBERTを用いたコード認識シンボリック音楽生成 MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT ( http://arxiv.org/abs/2409.00919v1 ) ライセンス: Link先を確認	Jinlong Zhu, Keigo Sakurai, Ren Togo, Takahiro Ogawa, Miki Haseyama,	(参考訳) シンボリック・マルチトラック音楽生成に特化して設計された新しいシンボリック・ミュージック表現とジェネレーティブ・アディバーショナル・ネットワーク(GAN)フレームワークを提案する。シンボリック・ミュージック・ジェネレーションの主なテーマは、音楽データの事前処理とディープラーニング・フレームワークの実装である。シンボリック・ミュージック・ジェネレーションに特化した現在の技術は、一般的に2つの重要な課題に直面する: 弦と音階に関する情報の不足を訓練するデータと、シンボリック・ミュージック・表現のユニークな形式に適合した特別に設計されたモデル・アーキテクチャの必要性。本稿では,MusicLang コード解析モデルを用いた新しい記号的音楽表現を導入することで,上記の問題を解決する。本稿では,その表現に適応したMT-BERTアーキテクチャを提案する。頑健なマルチトラック・ミュージック・ジェネレータを構築するため,事前学習したMusicBERTモデルを微調整して判別器として機能し,相対論的標準損失を取り入れた。このアプローチは,MusicBERT内に符号化されたシンボリック音楽の深い理解に支えられ,本手法が生み出す音楽の協和性と人間性を裏付けるものである。実験により,最先端の手法を厳格に追従するアプローチの有効性が示された。 We propose a novel symbolic music representation and Generative Adversarial Network (GAN) framework specially designed for symbolic multitrack music generation. The main theme of symbolic music generation primarily encompasses the preprocessing of music data and the implementation of a deep learning framework. Current techniques dedicated to symbolic music generation generally encounter two significant challenges: training data's lack of information about chords and scales and the requirement of specially designed model architecture adapted to the unique format of symbolic music representation. In this paper, we solve the above problems by introducing new symbolic music representation with MusicLang chord analysis model. We propose our MMT-BERT architecture adapting to the representation. To build a robust multitrack music generator, we fine-tune a pre-trained MusicBERT model to serve as the discriminator, and incorporate relativistic standard loss. This approach, supported by the in-depth understanding of symbolic music encoded within MusicBERT, fortifies the consonance and humanity of music generated by our method. Experimental results demonstrate the effectiveness of our approach which strictly follows the state-of-the-art methods.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# ToolACE: LLM関数呼び出しのポイントを獲得する ToolACE: Winning the Points of LLM Function Calling ( http://arxiv.org/abs/2409.00920v1 ) ライセンス: Link先を確認	Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian, Qun Liu, Enhong Chen,	(参考訳) 関数呼び出しは大きな言語モデルのアプリケーション境界を大幅に拡張し、高品質で多様なトレーニングデータがこの機能のアンロックに不可欠である。しかし、実際の関数呼び出しデータは収集と注釈が難しい一方で、既存のパイプラインで生成された合成データは、カバレッジと正確性に欠ける傾向にある。本稿では,高精度で複雑で多様なツール学習データを生成するための自動エージェントパイプラインであるToolACEを提案する。 ToolACEは、新しい自己進化合成プロセスを活用して、26,507の多様なAPIの包括的なAPIプールをキュレートする。ダイアログは、複数のエージェント間の相互作用を通じてさらに生成され、形式化された思考プロセスによってガイドされる。データ精度を確保するため、ルールベースとモデルベースのチェックを組み合わせた二重層検証システムを実装した。我々は、合成データに基づいてトレーニングされたモデルが、8Bパラメータだけで、最新のGPT-4モデルに匹敵する、バークレー・ファンクション・カリング・リーダーボードで最先端のパフォーマンスを達成することを実証した。我々のモデルとデータのサブセットはhttps://huggingface.co/Team-ACE.comで公開されています。 Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# 型付きホールを用いた大規模言語モデルの統計的文脈化 Statically Contextualizing Large Language Models with Typed Holes ( http://arxiv.org/abs/2409.00921v1 ) ライセンス: Link先を確認	Andrew Blinn, Xiang Li, June Hyung Kim, Cyrus Omar,	(参考訳) 大規模言語モデル(LLM)は、プログラム合成のランドスケープを形変えた。しかし、現代のLLMベースのコード補完システムは、特にトレーニングデータやカーソルに近い定義で作業する場合に、適切なコンテキストが欠如しているため、壊れたコードを幻覚させることが多い。本稿では,言語サーバが公開している言語型とバインディング構造との密接な統合が,この文脈化問題にトークン効率のよい方法で対処できることを実証する。要するに、AIにもIDEが必要だ、と私たちは主張するのです! 特に,LLMコード生成をHazelのライブプログラムスケッチ環境に統合する。 Hazel Language Serverは、エラーがあっても、穴のタイプと型付けのコンテキストを特定し、有意義なプログラムスケッチが常に利用可能であることを保証します。これにより、コードベース全体のコンテキスト情報をカーソルにレキシカルにローカルでなくても、必ずしも同じファイルにローカルでなくても、開発者の目標にセマンティックにローカルになる可能性がある。 LLMによって合成された補完は、言語サーバーとのさらなる対話を通じて反復的に洗練される。これらの手法を評価するために,MVU (Model-view-update) WebアプリケーションのデータセットであるMVUBenchを紹介する。これらのアプリケーションは、アプリケーション固有のデータ構造に依存しているため、課題として機能する。型定義によるコンテキスト化は,特に影響が大きいことが分かりました。 Hazelのコンテキストでアイデアを導入し、MVUBenchをTypeScriptに移植して、これらのメソッドを高レベルの言語に適用可能であることを検証しました。最後に、言語サーバが実装できる言語サーバプロトコル(LSP)の保守的な拡張であるChatLSPの概要を述べる。 Large language models (LLMs) have reshaped the landscape of program synthesis. However, contemporary LLM-based code completion systems often hallucinate broken code because they lack appropriate context, particularly when working with definitions not in the training data nor near the cursor. This paper demonstrates that tight integration with the type and binding structure of a language, as exposed by its language server, can address this contextualization problem in a token-efficient manner. In short, we contend that AIs need IDEs, too! In particular, we integrate LLM code generation into the Hazel live program sketching environment. The Hazel Language Server identifies the type and typing context of the hole being filled, even in the presence of errors, ensuring that a meaningful program sketch is always available. This allows prompting with codebase-wide contextual information not lexically local to the cursor, nor necessarily in the same file, but that is likely to be semantically local to the developer's goal. Completions synthesized by the LLM are then iteratively refined via further dialog with the language server. To evaluate these techniques, we introduce MVUBench, a dataset of model-view-update (MVU) web applications. These applications serve as challenge problems due to their reliance on application-specific data structures. We find that contextualization with type definitions is particularly impactful. After introducing our ideas in the context of Hazel we duplicate our techniques and port MVUBench to TypeScript in order to validate the applicability of these methods to higher-resource languages. Finally, we outline ChatLSP, a conservative extension to the Language Server Protocol (LSP) that language servers can implement to expose capabilities that AI code completion systems of various designs can use to incorporate static context when generating prompts for an LLM.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# ProphetFuzz: 大規模言語モデルによるドキュメントのみによるハイリスクオプションの組み合わせの完全な自動予測とファズリング ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model ( http://arxiv.org/abs/2409.00922v1 ) ライセンス: Link先を確認	Dawei Wang, Geng Zhou, Li Chen, Dan Li, Yukai Miao,	(参考訳) オプションの組み合わせに関連する脆弱性は、膨大な検索スペースのため、ソフトウェアのセキュリティテストにおいて重大な課題となる。従来の研究は、全てのオプションの組み合わせが脆弱性に対して同等の可能性を秘めているとして非効率に扱った突然変異やフィルタリング技術を通じてこの問題に対処していたため、非脆弱なターゲットではかなりの時間が費やされ、結果としてテスト効率が低下した。本稿では,大規模言語モデル(LLM)を設計したプロンプトエンジニアリングを用いて,リスクの高い選択肢の組み合わせ(脆弱性を含む可能性が高くなる)を予測し,人間の介入なしにファジテストを自動的に実施する。我々はProphetFuzzというツールを開発し、関連する3つの研究から収集された52のプログラムからなるデータセット上で評価した。実験全体では10.44CPUを消費した。 ProphetFuzzは1748のハイリスクオプションの組み合わせを平均8.69ドルと予測した。 72時間のファジグの後、ProphetFuzzは予測されたハイリスクオプションの組み合わせの12.30\%に関連する364のユニークな脆弱性を発見した。さらに、ProphetFuzzを使用して、これらのプログラムの最新バージョンで永続的なファジィを行い、140の脆弱性を発見し、93人の開発者が確認し、21人のCVE番号が与えられた。 Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mutation or filtering techniques, which inefficiently treated all option combinations as having equal potential for vulnerabilities, thus wasting considerable time on non-vulnerable targets and resulting in low testing efficiency. In this paper, we utilize carefully designed prompt engineering to drive the large language model (LLM) to predict high-risk option combinations (i.e., more likely to contain vulnerabilities) and perform fuzz testing automatically without human intervention. We developed a tool called ProphetFuzz and evaluated it on a dataset comprising 52 programs collected from three related studies. The entire experiment consumed 10.44 CPU years. ProphetFuzz successfully predicted 1748 high-risk option combinations at an average cost of only \$8.69 per program. Results show that after 72 hours of fuzzing, ProphetFuzz discovered 364 unique vulnerabilities associated with 12.30\% of the predicted high-risk option combinations, which was 32.85\% higher than that found by state-of-the-art in the same timeframe. Additionally, using ProphetFuzz, we conducted persistent fuzzing on the latest versions of these programs, uncovering 140 vulnerabilities, with 93 confirmed by developers and 21 awarded CVE numbers.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# 地下駐車場の稼働予測アルゴリズムの開発 Development of Occupancy Prediction Algorithm for Underground Parking Lots ( http://arxiv.org/abs/2409.00923v1 ) ライセンス: Link先を確認	Shijie Wang,	(参考訳) 本研究の中心となる目的は,地下などの悪環境下での自律運転が直面する認識課題に対処することである。当初,本論文は地下のガレージでデータ収集を開始している。 CARLAシミュレーション環境内にシミュレーションされた地下ガレージモデルを構築し、このシミュレーション環境でセマンティックキティフォーマットの接地真実データを収集する。その後、トランスフォーマーベースのOccupancy Networkモデルを統合し、このシナリオ内での占有グリッド予測タスクを完了させる。包括的なBEV認識フレームワークは、薄暗い、挑戦的な自律運転環境において、ニューラルネットワークモデルの精度を高めるように設計されている。最後に,提案手法の地下シナリオにおける知覚性能の精度を検証する実験を行った。提案手法は自作の地下ガレージデータセットであるSUSTech-COE-ParkingLotでテストし,良好な結果を得た。 The core objective of this study is to address the perception challenges faced by autonomous driving in adverse environments like basements. Initially, this paper commences with data collection in an underground garage. A simulated underground garage model is established within the CARLA simulation environment, and SemanticKITTI format occupancy ground truth data is collected in this simulated setting. Subsequently, the study integrates a Transformer-based Occupancy Network model to complete the occupancy grid prediction task within this scenario. A comprehensive BEV perception framework is designed to enhance the accuracy of neural network models in dimly lit, challenging autonomous driving environments. Finally, experiments validate the accuracy of the proposed solution's perception performance in basement scenarios. The proposed solution is tested on our self-constructed underground garage dataset, SUSTech-COE-ParkingLot, yielding satisfactory results.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# MedSAM-U:信頼性の高いMedSAMのための不確かさ誘導オートマルチプロンプト適応 MedSAM-U: Uncertainty-Guided Auto Multi-Prompt Adaptation for Reliable MedSAM ( http://arxiv.org/abs/2409.00924v1 ) ライセンス: Link先を確認	Nan Zhou, Ke Zou, Kai Ren, Mengting Luo, Linchao He, Meng Wang, Yidi Chen, Yi Zhang, Hu Chen, Huazhu Fu,	(参考訳) 医用セグメンテーションモデル (MedSAM) は, 医用画像のセグメンテーションにおいて顕著な性能を示し, この分野に大きな注目を集めている。しかし、異なるプロンプトタイプや場所に対する感度が問題となる。本稿では,MedSAMの精度を高める信頼性の高いプロンプトの開発に焦点を当て,これらの課題に対処する。我々はMedSAM-Uを導入する。MedSAM-Uは、より信頼性が高く正確な医用画像セグメンテーションのために、マルチプロンプト入力を自動的に洗練するための不確実性誘導フレームワークである。具体的には、まずMedSAMと統合されたMPA-MedSAMをトレーニングし、多様なMedSAM入力に適応させる。次に、不確実性誘導型マルチプロンプトを用いて、プロンプトと初期セグメンテーション結果に関する不確実性を効果的に推定する。特に、新しい不確実性誘導プロンプト適応手法が自動的に適用され、信頼性の高いプロンプトとその対応するセグメンテーション結果が導出される。複数のモードからのデータセットを用いてMedSAM-Uを検証し、普遍的な画像分割モデルを訓練する。 MedSAMと比較して、5つの異なるモーダルデータセットの実験結果から、提案したMedSAM-Uは、不確実性誘導されたプロンプトで平均1.7\%から20.5\%の性能向上を達成することが示された。 The Medical Segment Anything Model (MedSAM) has shown remarkable performance in medical image segmentation, drawing significant attention in the field. However, its sensitivity to varying prompt types and locations poses challenges. This paper addresses these challenges by focusing on the development of reliable prompts that enhance MedSAM's accuracy. We introduce MedSAM-U, an uncertainty-guided framework designed to automatically refine multi-prompt inputs for more reliable and precise medical image segmentation. Specifically, we first train a Multi-Prompt Adapter integrated with MedSAM, creating MPA-MedSAM, to adapt to diverse multi-prompt inputs. We then employ uncertainty-guided multi-prompt to effectively estimate the uncertainties associated with the prompts and their initial segmentation results. In particular, a novel uncertainty-guided prompts adaptation technique is then applied automatically to derive reliable prompts and their corresponding segmentation outcomes. We validate MedSAM-U using datasets from multiple modalities to train a universal image segmentation model. Compared to MedSAM, experimental results on five distinct modal datasets demonstrate that the proposed MedSAM-U achieves an average performance improvement of 1.7\% to 20.5\% across uncertainty-guided prompts.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# 授業場面における学生行動に向けて:新しいデータセットとベースライン Towards Student Actions in Classroom Scenes: New Dataset and Baseline ( http://arxiv.org/abs/2409.00926v1 ) ライセンス: Link先を確認	Zhuolin Tan, Chenqiang Gao, Anyong Qin, Ruixin Chen, Tiecheng Song, Feng Yang, Deyu Meng,	(参考訳) 学生行動の分析は、教育研究において重要かつ困難な課題である。既存の取り組みは、教室の微妙なアクションダイナミクスを捉えるために、アクセス可能なデータセットが欠如していることによって妨げられている。本稿では,複雑な教室シーンを対象としたSAV(Multi-label student action video)データセットを提案する。データセットは、758の教室から、4,324の慎重にトリミングされたビデオクリップで構成され、それぞれに15の教室で生徒が表示するアクションがラベル付けされている。既存の行動データセットと比較して、我々のデータセットは、さまざまな実際の教室シナリオ、高品質のビデオデータ、微妙な動きの違い、密集した物体のエンゲージメント、大きなスケールの違い、様々な射撃角度、視覚的閉塞など、ユニークな課題を提供することで際立っている。データセットの複雑さが増大すると、アクション検出をベンチマークする新たな機会と課題がもたらされる。また,小型で高密度な対象領域における局所的な重要な細部への注意を高めるための,新しいベースライン手法であるビジュアルトランスフォーマーを提案する。平均精度は67.9 %, 平均精度は27.4 %, 平均精度は67.9 %, 平均精度は27.4 %であった。この論文は、データセットを提供するだけでなく、教育方法論や学習成果を変革するAI駆動型教育ツールのさらなる研究も求めている。コードとデータセットはhttps://github.com/Ritatanz/SAVで公開される。 Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label student action video (SAV) dataset for complex classroom scenes. The dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, each labeled with 15 different actions displayed by students in classrooms. Compared to existing behavioral datasets, our dataset stands out by providing a wide range of real classroom scenarios, high-quality video data, and unique challenges, including subtle movement differences, dense object engagement, significant scale differences, varied shooting angles, and visual occlusion. The increased complexity of the dataset brings new opportunities and challenges for benchmarking action detection. Innovatively, we also propose a new baseline method, a visual transformer for enhancing attention to key local details in small and dense object regions. Our method achieves excellent performance with mean Average Precision (mAP) of 67.9\% and 27.4\% on SAV and AVA, respectively. This paper not only provides the dataset but also calls for further research into AI-driven educational tools that may transform teaching methodologies and learning outcomes. The code and dataset will be released at https://github.com/Ritatanz/SAV.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# 自己判断: 適応自己評価による選択的指導 Self-Judge: Selective Instruction Following with Alignment Self-Evaluation ( http://arxiv.org/abs/2409.00935v1 ) ライセンス: Link先を確認	Hai Ye, Hwee Tou Ng,	(参考訳) 事前訓練された大規模言語モデル(LLM)は、命令チューニングを通じて人間の指示に従うように調整することができる。しかし、テストタイムデータの分散が変化しているため、チャットアシスタントとして振る舞う際に、現実的なエラーやコンテンツに不一致を生じさせる可能性のある命令を常に正確に実行するわけではない。そこで本研究では,次の命令に対するLCMの信頼性を高めるために,期待する応答品質が低ければ命令の実行を減らし,選択的な命令に従うことを提案する。我々は、モデル応答の数値的品質スコアを予測できる判断モデルを訓練する。データ不足に対処するために、人間に注釈付けされた品質スコアを必要とせずに、判断モデルを開発するための新しい自己学習フレームワークであるSelf-Jを導入する。提案手法はモデル固有の自己評価機能を利用して,ラベル付き命令チューニングデータから応答品質に関する情報を抽出する。応答サンプルとゴールド参照のセマンティックな類似性を評価することにより、自己評価と再検討を容易にするために、ゴールド参照応答が組み込まれている。トレーニング期間中に,基準自由推定の能力を高めるために,正則化手法として自己蒸留を実装した。一般的な指示追従タスクにおけるアライメント評価を検証するため,Hugging Faceから大規模高品質な命令を収集し,モデルトレーニングと評価を行った。提案手法は, GPT-4およびGPT-3.5-turboから抽出した教師モデルよりも, GPT-4との相関性が高いことを示す。我々の分析は、ドメイン間のモデルの強い一般化を示している。さらに、審査モデルは、例えば、WizardLM-13B-V1.2を89.17から92.48に引き上げ、AlpacaEvalのバージョンv1とv2の12.03から15.90にそれぞれ、ベストオブ32サンプリングを使用して、報奨モデルとして機能する。 Pre-trained large language models (LLMs) can be tailored to adhere to human instructions through instruction tuning. However, due to shifts in the distribution of test-time data, they may not always execute instructions accurately, potentially generating factual errors or misaligned content when acting as chat assistants. To enhance the reliability of LLMs in following instructions, we propose the study of selective instruction following, whereby the system declines to execute instructions if the anticipated response quality is low. We train judge models that can predict numerical quality scores for model responses. To address data scarcity, we introduce Self-J, a novel self-training framework for developing judge models without needing human-annotated quality scores. Our method leverages the model's inherent self-evaluation capability to extract information about response quality from labeled instruction-tuning data. It incorporates a gold reference answer to facilitate self-evaluation and recalibrates by assessing the semantic similarity between the response sample and the gold reference. During the training phase, we implement self-distillation as a regularization technique to enhance the capability of reference-free estimation. To validate alignment evaluation on general instruction-following tasks, we collect large-scale high-quality instructions from Hugging Face for model training and evaluation. Extensive experiments on five open-source models show that our method correlates much more with GPT-4 than strong baselines, e.g., supervised models distilled from GPT-4 and GPT-3.5-turbo. Our analysis shows our model's strong generalization across domains. Additionally, our judge models serve as good reward models, e.g., boosting WizardLM-13B-V1.2 from 89.17 to 92.48 and from 12.03 to 15.90 in version v1 and v2 of AlpacaEval respectively using best-of-32 sampling with our judge models.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# 感性トピックの自動検出のための大規模言語モデル Large Language Models for Automatic Detection of Sensitive Topics ( http://arxiv.org/abs/2409.00940v1 ) ライセンス: Link先を確認	Ruoyu Wen, Stephanie Elena Crowe, Kunal Gupta, Xinyue Li, Mark Billinghurst, Simon Hoermann, Dwain Allan, Alaeddin Nassani, Thammathip Piumsomboon,	(参考訳) 安全なオンラインコミュニティを維持するためには、コンテンツモデレーションにおいて、敏感な情報検出が不可欠である。従来の手作業で補助することで、人間のモデレーターが圧倒的で面倒な作業から解放され、潜在的なリスクをもたらす可能性のあるフラグ付きコンテンツのみに集中できるようになる。急速に進歩する大規模言語モデル(LLM)は、自然言語を理解し処理する能力で知られており、このプロセスをサポートする潜在的なソリューションを提供する。本研究は,2つのオンラインデータセット内のメンタルヘルス領域における機密メッセージを検出するための5つのLLMの機能について検討し,精度,精度,リコール,F1スコア,一貫性の観点からその性能を評価する。以上の結果から, LLM はモデレーションワークフローに, 簡便かつ高精度な検出ツールとして組み込まれる可能性が示唆された。最高のパフォーマンスモデルであるGPT-4oは平均精度99.5\%、F1スコア0.99を達成した。我々は、モデレーションワークフローでLLMを使うことの利点と潜在的な課題について論じ、将来の研究は、この技術を利用する際の倫理的考慮事項に対処すべきだと提案する。 Sensitive information detection is crucial in content moderation to maintain safe online communities. Assisting in this traditionally manual process could relieve human moderators from overwhelming and tedious tasks, allowing them to focus solely on flagged content that may pose potential risks. Rapidly advancing large language models (LLMs) are known for their capability to understand and process natural language and so present a potential solution to support this process. This study explores the capabilities of five LLMs for detecting sensitive messages in the mental well-being domain within two online datasets and assesses their performance in terms of accuracy, precision, recall, F1 scores, and consistency. Our findings indicate that LLMs have the potential to be integrated into the moderation workflow as a convenient and precise detection tool. The best-performing model, GPT-4o, achieved an average accuracy of 99.5\% and an F1-score of 0.99. We discuss the advantages and potential challenges of using LLMs in the moderation workflow and suggest that future research should address the ethical considerations of utilising this technology.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# VQ-Flow:階層ベクトル量子化によるマルチクラス異常検出のための正規化フローのモデリング VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization ( http://arxiv.org/abs/2409.00942v1 ) ライセンス: Link先を確認	Yixuan Zhou, Xing Xu, Zhe Sun, Jingkuan Song, Andrzej Cichocki, Heng Tao Shen,	(参考訳) 複雑なデータ分布をモデル化する能力で有名な確率モデルのカテゴリである正規化フローは、教師なし異常検出において顕著な効果を示した。本稿では,マルチクラス異常検出におけるフローの正規化の可能性について検討する。ベクトル量子化(VQ)の統合により,多クラス正規データの異なる概念を教師なしで識別するフローモデルが強化され,VQ-Flowと呼ばれる新しいフローベース統一手法が実現される。具体的には,概念識別のための概念プロトタイプコードブック (Conceptual Prototype Codebook, CPC) と概念固有パターンコードブック (Concomitant Concept-Specific Pattern Codebook, CSPC) の2つの相対的符号ブックを,階層的ベクトル量子化を用いて推定する。 VQ-Flowのフローモデルは、CSPCでキャプチャされた概念固有のパターンに基づいており、異なる概念に関連する特定の通常のパターンをモデル化することができる。さらに、CPCは、概念認識分布モデリングのためのVQ-Flowを可能にし、概念プロトタイプ上で再パラメータ化された混合ガウス分布を通して、複雑な多クラス正規分布を忠実に模倣する。ベクトル量子化の導入により、提案したVQ-Flowは、統一的なトレーニングスキーム内での多クラス異常検出において最先端の手法を推し進め、Detを得る。 /Loc AUROC 99.5%/98.3% MVTec AD コードベースはhttps://github.com/cool-xuan/vqflow.comで公開されている。 Normalizing flows, a category of probabilistic models famed for their capabilities in modeling complex data distributions, have exhibited remarkable efficacy in unsupervised anomaly detection. This paper explores the potential of normalizing flows in multi-class anomaly detection, wherein the normal data is compounded with multiple classes without providing class labels. Through the integration of vector quantization (VQ), we empower the flow models to distinguish different concepts of multi-class normal data in an unsupervised manner, resulting in a novel flow-based unified method, named VQ-Flow. Specifically, our VQ-Flow leverages hierarchical vector quantization to estimate two relative codebooks: a Conceptual Prototype Codebook (CPC) for concept distinction and its concomitant Concept-Specific Pattern Codebook (CSPC) to capture concept-specific normal patterns. The flow models in VQ-Flow are conditioned on the concept-specific patterns captured in CSPC, capable of modeling specific normal patterns associated with different concepts. Moreover, CPC further enables our VQ-Flow for concept-aware distribution modeling, faithfully mimicking the intricate multi-class normal distribution through a mixed Gaussian distribution reparametrized on the conceptual prototypes. Through the introduction of vector quantization, the proposed VQ-Flow advances the state-of-the-art in multi-class anomaly detection within a unified training scheme, yielding the Det./Loc. AUROC of 99.5%/98.3% on MVTec AD. The codebase is publicly available at https://github.com/cool-xuan/vqflow.	翻訳日:2024-09-06 08:21:03 公開日:2024-09-02
# 大規模言語モデルを用いた音声合成のためのフレームワーク A Framework for Synthetic Audio Conversations Generation using Large Language Models ( http://arxiv.org/abs/2409.00946v1 ) ライセンス: Link先を確認	Kaung Myat Kyaw, Jonathan Hoyin Chan,	(参考訳) 本稿では,複数のペルソナ設定を持つ大言語モデル(LLM)を用いて合成会話音声を生成するためのフレームワークであるConversaSynthを紹介する。このフレームワークはまず、さまざまなトピックにわたる多様で一貫性のあるテキストベースの対話を生成し、その後、TTS(text-to-speech)システムを使用して音声に変換する。実験の結果、ConversaSynthは高品質な合成音声データセットを効果的に生成し、音声タグ付け、音声分類、マルチスピーカ音声認識のためのモデルの訓練と評価を大幅に向上させることができることがわかった。その結果、ConversaSynthが生成した合成データセットには、かなりの多様性とリアリズムがあり、堅牢で適応可能なオーディオベースのAIシステムの開発に適していることが示唆された。 In this paper, we introduce ConversaSynth, a framework designed to generate synthetic conversation audio using large language models (LLMs) with multiple persona settings. The framework first creates diverse and coherent text-based dialogues across various topics, which are then converted into audio using text-to-speech (TTS) systems. Our experiments demonstrate that ConversaSynth effectively generates highquality synthetic audio datasets, which can significantly enhance the training and evaluation of models for audio tagging, audio classification, and multi-speaker speech recognition. The results indicate that the synthetic datasets generated by ConversaSynth exhibit substantial diversity and realism, making them suitable for developing robust, adaptable audio-based AI systems.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# XNet v2: 制限が少なく、結果が良く、より普遍性が高い XNet v2: Fewer Limitations, Better Results and Greater Universality ( http://arxiv.org/abs/2409.00947v1 ) ライセンス: Link先を確認	Yanfeng Zhou, Lingrui Li, Zichen Wang, Guole Liu, Ziwen Liu, Ge Yang,	(参考訳) XNetはウェーブレットベースのバイオメディカルセグメンテーションのためのX字型統一アーキテクチャを導入している。しかし、これまでのところXNetは、高周波数(HF)情報がない場合のパフォーマンス低下、生画像の未使用化、核融合の不十分など、その制限に直面している。これらの問題に対処するため、低周波・高周波補完モデルであるXNet v2を提案する。 XNet v2は、ウェーブレットベースの画像レベルの相補的融合を行い、融合結果と3つの異なるサブネットワークを入力して整合性損失を構築する。さらに,低周波(LF)情報とHF情報の転送を促進する機能レベルの融合モジュールを導入する。 XNet v2は、半教師付きセグメンテーションにおける最先端の達成と、完全に教師付き学習の反復的な結果の維持を実現する。さらに重要なのは、XNetが失敗するシナリオにおいて、XNet v2が優れていることだ。 XNetと比較して、XNet v2はより少ない制限、より良い結果、より大きな普遍性を示す。 3つの2Dデータセットと2つの3Dデータセットに関する大規模な実験は、XNet v2の有効性を示している。コードはhttps://github.com/Yanfeng-Zhou/XNetv2で入手できる。 XNet introduces a wavelet-based X-shaped unified architecture for fully- and semi-supervised biomedical segmentation. So far, however, XNet still faces the limitations, including performance degradation when images lack high-frequency (HF) information, underutilization of raw images and insufficient fusion. To address these issues, we propose XNet v2, a low- and high-frequency complementary model. XNet v2 performs wavelet-based image-level complementary fusion, using fusion results along with raw images inputs three different sub-networks to construct consistency loss. Furthermore, we introduce a feature-level fusion module to enhance the transfer of low-frequency (LF) information and HF information. XNet v2 achieves state-of-the-art in semi-supervised segmentation while maintaining competitve results in fully-supervised learning. More importantly, XNet v2 excels in scenarios where XNet fails. Compared to XNet, XNet v2 exhibits fewer limitations, better results and greater universality. Extensive experiments on three 2D and two 3D datasets demonstrate the effectiveness of XNet v2. Code is available at https://github.com/Yanfeng-Zhou/XNetv2 .	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# 汎用ロボット学習のための意味制御可能な拡張 Semantically Controllable Augmentations for Generalizable Robot Learning ( http://arxiv.org/abs/2409.00951v1 ) ライセンス: Link先を確認	Zoey Chen, Zhao Mandi, Homanga Bharadhwaj, Mohit Sharma, Shuran Song, Abhishek Gupta, Vikash Kumar,	(参考訳) ロボット操作の現実に見えないシナリオへの一般化には、トレーニング中にさまざまなデータセットを公開する必要がある。しかし、運用コストが高いため、大規模な実世界のデータセットの収集は困難である。これらの課題にもかかわらず、ロボット学習が一般化するには、ロボットの直接的な経験を超えて、データや事前のソースを活用することが不可欠である。本研究では,大量のWebスクラッドデータに対して事前学習された画像テキスト生成モデルが,そのようなデータソースとして機能することを示す。これらの生成モデルは、ロボットの直接体験を超えた幅広い現実のシナリオを含み、ロボットエージェントが現実世界の一般化を余分なコストで支援する追加の世界に露出する新しい合成体験を合成することができる。特に,本手法では,事前学習した生成モデルをデータ拡張の有効なツールとして活用する。本稿では,実世界の一般化を可能にする豊富なバリエーションを誘導しながら,意味制御可能な拡張とロボットデータセットの高速乗算のための生成的拡張フレームワークを提案する。ロボットデータの多種多様な拡張に基づいて、シミュレーションとキッチンやテーブルトップのような目に見えない現実環境の両方において、スケーラブルなロボット操作ポリシーがいかに訓練され、デプロイされるかを示す。実世界の多様なロボットアプリケーションにおける画像テキスト生成モデルの有効性を実証することにより、我々の生成拡張フレームワークは、人間の余分なコストでロボット学習の一般化を促進するためのスケーラブルで効率的な経路を提供する。 Generalization to unseen real-world scenarios for robot manipulation requires exposure to diverse datasets during training. However, collecting large real-world datasets is intractable due to high operational costs. For robot learning to generalize despite these challenges, it is essential to leverage sources of data or priors beyond the robot's direct experience. In this work, we posit that image-text generative models, which are pre-trained on large corpora of web-scraped data, can serve as such a data source. These generative models encompass a broad range of real-world scenarios beyond a robot's direct experience and can synthesize novel synthetic experiences that expose robotic agents to additional world priors aiding real-world generalization at no extra cost. In particular, our approach leverages pre-trained generative models as an effective tool for data augmentation. We propose a generative augmentation framework for semantically controllable augmentations and rapidly multiplying robot datasets while inducing rich variations that enable real-world generalization. Based on diverse augmentations of robot data, we show how scalable robot manipulation policies can be trained and deployed both in simulation and in unseen real-world environments such as kitchens and table-tops. By demonstrating the effectiveness of image-text generative models in diverse real-world robotic applications, our generative augmentation framework provides a scalable and efficient path for boosting generalization in robot learning at no extra human cost.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# 多体断熱通路:不安定、カオス、量子古典対応 Many-body adiabatic passage: Instability, chaos, and quantum classical correspondence ( http://arxiv.org/abs/2409.00952v1 ) ライセンス: Link先を確認	Anant Vijay Varma, Amichay Vardi, Doron Cohen,	(参考訳) 相互作用するボソン系の断熱通路は、相互作用と粒子間の絡み合いによって大きく影響を受ける。我々は,低次元カオス(3サイト連鎖)および高次元カオス(3サイト以上)を示すBose-Hubbard鎖におけるSTIRAP様のスキームを考える。転送プロトコルによって生成されるダイナミクスは、平均場古典的処理、トランケート・ウィグナー半古典的処理、および全多体量子シミュレーションにおいて現れる古典的および量子的カオス指紋を示す。 Adiabatic passage in systems of interacting bosons is substantially affected by interactions and inter-particle entanglement. We consider STIRAP-like schemes in Bose-Hubbard chains that exhibit low-dimensional chaos (a 3 site chain), and high-dimensional chaos (more than 3 sites). The dynamics that is generated by a transfer protocol exhibits striking classical and quantum chaos fingerprints that are manifest in the mean-field classical treatment, in the truncated-Wigner semiclassical treatment, and in the full many-body quantum simulations.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# PNVC:実践的なINRベースのビデオ圧縮を目指して PNVC: Towards Practical INR-based Video Compression ( http://arxiv.org/abs/2409.00953v1 ) ライセンス: Link先を確認	Ge Gao, Ho Man Kwan, Fan Zhang, David Bull,	(参考訳) ニューラルビデオ圧縮は、最近、レート品質のパフォーマンスの観点から、従来のビデオコーデックと競合する大きな可能性を示している。しかしながら、これらの学習ビデオコーデックは、デコード複雑性(オートエンコーダベースの方法)や/またはシステム遅延(暗黙のニューラル表現(INR)ベースのモデル)に関連する様々な問題と関連付けられており、現在、それらが実用的なアプリケーションにデプロイされることを防いでいる。本稿では,実用的なニューラルビデオコーデックをターゲットとして,自動エンコーダと過度に適合したソリューションを革新的に組み合わせた,新しいINRベースのコーディングフレームワークであるPNVCを提案する。我々のアプローチは、新しい構造的再パラメータ化に基づくアーキテクチャ、階層的品質制御、変調に基づくエントロピーモデリング、スケールアウェアな位置埋め込みなど、いくつかの設計革新の恩恵を受けている。低遅延(LD)とランダムアクセス(RA)の両方をサポートしているため、PNVCは既存のINRベースのコーデックよりも優れており、HEVC HM 18.0(LD)に対して35%以上のBDレートの保存を実現している。これは、INRベースのビデオコーディングにとって重要な一歩であり、実践的なデプロイメントに向かっている。ソースコードは公開評価のために利用できる。 Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# 物理インフォームドニューラルネットワークを用いたディジタル画像相関法 Physics-Informed Neural Network Based Digital Image Correlation Method ( http://arxiv.org/abs/2409.00956v1 ) ライセンス: Link先を確認	Boda Li, Shichao Zhou, Qinwei Ma, Shaopeng Ma,	(参考訳) ディジタル画像相関(DIC)は、従来、変位場を決定するためにサブセットマッチングに頼っていた、フルフィールドの変形測定のための実験力学における鍵となる技術である。しかし、不均一な変形シナリオでは、形状関数やサブセットサイズのような最適なパラメータを選択することは困難である。最近のディープラーニングベースのDICアプローチは、教師付きと教師なしの両方で、ニューラルネットワークを使用してスペックル画像を変形場にマッピングし、手動チューニングなしで正確な測定を提供する。しかし,これらの手法ではスペックル画像の特徴を抽出するために複雑なネットワークアーキテクチャを必要とするため,解の精度は保証されない。従来のアプローチとは異なり、PINN-DICは、座標領域を入力として、変位場を出力する単純な完全に接続されたニューラルネットワークを使用する。 DIC制御方程式を損失関数に統合することにより、PINN-DICは、反復最適化により参照および変形スペックル画像から直接変位場を抽出する。シミュレーションおよび実実験による評価は、PINN-DICが非一様分野における深層学習に基づくDICの精度を維持しつつ、3つの異なる利点を提供していることを示している。 1)座標から変位場を直接取付けることにより、より単純なネットワークによる精度の向上。 2【最小パラメータ調整による不規則境界変位場の効果的取扱い】 3) 包括的DIC結果解析のための他のニューラルネットワークに基づく機械的解析手法と容易に統合できる。 Digital Image Correlation (DIC) is a key technique in experimental mechanics for full-field deformation measurement, traditionally relying on subset matching to determine displacement fields. However, selecting optimal parameters like shape functions and subset size can be challenging in non-uniform deformation scenarios. Recent deep learning-based DIC approaches, both supervised and unsupervised, use neural networks to map speckle images to deformation fields, offering precise measurements without manual tuning. However, these methods require complex network architectures to extract speckle image features, which does not guarantee solution accuracy This paper introduces PINN-DIC, a novel DIC method based on Physics-Informed Neural Networks (PINNs). Unlike traditional approaches, PINN-DIC uses a simple fully connected neural network that takes the coordinate domain as input and outputs the displacement field. By integrating the DIC governing equation into the loss function, PINN-DIC directly extracts the displacement field from reference and deformed speckle images through iterative optimization. Evaluations on simulated and real experiments demonstrate that PINN-DIC maintains the accuracy of deep learning-based DIC in non-uniform fields while offering three distinct advantages: 1) enhanced precision with a simpler network by directly fitting the displacement field from coordinates, 2) effective handling of irregular boundary displacement fields with minimal parameter adjustments, and 3) easy integration with other neural network-based mechanical analysis methods for comprehensive DIC result analysis.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# 音声と音声の同時翻訳における最先端化には,何が必要か? What does it take to get state of the art in simultaneous speech-to-speech translation? ( http://arxiv.org/abs/2409.00965v1 ) ライセンス: Link先を確認	Vincent Wilmet, Johnson Du,	(参考訳) 本稿では, 音声合成モデルの性能向上にともなう遅延特性の詳細な解析を行い, 特に幻覚による遅延スパイクに着目した。様々な入力パラメータや条件を体系的に実験することにより、レイテンシのスパイクを最小限に抑え、全体的な性能を改善する方法を提案する。この結果から,注意深い入力管理と戦略的パラメータ調整を組み合わせることで,音声合成モデルの遅延挙動を著しく向上させることができることが示唆された。 This paper presents an in-depth analysis of the latency characteristics observed in simultaneous speech-to-speech model's performance, particularly focusing on hallucination-induced latency spikes. By systematically experimenting with various input parameters and conditions, we propose methods to minimize latency spikes and improve overall performance. The findings suggest that a combination of careful input management and strategic parameter adjustments can significantly enhance speech-to-speech model's latency behavior.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# 低次多項式による相関確率ブロックモデル検出のための計算遷移 A computational transition for detecting correlated stochastic block models by low-degree polynomials ( http://arxiv.org/abs/2409.00966v1 ) ライセンス: Link先を確認	Guanyi Chen, Jian Ding, Shuyang Gong, Zhangsong Li,	(参考訳) 一対のランダムグラフにおける相関性の検出は、近年広く研究されている基本的な統計的および計算上の問題である。この研究では、相関(スパース)確率ブロックモデル $\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon;s)$を共通の親確率ブロックモデル $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon)$ with $k=O(1)$ symmetric community, average degree $\lambda=O(1)$, divergence parameter $\epsilon$, subsampling probability $s$とみなす。このモデルを同一辺密度$\mathcal{G}(n,\tfrac{\lambda s}{n})$の独立したErd\H{o}s-R\'enyiグラフと区別する検出問題に対して、隣接行列のエントリの \emph{low-degree polynomials} に基づくテストに焦点を合わせ、容易かつ難しい規則を分離するしきい値を決定する。より正確には、このテストのクラスがこれらの2つのモデルを区別できることは、$s> \min \{ \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} \}$, where $\alpha\approx 0.338$ is the Otter's constant and $\frac{1}{\lambda \epsilon^2}$ is the Kesten-Stigum thresholdである場合に限る。低次硬さの証明は、低次硬さ計算の条件変種に基づいている。 Detection of correlation in a pair of random graphs is a fundamental statistical and computational problem that has been extensively studied in recent years. In this work, we consider a pair of correlated (sparse) stochastic block models $\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon;s)$ that are subsampled from a common parent stochastic block model $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon)$ with $k=O(1)$ symmetric communities, average degree $\lambda=O(1)$, divergence parameter $\epsilon$, and subsampling probability $s$. For the detection problem of distinguishing this model from a pair of independent Erd\H{o}s-R\'enyi graphs with the same edge density $\mathcal{G}(n,\tfrac{\lambda s}{n})$, we focus on tests based on \emph{low-degree polynomials} of the entries of the adjacency matrices, and we determine the threshold that separates the easy and hard regimes. More precisely, we show that this class of tests can distinguish these two models if and only if $s> \min \{ \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} \}$, where $\alpha\approx 0.338$ is the Otter's constant and $\frac{1}{\lambda \epsilon^2}$ is the Kesten-Stigum threshold. Our proof of low-degree hardness is based on a conditional variant of the low-degree likelihood calculation.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# グラフニューラルネットワークを用いた深層強化学習による統合プロセス計画とスケジューリング問題の解法 Solving Integrated Process Planning and Scheduling Problem via Graph Neural Network Based Deep Reinforcement Learning ( http://arxiv.org/abs/2409.00968v1 ) ライセンス: Link先を確認	Hongpei Li, Han Zhang, Ziyan He, Yunkai Jia, Bo Jiang, Xiang Huang, Dongdong Ge,	(参考訳) 統合プロセス計画とスケジューリング(IPPS)問題は、プロセスルート計画とショップスケジューリングを組み合わせることで、生産の効率化と資源利用の最大化を実現している。混合整数線形計画法(MILP)とヒューリスティックアルゴリズムを用いる従来の手法では、IPPSを解く際の解の質と速度のバランスが良くない。本稿では,新しいエンドツーエンドのDeep Reinforcement Learning(DRL)手法を提案する。我々は、IPPS問題をマルコフ決定プロセス(MDP)としてモデル化し、不均一グラフニューラルネットワーク(GNN)を用いて、操作、機械、ジョブ間の複雑な関係を捉える。スケジューリング戦略の最適化にはPPO(Proximal Policy Optimization)を用いる。実験の結果,提案手法は従来手法と比較して,大規模IPPSインスタンスのソリューション効率と品質を著しく向上させ,現代のインテリジェント製造システムにおいて優れたスケジューリング戦略を提供することが示された。 The Integrated Process Planning and Scheduling (IPPS) problem combines process route planning and shop scheduling to achieve high efficiency in manufacturing and maximize resource utilization, which is crucial for modern manufacturing systems. Traditional methods using Mixed Integer Linear Programming (MILP) and heuristic algorithms can not well balance solution quality and speed when solving IPPS. In this paper, we propose a novel end-to-end Deep Reinforcement Learning (DRL) method. We model the IPPS problem as a Markov Decision Process (MDP) and employ a Heterogeneous Graph Neural Network (GNN) to capture the complex relationships among operations, machines, and jobs. To optimize the scheduling strategy, we use Proximal Policy Optimization (PPO). Experimental results show that, compared to traditional methods, our approach significantly improves solution efficiency and quality in large-scale IPPS instances, providing superior scheduling strategies for modern intelligent manufacturing systems.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# 解釈可能な畳み込みSyncNet Interpretable Convolutional SyncNet ( http://arxiv.org/abs/2409.00971v1 ) ライセンス: Link先を確認	Sungjoon Park, Jaesub Yun, Donggeon Lee, Minsik Park,	(参考訳) さまざまな理由でビデオが同期不能になる可能性があるため、同期されたビデオを必要とするタスクのために、同期ネットがビデオを再同期するために使用される。これまでのSOTA(State-of-the-art)シンクネットはInfoNCEロスを使用しており、トランスフォーマーアーキテクチャに依存している。残念なことに、前者はモデルの出力を解釈しにくくし、後者は大きな画像に親しみがなく、同期ネットの有用性を制限している。本研究ではBCE損失(BBCE)とBCE損失(BCE)とInfoNCE損失(InfoNCE損失)に基づいて畳み込み同期ネットを訓練する。 InfoNCEの損失とは対照的に、BBCEの損失は複雑なサンプリングスキームを必要としない。我々のモデルはより大きな画像を扱うことができ、その出力は確率論的解釈を与えることができる。確率論的解釈により、オフセット時の確率やオフスクリーン比などのメトリクスを定義し、音声視覚(AV)音声データセットの同期品質を評価することができる。さらに、当社のモデルでは、LSS2データセットで9,6.5\%、LSS3データセットで9,3.8\%のSOTA精度を実現している。 Because videos in the wild can be out of sync for various reasons, a sync-net is used to bring the video back into sync for tasks that require synchronized videos. Previous state-of-the-art (SOTA) sync-nets use InfoNCE loss, rely on the transformer architecture, or both. Unfortunately, the former makes the model's output difficult to interpret, and the latter is unfriendly with large images, thus limiting the usefulness of sync-nets. In this work, we train a convolutional sync-net using the balanced BCE loss (BBCE), a loss inspired by the binary cross entropy (BCE) and the InfoNCE losses. In contrast to the InfoNCE loss, the BBCE loss does not require complicated sampling schemes. Our model can better handle larger images, and its output can be given a probabilistic interpretation. The probabilistic interpretation allows us to define metrics such as probability at offset and offscreen ratio to evaluate the sync quality of audio-visual (AV) speech datasets. Furthermore, our model achieves SOTA accuracy of $96.5\%$ on the LRS2 dataset and $93.8\%$ on the LRS3 dataset.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# IVGF:Fusion-Guided Infrared and Visible General Framework IVGF: The Fusion-Guided Infrared and Visible General Framework ( http://arxiv.org/abs/2409.00973v1 ) ライセンス: Link先を確認	Fangcen Liu, Chenqiang Gao, Fang Chen, Pengcheng Li, Junjie Guo, Deyu Meng,	(参考訳) セマンティックセグメンテーション(セグメンテーション)やオブジェクト検出(オブジェクト検出)といった、赤外線および可視光二重モードタスクは、相補的な情報を融合することにより、極端な場面でも堅牢な性能を達成することができる。現在のほとんどのメソッドは、複数のタスクにまたがる一般化に制限があるタスク固有のフレームワークを設計している。本稿では、多くの高レベル視覚タスクに容易に拡張可能な、融合誘導型赤外線可視光一般フレームワークIVGFを提案する。まず、一般表現を抽出するために、SOTA赤外線および可視基盤モデルを採用する。そして,高次視覚タスクにおけるこれらの汎用表現のセマンティクス情報を強化するために,特徴マップとトークンのための特徴拡張モジュールとトークン拡張モジュールをそれぞれ設計する。さらに,2つのモードの相補的な情報を探究し,効果的に融合するための注意誘導核融合モジュールを提案する。さらに,データ拡張を行うために,カットアウト/ミックス拡張戦略を採用することで,モデルが2つのモダリティ間の地域相補性をマイニングする能力をさらに向上する。広範囲な実験により、IVGFはセマンティックセグメンテーションやオブジェクト検出タスクにおいて、最先端のデュアルモダリティ手法よりも優れていることが示された。詳細なアブレーション研究は各モジュールの有効性を実証し、別の実験では、二重モードセマンティックセマンティックセグメンテーションタスクにおいて提案手法の欠落防止能力について検討している。 Infrared and visible dual-modality tasks such as semantic segmentation and object detection can achieve robust performance even in extreme scenes by fusing complementary information. Most current methods design task-specific frameworks, which are limited in generalization across multiple tasks. In this paper, we propose a fusion-guided infrared and visible general framework, IVGF, which can be easily extended to many high-level vision tasks. Firstly, we adopt the SOTA infrared and visible foundation models to extract the general representations. Then, to enrich the semantics information of these general representations for high-level vision tasks, we design the feature enhancement module and token enhancement module for feature maps and tokens, respectively. Besides, the attention-guided fusion module is proposed for effectively fusing by exploring the complementary information of two modalities. Moreover, we also adopt the cutout&mix augmentation strategy to conduct the data augmentation, which further improves the ability of the model to mine the regional complementary between the two modalities. Extensive experiments show that the IVGF outperforms state-of-the-art dual-modality methods in the semantic segmentation and object detection tasks. The detailed ablation studies demonstrate the effectiveness of each module, and another experiment explores the anti-missing modality ability of the proposed method in the dual-modality semantic segmentation task.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# フェデレーションラーニングにおけるプライバシの強化:現実世界の医療アプリケーションのためのセキュアなアグリゲーション Enhancing Privacy in Federated Learning: Secure Aggregation for Real-World Healthcare Applications ( http://arxiv.org/abs/2409.00974v1 ) ライセンス: Link先を確認	Riccardo Taiello, Sergen Cansiz, Marc Vesin, Francesco Cremonesi, Lucia Innocenti, Melek Önen, Marco Lorenzi,	(参考訳) 現実のシナリオ、特にヘルスケアにフェデレートドラーニング(FL)をデプロイすることは、コミュニケーションとセキュリティに課題をもたらす。特に、フェデレーションアグリゲーション手順に関して、研究者は、クライアントが送信するモデルのパラメータに対するプライバシー保証を提供するセキュアアグリゲーション(SA)スキームの研究に注力してきた。しかしながら、現在利用可能なFLフレームワークでのSAの実用性は現在、計算と通信のボトルネックのために制限されている。このギャップを埋めるために、オープンソースのFed-BioMedフレームワークにおけるSAの実装について検討する。我々は、医療データ分析問題パネルに広範なベンチマークを提供することにより、2つのSAプロトコル、Joye-Libert (JL) と Low Overhead Masking (LOM) を実装し、比較する。 4つのデータセットの理論的および実験的評価により、SAプロトコルはタスク精度を維持しながら、効果的にプライバシを保護することが示されている。トレーニング中の計算オーバーヘッドは、CPU上で1%未満、大規模モデルのGPUで50%未満であり、保護フェーズは10秒未満である。 Fed-BioMedにSAを組み込むことは、非SAシナリオと比較してタスクの正確性に2%以上影響を与えます。全体として、本研究では、現実世界の医療アプリケーションにおけるSAの実現可能性を示し、センシティブなアプリケーションにおけるプライバシ保護技術の採用に対するギャップを減らすことに寄与している。 Deploying federated learning (FL) in real-world scenarios, particularly in healthcare, poses challenges in communication and security. In particular, with respect to the federated aggregation procedure, researchers have been focusing on the study of secure aggregation (SA) schemes to provide privacy guarantees over the model's parameters transmitted by the clients. Nevertheless, the practical availability of SA in currently available FL frameworks is currently limited, due to computational and communication bottlenecks. To fill this gap, this study explores the implementation of SA within the open-source Fed-BioMed framework. We implement and compare two SA protocols, Joye-Libert (JL) and Low Overhead Masking (LOM), by providing extensive benchmarks in a panel of healthcare data analysis problems. Our theoretical and experimental evaluations on four datasets demonstrate that SA protocols effectively protect privacy while maintaining task accuracy. Computational overhead during training is less than 1% on a CPU and less than 50% on a GPU for large models, with protection phases taking less than 10 seconds. Incorporating SA into Fed-BioMed impacts task accuracy by no more than 2% compared to non-SA scenarios. Overall this study demonstrates the feasibility of SA in real-world healthcare applications and contributes in reducing the gap towards the adoption of privacy-preserving technologies in sensitive applications.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# ランダム化ガウス過程の上層信頼境界のレグレト解析 Regret Analysis for Randomized Gaussian Process Upper Confidence Bound ( http://arxiv.org/abs/2409.00979v1 ) ライセンス: Link先を確認	Shion Takeno, Yu Inatsu, Masayuki Karasuyama,	(参考訳) ガウス過程上信頼境界 (GP-UCB) はベイズ最適化 (BO) の理論的に確立されたアルゴリズムであり、目的関数 $f$ は GP に従うと仮定する。 GP-UCBの特筆すべき欠点は、反復とともに$\beta$が増加するという理論的な信頼パラメータが大きすぎることである。この欠点を軽減するために, 指数関数分布から生じる信頼度パラメータを用いて, 改良された乱数化GP-UCB (IRGP-UCB) と呼ばれるGP-UCBのランダム化変種を解析した。予測された後悔と条件付き後悔を分析し、予測と確率をそれぞれ$f$とノイズとBOアルゴリズムのランダム性で分析する。両方の後悔解析において、IRGP-UCBは入力領域が有限であれば信頼パラメータを増大させることなく、サブ線形後悔上限を達成する。最後に,合成およびベンチマーク関数と実世界のエミュレータを用いた数値実験を行った。 Gaussian process upper confidence bound (GP-UCB) is a theoretically established algorithm for Bayesian optimization (BO), where we assume the objective function $f$ follows GP. One notable drawback of GP-UCB is that the theoretical confidence parameter $\beta$ increased along with the iterations is too large. To alleviate this drawback, this paper analyzes the randomized variant of GP-UCB called improved randomized GP-UCB (IRGP-UCB), which uses the confidence parameter generated from the shifted exponential distribution. We analyze the expected regret and conditional expected regret, where the expectation and the probability are taken respectively with $f$ and noises and with the randomness of the BO algorithm. In both regret analyses, IRGP-UCB achieves a sub-linear regret upper bound without increasing the confidence parameter if the input domain is finite. Finally, we show numerical experiments using synthetic and benchmark functions and real-world emulators.	翻訳日:2024-09-06 08:08:59 公開日:2024-09-02
# 平衡外トンネル速度に対する熱力学およびエネルギー的制約 Thermodynamic and energetic constraints on out-of-equilibrium tunneling rates ( http://arxiv.org/abs/2409.00981v1 ) ライセンス: Link先を確認	Ludovico Tesser, Matteo Acciai, Christian Spånslätt, Inès Safi, Janine Splettstoesser,	(参考訳) 2つのサブシステム間のトンネル結合が遷移を引き起こす異なる温度で保持される二部量子系について検討する。 2つのサブシステム間の温度バイアスに依存した非平衡トンネル速度には2つの独立した制約があるが、どちらも結合量子系が小さい場合に特に制限的であることが分かる。これらの境界は、散逸した熱と、それぞれ温度バイアスを確立するのに必要な吸収エネルギーに関連付けられているため、熱力学的およびエネルギー的制約の形を取る。導出された制約は、トンネル機構への制限を除いて、相互作用や非線形エネルギースペクトルを含む任意のサブシステムハミルトニアンの量子系に適用される。これらの結果は、分子接合から結合キャビティまで、多くの実験的なシステムに関係しており、例えば、非平衡トンネル電流とそのノイズを測定することで試験することができる。 We study bipartite quantum systems kept at different temperatures where a tunnel coupling between the two subsystems induces transitions. We find two independent constraints on the temperature-bias-dependent, out-of-equilibrium tunneling rates between the two subsystems, which both turn out to be particularly restrictive when the coupled quantum systems are small. These bounds take the form of a thermodynamic and of an energetic constraint, as they are associated with the dissipated heat and with the absorbed energy required to establish and deplete the temperature bias, respectively. The derived constraints apply to a large class of experimentally accessible quantum systems: except for the restriction to the tunneling regime, they hold for arbitrary subsystem Hamiltonians, including interactions or non-linear energy spectra. These results hold for a large class of experimentally relevant systems, ranging from molecular junctions to coupled cavities, and can be tested by, for instance, measuring the out-of-equilibrium tunneling current and its noise.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# GCCRR:Ear-Worn IMUに基づく短周期歩行周期分割法 GCCRR: A Short Sequence Gait Cycle Segmentation Method Based on Ear-Worn IMU ( http://arxiv.org/abs/2409.00983v1 ) ライセンス: Link先を確認	Zhenye Xu, Yao Guo,	(参考訳) 運動機能障害患者の在宅モニタリングとリハビリテーションのための実践的,非侵襲的アプローチとして,耳鳴IMUの短いシーケンスを用いた歩行周期セグメンテーションの重要課題について述べる。以前の研究では下肢に位置するIMUに焦点が当てられていたが、耳を縫ったIMUは、最小限の侵入で歩行動態を捉えるのにユニークな利点がある。短周期を用いた歩行周期のセグメンテーションの課題に対処するために、我々は、微細な歩行位相セグメンテーションのために設計された新しい2段階アプローチである、歩行特性曲線回帰再生法(GCCRR)を導入する。第1段階は、セグメント化タスクを周期情報を組み込んだ1次元の特徴系列である歩行特性曲線(GCC)の回帰タスクに変換する。第2段階はピーク検出技術を用いて歩行周期を復元する。提案手法では,Bi-LSTMに基づく深層学習アルゴリズムを用いて,短い歩数列に対して信頼性の高いセグメンテーションを実現する。 HamlynGaitデータセットの評価では、GCCRRは80\%以上の精度を実現しており、Timestamp Errorは1回のサンプリング間隔以下である。その有望な結果にもかかわらず、より広範なセンサーシステムを使用する方法の遅れは、より大きな、より多様なデータセットの必要性を強調している。今後の研究は、モーションキャプチャシステムによるデータ拡張とアルゴリズムの一般化性の改善に焦点を当てる予定である。 This paper addresses the critical task of gait cycle segmentation using short sequences from ear-worn IMUs, a practical and non-invasive approach for home-based monitoring and rehabilitation of patients with impaired motor function. While previous studies have focused on IMUs positioned on the lower limbs, ear-worn IMUs offer a unique advantage in capturing gait dynamics with minimal intrusion. To address the challenges of gait cycle segmentation using short sequences, we introduce the Gait Characteristic Curve Regression and Restoration (GCCRR) method, a novel two-stage approach designed for fine-grained gait phase segmentation. The first stage transforms the segmentation task into a regression task on the Gait Characteristic Curve (GCC), which is a one-dimensional feature sequence incorporating periodic information. The second stage restores the gait cycle using peak detection techniques. Our method employs Bi-LSTM-based deep learning algorithms for regression to ensure reliable segmentation for short gait sequences. Evaluation on the HamlynGait dataset demonstrates that GCCRR achieves over 80\% Accuracy, with a Timestamp Error below one sampling interval. Despite its promising results, the performance lags behind methods using more extensive sensor systems, highlighting the need for larger, more diverse datasets. Future work will focus on data augmentation using motion capture systems and improving algorithmic generalizability.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# 共学習:会話型自然言語インタフェースを用いた多言語強化協調フレームワークのためのコード学習 Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces ( http://arxiv.org/abs/2409.00985v1 ) ライセンス: Link先を確認	Jiapeng Yu, Yuqian Wu, Yajing Zhan, Wenhao Guo, Zhou Xu, Raymond Lee,	(参考訳) 大規模言語モデル(LLM)に基づくオンライン質問・回答システム(Q\&A)は、レクリエーションから専門的な利用へと徐々に変化してきた。本稿では,コード学習コミュニティ(Code Learning (Co-Learning) Community)と呼ばれるコード修正のための環境強化学習(E-RL)を備えたマルチエージェントフレームワークを提案する。 702の誤り符号を持つ元のデータセットから複数のLSMの性能を評価し、E-RLの報酬または罰則として使用し、入力エラー符号を現在のエージェントで分析し、適切なLSMベースのエージェントを選択し、最適な誤り訂正精度を達成し、修正時間を短縮する。実験の結果,E-RL法と比較して精度が3倍,時間コストが15倍改善した。私たちのソースコードは、https://github.com/yuqian2003/Co_Learning.comで公開されています。 Online question-and-answer (Q\&A) systems based on the Large Language Model (LLM) have progressively diverged from recreational to professional use. This paper proposed a Multi-Agent framework with environmentally reinforcement learning (E-RL) for code correction called Code Learning (Co-Learning) community, assisting beginners to correct code errors independently. It evaluates the performance of multiple LLMs from an original dataset with 702 error codes, uses it as a reward or punishment criterion for E-RL; Analyzes input error codes by the current agent; selects the appropriate LLM-based agent to achieve optimal error correction accuracy and reduce correction time. Experiment results showed that 3\% improvement in Precision score and 15\% improvement in time cost as compared with no E-RL method respectively. Our source code is available at: https://github.com/yuqian2003/Co_Learning	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# パーソナライズされた唇読み:視覚と言語によるユニークな唇の動きに適応する Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language ( http://arxiv.org/abs/2409.00986v1 ) ライセンス: Link先を確認	Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, Yong Man Ro,	(参考訳) 唇読解は、唇の動きを分析して音声言語を予測することを目的としている。唇読解技術の進歩にもかかわらず、唇の外観などの視覚情報の変化に敏感なため、モデルが見えない話者に適用されると性能が低下する。この課題に対処するために、話者適応型唇読解技術は、視覚的モダリティにおいてターゲット話者に唇読取モデルを効果的に適応させることに集中して進歩してきた。対象話者の語彙選択などの言語情報への適応の有効性については,これまでの研究では検討されていない。さらに、話者適応のための既存のデータセットは語彙のサイズが限られており、実際のシナリオにおける従来の話者適応手法の検証が制限されている。これらの課題に対処するため,視覚レベルと言語レベルの両方の話者を対象に,事前学習モデルを適用した新しい話者適応型唇読解法を提案する。具体的には、プロンプトチューニングとLoRAアプローチを統合し、訓練済みの唇読解モデルに適用し、ターゲット話者に効果的に適用する。さらに,実世界のシナリオでの有効性を検証するために,VoxCeleb2とLSS3から派生した新たなデータセットであるVoxLRS-SAを導入する。約100Kの単語の語彙を含み、多様なポーズのバリエーションを提供し、野生の文レベルの唇読解における適応法の検証を初めて行うことができる。種々の実験を通して,既存の話者適応法は文レベルでの野生における性能も向上することを示した。さらに,提案手法により,提案手法は従来の提案手法と比較して,対象話者に適用した場合の大幅な改善を実現することを示す。 Lip reading aims to predict spoken language by analyzing lip movements. Despite advancements in lip reading technologies, performance degrades when models are applied to unseen speakers due to their sensitivity to variations in visual information such as lip appearances. To address this challenge, speaker adaptive lip reading technologies have advanced by focusing on effectively adapting a lip reading model to target speakers in the visual modality. The effectiveness of adapting language information, such as vocabulary choice, of the target speaker has not been explored in the previous works. Moreover, existing datasets for speaker adaptation have limited vocabulary size and pose variations, limiting the validation of previous speaker-adaptive methods in real-world scenarios. To address these issues, we propose a novel speaker-adaptive lip reading method that adapts a pre-trained model to target speakers at both vision and language levels. Specifically, we integrate prompt tuning and the LoRA approach, applying them to a pre-trained lip reading model to effectively adapt the model to target speakers. In addition, to validate its effectiveness in real-world scenarios, we introduce a new dataset, VoxLRS-SA, derived from VoxCeleb2 and LRS3. It contains a vocabulary of approximately 100K words, offers diverse pose variations, and enables the validation of adaptation methods in wild, sentence-level lip reading for the first time. Through various experiments, we demonstrate that the existing speaker-adaptive method also improves performance in the wild at the sentence level. Moreover, with the proposed adaptation method, we show that the proposed method achieves larger improvements when applied to the target speaker, compared to the previous works.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# 交互最適化によるブラインド画像デブロアリングのための自己監督型マルチスケールネットワーク Self-Supervised Multi-Scale Network for Blind Image Deblurring via Alternating Optimization ( http://arxiv.org/abs/2409.00988v1 ) ライセンス: Link先を確認	Lening Guo, Jing Yu, Ning Zhang, Chuangbai Xiao,	(参考訳) ブラインドイメージデブロワーリング(Blind image deblurring)は、ぼやけたカーネルが未知のときに、未ブルーのイメージを推定する、挑戦的な低レベルのビジョンタスクである。本稿では,画像とぼやけたカーネルを交互に推定する,自己監督型マルチスケールブラインド画像デブロアリング手法を提案する。画像推定ステップでは、複数の入力と複数の出力を持つマルチスケールジェネレータネットワークを構築し、様々なスケールで遅延画像を協調的に推定し、ぼやけた画像のみから構築した画像ピラミッドによって監督する。このジェネレータは、ネットワーク上にアーキテクチャ上の制約を配置し、画像先行の数学的表現を必要としない。ぼやけたカーネル推定ステップでは、画像推定のために提案したマルチスケールジェネレータへの柔軟な適応のために、各スケールのぼやけたカーネルを2次正規化最小二乗モデルへの直接解で独立に推定する。提案手法は,複数スケールにわたる協調的推定により,計算集約的な粗大な伝播や,従来の数式最適化法で用いられる画像の劣化を回避できる。合成および現実的なデータセットの定量的および定性的な実験結果から,本手法の優れた性能,特に大規模および実世界のぼかしの処理性能を示す。 Blind image deblurring is a challenging low-level vision task that involves estimating the unblurred image when the blur kernel is unknown. In this paper, we present a self-supervised multi-scale blind image deblurring method to jointly estimate the latent image and the blur kernel via alternating optimization. In the image estimation step, we construct a multi-scale generator network with multiple inputs and multiple outputs to collaboratively estimate latent images at various scales, supervised by an image pyramid constructed from only the blurred image. This generator places architectural constraints on the network and avoids the need for mathematical expression of image priors. In the blur kernel estimation step, the blur kernel at each scale is independently estimated with a direct solution to a quadratic regularized least-squares model for its flexible adaptation to the proposed multi-scale generator for image estimation. Thanks to the collaborative estimation across multiple scales, our method avoids the computationally intensive coarse-to-fine propagation and additional image deblurring processes used in traditional mathematical optimization-based methods. Quantitative and qualitative experimental results on synthetic and realistic datasets demonstrate the superior performance of our method, especially for handling large and real-world blurs.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# ブラインド顔修復のための3次元優先誘導拡散法 3D Priors-Guided Diffusion for Blind Face Restoration ( http://arxiv.org/abs/2409.00991v1 ) ライセンス: Link先を確認	Xiaobin Lu, Xiaobin Hu, Jun Luo, Ben Zhu, Yaping Ruan, Wenqi Ren,	(参考訳) 劣化した顔画像から鮮明な顔画像を復元するためのブラインド顔復元作業。 GAN(Generative Adversarial Networks)を先駆者として採用した最近のアプローチは、この分野において顕著な成功を収めている。しかし、これらの手法は、特に複雑な劣化シナリオにおいて、現実主義と忠実さのバランスを達成する上で困難に直面する。拡散モデルの例外的リアリズム生成能力を継承し,自己認識の忠実さに制約されるために,3次元顔の先行を構造と同一性制約としてデノナイズド拡散プロセスに埋め込むことにより,新しい拡散基盤フレームワークを提案する。具体的には、より正確な3D先行表現を得るために、予め訓練された復元ネットワークで処理された初期復元顔画像を用いて、3D形態モデル(3DMM)により3D顔画像を再構成する。ノイズ推定プロセスにマッピングされる3次元顔画像の構造情報と同一性情報の両方を利用するために、カスタマイズされたマルチレベル特徴抽出手法を用いる。識別情報のノイズ推定への融合を強化するため,時間認識融合ブロック(TAFB)を提案する。本モジュールは,初期構造改善とテクスチャ詳細強化を伴う拡散モデルにおけるデノナイジング過程の動的性質を考慮した,より効率的で適応的な重みの融合を提供する。 Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success in this field. However, these methods encounter challenges in achieving a balance between realism and fidelity, particularly in complex degradation scenarios. To inherit the exceptional realism generative ability of the diffusion model and also constrained by the identity-aware fidelity, we propose a novel diffusion-based framework by embedding the 3D facial priors as structure and identity constraints into a denoising diffusion process. Specifically, in order to obtain more accurate 3D prior representations, the 3D facial image is reconstructed by a 3D Morphable Model (3DMM) using an initial restored face image that has been processed by a pretrained restoration network. A customized multi-level feature extraction method is employed to exploit both structural and identity information of 3D facial images, which are then mapped into the noise estimation process. In order to enhance the fusion of identity information into the noise estimation, we propose a Time-Aware Fusion Block (TAFB). This module offers a more efficient and adaptive fusion of weights for denoising, considering the dynamic nature of the denoising process in the diffusion model, which involves initial structure refinement followed by texture detail enhancement.Extensive experiments demonstrate that our network performs favorably against state-of-the-art algorithms on synthetic and real-world datasets for blind face restoration.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# 剛性に基づく損失関数を持つ物理インフォームドDeepONetによる構造応答予測 Physics-informed DeepONet with stiffness-based loss functions for structural response prediction ( http://arxiv.org/abs/2409.00994v1 ) ライセンス: Link先を確認	Bilal Ahmed, Yuqing Qiu, Diab W. Abueidda, Waleed El-Sekelly, Borja Garcia de Soto, Tarek Abdoun, Mostafa E. Mobasher,	(参考訳) 有限要素モデリングは、構造解析のための確立されたツールであるが、複雑な構造をモデル化するには、広範囲な前処理、重要な分析努力、かなりの時間を要することが多い。本研究では,DeepOnetを用いた構造的静的応答のリアルタイム予測手法を導入することで,この課題に対処する。このアプローチは、様々な負荷クラスとマグニチュードの下でレスポンスを正確に予測する柔軟性を提供します。トレーニングされたDeepONetは、1秒以内にドメイン全体のソリューションを生成することができる。この機能は、FEモデリングにおける各新しいケースで通常必要とされる広範囲なリモデリングと分析の必要性を効果的に排除する。提案手法を実橋の簡易な2次元ビーム構造と包括的3次元モデルという2つの構造に適用する。 DeepONetで複数の変数を予測するには、分割ブランチ/トランクと複数のDeepONetsを1つのDeepONetに統合する2つの戦略を利用する。データ駆動トレーニングに加えて、新しい物理インフォームドトレーニングアプローチを導入する。この方法は構造剛性行列を活用し、基本的な平衡とエネルギー保存の原理を強制し、2つの新しい物理学インフォームド損失関数(エネルギー保存とシュア補数を用いた静的平衡)をもたらす。損失関数の様々な組み合わせを用いて、トレーニング時間を大幅に短縮し、5%未満の誤差率を達成する。本研究では,ハイブリッド損失関数によって強化されたDeepONetが,各メッシュ点における変位と回転を,トレーニング時間を短縮して正確に,効率的に予測できることを示す。 Finite element modeling is a well-established tool for structural analysis, yet modeling complex structures often requires extensive pre-processing, significant analysis effort, and considerable time. This study addresses this challenge by introducing an innovative method for real-time prediction of structural static responses using DeepOnet which relies on a novel approach to physics-informed networks driven by structural balance laws. This approach offers the flexibility to accurately predict responses under various load classes and magnitudes. The trained DeepONet can generate solutions for the entire domain, within a fraction of a second. This capability effectively eliminates the need for extensive remodeling and analysis typically required for each new case in FE modeling. We apply the proposed method to two structures: a simple 2D beam structure and a comprehensive 3D model of a real bridge. To predict multiple variables with DeepONet, we utilize two strategies: a split branch/trunk and multiple DeepONets combined into a single DeepONet. In addition to data-driven training, we introduce a novel physics-informed training approaches. This method leverages structural stiffness matrices to enforce fundamental equilibrium and energy conservation principles, resulting in two novel physics-informed loss functions: energy conservation and static equilibrium using the Schur complement. We use various combinations of loss functions to achieve an error rate of less than 5% with significantly reduced training time. This study shows that DeepONet, enhanced with hybrid loss functions, can accurately and efficiently predict displacements and rotations at each mesh point, with reduced training time.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# DataSculpt:多目的分割によるLCM後トレーニングのためのデータランドスケープの構築 DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning ( http://arxiv.org/abs/2409.00997v1 ) ライセンス: Link先を確認	Keer Lu, Zheng Liang, Xiaonan Nie, Da Pan, Shusen Zhang, Keshi Zhao, Weipeng Chen, Zenan Zhou, Guosheng Dong, Wentao Zhang, Bin Cui,	(参考訳) 長期コンテキストモデリングの有効性は、様々なアプリケーションにおいて大規模言語モデル(LLM)にとって重要である。その可能性にもかかわらず、LLMsの長期的文脈における有効性は、常に期待を満たさないため、トレーニングにおける長期的シーケンスの効率的な管理には重大な課題が生じる。この難しさは、異なるデータソースにまたがる固有の長さバイアスに起因する、長いシーケンスに適した包括的で多様なトレーニングデータセットの不足と、拡張されたコンテキストでのトレーニングのための大規模データ管理に関連する論理的複雑さによって複雑化されている。本研究では,拡張コンテキストトレーニングのためのデータアーキテクチャを戦略的に強化するデータ構築フレームワークであるDataSculptを紹介する。我々の徹底的な評価は、DataSculptが長期コンテキストトレーニングのパフォーマンスを向上する驚くべき能力を示し、18.09%の検索強化、21.23%の要約、21.27%の読み取り理解、3.81%のコード補完、そしてモデルの全体的な習熟度を4.88%の改善で保ちながら達成していることを示している。 The effectiveness of long-context modeling is important for Large Language Models (LLMs) in various applications. Despite their potential, LLMs' efficacy in processing long context does not consistently meet expectations, posing significant challenges for efficient management of prolonged sequences in training. This difficulty is compounded by the scarcity of comprehensive and diverse training datasets suitable for long sequences, which stems from inherent length biases across different data sources, and the logistical complexities associated with massive data management for training in extended contexts. In this work, we introduce DataSculpt, a data construction framework designed to strategically augment the data architecture for extended-context training. Our thorough evaluations demonstrate DataSculpt's remarkable capacity to boost long-context training performance, achieving improvements including an 18.09% increase in retrieval augmentation, 21.23% in summarization, 21.27% in reading comprehension, and a 3.81% rise in code completion, all while preserving the models' overall proficiency with a 4.88% improvement.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# 画像分類のための高調な量子エクストリーム学習マシン Harnessing Quantum Extreme Learning Machines for image classification ( http://arxiv.org/abs/2409.00998v1 ) ライセンス: Link先を確認	A. De Lorenzis, M. P. Casado, M. P. Estarellas, N. Lo Gullo, T. Lux, F. Plastina, A. Riera, J. Settino,	(参考訳) 量子機械学習への関心は、古典的な手法に取り組むのが難しい問題に対する効率的なソリューションを開発する可能性から、ますます高まっている。本研究は,画像分類タスクにおける量子機械学習技術の利用に焦点を当てた研究である。我々は,量子貯水池基板が提供する豊富な特徴写像を利用して,量子極端学習マシンを利用する。我々は、データセット作成から画像最終分類まで、量子極端学習マシンプロセスの異なるフェーズを体系的に分析する。特に、主成分分析による符号化とオートエンコーダの使用による影響、および量子貯水池における異なるハミルトニアンの使用によるモデルのダイナミクスについて検討する。その結果,量子貯水池の導入は分類器の精度を体系的に向上させることがわかった。さらに、異なるエンコーディングは異なるパフォーマンスをもたらす可能性があるが、異なる接続度を持つハミルトン人は、相互作用している場合と同じ差別率を示す。 Interest in quantum machine learning is increasingly growing due to the possibility of developing efficient solutions to problems that are difficult to tackle with classical methods. In this context, the research work presented here focuses on the use of quantum machine learning techniques for image classification tasks. We exploit a quantum extreme learning machine by taking advantage of its rich feature map provided by the quantum reservoir substrate. We systematically analyse different phases of the quantum extreme learning machine process, from the dataset preparation to the image final classification. In particular, we investigate the impact of encoding through a Principal Component Analysis and the use of Auto-Encoders, as well as the dynamics of the model through the use of different Hamiltonians for the quantum reservoir. Our results show that the introduction of a quantum reservoir systematically improves the accuracy of the classifier. Additionally, while different encodings can lead to significantly different performances, Hamiltonians with varying degrees of connectivity exhibit the same discrimination rate, provided they are interacting.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# 雑音確率的誤差キャンセラと一般化物理実装可能性 Noisy Probabilistic Error Cancellation and Generalized Physical Implementability ( http://arxiv.org/abs/2409.01000v1 ) ライセンス: Link先を確認	Tian-Ren Jin, Kai Xu, Yu-Ran Zhang, Heng Fan,	(参考訳) 量子デコヒーレントノイズは実用的な量子プロセッサの性能に大きな影響を与えている。確率的誤差キャンセル量子誤差軽減法は、物理的チャネルではないノイズ逆演算を準確率的にシミュレートし、ノイズをキャンセルする。物理実装性(英: physical implementability)は、準確率分解によって物理チャネルを持つ非物理的量子演算をシミュレートする最小のコストである。しかし、実際は、このキャンセルはノイズの影響を受けうるため、実装可能なチャネルは物理的チャネルのすべてではないため、物理的実装性は確率的エラーキャンセル法の実情を完全に表現するのに十分ではない。したがって、自由量子資源の任意の凸集合に物理実装性を一般化し、その性質を議論する。ノイズの多いパウリベースでエラーチャネルを最適にキャンセルする方法を実証する。さらに、この一般化に関連するいくつかの性質についても論じる。我々は、その特性と構造を包括的に調査し、量子情報処理の分野でより多くの応用を期待する。 Quantum decoherent noises have significantly influenced the performance of practical quantum processors. Probabilistic error cancellation quantum error mitigation method quasiprobabilistically simulates the noise inverse operations, which are not physical channels, to cancel the noises. Physical implementability is the minimal cost to simulate a non-physical quantum operation with physical channels by the quasiprobabilistic decomposition. However, in practical, this cancellation may also be influenced by noises, and the implementable channels are not all of the physical channels, so the physical implementability is not sufficient to completely depict the practical situation of the probabilistic error cancellation method. Therefore, we generalize the physical implementability to an arbitrary convex set of free quantum resources and discuss several of its properties. We demonstrate the way to optimally cancel the error channel with the noisy Pauli basis. In addition, we also discuss the several properties relevant to this generalization. We expect that its properties and structures will be investigated comprehensively, and it will have more applications in the field of quantum information processing.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# チャットGPTを超えて - ソフトウェア品質保証タスクを多言語 LLM とバリデーション技術で強化する Beyond ChatGPT: Enhancing Software Quality Assurance Tasks with Diverse LLMs and Validation Techniques ( http://arxiv.org/abs/2409.01001v1 ) ライセンス: Link先を確認	Ratnadira Widyasari, David Lo, Lizi Liao,	(参考訳) LLM(Large Language Models)の進歩に伴い、ソフトウェア品質保証(Software Quality Assurance, SQA)への応用が増加している。しかし、これらのアプリケーションの現在の焦点は、主にChatGPTに焦点を当てている。この重要な領域では、様々なLLMの性能を理解することにはまだギャップがある。本稿では,2つのSQAタスク(障害局所化と脆弱性検出)にまたがる複数のLSMの能力に関する包括的調査を行うことにより,このギャップに対処することを目的とする。 GPT-3.5, GPT-4o, および他の4つのLLM(LLaMA-3-70B, LLaMA-3-8B, Gemma-7B, Mixtral-8x7B)を用いて比較検討を行い, これらの課題の有効性を検討した。以上の結果より,複数のLDMがGPT-3.5より優れていることが示唆された。さらに、低性能のLLMでさえ独自の正しい予測を提供し、異なるLLMの結果を組み合わせて全体的な性能を高める可能性を示唆した。 LLMの結果を組み合わせる投票機構を実装することで,両タスクにおいてGPT-3.5よりも10%以上の改善を実現した。さらに、検証プロンプトを用いて一方のLSM回答を他方に対して検証することにより、LCM回答を洗練するためのクロスバリデーション手法を導入した。このアプローチにより、障害のローカライゼーションが16%、脆弱性検出が12%、GPT-3.5が4%向上した。また, LLMの結果に説明文を組み込むことが, クロスバリデーション手法の有効性に影響を与えることも示唆した。 With the advancement of Large Language Models (LLMs), their application in Software Quality Assurance (SQA) has increased. However, the current focus of these applications is predominantly on ChatGPT. There remains a gap in understanding the performance of various LLMs in this critical domain. This paper aims to address this gap by conducting a comprehensive investigation into the capabilities of several LLMs across two SQA tasks: fault localization and vulnerability detection. We conducted comparative studies using GPT-3.5, GPT-4o, and four other publicly available LLMs (LLaMA-3-70B, LLaMA-3-8B, Gemma-7B, and Mixtral-8x7B), to evaluate their effectiveness in these tasks. Our findings reveal that several LLMs can outperform GPT-3.5 in both tasks. Additionally, even the lower-performing LLMs provided unique correct predictions, suggesting the potential of combining different LLMs' results to enhance overall performance. By implementing a voting mechanism to combine the LLMs' results, we achieved more than a 10% improvement over the GPT-3.5 in both tasks. Furthermore, we introduced a cross-validation approach to refine the LLM answer by validating one LLM answer against another using a validation prompt. This approach led to performance improvements of 16% in fault localization and 12% in vulnerability detection compared to the GPT-3.5, with a 4% improvement compared to the best-performed LLMs. Our analysis also indicates that the inclusion of explanations in the LLMs' results affects the effectiveness of the cross-validation technique.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# Free-DyGS:動的手術ビデオのためのガウススプレイティングに基づくカメラ不要シーン再構成 Free-DyGS: Camera-Pose-Free Scene Reconstruction based on Gaussian Splatting for Dynamic Surgical Videos ( http://arxiv.org/abs/2409.01003v1 ) ライセンス: Link先を確認	Qian Li, Shuojue Yang, Daiyun Shen, Yueming Jin,	(参考訳) 内視鏡的ビデオの再構成は,高忠実度可視化と外科手術の効率化に不可欠である。重要にもかかわらず、既存の3D再構成手法は、精度の厳しい要求、不正確なカメラ位置決め、複雑なダイナミックシーン、迅速な再構築の必要性など、いくつかの課題に直面している。これらの課題に対処するために,3Dガウススプラッティング技術を活用し,ダイナミックな手術ビデオに適したカメラレスシーン再構築フレームワークであるFree-DyGSを提案する。提案手法は,フレーム単位の再構築戦略を採用し,シーン初期化,共同学習,シーン拡張,レトロスペクティブ学習という4つの段階に分けられる。本稿では,RGBDフレームから各画素のガウス属性を逐次生成するために,Scene Initialization と Expansion フェーズ内に一般化可能なガウスパラメータ化モジュールを導入する。共同学習フェーズは、革新的なフレキシブルな変形モジュールによって促進されるシーン変形とカメラポーズを同時に推定する。シーン拡大段階では、カメラが動くにつれてガウス点が徐々に大きくなる。振り返り学習フェーズは、先行フレームの再評価を通じてシーン変形の精度を高めることを目的としている。提案されたFree-DyGSの有効性は、StereoMISとHamlynデータセットという2つのデータセットの実験を通じて実証されている。実験結果は、Free-DyGSが従来のベースラインモデルを超え、レンダリング忠実度と計算効率の両方を上回っていることを示している。 Reconstructing endoscopic videos is crucial for high-fidelity visualization and the efficiency of surgical operations. Despite the importance, existing 3D reconstruction methods encounter several challenges, including stringent demands for accuracy, imprecise camera positioning, intricate dynamic scenes, and the necessity for rapid reconstruction. Addressing these issues, this paper presents the first camera-pose-free scene reconstruction framework, Free-DyGS, tailored for dynamic surgical videos, leveraging 3D Gaussian splatting technology. Our approach employs a frame-by-frame reconstruction strategy and is delineated into four distinct phases: Scene Initialization, Joint Learning, Scene Expansion, and Retrospective Learning. We introduce a Generalizable Gaussians Parameterization module within the Scene Initialization and Expansion phases to proficiently generate Gaussian attributes for each pixel from the RGBD frames. The Joint Learning phase is crafted to concurrently deduce scene deformation and camera pose, facilitated by an innovative flexible deformation module. In the scene expansion stage, the Gaussian points gradually grow as the camera moves. The Retrospective Learning phase is dedicated to enhancing the precision of scene deformation through the reassessment of prior frames. The efficacy of the proposed Free-DyGS is substantiated through experiments on two datasets: the StereoMIS and Hamlyn datasets. The experimental outcomes underscore that Free-DyGS surpasses conventional baseline models in both rendering fidelity and computational efficiency.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# 大規模言語モデルの知恵を解き放つ:人工知能への道のり Unlocking the Wisdom of Large Language Models: An Introduction to The Path to Artificial General Intelligence ( http://arxiv.org/abs/2409.01007v1 ) ライセンス: Link先を確認	Edward Y. Chang,	(参考訳) この小冊子"Unlocking the Wisdom of Large Language Models"は包括的作品"The Path to Artificial General Intelligence"の紹介となる。一連の9つのアフォリスムを通じて、敵のLLM対話を通じてAIの未来を探究するための重要な洞察と原則を抽出する。本稿では,人工知能(AGI)の実現に向けた潜在的経路として,このアプローチを提案する。この冊子には本書の巻名、抄録、序文が含まれており、その全文で最初の2章を提示している。 This booklet, "Unlocking the Wisdom of Large Language Models," serves as an introduction to the comprehensive work "The Path to Artificial General Intelligence." Through a series of nine aphorisms, we distill key insights and principles that underpin the larger exploration of AI's future through adversarial LLM dialogue. We propose this approach as a potential path to realizing artificial general intelligence (AGI). This booklet also includes the titles, abstracts, and introductions of the chapters in the main book, and presents the first two chapters in their entirety.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# 木を$\ell_1$-双曲距離に適合させる Fitting trees to $\ell_1$-hyperbolic distances ( http://arxiv.org/abs/2409.01010v1 ) ライセンス: Link先を確認	Joon-Hyeok Yim, Anna C. Gilbert,	(参考訳) 植物遺伝学的解析、メートル法埋め込み、近似アルゴリズム、幾何グラフニューラルネット、階層データの解析において、木を構築することは重要な要素である。しかし、それまでのアルゴリズム的な研究の多くは、一般的な距離空間(すなわち、事前制約のないもの)に焦点を当てていた。双曲幾何学と幾何群理論の数学的解析からいくつかのアイデアを取り入れ、木嵌合問題を、双曲性(ウルトラメトリック性)ベクトルと木埋め込みの誤差の関係を見出すものとして研究する。すなわち、すべての点三重項上の双曲性(ultrametric)値のベクトルを定義し、このベクトルの$\ell_p$ノルムと、最良の木の歪みの$\ell_q$ノルムを比較する。この定式化により、双曲性ベクトルの正規化された$\ell_1$ノルムの言葉で平均双曲性 (ultrametricity) を定義することができる。さらに、グロモフの古典的ツリー適合結果は、$p = q = \infty$ resultと解釈できる。出力埋め込みの$\ell_1$エラーが双曲性ベクトルの$\ell_1$ノルム(すなわち$p = q = 1$)で解析的に有界であるようなアルゴリズム HCCRootedTreeFit を提案する。さらに、このアルゴリズムはグロモフの結果や関連するアルゴリズムと比較して、理論的および経験的性能が著しく異なる。最後に、HCCRootedTreeFitと関連する木適合アルゴリズムを用いて、階層型データ解析と幾何グラフニューラルネットワークの標準データセットは、合成された木のようなデータセットと根本的に異なる木適合性を持ち、これらの標準データセットのより洗練された分析が求められていることを示す。 Building trees to represent or to fit distances is a critical component of phylogenetic analysis, metric embeddings, approximation algorithms, geometric graph neural nets, and the analysis of hierarchical data. Much of the previous algorithmic work, however, has focused on generic metric spaces (i.e., those with no a priori constraints). Leveraging several ideas from the mathematical analysis of hyperbolic geometry and geometric group theory, we study the tree fitting problem as finding the relation between the hyperbolicity (ultrametricity) vector and the error of tree (ultrametric) embedding. That is, we define a vector of hyperbolicity (ultrametric) values over all triples of points and compare the $\ell_p$ norms of this vector with the $\ell_q$ norm of the distortion of the best tree fit to the distances. This formulation allows us to define the average hyperbolicity (ultrametricity) in terms of a normalized $\ell_1$ norm of the hyperbolicity vector. Furthermore, we can interpret the classical tree fitting result of Gromov as a $p = q = \infty$ result. We present an algorithm HCCRootedTreeFit such that the $\ell_1$ error of the output embedding is analytically bounded in terms of the $\ell_1$ norm of the hyperbolicity vector (i.e., $p = q = 1$) and that this result is tight. Furthermore, this algorithm has significantly different theoretical and empirical performance as compared to Gromov's result and related algorithms. Finally, we show using HCCRootedTreeFit and related tree fitting algorithms, that supposedly standard data sets for hierarchical data analysis and geometric graph neural networks have radically different tree fits than those of synthetic, truly tree-like data sets, suggesting that a much more refined analysis of these standard data sets is called for.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# チューバンボスリップスクリプトのためのマルチモーダルマルチグラニュリティトケナイザ Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts ( http://arxiv.org/abs/2409.01011v1 ) ライセンス: Link先を確認	Yingfa Chen, Chenlong Hu, Cong Feng, Chenyang Song, Shi Yu, Xu Han, Zhiyuan Liu, Maosong Sun,	(参考訳) 本研究では,古代中国における春・秋・戦国期(紀元前771-256年)に用いられた中竹スリップ(CBS)スクリプトに着目し,古代中国文字の分析に特化して設計された多モード多粒性トークンについて述べる。一つの文字が複数のサブ文字の組み合わせである古代中国語の複雑な階層構造を考えると、トークンライザはまず文字検出を採用して文字境界の特定を行い、文字レベルとサブ文字レベルの両方で文字認識を行う。さらに,学術コミュニティを支援するために,100K以上の注釈付き文字画像スキャンを備えたCBSの大規模データセットも収集した。我々のデータセット上に構築された音声タグ付けタスクでは、私たちのトークンライザを使うことで、主流のサブワードトークンライザと比較してF1スコアが5.5%向上します。我々の研究は、特定の文字のさらなる調査に役立つだけでなく、他の形態の漢文についての研究を進める可能性も持っている。 This study presents a multi-modal multi-granularity tokenizer specifically designed for analyzing ancient Chinese scripts, focusing on the Chu bamboo slip (CBS) script used during the Spring and Autumn and Warring States period (771-256 BCE) in Ancient China. Considering the complex hierarchical structure of ancient Chinese scripts, where a single character may be a combination of multiple sub-characters, our tokenizer first adopts character detection to locate character boundaries, and then conducts character recognition at both the character and sub-character levels. Moreover, to support the academic community, we have also assembled the first large-scale dataset of CBSs with over 100K annotated character image scans. On the part-of-speech tagging task built on our dataset, using our tokenizer gives a 5.5% relative improvement in F1-score compared to mainstream sub-word tokenizers. Our work not only aids in further investigations of the specific script but also has the potential to advance research on other forms of ancient Chinese scripts.	翻訳日:2024-09-06 07:59:10 公開日:2024-09-02
# リコメンデーションのための多様性向上型コラボレーションメトリックラーニング Improved Diversity-Promoting Collaborative Metric Learning for Recommendation ( http://arxiv.org/abs/2409.01012v1 ) ライセンス: Link先を確認	Shilong Bao, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, Qingming Huang,	(参考訳) コラボレーティブ・メトリック・ラーニング(CML)は、最近、レコメンデーション・システム(RS)において一般的な方法として現れ、メトリック・ラーニングとコラボレーティブ・フィルタリングのギャップを埋めている。 RSの慣例に従い、既存のプラクティスはモデル設計においてユニークなユーザー表現を利用する。本稿では,ユーザが複数のカテゴリの関心を持つ,困難なシナリオに焦点を当てる。この設定の下では、ユニークなユーザ表現は、特にアイテムカテゴリの分布が不均衡な場合に、優先バイアスを引き起こす可能性がある。この問題に対処するため,本稿では,ユーザの少数派の関心を概ね無視する目的で,‘textit{Diversity-Promoting Collaborative Metric Learning}’ (DPCML) と呼ばれる新しい手法を提案する。 DPCMLの背景にある重要な考え方は、ユーザがアイテムに対する好みを集約するシステムにおいて、埋め込みセットの中で最小のアイテム-ユーザ距離を取ることで、各ユーザに対して複数の表現セットを導入することである。具体的には、2つの効果的な割り当て戦略をインスタンス化し、各ユーザに対して適切な量のベクトルを探索する。一方、マルチベクタ表現戦略をより良くするために、textit{Diversity Control Regularization Scheme} (DCRS) が開発されている。理論的には、DPCMLは従来のCMLよりも小さな一般化誤差を誘導できることを示す。さらに, CMLに基づくアプローチでは, 対の目的によって引き起こされる計算負担を軽減するために, 通常は textit{ negative sample} を必要とする。本稿では,One-Way partial AUC(OPAUC)の観点から広く採用されているハード・アウェア・サンプリングの基本的な限界を明らかにし,CMLのパラダイムに対する効果的なサンプリング代替案を開発する。最後に、さまざまなベンチマークデータセットに関する包括的な実験は、DPCMLの有効性を物語っている。コードは \url{https://github.com/statusrank/LibCML} で入手できる。 Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called \textit{Diversity-Promoting Collaborative Metric Learning} (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to introduce a set of multiple representations for each user in the system where users' preference toward an item is aggregated by taking the minimum item-user distance among their embedding set. Specifically, we instantiate two effective assignment strategies to explore a proper quantity of vectors for each user. Meanwhile, a \textit{Diversity Control Regularization Scheme} (DCRS) is developed to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could induce a smaller generalization error than traditional CML. Furthermore, we notice that CML-based approaches usually require \textit{negative sampling} to reduce the heavy computational burden caused by the pairwise objective therein. In this paper, we reveal the fundamental limitation of the widely adopted hard-aware sampling from the One-Way Partial AUC (OPAUC) perspective and then develop an effective sampling alternative for the CML-based paradigm. Finally, comprehensive experiments over a range of benchmark datasets speak to the efficacy of DPCML. Code are available at \url{https://github.com/statusrank/LibCML}.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# SeCo-INR: 医用画像超解像のための意味的条件付きインシシトニューラル表現 SeCo-INR: Semantically Conditioned Implicit Neural Representations for Improved Medical Image Super-Resolution ( http://arxiv.org/abs/2409.01013v1 ) ライセンス: Link先を確認	Mevan Ekanayake, Zhifeng Chen, Gary Egan, Mehrtash Harandi, Zhaolin Chen,	(参考訳) Inlicit Neural Representations (INR)は、大規模なトレーニングデータセットを必要とせずに、信号の連続的な表現を学習する能力のために、最近ディープラーニングの分野を進歩させた。医用画像の超高分解能化のためにINR法が研究されているが, 医用画像における局所化先行への適応性は広く研究されていない。医用画像には、INRの精度と堅牢性を高めるために貴重な局所的な事前情報を提供する、豊富な解剖学的分類が含まれている。本研究では,医療画像から局所的な先行値を用いてINRを条件付けし,高精度なモデルフィッティングと補間機能を実現する,Semantically Conditioned INR (SeCo-INR) と呼ばれる新しいフレームワークを提案する。本フレームワークは、医用画像のセマンティックセグメンテーション特徴の連続表現を学習し、それを用いて画像の各セマンティック領域に対して最適なINRを導出する。我々は,いくつかの医用画像モダリティを用いてフレームワークを試験し,最先端の手法と比較して高い定量スコアとよりリアルな超解像出力を得た。 Implicit Neural Representations (INRs) have recently advanced the field of deep learning due to their ability to learn continuous representations of signals without the need for large training datasets. Although INR methods have been studied for medical image super-resolution, their adaptability to localized priors in medical images has not been extensively explored. Medical images contain rich anatomical divisions that could provide valuable local prior information to enhance the accuracy and robustness of INRs. In this work, we propose a novel framework, referred to as the Semantically Conditioned INR (SeCo-INR), that conditions an INR using local priors from a medical image, enabling accurate model fitting and interpolation capabilities to achieve super-resolution. Our framework learns a continuous representation of the semantic segmentation features of a medical image and utilizes it to derive the optimal INR for each semantic region of the image. We tested our framework using several medical imaging modalities and achieved higher quantitative scores and more realistic super-resolution outputs compared to state-of-the-art methods.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# 鳥の視点からストリートビューへ:潜伏拡散モデルを用いた多次元および条件付き画像の作成 From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model ( http://arxiv.org/abs/2409.01014v1 ) ライセンス: Link先を確認	Xiaojie Xu, Tianshuo Xu, Fulong Ma, Yingcong Chen,	(参考訳) 本研究では,Blord's-Eye View(BEV)生成を探索し,BEVマップを対応する多視点ストリートイメージに変換する。 BEVは、マルチセンサー融合を支援する統一空間表現で価値があり、様々な自律運転アプリケーションにおいて重要な役割を担っている。 BEVマップから正確なストリートビュー画像を作成することは、複雑な交通シナリオを描写し、運転アルゴリズムを強化するために不可欠である。同時に、拡散に基づく条件付き画像生成モデルは、多種多様で高品質で条件に整合した結果が得られ、顕著な結果を示した。それでも、これらのモデルのトレーニングには、かなりのデータと計算資源が必要である。したがって、特定の条件生成タスクのための安定拡散のような先進的なモデルを微調整する方法が、有望な道として現れる。本稿では,BEVレイアウトから画像を生成するための実用的なフレームワークを提案する。提案手法は,ニューラルビュー変換とストリート画像生成の2つの主要コンポーネントから構成される。ニューラルビュー変換フェーズは、BEVとパースペクティブビューの形状対応を学習することにより、BEVマップをアライメントされたマルチビューセマンティックセマンティックセマンティクスマップに変換する。その後、Street Image Generation フェーズでは、これらのセグメンテーションを、微調整された潜在拡散モデルを導く条件として利用する。この微調整プロセスにより、ビューとスタイルの一貫性が保証される。本モデルでは,交通状況下での大規模な事前学習拡散モデルの生成能力を活用し,多種多様かつ条件に整合したストリートビュー画像を生成する。 We explore Bird's-Eye View (BEV) generation, converting a BEV map into its corresponding multi-view street images. Valued for its unified spatial representation aiding multi-sensor fusion, BEV is pivotal for various autonomous driving applications. Creating accurate street-view images from BEV maps is essential for portraying complex traffic scenarios and enhancing driving algorithms. Concurrently, diffusion-based conditional image generation models have demonstrated remarkable outcomes, adept at producing diverse, high-quality, and condition-aligned results. Nonetheless, the training of these models demands substantial data and computational resources. Hence, exploring methods to fine-tune these advanced models, like Stable Diffusion, for specific conditional generation tasks emerges as a promising avenue. In this paper, we introduce a practical framework for generating images from a BEV layout. Our approach comprises two main components: the Neural View Transformation and the Street Image Generation. The Neural View Transformation phase converts the BEV map into aligned multi-view semantic segmentation maps by learning the shape correspondence between the BEV and perspective views. Subsequently, the Street Image Generation phase utilizes these segmentations as a condition to guide a fine-tuned latent diffusion model. This finetuning process ensures both view and style consistency. Our model leverages the generative capacity of large pretrained diffusion models within traffic contexts, effectively yielding diverse and condition-coherent street view images.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# Fed-MUnet:脳腫瘍分離のための多モードフェデレーションUnet Fed-MUnet: Multi-modal Federated Unet for Brain Tumor Segmentation ( http://arxiv.org/abs/2409.01020v1 ) ライセンス: Link先を確認	Ruojun Zhou, Lisha Qu, Lei Zhang, Ziming Li, Hongwei Yu, Bing Luo,	(参考訳) 深層学習に基づく手法は、単モード磁気共鳴イメージング(MRI)画像とマルチモード磁気共鳴イメージング(MRI)画像の両方を用いて脳腫瘍のセグメンテーションに広く用いられている。最近の研究の多くは、診療所間のデータ共有という本質的な課題のために、集中的なトレーニングに重点を置いている。プライバシーの懸念を軽減するために、研究者は脳腫瘍のセグメンテーションタスクにフェデレートラーニング(FL)メソッドを導入した。しかし、現在では単一のモーダルMRIに焦点が当てられており、マルチモーダルMRIについては限定的な研究がなされている。この課題には、複雑な構造、大規模パラメータ、マルチモーダルMRIを用いたFLベースの手法の過剰適合問題などが含まれる。以上の課題に対処するため,我々は,FLトレーニングに適した脳腫瘍セグメンテーション(Fed-MUnet)のための新しいマルチモーダルFLフレームワークを提案する。我々は、BraTS2022データセットを用いて、我々のアプローチを評価した。実験により,本フレームワークは分散学習とプライバシ保護のFL特性を実現することを示す。腫瘍, 腫瘍コア, 腫瘍全体の5つの指標の平均値は87.5%, 90.6%, 92.2%であり, それぞれSOTA法よりも高い値を示した。パラメータ数、浮動小数点演算量(FLOP)、推論の観点では、Fed-MUnetは最先端のセグメンテーションバックボーンと比較してパレートが最適であり、高いパフォーマンスを実現し、プライバシー問題に取り組む。私たちのコードはhttps://github.com/Arnold-Jun/Fed-MUnet.comでオープンソース化されています。 Deep learning-based techniques have been widely utilized for brain tumor segmentation using both single and multi-modal Magnetic Resonance Imaging (MRI) images. Most current studies focus on centralized training due to the intrinsic challenge of data sharing across clinics. To mitigate privacy concerns, researchers have introduced Federated Learning (FL) methods to brain tumor segmentation tasks. However, currently such methods are focusing on single modal MRI, with limited study on multi-modal MRI. The challenges include complex structure, large-scale parameters, and overfitting issues of the FL based methods using multi-modal MRI. To address the above challenges, we propose a novel multi-modal FL framework for brain tumor segmentation (Fed-MUnet) that is suitable for FL training. We evaluate our approach with the BraTS2022 datasets, which are publicly available. The experimental results demonstrate that our framework achieves FL nature of distributed learning and privacy preserving. For the enhancing tumor, tumor core and whole tumor, the mean of five major metrics were 87.5%, 90.6% and 92.2%, respectively, which were higher than SOTA methods while preserving privacy. In terms of parameters count, quantity of floating-point operations (FLOPs) and inference, Fed-MUnet is Pareto optimal compared with the state-of-the-art segmentation backbone while achieves higher performance and tackles privacy issue. Our codes are open-sourced at https://github.com/Arnold-Jun/Fed-MUnet.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# SINET:水中画像強調のための空間駆動型解釈型ニューラルネットワーク SINET: Sparsity-driven Interpretable Neural Network for Underwater Image Enhancement ( http://arxiv.org/abs/2409.01022v1 ) ライセンス: Link先を確認	Gargi Panda, Soumitra Kundu, Saumik Bhattacharya, Aurobinda Routray,	(参考訳) 水中画像の品質向上は海洋研究と技術の発展に不可欠である。この研究は、水中画像強調(UIE)タスクのための空間駆動型解釈型ニューラルネットワーク(SINET)を導入する。純粋な深層学習法とは異なり、我々のネットワークアーキテクチャは、新しいチャネル固有の畳み込みスパース符号化(CCSC)モデルに基づいており、基礎となる画像強調プロセスの良好な解釈性を保証する。 SINETの鍵となる特徴は、3つのスパース特徴推定ブロック(SFEB)を用いて3つの色チャネルから有意な特徴を推定することである。 SFEBのアーキテクチャは、$\ell_1$ regulaized convolutional sparse coding (CSC) 問題を解決するための反復アルゴリズムをアンロールすることによって設計されている。我々の実験によると、SINETは最先端のPSNRの値を$1.05$dB、計算複雑性を$3873$で上回っている。 Improving the quality of underwater images is essential for advancing marine research and technology. This work introduces a sparsity-driven interpretable neural network (SINET) for the underwater image enhancement (UIE) task. Unlike pure deep learning methods, our network architecture is based on a novel channel-specific convolutional sparse coding (CCSC) model, ensuring good interpretability of the underlying image enhancement process. The key feature of SINET is that it estimates the salient features from the three color channels using three sparse feature estimation blocks (SFEBs). The architecture of SFEB is designed by unrolling an iterative algorithm for solving the $\ell_1$ regulaized convolutional sparse coding (CSC) problem. Our experiments show that SINET surpasses state-of-the-art PSNR value by $1.05$ dB with $3873$ times lower computational complexity.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# データ分割におけるランダム性による予測精度の変動と間隔推定による公正評価 Variation in prediction accuracy due to randomness in data division and fair evaluation using interval estimation ( http://arxiv.org/abs/2409.01025v1 ) ライセンス: Link先を確認	Isao Goto,	(参考訳) 本稿では,機械学習アルゴリズムを用いて予測モデルを構築する際の「簡単な問題」に答えようとする。様々な疾患の診断および予測モデルは、大規模なコホート研究と機械学習アルゴリズムのデータを用いて提案されているが、その一般化性には課題がある。この課題のいくつかの原因が指摘されており、ランダムなデータセットの分割がその1つと考えられている。本研究では,AutoML(Automatic Machine Learning framework)とオープン糖尿病データを用いて,「初期状態」に依存した33,600の糖尿病診断モデルを構築し,その予測精度を評価した。その結果,予測精度は初期状態依存分布であった。この分布は正規分布に従うことができるため,予測モデルの精度を正確に比較するために,統計的間隔推定を用いて予測精度の予測間隔を推定する。 This paper attempts to answer a "simple question" in building predictive models using machine learning algorithms. Although diagnostic and predictive models for various diseases have been proposed using data from large cohort studies and machine learning algorithms, challenges remain in their generalizability. Several causes for this challenge have been pointed out, and partitioning of the dataset with randomness is considered to be one of them. In this study, we constructed 33,600 diabetes diagnosis models with "initial state" dependent randomness using autoML (automatic machine learning framework) and open diabetes data, and evaluated their prediction accuracy. The results showed that the prediction accuracy had an initial state-dependent distribution. Since this distribution could follow a normal distribution, we estimated the expected interval of prediction accuracy using statistical interval estimation in order to fairly compare the accuracy of the prediction models.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# 顔偽物検出のための偽物発見学習 Learning to Discover Forgery Cues for Face Forgery Detection ( http://arxiv.org/abs/2409.01030v1 ) ライセンス: Link先を確認	Jiahe Tian, Peng Chen, Cai Yu, Xiaomeng Fu, Xi Wang, Jiao Dai, Jizhong Han,	(参考訳) フォージェリーキューのピクセルレベルのアノテーションである位置操作マップは、顔フォージェリー検出において解釈可能な検出結果を提供するのに不可欠である。関連する学習オブジェクトは、検出器の分類性能を改善するための補助的なタスクとして広く採用されているが、実際の顔と偽顔を比較して、操作マップを監督として取得する必要がある。この要件は、未確認の顔に適用性を制限するとともに、現実のシナリオに矛盾する。さらに、使用した比較手法は、圧縮やアップサンプリングによって導入されたノイズを含む、すべての変化したピクセルに注釈を付ける。このようなマップを監督として使用すると、悪用可能な手がかりの学習が妨げられ、モデルが過度に適合する傾向がある。これらの問題に対処するために,フォージェリーキューディスカバリ (FoCus) と呼ばれる弱教師付きモデルを導入する。 FoCusは、注意マップ内の鍛造された領域を検知するいくつかの検出器とは異なり、部分的かつ不正確な偽造の手がかりを捕捉する欠点を補うように設計されている。具体的には、分類中の偽の手がかりを特定するための分類注意領域提案モジュールと、よりリッチな手がかりの学習を容易にするための補完学習モジュールを提案する。生成した操作マップは、顔偽造検知器を強化するためにより良い監視を行うことができる。提案したFoCusの操作マップの可視化は,既存手法と比較して高い解釈性とロバスト性を示す。 5つのデータセットと4つのマルチタスクモデルに対する実験は、FoCusがデータセット内およびデータセット内の両方で有効であることを示す。 Locating manipulation maps, i.e., pixel-level annotation of forgery cues, is crucial for providing interpretable detection results in face forgery detection. Related learning objects have also been widely adopted as auxiliary tasks to improve the classification performance of detectors whereas they require comparisons between paired real and forged faces to obtain manipulation maps as supervision. This requirement restricts their applicability to unpaired faces and contradicts real-world scenarios. Moreover, the used comparison methods annotate all changed pixels, including noise introduced by compression and upsampling. Using such maps as supervision hinders the learning of exploitable cues and makes models prone to overfitting. To address these issues, we introduce a weakly supervised model in this paper, named Forgery Cue Discovery (FoCus), to locate forgery cues in unpaired faces. Unlike some detectors that claim to locate forged regions in attention maps, FoCus is designed to sidestep their shortcomings of capturing partial and inaccurate forgery cues. Specifically, we propose a classification attentive regions proposal module to locate forgery cues during classification and a complementary learning module to facilitate the learning of richer cues. The produced manipulation maps can serve as better supervision to enhance face forgery detectors. Visualization of the manipulation maps of the proposed FoCus exhibits superior interpretability and robustness compared to existing methods. Experiments on five datasets and four multi-task models demonstrate the effectiveness of FoCus in both in-dataset and cross-dataset evaluations.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# パラメータ効率の良い微調整におけるタスク特化方向のパワーの解放 Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning ( http://arxiv.org/abs/2409.01035v1 ) ライセンス: Link先を確認	Chongjie Si, Zhiyi Shi, Shifan Zhang, Xiaokang Yang, Hanspeter Pfister, Wei Shen,	(参考訳) 大規模な言語モデルは、下流のタスクで素晴らしいパフォーマンスを示すが、全てのパラメータを完全に微調整する際には、リソース消費がかなり必要である。これを軽減するために、LoRAのようなパラメータ効率の良い細調整(PEFT)戦略が開発されている。本稿では,大規模モデルを事前学習状態からPEFTにおけるタスク固有化へ移行させる上で,タスク固有方向の概念を掘り下げる。本稿では,これらの方向性を明確に定義し,その特性と実用化の課題を探求する枠組みを提案する。そこで我々は,タスク特定方向の影響を最大化し,目標タスクに対するモデル性能を向上させることを目的とした,新しいアプローチであるLoRA-Dashを導入する。広汎な実験によりLoRA-Dashの有効性が確定され、詳細な分析によりLoRA-Dashの基礎となるメカニズムが明らかにされた。コードはhttps://github.com/Chongjie-Si/Subspace-Tuning.comで公開されている。 Large language models demonstrate impressive performance on downstream tasks, yet requiring extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions--critical for transitioning large models from pre-trained states to task-specific enhancements in PEFT. We propose a framework to clearly define these directions and explore their properties, and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of task-specific directions during the fine-tuning process, thereby enhancing model performance on targeted tasks. Extensive experiments have conclusively demonstrated the effectiveness of LoRA-Dash, and in-depth analyses further reveal the underlying mechanisms of LoRA-Dash. The code is available at https://github.com/Chongjie-Si/Subspace-Tuning.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# NYK-MS:カートゥーン・キャプション・データセットの多モードメタファーとサーカスム理解ベンチマーク NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding Benchmark on Cartoon-Caption Dataset ( http://arxiv.org/abs/2409.01037v1 ) ライセンス: Link先を確認	Ke Chang, Hao Li, Junzhao Zhang, Yunfang Wu,	(参考訳) メタファーとサルカズムは人々のコミュニケーション、特にインターネットや10代の若者に人気があるミームにおいて一般的な比喩表現である。我々はNYK-MS(NewYorKer for Metaphor and Sarcasm)という新しいベンチマークを作成し、比喩理解タスクの1,583のサンプルと皮肉理解タスクの1,578のサンプルを含む。これらのタスクにはメタファ/サルカズムが含まれているか、どの単語やオブジェクトがメタファ/サルカズムを含んでいるか、何を風刺しているか、なぜそれがメタファ/サルカズムを含んでいるのか、そして7つのタスクのすべてが少なくとも3つのアノテーションによって十分に注釈付けされている。一貫性と品質を向上させるために、いくつかのラウンドでデータセットに注釈を付け、GUIとGPT-4Vを使って効率を上げる。ベンチマークに基づいて、多くの実験を行います。ゼロショット実験では,Large Language Models (LLM) とLarge Multi-modal Models (LMM) が分類タスクをうまく行うことができず,スケールが大きくなるにつれて,他の5つのタスクのパフォーマンスが向上することを示した。従来のプレトレインモデルを用いた実験では,拡張法とアライメント法により,ベンチマークが以前のデータセットと整合性を証明し,両モードの双方を理解するためにモデルが必要であることを示す。 Metaphor and sarcasm are common figurative expressions in people's communication, especially on the Internet or the memes popular among teenagers. We create a new benchmark named NYK-MS (NewYorKer for Metaphor and Sarcasm), which contains 1,583 samples for metaphor understanding tasks and 1,578 samples for sarcasm understanding tasks. These tasks include whether it contains metaphor/sarcasm, which word or object contains metaphor/sarcasm, what does it satirize and why does it contains metaphor/sarcasm, all of the 7 tasks are well-annotated by at least 3 annotators. We annotate the dataset for several rounds to improve the consistency and quality, and use GUI and GPT-4V to raise our efficiency. Based on the benchmark, we conduct plenty of experiments. In the zero-shot experiments, we show that Large Language Models (LLM) and Large Multi-modal Models (LMM) can't do classification task well, and as the scale increases, the performance on other 5 tasks improves. In the experiments on traditional pre-train models, we show the enhancement with augment and alignment methods, which prove our benchmark is consistent with previous dataset and requires the model to understand both of the two modalities.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# 街路地図を用いた降雨時のロバスト車両位置推定と追跡 Robust Vehicle Localization and Tracking in Rain using Street Maps ( http://arxiv.org/abs/2409.01038v1 ) ライセンス: Link先を確認	Yu Xiang Tan, Malika Meghjani,	(参考訳) GPSによる車両のローカライゼーションと追跡は、トンネルセグメントや密集した都市部でよく経験される不安定な位置情報に悩まされている。また、視覚オドメトリー(VO)と視覚慣性オドメトリー(VIO)は、視覚入力の閉塞やぼやけを引き起こす悪天候条件の影響を受けやすい。本稿では,道路網を用いた地図情報を用いた車両位置推定手法を提案し,特に降雨やトンネルを走行するような敵のシナリオにおいて,ドリフト計測と間欠的なGPS計測を補正する。具体的には、断続的なGPSとドリフトIMUとVOを融合したフレキシブルフュージョンアルゴリズムと、ロバストな車両のローカライゼーションと追跡のための2次元マップ情報を組み合わせた。われわれのアプローチをMap-Fusionと呼んでいる。本提案手法は,晴天・降雨条件にまたがる異なる国々の地理的に多様な4つのデータセットに対して,強固に評価する。これらのデータセットには、トンネルやアンダーパスにおける難解な視覚セグメントも含まれている。マップ情報の統合により、Map-Fusionアルゴリズムは、すべてのデータセットにまたがる最先端VOおよびVIOアプローチの誤差を低減する。また,提案したアルゴリズムを実環境およびハードウェア制約された移動ロボット上でリアルタイムに検証する。 Map-Fusionは、晴天で2.46m、雨で6.05m、150mルートで6.05mの誤差を達成した。 GPS-based vehicle localization and tracking suffers from unstable positional information commonly experienced in tunnel segments and in dense urban areas. Also, both Visual Odometry (VO) and Visual Inertial Odometry (VIO) are susceptible to adverse weather conditions that causes occlusions or blur on the visual input. In this paper, we propose a novel approach for vehicle localization that uses street network based map information to correct drifting odometry estimates and intermittent GPS measurements especially, in adversarial scenarios such as driving in rain and tunnels. Specifically, our approach is a flexible fusion algorithm that integrates intermittent GPS, drifting IMU and VO estimates together with 2D map information for robust vehicle localization and tracking. We refer to our approach as Map-Fusion. We robustly evaluate our proposed approach on four geographically diverse datasets from different countries ranging across clear and rain weather conditions. These datasets also include challenging visual segments in tunnels and underpasses. We show that with the integration of the map information, our Map-Fusion algorithm reduces the error of the state-of-the-art VO and VIO approaches across all datasets. We also validate our proposed algorithm in a real-world environment and in real-time on a hardware constrained mobile robot. Map-Fusion achieved 2.46m error in clear weather and 6.05m error in rain weather for a 150m route.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# 修正Q-ラーニングアルゴリズムを用いた多目的タスク学習の高速化 Accelerated Multi-objective Task Learning using Modified Q-learning Algorithm ( http://arxiv.org/abs/2409.01046v1 ) ライセンス: Link先を確認	Varun Prakash Rajamohan, Senthil Kumar Jagatheesaperumal,	(参考訳) ロボットは産業において広範囲の応用を見出す。近年,家庭シナリオにおいてもロボットの影響が急速に拡大している。 Q-learningアルゴリズムは、目標を達成するための報酬を最大化することを目的としている。本稿では,Q-SDを用いたQ-learningと呼ばれるQ-ラーニングアルゴリズムの修正版を提案する。このアルゴリズムはタスク学習を強化し、タスク完了をより意味のあるものにする。ロボットマニピュレータ(エージェント)は、テーブルクリーニングのタスクにQ-SDアルゴリズムを適用する。 Q-SDを用いて、エージェントは、マニピュレータの移動距離を最小化しながらタスクを達成するために必要なステップのシーケンスを取得する。テーブルを異なる次元のグリッドに分割します。第1のグリッド数は3倍、第2のグリッドは4倍の4倍である。 Q-SDアルゴリズムを用いて、これらの2つの環境で得られた最大成功率は、それぞれ86%と59%であった。さらに,従来のQ-ラーニングアルゴリズムと比較して,これら2つの環境においてエージェントが移動した平均距離の減少は,それぞれ8.61%,6.7%であった。 Robots find extensive applications in industry. In recent years, the influence of robots has also increased rapidly in domestic scenarios. The Q-learning algorithm aims to maximise the reward for reaching the goal. This paper proposes a modified version of the Q-learning algorithm, known as Q-learning with scaled distance metric (Q-SD). This algorithm enhances task learning and makes task completion more meaningful. A robotic manipulator (agent) applies the Q-SD algorithm to the task of table cleaning. Using Q-SD, the agent acquires the sequence of steps necessary to accomplish the task while minimising the manipulator's movement distance. We partition the table into grids of different dimensions. The first has a grid count of 3 times 3, and the second has a grid count of 4 times 4. Using the Q-SD algorithm, the maximum success obtained in these two environments was 86% and 59% respectively. Moreover, Compared to the conventional Q-learning algorithm, the drop in average distance moved by the agent in these two environments using the Q-SD algorithm was 8.61% and 6.7% respectively.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# INTENTAS -- 微小重力のためのエンタングルメント強化原子センサー INTENTAS -- An entanglement-enhanced atomic sensor for microgravity ( http://arxiv.org/abs/2409.01051v1 ) ライセンス: Link先を確認	O. Anton, I. Bröckel, D. Derr, A. Fieguth, M. Franzke, M. Gärtner, E. Giese, J. S. Haase, J. Hamann, A. Heidt, S. Kanthak, C. Klempt, J. Kruse, M. Krutzik, S. Kubitza, C. Lotz, K. Müller, J. Pahl, E. M. Rasel, M. Schiemangk, W. P. Schleich, S. Schwertfeger, A. Wicht, L. Wörner,	(参考訳) INTENTASプロジェクトは、微小重力環境下で絡み合ったボース=アインシュタイン凝縮体(BEC)を利用した原子センサーを開発することを目的としている。この重要な成果は、絡み合いの強い感性と長い尋問時間の両方から恩恵を受ける測定能力を向上させるために必要である。このプロジェクトは、ハノーファーのアインシュタイン・エレベータの実験プラットフォームに特有のサイズ、重量、電力管理(SWaP)に関する重要な課題に対処している。この設計により、絡み目の生成と検出に不可欠な低騒音環境が確保される。さらに、この装置は、BECを全光学的に作成するための革新的なアプローチを特徴とし、様々な構成のためのフレキシブルなシステムを提供し、迅速なターンアラウンドタイムの要求を満たす。 Einstein-Elevatorにおけるこの技術の実証が成功すれば、宇宙への将来の展開の道が開けることになる。 The INTENTAS project aims to develop an atomic sensor utilizing entangled Bose-Einstein condensates (BECs) in a microgravity environment. This key achievement is necessary to advance the capability for measurements that benefit from both entanglement-enhanced sensitivities and extended interrogation times. The project addresses significant challenges related to size, weight, and power management (SWaP) specific to the experimental platform at the Einstein-Elevator in Hannover. The design ensures a low-noise environment essential for the creation and detection of entanglement. Additionally, the apparatus features an innovative approach to the all-optical creation of BECs, providing a flexible system for various configurations and meeting the requirements for rapid turnaround times. Successful demonstration of this technology in the Einstein-Elevator will pave the way for a future deployment in space, where its potential applications will unlock high-precision quantum sensing.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# 生成AIの文脈における文学メタファーの展望 A Perspective on Literary Metaphor in the Context of Generative AI ( http://arxiv.org/abs/2409.01053v1 ) ライセンス: Link先を確認	Imke van Heerden, Anil Bas,	(参考訳) 本研究は,創作テキスト生成と文芸理論の交わりにおいて,文芸メタファーの役割と,多種多様な意味を生み出す能力について考察する。この点において、文学的比喩は特定の言語の発展に不可欠である。原語の含意が文質を向上させるかどうかを検討するため,アフリカーンスでLSTMに基づく言語モデルを訓練した。ネットワークは、魅力的に斬新な音声の人物を含むフレーズを生成する。具体的には、AIがどのようにデファミリアライゼーション技術として活用されるかに重点を置いている。テキスト生成に関する文学的視点を提供することで、本論文は美的価値、解釈、評価に関する思慮に富んだ疑問を提起する。 At the intersection of creative text generation and literary theory, this study explores the role of literary metaphor and its capacity to generate a range of meanings. In this regard, literary metaphor is vital to the development of any particular language. To investigate whether the inclusion of original figurative language improves textual quality, we trained an LSTM-based language model in Afrikaans. The network produces phrases containing compellingly novel figures of speech. Specifically, the emphasis falls on how AI might be utilised as a defamiliarisation technique, which disrupts expected uses of language to augment poetic expression. Providing a literary perspective on text generation, the paper raises thought-provoking questions on aesthetic value, interpretation and evaluation.	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# Follow-Your-Canvas: 大規模コンテンツ生成による高分解能ビデオ露光 Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation ( http://arxiv.org/abs/2409.01055v1 ) ライセンス: Link先を確認	Qihua Chen, Yue Ma, Hongfa Wang, Junkun Yuan, Wenzhe Zhao, Qi Tian, Hongmei Wang, Shaobo Min, Qifeng Chen, Wei Liu,	(参考訳) 本稿では,大規模なコンテンツ生成による高精細映像の画質向上について検討する。我々は、ビデオに大きく勝とうとする既存の手法が直面する一般的な問題として、低品質なコンテンツの生成とGPUメモリによる制限を挙げている。これらの課題に対処するため,<textit{Follow-Your-Canvas} という拡散型手法を提案する。基本設計は2つある。まず,「単発」のアウトペイントという一般的な手法を使わずに,タスクを空間的ウィンドウに分散し,シームレスにマージする。これにより、GPUメモリに制約されることなく、どんなサイズや解像度の動画にも勝ることができます。次に、ソース映像とその相対位置関係を各ウィンドウの生成工程に注入する。これにより、各ウィンドウ内の生成された空間レイアウトが、ソースビデオと調和する。これら2つの設計と組み合わせることで、空間的・時間的整合性を維持しつつ、リッチなコンテンツで高解像度の露光映像を生成することができる。 Follow-Your-Canvas は 512X512 から 1152X2048 (9X) までの大規模なビデオ撮影で優れており、高品質で美的な結果が得られる。様々な解像度とスケールのセットアップで最高の定量的結果が得られる。コードはhttps://github.com/mayuelala/FollowYourCanvasで公開されている。 This paper explores higher-resolution video outpainting with extensive content generation. We point out common issues faced by existing methods when attempting to largely outpaint videos: the generation of low-quality content and limitations imposed by GPU memory. To address these challenges, we propose a diffusion-based method called \textit{Follow-Your-Canvas}. It builds upon two core designs. First, instead of employing the common practice of "single-shot" outpainting, we distribute the task across spatial windows and seamlessly merge them. It allows us to outpaint videos of any size and resolution without being constrained by GPU memory. Second, the source video and its relative positional relation are injected into the generation process of each window. It makes the generated spatial layout within each window harmonize with the source video. Coupling with these two designs enables us to generate higher-resolution outpainting videos with rich content while keeping spatial and temporal consistency. Follow-Your-Canvas excels in large-scale video outpainting, e.g., from 512X512 to 1152X2048 (9X), while producing high-quality and aesthetically pleasing results. It achieves the best quantitative results across various resolution and scale setups. The code is released on https://github.com/mayuelala/FollowYourCanvas	翻訳日:2024-09-06 07:49:16 公開日:2024-09-02
# No Peer, No Cry: フォールトインジェクションによるネットワークアプリケーションファズリング No Peer, no Cry: Network Application Fuzzing via Fault Injection ( http://arxiv.org/abs/2409.01059v1 ) ライセンス: Link先を確認	Nils Bars, Moritz Schloegel, Nico Schiller, Lukas Bernhard, Thorsten Holz,	(参考訳) ネットワーク対応アプリケーションは、特にインターネットに接続された場合、あらゆる種類の攻撃にさらされる。結果として、NginxやcurlのようなクライアントアプリケーションのようなWebサーバは、メモリ安全性違反を排除すべく、コードのセキュリティと強化にあらゆる努力を払っています。ファジングはソフトウェアのバグを発見するための最も成功したアプローチの1つだと証明されているからだ。しかし、ネットワークアプリケーションをファジィングすることに焦点を当てた驚くべき研究はほとんどない。基礎となる理由を研究すると、コミュニケーションのインタラクティブな性質、そのステートフルさ、交換メッセージの保護が典型的なファジィザを非効率にすることがわかった。記録されたメッセージを再生したり、それをオンザフライで修正しようとする試みは、特定のターゲットに対してのみ機能し、しばしば通信の早期終了につながる。本稿では、これらの課題を詳細に議論し、既存のプロトコル状態空間への取り組みの焦点がいかにして緩和し難いかを強調する。我々は、メッセージを変更するのではなく、フォールトインジェクションに依存する、根本的に異なるアプローチを提案する。効果的に、私たちはコミュニケーションピアの1つを、その出力がターゲットピアの期待に合わない奇妙な状態に強制します。重要なことは、この奇妙なピアはプロトコルメッセージを適切に暗号化/署名することができ、現在のファジィザの根本的な課題を克服できます。事実上、通信システムをそのままにしておくが、小さな汚職を発生させる。サーバまたはクライアントを奇妙なピアにすることができるので、クライアントサイドのネットワークアプリケーションを効果的にテストできるのは、私たちのアプローチが初めてです。 16の目標を評価した結果,Fuzztruction-Netは,他のファジィよりもカバー範囲やバグの点で優れていることがわかった。全体として、Fuzztruction-Netは、WebサーバのNginxやApache HTTPd、OpenSSHクライアントなど、よくテストされたソフトウェアの23のバグを発見した。 Network-facing applications are commonly exposed to all kinds of attacks, especially when connected to the internet. As a result, web servers like Nginx or client applications such as curl make every effort to secure and harden their code to rule out memory safety violations. One would expect this to include regular fuzz testing, as fuzzing has proven to be one of the most successful approaches to uncovering bugs in software. Yet, surprisingly little research has focused on fuzzing network applications. When studying the underlying reasons, we find that the interactive nature of communication, its statefulness, and the protection of exchanged messages render typical fuzzers ineffective. Attempts to replay recorded messages or modify them on the fly only work for specific targets and often lead to early termination of communication. In this paper, we discuss these challenges in detail, highlighting how the focus of existing work on protocol state space promises little relief. We propose a fundamentally different approach that relies on fault injection rather than modifying messages. Effectively, we force one of the communication peers into a weird state where its output no longer matches the expectations of the target peer, potentially uncovering bugs. Importantly, this weird peer can still properly encrypt/sign the protocol message, overcoming a fundamental challenge of current fuzzers. In effect, we leave the communication system intact but introduce small corruptions. Since we can turn either the server or the client into the weird peer, our approach is the first that can effectively test client-side network applications. Evaluating 16 targets, we show that Fuzztruction-Net outperforms other fuzzers in terms of coverage and bugs found. Overall, Fuzztruction-Net uncovered 23 new bugs in well-tested software, such as the web servers Nginx and Apache HTTPd and the OpenSSH client.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# ランダム消去によるモデル反転攻撃に対する防御 Defending against Model Inversion Attacks via Random Erasing ( http://arxiv.org/abs/2409.01062v1 ) ライセンス: Link先を確認	Viet-Hung Tran, Ngoc-Bao Nguyen, Son T. Mai, Hans Vandierendonck, Ngai-man Cheung,	(参考訳) Model Inversion(MI)は、マシンラーニングモデルの悪用を通じてプライベートトレーニングデータを再構築することに焦点を当てた、プライバシー侵害の一種である。 MI攻撃に対抗するため、SOTA(State-of-the-art)MI防衛法はトレーニング損失と矛盾する正規化に依存し、プライバシ保護とモデルユーティリティの間に明確な緊張を生じさせる。本稿では,MI攻撃に対する防御方法を新たに提案する。我々の手法は新たな視点を採り、トレーニングデータに焦点をあてる。我々のアイデアは、過去にデータ拡張技術として応用されたランダム消去(RE)に関する新しい知見に基づいており、閉塞下でのモデルの精度を向上させる。我々の研究では、MI攻撃精度の劣化にREを適用することに重点を置いている。我々の重要な洞察は、MI攻撃は高次元のプライベートイメージを再構築するために、モデル内に符号化された大量のプライベートトレーニングデータ情報を必要とすることである。そこで本研究では,トレーニング中のモデルに提示されるプライベート情報を減らすためにREを適用することを提案する。その結果,MI復元の精度が著しく低下し,攻撃精度が低下する可能性が示唆された。一方、モデルの自然な精度は適度にしか影響しない。本手法は,既存の防衛手法を実装・補完することが極めて容易である。提案手法は,モデルのプライバシと実用性のバランスをとる上で,SOTAの性能を実現することができることを示す。その結果,MI攻撃,ネットワークアーキテクチャ,アタック構成にまたがる既存の防御よりも,我々の手法が優れていることを一貫して示している。 Model Inversion (MI) is a type of privacy violation that focuses on reconstructing private training data through abusive exploitation of machine learning models. To defend against MI attacks, state-of-the-art (SOTA) MI defense methods rely on regularizations that conflict with the training loss, creating explicit tension between privacy protection and model utility. In this paper, we present a new method to defend against MI attacks. Our method takes a new perspective and focuses on training data. Our idea is based on a novel insight on Random Erasing (RE), which has been applied in the past as a data augmentation technique to improve the model accuracy under occlusion. In our work, we instead focus on applying RE for degrading MI attack accuracy. Our key insight is that MI attacks require significant amount of private training data information encoded inside the model in order to reconstruct high-dimensional private images. Therefore, we propose to apply RE to reduce private information presented to the model during training. We show that this can lead to substantial degradation in MI reconstruction quality and attack accuracy. Meanwhile, natural accuracy of the model is only moderately affected. Our method is very simple to implement and complementary to existing defense methods. Our extensive experiments of 23 setups demonstrate that our method can achieve SOTA performance in balancing privacy and utility of the models. The results consistently demonstrate the superiority of our method over existing defenses across different MI attacks, network architectures, and attack configurations.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# ハイブリッドアクティブ推論モデルにおける学習 Learning in Hybrid Active Inference Models ( http://arxiv.org/abs/2409.01066v1 ) ライセンス: Link先を確認	Poppy Collis, Ryan Singh, Paul F Kinghorn, Christopher L Buckley,	(参考訳) 人工知能におけるオープンな問題は、システムが本質的に連続的な問題を解決するのに有用な離散的な抽象化を柔軟に学習する方法である。計算神経科学におけるこれまでの研究は、能動的推論の形式主義の下で意思決定中に離散変数と連続変数を機能的に統合することを検討した(Parr, Friston & de Vries, 2017; Parr & Friston, 2018)。しかし、その焦点はカテゴリー決定の表現的物理的実装であり、階層的混合生成モデルが知られていると仮定される。結果として、このフレームワークが学習にどのように拡張されるのかは不明だ。そこで本研究では,高レベル離散型アクティブ・推論・プランナが低レベル連続型アクティブ・推論・コントローラの上に位置する,新しい階層型ハイブリッド・アクティブ・推論・エージェントを提案する。複素連続力学の断片的線形分解による有意な離散表現のエンドツーエンド学習を実現するリカレントスイッチング線形力学系(rSLDS)の最近の研究を活用している(Linderman et al , 2016)。 rSLDSが学習した表現は,(1)オプションフレームワークを連想させる手法で時間的に制約されたサブゴールを指定できるようにし,(2)情報理論的な探索ボーナスを活用できるように,(2)離散空間への探索を解除し,(3)離散プランナーの低レベル問題に対する近似解を「キャッシュ」する。提案手法を連続マウンテンカータスクに適用し,探索の強化による高速なシステム識別と,抽象的なサブゴールのデライン化による計画成功を実証する。 An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018). However, their focus is on the expressive physical implementation of categorical decisions and the hierarchical mixed generative model is assumed to be known. As a consequence, it is unclear how this framework might be extended to learning. We therefore present a novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller. We make use of recent work in recurrent switching linear dynamical systems (rSLDS) which implement end-to-end learning of meaningful discrete representations via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). The representations learned by the rSLDS inform the structure of the hybrid decision-making agent and allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and successful planning through the delineation of abstract sub-goals.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# グローバル・ローカル・デフォルマブル・トランスフォーメーションによるプログレッシブ網膜画像登録 Progressive Retinal Image Registration via Global and Local Deformable Transformations ( http://arxiv.org/abs/2409.01068v1 ) ライセンス: Link先を確認	Yepeng Liu, Baosheng Yu, Tian Chen, Yuliang Gu, Bo Du, Yongchao Xu, Jun Cheng,	(参考訳) 網膜画像登録は眼科診断過程において重要な役割を担っている。異なる網膜画像間の視角や解剖学的構造にばらつきがあるため、キーポイントベースのアプローチは、その堅牢性と低レイテンシにより、網膜画像登録の主流となる。これらの手法は通常、網膜表面が平面であると仮定し、画像間の大域的な変換を表すホモグラフィ行列を得るために特徴マッチングを採用する。しかし、このような平面仮説は、網膜表面がほぼ湾曲しているため、必然的に登録誤差を生じさせる。この制限は、視角に有意な差がある画像対を登録する場合に顕著である。この問題に対処するため,HybridRetinaと呼ばれるハイブリッドレジストレーションフレームワークを提案する。そこで我々は,GAMorphと呼ばれるキーポイント検出器と変形ネットワークを用いて,大域的な変換と局所的な変形可能な変換を推定する。具体的には,GAMorphのトレーニングを指導するために,多段階の画素関係知識を統合する。さらに,画像の幾何学的先行を含むエッジアテンションモジュールを利用することで,画像の変形領域が臨床的興味のある血管領域により集中することを保証する。 FIREとFLoRI21という2つの広く使われているデータセットの実験により、提案したHybridRetinaは最先端の手法よりも大幅に優れていることが示された。コードはhttps://github.com/lyp-deeplearning/awesome-retinal-registrationで公開されている。 Retinal image registration plays an important role in the ophthalmological diagnosis process. Since there exist variances in viewing angles and anatomical structures across different retinal images, keypoint-based approaches become the mainstream methods for retinal image registration thanks to their robustness and low latency. These methods typically assume the retinal surfaces are planar, and adopt feature matching to obtain the homography matrix that represents the global transformation between images. Yet, such a planar hypothesis inevitably introduces registration errors since retinal surface is approximately curved. This limitation is more prominent when registering image pairs with significant differences in viewing angles. To address this problem, we propose a hybrid registration framework called HybridRetina, which progressively registers retinal images with global and local deformable transformations. For that, we use a keypoint detector and a deformation network called GAMorph to estimate the global transformation and local deformable transformation, respectively. Specifically, we integrate multi-level pixel relation knowledge to guide the training of GAMorph. Additionally, we utilize an edge attention module that includes the geometric priors of the images, ensuring the deformation field focuses more on the vascular regions of clinical interest. Experiments on two widely-used datasets, FIRE and FLoRI21, show that our proposed HybridRetina significantly outperforms some state-of-the-art methods. The code is available at https://github.com/lyp-deeplearning/awesome-retinal-registration.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# 大規模量子ネットワーク展開のための青写真 A blueprint for large-scale quantum-network deployments ( http://arxiv.org/abs/2409.01069v1 ) ライセンス: Link先を確認	Alberto Sebastián-Lombraña, Hans H. Brunner, Juan P. Brito, Rubén B. Méndez, Rafael J. Vicente, Jaime S. Buruaga, Laura Ortiz, Chi-Hang Fred Fung, Momtchil Peev, José M. Rivas-Moscoso, Felipe Jiménez, Antonio Pastor, Diego R. López, Jesús Folgueira, Vicente Martín,	(参考訳) 量子通信(Quantum Communications)は、暗号、量子コンピューティング、クロック同期などの潜在的な応用の進歩を約束する分野である。しかし、量子現象に基づく通信は外部の障害からの極度の分離を必要とし、古典的な現象と共に量子信号の伝送が困難になる。すでに展開されている光ネットワークにおいて、レガシトラフィックも持つ量子通信を導入するために、さまざまな技術がテストされている。これは物理的なレイヤだけでなく、運用層や管理層でも問題が発生します。ネットワーク運用者の間で広く受け入れられるためには、量子的資源と古典的資源の共同管理と運用、標準の遵守、品質と法的保証に対処する必要がある。この記事では、MadQCI(Madrid Quantum Communication Infrastructure)テストベッドにデプロイされ、評価された上記の問題に対するソリューションの詳細な説明を紹介する。このネットワークは、2つの異なるオペレータのプロダクションノードに複数のプロバイダから量子キー分散モジュールをインストールすることで、通信エコシステムに量子通信を統合するように設計されている。モジュールは130km以上の光ファイバーを配置した光スイッチネットワークを介して接続された。テストは、既存の古典的ネットワークのレガシートラフィックを保護する厳格なサービスレベルの合意に従って実施された。目標は、光学トランスポートと暗号化の変更を制限し、可能な限り多くの標準に準拠しながら、あらゆるレベルで完全な量子古典互換を実現することであった。この取り組みは、大規模な量子ネットワーク展開の基盤として使用できるブループリントとして機能することを目的としていた。 MadQCIの機能を示すために、エンドツーエンドの暗号化サービスがデプロイされ、さまざまなユースケースが紹介された。 Quantum Communications is a field that promises advances in cryptography, quantum computing and clock synchronisation, among other potential applications. However, communication based on quantum phenomena requires an extreme level of isolation from external disturbances, making the transmission of quantum signals together with classical ones difficult. A range of techniques has been tested to introduce quantum communications in already deployed optical networks which also carry legacy traffic. This comes with challenges, not only at the physical layer but also at the operations and management layer. To achieve a broad acceptance among network operators, the joint management and operation of quantum and classical resources, compliance with standards, and quality and legal assurance need to be addressed. This article presents a detailed account of solutions to the above issues, deployed and evaluated in the MadQCI (Madrid Quantum Communication Infrastructure) testbed. This network is designed to integrate quantum communications in the telecommunications ecosystem by installing quantum-key-distribution modules from multiple providers in production nodes of two different operators. The modules were connected through an optical-switched network with more than 130 km of deployed optical fibre. The tests were done in compliance with strict service level agreements that protected the legacy traffic of the pre-existing classical network. The goal was to achieve full quantum-classical compatibility at all levels, while limiting the modifications of optical transport and encryption and complying with as many standards as possible. This effort was intended to serve as a blueprint, which can be used as the foundation of large-scale quantum network deployments. To demonstrate the capabilities of MadQCI, end-to-end encryption services were deployed and a variety of use-cases were showcased.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# VideoLLaMB:リカレントメモリブリッジによる長文ビデオ理解 VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges ( http://arxiv.org/abs/2409.01071v1 ) ライセンス: Link先を確認	Yuxuan Wang, Cihang Xie, Yang Liu, Zilong Zheng,	(参考訳) 近年の大規模ビデオ言語モデルの進歩は、リアルタイムプランニングや詳細なインタラクションにおいて大きな可能性を秘めている。しかし、それらの高い計算要求と注釈付きデータセットの不足は、学術研究者にとって実用性を制限している。本稿では,橋梁内の時間的メモリトークンを利用して,歴史的視覚データとともにビデオシーケンス全体を符号化し,意味的連続性を効果的に保ち,様々なタスクにおけるモデル性能を向上させるための,新しいフレームワークであるVideoLLaMBを紹介する。このアプローチには、リカレントメモリトークンと、ビデオを独立したセマンティックユニットに分割してセマンティックな整合性を維持するSceneTillingアルゴリズムが含まれている。実証的に、VideoLLaMBは既存のビデオ言語モデルを大きく上回り、3つのVideoQAベンチマークで競合製品よりも5.5ポイント、エゴセントリックプランニングでは2.06ポイント改善されている。 MVBench の総合的な結果から, VideoLLaMB-7B は, 従来の 7B モデルと同等の LLM モデルよりも著しく良好な結果が得られることが示された。ビデオ長が最大8倍になるにもかかわらず、PLLaVAとして頑丈な性能を維持している。さらに,ビデオハイスタック(NIAVH)ベンチマークのフレーム検索結果から,長大なビデオ内の特定のフレームを正確に識別する VideoLLaMB の長所を検証した。我々のSceneTillingアルゴリズムは、追加のトレーニングを必要とせずに、ストリーミングビデオキャプションを直接生成することを可能にする。 16フレームでトレーニングされたVideoLLaMBは、リニアGPUメモリスケーリングを備えた1台のNvidia A100 GPU上で最大320フレームをサポートする。 Recent advancements in large-scale video-language models have shown significant potential for real-time planning and detailed interactions. However, their high computational demands and the scarcity of annotated datasets limit their practicality for academic researchers. In this work, we introduce VideoLLaMB, a novel framework that utilizes temporal memory tokens within bridge layers to allow for the encoding of entire video sequences alongside historical visual data, effectively preserving semantic continuity and enhancing model performance across various tasks. This approach includes recurrent memory tokens and a SceneTilling algorithm, which segments videos into independent semantic units to preserve semantic integrity. Empirically, VideoLLaMB significantly outstrips existing video-language models, demonstrating a 5.5 points improvement over its competitors across three VideoQA benchmarks, and 2.06 points on egocentric planning. Comprehensive results on the MVBench show that VideoLLaMB-7B achieves markedly better results than previous 7B models of same LLM. Remarkably, it maintains robust performance as PLLaVA even as video length increases up to 8 times. Besides, the frame retrieval results on our specialized Needle in a Video Haystack (NIAVH) benchmark, further validate VideoLLaMB's prowess in accurately identifying specific frames within lengthy videos. Our SceneTilling algorithm also enables the generation of streaming video captions directly, without necessitating additional training. In terms of efficiency, VideoLLaMB, trained on 16 frames, supports up to 320 frames on a single Nvidia A100 GPU with linear GPU memory scaling, ensuring both high performance and cost-effectiveness, thereby setting a new foundation for long-form video-language models in both academic and practical applications.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# 逆気象条件下でのロバストなオンラインドメイン適応セマンティックセマンティックセグメンテーションに向けて Towards Robust Online Domain Adaptive Semantic Segmentation under Adverse Weather Conditions ( http://arxiv.org/abs/2409.01072v1 ) ライセンス: Link先を確認	Taorong Liu, Jing Xiao, Liang Liao, Chia-Wen Lin,	(参考訳) オンラインドメイン適応(OnDA)は、急激な気象イベントなど、ドメイン間の明確な境界を欠いた、モデルのデプロイ中に発生する最小限のコストで、予期せぬドメイン変更を処理するように設計されている。しかし、現在のドメインに適応するためにモデル自体にのみ依存する既存のOnDAメソッドは、連続的なドメインシフトの中で曖昧なクラスを誤って識別し、この誤った知識を次のドメインに渡します。そこで本研究では, ドメインシフトを動的に検出し, ハイパーパラメータを調整し, トレーニングコストとエラー伝搬を最小化するために, ハイパーパラメータを動的に調整する, テキストbf{R}obust \textbf{O}nline \textbf{D}omain \textbf{A}daptive \textbf{S}emantic \textbf{S}emantic \textbf{S}egmentation frameworkを提案する。具体的には、高度に乱れた領域を動的に選択し、これらの領域を隠蔽し、曖昧なクラスにおけるエラーの蓄積を緩和し、動的自然環境における外部ノイズに対するモデルの堅牢性を高める、 \textbf{D}ynamic \textbf{A}mbiguous \textbf{P}atch \textbf{Mask} (\textbf{DAP Mask})戦略を導入する。さらに、ターゲットドメインシーンをクラスレベルのソースバッファで拡張し、高い不確実性とノイズのあるラベルを低減し、適応を加速し、オンラインドメイン適応のためのより効率的なソリューションを提供するドメイン認識混合手法である、 \textbf{D}ynamic \textbf{S}ource \textbf{C}lass \textbf{Mix} (\textbf{DSC Mix})を提案する。提案手法は,約40フレーム/秒(FPS)を維持しながら,広く使用されているOnDAベンチマークの最先端手法より優れている。 Online Domain Adaptation (OnDA) is designed to handle unforeseeable domain changes at minimal cost that occur during the deployment of the model, lacking clear boundaries between the domain, such as sudden weather events. However, existing OnDA methods that rely solely on the model itself to adapt to the current domain often misidentify ambiguous classes amidst continuous domain shifts and pass on this erroneous knowledge to the next domain. To tackle this, we propose \textbf{RODASS}, a \textbf{R}obust \textbf{O}nline \textbf{D}omain \textbf{A}daptive \textbf{S}emantic \textbf{S}egmentation framework, which dynamically detects domain shifts and adjusts hyper-parameters to minimize training costs and error propagation. Specifically, we introduce the \textbf{D}ynamic \textbf{A}mbiguous \textbf{P}atch \textbf{Mask} (\textbf{DAP Mask}) strategy, which dynamically selects highly disturbed regions and masks these regions, mitigating error accumulation in ambiguous classes and enhancing the model's robustness against external noise in dynamic natural environments. Additionally, we present the \textbf{D}ynamic \textbf{S}ource \textbf{C}lass \textbf{Mix} (\textbf{DSC Mix}), a domain-aware mix method that augments target domain scenes with class-level source buffers, reducing the high uncertainty and noisy labels, thereby accelerating adaptation and offering a more efficient solution for online domain adaptation. Our approach outperforms state-of-the-art methods on widely used OnDA benchmarks while maintaining approximately 40 frames per second (FPS).	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# SCOPE: LLMの埋め込みによる手話文脈処理 SCOPE: Sign Language Contextual Processing with Embedding from LLMs ( http://arxiv.org/abs/2409.01073v1 ) ライセンス: Link先を確認	Yuqi Liu, Wenqian Zhang, Sihan Ren, Chengyu Huang, Jingyi Yu, Lan Xu,	(参考訳) 世界中の約7000万人の聴覚障害者が使用する手話は、視覚的および文脈的な情報を伝える視覚言語である。視覚に基づく手話認識(SLR)と翻訳(SLT)の現在の手法は、限られたデータセットの多様性と文脈に関連のある情報の無視により、対話シーンに苦慮している。これらの課題に対処するために,新しいコンテキスト認識型SLRおよびSLTフレームワークであるSCOPE(Sign Language Contextual Processing with Embedding from LLMs)を紹介する。 SLRでは,多モードエンコーダを用いて対話コンテキストを利用し,光度レベル認識を強化する。後続のSLTでは、事前の会話コンテキストを取り入れたLarge Language Model(LLM)をさらに微調整する。また,72時間の中国語手話ビデオを含む新しい手話データセットを,様々なシナリオにおける文脈対話に貢献する。我々のSCOPEフレームワークは,Phoenix-2014T,CSL-Daily,SCOPEデータセットなど,複数のデータセット上で最先端のパフォーマンスを実現している。さらに,Deafコミュニティの参加者による調査は,実世界の応用における我々のアプローチの堅牢性と有効性をさらに検証した。私たちのデータセットとコードはどちらも、さらなる研究を促進するためにオープンソース化されます。 Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information. Current methods in vision-based sign language recognition (SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information. To address these challenges, we introduce SCOPE (Sign language Contextual Processing with Embedding from LLMs), a novel context-aware vision-based SLR and SLT framework. For SLR, we utilize dialogue contexts through a multi-modal encoder to enhance gloss-level recognition. For subsequent SLT, we further fine-tune a Large Language Model (LLM) by incorporating prior conversational context. We also contribute a new sign language dataset that contains 72 hours of Chinese sign language videos in contextual dialogues across various scenarios. Experimental results demonstrate that our SCOPE framework achieves state-of-the-art performance on multiple datasets, including Phoenix-2014T, CSL-Daily, and our SCOPE dataset. Moreover, surveys conducted with participants from the Deaf community further validate the robustness and effectiveness of our approach in real-world applications. Both our dataset and code will be open-sourced to facilitate further research.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# Bootstrap SGD:アルゴリズムの安定性とロバスト性 Bootstrap SGD: Algorithmic Stability and Robustness ( http://arxiv.org/abs/2409.01074v1 ) ライセンス: Link先を確認	Andreas Christmann, Yunwen Lei,	(参考訳) 本稿では,確率勾配降下(SGD)に対する経験的ブートストラップ法を用いて,分離可能なヒルベルト空間上の経験的リスクを最小限に抑える手法について,アルゴリズム的安定性と統計的ロバスト性の観点から検討する。最初の2つのアプローチは平均に基づいており、理論的観点から検討されている。アルゴリズム的安定性に基づくタイプ1とタイプ2のブートストラップSGDの一般化解析を行う。また、ブートストラップSGDを用いて、中央曲線の純粋に分布自由な点方向の信頼区間を構築することが可能であることを実証するために、ブートストラップSGDの別のタイプを提案する。 In this paper some methods to use the empirical bootstrap approach for stochastic gradient descent (SGD) to minimize the empirical risk over a separable Hilbert space are investigated from the view point of algorithmic stability and statistical robustness. The first two types of approaches are based on averages and are investigated from a theoretical point of view. A generalization analysis for bootstrap SGD of Type 1 and Type 2 based on algorithmic stability is done. Another type of bootstrap SGD is proposed to demonstrate that it is possible to construct purely distribution-free pointwise confidence intervals of the median curve using bootstrap SGD.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# 効率性を超えて: 一般化のための分子データプルーニング Beyond Efficiency: Molecular Data Pruning for Enhanced Generalization ( http://arxiv.org/abs/2409.01081v1 ) ライセンス: Link先を確認	Dingshuo Chen, Zhixun Li, Yuyan Ni, Guibin Zhang, Ding Wang, Qiang Liu, Shu Wu, Jeffrey Xu Yu, Liang Wang,	(参考訳) 様々な分子タスクや大量のデータセットの出現により、効率的なトレーニングの実施は、この地域で急務だが未調査の課題となっている。データプルーニング(DP)は、トレーニングの負担を減らし、あまり影響力のないサンプルをフィルタリングし、トレーニングのコアセットを形成する。しかし、分子タスクの事前訓練モデルへの依存が高まると、従来のドメイン内DPメソッドは互換性がなくなる。そこで本研究では,データ解析を事前訓練したモデルに適用する,ソースフリーなデータ解析シナリオに焦点を当てた,拡張一般化(MolPeg)のための分子データ解析フレームワークを提案する。トレーニング中に異なる更新ペースで2つのモデルを維持することにより、損失差に基づいてサンプルの情報量を測定する新しいスコアリング機能を導入する。 MolPegはプラグイン・アンド・プレイのフレームワークとして、ソースドメインとターゲットドメインの両方の認識を実現し、4つの下流タスクで既存のDPメソッドを一貫して上回ります。注目すべきは、HIVおよびPCBAデータセット上のデータの60～70%をプルーニングしても、フルデータセットトレーニングから得られるパフォーマンスを上回ることができることだ。我々の研究は、効率的なデータ処理メトリクスの発見が、転送学習における効率の向上と優れた一般化の両方に有効な道をもたらすことを示唆している。 With the emergence of various molecular tasks and massive datasets, how to perform efficient training has become an urgent yet under-explored issue in the area. Data pruning (DP), as an oft-stated approach to saving training burdens, filters out less influential samples to form a coreset for training. However, the increasing reliance on pretrained models for molecular tasks renders traditional in-domain DP methods incompatible. Therefore, we propose a Molecular data Pruning framework for enhanced Generalization (MolPeg), which focuses on the source-free data pruning scenario, where data pruning is applied with pretrained models. By maintaining two models with different updating paces during training, we introduce a novel scoring function to measure the informativeness of samples based on the loss discrepancy. As a plug-and-play framework, MolPeg realizes the perception of both source and target domain and consistently outperforms existing DP methods across four downstream tasks. Remarkably, it can surpass the performance obtained from full-dataset training, even when pruning up to 60-70% of the data on HIV and PCBA dataset. Our work suggests that the discovery of effective data-pruning metrics could provide a viable path to both enhanced efficiency and superior generalization in transfer learning.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# 画像検索のための証拠変換器 Evidential Transformers for Improved Image Retrieval ( http://arxiv.org/abs/2409.01082v1 ) ライセンス: Link先を確認	Danilo Dordevic, Suryansh Kumar,	(参考訳) 本稿では,画像検索を改良し,頑健にするための不確実性駆動型トランスモデルであるEvidential Transformerを紹介する。本稿では,コンテンツベース画像検索(CBIR)にいくつかの貢献を行う。我々は,画像検索に確率的手法を取り入れ,堅牢で信頼性の高い結果を得る。さらに,Global Context Vision Transformer (GC ViT) アーキテクチャを利用して,複数のデータセットの最先端検索結果を改善する。 SOP(Stanford Online Products)とCUB-200-2011データセットのすべてのテスト設定でCBIRに新しいベンチマークを設定することで、我々のアプローチの信頼性を一貫して実証した。 We introduce the Evidential Transformer, an uncertainty-driven transformer model for improved and robust image retrieval. In this paper, we make several contributions to content-based image retrieval (CBIR). We incorporate probabilistic methods into image retrieval, achieving robust and reliable results, with evidential classification surpassing traditional training based on multiclass classification as a baseline for deep metric learning. Furthermore, we improve the state-of-the-art retrieval results on several datasets by leveraging the Global Context Vision Transformer (GC ViT) architecture. Our experimental results consistently demonstrate the reliability of our approach, setting a new benchmark in CBIR in all test settings on the Stanford Online Products (SOP) and CUB-200-2011 datasets.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# フローマッチングによるアフォーマンス型ロボット操作 Affordance-based Robot Manipulation with Flow Matching ( http://arxiv.org/abs/2409.01083v1 ) ライセンス: Link先を確認	Fan Zhang, Michael Gienger,	(参考訳) 本稿では,人間を含むマルチタスクデータを収集する場合,特に日常の生活環境において,視覚的空き時間モデルに基づいて,ロボットの軌道を効果的に学習する,という2つの基本的な課題に焦点を当てた支援ロボット操作の枠組みを提案する。学習可能なテキストを凍結視覚モデルにプリペイドするパラメータ効率の高いプロンプトチューニング手法を用いて,マルチタスクシナリオにおける操作能力の予測を行う。そこで本研究では,教師付きフローマッチング手法を用いて,ロボットの軌道を手頃な価格で案内する手法を提案する。フローマッチングは、望まれるロボット軌道にランダムなウェイポイントを流れる条件プロセスとして、ロボットビズモータポリシーを表す。最後に、私たちのフレームワークをテストするために、デイリーリビングのアクティビティにまたがる10のタスクからなる現実世界のデータセットを紹介します。提案手法では, パラメータ効率を満足しつつ, 言語プロンサによる操作能力向上のためのプロンプトチューニング手法が, 競合性能を達成し, データスケールにおける他の微調整プロトコルよりも優れていた。単一フローマッチングポリシによるマルチタスクロボット軌道の学習も,特にマルチモーダルロボット動作分布を考慮すれば,代替動作クローン法よりも一貫してパフォーマンスが向上する。本フレームワークは,ロボット操作のためのフローマッチングにより,相性モデル学習と軌道生成をシームレスに統合する。 We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot trajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Finally, we introduce a real-world dataset with 10 tasks across Activities of Daily Living to test our framework. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot trajectories with a single flow matching policy also leads to consistently better performance than alternative behavior cloning methods, especially given multimodal robot action distributions. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# DPD編集:マルチモーダルファッション画像編集のための詳細保存拡散モデル DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing ( http://arxiv.org/abs/2409.01086v1 ) ライセンス: Link先を確認	Xiaolong Wang, Zhi-Qi Cheng, Jue Wang, Xiaojiang Peng,	(参考訳) ファッション画像編集は、デザイン概念をインタラクティブに視覚化することで、デザイナーが創造的なアイデアを伝える上で重要なツールである。現在のファッション画像編集技術は、マルチモーダルプロンプトと強力な拡散モデルによって進歩しているが、しばしば編集領域を正確に識別し、望ましいテクスチャの詳細を保存するのに苦労する。これらの課題に対処するために,我々は,Detail-Preserved Diffusion Models (DPDEdit) と呼ばれる潜在拡散モデルに基づく,新しいマルチモーダルなファッション画像編集アーキテクチャを導入する。 DPDEditは、テキストプロンプト、地域マスク、人間のポーズイメージ、衣料品のテクスチャイメージを統合することで、拡散モデルのファッション画像生成をガイドする。編集領域を正確に特定するために、まず、ユーザのテキスト記述に基づいて編集領域を予測し、他の条件と組み合わせてローカル編集を行う。テクスチャのテクスチャの詳細を対象のファッションイメージに転送するために,テクスチャ注入と精細化機構を提案する。具体的には、このメカニズムは、テキスト記述とテクスチャイメージを統合するために分離されたクロスアテンション層を使用し、補助的なU-Netを組み込んで、生成されたテクスチャテクスチャの高周波の詳細を保存する。さらに,マルチモーダルな言語モデルを用いてVITON-HDデータセットを拡張し,テクスチャ画像とテクスチャ記述を用いたペアサンプルを生成する。広汎な実験により,DPDEditは与えられたマルチモーダル入力と画像の忠実度とコヒーレンスの観点から,最先端の手法よりも優れていた。 Fashion image editing is a crucial tool for designers to convey their creative ideas by visualizing design concepts interactively. Current fashion image editing techniques, though advanced with multimodal prompts and powerful diffusion models, often struggle to accurately identify editing regions and preserve the desired garment texture detail. To address these challenges, we introduce a new multimodal fashion image editing architecture based on latent diffusion models, called Detail-Preserved Diffusion Models (DPDEdit). DPDEdit guides the fashion image generation of diffusion models by integrating text prompts, region masks, human pose images, and garment texture images. To precisely locate the editing region, we first introduce Grounded-SAM to predict the editing region based on the user's textual description, and then combine it with other conditions to perform local editing. To transfer the detail of the given garment texture into the target fashion image, we propose a texture injection and refinement mechanism. Specifically, this mechanism employs a decoupled cross-attention layer to integrate textual descriptions and texture images, and incorporates an auxiliary U-Net to preserve the high-frequency details of generated garment texture. Additionally, we extend the VITON-HD dataset using a multimodal large language model to generate paired samples with texture images and textual descriptions. Extensive experiments show that our DPDEdit outperforms state-of-the-art methods in terms of image fidelity and coherence with the given multimodal inputs.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# キーワード予測のための事前学習言語モデル:レビュー Pre-Trained Language Models for Keyphrase Prediction: A Review ( http://arxiv.org/abs/2409.01087v1 ) ライセンス: Link先を確認	Muhammad Umair, Tangina Sultana, Young-Koo Lee,	(参考訳) キーフレーズ予測(KP)は、その内容を要約できる文書中のキーフレーズを特定するのに不可欠である。しかし、近年の自然言語処理(NLP)の進歩により、ディープラーニング技術を用いたより効率的なKPモデルが開発されている。事前学習言語モデルを用いたキーフレーズ抽出と生成の併用による包括的探索の制限は,文献における重要なギャップを浮き彫りにし,本研究は,この欠損を橋渡しし,過去の調査の限界に対処するための統一的かつ詳細な分析を提供するよう,我々の調査論文を説得する。そこで本研究では,キーフレーズ抽出(KPE)とキーフレーズ生成(KPG)の2種類のタスクについて,異なる学習技術(スーパーバイザ,教師なし,半教師付き,自己教師付き)を用いて,大規模テキストコーパスで学習する,キーフレーズ予測のための事前学習言語モデル(PLM-KP)のトピックを広く検討する。 PLM-KPE と KPG に適切な分類法を導入し,これらの2つの NLP の課題を強調した。さらに,キーフレーズの予測に期待できる今後の方向性を指摘する。 Keyphrase Prediction (KP) is essential for identifying keyphrases in a document that can summarize its content. However, recent Natural Language Processing (NLP) advances have developed more efficient KP models using deep learning techniques. The limitation of a comprehensive exploration jointly both keyphrase extraction and generation using pre-trained language models spotlights a critical gap in the literature, compelling our survey paper to bridge this deficiency and offer a unified and in-depth analysis to address limitations in previous surveys. This paper extensively examines the topic of pre-trained language models for keyphrase prediction (PLM-KP), which are trained on large text corpora via different learning (supervisor, unsupervised, semi-supervised, and self-supervised) techniques, to provide respective insights into these two types of tasks in NLP, precisely, Keyphrase Extraction (KPE) and Keyphrase Generation (KPG). We introduce appropriate taxonomies for PLM-KPE and KPG to highlight these two main tasks of NLP. Moreover, we point out some promising future directions for predicting keyphrases.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# 分散学習に基づくプライバシ保護記録リンクの実現に向けて Towards Split Learning-based Privacy-Preserving Record Linkage ( http://arxiv.org/abs/2409.01088v1 ) ライセンス: Link先を確認	Michail Zervas, Alexandros Karakasidis,	(参考訳) ユーザデータのプライバシが要求されるアプリケーションを容易にするために、Split Learningが最近導入された。しかし、プライバシ保存記録リンク(Privacy-Preserving Record Linkage)は、異なるデータ所有者のデータベース間で同一の現実世界のエンティティを識別する問題であるが、追加情報は開示されていない。本稿では,プライバシ保存記録マッチングのための分割学習の可能性について検討し,従来型の集中型SVM技術に対する最小のマッチング効果を示す参照セットの利用を通じて,新たなトレーニング手法を導入する。 Split Learning has been recently introduced to facilitate applications where user data privacy is a requirement. However, it has not been thoroughly studied in the context of Privacy-Preserving Record Linkage, a problem in which the same real-world entity should be identified among databases from different dataholders, but without disclosing any additional information. In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching, by introducing a novel training method through the utilization of Reference Sets, which are publicly available data corpora, showcasing minimal matching impact against a traditional centralized SVM-based technique.	翻訳日:2024-09-06 07:38:47 公開日:2024-09-02
# CARIn:シングルDNNおよびマルチDNNワークロードのための不均一デバイスに対する制約認識と応答推論 CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads ( http://arxiv.org/abs/2409.01089v1 ) ライセンス: Link先を確認	Ioannis Panopoulos, Stylianos I. Venieris, Iakovos S. Venieris,	(参考訳) 近年のディープラーニングアプリケーションの絶え間ない拡大は、リアルタイム処理の急激な要求、プライバシーの懸念の高まり、さまざまなドメイン間のレイテンシの低減などによって、デバイス上での実行に対する重要なシフトを引き起こしている。本稿では,モバイルデバイス上でのディープニューラルネットワーク(DNN)の実行を最適化する上で,デバイスの不均一性,マルチDNN実行,動的ランタイム適応といった課題に対処する。 CARInは、ユーザ定義のサービスレベルの目的の下で、シングルDNNおよびマルチDNNアプリケーションの最適化デプロイ用に設計された新しいフレームワークである。 MOOソルバとして表現型多目的最適化フレームワークとランタイム対応ソート・検索アルゴリズム(RASS)を活用して、CARInは、マルチDNN実行に伴うリソース競合問題に対処しながら、動的条件への効率的な適応を容易にする。特に、RASSは一連の構成を生成し、その後の実行時適応を予測し、環境変動に応じて迅速に低オーバーヘッドの調整を行う。テキスト分類、シーン認識、顔分析など、さまざまなタスクにわたる広範囲な評価は、畳み込みニューラルネットワークやトランスフォーマー、現実的なユースケースなど、さまざまなモデルアーキテクチャにおけるCARInの汎用性を示している。現状のOODInフレームワークとは対照的に,単一モデルの設計では1.92倍,最大10.69倍に達した。さらに,マルチDNNアプリケーションにおいてハードウェアを意識しない設計に比べて,最大4.06倍の高速化を実現している。最後に,環境問題に対する最適設計の特定に係わる時間的オーバーヘッドを効果的に排除しつつ,その性能を維持する。 The relentless expansion of deep learning applications in recent years has prompted a pivotal shift toward on-device execution, driven by the urgent need for real-time processing, heightened privacy concerns, and reduced latency across diverse domains. This article addresses the challenges inherent in optimising the execution of deep neural networks (DNNs) on mobile devices, with a focus on device heterogeneity, multi-DNN execution, and dynamic runtime adaptation. We introduce CARIn, a novel framework designed for the optimised deployment of both single- and multi-DNN applications under user-defined service-level objectives. Leveraging an expressive multi-objective optimisation framework and a runtime-aware sorting and search algorithm (RASS) as the MOO solver, CARIn facilitates efficient adaptation to dynamic conditions while addressing resource contention issues associated with multi-DNN execution. Notably, RASS generates a set of configurations, anticipating subsequent runtime adaptation, ensuring rapid, low-overhead adjustments in response to environmental fluctuations. Extensive evaluation across diverse tasks, including text classification, scene recognition, and face analysis, showcases the versatility of CARIn across various model architectures, such as Convolutional Neural Networks and Transformers, and realistic use cases. We observe a substantial enhancement in the fair treatment of the problem's objectives, reaching 1.92x when compared to single-model designs and up to 10.69x in contrast to the state-of-the-art OODIn framework. Additionally, we achieve a significant gain of up to 4.06x over hardware-unaware designs in multi-DNN applications. Finally, our framework sustains its performance while effectively eliminating the time overhead associated with identifying the optimal design in response to environmental challenges.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# ディジタルツインネットワークの2時間同期とマイグレーション:マルチエージェント深部強化学習アプローチ Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach ( http://arxiv.org/abs/2409.01092v1 ) ライセンス: Link先を確認	Wenshuai Liu, Yaru Fu, Yongna Guo, Fu Lee Wang, Wen Sun, Yan Zhang,	(参考訳) デジタル双生児(DT)は、物理的世界のリアルタイム状態を表現し、自己維持システムを実現するための有望なイネーブラーとして登場した。実際には、モバイルユーザ(MU)のような物理デバイスのDTは、レイテンシを低減するために、マルチアクセスエッジコンピューティング(MEC)ネットワークに一般的にデプロイされる。 DTの精度と忠実性を確保するためには、MUがDTと定期的にステータスを同期させることが不可欠である。しかし、MUモビリティはDT同期に重大な課題をもたらす。まず、MUモビリティはDTマイグレーションをトリガーし、同期障害を引き起こす可能性がある。次に、MUはDTの忠実性を保証するためにDTと頻繁に同期する必要がある。それでも、MUモビリティによって引き起こされるMECサーバ間のDTマイグレーションは、頻繁に発生する可能性がある。そこで本稿では,MUの長期平均エネルギー消費を最小限に抑えるために,非凸確率問題を確立することにより,信頼性を考慮した2段階のDT同期・マイグレーションフレームワークを提案する。我々はリアプノフ理論を用いて信頼性制約を変換し、新しい問題を部分的に観測可能なマルコフ決定過程(POMDP)として再構成する。さらに,ベータ分布(Beta-HAPPO)法による不均一なエージェント近似ポリシー最適化手法を開発し,その解法を提案する。シミュレーションの結果, 提案手法は, 他のベンチマークと比較すると, 省エネ性を大幅に向上することがわかった。 Digital twins (DTs) have emerged as a promising enabler for representing the real-time states of physical worlds and realizing self-sustaining systems. In practice, DTs of physical devices, such as mobile users (MUs), are commonly deployed in multi-access edge computing (MEC) networks for the sake of reducing latency. To ensure the accuracy and fidelity of DTs, it is essential for MUs to regularly synchronize their status with their DTs. However, MU mobility introduces significant challenges to DT synchronization. Firstly, MU mobility triggers DT migration which could cause synchronization failures. Secondly, MUs require frequent synchronization with their DTs to ensure DT fidelity. Nonetheless, DT migration among MEC servers, caused by MU mobility, may occur infrequently. Accordingly, we propose a two-timescale DT synchronization and migration framework with reliability consideration by establishing a non-convex stochastic problem to minimize the long-term average energy consumption of MUs. We use Lyapunov theory to convert the reliability constraints and reformulate the new problem as a partially observable Markov decision-making process (POMDP). Furthermore, we develop a heterogeneous agent proximal policy optimization with Beta distribution (Beta-HAPPO) method to solve it. Numerical results show that our proposed Beta-HAPPO method achieves significant improvements in energy savings when compared with other benchmarks.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# DS MYOLO: シナリオ駆動のためのSSMに基づく信頼性の高いオブジェクト検出器 DS MYOLO: A Reliable Object Detector Based on SSMs for Driving Scenarios ( http://arxiv.org/abs/2409.01093v1 ) ライセンス: Link先を確認	Yang Li, Jianli Xiao,	(参考訳) 正確なリアルタイムオブジェクト検出により、高度な運転支援システムの安全性が向上し、運転シナリオに不可欠なコンポーネントとなる。ディープラーニング技術の急速な発展に伴い、CNNベースのリアルタイムオブジェクト検出器YOLOが注目されている。しかし、CNNのローカルな焦点はパフォーマンスのボトルネックをもたらす。検出器性能をさらに向上するため、研究者らはグローバルな受容場を利用するトランスフォーマーベースの自己認識機構を導入したが、その2次複雑さは計算コストを大幅に上回っている。最近、マンバは線形複雑であり、地球規模の選択的走査によって大きな進歩を遂げた。マンバの卓越した性能に触発されて,我々は新しい物体検出器DS MYOLOを提案する。この検出器は、単純化された選択的走査型融合ブロック(SimVSS Block)を通してグローバルな特徴情報をキャプチャし、ネットワークの深い特徴を効果的に統合する。さらに,計算複雑性を低く保ちながら,チャネル間の特徴的相互作用を向上させる効率的なチャネルアテンション畳み込み(ECAConv)を導入する。 CCTSDB 2021およびVLD-45駆動シナリオデータセットの大規模な実験により、DS MYOLOは、同様のスケールのYOLOシリーズのリアルタイムオブジェクト検出器において、大きな可能性と競争上の優位性を示すことが示された。 Accurate real-time object detection enhances the safety of advanced driver-assistance systems, making it an essential component in driving scenarios. With the rapid development of deep learning technology, CNN-based YOLO real-time object detectors have gained significant attention. However, the local focus of CNNs results in performance bottlenecks. To further enhance detector performance, researchers have introduced Transformer-based self-attention mechanisms to leverage global receptive fields, but their quadratic complexity incurs substantial computational costs. Recently, Mamba, with its linear complexity, has made significant progress through global selective scanning. Inspired by Mamba's outstanding performance, we propose a novel object detector: DS MYOLO. This detector captures global feature information through a simplified selective scanning fusion block (SimVSS Block) and effectively integrates the network's deep features. Additionally, we introduce an efficient channel attention convolution (ECAConv) that enhances cross-channel feature interaction while maintaining low computational complexity. Extensive experiments on the CCTSDB 2021 and VLD-45 driving scenarios datasets demonstrate that DS MYOLO exhibits significant potential and competitive advantage among similarly scaled YOLO series real-time object detectors.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# OCMG-Net:非構造点雲のニューラル配向正規化 OCMG-Net: Neural Oriented Normal Refinement for Unstructured Point Clouds ( http://arxiv.org/abs/2409.01100v1 ) ライセンス: Link先を確認	Yingrui Wu, Mingyang Zhao, Weize Quan, Jian Shi, Xiaohong Jia, Dong-Ming Yan,	(参考訳) 非構造点雲から指向性正規項を推定するための頑健な精錬法を提案する。計算の複雑さに悩まされたり、望ましい精度を達成できなかった従来の手法とは対照的に、我々の新しいフレームワークは、特徴空間に手話方向とデータ拡張を取り入れ、初期指向の正規性を洗練させ、効率と精度のバランスを損なう。従来の手法ではノイズによる方向の不整合の問題に対処するため,クリーンな点の雲に最も近い点でアノテートされた正規を補正することにより,推定誤差を忠実に最小化する,Chamfer Normal Distanceと呼ばれる新しい指標を導入する。このメトリクスは、課題に取り組むだけでなく、ネットワークトレーニングを支援し、ノイズに対するネットワークの堅牢性を大幅に向上させる。さらに,マルチスケールな局所的特徴集約と階層的幾何情報融合を統合し,複雑な幾何学的詳細をより効果的に捕捉し,スケール選択のあいまいさを顕著に低減する,革新的なデュアル並列アーキテクチャを提案する。室内および屋外シナリオ間の合成および実世界のデータセット間の非指向性および指向性正規推定タスクにおいて,本手法の優位性と汎用性を示す。コードはhttps://github.com/YingruiWoo/OCMG-Net.gitで公開されている。 We present a robust refinement method for estimating oriented normals from unstructured point clouds. In contrast to previous approaches that either suffer from high computational complexity or fail to achieve desirable accuracy, our novel framework incorporates sign orientation and data augmentation in the feature space to refine the initial oriented normals, striking a balance between efficiency and accuracy. To address the issue of noise-caused direction inconsistency existing in previous approaches, we introduce a new metric called the Chamfer Normal Distance, which faithfully minimizes the estimation error by correcting the annotated normal with the closest point found on the potentially clean point cloud. This metric not only tackles the challenge but also aids in network training and significantly enhances network robustness against noise. Moreover, we propose an innovative dual-parallel architecture that integrates Multi-scale Local Feature Aggregation and Hierarchical Geometric Information Fusion, which enables the network to capture intricate geometric details more effectively and notably reduces ambiguity in scale selection. Extensive experiments demonstrate the superiority and versatility of our method in both unoriented and oriented normal estimation tasks across synthetic and real-world datasets among indoor and outdoor scenarios. The code is available at https://github.com/YingruiWoo/OCMG-Net.git.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# 進化的ソフトアクター批判によるAIオリンピックの挑戦 AI Olympics challenge with Evolutionary Soft Actor Critic ( http://arxiv.org/abs/2409.01104v1 ) ライセンス: Link先を確認	Marco Calì, Alberto Sinigaglia, Niccolò Turcato, Ruggero Carli, Gian Antonio Susto,	(参考訳) 次回報告では、IROS 2024で開催されるAIオリンピック大会の解決策について述べる。私たちのソリューションは、モデルフリーのDeep Reinforcement Learningアプローチと進化戦略を組み合わせています。使用済みのアルゴリズムを簡潔に記述し、そのアプローチの詳細を提供する。 In the following report, we describe the solution we propose for the AI Olympics competition held at IROS 2024. Our solution is based on a Model-free Deep Reinforcement Learning approach combined with an evolutionary strategy. We will briefly describe the algorithms that have been used and then provide details of the approach	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# Poster: O-RANセキュリティテストラボの開発 Poster: Developing an O-RAN Security Test Lab ( http://arxiv.org/abs/2409.01107v1 ) ライセンス: Link先を確認	Sotiris Michaelides, David Rupprecht, Katharina Kohls,	(参考訳) Open Radio Access Networks (ORAN) は、数年前に提案された新しいアーキテクチャアプローチであり、5Gの現在の次世代無線アクセスネットワーク(NG-RAN)の拡張である。 ORANは、さまざまなRadio Access Networks(RAN)コンポーネント間のオープンインターフェースを実装し、マシンラーニングや仮想化、デアグリゲーションといったモダンなテクノロジをRANに導入することで、少数のベンダによってコントロールされる、クローズドなRAN市場を破ることを目指している。しかし、ORANのアーキテクチャ設計は、そのセキュリティに関する懸念や議論を引き起こしており、これはその大きな欠点の1つと考えられている。 ORANに関するいくつかの理論的リスク分析が実施されているが、私たちの知る限りでは、まだ1つの実践的リスク解析も行われていない。本ポスターでは,ORAN 5Gネットワークを最小限かつ将来的に展開する手法について論じる。 Open Radio Access Networks (ORAN) is a new architectural approach, having been proposed only a few years ago, and it is an expansion of the current Next Generation Radio Access Networks (NG-RAN) of 5G. ORAN aims to break this closed RAN market that is controlled by a handful of vendors, by implementing open interfaces between the different Radio Access Networks (RAN) components, and by introducing modern technologies to the RAN like machine learning, virtualization, and disaggregation. However, the architectural design of ORAN was recently causing concerns and debates about its security, which is considered one of its major drawbacks. Several theoretical risk analyses related to ORAN have been conducted, but to the best of our knowledge, not even a single practical one has been performed yet. In this poster, we discuss and propose a way for a minimal, future-proof deployment of an ORAN 5G network, able to accommodate various hands-on security analyses for its different elements.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# SOOD-ImageNet:Semantic Out-Of-Distribution Image ClassificationとSemantic Segmentationのための大規模データセット SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation ( http://arxiv.org/abs/2409.01109v1 ) ライセンス: Link先を確認	Alberto Bacchin, Davide Allegro, Stefano Ghidoni, Emanuele Menegatti,	(参考訳) コンピュータビジョンにおけるアウト・オブ・ディストリビューション(OOD)の検出は重要な研究領域であり、関連するベンチマークは実際のシナリオにおけるモデルの一般化可能性とその適用性を評価する上で重要な役割を果たす。しかし、文献における既存のOODベンチマークには、1)潜在的な課題としてセマンティックシフトを見落としている場合が多く、(2)現代のモデルのトレーニングに使用される大規模なデータセットと比較して、その規模は限られている。これらのギャップに対処するために,OOD条件下でのイメージ分類やセマンティックセグメンテーションなどのコンピュータビジョンタスクのために設計された,56のクラスにまたがる約1.6万の画像からなる新しいデータセットSOOD-ImageNetを紹介し,セマンティックシフトの問題に焦点をあてる。我々は、人間の正確なチェックによって補完される現代の視覚言語モデルの能力を活用する革新的なデータエンジンを開発することで、必要なスケーラビリティと品質を確保した。我々は,SOOD-ImageNetにおける様々なモデルの広範囲なトレーニングと評価を通じて,OOD研究をコンピュータビジョンで大きく前進させる可能性を示す。プロジェクトページはhttps://github.com/bach05/SOODImageNet.gitで公開されている。 Out-of-Distribution (OOD) detection in computer vision is a crucial research area, with related benchmarks playing a vital role in assessing the generalizability of models and their applicability in real-world scenarios. However, existing OOD benchmarks in the literature suffer from two main limitations: (1) they often overlook semantic shift as a potential challenge, and (2) their scale is limited compared to the large datasets used to train modern models. To address these gaps, we introduce SOOD-ImageNet, a novel dataset comprising around 1.6M images across 56 classes, designed for common computer vision tasks such as image classification and semantic segmentation under OOD conditions, with a particular focus on the issue of semantic shift. We ensured the necessary scalability and quality by developing an innovative data engine that leverages the capabilities of modern vision-language models, complemented by accurate human checks. Through extensive training and evaluation of various models on SOOD-ImageNet, we showcase its potential to significantly advance OOD research in computer vision. The project page is available at https://github.com/bach05/SOODImageNet.git.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# 連続対称性群に対する量子スピン鎖の対称性保護状態の分類 Classification of symmetry protected states of quantum spin chains for continuous symmetry groups ( http://arxiv.org/abs/2409.01112v1 ) ライセンス: Link先を確認	Bruno de Oliveira Carvalho, Wojciech De Roeck, Tijl Jappens,	(参考訳) 量子スピン系の対称性保護状態(SPT)は、いくつかの著者によって研究された。有限オンサイト対称性群 $G$ に対応する SPT は、Kapustin et al [J. Math. Phys. (2021)] によって確立された第2コホモロジー群 $H^2(G,U(1))$ によって分類される。この結果はコンパクト位相対称性群 $G$ の場合に拡張する。我々はまた、我々の分類結果が局所的に有界なオンサイト次元を持つスピン鎖のクラスに収まるという意味で、既存の結果を強化する。 Symmetry protected states (SPT's) of quantum spin systems were studied by several authors. For one-dimensional systems (spin chains), there is an essentially complete and rigorous understanding: SPT's corresponding to finite on-site symmetry groups $G$ are classified by the second cohomology group $H^2(G,U(1))$, as established by Kapustin et al. [J. Math. Phys. (2021)]. We extend this result to the case of compact topological symmetry groups $G$. We also strengthen the existing results in the sense that our classification results holds within the class of spin chains with locally bounded on-site dimensions.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# KMTalk:キーモーション埋め込みによる音声駆動型3D顔アニメーション KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding ( http://arxiv.org/abs/2409.01113v1 ) ライセンス: Link先を確認	Zhihao Xu, Shengjie Gong, Jiapeng Tang, Lingyu Liang, Yining Huang, Haojie Li, Shuangping Huang,	(参考訳) キーモーション埋め込みを用いた音声系列から3次元顔の動きを合成する新しい手法を提案する。データ駆動技術の最近の進歩にもかかわらず、音声信号と3D顔メッシュの正確なマッピングは依然として難しい。シーケンス全体の直接回帰は、しばしば問題の性質が不適切なため、過度に滑らかな結果をもたらす。そこで本研究では,キーモーションキャプチャを導入して3次元顔画像を生成するプログレッシブ学習機構を提案する。具体的には、言語ベースのキーモーション獲得とモーダル間動作完了という2つのモジュールを通して、言語とデータ駆動の先行情報を統合する。前者は重要な動きを識別し、関連する3D表情を学習し、正確な唇音声同期を保証する。後者は、キーモーションを音声機能によってガイドされた3D音声の完全なシーケンスに拡張し、時間的コヒーレンスとオーディオ-視覚的整合性を改善する。既存の最先端手法と比較して、より鮮明で一貫した会話顔アニメーションを生成する上で、我々のアプローチが優れていることを示す。提案手法を既存の手法と統合することにより,提案手法の有効性を裏付ける結果が得られた。コードと重みはプロジェクトのWebサイトにある: \url{https://github.com/ffxzh/KMTalk}。 We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a progressive learning mechanism that generates 3D facial animations by introducing key motion capture to decrease cross-modal mapping uncertainty and learning complexity. Concretely, our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The former identifies key motions and learns the associated 3D facial expressions, ensuring accurate lip-speech synchronization. The latter extends key motions into a full sequence of 3D talking faces guided by audio features, improving temporal coherence and audio-visual consistency. Extensive experimental comparisons against existing state-of-the-art methods demonstrate the superiority of our approach in generating more vivid and consistent talking face animations. Consistent enhancements in results through the integration of our proposed learning scheme with existing methods underscore the efficacy of our approach. Our code and weights will be at the project website: \url{https://github.com/ffxzh/KMTalk}.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# ランダム畳み込みカーネルに基づく変換を用いた時系列分類:プーリング演算子と入力表現が重要である Time series classification with random convolution kernels based transforms: pooling operators and input representations matter ( http://arxiv.org/abs/2409.01115v1 ) ライセンス: Link先を確認	Mouhamadou Mansour Lo, Gildas Morvan, Mathieu Rossi, Fabrice Morganti, David Mercier,	(参考訳) 本稿では,SelF-Rocketと呼ばれるMiniRocketをベースとした,高速時系列分類(TSC)のための新しいアプローチを提案する。ランダムな畳み込みカーネルに基づく既存のアプローチとは異なり、トレーニングプロセス中に最適な入力表現とプーリング演算子を動的に選択する。 SelF-Rocketはカリフォルニア大学リバーサイド校(UCR)のベンチマークデータセットで最先端の精度を実現している。 This article presents a new approach based on MiniRocket, called SelF-Rocket, for fast time series classification (TSC). Unlike existing approaches based on random convolution kernels, it dynamically selects the best couple of input representations and pooling operator during the training process. SelF-Rocket achieves state-of-the-art accuracy on the University of California Riverside (UCR) TSC benchmark datasets.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# オープンソース・ソフトウェア・ソリューションの公共セクター買収におけるソフトロイン--市町村営Eサービスプラットフォームを事例として Soft-lockins in Public Sector Acquisitions of Open Source Software-solutions: A Case Study on a Municipal E-Service Platform ( http://arxiv.org/abs/2409.01118v1 ) ライセンス: Link先を確認	Per Persson, Johan Linåker,	(参考訳) 背景: オープンソースソフトウェア(OSS)は、しばしばロックインのリスクを軽減するオプションと見なされる。しかし、単一ベンダのOSSは、知識の対称性と技術的な障壁のために、依然としてソフトなロックインをもたらす可能性がある。 Aim: この研究は、このようなソフトロックインをレンダリングするアクターを調査します。研究設計: 190以上の自治体で使用されているE-service Platform(ESP)の質的なケーススタディを行う。結果: ユーザ主導のロックイン要因は, 限定的・透過的コミュニケーション, 調達の制限的資格要件, 保守の混乱, 現状の快適性など, 重要なカテゴリーとして出現した。技術的なロックイン要因には、不十分なドキュメント、依存性管理の問題、限定的なテストカバレッジなどがある。結論: 自治体間の快適で保守的な文化の存在に対処するために、強いリーダーシップと継続的な訓練が必要である。オープンソースのStewards、すなわちOSSプロジェクトの中立的なホストは、これらのタスクにおいて自治体をサポートすると同時に、より広範なサプライヤエコシステムを実現するためのオープンで競争力のあるコラボレーションを促進するのに役立つ。 Background: Open Source Software (OSS) is often seen as an option to mitigate risks of lock-ins. Yet, single-vendor OSS can still result in soft lock-ins due to knowledge asymmetries and technical barriers. Aim: This study explores actors that render such soft lock-ins. Research design: We conduct a qualitative case study of an E-service Platform (ESP) used by over 190+ municipalities. Results: User-driven lock-in factors emerged as a significant category, including limited and non-transparent communication, restrictive qualification requirements in procurement, confusion on maintainership, and comfort in the status quo. Technical lock-in factors include inadequate documentation, dependency management issues, and limited test coverage. Conclusions: Strong leadership and continuous training is needed to address presence of comfort and conservative culture among municipalities. Open Source Stewards, i.e., neutral hosts for OSS projects, can support municipalities in these tasks while also helping to foster an open, competitive collaboration that can enable a broader supplier ecosystem.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# 非線形波動方程式の孤立波シミュレーションのための2段階初期値反復物理学インフォームドニューラルネットワーク Two-stage initial-value iterative physics-informed neural networks for simulating solitary waves of nonlinear wave equations ( http://arxiv.org/abs/2409.01124v1 ) ライセンス: Link先を確認	Jin Song, Ming Zhong, George Em Karniadakis, Zhenya Yan,	(参考訳) 従来の数値反復法と物理インフォームドニューラルネットワーク(PINN)に基づく非線形波動方程式の孤立波計算のための新しい2段階初期値反復ニューラルネットワーク(IINN)を提案する。具体的には、IINNフレームワークは2つのサブネットワークで構成され、そのうちの1つは与えられた初期値に適合するために使用され、もう1つは物理情報を含み、最初のサブネットワークに基づいてトレーニングを継続する。重要なことに、IINN法は、与えられた初期値とは別に、境界条件を含む追加のデータ情報を必要としない。提案手法の有効性を示すための理論的保証を提供する。提案したIINN法は,1次元非線形シュリンガー方程式(NLS),PT-対称光学格子を持つ1次元飽和NLS方程式,KdV方程式,電位を持つ2次元NLS方程式,電位を持つ2次元修正GP方程式,2+1次元KP方程式,3次元NLS方程式など,様々な非線形波動方程式の解の学習に有効である。これらの応用は,本手法の有効性を示す証拠となる。最後に,従来の手法と比較することにより,提案手法の利点を実証する。 We propose a new two-stage initial-value iterative neural network (IINN) algorithm for solitary wave computations of nonlinear wave equations based on traditional numerical iterative methods and physics-informed neural networks (PINNs). Specifically, the IINN framework consists of two subnetworks, one of which is used to fit a given initial value, and the other incorporates physical information and continues training on the basis of the first subnetwork. Importantly, the IINN method does not require any additional data information including boundary conditions, apart from the given initial value. Corresponding theoretical guarantees are provided to demonstrate the effectiveness of our IINN method. The proposed IINN method is efficiently applied to learn some types of solutions in different nonlinear wave equations, including the one-dimensional (1D) nonlinear Schr\"odinger equations (NLS) equation (with and without potentials), the 1D saturable NLS equation with PT -symmetric optical lattices, the 1D focusing-defocusing coupled NLS equations, the KdV equation, the two-dimensional (2D) NLS equation with potentials, the 2D amended GP equation with a potential, the (2+1)-dimensional KP equation, and the 3D NLS equation with a potential. These applications serve as evidence for the efficacy of our method. Finally, by comparing with the traditional methods, we demonstrate the advantages of the proposed IINN method.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# 雑音チャネル上の通信におけるロバスト表現の学習 Learning Robust Representations for Communications over Noisy Channels ( http://arxiv.org/abs/2409.01129v1 ) ライセンス: Link先を確認	Sudharsan Senthil, Shubham Paul, Nambi Seshadri, R. David Koilpillai,	(参考訳) ディープラーニング(DL)ベースの通信システムは、従来の数学的モデル化システムよりも利点がある。 FCNN(Fully Connected Neural Networks)は、ディープラーニングアーキテクチャである。最適化問題を解くことはよく知られているが、既存の文献では、通信モデルの堅牢な表現を学ばないことが示唆されている。本研究は,既存の古典モデルからインスピレーションを受けずに,エンドツーエンドの通信システムを学習するFCNNの可能性を探るものである。本研究は,厳密な電力制約の下でシンボルの堅牢な表現を生成するために,コスト関数の変動によるドメイン知識の付与が与える影響について検討する。さらに,Barlow Twinsフレームワークにインスパイアされた新しいエンコーダ構造を導入する。最後に,SNR(Signal to Noise Ratio)の感度について,しばしば見落とされがちな課題に対処し,通信システムにおけるその重要性を強調するトレーニング戦略を導入する。このような手法がより信頼性の高いモデルを生み出すことを実証する。 A deep learning (DL)-based communication system offers advantages over traditional mathematically modelled systems, as the former may be jointly optimized. FCNNs (Fully Connected Neural Networks) are common Deep Learning architectures. Though they are well known to solve optimization problems, existing literature suggests that they fail to learn robust representations for communication models. This work explores the potential of FCNNs to learn an end-to-end communication system without taking any inspiration from existing classical models. The study investigates the impact of imbibing domain knowledge by varying cost functions to generate robust representations of symbols under strict power constraints. Additionally, we introduce a novel encoder structure inspired by the Barlow Twins framework. Finally, we introduce a training strategy that addresses the often-overlooked issue of training Signal to Noise Ratio (SNR) sensitivity, highlighting its importance in communication systems. We demonstrate that such a method leads to more reliable models.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# 変性体からの絡み合い変換のための誤差指数 Error exponents for entanglement transformations from degenerations ( http://arxiv.org/abs/2409.01130v1 ) ライセンス: Link先を確認	Dávid Bugár, Péter Vrana,	(参考訳) 本稿では, 純粋な多粒子状態間の漸近型LOCC変換における速度と強い逆指数のトレードオフ関係について検討する。一対の状態の間の単一コピー確率変換は、速度 1 での漸近変換が可能であり、指数関数的に成功確率が減少することを意味する。しかし、漸近変換が非ゼロ確率で実現可能である可能性はあるが、同じ速度の有限個のコピーの間には、確率的にさえ変換が存在しない。そのような場合、最適成功確率が指数関数的に減少するかどうかは分かっていない。漸近的変換の実現可能性を示すための基本的な道具は変性である。任意の退化は、初期状態のコピーとGHZ状態のサブ線形数からターゲット状態の同じコピー数への確率的LOCC変換をもたらす。これらのプロトコルは自由に選択できるパラメータを含むが、選択は成功確率に影響を与える。本稿では、パラメータの漸近的最適選択を特徴付け、結果のプロトコルのエラー指数に対するシングルレター式を導出する。特にこれは、確率変換が退化から生じるときの成功確率の指数的な下界を意味する。 This paper explores the trade-off relation between the rate and the strong converse exponent for asymptotic LOCC transformations between pure multipartite states. Any single-copy probabilistic transformation between a pair of states implies that an asymptotic transformation at rate 1 is possible with an exponentially decreasing success probability. However, it is possible that an asymptotic transformation is feasible with nonzero probability, but there is no transformation between any finite number of copies with the same rate, even probabilistically. In such cases it is not known if the optimal success probability decreases exponentially or faster. A fundamental tool for showing the feasibility of an asymptotic transformation is degeneration. Any degeneration gives rise to a sequence of stochastic LOCC transformations from copies of the initial state plus a sublinear number of GHZ states to the same number of copies of the target state. These protocols involve parameters that can be freely chosen, but the choice affects the success probability. In this paper, we characterize an asymptotically optimal choice of the parameters and derive a single-letter expression for the error exponent of the resulting protocol. In particular, this implies an exponential lower bound on the success probability when the stochastic transformation arises from a degeneration.	翻訳日:2024-09-06 07:26:52 公開日:2024-09-02
# 単眼画像から奥行きを理解できる大規模言語モデル Large Language Models Can Understanding Depth from Monocular Images ( http://arxiv.org/abs/2409.01133v1 ) ライセンス: Link先を確認	Zhongyi Xia, Tianzhao Wu,	(参考訳) 単眼深度推定はコンピュータビジョンアプリケーションにおいて重要な機能である。本稿では,資源利用の効率化と一貫したニューラルネットワークアーキテクチャを用いて,大規模言語モデル(LLM)を最小限の監視で効果的に解釈可能であることを示す。 LLM-MDEは,言語理解を通して深度を解読するマルチモーダルフレームワークである。具体的には、LLM-MDEは、事前訓練されたLLMの深度推定能力を高めるために、クロスモーダルプログラミングと適応的なプロンプト推定モジュールの2つの主要な戦略を採用している。これらの戦略は、視覚表現をテキストプロトタイプと整合させ、それぞれ単眼画像に基づいてプロンプトを自動生成する。実世界のMDEデータセットに関する総合的な実験により、資源使用を最小化しながら、数秒/ゼロのタスクに優れるLLM-MDEの有効性と優位性が確認された。ソースコードは公開されている。 Monocular depth estimation is a critical function in computer vision applications. This paper shows that large language models (LLMs) can effectively interpret depth with minimal supervision, using efficient resource utilization and a consistent neural network architecture. We introduce LLM-MDE, a multimodal framework that deciphers depth through language comprehension. Specifically, LLM-MDE employs two main strategies to enhance the pretrained LLM's capability for depth estimation: cross-modal reprogramming and an adaptive prompt estimation module. These strategies align vision representations with text prototypes and automatically generate prompts based on monocular images, respectively. Comprehensive experiments on real-world MDE datasets confirm the effectiveness and superiority of LLM-MDE, which excels in few-/zero-shot tasks while minimizing resource use. The source code is available.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# 2evy-Index分数Kerr媒体におけるソリトン崩壊、変調不安定、ローグ波励起の抑制 Suppression of soliton collapses, modulational instability, and rogue-wave excitation in two-Lévy-index fractional Kerr media ( http://arxiv.org/abs/2409.01135v1 ) ライセンス: Link先を確認	Ming Zhong, Yong Chen, Zhenya Yan, Boris A. Malomed,	(参考訳) L'{e}vy indices, $\alpha_{1}\, \alpha_{2}\in (1, 2]$, and self-focusing or defocusing Kerr linearity。いくつかの基本ソリトンは変分近似を用いて得られ、数値的な結果と比較して検証される。ソリトン崩壊は、L\'{e}vy index $\alpha =1$の1次元立方乗分数非線形Schr\"{o}dinger方程式で示され、2-L\'{e}vy-index分数非線形Schr\"{o}dinger系で抑制できる。ソリトンの安定性は、ガウスパルスとの衝突や系のパラメータの断熱的変動に対しても検討される。連続波の変調不安定性を2-L\'{e}vy-index系でも検討した。特に、変調不安定性は、2つの回折係数が反対の符号を持つとき、デフォーカス非線形性(英語版)の場合に生じることがある。変調不安定性の結果を用いて, 連続波上に1次および2次ローグ波を発生させ, ケーラー非線形性の両符号を求める。 s in laser systems with two fractional-dispersion/diffraction terms, quantified by their L\'{e}vy indices, $\alpha_{1}\, \alpha_{2}\in (1, 2]$, and self-focusing or defocusing Kerr nonlinearity. Some fundamental solitons are obtained by means of the variational approximation, which are verified by comparison with numerical results. We find that the soliton collapse, exhibited by the one-dimensional cubic fractional nonlinear Schr\"{o}dinger equation with only one L\'{e}vy index $\alpha =1$, can be suppressed in the two-L\'{e}vy-index fractional nonlinear Schr\"{o}dinger system. Stability of the solitons is also explored against collisions with Gaussian pulses and adiabatic variation of the system parameters. Modulation instability of continuous waves is investigated in the two-L\'{e}vy-index system too. In particular, the modulation instability may occur in the case of the defocusing nonlinearity when two diffraction coefficients have opposite signs. Using results for the modulation instability, we produce first- and second-order rogue waves on top of continuous waves, for both signs of the Kerr nonlinearity.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# 希少物体のための合成衛星画像の生成:モデルと計量の実証的比較 Generating Synthetic Satellite Imagery for Rare Objects: An Empirical Comparison of Models and Metrics ( http://arxiv.org/abs/2409.01138v1 ) ライセンス: Link先を確認	Tuong Vy Nguyen, Johannes Hoster, Alexander Glaser, Kristian Hildebrand, Felix Biessmann,	(参考訳) 生成的ディープラーニングアーキテクチャは、現実的で高解像度の偽画像を生成することができる。この文脈における重要な疑問は、特にニッチドメインにおいて、現実的なイメージを生成するのがどの程度簡単か、ということです。特定の画像の内容を達成するのに必要な反復的なプロセスは、自動化と制御が困難である。特に稀なクラスでは、生成的アプローチが現実的なイメージとアライメントを生み出すかどうかを評価できない。本研究では,合成衛星画像を生成するために微調整した生成アーキテクチャの大規模評価について述べる。この制限は、世界中の約400の施設にしか存在しないため、実世界の事例の限られた回数で訓練とテストデータが制限される他の多くのシナリオに例えられる。我々は,ゲームエンジンから得られた2種類のモーダル性,テキスト入力,画像入力を条件付けて合成画像を生成する。生成した画像は, 自動評価のためによく使用される指標を用いて評価し, そして, その信頼性を評価するために実施したユーザスタディからの人的判断と比較した。本研究は, 稀な物体であっても, テキストや詳細な建築レイアウトによる合成衛星画像の生成が可能であることを示す。実際、一般的に使用されている画像品質メトリクスと人間の評価との間には、強い負の相関関係があることが分かっています。 Generative deep learning architectures can produce realistic, high-resolution fake imagery -- with potentially drastic societal implications. A key question in this context is: How easy is it to generate realistic imagery, in particular for niche domains. The iterative process required to achieve specific image content is difficult to automate and control. Especially for rare classes, it remains difficult to assess fidelity, meaning whether generative approaches produce realistic imagery and alignment, meaning how (well) the generation can be guided by human input. In this work, we present a large-scale empirical evaluation of generative architectures which we fine-tuned to generate synthetic satellite imagery. We focus on nuclear power plants as an example of a rare object category - as there are only around 400 facilities worldwide, this restriction is exemplary for many other scenarios in which training and test data is limited by the restricted number of occurrences of real-world examples. We generate synthetic imagery by conditioning on two kinds of modalities, textual input and image input obtained from a game engine that allows for detailed specification of the building layout. The generated images are assessed by commonly used metrics for automatic evaluation and then compared with human judgement from our conducted user studies to assess their trustworthiness. Our results demonstrate that even for rare objects, generation of authentic synthetic satellite imagery with textual or detailed building layouts is feasible. In line with previous work, we find that automated metrics are often not aligned with human perception -- in fact, we find strong negative correlations between commonly used image quality metrics and human ratings.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# LLM-PQA: LLM強化予測クエリー解法 LLM-PQA: LLM-enhanced Prediction Query Answering ( http://arxiv.org/abs/2409.01140v1 ) ライセンス: Link先を確認	Ziyu Li, Wenjie Zhao, Asterios Katsifodimos, Rihan Hai,	(参考訳) LLM(Large Language Models)の出現は、従来のSQLベースのデータベースシステムの制約を越えて、クエリの処理方法を変更する機会を提供する。しかし、予測クエリにLLMを使用することは、外部MLモデルを採用する必要があり、回答を提供するために推論を行う必要があるため、依然として困難である。本稿では,自然言語で表現された予測クエリに対処する新しいツール LLM-PQA を紹介する。 LLM-PQAは、データレイクとモデル動物園を統合することにより、予測クエリの必要性を予測するためのLLMと検索強化メカニズムを結合する最初の方法である。この統合により、ユーザは多様な異種データと多様なMLモデルにアクセスでき、動的予測クエリ応答が容易になる。さらに、LLM-PQAは、特定のクエリ要求に基づいて、オンデマンドでモデルを動的にトレーニングすることができ、モデル動物園で事前訓練されたモデルがタスクのために利用できなくても、信頼性と関連する結果を保証する。 The advent of Large Language Models (LLMs) provides an opportunity to change the way queries are processed, moving beyond the constraints of conventional SQL-based database systems. However, using an LLM to answer a prediction query is still challenging, since an external ML model has to be employed and inference has to be performed in order to provide an answer. This paper introduces LLM-PQA, a novel tool that addresses prediction queries formulated in natural language. LLM-PQA is the first to combine the capabilities of LLMs and retrieval-augmented mechanism for the needs of prediction queries by integrating data lakes and model zoos. This integration provides users with access to a vast spectrum of heterogeneous data and diverse ML models, facilitating dynamic prediction query answering. In addition, LLM-PQA can dynamically train models on demand, based on specific query requirements, ensuring reliable and relevant results even when no pre-trained model in a model zoo, available for the task.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# Duplex: エキスパート、グループクエリアテンション、継続的バッチを備えた大規模言語モデルのためのデバイス Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching ( http://arxiv.org/abs/2409.01141v1 ) ライセンス: Link先を確認	Sungmin Yun, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim, Byeongho Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn,	(参考訳) 大規模言語モデル(LLM)は、さまざまなコンテキストにまたがる高品質なコンテンツを生成する能力のために登場した。コンピューティングリソースの爆発的に増加する需要を減らすために、専門家の混在(MoE)が出現した。 MoE層は、少ない計算で膨大な数のパラメータを活用できる。最先端の継続的バッチ処理を適用するとスループットが向上するが、MoE層やアテンション層でのDRAMアクセスが頻繁に発生する。従来の計算装置では,MoE処理やアテンション層処理に制限があり,実行時間全体を支配し,演算強度が低い(Op/B)。 PIM(Process-in-Memory)アーキテクチャのような低Op/BをターゲットとするデバイスでのみMoE層を処理することは、連続バッチによるMoE層内のOp/Bの変動により困難である。これらの課題に対処するため,1台のデバイスで低Op/B動作を効果的に行うために,高Op/Bに適したxPUとLogic-PIMを組み合わせたDuplexを提案する。 Duplex は LLM 内の各層の Op/B に基づいて最も適切なプロセッサを選択する。 MoE層のOp/Bが少なくとも1であり、アテンション層のOp/Bがグループ化されたクエリアテンションに対して4〜8の値を持つため、以前のPIMアーキテクチャは効率的ではない。近年の傾向に基づき、Logic-PIM は DRAM ダイと論理ダイとの高帯域通信を可能にし、論理ダイに強力な処理ユニットを配置するスルー・シリコン・バイス (TSV) を追加し、数ダースから数ダースまでの低Op/B操作に最適である。本稿では,xPU と Logic-PIM を最大限に活用するために,エキスパートとアテンションの共同処理を提案する。 Large language models (LLMs) have emerged due to their capability to generate high-quality content across diverse contexts. To reduce their explosively increasing demands for computing resources, a mixture of experts (MoE) has emerged. The MoE layer enables exploiting a huge number of parameters with less computation. Applying state-of-the-art continuous batching increases throughput; however, it leads to frequent DRAM access in the MoE and attention layers. We observe that conventional computing devices have limitations when processing the MoE and attention layers, which dominate the total execution time and exhibit low arithmetic intensity (Op/B). Processing MoE layers only with devices targeting low-Op/B such as processing-in-memory (PIM) architectures is challenging due to the fluctuating Op/B in the MoE layer caused by continuous batching. To address these challenges, we propose Duplex, which comprises xPU tailored for high-Op/B and Logic-PIM to effectively perform low-Op/B operation within a single device. Duplex selects the most suitable processor based on the Op/B of each layer within LLMs. As the Op/B of the MoE layer is at least 1 and that of the attention layer has a value of 4-8 for grouped query attention, prior PIM architectures are not efficient, which place processing units inside DRAM dies and only target extremely low-Op/B (under one) operations. Based on recent trends, Logic-PIM adds more through-silicon vias (TSVs) to enable high-bandwidth communication between the DRAM die and the logic die and place powerful processing units on the logic die, which is best suited for handling low-Op/B operations ranging from few to a few dozens. To maximally utilize the xPU and Logic-PIM, we propose expert and attention co-processing.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# LATEX-GCL:Large Language Models (LLMs)-based data Augmentation for Text-Attributed Graph Contrastive Learning LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning ( http://arxiv.org/abs/2409.01145v1 ) ライセンス: Link先を確認	Haoran Yang, Xiangyu Zhao, Sirui Huang, Qing Li, Guandong Xu,	(参考訳) Graph Contrastive Learning(GCL)は、自己教師付きグラフ学習の強力なパラダイムであり、さまざまなアプリケーションシナリオに注目されている。しかし、テキスト分散グラフ(TAG)について学ぶためのGCLはまだ検討されていない。機能埋め込みマスキングのような従来の拡張技術では、TAGのテキスト属性を直接処理することはできない。 GCLをTAGに適用するための簡単な戦略は、テキスト属性を言語モデルを介して機能埋め込みにエンコードし、次に処理するために次のGCLモジュールに埋め込むことである。このような戦略は3つの大きな課題に直面している。 (I) 情報損失を回避するための失敗 (II) テキストエンコーディングフェーズにおける意味的損失 (III) 暗黙的な拡張制約(英語版)は制御不能で理解不能な結果をもたらす。本稿では,LATEX-GCLと呼ばれる新しいGCLフレームワークを提案する。LATEX-GCL(Large Language Models, LLM)を用いてテキスト拡張とLLMの強力な自然言語処理能力を利用して,前述の3つの制限に対処し,TAGタスクにGCLを適用する方法について検討する。 4つの高品質なTAGデータセットに対する大規模な実験は、提案したLATEX-GCL法の優位性を示している。ソースコードとデータセットは再現性を容易にするためにリリースされており、このリンクからアクセスすることができる。 Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised graph learning that has attracted attention across various application scenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet to be explored. Because conventional augmentation techniques like feature embedding masking cannot directly process textual attributes on TAGs. A naive strategy for applying GCL to TAGs is to encode the textual attributes into feature embeddings via a language model and then feed the embeddings into the following GCL module for processing. Such a strategy faces three key challenges: I) failure to avoid information loss, II) semantic loss during the text encoding phase, and III) implicit augmentation constraints that lead to uncontrollable and incomprehensible results. In this paper, we propose a novel GCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to produce textual augmentations and LLMs' powerful natural language processing (NLP) abilities to address the three limitations aforementioned to pave the way for applying GCL to TAG tasks. Extensive experiments on four high-quality TAG datasets illustrate the superiority of the proposed LATEX-GCL method. The source codes and datasets are released to ease the reproducibility, which can be accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# FMRFT:Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking ( http://arxiv.org/abs/2409.01148v1 ) ライセンス: Link先を確認	Mingyuan Yao, Yukang Huo, Qingbin Tian, Jiayin Zhao, Xiao Liu, Ruifeng Wang, Haihua Wang,	(参考訳) 魚の成長, 異常行動, および魚の病気は, 画像処理による魚の追跡によって早期に検出できる。しかし、水中での反射や、高い類似性、刺激による急激な水泳、多目的閉塞などのいくつかの理由により、魚の多目的追跡に困難が生じる。これらの課題に対処するため,本稿では,複雑なマルチシーン・スタージョン追跡データセットを構築し,リアルタイム魚追跡モデルであるFMRFTを提案する。このモデルでは,マルチフレーム映像のタイミング記憶と高速特徴抽出を実現するために,低メモリ消費のMamba In Mamba (MIM) アーキテクチャを導入し,マルチフィッシュ映像における連続フレームの相関解析の効率を向上させる。さらに、RT-DETRの優れた特徴相互作用と事前フレーム処理機能を活用し、効率的な追跡アルゴリズムを提供する。 QTSIクエリインタラクション処理モジュールを組み込むことで、モデルは隠蔽されたオブジェクトと冗長なトラッキングフレームを効果的に処理し、より正確で安定した魚追跡を実現する。データセット上でトレーニングおよびテストが行われ、IDF1スコアは90.3%、MOTA精度は94.3%である。実験結果から,FMRFTモデルでは魚の群集における相似性と相互排除の課題に効果的に対処でき,工場の農業環境における正確な追跡が可能であることが示唆された。 Growth, abnormal behavior, and diseases of fish can be early detected by monitoring fish tracking through the method of image processing, which is of great significance for factory aquaculture. However, underwater reflections and some reasons with fish, such as the high similarity , rapid swimming caused by stimuli and multi-object occlusion bring challenges to multi-target tracking of fish. To address these challenges, this paper establishes a complex multi-scene sturgeon tracking dataset and proposes a real-time end-to-end fish tracking model, FMRFT. In this model, the Mamba In Mamba (MIM) architecture with low memory consumption is introduced into the tracking algorithm to realize multi-frame video timing memory and fast feature extraction, which improves the efficiency of correlation analysis for contiguous frames in multi-fish video. Additionally, the superior feature interaction and a priori frame processing capabilities of RT-DETR are leveraged to provide an effective tracking algorithm. By incorporating the QTSI query interaction processing module, the model effectively handles occluded objects and redundant tracking frames, resulting in more accurate and stable fish tracking. Trained and tested on the dataset, the model achieves an IDF1 score of 90.3% and a MOTA accuracy of 94.3%. Experimental results demonstrate that the proposed FMRFT model effectively addresses the challenges of high similarity and mutual occlusion in fish populations, enabling accurate tracking in factory farming environments.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# パラメータ自由表現アライメントによるマルチモーダル幻覚の理解 Understanding Multimodal Hallucination with Parameter-Free Representation Alignment ( http://arxiv.org/abs/2409.01151v1 ) ライセンス: Link先を確認	Yueqian Wang, Jianxin Liang, Yuxuan Wang, Huishuai Zhang, Dongyan Zhao,	(参考訳) 幻覚は、MLLM(Multimodal Large Language Models)において一般的な問題であるが、根底にある原則はよく分かっていない。本稿では,MLLMのどの成分が物体幻覚に寄与するかを考察する。画像表現自体以外の要素の影響を完全に回避しつつ画像表現を解析するために,任意の2つの表現システム間の類似度を,追加の訓練パラメータを必要とせずに測定できるパラメトリックフリー表現アライメントメトリック(Pfram)を提案する。特に、Pframは人間の表現システムとニューラル表現システムのアライメントを評価できる。オブジェクトアノテーションとのアライメントを評価することで、さまざまなモデルアーキテクチャやサイズにまたがる、さまざまな最先端MLLMにおけるオブジェクト幻覚との強い一貫した相関が示されることを示す。さらに, MLLMにおける画像表現に関する他の重要な課題として, 異なるモジュールの役割, テキスト命令の影響, 代替視覚エンコーダの使用などについて検討する。私たちのコードは、https://github.com/yellow-binary-tree/Pfram.comで利用可能です。 Hallucination is a common issue in Multimodal Large Language Models (MLLMs), yet the underlying principles remain poorly understood. In this paper, we investigate which components of MLLMs contribute to object hallucinations. To analyze image representations while completely avoiding the influence of all other factors other than the image representation itself, we propose a parametric-free representation alignment metric (Pfram) that can measure the similarities between any two representation systems without requiring additional training parameters. Notably, Pfram can also assess the alignment of a neural representation system with the human representation system, represented by ground-truth annotations of images. By evaluating the alignment with object annotations, we demonstrate that this metric shows strong and consistent correlations with object hallucination across a wide range of state-of-the-art MLLMs, spanning various model architectures and sizes. Furthermore, using this metric, we explore other key issues related to image representations in MLLMs, such as the role of different modules, the impact of textual instructions, and potential improvements including the use of alternative visual encoders. Our code is available at: https://github.com/yellow-binary-tree/Pfram.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# 実世界の会話型エンティティリンクはゼロショット以上を必要とする Real World Conversational Entity Linking Requires More Than Zeroshots ( http://arxiv.org/abs/2409.01152v1 ) ライセンス: Link先を確認	Mohanna Hoveyda, Arjen P. de Vries, Maarten de Rijke, Faegheh Hasibi,	(参考訳) 会話におけるエンティティリンク(EL)は、主にドメイン固有のロングテールエンティティを含むエンティティアノテーション付き会話データセットとスパースナレッジベース(KB)の不足により、実用的なアプリケーションにおいて顕著な課題に直面している。我々は,資源制約下でのELモデルの有効性を評価するための評価シナリオを設計した。評価には、Fandom、現実世界のEL複雑度を例示するFandom、広く使われているWikipediaの2つのKBが使われている。まず、Fandomを用いた新しい不慣れKBに一般化するELモデルの能力と、RedditのFandomエンティティに関する議論に基づいて収集したゼロショット対話エンティティリンクデータセットを評価する。次に,ELモデルの事前学習を伴わずに,会話環境への適応性を評価する。以上の結果から,既存のゼロショットELモデルは,事前トレーニングを伴わずに新しいドメイン固有KBに導入され,性能が著しく低下していることが示唆された。その結果,従来の評価手法はゼロショットELにおける実世界の複雑さを捉えるには不十分であり,限られたリソースに適応するための会話型ELモデルの設計と評価のための新たなアプローチの必要性が浮き彫りになった。本研究で提案した評価設定とデータセットを公開している。 Entity linking (EL) in conversations faces notable challenges in practical applications, primarily due to the scarcity of entity-annotated conversational datasets and sparse knowledge bases (KB) containing domain-specific, long-tail entities. We designed targeted evaluation scenarios to measure the efficacy of EL models under resource constraints. Our evaluation employs two KBs: Fandom, exemplifying real-world EL complexities, and the widely used Wikipedia. First, we assess EL models' ability to generalize to a new unfamiliar KB using Fandom and a novel zero-shot conversational entity linking dataset that we curated based on Reddit discussions on Fandom entities. We then evaluate the adaptability of EL models to conversational settings without prior training. Our results indicate that current zero-shot EL models falter when introduced to new, domain-specific KBs without prior training, significantly dropping in performance. Our findings reveal that previous evaluation approaches fall short of capturing real-world complexities for zero-shot EL, highlighting the necessity for new approaches to design and assess conversational EL models to adapt to limited resources. The evaluation setup and the dataset proposed in this research are made publicly available.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# 反復リアプノフ法による符号化量子ゲート生成について On encoded quantum gate generation by iterative Lyapunov-based methods ( http://arxiv.org/abs/2409.01153v1 ) ライセンス: Link先を確認	Paulo Sergio Pereira da Silva, Pierre Rouchon,	(参考訳) 本稿では,量子ゲート生成の符号化問題について述べる。この考え方は、合成される量子ゲートの次元$\bar n$よりも高次元$n$の量子系を考えることである。 2つの正則部分集合 $\mathbb{E} = \{e_1, e_2, \ldots, e_{\bar n}\}$ と $\mathbb F = \{f_1, f_2, \ldots, f_{\bar n}\}$ of $\mathbb{C}^n$ が与えられたとき、符号化された量子ゲート生成の問題は、すべての初期状態 $e_i$ が $\exp(\jmath \phi) f_i, i=1,2, \ldots ,\bar n$ にステアされるように、間隔 $[0, T_f]$ で定義された開ループ制御則を得ることである。この問題には古典的な(完全な)量子ゲート生成問題、$\bar n = n$、$\bar n = 1$、$ 1 < \bar n < n$ のエンコードゲート生成問題が含まれる。したがって、ここでは3つの問題が共通のアプローチで統一される。閉量子系における符号化ゲート生成問題を考えるために, RIGA (emph{Reference Input Generation Algorithm) が一般化される。適切なリャプノフ函数は、符号化ゲートの支持上の直交射影から導かれる。 2つの結合トランスモン量子ビット、トランスモン量子ビットに結合したキャビティモード、およびN=10$の大きい次元ケースを含む1列のN$量子ビットである。 The problem of encoded quantum gate generation is studied in this paper. The idea is to consider a quantum system of higher dimension $n$ than the dimension $\bar n$ of the quantum gate to be synthesized. Given two orthonormal subsets $\mathbb{E} = \{e_1, e_2, \ldots, e_{\bar n}\}$ and $\mathbb F = \{f_1, f_2, \ldots, f_{\bar n}\}$ of $\mathbb{C}^n$, the problem of encoded quantum gate generation consists in obtaining an open loop control law defined in an interval $[0, T_f]$ in a way that all initial states $e_i$ are steered to $\exp(\jmath \phi) f_i, i=1,2, \ldots ,\bar n$ up to some desired precision and to some global phase $\phi \in \mathbb{R}$. This problem includes the classical (full) quantum gate generation problem, when $\bar n = n$, the state preparation problem, when $\bar n = 1$, and finally the encoded gate generation when $ 1 < \bar n < n$. Hence, three problems are unified here within a unique common approach. The \emph{Reference Input Generation Algorithm (RIGA)} is generalized in this work for considering the encoded gate generation problem for closed quantum systems. A suitable Lyapunov function is derived from the orthogonal projector on the support of the encoded gate. Three case-studies of physical interest indicate the potential interest of such numerical algorithm: two coupled transmon-qubits, a cavity mode coupled to a transmon-qubit, and a chain of $N$ qubits, including a large dimensional case for which $N=10$.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# ニューラルネットワークによる感染症流行の予測と関連する不確実性 Forecasting infectious disease prevalence with associated uncertainty using neural networks ( http://arxiv.org/abs/2409.01154v1 ) ライセンス: Link先を確認	Michael Morris,	(参考訳) 感染症は人的・経済的に重荷を負う。病気の発生を正確に予測することで、公衆衛生機関は既存の疾患や新興疾患に効果的に対応できる。この分野の進歩にもかかわらず、正確な予測モデルの開発は依然として重要な課題である。この論文では、ニューラルネットワーク(NN)と関連する不確実性推定(NNの流行予測への適用を制限する重要なコンポーネント)を用いた2つの方法論フレームワークを提案する。米国におけるインフルエンザ様疾患(ILI)を予測し,その枠組みを整備する。提案手法は,従来のILIレートと連動してWeb検索活動データを用いて,NNアーキテクチャのトレーニングを行う。我々のモデルは不確実区間を生成するためにベイズ層を組み込んでおり、より伝統的なアプローチの正当な代替品として位置づけている。最高のアーキテクチャ: 反復リカレントニューラルネットワーク(IRNN)は平均絶対誤差を10.3%削減し、4つのインフルエンザシーズンにおける予測タスクの平均で17.1%改善する。提案手法は,IRNNにおけるサンプリング手順を変更し,不確実性評価を改善するアーキテクチャであるIRNNを導入して構築する。第2のフレームワークでは、ニューラル常微分方程式を使用して、機械的コンパートメンタルモデルとNN間のギャップを埋めます。我々は、ILIレートとWeb検索活動データを組み合わせた8つのニューラルODEモデルを評価し、予測を行った。これらはIRNNとIRNN0(IRNNはILIレートのみを使用する)と比較される。 Web検索活動データなしで訓練されたモデルは、スキルの点でIRNN0を16%上回っている。今後は、最高のパフォーマンスのIRNNと競合するために、Web検索データを使ったニューラルODEをより効果的に活用することに注力する必要がある。 Infectious diseases pose significant human and economic burdens. Accurately forecasting disease incidence can enable public health agencies to respond effectively to existing or emerging diseases. Despite progress in the field, developing accurate forecasting models remains a significant challenge. This thesis proposes two methodological frameworks using neural networks (NNs) with associated uncertainty estimates - a critical component limiting the application of NNs to epidemic forecasting thus far. We develop our frameworks by forecasting influenza-like illness (ILI) in the United States. Our first proposed method uses Web search activity data in conjunction with historical ILI rates as observations for training NN architectures. Our models incorporate Bayesian layers to produce uncertainty intervals, positioning themselves as legitimate alternatives to more conventional approaches. The best performing architecture: iterative recurrent neural network (IRNN), reduces mean absolute error by 10.3% and improves Skill by 17.1% on average in forecasting tasks across four flu seasons compared to the state-of-the-art. We build on this method by introducing IRNNs, an architecture which changes the sampling procedure in the IRNN to improve the uncertainty estimation. Our second framework uses neural ordinary differential equations to bridge the gap between mechanistic compartmental models and NNs; benefiting from the physical constraints that compartmental models provide. We evaluate eight neural ODE models utilising a mixture of ILI rates and Web search activity data to provide forecasts. These are compared with the IRNN and IRNN0 - the IRNN using only ILI rates. Models trained without Web search activity data outperform the IRNN0 by 16% in terms of Skill. Future work should focus on more effectively using neural ODEs with Web search data to compete with the best performing IRNN.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# TempMe: テキスト・ビデオ検索に便利なビデオ・テンポラル・トーケン・マージ TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval ( http://arxiv.org/abs/2409.01156v1 ) ライセンス: Link先を確認	Leqi Shen, Tianxiang Hao, Sicheng Zhao, Yifeng Zhang, Pengzhang Liu, Yongjun Bao, Guiguang Ding,	(参考訳) ほとんどのテキストビデオ検索手法は、テキストイメージを事前訓練したCLIPをバックボーンとして使用し、計算オーバーヘッドの高い複雑なモジュールを組み込む。その結果、多くの研究が効率的な微調整に焦点を当てた。効率的な適応の第一の課題は、画像とビデオのモダリティの固有の相違から生じる。各サンプルビデオフレームは、画像エンコーダによって独立して処理されなければならない。既存の効率的な方法は訓練可能なパラメータを小さく微調整するが、大きなトークン数のために高い推論コストを発生させる。本研究では,時間的冗長性は連続するフレームにおける繰り返し情報により,モデルの複雑さに大きく寄与する,と論じる。既存の画像モデルのトークン圧縮手法では、フレーム間の時間的冗長性を見落としているため、ユニークな課題を解決できない。これらの課題に対処するため,時間的冗長性を低減するため,TempMe(TempMe)を提案する。具体的には、プログレッシブ・マルチグラニュラリティ・フレームワークを導入する。近隣のクリップを徐々に組み合わせることで、異なるフレームに時間トークンをマージし、ビデオレベルの特徴を学習することで、複雑さの低減とパフォーマンスの向上を実現します。大規模な実験により、TempMeの優位性が検証された。従来の効率的なテキストビデオ検索手法と比較して、TempMeは出力トークンを95%、GFLOPを51%削減し、1.8倍の高速化と4.4%のR-Sum改善を実現した。さらにTempMeは、効率的かつ完全な微調整手法を効果的に統合することで、堅牢な一般化能力を示す。完全な微調整により、TempMeは7.9%のR-Sumの改善、1.57倍の高速化、75.2%のGPUメモリ使用率を実現している。私たちのコードは解放されます。 Most text-video retrieval methods utilize the text-image pre-trained CLIP as a backbone, incorporating complex modules that result in high computational overhead. As a result, many studies focus on efficient fine-tuning. The primary challenge in efficient adaption arises from the inherent differences between image and video modalities. Each sampled video frame must be processed by the image encoder independently, which increases complexity and complicates practical deployment. Although existing efficient methods fine-tune with small trainable parameters, they still incur high inference costs due to the large token number. In this work, we argue that temporal redundancy significantly contributes to the model's high complexity due to the repeated information in consecutive frames. Existing token compression methods for image models fail to solve the unique challenges, as they overlook temporal redundancy across frames. To tackle these problems, we propose Temporal Token Merging (TempMe) to reduce temporal redundancy. Specifically, we introduce a progressive multi-granularity framework. By gradually combining neighboring clips, we merge temporal tokens across different frames and learn video-level features, leading to lower complexity and better performance. Extensive experiments validate the superiority of our TempMe. Compared to previous efficient text-video retrieval methods, TempMe significantly reduces output tokens by 95% and GFLOPs by 51%, while achieving a 1.8X speedup and a 4.4% R-Sum improvement. Additionally, TempMe exhibits robust generalization capabilities by integrating effectively with both efficient and full fine-tuning methods. With full fine-tuning, TempMe achieves a significant 7.9% R-Sum improvement, trains 1.57X faster, and utilizes 75.2% GPU memory usage. Our code will be released.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# 自動音声キャプションのための補助検索モデルによるEnCLAPの拡張 Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning ( http://arxiv.org/abs/2409.01160v1 ) ライセンス: Link先を確認	Jaeyeon Kim, Jaeyoon Jung, Minjeong Jeon, Sang Hoon Woo, Jinjoo Lee,	(参考訳) 本稿では,DCASE2024 Challenge Task6 (Automated Audio Captioning) と Task8 (Language-based Audio Retrieval) について述べる。本稿では,EnCLAP音声キャプションフレームワークに基づくアプローチを開発し,課題の6タスクに最適化する。特に、基礎となるコンポーネントの変更と、再品位プロセスの組み入れについて概説する。さらに、修正したフレームワークの副産物である補足型レトリバーモデルをTask8に送信します。提案システムでは,タスク6のFENSEスコアが0.542,タスク8のmAP@10スコアが0.386,ベースラインモデルが大幅に向上した。 In this technical report, we describe our submission to DCASE2024 Challenge Task6 (Automated Audio Captioning) and Task8 (Language-based Audio Retrieval). We develop our approach building upon the EnCLAP audio captioning framework and optimizing it for Task6 of the challenge. Notably, we outline the changes in the underlying components and the incorporation of the reranking process. Additionally, we submit a supplementary retriever model, a byproduct of our modified framework, to Task8. Our proposed systems achieve FENSE score of 0.542 on Task6 and mAP@10 score of 0.386 on Task8, significantly outperforming the baseline models.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# 性能と効率のバランスをとる:画像テキストの相互作用に基づく多モーダル大言語モデルプルーニング法 Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction ( http://arxiv.org/abs/2409.01162v1 ) ライセンス: Link先を確認	Gaotong Yu, Yi Chen, Jian Xu,	(参考訳) 近年,多モーダル大規模言語モデル (MM-LLM) は多モーダルタスクにおいて大きな成功を収めている。 MM-LLMsフレームワークでは、LLM層における連結テキストと視覚トークンの処理が主な計算消費ステップである。 LLMの入力トークンの長さは、全体的なトレーニングと推論効率に直接影響を及ぼす。そこで本研究では,MM-LLMの視覚的トークンについて検討した。その結果,視覚エンコーダにおける視覚トークンとCLSトークンの類似性は,長いテール分布に従うことがわかった。言い換えれば、少数の視覚トークンだけがCLSトークンと非常によく似ている。そこで我々は,この問題に対処する動的プルーニングアルゴリズムを設計した。まず、異なる入力サンプルに対して、視覚的CLSトークン類似度曲線の屈折点を探索し、対応するセグメンテーション点として使用し、視覚マーカーをトリミングする。このプロセスは、主に視覚エンコーダの出力を減らし、モデルを加速する。そして、LLM層において、連結された視覚テキストトークンを2度目のプルーニングを行う。この過程で、視覚的特徴とテキスト的特徴の相互作用により、テキスト相関の低い視覚的トークンとテキスト的トークンはさらにフィルタリングされ、効率と性能のバランスがとれる。複数のデータセットから得られた結果から,提案手法は元のトークン量の平均22%を使用する場合,元のトークン量と競合する性能を達成できることが示唆された。私たちのソースコードは受理後、公開されます。 Recently, multimodal large language models (MM-LLMs) have achieved great success in many multimodal tasks, but their high computational costs limit their further promotion and application. In the MM-LLMs framework, the main computational consumption step is the processing of concatenated text and visual tokens at the LLM layer. The length of the input token for LLM directly affects the overall training and inference efficiency. In response to this issue, we further studied the visual tokens of MM-LLMs. We found that the similarity between visual and CLS tokens in the visual encoder follows a long-tail distribution. In other words, only a few visual tokens are highly similar to CLS tokens. Therefore, we designed a dynamic pruning algorithm to address this issue. Firstly, for different input samples, we search for the inflection point of their visual CLS token similarity curve and use it as the corresponding segmentation point to trim the visual markers. This process mainly reduces the output of the visual encoder to accelerate the model. Then, in the LLM layer, the concatenated visual text tokens are pruned for the second time. During this process, due to the interaction between visual and textual features, visual and textual tokens with low text correlation are further filtered, achieving a balance between efficiency and performance. The results on multiple datasets show that our proposed method can achieve performance that competes with the original performance when using an average of 22% of the original token quantity. Our source code will be made publicly available following acceptance.	翻訳日:2024-09-06 07:13:03 公開日:2024-09-02
# PACSBO: おそらくほぼ正しいベイズ最適化 PACSBO: Probably approximately correct safe Bayesian optimization ( http://arxiv.org/abs/2409.01163v1 ) ライセンス: Link先を確認	Abdullah Tokmak, Thomas B. Schön, Dominik Baumann,	(参考訳) 安全なベイズ最適化(BO)アルゴリズムは、システムのダイナミクスを知らずに最適な制御ポリシーを見つけることを約束すると同時に、高い確率で安全性を保証する。これらの保証と引き換えに、一般的なアルゴリズムは滑らかな仮定を必要とする:再生カーネルヒルベルト空間(RKHS)のノルム上の既知の上限。 RKHS は潜在的に無限次元空間であり、実際、その対応する RKHS において未知函数の上界を得る方法は不明である。そこで本研究では,データから未知関数のRKHSノルムの上界を推定し,その理論的性質について検討するアルゴリズムを提案する。さらに、リプシッツに基づく手法と同様に、RKHSノルムをグローバルな対象ではなく局所的な対象として扱い、保守主義を減少させる。 RKHSノルム推定とRKHSノルムの局所解釈を安全なBOアルゴリズムに統合すると、ほぼ正しいベイズ最適化のためのアルゴリズムPACSBOが得られる。 Safe Bayesian optimization (BO) algorithms promise to find optimal control policies without knowing the system dynamics while at the same time guaranteeing safety with high probability. In exchange for those guarantees, popular algorithms require a smoothness assumption: a known upper bound on a norm in a reproducing kernel Hilbert space (RKHS). The RKHS is a potentially infinite-dimensional space, and it is unclear how to, in practice, obtain an upper bound of an unknown function in its corresponding RKHS. In response, we propose an algorithm that estimates an upper bound on the RKHS norm of an unknown function from data and investigate its theoretical properties. Moreover, akin to Lipschitz-based methods, we treat the RKHS norm as a local rather than a global object, and thus reduce conservatism. Integrating the RKHS norm estimation and the local interpretation of the RKHS norm into a safe BO algorithm yields PACSBO, an algorithm for probably approximately correct safe Bayesian optimization, for which we provide numerical and hardware experiments that demonstrate its applicability and benefits over popular safe BO algorithms.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# 焦点長とカメラポッドの共通体長変化によるカメラパラメータの変動 Variation of Camera Parameters due to Common Physical Changes in Focal Length and Camera Pose ( http://arxiv.org/abs/2409.01171v1 ) ライセンス: Link先を確認	Hsin-Yi Chen, Chuan-Kai Fu, Jen-Hui Chuang,	(参考訳) カメラ固有のパラメータの正確な校正は、インテリジェントシステムや自動運転車などの分野における様々なコンピュータビジョンベースの応用に不可欠である。しかし、既存の校正方式は、一般的な物理的変化によるカメラパラメータの変動の一般的な傾向を見出すには不適である。本稿では,焦点距離とカメラポーズの変化による大小の変動を,最近提案されたキャリブレーション法で同定できることを実証した。実験結果から、前者は様々なタイプのカメラの主点偏差の傾向(方向)が異なるが、後者は内部レンズの配置が異なるためか、後者は重力方向による偏差に非常によく似た傾向を持つ。最後に, カメラキャリブレーションの異なる方法において, 3次元から2次元への再投射誤差を比較検討した。 Accurate calibration of camera intrinsic parameters is crucial to various computer vision-based applications in the fields of intelligent systems, autonomous vehicles, etc. However, existing calibration schemes are incompetent for finding general trend of the variation of camera parameters due to common physical changes. In this paper, it is demonstrated that major and minor variations due to changes in focal length and camera pose, respectively, can be identified with a recently proposed calibration method. It is readily observable from the experimental results that the former variations have different trends (directions) of principal point deviation for different types of camera, possibly due to different internal lens configurations, while the latter have very similar trends in the deviation which is most likely due to direction of gravity. Finally, to confirm the validity of such unprecedented findings, 3D to 2D reprojection errors are compared for different methods of camera calibration.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# ブリルアン光学系における合成磁性による低しきい値量子相関 Low threshold quantum correlations via synthetic magnetism in Brillouin optomechanical system ( http://arxiv.org/abs/2409.01172v1 ) ライセンス: Link先を確認	D. R. K. Massembele, P. Djorwé, K. B. Emale, Jia-Xin Peng, A. -H. Abdel-Aty, K. S. Nisar,	(参考訳) 本稿では, ブリュアン光学系における低駆動閾値量子相関を合成磁性に基づいて生成する手法を提案する。提案手法は,2つの光モードに結合した機械的(音響的)共振器を標準振動圧(電気的拘束力)によって構成する。音響モードと光学モードを結合する電気的拘束力は、システム内の後方刺激ブリルアン散乱(BSBS)過程をトリガーする。さらに、機械的および音響的共振器は、結合率$J_m$で機械的に結合される。機械的結合がなければ、生成した量子相関は強い駆動場を必要とする。フォノンホッピング結合を考慮し、合成磁性を誘導し、低結合強度の量子相関を生成する。生成した量子相関は急激な死と再生性フェノネナを示し、熱雑音に対して堅牢である。本研究は,量子通信,量子センサ,量子計算タスクに有用な低しきい値量子相関生成法を提案する。 We propose a scheme to generate low driving threshold quantum correlations in Brillouin optomechanical system based on synthetic magnetism. Our proposal consists of a mechanical (acoustic) resonator coupled to two optical modes through the standard optomechanical radiation pressure (an electrostrictive force). The electrostrictive force that couples the acoustic mode to the optical ones striggers Backward Stimulated Brillouin Scattering (BSBS) process in the system. Moreover, the mechanical and acoustic resonators are mechanically coupled through the coupling rate $J_m$, which is $\theta$-phase modulated. Without a mechanical coupling, the generated quantum correlations require a strong driving field. By accounting phonon hopping coupling, the synthetic magnetism is induced and the quantum correlations are generated for low coupling strengths. The generated quantum correlations display sudden death and revival phenonmena, and are robust against thermal noise. Our results suggest a way for low threshold quantum correlations generation, and are useful for quantum communications, quantum sensors, and quantum computational tasks.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# アウト・オブ・ディストリビューション検出のためのログスケーリング Logit Scaling for Out-of-Distribution Detection ( http://arxiv.org/abs/2409.01175v1 ) ライセンス: Link先を確認	Andrija Djurisic, Rosanne Liu, Mladen Nikolic,	(参考訳) 機械学習とAIモデルのオープンワールド環境への安全なデプロイは、アウト・オブ・ディストリビューション(OOD)データを正確に検出する能力、モデルのトレーニング内容と大きく異なるデータサンプルに重きを置いている。 OOD検出への現在のアプローチは、モデルをさらにトレーニングすることや、もはやアクセスできないかもしれないトレーニングデータに関する統計を必要とすることが多い。さらに、既存のOOD検出メソッドの多くは、異なるアーキテクチャ間で転送された場合のパフォーマンスを維持するのに苦労している。我々の研究は、トレーニングデータ配信へのアクセスを必要とせず、トレーニングされたネットワークをそのまま維持し、さまざまなアーキテクチャにわたって強力なパフォーマンスを維持する、シンプルなポストホック手法を提案することで、これらの課題に対処する。我々の方法である Logit Scaling (LTS) は、その名が示すように、単純にロジットを、分散内(ID) と OOD のサンプルを効果的に区別する方法でスケールする。 CIFAR-10, CIFAR-100, ImageNet, OpenOOD など,様々なスケールでベンチマークを行った。実験では、3つのIDと14のOODデータセットと9つのモデルアーキテクチャがカバーされた。全体として、我々は様々なアーキテクチャにおける最先端性能、堅牢性、適応性を実証し、高度なOOD検出のための普遍的なソリューションへの道を開いた。 The safe deployment of machine learning and AI models in open-world settings hinges critically on the ability to detect out-of-distribution (OOD) data accurately, data samples that contrast vastly from what the model was trained with. Current approaches to OOD detection often require further training the model, and/or statistics about the training data which may no longer be accessible. Additionally, many existing OOD detection methods struggle to maintain performance when transferred across different architectures. Our research tackles these issues by proposing a simple, post-hoc method that does not require access to the training data distribution, keeps a trained network intact, and holds strong performance across a variety of architectures. Our method, Logit Scaling (LTS), as the name suggests, simply scales the logits in a manner that effectively distinguishes between in-distribution (ID) and OOD samples. We tested our method on benchmarks across various scales, including CIFAR-10, CIFAR-100, ImageNet and OpenOOD. The experiments cover 3 ID and 14 OOD datasets, as well as 9 model architectures. Overall, we demonstrate state-of-the-art performance, robustness and adaptability across different architectures, paving the way towards a universally applicable solution for advanced OOD detection.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# 自律運転におけるオンラインコーナーケース検出のためのエンド・ツー・エンド・エンドとモジュラー・ドライビング・アプローチの統合 Integrating End-to-End and Modular Driving Approaches for Online Corner Case Detection in Autonomous Driving ( http://arxiv.org/abs/2409.01178v1 ) ライセンス: Link先を確認	Gemb Kaljavesi, Xiyan Su, Frank Diermeyer,	(参考訳) オンラインコーナーケース検出は、自動運転車の安全性を確保するために不可欠である。現在の自律運転アプローチは、モジュラーアプローチとエンドツーエンドアプローチに分類することができる。両手法の利点を生かして,エンド・ツー・エンドのアプローチをモジュールシステムに統合したオンラインコーナーケース検出手法を提案する。モジュールシステムは一次駆動タスクを引き継ぎ、エンド・ツー・エンドのネットワークは二次駆動として並列に動作し、システム間の不一致をコーナーケース検出に使用する。本手法を実車に実装し,定性的に評価する。本研究は,2次駆動システムとして,状況認識の優れたエンド・ツー・エンドネットワークが,コーナケースの検出に有効であることを示す。これらのことから,このようなアプローチは自動運転車の安全性を高める可能性を秘めていると考えられる。 Online corner case detection is crucial for ensuring safety in autonomous driving vehicles. Current autonomous driving approaches can be categorized into modular approaches and end-to-end approaches. To leverage the advantages of both, we propose a method for online corner case detection that integrates an end-to-end approach into a modular system. The modular system takes over the primary driving task and the end-to-end network runs in parallel as a secondary one, the disagreement between the systems is then used for corner case detection. We implement this method on a real vehicle and evaluate it qualitatively. Our results demonstrate that end-to-end networks, known for their superior situational awareness, as secondary driving systems, can effectively contribute to corner case detection. These findings suggest that such an approach holds potential for enhancing the safety of autonomous vehicles.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# Recoverable Compression: テキスト情報によるマルチモーダルビジョントークン復元機構 Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information ( http://arxiv.org/abs/2409.01179v1 ) ライセンス: Link先を確認	Yi Chen, Jian Xu, Xu-Yao Zhang, Wen-Zhuo Liu, Yang-Yang Liu, Cheng-Lin Liu,	(参考訳) 大規模言語モデリング技術の進歩により、視覚エンコーダと大規模言語モデルを組み合わせた大規模マルチモーダルモデルは、様々な視覚的タスクにおいて例外的な性能を示した。現在の大規模マルチモーダルモデルのほとんどは、ビジュアルエンコーダから得られた視覚的特徴を大きな言語モデルにマッピングし、下流タスクのテキストと並行して入力として使用することでこれを実現している。したがって、視覚トークンの数はモデルのトレーニングと推論速度に直接影響を与える。しかし、大規模なマルチモーダルモデルでは、トークンのプルーニングや圧縮に視覚情報に頼るだけで重要な情報が失われる可能性がある。一方、質問の形式でのテキスト入力には、質問に答えるのに役立つ貴重な情報が含まれており、モデルにさらなる知識を提供する。純粋に視覚的トークンプルーニング法で起こりうる潜在的な単純化と過剰なプルーニングに対処するために,テキスト情報を用いた動的視覚的トークン回復機構を提案する。このメカニズムは、質問テキストと視覚トークンの類似性を利用して、重要なテキスト情報で視覚的に意味のあるトークンを回収し、他の重要でないトークンをマージする。実験により,提案手法は,視覚トークンを平均10%まで圧縮しながら,従来の手法に匹敵する性能を示した。私たちのソースコードは受理後、公開されます。 With the advancement of large-scale language modeling techniques, large multimodal models combining visual encoders with large language models have demonstrated exceptional performance in various visual tasks. Most of the current large-scale multimodal models achieve this by mapping visual features obtained from the visual encoder into a large language model and using them as inputs alongside text for downstream tasks. Therefore, the number of visual tokens directly affects the training and inference speed of the model. There has been significant work on token pruning for visual transformers, but for large multimodal models, only relying on visual information for token pruning or compression may lead to significant loss of important information. On the other hand, the textual input in the form of a question may contain valuable information that can aid in answering the question, providing additional knowledge to the model. To address the potential oversimplification and excessive pruning that can occur with most purely visual token pruning methods, we propose a text information-guided dynamic visual token recovery mechanism that does not require training. This mechanism leverages the similarity between the question text and visual tokens to recover visually meaningful tokens with important text information while merging other less important tokens. Experimental results demonstrate that our proposed method achieves comparable performance to the original approach while compressing the visual tokens to an average of 10% of the original quantity. Our source code will be made publicly available following acceptance.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# PitVis-2023 : 内視鏡下下垂体手術ビデオにおけるワークフロー認識の試み PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery ( http://arxiv.org/abs/2409.01184v1 ) ライセンス: Link先を確認	Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, Guoyan Zheng, Abdul Qayyum, Moona Mazher, Imran Razzak, Tianbin Li, Jin Ye, Junjun He, Szymon Płotka, Joanna Kaleta, Amine Yamlahi, Antoine Jund, Patrick Godau, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Dominik Rivoir, Alejandra Pérez, Santiago Rodriguez, Pablo Arbeláez, Danail Stoyanov, Hani J. Marcus, Sophia Bano,	(参考訳) 最小侵襲手術のビデオに応用されるコンピュータビジョンの分野は、ますます成長している。ワークフロー認識(Workflow recognition)とは、手術のさまざまな側面を自動認識することである。この情報は後に、手術を学ぶとき、生きた手術中、手術ノートを書くときに、臨床医を助けるために使われる。 The Pituitary Vision (PitVis) 2023 Challengeは、内視鏡下垂体手術のビデオで、コミュニティに認識のステップと計測を課している。これは、視覚を制限し歪ませる作業スペースが小さいことや、より正確なモデル予測を必要とする機器とステップの切り替えの頻度が高いことによる、他の最小侵襲の手術と比較してもユニークなタスクである。参加者には25本のビデオが提供され、2008-Oct-2023年、カナダのバンクーバーで開催された内視鏡的ビジョン2023チャレンジの一環としてMICCAI-2023で発表された。さまざまなディープラーニングモデルを使用して、6つの国にまたがる9チームからの18のサブミッションがあった。トップパフォーマンスモデルの共通性は、時空間法とマルチタスク法を採用し、ステップと楽器の認識において、純粋にスペーシャルな単一タスクモデルよりも50%以上、マクロF1スコアが10%以上改善されたことである。したがって、PitVis-2023 Challengeは、最小侵襲手術における最先端のコンピュータビジョンモデルが新しいデータセットに転送可能であることを示した。ベンチマーク結果は論文に記載されており、データセットはhttps://doi.org/10.5522/04/26531686で公開されている。 The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery: including which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery; during live surgery; and when writing operation notes. The Pituitary Vision (PitVis) 2023 Challenge tasks the community to step and instrument recognition in videos of endoscopic pituitary surgery. This is a unique task when compared to other minimally invasive surgeries due to the smaller working space, which limits and distorts vision; and higher frequency of instrument and step switching, which requires more precise model predictions. Participants were provided with 25-videos, with results presented at the MICCAI-2023 conference as part of the Endoscopic Vision 2023 Challenge in Vancouver, Canada, on 08-Oct-2023. There were 18-submissions from 9-teams across 6-countries, using a variety of deep learning models. A commonality between the top performing models was incorporating spatio-temporal and multi-task methods, with greater than 50% and 10% macro-F1-score improvement over purely spacial single-task models in step and instrument recognition respectively. The PitVis-2023 Challenge therefore demonstrates state-of-the-art computer vision models in minimally invasive surgery are transferable to a new dataset, with surgery specific techniques used to enhance performance, progressing the field further. Benchmark results are provided in the paper, and the dataset is publicly available at: https://doi.org/10.5522/04/26531686.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# 自己監督型, 生成型学習によるバックドアディフェンス Backdoor Defense through Self-Supervised and Generative Learning ( http://arxiv.org/abs/2409.01185v1 ) ライセンス: Link先を確認	Ivan Sabolić, Ivan Grubišić, Siniša Šegvić,	(参考訳) バックドア攻撃は、手作りのトリガーを導入し、対応するラベルを望ましいターゲットクラスに切り替えることで、トレーニングデータのごく一部を変更する。このようなデータのトレーニングは、選択されたテストサンプルに悪意のある推論を引き起こすバックドアを注入する。ほとんどの防衛は、差別的な学習手順の様々な修正を通じて、このような攻撃を緩和する。対照的に、自己教師付き表現空間におけるクラスごとの分布の生成モデルに基づくアプローチについて検討する。興味深いことに、これらの表現は最近のバックドア攻撃で保存されるか、ひどく乱される。どちらの場合も、クラスごとの生成モデルにより、有毒なデータを検出し、データセットをクリーン化することができます。実験により、クリーン化されたデータセットでのトレーニングは、攻撃の成功率を大幅に低減し、良心的な入力の精度を維持することが示された。 Backdoor attacks change a small portion of training data by introducing hand-crafted triggers and rewiring the corresponding labels towards a desired target class. Training on such data injects a backdoor which causes malicious inference in selected test samples. Most defenses mitigate such attacks through various modifications of the discriminative learning procedure. In contrast, this paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space. Interestingly, these representations get either preserved or heavily disturbed under recent backdoor attacks. In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset. Experiments show that training on cleansed dataset greatly reduces the attack success rate and retains the accuracy on benign inputs.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# ボソン・スピンモデルにおけるホレボ境界と客観性 Holevo bound and objectivity in the boson-spin model ( http://arxiv.org/abs/2409.01186v1 ) ライセンス: Link先を確認	Tae-Hun Lee, Jarosław K. Korbicz,	(参考訳) 量子系における客観的で古典的な性質の出現は、量子情報理論の現代言語で説明できる。本稿では、そのような分析の例を示す。我々は、量子チャネル理論をオープン量子系のボソンスピンモデルに適用し、リコイルレス近似とフロケ理論を用いて、チャネルの容量を束縛するホエボ量を用いて、中央系の情報をその環境にブロードキャストする。本研究は, 短期体制を解析し, キャパシティの初期成長と漸近体制を2次的に示す。温度や環境のトンネルエネルギーなどのモデルパラメータへの複雑な依存も分析し、ホレヴォ境界が最大に達するようなeg状態を示す。 Emergence of objective, classical properties in quantum systems can be described in the modern language of quantum information theory. In this work, we present an example of such an analysis. We apply the quantum channel theory to a boson-spin model of open quantum systems and calculate, using recoilless approximation and the Floquet theory, the Hoevo quantity, which bounds the capacity of the channel, broadcasting information about the central system into its environment. We analyze both the short-time regime, showing quadratic in time initial growth of the capacity, and the asymptotic regime. Complicated dependence on the model parameters, such as temperature, tunneling energy for the environment, etc is also analyzed, showing e.g. regimes where the Holevo bound reaches its maximum.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# 空間モード分解法による単一光子超解光分光 Single-photon super-resolved spectroscopy from spatial-mode demultiplexing ( http://arxiv.org/abs/2409.01190v1 ) ライセンス: Link先を確認	Luigi Santamaria Amato, Fabrizio Sgobba, Deborah Pallotti, Cosmo Lupo,	(参考訳) サブ回折分解能を持つ非コヒーレント光の分光を実証する。原理実証実験では、回折限界以下で分離する不整点のような一対の源のスペクトルを解析する。この2つの源は惑星系を模しており、恒星の光源は明るく、惑星の光源は暗くなっている。 2つの画像が重なり合っているため、二次音源のスペクトル情報を取得することは困難である。この制限は、空間モードデマルチプレクシングに基づく構造化された測定を利用して解決され、光は横磁場のエルミート・ガウス成分で最初にソートされ、光子検出によって測定される。これにより、2つの源からの光子を効果的に分離することができます。太陽系外惑星の大気スペクトロスコピーを強化するための応用が提案されている。空間デマルチプレクシングに基づく超高分解能イメージングのいくつかの実験が過去数年間に行われ、有望な結果が得られた。ここでは、私たちの知識を最大限に活用するために、この概念を分光領域に拡張する。 We demonstrate spectroscopy of incoherent light with sub-diffraction resolution. In a proof-of-principle experiment we analyze the spectrum of a pair of incoherent point-like sources whose separation is below the diffraction limit. The two sources mimic a planetary system, with a brighter source for the star and a dimmer one for the planet. Acquiring spectral information about the secondary source is hard because the two images have a substantial overlap. This limitation is solved by leveraging a structured measurement based on spatial-mode demultiplexing, where light is first sorted in its Hermite-Gaussian components in the transverse field, then measured by photon detection. This allows us to effectively decouple the photons coming from the two sources. An application is suggested to enhance exoplanets' atmosphere spectroscopy. A number of experiments of super-resolution imaging based on spatial demultiplexing have been conducted in the past few years, with promising results. Here, for the first time to the best of our knowledge, we extend this concept to the domain of spectroscopy.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# CLIBE: トランスフォーマーベースNLPモデルにおける動的バックドアの検出 CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models ( http://arxiv.org/abs/2409.01193v1 ) ライセンス: Link先を確認	Rui Zeng, Xi Chen, Yuwen Pu, Xuhong Zhang, Tianyu Du, Shouling Ji,	(参考訳) バックドアはNLPモデルに注入され、入力テキストにトリガーと呼ばれる特定の機能が含まれており、攻撃者が秘密に選択した場合に誤動作を誘発する。静的テキストトリガで使用される固定語、フレーズ、文とは異なり、NLP動的バックドアは抽象的および潜在的なテキスト機能に関連する設計トリガーを攻撃し、従来の静的バックドア攻撃よりもかなりステルス性が高い。しかし、NLPバックドア検出に関する既存の研究は、主に静的バックドア攻撃に対する防御に焦点を当てているが、NLPモデルにおける動的バックドアの検出は明らかにされていない。本稿では, Transformer ベースの NLP モデルで動的バックドアを検出する最初のフレームワークである CLIBE を提案する。 CLIBEは、ターゲットラベルとして限られた数の参照サンプルを分類するように、注目層に最適化された重量摂動を組み込むことで、疑似トランスフォーマーモデルに「ファウショット摂動」を注入する。その後、CLIBEは、この数発の摂動の一般化能力を利用して、元のモデルが動的バックドアを含むかどうかを判断する。 3つの高度なNLP動的バックドア攻撃,2つの広く使用されているトランスフォーマーフレームワーク,および4つの実世界の分類タスクに対する広範囲な評価は,CLIBEの有効性を強く検証する。また,様々なアダプティブアタックに対するCLIBEの堅牢性を示す。さらに、CLIBEを用いて、Hugging Face上で49の人気のTransformerモデルを精査し、動的バックドアを含む確率の高いモデルを見つける。我々はHugging Faceにコンタクトを取り、このモデルのバックドア動作の詳細な証拠を提供した。さらに、CLIBEを拡張し、有害な振る舞いを示すように修正されたバックドアテキスト生成モデルを検出する。私たちの知る限り、CLIBEは、入力テストサンプルをトリガーすることなく、テキスト生成モデルのバックドアを検出することができる最初のフレームワークです。 Backdoors can be injected into NLP models to induce misbehavior when the input text contains a specific feature, known as a trigger, which the attacker secretly selects. Unlike fixed words, phrases, or sentences used in the static text trigger, NLP dynamic backdoor attacks design triggers associated with abstract and latent text features, making them considerably stealthier than traditional static backdoor attacks. However, existing research on NLP backdoor detection primarily focuses on defending against static backdoor attacks, while detecting dynamic backdoors in NLP models remains largely unexplored. This paper presents CLIBE, the first framework to detect dynamic backdoors in Transformer-based NLP models. CLIBE injects a "few-shot perturbation" into the suspect Transformer model by crafting optimized weight perturbation in the attention layers to make the perturbed model classify a limited number of reference samples as a target label. Subsequently, CLIBE leverages the generalization ability of this few-shot perturbation to determine whether the original model contains a dynamic backdoor. Extensive evaluation on three advanced NLP dynamic backdoor attacks, two widely-used Transformer frameworks, and four real-world classification tasks strongly validates the effectiveness of CLIBE. We also demonstrate the robustness of CLIBE against various adaptive attacks. Furthermore, we employ CLIBE to scrutinize 49 popular Transformer models on Hugging Face and discover one exhibiting a high probability of containing a dynamic backdoor. We have contacted Hugging Face and provided detailed evidence of this model's backdoor behavior. Moreover, we extend CLIBE to detect backdoor text generation models modified to exhibit toxic behavior. To the best of our knowledge, CLIBE is the first framework capable of detecting backdoors in text generation models without access to trigger input test samples.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# 新生児脳における学習に基づく繊維配向分布推定における地表面構造の影響 Ground-truth effects in learning-based fiber orientation distribution estimation in neonatal brains ( http://arxiv.org/abs/2409.01195v1 ) ライセンス: Link先を確認	Rizhong Lin, Hamza Kebiri, Ali Gholipour, Yufei Chen, Jean-Philippe Thiran, Davood Karimi, Meritxell Bach Cuadra,	(参考訳) 拡散磁気共鳴イメージング(Diffusion Magnetic Resonance Imaging, DMRI)は、脳の微細構造を生体内で描写する非侵襲的な方法である。ファイバ配向分布(英: Fiber orientation distributions、FOD)は、ホワイトマターファイバーの構成をマッピングするのに広く用いられる数学的表現である。近年、深層ニューラルネットワークを用いたFOD推定は成功しており、特に拡散測定の少ない新生児では成功している。これらの方法は、主にマルチシェルの制約付き球状脱畳(MSMT-CSD)を用いて再構成されたターゲットFODに基づいて訓練されている。本稿では,MSMT-CSDとS3T-CSDによる制約付き球面デコンボリューション(SS3T-CSD)の両面において,U-Netアーキテクチャに基づく最先端モデルのトレーニングにより,この仮説を検証する。以上の結果より, SS3T-CSDとSS3T-CSDとの単繊維推定ボクセルの比率はMSMT-CSDよりも現実的であることが示唆された。さらに入力勾配方向の増大はMSMT-CSDよりもSS3T-CSDの性能を著しく向上させる。最後に、年齢領域シフト設定では、SS3T-CSDは年齢群全体で堅牢なパフォーマンスを維持しており、より正確な新生児脳画像撮影の可能性を示している。 Diffusion Magnetic Resonance Imaging (dMRI) is a non-invasive method for depicting brain microstructure in vivo. Fiber orientation distributions (FODs) are mathematical representations extensively used to map white matter fiber configurations. Recently, FOD estimation with deep neural networks has seen growing success, in particular, those of neonates estimated with fewer diffusion measurements. These methods are mostly trained on target FODs reconstructed with multi-shell multi-tissue constrained spherical deconvolution (MSMT-CSD), which might not be the ideal ground truth for developing brains. Here, we investigate this hypothesis by training a state-of-the-art model based on the U-Net architecture on both MSMT-CSD and single-shell three-tissue constrained spherical deconvolution (SS3T-CSD). Our results suggest that SS3T-CSD might be more suited for neonatal brains, given that the ratio between single and multiple fiber-estimated voxels with SS3T-CSD is more realistic compared to MSMT-CSD. Additionally, increasing the number of input gradient directions significantly improves performance with SS3T-CSD over MSMT-CSD. Finally, in an age domain-shift setting, SS3T-CSD maintains robust performance across age groups, indicating its potential for more accurate neonatal brain imaging.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# 電気・磁気・マルチポール相互作用のための統一偏極形式 A unifying polarization formalism for electric- and magnetic-multipole interactions ( http://arxiv.org/abs/2409.01197v1 ) ライセンス: Link先を確認	R. Casini, R. Manso Sainz, A. Lopez Ariste, N. Kaikati,	(参考訳) 偏極のための球面テンソル形式は、任意の順序の電気的および磁気的マルチポール遷移の処理に拡張する。我々は、原子系と偏光場との相互作用を記述する作用素のテンソル形式を導出するために、球面波の膨張に頼っており、これは自然界の偏光特性を記述する球面テンソルの導入につながっている。直接応用として、フォーマリズムは電気四極子転移における放射の散乱に影響を及ぼす放射異方性をモデル化し、磁場の存在下でのハンル効果をモデル化するために用いられる。 We extend the spherical tensorial formalism for polarization to the treatment of electric- and magnetic-multipole transitions of any order. We rely on the spherical-wave expansion to derive the tensor form of the operator describing the interaction of the atomic system with a polarized radiation field, which naturally leads to the introduction of spherical tensors describing the polarization properties of the interacting field. As a direct application, the formalism is used to model the radiation anisotropy affecting the scattering of radiation in an electric-quadrupole transition, and the associated Hanle effect in the presence of a magnetic field.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# OD-VAE:潜時ビデオ拡散モデル改善のための全次元ビデオ圧縮機 OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model ( http://arxiv.org/abs/2409.01199v1 ) ライセンス: Link先を確認	Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinghua Cheng, Li Yuan,	(参考訳) 可変オートエンコーダ (VAE) は遅延表現に動画を圧縮し、遅延ビデオ拡散モデル (LVDM) に先行する重要なコンポーネントである。再現品質が同じであれば、ビデオに対するVAEの圧縮が十分であればなるほど、LVDMはより効率的になります。しかし、ほとんどのLVDMは、ビデオの圧縮が空間次元でのみ行われ、時間次元ではしばしば無視される2D画像VAEを使用している。正確な再現を約束しながら、より簡潔な潜在表現を得るために、VAE内のビデオの時間的圧縮を実行する方法はめったにない。このギャップを埋めるために、時間的・空間的に動画を圧縮できるOD-VAEという全次元圧縮VAEを提案する。 OD-VAEのより十分な圧縮は、ビデオ再構成に大きな課題をもたらすが、細かな設計によって高い再構成精度を達成することができる。映像再構成品質と圧縮速度のトレードオフを改善するために、OD-VAEの4つの変種を導入分析する。さらに、OD-VAEをより効率的にトレーニングするための新しいテール初期化を設計し、GPUメモリに制限のある任意の長さの動画をOD-VAEが扱えるようにするための新しい推論戦略を提案する。ビデオ再構成とLVDMに基づくビデオ生成に関する総合的な実験により,提案手法の有効性と有効性を示した。 Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ignored in the temporal dimension. How to conduct temporal compression for videos in a VAE to obtain more concise latent representations while promising accurate reconstruction is seldom explored. To fill this gap, we propose an omni-dimension compression VAE, named OD-VAE, which can temporally and spatially compress videos. Although OD-VAE's more sufficient compression brings a great challenge to video reconstruction, it can still achieve high reconstructed accuracy by our fine design. To obtain a better trade-off between video reconstruction quality and compression speed, four variants of OD-VAE are introduced and analyzed. In addition, a novel tail initialization is designed to train OD-VAE more efficiently, and a novel inference strategy is proposed to enable OD-VAE to handle videos of arbitrary length with limited GPU memory. Comprehensive experiments on video reconstruction and LVDM-based video generation demonstrate the effectiveness and efficiency of our proposed methods.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# EnCLAP++: 自動オーディオキャプションパフォーマンスを最適化するためのEnCLAPフレームワークの分析 EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance ( http://arxiv.org/abs/2409.01201v1 ) ライセンス: Link先を確認	Jaeyeon Kim, Minjeon Jeon, Jaeyoon Jung, Sang Hoon Woo, Jinjoo Lee,	(参考訳) 本研究では,音声の自動キャプションにおける最先端モデルであるEnCLAPフレームワークの解析と最適化を目的とする。本研究では,音響エンコーダコンポーネントの変更の影響について検討し,異なるデータセットスケールでの事前学習について検討し,再分類方式の有効性について検討する。生成されたキャプションの広範な実験と定量的解析により,オリジナルをはるかに上回る拡張版であるEnCLAP++を開発した。 In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++, an enhanced version that significantly surpasses the original.	翻訳日:2024-09-06 07:01:54 公開日:2024-09-02
# 一般産業インテリジェンスに向けて:IIoT強化型連続型大規模モデルに関する調査 Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models ( http://arxiv.org/abs/2409.01207v1 ) ライセンス: Link先を確認	Jiao Chen, Jiayi He, Fangfang Chen, Zuohong Lv, Jianhua Tang, Weihua Li, Zuozhu Liu, Howard H. Yang, Guangjie Han,	(参考訳) 現在、Industrial Internet of Things(IIoT)のほとんどのアプリケーションは、依然としてCNNベースのニューラルネットワークに依存している。言語、ビジョン、マルチモーダルモデルを含むトランスフォーマーベースの大規模モデル(LM)は、AIGC(AIGC)において印象的な能力を示してきたが、検出、計画、制御といった産業分野への応用は依然として比較的限られている。産業環境に事前訓練されたLMを配置することは、タスクの複雑さ、データの多様性、ユーザ要求の動的性質などにより、安定性と可塑性の課題に直面することが多い。これらの課題に対処するために、事前学習と微調整の戦略と継続的学習は効果的なソリューションであることが証明され、モデルが動的要求に適応し、推論と意思決定能力を継続的に最適化できるようになりました。本稿では, GII における LM と GII 上の LM の2つの重要な領域に着目し, IIoT による汎用産業情報 (GII) への LM の統合について検討する。前者は産業用アプリケーションの課題に対して最適化されたソリューションを提供するためにLMを活用することに焦点を当て、後者は産業用デバイス、エッジコンピューティング、クラウドコンピューティングを含む協調シナリオにおけるLM学習と推論機能の継続的な最適化について研究している。本稿では,より汎用的で適応的な未来に向けて,GIIの総合的な理論枠組みと研究方向性を確立することを目的とした,GIIの今後の発展に関する知見を提供する。 Currently, most applications in the Industrial Internet of Things (IIoT) still rely on CNN-based neural networks. Although Transformer-based large models (LMs), including language, vision, and multimodal models, have demonstrated impressive capabilities in AI-generated content (AIGC), their application in industrial domains, such as detection, planning, and control, remains relatively limited. Deploying pre-trained LMs in industrial environments often encounters the challenge of stability and plasticity due to the complexity of tasks, the diversity of data, and the dynamic nature of user demands. To address these challenges, the pre-training and fine-tuning strategy, coupled with continual learning, has proven to be an effective solution, enabling models to adapt to dynamic demands while continuously optimizing their inference and decision-making capabilities. This paper surveys the integration of LMs into IIoT-enhanced General Industrial Intelligence (GII), focusing on two key areas: LMs for GII and LMs on GII. The former focuses on leveraging LMs to provide optimized solutions for industrial application challenges, while the latter investigates continuous optimization of LMs learning and inference capabilities in collaborative scenarios involving industrial devices, edge computing, and cloud computing. This paper provides insights into the future development of GII, aiming to establish a comprehensive theoretical framework and research direction for GII, thereby advancing GII towards a more general and adaptive future.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# 欠測データ付き混合型データの統計的ジャンプモデル Statistical Jump Model for Mixed-Type Data with Missing Data Imputation ( http://arxiv.org/abs/2409.01208v1 ) ライセンス: Link先を確認	Federico P. Cortese, Antonio Pievatolo,	(参考訳) 本稿では,混合型データに対する統計的ジャンプモデルを導入することで,時間的進化を伴う混合型データをクラスタリングすることの課題に対処する。この新しいフレームワークは、状態の持続性、解釈可能性の向上、状態スイッチの頻度の低減、および欠落したデータの効率的な処理を含む。このモデルは、状態条件の手段とモードで容易に解釈でき、実践者や政策立案者にはアクセス可能である。本研究では, 従来の大気質指標と比較して, 大気質の持続的な状態の推測において, その優位性を示すとともに, 大気質データへの実証的応用を通じて, 本手法の有効性を検証した。コントリビューションには、混合型時間クラスタリングの堅牢な方法、効果的なデータ管理の欠如、環境モニタリングの実践的洞察が含まれている。 In this paper, we address the challenge of clustering mixed-type data with temporal evolution by introducing the statistical jump model for mixed-type data. This novel framework incorporates regime persistence, enhancing interpretability and reducing the frequency of state switches, and efficiently handles missing data. The model is easily interpretable through its state-conditional means and modes, making it accessible to practitioners and policymakers. We validate our approach through extensive simulation studies and an empirical application to air quality data, demonstrating its superiority in inferring persistent air quality regimes compared to the traditional air quality index. Our contributions include a robust method for mixed-type temporal clustering, effective missing data management, and practical insights for environmental monitoring.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# MobileIQA:知識蒸留を用いた非参照画像品質評価のためのモバイルレベルのディバイスオピニオンネットワークのエクスプロイト MobileIQA: Exploiting Mobile-level Diverse Opinion Network For No-Reference Image Quality Assessment Using Knowledge Distillation ( http://arxiv.org/abs/2409.01212v1 ) ライセンス: Link先を確認	Zewen Chen, Sunhan Xu, Yun Zeng, Haochen Guo, Jian Guo, Shuai Liu, Juan Wang, Bing Li, Weiming Hu, Dehua Liu, Hesong Li,	(参考訳) 高解像度(HR)画像の需要が高まる中、NR-IQA(No-Reference Image Quality Assessment)が注目されるようになり、モバイルデバイス上でのリアルタイムな画質向上とユーザエクスペリエンスの向上を実現している。しかし、既存のNR-IQA法では、HR画像を小さな解像度にリサイズまたはトリミングすることが多く、重要な詳細が失われる。そして、そのほとんどは計算量が多いため、計算資源が限られているため、モバイル機器への応用を妨げている。これらの課題に対処するため,高解像度入力により画像の詳細を保存しながら,画像品質を効率的に評価する,軽量なバックボーンを用いた新しい手法であるMobileIQAを提案する。 MobileIQAは、多視点アテンション学習(MAL)モジュールを用いて、データセットアノテーションプロセス中に異なるアノテータによって提供される主観的な意見をシミュレートする。モデルは教師モデルを使用して、知識蒸留を通して学生モデルの学習を誘導する。この方法は高い性能を維持しながら計算複雑性を著しく低減する。実験により、MobileIQAは、評価指標と計算効率において、新しいIQA法よりも優れていることが示された。コードはhttps://github.com/chencn2020/MobileIQA.comで入手できる。 With the rising demand for high-resolution (HR) images, No-Reference Image Quality Assessment (NR-IQA) gains more attention, as it can ecaluate image quality in real-time on mobile devices and enhance user experience. However, existing NR-IQA methods often resize or crop the HR images into small resolution, which leads to a loss of important details. And most of them are of high computational complexity, which hinders their application on mobile devices due to limited computational resources. To address these challenges, we propose MobileIQA, a novel approach that utilizes lightweight backbones to efficiently assess image quality while preserving image details through high-resolution input. MobileIQA employs the proposed multi-view attention learning (MAL) module to capture diverse opinions, simulating subjective opinions provided by different annotators during the dataset annotation process. The model uses a teacher model to guide the learning of a student model through knowledge distillation. This method significantly reduces computational complexity while maintaining high performance. Experiments demonstrate that MobileIQA outperforms novel IQA methods on evaluation metrics and computational efficiency. The code is available at https://github.com/chencn2020/MobileIQA.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# スキュー特徴密度を考慮した教師付きパターン認識 Supervised Pattern Recognition Involving Skewed Feature Densities ( http://arxiv.org/abs/2409.01213v1 ) ライセンス: Link先を確認	Alexandre Benatti, Luciano da F. Costa,	(参考訳) パターン認識は、多くの科学および技術活動の基礎となる特に重要な課題である。同時に、パターン認識には、データ要素を表現する機能の選択や、可能な各変換など、いくつかの課題が含まれている。本研究は, 類似度指数に基づくユークリッド距離の分類ポテンシャルと相似類似度指数に基づく相似性指数を, k-neighbors による分類法を用いて比較した。重なりのない, あるいは重複しない, それぞれの密度を特徴とする2つの群が与えられた場合, ユークリッド距離に基づくk近傍法の性能を類似度指標として定量的に評価するために, 異なるタイプの変換が得られた。より具体的には、隣り合う2つの群の密度の間の交点を分類する精度が比較に考慮される。また,データ要素間の比較のシャープさが,各教師付き分類性能とは無関係であることが確認された。 Pattern recognition constitutes a particularly important task underlying a great deal of scientific and technologica activities. At the same time, pattern recognition involves several challenges, including the choice of features to represent the data elements, as well as possible respective transformations. In the present work, the classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared by using the k-neighbors supervised classification method respectively to features resulting from several types of transformations of one- and two-dimensional symmetric densities. Given two groups characterized by respective densities without or with overlap, different types of respective transformations are obtained and employed to quantitatively evaluate the performance of k-neighbors methodologies based on the Euclidean distance an coincidence similarity index. More specifically, the accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account for the comparison. Several interesting results are described and discussed, including the enhanced potential of the dissimilarity index for classifying datasets with right skewed feature densities, as well as the identification that the sharpness of the comparison between data elements can be independent of the respective supervised classification performance.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# PythonエコシステムにおけるSBOM生成ツール:詳細分析 SBOM Generation Tools in the Python Ecosystem: an In-Detail Analysis ( http://arxiv.org/abs/2409.01214v1 ) ライセンス: Link先を確認	Serena Cofano, Giacomo Benedetti, Matteo Dell'Amico,	(参考訳) SBOM(Software Bills of Materials)は、ソフトウェアを構成するコンポーネントをリストアップすることで透明性を向上させるもので、ソフトウェアサプライチェーン攻撃の実施問題に対する重要な対策である。 SBOM生成ツールは、プロジェクトソースファイルを取り込み、SBOMを出力として提供し、ソフトウェアエコシステムと相互作用する。 SBOMはセキュリティ実践者にとって大幅に改善されているが、完全かつ正しいSBOMを提供することは依然として未解決の問題である。本稿では,SBOMの完全性と正しさに影響を及ぼす問題の原因をPyPIエコシステムに焦点をあてて検討する。我々はCycloneDX標準を用いて4つの人気のあるSBOM生成ツールを分析する。私たちの分析では、依存関係のバージョン、メタデータファイル、リモート依存関係、オプションの依存関係に関する問題を取り上げています。さらに,PyPIエコシステムにおけるメタデータの標準が欠如していることから,体系的な問題を見出した。これにはメタデータファイルの存在の矛盾や、コンテンツのフォーマットのバリエーションが含まれている。 Software Bills of Material (SBOMs), which improve transparency by listing the components constituting software, are a key countermeasure to the mounting problem of Software Supply Chain attacks. SBOM generation tools take project source files and provide an SBOM as output, interacting with the software ecosystem. While SBOMs are a substantial improvement for security practitioners, providing a complete and correct SBOM is still an open problem. This paper investigates the causes of the issues affecting SBOM completeness and correctness, focusing on the PyPI ecosystem. We analyze four popular SBOM generation tools using the CycloneDX standard. Our analysis highlights issues related to dependency versions, metadata files, remote dependencies, and optional dependencies. Additionally, we identified a systematic issue with the lack of standards for metadata in the PyPI ecosystem. This includes inconsistencies in the presence of metadata files as well as variations in how their content is formatted.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# ESP-PCT:ポイントクラウドトランスにおける時間的・空間的冗長性の効率的な圧縮によるVRセマンティックパフォーマンスの向上 ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers ( http://arxiv.org/abs/2409.01216v1 ) ライセンス: Link先を確認	Luoyu Mei, Shuai Wang, Yun Cheng, Ruofeng Liu, Zhimeng Yin, Wenchao Jiang, Shuai Wang, Wei Gong,	(参考訳) セマンティック認識は仮想現実(VR)アプリケーションにおいて重要なものであり、没入的でインタラクティブな体験を可能にする。有望なアプローチは、ミリ波(mmWave)信号を利用して点雲を生成することである。しかし、現在のmmWaveポイントクラウドモデルの高い計算とメモリ要求は、その効率と信頼性を妨げている。この制限に対処するため,本論文では,VRアプリケーションに適した2段階のセマンティック・セマンティック・パフォーマンス・ポイント・クラウド・トランスフォーマであるESP-PCTを紹介する。 ESP-PCTは、センサポイントクラウドデータの精度を活用し、ローカライゼーションとフォーカスステージをエンドツーエンドで共同でトレーニングする意味認識プロセスを最適化する。各種VRセマンティック認識条件でESP-PCTを評価し,認識効率を大幅に向上させた。特に、ESP-PCTは計算要求(FLOP)を76.9%削減し、メモリ使用量を78.2%削減し、93.2%の精度を達成している。これらのことは、高い精度と冗長性を低下させることにより、VRセマンティック認識におけるESP-PCTの可能性を強調している。このプロジェクトのコードとデータは \url{https://github.com/lymei-SEU/ESP-PCT} で公開されている。 Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a novel Enhanced Semantic Performance Point Cloud Transformer with a two-stage semantic recognition framework tailored for VR applications. ESP-PCT takes advantage of the accuracy of sensory point cloud data and optimizes the semantic recognition process, where the localization and focus stages are trained jointly in an end-to-end manner. We evaluate ESP-PCT on various VR semantic recognition conditions, demonstrating substantial enhancements in recognition efficiency. Notably, ESP-PCT achieves a remarkable accuracy of 93.2% while reducing the computational requirements (FLOPs) by 76.9% and memory usage by 78.2% compared to the existing Point Transformer model simultaneously. These underscore ESP-PCT's potential in VR semantic recognition by achieving high accuracy and reducing redundancy. The code and data of this project are available at \url{https://github.com/lymei-SEU/ESP-PCT}.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# 低資源テキストから音声への多言語学習戦略 A multilingual training strategy for low resource Text to Speech ( http://arxiv.org/abs/2409.01217v1 ) ライセンス: Link先を確認	Asma Amalas, Mounir Ghogho, Mohamed Chetouani, Rachid Oulad Haj Thami,	(参考訳) 近年, 音声合成技術は, ニューラルテキスト・トゥ・スピーチ(TTS)の進歩により, 高品質な音声合成を実現している。しかし、これらのTSモデルは、生産にコストがかかり、既存のすべての言語にはスケーラビリティが低い膨大なデータに依存しており、特に低リソース言語にはほとんど注目されない。知識伝達のような技術により、データセット作成の負担を軽減することができる。そこで本稿では,ソーシャルメディアからのデータを小さなTSデータセット構築に使用することができるか,低リソース言語における言語間変換学習(TL)が,このタイプのデータを扱うことができるか,という2つの側面について検討する。本稿では,単言語コーパスの学習の代替として,多言語モデリングがどの程度活用できるかを具体的に評価する。そこで本稿では,対象とする低リソース言語に対するTSモデルをトレーニングするために,外国語からのデータをどのように選択し,プールするかを検討する。以上の結果から,多言語事前学習は単言語事前学習よりも,生成した音声の明瞭さと自然性を高めることが示唆された。 Recent speech technologies have led to produce high quality synthesised speech due to recent advances in neural Text to Speech (TTS). However, such TTS models depend on extensive amounts of data that can be costly to produce and is hardly scalable to all existing languages, especially that seldom attention is given to low resource languages. With techniques such as knowledge transfer, the burden of creating datasets can be alleviated. In this paper, we therefore investigate two aspects; firstly, whether data from social media can be used for a small TTS dataset construction, and secondly whether cross lingual transfer learning (TL) for a low resource language can work with this type of data. In this aspect, we specifically assess to what extent multilingual modeling can be leveraged as an alternative to training on monolingual corporas. To do so, we explore how data from foreign languages may be selected and pooled to train a TTS model for a target low resource language. Our findings show that multilingual pre-training is better than monolingual pre-training at increasing the intelligibility and naturalness of the generated speech.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# 画像検索技術の概観:データ強化と逆学習アプローチ A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches ( http://arxiv.org/abs/2409.01219v1 ) ライセンス: Link先を確認	Kim Jinwoo,	(参考訳) 画像検索はコンピュータビジョンにおいて重要な研究課題であり、オンライン製品検索からセキュリティ監視システムまで幅広い応用の見通しがある。近年,ディープラーニングの進歩により画像検索の精度と効率が著しく向上している。しかし、既存の手法はまだ多くの課題に直面しており、特に大規模データセット、クロスドメイン検索、照明の変動、閉塞、視点などの現実的な条件から生じるイメージ摂動を扱う。これらの課題に対処するため,画像検索の分野では,データ拡張手法や逆学習法が広く応用されている。データ拡張は、より多様なトレーニングサンプルを生成し、現実世界のバリエーションをシミュレートし、オーバーフィッティングを減らすことで、モデルの一般化能力と堅牢性を高める。一方、敵の攻撃と防衛は、潜在的な攻撃に対するモデルの堅牢性を改善し、実用的な応用における信頼性を確保するために、訓練中に摂動を導入する。本稿では,画像検索における最新の研究成果を包括的に要約し,検索性能向上におけるデータ強化と対人学習技術の役割に着目した。今後の方向性や潜在的な課題についても論じる。 Image retrieval is a crucial research topic in computer vision, with broad application prospects ranging from online product searches to security surveillance systems. In recent years, the accuracy and efficiency of image retrieval have significantly improved due to advancements in deep learning. However, existing methods still face numerous challenges, particularly in handling large-scale datasets, cross-domain retrieval, and image perturbations that can arise from real-world conditions such as variations in lighting, occlusion, and viewpoint. Data augmentation techniques and adversarial learning methods have been widely applied in the field of image retrieval to address these challenges. Data augmentation enhances the model's generalization ability and robustness by generating more diverse training samples, simulating real-world variations, and reducing overfitting. Meanwhile, adversarial attacks and defenses introduce perturbations during training to improve the model's robustness against potential attacks, ensuring reliability in practical applications. This review comprehensively summarizes the latest research advancements in image retrieval, with a particular focus on the roles of data augmentation and adversarial learning techniques in enhancing retrieval performance. Future directions and potential challenges are also discussed.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# THInC: コンピュータ・ヒューム検出のための理論駆動型フレームワーク THInC: A Theory-Driven Framework for Computational Humor Detection ( http://arxiv.org/abs/2409.01232v1 ) ライセンス: Link先を確認	Victor De Marez, Thomas Winters, Ayla Rigouts Terryn,	(参考訳) ヒューマンコミュニケーションと認知の基本的な側面は、社会的なエンゲージメントにおいて重要な役割を果たすためである。ユーモアに関する理論は何世紀にもわたって進化してきたが、単一の総合的なユーモア理論についてはまだ合意が得られていない。同様に、大規模言語モデルの最近の進歩にもかかわらず、コンピュータでユーモアを認識することは重要な課題である。さらに、ユーモアを検出するためのほとんどの計算手法は、既存のユーモア理論に基づいていない。本稿では、THInC(Theory-driven Humor Interpretation and Classification)と呼ばれる複数のユーモア理論に基づく、ユーモア分類のための解釈可能なフレームワークを作成することにより、ユーモア理論研究と計算ユーモア検出の長年のギャップを埋めることに貢献した。 THInCは解釈可能なGA2M分類器をアンサンブルし、それぞれ異なるユーモア理論を表す。私たちは、理論の異なる側面を定量的に反映するプロキシ機能を積極的に作成するために、透明なフローを設計しました。このフレームワークの実装により、F1スコアは0.85となる。フレームワークの連想的解釈可能性により、プロキシの有効性の分析、ジョークの特徴と理論のアライメント、グローバルに貢献する特徴の同定が可能になる。本稿では,様々なユーモア理論から情報を得て,理論駆動型ユーモア分類の今後の発展のための基盤を提供するユーモア検出フレームワークの構築に向けた先駆的な取り組みを示す。また、ユーモア理論を定量的に自動比較する第一歩として機能する。 Humor is a fundamental aspect of human communication and cognition, as it plays a crucial role in social engagement. Although theories about humor have evolved over centuries, there is still no agreement on a single, comprehensive humor theory. Likewise, computationally recognizing humor remains a significant challenge despite recent advances in large language models. Moreover, most computational approaches to detecting humor are not based on existing humor theories. This paper contributes to bridging this long-standing gap between humor theory research and computational humor detection by creating an interpretable framework for humor classification, grounded in multiple humor theories, called THInC (Theory-driven Humor Interpretation and Classification). THInC ensembles interpretable GA2M classifiers, each representing a different humor theory. We engineered a transparent flow to actively create proxy features that quantitatively reflect different aspects of theories. An implementation of this framework achieves an F1 score of 0.85. The associative interpretability of the framework enables analysis of proxy efficacy, alignment of joke features with theories, and identification of globally contributing features. This paper marks a pioneering effort in creating a humor detection framework that is informed by diverse humor theories and offers a foundation for future advancements in theory-driven humor classification. It also serves as a first step in automatically comparing humor theories in a quantitative manner.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# SoK: 自動運転車の画像処理パイプラインのセキュリティ SoK: Security of the Image Processing Pipeline in Autonomous Vehicles ( http://arxiv.org/abs/2409.01234v1 ) ライセンス: Link先を確認	Michael Kühr, Mohammad Hamad, Pedram MohajerAnsari, Mert D. Pesé, Sebastian Steinhorst,	(参考訳) カメラは自動運転車にとって重要なセンサーだ。それらは、知覚を含む多くの安全クリティカルなタスクに不可欠な画像をキャプチャする。これらのイメージを処理するには、複数のレイヤを持つ複雑なパイプラインを使用する。このパイプラインに対するセキュリティ攻撃は、乗客の安全とシステムパフォーマンスに深刻な影響を及ぼす可能性がある。しかし、多くの攻撃はパイプラインの異なるレイヤを見落としており、その実現可能性と影響は様々である。画像処理パイプラインの品質と堅牢性を改善する研究は行われているが、これらの取り組みは、その潜在的な相乗効果を意識せずに、しばしばセキュリティ研究と並行して機能する。本研究では,自律走行車における画像処理パイプラインのセキュリティとロバスト性の研究を組み合わせることで,このギャップを埋めることを目的とする。我々は,自動車セキュリティ標準ISO 21434による攻撃のリスクを分類し,システムセキュリティ全体のすべてのレイヤを検討する必要性を強調した。我々はまた、既存の堅牢性の研究が、現在の研究ギャップに対処して、攻撃の影響を軽減するのにどのように役立つかを実証する。最後に、各層にまたがる様々なパラメータに影響を及ぼすことができる組込みテストベッドを提案し、研究者は異なる防御戦略の効果と攻撃効果を分析できる。ユースケース分析により,このようなテスト環境の重要性を実証し,強靭性に関する研究の一例として,HDR画像を用いて視覚障害を緩和する方法を示す。 Cameras are crucial sensors for autonomous vehicles. They capture images that are essential for many safety-critical tasks, including perception. To process these images, a complex pipeline with multiple layers is used. Security attacks on this pipeline can severely affect passenger safety and system performance. However, many attacks overlook different layers of the pipeline, and their feasibility and impact vary. While there has been research to improve the quality and robustness of the image processing pipeline, these efforts often work in parallel with security research, without much awareness of their potential synergy. In this work, we aim to bridge this gap by combining security and robustness research for the image processing pipeline in autonomous vehicles. We classify the risk of attacks using the automotive security standard ISO 21434, emphasizing the need to consider all layers for overall system security. We also demonstrate how existing robustness research can help mitigate the impact of attacks, addressing the current research gap. Finally, we present an embedded testbed that can influence various parameters across all layers, allowing researchers to analyze the effects of different defense strategies and attack impacts. We demonstrate the importance of such a test environment through a use-case analysis and show how blinding attacks can be mitigated using HDR imaging as an example of robustness-related research.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# MRIおよびメタボロミクスに基づく年齢スコアは、多コホート・フェデレーション・ラーニングによって示される死亡予測に相乗的に作用する MRI-based and metabolomics-based age scores act synergetically for mortality prediction shown by multi-cohort federated learning ( http://arxiv.org/abs/2409.01235v1 ) ライセンス: Link先を確認	Pedro Mateus, Swier Garst, Jing Yu, Davy Cats, Alexander G. J. Harms, Mahlet Birhanu, Marian Beekman, P. Eline Slagboom, Marcel Reinders, Jeroen van der Grond, Andre Dekker, Jacobus F. A. Jansen, Magdalena Beran, Miranda T. Schram, Pieter Jelle Visser, Justine Moonen, Mohsen Ghanbari, Gennady Roshchupkin, Dina Vojinovic, Inigo Bermejo, Hailiang Mei, Esther E. Bron,	(参考訳) 生物学的年齢スコアは、生理的バイオマーカーに基づいて時系列を推定することにより、老化を特徴づける新しいツールである。様々なスコアが老化関連の結果と関連している。脳MRI画像(BrainAge)による年齢スコアとメタボミクスバイオマーカー(MetaboAge)による年齢スコアの関係について検討した。我々は3つのコホートでBrainAgeを推定するために、連合型ディープラーニングモデルを訓練した。フェデレートされたBrainAgeモデルでは,コホート全体の年齢予測誤差が局所訓練モデルよりも有意に低かった。コホート間の年齢間隔を調和させることにより、BrainAgeの精度が向上した。その後,連合と生存分析を用いてBrainAgeとMetaboAgeを比較した。その結果,BrainAgeとMetaboAgeの相関は小さかった。そこで本研究では,老化過程の異なる側面を捉えた老化スコアについて検討した。 Biological age scores are an emerging tool to characterize aging by estimating chronological age based on physiological biomarkers. Various scores have shown associations with aging-related outcomes. This study assessed the relation between an age score based on brain MRI images (BrainAge) and an age score based on metabolomic biomarkers (MetaboAge). We trained a federated deep learning model to estimate BrainAge in three cohorts. The federated BrainAge model yielded significantly lower error for age prediction across the cohorts than locally trained models. Harmonizing the age interval between cohorts further improved BrainAge accuracy. Subsequently, we compared BrainAge with MetaboAge using federated association and survival analyses. The results showed a small association between BrainAge and MetaboAge as well as a higher predictive value for the time to mortality of both scores combined than for the individual scores. Hence, our study suggests that both aging scores capture different aspects of the aging process.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# 信頼に値するハイパースペクトル画像分類のための空間認識コンフォーマル予測 Spatial-Aware Conformal Prediction for Trustworthy Hyperspectral Image Classification ( http://arxiv.org/abs/2409.01236v1 ) ライセンス: Link先を確認	Kangdao Liu, Tianhao Sun, Hao Zeng, Yongshan Zhang, Chi-Man Pun, Chi-Man Vong,	(参考訳) ハイパースペクトル画像(HSI)分類では、各ピクセルに特定のラベルを割り当て、様々な土地被覆カテゴリを識別する。深層分類器はこの分野で高い予測精度を示してきたが、その不確実性を定量化することは重要な課題であり、重要な文脈での応用を妨げる。本研究ではまず,HSI分類の文脈において,不確実性定量化の新たな手法である \textit{Conformal Prediction} (CP) の適用性について理論的に評価する。次に、信頼に値する予測セットをHSI分類器に提供するコンフォメーション手順を提案し、これらのセットが真のラベルをユーザ特定確率で含むことを保証するカバレッジ保証を提供する。この基盤を基盤として,HSIに固有の必須空間情報を,空間相関の高い画素の非整合点を集約して組み込んだ「textit{Spatial-Aware Conformal Prediction} (\texttt{SACP})」を導入する。理論的および実証的な結果は、'texttt{SACP} が HSI 分類において標準 CP より優れていることを示している。ソースコードは \url{https://github.com/J4ckLiu/SACP} でアクセスできる。 Hyperspectral image (HSI) classification involves assigning specific labels to each pixel to identify various land cover categories. Although deep classifiers have shown high predictive accuracy in this field, quantifying their uncertainty remains a significant challenge, which hinders their application in critical contexts. This study first theoretically evaluates the applicability of \textit{Conformal Prediction} (CP), an emerging technique for uncertainty quantification, in the context of HSI classification. We then propose a conformal procedure that provides HSI classifiers with trustworthy prediction sets, offering coverage guarantees that ensure these sets contain the true labels with a user-specified probability. Building on this foundation, we introduce \textit{Spatial-Aware Conformal Prediction} (\texttt{SACP}), which incorporates essential spatial information inherent in HSIs by aggregating non-conformity scores of pixels with high spatial correlation. Both theoretical and empirical results demonstrate that \texttt{SACP} outperforms standard CP in HSI classification. The source code is accessible at \url{https://github.com/J4ckLiu/SACP}.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# CyberCortex.AI: 自律ロボットと複雑自動化のためのAIベースのオペレーティングシステム CyberCortex.AI: An AI-based Operating System for Autonomous Robotics and Complex Automation ( http://arxiv.org/abs/2409.01241v1 ) ライセンス: Link先を確認	Sorin Grigorescu, Mihai Zaha,	(参考訳) 自律型ロボットと複雑な自動化アプリケーションを制御するための基盤となるフレームワークは、知覚制御タスクをスケジューリングできるオペレーティングシステム(OS)であり、他のロボットピアやリモートクラウドコンピュータにリアルタイムのデータ通信を提供する。本稿では、異種AIベースのロボティクスと複雑な自動化アプリケーションを実現するために設計されたロボットOSであるCyberCortex.AIを紹介する。 CyberCortex.AIは分散分散OSで、ロボット同士の対話やクラウド上の高性能コンピュータ(HPC)との通信を可能にする。ロボットのセンサーと制御データは、その後ロボットにデプロイされるAIアルゴリズムのトレーニングを目的として、HPCシステムに向けてストリームされる。ロボットの各機能(例えば、知覚データ取得、経路計画、動作制御など)は、インターネットを介して共有されるいわゆるDataBlock of Filterの中で実行される。データは、いわゆる \textit{Temporal Addressable Memory} (TAM) を通じて格納され、各フィルタの入力と出力の間のゲートウェイとして機能する。 CyberCortex.AIには2つの主要なコンポーネントがある。 i) ロボットの組み込みハードウェア上で動作するDataBlockのリアルタイム実装であるCyberCortex.AI.inferenceシステム ii) クラウド上のHPCコンピュータ上で動作するCyberCortex.AI.dojoで、AIアルゴリズムの設計、トレーニング、デプロイに使用される。本稿では,Unitree A1脚ロボットとAnafi Parrot 4Kドローンをベースとした森林火災防止システムである「textit{i}」と,CyberCortex.AIを用いた自律走行システム「textit{ii}」を提案する。 The underlying framework for controlling autonomous robots and complex automation applications are Operating Systems (OS) capable of scheduling perception-and-control tasks, as well as providing real-time data communication to other robotic peers and remote cloud computers. In this paper, we introduce CyberCortex.AI, a robotics OS designed to enable heterogeneous AI-based robotics and complex automation applications. CyberCortex.AI is a decentralized distributed OS which enables robots to talk to each other, as well as to High Performance Computers (HPC) in the cloud. Sensory and control data from the robots is streamed towards HPC systems with the purpose of training AI algorithms, which are afterwards deployed on the robots. Each functionality of a robot (e.g. sensory data acquisition, path planning, motion control, etc.) is executed within a so-called DataBlock of Filters shared through the internet, where each filter is computed either locally on the robot itself, or remotely on a different robotic system. The data is stored and accessed via a so-called \textit{Temporal Addressable Memory} (TAM), which acts as a gateway between each filter's input and output. CyberCortex.AI has two main components: i) the CyberCortex.AI.inference system, which is a real-time implementation of the DataBlock running on the robots' embedded hardware, and ii) the CyberCortex.AI.dojo, which runs on an HPC computer in the cloud, and it is used to design, train and deploy AI algorithms. We present a quantitative and qualitative performance analysis of the proposed approach using two collaborative robotics applications: \textit{i}) a forest fires prevention system based on an Unitree A1 legged robot and an Anafi Parrot 4K drone, as well as \textit{ii}) an autonomous driving system which uses CyberCortex.AI for collaborative perception and motion control.	翻訳日:2024-09-06 06:47:21 公開日:2024-09-02
# 符号摂動和法のサンプル複雑度 Sample Complexity of the Sign-Perturbed Sums Method ( http://arxiv.org/abs/2409.01243v1 ) ライセンス: Link先を確認	Szabolcs Szentpéteri, Balázs Csanád Csáji,	(参考訳) 独立雑音項や対称雑音項などの微妙な統計的仮定の下で,真のシステムパラメータに対する正確で漸近的でない信頼領域を構成するサイン・パーステッド・サムズ法(SPS)のサンプル複雑性について検討する。 SPSの標準的なバージョンは線形回帰問題を扱うが、閉ループのセットアップであっても確率線形(力学)システムや非線形および非パラメトリック問題にも一般化できる。この手法の強い整合性は厳密に証明されたが、アルゴリズムのサンプルの複雑さはスカラー線形回帰問題に対してのみ解析された。本稿では,一般線形回帰問題に対するSPSのサンプル複雑性について検討する。有限試料径のSPS信頼領域の直径に対して高い確率上界を確立し,SPS領域が古典的漸近的信頼楕円体と同じ最適な速度で収縮することを示す。最後に,SPS信頼領域の理論的境界と経験的大きさの差について実験的に検討した。 We study the sample complexity of the Sign-Perturbed Sums (SPS) method, which constructs exact, non-asymptotic confidence regions for the true system parameters under mild statistical assumptions, such as independent and symmetric noise terms. The standard version of SPS deals with linear regression problems, however, it can be generalized to stochastic linear (dynamical) systems, even with closed-loop setups, and to nonlinear and nonparametric problems, as well. Although the strong consistency of the method was rigorously proven, the sample complexity of the algorithm was only analyzed so far for scalar linear regression problems. In this paper we study the sample complexity of SPS for general linear regression problems. We establish high probability upper bounds for the diameters of SPS confidence regions for finite sample sizes and show that the SPS regions shrink at the same, optimal rate as the classical asymptotic confidence ellipsoids. Finally, the difference between the theoretical bounds and the empirical sizes of SPS confidence regions is investigated experimentally.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# 安全強化学習における安全な探索の再考 Revisiting Safe Exploration in Safe Reinforcement learning ( http://arxiv.org/abs/2409.01245v1 ) ライセンス: Link先を確認	David Eckel, Baohe Zhang, Joschka Bödecker,	(参考訳) 安全強化学習(SafeRL)は、安全という概念で標準的な強化学習を拡張している。しかし、この指標はコストの上昇の程度を区別できず、頻繁な軽度イベントと同等の重大なコストイベントを扱い、リスクの高い振る舞いを招き、安全でない探索をもたらす可能性がある。本研究では, トレーニング中の安全対策として, 安全でないステップの重大度を連続的発生に基づいて評価する, 新たな測定基準であるEMCCを導入する。この指標は特に、長期間の安全違反と時折の安全違反の区別に有効である。 EMMCをオン・アンド・オフ・ポリシーのアルゴリズムに応用し,その安全性をベンチマークする。最後に,ベンチマークによる評価を行い,アルゴリズム設計のための高速な評価を可能にする,新しい軽量ベンチマークタスクを提案する。 Safe reinforcement learning (SafeRL) extends standard reinforcement learning with the idea of safety, where safety is typically defined through the constraint of the expected cost return of a trajectory being below a set limit. However, this metric fails to distinguish how costs accrue, treating infrequent severe cost events as equal to frequent mild ones, which can lead to riskier behaviors and result in unsafe exploration. We introduce a new metric, expected maximum consecutive cost steps (EMCC), which addresses safety during training by assessing the severity of unsafe steps based on their consecutive occurrence. This metric is particularly effective for distinguishing between prolonged and occasional safety violations. We apply EMMC in both on- and off-policy algorithm for benchmarking their safe exploration capability. Finally, we validate our metric through a set of benchmarks and propose a new lightweight benchmark task, which allows fast evaluation for algorithm design.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# 連続波場を用いた水素充填中空コアファイバの周波数変換 Frequency conversion in a hydrogen-filled hollow-core fiber using continuous-wave fields ( http://arxiv.org/abs/2409.01246v1 ) ライセンス: Link先を確認	Anica Hamer, Frank Vewinger, Thorsten Peters, Michael H. Frosz, Simon Stellmer,	(参考訳) 光ファイバーに基づく大規模量子ネットワークでは、光子はいわゆるフライング量子ビットとして情報の基本キャリアである。これらはまた、可視または近赤外波長で動作する原子と固体のプラットフォームからなるハイブリッドアーキテクチャの異なるコンポーネント間の相互接続や、テレコムバンド内の光リンクとして機能する。量子周波数変換は、その量子状態を保ちながら単一の光子の色を変える経路である。現在、このプロセスには非線形結晶が使用されている。しかし、その性能は、受信帯域幅、チューニング可能性、偏光感度、および望ましくない背景放射によって制限される。有望な代替手段は、気体中の刺激されたラマン散乱に基づいている。ここでは,水素充填反共振中空コアファイバの偏光保存周波数変換について述べる。このアプローチは光ファイバーネットワークへのシームレスな統合と単一エミッタへのインタフェースを約束する。パルスポンプ場を用いた関連する実験とは違い、2つのコヒーレント連続波ポンプ場を利用する。 In large-area quantum networks based on optical fibers, photons are the fundamental carriers of information as so-called flying qubits. They may also serve as the interconnect between different components of a hybrid architecture, which might comprise atomic and solid state platforms operating at visible or near-infrared wavelengths, as well as optical links in the telecom band. Quantum frequency conversion is the pathway to change the color of a single photon while preserving its quantum state. Currently, nonlinear crystals are utilized for this process. However, their performance is limited by their acceptance bandwidth, tunability, polarization sensitivity, as well as undesired background emission. A promising alternative is based on stimulated Raman scattering in gases. Here, we demonstrate polarization-preserving frequency conversion in a hydrogen-filled anti-resonant hollow-core fiber. This approach holds promises for seamless integration into optical fiber networks and interfaces to single emitters. Disparate from related experiments that employ a pulsed pump field, we here take advantage of two coherent continuous-wave pump fields.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# 大規模言語モデルにおけるリスク評価のための会話複雑度 Conversational Complexity for Assessing Risk in Large Language Models ( http://arxiv.org/abs/2409.01247v1 ) ライセンス: Link先を確認	John Burden, Manuel Cebrian, Jose Hernandez-Orallo,	(参考訳) 大きな言語モデル(LLM)は二重用途ジレンマを示し、特に対話的相互作用を通じて、有害な可能性を持ちながら有益なアプリケーションを可能にする。様々な安全対策にもかかわらず、先進的なLLMは脆弱なままである。ケビン・ルースのBingとの有名な会話は、長期にわたる対話の後有害なアウトプットを引き起こした。これは、同様のコンテンツをより簡単に作成できる単純な初期のジェイルブレイクとは対照的であり、疑問を提起する: LLMから有害な情報を引き出すのに、どのくらいの会話努力が必要か? 本稿では,特定の応答を得るために使用される会話長を定量化する会話長(CL)と,その応答につながるユーザの命令シーケンスのコルモゴロフ複雑性として定義される会話複雑度(CC)の2つの尺度を提案する。 Kolmogorov複雑性の計算不能性に対処するため,リファレンスLCMを用いてCCを近似し,ユーザ命令の圧縮性を評価する。このアプローチを大規模な赤チームデータセットに適用し、有害で無害な会話の長さと複雑さの統計的分布を定量的に分析する。我々の経験から、この分布分析とCCの最小化はAIの安全性を理解するための貴重なツールであり、有害な情報のアクセシビリティに関する洞察を与えてくれることが示唆されている。この研究は、LLMの安全性に対する新たな視点の基礎を確立し、害を与える経路のアルゴリズム的な複雑さを中心にしている。 Large Language Models (LLMs) present a dual-use dilemma: they enable beneficial applications while harboring potential for harm, particularly through conversational interactions. Despite various safeguards, advanced LLMs remain vulnerable. A watershed case was Kevin Roose's notable conversation with Bing, which elicited harmful outputs after extended interaction. This contrasts with simpler early jailbreaks that produced similar content more easily, raising the question: How much conversational effort is needed to elicit harmful information from LLMs? We propose two measures: Conversational Length (CL), which quantifies the conversation length used to obtain a specific response, and Conversational Complexity (CC), defined as the Kolmogorov complexity of the user's instruction sequence leading to the response. To address the incomputability of Kolmogorov complexity, we approximate CC using a reference LLM to estimate the compressibility of user instructions. Applying this approach to a large red-teaming dataset, we perform a quantitative analysis examining the statistical distribution of harmful and harmless conversational lengths and complexities. Our empirical findings suggest that this distributional analysis and the minimisation of CC serve as valuable tools for understanding AI safety, offering insights into the accessibility of harmful information. This work establishes a foundation for a new perspective on LLM safety, centered around the algorithmic complexity of pathways to harm.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# 逆算法:逆算法ロバスト性判定法の検討と評価 Adversarial Pruning: A Survey and Benchmark of Pruning Methods for Adversarial Robustness ( http://arxiv.org/abs/2409.01249v1 ) ライセンス: Link先を確認	Giorgio Piras, Maura Pintor, Ambra Demontis, Battista Biggio, Giorgio Giacinto, Fabio Roli,	(参考訳) 近年の研究では、ニューラルネットワークのプルーニング技術が提案され、ネットワークのサイズを減らし、敵の例に対する堅牢性を保っている。これらの手法は, 複雑で明瞭な設計を伴い, 相違を解析し, 公正かつ正確な比較を行うのが困難である。本研究では,これらの課題を,現在の敵作法を調査し,パイプライン,いつ産卵するか,具体例,どのように産卵するかという2つの主要な次元に基づいて分類する新しい分類法を提案することによって克服する。次に、現在の経験分析の限界を強調し、それに対応するための新しい公正な評価ベンチマークを提案する。最終的に,現在の逆解析手法の実証的再評価を行い,その結果について考察し,高い性能の逆解析手法の共通特性と共通問題を明らかにする。 https://github.com/pralab/AdversarialPruningBenchmarkで公開されているベンチマークへのコントリビューションを歓迎します。 Recent work has proposed neural network pruning techniques to reduce the size of a network while preserving robustness against adversarial examples, i.e., well-crafted inputs inducing a misclassification. These methods, which we refer to as adversarial pruning methods, involve complex and articulated designs, making it difficult to analyze the differences and establish a fair and accurate comparison. In this work, we overcome these issues by surveying current adversarial pruning methods and proposing a novel taxonomy to categorize them based on two main dimensions: the pipeline, defining when to prune; and the specifics, defining how to prune. We then highlight the limitations of current empirical analyses and propose a novel, fair evaluation benchmark to address them. We finally conduct an empirical re-evaluation of current adversarial pruning methods and discuss the results, highlighting the shared traits of top-performing adversarial pruning methods, as well as common issues. We welcome contributions in our publicly-available benchmark at https://github.com/pralab/AdversarialPruningBenchmark	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# GAS: アクティベーション支援型非同期フェデレーションラーニング GAS: Generative Activation-Aided Asynchronous Split Federated Learning ( http://arxiv.org/abs/2409.01251v1 ) ライセンス: Link先を確認	Jiarong Yang, Yuan Liu,	(参考訳) Split Federated Learning (SFL)は、クライアントとサーバ間の共有モデルを分割し、協調的にトレーニングする。最近のSFL研究は、クライアントからサーバへのアクティベーションとクライアント側モデルの同期送信を想定している。しかし、クライアント間の計算能力と通信能力の大幅な変化により、アクティベーションとクライアント側モデルが非同期にサーバにやってくる。非同期による遅延はSFLの性能を著しく低下させる。この問題に対処するために,アクティベーションバッファとモデルバッファをサーバに埋め込んで,それぞれに非同期に送信されるアクティベーションとクライアント側モデルを管理する非同期SFLフレームワークを検討する。さらに、非同期アクティベーション送信がリソース豊富なクライアントからのアクティベーションを頻繁に受信するので、サーバサイドモデルのバイアスのある更新につながるため、生成アクティベーション支援非同期SFL(GAS)を提案する。 GASでは、受信したアクティベーションに基づいて各ラベルのアクティベーション分布を保持し、バイアスの程度に応じてこれらの分布からアクティベーションを生成する。これらの生成アクティベーションは、サーバサイドモデルの更新を支援し、より正確な更新を保証するために使用される。より厳密な収束境界を導出し,提案手法の有効性を実証した。 Split Federated Learning (SFL) splits and collaboratively trains a shared model between clients and server, where clients transmit activations and client-side models to server for updates. Recent SFL studies assume synchronous transmission of activations and client-side models from clients to server. However, due to significant variations in computational and communication capabilities among clients, activations and client-side models arrive at server asynchronously. The delay caused by asynchrony significantly degrades the performance of SFL. To address this issue, we consider an asynchronous SFL framework, where an activation buffer and a model buffer are embedded on the server to manage the asynchronously transmitted activations and client-side models, respectively. Furthermore, as asynchronous activation transmissions cause the buffer to frequently receive activations from resource-rich clients, leading to biased updates of the server-side model, we propose Generative activations-aided Asynchronous SFL (GAS). In GAS, the server maintains an activation distribution for each label based on received activations and generates activations from these distributions according to the degree of bias. These generative activations are then used to assist in updating the server-side model, ensuring more accurate updates. We derive a tighter convergence bound, and our experiments demonstrate the effectiveness of the proposed method.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# 単眼深度3次元モデリングによる自律走行のリアルタイム予測 Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling ( http://arxiv.org/abs/2409.01256v1 ) ライセンス: Link先を確認	Haicheng Liao, Yongkang Li, Chengyue Wang, Songning Lai, Zhenning Li, Zilin Bian, Jaeyoung Lee, Zhiyong Cui, Guohui Zhang, Chengzhong Xu,	(参考訳) 交通事故予測の第一の目的は、自動運転技術の安全性と信頼性を高める上で重要な課題であるダシュカムビデオを用いて、潜在的な事故をリアルタイムで予測することである。本研究では,高度な3Dシーンモデリングのためのモノクルディープキューを組み込むことにより,現在のSOTA(State-of-the-art (SOTA))2D手法を超えて予測能力を著しく向上させる,革新的なフレームワークであるAccNetを紹介する。本稿では,交通事故データセットにおけるスキュードデータ分散の課題に対処し,早期予測のためのバイナリ適応損失(BA-LEA)を提案する。この新たな損失関数は、マルチタスク学習戦略とともに、予測モデルの焦点を事故前の臨界瞬間にシフトさせる。平均精度 (AP) や平均タイム・トゥ・アクシデント (mTTA) といった重要な指標を用いて, 予測精度に優れるDashcam Accident Dataset (DAD) , Car Crash Dataset (CCD) , AnAn Accident Detection (A3D) , DADA-2000 Dataset (DADA-2000) の3つのベンチマークデータセット上で, フレームワークの性能を厳格に評価する。 The primary goal of traffic accident anticipation is to foresee potential accidents in real time using dashcam videos, a task that is pivotal for enhancing the safety and reliability of autonomous driving technologies. In this study, we introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods by incorporating monocular depth cues for sophisticated 3D scene modeling. Addressing the prevalent challenge of skewed data distribution in traffic accident datasets, we propose the Binary Adaptive Loss for Early Anticipation (BA-LEA). This novel loss function, together with a multi-task learning strategy, shifts the focus of the predictive model towards the critical moments preceding an accident. {We rigorously evaluate the performance of our framework on three benchmark datasets--Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D), and DADA-2000 Dataset--demonstrating its superior predictive accuracy through key metrics such as Average Precision (AP) and mean Time-To-Accident (mTTA).	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# 二重機械学習がパネルデータに到達 -- 約束、落とし穴、潜在的な解決策 Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions ( http://arxiv.org/abs/2409.01266v1 ) ライセンス: Link先を確認	Jonathan Fuhr, Dominik Papies,	(参考訳) 機械学習(ML)アルゴリズムを用いた因果効果の推定は、適切なフレームワークで使用すれば、機能的なフォーム仮定を緩和するのに役立ちます。しかしながら、これらのフレームワークの多くは断面データの設定を前提としていますが、研究者はしばしばパネルデータにアクセスでき、従来の方法では、ユニット間の不均一性を扱うのに役立ちます。本稿では、観測されていない異種性の存在下でのパネルデータに対して、ダブル/デバイアスド機械学習(DML)(Chernozhukov et al , 2018)を適用する方法について検討する。この適応は、DMLのクロスフィッティング手順が独立データを前提としており、観測されていない不均一性は、非線形に観測された不均一な設定において必ずしも追加的に分離可能であるとは限らないため、困難である。様々なシミュレーションにおいて,直感的に魅力的な推定器の性能を評価する。クロスフィット仮定の違反は効果推定の正確性にほとんど不適切であると考えられるが、多くの手法では観測されていない不均一性の存在を適切に考慮できない。しかし,DMLにおける相関ランダムエフェクトアプローチ(Mundlak, 1978)に基づく予測モデルを用いることで,観測された共同設立者数に対して大きなサンプルサイズを考慮し,正確な係数推定が可能であることが判明した。また、観測された共同設立者に対する観測されていない異種性の影響が、ほとんどの代替手法の性能に重要な役割を担っていることも示している。 Estimating causal effect using machine learning (ML) algorithms can help to relax functional form assumptions if used within appropriate frameworks. However, most of these frameworks assume settings with cross-sectional data, whereas researchers often have access to panel data, which in traditional methods helps to deal with unobserved heterogeneity between units. In this paper, we explore how we can adapt double/debiased machine learning (DML) (Chernozhukov et al., 2018) for panel data in the presence of unobserved heterogeneity. This adaptation is challenging because DML's cross-fitting procedure assumes independent data and the unobserved heterogeneity is not necessarily additively separable in settings with nonlinear observed confounding. We assess the performance of several intuitively appealing estimators in a variety of simulations. While we find violations of the cross-fitting assumptions to be largely inconsequential for the accuracy of the effect estimates, many of the considered methods fail to adequately account for the presence of unobserved heterogeneity. However, we find that using predictive models based on the correlated random effects approach (Mundlak, 1978) within DML leads to accurate coefficient estimates across settings, given a sample size that is large relative to the number of observed confounders. We also show that the influence of the unobserved heterogeneity on the observed confounders plays a significant role for the performance of most alternative methods.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# DAVIDE:深度を意識した動画のデブリ DAVIDE: Depth-Aware Video Deblurring ( http://arxiv.org/abs/2409.01274v1 ) ライセンス: Link先を確認	German F. Torres, Jussi Kalliola, Soumya Tripathy, Erman Acar, Joni-Kristian Kämäräinen,	(参考訳) ビデオのデブロアリングは、ぼやけたフレームの連続からシャープな詳細を回復することを目的としている。携帯電話における深度センサの普及と,深度情報により深度を誘導する可能性にもかかわらず,深度認識の難読化はわずかしか注目されていない。本稿では,映像の深度情報の影響を研究するために,DAVIDEデータセットについて紹介する。データセットは、同期されたぼかし、シャープ、ディープビデオで構成されている。本稿では,既存の深度RGBビデオデブロアリングモデルに深度情報を注入する方法について検討し,深度対応ビデオデブロアリングのための強力なベースラインを提案する。ビデオデブリにおける深度情報の意義を明らかにするとともに,深度手がかりが有用である症例について考察した。さらに, この結果から, 深度が劣化性能を向上する一方で, モデルに時間的コンテキストが長くなると, この効果は低下することが示された。プロジェクトページ: https://germanftv.github.io/DAVIDE.github.io/ Video deblurring aims at recovering sharp details from a sequence of blurry frames. Despite the proliferation of depth sensors in mobile phones and the potential of depth information to guide deblurring, depth-aware deblurring has received only limited attention. In this work, we introduce the 'Depth-Aware VIdeo DEblurring' (DAVIDE) dataset to study the impact of depth information in video deblurring. The dataset comprises synchronized blurred, sharp, and depth videos. We investigate how the depth information should be injected into the existing deep RGB video deblurring models, and propose a strong baseline for depth-aware video deblurring. Our findings reveal the significance of depth information in video deblurring and provide insights into the use cases where depth cues are beneficial. In addition, our results demonstrate that while the depth improves deblurring performance, this effect diminishes when models are provided with a longer temporal context. Project page: https://germanftv.github.io/DAVIDE.github.io/ .	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# CHSH不等式の不整合 An inconsistency in the CHSH inequality ( http://arxiv.org/abs/2409.01275v1 ) ライセンス: Link先を確認	Andrea Aiello,	(参考訳) CHSHの不等式の違反は、量子力学と局所的で現実的な隠れ変数理論の間の不可逆的な衝突を示すと考えられている。我々は、CHSH不等式を証明する数学的仮定が、実際、そのような不等式をテストする実験の物理学とは相容れないことを示した。これは、現在利用可能な実験データに基づいて、局所的な現実的な隠れ変数理論を排除できないことを意味する。しかし、CHSH不等式の実験的な証明は原則として可能であることも示しているが、実際、そのような実験をどのように実装するかは定かではない。 Violation of the CHSH inequality supposedly demonstrates an irreconcilable conflict between quantum mechanics and local, realistic hidden variable theories. We show that the mathematical assumptions underlying the proof of the CHSH inequality are, in fact, incompatible with the physics of the experiments testing such inequality. This implies that we cannot dismiss local realistic hidden variable theories on the basis of currently available experimental data yet. However, we also show that an experimental proof of CHSH inequality is, in principle, possible, but it is unclear how to implement, in practice, such an experiment.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# ビジネスプロセス改善の体系的レビュー:運用研究とビジネスプロセスマネジメントの概念の融合における成果と可能性 A Systematic Review of Business Process Improvement: Achievements and Potentials in Combining Concepts from Operations Research and Business Process Management ( http://arxiv.org/abs/2409.01276v1 ) ライセンス: Link先を確認	Michel Kunkler, Felix Schumann, Stefanie Rinderle-Ma,	(参考訳) ビジネスプロセスマネジメントとオペレーションリサーチは、どちらも組織における価値創造を強化することを目的としている2つの研究分野です。ビジネス・プロセス・マネジメントは歴史的に正確なモデルの提供に重点を置いてきたが、オペレーティング・リサーチはトラクタブル・モデルとそのソリューションの構築に重点を置いてきた。この体系的な文献レビューは、両方の分野から組み合わせた概念を用いた作品を特定し分析する。特に、ビジネスプロセスモデルがどのように数学的モデルとして概念化され、どの最適化技術がこれらのモデルに適用されたかを分析する。その結果,資源配分とスケジューリングの問題に強い焦点が当てられている。現在のアプローチは、多くの問題に対する確率的な性質のサポートを欠いていることが多く、リソース関連の情報やデータの観点からの情報といった、プロセスモデルやイベントログからの情報のみをわずかに利用する。 Business Process Management and Operations Research are two research fields that both aim to enhance value creation in organizations. While Business Process Management has historically emphasized on providing precise models, Operations Research has focused on constructing tractable models and their solutions. This systematic literature review identifies and analyzes work that uses combined concepts from both disciplines. In particular, it analyzes how business process models have been conceptualized as mathematical models and which optimization techniques have been applied to these models. Results indicate a strong focus on resource allocation and scheduling problems. Current approaches often lack support of the stochastic nature of many problems, and do only sparsely use information from process models or from event logs, such as resource-related information or information from the data perspective.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# 1次元ベクトル量子化に基づく画像分類に基づく逆攻撃 One-Index Vector Quantization Based Adversarial Attack on Image Classification ( http://arxiv.org/abs/2409.01282v1 ) ライセンス: Link先を確認	Haiju Fan, Xiaona Qin, Shuang Chen, Hubert P. H. Shum, Ming Li,	(参考訳) ストレージと送信を改善するため、画像は一般的に圧縮される。ベクトル量子化(VQ)は圧縮率の高い圧縮法であり、他の圧縮手法を抑圧する。これにもかかわらず、画像分類における既存の敵対的攻撃法は、圧縮された領域ではほとんど例外なくピクセル領域で実行されており、現実のシナリオでは適用できない。本稿では,VQ領域における新たなワンインデックス攻撃手法を提案する。ワンインデックス攻撃方法は、圧縮されたデータストリーム内の単一のインデックスを変更して、圧縮された画像が誤って分類されるようにする。攻撃を実現するには単一のVQインデックスを変更するだけでよい。提案手法は,実際の攻撃シナリオと一致した半ブラックボックス攻撃に属する。本稿では,Resnet,NIN,VGG16の3つの画像分類モデルに対して,本手法を適用した。 CIFAR-10 と Fashion MNIST の画像の 55.9% と 77.4% はそれぞれ、高いレベルの誤分類信頼性と低いレベルの画像摂動で攻撃に成功している。 To improve storage and transmission, images are generally compressed. Vector quantization (VQ) is a popular compression method as it has a high compression ratio that suppresses other compression techniques. Despite this, existing adversarial attack methods on image classification are mostly performed in the pixel domain with few exceptions in the compressed domain, making them less applicable in real-world scenarios. In this paper, we propose a novel one-index attack method in the VQ domain to generate adversarial images by a differential evolution algorithm, successfully resulting in image misclassification in victim models. The one-index attack method modifies a single index in the compressed data stream so that the decompressed image is misclassified. It only needs to modify a single VQ index to realize an attack, which limits the number of perturbed indexes. The proposed method belongs to a semi-black-box attack, which is more in line with the actual attack scenario. We apply our method to attack three popular image classification models, i.e., Resnet, NIN, and VGG16. On average, 55.9% and 77.4% of the images in CIFAR-10 and Fashion MNIST, respectively, are successfully attacked, with a high level of misclassification confidence and a low level of image perturbation.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# IoMTの医療・患者に対する包括的影響 Comprehensive up-to-date impact of the IoMT in healthcare and patients ( http://arxiv.org/abs/2409.01287v1 ) ライセンス: Link先を確認	Guy. Mouanda,	(参考訳) インターネット・オブ・メディカル・モノ(IoMT)は、医療データの収集・拡散に多数の技術を適用し、医療サービスの特徴、有効性、可用性を開発することを目的とした、急速に拡大する分野である。 IoMTデバイスには、ウェアラブルセンサー、インプラント可能なデバイス、スマートホームメソッド、遠隔医療ポリシー、モバイルアプリケーションが含まれている。 IoMTの応用は、慢性疾患の管理、遠隔患者の監視、緊急対応、臨床診断支援、健康増進、健康など多岐にわたる。本稿では,この発展領域の長所,短所,展望の方向性について考察する。また、IoMTの倫理的、法的、社会的意味や、IoMT環境の危険性や脆弱性についても検討する。 The Internet of Medical Things (IoMT) is a quickly expanding field that intends to develop the features, effectiveness, and availability of healthcare services by applying numerous technologies to gather and diffuse medical data. IoMT devices incorporate wearable sensors, implantable devices, smart home methods, telemedicine policies, and mobile applications. IoMT applications range from chronic disease administration, remote patient monitoring, emergency response, and clinical decision support to health promotion and wellness. This paper aligns on the advantages, defies, and outlook directions of this developing domain. The paper also examines the ethical, legal, and social implications of IoMT, as well as the possible risks and vulnerabilities of the IoMT environment	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# De Broglie-Bohm量子力学 De Broglie-Bohm Quantum Mechanics ( http://arxiv.org/abs/2409.01294v1 ) ライセンス: Link先を確認	Antony Valentini,	(参考訳) De Broglie-Bohmパイロット波の量子力学の定式化について概説し、場の理論、高エネルギー物理学、重力、宇宙論への応用を強調した。 We provide an overview of the de Broglie-Bohm pilot-wave formulation of quantum mechanics, emphasising its applications to field theory, high-energy physics, gravitation, and cosmology.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# 絡み合いの離散診断としての位相次数と$Δ$VAEへの応用 Topological degree as a discrete diagnostic for disentanglement, with applications to the $Δ$VAE ( http://arxiv.org/abs/2409.01303v1 ) ライセンス: Link先を確認	Mahefa Ratsisetraina Ravelonanosy, Vlado Menkovski, Jacobus W. Portegies,	(参考訳) 本研究では, 単位球面$\mathcal{S}^2$の拡散変分オートエンコーダ(\Delta$VAE)を潜在空間として用いて, トポロジカル・幾何学的構造を捉え, 潜在因子を分解する能力について検討する。そこで本研究では,データ多様体から潜在空間への写像であるエンコーダの位相次数 (topological degree of the encoder) を新たに導入する。ホモロジー理論のツールを用いて、この次数を計算するアルゴリズムを導出し、実装する。トレーニング手順から得られたモデルのエンコーダのエンコーダの度合いをアルゴリズムを用いて計算する。実験の結果、$\Delta$VAEはLSBDスコアが比較的小さいことを示し、初期化後の度合に関わらず、訓練後のエンコーダの次数は$-1$または$+1$となり、その結果、エンコーダは少なくとも同相であることを示す。 We investigate the ability of Diffusion Variational Autoencoder ($\Delta$VAE) with unit sphere $\mathcal{S}^2$ as latent space to capture topological and geometrical structure and disentangle latent factors in datasets. For this, we introduce a new diagnostic of disentanglement: namely the topological degree of the encoder, which is a map from the data manifold to the latent space. By using tools from homology theory, we derive and implement an algorithm that computes this degree. We use the algorithm to compute the degree of the encoder of models that result from the training procedure. Our experimental results show that the $\Delta$VAE achieves relatively small LSBD scores, and that regardless of the degree after initialization, the degree of the encoder after training becomes $-1$ or $+1$, which implies that the resulting encoder is at least homotopic to a homeomorphism.	翻訳日:2024-09-06 06:37:11 公開日:2024-09-02
# ニューラルネットワークを用いた高精度実空間電子密度 Highly Accurate Real-space Electron Densities with Neural Networks ( http://arxiv.org/abs/2409.01306v1 ) ライセンス: Link先を確認	Lixue Cheng, P. Bernát Szabó, Zeno Schätzle, Derk Kooi, Jonas Köhler, Klaas J. H. Giesbertz, Frank Noé, Jan Hermann, Paola Gori-Giorgi, Adam Foster,	(参考訳) 量子化学における変分ab-initio法は、波動関数への直接アクセスを提供する他の方法の中でも際立っている。これは原則として、エネルギー以外の他の観測可能な興味の抽出を可能にするが、実際、この抽出は技術的に困難であり、計算的に非現実的であることが多い。ここでは,電子密度を量子化学において観測可能な中心となるものとみなし,その密度を既知の漸近特性を捉えるニューラルネットワークを用いて表現し,スコアマッチングとノイズコントラスト推定により波動関数からトレーニングすることにより,実空間多電子波関数から正確な密度を求める新しい手法を提案する。深層学習型 ans\atze (深部QMC) を用いた変分量子モンテカルロを用いて、基底セット誤差のない高精度な波動関数を得るとともに、新しい手法を用いて、双極子モーメント、原子間力、接触密度、その他の密度に基づく特性を計算して、対応する正確な電子密度を求める。 Variational ab-initio methods in quantum chemistry stand out among other methods in providing direct access to the wave function. This allows in principle straightforward extraction of any other observable of interest, besides the energy, but in practice this extraction is often technically difficult and computationally impractical. Here, we consider the electron density as a central observable in quantum chemistry and introduce a novel method to obtain accurate densities from real-space many-electron wave functions by representing the density with a neural network that captures known asymptotic properties and is trained from the wave function by score matching and noise-contrastive estimation. We use variational quantum Monte Carlo with deep-learning ans\"atze (deep QMC) to obtain highly accurate wave functions free of basis set errors, and from them, using our novel method, correspondingly accurate electron densities, which we demonstrate by calculating dipole moments, nuclear forces, contact densities, and other density-based properties.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# クープマン演算子理論によるニューラルネットワーク層を線形演算として表現する Representing Neural Network Layers as Linear Operations via Koopman Operator Theory ( http://arxiv.org/abs/2409.01308v1 ) ライセンス: Link先を確認	Nishant Suresh Aswani, Saif Eddin Jabari, Muhammad Shafique,	(参考訳) 単純なニューラルネットワークの強い性能は、しばしばその非線形活性化に起因する。しかし、ニューラルネットワークの線形ビューは、ネットワークの理解と制御をよりアプローチしやすくする。ニューラルネットワークの動的システムビューから、クープマン作用素理論と動的モード分解(DMD)との接続を利用して、新しい視点を提供する。同時に、システムを適切な可観測空間に埋め込むことで、動的システムを線形化するフレームワークを提供する。ニューラルネットワークを力学系として再フレーミングすることにより、事前学習された多層パーセプトロン(MLP)の非線形層を有限次元線形作用素に置き換えることができることを示す。さらに、DMD の固有値と SVD の右特異ベクトルを分析し、時間遅延座標がネットワーク層を線形化するクープマン理論において、単純かつ高効率な観測可能空間を提供することを示す。その結果、Yin-YangデータセットでトレーニングされたMLPの層をDMDモデルからの予測に置き換え、元の98.4%と比較して最大97.3%のmdoel精度を実現した。さらに、MNISTデータセットでトレーニングされたMLPのレイヤを95.8%に置き換える。 The strong performance of simple neural networks is often attributed to their nonlinear activations. However, a linear view of neural networks makes understanding and controlling networks much more approachable. We draw from a dynamical systems view of neural networks, offering a fresh perspective by using Koopman operator theory and its connections with dynamic mode decomposition (DMD). Together, they offer a framework for linearizing dynamical systems by embedding the system into an appropriate observable space. By reframing a neural network as a dynamical system, we demonstrate that we can replace the nonlinear layer in a pretrained multi-layer perceptron (MLP) with a finite-dimensional linear operator. In addition, we analyze the eigenvalues of DMD and the right singular vectors of SVD, to present evidence that time-delayed coordinates provide a straightforward and highly effective observable space for Koopman theory to linearize a network layer. Consequently, we replace layers of an MLP trained on the Yin-Yang dataset with predictions from a DMD model, achieving a mdoel accuracy of up to 97.3%, compared to the original 98.4%. In addition, we replace layers in an MLP trained on the MNIST dataset, achieving up to 95.8%, compared to the original 97.2% on the test set.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# 集団開発における望ましくないパターンとは何か What Could Possibly Go Wrong: Undesirable Patterns in Collective Development ( http://arxiv.org/abs/2409.01312v1 ) ライセンス: Link先を確認	Mikhail Evtikhiev, Ekaterina Koshchenko, Vladimir Kovalenko,	(参考訳) ソフトウェア開発は、しばしば技術的取り組みと見なされるが、基本的には、チームメンバ間のコラボレーションを必要とする社会的活動である。これを認めて、ソフトウェア開発コミュニティは、コラボレーションに関連する潜在的な欠点に対処するための戦略を考案した。様々な研究がソフトウェア工学における社会的ダイナミクスを捉えようと試みている。本研究では,多くのチームワーク問題を識別する手法を開発し,それに対応する様々なアプローチを提案する。しかしながら、一部のチームワークの問題はまだ検討されておらず、実践者の認識から共通のパターンへの包括的ボトムアップ調査が必要である。本稿では, 集団開発における望ましくないパターンの概念を紹介する。詳細な38回の探索的なインタビューを通じて,42のパターンを識別・分類し,その起源と結果を明らかにする。その後の調査では、それぞれ436名と968名の参加者が、望ましくないパターンの重要性と頻度を調査し、これらのパターンを管理する潜在的なツールや特徴を評価した。この研究は、望ましくないパターンの微妙な理解に寄与し、その影響を評価し、産業応用のための実用的ツールと特徴を提案する。この発見は、より詳細な研究と、協調的なソフトウェアエンジニアリングプラクティスを強化するツールの開発のための貴重な基盤を提供する。 Software development, often perceived as a technical endeavor, is fundamentally a social activity requiring collaboration among team members. Acknowledging this, the software development community has devised strategies to address possible collaboration-related shortcomings. Various studies have attempted to capture the social dynamics within software engineering. In these studies, the authors developed methods to identify numerous teamwork issues and proposed various approaches to address them. However, certain teamwork issues remain unstudied, necessitating a comprehensive bottom-up exploration from practitioner's perceptions to common patterns. This paper introduces the concept of undesirable patterns in collective development, referring to potential teamwork problems that may escalate if unaddressed. Through 38 in-depth exploratory interviews, we identify and classify 42 patterns, revealing their origins and consequences. Subsequent surveys, 436 and 968 participants each, explore the significance and frequency of the undesirable patterns, and evaluate potential tools and features to manage these patterns. The study contributes a nuanced understanding of undesirable patterns, evaluating their impact and proposing pragmatic tools and features for industrial application. The findings provide a valuable foundation for further in-depth studies and the development of tools to enhance collaborative software engineering practices.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# イメージジェネレータの診断精度向上のための平均埋め込み Disentangling Mean Embeddings for Better Diagnostics of Image Generators ( http://arxiv.org/abs/2409.01314v1 ) ライセンス: Link先を確認	Sebastian G. Gruber, Pascal Tobias Ziegler, Florian Buettner,	(参考訳) イメージジェネレータの評価は、特定の画像領域に対する微妙な洞察を提供することにおいて、従来のメトリクスの限界のため、依然として課題である。画像のすべての領域が同様の容易さで学習されるわけではないため、これは重要な問題である。本研究では,中心核アライメントによる個々の画素クラスタに対するコサイン類似性の積に平均埋め込みのコサイン類似性を解き放つ新しい手法を提案する。これにより、クラスタワイズ性能が全体の画像生成性能に与える影響を定量化することができる。実世界の様々なユースケースにおいて、モデル誤動作の画素領域を識別する可能性や説明性をいかに向上させるかを示す。 The evaluation of image generators remains a challenge due to the limitations of traditional metrics in providing nuanced insights into specific image regions. This is a critical problem as not all regions of an image may be learned with similar ease. In this work, we propose a novel approach to disentangle the cosine similarity of mean embeddings into the product of cosine similarities for individual pixel clusters via central kernel alignment. Consequently, we can quantify the contribution of the cluster-wise performance to the overall image generation performance. We demonstrate how this enhances the explainability and the likelihood of identifying pixel regions of model misbehavior across various real-world use cases.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# 多周波ニューラルボルン反復法による2次元逆散乱問題の解法 Multi-frequency Neural Born Iterative Method for Solving 2-D Inverse Scattering Problems ( http://arxiv.org/abs/2409.01315v1 ) ライセンス: Link先を確認	Daoqi Liu, Tao Shan, Maokun Li, Fan Yang, Shenheng Xu,	(参考訳) 本研究では,多周波電磁法(EM)逆散乱問題(ISP)に対処する深層学習に基づくイメージング手法を提案する。深層学習技術とEM物理法則を組み合わせることで、単周波ニューラルBIMの原理に導かれる多周波ニューラルボルン反復法(NeuralBIM)の開発に成功した。この手法は,マルチタスク学習技術とNeuralBIMの効率的な反復インバージョン処理を統合し,堅牢な多周波ボルン反復インバージョンモデルを構築する。トレーニング中、モデルはホモシステマティックな不確実性によって導かれるマルチタスク学習アプローチを採用し、各周波数データの重みを適応的に割り当てる。さらに、ISPの物理法則に制約された教師なし学習法を用いて、コントラストや全フィールドデータを必要としない多周波ニューラルBIMモデルを訓練する。多周波ニューラルBIMの有効性は、ISPを解くための精度と計算効率の向上を実証し、合成および実験データを通して検証する。さらに、この手法は強力な一般化機能と耐雑音性を示す。多周波ニューラルBIM法は、多周波EMデータに対する新しい逆変換法を探索し、多周波データの電磁ISPに有効な解を提供する。 In this work, we propose a deep learning-based imaging method for addressing the multi-frequency electromagnetic (EM) inverse scattering problem (ISP). By combining deep learning technology with EM physical laws, we have successfully developed a multi-frequency neural Born iterative method (NeuralBIM), guided by the principles of the single-frequency NeuralBIM. This method integrates multitask learning techniques with NeuralBIM's efficient iterative inversion process to construct a robust multi-frequency Born iterative inversion model. During training, the model employs a multitask learning approach guided by homoscedastic uncertainty to adaptively allocate the weights of each frequency's data. Additionally, an unsupervised learning method, constrained by the physical laws of ISP, is used to train the multi-frequency NeuralBIM model, eliminating the need for contrast and total field data. The effectiveness of the multi-frequency NeuralBIM is validated through synthetic and experimental data, demonstrating improvements in accuracy and computational efficiency for solving ISP. Moreover, this method exhibits strong generalization capabilities and noise resistance. The multi-frequency NeuralBIM method explores a novel inversion method for multi-frequency EM data and provides an effective solution for the electromagnetic ISP of multi-frequency data.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# LoGex:ガイド拡散による極めて稀な病理組織クラスの尾部検出の改善 LoGex: Improved tail detection of extremely rare histopathology classes via guided diffusion ( http://arxiv.org/abs/2409.01317v1 ) ライセンス: Link先を確認	Maximilian Mueller, Matthias Hein,	(参考訳) 現実的な医療環境では、データは本質的に長い尾を持つことが多く、ほとんどのサンプルは少数のクラスと稀なクラスの長い尾に集中しており、通常は少数のサンプルしか含まれていない。この分布は、希少な条件が検出しにくく、限られたデータのために分類することが難しいため、重大な課題である。本稿では,レアクラスを分類する代わりに,配布外データとして確実に検出することを目的とする。我々はローランク適応(LoRA)と拡散誘導を利用して、検出問題に対するターゲット合成データを生成する。本研究は, 頭骨の分類精度を低下させることなく, 尾骨の10サンプルのみを用いて, 組織学的課題におけるOOD検出性能を著しく改善した。 In realistic medical settings, the data are often inherently long-tailed, with most samples concentrated in a few classes and a long tail of rare classes, usually containing just a few samples. This distribution presents a significant challenge because rare conditions are critical to detect and difficult to classify due to limited data. In this paper, rather than attempting to classify rare classes, we aim to detect these as out-of-distribution data reliably. We leverage low-rank adaption (LoRA) and diffusion guidance to generate targeted synthetic data for the detection problem. We significantly improve the OOD detection performance on a challenging histopathological task with only ten samples per tail class without losing classification accuracy on the head classes.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# 量子イマジナリー時間進化による強相関量子多体系の近似基底状態の生成 Generating Approximate Ground States of Strongly Correlated Quantum Many-Body Systems Through Quantum Imaginary Time Evolution ( http://arxiv.org/abs/2409.01320v1 ) ライセンス: Link先を確認	Michael P. Kaicher, Florian Dommert, Christopher Wever, Maximilian Amsler, Michael Kühn,	(参考訳) 量子多体系の基底状態の特性を生成または探究するために設計されたほとんどの量子アルゴリズムは、所望の基底状態と大きな重なり合う初期状態として入力する必要がある。そのような基底状態を作るための1つのアプローチは、Imaginary Time Evolution (ITE)である。最近の[Motta, M., Sun, C., Tan, A.T.K. et al (2020)]の研究は、量子イマジナリー時間進化(Quantum Imaginary Time Evolution, QITE)と呼ばれるアルゴリズムを導入した。本研究では,格子および分子電子構造ハミルトニアンのITTを近似するQITEアルゴリズムの能力に関するヒューリスティックな研究を行う。大規模システムに対して古典的初期状態が整った場合のQITEアルゴリズムの性能を数値的に研究し,その一部が産業応用に関心を持ち,古典的平均場解よりも定性的にITTの挙動を再現し,改善できるかどうかを確認する。この研究で検討するシステムは、短距離と長距離の相互作用を示す様々な格子幾何学の1次元および2次元格子系から、分子電子構造のハミルトニアンの活性空間まで様々である。 QITE と ITE の比較に加え、フェルミオンガウス状態が古典的コンピュータ上で効率よく計算でき、任意の格子幾何学と次元におけるジェネリックスピンハミルトニアンの量子コンピュータ上で効率よく実装できる初期状態として機能しうることを示す。 Most quantum algorithms designed to generate or probe properties of the ground state of a quantum many-body system require as input an initial state with a large overlap with the desired ground state. One approach for preparing such a ground state is Imaginary Time Evolution (ITE). Recent work by [Motta, M., Sun, C., Tan, A.T.K. et al. (2020)] introduced an algorithm -- which we will refer to as Quantum Imaginary Time Evolution (QITE) -- that shows how ITE can be approximated by a sequence of unitary operators, making QITE potentially implementable on early fault-tolerant quantum computers. In this work, we provide a heuristic study of the capabilities of the QITE algorithm in approximating the ITE of lattice and molecular electronic structure Hamiltonians. We numerically study the performance of the QITE algorithm when provided with a good classical initial state for a large class of systems, some of which are of interest to industrial applications, and check if QITE is able to qualitatively replicate the ITE behavior and improve over a classical mean-field solution. The systems we consider in this work range from one- and two-dimensional lattice systems of various lattice geometries displaying short- and long-range interactions, to active spaces of molecular electronic structure Hamiltonians. In addition to the comparison of QITE and ITE, we explicitly show how imaginary time evolved fermionic Gaussian states can serve as initial states which can be efficiently computed on classical computers and efficiently implemented on quantum computers for generic spin Hamiltonians in arbitrary lattice geometries and dimensions, which can be of independent interest.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# ガイド・アンド・リスケール:効果的なチューニング自由な実画像編集のためのセルフガイド機構 Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing ( http://arxiv.org/abs/2409.01322v1 ) ライセンス: Link先を確認	Vadim Titov, Madina Khalmatova, Alexandra Ivanova, Dmitry Vetrov, Aibek Alanov,	(参考訳) 近年の大規模テキスト・画像生成モデルの発展にもかかわらず、実際の画像をこれらのモデルで操作することは難しい問題である。既存の編集方法の主な制限は、入力画像のイメージ固有の外観を維持するために、幅広い画像編集において一貫した品質で実行できないか、あるいは時間を要するハイパーパラメータチューニングや拡散モデルの微調整を必要とすることである。本稿では,誘導機構による拡散サンプリングプロセスの修正に基づく新しい手法を提案する。本研究では,入力画像の全体構造と編集すべきでない局所的な外観を保存するための自己誘導技術について検討する。特に,画像の局所的および大域的構造を保存することを目的としたレイアウト保存エネルギー関数を明示的に導入する。さらに,本研究では,世代間における分類器フリーガイダンスとガイドの基準のバランスをとることで,雑音分布の保存を可能にするノイズ再スケーリング機構を提案する。このような誘導的アプローチは、拡散モデルと正確な反転過程を微調整する必要はない。その結果,提案手法は高速かつ高品質な編集機構を提供する。本実験では,人為的評価と定量的解析により,提案手法により,人間に好適な編集が可能であり,原画像の編集品質と保存のトレードオフが良好であることを示す。私たちのコードはhttps://github.com/FusionBrainLab/Guide-and-Rescale.comで利用可能です。 Despite recent advances in large-scale text-to-image generative models, manipulating real images with these models remains a challenging problem. The main limitations of existing editing methods are that they either fail to perform with consistent quality on a wide range of image edits or require time-consuming hyperparameter tuning or fine-tuning of the diffusion model to preserve the image-specific appearance of the input image. We propose a novel approach that is built upon a modified diffusion sampling process via the guidance mechanism. In this work, we explore the self-guidance technique to preserve the overall structure of the input image and its local regions appearance that should not be edited. In particular, we explicitly introduce layout-preserving energy functions that are aimed to save local and global structures of the source image. Additionally, we propose a noise rescaling mechanism that allows to preserve noise distribution by balancing the norms of classifier-free guidance and our proposed guiders during generation. Such a guiding approach does not require fine-tuning the diffusion model and exact inversion process. As a result, the proposed method provides a fast and high-quality editing mechanism. In our experiments, we show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing which is more preferable by humans and also achieves a better trade-off between editing quality and preservation of the original image. Our code is available at https://github.com/FusionBrainLab/Guide-and-Rescale.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# 自律的なロコ操作課題における接地言語モデル Grounding Language Models in Autonomous Loco-manipulation Tasks ( http://arxiv.org/abs/2409.01326v1 ) ライセンス: Link先を確認	Jin Wang, Nikos Tsagarakis,	(参考訳) 行動自律性を持ったヒューマノイドロボットは、私たちの日常生活における理想的な協力者とされてきた。固定ベースのロボットアームと比較して、ヒューマノイドロボットはより大きな操作スペースを提供し、制御と計画の難しさを大幅に増大させる。汎用型ヒューマノイドロボットへの急速な進歩にもかかわらず、ほとんどの研究は、身体全体の調整とタスク計画に関する研究がほとんどなく、移動性と操作性の両方を含む長期的タスクをオープンエンドの言語指導下で実証する可能性を制限して、移動能力に重点を置いている。本研究では,異なるシナリオにおけるタスクに基づいて行動を学び,選択し,計画する新しいフレームワークを提案する。我々は、強化学習(RL)と全身最適化を組み合わせることで、ロボットの動きを生成し、それらをモーションライブラリーに格納する。我々はさらに,大規模言語モデル(LLM)の計画と推論機能を活用し,一連の動作プリミティブからなる階層的なタスクグラフを構築し,より高レベルな計画で下位レベルの実行をブリッジする。 CENTAUROロボットを用いたシミュレーションおよび実世界の実験により、言語モデルに基づくプランナーは、非構造化シーンにおける自由テキストコマンドからの高い自律性を証明し、新しいロコ操作タスクに効率的に適応できることが示されている。 Humanoid robots with behavioral autonomy have consistently been regarded as ideal collaborators in our daily lives and promising representations of embodied intelligence. Compared to fixed-based robotic arms, humanoid robots offer a larger operational space while significantly increasing the difficulty of control and planning. Despite the rapid progress towards general-purpose humanoid robots, most studies remain focused on locomotion ability with few investigations into whole-body coordination and tasks planning, thus limiting the potential to demonstrate long-horizon tasks involving both mobility and manipulation under open-ended verbal instructions. In this work, we propose a novel framework that learns, selects, and plans behaviors based on tasks in different scenarios. We combine reinforcement learning (RL) with whole-body optimization to generate robot motions and store them into a motion library. We further leverage the planning and reasoning features of the large language model (LLM), constructing a hierarchical task graph that comprises a series of motion primitives to bridge lower-level execution with higher-level planning. Experiments in simulation and real-world using the CENTAURO robot show that the language model based planner can efficiently adapt to new loco-manipulation tasks, demonstrating high autonomy from free-text commands in unstructured scenes.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# SPDiffusion:多概念テキスト画像生成のための意味的保護拡散 SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation ( http://arxiv.org/abs/2409.01327v1 ) ライセンス: Link先を確認	Yang Zhang, Rui Zhang, Xuecheng Nie, Haochen Li, Jikun Chen, Yifan Hao, Xin Zhang, Luoqi Liu, Ling Li,	(参考訳) 近年のテキスト・ツー・イメージモデルでは,高品質な画像の生成に顕著な成功を収めている。しかし、複数の文字やオブジェクトを含む画像を生成するマルチコンセプト生成をタスクすると、既存のメソッドは属性の混乱に悩まされ、重度のテキストイメージの不整合が発生する。属性の混乱は、潜在特徴のある領域が複数のまたは間違ったプロンプトトークンに付随する場合に発生する。本研究では,意味的保護拡散(SPDiffusion, Semantic Protection Diffusion)を提案する。 SPDiffusion frameworkでは、各領域とトークンの関連性を表すセマンティック保護マスク(SP-Mask)を設計し、生成プロセスにおいて無関係なトークンが特定の領域に与える影響を保護するためのセマンティック保護クロスアテンション(SP-Attn)を提案する。提案手法を評価するため,多種多様なマルチコンセプト・ベンチマークを作成し,SPDiffusionはこのベンチマークの最先端結果を達成し,その有効性を実証した。当社の方法は,ControlNet,Story Diffusion,PhotoMaker,PixArt-alphaなど,他の多くのアプリケーションメソッドやバックボーンと組み合わせて,マルチコンセプト機能を強化し,高い互換性とスケーラビリティを示す。 Recent text-to-image models have achieved remarkable success in generating high-quality images. However, when tasked with multi-concept generation which creates images containing multiple characters or objects, existing methods often suffer from attribute confusion, resulting in severe text-image inconsistency. We found that attribute confusion occurs when a certain region of the latent features attend to multiple or incorrect prompt tokens. In this work, we propose novel Semantic Protection Diffusion (SPDiffusion) to protect the semantics of regions from the influence of irrelevant tokens, eliminating the confusion of non-corresponding attributes. In the SPDiffusion framework, we design a Semantic Protection Mask (SP-Mask) to represent the relevance of the regions and the tokens, and propose a Semantic Protection Cross-Attention (SP-Attn) to shield the influence of irrelevant tokens on specific regions in the generation process. To evaluate our method, we created a diverse multi-concept benchmark, and SPDiffusion achieves state-of-the-art results on this benchmark, proving its effectiveness. Our method can be combined with many other application methods or backbones, such as ControlNet, Story Diffusion, PhotoMaker and PixArt-alpha to enhance their multi-concept capabilities, demonstrating strong compatibility and scalability.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# プライバシ保護機械学習における画像データセット機能の影響評価 Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning ( http://arxiv.org/abs/2409.01329v1 ) ライセンス: Link先を確認	Lucas Lange, Maurice-Maximilian Heykeroth, Erhard Rahm,	(参考訳) 機械学習(ML)はコンピュータビジョンを含む多くの分野において重要である。しかし、センシティブなデータに基づいてトレーニングされたMLモデルは、攻撃や情報漏洩が可能であるため、セキュリティ上の課題に直面している。プライバシ保存機械学習(PPML)は、差分プライバシー(DP)を使用して、ユーティリティとプライバシのバランスをとることで、この問題に対処する。本研究では,私的および非私的畳み込みニューラルネットワーク(CNN)モデルの有用性と脆弱性に影響を与える画像データセットの特徴を明らかにする。複数のデータセットとプライバシ予算を分析することで、不均衡なデータセットはマイノリティクラスで脆弱性を増大させるが、DPはこの問題を緩和する。クラスの少ないデータセットは、モデルユーティリティとプライバシの両方を改善し、高いエントロピーまたは低いFisher Discriminant Ratio(FDR)データセットは、ユーティリティとプライバシのトレードオフを悪化させる。これらの洞察は、画像データセットのユーティリティプライバシトレードオフを推定し、最適化する実践者や研究者にとって貴重なガイダンスを提供する。 Machine Learning (ML) is crucial in many sectors, including computer vision. However, ML models trained on sensitive data face security challenges, as they can be attacked and leak information. Privacy-Preserving Machine Learning (PPML) addresses this by using Differential Privacy (DP) to balance utility and privacy. This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models. Through analyzing multiple datasets and privacy budgets, we find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue. Datasets with fewer classes improve both model utility and privacy, while high entropy or low Fisher Discriminant Ratio (FDR) datasets deteriorate the utility-privacy trade-off. These insights offer valuable guidance for practitioners and researchers in estimating and optimizing the utility-privacy trade-off in image datasets, helping to inform data and privacy modifications for better outcomes based on dataset characteristics.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# デジタル病理組織学とディープラーニングを用いた小児脳腫瘍分類:多施設スウェーデンコホートを用いたSOTA法の評価 Pediatric brain tumor classification using digital histopathology and deep learning: evaluation of SOTA methods on a multi-center Swedish cohort ( http://arxiv.org/abs/2409.01330v1 ) ライセンス: Link先を確認	Iulian Emil Tampu, Per Nyman, Christoforos Spyretos, Ida Blystad, Alia Shamikh, Gabriela Prochazka, Teresita Díaz de Ståhl, Johanna Sandgren, Peter Lundberg, Neda Haj-Hosseini,	(参考訳) 脳腫瘍は小児や若年者で最も一般的な固形腫瘍であるが、大きな病理組織学的データセットの不足は、このグループにおける計算病理学の適用を制限している。本研究は、ヘマトキシリンおよびエオシン全スライド画像(WSI)の小児脳腫瘍を多施設スウェーデンコホートから分類するために、最先端の組織学的基盤モデルから得られたパッチ機能に対する2つの弱教師付き多重インスタンス学習(MIL)アプローチを実装した。スウェーデンの6つの大学病院から脳腫瘍と診断された540人の被験者(年齢8.5$\pm$4.9年)のWSIが集められた。インスタンス(パッチ)レベルの特徴は、事前訓練された3つの特徴抽出器(ResNet50, UNI, CONCH)を使用してWSIから取得した。患者レベルの分類には,アテンションベースMIL (ABMIL) やクラスタリング制約アテンションMIL (CLAM) を用いた。小児脳腫瘍の階層的分類に基づく3つの分類課題(腫瘍分類,家族分類,タイプ分類)をモデルとして評価した。モデル一般化は、2つのセンターのデータに関するトレーニングと、他の4つのセンターのデータによるテストによって評価された。注意マッピングによるモデル解釈性の評価を行った。 UNIとABBILの相関係数は0.86$\pm$0.04,0.63$\pm$0.04,0.53$\pm$0.05,それぞれ腫瘍分類,家族分類,型分類で達成された。一般化を評価する際、UNIとCONCHを利用したモデルはResNet50を用いたモデルよりも優れていた。しかし,in-siteからout-of-siteテストへの性能低下は,特徴抽出器間で類似していた。以上の結果から,多施設の国立データセットにおいて,様々な階層レベルでの小児脳腫瘍の診断における最先端の計算病理学的手法の可能性が示唆された。 Brain tumors are the most common solid tumors in children and young adults, but the scarcity of large histopathology datasets has limited the application of computational pathology in this group. This study implements two weakly supervised multiple-instance learning (MIL) approaches on patch-features obtained from state-of-the-art histology-specific foundation models to classify pediatric brain tumors in hematoxylin and eosin whole slide images (WSIs) from a multi-center Swedish cohort. WSIs from 540 subjects (age 8.5$\pm$4.9 years) diagnosed with brain tumor were gathered from the six Swedish university hospitals. Instance (patch)-level features were obtained from WSIs using three pre-trained feature extractors: ResNet50, UNI and CONCH. Instances were aggregated using attention-based MIL (ABMIL) or clustering-constrained attention MIL (CLAM) for patient-level classification. Models were evaluated on three classification tasks based on the hierarchical classification of pediatric brain tumors: tumor category, family and type. Model generalization was assessed by training on data from two of the centers and testing on data from four other centers. Model interpretability was evaluated through attention-mapping. The highest classification performance was achieved using UNI features and AMBIL aggregation, with Matthew's correlation coefficient of 0.86$\pm$0.04, 0.63$\pm$0.04, and 0.53$\pm$0.05, for tumor category, family and type classification, respectively. When evaluating generalization, models utilizing UNI and CONCH features outperformed those using ResNet50. However, the drop in performance from the in-site to out-of-site testing was similar across feature extractors. These results show the potential of state-of-the-art computational pathology methods in diagnosing pediatric brain tumors at different hierarchical levels with fair generalizability on a multi-center national dataset.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# Few-shot Guidanceによるテスト時間適応の強化 Enhancing Test Time Adaptation with Few-shot Guidance ( http://arxiv.org/abs/2409.01341v1 ) ライセンス: Link先を確認	Siqi Luo, Yi Xin, Yuntao Du, Zhongwei Wan, Tao Tan, Guangtao Zhai, Xiaohong Liu,	(参考訳) 深層ニューラルネットワークは、トレーニング(ソース)とテスト(ターゲット)データのドメインシフトに直面しながら、大きなパフォーマンス低下に直面することが多い。この問題に対処するために、事前訓練されたソースモデルを適用して、アウト・オブ・ディストリビューションのストリーミングターゲットデータを処理するテスト時間適応(TTA)手法が提案されている。これらの手法はある種の緩和を提供するが、ドメインシフト補正のための信頼性の高いメカニズムは欠如しており、現実のアプリケーションでは不安定であることが多い。そこで我々は,Few-Shot Test Time Adaptation (FS-TTA) を開発した。少ない入力の原則に従うと、FS-TTAは目に見えないターゲットドメインでの盲点探索を減らす。さらに,FS-TTAに取り組むための2段階のフレームワークを提案する。 (i)オーバーフィッティングを避けるために機能多様性拡張モジュールを使用するとともに、少数ショットのサポートセットで事前訓練されたソースモデルを微調整する。二モデル適応のための高品質な擬似ラベルを作成するため、プロトタイプメモリバンクガイダンスに基づくテスト時間適応を実装した。 3つのクロスドメイン分類ベンチマークに関する広範な実験を通じて、FS-TTAとフレームワークの性能と信頼性を実証した。 Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data. To address this issue, Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data. Although these methods offer some relief, they lack a reliable mechanism for domain shift correction, which can often be erratic in real-world applications. In response, we develop Few-Shot Test Time Adaptation (FS-TTA), a novel and practical setting that utilizes a few-shot support set on top of TTA. Adhering to the principle of few inputs, big gains, FS-TTA reduces blind exploration in unseen target domains. Furthermore, we propose a two-stage framework to tackle FS-TTA, including (i) fine-tuning the pre-trained source model with few-shot support set, along with using feature diversity augmentation module to avoid overfitting, (ii) implementing test time adaptation based on prototype memory bank guidance to produce high quality pseudo-label for model adaptation. Through extensive experiments on three cross-domain classification benchmarks, we demonstrate the superior performance and reliability of our FS-TTA and framework.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# Mutual Benefit: 公の場での自動運転車データの共有 Mutual Benefit: The Case for Sharing Autonomous Vehicle Data with the Public ( http://arxiv.org/abs/2409.01342v1 ) ライセンス: Link先を確認	David Goedicke, Natalie Chyi, Alexandra Bremers, Stacey Li, James Grimmelmann, Wendy Ju,	(参考訳) 自動運転は、公道で頻繁にテストされる、広く研究されている技術である。これらのテストから生成されたデータは、この技術を前進させる各企業にとって重要な競争要素である。本稿では、このデータの一部が、信頼された団体を通じて、実験中のコミュニティに対する補償と統制の形で共有することで、一般市民により明確な利益をもたらすべきだという規範的考え方を論じる。この議論を支持するために、どのようなデータを共有することができるか、自動運転車のデータを共有する倫理的ケース、現在AVデータの共有方法に関するケーススタディ、類似の交通業界から既存のデータ共有プラットフォームを引き合いに出し、どのようにデータを共有するべきかを推奨し、なぜそのようなデータ共有を奨励すべきかという議論を締めくくる。 Autonomous driving is a widely researched technology that is frequently tested on public roads. The data generated from these tests represent an essential competitive element for the respective companies moving this technology forward. In this paper, we argue for the normative idea that a part of this data should more explicitly benefit the general public by sharing it through a trusted entity as a form of compensation and control for the communities that are being experimented upon. To support this argument, we highlight what data is available to be shared, make the ethical case for sharing autonomous vehicle data, present case studies in how AV data is currently shared, draw from existing data-sharing platforms from similar transportation industries to make recommendations on how data should be shared and conclude with arguments as to why such data-sharing should be encouraged.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# Pairing Analogy-Augmented Generation with Procedural Memory for Procedural Q&A Pairing Analogy-Augmented Generation with Procedural Memory for Procedural Q&A ( http://arxiv.org/abs/2409.01344v1 ) ライセンス: Link先を確認	K Roth, Rushil Gupta, Simon Halle, Bang Liu,	(参考訳) RAGパラダイムのLLMは、様々なタスクにおいて顕著なパフォーマンスを示しているが、まだ目に見えない領域、特に手続き的質問応答のような複雑なタスクではパフォーマンスが低い。本研究では,テキストベースの手続きを操作するための新しい形式と構造を導入する。このフォーマリズムに基づいて、LangChain Pythonドキュメントから取り除かれたLCStepと呼ばれる新しいデータセットも提示する。さらに、従来のRAGシステムを拡張して、人間の類推的推論からインスピレーションを得て、過去の経験を同化して、目に見えない問題を解決する新しいシステムAg(analytic-augmented Generation)を提案する。提案手法は,特殊な知識に適応するために,カスタムプロシージャメモリストアを備えたフリーズ言語モデルを用いている。本研究では,LCStep,RecipeNLG,CHAMPデータセットにおいて,LCStep,RecipeNLG,ChAMPの2つのLLMに基づく評価において,AAGが数ショットベースライン,RAGベースラインよりも優れており,RecipeNLGの場合は人間による評価で相関していることを示す。 While LLMs in the RAG paradigm have shown remarkable performance on a variety of tasks, they still under-perform on unseen domains, especially on complex tasks like procedural question answering. In this work, we introduce a novel formalism and structure for manipulating text-based procedures. Based on this formalism, we further present a novel dataset called LCStep, scraped from the LangChain Python docs. Moreover, we extend the traditional RAG system to propose a novel system called analogy-augmented generation (AAG), that draws inspiration from human analogical reasoning and ability to assimilate past experiences to solve unseen problems. The proposed method uses a frozen language model with a custom procedure memory store to adapt to specialized knowledge. We demonstrate that AAG outperforms few-shot and RAG baselines on LCStep, RecipeNLG, and CHAMP datasets under a pairwise LLM-based evaluation, corroborated by human evaluation in the case of RecipeNLG.	翻訳日:2024-09-06 06:25:12 公開日:2024-09-02
# 暗黙の知識による準備から得られる言語モデル Language Models Benefit from Preparation with Elicited Knowledge ( http://arxiv.org/abs/2409.01345v1 ) ライセンス: Link先を確認	Jiacan Yu, Hannah An, Lenhart K. Schubert,	(参考訳) ゼロショット・チェーン・オブ・シンキング (ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング) は、複数の推論ステップを必要とするタスクに対して言語モデル (LM) による質問応答 (QA) でしばしば用いられる。しかしながら、いくつかのQAタスクは、連鎖推論ステップよりも、関連する知識へのアクセスに重点を置いている。そこで本研究では, LMの2つの事例を用いて, LM1が関連情報を生成し, LM2がこの情報に基づいて疑問に答える, PreP と呼ばれる単純な汎用的プロンプト手法を提案する。 PrePは、ユーザのドメイン知識から独立して設計されており、特別なプロンプトエンジニアリングを必要とせずに、様々なQAタスクに適用できる。提案手法の有効性を評価するため,人工物部品および材料組成に関する広範囲なスキーマ的データセットから,100のバイナリ選択質問のデータセットを作成する。これらの質問は、2つのアーティファクトのうちどれが、他のアーティファクトと材料を共有する可能性が低いかを問う。このような質問は、異なるアーティファクトの部分構造における共有材料に関するLMの知識を調査する。提案手法は,我々のデータセットと3つの公開コモンセンス推論データセットで検証する。我々の手法の平均精度は、テストされたすべてのデータセットでテストされた他のすべてのメソッドよりも一貫して高い。 The zero-shot chain of thought (CoT) approach is often used in question answering (QA) by language models (LMs) for tasks that require multiple reasoning steps, typically enhanced by the prompt "Let's think step by step." However, some QA tasks hinge more on accessing relevant knowledge than on chaining reasoning steps. We introduce a simple general prompting technique, called PREP, that involves using two instances of LMs: the first (LM1) generates relevant information, and the second (LM2) answers the question based on this information. PREP is designed to be general and independent of the user's domain knowledge, making it applicable across various QA tasks without the need for specialized prompt engineering. To evaluate the effectiveness of our prompting method, we create a dataset of 100 binary-choice questions, derived from an extensive schematic dataset on artifact parts and material composition. These questions ask which of two artifacts is less likely to share materials with another artifact. Such questions probe the LM's knowledge of shared materials in the part structure of different artifacts. We test our method on our dataset and three published commonsense reasoning datasets. The average accuracy of our method is consistently higher than that of all the other tested methods across all the tested datasets.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# ターゲット駆動蒸留:目標時間選択と分離誘導による連続蒸留 Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance ( http://arxiv.org/abs/2409.01347v1 ) ライセンス: Link先を確認	Cunzheng Wang, Ziyuan Guo, Yuxuan Duan, Huaxia Li, Nemo Chen, Xu Tang, Yao Hu,	(参考訳) 連続蒸留法は拡散モデルの生成タスクを加速させることで大きな成功を収めた。しかし, 従来の連続蒸留法では, 目標の時間ステップの選択に単純かつ簡単な手法が用いられていたため, 画像のぼやけや細かな損失に悩まされることが多い。これらの制約に対処するため,(1)ターゲット駆動蒸留(Target-Driven Distillation, TDD)を導入し,(1)目標タイムステップの微妙な選択戦略を採用し,トレーニング効率を向上する;(2)トレーニング中に分離したガイダンスを活用する;(2)推論期間中のガイダンス尺度の学習後にTDDを開放する;(3)非等価サンプリングとx0クリッピングをオプションで装備することで,画像サンプリングをより柔軟かつ正確に行えるようにする。実験では、TDDが数ステップの世代で最先端のパフォーマンスを達成することを検証する。 Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting target timesteps, they usually struggle with blurs and detail losses in generated images. To address these limitations, we introduce Target-Driven Distillation (TDD), which (1) adopts a delicate selection strategy of target timesteps, increasing the training efficiency; (2) utilizes decoupled guidances during training, making TDD open to post-tuning on guidance scale during inference periods; (3) can be optionally equipped with non-equidistant sampling and x0 clipping, enabling a more flexible and accurate way for image sampling. Experiments verify that TDD achieves state-of-the-art performance in few-step generation, offering a better choice among consistency distillation models.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# PatternPaint: 生成AIと塗装技術を用いたレイアウトパターンの生成 PatternPaint: Generating Layout Patterns Using Generative AI and Inpainting Techniques ( http://arxiv.org/abs/2409.01348v1 ) ライセンス: Link先を確認	Guanglei Zhou, Bhargav Korrapati, Gaurav Rajavendra Reddy, Jiang Hu, Yiran Chen, Dipto G. Thakurta,	(参考訳) VLSIレイアウトパターンの生成は、幅広いDFM(Design For Manufacturability)研究に不可欠である。本研究では,設計規則法則的金属配置パターンを作成するための生成機械学習モデルの可能性について検討する。提案手法は, 複雑な設計規則設定において法的なパターンを生成でき, 高い多様性を達成できることを示す。フレキシブルな設定を備えた設計システムは、局所的な変更を伴うパターン生成と、設計規則違反訂正の両方をサポートする。提案手法はIntel 18A Process Design Kit (PDK) で検証され,20の開始パターンしか持たない多種多様なDRC対応パターンライブラリを生成することができる。 Generation of VLSI layout patterns is essential for a wide range of Design For Manufacturability (DFM) studies. In this study, we investigate the potential of generative machine learning models for creating design rule legal metal layout patterns. Our results demonstrate that the proposed model can generate legal patterns in complex design rule settings and achieves a high diversity score. The designed system, with its flexible settings, supports both pattern generation with localized changes, and design rule violation correction. Our methodology is validated on Intel 18A Process Design Kit (PDK) and can produce a wide range of DRC-compliant pattern libraries with only 20 starter patterns.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# スペクトル: 逆補正を用いた条件変換器を用いたターゲット話者抽出 Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement ( http://arxiv.org/abs/2409.01352v1 ) ライセンス: Link先を確認	Tathagata Bandyopadhyay,	(参考訳) 近年,自然言語処理,コンピュータビジョン,信号処理など,多くのディープラーニングアプリケーションにおいて,アテンションベースのトランスフォーマーがデファクトスタンダードになっている。本論文では,モノラルマルチスピーカ混合音声信号からターゲット話者の音声を抽出するトランスフォーマーに基づくエンドツーエンドモデルを提案する。既存の話者抽出法とは異なり、話者埋め込み一貫性と波形エンコーダの可逆性を付加する2つの追加目的を導入し、話者エンコーダと音声分離器を併用して話者条件埋め込みをよりよく捉える。さらに,抽出した音声の知覚品質を向上するために,マルチスケール判別器を利用する。実験の結果,セパレータのバックボーンにデュアルパストランスフォーマーを用いることで,CNNのベースラインを3.12ドルdBポイント向上できることがわかった。最後に、我々のアプローチを最近の最先端技術と比較し、我々のモデルは、追加のデータ依存を発生させずに、平均4.1ドルのdBポイントで既存のメソッドを上回ります。 Recently, attention-based transformers have become a de facto standard in many deep learning applications including natural language processing, computer vision, signal processing, etc.. In this paper, we propose a transformer-based end-to-end model to extract a target speaker's speech from a monaural multi-speaker mixed audio signal. Unlike existing speaker extraction methods, we introduce two additional objectives to impose speaker embedding consistency and waveform encoder invertibility and jointly train both speaker encoder and speech separator to better capture the speaker conditional embedding. Furthermore, we leverage a multi-scale discriminator to refine the perceptual quality of the extracted speech. Our experiments show that the use of a dual path transformer in the separator backbone along with proposed training paradigm improves the CNN baseline by $3.12$ dB points. Finally, we compare our approach with recent state-of-the-arts and show that our model outperforms existing methods by $4.1$ dB points on an average without creating additional data dependency.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# ピクセルからオブジェクトへ:局所的および大域的アグリゲーションを用いた部分と対象のセグメンテーションのための階層的アプローチ From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation ( http://arxiv.org/abs/2409.01353v1 ) ライセンス: Link先を確認	Yunfei Xie, Cihang Xie, Alan Yuille, Jieru Mei,	(参考訳) 本稿では,高精細画像分割作業のための階層型トランスフォーマーモデルを導入し,オブジェクト分割の包括的範囲で分割の粒度を効果的にブリッジする。このアプローチの核心は多面的表現戦略であり、個々のピクセルからスーパーピクセルへ体系的に進行し、最終的には凝集性グループ形成へと発展する。このアーキテクチャは、ローカルアグリゲーションとグローバルアグリゲーションという2つの重要なアグリゲーション戦略によって支えられている。局所アグリゲーションはスーパーピクセルを形成するために使用され、画像データの固有の冗長性を利用してオブジェクトの特定の部分と密に整合したセグメントを生成し、オブジェクトレベルの監視によってガイドされる。対照的に、グローバルアグリゲーションはこれらのスーパーピクセルをインターリンクし、それらを大きなグループに編成し、オブジェクト全体と相関し、部分レベルの監視の恩恵を受ける。このデュアルアグリゲーションフレームワークは、計算効率を保ちながら、様々な監視入力への多彩な適応を保証する。本手法は, 異なる監督モダリティ間の適応性と計算管理性のバランスを改善し, セグメンテーション性能の大幅な向上を図っている。 PartImageNetデータセットでテストすると,従来の状態よりも2.8%,mIoUスコアが0.8%,オブジェクトセグメンテーションが0.8%向上した。同様に、Pascal Partデータセットでは、それぞれ1.5%と2.0%のパフォーマンス向上を記録している。 In this paper, we introduce a hierarchical transformer-based model designed for sophisticated image segmentation tasks, effectively bridging the granularity of part segmentation with the comprehensive scope of object segmentation. At the heart of our approach is a multi-level representation strategy, which systematically advances from individual pixels to superpixels, and ultimately to cohesive group formations. This architecture is underpinned by two pivotal aggregation strategies: local aggregation and global aggregation. Local aggregation is employed to form superpixels, leveraging the inherent redundancy of the image data to produce segments closely aligned with specific parts of the object, guided by object-level supervision. In contrast, global aggregation interlinks these superpixels, organizing them into larger groups that correlate with entire objects and benefit from part-level supervision. This dual aggregation framework ensures a versatile adaptation to varying supervision inputs while maintaining computational efficiency. Our methodology notably improves the balance between adaptability across different supervision modalities and computational manageability, culminating in significant enhancement in segmentation performance. When tested on the PartImageNet dataset, our model achieves a substantial increase, outperforming the previous state-of-the-art by 2.8% and 0.8% in mIoU scores for part and object segmentation, respectively. Similarly, on the Pascal Part dataset, it records performance enhancements of 1.5% and 2.0% for part and object segmentation, respectively.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# 説明空間: 時系列解釈可能性の新しい視点 Explanation Space: A New Perspective into Time Series Interpretability ( http://arxiv.org/abs/2409.01354v1 ) ライセンス: Link先を確認	Shahbaz Rezaei, Xin Liu,	(参考訳) 深層学習モデルの人間による理解可能な説明は、多くの重要かつ敏感なアプリケーションに必要である。各入力特徴(分類器の決定のために)の重要性を直接入力に投影できる画像や表データとは異なり、時系列識別可能な特徴(例えば支配周波数)は、ユーザが容易に理解できる時間領域に現れにくいことが多い。さらに、ほとんどの説明手法は、どんな特徴も欠如していることを示す指標として基準値を必要とする。しかしながら、視覚タスクのブラックピクセルや表データのゼロ/平均値として定義される特徴の欠如の概念は、時系列ではよく定義されていない。表と視覚ドメインから時系列ドメインへの説明可能なAIメソッド(XAI)の採用にもかかわらず、これらの違いは実際にはこれらのXAIメソッドの適用を制限する。本稿では,既存の手法を用いて時間領域で訓練されたモデルを他の説明空間で解釈できる簡易かつ効果的な手法を提案する。それぞれが特定の時系列でこれらの問題を緩和できる4つの説明空間を提案する。トレーニングされたモデルやXAIメソッドを変更することなく,既存のプラットフォームで簡単に適用することができる。 Human understandable explanation of deep learning models is necessary for many critical and sensitive applications. Unlike image or tabular data where the importance of each input feature (for the classifier's decision) can be directly projected into the input, time series distinguishable features (e.g. dominant frequency) are often hard to manifest in time domain for a user to easily understand. Moreover, most explanation methods require a baseline value as an indication of the absence of any feature. However, the notion of lack of feature, which is often defined as black pixels for vision tasks or zero/mean values for tabular data, is not well-defined in time series. Despite the adoption of explainable AI methods (XAI) from tabular and vision domain into time series domain, these differences limit the application of these XAI methods in practice. In this paper, we propose a simple yet effective method that allows a model originally trained on time domain to be interpreted in other explanation spaces using existing methods. We suggest four explanation spaces that each can potentially alleviate these issues in certain types of time series. Our method can be readily adopted in existing platforms without any change to trained models or XAI methods.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# 法律領域における非英語ハイブリッド検索の調査 Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain ( http://arxiv.org/abs/2409.01357v1 ) ライセンス: Link先を確認	Antoine Louis, Gijs van Dijck, Gerasimos Spanakis,	(参考訳) ハイブリッド検索は,特に検索品質の顕著な改善が観察された領域外文脈において,異なるマッチングパラダイムの制約を相殺するための効果的な戦略として現れてきた。しかし、既存の研究は主に限定的な検索手法に焦点をあてており、英語のみのドメイン一般データセットをペアで評価している。本研究は,フランス語の未探索法分野における多種多様な検索モデルに対するハイブリッド検索の有効性について検討し,ゼロショットとインドメインの両方のシナリオを評価する。その結果,0ショットの文脈では,異なるドメインジェネラルモデルとの融合は,融合法によらず,スタンドアローンモデルと比較して連続的に性能を向上することがわかった。驚くべきことに、モデルがドメイン内でトレーニングされている場合、融合は、注意深く調整された重みでスコアを融合しない限り、最も優れた単一システムの使用と比較して、一般的にパフォーマンスを低下させる。これらの新たな洞察は、新しい分野や言語にまたがる事前発見の適用性を高め、英語以外の専門分野におけるハイブリッド検索の深い理解に寄与する。 Hybrid search has emerged as an effective strategy to offset the limitations of different matching paradigms, especially in out-of-domain contexts where notable improvements in retrieval quality have been observed. However, existing research predominantly focuses on a limited set of retrieval methods, evaluated in pairs on domain-general datasets exclusively in English. In this work, we study the efficacy of hybrid search across a variety of prominent retrieval models within the unexplored field of law in the French language, assessing both zero-shot and in-domain scenarios. Our findings reveal that in a zero-shot context, fusing different domain-general models consistently enhances performance compared to using a standalone model, regardless of the fusion method. Surprisingly, when models are trained in-domain, we find that fusion generally diminishes performance relative to using the best single system, unless fusing scores with carefully tuned weights. These novel insights, among others, expand the applicability of prior findings across a new field and language, and contribute to a deeper understanding of hybrid search in non-English specialized domains.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# ポスト量子と量子ブロックチェーンの探索と比較 A Survey and Comparison of Post-quantum and Quantum Blockchains ( http://arxiv.org/abs/2409.01358v1 ) ライセンス: Link先を確認	Zebo Yang, Haneen Alfauri, Behrooz Farkiani, Raj Jain, Roberto Di Pietro, Aiman Erbad,	(参考訳) ブロックチェーンは、分散型の信頼とコミュニケーションを促進する能力によって、学術や産業からかなりの注目を集めている。しかし、量子コンピューティングの急速な進歩は、既存のブロックチェーン技術のセキュリティに重大な脅威をもたらす。特に、ShorとGroverのアルゴリズムの出現は、ブロックチェーンを支える暗号システムの妥協に関する懸念を提起する。そのため、量子攻撃に対してブロックチェーン技術を強化する方法を開発することが不可欠である。この問題に対して、2つの異なるアプローチが提案されている。最初のアプローチは、量子攻撃に耐性のある古典的な暗号アルゴリズムを活用することを目的とした、量子後ブロックチェーンである。第2のアプローチでは、量子コンピュータとネットワークのパワーを活用してブロックチェーンの基礎を再構築する、量子ブロックチェーンについて検討している。本稿では、これらの領域におけるオープンな質問と残りの課題を探求しながら、量子後ブロックチェーンと量子ブロックチェーンの包括的概要と比較を提供することを目的とする。詳細な紹介を提供し、ブロックチェーンの構造、セキュリティ、プライバシ、その他の重要な要素の違いを調べ、現在の研究動向を議論することで結論付けている。 Blockchains have gained substantial attention from academia and industry for their ability to facilitate decentralized trust and communications. However, the rapid progress of quantum computing poses a significant threat to the security of existing blockchain technologies. Notably, the emergence of Shor's and Grover's algorithms raises concerns regarding the compromise of the cryptographic systems underlying blockchains. Consequently, it is essential to develop methods that reinforce blockchain technology against quantum attacks. In response to this challenge, two distinct approaches have been proposed. The first approach involves post-quantum blockchains, which aim to utilize classical cryptographic algorithms resilient to quantum attacks. The second approach explores quantum blockchains, which leverage the power of quantum computers and networks to rebuild the foundations of blockchains. This paper aims to provide a comprehensive overview and comparison of post-quantum and quantum blockchains while exploring open questions and remaining challenges in these domains. It offers an in-depth introduction, examines differences in blockchain structure, security, privacy, and other key factors, and concludes by discussing current research trends.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# 解釈可能な畳み込みカーネルによる時系列の関連 Correlating Time Series with Interpretable Convolutional Kernels ( http://arxiv.org/abs/2409.01362v1 ) ライセンス: Link先を確認	Xinyu Chen, HanQin Cai, Fuqiang Liu, Jinhua Zhao,	(参考訳) 本研究では,一変量,多変量,多次元時系列データにおける畳み込みカーネル学習の問題に対処する。まず、非負制約付きスパース回帰問題として単変量時系列に対する畳み込みカーネル学習を定式化し、円形畳み込みと循環行列の特性を活用する。第2に、この手法を多変量および多次元時系列データに一般化するために、テンソル計算を用い、畳み込みカーネル学習問題をテンソルの形で再構成する。これはベクトル化やテンソル展開操作を通じて、標準的なスパース回帰問題に変換される。提案手法では,既存の非負の部分空間探索法を用いて最適化問題に対処し,畳み込みカーネルが時間的相関やパターンを捕捉できるようにする。提案モデルを評価するために,複数の実世界の時系列データセットに適用する。ニューヨーク市とシカゴの多次元ライドシェアとタクシー旅行のデータから、畳み込みカーネルは解釈可能な局所的相関と周期的パターン、例えば毎週の季節性を明らかにしている。多次元流体流動データでは、畳み込みカーネルによって捕捉された局所的相関と非局所的相関の両方がテンソル分解を補強し、流体流動再構成タスクの性能改善につながる。そこで本研究では,時系列データから畳み込みカーネルを自動的に学習するための洞察に富んだ基礎を構築し,空間性や非負性性制約による解釈性を重視した。 This study addresses the problem of convolutional kernel learning in univariate, multivariate, and multidimensional time series data, which is crucial for interpreting temporal patterns in time series and supporting downstream machine learning tasks. First, we propose formulating convolutional kernel learning for univariate time series as a sparse regression problem with a non-negative constraint, leveraging the properties of circular convolution and circulant matrices. Second, to generalize this approach to multivariate and multidimensional time series data, we use tensor computations, reformulating the convolutional kernel learning problem in the form of tensors. This is further converted into a standard sparse regression problem through vectorization and tensor unfolding operations. In the proposed methodology, the optimization problem is addressed using the existing non-negative subspace pursuit method, enabling the convolutional kernel to capture temporal correlations and patterns. To evaluate the proposed model, we apply it to several real-world time series datasets. On the multidimensional rideshare and taxi trip data from New York City and Chicago, the convolutional kernels reveal interpretable local correlations and cyclical patterns, such as weekly seasonality. In the context of multidimensional fluid flow data, both local and nonlocal correlations captured by the convolutional kernels can reinforce tensor factorization, leading to performance improvements in fluid flow reconstruction tasks. Thus, this study lays an insightful foundation for automatically learning convolutional kernels from time series data, with an emphasis on interpretability through sparsity and non-negativity constraints.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# 角運動量絡みによる量子重力の低エネルギー試験 Low-Energy Test of Quantum Gravity via Angular Momentum Entanglement ( http://arxiv.org/abs/2409.01364v1 ) ライセンス: Link先を確認	Trinidad B. Lantaño, Luciano Petruzziello, Susana F. Huelga, Martin B. Plenio,	(参考訳) 現在、低エネルギー状態における重力相互作用の量子的性質を探索するための実験は、典型的には2つの球対称試験質量の量子化された中心-質量自由度、またはニュートンポテンシャルによって支配される重力相互作用の下で非対称質量の自由度に焦点をあてている。対照的に, 球対称試験質量の角モータ間の相互作用を, 角モータ間の効果的な双極子相互作用をもたらすフレーム描画に関連する木レベル相対論的補正を考慮した検討する。このアプローチでは、プローブの質量は直接的に関係せず、代わりに角運動量が中心的な役割を果たす。最適エンタングリング速度は最大非局在化初期状態で達成されるが、それぞれが回転の固有状態で初期化されている場合でも、2つの回転系の間に有意な量子相関が生じることが実証された。さらに、生成した絡み合いの典型的なノイズ源に対する堅牢性について検討し、角運動量と球対称試験質量の組み合わせが多くの一般的なノイズ源の影響を軽減することを観察する。 Currently envisaged tests for probing the quantum nature of the gravitational interaction in the low-energy regime typically focus either on the quantized center-of-mass degrees of freedom of two spherically-symmetric test masses or on the rotational degrees of freedom of non-symmetric masses under a gravitational interaction governed by the Newtonian potential. In contrast, here we investigate the interaction between the angular momenta of spherically-symmetric test masses considering a tree-level relativistic correction related to frame-dragging that leads to an effective dipolar interaction between the angular momenta. In this approach, the mass of the probes is not directly relevant; instead, their angular momentum plays the central role. We demonstrate that, while the optimal entangling rate is achieved with a maximally delocalized initial state, significant quantum correlations can still arise between two rotating systems even when each is initialized in an eigenstate of rotation. Additionally, we examine the robustness of the generated entanglement against typical sources of noise and observe that our combination of angular momentum and spherically-symmetric test-masses mitigates the impact of many common noise sources.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# CHESS: Channel-Wise ThresholdingとSelective SparsificationによるLLM推論の最適化 CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification ( http://arxiv.org/abs/2409.01366v1 ) ライセンス: Link先を確認	Junhui He, Shangyu Wu, Weidong Wen, Chun Jason Xue, Qingan Li,	(参考訳) エッジデバイスに大規模言語モデル(LLM)をデプロイすることは、計算オーバーヘッドとメモリ要求がかなり大きいため、大きな課題となる。活性化スパーシフィケーションは、推論中に活性化されたニューロンの数を減らすことでこれらの課題を軽減することができる。既存の手法では、アクティベーションテンソルの統計に基づく閾値に基づくスペーシングが一般的である。しかし、これらの手法は、アクティベーションスペーシフィケーションがパフォーマンスに与える影響を明示的にモデル化するものではない。この問題に対処するため,本論文では,スパーシフィケーション決定を最適化する新たな目的を導入することにより,アクティベーションスペーシフィケーション問題を再考する。本稿では,この改革を基礎として,Channel-wise thrEsholding と Selective Sparsification による一般活性化スカラー化手法であるCHESSを提案する。第一に、チャネルワイドの閾値付けは、フィードフォワードネットワーク(FFN)層内の各アクティベーションチャネルにユニークな閾値を割り当てる。次に、選択的なスペーシフィケーションは、アテンションモジュール内の特定の層に閾値に基づくアクティベーションスペーシフィケーションを適用する。最後に,LLM推論を高速化するスパースカーネルの実装について述べる。実験結果から,提案したCHESSは,既存の手法に比べてパラメータを小さくし,最大1.27倍の高速化を実現していることがわかった。 Deploying large language models (LLMs) on edge devices presents significant challenges due to the substantial computational overhead and memory requirements. Activation sparsification can mitigate these challenges by reducing the number of activated neurons during inference. Existing methods typically employ thresholding-based sparsification based on the statistics of activation tensors. However, these methods do not explicitly model the impact of activation sparsification on performance, leading to suboptimal performance degradation. To address this issue, this paper reformulates the activation sparsification problem by introducing a new objective that optimizes the sparsification decisions. Building on this reformulation, we propose CHESS, a general activation sparsification approach via CHannel-wise thrEsholding and Selective Sparsification. First, channel-wise thresholding assigns a unique threshold to each activation channel in the feed-forward network (FFN) layers. Then, selective sparsification involves applying thresholding-based activation sparsification to specific layers within the attention modules. Finally, we detail the implementation of sparse kernels to accelerate LLM inference. Experimental results demonstrate that the proposed CHESS achieves lower performance degradation over 8 downstream tasks while activating fewer parameters compared to existing methods, thus speeding up the LLM inference by up to 1.27x.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# インフォメーション・ボトルネックに基づくグラフ表現学習のバイアス化 Debiasing Graph Representation Learning based on Information Bottleneck ( http://arxiv.org/abs/2409.01367v1 ) ライセンス: Link先を確認	Ziyi Zhang, Mingxuan Ouyang, Wanyu Lin, Hao Lan, Lei Yang,	(参考訳) グラフ表現学習は、金融やソーシャルネットワークなど、多くの現実世界のアプリケーションにおいて優れたパフォーマンスを示している。しかし、既存の作品の多くは、意思決定プロセスにおける公正性への注意不足のために差別的な予測を下す可能性がある。この監視によって、公正な表現学習への注目が高まっている。公正表現学習に関する最近の研究の中で、敵対的学習に基づく先行研究は、通常不安定または反生産的なパフォーマンスを誘発する。本研究では,変動グラフオートエンコーダに基づく新しいフレームワークGRAFairの設計と実装について述べる。 GRAFairの要点は条件フェアネス・ボトルネック(Conditional Fairness Bottleneck)であり、表現の効用と関心の情報とのトレードオフを捉えることを目的としている。変分近似を適用することにより、最適化対象を抽出できる。特にGRAFairは、敵の訓練を受けずに機密情報をほとんど含まないまま、タスクの情報表現を訓練することができる。実世界の様々なデータセットに対する実験により,提案手法の有効性を,公正性,有用性,堅牢性,安定性の観点から実証した。 Graph representation learning has shown superior performance in numerous real-world applications, such as finance and social networks. Nevertheless, most existing works might make discriminatory predictions due to insufficient attention to fairness in their decision-making processes. This oversight has prompted a growing focus on fair representation learning. Among recent explorations on fair representation learning, prior works based on adversarial learning usually induce unstable or counterproductive performance. To achieve fairness in a stable manner, we present the design and implementation of GRAFair, a new framework based on a variational graph auto-encoder. The crux of GRAFair is the Conditional Fairness Bottleneck, where the objective is to capture the trade-off between the utility of representations and sensitive information of interest. By applying variational approximation, we can make the optimization objective tractable. Particularly, GRAFair can be trained to produce informative representations of tasks while containing little sensitive information without adversarial training. Experiments on various real-world datasets demonstrate the effectiveness of our proposed method in terms of fairness, utility, robustness, and stability.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# 量子パンプロトサイコリズムと構造と主観的な組み合わせ問題 Quantum panprotopsychism and the structure and subject-summing combination problem ( http://arxiv.org/abs/2409.01368v1 ) ライセンス: Link先を確認	Rodolfo Gambini, Jorge Pullin,	(参考訳) 先程の論文では、パンプロト心理学の一形態として、状態や内部現象を伴う事象の量子力学のオントロジーが、意識の現象的側面を説明するのに適していることが示されている。我々は、パン心理学とパンプロト心理学のパレットと穀物の組み合わせの問題が、量子レベルで不適切な超越性に関する古典物理学に基づく暗黙の仮説から生じることを証明した。本稿では、ウィリアム・ジェームスによってもたらされた主観的仮定問題について、おそらくパンサイコズムの第一かつ最も重要な組み合わせ問題について論じる。我々はまず、その論文で提示された量子パンプロト心理学的アプローチにおける経験者の物理的相違点を特定することから始める。これを達成するために、ホワイトヘッドが提唱した包括的概念から着想を得た経験の主題の概念に目を向け、この概念が物体や事象の量子オントロジーに適応可能であることを示す。量子力学の不確定性と因果開放性のため、このオントロジーは構造結合問題の残りの側面の分析にも適しており、意識の構造が原始動物から人間へとどのように進化したかを示している。この分析は、脳における量子認知機構の実装に関する条件を課し、それらに取り組むための新しい問題と戦略を提案する。特に、進化の発達の度合いが異なる動物における経験の構造化についてである。 In a previous paper, we have shown that an ontology of quantum mechanics in terms of states and events with internal phenomenal aspects, that is, a form of panprotopsychism, is well suited to explaining the phenomenal aspects of consciousness. We have proved there that the palette and grain combination problems of panpsychism and panprotopsychism arise from implicit hypotheses based on classical physics about supervenience that are inappropriate at the quantum level, where an exponential number of emergent properties and states arise. In this article, we address what is probably the first and most important combination problem of panpsychism: the subject-summing problem originally posed by William James. We begin by identifying the physical counterparts of the subjects of experience within the quantum panprotopsychic approach presented in that article. To achieve this, we turn to the notion of subject of experience inspired by the idea of prehension proposed by Whitehead and show that this notion can be adapted to the quantum ontology of objects and events. Due to the indeterminacy of quantum mechanics and its causal openness, this ontology also seems to be suitable for the analysis of the remaining aspects of the structure combination problem, which shows how the structuration of consciousness could have evolved from primitive animals to humans. The analysis imposes conditions on possible implementations of quantum cognition mechanisms in the brain and suggests new problems and strategies to address them. In particular, with regard to the structuring of experiences in animals with different degrees of evolutionary development.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# スケーラブルな逆強化学習による言語模倣 Imitating Language via Scalable Inverse Reinforcement Learning ( http://arxiv.org/abs/2409.01369v1 ) ライセンス: Link先を確認	Markus Wulfmeier, Michael Bloesch, Nino Vieillard, Arun Ahuja, Jorg Bornschein, Sandy Huang, Artem Sokolov, Matt Barnes, Guillaume Desjardins, Alex Bewley, Sarah Maria Elisabeth Bechtle, Jost Tobias Springenberg, Nikola Momchev, Olivier Bachem, Matthieu Geist, Martin Riedmiller,	(参考訳) 言語モデルトレーニングの大半は模倣学習に基づいている。プレトレーニング、教師付き微調整をカバーし、人間からのフィードバック(RLHF)からの強化学習の開始条件に影響を与える。次のトークン予測のための最大推定値(MLE)の単純性とスケーラビリティは、主要なパラダイムとしての役割を導いた。しかし、より広範な模倣学習の分野は、自己回帰生成に基づくシーケンシャルな構造をより効果的に活用することができる。我々は、逆強化学習(IRL)の模倣に対する視点を調査し、報酬を抽出し、個々のトークンの確率ではなくシーケンスを直接最適化し、その利点を大規模言語モデルの微調整に向け評価する。我々は,MLEの時間差正規化拡張として,逆ソフトQ-ラーニングを改良した新しいアングルを提供する。これにより、MLEとIRLの原則的な接続が作成され、教師付き微調整(SFT)設定において、パフォーマンスと世代間の多様性が向上した、追加の複雑さのトレードオフが可能になる。特に,タスク性能を最大化しながら多様性を維持するため,IRLをオンラインデータ生成なしでも固定SFTデータセットに強力な代替手段として活用する上で,IRLに基づく模倣の明確な利点を見出した。 IRL抽出報酬関数の解析により、教師付きおよび嗜好に基づくLLMポストトレーニングの強化により、より堅牢な報酬関数の利点が示唆された。 The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability of maximum likelihood estimation (MLE) for next token prediction led to its role as predominant paradigm. However, the broader field of imitation learning can more effectively utilize the sequential structure underlying autoregressive generation. We focus on investigating the inverse reinforcement learning (IRL) perspective to imitation, extracting rewards and directly optimizing sequences instead of individual token likelihoods and evaluate its benefits for fine-tuning large language models. We provide a new angle, reformulating inverse soft-Q-learning as a temporal difference regularized extension of MLE. This creates a principled connection between MLE and IRL and allows trading off added complexity with increased performance and diversity of generations in the supervised fine-tuning (SFT) setting. We find clear advantages for IRL-based imitation, in particular for retaining diversity while maximizing task performance, rendering IRL a strong alternative on fixed SFT datasets even without online data generation. Our analysis of IRL-extracted reward functions further indicates benefits for more robust reward functions via tighter integration of supervised and preference-based LLM post-training.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# 離散最適化のための量子コンピューティング:3つの技術のハイライト Quantum Computing for Discrete Optimization: A Highlight of Three Technologies ( http://arxiv.org/abs/2409.01373v1 ) ライセンス: Link先を確認	Alexey Bochkarev, Raoul Heese, Sven Jäger, Philine Schiewe, Anita Schöbel,	(参考訳) 量子最適化は量子コンピューティングの有望なフロンティアとして登場し、数学的最適化問題に対する新しい数値的アプローチを提供している。本研究の目的は,オペレーショナル・リサーチ(OR)と量子コンピューティングのコミュニティ間の学際的な研究を促進することである。この目的のために、様々な種類の量子ハードウェアを市場に投入する3つの量子パワー最適化手法を検討する。これらの手法を説明するために,トラベリングセールスパーソン問題,重み付き最大カット,最大独立セットの3つの古典的最適化問題を解く。一般のオーディエンスを念頭に置いて、各アプローチの背後にある直感と重要な参照を提供し、対応する高レベルのワークフローを記述し、重要な実践的考察を強調します。特に、問題定式化とデバイス固有の構成の重要性を強調し、計算に必要なリソース量(量子ビットの数に焦点をあてる)に与える影響を強調します。これらの点は、QuEraの中立原子マシン、D-Waveの量子アニール、IBMのゲートベースのデバイスという3種類の量子コンピュータに関する一連の実験で説明されている。 Quantum optimization has emerged as a promising frontier of quantum computing, providing novel numerical approaches to mathematical optimization problems. The main goal of this paper is to facilitate interdisciplinary research between the Operations Research (OR) and Quantum Computing communities by providing an OR scientist's perspective on selected quantum-powered methods for discrete optimization. To this end, we consider three quantum-powered optimization approaches that make use of different types of quantum hardware available on the market. To illustrate these approaches, we solve three classical optimization problems: the Traveling Salesperson Problem, Weighted Maximum Cut, and Maximum Independent Set. With a general OR audience in mind, we attempt to provide an intuition behind each approach along with key references, describe the corresponding high-level workflow, and highlight crucial practical considerations. In particular, we emphasize the importance of problem formulations and device-specific configurations, and their impact on the amount of resources required for computation (where we focus on the number of qubits). These points are illustrated with a series of experiments on three types of quantum computers: a neutral atom machine from QuEra, a quantum annealer from D-Wave, and a gate-based device from IBM.	翻訳日:2024-09-06 06:11:05 公開日:2024-09-02
# H-ARC: 抽象と推論コーパスベンチマークにおける人的パフォーマンスのロバストな評価 H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark ( http://arxiv.org/abs/2409.01374v1 ) ライセンス: Link先を確認	Solim LeGris, Wai Keen Vong, Brenden M. Lake, Todd M. Gureckis,	(参考訳) ARC(Abstraction and Reasoning Corpus)は、人や機械における配布外一般化の挑戦をテストするために設計されたビジュアルプログラム合成ベンチマークである。 2019年以降、既存の人工知能手法による課題について、限られた進展が観察されている。人間と機械のパフォーマンスを比較することは、ベンチマークの有効性にとって重要である。以前の研究では、人間がARCベンチマークからいかにうまくタスクを解くことができるかを調べたが、それらは元のデータセットやARCの変種からのタスクのサブセットのみを使用していたため、人間のパフォーマンスを仮に見積もっただけだった。本研究では,元のARC問題集合から,400のトレーニングと400のタスクの完全なセット上で1729人の人間を評価することにより,より堅牢な人的パフォーマンスの推定値を得る。トレーニングセットでは、平均人のパフォーマンスが73.3%から77.2%、トレーニングセットでは76.2%、公的評価セットでは55.9%から68.9%と報告されている経験平均が64.2%と見積もられている。しかし、800件のうち790件は少なくとも1件の人が3回試みて解決可能であり、一般に公開されているARCタスクの大部分は、インターネット上で採用される一般的なクラウドワーカーによって原則的に解決可能であることを示唆している。特に、これらの数値は以前の推定よりもわずかに低いが、人間のパフォーマンスはARCを解くための最先端のアプローチを大きく上回っている。 ARCの研究を容易にするために、私たちはH-ARC(Human-ARC)と呼ばれるデータセットを公開しました。 The Abstraction and Reasoning Corpus (ARC) is a visual program synthesis benchmark designed to test challenging out-of-distribution generalization in humans and machines. Since 2019, limited progress has been observed on the challenge using existing artificial intelligence methods. Comparing human and machine performance is important for the validity of the benchmark. While previous work explored how well humans can solve tasks from the ARC benchmark, they either did so using only a subset of tasks from the original dataset, or from variants of ARC, and therefore only provided a tentative estimate of human performance. In this work, we obtain a more robust estimate of human performance by evaluating 1729 humans on the full set of 400 training and 400 evaluation tasks from the original ARC problem set. We estimate that average human performance lies between 73.3% and 77.2% correct with a reported empirical average of 76.2% on the training set, and between 55.9% and 68.9% correct with a reported empirical average of 64.2% on the public evaluation set. However, we also find that 790 out of the 800 tasks were solvable by at least one person in three attempts, suggesting that the vast majority of the publicly available ARC tasks are in principle solvable by typical crowd-workers recruited over the internet. Notably, while these numbers are slightly lower than earlier estimates, human performance still greatly exceeds current state-of-the-art approaches for solving ARC. To facilitate research on ARC, we publicly release our dataset, called H-ARC (human-ARC), which includes all of the submissions and action traces from human participants.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# 中心スピンモデルによる駆動制御場の騒音評価 Characterizing Noise of Driven Controlled Field Using the Central Spin Model ( http://arxiv.org/abs/2409.01375v1 ) ライセンス: Link先を確認	R. Jafari, A. Asadian, M. Abdi, Alireza Akbari,	(参考訳) スピンチェーンと時間依存性のノイズ磁場を結合した中心スピンのコヒーレンスダイナミクスを解析し,ノイズがシステムのデコヒーレンスにどのように影響するかに着目した。その結果, 環境の非平衡臨界ダイナミクスによるデコヒーレンシは, 非相関および相関ガウス雑音の存在下で増幅されることがわかった。我々は,デコヒーレンス係数が常に臨界点を信号し,システムサイズ,雑音強度の2乗,騒音相関時間の指数的スケーリングを示すことを示した。量子ビットと環境との強い結合により、コヒーレンスの部分的回復が可能となり、ノイズ強度の増加やノイズ相関時間の減少によって減少することがわかった。対照的に、弱いカップリングはモノトニック強化デコヒーレンスをもたらす。数値的な結果は、再生は減衰し、ノイズ強度とともに指数関数的にスケールすることを示している。さらに、リバイバルは増加し、ノイズ相関時間による線形または電力法スケーリングを示すが、相関ノイズの速さや遅さに依存する。さらに,非マルコビアン性について検討し,ノイズの存在下では減衰するが,ノイズ相関時間の増加とともに増大することを示した。本研究は,外部信号のノイズスペクトロスコピーに応用できる可能性が示唆された。 We analyze the coherence dynamics of a central spin coupled to a spin chain with a time-dependent noisy magnetic field, focusing on how noise influences the system's decoherence. Our results show that decoherency due to the nonequilibrium critical dynamics of the environment is amplified in the presence of uncorrelated and correlated Gaussian noise. We demonstrate that decoherence factor consistently signals the critical points, and exhibits exponential scaling with the system size, the square of noise intensity, and the noise correlation time at the critical points. We find that strong coupling between the qubit and the environment allows partial revivals of coherence, which diminish with increasing noise intensity or decreasing noise correlation time. In contrast, weak coupling leads to monotonic enhanced decoherence. The numerical results illustrate that, the revivals decay and scale exponentially with noise intensity. Moreover, the revivals increase and indicate linear or power law scaling with noise correlation time depends on how the correlated noise is fast or slow. Additionally, we explore the non-Markovianity of the dynamics, finding that it decays in the presence of noise but increases as the noise correlation time grows. Our findings have potential applications in the noise spectroscopy of external signals.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# コンテンツ・ナッジ・インセンティブ:埋設養殖訓練の有効性と知覚に関する研究 Content, Nudges and Incentives: A Study on the Effectiveness and Perception of Embedded Phishing Training ( http://arxiv.org/abs/2409.01378v1 ) ライセンス: Link先を確認	Daniele Lain, Tarek Jost, Sinisa Matetic, Kari Kostiainen, Srdjan Capkun,	(参考訳) 組織におけるフィッシングトレーニングの一般的な形態は、フィッシング攻撃に対する従業員の感受性をテストするためのシミュレートされたフィッシングメールの使用と、テストに失敗した人々へのトレーニング資料の即時提供である。この実践を組込み訓練と呼ぶが、近年のいくつかのフィールド研究の矛盾した結果から、将来再びフィッシングに陥る可能性を減らす効果が疑問視されている。フィッシングトレーニングを3つの側面から検討した。まず、このプラクティスには、その内容から得られる知識、テスト自体から得られる評価やリマインダー、潜在的な結果の抑止効果など、さまざまなコンポーネントが組み込まれています。第2に、トレーニングの2つの潜在的な改善、すなわち、タイミングとインセンティブの使用について検討する。第3に,この実践に対する従業員の受容と認識を分析した。そこで我々は,パートナー企業の従業員を対象とした大規模混合メソッド(質的,質的)調査を行った。本研究は, トレーニング実践における新たな知見として, 特に, その有効性は, そのヌード効果,すなわち, コンテンツではなく, 脅威の定期的なリマインダーによるものである。さらに、トレーニングを遅らせて時間的プレッシャーを緩和することは、現在確立されているプラクティスと同じくらい効果的であるが、報酬は安全な振る舞いを改善しない。最後に, フィッシングは知識ではなく, もっとも感受性の高い従業員であっても注意の問題であり, トレーニングの実施は役に立たないという, 生態学的妥当性の向上による過去の知見を裏付ける結果が得られた。 A common form of phishing training in organizations is the use of simulated phishing emails to test employees' susceptibility to phishing attacks, and the immediate delivery of training material to those who fail the test. This widespread practice is dubbed embedded training; however, its effectiveness in decreasing the likelihood of employees falling for phishing again in the future is questioned by the contradictory findings of several recent field studies. We investigate embedded phishing training in three aspects. First, we observe that the practice incorporates different components -- knowledge gains from its content, nudges and reminders from the test itself, and the deterrent effect of potential consequences -- our goal is to study which ones are more effective, if any. Second, we explore two potential improvements to training, namely its timing and the use of incentives. Third, we analyze employees' reception and perception of the practice. For this, we conducted a large-scale mixed-methods (quantitative and qualitative) study on the employees of a partner company. Our study contributes several novel findings on the training practice: in particular, its effectiveness comes from its nudging effect, i.e., the periodic reminder of the threat rather than from its content, which is rarely consumed by employees due to lack of time and perceived usefulness. Further, delaying training to ease time pressure is as effective as currently established practices, while rewards do not improve secure behavior. Finally, some of our results support previous findings with increased ecological validity, e.g., that phishing is an attention problem, rather than a knowledge one, even for the most susceptible employees, and thus enforcing training does not help.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# インコンテキスト学習に対するメンバーシップ推論攻撃 Membership Inference Attacks Against In-Context Learning ( http://arxiv.org/abs/2409.01380v1 ) ライセンス: Link先を確認	Rui Wen, Zheng Li, Michael Backes, Yang Zhang,	(参考訳) LLM(Large Language Models)を特定のタスクに適応させると、計算効率に関する懸念が生じ、In-Context Learning (ICL)のような効率的な手法が探索される。しかし、現実的な仮定の下でのプライバシー攻撃に対するICLの脆弱性はほとんど解明されていない。本研究では,ICLに適した最初のメンバシップ推論攻撃について述べる。様々な制約のあるシナリオに合わせた4つの攻撃戦略を提案し、4つの人気のある大規模言語モデルに対して広範な実験を行う。実験の結果、LLaMAに対する95%の精度優位性など、ほとんどのケースにおいて、我々の攻撃が正確な会員資格を決定できることが示され、関連するリスクが既存の確率ベース攻撃よりもはるかに高いことが示唆された。さらに、上記の戦略の強みを合成し、ほとんどの場合、95%以上の精度の利点を達成できるハイブリッド攻撃を提案する。さらに,データ,命令,出力を対象とする3つの潜在的防御について検討する。その結果、直交次元からの防御を組み合わせることで、プライバシーの漏洩が著しく減少し、プライバシーの保証が強化されることが示されている。 Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four attack strategies tailored to various constrained scenarios and conduct extensive experiments on four popular large language models. Empirical results show that our attacks can accurately determine membership status in most cases, e.g., 95\% accuracy advantage against LLaMA, indicating that the associated risks are much higher than those shown by existing probability-based attacks. Additionally, we propose a hybrid attack that synthesizes the strengths of the aforementioned strategies, achieving an accuracy advantage of over 95\% in most cases. Furthermore, we investigate three potential defenses targeting data, instruction, and output. Results demonstrate combining defenses from orthogonal dimensions significantly reduces privacy leakage and offers enhanced privacy assurances.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# LLM生成符号の自動検出:クロード3俳句を事例として Automatic Detection of LLM-generated Code: A Case Study of Claude 3 Haiku ( http://arxiv.org/abs/2409.01382v1 ) ライセンス: Link先を確認	Musfiqur Rahman, SayedHassan Khatoonabadi, Ahmad Abdellatif, Emad Shihab,	(参考訳) LLM(Large Language Models)の使用は、ソースコードを生成するソフトウェア開発者の間で人気を集めている。しかし、LLM生成コードを使用することで、最適でない、欠陥のある、脆弱性のあるコードを追加するリスクが生じる可能性がある。これにより、LCM生成コードの正確な検出方法を考案する必要がある。この目標に向けて、CodeSearchNetデータセット上でClude 3 Haiku(またはClude 3 for brevity)のケーススタディを実行します。我々は分析を関数レベルとクラスレベルという2つの部分に分けられる。粒度レベルごとに,コードラインやサイクロマティック複雑度などの22のソフトウェアメトリクス特徴を抽出する。次に、抽出した特徴を用いて、Claude 3と、その人間が作成したコードスニペットを分析し、Claude 3で生成されたコードがいかにユニークかを理解する。次のステップでは、Claude 3生成コードの特徴を利用して機械学習(ML)モデルを構築し、コードスニペットのどの特徴がMLモデルでより検出可能かを識別する。以上の結果から,Claude 3はより長い関数を生成する傾向にあるが,より短いクラスを生成する傾向があり,この特徴は,関数レベルスニペットとクラスレベルのスニペットの精度が82%,66%のClaude 3生成コードを検出するのに有効であることが示唆された。 Using Large Language Models (LLMs) has gained popularity among software developers for generating source code. However, the use of LLM-generated code can introduce risks of adding suboptimal, defective, and vulnerable code. This makes it necessary to devise methods for the accurate detection of LLM-generated code. Toward this goal, we perform a case study of Claude 3 Haiku (or Claude 3 for brevity) on CodeSearchNet dataset. We divide our analyses into two parts: function-level and class-level. We extract 22 software metric features, such as Code Lines and Cyclomatic Complexity, for each level of granularity. We then analyze code snippets generated by Claude 3 and their human-authored counterparts using the extracted features to understand how unique the code generated by Claude 3 is. In the following step, we use the unique characteristics of Claude 3-generated code to build Machine Learning (ML) models and identify which features of the code snippets make them more detectable by ML models. Our results indicate that Claude 3 tends to generate longer functions, but shorter classes than humans, and this characteristic can be used to detect Claude 3-generated code with ML models with 82% and 66% accuracies for function-level and class-level snippets, respectively.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# 低光強度及び平均場外におけるサブ波長原子配列の非線形定常状態 Non-linear steady states of subwavelength atomic arrays at low light intensities and beyond mean field ( http://arxiv.org/abs/2409.01386v1 ) ライセンス: Link先を確認	Orazio Scarlatella, Nigel R. Cooper,	(参考訳) サブ波長原子配列は、新しい非平衡多体状態をホストできる長距離相互作用と集合散逸を持つ新しい光物質プラットフォームを構成する。ここでは、コヒーレント運転下での定常状態について検討する。低駆動強度系では、線形で非相互作用的な理論でしばしば説明されるが、そのような記述はサブ波長系では不十分であることを示す。ここでは、非線形性は、多数の原子の限界において、消滅する駆動強度まで大きな影響を及ぼす可能性があることを指摘する。次に、中間駆動強度の条件下での動的平均場理論(DMFT)アプローチにおけるグッツウィラー平均場理論以外の揺らぎの役割について検討する。平均場理論で予測される不均一な不安定性と位相の幅を抑えることを含む,定常相図に劇的な影響を与えることを示す。 Subwavelength atomic arrays constitute a novel light-matter platform with long-range interactions and collective dissipation that can host novel non-equilibrium many-body states. Here we investigate their steady states under coherent driving. While in the low-drive intensity regime they have often been described in terms of linear, non-interacting theories, we show that such a description is inadequate in subwavelength regimes. There, we point out that non-linearities can have large effects down to a vanishing drive intensity in the limit of large number of atoms. Then we investigate the role of fluctuations beyond Gutzwiller mean-field theory within a Dynamical Mean Field Theory (DMFT) approach in the regime of intermediate drive intensity. We show that these have a dramatic impact on the steady-state phase diagram, including suppressing a range of non-homogeneous instabilities and phases predicted in mean-field theory.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# ディープラーニングを用いたVLSIハイパーグラフ分割 VLSI Hypergraph Partitioning with Deep Learning ( http://arxiv.org/abs/2409.01387v1 ) ライセンス: Link先を確認	Muhammad Hadir Khan, Bugra Onal, Eren Dogan, Matthew R. Guthaus,	(参考訳) 分割はコンピュータ科学における既知の問題であり、チップ設計のワークフローにおいて重要な問題である。ディープラーニング(DL)技術、特にグラフニューラルネットワーク(GNN)に関わる技術は、インダクティブ学習とトランスダクティブ学習の両方を用いて、様々なノード、エッジ、グラフ予測タスクで強いパフォーマンスを示している。 GNNにおける最近の関心の顕著な領域は、レイヤのプーリングと、グラフパーティショニングへの応用である。これらの手法は、社会グラフ、計算グラフ、その他のランダムグラフにまたがって有望な結果をもたらすが、VLSIハイパーグラフネットリストの文脈では、その有効性はまだ検討されていない。本研究では,実世界のネットリスト特性をエミュレートし,ソリューションカット品質の既知の上限を有する,新しい合成分割ベンチマークを提案する。我々は、これらのベンチマークを以前の研究と区別し、既存の最先端分割アルゴリズムとGNNベースのアプローチを併用して評価し、それぞれの利点と欠点を強調した。 Partitioning is a known problem in computer science and is critical in chip design workflows, as advancements in this area can significantly influence design quality and efficiency. Deep Learning (DL) techniques, particularly those involving Graph Neural Networks (GNNs), have demonstrated strong performance in various node, edge, and graph prediction tasks using both inductive and transductive learning methods. A notable area of recent interest within GNNs are pooling layers and their application to graph partitioning. While these methods have yielded promising results across social, computational, and other random graphs, their effectiveness has not yet been explored in the context of VLSI hypergraph netlists. In this study, we introduce a new set of synthetic partitioning benchmarks that emulate real-world netlist characteristics and possess a known upper bound for solution cut quality. We distinguish these benchmarks with the prior work and evaluate existing state-of-the-art partitioning algorithms alongside GNN-based approaches, highlighting their respective advantages and disadvantages.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# CV-Probes:視覚的言語理解における語彙と世界知識の相互作用に関する研究 CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding ( http://arxiv.org/abs/2409.01389v1 ) ライセンス: Link先を確認	Ivana Beňová, Michal Gregor, Albert Gatt,	(参考訳) 本研究では,様々な視覚言語(VL)モデルが文脈依存・非文脈依存の動詞句を接地する能力について検討した。 CV-Probesデータセットは,文脈依存動詞(例,「beg」)と非文脈依存動詞(例,「sit」)を含む,文脈理解を明示的に研究するためのデータセットである。モデル予測に対する動詞トークンの寄与を評価するためにMM-SHAP評価を用いる。以上の結果から,VLモデルは文脈依存動詞句を効果的に理解するのに苦慮していることが明らかとなった。これらの知見は,VLモデルのコンテキストを正確に統合する上での課題を浮き彫りにして,VLモデルのトレーニングと評価における方法論の改善の必要性を示唆している。 This study investigates the ability of various vision-language (VL) models to ground context-dependent and non-context-dependent verb phrases. To do that, we introduce the CV-Probes dataset, designed explicitly for studying context understanding, containing image-caption pairs with context-dependent verbs (e.g., "beg") and non-context-dependent verbs (e.g., "sit"). We employ the MM-SHAP evaluation to assess the contribution of verb tokens towards model predictions. Our results indicate that VL models struggle to ground context-dependent verb phrases effectively. These findings highlight the challenges in training VL models to integrate context accurately, suggesting a need for improved methodologies in VL model training and evaluation.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# スペクトルからの量子メアロジーとサブシステム Quantum mereology and subsystems from the spectrum ( http://arxiv.org/abs/2409.01391v1 ) ライセンス: Link先を確認	Nicolas Loizeau, Dries Sels,	(参考訳) 量子系を記述する最小の材料は、ハミルトニアン、初期状態、およびサブシステムへの分解を符号化する好ましいテンソル積構造である。サブシステムがシステム全体のスペクトルから現れるトップダウンアプローチについて検討する。このアプローチは量子メアロジー(quantum mereology)と呼ばれる。まず、システムをサブシステムに分解することは、スペクトルを他のスペクトルに分解することと同値であることを示す。すると、サブシステムの数(系の体積)はスペクトル自身から推測できると論じる。局所モデルでは、この情報はガウス状態密度に対する有限サイズ補正で符号化される。 The minimal ingredients to describe a quantum system are a Hamiltonian, an initial state, and a preferred tensor product structure that encodes a decomposition into subsystems. We explore a top-down approach in which the subsystems emerge from the spectrum of the whole system. This approach has been referred to as quantum mereology. First we show that decomposing a system into subsystems is equivalent to decomposing a spectrum into other spectra. Then we argue that the number of subsystems (the volume of the system) can be inferred from the spectrum itself. In local models, this information is encoded in finite size corrections to the Gaussian density of states.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# GenAgent: ワークフローの自動生成によるコラボレーションAIシステムの構築 -- ComfyUIのケーススタディ GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI ( http://arxiv.org/abs/2409.01392v1 ) ライセンス: Link先を確認	Xiangyuan Xue, Zeyu Lu, Di Huang, Wanli Ouyang, Lei Bai,	(参考訳) これまでのAI研究は、インテリジェンスと能力を最大化するモノリシックモデルの開発に重点を置いてきた。対照的に、この記事では、ワークフローを使用してモデル、データソース、パイプラインを統合し、複雑で多様なタスクを解決する、コラボレーションAIシステムという別のアプローチを探求する。我々は、複雑なワークフローを自動的に生成するLLMベースのフレームワークであるGenAgentを紹介し、モノリシックモデルよりも柔軟性とスケーラビリティを提供する。 GenAgentの中核的なイノベーションは、ワークフローをコードで表現することであり、ワークフローと協調エージェントをステップバイステップで構築することにある。我々は、ComfyUIプラットフォームにGenAgentを実装し、新しいベンチマークOpenComfyを提案する。その結果、GenAgentは実行レベルおよびタスクレベルの評価においてベースラインアプローチよりも優れており、より優れた効率と安定性で複雑なワークフローを生成する能力を示している。 Much previous AI research has focused on developing monolithic models to maximize their intelligence and capability, with the primary goal of enhancing performance on specific tasks. In contrast, this paper explores an alternative approach: collaborative AI systems that use workflows to integrate models, data sources, and pipelines to solve complex and diverse tasks. We introduce GenAgent, an LLM-based framework that automatically generates complex workflows, offering greater flexibility and scalability compared to monolithic models. The core innovation of GenAgent lies in representing workflows with code, alongside constructing workflows with collaborative agents in a step-by-step manner. We implement GenAgent on the ComfyUI platform and propose a new benchmark, OpenComfy. The results demonstrate that GenAgent outperforms baseline approaches in both run-level and task-level evaluations, showing its capability to generate complex workflows with superior effectiveness and stability.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# カオス力学予測のための有限サンプル量子貯水池コンピュータの最適トレーニング Optimal training of finitely-sampled quantum reservoir computers for forecasting of chaotic dynamics ( http://arxiv.org/abs/2409.01394v1 ) ライセンス: Link先を確認	Osama Ahmed, Felix Tennie, Luca Magri,	(参考訳) 現在のノイズ中間スケール量子(NISQ)時代には、ノイズの存在は量子コンピューティングアルゴリズムの性能を悪化させる。量子Reservoir Computing (QRC) は量子機械学習アルゴリズムの一種であり、様々な種類のチューニングノイズの恩恵を受けることができる。本稿では,QRCとRF-QRCのカオス的時系列予測能力に及ぼす有限サンプリングノイズの影響を解析する。まず、繰り返しループがなくても、RF-QRCは漏れた集積ニューロンを用いて、過去の貯水状態に関する時間情報を含むことを示す。これにより、RF-QRCはQuantum Extreme Learning Machines (QELM)とは異なる。第2に, 有限サンプリングノイズはQRCとRF-QRCの両方の予測能力を劣化させ, ノイズの伝搬によりQRCに悪影響を及ぼすことを示した。第3に、2つの手法を用いて有限サンプル量子貯水池計算フレームワークのトレーニングを最適化する。 (a)ノイズ貯水池活性化状態を含むデータマトリックスに適用される特異値分解(SVD)及び (b) ノイズの多い貯水池活性化状態から高周波を取り除くためのデータフィルタリング技術。本研究は,貯水池活性化状態が訓練損失を小さくして信号対雑音比を向上することを示す。最後に、RF-QRCにおけるノイズ発生活性化信号のトレーニングと復調は、繰り返し接続を持つQRCアーキテクチャと比較して、複数の量子処理ユニット(QPU)上で高い並列化が可能であることを示す。これらの解析は乱流に関連する原型カオス力学系に数値的に示される。この研究は、短期量子ハードウェア上での時系列予測のための有限サンプルを用いた量子貯水池計算の機会を開放する。 In the current Noisy Intermediate Scale Quantum (NISQ) era, the presence of noise deteriorates the performance of quantum computing algorithms. Quantum Reservoir Computing (QRC) is a type of Quantum Machine Learning algorithm, which, however, can benefit from different types of tuned noise. In this paper, we analyse the effect that finite-sampling noise has on the chaotic time-series prediction capabilities of QRC and Recurrence-free Quantum Reservoir Computing (RF-QRC). First, we show that, even without a recurrent loop, RF-QRC contains temporal information about previous reservoir states using leaky integrated neurons. This makes RF-QRC different from Quantum Extreme Learning Machines (QELM). Second, we show that finite sampling noise degrades the prediction capabilities of both QRC and RF-QRC while affecting QRC more due to the propagation of noise. Third, we optimize the training of the finite-sampled quantum reservoir computing framework using two methods: (a) Singular Value Decomposition (SVD) applied to the data matrix containing noisy reservoir activation states; and (b) data-filtering techniques to remove the high-frequencies from the noisy reservoir activation states. We show that denoising reservoir activation states improve the signal-to-noise ratios with smaller training loss. Finally, we demonstrate that the training and denoising of the noisy reservoir activation signals in RF-QRC are highly parallelizable on multiple Quantum Processing Units (QPUs) as compared to the QRC architecture with recurrent connections. The analyses are numerically showcased on prototypical chaotic dynamical systems with relevance to turbulence. This work opens opportunities for using quantum reservoir computing with finite samples for time-series forecasting on near-term quantum hardware.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# 最適化量子回路干渉による誤差フィルタリング Error filtration from optimized quantum circuit interference ( http://arxiv.org/abs/2409.01398v1 ) ライセンス: Link先を確認	Aaqib Ali, Giovanni Scala, Cosmo Lupo,	(参考訳) ノイズの多い量子ビットでエラーを軽減するためのハードウェア戦略を最適化する。提案手法は, 誤差フィルタリングの物理原理に基づいて, 補助量子ビットを利用する。信号とアンシラは共に局所雑音を受けるが、構成的干渉(および選択後)により、信号量子ビット内の雑音のレベルを下げることができる。量子ビットを最も効果的な方法で干渉させる最適なユニタリを求めると、一組の普遍ゲートから始め、勾配降下あるいは確率近似によって適切な汎函数を最適化する。我々は,エンタングルメントの忠実度,量子フィッシャー情報(量子センシングへの応用),CHSH値(非局所性および量子暗号のテストへの応用)など,様々な応用に対応するさまざまなメリットの数字に対して,そのアプローチをベンチマークした。 1つのアンシラと2つのアンシラでは、最適なユニタリのためのアンサッツからの明示的な表現も提供します。 We develop an optimized hardware strategy to mitigate errors in a noisy qubit. Our scheme builds on the physical principle of error filtration and exploits auxiliary qubits. Both signal and ancillas are subject to local noise, yet constructive interference (and in some cases post-selection) allows us to reduce the level of noise in the signal qubit. Seeking for the optimal unitary that makes the qubits interfere in the most effective way, we start with a set of universal gates and proceed by optimizing suitable functionals by gradient descent or stochastic approximation. We benchmark our approach against a number of figure of merits that correspond to different applications, including entanglement fidelity, quantum Fisher information (for applications in quantum sensing), and CHSH value (for applications in tests of non-locality and quantum cryptography), with one, two, and three ancillary qubits. With one and two ancillas we also provide explicit expressions from an ansatz for an optimal unitary.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# 格子系におけるマイクロカノニカルフリーキューマント Microcanonical Free Cumulants in lattice systems ( http://arxiv.org/abs/2409.01404v1 ) ライセンス: Link先を確認	Felix Fritzsch, Tomaž Prosen, Silvia Pappalardi,	(参考訳) 近年, 固有状態熱化仮説(ETH)の完全版が自由確率を用いて体系化されている。本稿では,マイクロカノニカルアンサンブル内の多体ダイナミクスに対するFree Cumulantsアプローチについて詳述する。後の標準平均と標準平均の差は、広範な作用素の時間依存性の変動に現れることが知られている。したがって、マイクロカノニカルアンサンブルは、Free Probability の応用を広範囲にわたる可観測物に拡張するために不可欠である。有限エネルギー密度の広い可観測物に対する非可積分スピン鎖ハミルトニアンにおける我々のアプローチの有効性を数値的に示す。以上の結果より,ETHの完全特性,特に交差寄与の抑制と非交差寄与の分解が確認され,局所および広範囲の観測対象のETHのスムーズな相関をマイクロカノニカルフリー累積が符号化していることが示された。 Recently, the full version of the Eigenstate Thermalization Hypothesis (ETH) has been systematized using Free Probability. In this paper, we present a detailed discussion of the Free Cumulants approach to many-body dynamics within the microcanonical ensemble. Differences between the later and canonical averages are known to manifest in the time-dependent fluctuations of extensive operators. Thus, the microcanonical ensemble is essential to extend the application of Free Probability to the broad class of extensive observables. We numerically demonstrate the validity of our approach in a non-integrable spin chain Hamiltonian for extensive observables at finite energy density. Our results confirm the full ETH properties, specifically the suppression of crossing contributions and the factorization of non-crossing ones, thus demonstrating that the microcanonical free cumulants encode ETH smooth correlations for both local and extensive observables.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# $\mathtt{emuflow}$: 共同宇宙分析のための正規化フロー $\mathtt{emuflow}$: Normalising Flows for Joint Cosmological Analysis ( http://arxiv.org/abs/2409.01407v1 ) ライセンス: Link先を確認	Arrykrishna Mootoovaloo, Carlos García-García, David Alonso, Jaime Ruiz-Zapatero,	(参考訳) 宇宙論における関心の天文学的データセットの多様性と精度の増大を考えると、最も優れた宇宙論的な制約は、異なる実験のデータを組み合わせることで必ず得られる。可能性レベルでは、そうすることの1つの複雑さは、それぞれの実験のデータを記述する大きな次元のパラメーターモデルよりも疎外する必要があることである。これらには、関心の宇宙学的パラメータの数が比較的少ないことと、多数の「ニュアンス」パラメータが含まれる。複数の実験のために結合パラメータ空間をサンプリングすることは、非常に計算に費用がかかる演算となる。これは、宇宙論的なパラメータの共通集合にのみ依存して、前回の実験の限界宇宙論的な後部分布から直接サンプルを採取できれば、大幅に単純化できる。本稿では, 正規化流を介し, 境界後部分布をエミュレートすることで, 実現可能であることを示す。得られたトレーニングされた正規化フローモデルは、研究中のパラメータ空間の次元を増大させることなく、独立したデータセットからの宇宙的制約を効率的に組み合わせることができる。本手法は,実験の間に大きな緊張関係が存在する場合でも,実際の宇宙的データセットの後方分布と,異なるデータセットの同時分布を正確に記述できることを示す。結果として生じる共同制約は、その可能性のレベルで同じデータセットを組み合わせるのに要する時間の一部で得られる。一般の関心を持つ公開宇宙論データセットの正規化フローモデルを構築し、それらをトレーニングに使用するソフトウェアとともに利用可能にし、宇宙パラメータ推論でそれらを活用する。 Given the growth in the variety and precision of astronomical datasets of interest for cosmology, the best cosmological constraints are invariably obtained by combining data from different experiments. At the likelihood level, one complication in doing so is the need to marginalise over large-dimensional parameter models describing the data of each experiment. These include both the relatively small number of cosmological parameters of interest and a large number of "nuisance" parameters. Sampling over the joint parameter space for multiple experiments can thus become a very computationally expensive operation. This can be significantly simplified if one could sample directly from the marginal cosmological posterior distribution of preceding experiments, depending only on the common set of cosmological parameters. In this paper, we show that this can be achieved by emulating marginal posterior distributions via normalising flows. The resulting trained normalising flow models can be used to efficiently combine cosmological constraints from independent datasets without increasing the dimensionality of the parameter space under study. We show that the method is able to accurately describe the posterior distribution of real cosmological datasets, as well as the joint distribution of different datasets, even when significant tension exists between experiments. The resulting joint constraints can be obtained in a fraction of the time it would take to combine the same datasets at the level of their likelihoods. We construct normalising flow models for a set of public cosmological datasets of general interests and make them available, together with the software used to train them, and to exploit them in cosmological parameter inference.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# 第一原理に基づくデータセット蒸留 : コア情報抽出と目的学習の統合 Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning ( http://arxiv.org/abs/2409.01410v1 ) ライセンス: Link先を確認	Vyacheslav Kungurtsev, Yuanfang Peng, Jianyang Gu, Saeed Vahidian, Anthony Quinn, Fadwa Idlahcen, Yiran Chen,	(参考訳) データセット蒸留(DD)は、トレーニングデータのコア情報をキャプチャして、後者でトレーニングされたモデルで同等のパフォーマンスを達成する合成データセットの構築に焦点を当てる、ますます重要な技術である。 DDには幅広い応用があるが、それを支持する理論はあまり進化していない。 DDの新しい手法は、特定の学習タスクを指向するのではなく、共通のベンチマークセットで比較される。そこで本研究では,DDの形式的モデルとして,対象とする最適化問題を正確に評価するには,関心の応用に関連する推論タスクを指定する必要がある,と論じる。このタスク固有の焦点がなければ、DD問題は未定であり、特定のタスクに対するDDアルゴリズムの選択はヒューリスティックである。我々の形式化は、様々なモデリング環境にまたがるDDの新たな応用を明らかにします。我々は,この広角レンズを用いて既存のDD法を解析し,その強度と限界を最適DD操作に対する精度と忠実度の観点から明らかにした。最後に,現代環境において重要な2つのケーススタディについて,数値的な結果を示す。まず、医療データ分析における重要な課題として、交差するが同一ではない異なるデータセットから知識をマージして、通常小さなサンプル設定である大きなデータセットを構築する。第2に,物理インフォームドニューラルネットワーク(PINN)の境界条件を越えた分布誤差を考察し,DDがより物理的に忠実なデータを提供する可能性を示す。このDDの一般的な定式化を確立することにより、DDを理解するための新たな研究パラダイムを確立し、そこから新たなDD技術が生まれることを目標にしている。 Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter. While DD has a wide range of applications, the theory supporting it is less well evolved. New methods of DD are compared on a common set of benchmarks, rather than oriented towards any particular learning task. In this work, we present a formal model of DD, arguing that a precise characterization of the underlying optimization problem must specify the inference task associated with the application of interest. Without this task-specific focus, the DD problem is under-specified, and the selection of a DD algorithm for a particular task is merely heuristic. Our formalization reveals novel applications of DD across different modeling environments. We analyze existing DD methods through this broader lens, highlighting their strengths and limitations in terms of accuracy and faithfulness to optimal DD operation. Finally, we present numerical results for two case studies important in contemporary settings. Firstly, we address a critical challenge in medical data analysis: merging the knowledge from different datasets composed of intersecting, but not identical, sets of features, in order to construct a larger dataset in what is usually a small sample setting. Secondly, we consider out-of-distribution error across boundary conditions for physics-informed neural networks (PINNs), showing the potential for DD to provide more physically faithful data. By establishing this general formulation of DD, we aim to establish a new research paradigm by which DD can be understood and from which new DD techniques can arise.	翻訳日:2024-09-06 04:14:12 公開日:2024-09-02
# 性能を考慮した自己構成型マルチエージェントネットワーク:同時コーディネートとネットワーク設計のための分散サブモジュールアプローチ Performance-Aware Self-Configurable Multi-Agent Networks: A Distributed Submodular Approach for Simultaneous Coordination and Network Design ( http://arxiv.org/abs/2409.01411v1 ) ライセンス: Link先を確認	Zirui Xu, Vasileios Tzoumas,	(参考訳) マルチエージェントネットワークが通信トポロジを自己設定して,マルチエージェント計画におけるスケーラビリティと最適性のトレードオフをバランスさせる,厳密なアプローチを私たちの知識に導入する。我々は,交通監視,イベント検出,環境探索といった複雑なタスクを実行するために,エージェントとエージェントのコミュニケーションを通じて多数の分散エージェントが協調する,ユビキタスコラボレーティブな自律性の未来を動機としている。しかし、そのような大規模ネットワークにおける情報の爆発は、既存のほぼ最適協調アルゴリズムの計算と通信の要求によって引き起こされる非現実的な決定時間によって、その展開を制限している。この課題を克服するために、ほぼ最適性を保証するスケーラブルアルゴリズムであるAlterNAting Coordination and Network-Design Algorithm (Anaconda)を提案する。エージェントの帯域制限に基づき、Anacondaはエージェントがネットワークの動作座標近似性能を最大化するように、ローカル通信地区を最適化することができる。最先端のアルゴリズムと比較すると、Anacondaは任意のタイプのネットワークに対して、完全な切断から完全に中央集権化まで、そのサブ最適性を保証するアルゴリズムであり、スパースネットワークでは、決定速度の点で1桁高速である。このアルゴリズムを開発するために、分散化による最適以下のコスト、すなわち通信最小分散調整によるコストを定量化する。また,多腕包帯に関する文献から着想を得たツールや,濃度制約を受ける部分モジュラーの最大化も採用している。我々はAnacondaを領域監視のシミュレーションシナリオで示し、それを最先端のアルゴリズムと比較する。 We introduce the first, to our knowledge, rigorous approach that enables multi-agent networks to self-configure their communication topology to balance the trade-off between scalability and optimality during multi-agent planning. We are motivated by the future of ubiquitous collaborative autonomy where numerous distributed agents will be coordinating via agent-to-agent communication to execute complex tasks such as traffic monitoring, event detection, and environmental exploration. But the explosion of information in such large-scale networks currently curtails their deployment due to impractical decision times induced by the computational and communication requirements of the existing near-optimal coordination algorithms. To overcome this challenge, we present the AlterNAting COordination and Network-Design Algorithm (Anaconda), a scalable algorithm that also enjoys near-optimality guarantees. Subject to the agents' bandwidth constraints, Anaconda enables the agents to optimize their local communication neighborhoods such that the action-coordination approximation performance of the network is maximized. Compared to the state of the art, Anaconda is an anytime self-configurable algorithm that quantifies its suboptimality guarantee for any type of network, from fully disconnected to fully centralized, and that, for sparse networks, is one order faster in terms of decision speed. To develop the algorithm, we quantify the suboptimality cost due to decentralization, i.e., due to communication-minimal distributed coordination. We also employ tools inspired by the literature on multi-armed bandits and submodular maximization subject to cardinality constraints. We demonstrate Anaconda in simulated scenarios of area monitoring and compare it with a state-of-the-art algorithm.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# スパース学習のための確率的反復型ハード閾値 Probabilistic Iterative Hard Thresholding for Sparse Learning ( http://arxiv.org/abs/2409.01413v1 ) ライセンス: Link先を確認	Matteo Bergamaschi, Andrea Cristofari, Vyacheslav Kungurtsev, Francesco Rinaldi,	(参考訳) サンプルサイズに対してデータ構造が不都合な統計モデルでは、基底の真理に隠れた空間性を見つけることは、正確な統計モデルを定式化する際に重要である。いわゆる「l0ノルム」は、ベクトルにおけるゼロでない成分の数を数え、最適化問題に組み込むときの間隔を強制する信頼性の高いメカニズムである。しかし、計算の必要性から勾配のノイズが評価されなければならないようなビッグデータ設定では、文献は確実に収束する手法に精通している。本稿では,濃度制約を用いた予測目標最適化問題の解法を提案する。基礎となる確率過程の収束を証明し、2つの機械学習問題における性能を実証する。 For statistical modeling wherein the data regime is unfavorable in terms of dimensionality relative to the sample size, finding hidden sparsity in the ground truth can be critical in formulating an accurate statistical model. The so-called "l0 norm" which counts the number of non-zero components in a vector, is a strong reliable mechanism of enforcing sparsity when incorporated into an optimization problem. However, in big data settings wherein noisy estimates of the gradient must be evaluated out of computational necessity, the literature is scant on methods that reliably converge. In this paper we present an approach towards solving expectation objective optimization problems with cardinality constraints. We prove convergence of the underlying stochastic process, and demonstrate the performance on two Machine Learning problems.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# 位相ポートレート・スケッチによる正規微分方程式のアクティブシンボリック発見 Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching ( http://arxiv.org/abs/2409.01416v1 ) ライセンス: Link先を確認	Nan Jiang, Md Nasim, Yexiang Xue,	(参考訳) 軌道データから通常の微分方程式(ODE)を発見することは、AIによる科学的発見において重要な課題である。最近のODEのシンボリックな発見法は、主にa-prioriを収集した固定トレーニングデータセットに依存しており、図1に示すように、しばしば最適以下の性能をもたらす。アクティブラーニングに着想を得て,情報トラジェクトリデータを問合せして予測されたODEを評価し,トラジェクトリの指定された初期条件によってデータを取得する方法を探る。カオス理論は、力学系の初期条件の小さな変化は、軌道の大規模な初期条件の維持を必要とする、非常に異なる軌道の結果として生じることを示唆している。この課題に対処するために、フェーズポートレート・スケッチ(APPS)を用いて、正規微分方程式のアクティブシンボリック発見を導入する。個々の初期条件を直接選択する代わりに、APPSはまず情報領域を特定し、その領域内の初期条件のバッチをサンプリングする。従来のアクティブな学習手法と比較して、APPSは大量のデータを維持する必要性を排除している。大規模な実験では、APPSは受動的に収集されたデータセットを使用してベースライン法よりも正確なODE式を一貫して発見している。 Discovering Ordinary Differential Equations (ODEs) from trajectory data is a crucial task in AI-driven scientific discovery. Recent methods for symbolic discovery of ODEs primarily rely on fixed training datasets collected a-priori, often leading to suboptimal performance, as observed in our experiments in Figure 1. Inspired by active learning, we explore methods for querying informative trajectory data to evaluate predicted ODEs, where data are obtained by the specified initial conditions of the trajectory. Chaos theory indicates that small changes in the initial conditions of a dynamical system can result in vastly different trajectories, necessitating the maintenance of a large set of initial conditions of the trajectory. To address this challenge, we introduce Active Symbolic Discovery of Ordinary Differential Equations via Phase Portrait Sketching (APPS). Instead of directly selecting individual initial conditions, APPS first identifies an informative region and samples a batch of initial conditions within that region. Compared to traditional active learning methods, APPS eliminates the need for maintaining a large amount of data. Extensive experiments demonstrate that APPS consistently discovers more accurate ODE expressions than baseline methods using passively collected datasets.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# 量子状態準備回路最適化 Quantum State Preparation Circuit Optimization Exploiting Don't Cares ( http://arxiv.org/abs/2409.01418v1 ) ライセンス: Link先を確認	Hanyu Wang, Daniel Bochen Tan, Jason Cong,	(参考訳) 量子状態の準備は量子レジスタを初期化し、量子アルゴリズムの実行に必須である。 2量子ゲートが少なくて効率よく量子ビットを絡める状態準備回路の設計は、精度を高め、デバイス上の結合制約を軽減する。既存の方法は初期回路を合成し、コンパイラを利用して回路のゲート数を削減する。本研究では、局所的なユニタリ同値の破れが状態準備の全体的な結果(すなわち、気にするな)を変化させない量子回路内の多くの条件を同定する。そこで,本研究では,従来の回路の代替として,このようなユニタリを識別するピープホール最適化アルゴリズムを提案する。提案手法は,従来の手法に比べて2ビットゲート数の36%削減を実現している。 Quantum state preparation initializes the quantum registers and is essential for running quantum algorithms. Designing state preparation circuits that entangle qubits efficiently with fewer two-qubit gates enhances accuracy and alleviates coupling constraints on devices. Existing methods synthesize an initial circuit and leverage compilers to reduce the circuit's gate count while preserving the unitary equivalency. In this study, we identify numerous conditions within the quantum circuit where breaking local unitary equivalences does not alter the overall outcome of the state preparation (i.e., don't cares). We introduce a peephole optimization algorithm that identifies such unitaries for replacement in the original circuit. Exploiting these don't care conditions, our algorithm achieves a 36% reduction in the number of two-qubit gates compared to prior methods.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# 漁獲量平均化による消去符号化ニューラルネットワーク推論 Erasure Coded Neural Network Inference via Fisher Averaging ( http://arxiv.org/abs/2409.01420v1 ) ライセンス: Link先を確認	Divyansh Jhunjhunwala, Neharika Jali, Gauri Joshi, Shiqiang Wang,	(参考訳) 消去符号化コンピューティングは、サーバのストラグリングや異種トラフィックの変動といった要因によって引き起こされるテールレイテンシを低減するために、クラウドシステムで成功している。クラウドコンピューティングのトラフィックの大部分は、推論クエリの応答時間が同じ要因によって悪影響を受ける共有リソース上のニューラルネットワーク上の推論で構成されている。しかし、現在の消去符号化技術は主に行列ベクトルや行列行列行列の乗算のような線形計算に重点を置いているため、高非線形ニューラルネットワーク関数では機能しない。本論文では、2つ以上のニューラルネットワークモデルが与えられた場合、与えられたニューラルネットワークの出力の線形結合である符号付きモデルを構築する方法を提案する。我々は,この問題をKLバリセンタ問題として定式化し,対角的なフィッシャー情報を活用する実用的なアルゴリズムCOINを提案し,出力の線形結合を概略出力する符号化モデルを作成する。実世界のビジョンデータセットに基づいてトレーニングされたニューラルネットワーク上で消去符号化を行う実験を行い、COINを用いた復号出力の精度が、計算効率が極めて高いとともに、他のベースラインよりも著しく高いことを示す。 Erasure-coded computing has been successfully used in cloud systems to reduce tail latency caused by factors such as straggling servers and heterogeneous traffic variations. A majority of cloud computing traffic now consists of inference on neural networks on shared resources where the response time of inference queries is also adversely affected by the same factors. However, current erasure coding techniques are largely focused on linear computations such as matrix-vector and matrix-matrix multiplications and hence do not work for the highly non-linear neural network functions. In this paper, we seek to design a method to code over neural networks, that is, given two or more neural network models, how to construct a coded model whose output is a linear combination of the outputs of the given neural networks. We formulate the problem as a KL barycenter problem and propose a practical algorithm COIN that leverages the diagonal Fisher information to create a coded model that approximately outputs the desired linear combination of outputs. We conduct experiments to perform erasure coding over neural networks trained on real-world vision datasets and show that the accuracy of the decoded outputs using COIN is significantly higher than other baselines while being extremely compute-efficient.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# DiffCSG:ラスタ化による差別化可能なCSG DiffCSG: Differentiable CSG via Rasterization ( http://arxiv.org/abs/2409.01421v1 ) ライセンス: Link先を確認	Haocheng Yuan, Adrien Bousseau, Hao Pan, Chengquan Zhang, Niloy J. Mitra, Changjian Li,	(参考訳) 異なるレンダリングは、シーンパラメータ(形状、材料、照明)を最適化し、ターゲット画像に最適なものにすることができるため、逆レンダリングと機械学習の鍵となる要素である。異なるレンダリングでは、各シーンパラメータは、異なる操作を通じてピクセル値に関連付ける必要がある。 3Dメッシュレンダリングアルゴリズムは異なる方法で実装されているが、これらのアルゴリズムは一般的な形状のパラメトリック表現であるコンストラクティブ・ソリッド・ジオメトリー(CSG)に直接拡張されない。 CSGモデルを異なる方法でレンダリングするアルゴリズムDiffCSGを提案する。我々のアルゴリズムはCSGラスタライゼーションに基づいており、結果として得られるメッシュを明示的に計算することなくプリミティブ間のブール演算の結果を表示し、ブラックボックスメッシュ処理をバイパスする。本稿では,CSGラスタ化を異なるレンダリングパイプライン内に実装する方法について述べる。我々のアルゴリズムはシンプルで高速で、現代の機械学習に簡単に組み込むことができ、CSGプリミティブの直接および画像ベースの編集を含む、コンピュータ支援設計のための幅広いアプリケーションを可能にする。コードとデータ:https://yyyyyhc.github.io/DiffCSG/。 Differentiable rendering is a key ingredient for inverse rendering and machine learning, as it allows to optimize scene parameters (shape, materials, lighting) to best fit target images. Differentiable rendering requires that each scene parameter relates to pixel values through differentiable operations. While 3D mesh rendering algorithms have been implemented in a differentiable way, these algorithms do not directly extend to Constructive-Solid-Geometry (CSG), a popular parametric representation of shapes, because the underlying boolean operations are typically performed with complex black-box mesh-processing libraries. We present an algorithm, DiffCSG, to render CSG models in a differentiable manner. Our algorithm builds upon CSG rasterization, which displays the result of boolean operations between primitives without explicitly computing the resulting mesh and, as such, bypasses black-box mesh processing. We describe how to implement CSG rasterization within a differentiable rendering pipeline, taking special care to apply antialiasing along primitive intersections to obtain gradients in such critical areas. Our algorithm is simple and fast, can be easily incorporated into modern machine learning setups, and enables a range of applications for computer-aided design, including direct and image-based editing of CSG primitives. Code and data: https://yyyyyhc.github.io/DiffCSG/.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# 拡散モデルと近似政策最適化の統合による強化学習におけるサンプル効率の向上と探索 Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization ( http://arxiv.org/abs/2409.01427v1 ) ライセンス: Link先を確認	Gao Tianci, Dmitriev D. Dmitry, Konstantin A. Neusypin, Yang Bo, Rao Shengren,	(参考訳) 強化学習(RL)の最近の進歩は、特に高次元および複雑なタスクにおいて、大規模データとディープニューラルネットワークによって加速されている。 PPO(Proximal Policy Optimization)のようなオンラインRL手法は動的シナリオでは有効であるが、かなりのリアルタイムデータを必要とする。 Offline RLは、大規模なデータセットからの事前学習ポリシーによってこの問題に対処するが、その成功はデータの品質と多様性に依存している。本研究では,オフラインデータセットのための高品質な仮想トラジェクトリを生成するために拡散モデルを組み込むことにより,PPOアルゴリズムを強化するフレームワークを提案する。このアプローチは探索とサンプル効率を改善し、複雑なタスクにおける累積報酬、収束速度、戦略安定性を大きく向上させる。 RLにおける拡散モデルの可能性、特にオフラインデータセットについて検討し、オンラインRLをオフライン環境に拡張し、拡散モデルによるPPOの性能改善を実験的に検証する。これらの知見は、RLを高次元の複雑なタスクに適用するための新しい洞察と方法を提供する。最後に、私たちはコードをhttps://github.com/TianciGao/DiffPPOでオープンソース化しました。 Recent advancements in reinforcement learning (RL) have been fueled by large-scale data and deep neural networks, particularly for high-dimensional and complex tasks. Online RL methods like Proximal Policy Optimization (PPO) are effective in dynamic scenarios but require substantial real-time data, posing challenges in resource-constrained or slow simulation environments. Offline RL addresses this by pre-learning policies from large datasets, though its success depends on the quality and diversity of the data. This work proposes a framework that enhances PPO algorithms by incorporating a diffusion model to generate high-quality virtual trajectories for offline datasets. This approach improves exploration and sample efficiency, leading to significant gains in cumulative rewards, convergence speed, and strategy stability in complex tasks. Our contributions are threefold: we explore the potential of diffusion models in RL, particularly for offline datasets, extend the application of online RL to offline environments, and experimentally validate the performance improvements of PPO with diffusion models. These findings provide new insights and methods for applying RL to high-dimensional, complex tasks. Finally, we open-source our code at https://github.com/TianciGao/DiffPPO	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# グラフ上の凸ラベルの自己指向学習 Self-Directed Learning of Convex Labelings on Graphs ( http://arxiv.org/abs/2409.01428v1 ) ライセンス: Link先を確認	Georgy Sokolov, Maximilian Thiessen, Margarita Akhmejanova, Fabio Vitale, Francesco Orabona,	(参考訳) 本研究では,自己指向型学習システムにおいて,与えられたグラフのクラスタを学習する問題について検討する。この学習設定はオンライン学習の変種であり、ノードが提示される順序を決定する敵ではなく、学習者が自律的に適応的に選択する。ユークリッド半空間、線形関数、一般抽象的多クラス仮説クラスの自己指向学習は近年検討されているが、グラフ上の自己指向ノード分類に特有な結果はない。本稿では,効率の良いアルゴリズムを開発するために,この問題に対処する。より具体的には、同じラベルを共有する2つのノードのそれぞれに対して、最も短いパスのすべてのノードが同じラベルを共有する(geodesically)凸クラスタの場合に焦点を当てる。特に、2つの凸クラスタを持つグラフに対してわずか3(h(G)+1)^4 \ln n$ミスしか起こさない多項式時間アルゴリズムを考案する。また、我々のアルゴリズムはクラスタが少し非凸である場合にも頑健であることを示し、なおも$n$の誤り境界対数を達成した。最後に、強い連結ノードが同じクラスに属する傾向にあるより標準的なホモ親和性クラスタの場合、単純で効率的なアルゴリズムを考案する。 We study the problem of learning the clusters of a given graph in the self-directed learning setup. This learning setting is a variant of online learning, where rather than an adversary determining the sequence in which nodes are presented, the learner autonomously and adaptively selects them. While self-directed learning of Euclidean halfspaces, linear functions, and general abstract multi-class hypothesis classes was recently considered, no results previously existed specifically for self-directed node classification on graphs. In this paper, we address this problem developing efficient algorithms for it. More specifically, we focus on the case of (geodesically) convex clusters, i.e., for every two nodes sharing the same label, all nodes on every shortest path between them also share the same label. In particular, we devise a polynomial-time algorithm that makes only $3(h(G)+1)^4 \ln n$ mistakes on graphs with two convex clusters, where $n$ is the total number of nodes and $h(G)$ is the Hadwiger number, i.e., the size of the largest clique minor of the graph $G$. We also show that our algorithm is robust to the case that clusters are slightly non-convex, still achieving a mistake bound logarithmic in $n$. Finally, for the more standard case of homophilic clusters, where strongly connected nodes tend to belong the same class, we devise a simple and efficient algorithm.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# 量子同期ダイナミクスにおけるクビット速度の影響の検討 Investigating the Impact of Qubit Velocity on Quantum Synchronization Dynamics ( http://arxiv.org/abs/2409.01429v1 ) ライセンス: Link先を確認	Amir Hossein Houshmand Almani, Alireza Nourmandipour, Ali Mortezapour,	(参考訳) 本稿では,フシミ$Q$関数を用いて位相空間の進化を解析し,散逸性空洞環境における移動量子ビットの量子同期ダイナミクスについて検討する。本研究は, キュービット速度とシステムデチューニングの相互作用に着目し, キュービットと空洞場の結合強度の範囲について検討した。弱い結合状態(\lambda = 5\gamma$)では、システムは急速にデコヒールし、最小限の同期を示す。逆に、強い結合状態(\lambda = 0.01\gamma$)では、位相ロックや発振挙動を含むより複雑なダイナミクスが観察され、より優れた同期度を示す。この結果から,量子系のコヒーレンスを高めるための潜在的な経路として,量子ビットの速度とデチューンの影響が示唆された。これらの洞察は量子コンピューティングや量子通信に重要な意味を持ち、同期制御は量子状態の安定性とセキュリティを向上させる。さらに、環境パラメータを通して同期を操作する能力は、量子状態の正確な制御が不可欠である量子力学およびセンシングにおける新しい応用を示唆している。 We investigate the quantum synchronization dynamics of a moving qubit within a dissipative cavity environment, leveraging the Husimi $Q$-function to analyze the phase space evolution. The study explores a range of coupling strengths between the qubit and the cavity field, focusing on the interplay between qubit velocity and system detuning. In the weak coupling regime ($\lambda = 5\gamma$), the system rapidly decoheres, exhibiting minimal synchronization. Conversely, in the strong coupling regime ($\lambda = 0.01\gamma$), we observe more intricate dynamics, including phase locking and oscillatory behavior, indicating a better degree of synchronization. Our findings reveal that the qubit's velocity and detuning influence synchronization, offering potential pathways to enhance coherence in quantum systems. These insights have significant implications for quantum computing and quantum communication, where controlling synchronization can improve the stability and security of quantum states. Moreover, the ability to manipulate synchronization through environmental parameters suggests new applications in quantum metrology and sensing, where precise control of quantum states is essential.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# 層適応スカラー化モデルアグリゲーションによるビザンチンレジリエント・フェデレーション学習の実現 Achieving Byzantine-Resilient Federated Learning via Layer-Adaptive Sparsified Model Aggregation ( http://arxiv.org/abs/2409.01435v1 ) ライセンス: Link先を確認	Jiahao Xu, Zikai Zhang, Rui Hu,	(参考訳) フェデレートラーニング(FL)は、複数のクライアントがローカルデータを共有せずに、協調的にモデルをトレーニングすることを可能にする。しかし、FLシステムは、悪質なモデルの更新をアップロードすることでモデルのトレーニングプロセスを妨害することを目的とした、よく設計されたByzantine攻撃に対して脆弱である。既存のロバストアグリゲーションルールに基づく防御手法は、モデル更新の異なる層にまたがる大きさと方向の多様性を見落とし、特に非IID設定において、ロバスト性のパフォーマンスが制限される。これらの課題に対処するため、我々は、階層的適応アグリゲーションと事前アグリゲーション・スパレーションを組み合わせたLayer-Adaptive Sparsified Model Aggregation (LASA)アプローチを提案する。特に、LASAには、アグリゲーション前の各クライアントからの更新をスペーシングし、悪意のあるパラメータの影響を減らし、その後のフィルタリングプロセスにおいて重要でないパラメータから干渉を最小限にする事前集約スペーシフィケーションモジュールが含まれている。分割された更新に基づいて、レイヤワイド適応フィルタは、アグリゲーションのためにすべてのクライアントにわたって、大きさと方向の両方のメトリクスを使用して、良質な層を適応的に選択する。 LASA の詳細な理論的堅牢性解析と LASA と統合された FL のレジリエンス解析について述べる。様々なIIDおよび非IIDデータセットに対して大規模な実験を行う。その結果,LASAの有効性が示された。コードは \url{https://github.com/JiiahaoXU/LASA} で公開されている。 Federated Learning (FL) enables multiple clients to collaboratively train a model without sharing their local data. Yet the FL system is vulnerable to well-designed Byzantine attacks, which aim to disrupt the model training process by uploading malicious model updates. Existing robust aggregation rule-based defense methods overlook the diversity of magnitude and direction across different layers of the model updates, resulting in limited robustness performance, particularly in non-IID settings. To address these challenges, we propose the Layer-Adaptive Sparsified Model Aggregation (LASA) approach, which combines pre-aggregation sparsification with layer-wise adaptive aggregation to improve robustness. Specifically, LASA includes a pre-aggregation sparsification module that sparsifies updates from each client before aggregation, reducing the impact of malicious parameters and minimizing the interference from less important parameters for the subsequent filtering process. Based on sparsified updates, a layer-wise adaptive filter then adaptively selects benign layers using both magnitude and direction metrics across all clients for aggregation. We provide the detailed theoretical robustness analysis of LASA and the resilience analysis for the FL integrated with LASA. Extensive experiments are conducted on various IID and non-IID datasets. The numerical results demonstrate the effectiveness of LASA. Code is available at \url{https://github.com/JiiahaoXU/LASA}.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# Kvasir-VQA: テキストイメージのペアGIトラクトデータセット Kvasir-VQA: A Text-Image Pair GI Tract Dataset ( http://arxiv.org/abs/2409.01437v1 ) ライセンス: Link先を確認	Sushant Gautam, Andrea Storås, Cise Midoglu, Steven A. Hicks, Vajira Thambawita, Pål Halvorsen, Michael A. Riegler,	(参考訳) 我々は,HyperKvasirとKvasir-Instrumentのデータセットから派生した拡張データセットであるKvasir-VQAを紹介した。各種GI路条件および手術器具にまたがる6,500個の注釈付き画像からなるデータセットで、イエス/ノー、選択、位置、数値を含む複数の質問タイプをサポートする。データセットは、画像キャプション、視覚質問回答(VQA)、テキストベースの合成医療画像の生成、オブジェクト検出、分類などのアプリケーションを対象としている。本実験は, 3つのタスクのトレーニングモデルにおけるデータセットの有効性を実証し, 医用画像解析および診断における重要な応用を示すものである。また、各タスクの評価指標を示し、データセットのユーザビリティと汎用性を強調します。データセットとサポートアーティファクトはhttps://datasets.simula.no/kvasir-vqaで公開されている。 We introduce Kvasir-VQA, an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations to facilitate advanced machine learning tasks in Gastrointestinal (GI) diagnostics. This dataset comprises 6,500 annotated images spanning various GI tract conditions and surgical instruments, and it supports multiple question types including yes/no, choice, location, and numerical count. The dataset is intended for applications such as image captioning, Visual Question Answering (VQA), text-based generation of synthetic medical images, object detection, and classification. Our experiments demonstrate the dataset's effectiveness in training models for three selected tasks, showcasing significant applications in medical image analysis and diagnostics. We also present evaluation metrics for each task, highlighting the usability and versatility of our dataset. The dataset and supporting artifacts are available at https://datasets.simula.no/kvasir-vqa.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# 回路レベルの雑音下での量子LDPC符号のニア線形時間復号アルゴリズム An almost-linear time decoding algorithm for quantum LDPC codes under circuit-level noise ( http://arxiv.org/abs/2409.01440v1 ) ライセンス: Link先を確認	Antonio deMarti iOlius, Imanol Etxezarreta Martinez, Joschka Roffe, Josu Etxezarreta Martinez,	(参考訳) フォールトトレラントな量子コンピュータは、量子エラー補正測定情報をリアルタイムで復号する古典的コプロセッサと組み合わせて設計されなければならない。本研究では、量子低密度パリティチェック符号のほぼ線形時間デコーダとして、信念伝搬と順序付きタナーフォレスト(BP+OTF)アルゴリズムを導入する。 OTF後処理段階は、木のような構造になるまで、復号グラフからキュービットを除去する。ループフリーなOTFグラフがシンドロームを生成する量子ビットのサブセットをサポートすると、BP復号が収束することが保証される。回路レベルのノイズ下での性能を向上させるため,検出誤差モデルをスペーシングする手法を提案する。本手法は、転送行列を用いて、全検出器グラフから疎化グラフへのソフト情報をマッピングし、シンドローム抽出回路からの致命的なエラー伝搬情報を保存する。我々のBP+OTF実装は、まず標準BPを全検出器グラフに適用し、続いてスパシファイドグラフ上でBP+OTF後処理を行った。数値シミュレーションにより,BP+OTFデコーダは,全段階にわたってほぼ線形な実行複雑性を維持しつつ,最先端の反転型デコーダの桁数で論理誤差の抑制を実現することを示した。 Fault-tolerant quantum computers must be designed in conjunction with classical co-processors that decode quantum error correction measurement information in real-time. In this work, we introduce the belief propagation plus ordered Tanner forest (BP+OTF) algorithm as an almost-linear time decoder for quantum low-density parity-check codes. The OTF post-processing stage removes qubits from the decoding graph until it has a tree-like structure. Provided that the resultant loop-free OTF graph supports a subset of qubits that can generate the syndrome, BP decoding is then guaranteed to converge. To enhance performance under circuit-level noise, we introduce a technique for sparsifying detector error models. This method uses a transfer matrix to map soft information from the full detector graph to the sparsified graph, preserving critical error propagation information from the syndrome extraction circuit. Our BP+OTF implementation first applies standard BP to the full detector graph, followed by BP+OTF post-processing on the sparsified graph. Numerical simulations show that the BP+OTF decoder achieves logical error suppression within an order of magnitude of state-of-the-art inversion-based decoders while maintaining almost-linear runtime complexity across all stages.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# 波長下格子における有効幾何学的フラストレーションと長距離相互作用からの多体位相 Many-body phases from effective geometrical frustration and long-range interactions in a subwavelength lattice ( http://arxiv.org/abs/2409.01443v1 ) ライセンス: Link先を確認	Domantas Burba, Gediminas Juzeliūnas, Ian B. Spielman, Luca Barbiero,	(参考訳) 幾何学的なフラストレーションと長距離結合は、物理学を通して異なる性質を持つ量子相を作るための鍵となる貢献者である。両成分がラマン誘導サブ波長格子に自然に出現する手法を提案する。ラマン結合型多成分量子ガスは、長距離相互作用を持つ非常に多目的なフラストレーションを持つハバード・ハミルトニアンを実現できることを最初に実証した。深いサブ波長の格子周期は、調整可能な範囲と崩壊を伴う強い長距離粒子間反発をもたらす。フラストレーションと長距離結合の組み合わせは, 共振器の多体相を生成することを数値的に示している。この結果は,量子シミュレーションにおける長距離相互作用とフラストレーションを効率的に組み合わせるための強力なアプローチである。 Geometrical frustration and long-range couplings are key contributors to create quantum phases with different properties throughout physics. We propose a scheme where both ingredients naturally emerge in a Raman induced subwavelength lattice. We first demonstrate that Raman-coupled multicomponent quantum gases can realize a highly versatile frustrated Hubbard Hamiltonian with long-range interactions. The deeply subwavelength lattice period leads to strong long-range interparticle repulsion with tunable range and decay. We numerically demonstrate that the combination of frustration and long-range couplings generates many-body phases of bosons, including a range of density-wave and superfluid phases with broken translational and time reversal symmetries, respectively. Our results thus represent a powerful approach for efficiently combining long-range interactions and frustration in quantum simulations.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# ケースミックス変化下における予測モデルの性能に関する因果的視点--予後と診断予測に異なる対応性を示す識別と校正 A causal viewpoint on prediction model performance under changes in case-mix: discrimination and calibration respond differently for prognosis and diagnosis predictions ( http://arxiv.org/abs/2409.01444v1 ) ライセンス: Link先を確認	Wouter A. C. van Amsterdam,	(参考訳) 予測モデルは、診断、予後、および治療計画において重要な臨床的決定を通知する。これらのモデルの予測性能は通常、識別と校正によって評価される。しかし、データインパクトモデルの性能の分布の変化。医療における典型的な変化はケースミックスの変化であり、例えば、心臓血管のリスク管理では、一般開業医は、第3次病院の専門医とは異なる患者の混在を見出している。本研究は,予測タスクの因果方向に基づいて,ケースミックスシフトが識別と校正に与える影響を区別する新しい枠組みを提案する。予測が因果方向にある場合(しばしば予後予測の場合)、キャリブレーションはケースミックスシフトの下で安定するが、識別はしない。逆に、反因果方向(しばしば診断予測を伴う)で予測する場合、識別は安定しているが、校正は行われない。循環器疾患予測モデルを用いたシミュレーション研究と実証検証により,この枠組みの意義が示された。この枠組みは, 様々な臨床領域における予測モデルの評価と展開に重要な洞察を与え, 予測課題の因果構造を理解することの重要性を強調している。 Prediction models inform important clinical decisions, aiding in diagnosis, prognosis, and treatment planning. The predictive performance of these models is typically assessed through discrimination and calibration. However, changes in the distribution of the data impact model performance. In health-care, a typical change is a shift in case-mix: for example, for cardiovascular risk managment, a general practitioner sees a different mix of patients than a specialist in a tertiary hospital. This work introduces a novel framework that differentiates the effects of case-mix shifts on discrimination and calibration based on the causal direction of the prediction task. When prediction is in the causal direction (often the case for prognosis preditions), calibration remains stable under case-mix shifts, while discrimination does not. Conversely, when predicting in the anti-causal direction (often with diagnosis predictions), discrimination remains stable, but calibration does not. A simulation study and empirical validation using cardiovascular disease prediction models demonstrate the implications of this framework. This framework provides critical insights for evaluating and deploying prediction models across different clinical settings, emphasizing the importance of understanding the causal structure of the prediction task.	翻訳日:2024-09-06 04:02:22 公開日:2024-09-02
# 海からのシンク:大規模データセットからビデオを取り出す Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets ( http://arxiv.org/abs/2409.01445v1 ) ライセンス: Link先を確認	Ishan Rajendrakumar Dave, Fabian Caba Heilbron, Mubarak Shah, Simon Jenni,	(参考訳) テンポラルビデオアライメントは、オブジェクトのインタラクションやアクションフェーズの遷移といった重要なイベントを2つのビデオで同期することを目的としている。このような方法は様々なビデオ編集、処理、理解の作業に役立てることができる。しかし、既存のアプローチは、アライメントに適したビデオペアが与えられるという制限的な仮定の下で動作し、より広範な適用性を著しく制限する。そこで我々は,時間的アライメントを検索問題として再検討し,AVR(Alignable Video Retrieval)の課題を紹介した。クェリビデオが与えられた場合、我々は大量のクリップから良質な映像を識別し、時間的にクェリに同期させることができる。これを達成するために、私たちは3つの重要な貢献をします。 1) ビデオ整合性指標であるDRAQを導入し, 最適な整合性ビデオを特定し, 再ランク付けする。 2)複数のオフザシェルフ特徴表現のアライメント性能を改善するために,有効で一般化可能なフレームレベルの映像特徴設計を提案する。 3) サイクル一貫性メトリクスを用いたAVRのための新しいベンチマークと評価プロトコルを提案する。大規模なKineetics700を含む3つのデータセットに関する実験は、多様なデータセットから調整可能なビデオペアを識別する手法の有効性を実証する。 Project Page: https://daveishan.github.io/avr-webpage/.com Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos. Such methods could benefit various video editing, processing, and understanding tasks. However, existing approaches operate under the restrictive assumption that a suitable video pair for alignment is given, significantly limiting their broader applicability. To address this, we re-pose temporal alignment as a search problem and introduce the task of Alignable Video Retrieval (AVR). Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query. To achieve this, we make three key contributions: 1) we introduce DRAQ, a video alignability indicator to identify and re-rank the best alignable video from a set of candidates; 2) we propose an effective and generalizable frame-level video feature design to improve the alignment performance of several off-the-shelf feature representations, and 3) we propose a novel benchmark and evaluation protocol for AVR using cycle-consistency metrics. Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach in identifying alignable video pairs from diverse datasets. Project Page: https://daveishan.github.io/avr-webpage/.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# 多出力混合回帰と分類を用いたランドスケープ対応自動アルゴリズム構成 Landscape-Aware Automated Algorithm Configuration using Multi-output Mixed Regression and Classification ( http://arxiv.org/abs/2409.01446v1 ) ライセンス: Link先を確認	Fu Xing Long, Moritz Frenzel, Peter Krause, Markus Gitterle, Thomas Bäck, Niki van Stein,	(参考訳) ランドスケープ・アウェア・アルゴリズム選択問題において,特徴に基づく予測モデルの有効性は,実践的応用のためのトレーニングデータの表現性に強く依存する。本研究では,モデル学習におけるランダム生成関数 (RGF) の可能性について検討する。これは,広く使用されているブラックボックス最適化ベンチマーク (BBOB) スイートと比較して,より多様な最適化問題クラスをカバーする。問題インスタンスのランドスケープ特性に基づいて最適なアルゴリズムを選択し、そのハイパーパラメータを微調整する自動アルゴリズム構成(AAC)に焦点を当てる。 RGFやマルチアフィンBBOB(MA-BBOB)関数などの異なるトレーニングデータセットを用いて,多出力混合回帰および分類タスクの処理における高密度ニューラルネットワーク(NN)モデルの性能を解析した。 5d と 20d の BBOB 関数に関する結果に基づいて,提案手法を用いて,最適に近い構成を同定することができる。さらに、予測された構成は、多くの場合、単一最適解法と競合する。全体としては、RGFとMA-BBOB関数の組み合わせでトレーニングされたNNモデルを使用することで、より優れたパフォーマンスを持つ構成を最もよく識別することができる。 In landscape-aware algorithm selection problem, the effectiveness of feature-based predictive models strongly depends on the representativeness of training data for practical applications. In this work, we investigate the potential of randomly generated functions (RGF) for the model training, which cover a much more diverse set of optimization problem classes compared to the widely-used black-box optimization benchmarking (BBOB) suite. Correspondingly, we focus on automated algorithm configuration (AAC), that is, selecting the best suited algorithm and fine-tuning its hyperparameters based on the landscape features of problem instances. Precisely, we analyze the performance of dense neural network (NN) models in handling the multi-output mixed regression and classification tasks using different training data sets, such as RGF and many-affine BBOB (MA-BBOB) functions. Based on our results on the BBOB functions in 5d and 20d, near optimal configurations can be identified using the proposed approach, which can most of the time outperform the off-the-shelf default configuration considered by practitioners with limited knowledge about AAC. Furthermore, the predicted configurations are competitive against the single best solver in many cases. Overall, configurations with better performance can be best identified by using NN models trained on a combination of RGF and MA-BBOB functions.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# ゼロサム確率ゲームにおけるペイオフ型独立学習の終局収束 Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games ( http://arxiv.org/abs/2409.01447v1 ) ライセンス: Link先を確認	Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman,	(参考訳) 本稿では,2人プレイヤゼロサム行列と確率ゲームについて考察し,2人プレイヤ間のペイオフベース,収束,有理,対称な学習ダイナミクスを開発する。具体的には、行列ゲームに対する学習ダイナミクスは、スムーズ化された最適応答ダイナミクスに基づいており、一方確率ゲームに対する学習ダイナミクスは、行列ゲームに対する学習ダイナミクスの上に構築され、最小値の反復を付加する。我々の知る限り、我々の理論的結果は、最後の保証付き学習力学の有限サンプル解析を初めて提示する。行列ゲーム設定では、結果は、ナッシュ分布を見つけるために$O(\epsilon^{-1})$のサンプル複雑性と、ナッシュ平衡を求めるために$O(\epsilon^{-8})$のサンプル複雑性を意味する。確率ゲーム設定では、結果はナッシュ均衡を求めるために$O(\epsilon^{-8})$のサンプル複雑性をも意味している。これらの結果を確立するために、主な課題は、(おそらく)異なる時間スケールで進化する複数の結合および確率的反復からなる確率近似アルゴリズムを扱うことである。この課題を克服するため,我々は,確率近似アルゴリズムの収束挙動を研究対象とする,リアプノフをベースとした手法を開発した。 In this paper, we consider two-player zero-sum matrix and stochastic games and develop learning dynamics that are payoff-based, convergent, rational, and symmetric between the two players. Specifically, the learning dynamics for matrix games are based on the smoothed best-response dynamics, while the learning dynamics for stochastic games build upon those for matrix games, with additional incorporation of the minimax value iteration. To our knowledge, our theoretical results present the first finite-sample analysis of such learning dynamics with last-iterate guarantees. In the matrix game setting, the results imply a sample complexity of $O(\epsilon^{-1})$ to find the Nash distribution and a sample complexity of $O(\epsilon^{-8})$ to find a Nash equilibrium. In the stochastic game setting, the results also imply a sample complexity of $O(\epsilon^{-8})$ to find a Nash equilibrium. To establish these results, the main challenge is to handle stochastic approximation algorithms with multiple sets of coupled and stochastic iterates that evolve on (possibly) different time scales. To overcome this challenge, we developed a coupled Lyapunov-based approach, which may be of independent interest to the broader community studying the convergence behavior of stochastic approximation algorithms.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# FinePseudo: 半監督された細粒度行動認識のための時間的合理性による擬似ラベルの改善 FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition ( http://arxiv.org/abs/2409.01448v1 ) ライセンス: Link先を確認	Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Mubarak Shah,	(参考訳) 実生活における行動認識の応用は、スポーツ分析、AR/VRにおけるユーザーインタラクション、手術ビデオなど、微妙な動きの微妙な理解を必要とすることが多い。微粒な動作はアノテートするのによりコストがかかるが、既存の半監督的な動作認識は主に粗粒な動作認識に焦点を当てている。シーンバイアスがないため、きめ細かいアクションはより難しいため、これらのアクションを分類するにはアクションフェーズを理解する必要がある。したがって、既存の粗い半教師付き手法は効果的に機能しない。本研究は, 半教師付き細粒度行動認識(FGAR)を初めて徹底的に検討した。我々は、動的時間ワープ(DTW)のようなアライメント距離が、FGARで以前は明らかにされていなかった、きめ細かい動作を比較するのに適したアクションフェーズアウェア尺度を提供することを観察した。しかし、通常のDTW距離はペアワイズであり、ペア間の厳密なアライメントを前提としているため、きめ細かい動作の分類にはあまり適していない。このようなアライメント距離を限定ラベル設定で利用するために,微粒化作用対を効果的に識別するアライナビリティ検証に基づくメトリック学習手法を提案する。学習可能な整合性スコアは、一次ビデオエンコーダの擬似ラベルを洗練するために、より優れた位相認識尺度を提供する。協調的な擬似ラベルベースのフレームワークである '\textit{FinePseudo}' は,4つのきめ細かいアクション認識データセットである Diving48, FineGym99, FineGym288, FineDiving において,従来の粗い粒度のデータセットである Kinetics400 と Something-SomethingV2 を著しく上回っている。また、オープンワールドの半教師付きセットアップにおいて、新しい未ラベルクラスを扱う上で、協調的な擬似ラベルの堅牢性を示す。 Project Page: https://daveishan.github.io/finepsuedo-webpage/.com Real-life applications of action recognition often require a fine-grained understanding of subtle movements, e.g., in sports analytics, user interactions in AR/VR, and surgical videos. Although fine-grained actions are more costly to annotate, existing semi-supervised action recognition has mainly focused on coarse-grained action recognition. Since fine-grained actions are more challenging due to the absence of scene bias, classifying these actions requires an understanding of action-phases. Hence, existing coarse-grained semi-supervised methods do not work effectively. In this work, we for the first time thoroughly investigate semi-supervised fine-grained action recognition (FGAR). We observe that alignment distances like dynamic time warping (DTW) provide a suitable action-phase-aware measure for comparing fine-grained actions, a concept previously unexploited in FGAR. However, since regular DTW distance is pairwise and assumes strict alignment between pairs, it is not directly suitable for classifying fine-grained actions. To utilize such alignment distances in a limited-label setting, we propose an Alignability-Verification-based Metric learning technique to effectively discriminate between fine-grained action pairs. Our learnable alignability score provides a better phase-aware measure, which we use to refine the pseudo-labels of the primary video encoder. Our collaborative pseudo-labeling-based framework `\textit{FinePseudo}' significantly outperforms prior methods on four fine-grained action recognition datasets: Diving48, FineGym99, FineGym288, and FineDiving, and shows improvement on existing coarse-grained datasets: Kinetics400 and Something-SomethingV2. We also demonstrate the robustness of our collaborative pseudo-labeling in handling novel unlabeled classes in open-world semi-supervised setups. Project Page: https://daveishan.github.io/finepsuedo-webpage/.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# 強化学習におけるトレース単位を用いたリアルタイム反復学習 Real-Time Recurrent Learning using Trace Units in Reinforcement Learning ( http://arxiv.org/abs/2409.01449v1 ) ライセンス: Link先を確認	Esraa Elelimy, Adam White, Michael Bowling, Martha White,	(参考訳) リカレントニューラルネットワーク(RNN)は、部分的に観測可能な環境で表現を学ぶために使用される。オンライン学習や環境との継続的な対話を行うエージェントに対しては、RTRL(Real-time Recurrent Learning)を用いてRNNをトレーニングすることが望ましい。有望な方向はリニアリカレントアーキテクチャ(LRU)を使用することで、高密度リカレント重みを複素値の対角線に置き換え、RTRLを効率的にする。本研究では、これらの知見に基づいて、オンラインRLにおけるRNNのトレーニングに軽量で効果的なアプローチを提供する。 RTU(Recurrent Trace Units)は,RTLのトレーニングにおいて,LRUに対して大きなパフォーマンス上のメリットがあるにもかかわらず,LRUに対する小さな修正である。 RTUは、いくつかの部分観測可能な環境において、計算量を大幅に減らしながら、他の再帰的アーキテクチャを著しく上回っている。 Recurrent Neural Networks (RNNs) are used to learn representations in partially observable environments. For agents that learn online and continually interact with the environment, it is desirable to train RNNs with real-time recurrent learning (RTRL); unfortunately, RTRL is prohibitively expensive for standard RNNs. A promising direction is to use linear recurrent architectures (LRUs), where dense recurrent weights are replaced with a complex-valued diagonal, making RTRL efficient. In this work, we build on these insights to provide a lightweight but effective approach for training RNNs in online RL. We introduce Recurrent Trace Units (RTUs), a small modification on LRUs that we nonetheless find to have significant performance benefits over LRUs when trained with RTRL. We find RTUs significantly outperform other recurrent architectures across several partially observable environments while using significantly less computation.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# 3D-LSPTM:喉頭画像を用いた喉頭癌検診のための3D-LSPTMによる3D-Large-scale Pretrained Modelの自動フレームワーク 3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model for Laryngeal Cancer Detection Using Laryngoscopic Videos ( http://arxiv.org/abs/2409.01459v1 ) ライセンス: Link先を確認	Meiyu Qiu, Yun Li, Wenjun Huang, Haoyun Zhang, Weiping Zheng, Wenbin Lei, Xiaomao Fan,	(参考訳) 喉頭癌は耳鼻咽喉科において高い道徳率を有する悪性疾患であり、ヒトの健康に重大な脅威をもたらす。伝統的に喉頭科医は喉頭内視鏡的ビデオで手動で喉頭癌を視認するが、それは非常に時間がかかり、主観的である。本研究では,喉頭癌検出のための3D-LSPTMと呼ばれる3次元大規模事前訓練モデルを用いた新しい自動フレームワークを提案する。まず, 第一提携病院サンヤットセン大学から, 倫理委員会の承認を得て1,109本の喉頭鏡映像を収集した。次に,C3D,TimeSformer,Video-Swin-Transformerの3次元大規模事前訓練モデルを用いて,喉頭癌検出のための微細調整技術を開発した。今回提案した3D-LSPTMは喉頭癌検出の課題において有望な性能を発揮することが確認された。特に、Video-Swin-Transformerのバックボーンを持つ3D-LSPTMは92.4%の精度、95.6%の感度、94.1%の精度、94.8%のF_1を達成することができる。 Laryngeal cancer is a malignant disease with a high morality rate in otorhinolaryngology, posing an significant threat to human health. Traditionally larygologists manually visual-inspect laryngeal cancer in laryngoscopic videos, which is quite time-consuming and subjective. In this study, we propose a novel automatic framework via 3D-large-scale pretrained models termed 3D-LSPTM for laryngeal cancer detection. Firstly, we collect 1,109 laryngoscopic videos from the First Affiliated Hospital Sun Yat-sen University with the approval of the Ethics Committee. Then we utilize the 3D-large-scale pretrained models of C3D, TimeSformer, and Video-Swin-Transformer, with the merit of advanced featuring videos, for laryngeal cancer detection with fine-tuning techniques. Extensive experiments show that our proposed 3D-LSPTM can achieve promising performance on the task of laryngeal cancer detection. Particularly, 3D-LSPTM with the backbone of Video-Swin-Transformer can achieve 92.4% accuracy, 95.6% sensitivity, 94.1% precision, and 94.8% F_1.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# 弱値の時間微分 Time Derivatives of Weak Values ( http://arxiv.org/abs/2409.01460v1 ) ライセンス: Link先を確認	Xavier Oriols,	(参考訳) 物理的性質の時間微分は、しばしば別の意味のある性質をもたらす。弱い値は期待値から導出できない経験的洞察を与えるため,弱い値の時間微分から得られる物理的特性について検討する。これは一般にゲージ不変弱値の時間微分が弱値でもゲージ不変元でもないことを示している。弱い値の時間微分がゲージ不変の弱い値であることを保証するために、2つの条件が提示される。そのような条件下では、局所的なエレンフェストのような定理が提示される: 位置のユニークな測定された弱い値は、弱値の1階と2階の時間微分を通して、系の局所速度と加速度に関する情報を与えるが、それ以上の測定は行わない。これらの結果は、実験者が弱値理論を実験室のセットアップに変換するためのガイドラインとなり、革新的な量子技術への道を開いた。例えば、弱い値の1階と2階の時間微分から、特定の位置と時間で電磁場がどのように得られるかを示す。 The time derivative of a physical property frequently gives rise to another meaningful property. Since weak values offer empirical insights that cannot be derived from expectation values, this paper investigates what physical properties can be obtained from the time derivative of weak values. It shows that, in general, the time derivative of a gauge-invariant weak value is neither a weak value nor a gauge-invariant element. Two conditions are presented to ensure that a time derivative of a weak value is also a gauge-invariant weak value. Under such conditions, a local Ehrenfest-like theorem is presented: a uniquely measured weak value of the position gives information on the local velocity and the acceleration of the system, through the first-order and second-order time derivatives of the weak value, without further measurements. These results serve as guidelines for experimentalists to translate the weak value theory into laboratory setups, paving the way for innovative quantum technologies. An example illustrates how the electromagnetic field can be obtained, at specific positions and times, from a weak value's first- and second-order time derivatives.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# カナダ移民移住セクターのための人間中心型AI応用 Human-Centered AI Applications for Canada's Immigration Settlement Sector ( http://arxiv.org/abs/2409.01461v1 ) ライセンス: Link先を確認	Isar Nejadgholi, Maryam Molamohammadi, Kimiya Missaghi, Samir Bakhtawar,	(参考訳) AIは移民の文脈で頻繁に適用されているが、これらのアプリケーションは選択とスクリーニングに重点を置いており、主に州や当局の強化に役立っている。対照的に、この論文は、情報へのアクセスが重要であり、サービスプロバイダが過大評価される、カナダの移民解決段階におけるAIの可能性を強調している。決済部門を信頼できるAIアプリケーションの主要な候補として強調することで、移民を直接支援するユニークな能力を示しているが、AI研究では未調査である。我々は、新参者の統合を促進する、人間中心で責任あるAIソリューションに対するビジョンを概説する。私たちはAI研究者に、私たちの仕事の上に構築し、サービス提供者や政府組織と活発に協力し、権限、包括性、安全を付与するAIツールを開発するよう呼びかけています。 While AI has been frequently applied in the context of immigration, most of these applications focus on selection and screening, which primarily serve to empower states and authorities, raising concerns due to their understudied reliability and high impact on immigrants' lives. In contrast, this paper emphasizes the potential of AI in Canada's immigration settlement phase, a stage where access to information is crucial and service providers are overburdened. By highlighting the settlement sector as a prime candidate for reliable AI applications, we demonstrate its unique capacity to empower immigrants directly, yet it remains under-explored in AI research. We outline a vision for human-centred and responsible AI solutions that facilitate the integration of newcomers. We call on AI researchers to build upon our work and engage in multidisciplinary research and active collaboration with service providers and government organizations to develop tailored AI tools that are empowering, inclusive and safe.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# ベイズ推論のためのスタイン輸送 Stein transport for Bayesian inference ( http://arxiv.org/abs/2409.01464v1 ) ライセンス: Link先を確認	Nikolas Nüsken,	(参考訳) ベイズ推論の新しい手法である$\textit{Stein transport}$を導入し、温度分布の事前定義された曲線に沿って粒子のアンサンブルを効率的に押し上げる。駆動ベクトル場は再生カーネルヒルベルト空間から選択され、適切なカーネルリッジ回帰定式化またはスタイン幾何学における無限小最適輸送写像として導出することができる。ステイン輸送の更新方程式は、スタイン変分勾配降下(SVGD)と似ているが、時間変化のスコア関数と粒子に付着する比重を導入する。 SVGDは長期の極限における収束に依存するが、スタイン輸送はその後部近似に有限時間で$t=1$で到達する。平均場限界について,正則化と有限粒子効果による誤差を考察し,スタイン輸送を生死ダイナミクスとフィッシャー-ラオ勾配流に接続する。 SVGDと比較して,Stein輸送は計算予算を大幅に削減した上で,より正確な後方近似に到達するだけでなく,SVGDでよく見られる分散崩壊現象を効果的に緩和することを示した。 We introduce $\textit{Stein transport}$, a novel methodology for Bayesian inference designed to efficiently push an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can be derived either through a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map in the Stein geometry. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a time-varying score function as well as specific weights attached to the particles. While SVGD relies on convergence in the long-time limit, Stein transport reaches its posterior approximation at finite time $t=1$. Studying the mean-field limit, we discuss the errors incurred by regularisation and finite-particle effects, and we connect Stein transport to birth-death dynamics and Fisher-Rao gradient flows. In a series of experiments, we show that in comparison to SVGD, Stein transport not only often reaches more accurate posterior approximations with a significantly reduced computational budget, but that it also effectively mitigates the variance collapse phenomenon commonly observed in SVGD.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# PoliPrompt: 政治科学のための高性能コスト効果 LLM ベースのテキスト分類フレームワーク PoliPrompt: A High-Performance Cost-Effective LLM-Based Text Classification Framework for Political Science ( http://arxiv.org/abs/2409.01466v1 ) ライセンス: Link先を確認	Menglin Liu, Ge Shi,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、広範な特徴工学、ヒューマンラベリング、タスク固有の訓練を必要とする伝統的な機械学習手法を超越して、政治科学におけるテキスト分類効率を高める新たな道を開いた。しかし、高い分類精度を達成する上での有効性は疑問視されている。本稿では,LLMを利用して実験コストを最小化しながら,分類精度を向上させる3段階のインコンテキスト学習手法を提案する。提案手法は, 自動拡張プロンプト生成, 適応的指数選択, および高度LLMにより改良された2つの弱いLDM間の相違を解消するコンセンサス機構を含む。我々は,BBCの報道,カバノー最高裁判所の確認,2018年の選挙キャンペーン広告のデータセットを用いて,我々のアプローチを検証する。その結果、従来の機械学習の限界を効果的に解決し、政治的科学におけるテキスト分析のスケーラブルで信頼性の高いソリューションを提供しながら、F1スコア(ゼロショット分類では+0.36)を管理可能な経済コスト(78%)で大幅に改善した。 Recent advancements in large language models (LLMs) have opened new avenues for enhancing text classification efficiency in political science, surpassing traditional machine learning methods that often require extensive feature engineering, human labeling, and task-specific training. However, their effectiveness in achieving high classification accuracy remains questionable. This paper introduces a three-stage in-context learning approach that leverages LLMs to improve classification accuracy while minimizing experimental costs. Our method incorporates automatic enhanced prompt generation, adaptive exemplar selection, and a consensus mechanism that resolves discrepancies between two weaker LLMs, refined by an advanced LLM. We validate our approach using datasets from the BBC news reports, Kavanaugh Supreme Court confirmation, and 2018 election campaign ads. The results show significant improvements in classification F1 score (+0.36 for zero-shot classification) with manageable economic costs (-78% compared with human labeling), demonstrating that our method effectively addresses the limitations of traditional machine learning while offering a scalable and reliable solution for text analysis in political science.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# オープンエンディング進化力学のプラットフォームとしてのSwarm Systems Swarm Systems as a Platform for Open-Ended Evolutionary Dynamics ( http://arxiv.org/abs/2409.01469v1 ) ライセンス: Link先を確認	Hiroki Sayama,	(参考訳) 人工スワムシステムはコンピュータ科学、ロボティクス、工学、その他の技術分野で広く研究され、主に事前に定義された目的を達成するために堅牢な分散システムを実装するためのプラットフォームとして利用されている。しかし、そのようなスワム系、特に異種系は、事前定義された目標に向かって収束せず、多様な可能性を探究し、新しい出力を無限に生成するオープンな進化力学を作成するための理想的なプラットフォームとして利用することもできる。本稿では,Swarm Chemistryとその変種を具体例として,設計空間の濃度跳躍,マルチスケール構造・挙動とその多様性,頑健な自己組織化,創発的パターンの自己修復・生態的相互作用など,不均一なSwarmシステムの有用な特性を示す。科学、工学、芸術・エンターテイメントへの応用、さらなる研究の方向性についても論じる。 Artificial swarm systems have been extensively studied and used in computer science, robotics, engineering and other technological fields, primarily as a platform for implementing robust distributed systems to achieve pre-defined objectives. However, such swarm systems, especially heterogeneous ones, can also be utilized as an ideal platform for creating open-ended evolutionary dynamics that do not converge toward pre-defined goals but keep exploring diverse possibilities and generating novel outputs indefinitely. In this article, we review Swarm Chemistry and its variants as concrete sample cases to illustrate beneficial characteristics of heterogeneous swarm systems, including the cardinality leap of design spaces, multiscale structures/behaviors and their diversity, and robust self-organization, self-repair and ecological interactions of emergent patterns, all of which serve as the driving forces for open-ended evolutionary processes. Applications to science, engineering, and art/entertainment as well as the directions of further research are also discussed.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# Phantom: セミスーパービジョンの学習に対する未ターゲットの毒殺攻撃(フルバージョン) Phantom: Untargeted Poisoning Attacks on Semi-Supervised Learning (Full Version) ( http://arxiv.org/abs/2409.01470v1 ) ライセンス: Link先を確認	Jonathan Knauer, Phillip Rieger, Hossein Fereidooni, Ahmad-Reza Sadeghi,	(参考訳) Deep Neural Networks(DNN)は、ますます複雑なタスクを処理することができる。ソーシャルネットワークなどのユーザ生成コンテンツを持つプラットフォームからデータを収集することで、DNNをトレーニングするための大規模なデータセットの取得が大幅に簡単になった。これらの進歩にもかかわらず、手作業によるラベリングプロセスは、時間とコストの両面で大きな課題である。これに対して、Semi-Supervised Learning (SSL)アプローチが登場し、データセットのごく一部をラベル付けする必要があり、大多数はラベル付けされていない。しかし、ソーシャルネットワークのような信頼できない情報源からのデータを活用すれば、攻撃者が操作されたサンプルを簡単に注入できるため、新たなセキュリティリスクも生じる。 SSLのセキュリティに関する以前の研究は、主に訓練されたモデルにバックドアを注入することに焦点を当てていたが、より困難な未標的の毒殺攻撃には注意が払われなかった。本稿では、SSLにおける最初の未標的の毒殺攻撃であるPhantomを紹介し、少数の操作済み画像をラベルなしデータセットに注入することにより、トレーニングプロセスを妨害する。既存の攻撃と異なり、我々のアプローチでは、被害者を制御せずに、ソーシャルネットワークに画像を投稿するなど、操作されたサンプルをほとんど追加するしかありません。 PhantomはSSLアルゴリズムに実際の画像のピクセルを見落としさせ、実際の画像に重ねられた名前の悪質なパターンにのみ依存させる。 6つの異なるデータセットと3つの現実世界のソーシャルメディアプラットフォーム(Facebook、Instagram、Pinterest)に対するPhantomの有効性を示します。既に操作されたサンプルのごく一部(例: 5 %)は、結果モデルの精度を10 %削減し、高い割合は、単純分類器に匹敵する性能をもたらす。以上の結果から,ユーザが生成したコンテンツプラットフォームを害する危険性が指摘され,特定のタスクにおいてSSLに適さないことが示唆された。 Deep Neural Networks (DNNs) can handle increasingly complex tasks, albeit they require rapidly expanding training datasets. Collecting data from platforms with user-generated content, such as social networks, has significantly eased the acquisition of large datasets for training DNNs. Despite these advancements, the manual labeling process remains a substantial challenge in terms of both time and cost. In response, Semi-Supervised Learning (SSL) approaches have emerged, where only a small fraction of the dataset needs to be labeled, leaving the majority unlabeled. However, leveraging data from untrusted sources like social networks also creates new security risks, as potential attackers can easily inject manipulated samples. Previous research on the security of SSL primarily focused on injecting backdoors into trained models, while less attention was given to the more challenging untargeted poisoning attacks. In this paper, we introduce Phantom, the first untargeted poisoning attack in SSL that disrupts the training process by injecting a small number of manipulated images into the unlabeled dataset. Unlike existing attacks, our approach only requires adding few manipulated samples, such as posting images on social networks, without the need to control the victim. Phantom causes SSL algorithms to overlook the actual images' pixels and to rely only on maliciously crafted patterns that \ourname superimposed on the real images. We show Phantom's effectiveness for 6 different datasets and 3 real-world social-media platforms (Facebook, Instagram, Pinterest). Already small fractions of manipulated samples (e.g., 5\%) reduce the accuracy of the resulting model by 10\%, with higher percentages leading to a performance comparable to a naive classifier. Our findings demonstrate the threat of poisoning user-generated content platforms, rendering them unsuitable for SSL in specific tasks.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# 構造的分解からの再構成による画像ラベルからのセマンティックセグメンテーション Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition ( http://arxiv.org/abs/2409.01472v1 ) ライセンス: Link先を確認	Xuanrui Zeng,	(参考訳) 画像タグからの弱教師付きイメージセグメンテーション(WSSS)は、制約の少ない性質のため、依然として困難である。ほとんどの研究は、クラスアクティベーションマップ(CAM)の抽出と、様々な追加の正規化に重点を置いている。主流とは対照的に、マスクを用いて画像の分解から再構築する問題としてWSSSの枠組みを提案し、ほとんどの正規化が新たな問題の枠組みに暗黙的に埋め込まれている。提案手法は,初期実験において有望な結果を示し,背景曖昧性問題に対する堅牢性を示した。我々のコードは \url{https://github.com/xuanrui-work/WSSSByRec} で利用可能です。 Weakly supervised image segmentation (WSSS) from image tags remains challenging due to its under-constraint nature. Most mainstream work focus on the extraction of class activation map (CAM) and imposing various additional regularization. Contrary to the mainstream, we propose to frame WSSS as a problem of reconstruction from decomposition of the image using its mask, under which most regularization are embedded implicitly within the framework of the new problem. Our approach has demonstrated promising results on initial experiments, and shown robustness against the problem of background ambiguity. Our code is available at \url{https://github.com/xuanrui-work/WSSSByRec}.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# 量子力学と最大速度境界における情報の伝播について On Propagation of Information in Quantum Mechanics and Maximal Velocity Bounds ( http://arxiv.org/abs/2409.01473v1 ) ライセンス: Link先を確認	Israel Michael Sigal, Xiaoxu Wu,	(参考訳) 我々は、量子力学における量子情報の進化に関連する重要な概念を再考し、指数誤差境界を持つ状態や観測可能な状態に対する量子情報の伝播の最大速度について一様境界を証明した。我々の結果は、特にリーブ・ロビンソン境界の量子力学的バージョンであり、量子情報の伝播に様々な制約をもたらすことが知られていることを示唆している。本稿では,最大速度境界を証明するための新しい手法を提案する。 We revisit key notions related to evolution of quantum information in quantum mechanics and prove uniform bounds on the maximal speed of propagation of quantum information for states and observables with exponential error bounds. Our results imply, in particular, a quantum mechanical version of the Lieb-Robinson bound, which is known to yield various constraints on propagation of quantum information. We propose a novel approach to proving maximal speed bounds.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# Actor-Critic アルゴリズムの適合勾配近似 Compatible Gradient Approximations for Actor-Critic Algorithms ( http://arxiv.org/abs/2409.01477v1 ) ライセンス: Link先を確認	Baturay Saglam, Dionysis Kalogerias,	(参考訳) 決定論的ポリシー勾配アルゴリズムは、連続システムの制御においてアクター批判的手法の基礎となるが、しばしば、入力アクションに対する批評家の値推定の導出に依存するため、不正確な問題に遭遇する。この依存には、関数近似の下で困難なタスクである、正確なアクション値勾配計算が必要である。本稿では,アクション空間内の2点確率勾配推定を通じて,アクション値勾配のゼロ階近似を用いることで,そのような精度の必要性を回避できるアクタ批判アルゴリズムを提案する。このアプローチは、決定論的ポリシー勾配スキームに固有の互換性の問題に対して、有効かつ効果的に対処する。さらに実験結果から,本アルゴリズムが現在の最先端手法の性能を上回ることが確認された。 Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with respect to input actions. This reliance requires precise action-value gradient computations, a task that proves challenging under function approximation. We introduce an actor-critic algorithm that bypasses the need for such precision by employing a zeroth-order approximation of the action-value gradient through two-point stochastic gradient estimation within the action space. This approach provably and effectively addresses compatibility issues inherent in deterministic policy gradient schemes. Empirical results further demonstrate that our algorithm not only matches but frequently exceeds the performance of current state-of-the-art methods.	翻訳日:2024-09-06 03:48:38 公開日:2024-09-02
# 言語生成と検索のためのマスケミキサー Masked Mixers for Language Generation and Retrieval ( http://arxiv.org/abs/2409.01482v1 ) ライセンス: Link先を確認	Benjamin L. Badger,	(参考訳) 入力要素の厳密な部分集合に選択的に焦点をあてる注意機構は、今日の言語モデルにおいてほぼどこでも見られる。私たちは、注意力の使用のマイナス面として、入力に存在するほとんどの情報が必然的に失われていると仮定します。この考え方を支持するために、変換器における入力表現の精度が低いが、自己認識をマスク畳み込みに置き換えるマスクミキサーと呼ばれるものにおいて、より正確な表現を求める。 TinyStoriesに適用されたマスク付きミキサーは、初期のトランスフォーマー実装よりも因果言語タスクを効率よく学習し、最適化された現在の実装よりも若干効率が低い。このデータセットで観測される最も効率的な学習アルゴリズムはトランスフォーマー・マザード・ミキサーハイブリッドであり、これらのモデルが直交的に学習することを示唆している。変換器が提示する情報損失は, 生成よりも検索に負担がかかると仮定し, 既存の生成モデル埋め込みに基づく検索モデルの効率的なトレーニング手法を提案する。この方法により, マスクミキサーの埋め込みは, 変圧器の埋め込みに比べて, より優れた要約から物語の検索をもたらすことがわかった。 Attention mechanisms that confer selective focus on a strict subset of input elements are nearly ubiquitous in language models today. We posit there to be downside to the use of attention: most information present in the input is necessarily lost. In support of this idea we observe poor input representation accuracy in transformers, but find more accurate representation in what we term masked mixers which replace self-attention with masked convolutions. Applied to TinyStories the masked mixer learns causal language tasks more efficiently than early transformer implementations and somewhat less efficiently than optimized, current implementations. The most efficient learning algorithm observed for this dataset is a transformer-masked mixer hybrid, suggesting that these models learn in an orthogonal manner. We hypothesized that the information loss exhibited by transformers would be much more detrimental to retrieval than generation, and to test this we introduce an efficient training approach for retrieval models based on existing generative model embeddings. With this method, embeddings from masked mixers are found to result in far better summary-to-story retrieval compared to embeddings from transformers.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# タスク特定エキスパート・プルーニングによる非効率評価によるSMoE言語モデルの再検討 Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning ( http://arxiv.org/abs/2409.01483v1 ) ライセンス: Link先を確認	Soumajyoti Sarkar, Leonard Lausen, Volkan Cevher, Sheng Zha, Thomas Brox, George Karypis,	(参考訳) SMOE(Sparse Mixture of Expert)モデルは、言語モデリングにおける高密度モデルに代わるスケーラブルな代替品として登場した。これらのモデルはトランスブロックで条件付きアクティブなフィードフォワードサブネットワークを使用し、モデル全体のパラメータとサンプルごとの計算を分離することができる。しかし、大きなトークン引き込みSMoEモデルは大きな課題に直面している: 推論の間、モデル全体がシーケンスまたはバッチに使用される必要があり、結果として分散設定において、トークンごとのスパースアクティベーションの利点を相殺する高いレイテンシが生じる。本研究は,SMoEアーキテクチャの設計に関する意思決定に対して,事前学習における専門家数の選択を主眼として,タスク固有のモデルプルーニングについて検討する。本研究は,スクラッチから学習した小型SMoEモデルに対して,タスク上で個別に評価・比較する場合に優位性があるかどうかを考察する。そこで本研究では,適応型タスク対応プルーニング手法UNCURLを導入し,MoE層当たりの専門家数をオフラインで学習する手法を提案する。以上の結果から, プレトレーニングに使用する専門家数に依存して, モデル性能を低下させるしきい値プルーニング因子が明らかとなった。これらの知見は,SMoEアーキテクチャの事前学習におけるモデル設計選択の理解に寄与する。 Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. These models use conditionally activated feedforward subnetworks in transformer blocks, allowing for a separation between total model parameters and per-example computation. However, large token-routed SMoE models face a significant challenge: during inference, the entire model must be used for a sequence or a batch, resulting in high latencies in a distributed setting that offsets the advantages of per-token sparse activation. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures, mainly modulating the choice of expert counts in pretraining. We investigate whether such pruned models offer advantages over smaller SMoE models trained from scratch, when evaluating and comparing them individually on tasks. To that end, we introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training. Our findings reveal a threshold pruning factor for the reduction that depends on the number of experts used in pretraining, above which, the reduction starts to degrade model performance. These insights contribute to our understanding of model design choices when pretraining with SMoE architectures, particularly useful when considering task-specific inference optimization for later stages.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# 量子回路の透かし Watermarking of Quantum Circuits ( http://arxiv.org/abs/2409.01484v1 ) ライセンス: Link先を確認	Rupshali Roy, Swaroop Ghosh,	(参考訳) 量子回路は、量子開発者とユーザの知的財産権(IP)を構成し、敵エージェント、例えば量子クラウドプロバイダ、あるいはクラウドに存在する不正な敵によって盗難から保護される必要がある。これは、量子回路/algorithms\textquotesingle{} IPとその出力を追跡するために、短期量子デバイスに適用可能な低オーバーヘッド手法の探索を必要とする。本稿では,回路設計をクローンする場合のオーナシップを証明するために,このような2つの軽量な透かし技術を提案する。第1の技法では、回路の出力において、回転ゲートを他のゲートと組み合わせたアシラキュービットに配置する。第2の方法は、回路の中央にランダムゲートのセットを挿入し、その逆を回路からバリアで分離する。これらのモデルをベンチマーク回路に組み合わせて適用し、回路深さ、2ビットゲート数、試験成功確率(PST)、著者の確率的証明(PPA)を最先端技術と比較する。 PSTは、非透かしベンチマークに対して0.53\%のマイナス値で減少し、既存の技術よりも22.69\%高い。回路深さは最先端に対して最大27.7 %まで減少している。 PPAは、既存の透かしよりも天文学的に小さい。 Quantum circuits constitute Intellectual Property (IP) of the quantum developers and users, which needs to be protected from theft by adversarial agents, e.g., the quantum cloud provider or a rogue adversary present in the cloud. This necessitates the exploration of low-overhead techniques applicable to near-term quantum devices, to trace the quantum circuits/algorithms\textquotesingle{} IP and their output. We present two such lightweight watermarking techniques to prove ownership in the event of an adversary cloning the circuit design. For the first technique, a rotation gate is placed on ancilla qubits combined with other gate(s) at the output of the circuit. For the second method, a set of random gates are inserted in the middle of the circuit followed by its inverse, separated from the circuit by a barrier. These models are combined and applied on benchmark circuits, and the circuit depth, 2-qubit gate count, probability of successful trials (PST), and probabilistic proof of authorship (PPA) are compared against the state-of-the-art. The PST is reduced by a minuscule 0.53\% against the non-watermarked benchmarks and is up to 22.69\% higher compared to existing techniques. The circuit depth has been reduced by up to 27.7\% as against the state-of-the-art. The PPA is astronomically smaller than existing watermarks.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# EarthGen:トップダウンビューから世界を生成する EarthGen: Generating the World from Top-Down Views ( http://arxiv.org/abs/2409.01491v1 ) ライセンス: Link先を確認	Ansh Sharma, Albert Xiao, Praneet Rathi, Rohit Kundu, Albert Zhai, Yuan Shen, Shenlong Wang,	(参考訳) そこで本研究では,広域地形モデリングのための新しい手法を提案する。我々のモデルの中核は超解像拡散モデルのカスケードであり、複数の解像度で一貫した画像を生成するために組み合わせることができる。この概念をタイル状生成法で適用することで、数千平方kmのリアルな地球表面を高解像度で生成できるスケーラブルなシステムが得られる。提案手法は,Bing Mapsから収集したデータセット上で評価し,超高解像度の1024倍ズーム処理において,超高解像度のベースラインよりも優れていることを示す。また,対話型ギガピクセルスケール生成マップを用いて,多様でコヒーレントなシーンを作成できることを示す。最後に、制御可能なワールドジェネレーションや3Dシーン生成を含む新しいコンテンツ作成アプリケーションを実現するために、我々のシステムをいかに拡張できるかを実証する。 In this work, we present a novel method for extensive multi-scale generative terrain modeling. At the core of our model is a cascade of superresolution diffusion models that can be combined to produce consistent images across multiple resolutions. Pairing this concept with a tiled generation method yields a scalable system that can generate thousands of square kilometers of realistic Earth surfaces at high resolution. We evaluate our method on a dataset collected from Bing Maps and show that it outperforms super-resolution baselines on the extreme super-resolution task of 1024x zoom. We also demonstrate its ability to create diverse and coherent scenes via an interactive gigapixel-scale generated map. Finally, we demonstrate how our system can be extended to enable novel content creation applications including controllable world generation and 3D scene generation.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# 言語モデルOSのための圧縮機-レトリバーアーキテクチャ The Compressor-Retriever Architecture for Language Model OS ( http://arxiv.org/abs/2409.01495v1 ) ライセンス: Link先を確認	Yuan Yang, Siheng Xiong, Ehsan Shareghi, Faramarz Fekri,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、複数のモダリティにまたがって情報を集約・処理する能力を大幅に強化し、マルチモーダルデータクエリ、ツールの使用、Webインタラクション、長いドキュメントの処理など、幅広いタスクを実行できるようになった。これらの能力は、LLMを単なるチャットボットから、現実世界と対話できる汎用エージェントに変換するための道を開く。本稿では,オペレーティングシステム(OS)のコアコンポーネントとして言語モデルを使用する概念を考察し,RAMとして機能するコンテキストウィンドウに格納されたデータを処理するCPUとして効果的に機能する。このようなLM OSを実現する上で重要な課題は、コンテキストウィンドウサイズ制限による現在のセッションベースのインタラクションパラダイムによって制限された、長いコンテキストの管理とセッション間のステートフル性を保証することだ。そこで本研究では,生涯のコンテキスト管理のために設計されたモデルに依存しないコンプレッサー・レトリバーを提案する。検索拡張生成のような他の長期コンテキストソリューションとは異なり、我々のアプローチはベースモデルのフォワード関数のみを用いてコンテキストを圧縮・取得し、エンドツーエンドの微分可能性を保証する。予備的な実験では、このアーキテクチャがコンテキスト内学習タスクにおいて有効であることを示し、完全にステートフルなLLM OSの開発への一歩を踏み出した。プロジェクトリポジトリは、https://github.com/gblackout/LM-OSで入手できる。 Recent advancements in large language models (LLMs) have significantly enhanced their capacity to aggregate and process information across multiple modalities, enabling them to perform a wide range of tasks such as multimodal data querying, tool usage, web interactions, and handling long documents. These capabilities pave the way for transforming LLMs from mere chatbots into general-purpose agents capable of interacting with the real world. This paper explores the concept of using a language model as the core component of an operating system (OS), effectively acting as a CPU that processes data stored in a context window, which functions as RAM. A key challenge in realizing such an LM OS is managing the life-long context and ensuring statefulness across sessions, a feature limited by the current session-based interaction paradigm due to context window size limit. To address this, we introduce compressor-retriever, a model-agnostic architecture designed for life-long context management. Unlike other long-context solutions such as retrieval-augmented generation, our approach exclusively uses the base model's forward function to compress and retrieve context, ensuring end-to-end differentiability. Preliminary experiments demonstrate the effectiveness of this architecture in in-context learning tasks, marking a step towards the development of a fully stateful LLM OS. Project repo available at: https://github.com/gblackout/LM-OS	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# 幾何学的量子機械学習はバーコード分類に有効か? Can Geometric Quantum Machine Learning Lead to Advantage in Barcode Classification? ( http://arxiv.org/abs/2409.01496v1 ) ライセンス: Link先を確認	Chukwudubem Umeano, Stefano Scali, Oleksandr Kyriienko,	(参考訳) 2つのベクトル(画像やバーコードとして視覚化されている)を区別し、それらが互いに関連しているかどうかを学習する問題を考える。そこで我々は,グローバルな相関関係に基づく類似対と異種対の分類を可能にし,少数のサンプルからの一般化を可能にする,埋め込み対称性を備えた幾何量子機械学習(GQML)アプローチを開発した。従来開発されたGQMLアルゴリズムとは違って,単元パラメトリゼーションよりも優れた対称性を考慮した計測適応を提案する。我々は、古典的なディープニューラルネットワークと畳み込みニューラルネットワークとシームズアーキテクチャとの類似性テストについて、GQMLを比較した。量子ネットワークが従来のネットワークよりも優れていることを示す。データセットを構成するために使用される相関分布を解析することにより、この性能の違いを説明する。類似性テストは、BQP複雑性クラスと多項式階層の間の最大分離が証明された問題と関係する。アドバンテージを実現する能力は、データのロード方法に大きく依存するが、量子機械学習の利点と類似した問題を論じる。 We consider the problem of distinguishing two vectors (visualized as images or barcodes) and learning if they are related to one another. For this, we develop a geometric quantum machine learning (GQML) approach with embedded symmetries that allows for the classification of similar and dissimilar pairs based on global correlations, and enables generalization from just a few samples. Unlike GQML algorithms developed to date, we propose to focus on symmetry-aware measurement adaptation that outperforms unitary parametrizations. We compare GQML for similarity testing against classical deep neural networks and convolutional neural networks with Siamese architectures. We show that quantum networks largely outperform their classical counterparts. We explain this difference in performance by analyzing correlated distributions used for composing our dataset. We relate the similarity testing with problems that showcase a proven maximal separation between the BQP complexity class and the polynomial hierarchy. While the ability to achieve advantage largely depends on how data are loaded, we discuss how similar problems can benefit from quantum machine learning.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# DiversityMedQA:大規模言語モデルを用いた医学診断におけるデモグラフィックバイアスの評価 DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models ( http://arxiv.org/abs/2409.01497v1 ) ライセンス: Link先を確認	Rajat Rawat, Hudson McBride, Dhiyaan Nirmal, Rajarshi Ghosh, Jong Moon, Dhruv Alamuri, Sean O'Brien, Kevin Zhu,	(参考訳) 大規模言語モデル(LLM)がヘルスケアで勢いを増すにつれ、人口統計バイアスへの感受性への懸念が高まっている。性別や民族など多種多様な患者層を対象とした医療クエリに対するLCM応答を評価するための新しいベンチマークである {DiversityMedQA} を紹介する。医療委員会試験の質問を含むMedQAデータセットからの質問を摂動することで、さまざまな患者プロファイルの医療診断の微妙な違いを捉えたベンチマークを作成しました。以上の結果から,これらの変動に比較して,モデル性能に顕著な差が認められた。さらに,摂動の正確性を確保するため,各摂動を検証するフィルタ戦略を提案する。本研究では, LLM 医学診断における人口統計バイアスの評価と緩和のための資源として, DiversityMedQA をリリースする。 As large language models (LLMs) gain traction in healthcare, concerns about their susceptibility to demographic biases are growing. We introduce {DiversityMedQA}, a novel benchmark designed to assess LLM responses to medical queries across diverse patient demographics, such as gender and ethnicity. By perturbing questions from the MedQA dataset, which comprises medical board exam questions, we created a benchmark that captures the nuanced differences in medical diagnosis across varying patient profiles. Our findings reveal notable discrepancies in model performance when tested against these demographic variations. Furthermore, to ensure the perturbations were accurate, we also propose a filtering strategy that validates each perturbation. By releasing DiversityMedQA, we provide a resource for evaluating and mitigating demographic bias in LLM medical diagnoses.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# ディープネットワークベンチマークのための実用的な一般化指標 A practical generalization metric for deep networks benchmarking ( http://arxiv.org/abs/2409.01498v1 ) ライセンス: Link先を確認	Mengqing Huang, Hongchuan Yu, Jianjun Zhang,	(参考訳) 深層学習モデルの一般化誤差の限界を推定し、モデルが一般化する能力を実験的に評価できる実用的な指標への関心が高まっている。この関心は、実際的な考察だけでなく、理論的な推定が実際的な検証を必要とするため、理論的な研究にも不可欠である。しかし、現在、様々なディープネットワークの一般化能力のベンチマークとこれらの理論推定の検証に関する研究が不足している。本稿では,異なるディープネットワークをベンチマークする実用的な一般化基準を提案し,理論的推定の検証のための新しいテストベッドを提案する。その結果,分類作業におけるディープネットワークの一般化能力は,分類精度と未確認データの多様性の両方に依存していることがわかった。提案手法は,ディープラーニングモデルの精度とデータの多様性を定量化し,直感的かつ定量的な評価方法,トレードオフ点を提供する。さらに,実測値と既存の一般化理論推定をベンチマークテストベッドを用いて比較した。利用可能な一般化推定のほとんどは,提案した実測値を用いて得られた実測値と相関しない点に注意が必要である。一方、この発見は理論的な推定の欠点を露呈し、新たな探索を促すために重要である。 There is an ongoing and dedicated effort to estimate bounds on the generalization error of deep learning models, coupled with an increasing interest with practical metrics that can be used to experimentally evaluate a model's ability to generalize. This interest is not only driven by practical considerations but is also vital for theoretical research, as theoretical estimations require practical validation. However, there is currently a lack of research on benchmarking the generalization capacity of various deep networks and verifying these theoretical estimations. This paper aims to introduce a practical generalization metric for benchmarking different deep networks and proposes a novel testbed for the verification of theoretical estimations. Our findings indicate that a deep network's generalization capacity in classification tasks is contingent upon both classification accuracy and the diversity of unseen data. The proposed metric system is capable of quantifying the accuracy of deep learning models and the diversity of data, providing an intuitive and quantitative evaluation method, a trade-off point. Furthermore, we compare our practical metric with existing generalization theoretical estimations using our benchmarking testbed. It is discouraging to note that most of the available generalization estimations do not correlate with the practical measurements obtained using our proposed practical metric. On the other hand, this finding is significant as it exposes the shortcomings of theoretical estimations and inspires new exploration.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# NaijaCoderサマーキャンプの講義ノート Lecture Notes from the NaijaCoder Summer Camp ( http://arxiv.org/abs/2409.01499v1 ) ライセンス: Link先を確認	Daniel Alabi, Joseph Ekpenyong, Alida Monaco,	(参考訳) ナイジャコーダーの直接サマーキャンプは、ナイジェリアの高校とプレコラージュの学生のための集中的なプログラムである。プログラムはアルゴリズムとコンピュータプログラミングの基礎に関する無料の教育を提供する。 2024年、収容所は国内2か所で行われた。 (i)連邦首都圏(F.C.T.)、アブハ及び (二)ラゴス州。どちらの場所も教育目的で同じ音符に頼っていた。学生と教師の両方が、メインの対人プログラムが終わった後、レビューできるように、これらのメモを公開メディアで提供しています。 The NaijaCoder in-person summer camps are intensive programs for high school and pre-college students in Nigeria. The programs are meant to provide free instruction on the basics of algorithms and computer programming. In 2024, the camps were held in two locations within the country: (i) the Federal Capital Territory (F.C.T.), Abuja; and (ii) Lagos state. Both locations relied on the same set of notes for instructional purposes. We are providing these notes in a publicly-available medium for both students and teachers to review after the main in-person programs are over.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# 複雑な気象条件下での船舶の航行安全向上のためのリアルタイムマルチシーン視認性向上 Real-Time Multi-Scene Visibility Enhancement for Promoting Navigational Safety of Vessels Under Complex Weather Conditions ( http://arxiv.org/abs/2409.01500v1 ) ライセンス: Link先を確認	Ryan Wen Liu, Yuxu Lu, Yuan Gao, Yu Guo, Wenqi Ren, Fenghua Zhu, Fei-Yue Wang,	(参考訳) 環境認識とナビゲーション支援が可能な可視光カメラは、インテリジェント水上輸送システム(IWTS)において、海洋表層船舶に不可欠なイメージングセンサーとして登場した。しかし、視覚画像の画質は、複雑な気象条件(例えば、ヘイズ、雨、低照度)下での様々な劣化(例えば、可視性、低コントラスト、色歪みなど)に必然的に悩まされる。劣化した視覚情報は、不正確な環境認識と航法リスクのための遅延操作をもたらす。船舶の航行安全を促進するため,悪天候下での視覚的品質向上のために,多くの計算手法が提案されている。しかし、これらの手法の多くは本質的に特定の目的の実装戦略であり、1つの特定の気象タイプでしか利用できない。この制限を克服するために、異なる気象条件下で撮影された劣化画像を適応的に復元するために、エッジパラメータ化とアテンション誘導ニューラルネットワーク(ERANet)の汎用的なマルチシーン可視性向上手法を提案する。特に,私たちのERANetは,低計算コストを維持しながら視品質を向上させるために,チャネルアテンション,空間アテンション,再パラメータ化技術を同時に活用している。標準およびIWTS関連データセットで実施された大規模な実験により、ERANetは画像品質と計算効率の両方において、いくつかの代表的な可視性向上手法より優れていることが示された。 IWTS関連物体検出とシーンセグメンテーションの優れた性能は、複雑な気象条件下でのERANetに基づく可視性向上の後、着実に得られる。 The visible-light camera, which is capable of environment perception and navigation assistance, has emerged as an essential imaging sensor for marine surface vessels in intelligent waterborne transportation systems (IWTS). However, the visual imaging quality inevitably suffers from several kinds of degradations (e.g., limited visibility, low contrast, color distortion, etc.) under complex weather conditions (e.g., haze, rain, and low-lightness). The degraded visual information will accordingly result in inaccurate environment perception and delayed operations for navigational risk. To promote the navigational safety of vessels, many computational methods have been presented to perform visual quality enhancement under poor weather conditions. However, most of these methods are essentially specific-purpose implementation strategies, only available for one specific weather type. To overcome this limitation, we propose to develop a general-purpose multi-scene visibility enhancement method, i.e., edge reparameterization- and attention-guided neural network (ERANet), to adaptively restore the degraded images captured under different weather conditions. In particular, our ERANet simultaneously exploits the channel attention, spatial attention, and reparameterization technology to enhance the visual quality while maintaining low computational cost. Extensive experiments conducted on standard and IWTS-related datasets have demonstrated that our ERANet could outperform several representative visibility enhancement methods in terms of both imaging quality and computational efficiency. The superior performance of IWTS-related object detection and scene segmentation could also be steadily obtained after ERANet-based visibility enhancement under complex weather conditions.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# AMG:アバターモーションガイドビデオジェネレーション AMG: Avatar Motion Guided Video Generation ( http://arxiv.org/abs/2409.01502v1 ) ライセンス: Link先を確認	Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang,	(参考訳) 人間の映像生成タスクは、深層生成モデルの進歩によって大きな注目を集めている。人間の動きでリアルなビデオを生成することは、人間の身体トポロジーの複雑さと視覚的アーティファクトへの敏感さのため、自然界では難しい。広範に研究されている2Dメディア生成手法は、巨大な人間のメディアデータセットを活用するが、3Dアバターベースのアプローチとは対照的に、3Dアバターベースのアプローチは、コントロールの自由度を高めながら、フォトリアリズムを欠き、背景シーンとシームレスに調和できない。本稿では,3次元アバターの制御レンダリングにビデオ拡散モデルを適用し,2次元フォトリアリズムと3次元制御性を組み合わせたAMGを提案する。また、ダイナミックカメラビデオから人間のアバターの動きを再構成しレンダリングする新しいデータ処理パイプラインも導入する。 AMGは、カメラの位置、人間の動き、背景スタイルを正確に制御し、多人数拡散ビデオ生成を可能にする最初の方法である。また,提案手法は,ポーズシーケンスや動画の駆動に係わる既存の映像生成手法を,現実性と適応性の観点から上回っていることを示す。 Human video generation task has gained significant attention with the advancement of deep generative models. Generating realistic videos with human movements is challenging in nature, due to the intricacies of human body topology and sensitivity to visual artifacts. The extensively studied 2D media generation methods take advantage of massive human media datasets, but struggle with 3D-aware control; whereas 3D avatar-based approaches, while offering more freedom in control, lack photorealism and cannot be harmonized seamlessly with background scene. We propose AMG, a method that combines the 2D photorealism and 3D controllability by conditioning video diffusion models on controlled rendering of 3D avatars. We additionally introduce a novel data processing pipeline that reconstructs and renders human avatar movements from dynamic camera videos. AMG is the first method that enables multi-person diffusion video generation with precise control over camera positions, human motions, and background style. We also demonstrate through extensive evaluation that it outperforms existing human video generation methods conditioned on pose sequences or driving videos in terms of realism and adaptability.	翻訳日:2024-09-06 03:35:27 公開日:2024-09-02
# CNNによるSchrodinger cat状態の認識 Recognition of Schrodinger cat state based on CNN ( http://arxiv.org/abs/2409.02132v1 ) ライセンス: Link先を確認	Tao Zhang, Chaoying Zhao,	(参考訳) 猫の状態とコヒーレント状態の分類に畳み込みニューラルネットワークを適用した。当初、非線形プロセスからSchrodinger cat状態とコヒーレントな状態のデータセットを生成し、これらのデータセットを前処理しました。その後、LesNetとResNetのネットワークアーキテクチャを構築し、畳み込みカーネルやストライドなどのパラメータを最適値に調整した。その後、トレーニングセットでLeNetとResNetの両方をトレーニングしました。損失関数の値は、ResNetが猫の状態とコヒーレントな状態の分類においてより優れていることを示している。最後に、トレーニングされたモデルをテストセットで評価し、LeNetで97.5%、ResNetで100%の精度を実現した。猫の状態とコヒーレント状態とを異なる条件で評価し,ある程度の一般化能力を示した。その結果、LeNetはコヒーレントな状態をコヒーレントな特徴のない猫の状態と誤って認識し、ResNetは従来のニューラルネットワークによる猫の状態とコヒーレントな状態を誤って認識する問題に対して、実現可能な解決策を提供することが示された。 We applied convolutional neural networks to the classification of cat states and coherent states. Initially, we generated datasets of Schrodinger cat states and coherent states from nonlinear processes and preprocessed these datasets. Subsequently, we constructed both LeNet and ResNet network architectures, adjusting parameters such as convolution kernels and strides to optimal values. We then trained both LeNet and ResNet on the training sets. The loss function values indicated that ResNet performs better in classifying cat states and coherent states. Finally, we evaluated the trained models on the test sets, achieving an accuracy of 97.5% for LeNet and 100% for ResNet. We evaluated cat states and coherent states with different {\alpha}, demonstrating a certain degree of generalization capability. The results show that LeNet may mistakenly recognize coherent states as cat states without coherent features, while ResNet provides a feasible solution to the problem of mistakenly recognizing cat states and coherent states by traditional neural networks.	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# Edge AI:畳み込みニューラルネットワークのためのモデル圧縮手法の評価 Edge AI: Evaluation of Model Compression Techniques for Convolutional Neural Networks ( http://arxiv.org/abs/2409.02134v1 ) ライセンス: Link先を確認	Samer Francy, Raghubir Singh,	(参考訳) 本研究は,CIFAR-10データセットを用いた画像分類タスクにおけるConvNeXtモデルの圧縮手法を評価する。構造的プルーニング、非構造的プルーニング、動的量子化法を評価し、精度を維持しながらモデルサイズと計算複雑性を低減する。クラウドベースのプラットフォームとエッジデバイスで実施された実験は、これらの技術の性能を評価する。その結果, モデルサイズが著しく減少し, 構造化プルーニング技術により最大75%の削減が達成された。さらに、動的量子化はパラメータ数の最大95%の減少を達成する。微調整されたモデルでは圧縮性能が向上し、圧縮技術とともに事前訓練の利点が示された。非構造化プルーニング法は、計算複雑性が制限された精度と圧縮の傾向を示す。 OTOV3プルーニングと動的量子化の組み合わせにより圧縮性能がさらに向上し、89.7%のサイズが減少し、95%がパラメータ数とMAC数で減少し、3.8%が精度で向上した。エッジデバイスへの最終的な圧縮モデルの展開により、92.5%の精度と20ミリ秒の低推論が可能となり、実世界のエッジコンピューティングアプリケーションにおける圧縮技術の有効性が検証された。 This work evaluates the compression techniques on ConvNeXt models in image classification tasks using the CIFAR-10 dataset. Structured pruning, unstructured pruning, and dynamic quantization methods are evaluated to reduce model size and computational complexity while maintaining accuracy. The experiments, conducted on cloud-based platforms and edge device, assess the performance of these techniques. Results show significant reductions in model size, with up to 75% reduction achieved using structured pruning techniques. Additionally, dynamic quantization achieves a reduction of up to 95% in the number of parameters. Fine-tuned models exhibit improved compression performance, indicating the benefits of pre-training in conjunction with compression techniques. Unstructured pruning methods reveal trends in accuracy and compression, with limited reductions in computational complexity. The combination of OTOV3 pruning and dynamic quantization further enhances compression performance, resulting 89.7% reduction in size, 95% reduction with number of parameters and MACs, and 3.8% increase with accuracy. The deployment of the final compressed model on edge device demonstrates high accuracy 92.5% and low inference time 20 ms, validating the effectiveness of compression techniques for real-world edge computing applications.	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# 勾配型サンプリングによる並列準量子アニーリングの最適化 Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling ( http://arxiv.org/abs/2409.02135v1 ) ライセンス: Link先を確認	Yuma Ichikawa, Yamato Arai,	(参考訳) 学習ベースの手法は、問題固有のヒューリスティックを自動学習し、手作業によるヒューリスティックを不要にするため、汎用的な解法として注目を集めている。しかしながら、これらの手法はスケーラビリティに関する課題に直面することが多い。これらの問題に対処するため、離散ランゲヴィン力学を用いた組合せ最適化のための改良されたサンプリングアルゴリズム(iSCO)が提案され、いくつかの学習ベースの解法よりも優れた性能を示している。本研究は, 連続緩和による勾配に基づく更新と準量子アニーリング(QQA)を組み合わせた別のアプローチを提案する。 QQA は目的関数を、半積分解が支配する単純凸形式から、変数が 0 または 1 に制限された元の目的関数へ滑らかに遷移する。さらに、GPUを活用した並列実行通信、探索能力の向上、収束の加速も取り入れた。数値実験により,本手法は様々なベンチマーク問題に対してiSCOに匹敵する性能を達成し,競争力のある汎用解法であることが示された。特に,本手法は,iSCOや商用解法,特殊アルゴリズムと比較して,大規模インスタンスの速度と解品質のトレードオフが優れている。 Learning-based methods have gained attention as general-purpose solvers because they can automatically learn problem-specific heuristics, reducing the need for manually crafted heuristics. However, these methods often face challenges with scalability. To address these issues, the improved Sampling algorithm for Combinatorial Optimization (iSCO) using discrete Langevin dynamics has been proposed, demonstrating better performance than several learning-based solvers. This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA). QQA smoothly transitions the objective function from a simple convex form, where half-integral solutions dominate, to the original objective function, where the variables are restricted to 0 or 1. Furthermore, we incorporate parallel run communication leveraging GPUs, enhancing exploration capabilities and accelerating convergence. Numerical experiments demonstrate that our approach is a competitive general-purpose solver, achieving comparable performance to iSCO across various benchmark problems. Notably, our method exhibits superior trade-offs between speed and solution quality for large-scale instances compared to iSCO, commercial solvers, and specialized algorithms.	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# 大規模言語モデルと古典的機械学習:高次元タブラルデータを用いたCOVID-19死亡予測の性能 Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data ( http://arxiv.org/abs/2409.02136v1 ) ライセンス: Link先を確認	Mohammadreza Ghaffarzadeh-Esfahani, Mahdi Ghaffarzadeh-Esfahani, Arian Salahi-Niri, Hossein Toreyhi, Zahra Atf, Amirali Mohsenzadeh-Kermani, Mahshad Sarikhani, Zohreh Tajabadi, Fatemeh Shojaeian, Mohammad Hassan Bagheri, Aydin Feyzi, Mohammadamin Tarighatpayma, Narges Gazmeh, Fateme Heydari, Hossein Afshar, Amirreza Allahgholipour, Farid Alimardani, Ameneh Salehi, Naghmeh Asadimanesh, Mohammad Amin Khalafi, Hadis Shabanipour, Ali Moradi, Sajjad Hossein Zadeh, Omid Yazdani, Romina Esbati, Moozhan Maleki, Danial Samiei Nasr, Amirali Soheili, Hossein Majlesi, Saba Shahsavan, Alireza Soheilipour, Nooshin Goudarzi, Erfan Taherifard, Hamidreza Hatamabadi, Jamil S Samaan, Thomas Savage, Ankit Sakhuja, Ali Soroush, Girish Nadkarni, Ilad Alavi Darazam, Mohamad Amin Pourhoseingholi, Seyed Amir Ahmad Safavi-Naini,	(参考訳) 背景: 本研究は, 従来の機械学習モデル (CML) と大規模言語モデル (LLM) のパフォーマンスを, 高次元の表付きデータセットを用いて評価し, 比較することを目的とした。材料と方法:4つの病院で収集された9,134人の新型コロナウイルス患者のデータを分析した。 XGBoostとランダムフォレスト(RF)を含む7つのCMLモデルを訓練し,評価した。構造化データは、GPT-4やMistral-7bを含む8つのLCMでゼロショット分類のためにテキストに変換された。さらに、Mistral-7bは予測能力を高めるためにQLoRAアプローチを使用して微調整された。結果: CMLモデルでは,XGBoostとRFが最も精度が高く,F1スコアは内部検証0.87,外部検証0.83であった。 LLMカテゴリーでは、GPT-4がF1スコア0.43のトップパフォーマーであった。微調整のMistral-7bはリコールを1%から79%に改善し、F1スコアは0.74となり、外的検証では安定していた。結論: LLMはゼロショット分類において適度な性能を示すが、微調整はそれらの効果を著しく向上させ、CMLモデルに近くなる可能性がある。しかし、CMLは高次元の表データタスクにおいてLLMよりも優れています。 Background: This study aimed to evaluate and compare the performance of classical machine learning models (CMLs) and large language models (LLMs) in predicting mortality associated with COVID-19 by utilizing a high-dimensional tabular dataset. Materials and Methods: We analyzed data from 9,134 COVID-19 patients collected across four hospitals. Seven CML models, including XGBoost and random forest (RF), were trained and evaluated. The structured data was converted into text for zero-shot classification by eight LLMs, including GPT-4 and Mistral-7b. Additionally, Mistral-7b was fine-tuned using the QLoRA approach to enhance its predictive capabilities. Results: Among the CML models, XGBoost and RF achieved the highest accuracy, with F1 scores of 0.87 for internal validation and 0.83 for external validation. In the LLM category, GPT-4 was the top performer with an F1 score of 0.43. Fine-tuning Mistral-7b significantly improved its recall from 1% to 79%, resulting in an F1 score of 0.74, which was stable during external validation. Conclusion: While LLMs show moderate performance in zero-shot classification, fine-tuning can significantly enhance their effectiveness, potentially aligning them closer to CML models. However, CMLs still outperform LLMs in high-dimensional tabular data tasks.	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# 分散システムテストのための強化学習におけるReward Augmentation Reward Augmentation in Reinforcement Learning for Testing Distributed Systems ( http://arxiv.org/abs/2409.02137v1 ) ライセンス: Link先を確認	Andrea Borgarelli, Constantin Enea, Rupak Majumdar, Srinidhi Nagendra,	(参考訳) 人気のある分散プロトコル実装のバグは、人気のあるインターネットサービスにおける多くのダウンタイムの源となっている。本稿では,強化学習に基づく分散プロトコル実装のためのランダム化テスト手法について述べる。自然報酬構造は非常に希少であるため、強化学習における探索の成功の鍵は報酬増強である。お互いに構築する2つの異なるテクニックを示します。まず、新しい状態の発見に基づいて崩壊する探索ボーナスを提供する -- 同じ状態が何度も訪れると、報酬は崩壊する。探索ボーナスは、新たなカバレッジポイントの優先順位付けによるカバレッジ誘導ファジィングからの直感を捉え、他のスキームとは対照的に、ボーナスの最大値とQ値の取得がより効果的な探索につながることを示す。第2に、興味深いセマンティックシナリオをキャプチャする述語列として、アルゴリズムのウェイポイントを提供する。 Waypointは、プロトコルに関するデザイナの洞察を利用して、状態空間の‘興味深い’部分に探索を誘導する。我々の報酬構造は、新しいエピソードがキャッシュを実行せずに確実に深い興味深い状態に到達できるようにします。アルゴリズムをGoで実装しました。 RedisRaft, Etcd, RSLの3つの大規模ベンチマークによる評価から, このアルゴリズムは, カバレッジとバグ発見の点で, ベースラインアプローチを著しく上回っていることが示された。 Bugs in popular distributed protocol implementations have been the source of many downtimes in popular internet services. We describe a randomized testing approach for distributed protocol implementations based on reinforcement learning. Since the natural reward structure is very sparse, the key to successful exploration in reinforcement learning is reward augmentation. We show two different techniques that build on one another. First, we provide a decaying exploration bonus based on the discovery of new states -- the reward decays as the same state is visited multiple times. The exploration bonus captures the intuition from coverage-guided fuzzing of prioritizing new coverage points; in contrast to other schemes, we show that taking the maximum of the bonus and the Q-value leads to more effective exploration. Second, we provide waypoints to the algorithm as a sequence of predicates that capture interesting semantic scenarios. Waypoints exploit designer insight about the protocol and guide the exploration to ``interesting'' parts of the state space. Our reward structure ensures that new episodes can reliably get to deep interesting states even without execution caching. We have implemented our algorithm in Go. Our evaluation on three large benchmarks (RedisRaft, Etcd, and RSL) shows that our algorithm can significantly outperform baseline approaches in terms of coverage and bug finding.	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# 拡散モデルに基づく金融時系列記述器 A Financial Time Series Denoiser Based on Diffusion Model ( http://arxiv.org/abs/2409.02138v1 ) ライセンス: Link先を確認	Zhuohan Wang, Carmine Ventre,	(参考訳) 金融時系列はしばしば低信号対雑音比を示し、正確なデータの解釈と予測、最終的な意思決定に重大な課題を提起する。生成モデルは複雑なデータパターンをシミュレートし予測するための強力なツールとして注目され、拡散モデルは特に効果的な方法として出現している。本稿では、データ予測可能性と取引性能を向上させるために、金融時系列のデノイザとして拡散モデルを利用する新しいアプローチを提案する。条件拡散モデルの前方および逆過程を利用して、ノイズを段階的に加減して除去することにより、ノイズ入力から元のデータを再構成する。実験により,拡散モデルに基づく離散化時系列は,下流の将来の回帰分類タスクにおける性能を著しく向上することを示した。さらに、復号化データから導出される取引信号は、より利益率の高い取引を少ない取引で得ることにより、取引コストを最小化し、全体的な取引効率を向上する。最後に、復号化時系列に基づいて訓練された分類器を用いて、市場の雑音状態を認識し、過剰なリターンが得られることを示す。 Financial time series often exhibit low signal-to-noise ratio, posing significant challenges for accurate data interpretation and prediction and ultimately decision making. Generative models have gained attention as powerful tools for simulating and predicting intricate data patterns, with the diffusion model emerging as a particularly effective method. This paper introduces a novel approach utilizing the diffusion model as a denoiser for financial time series in order to improve data predictability and trading performance. By leveraging the forward and reverse processes of the conditional diffusion model to add and remove noise progressively, we reconstruct original data from noisy inputs. Our extensive experiments demonstrate that diffusion model-based denoised time series significantly enhance the performance on downstream future return classification tasks. Moreover, trading signals derived from the denoised data yield more profitable trades with fewer transactions, thereby minimizing transaction costs and increasing overall trading efficiency. Finally, we show that by using classifiers trained on denoised time series, we can recognize the noising state of the market and obtain excess return.	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# ブロックチェーン技術の発展におけるトランスフォーマーモデルの役割:システムレビュー The Role of Transformer Models in Advancing Blockchain Technology: A Systematic Review ( http://arxiv.org/abs/2409.02139v1 ) ライセンス: Link先を確認	Tianxu Liu, Yanbin Wang, Jianguo Sun, Ye Tian, Yanyu Huang, Tao Xue, Peiyue Li, Yiwei Liu,	(参考訳) ブロックチェーン技術が急速に進化するにつれて、効率性、セキュリティ、スケーラビリティの向上に対する需要が増加し、トランスフォーマーモデルは、強力なディープラーニングアーキテクチャとして、さまざまなブロックチェーン課題に対処する上で、前例のない可能性を示している。しかし、ブロックチェーンにおけるTransformerアプリケーションの体系的なレビューには欠けている。本稿では、200以上の関連論文を調査し、ブロックチェーンアプリケーションにおけるTransformerの実践事例と研究の進捗を包括的にレビューすることで、この研究ギャップを埋めることを目的としている。本調査では,異常検出,スマートコントラクトセキュリティ分析,暗号通貨の予測とトレンド分析,コード要約生成などの主要領域について検討した。さまざまなブロックチェーンドメインにわたるトランスフォーマーの進歩を明確にするために、ドメイン指向の分類システムを採用し、現在のブロックチェーン研究における大きな課題に基づいた代表的なメソッドを編成、導入しています。各研究領域について、まず、その背景と目的を紹介し、次に、従来の代表的手法をレビューし、それらの制限を分析し、最後にTransformerモデルによってもたらされた進歩を紹介します。さらに,データプライバシやモデル複雑性,リアルタイム処理要件など,Transformerを活用する上での課題についても検討する。最後に、この記事では、特定のブロックチェーンアプリケーションに適応するために、Transformerアーキテクチャを深く探求することの重要性を強調し、ブロックチェーンテクノロジの発展を促進するその役割について論じる。このレビューは、ブロックチェーン技術と機械学習の統合開発のための新しい視点と研究基盤を提供することを目的としており、ブロックチェーン技術のさらなる革新とアプリケーション拡張をサポートする。 As blockchain technology rapidly evolves, the demand for enhanced efficiency, security, and scalability grows.Transformer models, as powerful deep learning architectures,have shown unprecedented potential in addressing various blockchain challenges. However, a systematic review of Transformer applications in blockchain is lacking. This paper aims to fill this research gap by surveying over 200 relevant papers, comprehensively reviewing practical cases and research progress of Transformers in blockchain applications. Our survey covers key areas including anomaly detection, smart contract security analysis, cryptocurrency prediction and trend analysis, and code summary generation. To clearly articulate the advancements of Transformers across various blockchain domains, we adopt a domain-oriented classification system, organizing and introducing representative methods based on major challenges in current blockchain research. For each research domain,we first introduce its background and objectives, then review previous representative methods and analyze their limitations,and finally introduce the advancements brought by Transformer models. Furthermore, we explore the challenges of utilizing Transformer, such as data privacy, model complexity, and real-time processing requirements. Finally, this article proposes future research directions, emphasizing the importance of exploring the Transformer architecture in depth to adapt it to specific blockchain applications, and discusses its potential role in promoting the development of blockchain technology. This review aims to provide new perspectives and a research foundation for the integrated development of blockchain technology and machine learning, supporting further innovation and application expansion of blockchain technology.	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# 下水道における欠陥の同定のための自己教師付き学習 Self-Supervised Learning for Identifying Defects in Sewer Footage ( http://arxiv.org/abs/2409.02140v1 ) ライセンス: Link先を確認	Daniel Otero, Rafael Mateus,	(参考訳) 下水道インフラは、適任職員による時間集約的な手動検査を必要とする、最も高価な近代的な投資の1つである。本研究は,大量のラベル付きデータに頼ることなく,自動解法の必要性に対処する。欠陥検出のためのスケーラブルで費用対効果の高いソリューションを提供する下水道検査にSSL(Self-Supervised Learning)の新たな応用を提案する。我々は、文献にある他のアプローチの少なくとも5倍小さいモデルで競争結果を達成し、より大きなアーキテクチャでトレーニングする場合、利用可能なデータの10%で競争性能を得る。本研究は,資源制限条件下での下水道保守に革命をもたらすSSLの可能性を明らかにするものである。 Sewerage infrastructure is among the most expensive modern investments requiring time-intensive manual inspections by qualified personnel. Our study addresses the need for automated solutions without relying on large amounts of labeled data. We propose a novel application of Self-Supervised Learning (SSL) for sewer inspection that offers a scalable and cost-effective solution for defect detection. We achieve competitive results with a model that is at least 5 times smaller than other approaches found in the literature and obtain competitive performance with 10\% of the available data when training with a larger architecture. Our findings highlight the potential of SSL to revolutionize sewer maintenance in resource-limited settings.	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# ベクトル空間におけるツール表現の効率的かつスケーラブルな評価 Efficient and Scalable Estimation of Tool Representations in Vector Space ( http://arxiv.org/abs/2409.02141v1 ) ライセンス: Link先を確認	Suhong Moon, Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Woosang Lim, Kurt Keutzer, Amir Gholami,	(参考訳) 関数呼び出しとツール使用の最近の進歩は、外部情報ソースと対話し、複雑なタスクを実行することで、大きな言語モデル(LLM)の機能を大幅に強化した。しかし、LLMの限られたコンテキストウィンドウは、多数のツールが利用可能である場合の課題を示し、迅速な長さの管理と精度の維持に効率的な方法が必要である。 LLMの微調整や推論能力の活用といった既存のアプローチでは、頻繁な再トレーニングが必要か、重大な遅延オーバヘッドが発生している。より効率的なソリューションでは、高品質でドメイン固有のデータを必要とするが、より小さなモデルをトレーニングして、クエリーで最も関連性の高いツールを検索する。これらの課題に対処するために、ツール検索アプリケーションのための合成データを生成する新しいフレームワークと、小さなエンコーダモデルを用いた効率的なデータ駆動型ツール検索戦略を提案する。 LLMを活用して,実際のユーザ利用を反映した新しいツール検索データセットであるToolBankを開発した。ツール検索手法としては,(1)ツール2Vec:ツール検索のためのツール埋め込み生成,(2)ツールレフィナ:検索ツールの品質を反復的に改善するステージド検索,(3)LC:フレーミングツール検索を多ラベル分類問題として提案する。これらの新しい方法により、ToolBenchデータセット上のRecall@Kで最大27.28、ToolBank上のRecall@Kで30.5の改善を実現しています。さらに,本手法を厳格に検証するために,さらなる実験結果を示す。私たちのコードは \url{https://github.com/SqueezeAILab/Tool2Vec} で利用可能です。 Recent advancements in function calling and tool use have significantly enhanced the capabilities of large language models (LLMs) by enabling them to interact with external information sources and execute complex tasks. However, the limited context window of LLMs presents challenges when a large number of tools are available, necessitating efficient methods to manage prompt length and maintain accuracy. Existing approaches, such as fine-tuning LLMs or leveraging their reasoning capabilities, either require frequent retraining or incur significant latency overhead. A more efficient solution involves training smaller models to retrieve the most relevant tools for a given query, although this requires high quality, domain-specific data. To address those challenges, we present a novel framework for generating synthetic data for tool retrieval applications and an efficient data-driven tool retrieval strategy using small encoder models. Empowered by LLMs, we create ToolBank, a new tool retrieval dataset that reflects real human user usages. For tool retrieval methodologies, we propose novel approaches: (1) Tool2Vec: usage-driven tool embedding generation for tool retrieval, (2) ToolRefiner: a staged retrieval method that iteratively improves the quality of retrieved tools, and (3) MLC: framing tool retrieval as a multi-label classification problem. With these new methods, we achieve improvements of up to 27.28 in Recall@K on the ToolBench dataset and 30.5 in Recall@K on ToolBank. Additionally, we present further experimental results to rigorously validate our methods. Our code is available at \url{https://github.com/SqueezeAILab/Tool2Vec}	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# CMOB: オープンデータセット、タスク、ベースラインを備えた大規模がんマルチオミクスベンチマーク CMOB: Large-Scale Cancer Multi-Omics Benchmark with Open Datasets, Tasks, and Baselines ( http://arxiv.org/abs/2409.02143v1 ) ライセンス: Link先を確認	Ziwei Yang, Rikuto Kotoge, Zheng Chen, Xihao Piao, Yasuko Matsubara, Yasushi Sakurai,	(参考訳) 機械学習は、がんのマルチオミクス研究の分野で大きな可能性を秘めており、精度医学を進歩させる素晴らしい機会を提供している。しかし、データセットのキュレーションやタスクの定式化に関連する課題は、特に医学的背景を持たない研究者にとって大きなハードルとなる。ここでは,TGAプラットフォームを統合した最初の大規模がんマルチオミクスベンチマークであるCMOBを紹介した。これにより,データリソースを,十分な準備や専門知識のない機械学習研究者が利用できるようにし,これまでに,32のがんをカバーする20のがんマルチオミクスデータセットのコレクションと,体系的なデータ処理パイプラインが付属する。 CMOBは、よく処理されたデータセットバージョンを提供し、4つの研究で20の有意義なタスクをサポートする。また、CMOBを2つの補完的なリソースと様々な生物学的ツールと統合し、より広範な研究の道を探り、全てのリソースは、ユーザフレンドリで互換性のある統合スクリプトでオープンアクセス可能であり、非専門家が様々なタスクにこの補完的な情報を簡単に組み込めるようにします。選択したデータセットに対して広範な実験を行い、特定のアプリケーションに適した機械学習ベースラインを推奨します。 CMOBを通じて,パーソナライズされたがん治療のための機械学習モデルの開発,検証,臨床翻訳を促進することを目的としている。 CMOBはGitHubで入手できる(\url{https://github.com/chenzRG/Cancer-Multi-Omics-Benchmark})。 Machine learning has shown great potential in the field of cancer multi-omics studies, offering incredible opportunities for advancing precision medicine. However, the challenges associated with dataset curation and task formulation pose significant hurdles, especially for researchers lacking a biomedical background. Here, we introduce the CMOB, the first large-scale cancer multi-omics benchmark integrates the TCGA platform, making data resources accessible and usable for machine learning researchers without significant preparation and expertise.To date, CMOB includes a collection of 20 cancer multi-omics datasets covering 32 cancers, accompanied by a systematic data processing pipeline. CMOB provides well-processed dataset versions to support 20 meaningful tasks in four studies, with a collection of benchmarks. We also integrate CMOB with two complementary resources and various biological tools to explore broader research avenues.All resources are open-accessible with user-friendly and compatible integration scripts that enable non-experts to easily incorporate this complementary information for various tasks. We conduct extensive experiments on selected datasets to offer recommendations on suitable machine learning baselines for specific applications. Through CMOB, we aim to facilitate algorithmic advances and hasten the development, validation, and clinical translation of machine-learning models for personalized cancer treatments. CMOB is available on GitHub (\url{https://github.com/chenzRG/Cancer-Multi-Omics-Benchmark}).	翻訳日:2024-09-05 21:50:21 公開日:2024-09-02
# MOOSS:視覚強化学習におけるスムーズな状態進化のためのマスクによる時間的コントラスト学習 MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning ( http://arxiv.org/abs/2409.02714v1 ) ライセンス: Link先を確認	Jiarui Sun, M. Ugur Akcal, Wei Zhang, Girish Chowdhary,	(参考訳) 視覚強化学習(RL)では、高次元データから情報的状態表現を抽出する複雑さにより、ピクセルベースの観察から学ぶことがサンプル効率に重大な課題を生じさせる。対照的なアプローチのような従来の手法は、サンプル効率を改善するために進歩してきたが、状態の微妙な進化をモデル化するには不足している。この問題に対処するため,視覚的RLにおける状態進化を明示的にモデル化するために,グラフベースの時空間マスキングの助けを借りて時間的コントラストの目的を利用する新しいフレームワークMOOSSを導入する。具体的には,(1)空間的マスキングのための画素ベース観測のグラフ構築と,(2)時間的連続性と状態の変化を強調することで状態表現を充実させるマルチレベルコントラスト学習機構を統合した,自己監督型デュアルコンポーネント戦略を提案する。 MOOSSは、空間的時間的相関から学び、政策学習を促進することによって、状態ダイナミクスの理解を促進する。複数の連続的および離散的な制御ベンチマークに対する総合的な評価により、MOOSSはサンプル効率の観点から従来の最先端の視覚的RL法よりも優れており、本手法の有効性が示されている。私たちのコードはhttps://github.com/jsun57/MOOSS.comでリリースされています。 In visual Reinforcement Learning (RL), learning from pixel-based observations poses significant challenges on sample efficiency, primarily due to the complexity of extracting informative state representations from high-dimensional data. Previous methods such as contrastive-based approaches have made strides in improving sample efficiency but fall short in modeling the nuanced evolution of states. To address this, we introduce MOOSS, a novel framework that leverages a temporal contrastive objective with the help of graph-based spatial-temporal masking to explicitly model state evolution in visual RL. Specifically, we propose a self-supervised dual-component strategy that integrates (1) a graph construction of pixel-based observations for spatial-temporal masking, coupled with (2) a multi-level contrastive learning mechanism that enriches state representations by emphasizing temporal continuity and change of states. MOOSS advances the understanding of state dynamics by disrupting and learning from spatial-temporal correlations, which facilitates policy learning. Our comprehensive evaluation on multiple continuous and discrete control benchmarks shows that MOOSS outperforms previous state-of-the-art visual RL methods in terms of sample efficiency, demonstrating the effectiveness of our method. Our code is released at https://github.com/jsun57/MOOSS.	翻訳日:2024-09-05 18:06:49 公開日:2024-09-02
# GET-UP: Radar Points UPsampling を用いたGEomeTric-Aware Depth Estimation GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling ( http://arxiv.org/abs/2409.02720v1 ) ライセンス: Link先を確認	Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille,	(参考訳) 深さ推定は、自動運転車において重要な役割を担い、車両の3D環境の包括的理解を促進する。レーダーは悪天候条件に対する頑丈さと距離を測定する能力を備えており、レーダーカメラの深さ推定に大きな関心を寄せている。しかし、既存のアルゴリズムは、3Dポイントを画像面に投影して画素レベルの特徴抽出を行い、レーダポイントクラウドに含まれる貴重な幾何学的情報を見渡すことによって、本質的にノイズでスパースなレーダデータを処理している。このギャップに対処するために,レーダーデータから2次元情報と3次元情報を交換・集約するために,注目度の高いグラフニューラルネットワーク(GNN)を利用するGET-UPを提案する。この手法は,2次元特徴抽出のみに依存する従来の手法と比較して,空間的関係を取り入れた特徴表現を効果的に強化する。さらに,レーダ点群を密度化し,点位置を補正し,ライダーデータに基づく付加的な3次元特徴を導出する点群アップサンプリングタスクを組み込んだ。最後に、深度推定のためのデコードフェーズにおいて、レーダとカメラの特徴を融合する。提案したGET-UPをnuScenesデータセット上でベンチマークし,従来最高のパフォーマンスモデルよりも15.3%,14.7%改善した。 Depth estimation plays a pivotal role in autonomous driving, facilitating a comprehensive understanding of the vehicle's 3D surroundings. Radar, with its robustness to adverse weather conditions and capability to measure distances, has drawn significant interest for radar-camera depth estimation. However, existing algorithms process the inherently noisy and sparse radar data by projecting 3D points onto the image plane for pixel-level feature extraction, overlooking the valuable geometric information contained within the radar point cloud. To address this gap, we propose GET-UP, leveraging attention-enhanced Graph Neural Networks (GNN) to exchange and aggregate both 2D and 3D information from radar data. This approach effectively enriches the feature representation by incorporating spatial relationships compared to traditional methods that rely only on 2D feature extraction. Furthermore, we incorporate a point cloud upsampling task to densify the radar point cloud, rectify point positions, and derive additional 3D features under the guidance of lidar data. Finally, we fuse radar and camera features during the decoding phase for depth estimation. We benchmark our proposed GET-UP on the nuScenes dataset, achieving state-of-the-art performance with a 15.3% and 14.7% improvement in MAE and RMSE over the previously best-performing model.	翻訳日:2024-09-05 17:55:43 公開日:2024-09-02
# ウォールストリートのドーム:ポンプとダンプの暗号操作の解析と検出 The Doge of Wall Street: Analysis and Detection of Pump and Dump Cryptocurrency Manipulations ( http://arxiv.org/abs/2105.00733v2 ) ライセンス: Link先を確認	Massimo La Morgia, Alessandro Mei, Francesco Sassi, Julinda Stefa,	(参考訳) 暗号通貨はますます人気がある。専門家でない人々でさえこうした資産に投資を始めており、今日では暗号通貨取引所は月に1000億ドル以上の取引を処理する。それにもかかわらず、多くの暗号通貨は流動性が低く、市場操作が困難である。本稿では,インターネット上のコミュニティによって組織された2つの市場操作(ポンプとダンプと群衆ポンプ)について,詳細な分析を行う。ポンプとダンプは株式市場と同じくらい古い詐欺だ。今では、緩やかに規制された暗号通貨市場において、新たな活力を得た。高度に調整された人々のグループは、通常はTelegramとDiscordでこの詐欺を体系的に手配する。約900件の個人イベントを3年以上にわたって監視した。本報告では,ポンプ群とダンプ群に関する3症例について報告する。検証済みのポンプとダンプのユニークなデータセットを利用して、開始から25秒でポンプとダンプを検出するマシンラーニングモデルを構築し、F1スコアの94.5%の結果を達成しています。 Redditコミュニティーが、世界最大の証券取引所であるウォール街で1,900%以上値上がりした。その後、他のRedditコミュニティが暗号通貨市場での運用を複製した。ターゲットはDogeCoin(DOGE)とRipple(XRP)。これらの操作がどのように発達したのかを再構築し、標準ポンプとダンプとの相違や類似性について議論する。この研究は、暗号通貨市場に影響を与える広範な現象を理解するのに役立ちます。私たちが開発している検出アルゴリズムは、これらのイベントをリアルタイムで効果的に検出し、これらの不正行為が実行された場合に投資家が市場から離れるのを支援する。 Cryptocurrencies are increasingly popular. Even people who are not experts have started to invest in these assets, and nowadays, cryptocurrency exchanges process transactions for over 100 billion US dollars per month. Despite this, many cryptocurrencies have low liquidity and are highly prone to market manipulation. This paper performs an in-depth analysis of two market manipulations organized by communities over the Internet: The pump and dump and the crowd pump. The pump and dump scheme is a fraud as old as the stock market. Now, it got new vitality in the loosely regulated market of cryptocurrencies. Groups of highly coordinated people systematically arrange this scam, usually on Telegram and Discord. We monitored these groups for more than 3 years detecting around 900 individual events. We report on three case studies related to pump and dump groups. We leverage our unique dataset of the verified pump and dumps to build a machine learning model able to detect a pump and dump in 25 seconds from the moment it starts, achieving the results of 94.5% of F1-score. Then, we move on to the crowd pump, a new phenomenon that hit the news in the first months of 2021, when a Reddit community inflates the price of the GameStop stocks (GME) by over 1,900% on Wall Street, the world's largest stock exchange. Later, other Reddit communities replicate the operation on the cryptocurrency markets. The targets were DogeCoin (DOGE) and Ripple (XRP). We reconstruct how these operations developed and discuss differences and analogies with the standard pump and dump. We believe this study helps understand a widespread phenomenon affecting cryptocurrency markets. The detection algorithms we develop effectively detect these events in real-time and help investors stay out of the market when these frauds are in action.	翻訳日:2024-09-04 23:16:54 公開日:2024-09-02
# Token Spammers, Rug Pulls, SniperBots:EthereumおよびBinance Smart Chain(BNB)におけるTokensのエコシステムの解析 Token Spammers, Rug Pulls, and SniperBots: An Analysis of the Ecosystem of Tokens in Ethereum and in the Binance Smart Chain (BNB) ( http://arxiv.org/abs/2206.08202v3 ) ライセンス: Link先を確認	Federico Cernera, Massimo La Morgia, Alessandro Mei, Francesco Sassi,	(参考訳) 本研究では,BNB Smart ChainとEthereumブロックチェーンを2022年3月までに経時的に分析する。トークンと流動性のプールのエコシステムを調査し、両方のブロックチェーン間の類似点と相違点を強調します。トークンの約60%が1日以内でアクティブであることが分かりました。さらに、アドレスの1%が異常な数のトークンを生成する(20%から25%の間)。これらのトークンは、特定のタイプのラグプルを実行するために使い捨てトークンとして使用され、これは1日間のラグプルと呼ばれます。 BNBスマートチェーンにおけるこの操作の妥当性を両ブロックチェーンで確認し,この操作の存在を定量化する。 1日間の暴走が2億4000万ドルという利益を生み出したと見積もっている。最後に、これらの活動に関わる新しい種類のトレーダーボットであるスナイパーボットを提示し、それらの存在を検出し、ラグプル操作における活動の定量化を行う。 In this work, we perform a longitudinal analysis of the BNB Smart Chain and Ethereum blockchain from their inception to March 2022. We study the ecosystem of the tokens and liquidity pools, highlighting analogies and differences between the two blockchains. We discover that about 60% of tokens are active for less than one day. Moreover, we find that 1% of addresses create an anomalous number of tokens (between 20% and 25%). We discover that these tokens are used as disposable tokens to perform a particular type of rug pull, which we call 1-day rug pull. We quantify the presence of this operation on both blockchains discovering its prevalence on the BNB Smart Chain. We estimate that 1-day rug pulls generated $240 million in profits. Finally, we present sniper bots, a new kind of trader bot involved in these activities, and we detect their presence and quantify their activity in the rug pull operations.	翻訳日:2024-09-04 23:16:54 公開日:2024-09-02
# ダイナミックポイントクラウド幾何符号化のためのフレーム間圧縮 Inter-Frame Compression for Dynamic Point Cloud Geometry Coding ( http://arxiv.org/abs/2207.12554v2 ) ライセンス: Link先を確認	Anique Akhtar, Zhu Li, Geert Van der Auwera,	(参考訳) 仮想と混合現実、自律運転、文化遺産といったアプリケーションには、効率的なポイントクラウド圧縮が不可欠です。本稿では,動的点雲幾何圧縮のための深層学習に基づくフレーム間符号化方式を提案する。本稿では,新しい特徴空間間予測ネットワークを用いて,現在のフレームの潜在表現を前フレームで予測する,損失のある幾何学的圧縮手法を提案する。提案するネットワークは,階層型マルチスケール3次元特徴学習によるスパース畳み込みを利用して,前のフレームを用いて現在のフレームを符号化する。提案手法は,前フレームの潜在表現を現在のフレームの座標にマッピングし,現在のフレームの特徴埋め込みを予測するための,特徴領域における動き補償のための新しい予測器ネットワークを提案する。このフレームワークは、予測された特徴と実際の特徴の残余を、学習された確率的因子化エントロピーモデルを用いて圧縮することによって伝達する。受信機では、デコーダは、特徴埋め込みを段階的に再スケーリングすることにより、現在のフレームを階層的に再構築する。提案手法は,移動画像専門家グループ (MPEG) が標準化した最新技術であるビデオベースのポイントクラウド圧縮 (V-PCC) と幾何学ベースのポイントクラウド圧縮 (G-PCC) とを比較した。提案手法は,G-PCCv20Octreeに対する88%以上のBD-Rate(Bjontegaard Delta Rate)の削減,G-PCCv20 Trisoupに対する56%以上のBD-Rateの削減,V-PCCフレーム内符号化モードに対する62%以上のBD-Rateの削減,HEVCを用いたV-PCC Pフレームベースのフレーム間符号化モードに対する52%以上のBD-Rateの削減を実現する。これらの重要なパフォーマンス向上は、MPEGワーキンググループでクロスチェックされ、検証されます。 Efficient point cloud compression is essential for applications like virtual and mixed reality, autonomous driving, and cultural heritage. This paper proposes a deep learning-based inter-frame encoding scheme for dynamic point cloud geometry compression. We propose a lossy geometry compression scheme that predicts the latent representation of the current frame using the previous frame by employing a novel feature space inter-prediction network. The proposed network utilizes sparse convolutions with hierarchical multiscale 3D feature learning to encode the current frame using the previous frame. The proposed method introduces a novel predictor network for motion compensation in the feature domain to map the latent representation of the previous frame to the coordinates of the current frame to predict the current frame's feature embedding. The framework transmits the residual of the predicted features and the actual features by compressing them using a learned probabilistic factorized entropy model. At the receiver, the decoder hierarchically reconstructs the current frame by progressively rescaling the feature embedding. The proposed framework is compared to the state-of-the-art Video-based Point Cloud Compression (V-PCC) and Geometry-based Point Cloud Compression (G-PCC) schemes standardized by the Moving Picture Experts Group (MPEG). The proposed method achieves more than 88% BD-Rate (Bjontegaard Delta Rate) reduction against G-PCCv20 Octree, more than 56% BD-Rate savings against G-PCCv20 Trisoup, more than 62% BD-Rate reduction against V-PCC intra-frame encoding mode, and more than 52% BD-Rate savings against V-PCC P-frame-based inter-frame encoding mode using HEVC. These significant performance gains are cross-checked and verified in the MPEG working group.	翻訳日:2024-09-04 23:16:54 公開日:2024-09-02
# 静的構造から動的構造へ:グラフに基づくディープラーニングによる結合親和性予測の改善 From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning ( http://arxiv.org/abs/2208.10230v4 ) ライセンス: Link先を確認	Yaosen Min, Ye Wei, Peizhuo Wang, Xiaoting Wang, Han Li, Nian Wu, Stefan Bauer, Shuxin Zheng, Yu Shi, Yingheng Wang, Ji Wu, Dan Zhao, Jianyang Zeng,	(参考訳) タンパク質-リガンド結合親和性の正確な予測は、構造に基づく薬物設計において重要な課題である。データ駆動型アフィニティ予測法が近年進歩しているにもかかわらず、その精度は限定的であり、これは静的結晶構造のみを利用するのに対し、実際の結合親和性はタンパク質とリガンドの間の熱力学的アンサンブルによって決定されるためである。そのような熱力学的アンサンブルを近似する効果的な方法は分子動力学(MD)シミュレーションを使用することである。そこで,3,218個のタンパク質-リガンド複合体を含むMDデータセットをキュレートし,MD軌道からタンパク質-リガンド相互作用の幾何学的特徴を学習することにより結合親和性を予測するグラフベースディープラーニングモデルDynaformerを開発した。サイリコ実験では、このモデルがCASF-2016ベンチマークデータセット上で最先端のスコアとランキングの能力を示し、報告された手法よりも優れていた。さらに、Dynaformerを用いた熱ショックタンパク質90(HSP90)の仮想スクリーニングにおいて、20の候補を同定し、それらの結合親和性をさらに実験的に検証する。ダイナフォーマーは、仮想薬物スクリーニングの有望な結果を示し、新しい足場を含む12のヒット化合物(2つはマイクロモルの範囲内)を明らかにした。これらの結果は、この手法が初期の薬物発見プロセスの加速に有望な道を提供することを示した。 Accurate prediction of protein-ligand binding affinities is an essential challenge in structure-based drug design. Despite recent advances in data-driven methods for affinity prediction, their accuracy is still limited, partially because they only take advantage of static crystal structures while the actual binding affinities are generally determined by the thermodynamic ensembles between proteins and ligands. One effective way to approximate such a thermodynamic ensemble is to use molecular dynamics (MD) simulation. Here, an MD dataset containing 3,218 different protein-ligand complexes is curated, and Dynaformer, a graph-based deep learning model is further developed to predict the binding affinities by learning the geometric characteristics of the protein-ligand interactions from the MD trajectories. In silico experiments demonstrated that the model exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset, outperforming the methods hitherto reported. Moreover, in a virtual screening on heat shock protein 90 (HSP90) using Dynaformer, 20 candidates are identified and their binding affinities are further experimentally validated. Dynaformer displayed promising results in virtual drug screening, revealing 12 hit compounds (two are in the submicromolar range), including several novel scaffolds. Overall, these results demonstrated that the approach offer a promising avenue for accelerating the early drug discovery process.	翻訳日:2024-09-04 23:16:54 公開日:2024-09-02
# PDE数値解におけるブレンディングニューラル演算子と緩和法 Blending Neural Operators and Relaxation Methods in PDE Numerical Solvers ( http://arxiv.org/abs/2208.13273v2 ) ライセンス: Link先を確認	Enrui Zhang, Adar Kahana, Alena Kopaničáková, Eli Turkel, Rishikesh Ranade, Jay Pathak, George Em Karniadakis,	(参考訳) ニューラルネットワークは、関数の高周波成分を表現するのが難しいスペクトルバイアスに悩まされ、緩和法は高周波数を効率的に解けるが、中程度の周波数から低い周波数で停止する。この2つの手法の弱点を相乗的に組み合わせて、偏微分方程式(PDE)の高速数値解法を大規模に開発する。具体的には,Deep Operator Network(DeepONet)と標準緩和手法を統合したハイブリッド,反復,数値,移動可能な解法であるHINTSを提案する。 HINTSは、DeepONetのスペクトルバイアスを利用して固有モードのスペクトル間の収束挙動をバランスさせ、その結果、一様収束率と、ハイブリッドソルバ全体の例外的な性能をもたらす。さらに、HINTSは大規模多次元システムに適用され、離散化、計算領域、境界条件に関して柔軟である。 Neural networks suffer from spectral bias having difficulty in representing the high frequency components of a function while relaxation methods can resolve high frequencies efficiently but stall at moderate to low frequencies. We exploit the weaknesses of the two approaches by combining them synergistically to develop a fast numerical solver of partial differential equations (PDEs) at scale. Specifically, we propose HINTS, a hybrid, iterative, numerical, and transferable solver by integrating a Deep Operator Network (DeepONet) with standard relaxation methods, leading to parallel efficiency and algorithmic scalability for a wide class of PDEs, not tractable with existing monolithic solvers. HINTS balances the convergence behavior across the spectrum of eigenmodes by utilizing the spectral bias of DeepONet, resulting in a uniform convergence rate and hence exceptional performance of the hybrid solver overall. Moreover, HINTS applies to large-scale, multidimensional systems, it is flexible with regards to discretizations, computational domain, and boundary conditions.	翻訳日:2024-09-04 23:16:54 公開日:2024-09-02
# 重量変動潜在因果モデルの同定 Identifying Weight-Variant Latent Causal Models ( http://arxiv.org/abs/2208.14153v6 ) ライセンス: Link先を確認	Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi,	(参考訳) 因果表現学習の課題は、下位レベルの観察に影響を与える潜在的な上位の因果表現を明らかにすることである。しかし、観測データから真の潜伏因果関係を同定する一方で、潜伏変数間の即時因果関係を許容することは依然として困難である。この目的のために、推移性、置換不確定性、スケール不確定性の3つの観測から潜在空間を同定する3つの本質的性質の分析から始める。推移性は潜在因果表現の識別性を阻害する重要な役割を担っている。推移性に起因する同定不可能な問題に対処するため,基礎となる潜在因果モデルが線形-ガウスモデルを満たす新たな識別可能性条件を導入し,因果係数とガウス雑音の分布を追加の観測変数で変調する。いくつかの軽微な仮定の下では、潜伏因果表現が自明な置換とスケーリングまで特定可能であることを示すことができる。さらに、この理論結果に基づいて、潜時因果変数から観測された因果変数へのマッピングとともに、潜時因果表現と因果関係を直接学習する構造的caUsAl変分自動エンコーダを提案する。提案手法は, 漸近的に真のパラメータを学習することを示す。合成および実データを用いた実験結果から,潜在因果表現の学習における識別性と一貫性,および提案手法の有効性が示された。 The task of causal representation learning aims to uncover latent higher-level causal representations that affect lower-level observations. Identifying true latent causal representations from observed data, while allowing instantaneous causal relations among latent variables, remains a challenge, however. To this end, we start from the analysis of three intrinsic properties in identifying latent space from observations: transitivity, permutation indeterminacy, and scaling indeterminacy. We find that transitivity acts as a key role in impeding the identifiability of latent causal representations. To address the unidentifiable issue due to transitivity, we introduce a novel identifiability condition where the underlying latent causal model satisfies a linear-Gaussian model, in which the causal coefficients and the distribution of Gaussian noise are modulated by an additional observed variable. Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling. Furthermore, based on this theoretical result, we propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal representations and causal relationships among them, together with the mapping from the latent causal variables to the observed ones. We show that the proposed method learns the true parameters asymptotically. Experimental results on synthetic and real data demonstrate the identifiability and consistency results and the efficacy of the proposed method in learning latent causal representations.	翻訳日:2024-09-04 23:05:43 公開日:2024-09-02
# NFTのゲーム:EthereumブロックチェーンにおけるNFTウォッシュトレーディングの特徴 A Game of NFTs: Characterizing NFT Wash Trading in the Ethereum Blockchain ( http://arxiv.org/abs/2212.01225v3 ) ライセンス: Link先を確認	Massimo La Morgia, Alessandro Mei, Alberto Maria Mongardini, Eugenio Nerio Nemmi,	(参考訳) EthereumブロックチェーンのNon-Fungible Token(NFT)市場は2021年に爆発的な成長を遂げ、2022年1月には月間貿易額が60億ドルに達した。しかし、ある当事者がNFTを取引してそのボリュームを人工的に膨らませる市場操作の形で、洗剤取引の可能性に関する懸念が浮上している。本研究は, イーサリアムのNFT市場における洗剤取引が2022年1月までに及ぼす影響を, 複数のアプローチを用いて検討した。洗濯物取引は全NFTコレクションの5.66%に影響を及ぼし、総人工体積は3,406,110,774米ドルである。我々は、NFTの価格を人工的に上昇させ、一部の市場が提供するトークン報酬システムを活用するという、洗浄取引から利益を得るための2つの方法を検討します。以上の結果から,NFTMのトークン報酬システムの利用は,LooksRare上では1.055M以上の利益が得られ,成功率(運用の80%以上)が高く,洗剤取引で高い価格でNFTを再販売するよりもリスクが低い(活動の50%が損なわれる)ことが示唆された。我々の研究は、Ethereumでは洗剤取引が頻繁に行われており、NFTMはそのような不正行為を防ぐために保護機構を実装するべきであることを強調している。 The Non-Fungible Token (NFT) market in the Ethereum blockchain experienced explosive growth in 2021, with a monthly trade volume reaching \$6 billion in January 2022. However, concerns have emerged about possible wash trading, a form of market manipulation in which one party repeatedly trades an NFT to inflate its volume artificially. Our research examines the effects of wash trading on the NFT market in Ethereum from the beginning until January 2022, using multiple approaches. We find that wash trading affects 5.66% of all NFT collections, with a total artificial volume of \$3,406,110,774. We look at two ways to profit from wash trading: Artificially increasing the price of the NFT and taking advantage of the token reward systems provided by some marketplaces. Our findings show that exploiting the token reward systems of NFTMs is much more profitable (mean gain of successful operations is \$1.055M on LooksRare), more likely to succeed (more than 80% of operations), and less risky than reselling an NFT at a higher price using wash trading (50% of activities result in a loss). Our research highlights that wash trading is frequent in Ethereum and that NFTMs should implement protective mechanisms to stop such illicit behavior.	翻訳日:2024-09-04 23:05:43 公開日:2024-09-02
# 研究成果の自動チェックを支援する多言語ツールキット A multi-language toolkit for supporting automated checking of research outputs ( http://arxiv.org/abs/2212.02935v2 ) ライセンス: Link先を確認	Richard J. Preen, Maha Albashir, Simon Davy, Jim Smith,	(参考訳) 本稿では、研究者が分析を行う際に、最良の原則に基づく統計開示制御(SDC)技術を自動的に適用することにより、研究者やデータガバナンスチームを支援する研究成果自動チェックパッケージアクロを提案する。 acroは、公開が安全な研究出力、さらなる分析を必要とする出力、プライベートデータを開示する重大なリスクを生じるため公開できない出力の2つを区別する。これは、テーブル、プロット、統計モデルなどの出力を生成するよく知られた分析ツールの上に置かれる軽量Pythonラッパーを使用することで実現される。これにより機能が追加される。 i) 一般的に使用される開示試験の範囲に対して潜在的な開示出力を識別すること。二必要なときは開示緩和戦略を適用すること。三) SDCの適用理由の報告及び (4)信頼性のある研究環境のスタッフがワークフローの合理化に利用できる簡単な要約文書を作成する。研究者が使用する主要な分析プログラミング言語は、Python、R、Staである。 acroコードとドキュメントはMITライセンスでhttps://github.com/AI-SDC/ACROで公開されている。 This article presents the automatic checking of research outputs package acro, which assists researchers and data governance teams by automatically applying best-practice principles-based statistical disclosure control (SDC) techniques on-the-fly as researchers conduct their analyses. acro distinguishes between: research output that is safe to publish; output that requires further analysis; and output that cannot be published because it creates substantial risk of disclosing private data. This is achieved through the use of a lightweight Python wrapper that sits over well-known analysis tools that produce outputs such as tables, plots, and statistical models. This adds functionality to (i) identify potentially disclosive outputs against a range of commonly used disclosure tests; (ii) apply disclosure mitigation strategies where required; (iii) report reasons for applying SDC; and (iv) produce simple summary documents trusted research environment staff can use to streamline their workflow. The major analytical programming languages used by researchers are supported: Python, R, and Stata. The acro code and documentation are available under an MIT license at https://github.com/AI-SDC/ACRO	翻訳日:2024-09-04 23:05:43 公開日:2024-09-02
# デクラミブリングによるニューラルネットワーク説明可能性の限界について On the limits of neural network explainability via descrambling ( http://arxiv.org/abs/2301.07820v3 ) ライセンス: Link先を確認	Shashank Sule, Richard G. Spencer, Wojciech Czaja,	(参考訳) トレーニングニューラルネットワーク(NN)の完全連結層を説明する数学的モデルとして,ニューラルネットワークデクラムブリングの正確な解を特徴付ける。グラフマッチングと複雑性理論に起因したブロケット関数の最小化に問題を再構成することにより、隠蔽層プレアクティベーションの主成分が層重みの最適説明やデクランブラーとして特徴付けられることを示す。典型的なディープラーニングでは,(1) 最大主成分と等方的隠蔽データに対するフーリエ基底の最低周波数モードとの整合性,(2) 信号回復問題のための2層線形NNにおける意味発達の発見,(3) ニューロンを最適に置換することでCNNを説明すること,などが特徴である。数値実験により,隠蔽層データの固有分解がデクランブラーとして理解されていることが示唆された。これらの結果から,SVDは従来考えられていたよりもNNの説明可能性と直接的に関連し,特に操作者学習や物理インフォームドNNの文脈において,NNの隠蔽動作に対する解釈可能なモチーフを発見するための有望な手段を提供する。 We characterize the exact solutions to neural network descrambling--a mathematical model for explaining the fully connected layers of trained neural networks (NNs). By reformulating the problem to the minimization of the Brockett function arising in graph matching and complexity theory we show that the principal components of the hidden layer preactivations can be characterized as the optimal explainers or descramblers for the layer weights, leading to descrambled weight matrices. We show that in typical deep learning contexts these descramblers take diverse and interesting forms including (1) matching largest principal components with the lowest frequency modes of the Fourier basis for isotropic hidden data, (2) discovering the semantic development in two-layer linear NNs for signal recovery problems, and (3) explaining CNNs by optimally permuting the neurons. Our numerical experiments indicate that the eigendecompositions of the hidden layer data--now understood as the descramblers--can also reveal the layer's underlying transformation. These results illustrate that the SVD is more directly related to the explainability of NNs than previously thought and offers a promising avenue for discovering interpretable motifs for the hidden action of NNs, especially in contexts of operator learning or physics-informed NNs, where the input/output data has limited human readability.	翻訳日:2024-09-04 23:05:43 公開日:2024-09-02
# シーン理解のための構造化生成モデル Structured Generative Models for Scene Understanding ( http://arxiv.org/abs/2302.03531v2 ) ライセンス: Link先を確認	Christopher K. I. Williams,	(参考訳) 本稿では静的なシーンの理解に \emph{structured generative model} (SGM) を用いることを論じる。これは、入力画像(またはマルチビュー画像の集合)から3Dシーンを再構成することを必要とし、画像の内容は、シーンライティングやカメラパラメータなどのグローバル変数とともに、それぞれ独自のタイプ、形状、外観、ポーズを持つインスタンス化されたオブジェクトのモデルによって因果的に説明される。このアプローチはまた、シーン内のオブジェクトの共起と相互関係を説明するシーンモデルを必要とする。 SGMのアプローチは、それが構成的かつ生成的であり、解釈可能性と編集可能性をもたらすという利点がある。 SGMアジェンダを追求するには、オブジェクトやシーンのモデルと、推論を実行するためのアプローチが必要です。まず,「物」と「物」と「物」とを含む対象物のモデルについて検討する。次に、オブジェクトの相互関係を記述する 'emph{scene model} をレビューします。おそらくSGMの最も難しい問題は、オブジェクトのemph{inference}、照明とカメラパラメータ、および1つまたは複数の画像からなる入力からのシーン相互関係である。我々は、SGMアジェンダを前進させるために対処する必要がある問題についての議論で締めくくります。 This position paper argues for the use of \emph{structured generative models} (SGMs) for the understanding of static scenes. This requires the reconstruction of a 3D scene from an input image (or a set of multi-view images), whereby the contents of the image(s) are causally explained in terms of models of instantiated objects, each with their own type, shape, appearance and pose, along with global variables like scene lighting and camera parameters. This approach also requires scene models which account for the co-occurrences and inter-relationships of objects in a scene. The SGM approach has the merits that it is compositional and generative, which lead to interpretability and editability. \\\\ To pursue the SGM agenda, we need models for objects and scenes, and approaches to carry out inference. We first review models for objects, which include ``things'' (object categories that have a well defined shape), and ``stuff'' (categories which have amorphous spatial extent). We then move on to review \emph{scene models} which describe the inter-relationships of objects. Perhaps the most challenging problem for SGMs is \emph{inference} of the objects, lighting and camera parameters, and scene inter-relationships from input consisting of a single or multiple images. We conclude with a discussion of issues that need addressing to advance the SGM agenda.	翻訳日:2024-09-04 23:05:43 公開日:2024-09-02
# ACE、ジェネリック制約解決器 ACE, a generic constraint solver ( http://arxiv.org/abs/2302.05405v2 ) ライセンス: Link先を確認	Christophe Lecoutre,	(参考訳) 制約プログラミング(CP)は、組合せ制約問題のモデル化と解決に有用な技術である。一方、PyCSP3のようなライブラリを使って、さまざまなアプリケーション分野(スケジューリング、計画、データマイニング、暗号、バイオインフォマティクス、有機化学など)で発生する問題をモデル化することができる。問題インスタンスは特定のモデルやデータから直接生成できる。一方、インスタンス(特にXCSP3フォーマットで表現される)の解決には、ACEのような制約解決器を使用できます。 ACEは、Javaで開発されたオープンソースの制約解決ツールで、整数変数(0/1-ブール変数を含む)、最先端のテーブル制約、一般的なグローバル制約、検索ヒューリスティックス、(単条件)最適化に焦点を当てている。 Constraint Programming (CP) is a useful technology for modeling and solving combinatorial constrained problems. On the one hand, on can use a library like PyCSP3 for easily modeling problems arising in various application fields (e.g., scheduling, planning, data-mining, cryptography, bio-informatics, organic chemistry, etc.). Problem instances can then be directly generated from specific models and data. On the other hand, for solving instances (notably, represented in XCSP3 format), one can use a constraint solver like ACE, which is presented in this paper. ACE is an open-source constraint solver, developed in Java, which focuses on integer variables (including 0/1-Boolean variables), state-of-the-art table constraints, popular global constraints, search heuristics and (mono-criterion) optimization.	翻訳日:2024-09-04 23:05:43 公開日:2024-09-02
# 量子格子系における自由核子による全固有状態熱化 Full Eigenstate Thermalization via Free Cumulants in Quantum Lattice Systems ( http://arxiv.org/abs/2303.00713v4 ) ライセンス: Link先を確認	Silvia Pappalardi, Felix Fritzsch, Tomaž Prosen,	(参考訳) ETH(Eigenstate-Thermalization-Hypothesis)は、量子統計力学を理解するための一般的な枠組みとして確立されている。マトリックス要素間の高次相関を考慮に入れ、自由確率の言語を用いて理論的に理性化できるような、いわゆる完全ETH(Full ETH)に注目が集まっているのはつい最近である。本研究では,高次相関器を局所演算子に分解する実験により,物理多体系における全ETHの局所相互作用に関する最初の数値計算を行った。我々は、スピンチェーンハミルトニアンとフロケブリックワークユニタリ回路という、局所的非可積分(カオス)量子多体系の2つのクラスで正確な対角化を行う。 ETH が予測した 4 次自由累積において, 4 時間相関関数のダイナミクスが符号化されていることを示す。周波数への依存は、局所的な多体系の物理的性質を符号化し、ランダム行列の非構造的、回転不変なアンサンブルと区別する。 The Eigenstate-Thermalization-Hypothesis (ETH) has been established as the general framework to understand quantum statistical mechanics. Only recently has the attention been paid to so-called full ETH, which accounts for higher-order correlations among matrix elements, and that can be rationalized theoretically using the language of Free Probability. In this work, we perform the first numerical investigation of the full ETH in physical many-body systems with local interactions by testing the decomposition of higher-order correlators into thermal free cumulants for local operators. We perform exact diagonalization on two classes of local non-integrable (chaotic) quantum many-body systems: spin chain Hamiltonians and Floquet brickwork unitary circuits. We show that the dynamics of four-time correlation functions are encoded in fourth-order free cumulants, as predicted by ETH. Their dependence on frequency encodes the physical properties of local many-body systems and distinguishes them from structureless, rotationally invariant ensembles of random matrices.	翻訳日:2024-09-04 22:54:55 公開日:2024-09-02
# イベントストリームにおける時空間表現学習のためのイベントボクセルセット変換器 Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams ( http://arxiv.org/abs/2303.03856v3 ) ライセンス: Link先を確認	Bochen Xie, Yongjian Deng, Zhanpeng Shao, Qingsong Xu, Youfu Li,	(参考訳) イベントカメラは、シーンをスパースで非同期なイベントストリームとして記録するニューロモルフィックな視覚センサである。イベントベースのほとんどの手法はイベントを高密度のフレームに投影し、従来の視覚モデルを用いて処理する。最近のトレンドは、スパース表現を学習することで効率的なイベント処理を実現するポイントベースのネットワークを開発することである。しかし、既存の作業には、ロバストなローカル情報アグリゲータと効果的な機能インタラクション操作が欠けているため、モデリング能力が制限される可能性がある。そこで本研究では,イベントストリーム上での時空間表現学習を効率的に行うために,イベントVoxel Set Transformer (EVSTr) という注意型モデルを提案する。まずイベントストリームをボクセル集合に変換し、次に階層的にボクセル特徴を集約してロバスト表現を得る。 EVSTrの中核は、ローカル情報アグリゲーションのためのMNEL(Multi-Scale Neighbor Embedding Layer)とグローバル機能インタラクションのためのVSAL(Voxel Self-Attention Layer)という2つのよく設計されたコンポーネントで構成されるイベントボクセルトランスフォーマーエンコーダである。長範囲の時間構造を組み込むためにネットワークを導入し、セグメント化されたボクセル集合のシーケンスから動作パターンを学習するためのセグメントモデリング戦略(S$^{2}$TM)を導入する。提案手法はオブジェクト分類と動作認識を含む2つの認識タスクに基づいて評価される。説得力のあるモデル評価を行うため,挑戦的なシナリオに記録された新しいイベントベースの行動認識データセット(NeuroHAR)を提案する。総合的な実験によると、EVSTrは低モデルの複雑さを維持しながら最先端のパフォーマンスを達成する。 Event cameras are neuromorphic vision sensors that record a scene as sparse and asynchronous event streams. Most event-based methods project events into dense frames and process them using conventional vision models, resulting in high computational complexity. A recent trend is to develop point-based networks that achieve efficient event processing by learning sparse representations. However, existing works may lack robust local information aggregators and effective feature interaction operations, thus limiting their modeling capabilities. To this end, we propose an attention-aware model named Event Voxel Set Transformer (EVSTr) for efficient spatiotemporal representation learning on event streams. It first converts the event stream into voxel sets and then hierarchically aggregates voxel features to obtain robust representations. The core of EVSTr is an event voxel transformer encoder that consists of two well-designed components, including the Multi-Scale Neighbor Embedding Layer (MNEL) for local information aggregation and the Voxel Self-Attention Layer (VSAL) for global feature interaction. Enabling the network to incorporate a long-range temporal structure, we introduce a segment modeling strategy (S$^{2}$TM) to learn motion patterns from a sequence of segmented voxel sets. The proposed model is evaluated on two recognition tasks, including object classification and action recognition. To provide a convincing model evaluation, we present a new event-based action recognition dataset (NeuroHAR) recorded in challenging scenarios. Comprehensive experiments show that EVSTr achieves state-of-the-art performance while maintaining low model complexity.	翻訳日:2024-09-04 22:54:55 公開日:2024-09-02
# 幅広から深部まで:パラメータ効率の知識グラフ埋め込みのための次元リフティングネットワーク From Wide to Deep: Dimension Lifting Network for Parameter-efficient Knowledge Graph Embedding ( http://arxiv.org/abs/2303.12816v4 ) ライセンス: Link先を確認	Borui Cai, Yong Xiang, Longxiang Gao, Di Wu, He Zhang, Jiong Jin, Tom Luan,	(参考訳) エンティティと関係をベクトル表現にマッピングする知識グラフ埋め込み(KGE)は下流アプリケーションに不可欠である。従来のKGE法は知識グラフの複雑な構造を学ぶために高次元表現を必要とするが、大きすぎるモデルパラメータをもたらす。近年、低次元の実体表現によるパラメータの削減や、縮小次元を補う技術(例えば、知識蒸留や再発明された表現形式)の開発が進んでいる。しかし、そのような演算は複雑な計算やモデル設計を導入しており、これは大きな知識グラフの恩恵を受けない可能性がある。従来のKGEモデルのパラメータ効率を改善するための簡単な戦略を探るため、より深いニューラルネットワークは、構成構造のためのより広いネットワークに匹敵する表現性を達成するために指数的に少ないパラメータを必要とする。我々は、すべての実体表現を単層埋め込みネットワークとみなし、高次元の実体表現を採用する従来のKGE法は、埋め込みネットワークを均等に拡張して表現性を得る。パラメータ効率を達成するために、我々はエンティティ表現のためのより深い埋め込みネットワーク、すなわち、狭いエンティティ埋め込み層と多層次元リフトネットワーク(LiftNet)を提案する。 3つの公開データセットの実験により、従来の4つのKGEメソッドと16次元表現を統合することで、512次元表現を採用したオリジナルのモデルと同等のリンク予測精度を実現し、68.4%から96.9%のパラメータを節約した。 Knowledge graph embedding (KGE) that maps entities and relations into vector representations is essential for downstream applications. Conventional KGE methods require high-dimensional representations to learn the complex structure of knowledge graph, but lead to oversized model parameters. Recent advances reduce parameters by low-dimensional entity representations, while developing techniques (e.g., knowledge distillation or reinvented representation forms) to compensate for reduced dimension. However, such operations introduce complicated computations and model designs that may not benefit large knowledge graphs. To seek a simple strategy to improve the parameter efficiency of conventional KGE models, we take inspiration from that deeper neural networks require exponentially fewer parameters to achieve expressiveness comparable to wider networks for compositional structures. We view all entity representations as a single-layer embedding network, and conventional KGE methods that adopt high-dimensional entity representations equal widening the embedding network to gain expressiveness. To achieve parameter efficiency, we instead propose a deeper embedding network for entity representations, i.e., a narrow entity embedding layer plus a multi-layer dimension lifting network (LiftNet). Experiments on three public datasets show that by integrating LiftNet, four conventional KGE methods with 16-dimensional representations achieve comparable link prediction accuracy as original models that adopt 512-dimensional representations, saving 68.4% to 96.9% parameters.	翻訳日:2024-09-04 22:54:55 公開日:2024-09-02
# 深層学習モデル変換器の故障とリスクの分析:ONNXエコシステムを事例として Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem ( http://arxiv.org/abs/2303.17708v4 ) ライセンス: Link先を確認	Purvish Jajal, Wenxin Jiang, Arav Tewari, Erik Kocinare, Joseph Woo, Anusha Sarraf, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis,	(参考訳) ソフトウェアエンジニアは、さまざまな開発フレームワークとランタイム環境を使用して、ディープラーニング(DL)モデルを開発、微調整、デプロイします。 DLモデルコンバータは、フレームワークとランタイム環境の間でモデルを移動します。変換エラーによってモデルの品質が損なわれ、デプロイメントが破壊される。しかし、DLモデルコンバータの故障特性は不明であり、DLインターオペラビリティ技術を使用する場合のリスクが増大する。本稿では,DLモデルコンバータの故障解析を行う。我々は,DL相互運用性ツール,ユースケース,痛点(N=92)について,ソフトウェアエンジニアを調査した。次に、メインの相互運用性ツールであるONNX(PyTorchとTensorFlowのN=200問題)に関連するモデルコンバータの障害を特徴付ける。最後に、我々が研究した失敗の構造的原因に関する2つの仮説を定式化し、検証した。モデル変換器のノード変換段階が欠陥の75%を占め、報告された障害の33%が意味的に誤りのあるモデルと関連していることがわかった。意味的に不正確なモデルの原因は解明されているが、振る舞いの不整合のあるモデルは演算子シーケンスを共有する。我々の成果は、DLインターオペラビリティソフトウェアをメンテナンス、拡張、検証をより簡単にするための将来の研究を動機付けています。行動寛容とアーキテクチャカバレッジメトリクスの研究は実りあるかもしれない。 Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ~75% of the defects and 33% of reported failure are related to semantically incorrect models. The cause of semantically incorrect models is elusive, but models with behaviour inconsistencies share operator sequences. Our results motivate future research on making DL interoperability software simpler to maintain, extend, and validate. Research into behavioural tolerances and architectural coverage metrics could be fruitful.	翻訳日:2024-09-04 22:54:55 公開日:2024-09-02
# アクセス不能情報の理論 A Theory of Inaccessible Information ( http://arxiv.org/abs/2305.05734v2 ) ライセンス: Link先を確認	Jacopo Surace,	(参考訳) 実験的に世界を探索する能力に根本的な限界があったら、どうなるでしょう? この研究では、この問題を真剣に検討する。真理値が実験的にアクセスできないステートメントが存在すると仮定する。つまり、理論上でさえ、これらの文が真か偽かを直接検査する方法は存在しない。さらに、実験的にアクセス可能なステートメントが一定数の到達不能ステートメントの和となる理論を発展させる。例えば、文 "a" と "b" の真理値にはアクセスできないが、文 "a or b" の真理値にはアクセスできない。確率論を直接仮定するのではなく、実験的にアクセシブルで到達不能なステートメントを排他的に定義し、古典論理の規則を用いてこれらの概念を構築する。興味深い構造が現れる。この理論を発展させ、論理構造を確率論的に緩和し、「到達不能情報の理論」と呼ばれる構造に富んだ理論を得る。驚いたことに、到達不能情報の理論の最も単純なモデルは量子力学における量子ビットである。この理論の構築の道筋に沿って、我々は「アクセシビリティ対策」と呼ぶ乗法的情報尺度の族を特徴づけ、研究する。 What would be the consequences if there were fundamental limits to our ability to experimentally explore the world? In this work we seriously consider this question. We assume the existence of statements whose truth value is not experimentally accessible. That is, there is no way, not even in theory, to directly test if these statements are true or false. We further develop a theory in which experimentally accessible statements are a union of a fixed minimum number of inaccessible statements. For example, the value of truth of the statements "a" and "b" is not accessible, but the value of truth of the statement "a or b" is accessible. We do not directly assume probability theory, we exclusively define experimentally accessible and inaccessible statements and build on these notions using the rules of classical logic. We find that an interesting structure emerges. Developing this theory, we relax the logical structure to a probabilistic one, obtaining a theory rich in structure that we call "theory of inaccessible information". Surprisingly, the simplest model of theory of inaccessible information is the qubit in quantum mechanics. Along the path for the construction of this theory, we characterise and study a family of multiplicative information measures that we call "inaccessibility measures".	翻訳日:2024-09-04 22:54:55 公開日:2024-09-02
# 量子鍵分布における状態遮断側チャネル攻撃と自律的故障検出 State-Blocking Side-Channel Attacks and Autonomous Fault Detection in Quantum Key Distribution ( http://arxiv.org/abs/2305.18006v3 ) ライセンス: Link先を確認	Matt Young, Marco Lucamarini, Stefano Pirandola,	(参考訳) サイドチャネル攻撃により、EavesdropperはQKDシステムの実践的な実装において不正確性を使用することで、完全な実装を前提とするセキュリティ証明では考慮されていない利点を得ることができる。そこで本研究では,現在進行中のQKDセッションにおいて,このような攻撃を自律的に検出する手法と,検出速度の限界について検討する。サイドチャネルの能力は非常に一般的で、攻撃自体の様々な実装をカバーしている。本稿では、Alice と Bob が、現在進行中のサイドチャネル攻撃にかかわらず、検知が完了すると、QKD システムの使用を継続するための対策を講じる方法について述べる。これによりQKDシステムのダウンタイムが防止され、重要なインフラでは深刻なリスクが発生する可能性がある。次に、Evesのサイドチャネル機能を拡張し、修正された攻撃戦略を示す。この強化された攻撃は、我々のスキームによって特定条件下で検出できるが、Eveからの知的パラメータの選択により、強化された攻撃は検出されない。このことから,プライバシ・アンプリフィケーション(プライバシ・アンプリフィケーション)が持つ意味や,QKD全体のセキュリティについて論じる。最後に、これらのタイプの攻撃が、QKDシステム内の特定の種類の障害とどのように類似しているか、また、我々の検出方法がこれらの障害を検出する方法、そして、どのようにしてQKDの実装に自律的な障害検出と冗長性を付加するかについて考察する。 Side-channel attacks allow an Eavesdropper to use insecurities in the practical implementation of QKD systems to gain an advantage that is not considered by security proofs that assume perfect implementations. In this work we specify a side-channel capability for Eve that has yet to be considered, before then going on to discuss a scheme to autonomously detect such an attack during an ongoing QKD session, and the limits as to how fast a detection can be made. The side-channel capability is very general and covers a wide variety of possible implementations for the attack itself. We present how Alice and Bob can put in place a countermeasure to continue use of the QKD system, once a detection is made, regardless of the ongoing side-channel attack. This prevents downtime of QKD systems, which in critical infrastructure could pose severe risks. We then extend Eves side-channel capability and present a modified attack strategy. This strengthened attack can be detected under certain conditions by our scheme, however intelligent choices of parameters from Eve allow her strengthened attack to go undetected. From this, we discuss the implications this has on Privacy Amplification, and therefore on the security of QKD as a whole. Finally, consideration is given as to how these types of attacks are analogous to certain types of faults in the QKD system, how our detection scheme can also detect these faults, and therefore how this adds autonomous fault detection and redundancy to implementations of QKD.	翻訳日:2024-09-04 22:54:55 公開日:2024-09-02
# 量子温度測定における精度と範囲拡張のための混合熱コヒーレント状態 Mixing thermal coherent states for precision and range enhancement in quantum thermometry ( http://arxiv.org/abs/2306.04369v4 ) ライセンス: Link先を確認	Asghar Ullah, M. Tahir Naseem, Özgür E. Müstecaplıoğlu,	(参考訳) 熱環境と量子システムの間の避けられない相互作用は、典型的には量子コヒーレンスを低下させ、貯水池工学によって抗うことができる。共振器に縦に結合した2レベルシステムと熱浴を結合させることにより, 熱コヒーレント状態の特別な混合を実現することを提案する。共振器の状態は2つの対向した熱コヒーレント状態の特別な混合であるのに対して、2レベル系は熱のままである。この観測は、共振器状態の2次相関係数を評価して検証する。さらに, 共振器の熱コヒーレント状態の混合を利用した量子温度測定の利点を明らかにする。この文脈では、共振器は2レベルシステムによって媒介される浴槽の未知温度を測定するプローブとして機能し、両者の接続を戦略的にブリッジする。以上の結果から,補助プローブの使用により適用温度範囲が拡大する可能性が示唆された。 The unavoidable interaction between thermal environments and quantum systems typically leads to the degradation of quantum coherence, which can be fought against by reservoir engineering. We propose the realization of a special mixture of thermal coherent states by coupling a thermal bath with a two-level system that is longitudinally coupled to a resonator. We find that the state of the resonator is a special mixture of two oppositely displaced thermal coherent states, whereas the two-level system remains thermal. This observation is verified by evaluating the second-order correlation coefficient for the resonator state. Moreover, we reveal the potential benefits of employing the mixture of thermal coherent states of the resonator in quantum thermometry. In this context, the resonator functions as a probe to measure the unknown temperature of a bath mediated by a two-level system, strategically bridging the connection between the two. Our results show that the use of an ancillary-assisted probe may broaden the applicable temperature range.	翻訳日:2024-09-04 22:54:55 公開日:2024-09-02
# テキスト・ツー・イメージ・ジェネレーションの育成実践 The Cultivated Practices of Text-to-Image Generation ( http://arxiv.org/abs/2306.11393v3 ) ライセンス: Link先を確認	Jonas Oppenlaender,	(参考訳) 人間は、誰でも生成人工知能(AI)を使ってデジタル情報を合成できる新しい創造的時代に入った。特にテキスト・ツー・イメージ・ジェネレーションは非常に人気があり、何百万人もの実践者がAI生成画像やAIアートをオンラインで制作している。この章ではまず、テキスト・ツー・イメージ生成に関する健全な共創造的なオンラインエコシステムが急速に出現し、続いて、このエコシステムにおける重要な要素を高レベルに記述する上で、重要な展開の概要を紹介します。 AIアートコミュニティによって受け入れられた創造的なプラクティスである、プロンプトエンジニアリングに特に焦点が当てられている。このシステムは、人間の創造性をサポートするだけでなく、将来の世代を巻き込み、AIにおける将来の開発努力を制限する可能性がある。この章では、今日のトレーニングデータに固有のバイアス、合成データによる将来の画像生成システムの潜在的な品質劣化、人々の想像力、野心、発展に対するテキスト・ツー・イメージ・ジェネレーションの長期的な影響など、この共同創造的エコシステムを育む潜在的なリスクと危険性について論じている。 Humankind is entering a novel creative era in which anybody can synthesize digital information using generative artificial intelligence (AI). Text-to-image generation, in particular, has become vastly popular and millions of practitioners produce AI-generated images and AI art online. This chapter first gives an overview of the key developments that enabled a healthy co-creative online ecosystem around text-to-image generation to rapidly emerge, followed by a high-level description of key elements in this ecosystem. A particular focus is placed on prompt engineering, a creative practice that has been embraced by the AI art community. It is then argued that the emerging co-creative ecosystem constitutes an intelligent system on its own - a system that both supports human creativity, but also potentially entraps future generations and limits future development efforts in AI. The chapter discusses the potential risks and dangers of cultivating this co-creative ecosystem, such as the bias inherent in today's training data, potential quality degradation in future image generation systems due to synthetic data becoming common place, and the potential long-term effects of text-to-image generation on people's imagination, ambitions, and development.	翻訳日:2024-09-04 22:54:55 公開日:2024-09-02
# バイトペア符号化の形式的展望 A Formal Perspective on Byte-Pair Encoding ( http://arxiv.org/abs/2306.16837v3 ) ライセンス: Link先を確認	Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Tim Vieira, Mrinmaya Sachan, Ryan Cotterell,	(参考訳) Byte-Pair Encoding (BPE) は、当初圧縮法として考案されたものの、NLPでデータをトークン化するために使われる一般的なアルゴリズムである。 BPEは、顔の値にグリージーなアルゴリズムのように見えるが、BPEが解決しようとしている基礎となる最適化問題は、まだ定まっていない。 BPEを組合せ最適化問題として定式化する。部分モジュラー函数により、反復グリーディ版が$\frac{1}{{\sigma(\boldsymbol{\mu}^\star)}}(1-e^{-{\sigma(\boldsymbol{\mu}^\star)}})$-approximation of a optimal merge sequence, where ${\sigma(\boldsymbol{\mu}^\star)}$は、最適マージ列に対する全後方曲率である。経験的には近似の下位境界は$\approx 0.37$である。我々は、ランタイムの複雑さを$\mathcal{O}\left(N M\right)$から$\mathcal{O}\left(N \log M\right)$に改善するBPEのより高速な実装を提供する。最後に, メモリ化を用いた最適BPEに対して, ブルートフォースアルゴリズムを最適化する。 Byte-Pair Encoding (BPE) is a popular algorithm used for tokenizing data in NLP, despite being devised initially as a compression method. BPE appears to be a greedy algorithm at face value, but the underlying optimization problem that BPE seeks to solve has not yet been laid down. We formalize BPE as a combinatorial optimization problem. Via submodular functions, we prove that the iterative greedy version is a $\frac{1}{{\sigma(\boldsymbol{\mu}^\star)}}(1-e^{-{\sigma(\boldsymbol{\mu}^\star)}})$-approximation of an optimal merge sequence, where ${\sigma(\boldsymbol{\mu}^\star)}$ is the total backward curvature with respect to the optimal merge sequence $\boldsymbol{\mu}^\star$. Empirically the lower bound of the approximation is $\approx 0.37$. We provide a faster implementation of BPE which improves the runtime complexity from $\mathcal{O}\left(N M\right)$ to $\mathcal{O}\left(N \log M\right)$, where $N$ is the sequence length and $M$ is the merge count. Finally, we optimize the brute-force algorithm for optimal BPE using memoization.	翻訳日:2024-09-04 22:44:54 公開日:2024-09-02
# 早期停止を伴う手指伸長による高速ロバストカーネル回帰 Fast Robust Kernel Regression through Sign Gradient Descent with Early Stopping ( http://arxiv.org/abs/2306.16838v6 ) ライセンス: Link先を確認	Oskar Allerbo,	(参考訳) カーネルリッジ回帰(カーネルリッジ回帰、英: Kernel ridge regression、KRR)は、データにおいて非線形であるが、モデルパラメータでは線形である線形リッジ回帰の一般化である。ここでは、KRRの目的関数の等価な定式化を導入する。これは、リッジペナルティを$\ell_\infty$と$\ell_1$ペナルティに置き換えることと、勾配降下の観点からカーネルリッジ回帰を研究することである。 $\ell_\infty$ と $\ell_1$ のペナルティを用いて、それぞれ堅牢なカーネル回帰とスパースカーネル回帰を得る。さらに、明示的に正規化されたカーネル回帰と反復勾配ベースの手法の早期停止によって得られる解との類似性について研究し、そこでは、符号勾配降下に$\ell_\infty$正規化と$\ell_1$正規化を前方段階回帰(座標降下としても知られる)に$\ell_1$正規化と、勾配降下に$\ell_2$正規化を接続し、最後のケースでは、その差に理論的に拘束される。我々は、$\ell_\infty$正規化と符号勾配降下の関係、および$\ell_1$正規化と座標降下の関係を利用して、堅牢でスパースなカーネル回帰のための計算効率の良い手法を提案する。最後に, 5つの実データに対して, 符号勾配勾配による堅牢なカーネル回帰を既存手法と比較し, 精度を損なうことなく, 提案手法が桁違いに高速であることを示す。 Kernel ridge regression, KRR, is a generalization of linear ridge regression that is non-linear in the data, but linear in the model parameters. Here, we introduce an equivalent formulation of the objective function of KRR, which opens up both for replacing the ridge penalty with the $\ell_\infty$ and $\ell_1$ penalties and for studying kernel ridge regression from the perspective of gradient descent. Using the $\ell_\infty$ and $\ell_1$ penalties, we obtain robust and sparse kernel regression, respectively. We further study the similarities between explicitly regularized kernel regression and the solutions obtained by early stopping of iterative gradient-based methods, where we connect $\ell_\infty$ regularization to sign gradient descent, $\ell_1$ regularization to forward stagewise regression (also known as coordinate descent), and $\ell_2$ regularization to gradient descent, and, in the last case, theoretically bound for the differences. We exploit the close relations between $\ell_\infty$ regularization and sign gradient descent, and between $\ell_1$ regularization and coordinate descent to propose computationally efficient methods for robust and sparse kernel regression. We finally compare robust kernel regression through sign gradient descent to existing methods for robust kernel regression on five real data sets, demonstrating that our method is one to two orders of magnitude faster, without compromising accuracy.	翻訳日:2024-09-04 22:44:54 公開日:2024-09-02
# 広帯域非調和ポテンシャルにおける粒子ダイナミクスとデコヒーレンスの解析 Wigner Analysis of Particle Dynamics and Decoherence in Wide Nonharmonic Potentials ( http://arxiv.org/abs/2307.14106v6 ) ライセンス: Link先を確認	Andreu Riera-Campeny, Marc Roda-Llordes, Piotr T. Grochowski, Oriol Romero-Isart,	(参考訳) 非調和ポテンシャルにおける粒子の1次元運動の時間発展を概ね記述したウィグナー関数の解析式を導出する。提案手法は,初期状態のセントロイドの古典力学と,その軌道に関する回転と旋回の両方を考慮に入れた,2つの正確なフレーム変換を含む。その後、定数角と線形化デコヒーレンス近似という2つの重要な近似を用いる。これらの近似は、広いポテンシャルと小さなゆらぎの体制、すなわち、初期状態よりも大きい空間膨張を許容するが、関連する力学長スケール(例えば、旋回点間の距離)よりも小さいポテンシャルに有効である。我々の分析結果は、古典物理学と量子物理学の相互作用と非線形力学におけるデコヒーレンスの影響を解明する。この分析結果は、大粒子のマクロ量子状態を生成するために非線形力学を用いて提案を設計し、最適化し、理解するのに役立つ。 We derive an analytical expression of a Wigner function that approximately describes the time evolution of the one-dimensional motion of a particle in a nonharmonic potential. Our method involves two exact frame transformations, accounting for both the classical dynamics of the centroid of the initial state and the rotation and squeezing about that trajectory. Subsequently, we employ two crucial approximations, namely the constant-angle and linearized-decoherence approximations. These approximations are effective in the regime of wide potentials and small fluctuations, namely potentials that enable spatial expansions orders of magnitude larger than the one of the initial state but that remain smaller compared to the relevant dynamical length scale (e.g., distance between turning points). Our analytical result elucidates the interplay between classical and quantum physics and the impact of decoherence during nonlinear dynamics. This analytical result is instrumental to design, optimize and understand proposals using nonlinear dynamics to generate macroscopic quantum states of massive particles.	翻訳日:2024-09-04 22:44:54 公開日:2024-09-02
# 初期スクリーニング順序問題 The Initial Screening Order Problem ( http://arxiv.org/abs/2307.15398v4 ) ライセンス: Link先を確認	Jose M. Alvarez, Antonio Mastropietro, Salvatore Ruggieri,	(参考訳) そこで,本研究では,候補者プールから$k$の候補者を選別する審査員を課題とする,従業員採用や学術受験などの候補検定業務における初期検定命令(ISO)の役割について検討する。 ISOは、スクリーニング者が候補プールを検索する順序を指す。今日では、ISOがオンラインプラットフォームやデータベースクエリのような情報アクセスシステムの製品であるのが一般的である。 ISOは、選択された$k$候補の最適性と公正性、特にヒトスクリーニングの下での潜在的影響にもかかわらず、文献でおおむね見落とされてきた。我々は,ISO の下での表示者の検索動作を記述する2つの問題定式化を定義する。ベスト$k$,ベスト$k$,グッド$k$,ベスト$k$。 ISOの影響を研究するため,本研究では,ヒトライクなスクリーニング装置を導入し,そのアルゴリズムとの比較を行った。特に,本分析の結果から,ISOは「$k$」の問題に対処し,グループレベルの公正さにもかかわらず個人の公正性を阻害し,選択した$k$候補の最適性を損なうことが明らかとなった。これは、候補の評価がISO内の位置に影響される位置バイアスによるものである。我々は,アルゴリズムや人間のようなスクリーニングを行う上で,最良のk$とよいk$のパラメータを探索する広範囲なシミュレーション実験を報告する。シミュレーションフレームワークは、複数のスクリーニング設定を考慮するのに十分な柔軟性があり、実際の候補スクリーニング手順の実行に代わるものだ。この研究は、ヨーロッパの企業と共同で研究されている実世界の候補者スクリーニング問題によって動機付けられている。 We investigate the role of the initial screening order (ISO) in candidate screening tasks, such as employee hiring and academic admissions, in which a screener is tasked with selecting $k$ candidates from a candidate pool. The ISO refers to the order in which the screener searches the candidate pool. Today, it is common for the ISO to be the product of an information access system, such as an online platform or a database query. The ISO has been largely overlooked in the literature, despite its potential impact on the optimality and fairness of the chosen $k$ candidates, especially under a human screener. We define two problem formulations describing the search behavior of the screener under the ISO: the best-$k$, where the screener selects the $k$ best candidates; and the good-$k$, where the screener selects the $k$ first good-enough candidates. To study the impact of the ISO, we introduce a human-like screener and compare it to its algorithmic counterpart, where the human-like screener is conceived to be inconsistent over time due to fatigue. In particular, our analysis shows that the ISO, under a human-like screener solving for the good-$k$ problem, hinders individual fairness despite meeting group level fairness, and hampers the optimality of the selected $k$ candidates. This is due to position bias, where a candidate's evaluation is affected by its position within the ISO. We report extensive simulated experiments exploring the parameters of the best-$k$ and good-$k$ problems for the algorithmic and human-like screeners. The simulation framework is flexible enough to account for multiple screening settings, being an alternative to running real-world candidate screening procedures. This work is motivated by a real-world candidate screening problem studied in collaboration with an European company.	翻訳日:2024-09-04 22:44:54 公開日:2024-09-02
# 自律型負荷熱制御 Autonomous Payload Thermal Control ( http://arxiv.org/abs/2307.15438v3 ) ライセンス: Link先を確認	Alejandro D. Mousist,	(参考訳) 小さな衛星では、熱制御装置、科学機器、電子部品のスペースは少ない。さらに、電子部品の近接により、温度を適切に制御できず、部品寿命とミッション性能が低下する危険性があるため、送電が困難になる。この課題に対処するために、船上衛星におけるインテリジェンスの増加を生かして、深層強化学習を用いた自律型熱制御ツールが提案されている。このツールは、国際宇宙ステーション(ISS)にホストされたデモペイロードに使用される実際のスペースエッジ処理コンピュータで評価された。実験の結果,提案フレームワークは従来の熱制御システムを補完して,運用範囲の温度を維持するためにペイロード処理能力の制御を学べることがわかった。 In small satellites there is less room for heat control equipment, scientific instruments, and electronic components. Furthermore, the near proximity of electronic components makes power dissipation difficult, with the risk of not being able to control the temperature appropriately, reducing component lifetime and mission performance. To address this challenge, taking advantage of the advent of increasing intelligence on board satellites, an autonomous thermal control tool that uses deep reinforcement learning is proposed for learning the thermal control policy onboard. The tool was evaluated in a real space edge processing computer that will be used in a demonstration payload hosted in the International Space Station (ISS). The experiment results show that the proposed framework is able to learn to control the payload processing power to maintain the temperature under operational ranges, complementing traditional thermal control systems.	翻訳日:2024-09-04 22:44:54 公開日:2024-09-02
# 私の言葉で世界を示す: シーンテキストからシーンテキストへの翻訳のための最初のベースラインを確立する Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation ( http://arxiv.org/abs/2308.03024v3 ) ライセンス: Link先を確認	Shreyas Vaidya, Arvind Kumar Sharma, Prajwal Gatti, Anand Mishra,	(参考訳) 本研究では,ソース言語 (e g , Hindi) からターゲット言語 (e g , English) への「視覚的」なシーンテキストの翻訳作業について検討する。視覚翻訳は、シーンテキストの認識と翻訳だけでなく、フォント、サイズ、背景といった元のシーンテキストの視覚的特徴を保存する翻訳画像の生成も含む。このタスクには、限られた文脈での翻訳、翻訳と文字の翻訳の決定、固定された空間境界内での様々なテキストの長さの調整、ターゲット言語におけるソースシーンテキストのフォントと背景スタイルの保存など、いくつかの課題がある。この問題に対処するため、以下の貢献をしている。 (i)本論文では,視覚翻訳を単独問題として初めて考察した。 (II)シーンテキスト認識,機械翻訳,シーンテキスト合成のための最先端モジュールをタスクのベースラインとして組み合わせた視覚翻訳フレームワークを提案する。 (3) 性能改善のために, ベースラインの変種を設計するためのタスク固有の設計拡張セットを提案する。 (四)現時点の文献では、この新たな課題に対する総合的な性能評価が欠如している。このギャップを埋めるために、視覚翻訳を明示的に評価するための自動的およびユーザ支援的な評価指標をいくつか導入する。さらに,ヒンディー語と英語のシーンテキストを翻訳するための提示ベースラインの評価を行った。本実験は,シーンテキスト画像の集合体上で視覚的翻訳を効果的に行うことができるが,提示されたベースラインは視覚的翻訳タスクによって生じる課題に部分的に対処するのみであることを示す。我々は,この新たな課題と既存モデルの限界が,視覚翻訳のさらなる研究を促進すると強く信じている。 In this work, we study the task of ``visually'' translating scene text from a source language (e.g., Hindi) to a target language (e.g., English). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the source scene text, such as font, size, and background. There are several challenges associated with this task, such as translation with limited context, deciding between translation and transliteration, accommodating varying text lengths within fixed spatial boundaries, and preserving the font and background styles of the source scene text in the target language. To address this problem, we make the following contributions: (i) We study visual translation as a standalone problem for the first time in the literature. (ii) We present a cascaded framework for visual translation that combines state-of-the-art modules for scene text recognition, machine translation, and scene text synthesis as a baseline for the task. (iii) We propose a set of task-specific design enhancements to design a variant of the baseline to obtain performance improvements. (iv) Currently, the existing related literature lacks any comprehensive performance evaluation for this novel task. To fill this gap, we introduce several automatic and user-assisted evaluation metrics designed explicitly for evaluating visual translation. Further, we evaluate presented baselines for translating scene text between Hindi and English. Our experiments demonstrate that although we can effectively perform visual translation over a large collection of scene text images, the presented baseline only partially addresses challenges posed by visual translation tasks. We firmly believe that this new task and the limitations of existing models, as reported in this paper, should encourage further research in visual translation.	翻訳日:2024-09-04 22:44:54 公開日:2024-09-02
# マルチビジュアル慣性システム:分析・校正・推定 Multi-Visual-Inertial System: Analysis, Calibration and Estimation ( http://arxiv.org/abs/2308.05303v4 ) ライセンス: Link先を確認	Yulin Yang, Patrick Geneva, Guoquan Huang,	(参考訳) 本稿では,多視点慣性系(MVIS)の状態を推定し,任意の数の非同期慣性測定ユニット(IMU)やジャイロスコープ,グローバルおよび(または)ローリングシャッターカメラを最適に融合させるセンサ融合アルゴリズムを開発する。 IMUやカメラの内在性、IMU-IMU(またはカメラ)時空間外在性、ローリングシャッターカメラ(使用)の画像読取時間など、関連する視覚慣性センサーの完全な校正に関心がある。この目的のために,本研究では,ベースIMUとともに補助IMUと(または)ジャイロスコープの融合に利用した,内在性決定型ACI3-to preintegrate IMU測定と新たなIMU統合法を開発した。我々は,IMU-IMUの剛体制約を活用して,補助慣性ポーズの必要を排除し,複雑性を低減しつつ,必要慣性内在およびIMU-IMU時空間外因性パラメータをすべて含む多慣性測定をモデル化した。 MVISの可観測性解析により,慣性センサの数に関わらず,標準の4つの観測不可能な方向が残っていること,IMU-IMU時空間外在性運動と補助慣性内在性運動の退化を初めて確認した。解析とアルゴリズムを検証した広範囲なシミュレーションに加えて、我々は独自のMVISセンサーリグを構築し、25以上の実世界のデータセットを収集し、Kalibrのような最先端のキャリブレーション手法に対するキャリブレーションの提案を実験的に検証した。提案したMVISキャリブレーションにより,コンバージェンスとリピータビリティの向上を図り,コンバージェンスとリピータビリティの向上を図った。 In this paper, we study state estimation of multi-visual-inertial systems (MVIS) and develop sensor fusion algorithms to optimally fuse an arbitrary number of asynchronous inertial measurement units (IMUs) or gyroscopes and global and(or) rolling shutter cameras. We are especially interested in the full calibration of the associated visual-inertial sensors, including the IMU or camera intrinsics and the IMU-IMU(or camera) spatiotemporal extrinsics as well as the image readout time of rolling-shutter cameras (if used). To this end, we develop a new analytic combined IMU integration with intrinsics-termed ACI3-to preintegrate IMU measurements, which is leveraged to fuse auxiliary IMUs and(or) gyroscopes alongside a base IMU. We model the multi-inertial measurements to include all the necessary inertial intrinsic and IMU-IMU spatiotemporal extrinsic parameters, while leveraging IMU-IMU rigid-body constraints to eliminate the necessity of auxiliary inertial poses and thus reducing computational complexity. By performing observability analysis of MVIS, we prove that the standard four unobservable directions remain - no matter how many inertial sensors are used, and also identify, for the first time, degenerate motions for IMU-IMU spatiotemporal extrinsics and auxiliary inertial intrinsics. In addition to the extensive simulations that validate our analysis and algorithms, we have built our own MVIS sensor rig and collected over 25 real-world datasets to experimentally verify the proposed calibration against the state-of-the-art calibration method such as Kalibr. We show that the proposed MVIS calibration is able to achieve competing accuracy with improved convergence and repeatability, which is open sourced to better benefit the community.	翻訳日:2024-09-04 22:35:08 公開日:2024-09-02
# 人間とLLMによるニューステキストにおける言語パターンの対比 Contrasting Linguistic Patterns in Human and LLM-Generated News Text ( http://arxiv.org/abs/2308.09067v3 ) ライセンス: Link先を確認	Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, David Vilares,	(参考訳) 人書きニューステキストとは対照的な定量的解析を行い,3つの異なるファミリーと4つのサイズをカバーする6種類のLLMから出力されるLLMを比較検討した。我々の分析は、形態学、統語学、心理計測学、社会言語学的な側面を含む、いくつかの測定可能な言語次元にまたがる。結果は、人間とAIが生成したテキストの様々な測定可能な相違を明らかにした。人間の文章は、より散在した文の長さの分布、より多様な語彙、依存関係と構成要素の明確な使用、より短い構成物、より最適化された依存距離を示す。人間は(恐怖や嫌悪など)強い負の感情を示し、LLMが生成するテキストに比べて喜びを減らし、サイズが大きくなるにつれてこれらのモデルの毒性が増大する傾向にある。 LLMの出力は、人文よりも数字、記号、補助語(目的語を推奨する)が多用され、代名詞も多用される。ヒトのテキストで広く見られる性差別バイアスは、LSMによっても表現され、それら全てにおいて1つを除いて拡大される。 LLMと人間の違いはLLMよりも大きい。 We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from six different LLMs that cover three different families and four sizes in total. Our analysis spans several measurable linguistic dimensions, including morphological, syntactic, psychometric, and sociolinguistic aspects. The results reveal various measurable differences between human and AI-generated texts. Human texts exhibit more scattered sentence length distributions, more variety of vocabulary, a distinct use of dependency and constituent types, shorter constituents, and more optimized dependency distances. Humans tend to exhibit stronger negative emotions (such as fear and disgust) and less joy compared to text generated by LLMs, with the toxicity of these models increasing as their size grows. LLM outputs use more numbers, symbols and auxiliaries (suggesting objective language) than human texts, as well as more pronouns. The sexist bias prevalent in human text is also expressed by LLMs, and even magnified in all of them but one. Differences between LLMs and humans are larger than between LLMs.	翻訳日:2024-09-04 22:35:08 公開日:2024-09-02
# 神経脱水:限られたデータによるDNNからのブラックボックス透かしの効果的消去 Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data ( http://arxiv.org/abs/2309.03466v2 ) ライセンス: Link先を確認	Yifan Lu, Wenxuan Li, Mi Zhang, Xudong Pan, Min Yang,	(参考訳) 高度に訓練された深層ニューラルネットワーク(DNN)の知的特性を保護するため,DNNモデルの予測行動に埋め込まれたブラックボックスの透かしが,APIアクセスのみを用いて疑似モデルから抽出され,学業・産業ともに人気が高まっている。ウォーターマークの堅牢性は通常、保護されたモデルを盗み、ウォーターマーク除去のパラメータを難読化する攻撃者に対して実装される。しかし、現在の堅牢性評価は主に中程度の攻撃や非現実的な設定下で実行される。既存の削除攻撃は、メインストリームのブラックボックスの透かしのごく一部を破ることしかできず、不完全な除去、透かしの事前の知識への依存、性能劣化、データへの高い依存という4つの重要な側面で不足する可能性がある。本稿では, 透かし非依存の除去攻撃である‘textsc{Neural Dehydration} (\textit{abbrev) を提案する。これはDNNから10のメインストリームのブラックボックスの透かしを効果的に消去する。一般的に、攻撃パイプラインは保護されたモデルの内部を利用して、透かしメッセージを復元し、解放する。さらに,目的のクラス検出とサンプル分割アルゴリズムを設計し,実用的損失を低減し,透かしの5つのスキームでデータフリーな透かし除去を実現する。我々は,3つのベンチマークデータセットとDNNアーキテクチャを用いた10の主流ブラックボックス透かしに対して,textsc{Dehydra} の総合評価を行う。既存の削除攻撃と比較すると、 \textsc{Dehydra} は、盗難されたモデルユーティリティの少なくとも 90 %$ をデータ制限設定下で保存し、トレーニングデータの 2 % 以下、あるいはデータフリーでも、カバーされたすべての透かしの強い除去効果を達成する。 To protect the intellectual property of well-trained deep neural networks (DNNs), black-box watermarks, which are embedded into the prediction behavior of DNN models on a set of specially-crafted samples and extracted from suspect models using only API access, have gained increasing popularity in both academy and industry. Watermark robustness is usually implemented against attackers who steal the protected model and obfuscate its parameters for watermark removal. However, current robustness evaluations are primarily performed under moderate attacks or unrealistic settings. Existing removal attacks could only crack a small subset of the mainstream black-box watermarks, and fall short in four key aspects: incomplete removal, reliance on prior knowledge of the watermark, performance degradation, and high dependency on data. In this paper, we propose a watermark-agnostic removal attack called \textsc{Neural Dehydration} (\textit{abbrev.} \textsc{Dehydra}), which effectively erases all ten mainstream black-box watermarks from DNNs, with only limited or even no data dependence. In general, our attack pipeline exploits the internals of the protected model to recover and unlearn the watermark message. We further design target class detection and recovered sample splitting algorithms to reduce the utility loss and achieve data-free watermark removal on five of the watermarking schemes. We conduct comprehensive evaluation of \textsc{Dehydra} against ten mainstream black-box watermarks on three benchmark datasets and DNN architectures. Compared with existing removal attacks, \textsc{Dehydra} achieves strong removal effectiveness across all the covered watermarks, preserving at least $90\%$ of the stolen model utility, under the data-limited settings, i.e., less than $2\%$ of the training data or even data-free.	翻訳日:2024-09-04 22:35:08 公開日:2024-09-02
# Cs原子のすべての関連準位を持つ密度行列方程式からのCPT共鳴信号の導出と実験結果の確認 Derivation of CPT resonance signals from density-matrix equations with all relevant sublevels of Cs atoms and confirmation of experimental results ( http://arxiv.org/abs/2309.06761v2 ) ライセンス: Link先を確認	K. Matsumoto, S. Kagami, T. Fujisaku, A. Kirihara, S. Yanagimachi, T. Ikegami, A. Morinaga,	(参考訳) コヒーレント・ポピュレーション・トッピング共鳴(Coherent-population-trapping resonance)は、アルカリ原子の基底状態超微粒子準位の間の2光子遷移に現れる量子干渉効果であり、小型のクロックデバイスでしばしば用いられる。この現象の性能を定量的に理解し、予測するには、原子の異なる励起過程に関与するすべての超微細ゼーマン準位間の遷移と緩和を考慮する必要がある。本研究では,Louville密度行列方程式の計算的マルチレベル原子モデルを構築し,Cs蒸気セルによる透過光の振幅と形状をシミュレーションした。実験結果から, 方程式の数値解と解析的研究が, 種々の特性を適切に説明できることが示唆された。 Coherent-population-trapping resonance is a quantum interference effect that appears in the two-photon transitions between the ground-state hyperfine levels of alkali atoms and is often utilized in miniature clock devices. To quantitatively understand and predict the performance of this phenomenon, it is necessary to consider the transitions and relaxations between all hyperfine Zeeman sublevels involved in the different excitation processes of the atom. In this study, we constructed a computational multi-level atomic model of the Liouville density-matrix equation for 32 Zeeman sublevels involved in the $D_1$ line of $^{133}$Cs irradiated by two frequencies with circularly polarized components and then simulated the amplitude and shape of the transmitted light through a Cs vapor cell. We show that the numerical solutions of the equation and analytical investigations adequately explain a variety of the characteristics observed in the experiment.	翻訳日:2024-09-04 22:35:08 公開日:2024-09-02
# NutritionVerse: 食事摂取量推定手法の実証的研究 NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches ( http://arxiv.org/abs/2309.07704v2 ) ライセンス: Link先を確認	Chi-en Amy Tai, Matthew Keller, Saeejith Nair, Yuhao Chen, Yifan Wu, Olivia Markham, Krish Parmar, Pengcheng Xi, Heather Keller, Sharon Kirkpatrick, Alexander Wong,	(参考訳) 栄養失調は、生活の質の低下に直接関連しているため、健康な食事を支援するための政策やプログラムを伝える上で、正確な食事摂取推定が重要である。しかし、食品日記のような自己申告方法には、かなりのバイアスが伴う。従来の食事アセスメント技術やモバイルアプリケーションのような新たな代替手法は、高い時間的コストを伴い、訓練された人員を必要とする可能性がある。最近の研究は、コンピュータビジョンと機械学習を使って食品画像から食事の摂取を自動推定することに重点を置いているが、多様な視点、モダリティ、食品アノテーションを持つ包括的なデータセットが欠如しているため、これらの手法の正確性や現実性を妨げている。この制限に対処するために、NutritionVerse-Synthは、84,984個のフォトリアリスティック合成2Dフードイメージを、関連する食事情報とマルチモーダルアノテーション(奥行き画像、インスタンスマスク、セマンティックマスクを含む)で構築した最初の大規模データセットである。さらに,リアルなイメージデータセットであるNutritionVerse-Real(NutritionVerse-Real)を収集し,リアル性を評価する。これらの新しいデータセットを活用して、間接セグメンテーションと直接予測ネットワークを含む様々な食事摂取推定手法の実証研究であるNutritionVerseを開発し、ベンチマークする。さらに、合成データと実画像との融合に関する洞察を提供するために、合成データに事前訓練された微調整モデルについて述べる。最後に、ダイエットセンシングのための機械学習を加速するオープンイニシアチブの一環として、両方のデータセット(NutritionVerse-Synth、NutritionVerse-Real)をhttps://www.kaggle.com/nutritionverse/datasetsでリリースします。 Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating, as malnutrition has been directly linked to decreased quality of life. However self-reporting methods such as food diaries suffer from substantial bias. Other conventional dietary assessment techniques and emerging alternative approaches such as mobile applications incur high time costs and may necessitate trained personnel. Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images, but the lack of comprehensive datasets with diverse viewpoints, modalities and food annotations hinders the accuracy and realism of such methods. To address this limitation, we introduce NutritionVerse-Synth, the first large-scale dataset of 84,984 photorealistic synthetic 2D food images with associated dietary information and multimodal annotations (including depth images, instance masks, and semantic masks). Additionally, we collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism. Leveraging these novel datasets, we develop and benchmark NutritionVerse, an empirical study of various dietary intake estimation approaches, including indirect segmentation-based and direct prediction networks. We further fine-tune models pretrained on synthetic data with real images to provide insights into the fusion of synthetic and real data. Finally, we release both datasets (NutritionVerse-Synth, NutritionVerse-Real) on https://www.kaggle.com/nutritionverse/datasets as part of an open initiative to accelerate machine learning for dietary sensing.	翻訳日:2024-09-04 22:35:08 公開日:2024-09-02
# エバネッセント電子波スピン Evanescent Electron Wave Spin ( http://arxiv.org/abs/2309.17325v5 ) ライセンス: Link先を確認	Ju Gao, Fang Shen,	(参考訳) 本研究は、ディラック方程式を解き、境界におけるスピノル波動関数の連続性を保証することにより、有限および無限量子井戸の外側にエバネッセント電子波が存在することを示す。このエバネッセント波は、すべての領域にまたがる電流密度の解析式により、井戸内に閉じ込められた波のスピン特性を共有することを示す。我々の発見は、電子は数学的特異点に制限されず、量子情報(量子エントロピー)はどんな閉じ込めでも漏れる可能性があることを示唆している。これらの結果は、ローレンツ不変電荷と電流密度によって完全に特徴づけられる電子波は、電子の真で唯一の存在であると考えられるべきであることを強調している。 This study demonstrates the existence of an evanescent electron wave outside both finite and infinite quantum wells by solving the Dirac equation and ensuring the continuity of the spinor wavefunction at the boundaries. We show that this evanescent wave shares the spin characteristics of the wave confined within the well, as indicated by analytical expressions for the current density across all regions. Our findings suggest that the electron cannot be confined to a mathematical singularity and that quantum information, or quantum entropy, can leak through any confinement. These results emphasize that the electron wave, fully characterized by Lorentz-invariant charge and current densities, should be considered the true and sole entity of the electron.	翻訳日:2024-09-04 22:35:08 公開日:2024-09-02
# エンタングルメントエントロピー計算のための再仮定に基づく量子モンテカルロ Resummation-based Quantum Monte Carlo for Entanglement Entropy Computation ( http://arxiv.org/abs/2310.01490v5 ) ライセンス: Link先を確認	Menghan Song, Ting-Tung Wang, Zi Yang Meng,	(参考訳) 最近開発されたSU($N$)スピンとループガスモデルに対する量子モンテカルロ法に基づいて, エンタングルメントエントロピー(EE)を計算し, 効率を大幅に向上させるアルゴリズムResumEEを開発した。我々のResumEEは指数関数的に小さな値である$\langle e^{-S^{(2)}}\rangle$の計算を指数関数的に高速化するが、$S^{(2)}$はR\'enyi EEであり、一般的な2D量子SU($N$)スピンモデルに対して$S^{(2)}$は容易に高精度に計算できる。提案したS^{(2)}$ 1Dおよび2D SU($2$)ハイゼンベルクスピンシステムの推定器を用いてアルゴリズムをベンチマークし、その優れた性能を明らかにするとともに、2D SU($N$)ハイゼンベルクモデル上でのN'eel-to-VBS遷移のエンタングルメントスケーリングデータを検出する。我々のResumEEアルゴリズムは、SU($N$)スピンモデルの絡み合いエントロピーを連続$N$で正確に評価し、高絡み合いの量子物質に対する共形場理論データへの信頼性の高いアクセスを行う。 Based on the recently developed resummation-based quantum Monte Carlo method for the SU($N$) spin and loop-gas models, we develop a new algorithm, dubbed ResumEE, to compute the entanglement entropy (EE) with greatly enhanced efficiency. Our ResumEE exponentially speeds up the computation of the exponentially small value of the $\langle e^{-S^{(2)}}\rangle$, where $S^{(2)}$ is the 2nd order R\'enyi EE, such that the $S^{(2)}$ for a generic 2D quantum SU($N$) spin models can be readily computed with high accuracy. We benchmark our algorithm with the previously proposed estimators of $S^{(2)}$ on 1D and 2D SU($2$) Heisenberg spin systems to reveal its superior performance and then use it to detect the entanglement scaling data of the N\'eel-to-VBS transition on 2D SU($N$) Heisenberg model with continuously varying $N$. Our ResumEE algorithm is efficient for precisely evaluating the entanglement entropy of SU($N$) spin models with continuous $N$ and reliable access to the conformal field theory data for the highly entangled quantum matter.	翻訳日:2024-09-04 22:24:42 公開日:2024-09-02
# EFL書記教育におけるLLM-as-a-tutor--学生-LLMインタラクションの評価に着目して- LLM-as-a-tutor in EFL Writing Education: Focusing on Evaluation of Student-LLM Interaction ( http://arxiv.org/abs/2310.05191v2 ) ライセンス: Link先を確認	Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Hyunseung Lim, Yoonsu Kim, Tak Yeon Lee, Hwajung Hong, Juho Kim, So-Yeon Ahn, Alice Oh,	(参考訳) LLM-as-a-tutorは、英語を外国語(EFL)として書く場合、エッセイにリアルタイムでフィードバックを提供することで、学生を支援することができる。しかし,LLM-as-a-tutorの評価には,教育と一般のユースケースの標準の相違による課題が生じる。このギャップを埋めるために、学生とLLMの相互作用を評価する教育原則を統合する。まず、LLMが英語教師として機能し、学生に合った効果的なエッセイフィードバックを提供する方法について検討する。第2に,EFL書記教育に特化して設計されたLLM-as-a-tutorを評価するための3つの指標を提案し,教育的側面を強調した。この過程で,LEM-as-a-tutorからの質と特性に対するフィードバックを評価した。一方、EFL学習者は、LLM-as-a-tutorとの相互作用から学習結果を評価する。このアプローチは、EFL学習者のニーズに合わせたLLM-as-a-tutorを開発するための基礎を築き、この文脈での書字教育の有効性を推し進める。 In the context of English as a Foreign Language (EFL) writing education, LLM-as-a-tutor can assist students by providing real-time feedback on their essays. However, challenges arise in assessing LLM-as-a-tutor due to differing standards between educational and general use cases. To bridge this gap, we integrate pedagogical principles to assess student-LLM interaction. First, we explore how LLMs can function as English tutors, providing effective essay feedback tailored to students. Second, we propose three metrics to evaluate LLM-as-a-tutor specifically designed for EFL writing education, emphasizing pedagogical aspects. In this process, EFL experts evaluate the feedback from LLM-as-a-tutor regarding quality and characteristics. On the other hand, EFL learners assess their learning outcomes from interaction with LLM-as-a-tutor. This approach lays the groundwork for developing LLMs-as-a-tutor tailored to the needs of EFL learners, advancing the effectiveness of writing education in this context.	翻訳日:2024-09-04 22:24:42 公開日:2024-09-02
# 拡散モデルの含意的概念除去 Implicit Concept Removal of Diffusion Models ( http://arxiv.org/abs/2310.05873v7 ) ライセンス: Link先を確認	Zhili Liu, Kai Chen, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok,	(参考訳) テキスト・ツー・イメージ(T2I)拡散モデルはしばしば、透かしや安全でない画像のような望ましくない概念を不注意に生成する。これらの概念は「単純概念」と呼ばれ、訓練中に意図せず学習され、推論中に制御不能に生成される。既存の除去方法は、主にモデルが実際に認識できない概念を認識する能力に依存しているため、暗黙的な概念を排除するのに依然として苦労している。そこで我々は,暗黙的概念の内在的幾何学的特徴を活用し,幾何学的制御に基づく新しい概念除去手法であるGeom-Erasingを提案する。具体的には、不要な暗黙的な概念が特定されると、その概念の存在と幾何学的情報をテキストプロンプトに統合し、アクセス可能な分類器や検出器モデルの助けを借りる。その後、モデルはこれらの情報を識別し、切り離すように最適化され、生成時に負のプロンプトとして採用される。さらに,暗黙的概念が容易に注入される現実の状況を反映した,3つの典型的な暗黙的概念(QRコード,透かし,テキスト)を付与した新しい画像テキストデータセットであるImplicit Concept Dataset(ICD)を導入する。 Geom-Erasingは暗黙的な概念の生成を効果的に軽減し、不適切なイメージプロンプト(I2P)と我々の挑戦的なImplicit Concept Dataset(ICD)ベンチマークで最先端の結果を達成する。 Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. These concepts, termed as the "implicit concepts", could be unintentionally learned during training and then be generated uncontrollably during inference. Existing removal methods still struggle to eliminate implicit concepts primarily due to their dependency on the model's ability to recognize concepts it actually can not discern. To address this, we utilize the intrinsic geometric characteristics of implicit concepts and present the Geom-Erasing, a novel concept removal method based on the geometric-driven control. Specifically, once an unwanted implicit concept is identified, we integrate the existence and geometric information of the concept into the text prompts with the help of an accessible classifier or detector model. Subsequently, the model is optimized to identify and disentangle this information, which is then adopted as negative prompts during generation. Moreover, we introduce the Implicit Concept Dataset (ICD), a novel image-text dataset imbued with three typical implicit concepts (i.e., QR codes, watermarks, and text), reflecting real-life situations where implicit concepts are easily injected. Geom-Erasing effectively mitigates the generation of implicit concepts, achieving the state-of-the-art results on the Inappropriate Image Prompts (I2P) and our challenging Implicit Concept Dataset (ICD) benchmarks.	翻訳日:2024-09-04 22:24:42 公開日:2024-09-02
# 文化コンパス:文化的な特徴を持つ攻撃言語検出における伝達学習の成功を予測する Cultural Compass: Predicting Transfer Learning Success in Offensive Language Detection with Cultural Features ( http://arxiv.org/abs/2310.06458v2 ) ライセンス: Link先を確認	Li Zhou, Antonia Karamolegkou, Wenyu Chen, Daniel Hershcovich,	(参考訳) 言語技術の普及は、特に攻撃的言語検出(Offensive Language Detection,OLD)のような文化的ニュアンスに大きく依存する主観的なタスクにおいて、機械学習領域における文化的多様性を考えるためのシフトを必要とする。現在の理解は、これらの課題が文化的価値に大きく影響されていることを強調しているが、文化的特徴がそのような主観的なタスクに対する異文化間移動学習の成功を正確に予測できるかどうかを判断する際、顕著なギャップが存在する。これに対応するために,文化的な特徴の交わりと伝達学習の有効性について検討した。これらの結果から,OLDタスクにおける異文化間移動学習の成功の予測力と,攻撃的な単語距離を用いてさらに改善できることが示唆された。これらの結果に基づいて,文化情報のデータセットへの統合を提唱する。さらに,文化適応性を高めるために,調査などの文化的情報に富んだデータソースを活用することを推奨する。我々の研究は、より包括的で文化的に敏感な言語技術の探求において、一歩前進していることを示している。 The increasing ubiquity of language technology necessitates a shift towards considering cultural diversity in the machine learning realm, particularly for subjective tasks that rely heavily on cultural nuances, such as Offensive Language Detection (OLD). Current understanding underscores that these tasks are substantially influenced by cultural values, however, a notable gap exists in determining if cultural features can accurately predict the success of cross-cultural transfer learning for such subjective tasks. Addressing this, our study delves into the intersection of cultural features and transfer learning effectiveness. The findings reveal that cultural value surveys indeed possess a predictive power for cross-cultural transfer learning success in OLD tasks and that it can be further improved using offensive word distance. Based on these results, we advocate for the integration of cultural information into datasets. Additionally, we recommend leveraging data sources rich in cultural information, such as surveys, to enhance cultural adaptability. Our research signifies a step forward in the quest for more inclusive, culturally sensitive language technologies.	翻訳日:2024-09-04 22:24:42 公開日:2024-09-02
# 超伝導量子シミュレータ上のスピン流体力学の研究 Probing spin hydrodynamics on a superconducting quantum simulator ( http://arxiv.org/abs/2310.06565v3 ) ライセンス: Link先を確認	Yun-Hao Shi, Zheng-Hang Sun, Yong-Yi Wang, Zheng-An Wang, Yu-Ran Zhang, Wei-Guo Ma, Hao-Tian Liu, Kui Zhao, Jia-Cheng Song, Gui-Han Liang, Zheng-Yang Mei, Jia-Chi Zhang, Hao Li, Chi-Tong Chen, Xiaohui Song, Jieci Wang, Guangming Xue, Haifeng Yu, Kaixuan Huang, Zhongcheng Xiang, Kai Xu, Dongning Zheng, Heng Fan,	(参考訳) 量子力学における流体力学的輸送の性質を特徴づけることは、物質の異方性非平衡相の基本的な理解に関する貴重な洞察を与える。大規模複雑量子系における無限温度輸送を実験的にシミュレーションすることは、かなりの関心事である。ここでは、制御可能でコヒーレントな超伝導量子シミュレータを用いて、Haar-ランダム状態を効率的に生成し、無限温度でスピン輸送をプローブするアナログ量子回路を実験的に実現する。エルゴード力学を用いたラダー型量子シミュレータのユニタリ進化過程における拡散スピン輸送の観察を行った。さらに, 熱処理の破壊に伴う異常なサブ拡散の兆候を明らかにするとともに, 強い障害や傾いた電位を受ける系の輸送特性について検討した。我々の研究は、アナログ量子シミュレーター上で無限温度のスピン輸送を探索するスケーラブルな方法を示し、輸送の観点から他の興味深い非平衡現象を研究する方法を示している。 Characterizing the nature of hydrodynamical transport properties in quantum dynamics provides valuable insights into the fundamental understanding of exotic non-equilibrium phases of matter. Experimentally simulating infinite-temperature transport on large-scale complex quantum systems is of considerable interest. Here, using a controllable and coherent superconducting quantum simulator, we experimentally realize the analog quantum circuit, which can efficiently prepare the Haar-random states, and probe spin transport at infinite temperature. We observe diffusive spin transport during the unitary evolution of the ladder-type quantum simulator with ergodic dynamics. Moreover, we explore the transport properties of the systems subjected to strong disorder or a tilted potential, revealing signatures of anomalous subdiffusion in accompany with the breakdown of thermalization. Our work demonstrates a scalable method of probing infinite-temperature spin transport on analog quantum simulators, which paves the way to study other intriguing out-of-equilibrium phenomena from the perspective of transport.	翻訳日:2024-09-04 22:24:42 公開日:2024-09-02
# DORec: 2次元自己監督機能を利用した分解対象再構成とセグメンテーション DORec: Decomposed Object Reconstruction and Segmentation Utilizing 2D Self-Supervised Features ( http://arxiv.org/abs/2310.11092v3 ) ライセンス: Link先を確認	Jun Wu, Sicheng Li, Sihui Ji, Yifei Yang, Yue Wang, Rong Xiong, Yiyi Liao,	(参考訳) 個々の物体の3D形状とテクスチャの復元は、操作、ポーズ推定、自律運転などの多くのロボット応用にとって不可欠である。しかし、複雑な背景から対象のオブジェクトを分解することは困難である。既存のアプローチのほとんどは、オブジェクトのインスタンスの認識を得るために、コストのかかる手作業によるラベルに依存しています。近年の2次元自己教師型学習の進歩は、興味の対象を識別する新たな可能性を提供するが、そのようなノイズの多い2次元特徴をクリーンな分解に活用することは困難である。本稿では,ニューラル暗黙表現に基づく分解対象再構成(DORec)ネットワークを提案する。私たちのキーとなるアイデアは、2次元の自己監督機能を使って、前景領域の2次元マスクと、意味的に類似した領域のKクラスターマスクという2つのレベルのマスクを作成することです。これらの相補的なマスクは、堅牢な分解をもたらす。異なるデータセットに対する実験結果は、ポーズ推定などの下流タスクを可能にする様々な背景から、DORecが様々な前景オブジェクトのセグメンテーションと再構成に優れていることを示している。 Recovering 3D geometry and textures of individual objects is crucial for many robotics applications, such as manipulation, pose estimation, and autonomous driving. However, decomposing a target object from a complex background is challenging. Most existing approaches rely on costly manual labels to acquire object instance perception. Recent advancements in 2D self-supervised learning offer new prospects for identifying objects of interest, yet leveraging such noisy 2D features for clean decomposition remains difficult. In this paper, we propose a Decomposed Object Reconstruction (DORec) network based on neural implicit representations. Our key idea is to use 2D self-supervised features to create two levels of masks for supervision: a binary mask for foreground regions and a K-cluster mask for semantically similar regions. These complementary masks result in robust decomposition. Experimental results on different datasets show DORec's superiority in segmenting and reconstructing diverse foreground objects from varied backgrounds enabling downstream tasks such as pose estimation.	翻訳日:2024-09-04 22:24:42 公開日:2024-09-02
# ExtractGPT: 製品属性値抽出のための大規模言語モデルの可能性を探る ExtractGPT: Exploring the Potential of Large Language Models for Product Attribute Value Extraction ( http://arxiv.org/abs/2310.12537v3 ) ライセンス: Link先を確認	Alexander Brinkmann, Roee Shraga, Christian Bizer,	(参考訳) ファセット商品検索や製品比較などの機能を容易にするため、Eコマースプラットフォームは正確な属性/値のペアを含む、正確に構造化された製品データを必要とする。ベンダーはしばしば、オファータイトルとテキスト記述のみからなる非構造化製品記述を提供する。そのため、eコマースプラットフォームでは、タイトルや説明から属性値を抽出することが不可欠である。 BERTのような事前学習言語モデルに基づく最先端属性値抽出法は2つの欠点に直面している一相当量のタスク特化訓練データを必要とする方法 (II) 微調整モデルでは, トレーニングデータに含まれない属性値の非表示化が問題となる。本稿では,大規模言語モデルを既存のAVE法に代わる,より訓練的なデータ効率,堅牢な代替手段として活用する可能性について検討する。ゼロショットシナリオと少数ショットシナリオの両方をカバーする,LLMへの抽出対象属性を記述するためのプロンプトテンプレートを提案する。ゼロショットシナリオでは、属性のテキストとJSONベースのターゲットスキーマ表現を比較します。数発のシナリオで、我々は調査する (i)例属性値の提供 (ii)文脈内デモンストレーションの選択三位置バイアスを防止するためのシャッフルアンサンブル (4)LDMを微調整する。 GPT-3.5 や GPT-4 のようなホスト型 LLM と組み合わせてプロンプトテンプレートを評価し,ローカルで実行できるオープンソース LLM を評価した。我々は,PLMの性能をSU-OpenTag,AVEQA,MAVEQAと比較した。平均F1スコアは86%で、GPT-4で達成された。 Llama-3-70B は GPT-4 よりも 3% しか性能が良くないため、競合するオープンソース代替品となっている。同じトレーニングデータから、このプロンプト/GPT-4の組み合わせは、平均6%のF1スコアで最高のPLMベースラインを上回っている。 In order to facilitate features such as faceted product search and product comparison, e-commerce platforms require accurately structured product data, including precise attribute/value pairs. Vendors often times provide unstructured product descriptions consisting only of an offer title and a textual description. Consequently, extracting attribute values from titles and descriptions is vital for e-commerce platforms. State-of-the-art attribute value extraction methods based on pre-trained language models, such as BERT, face two drawbacks (i) the methods require significant amounts of task-specific training data and (ii) the fine-tuned models have problems with generalising to unseen attribute values that were not part of the training data. This paper explores the potential of using large language models as a more training data-efficient and more robust alternative to existing AVE methods. We propose prompt templates for describing the target attributes of the extraction to the LLM, covering both zero-shot and few-shot scenarios. In the zero-shot scenario, textual and JSON-based target schema representations of the attributes are compared. In the few-shot scenario, we investigate (i) the provision of example attribute values, (ii) the selection of in-context demonstrations, (iii) shuffled ensembling to prevent position bias, and (iv) fine-tuning the LLM. We evaluate the prompt templates in combination with hosted LLMs, such as GPT-3.5 and GPT-4, and open-source LLMs which can be run locally. We compare the performance of the LLMs to the PLM-based methods SU-OpenTag, AVEQA, and MAVEQA. The highest average F1-score of 86% was achieved by GPT-4. Llama-3-70B performs only 3% worse than GPT-4, making it a competitive open-source alternative. Given the same training data, this prompt/GPT-4 combination outperforms the best PLM baseline by an average of 6% F1-score.	翻訳日:2024-09-04 22:24:42 公開日:2024-09-02
# 共謀マネーマシン:テレグラムの共謀チャンネルと利益モデル The Conspiracy Money Machine: Uncovering Telegram's Conspiracy Channels and their Profit Model ( http://arxiv.org/abs/2310.15977v2 ) ライセンス: Link先を確認	Vincenzo Imperati, Massimo La Morgia, Alessandro Mei, Alberto Maria Mongardini, Francesco Sassi,	(参考訳) 近年、ソーシャルメディアの主要プラットフォームは、ますます厳格なモデレーションポリシーを実践し、陰謀論に関連するコンテンツに対する禁止や制限をもたらしている。これらの制限を回避するために、陰謀論者は、より少ない制限で彼らの見解を表現し、広めることができるTelegramのような代替手段に目を向けている。 Telegramは、チャンネル、管理者だけがメッセージをブロードキャストできる仮想ルーム、より寛容なコンテンツポリシーを提供する。これらの特徴は、陰謀チャネルの複雑なエコシステムのための完璧な繁殖基盤を生み出しました。本稿では,この生態系を照明する。まず,陰謀チャネルを検出する手法を提案する。そして、陰謀のチャネルを17,000以上のチャンネルからなる4つの異なるコミュニティにまとめることができることを発見した。最後に,「共謀マネーマシン」を明らかにし,ほとんどの共謀チャンネルが加入者からの利益を積極的に求めていることを明らかにした。陰謀論者はeコマースプラットフォームを利用して、疑わしい商品を販売したり、アフィリエイトリンクを通じて利益を上げたりする。さらに,共謀チャネルは寄付やクラウドファンディングのプラットフォームを利用してキャンペーンの資金を集めることを観察する。この事業には何十万もの寄付者が参加し、約6600万ドル(約6億6000万円)の資金が投じられる。 In recent years, major social media platforms have implemented increasingly strict moderation policies, resulting in bans and restrictions on conspiracy theory-related content. To circumvent these restrictions, conspiracy theorists are turning to alternatives, such as Telegram, where they can express and spread their views with fewer limitations. Telegram offers channels, virtual rooms where only administrators can broadcast messages, and a more permissive content policy. These features have created the perfect breeding ground for a complex ecosystem of conspiracy channels. In this paper, we illuminate this ecosystem. First, we propose an approach to detect conspiracy channels. Then, we discover that conspiracy channels can be clustered into four distinct communities comprising over 17,000 channels. Finally, we uncover the "Conspiracy Money Machine," revealing how most conspiracy channels actively seek to profit from their subscribers. We find conspiracy theorists leverage e-commerce platforms to sell questionable products or lucratively promote them through affiliate links. Moreover, we observe that conspiracy channels use donation and crowdfunding platforms to raise funds for their campaigns. We determine that this business involves hundreds of thousands of donors and generates a turnover of almost $66 million.	翻訳日:2024-09-04 22:14:48 公開日:2024-09-02
# AMERICANO: 談話駆動分解とエージェントインタラクションによる論証生成 AMERICANO: Argument Generation with Discourse-driven Decomposition and Agent Interaction ( http://arxiv.org/abs/2310.20352v2 ) ライセンス: Link先を確認	Zhe Hu, Hou Pong Chan, Yu Yin,	(参考訳) 議論生成は自然言語処理において難しい課題であり、厳密な推論と適切なコンテンツ組織を必要とする。複雑なタスクを中間段階に分解する最近のチェーン・オブ・シークレット・プロンプトにインスパイアされ、エージェントインタラクションによる引数生成のための新しいフレームワークであるAmericanoを提案する。提案手法では,生成過程を議論理論に基づく逐次的行動に分解し,まず議論的談話成分を生成するために逐次的行動を実行する。人間の記述プロセスをさらに模倣し、現在の自己回帰言語モデルの左から右への生成パラダイムを改善するために、フィードバックに基づいて議論草案を自動評価・精査する引数修正モジュールを導入する。 Reddit/CMVデータセットのサブセットを用いて,提案手法の評価を行った。その結果,提案手法はエンド・ツー・エンド・エンドとチェーン・オブ・ワンド・プロンプトのどちらよりも優れており,多種多様な内容の一貫性と説得力のある議論を創出できることがわかった。 Argument generation is a challenging task in natural language processing, which requires rigorous reasoning and proper content organization. Inspired by recent chain-of-thought prompting that breaks down a complex task into intermediate steps, we propose Americano, a novel framework with agent interaction for argument generation. Our approach decomposes the generation process into sequential actions grounded on argumentation theory, which first executes actions sequentially to generate argumentative discourse components, and then produces a final argument conditioned on the components. To further mimic the human writing process and improve the left-to-right generation paradigm of current autoregressive language models, we introduce an argument refinement module which automatically evaluates and refines argument drafts based on feedback received. We evaluate our framework on the task of counterargument generation using a subset of Reddit/CMV dataset. The results show that our method outperforms both end-to-end and chain-of-thought prompting methods and can generate more coherent and persuasive arguments with diverse and rich contents.	翻訳日:2024-09-04 22:14:48 公開日:2024-09-02
# Seneca: Javaオブジェクトデシリアライズのためのタイトベースのコールグラフ構築 Seneca: Taint-Based Call Graph Construction for Java Object Deserialization ( http://arxiv.org/abs/2311.00943v2 ) ライセンス: Link先を確認	Joanna C. S. Santos, Mehdi Mirakhorli, Ali Shokri,	(参考訳) オブジェクトのシリアライゼーションとデシリアライゼーションは、ファイル、メモリ、データベース内のオブジェクトの保存と保存、マシン間での転送、プロセス間のリモートインタラクションなどに広く利用されている。このメカニズムは動的言語であるリフレクションに依存しており、静的解析に深刻な課題をもたらす。現在の最先端のコールグラフ構築アルゴリズムは、オブジェクトのシリアライズ/デシリアライズを完全にサポートしていない。コールグラフは、複数のタイプの分析(例えば脆弱性検出)のためのコアデータ構造であるため、コールバックメソッドを介して発生する隠された(脆弱な)パスをキャプチャしないため、適切な解析を行うことはできない。本稿では,コールグラフ構築における音質向上によるシリアライズ処理手法であるSeecaを提案する。提案手法は,音声呼出グラフを構築するために,テナント解析とAPIモデリングに依存している。我々は,信頼できないオブジェクトデシリアライズ脆弱性の検出において,音質,精度,性能,有用性に関して,我々のアプローチを評価した。この結果から,Seecaはシリアライズ機能に関して,音声コールグラフを作成できることがわかった。その結果、コールグラフは実行時の大きなオーバーヘッドを発生させず、信頼できないオブジェクトデシリアライゼーションによる脆弱なパスの識別に有用であることが示されている。 Object serialization and deserialization are widely used for storing and preserving objects in files, memory, or database as well as for transporting them across machines, enabling remote interaction among processes and many more. This mechanism relies on reflection, a dynamic language that introduces serious challenges for static analyses. Current state-of-the-art call graph construction algorithms do not fully support object serialization/deserialization, i.e., they are unable to uncover the callback methods that are invoked when objects are serialized and deserialized. Since call graphs are a core data structure for multiple types of analysis (e.g., vulnerability detection), an appropriate analysis cannot be performed since the call graph does not capture hidden (vulnerable) paths that occur via callback methods. In this paper, we present Seneca, an approach for handling serialization with improved soundness in the context of call graph construction. Our approach relies on taint analysis and API modeling to construct sound call graphs. We evaluated our approach with respect to soundness, precision, performance, and usefulness in detecting untrusted object deserialization vulnerabilities. Our results show that Seneca can create sound call graphs with respect to serialization features. The resulting call graphs do not incur significant runtime overhead and were shown to be useful for performing identification of vulnerable paths caused by untrusted object deserialization.	翻訳日:2024-09-04 22:14:48 公開日:2024-09-02
# LLMの物理シミュレーション能力 Physics simulation capabilities of LLMs ( http://arxiv.org/abs/2312.02091v2 ) ライセンス: Link先を確認	Mohamad Ali-Dib, Kristen Menou,	(参考訳) [Abridged abstract]Large Language Models (LLMs)は、学部レベルから大学院レベルの物理教科書の問題を解くことができ、コーディングに精通している。これら2つの能力を組み合わせることで、いつかAIシステムが物理的な世界をシミュレートし予測できるようになるだろう。本稿では、PhDレベルから研究レベルの計算物理問題に対するSOTA (State-of-the-art) LLMの評価を行う。物理・天体物理学領域における符号化能力を引き出すために, 文書化・広く利用されているパッケージを用いてLCM生成を行う。我々は、天体力学(REBOUND)、恒星物理学(MESA)、1次元流体力学(Dedalus)、非線形力学(SciPy)において、$\sim 50$のオリジナルかつ挑戦的な問題に貢献する。我々の問題は、ユニークな解を認めていないため、異なるタイプのエラー(コーディング、物理、必要性、十分性)を含む行数と、その問題の健全な物理成分を捉えることに焦点を当てた、より「教育的な」パスフェイル測定値という、いくつかのソフトメトリクス上でのLLM性能を評価する。予想通り、今日のSOTA LLM(GPT4)ゼロショットは、ほとんどの問題に失敗する。生成したコード行の約70-90 %$は必要であり、十分で正しい(コード \&物理)。物理とコーディングのエラーが最も一般的で、不必要な線や不十分な線がある。問題クラスと難易度に有意なばらつきを観察する。我々は計算物理領域におけるGPT4のいくつかの障害モードを同定する。我々の偵察作業は、古典物理学における現在の計算能力のスナップショットを提供し、もしAIシステムが物理学シミュレーション能力において基本的な自律性に達することがあれば、明らかな改善目標を指摘する。 [Abridged abstract] Large Language Models (LLMs) can solve some undergraduate-level to graduate-level physics textbook problems and are proficient at coding. Combining these two capabilities could one day enable AI systems to simulate and predict the physical world. We present an evaluation of state-of-the-art (SOTA) LLMs on PhD-level to research-level computational physics problems. We condition LLM generation on the use of well-documented and widely-used packages to elicit coding capabilities in the physics and astrophysics domains. We contribute $\sim 50$ original and challenging problems in celestial mechanics (with REBOUND), stellar physics (with MESA), 1D fluid dynamics (with Dedalus) and non-linear dynamics (with SciPy). Since our problems do not admit unique solutions, we evaluate LLM performance on several soft metrics: counts of lines that contain different types of errors (coding, physics, necessity and sufficiency) as well as a more "educational" Pass-Fail metric focused on capturing the salient physical ingredients of the problem at hand. As expected, today's SOTA LLM (GPT4) zero-shot fails most of our problems, although about 40\% of the solutions could plausibly get a passing grade. About $70-90 \%$ of the code lines produced are necessary, sufficient and correct (coding \& physics). Physics and coding errors are the most common, with some unnecessary or insufficient lines. We observe significant variations across problem class and difficulty. We identify several failure modes of GPT4 in the computational physics domain. Our reconnaissance work provides a snapshot of current computational capabilities in classical physics and points to obvious improvement targets if AI systems are ever to reach a basic level of autonomy in physics simulation capabilities.	翻訳日:2024-09-04 22:02:40 公開日:2024-09-02
# FRDiff : 拡散モデルのユニバーサルトレーニングフリー加速のための特徴再利用 FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models ( http://arxiv.org/abs/2312.03517v3 ) ライセンス: Link先を確認	Junhyuk So, Jungwon Lee, Eunhyeok Park,	(参考訳) 拡散モデルの相当な計算コストは、特に高品質な画像生成に必要な繰り返しのデノゲーションステップのため、その普及の大きな障害となっている。いくつかの研究は、微調整なしで高度なODEソルバを用いてスコア関数評価(NFE)の数を減らし、この問題に対処しようとしているが、デノナイジングイテレーションの減少は詳細を更新する機会を逃し、顕著な品質劣化をもたらす。本研究では,拡散モデルに固有の時間的冗長性を活用する高度な加速手法を提案する。時間的類似度の高い特徴マップの再利用は、出力品質を損なうことなく計算資源を節約する新たな機会を開く。この直観の実用的メリットを実現するために、我々は広範囲な分析を行い、新しい手法であるFRDiffを提案する。 FRDiffは、削減されたNFEと機能の再利用の両方の利点を活用するように設計されており、様々な生成タスクにおける忠実性と遅延トレードオフのバランスをとるParetoフロンティアを実現している。 The substantial computational costs of diffusion models, especially due to the repeated denoising steps necessary for high-quality image generation, present a major obstacle to their widespread adoption. While several studies have attempted to address this issue by reducing the number of score function evaluations (NFE) using advanced ODE solvers without fine-tuning, the decreased number of denoising iterations misses the opportunity to update fine details, resulting in noticeable quality degradation. In our work, we introduce an advanced acceleration technique that leverages the temporal redundancy inherent in diffusion models. Reusing feature maps with high temporal similarity opens up a new opportunity to save computation resources without compromising output quality. To realize the practical benefits of this intuition, we conduct an extensive analysis and propose a novel method, FRDiff. FRDiff is designed to harness the advantages of both reduced NFE and feature reuse, achieving a Pareto frontier that balances fidelity and latency trade-offs in various generative tasks.	翻訳日:2024-09-04 22:02:40 公開日:2024-09-02
# Pearl: 生産対応強化学習エージェント Pearl: A Production-ready Reinforcement Learning Agent ( http://arxiv.org/abs/2312.03814v2 ) ライセンス: Link先を確認	Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu,	(参考訳) 強化学習(Reinforcement Learning, RL)は、長期的な目標を最適化するための汎用的なフレームワークである。実世界の多くの問題はRLで定式化できるが、実演的なRLポリシーの学習と展開には、探索・探索ジレンマ、部分観測可能性、動的行動空間、安全上の懸念など、いくつかの重要な課題に対処するように設計されたシステムが必要である。これらの課題の重要性は認識されているが、既存のオープンソースRLライブラリはそれらに明示的に対処していない。本稿では、これらの課題をモジュラー方式で受け入れるように設計された、生産対応RLソフトウェアパッケージであるPearlを紹介する。ベンチマーク結果の提示に加えて、運用ユースケースのメリットを示すために、Pearlが進行中の業界採用事例も取り上げている。 PearlはGitHubのgithub.com/facebookresearch/pearlでオープンソース公開されている。公式ウェブサイトはpearlagent.github.ioである。 Reinforcement learning (RL) is a versatile framework for optimizing long-term goals. Although many real-world problems can be formalized with RL, learning and deploying a performant RL policy requires a system designed to address several important challenges, including the exploration-exploitation dilemma, partial observability, dynamic action spaces, and safety concerns. While the importance of these challenges has been well recognized, existing open-source RL libraries do not explicitly address them. This paper introduces Pearl, a Production-Ready RL software package designed to embrace these challenges in a modular way. In addition to presenting benchmarking results, we also highlight examples of Pearl's ongoing industry adoption to demonstrate its advantages for production use cases. Pearl is open sourced on GitHub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io.	翻訳日:2024-09-04 22:02:40 公開日:2024-09-02
# プライバシを意識したビジュアル質問応答 Privacy-Aware Document Visual Question Answering ( http://arxiv.org/abs/2312.10108v2 ) ライセンス: Link先を確認	Rubèn Tito, Khanh Nguyen, Marlon Tobaben, Raouf Kerkouche, Mohamed Ali Souibgui, Kangsoo Jung, Joonas Jälkö, Vincent Poulain D'Andecy, Aurelie Joseph, Lei Kang, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas,	(参考訳) Document Visual Question Answering (DocVQA)は、文書理解の中心的なタスクへと急速に成長してきた。しかし、文書には機密情報や著作権情報が含まれているにもかかわらず、現在のDocVQAの方法はいずれも強力なプライバシー保証を提供していない。本研究では、DocVQAの領域におけるプライバシを初めて探求し、DocVQAに使用される最先端のマルチモーダルLCMモデルのプライバシ問題を強調し、可能なソリューションを探る。具体的には,インボイス処理を現実的な文書理解シナリオとして重視し,インボイス文書と関連する質問や回答からなる大規模DocVQAデータセットを提案する。我々は,異なるビジネスにおける文書のリアルタイム配信を反映したフェデレーション学習方式を採用し,請求書提供者のデータが保護すべき機密情報である場合について検討する。プライベートでないモデルは記憶に残る傾向があり、プライベートな情報が露出する可能性があることを実証する。この多モードシナリオでは,視覚(文書画像)と言語(OCRトークン)という2つの入力モダリティのいずれかによって,センシティブな情報が露出する可能性がある。最後に,モデルの記憶効果を利用した攻撃を設計し,DocVQAモデルを探索する上での有効性を実証する。 Document Visual Question Answering (DocVQA) has quickly grown into a central task of document understanding. But despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong privacy guarantees. In this work, we explore privacy in the domain of DocVQA for the first time, highlighting privacy issues in state of the art multi-modal LLM models used for DocVQA, and explore possible solutions. Specifically, we focus on invoice processing as a realistic document understanding scenario, and propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers. We employ a federated learning scheme, that reflects the real-life distribution of documents in different businesses, and we explore the use case where the data of the invoice provider is the sensitive information to be protected. We demonstrate that non-private models tend to memorise, a behaviour that can lead to exposing private information. We then evaluate baseline training schemes employing federated learning and differential privacy in this multi-modal scenario, where the sensitive information might be exposed through either or both of the two input modalities: vision (document image) or language (OCR tokens). Finally, we design attacks exploiting the memorisation effect of the model, and demonstrate their effectiveness in probing a representative DocVQA models.	翻訳日:2024-09-04 21:42:39 公開日:2024-09-02
# CLIPを用いたPrompt Vision-Language Fusionによる歩行者属性認識 Pedestrian Attribute Recognition via CLIP based Prompt Vision-Language Fusion ( http://arxiv.org/abs/2312.10692v2 ) ライセンス: Link先を確認	Xiao Wang, Jiandong Jin, Chenglong Li, Jin Tang, Cheng Zhang, Wei Wang,	(参考訳) 既存の歩行者属性認識(PAR)アルゴリズムでは、歩行者画像と属性ラベルの関係が不十分なため、視覚特徴学習のバックボーンネットワークとしてCNN(例えばResNet)を採用している。本稿では、PARを視覚言語融合問題として定式化し、歩行者画像と属性ラベルの関係をフル活用する。具体的には、まず属性句を文に拡張し、次に事前学習された視覚言語モデルCLIPを、視覚画像と属性記述の特徴埋め込みのためのバックボーンとして採用する。対照的な学習目的は、CLIPベースの機能空間において、ビジョンと言語モダリティをうまく結びつけ、CLIPで使用されるトランスフォーマー層は、ピクセル間の長距離関係をキャプチャすることができる。次に、デュアル機能を効果的に融合するためにマルチモーダルトランスを採用し、フィードフォワードネットワークを用いて属性を予測する。ネットワークを効率よく最適化するために、領域認識型プロンプトチューニング手法を提案し、ごく少数のパラメータ(プロンプトベクトルと分類ヘッドのみ)を調整し、事前学習されたVLモデルとマルチモーダルトランスフォーマーの両方を修正する。提案するPARアルゴリズムは, 微調整手法と比較して0.75%しか学習可能なパラメータを調整できない。 RAPv1, RAPv2, WIDER, PA100K, PETA-ZS, RAP-ZSデータセットなど,PARの標準設定とゼロショット設定の両方で、新たな最先端パフォーマンスを実現している。ソースコードと事前トレーニングされたモデルはhttps://github.com/Event-AHU/OpenPARでリリースされる。 Existing pedestrian attribute recognition (PAR) algorithms adopt pre-trained CNN (e.g., ResNet) as their backbone network for visual feature learning, which might obtain sub-optimal results due to the insufficient employment of the relations between pedestrian images and attribute labels. In this paper, we formulate PAR as a vision-language fusion problem and fully exploit the relations between pedestrian images and attribute labels. Specifically, the attribute phrases are first expanded into sentences, and then the pre-trained vision-language model CLIP is adopted as our backbone for feature embedding of visual images and attribute descriptions. The contrastive learning objective connects the vision and language modalities well in the CLIP-based feature space, and the Transformer layers used in CLIP can capture the long-range relations between pixels. Then, a multi-modal Transformer is adopted to fuse the dual features effectively and feed-forward network is used to predict attributes. To optimize our network efficiently, we propose the region-aware prompt tuning technique to adjust very few parameters (i.e., only the prompt vectors and classification heads) and fix both the pre-trained VL model and multi-modal Transformer. Our proposed PAR algorithm only adjusts 0.75% learnable parameters compared with the fine-tuning strategy. It also achieves new state-of-the-art performance on both standard and zero-shot settings for PAR, including RAPv1, RAPv2, WIDER, PA100K, and PETA-ZS, RAP-ZS datasets. The source code and pre-trained models will be released on https://github.com/Event-AHU/OpenPAR.	翻訳日:2024-09-04 21:42:39 公開日:2024-09-02
# 流通シフト下における私的移動学習のための公共表現のメリットについて On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift ( http://arxiv.org/abs/2312.15551v4 ) ライセンス: Link先を確認	Pratiksha Thaker, Amrith Setlur, Zhiwei Steven Wu, Virginia Smith,	(参考訳) 公的な事前訓練は、微分プライベートモデルトレーニングを改善するための有望なアプローチである。しかし、近年の研究では、このパラダイムを研究する多くの肯定的な研究成果は、分散タスクのみを考慮しており、事前学習データと微調整データの間に分散シフトがある設定には適用できない可能性がある、と指摘している。本研究では、公開データからのゼロショットのパフォーマンスとプライベートデータによるゼロショットのトレーニングの両方が、不可能なほど弱い結果をもたらすような、大規模な分散シフトの設定においても、3つのタスクを経験的に比較し、パブリック機能は、スクラッチからプライベートトレーニングよりも最大67倍まで、プライベートトレーニングの精度を向上させることができることを示す。この現象の理論的説明として、公開データとプライベートデータが低次元表現を共有している場合、公開データのみからプライベートタスクを学習できない場合でも、公開表現はプライベートトレーニングのサンプル複雑さを改善することができることを示す。いずれにせよ,我々の結果は,公開データによって,極端分布シフトの現実的な設定において,私的なトレーニングを現実的に行うことができることを示すものである。 Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data -- a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67\% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.	翻訳日:2024-09-04 21:42:39 公開日:2024-09-02
# 半教師型医用画像セグメンテーションのためのデュアルスケール・クロスジェネレーション・一貫性学習 Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation ( http://arxiv.org/abs/2312.16039v2 ) ライセンス: Link先を確認	Yunqi Gu, Tao Zhou, Yizhe Zhang, Yi Zhou, Kelei He, Chen Gong, Huazhu Fu,	(参考訳) 医用画像のセグメンテーションはコンピュータ支援診断において重要な役割を担っている。しかし、既存の手法は完全な教師付きトレーニングに大きく依存しており、大量のラベル付きデータと時間を要するピクセル単位のアノテーションを必要とする。さらに, 形状, サイズ, 位置の相違により, 病変の区分けが困難である。これらの課題に対処するため、我々は、半教師付き医用画像分割(DEC-Seg)のための新しいDual-scale Enhanced and Cross-generative consistency learning frameworkを提案する。まず,クロスレベル・フィーチャー・アグリゲーション(CFA, Cross-level Feature Aggregation)モジュールを提案する。スケールの変動に対処するため,異なるスケールで同じ入力画像から生成されたセグメンテーションマップの一貫性を保証する,スケール強化された一貫性制約を提案する。この制約は病変の大きさの変動に対処し、モデルの堅牢性を改善するのに役立ちます。さらに,クロスセグメントマップを用いて,原画像と摂動画像の再構成が可能なクロスジェネレーション整合性スキームを提案する。この一貫性の制約により、効率的な特徴表現を抽出し、セグメンテーション性能を高めることができます。そこで本研究では,異なるスケールで動作する2つのデコーダの機能を統合し,より正確なセグメンテーションマップを作成するためのDCFモジュールを提案する。複数の医学的セグメンテーションタスク(ポリープ、皮膚病変、脳グリオーマ)の広範な実験結果から、DEC-Segの他の最先端の半教師付きセグメンテーションアプローチに対する効果が示された。実装コードはhttps://github.com/taozh2017/DECSeg.comで公開される。 Medical image segmentation plays a crucial role in computer-aided diagnosis. However, existing methods heavily rely on fully supervised training, which requires a large amount of labeled data with time-consuming pixel-wise annotations. Moreover, accurately segmenting lesions poses challenges due to variations in shape, size, and location. To address these issues, we propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image Segmentation (DEC-Seg). First, we propose a Cross-level Feature Aggregation (CFA) module that integrates cross-level adjacent layers to enhance the feature representation ability across different resolutions. To address scale variation, we present a scale-enhanced consistency constraint, which ensures consistency in the segmentation maps generated from the same input image at different scales. This constraint helps handle variations in lesion sizes and improves the robustness of the model. Furthermore, we propose a cross-generative consistency scheme, in which the original and perturbed images can be reconstructed using cross-segmentation maps. This consistency constraint allows us to mine effective feature representations and boost the segmentation performance. To further exploit the scale information, we propose a Dual-scale Complementary Fusion (DCF) module that integrates features from two scale-specific decoders operating at different scales to help produce more accurate segmentation maps. Extensive experimental results on multiple medical segmentation tasks (polyp, skin lesion, and brain glioma) demonstrate the effectiveness of our DEC-Seg against other state-of-the-art semi-supervised segmentation approaches. The implementation code will be released at https://github.com/taozh2017/DECSeg.	翻訳日:2024-09-04 21:42:39 公開日:2024-09-02
# 統合センシング・通信システムにおけるディープラーニングに基づくターゲット・ツー・ユーザ・アソシエーション Deep Learning-based Target-To-User Association in Integrated Sensing and Communication Systems ( http://arxiv.org/abs/2401.12801v2 ) ライセンス: Link先を確認	Lorenzo Cazzella, Marouan Mizmizi, Dario Tagliaferri, Damiano Badini, Matteo Matteucci, Umberto Spagnolini,	(参考訳) ISAC(Integrated Sensing and Communication)システムでは、レーダターゲットと通信ユーザ機器(UE)をマッチングすることは、アクティブハンドオーバやビーム予測など、いくつかの通信タスクに有効である。本稿では,基地局 (BS) にマルチインプット・マルチプル・アウトプット (MIMO) レーダーを装備する,二重目的のレーダ支援通信システムについて考察する。一通信用ビーム空間における車両用レーダー目標と車両用機器(VE)とを関連付けること。 (II)レーダデータから各VEのビーム形成ベクトルを予測する。提案されたターゲット・ツー・ユーザー協会(T2U)は2つの段階から構成される。まず、レンジ角画像から車両レーダー目標を検出し、それぞれビームフォーミングベクトルを推定する。そして、この推定されたターゲット毎ビームフォーミングベクトルをBSで使用されるものと照合して通信し、T2Uアソシエーションを行う。共用マルチターゲット検出とビーム推定は、模擬レンジアングルレーダ画像を用いてトレーニングされたYOLOモデルに修正を加えて得られる。都会の車両移動シナリオのシミュレーション結果から,提案手法はBSアンテナアレイの大きさに応じて増大する正しい相関の確率を示し,ビーム空間におけるVEの分離性の向上を浮き彫りにした。さらに, 改良されたYOLOアーキテクチャは, ビーム予測とレーダ目標検出の両方を効果的に行うことができることを示す。 In Integrated Sensing and Communication (ISAC) systems, matching the radar targets with communication user equipments (UEs) is functional to several communication tasks, such as proactive handover and beam prediction. In this paper, we consider a radar-assisted communication system where a base station (BS) is equipped with a multiple-input-multiple-output (MIMO) radar that has a double aim: (i) associate vehicular radar targets to vehicular equipments (VEs) in the communication beamspace and (ii) predict the beamforming vector for each VE from radar data. The proposed target-to-user (T2U) association consists of two stages. First, vehicular radar targets are detected from range-angle images, and, for each, a beamforming vector is estimated. Then, the inferred per-target beamforming vectors are matched with the ones utilized at the BS for communication to perform target-to-user (T2U) association. Joint multi-target detection and beam inference is obtained by modifying the you only look once (YOLO) model, which is trained over simulated range-angle radar images. Simulation results over different urban vehicular mobility scenarios show that the proposed T2U method provides a probability of correct association that increases with the size of the BS antenna array, highlighting the respective increase of the separability of the VEs in the beamspace. Moreover, we show that the modified YOLO architecture can effectively perform both beam prediction and radar target detection, with similar performance in mean average precision on the latter over different antenna array sizes.	翻訳日:2024-09-04 21:31:42 公開日:2024-09-02
# オラクル骨スクリプト認識・解読のためのオープンデータセット An open dataset for oracle bone script recognition and decipherment ( http://arxiv.org/abs/2401.15365v4 ) ライセンス: Link先を確認	Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Jinpeng Wan, Haisu Guan, Zhebin Kuang, Lianwen Jin, Xiang Bai, Yuliang Liu,	(参考訳) 古代中国最古の書体の一つ、Oracleの骨書は、3000年前にさかのぼる上海王朝の人文・地理を研究する学者にとって、貴重な研究資料を提示している。これらの著作の歴史的・文化的意義は過大評価されない。しかし、時間の経過はそれらの意味の多くを曖昧にしており、これらの古代のテキストを解読する上で重要な課題が提示されている。人工知能(AI)の出現により、Oracle Bone Characters(OBC)の解読を支援するAIが実現可能な選択肢となっている。しかし、この分野の進歩は高品質なデータセットの欠如によって妨げられている。この問題に対処するために、HUST-OBCデータセットの作成について詳述する。このデータセットは1,588個の個別の解読文字の77,064個の画像と9,411個の未解読文字の62,989個の画像を含む。このデータセットは、未知のOBCを解読する将来の研究を刺激し、支援することを期待している。すべてのコードとデータセットはhttps://github.com/Yuliang-Liu/Open-Oracleで公開されている。 Oracle bone script, one of the earliest known forms of ancient Chinese writing, presents invaluable research materials for scholars studying the humanities and geography of the Shang Dynasty, dating back 3,000 years. The immense historical and cultural significance of these writings cannot be overstated. However, the passage of time has obscured much of their meaning, presenting a significant challenge in deciphering these ancient texts. With the advent of Artificial Intelligence (AI), employing AI to assist in deciphering Oracle Bone Characters (OBCs) has become a feasible option. Yet, progress in this area has been hindered by a lack of high-quality datasets. To address this issue, this paper details the creation of the HUST-OBC dataset. This dataset encompasses 77,064 images of 1,588 individual deciphered characters and 62,989 images of 9,411 undeciphered characters, with a total of 140,053 images, compiled from diverse sources. The hope is that this dataset could inspire and assist future research in deciphering those unknown OBCs. All the codes and datasets are available at https://github.com/Yuliang-Liu/Open-Oracle.	翻訳日:2024-09-04 21:31:42 公開日:2024-09-02
# ハイブリッド量子回路におけるノイズ誘起相転移 Noise-induced phase transitions in hybrid quantum circuits ( http://arxiv.org/abs/2401.16631v2 ) ライセンス: Link先を確認	Shuo Liu, Ming-Rui Li, Shi-Xin Zhang, Shao-Kai Jian, Hong Yao,	(参考訳) 実際の物理系に固有の量子ノイズの存在は、局所的なランダムなユニタリと中間回路の測定を持つハイブリッド量子回路の物理学に強い影響を与える。大きさに依存しない発生確率を持つ量子ノイズは、測定によって引き起こされる絡み合い相転移の消失と、1つのエリアロー相の出現につながる。本研究では、サイズ依存確率$q=p/L^{\alpha}$における量子ノイズの影響について検討する。我々は,体積法則からパワー(面積)法則への雑音誘起エンタングルメント相転移を,$\alpha=1$のときの$p$増加として同定した。実効統計モデルの助けを借りて、2種類のスピン配置の競合から生じる位相遷移が一階述語であることを明らかにし、ノイズ誘起符号遷移と同じ解析的理解を共有する。この統合された図は、絡み合い行動と情報保護の能力との間の関係の理解をさらに深めている。 $\alpha \neq 1$ の場合、1つのスピン構成は常に$p$ によらず支配的であり、したがって相転移は消滅する。さらに,サイズ依存バルクノイズと境界雑音の差を強調した。我々は、安定化回路シミュレーションによる解析的予測を広範囲な数値計算で検証する。 The presence of quantum noises inherent to real physical systems can strongly impact the physics in hybrid quantum circuits with local random unitaries and mid-circuit measurements. The quantum noises with a size-independent occurring probability can lead to the disappearance of a measurement-induced entanglement phase transition and the emergence of a single area-law phase. In this work, we investigate the effects of quantum noises with size-dependent probabilities $q=p/L^{\alpha}$ where $\alpha$ represents the scaling exponent. We have identified a noise-induced entanglement phase transition from a volume law to a power (area) law in the presence (absence) of measurements as $p$ increases when $\alpha=1$. With the help of an effective statistical model, we reveal that the phase transition is of first-order arising from the competition between two types of spin configurations and shares the same analytical understanding as the noise-induced coding transition. This unified picture further deepens the understanding of the connection between entanglement behavior and the capacity of information protection. When $\alpha \neq 1$, one spin configuration always dominates regardless of $p$ and thus the phase transition disappears. Moreover, we highlight the difference between the effects of size-dependent bulk noise and boundary noises. We validate our analytical predictions with extensive numerical results from stabilizer circuit simulations.	翻訳日:2024-09-04 21:31:42 公開日:2024-09-02
# 非マルコフ開量子ダイナミクスの量子シミュレーションに向けて:普遍的でコンパクトな理論 Towards Quantum Simulation of Non-Markovian Open Quantum Dynamics: A Universal and Compact Theory ( http://arxiv.org/abs/2401.17255v3 ) ライセンス: Link先を確認	Xiang Li, Su-Xiang Lyu, Yao Wang, Rui-Xue Xu, Xiao Zheng, YiJing Yan,	(参考訳) 非マルコビアン性(非マルコビアン性、Non-Markovianity)は、その時間進化史におけるオープン量子系の複雑な依存であり、様々な科学分野において大きな意味を持つ。しかし、複雑な非マルコフ効果を正確に特徴づけることは、数値シミュレーションにとって大きな課題となっている。量子コンピューティング技術は将来性を示すが、実用的な量子アルゴリズムの実装を可能にする普遍理論は解明されてきた。本稿では,2次量子化(DQME-SQ)において,ディシパトン埋め込み量子マスター方程式を導入することで,このギャップに対処する。この正確でコンパクトな理論は、量子回路による表現可能性と任意のガウス環境への普遍的適用性という2つの大きな利点を提供する。ボゾンおよびフェルミオン環境における非マルコフ散逸動力学のディジタル量子シミュレーションによりこれらの能力を実証する。 DQME-SQフレームワークは、急速に進歩する量子コンピューティング技術を活用することで、複雑なオープン量子システムの効率的な探索のための新たな地平を開く。 Non-Markovianity, the intricate dependence of an open quantum system on its temporal evolution history, holds tremendous implications across various scientific disciplines. However, accurately characterizing the complex non-Markovian effects has posed a formidable challenge for numerical simulations. While quantum computing technologies show promise, a universal theory enabling practical quantum algorithm implementation has been elusive. We address this gap by introducing the dissipaton-embedded quantum master equation in second quantization (DQME-SQ). This exact and compact theory offers two key advantages: representability by quantum circuits and universal applicability to any Gaussian environment. We demonstrate these capabilities through digital quantum simulations of non-Markovian dissipative dynamics in both bosonic and fermionic environments. The DQME-SQ framework opens a new horizon for the efficient exploration of complex open quantum systems by leveraging the rapidly advancing quantum computing technologies.	翻訳日:2024-09-04 19:43:36 公開日:2024-09-02
# EuroPED-NN: 疑わしいサロゲートモデル EuroPED-NN: Uncertainty aware surrogate model ( http://arxiv.org/abs/2402.00760v2 ) ライセンス: Link先を確認	A. Panera Alvarez, A. Ho, A. Jarvinen, S. Saarelma, S. Wiesen, JET Contributors, the AUG team,	(参考訳) 本研究は,ノイズコントラッシブ先行(BNN-NCP)技術を用いたベイズニューラルネットワークを用いて,EuroPEDプラズマペデスタルモデルの不確実性を考慮したサロゲートモデルの生成に成功した。このモデルは、JET-ILW台座データベースのデータと、それに続くモデル評価を用いて、EuroPED-NNに従って訓練される。 BNN-NCP法は不確実性を考慮した代理モデルを生成するのに適した方法であることが証明されている。これは通常のニューラルネットワークの出力結果と一致し、予測の信頼度を不確実性として提供する。さらに、サロゲートモデルの不確実性を使用して、アウト・オブ・ディストリビューション(OOD)領域を強調する。これにより、モデルの堅牢性と信頼性に関する重要な洞察が得られる。 EuroPED-NNはまず電子密度$n_e\! プラズマ電流の増加に対して \left(\psi_{\text{pol}}=0.94\right)$, $I_p$, 第2に、EuroPEDモデルに関連する$\Delta-\beta_{p,ped}$関係を検証する。このことは、代理モデルによって学習された基礎物理学の頑健さを裏付けるものである。さらに,この手法を用いて,実験データ,すなわちJETデータベースで機能する不確実性を考慮した実験モデルを用いてEuroPEDライクなモデルを構築した。どちらのモデルも、$\sim 50$ AUGのショットでテストされている。 This work successfully generates an uncertainty-aware surrogate model of the EuroPED plasma pedestal model using the Bayesian neural network with noise contrastive prior (BNN-NCP) technique. This model is trained using data from the JET-ILW pedestal database and subsequent model evaluations, conforming to EuroPED-NN. The BNN-NCP technique has been proven to be a suitable method for generating uncertainty-aware surrogate models. It matches the output results of a regular neural network while providing confidence estimates for predictions as uncertainties. Additionally, it highlights out-of-distribution (OOD) regions using surrogate model uncertainties. This provides critical insights into model robustness and reliability. EuroPED-NN has been physically validated, first, analyzing electron density $n_e\!\left(\psi_{\text{pol}}=0.94\right)$ with respect to increasing plasma current, $I_p$, and second, validating the $\Delta-\beta_{p,ped}$ relation associated with the EuroPED model. This affirms the robustness of the underlying physics learned by the surrogate model. On top of that, the method was used to develop a EuroPED-like model fed with experimental data, i.e. an uncertainty aware experimental model, which is functional in JET database. Both models have been also tested in $\sim 50$ AUG shots.	翻訳日:2024-09-04 19:43:36 公開日:2024-09-02
# S-NeRF++:ニューラル再構成と生成による自律走行シミュレーション S-NeRF++: Autonomous Driving Simulation via Neural Reconstruction and Generation ( http://arxiv.org/abs/2402.02112v2 ) ライセンス: Link先を確認	Yurui Chen, Junge Zhang, Ziyang Xie, Wenye Li, Feihu Zhang, Jiachen Lu, Li Zhang,	(参考訳) 自動運転シミュレーションシステムは、自動運転データを強化し、複雑な交通シナリオと稀な交通シナリオをシミュレートし、ナビゲーションの安全性を確保する上で重要な役割を果たす。しかし、手動モデリングや2次元画像編集に大きく依存する従来のシミュレーションシステムは、広いシーンへのスケーリングと現実的なシミュレーションデータの生成に苦労した。本研究では,ニューラル再構成に基づく革新的な自律運転シミュレーションシステムであるS-NeRF++を提案する。 nuScenesやWaymoといった、広く使用されている自動運転データセットに基づいて、S-NeRF++は、多くの現実的なストリートシーンと、高いレンダリング品質のフォアグラウンドオブジェクトを生成し、操作とシミュレーションにかなりの柔軟性を提供する。具体的には、S-NeRF++は大規模シーンと移動車両を合成するための強化された神経放射場であり、シーンパラメータ化とカメラポーズ学習を改善している。このシステムは、ノイズと疎結合なLiDARデータを効果的に利用し、トレーニングを洗練し、奥行きの外れに対処し、高品質な再構築とノベルビューレンダリングを確実にする。また, 総合的なシナリオ作成を支援するため, さまざまな前景車両を再構成・生成し, 照明効果と影効果を巧みに統合し, シミュレーションの現実性をさらに向上する高度前景融合パイプラインを開発し, 多様な前景資産銀行も提供する。 S-NeRF++によって提供される高品質なシミュレーションデータにより、認識手法は複数の自律走行下流タスクのパフォーマンス向上を享受し、さらに提案したシミュレータの有効性を実証する。 Autonomous driving simulation system plays a crucial role in enhancing self-driving data and simulating complex and rare traffic scenarios, ensuring navigation safety. However, traditional simulation systems, which often heavily rely on manual modeling and 2D image editing, struggled with scaling to extensive scenes and generating realistic simulation data. In this study, we present S-NeRF++, an innovative autonomous driving simulation system based on neural reconstruction. Trained on widely-used self-driving datasets such as nuScenes and Waymo, S-NeRF++ can generate a large number of realistic street scenes and foreground objects with high rendering quality as well as offering considerable flexibility in manipulation and simulation. Specifically, S-NeRF++ is an enhanced neural radiance field for synthesizing large-scale scenes and moving vehicles, with improved scene parameterization and camera pose learning. The system effectively utilizes noisy and sparse LiDAR data to refine training and address depth outliers, ensuring high-quality reconstruction and novel-view rendering. It also provides a diverse foreground asset bank by reconstructing and generating different foreground vehicles to support comprehensive scenario creation.Moreover, we have developed an advanced foreground-background fusion pipeline that skillfully integrates illumination and shadow effects, further enhancing the realism of our simulations. With the high-quality simulated data provided by our S-NeRF++, we found the perception methods enjoy performance boosts on several autonomous driving downstream tasks, further demonstrating our proposed simulator's effectiveness.	翻訳日:2024-09-04 19:43:36 公開日:2024-09-02
# AlphaFoldがタンパク質アンサンブル生成のためのフローマッチングを発表 AlphaFold Meets Flow Matching for Generating Protein Ensembles ( http://arxiv.org/abs/2402.04845v2 ) ライセンス: Link先を確認	Bowen Jing, Bonnie Berger, Tommi Jaakkola,	(参考訳) タンパク質の生物学的機能はしばしば動的構造的アンサンブルに依存する。本研究では,タンパク質のコンフォメーション・ランドスケープを学習・サンプリングするためのフローベース生成モデリング手法を開発する。我々は,AlphaFold や ESMFold のような高精度な単一状態予測器を再利用し,それらをカスタムフローマッチングフレームワークで微調整し,AlphaFlow や ESMFlow と呼ばれるタンパク質構造のシーケンシャルコンディトン生成モデルを得る。 PDBをトレーニングし評価すると,本手法はAlphaFoldとMSAサブサンプリングと比較して精度と多様性の優れた組み合わせを提供する。本手法は全原子MDからのアンサンブルのさらなる訓練を行うと, コンフォメーションの柔軟性, 位置分布, および未知タンパク質の高次アンサンブル観測値を正確に把握する。さらに,提案手法は,MD軌道の再現よりも高速な壁面収束による静的PDB構造を多様化し,高コストな物理シミュレーションのプロキシとしての可能性を示す。コードはhttps://github.com/bjing2016/alphaflow.comで公開されている。 The biological functions of proteins often depend on dynamic structural ensembles. In this work, we develop a flow-based generative modeling approach for learning and sampling the conformational landscapes of proteins. We repurpose highly accurate single-state predictors such as AlphaFold and ESMFold and fine-tune them under a custom flow matching framework to obtain sequence-conditoned generative models of protein structure called AlphaFlow and ESMFlow. When trained and evaluated on the PDB, our method provides a superior combination of precision and diversity compared to AlphaFold with MSA subsampling. When further trained on ensembles from all-atom MD, our method accurately captures conformational flexibility, positional distributions, and higher-order ensemble observables for unseen proteins. Moreover, our method can diversify a static PDB structure with faster wall-clock convergence to certain equilibrium properties than replicate MD trajectories, demonstrating its potential as a proxy for expensive physics-based simulations. Code is available at https://github.com/bjing2016/alphaflow.	翻訳日:2024-09-04 19:43:36 公開日:2024-09-02
# 短期量子コンピュータにおけるテンソルネットワークノイズ特性 Tensor network noise characterization for near-term quantum computers ( http://arxiv.org/abs/2402.08556v2 ) ライセンス: Link先を確認	Stefano Mangini, Marco Cattaneo, Daniel Cavalcanti, Sergei Filippov, Matteo A. C. Rossi, Guillermo García-Pérez,	(参考訳) 現在の量子デバイスにおけるノイズのキャラクタリゼーションは、その計算能力を完全に活用する上で、最重要事項である。しかし、数十量子ビットからなるシステムでは、直接量子プロセストモグラフィーは実現不可能となる。テンソルネットワークに基づく有望な代替手法が最近提案された [Nat. Commun. 14, 2858 (2023)]。本稿では,近距離量子コンピュータにおけるノイズチャネルのキャラクタリゼーションに適応し,その性能を徹底的に検討する。特に,実験により実現可能なトモグラフィーサンプルを用いて,量子回路の各層に影響を及ぼす現実的な相関ノイズモデルを正確に解析し,最大20量子ビットのシステム上での性能について検討する。さらに,本手法と最近提案されたノイズ対応テンソルネットワーク誤差低減プロトコルを組み合わせることで,ノイズの多い回路における結果の補正を行い,深部回路インスタンスにおいても正確な推定を行う。これにより、テンソルネットワークに基づくノイズキャラクタリゼーションプロトコルは、短期量子コンピューティング時代の実用的なエラーキャラクタリゼーションと緩和のための貴重なツールとして位置づけられる。 Characterization of noise in current near-term quantum devices is of paramount importance to fully use their computational power. However, direct quantum process tomography becomes unfeasible for systems composed of tens of qubits. A promising alternative method based on tensor networks was recently proposed [Nat. Commun. 14, 2858 (2023)]. In this paper, we adapt it for the characterization of noise channels on near-term quantum computers and investigate its performance thoroughly. In particular, we show how experimentally feasible tomographic samples are sufficient to accurately characterize realistic correlated noise models affecting individual layers of quantum circuits, and study its performance on systems composed of up to 20 qubits. Furthermore, we combine this noise characterization method with a recently proposed noise-aware tensor network error mitigation protocol for correcting outcomes in noisy circuits, resulting accurate estimations even on deep circuit instances. This positions the tensor-network-based noise characterization protocol as a valuable tool for practical error characterization and mitigation in the near-term quantum computing era.	翻訳日:2024-09-04 19:43:36 公開日:2024-09-02
# ResQuNNs:量子畳み込みニューラルネットワークにおけるディープラーニングの実現に向けて ResQuNNs:Towards Enabling Deep Learning in Quantum Convolution Neural Networks ( http://arxiv.org/abs/2402.09146v5 ) ライセンス: Link先を確認	Muhammad Kashif, Muhammad Shafique,	(参考訳) 本稿では、トレーニング可能な準進化層を導入し、それに関連する重要な課題に対処することにより、クオン進化ニューラルネットワーク(QuNN)の性能を向上させるための新しい枠組みを提案する。従来の準進化的レイヤは、機能抽出には有益だが、ほとんど静的であり、適応性は限られている。最先端とは違って、これらのレイヤ内でのトレーニングを可能にすることで、この制限を克服し、QuNNの柔軟性と可能性を大幅に向上させています。しかし、複数のトレーニング可能な準畳み込み層の導入は、主にこれらの層にまたがる勾配にアクセスするのが困難であるため、勾配に基づく最適化の複雑さを引き起こす。これを解決するために,Residual Quanvolutional Neural Networks (ResQuNNs) という新しいアーキテクチャを提案する。畳み込み層間に残留ブロックを挿入することにより、ネットワーク全体のグラデーションアクセスが向上し、トレーニング性能が向上する。さらに,これらの残留ブロックの戦略的配置に関する実証的証拠をQuNN内に提示する。大規模な実験により,残差ブロックの効率的な構成が特定され,ネットワーク内のすべての層をまたがる勾配が実現され,結果として効率のよいトレーニングがもたらされる。本研究は, 残差ブロックの正確な位置がQuNNの性能向上の最大化に重要な役割を担っていることを示唆する。我々の結果は、量子深層学習の進化における大きな一歩であり、理論開発と実用的な量子コンピューティングアプリケーションの両方に新しい道のりを提供する。 In this paper, we present a novel framework for enhancing the performance of Quanvolutional Neural Networks (QuNNs) by introducing trainable quanvolutional layers and addressing the critical challenges associated with them. Traditional quanvolutional layers, although beneficial for feature extraction, have largely been static, offering limited adaptability. Unlike state-of-the-art, our research overcomes this limitation by enabling training within these layers, significantly increasing the flexibility and potential of QuNNs. However, the introduction of multiple trainable quanvolutional layers induces complexities in gradient-based optimization, primarily due to the difficulty in accessing gradients across these layers. To resolve this, we propose a novel architecture, Residual Quanvolutional Neural Networks (ResQuNNs), leveraging the concept of residual learning, which facilitates the flow of gradients by adding skip connections between layers. By inserting residual blocks between quanvolutional layers, we ensure enhanced gradient access throughout the network, leading to improved training performance. Moreover, we provide empirical evidence on the strategic placement of these residual blocks within QuNNs. Through extensive experimentation, we identify an efficient configuration of residual blocks, which enables gradients across all the layers in the network that eventually results in efficient training. Our findings suggest that the precise location of residual blocks plays a crucial role in maximizing the performance gains in QuNNs. Our results mark a substantial step forward in the evolution of quantum deep learning, offering new avenues for both theoretical development and practical quantum computing applications.	翻訳日:2024-09-04 19:43:36 公開日:2024-09-02
# Into the Unknown: Self-Learning Large Language Models Into the Unknown: Self-Learning Large Language Models ( http://arxiv.org/abs/2402.09147v3 ) ライセンス: Link先を確認	Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko,	(参考訳) 自己学習 LLM の主な課題は,何を学ぶべきかという問題である。本研究では,LLMが自覚の自己評価を通じて,未知の知識を独立に学習することのできる自己学習型LLMフレームワークを提案する。我々は、モデルに未知の原子知識を識別する「未知のポイント」という概念と、モデルに未知の知識を吸収することに特化した自己学習ループの作成を促進する4つの方法を紹介した。さらに,LLMの自己学習能力を評価するための評価指標を開発した。実験の結果,少なくとも3Bパラメータを持つLCMは,ある程度の指導訓練を行えば,自己学習をうまく行うことができることがわかった。さらに,自己学習を行ったモデルの性能を未学習モデルと比較することにより,自己学習の有効性を実証した。我々の自己学習の概念は、より効率的なLLM更新を可能にし、LLM知識交換のための新しい視点を開放します。 We address the main problem of self-learning LLM: the question of what to learn. We propose a self-learning LLM framework that enables an LLM to independently learn previously unknown knowledge through self-assessment of their own hallucinations. We introduce a concept called Point in the Unknown (PiU) to identify atomic knowledge unknown to a model, along with four methods for automatic PiUs identification, facilitating the creation of a self-learning loop that focuses exclusively on the absorption of currently unknown knowledge into the model. Additionally, we developed evaluation metrics to gauge an LLM's self-learning capability. Our experiments revealed that LLMs with at least 3B parameters that have undergone some instruction training would be able to perform self-learning well. We further proved the effectiveness of self-learning by comparing the performance of a model that has undergone self-learning to a model that has not. Our self-learning concept allows more efficient LLM updates and opens new perspectives for LLM knowledge exchange.	翻訳日:2024-09-04 19:31:47 公開日:2024-09-02
# HPCクラスタ上での大規模変分量子アルゴリズムのシミュレータ実証 Simulator Demonstration of Large Scale Variational Quantum Algorithm on HPC Cluster ( http://arxiv.org/abs/2402.11878v2 ) ライセンス: Link先を確認	Mikio Morita, Yoshinori Tomita, Junpei Koyama, Koichi Kimura,	(参考訳) 量子シミュレーション技術の進歩は、量子アルゴリズムの研究がより高度で複雑になってきているため、ますます求められている。状態ベクトルシミュレーションは、量子ビットの数に関して指数関数的に計算ノードのCPUとメモリ資源を利用する。この問題は、効果的に動作する多数の計算ノードやシミュレーションフレームワークを準備することで解決されている。本研究の目的は, MPIと分散処理並列性の比を目標問題設定に応じて調整し, 計算結果に精度の影響を考慮し, ハミルトニアンをスリム化することで, 限られた計算資源を効率的に活用することである。 InfiniBandにより最大1024個のFUJITSUプロセッサA64FXを接続したHPCクラスタ上で,変分量子固有解器(VQE)を用いてフェルミオンモデルの基底状態エネルギー計算を行った。 VQEシミュレーションの200倍の高速化を実現し,32kbitsの地中エネルギー計算を許容時間で実証した。この結果は,30量子ビット状態ベクトルシミュレーションを現実的に利用して,変分量子アルゴリズムのさらなる研究を可能にすることを示唆している。 Advances in quantum simulator technology is increasingly required because research on quantum algorithms is becoming more sophisticated and complex. State vector simulation utilizes CPU and memory resources in computing nodes exponentially with respect to the number of qubits; furthermore, in a variational quantum algorithm, the large number of repeated runs by classical optimization is also a heavy load. This problem has been addressed by preparing numerous computing nodes or simulation frameworks that work effectively. This study aimed to accelerate quantum simulation using two newly proposed methods: to efficiently utilize limited computational resources by adjusting the ratio of the MPI and distributed processing parallelism corresponding to the target problem settings and to slim down the Hamiltonian by considering the effect of accuracy on the calculation result. Ground-state energy calculations of fermionic model were performed using variational quantum eigensolver (VQE) on an HPC cluster with up to 1024 FUJITSU Processor A64FX connected to each other by InfiniBand; the processor is also used on supercomputer Fugaku. We achieved 200 times higher speed over VQE simulations and demonstrated 32 qubits ground-state energy calculations in acceptable time. This result indicates that > 30 qubit state vector simulations can be realistically utilized to further research on variational quantum algorithms.	翻訳日:2024-09-04 19:31:47 公開日:2024-09-02
# MM-Soc:ソーシャルメディアプラットフォームにおけるマルチモーダル大言語モデルのベンチマーク MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms ( http://arxiv.org/abs/2402.14154v3 ) ライセンス: Link先を確認	Yiqiao Jin, Minje Choi, Gaurav Verma, Jindong Wang, Srijan Kumar,	(参考訳) ソーシャルメディアプラットフォームは、テキスト、画像、ビデオを含むマルチモーダルな情報交換のためのハブであり、マシンがオンライン空間におけるインタラクションに関連する情報や感情を理解することは困難である。 MLLM(Multimodal Large Language Models)は、これらの課題に対する有望な解決策として登場したが、人間の感情や誤報のような複雑な内容の正確な解釈に苦慮している。本稿では,マルチモーダルなソーシャルメディアコンテンツに対するMLLMの理解を評価するための総合的なベンチマークであるMM-Socを紹介する。 MM-Socは、注目すべきマルチモーダルデータセットをコンパイルし、誤情報検出、ヘイトスピーチ検出、ソーシャルコンテキスト生成など、さまざまなタスクをターゲットにした、新しい大規模なYouTubeタグ付けデータセットを組み込んだ。オープンソースMLLMの10種類のサイズバリエーションを網羅的に評価した結果,性能の相違が明らかとなり,モデルの社会的理解能力の向上の必要性が浮き彫りになった。分析の結果、ゼロショット環境では、様々なMLLMが一般的にソーシャルメディアのタスクを扱うのに困難を呈することが明らかとなった。しかし、MLLMは微調整後の性能向上を示し、改善の道筋を示唆している。私たちのコードとデータはhttps://github.com/claws-lab/MMSoc.git.comで公開されています。 Social media platforms are hubs for multimodal information exchange, encompassing text, images, and videos, making it challenging for machines to comprehend the information or emotions associated with interactions in online spaces. Multimodal Large Language Models (MLLMs) have emerged as a promising solution to these challenges, yet they struggle to accurately interpret human emotions and complex content such as misinformation. This paper introduces MM-Soc, a comprehensive benchmark designed to evaluate MLLMs' understanding of multimodal social media content. MM-Soc compiles prominent multimodal datasets and incorporates a novel large-scale YouTube tagging dataset, targeting a range of tasks from misinformation detection, hate speech detection, and social context generation. Through our exhaustive evaluation on ten size-variants of four open-source MLLMs, we have identified significant performance disparities, highlighting the need for advancements in models' social understanding capabilities. Our analysis reveals that, in a zero-shot setting, various types of MLLMs generally exhibit difficulties in handling social media tasks. However, MLLMs demonstrate performance improvements post fine-tuning, suggesting potential pathways for improvement. Our code and data are available at https://github.com/claws-lab/MMSoc.git.	翻訳日:2024-09-04 19:31:47 公開日:2024-09-02
# 非エルミート・フォック皮膚効果による多体量子傷の増強 Enhanced many-body quantum scars from the non-Hermitian Fock skin effect ( http://arxiv.org/abs/2403.02395v2 ) ライセンス: Link先を確認	Ruizhe Shen, Fang Qin, Jean-Yves Desaules, Zlatko Papić, Ching Hua Lee,	(参考訳) 拡張されたブロッホ波とは対照的に、非エルミタンポンピングに由来するいわゆる皮膚効果により、単一粒子は空間的に局所化することができる。ここでは, 速度論的に制約された多体系において, 皮膚効果がFock空間内の動的増幅として現れることを示す。我々は、この非エルミートフォック皮膚効果をPXPモデルの非対称バージョンで例示し、量子多体傷の非エルミート的類似であるエルゴード性破壊固有状態をもたらすことを示す。これらの非エルミート傷の特徴は、外的障害に対する頑丈さの強化である。傾斜Bose-Hubbard光学格子におけるレーザー誘起損失による非エルミタン傷拡大の実験的実現法を提案する。さらに,IBMの量子プロセッサ上で,このようなスカーエンハンスメントのディジタルシミュレーションを実装した。以上の結果から,Fockスキン効果は,汎用オープン量子システムにおいて堅牢な非エルゴード状態を生成する強力なツールとなることが示唆された。 In contrast with extended Bloch waves, a single particle can become spatially localized due to the so-called skin effect originating from non-Hermitian pumping. Here we show that in kinetically-constrained many-body systems, the skin effect can instead manifest as dynamical amplification within the Fock space, beyond the intuitively expected and previously studied particle localization and clustering. We exemplify this non-Hermitian Fock skin effect in an asymmetric version of the PXP model and show that it gives rise to ergodicity-breaking eigenstates, the non-Hermitian analogs of quantum many-body scars. A distinguishing feature of these non-Hermitian scars is their enhanced robustness against external disorders. We propose an experimental realization of the non-Hermitian scar enhancement in a tilted Bose-Hubbard optical lattice with laser-induced loss. Additionally, we implement digital simulations of such scar enhancement on the IBM quantum processor. Our results show that the Fock skin effect provides a powerful tool for creating robust non-ergodic states in generic open quantum systems.	翻訳日:2024-09-04 19:15:46 公開日:2024-09-02
# グローバルに安定なニューラル・イミテーション・ポリシー Globally Stable Neural Imitation Policies ( http://arxiv.org/abs/2403.04118v2 ) ライセンス: Link先を確認	Amin Abyaneh, Mariana Sosa Guzmán, Hsiu-Chin Lin,	(参考訳) 模倣学習は、ソリューション空間のスクラッチから政策学習の資源集約的で時間を要する性質を緩和するための効果的なアプローチを示す。結果として得られた政策は専門家のデモンストレーションを確実に模倣することができるが、国家空間の未調査領域での予測可能性に欠けることが多く、摂動に直面した場合に重大な安全上の懸念が生じる。これらの課題に対処するために,形式的安定性を保証するポリシーを生成する模倣学習システムであるSNDS(Stable Neural Dynamical System)を導入する。我々は、リアプノフの定理に基づく安定性の表現を容易にするニューラルポリシーアーキテクチャをデプロイし、そのポリシーとそれに対応するリアプノフ候補を共同で訓練し、グローバルな安定性を確保する。シミュレーション実験を行い、実世界のマニピュレータアームにトレーニングされたポリシーをうまく展開することで、我々のアプローチを検証する。実験の結果,提案手法は,従来の模倣学習手法に係わる不安定性,精度,計算強度の問題を克服し,複雑な計画シナリオにおける安定的な政策学習のための有望な解であることが示された。 Imitation learning presents an effective approach to alleviate the resource-intensive and time-consuming nature of policy learning from scratch in the solution space. Even though the resulting policy can mimic expert demonstrations reliably, it often lacks predictability in unexplored regions of the state-space, giving rise to significant safety concerns in the face of perturbations. To address these challenges, we introduce the Stable Neural Dynamical System (SNDS), an imitation learning regime which produces a policy with formal stability guarantees. We deploy a neural policy architecture that facilitates the representation of stability based on Lyapunov theorem, and jointly train the policy and its corresponding Lyapunov candidate to ensure global stability. We validate our approach by conducting extensive experiments in simulation and successfully deploying the trained policies on a real-world manipulator arm. The experimental results demonstrate that our method overcomes the instability, accuracy, and computational intensity problems associated with previous imitation learning methods, making our method a promising solution for stable policy learning in complex planning scenarios.	翻訳日:2024-09-04 19:15:46 公開日:2024-09-02
# ローレンツ量子コンピュータのパワー The Power of Lorentz Quantum Computer ( http://arxiv.org/abs/2403.04170v2 ) ライセンス: Link先を確認	Qi Zhang, Biao Wu,	(参考訳) 本稿では,最近提案されたローレンツ量子コンピュータ(LQC)の,従来の量子コンピュータと比較して優れた性能を示す。計算複雑性クラスとして有界誤差ローレンツ量子多項式時間 (BLQP) を導入し、複雑性クラス ${\text P}^{\sharp \text{P}}$ と等価性を示す。最大独立集合 PP (probabilistic polynomial-time) の問題を効率的に解き、結果として${\text P}^{\sharp \text{P}}$ を多項式時間で解くLQCアルゴリズムを提案する。さらに、アーロンソンの提案したポストセレクションによる量子コンピューティングはLQCで効率的にシミュレートできるが、その逆ではないことを示す。 We demonstrate the superior capabilities of the recently proposed Lorentz quantum computer (LQC) compared to conventional quantum computers. We introduce an associated computational complexity class termed bounded-error Lorentz quantum polynomial-time (BLQP), demonstrating its equivalence to the complexity class ${\text P}^{\sharp \text{P}}$. We present LQC algorithms that efficiently solve the problem of maximum independent set, PP (probabilistic polynomial-time), and consequently ${\text P}^{\sharp \text{P}}$, all within polynomial time. Additionally, we show that the quantum computing with postselection proposed by Aaronson can be efficiently simulated by LQC, but not vice versa.	翻訳日:2024-09-04 19:15:46 公開日:2024-09-02
# 大域的推定のための最適戦略の厳密な階層:大域的推定と局所的推定をリンクする Strict hierarchy of optimal strategies for global estimations: Linking global estimations with local ones ( http://arxiv.org/abs/2403.06585v3 ) ライセンス: Link先を確認	Zhao-Yi Zhou, Jing-Tao Qiu, Da-Jian Zhang,	(参考訳) 量子力学における決定的かつ挑戦的な問題は、推定戦略において達成可能な最終的な精度を確かめることである。局所的および大域的推定のパラダイムは2つあるが、現在の研究は主に局所的推定に限られており、興味のパラメータがほぼ知られている場合に有用である。このレターでは,少数の測定データでも確実に動作し,パラメータについて十分な事前知識を持たないグローバル推定へのパラダイムシフトを目標としている。ここでの鍵となる革新は、仮想想像時間進化と呼ばれる技術を開発することである。これは、グローバルな推定で得られた情報と仮想的な局所的な推定のための量子フィッシャー情報との等価性を確立するものである。これは、局所的な見積もりに適した強力なツールを活用することで、グローバルな見積もりの領域における課題を克服する興味深い経路を提供する。提案手法は,グローバルな評価戦略の達成可能な精度の厳密な階層を明らかにし,局所的推定における従来の知恵とは対照的な予期せぬ結果を明らかにするものである。 A crucial yet challenging issue in quantum metrology is to ascertain the ultimate precision achievable in estimation strategies. While there are two paradigms of estimations, local and global, current research is largely confined to local estimations, which are useful once the parameter of interest is approximately known. In this Letter we target a paradigm shift towards global estimations, which can operate reliably even with a few measurement data and no substantial prior knowledge about the parameter. The key innovation here is to develop a technique, dubbed virtual imaginary time evolution, which establishes an equality between the information gained in a global estimation and the quantum Fisher information for a virtual local estimation. This offers an intriguing pathway to surmount challenges in the realm of global estimations by leveraging powerful tools tailored for local estimations. We explore our technique to reveal a strict hierarchy of achievable precision for different global estimation strategies and uncover unexpected results contrary to conventional wisdom in local estimations.	翻訳日:2024-09-04 19:15:46 公開日:2024-09-02
# レイヤー2後の1/2トークン画像:大規模視覚言語モデルのためのプラグ・アンド・プレイ推論高速化 An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models ( http://arxiv.org/abs/2403.06764v3 ) ライセンス: Link先を確認	Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, Baobao Chang,	(参考訳) 本研究では,LLaVA-1.5,QwenVL-Chat,Video-LLaVAなどの顕著なモデルにおいて,LVLM(Large Vision-Language Models)における非効率な注意現象を同定する。視覚的トークンに対する注意計算は、一般的なLVLMの深層では極めて非効率であることが判明し、テキストデータ処理と比較してスペーサーアプローチの必要性が示唆された。この目的のために我々は,初期層における適応的な注意パターンを学習し,その後の層で視覚トークンをプルーニングすることにより,計算効率を最適化する汎用的なプラグアンドプレイ方式であるFastVを紹介する。我々の評価は、幅広い画像・映像理解タスクの性能を犠牲にすることなく、計算コストを劇的に削減するFastV(例えば、LLaVA-1.5-13BのFLOPを45削減する)の能力を示す。 FastVの計算効率と性能トレードオフは、高度にカスタマイズ可能で、パレート効率が高い。 13BパラメータモデルのFLOPを圧縮して、7BパラメータモデルのFLOPよりも低い予算を達成するが、優れた性能は維持できる。我々は、FastVがエッジデバイスや商用モデルにLVLMを配備する上で、実用的な価値を持っていると考えている。コードはhttps://github.com/pkunlp-icler/FastVで公開されている。 In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of extreme inefficiency in the deep layers of popular LVLMs, suggesting a need for a sparser approach compared to textual data handling. To this end, we introduce FastV, a versatile plug-and-play method designed to optimize computational efficiency by learning adaptive attention patterns in early layers and pruning visual tokens in subsequent ones. Our evaluations demonstrate FastV's ability to dramatically reduce computational costs (e.g., a 45 reduction in FLOPs for LLaVA-1.5-13B) without sacrificing performance in a wide range of image and video understanding tasks. The computational efficiency and performance trade-off of FastV are highly customizable and pareto-efficient. It can compress the FLOPs of a 13B-parameter model to achieve a lower budget than that of a 7B-parameter model, while still maintaining superior performance. We believe FastV has practical values for deployment of LVLMs in edge devices and commercial models. Code is released at https://github.com/pkunlp-icler/FastV.	翻訳日:2024-09-04 19:15:46 公開日:2024-09-02
# 非識別基準の修正による生成言語モデルへのフェアネスの一般化 Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria ( http://arxiv.org/abs/2403.08564v3 ) ライセンス: Link先を確認	Sara Sterlie, Nina Weng, Aasa Feragen,	(参考訳) 大規模言語モデルなどのジェネレーティブAIは,近年,急速な発展を遂げている。これらのモデルが一般に普及するにつれて、アプリケーションにおける有害なバイアスの持続性と増幅に関する懸念が生じる。性別のステレオタイプは、彼らが対象とする個人に対して有害で制限されうる。本稿では、ジェンダーバイアスを広汎な社会的構成として認識し、生成言語モデルにおけるジェンダーバイアスの存在を明らかにする方法と定量化方法について考察する。特に、独立性、分離性、充足性という3つのよく知られた非識別基準のジェネレーティブAIアナログを導出する。これらの基準を実際に実施するために、我々は、職業性ステレオタイプに焦点を当てた各基準のためのプロンプトを設計し、特に、医療試験を利用して、生成的AIコンテキストに基礎的真理を導入する。本研究は,このような対話型言語モデルにおける職業性バイアスの存在に対処するものである。 Generative AI, such as large language models, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative language models. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to introduce the ground truth in the generative AI context. Our results address the presence of occupational gender bias within such conversational language models.	翻訳日:2024-09-04 19:15:46 公開日:2024-09-02
# 2+1Dアベリアゲージ理論における任意整数スピンの量子多体スカー Quantum Many-Body Scars for Arbitrary Integer Spin in 2+1D Abelian Gauge Theories ( http://arxiv.org/abs/2403.08892v3 ) ライセンス: Link先を確認	Thea Budde, Marina Krstić Marinković, Joao C. Pinto Barros,	(参考訳) 量子多体スカー(Quantum Many-Body Scars)の存在は、様々な量子多体系にまたがって確立されている。これらはスピン1/2量子リンクモデルに対応するゲージ理論を含む。高いスピンを持つゲージ理論における量子スカーの確立は、正確な対角化に依存する既存の数値法ではアクセスできない。我々は、任意の大きさの整数スピンを持つ純ゲージ理論のスカーを2+1$Dで体系的に同定し、そこでは電場はリンク当たり2S+1$状態に制限される。明示的な解析的構成を通じて、任意の整数スピンに対する2+1$Dゲージ理論において、傷跡の存在が広く見られることを示す。これらの結果は,小回転スピンと$S=1$量子リンクモデルに対して数値的に確認する。我々の分析構造は、既存の数値法で探索できる量やスピンよりもはるかに遠い傷痕の存在を確立し、興味深い非平衡現象への量子シミュレーション実験を導出することができる。 The existence of Quantum Many-Body Scars, which prevents thermalization from certain initial states after a long time, has been established across different quantum many-body systems. These include gauge theories corresponding to spin-1/2 quantum link models. Establishing quantum scars in gauge theories with high spin is not accessible with existing numerical methods, which rely on exact diagonalization. We systematically identify scars for pure gauge theories with arbitrarily large integer spin $S$ in $2+1$D, where the electric field is restricted to $2S+1$ states per link. Through an explicit analytic construction, we show that the presence of scars is widespread in $2+1$D gauge theories for arbitrary integer spin. We confirm these findings numerically for small truncated spin and $S=1$ quantum link models. Our analytic construction establishes the presence of scars far beyond volumes and spins that can be probed with existing numerical methods and can guide quantum simulation experiments toward interesting non-equilibrium phenomena, inaccessible otherwise.	翻訳日:2024-09-04 19:15:46 公開日:2024-09-02
# 変量量子固有溶媒の物理的改善 Physically motivated improvements of Variational Quantum Eigensolvers ( http://arxiv.org/abs/2403.09624v2 ) ライセンス: Link先を確認	Nonia Vaquero-Sabater, Abel Carreras, Román Orús, Nicholas J. Mayhall, David Casanova,	(参考訳) アダプティブデリバティブ・アサンブル・擬似トロッター変分量子固有解器 (ADAPT-VQE) は、ノイズ量子デバイスを用いた量子化学における電子構造問題に対する重要なアプローチとして登場した。しかし,既存の技術的制約を克服するため,ADAPT-VQEの有効性を高める努力を行った。電子構造理論からの洞察を生かし、計算負荷を加味せずに状態準備を最適化することに集中し、アンザッツ展開を誘導し、より簡潔な波動関数を高速に解へ収束させる。これらの進歩は、より浅い回路で頂点に達し、実証されたように測定要求を減らした。本研究は,H4モデルの単次元,二次元,三次元の配列,および水分子におけるそれらの性能を評価する。究極的には、この研究はADAPT-VQEの効率を固める物理的動機付け戦略の可能性を証明し、量子化学シミュレーションにおいて重要な一歩を踏み出した。 The Adaptive Derivative-Assembled Pseudo-Trotter Variational Quantum Eigensolver (ADAPT-VQE) has emerged as a pivotal promising approach for electronic structure challenges in quantum chemistry with noisy quantum devices. Nevertheless, to surmount existing technological constraints, this study endeavors to enhance ADAPT-VQE's efficacy. Leveraging insights from electronic structure theory, we concentrate on optimizing state preparation without added computational burden and guiding ansatz expansion to yield more concise wavefunctions with expedited convergence toward exact solutions. These advancements culminate in shallower circuits and, as demonstrated, reduced measurement requirements. This research delineates these enhancements and assesses their performance across mono, di, and tridimensional arrangements of H4 models, as well as in the water molecule. Ultimately, this work attests to the viability of physically-motivated strategies in fortifying ADAPT-VQE's efficiency, marking a significant stride in quantum chemistry simulations.	翻訳日:2024-09-04 19:15:46 公開日:2024-09-02
# DarkGS: 暗黒でのロボット探査をめざす3Dガウスとニューラル照明の学習 DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark ( http://arxiv.org/abs/2403.10814v2 ) ライセンス: Link先を確認	Tianyi Zhang, Kaining Huang, Weiming Zhi, Matthew Johnson-Roberson,	(参考訳) 人間は、限られた、あるいは様々なレベルの照明の下でも、環境の一貫性のある精神モデルを構築することができる。私たちは同じ能力でロボットを養うことを望んでいます。本稿では, 照明条件が悪く, 移動光源で光写実的シーン表現を構築することの課題に対処する。我々は,照明を学習問題としてモデル化する作業にアプローチし,シーン再構築を支援するために開発した照明モデルを利用する。我々は,Neural Light Simulator (NeLiS) を用いて,カメラライトシステムのモデル化とキャリブレーションを行う革新的なフレームワークを提案する。さらに,新しい視点からリアルタイムで光リアルなレンダリングが可能な3次元ガウスシーンモデルを作成するためにNeLiSを応用したDarkGSを提案する。本研究では,様々な実環境におけるシミュレータとシステムの適用性およびロバスト性を示す。 Humans have the remarkable ability to construct consistent mental models of an environment, even under limited or varying levels of illumination. We wish to endow robots with this same capability. In this paper, we tackle the challenge of constructing a photorealistic scene representation under poorly illuminated conditions and with a moving light source. We approach the task of modeling illumination as a learning problem, and utilize the developed illumination model to aid in scene reconstruction. We introduce an innovative framework that uses a data-driven approach, Neural Light Simulators (NeLiS), to model and calibrate the camera-light system. Furthermore, we present DarkGS, a method that applies NeLiS to create a relightable 3D Gaussian scene model capable of real-time, photorealistic rendering from novel viewpoints. We show the applicability and robustness of our proposed simulator and system in a variety of real-world environments.	翻訳日:2024-09-04 19:15:46 公開日:2024-09-02
# TRAM:3D映像から見る人間の世界的軌道と動き TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos ( http://arxiv.org/abs/2403.17346v2 ) ライセンス: Link先を確認	Yufu Wang, Ziyun Wang, Lingjie Liu, Kostas Daniilidis,	(参考訳) 我々は,TRAMを提案する。TRAMは人間のグローバルな軌道と動きを,線内ビデオから再構成する2段階の手法である。 TRAMはSLAMを強固にし、ダイナミックな人間の存在下でカメラの動きを回復させ、シーン背景を用いてモーションスケールを導出する。回収したカメラをメートルスケールの基準フレームとして使用し、人間の運動運動を抑えるためのビデオトランスフォーマーモデル(VIMO)を導入する。これら2つの動きを合成することにより、世界空間における3次元人間の正確な回復を実現し、これまでの作業との大きな差でグローバルな動き誤差を低減できる。 https://yufu-wang.github.io/tram4d/ We propose TRAM, a two-stage method to reconstruct a human's global trajectory and motion from in-the-wild videos. TRAM robustifies SLAM to recover the camera motion in the presence of dynamic humans and uses the scene background to derive the motion scale. Using the recovered camera as a metric-scale reference frame, we introduce a video transformer model (VIMO) to regress the kinematic body motion of a human. By composing the two motions, we achieve accurate recovery of 3D humans in the world space, reducing global motion errors by a large margin from prior work. https://yufu-wang.github.io/tram4d/	翻訳日:2024-09-04 19:02:17 公開日:2024-09-02
# 変調型クロスアテンションメモリによる高能率映像オブジェクト分割 Efficient Video Object Segmentation via Modulated Cross-Attention Memory ( http://arxiv.org/abs/2403.17937v2 ) ライセンス: Link先を確認	Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan,	(参考訳) 近年,半教師付きビデオオブジェクトセグメンテーションにおいて,トランスフォーマーに基づくアプローチが有望な結果を示している。しかし、これらのアプローチは一般的に、GPUメモリの要求が増加するため、数フレーム毎にメモリバンクを頻繁に拡張するため、長いビデオに苦しむ。我々は,時間的スムーズさを頻繁なメモリ拡張を必要とせず,時間的スムーズさをモデル化するために,MCAメモリを最適化し,動的に変更するMAVOSというトランスフォーマーベースの手法を提案する。提案したMCAは,映像長に関わらず,局所的特徴とグローバルな特徴を多種多様な粒度で効果的に符号化し,一貫した速度を効率的に維持する。複数のベンチマーク、LVOS、Long-Time Video、DAVIS 2017の大規模な実験では、提案したコントリビューションの有効性が実時間推論に結びつき、長いビデオのセグメンテーション精度を低下させることなく、メモリ要求が著しく削減された。既存のトランスフォーマーベースのアプローチと比較して、MAVOSはスピードを7.6倍にし、GPUメモリはショートビデオとロングビデオのデータセットで同等のセグメンテーション性能で87%削減しました。特にLVOSデータセットでは、単一のV100 GPU上で37フレーム/秒(FPS)で動作しながら、J&Fスコアが63.3%に達しています。私たちのコードとモデルは、https://github.com/Amshaker/MAVOS.comで公開されます。 Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation. However, these approaches typically struggle on long videos due to increased GPU memory demands, as they frequently expand the memory bank every few frames. We propose a transformer-based approach, named MAVOS, that introduces an optimized and dynamic long-term modulated cross-attention (MCA) memory to model temporal smoothness without requiring frequent memory expansion. The proposed MCA effectively encodes both local and global features at various levels of granularity while efficiently maintaining consistent speed regardless of the video length. Extensive experiments on multiple benchmarks, LVOS, Long-Time Video, and DAVIS 2017, demonstrate the effectiveness of our proposed contributions leading to real-time inference and markedly reduced memory demands without any degradation in segmentation accuracy on long videos. Compared to the best existing transformer-based approach, our MAVOS increases the speed by 7.6x, while significantly reducing the GPU memory by 87% with comparable segmentation performance on short and long video datasets. Notably on the LVOS dataset, our MAVOS achieves a J&F score of 63.3% while operating at 37 frames per second (FPS) on a single V100 GPU. Our code and models will be publicly available at: https://github.com/Amshaker/MAVOS.	翻訳日:2024-09-04 19:02:17 公開日:2024-09-02
# 低絡み状態の典型的熱化 Typical thermalization of low-entanglement states ( http://arxiv.org/abs/2403.18007v4 ) ライセンス: Link先を確認	Christian Bertoni, Clara Wassner, Giacomo Guarnieri, Jens Eisert,	(参考訳) 閉量子系のユニタリ進化から熱化を証明することは、現在でも部分的に解決されている最も古い問題の一つである。いくつかの試みにより、固有状態熱化仮説と呼ばれる仮説が様々な定式化され、初期状態の特定の条件下での熱化が導かれる。しかし、これらの条件は仮説の正確な定式化に敏感である。本研究は, 熱処理実験や量子シミュレーションなど, 様々な自然環境において操作可能な低絡み合い初期状態の重要事例に焦点を当てる。運用上重要な正確な条件下でこれらの状態が熱化されることを実証する。より具体的には、避けられない有限分解の議論に動機づけられて、初期状態の絡み合いが低いときに局所的な熱化につながる局所ハミルトニアン上のランダムエネルギー滑らか化を定義する。最後に、そのような変換は、Gibs状態にも、スペクトル上の一般的な滑らかさ条件の下でも、短時間のダイナミクスにも影響しないことを示す。 Proving thermalization from the unitary evolution of a closed quantum system is one of the oldest questions that is still nowadays only partially resolved. Several efforts have led to various formulations of what is called the eigenstate thermalization hypothesis, which leads to thermalization under certain conditions on the initial states. These conditions, however, are sensitive to the precise formulation of the hypothesis. In this work, we focus on the important case of low entanglement initial states, which are operationally accessible in many natural physical settings, including experimental schemes for testing thermalization and for quantum simulation. We prove thermalization of these states under precise conditions that have operational significance. More specifically, motivated by arguments of unavoidable finite resolution, we define a random energy smoothing on local Hamiltonians that leads to local thermalization when the initial state has low entanglement. Finally we show that such a transformation affects neither the Gibbs state locally nor, under generic smoothness conditions on the spectrum, the short-time dynamics.	翻訳日:2024-09-04 19:02:17 公開日:2024-09-02
# 限定監督下での揚力モデリング Uplift Modeling Under Limited Supervision ( http://arxiv.org/abs/2403.19289v4 ) ライセンス: Link先を確認	George Panagopoulos, Daniele Malitesta, Fragkiskos D. Malliaros, Jun Pang,	(参考訳) 電子商取引における因果効果の推定には、大規模な環境では実用的でないような費用がかかる傾向がある。このような治療効果を実際の介入なしに予測するために機械学習を活用することは、リスクを減らすための標準的なプラクティスである。しかし、既存の治療効果予測法は、実際の実験から構築され、本質的にはリスクが伴う、相当な大きさの訓練セットに依存する傾向にある。本研究では,電子商取引データに共通するグラフに依存して,必要なトレーニングセットのサイズを小さくするグラフニューラルネットワークを提案する。具体的には、ラベル付きインスタンスが制限されたノード回帰として問題を認識し、従来の因果効果推定器に似た2モデルニューラルアーキテクチャを開発し、符号化のための様々なメッセージパス層をテストする。さらに、追加的なステップとして、モデルと取得関数を組み合わせることで、極めて低い実験予算で設定したトレーニングセットの作成をガイドする。各ステップは他のモデルや治療ポリシーと別々に使用できるので、フレームワークは柔軟です。実大規模ネットワークにおける実験は、実験リスクを減らすために限られた監督で一般化できるモデルの必要性を浮き彫りにし、多くの場合、ランダムに近い動作を行う、最先端技術に対する我々の方法論の明確な優位性を示している。 Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from real experiments and are thus inherently risky to create. In this work we propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data. Specifically, we view the problem as node regression with a restricted number of labeled instances, develop a two-model neural architecture akin to previous causal effect estimators, and test varying message-passing layers for encoding. Furthermore, as an extra step, we combine the model with an acquisition function to guide the creation of the training set in settings with extremely low experimental budget. The framework is flexible since each step can be used separately with other models or treatment policies. The experiments on real large-scale networks indicate a clear advantage of our methodology over the state of the art, which in many cases performs close to random, underlining the need for models that can generalize with limited supervision to reduce experimental risks.	翻訳日:2024-09-04 19:02:17 公開日:2024-09-02
# PhysORD:オフロード運転における物理拡散運動予測のための神経・シンボリックアプローチ PhysORD: A Neuro-Symbolic Approach for Physics-infused Motion Prediction in Off-road Driving ( http://arxiv.org/abs/2404.01596v2 ) ライセンス: Link先を確認	Zhipeng Zhao, Bowen Li, Yi Du, Taimeng Fu, Chen Wang,	(参考訳) 移動予測はオフロード走行において重要であるが、車両と地形の間の複雑な相互作用のため、オフロード走行よりもはるかに多くの課題が生じる。従来の物理学に基づくアプローチは、力学系と外乱を正確にモデル化することの難しさに直面する。対照的に、データ駆動型ニューラルネットワークは広範なデータセットを必要とし、基本的な物理法則を明示的に把握するのに苦労する。両方の手法の利点を融合することにより、ニューロシンボリックアプローチは有望な方向を示す。これらの手法は物理法則をニューラルネットワークに組み込み、一般化能力を大幅に向上させる可能性がある。しかし、オフロード運転の実際の設定では、事前の作業は評価されなかった。このギャップを埋めるために、我々は、オイラー・ラグランジュ方程式(Euler-Lagrange equation)という保存則をオフロード駆動時の運動予測のためのデータ駆動ニューラルモデルに統合する、ニューラルシンボリックアプローチであるPhysORDを提案する。実験の結果,PhysORDは不確かさをモデル化することで車体の動きを正確に予測し,外乱を許容できることがわかった。精度と効率の両方で既存の手法を上回り、長期予測においてデータ効率の学習能力と一般化能力を示す。 Motion prediction is critical for autonomous off-road driving, however, it presents significantly more challenges than on-road driving because of the complex interaction between the vehicle and the terrain. Traditional physics-based approaches encounter difficulties in accurately modeling dynamic systems and external disturbance. In contrast, data-driven neural networks require extensive datasets and struggle with explicitly capturing the fundamental physical laws, which can easily lead to poor generalization. By merging the advantages of both methods, neuro-symbolic approaches present a promising direction. These methods embed physical laws into neural models, potentially significantly improving generalization capabilities. However, no prior works were evaluated in real-world settings for off-road driving. To bridge this gap, we present PhysORD, a neural-symbolic approach integrating the conservation law, i.e., the Euler-Lagrange equation, into data-driven neural models for motion prediction in off-road driving. Our experiments showed that PhysORD can accurately predict vehicle motion and tolerate external disturbance by modeling uncertainties. It outperforms existing methods both in accuracy and efficiency and demonstrates data-efficient learning and generalization ability in long-term prediction.	翻訳日:2024-09-04 18:50:14 公開日:2024-09-02
# 高速ユニタリダイナミクスを用いた断熱除去のための明示的式 Explicit formulas for adiabatic elimination with fast unitary dynamics ( http://arxiv.org/abs/2404.01802v2 ) ライセンス: Link先を確認	Angela Riva, Alain Sarlette, Pierre Rouchon,	(参考訳) 開量子系における高速減衰自由度のいわゆる「アディアバティック消去」は、時間スケールの分離において一連の展開によって行うことができる。関連する計算は、残りの自由度(中心多様体)が単に遅くなるのではなく、高速なユニタリダイナミクスに従う場合、はるかに困難である。本稿では, シルヴェスター方程式と随伴ダイナミクスを用いた定式化が, 身体的関心の設定のために高次で体系的, 明示的な表現をもたらすかを明らかにする。 The so-called ``adiabatic elimination'' of fast decaying degrees of freedom in open quantum systems can be performed with a series expansion in the timescale separation. The associated computations are significantly more difficult when the remaining degrees of freedom (center manifold) follow fast unitary dynamics instead of just being slow. This paper highlights how a formulation with Sylvester's equation and with adjoint dynamics leads to systematic, explicit expressions at high orders for settings of physical interest.	翻訳日:2024-09-04 18:50:14 公開日:2024-09-02
# DeVAIC: AI生成コードのセキュリティアセスメントツール DeVAIC: A Tool for Security Assessment of AI-generated Code ( http://arxiv.org/abs/2404.07548v2 ) ライセンス: Link先を確認	Domenico Cotroneo, Roberta De Luca, Pietro Liguori,	(参考訳) コンテキスト: AIコードジェネレータは、コード記述とソフトウェア開発に革命をもたらしていますが、潜在的に信頼できないソースコードを含む大規模なデータセットでのトレーニングは、セキュリティ上の懸念を引き起こします。さらに、これらのジェネレータは、現在のソリューションを使った評価が難しい不完全なコードスニペットを生成することができる。目的: この研究は、AI生成されたPythonコードのセキュリティを評価するツールであるDeVAIC(AI生成コードの脆弱性の検出)を導入し、不完全なコードを調べるという課題を克服する。方法: 脆弱なサンプルを収集し, 実装パターンを抽出し, 提案ツールを開発するための正規表現を作成する手法を踏襲した。 DeVAICの実装には正規表現に基づく一連の検出ルールが含まれており、OWASPトップ10の脆弱性カテゴリに該当する35の共通弱度列挙(CWE)をカバーする。結果: 人気の高い4つのAIモデルを使用してPythonコードを生成しました。 DeVAICは、最先端のソリューションと比較してセキュリティ上の脆弱性を検出する能力に統計的に有意な差を示し、F1スコアと精度は94%で、コードスニペットあたりの計算コストは平均0.14秒であった。結論: 提案されたツールは、不完全なコードであっても、脆弱性検出のための軽量で効率的なソリューションを提供する。 Context: AI code generators are revolutionizing code writing and software development, but their training on large datasets, including potentially untrusted source code, raises security concerns. Furthermore, these generators can produce incomplete code snippets that are challenging to evaluate using current solutions. Objective: This research work introduces DeVAIC (Detection of Vulnerabilities in AI-generated Code), a tool to evaluate the security of AI-generated Python code, which overcomes the challenge of examining incomplete code. Method: We followed a methodological approach that involved gathering vulnerable samples, extracting implementation patterns, and creating regular expressions to develop the proposed tool. The implementation of DeVAIC includes a set of detection rules based on regular expressions that cover 35 Common Weakness Enumerations (CWEs) falling under the OWASP Top 10 vulnerability categories. Results: We utilized four popular AI models to generate Python code, which we then used as a foundation to evaluate the effectiveness of our tool. DeVAIC demonstrated a statistically significant difference in its ability to detect security vulnerabilities compared to the state-of-the-art solutions, showing an F1 Score and Accuracy of 94% while maintaining a low computational cost of 0.14 seconds per code snippet, on average. Conclusions: The proposed tool provides a lightweight and efficient solution for vulnerability detection even on incomplete code.	翻訳日:2024-09-04 18:50:14 公開日:2024-09-02
# 製品の可視性を高めるために大規模言語モデルを操作する Manipulating Large Language Models to Increase Product Visibility ( http://arxiv.org/abs/2404.07981v2 ) ライセンス: Link先を確認	Aounon Kumar, Himabindu Lakkaraju,	(参考訳) 大規模言語モデル(LLM)は、ユーザクエリに適した自然言語応答を提供するために、検索エンジンに統合されつつある。顧客とエンドユーザーも、迅速かつ簡単な購入決定のためにこれらのモデルに依存している。本研究では,製品の可視性を高めるため,LCMからのレコメンデーションを操作できるかどうかを検討する。戦略テキストシーケンス (STS) を製品の情報ページに慎重に作成することで, LLM のトップレコメンデーションとしてリストアップされる可能性を大幅に高めることを示す。 STSの影響を理解するために、架空のコーヒーマシンのカタログを使用して、2つのターゲット製品にその効果を分析します。戦略的テキストシーケンスは、トップレコメンデーションとして現れる可能性を高めることにより、両製品の可視性を大幅に向上させる。 LLM生成した検索応答を操作するこの能力は、ベンダーにかなりの競争上の優位性を与え、公正な市場競争を妨害する可能性がある。検索エンジン最適化(SEO)が、検索エンジン検索結果のランクを上げるためにWebページをカスタマイズする方法に革命をもたらしたのと同じように、LLMの推奨に影響を与えることは、AI駆動の検索サービスのコンテンツ最適化に大きな影響を及ぼす可能性がある。実験用のコードはhttps://github.com/aounon/llm-rank-optimizer.comで公開されている。 Large language models (LLMs) are increasingly being integrated into search engines to provide natural language responses tailored to user queries. Customers and end-users are also becoming more dependent on these models for quick and easy purchase decisions. In this work, we investigate whether recommendations from LLMs can be manipulated to enhance a product's visibility. We demonstrate that adding a strategic text sequence (STS) -- a carefully crafted message -- to a product's information page can significantly increase its likelihood of being listed as the LLM's top recommendation. To understand the impact of STS, we use a catalog of fictitious coffee machines and analyze its effect on two target products: one that seldom appears in the LLM's recommendations and another that usually ranks second. We observe that the strategic text sequence significantly enhances the visibility of both products by increasing their chances of appearing as the top recommendation. This ability to manipulate LLM-generated search responses provides vendors with a considerable competitive advantage and has the potential to disrupt fair market competition. Just as search engine optimization (SEO) revolutionized how webpages are customized to rank higher in search engine results, influencing LLM recommendations could profoundly impact content optimization for AI-driven search services. Code for our experiments is available at https://github.com/aounon/llm-rank-optimizer.	翻訳日:2024-09-04 18:50:14 公開日:2024-09-02
# 自発パラメトリックダウン変換に基づく単一光子源の多重モード特性のキャラクタリゼーション Characterization of the multimode nature of single-photon sources based on spontaneous parametric down conversion ( http://arxiv.org/abs/2404.10682v2 ) ライセンス: Link先を確認	Emil R. Hellebek, Klaus Mølmer, Anders S. Sørensen,	(参考訳) 単一光子源は多くの将来的な量子技術に必要な成分である。単一光子源の候補の1つは、自発パラメトリックダウン変換と隠蔽光子検出の組み合わせである。このような光源から放射される光パルスは、通常は単一モードとして扱われるが、この処理は不完全である。ダウン変換過程のボゴリューボフの正確な処理に基づいて, 完全なマルチモード記述を開発する。次に,本研究は,光子の正確な検出時間に依存することなく,かつ検出前後の狭い窓にのみ光子を受容した場合に,最も重要な物理的メカニズムを示し,単一光子の状態の成功確率と純度を解析的に推定することができる摂動的かつ効果的な治療方法を提案する。これにより、ポンプパルスの3つの異なる仮定の下で発光された光を特徴付けることができる。超短パルスによる自然パラメトリックダウン変換では、単モード記述が正確であるのに対して、長いポンプパルスと連続ポンプでは多重モード記述が必要である。本研究の成果は,マルチモード特性に依存した単一光子源に基づく量子情報プロトコルの設計を導くのに有用である。 Single-photon sources are necessary components for many prospective quantum technologies. One candidate for a single-photon source is spontaneous parametric down conversion combined with a heralding photon detection. The heralded light pulse from such a source, is typically treated as single-mode, this treatment, however, is incomplete. We develop a full multimode description based on the exact Bogoliubov treatment of the down conversion process. We then provide a perturbative and effective treatment, which illustrates the most important physical mechanisms and permits analytical estimates of the success probability and purity of single-photon states under practical heralding conditions, both without relying on the precise detection time of the heralding photon and when accepting photons only in a narrow window around the time of the detection. This permits us to characterize the emitted light under three different assumptions for the pump pulse. For spontaneous parametric down conversion with a very short pump pulse, we find the single-mode description to be accurate, while for longer pump pulses and continuous pumping, a multimode description is necessary. Our findings can be used to guide the design of quantum information protocols based on heralded single-photon sources, as their performance may depend on the multimode nature of the sources.	翻訳日:2024-09-04 18:50:14 公開日:2024-09-02
# TrajDeleter: オフライン強化学習エージェントにおける軌道フォーミングの実現 TrajDeleter: Enabling Trajectory Forgetting in Offline Reinforcement Learning Agents ( http://arxiv.org/abs/2404.12530v2 ) ライセンス: Link先を確認	Chen Gong, Kecen Li, Jin Yao, Tianhao Wang,	(参考訳) 強化学習(RL)は、環境と相互作用する経験からエージェントを訓練する。オンラインインタラクションが現実的でないシナリオでは、事前にコンパイルされたデータセットを使用してエージェントをトレーニングするオフラインRLが人気を集めている。この新しいパラダイムは、医療やエネルギー管理など、さまざまな現実世界の領域で顕著な効果を示す一方で、トレーニングデータセットとトレーニングされたエージェントの両方からの特定のトラジェクトリの影響を、エージェントが迅速かつ完全に排除する必要性が高まっている。この問題に対処するために、オフラインRLエージェントの軌道未学習のための最初の実践的アプローチであるTrajdeleterを提唱する。 Trajdeleterのキーとなるアイデアは、エージェントを誘導して、未学習の軌跡に関連する状態に遭遇した際のパフォーマンス低下を示すことである。同時に、他のトラジェクトリに直面するとき、エージェントが元のパフォーマンスレベルを維持する。さらに、TrajdeleterがオフラインのRLエージェントから影響の特定の軌跡をうまく除去するかどうかを簡易かつ効率的な評価方法であるTrajauditorを導入する。 6つのオフラインRLアルゴリズムと3つのタスクで実施された大規模な実験は、トラジデレターがスクラッチから再トレーニングするのに必要な時間の約1.5%しか必要としていないことを示した。目標軌道の94.8%を効果的に解き放つが、未学習の後も実際の環境相互作用は良好である。レプリケーションパッケージとエージェントパラメータはオンラインで利用できる。 Reinforcement learning (RL) trains an agent from experiences interacting with the environment. In scenarios where online interactions are impractical, offline RL, which trains the agent using pre-collected datasets, has become popular. While this new paradigm presents remarkable effectiveness across various real-world domains, like healthcare and energy management, there is a growing demand to enable agents to rapidly and completely eliminate the influence of specific trajectories from both the training dataset and the trained agents. To meet this problem, this paper advocates Trajdeleter, the first practical approach to trajectory unlearning for offline RL agents. The key idea of Trajdeleter is to guide the agent to demonstrate deteriorating performance when it encounters states associated with unlearning trajectories. Simultaneously, it ensures the agent maintains its original performance level when facing other remaining trajectories. Additionally, we introduce Trajauditor, a simple yet efficient method to evaluate whether Trajdeleter successfully eliminates the specific trajectories of influence from the offline RL agent. Extensive experiments conducted on six offline RL algorithms and three tasks demonstrate that Trajdeleter requires only about 1.5% of the time needed for retraining from scratch. It effectively unlearns an average of 94.8% of the targeted trajectories yet still performs well in actual environment interactions after unlearning. The replication package and agent parameters are available online.	翻訳日:2024-09-04 18:40:27 公開日:2024-09-02
# TRNet:ノイズロバスト音声認識における音声強調を利用した2レベルリファインメントネットワーク TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition ( http://arxiv.org/abs/2404.12979v2 ) ライセンス: Link先を確認	Chengxin Chen, Pengyuan Zhang,	(参考訳) 音声感情認識(SER)における永続的課題の1つは、ユビキタス環境騒音であり、実際にSERの性能を劣化させることがしばしばある。本稿では,この課題に対処するため,TRNetと呼ばれる2レベルリファインメントネットワークを提案する。具体的には、事前訓練された音声強調モジュールを用いて、フロントエンド雑音の低減と雑音レベルの推定を行う。その後、クリーン音声スペクトログラムとその対応する深部表現を参照信号として利用し、モデル学習時のスペクトル歪みと強調音声の表現シフトを洗練させる。実験により,提案したTRNetは,ノイズフリー環境での性能を損なうことなく,一致した環境と一致しない環境の両方において,提案システムの堅牢性を大幅に向上することを確認した。 One persistent challenge in Speech Emotion Recognition (SER) is the ubiquitous environmental noise, which frequently results in deteriorating SER performance in practice. In this paper, we introduce a Two-level Refinement Network, dubbed TRNet, to address this challenge. Specifically, a pre-trained speech enhancement module is employed for front-end noise reduction and noise level estimation. Later, we utilize clean speech spectrograms and their corresponding deep representations as reference signals to refine the spectrogram distortion and representation shift of enhanced speech during model training. Experimental results validate that the proposed TRNet substantially promotes the robustness of the proposed system in both matched and unmatched noisy environments, without compromising its performance in noise-free environments.	翻訳日:2024-09-04 18:40:27 公開日:2024-09-02
# プログラム環境ファズリング Program Environment Fuzzing ( http://arxiv.org/abs/2404.13951v3 ) ライセンス: Link先を確認	Ruijie Meng, Gregory J. Duck, Abhik Roychoudhury,	(参考訳) プログラムは独立して実行されるのではなく、プログラムの振る舞いを駆動する実行環境と相互作用する。したがって、ソフトウェア検証手法は、複雑な環境相互作用の影響を捉える必要がある。プログラム環境は、ファイル、データベース、設定、ネットワークソケット、ヒューマン・ユーザ・インタラクションなどによってもたらされる。シンボリックな実行における環境キャプチャの従来のアプローチと、手作業を伴う環境モデリングを用いたモデル検査である。本稿では,グレーボックスファジングの拡張に基づいて,異なるアプローチをとる。プログラムが与えられた場合、カーネル/ユーザ/モード境界におけるすべての環境相互作用をシステムコールの形式で記録する。次に、元の記録された相互作用の下でプログラムをリプレイしますが、今回は、異なるプログラム環境の効果を得るために、選択的な突然変異を適用します。ファジィキャンペーンの繰り返し(フィードバック駆動)変異によって、クラッシュする振る舞いを引き起こすプログラム環境を探すことができる。私たちのEnvFuzzツールは、よく知られた現実世界のプロトコル実装とGUIアプリケーションで、33の既知のバグを発見しました。その多くはセキュリティ上の脆弱性であり、16のCVEが割り当てられている。 Computer programs are not executed in isolation, but rather interact with the execution environment which drives the program behaviors. Software validation methods thus need to capture the effect of possibly complex environmental interactions. Program environments may come from files, databases, configurations, network sockets, human-user interactions, and more. Conventional approaches for environment capture in symbolic execution and model checking employ environment modeling, which involves manual effort. In this paper, we take a different approach based on an extension of greybox fuzzing. Given a program, we first record all observed environmental interactions at the kernel/user-mode boundary in the form of system calls. Next, we replay the program under the original recorded interactions, but this time with selective mutations applied, in order to get the effect of different program environments -- all without environment modeling. Via repeated (feedback-driven) mutations over a fuzzing campaign, we can search for program environments that induce crashing behaviors. Our EnvFuzz tool found 33 previously unknown bugs in well-known real-world protocol implementations and GUI applications. Many of these are security vulnerabilities and 16 CVEs were assigned.	翻訳日:2024-09-04 18:40:27 公開日:2024-09-02
# 代用勾配スパイクニューラルネットワークによる音声知覚中のニューラル振動の探索 Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks ( http://arxiv.org/abs/2404.14024v2 ) ライセンス: Link先を確認	Alexandre Bittar, Philip N. Garner,	(参考訳) 脳内の認知過程を理解するには、大規模に神経力学を複製できる洗練されたモデルが必要である。本稿では、ディープラーニングフレームワークと互換性があり、スケーラブルな、生理学的にインスピレーションを受けた音声認識アーキテクチャを示し、エンドツーエンドの勾配降下トレーニングが中枢スパイクニューラルネットワークにおける神経振動の出現に繋がることを示す。これらの振動を示唆する重要な周波数間結合は、音声処理中にネットワーク層内およびネットワーク層間で測定されるが、背景雑音入力を処理する際にはそのような相互作用は観測されない。さらに,神経活動の調節と同期化において,スパイク周波数適応やリカレント接続などのフィードバック機構が重要な阻害的役割を担い,認識性能の向上に寄与することが示唆された。全体として、人間の聴覚経路で顕著に観察される同期現象の理解を深める上で、我々のアーキテクチャは、ニューロモルフィック技術に関連して、動的かつ効率的な情報処理を示す。 Understanding cognitive processes in the brain demands sophisticated models capable of replicating neural dynamics at large scales. We present a physiologically inspired speech recognition architecture, compatible and scalable with deep learning frameworks, and demonstrate that end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network. Significant cross-frequency couplings, indicative of these oscillations, are measured within and across network layers during speech processing, whereas no such interactions are observed when handling background noise inputs. Furthermore, our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance. Overall, on top of developing our understanding of synchronisation phenomena notably observed in the human auditory pathway, our architecture exhibits dynamic and efficient information processing, with relevance to neuromorphic technology.	翻訳日:2024-09-04 18:40:27 公開日:2024-09-02
# Mamba3D: 状態空間モデルによる3Dポイントクラウド分析のためのローカル機能強化 Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model ( http://arxiv.org/abs/2404.14966v2 ) ライセンス: Link先を確認	Xu Han, Yuan Tang, Zhaoxuan Wang, Xianzhi Li,	(参考訳) 既存のTransformerベースのポイントクラウド分析モデルは2次複雑さに悩まされ、ポイントクラウドの解像度と情報損失が損なわれる。対照的に、状態空間モデル(SSM)に基づく新しいMambaモデルでは、線形複雑性のみを持つ複数の領域でTransformerの性能が向上する。しかし、Mambaの直接的な採用は、ポイントクラウドタスクで十分なパフォーマンスを達成できない。本研究では,ポイントクラウド学習に適した状態空間モデルであるMamba3Dを提案する。具体的には,局所的幾何学的特徴を抽出するシンプルな局所ノルムプール(LNP)ブロックを提案する。さらに、より優れたグローバルな特徴を得るために、トークンフォワードSSMと特徴チャネルで動作する新しい後方SSMの両方を備えた双方向SSM(bi-SSM)を導入する。大規模な実験結果から、Mamba3DはTransformerベースのものを超え、事前トレーニングの有無に関わらず、複数のタスクで同時に動作することがわかった。特に、Mamba3DはScanObjectNNで92.6%(スクラッチからトレーニング)、ModelNet40分類タスクで95.1%(シングルモーダル事前トレーニング)の総合的な精度で複数のSoTAを達成している。コードとウェイトはhttps://github.com/xhanxu/Mamba3D.comで公開されています。 Existing Transformer-based models for point cloud analysis suffer from quadratic complexity, leading to compromised point cloud resolution and information loss. In contrast, the newly proposed Mamba model, based on state space models (SSM), outperforms Transformer in multiple areas with only linear complexity. However, the straightforward adoption of Mamba does not achieve satisfactory performance on point cloud tasks. In this work, we present Mamba3D, a state space model tailored for point cloud learning to enhance local feature extraction, achieving superior performance, high efficiency, and scalability potential. Specifically, we propose a simple yet effective Local Norm Pooling (LNP) block to extract local geometric features. Additionally, to obtain better global features, we introduce a bidirectional SSM (bi-SSM) with both a token forward SSM and a novel backward SSM that operates on the feature channel. Extensive experimental results show that Mamba3D surpasses Transformer-based counterparts and concurrent works in multiple tasks, with or without pre-training. Notably, Mamba3D achieves multiple SoTA, including an overall accuracy of 92.6% (train from scratch) on the ScanObjectNN and 95.1% (with single-modal pre-training) on the ModelNet40 classification task, with only linear complexity. Our code and weights are available at https://github.com/xhanxu/Mamba3D.	翻訳日:2024-09-04 18:40:27 公開日:2024-09-02
# 中心スピンモデルにおける測定誘起相転移:双対空間アプローチにおける第2レニイエントロピー Measurement induced phase transition in the central spin model: second Rényi entropy in dual space approach ( http://arxiv.org/abs/2404.15717v2 ) ライセンス: Link先を確認	V. V. Belov, W. V. Pogosov,	(参考訳) 我々は, 中心スピンモデルが測定過程が存在する場合の力学の数値的研究を行う。このモデルは、そのトポロジーにより実験的な探索を約束しており、中心粒子と量子浴を異なるサブシステムとして自然に区別し、絡み合い相転移を調べることができる。この系における測定誘起相転移を特徴づけるために、二次元空間における第二R'enyiエントロピーに基づく最近開発された手法を用いる。シミュレーションでは、デコヒーレンス、エネルギー緩和、ゲートエラーが説明できる。臨界測定速度を判定し, 相互エントロピーに基づく簡単なアプローチで予測した値とは大きく異なることを示す。 We conduct a numerical investigation of the dynamics of the central spin model in the presence of measurement processes. This model holds promise for experimental exploration due to its topology, which facilitates the natural distinction of a central particle and the quantum bath as different subsystems, allowing for the examination of entanglement phase transitions. To characterize the measurement-induced phase transition in this system, we employ a recently developed method based on second R\'enyi entropy in dual space. Our simulations account for decoherence, energy relaxation, and gate errors. We determine critical measurement rates and demonstrate that they significantly differ from those predicted by a simple approach based on mutual entropy.	翻訳日:2024-09-04 18:40:27 公開日:2024-09-02
# アシスタントを用いた心理療法チャットボットのドメイン特異的改善 Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant ( http://arxiv.org/abs/2404.16160v2 ) ライセンス: Link先を確認	Cheng Kang, Daniel Novak, Katerina Urbanova, Yuqing Cheng, Yong Hu,	(参考訳) 大規模言語モデル (LLM) は、人手による命令データを用いた特定のタスクに対する印象的な一般化機能を実証している。しかし、そのような指導データに対する限られた量、多様性、専門知識は、ドメイン固有の指示が与えられた場合の精神療法タスクにおけるLLMのパフォーマンスに関する懸念を提起する。まず、AlexanderStreet療法に基づくドメイン特化補助命令を提案し、次に、適応微調整法と検索強化法を用いて、事前学習したLLMを改善する。自動評価と人的評価を用いて言語質を定量的に評価することにより、心理療法補助指導における事前学習のLLMが、最先端のLLM応答ベースラインを上回っていることを観察する。我々の助教授アプローチは、トレーニング済みのLSMに指示を合わせ、トレーニング済みのLSMにより心理学的な知識を与える半注釈法を提供する。 Large language models (LLMs) have demonstrated impressive generalization capabilities on specific tasks with human-written instruction data. However, the limited quantity, diversity, and professional expertise of such instruction data raise concerns about the performance of LLMs in psychotherapy tasks when provided with domain-specific instructions. To address this, we firstly propose Domain-Specific Assistant Instructions based on AlexanderStreet therapy, and secondly, we use an adaption fine-tuning method and retrieval augmented generation method to improve pre-trained LLMs. Through quantitative evaluation of linguistic quality using automatic and human evaluation, we observe that pre-trained LLMs on Psychotherapy Assistant Instructions outperform state-of-the-art LLMs response baselines. Our Assistant-Instruction approach offers a half-annotation method to align pre-trained LLMs with instructions and provide pre-trained LLMs with more psychotherapy knowledge.	翻訳日:2024-09-04 18:40:27 公開日:2024-09-02
# 量子リンクシミュレータにおける部分閉じ込め Partial confinement in a quantum-link simulator ( http://arxiv.org/abs/2404.18095v2 ) ライセンス: Link先を確認	Zheng Tang, Fei Zhu, Yi-Fan Luo, Wei Zheng, Li Chen,	(参考訳) 冷却原子に基づく量子シミュレーションにおいて,高エネルギー素粒子の閉じ込め・分解特性が注目されている。しかし、分断と分断の間の中間状態である分断は未解明のままである。部分閉じ込めは、荷電粒子の凝縮挙動が相対的な位置にあるという現象をカプセル化する。本稿では,スピン-1量子リンクモデルが,部分的閉じ込めを探索するための優れたプラットフォームを提供することを示す。我々は、平衡力学と非平衡力学の両方の文脈において、部分閉じ込めから生じる物理学を包括的に研究する。低温原子を用いた潜在的実験装置についても論じる。我々の研究は、ゲージ対称性の対象となる最先端の人工量子系における閉じ込め関連物理学の研究に、シンプルで実現可能なルーチンを提供する。 Confinement/deconfinement, captivating attributes of high-energy elementary particles, have recently garnered wide attention in quantum simulations based on cold atoms. Yet, the partial confinement, an intermediate state between the confinement and deconfinement, remains underexplored. The partial confinement encapsulates the phenomenon that the confining behavior of charged particles is contingent upon their relative positions. In this paper, we demonstrate that the spin-1 quantum link model provides an excellent platform for exploring partial confinement. We conduct a comprehensive investigation of the physics emerging from partial confinement in both the context of equilibrium and non-equilibrium dynamics. Potential experimental setups using cold atoms are also discussed. Our work offers a simple and feasible routine for the study of confinement-related physics in the state-of-the-art artificial quantum systems subject to gauge symmetries.	翻訳日:2024-09-04 18:30:43 公開日:2024-09-02
# GUing:ビジョンランゲージモデルを用いたモバイルGUI検索エンジン GUing: A Mobile GUI Search Engine using a Vision-Language Model ( http://arxiv.org/abs/2405.00145v2 ) ライセンス: Link先を確認	Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Binbin Xu, Pierre Louis Bernard, Gérard Dray, Walid Maalej,	(参考訳) アプリ開発者は、自身のアプリを設計し改善するためのインスピレーションの源として、他のアプリのグラフィカルユーザインタフェース(GUI)を使用する。近年の研究では、クラウドソースまたはGUIの自動探索によって取得されたスクリーンショットデータセットから、特定のテキストクエリにマッチするGUI設計の検索が提案されている。しかし、このようなテキストからGUIへの検索手法はGUI要素のテキスト情報のみを利用し、アイコンや背景画像などの視覚情報を無視する。さらに、検索されたスクリーンショットは、アプリ開発者によって操られず、特定の入力データを必要とする重要なアプリ機能がないことが多い。本稿では,GUIClipと呼ばれる視覚言語モデルに基づくGUI検索エンジンであるGUingを提案する。このために、Google Playアプリの紹介画像から最初に収集した画像は、通常最も代表的なスクリーンショットを表示し、しばしばアプリベンダーによってキャプション(ラベル付き)される。そこで我々は,これらの画像からキャプションを分類し,収穫し,抽出する自動パイプラインを開発した。その中には303Kのアプリスクリーンショットが含まれており、そのうち135Kにはキャプションがある。私たちはこのデータセットを使って新しい視覚言語モデルをトレーニングしました。我々は、関連する作業や手動実験から、様々なデータセットに対するアプローチを評価した。その結果,テキストからGUIへの検索では,最大0.69のRecall@10,最大0.91のHIT@10が得られた。また、GUI分類やスケッチ・ツー・GUI検索など他のGUIタスクに対するGUIClipの性能についても検討した。 App developers use the Graphical User Interface (GUI) of other apps as a source of inspiration for designing and improving their own apps. Recent research has thus suggested retrieving relevant GUI designs that match a certain text query from screenshot datasets acquired through crowdsourced or automated exploration of GUIs. However, such text-to-GUI retrieval approaches only leverage the textual information of the GUI elements, neglecting visual information such as icons or background images. In addition, retrieved screenshots are not steered by app developers and often lack important app features that require particular input data. To overcome these limitations, this paper proposes GUing, a GUI search engine based on a vision-language model called GUIClip, which we trained specifically for the problem of designing app GUIs. For this, we first collected from Google Play app introduction images which usually display the most representative screenshots and are often captioned (i.e.~labeled) by app vendors. Then, we developed an automated pipeline to classify, crop, and extract the captions from these images. This resulted in a large dataset which we share with this paper: including 303k app screenshots, out of which 135k have captions. We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind in GUI retrieval. We evaluated our approach on various datasets from related work and in manual experiment. The results demonstrate that our model outperforms previous approaches in text-to-GUI retrieval achieving a Recall@10 of up to 0.69 and a HIT@10 of 0.91. We also explored the performance of GUIClip for other GUI tasks including GUI classification and sketch-to-GUI retrieval with encouraging results.	翻訳日:2024-09-04 18:30:43 公開日:2024-09-02
# 言語間の感性分析:英語への機械翻訳前後の評価 Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English ( http://arxiv.org/abs/2405.02887v2 ) ライセンス: Link先を確認	Aekansh Kathunia, Mohammad Kaif, Nalin Arora, N Narotam,	(参考訳) 約780の言語がインドだけで話されている。この言語的多様性にもかかわらず、感性分析の研究は主に英語のテキストデータに焦点を当てており、その結果、英語の感情資源が不均等に利用できるようになった。本稿では,機械翻訳を行った多言語データセットおよびテキストを対象とした感性分析タスクにおけるトランスフォーマーモデルの性能について検討する。異なる言語文脈におけるこれらのモデルの有効性を比較することで、それらの性能変化と様々な言語における感情分析の潜在的な影響について洞察を得ることができる。また,今後の課題と今後の課題についても論じる。 People communicate in more than 7,000 languages around the world, with around 780 languages spoken in India alone. Despite this linguistic diversity, research on Sentiment Analysis has predominantly focused on English text data, resulting in a disproportionate availability of sentiment resources for English. This paper examines the performance of transformer models in Sentiment Analysis tasks across multilingual datasets and text that has undergone machine translation. By comparing the effectiveness of these models in different linguistic contexts, we gain insights into their performance variations and potential implications for sentiment analysis across diverse languages. We also discuss the shortcomings and potential for future work towards the end.	翻訳日:2024-09-04 18:30:43 公開日:2024-09-02
# Unicorn: 畳み込みニューラル正規微分方程式を用いた海氷予測のためのU-Net Unicorn: U-Net for Sea Ice Forecasting with Convolutional Neural Ordinary Differential Equations ( http://arxiv.org/abs/2405.03929v2 ) ライセンス: Link先を確認	Jaesung Park, Sungchul Hong, Yoonseo Cho, Jong-June Jeon,	(参考訳) 北極の海氷は地球規模の気候動態に欠かせない。しかし、海氷の正確な予測は、複数の変数間の複雑な相互作用のために大きな課題となる。複数の入力と強力なパフォーマンスをシームレスに統合する能力を活用することで、多くの研究が海氷予測のためのニューラルネットワークに転換している。本稿では,毎週の海氷予測を目的とした,Unicornという新しい深層建築について紹介する。本モデルでは,アーキテクチャ内に複数の時系列画像を統合することにより,予測性能を向上する。さらに、U-Netアーキテクチャにボトルネック層を組み込み、畳み込み演算を伴うニューラル常微分方程式として機能し、潜伏変数の時空間ダイナミクスを捉える。 1998年から2021年にかけてのデータセットを用いた実データ解析により,海氷濃度予測作業における最先端モデルに対する大幅な改善が示された。ベンチマークモデルと比較して平均12%のMAE改善を実現している。さらに,本手法は,海氷範囲予測における既存の手法よりも優れており,約18%の分類性能向上を実現している。これらの実験結果は,提案手法の優位性を示すものである。 Sea ice at the North Pole is vital to global climate dynamics. However, accurately forecasting sea ice poses a significant challenge due to the intricate interaction among multiple variables. Leveraging the capability to integrate multiple inputs and powerful performances seamlessly, many studies have turned to neural networks for sea ice forecasting. This paper introduces a novel deep architecture named Unicorn, designed to forecast weekly sea ice. Our model integrates multiple time series images within its architecture to enhance its forecasting performance. Moreover, we incorporate a bottleneck layer within the U-Net architecture, serving as neural ordinary differential equations with convolution operations, to capture the spatiotemporal dynamics of latent variables. Through real data analysis with datasets spanning from 1998 to 2021, our proposed model demonstrates significant improvements over state-of-the-art models in the sea ice concentration forecasting task. It achieves an average MAE improvement of 12% compared to benchmark models. Additionally, our method outperforms existing approaches in sea ice extent forecasting, achieving a classification performance improvement of approximately 18%. These experimental results show the superiority of our proposed model.	翻訳日:2024-09-04 18:30:43 公開日:2024-09-02
# ERATTA: 大規模言語モデルで答えるテーブルのための極端なRAG ERATTA: Extreme RAG for Table To Answers with Large Language Models ( http://arxiv.org/abs/2405.03963v3 ) ライセンス: Link先を確認	Sohini Roychowdhury, Marko Krema, Anvar Mahammad, Brian Moore, Arijit Mukherjee, Punit Prakashchandra,	(参考訳) 検索拡張現実(RAG)を備えた大規模言語モデル(LLM)は、近年、スケーラブルな生成AIソリューションに最適な選択肢となっている。 AIエージェント(agentic-RAG)で実装されたRAGは、最近普及しているが、エンタープライズレベルのデータプラクティスの不安定なコストと信頼性の低いパフォーマンスに悩まされている。 LLMにRAGを組み込んだ既存のほとんどのユースケースは、汎用的あるいは極端にドメイン固有であり、RAG-LLMアプローチのスケーラビリティと一般化性に疑問を呈している。本研究では,データ認証,ユーザクエリルーティング,データ検索,エンタープライズデータテーブルからの質問応答機能へのカスタムプロンプトを実現するために,複数のLCMを起動可能な,ユニークなLCMベースのシステムを提案する。ここでのソーステーブルは高度に変動し、サイズが大きく、提案されたフレームワークはクエリ毎に10秒未満で構造化された応答を可能にする。さらに,LLM応答の幻覚を検知し,報告する5つの評価モジュールを提案する。提案するシステムと評価基準は,持続可能性,財務状況,ソーシャルメディア領域において,数百のユーザクエリに対して,90%以上の信頼性スコアを達成している。提案した極端なRAGアーキテクチャの拡張は、LLMを用いた異種ソースクエリを可能にする。 Large language models (LLMs) with retrieval augmented-generation (RAG) have been the optimal choice for scalable generative AI solutions in the recent past. Although RAG implemented with AI agents (agentic-RAG) has been recently popularized, its suffers from unstable cost and unreliable performances for Enterprise-level data-practices. Most existing use-cases that incorporate RAG with LLMs have been either generic or extremely domain specific, thereby questioning the scalability and generalizability of RAG-LLM approaches. In this work, we propose a unique LLM-based system where multiple LLMs can be invoked to enable data authentication, user-query routing, data-retrieval and custom prompting for question-answering capabilities from Enterprise-data tables. The source tables here are highly fluctuating and large in size and the proposed framework enables structured responses in under 10 seconds per query. Additionally, we propose a five metric scoring module that detects and reports hallucinations in the LLM responses. Our proposed system and scoring metrics achieve >90% confidence scores across hundreds of user queries in the sustainability, financial health and social media domains. Extensions to the proposed extreme RAG architectures can enable heterogeneous source querying using LLMs.	翻訳日:2024-09-04 18:30:43 公開日:2024-09-02
# ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation ( http://arxiv.org/abs/2405.04818v2 ) ライセンス: Link先を確認	Ana Brassard, Benjamin Heinzerling, Keito Kudo, Keisuke Sakaguchi, Kentaro Inui,	(参考訳) 自由文の説明の質を評価することは多面的、主観的、労働集約的な課題である。大規模言語モデル(LLM)は、一貫性、スケーラビリティ、コスト効率の面で魅力的な代替手段である。本研究では,3500のフリーテキスト説明とアスペクトワイドな品質評価のデータセットであるACORNを紹介し,LCMのレート説明の方法を評価する。以上の結果から,アノテータ間の合意を維持・拡大するラベルを出力し,アノテータ間のアノテータ間の差異が予想される範囲内であることが示唆された。しかし、多数投票による評価との相関は品質面で異なるため、完全な置き換えではないことが示唆された。逆に、LLMを少数のヒトラッカーの補充剤として使用することで、元のマジョリティラベルとの相関が向上するケースもある。しかし、この効果は、ヒトのラッカーが不足しているケースに限られており、追加のヒトレーダは全てのケースにおいてより顕著な効果を示した。全体としては、LLMを人間のラッカーの完全な代替品として使用することを推奨するが、人間の関与を目標とする構成で使用することを推奨する。 https://github.com/a-brassard/ACORN Evaluating the quality of free-text explanations is a multifaceted, subjective, and labor-intensive task. Large language models (LLMs) present an appealing alternative due to their potential for consistency, scalability, and cost-efficiency. In this work, we present ACORN, a new dataset of 3,500 free-text explanations and aspect-wise quality ratings, and use it to evaluate how LLMs rate explanations. We observed that larger models outputted labels that maintained or increased the inter-annotator agreement, suggesting that they are within the expected variance between human raters. However, their correlation with majority-voted human ratings varied across different quality aspects, indicating that they are not a complete replacement. In turn, using LLMs as a supplement to a smaller group of human raters in some cases improved the correlation with the original majority labels. However, the effect was limited to cases where human raters were scarce, and an additional human rater had a more pronounced effect in all cases. Overall, we recommend against using LLMs as a complete replacement for human raters but encourage using them in configurations that end with targeted human involvement. Data available here: https://github.com/a-brassard/ACORN	翻訳日:2024-09-04 18:30:43 公開日:2024-09-02
# MAPL:半教師付き異常検出のためのメモリ拡張と擬似ラベル化 MAPL: Memory Augmentation and Pseudo-Labeling for Semi-Supervised Anomaly Detection ( http://arxiv.org/abs/2405.06198v2 ) ライセンス: Link先を確認	Junzhuo Chen,	(参考訳) 大規模なラベル付きデータと識別の難しい異常は、ほとんどの産業現場で緊急に克服する必要がある問題である。この問題に対処するために、メモリ拡張(Memory Augmentation and Pseudo-Labeling, MAPL)と呼ばれる、土木環境における表面欠陥を検出する新しいメソドロジーを導入する。この手法が最初に導入されるのは異常シミュレーション戦略であり、シミュレーションされた異常サンプルを生成することにより、稀または未知の異常型を認識できるモデルの能力を大幅に向上する。模擬サンプルのラベル付けの欠如に対処するため, 1分類アンサンブルに基づく擬似ラベル法を用い, 鍵擬似ラベル化ハイパーパラメータを自動的に選択することにより, 限定ラベルデータの場合のモデルのロバスト性を向上する。一方、メモリプール内の入力サンプルと正常サンプルとの差を解析することにより、異常領域を効果的に予測するメモリ強化学習機構を導入する。エンド・ツー・エンドの学習フレームワークはMAPLによって入力データから直接異常領域を識別するために使用され、デテクションの効率とリアルタイム性能を最適化する。最近開発されたBHADデータセット(MVTec AD [1], Visa [2], MDPP [3] を含む)の広範囲な試行により、MAPL は、オリジナルの MemSeg [4] モデルと比較して平均既成の AUROC スコア 86.2% を達成する。ソースコードはhttps://github.com/jzc777/MAPLで公開されている。 Large unlabeled data and difficult-to-identify anomalies are the urgent issues need to overcome in most industrial scene. In order to address this issue, a new meth-odology for detecting surface defects in in-dustrial settings is introduced, referred to as Memory Augmentation and Pseudo-Labeling(MAPL). The methodology first in-troduces an anomaly simulation strategy, which significantly improves the model's ability to recognize rare or unknown anom-aly types by generating simulated anomaly samples. To cope with the problem of the lack of labeling of anomalous simulated samples, a pseudo-labeler method based on a one-classifier ensemble was employed in this study, which enhances the robustness of the model in the case of limited labeling data by automatically selecting key pseudo-labeling hyperparameters. Meanwhile, a memory-enhanced learning mechanism is introduced to effectively predict abnormal regions by analyzing the difference be-tween the input samples and the normal samples in the memory pool. An end-to-end learning framework is employed by MAPL to identify the abnormal regions directly from the input data, which optimizes the ef-ficiency and real-time performance of de-tection. By conducting extensive trials on the recently developed BHAD dataset (in-cluding MVTec AD [1], Visa [2], and MDPP [3]), MAPL achieves an average im-age-level AUROC score of 86.2%, demon-strating a 5.1% enhancement compared to the original MemSeg [4] model. The source code is available at https://github.com/jzc777/MAPL.	翻訳日:2024-09-04 18:20:55 公開日:2024-09-02
# Qsyn: NISQ時代以降のための開発者フレンドリーな量子回路合成フレームワーク Qsyn: A Developer-Friendly Quantum Circuit Synthesis Framework for NISQ Era and Beyond ( http://arxiv.org/abs/2405.07197v2 ) ライセンス: Link先を確認	Mu-Te Lau, Chin-Yi Cheng, Cheng-Hua Lu, Chia-Hsu Chuang, Yi-Hsiang Kuo, Hsiang-Chun Yang, Chien-Tung Kuo, Hsin-Yu Chen, Chen-Ying Tung, Cheng-En Tsai, Guan-Hao Chen, Leng-Kai Lin, Ching-Huan Wang, Tzu-Hsu Wang, Chung-Yang Ric Huang,	(参考訳) 本稿では、新しい量子回路合成(QCS)フレームワークであるQsynを紹介し、開発者がQCSアルゴリズムとツールを研究、開発、試験、実験し、そしてフレームワークに貢献できるようにする。 1) 開発者が様々なテストシナリオを簡単に設計し、アルゴリズムで柔軟に実験できるように、リッチなコマンドラインインターフェースを設計します。 2) 開発者がアルゴリズムを極端に最適化できるように,異なる抽象レベルの量子回路上で多くのデータ表現に詳細なアクセスを提供する。 (3)私たちは,開発者が開発品質を,最新のソフトウェアエンジニアリングのベストプラクティスで確保できるように,厳格な開発フローと環境を定義します。筆者らは,T-Count Optimizationアルゴリズムの開発を実演し,最近のQCSフレームワークと同等に比較して,性能上の優位性を示す。 In this paper, we introduce a new quantum circuit synthesis (QCS) framework, Qsyn, for developers to research, develop, test, experiment, and then contribute their QCS algorithms and tools to the framework. Our framework is more developer-friendly than other modern QCS frameworks in three aspects: (1) We design a rich command-line interface so that developers can easily design various testing scenarios and flexibly conduct experiments on their algorithms. (2) We offer detailed access to many data representations on different abstract levels of quantum circuits so that developers can optimize their algorithms to the extreme. (3) We define a rigid developing flow and environment so that developers can ensure their development qualities with the best modern software engineering practices. We illustrate the friendliness of our framework with a showcase of developing a T-Count Optimization algorithm and demonstrate our performance superiority with fair comparisons to other modern QCS frameworks.	翻訳日:2024-09-04 18:20:55 公開日:2024-09-02
# PeRFlow:Universal Plug-and-Play AcceleratorとしてのPiecewise Rectified Flow PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator ( http://arxiv.org/abs/2405.07510v5 ) ライセンス: Link先を確認	Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng,	(参考訳) 拡散モデルを高速化するフローベース手法であるPecewise Rectified Flow(PeRFlow)を提案する。 PeRFlowは、生成フローのサンプリングプロセスを複数の時間ウィンドウに分割し、リフロー操作を通じて各間隔の軌跡を直線化し、断片的な線形フローに近づく。 PeRFlowは数ステップの世代で優れたパフォーマンスを達成する。さらに、専用のパラメータ化を通じて、PeRFlowモデルは事前訓練された拡散モデルから知識を継承する。このように、トレーニングは高速に収束し、得られたモデルは、事前訓練された拡散モデルに基づいて様々なワークフローと互換性のある普遍的なプラグアンドプレイアクセラレータとして機能する、有利な転送能力を示す。トレーニングと推論のためのコードも公開されている。 https://github.com/magic-research/piecewise-rectified-flow We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated parameterizations, the PeRFlow models inherit knowledge from the pretrained diffusion models. Thus, the training converges fast and the obtained models show advantageous transfer ability, serving as universal plug-and-play accelerators that are compatible with various workflows based on the pre-trained diffusion models. Codes for training and inference are publicly released. https://github.com/magic-research/piecewise-rectified-flow	翻訳日:2024-09-04 18:20:55 公開日:2024-09-02
# スクイージングによる量子強調多相推定 Squeezing-induced quantum-enhanced multiphase estimation ( http://arxiv.org/abs/2405.11705v2 ) ライセンス: Link先を確認	Le Bin Ho,	(参考訳) 本研究は,多相量子メートル法において,スクイーズ法が測定精度を向上する方法について検討する。これらの手法は単相推定においてよく研究され,効果的に用いられているが,多相状態における使用法はまだ検討されていない。多相シナリオにおける量子エンハンスメントのメカニズムを解明することによって、このギャップを埋める。我々の分析は、量子クレーマー・ラオ境界を達成するための最適条件に関する理論的および数値的な洞察を与え、スクイーズによる量子拡大多相推定の可能性とメカニズムを理解するのに役立ちます。この研究は量子力学とセンシング技術の進歩の新たな可能性を開く。 We investigate how squeezing techniques can improve the measurement precision in multiphase quantum metrology. While these methods are well-studied and effectively used in single-phase estimations, their usage in multiphase situations has yet to be examined. We fill this gap by investigating the mechanism of quantum enhancement in the multiphase scenarios. Our analysis provides theoretical and numerical insights into the optimal condition for achieving the quantum Cramer-Rao bound, helping us understand the potential and mechanism for quantum-enhanced multiphase estimations with squeezing. This research opens up new possibilities for advancements in quantum metrology and sensing technologies.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# 学習環境における生成型AI活用における学生の知覚の質的・定量的分析 Qualitative and quantitative analysis of student's perceptions in the use of generative AI in educational environments ( http://arxiv.org/abs/2405.13487v2 ) ライセンス: Link先を確認	Sergio Altares-López, José M. Bengochea-Guevara, Carlos Ranz, Héctor Montes, Angela Ribeiro,	(参考訳) 教育における生成人工知能の効果的な統合は、将来の世代を準備するための基本的な側面である。本研究の目的は,教室内における制御された学生とIAの相互作用の知覚を定量的かつ質的な視点から分析することである。この分析には、AIツールの倫理的意味と日常的使用の評価、およびAIツールが学生にSTEMのキャリアを追求することを奨励するかどうかの理解が含まれる。教育改善のためのいくつかのポイントは、教師が新しい技術に取り組み、技術に関連するものだけでなく、すべての課題にその手法を適用することの難しさなどである。 The effective integration of generative artificial intelligence in education is a fundamental aspect to prepare future generations. The objective of this study is to analyze from a quantitative and qualitative point of view the perception of controlled student-IA interaction within the classroom. This analysis includes assessing the ethical implications and everyday use of AI tools, as well as understanding whether AI tools encourage students to pursue STEM careers. Several points for improvement in education are found, such as the challenge of getting teachers to engage with new technologies and adapt their methods in all subjects, not just those related to technologies.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# フラクタル量子力学は量子重力現象と一致する Fractional quantum mechanics meets quantum gravity phenomenology ( http://arxiv.org/abs/2405.13544v2 ) ライセンス: Link先を確認	Gislaine Varão, Iarley P. Lobo, Valdir B. Bezerra,	(参考訳) 本稿では、量子重力現象学にインスパイアされた修正Schr\"odinger進化に関する以前の知見を拡張した。このアプローチと分数量子力学の接続を確立することで、深紫外状態で観察される挙動に似たフラクタル次元の出現を特徴とする、量子重力の潜在的な深い赤外線状態に関する洞察を提供する。さらに,ボース・アインシュタイン凝縮体を用いたこの体制の実験的検討を行った。分数量子力学を探索する一般的な実験は、量子重力の等価モデルとして機能する可能性がある。このような系における非局所的な挙動の例を同定し、量子重力における非局所性の類似現象を示唆する。 This paper extends previous findings on the modified Schr\"odinger evolution inspired by quantum gravity phenomenology. By establishing a connection between this approach and fractional quantum mechanics, we provide insights into a potential deep infrared regime of quantum gravity, characterized by the emergence of fractal dimensions, similar to behaviors observed in the deep ultraviolet regime. Additionally, we explore the experimental investigations of this regime using Bose-Einstein condensates. Notably, our analysis reveals a direct implication of this analogy: general experiments probing fractional quantum mechanics may serve as equivalent models of quantum gravity. We identify instances of nonlocal behavior in such systems, suggesting an analogous phenomenon of nonlocality in quantum gravity.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# 一般化確率論における絡み合いスワッピングと繰り返しCHSHゲーム Entanglement-swapping in generalised probabilistic theories, and iterated CHSH games ( http://arxiv.org/abs/2405.13819v4 ) ライセンス: Link先を確認	Lionel J. Dmello, Laurens T. Ligthart, David Gross,	(参考訳) 量子論よりも「より強い絡み合い」を持つ理論があるが、それらがツィレルソンの有界より上のCHSH値を示すという意味では、そのような理論の既知のすべての例は、厳密に小さな測定セットを持っている。したがって、二分項状態と測定の両方を必要とするタスクでは、QMよりもパフォーマンスが良くない。両分割状態と測定の両方を含む最も単純な情報処理タスクの1つは、絡み合いの交換である。本稿では,一般化確率論(GPT)における絡み合いのスワッピングについて検討する。特に, GPT のパワーを計測して非古典的相関を保ち, 絡み合いのラウンド数$n$の後に得られる最大のCHSH値を用いて, 繰り返しCHSHゲームを導入する。我々の主な成果は、任意のラウンド数でCHSH値が4ドルに達するGPTの構築である。これはワイレンマンとコルベックによって最近提起されたそのようなゲームに対する量子論の最適性に関する問題に対処する。この問題に対処する上で直面する課題は、絡み合いスワッピングが適切に定義された操作であるGPTを構築するための一般的な枠組みが存在しないことである。そこで本研究では,両部GPTを多部GPTに変換するアルゴリズム構成を導入する。 While there exist theories that have states "more strongly entangled" than quantum theory, in the sense that they show CHSH values above Tsirelson's bound, all known examples of such theories have a strictly smaller set of measurements. Therefore, in tasks which require both bipartite states and measurements, they do not perform better than QM. One of the simplest information processing tasks involving both bipartite states and measurements is that of entanglement swapping. In this paper, we study entanglement swapping in generalised probabilistic theories (GPTs). In particular, we introduce the iterated CHSH game, which measures the power of a GPT to preserve non-classical correlations, in terms of the largest CHSH value obtainable after $n$ rounds of entanglement swapping. Our main result is the construction of a GPT that achieves a CHSH value of $4$ after an arbitrary number of rounds. This addresses a question about the optimality of quantum theory for such games recently raised by Weilenmann and Colbeck. One challenge faced when treating this problem is that there seems to be no general framework for constructing GPTs in which entanglement swapping is a well-defined operation. Therefore, we introduce an algorithmic construction that turns a bipartite GPT into a multipartite GPT that supports entanglement swapping, if consistently possible.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# 微分方程式を解くニューラルネットワークの学習において、自動微分は不可欠である Automatic Differentiation is Essential in Training Neural Networks for Solving Differential Equations ( http://arxiv.org/abs/2405.14099v3 ) ライセンス: Link先を確認	Chuqi Chen, Yahong Yang, Yang Xiang, Wenrui Hao,	(参考訳) ニューラルネットワークベースのアプローチは、科学と工学における偏微分方程式(PDE)の解法において、特に複雑なドメインや経験データの導入を特徴とするシナリオにおいて、非常に有望であることを示している。 PDEのニューラルネットワーク手法の利点の1つは、その自動微分(AD)であり、微分を計算するために近傍の局所点を必要とする従来の有限差分(FD)近似とは異なり、標本点自身だけを必要とする。本稿では、ニューラルネットワークのトレーニングにおけるADの利点を定量的に示す。トランキャットエントロピーの概念は、トレーニング特性を特徴づけるために導入された。具体的には、ランダム特徴モデルと2層ニューラルネットワークを用いた総合的な実験および理論的解析により、決定されたトランケートエントロピーが、ランダム特徴モデルの残留損失と、ADおよびFD法のニューラルネットワークのトレーニング速度を定量化するための信頼性の高い指標であることがわかった。実験および理論的分析により,ADはPDEの解法においてFDより優れていることが示された。 Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or incorporation of empirical data. One advantage of the neural network methods for PDEs lies in its automatic differentiation (AD), which necessitates only the sample points themselves, unlike traditional finite difference (FD) approximations that require nearby local points to compute derivatives. In this paper, we quantitatively demonstrate the advantage of AD in training neural networks. The concept of truncated entropy is introduced to characterize the training property. Specifically, through comprehensive experimental and theoretical analyses conducted on random feature models and two-layer neural networks, we discover that the defined truncated entropy serves as a reliable metric for quantifying the residual loss of random feature models and the training speed of neural networks for both AD and FD methods. Our experimental and theoretical analyses demonstrate that, from a training perspective, AD outperforms FD in solving PDEs.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# フォールトトレラントML:効率的なメタアグリゲーションと同期トレーニング Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training ( http://arxiv.org/abs/2405.14759v3 ) ライセンス: Link先を確認	Tehila Dahan, Kfir Y. Levy,	(参考訳) 本稿では,分散機械学習(ML)システムにおけるビザンチン・ロバスト学習の挑戦的枠組みについて検討し,効率性と実用性の両方に焦点をあてる。分散MLシステムは複雑なMLタスクに不可欠なものとなり、ビザンチンの障害に対するレジリエンスを確保する。最初のコントリビューションは、CTMA(Centered Trimmed Meta Aggregator)の導入です。これは、低計算要求を必要としながら、ベースラインアグリゲータを最適なパフォーマンスレベルにアップグレードする効率的なメタアグリゲータです。さらに,ビザンチン文脈における2重モーメント戦略に基づいて,最近開発された勾配推定手法を提案する。本稿では,ビザンチン・ロバスト訓練の理論的・実践的優位性,特にチューニングプロセスの簡素化と多数のハイパーパラメータへの依存軽減について述べる。この手法の有効性は確率凸最適化(SCO)フレームワークの理論的な洞察に支えられ、実証的な証拠によって裏付けられる。 In this paper, we investigate the challenging framework of Byzantine-robust training in distributed machine learning (ML) systems, focusing on enhancing both efficiency and practicality. As distributed ML systems become integral for complex ML tasks, ensuring resilience against Byzantine failures-where workers may contribute incorrect updates due to malice or error-gains paramount importance. Our first contribution is the introduction of the Centered Trimmed Meta Aggregator (CTMA), an efficient meta-aggregator that upgrades baseline aggregators to optimal performance levels, while requiring low computational demands. Additionally, we propose harnessing a recently developed gradient estimation technique based on a double-momentum strategy within the Byzantine context. Our paper highlights its theoretical and practical advantages for Byzantine-robust training, especially in simplifying the tuning process and reducing the reliance on numerous hyperparameters. The effectiveness of this technique is supported by theoretical insights within the stochastic convex optimization (SCO) framework and corroborated by empirical evidence.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# 熱平衡からの不均一物体の量子自己推進 Quantum Self-Propulsion of an Inhomogeneous Object out of Thermal Equilibrium ( http://arxiv.org/abs/2405.15061v2 ) ライセンス: Link先を確認	Kimball A. Milton, Nima Pourtolami, Gerard Kennedy,	(参考訳) 先程の論文では、熱平衡から外れた体やナノ粒子が自然にトルクを経験する、量子真空のトルクがどのように生じるかについて検討した。しかし、これは体が非相互物質からなり、磁場のような外部の影響が必要とされると考えられる。すると、粒子の電気分極性は非対称な実部を持つ。この効果は偏光性において一階に起こる。そのため、自己推進力は生じない。ここでは,2次効果を考慮し,エキゾチックな電磁特性を必要とせず,真空中で自発力が生じることを示す。熱非平衡は依然として必要であるが、身体の電気感受性は不均一である必要がある。そこで本研究では, 異なる半身からなる針, 球体と球体, それぞれの半球が異なる物質でできている針, 顔が異なる薄いスラブの4つの例について検討する。その結果は過去の数値調査と一致している。ここでは,金属表面の皮膚深度を考察する。また、身体に終端速度を生じさせる摩擦力も考慮し、これは観測可能であるかもしれない。より重要となるのは、熱平衡への緩和であり、これはまだ実験的に検証可能な終端速度に繋がる可能性がある。運動量空間で表される運動体上のそのような力の一般的な処理は、推進力と摩擦力の両方を包含する。推進力の源は、体の異なる部分からの放射の非対称パターンであり、金属部分の高い反射率が重要な役割を果たす。 In an earlier paper, we explored how quantum vacuum torque can arise: a body or nanoparticle that is out of thermal equilibrium with its environment experiences a spontaneous torque. But this requires that the body be composed of nonreciprocal material, which seems to necessitate the presence of an external influence, such as a magnetic field. Then the electric polarizability of the particle has a real part that is nonsymmetric. This effect occurs to first order in the polarizability. To that order, no self-propulsive force can arise. Here, we consider second-order effects, and show that spontaneous forces can arise in vacuum, without requiring exotic electromagnetic properties. Thermal nonequilibrium is still necessary, but the electric susceptibility of the body need only be inhomogeneous. We investigate four examples of such a body: a needle composed of distinct halves; a sphere and a ball, each hemisphere being made of a different substance; and a thin slab, each face of which is different. The results found are consistent with previous numerical investigations. Here, we take into account the skin depth of metal surfaces. We also consider the frictional forces that would cause the body to acquire a terminal velocity, which might be observable. More likely to be important is relaxation to thermal equilibrium, which can still lead to a terminal velocity that might be experimentally verifiable. A general treatment of such forces on a moving body, expressed in momentum space, is provided, which incorporates both propulsive and frictional forces. The source of the propulsive force is the nonsymmetric pattern of radiation from different parts of the body, the higher reflectivity of the metal portion playing a crucial role.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# 大規模言語モデルを用いたインフォーマティブテキスト評価の緩和 Eliciting Informative Text Evaluations with Large Language Models ( http://arxiv.org/abs/2405.15077v4 ) ライセンス: Link先を確認	Yuxuan Lu, Shengwei Xu, Yichi Zhang, Yuqing Kong, Grant Schoenebeck,	(参考訳) ピア予測機構は、証明可能な保証で高品質なフィードバックを動機付ける。しかし、現在の手法は、多重選択やスカラー数のような比較的単純なレポートにのみ適用される。我々は,これらの手法をテキストベースレポートの大規模領域に拡張することを目指しており,近年の大規模言語モデルの発展を反映している。これは、ピアレビュー、eコマースの顧客レビュー、ソーシャルメディアへのコメントなど、さまざまなフィードバックチャネルにおいて、テキストフィードバックが標準となっているため、ピア予測メカニズムの適用性を大幅に向上させる。本稿では,GPPM(Generative Peer Prediction Mechanism)とGSPPM(Generative Synopsis Peer Prediction Mechanism)の2つのメカニズムを紹介する。これらのメカニズムはLSMを予測因子として利用し、あるエージェントのレポートから仲間のレポートの予測にマッピングする。理論的には、LLM予測が十分正確であれば、我々のメカニズムは(近似)ベイズナッシュ平衡として高い努力と真理を動機付けることができる。実験により,Yelp レビューデータセットと ICLR OpenReview データセットという,2つの実際のデータセットで実施した実験を通じて,我々のメカニズムの有効性を確認した。 ICLRデータセットでは、人間によるレビュー、GPT-4生成レビュー、GPT-3.5生成レビューの3つの品質レベルを、期待されるスコアの観点から区別することが可能です。さらに、GSPPMはLPM生成レビューをGPPMよりも効果的にペナルティ化する。 Peer prediction mechanisms motivate high-quality feedback with provable guarantees. However, current methods only apply to rather simple reports, like multiple-choice or scalar numbers. We aim to broaden these techniques to the larger domain of text-based reports, drawing on the recent developments in large language models. This vastly increases the applicability of peer prediction mechanisms as textual feedback is the norm in a large variety of feedback channels: peer reviews, e-commerce customer reviews, and comments on social media. We introduce two mechanisms, the Generative Peer Prediction Mechanism (GPPM) and the Generative Synopsis Peer Prediction Mechanism (GSPPM). These mechanisms utilize LLMs as predictors, mapping from one agent's report to a prediction of her peer's report. Theoretically, we show that when the LLM prediction is sufficiently accurate, our mechanisms can incentivize high effort and truth-telling as an (approximate) Bayesian Nash equilibrium. Empirically, we confirm the efficacy of our mechanisms through experiments conducted on two real datasets: the Yelp review dataset and the ICLR OpenReview dataset. We highlight the results that on the ICLR dataset, our mechanisms can differentiate three quality levels -- human-written reviews, GPT-4-generated reviews, and GPT-3.5-generated reviews in terms of expected scores. Additionally, GSPPM penalizes LLM-generated reviews more effectively than GPPM.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# LLMにおける忠実で不誠実な推論の解離 Dissociation of Faithful and Unfaithful Reasoning in LLMs ( http://arxiv.org/abs/2405.15092v2 ) ライセンス: Link先を確認	Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan Paturi, Leon Bergen,	(参考訳) 大規模言語モデル(LLM)は、回答を生成する前に思考の連鎖推論テキストを生成するとき、ダウンストリームタスクのパフォーマンスを改善することが多い。思考の連鎖の誤りからLLMがいかに回復するかを考察する。誤り回復行動の分析を通じて、不当な推論テキストにもかかわらず、モデルが正しい回答に到達したときに発生する、思考の連鎖の不誠実さの証拠を見出す。 LLMの回復行動を変える要因は, 明らかな誤りや, 正しい回答の証拠となる状況から, より頻繁に回復する。批判的に、これらの要因は忠実で不誠実な回復に異なる影響を及ぼす。以上の結果から,不誠実かつ不誠実な誤り回復を誘発するメカニズムが明らかとなった。これらのメカニズムの選択的ターゲティングは、不誠実な推論の速度を下げ、モデルの解釈可能性を改善することができるかもしれない。 Large language models (LLMs) often improve their performance in downstream tasks when they generate Chain of Thought reasoning text before producing an answer. We investigate how LLMs recover from errors in Chain of Thought. Through analysis of error recovery behaviors, we find evidence for unfaithfulness in Chain of Thought, which occurs when models arrive at the correct answer despite invalid reasoning text. We identify factors that shift LLM recovery behavior: LLMs recover more frequently from obvious errors and in contexts that provide more evidence for the correct answer. Critically, these factors have divergent effects on faithful and unfaithful recoveries. Our results indicate that there are distinct mechanisms driving faithful and unfaithful error recoveries. Selective targeting of these mechanisms may be able to drive down the rate of unfaithful reasoning and improve model interpretability.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# eラーニングにおけるディトラクション検出のためのバイオメトリックスと行動解析 Biometrics and Behavior Analysis for Detecting Distractions in e-Learning ( http://arxiv.org/abs/2405.15434v3 ) ライセンス: Link先を確認	Álvaro Becerra, Javier Irigoyen, Roberto Daza, Ruth Cobos, Aythami Morales, Julian Fierrez, Mutlu Cukurova,	(参考訳) 本稿では,eラーニングセッション中の異常な頭部ポーズを検出するコンピュータビジョンアプローチについて検討し,これらのセッションにおける携帯電話の利用状況について検討する。我々はMOOC学習セッションに参加している120人の学習者から収集した行動データを利用する。本研究は,電話使用前後の行動,生理的反応,特に注意,心拍数,想想などに及ぼす電話使用事象の影響に焦点を当てた。また、MOOC学習セッション中にウェブカメラが撮影した画像を用いて、頭部ポーズイベントを推定し、電話使用イベントを検出する手法を提案する。本仮説は,eラーニングセッション中に学習者がコンピュータと対面する典型的な行動と対照的に,学習者が携帯電話と対話するときの頭部姿勢に大きな変化が生じることを示唆している。本研究では,学習者のセッション中に観測された平均値から頭部姿勢の偏差を検出するための手法を提案する。このシステムは、その後の人間のレビューと携帯電話の使用状況の選択のための頭部姿勢の変化を示すイベントを90%以上の感度でフラグ付けする。 In this article, we explore computer vision approaches to detect abnormal head pose during e-learning sessions and we introduce a study on the effects of mobile phone usage during these sessions. We utilize behavioral data collected from 120 learners monitored while participating in a MOOC learning sessions. Our study focuses on the influence of phone-usage events on behavior and physiological responses, specifically attention, heart rate, and meditation, before, during, and after phone usage. Additionally, we propose an approach for estimating head pose events using images taken by the webcam during the MOOC learning sessions to detect phone-usage events. Our hypothesis suggests that head posture undergoes significant changes when learners interact with a mobile phone, contrasting with the typical behavior seen when learners face a computer during e-learning sessions. We propose an approach designed to detect deviations in head posture from the average observed during a learner's session, operating as a semi-supervised method. This system flags events indicating alterations in head posture for subsequent human review and selection of mobile phone usage occurrences with a sensitivity over 90%.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# HyperInterval:連続学習におけるウェイトインターバル領域のトレーニングのためのハイパーネットワークアプローチ HyperInterval: Hypernetwork approach to training weight interval regions in continual learning ( http://arxiv.org/abs/2405.15444v3 ) ライセンス: Link先を確認	Patryk Krukowski, Anna Bielawska, Kamil Książek, Paweł Wawrzyński, Paweł Batorski, Przemysław Spurek,	(参考訳) 最近、ニューラルネットワークのパラメータ空間に間隔制約を強制することに依存するInterContiNet(IntercontiNet)と呼ばれる、破滅的な忘れを制御するために、新しい連続学習(CL)パラダイムが提示された。残念ながら、InterContiNetトレーニングは重量空間の高次元性のために困難であり、間隔の管理が困難である。この問題に対処するため,ソースコードはhttps://github.com/gmum/HyperInterval} で利用可能である。我々は、連続したタスクに対するインターバル埋め込みを訓練し、ハイパーネットワークをトレーニングし、これらの埋め込みをターゲットネットワークの重みに変換する。与えられたタスクの埋め込みはハイパーネットワークと共にトレーニングされ、以前のタスクの埋め込みに対するターゲットネットワークの応答を保存する。インターバル算術は、高次元の重み空間における間隔を直接準備するのではなく、より管理可能な、より低次元の埋め込み空間で動作する。私たちのモデルはより速く、より効率的なトレーニングを可能にします。さらに \our{} は、忘れないことを保証する。トレーニングの終わりに、すべてのタスク専用の1つのネットワークを生成するために、1つの普遍的な埋め込みを選択することができます。このようなフレームワークでは、ハイパーネットワークはトレーニングにのみ使用され、最終的には1セットの重みを使うことができる。 \our{}はInterContiNetよりもはるかに優れた結果を取得し、いくつかのベンチマークでSOTA結果を与える。 Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce \our{} \footnote{The source code is available at https://github.com/gmum/HyperInterval}, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the target network parameter space. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with a more manageable, lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Our model allows faster and more efficient training. Furthermore, \our{} maintains the guarantee of not forgetting. At the end of training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, hypernetwork is used only for training and, finally, we can utilize one set of weights. \our{} obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.	翻訳日:2024-09-04 18:11:10 公開日:2024-09-02
# VAAD:eラーニングに応用された視覚的注意分析ダッシュボード VAAD: Visual Attention Analysis Dashboard applied to e-Learning ( http://arxiv.org/abs/2405.20091v4 ) ライセンス: Link先を確認	Miriam Navarro, Álvaro Becerra, Roberto Daza, Ruth Cobos, Aythami Morales, Julian Fierrez,	(参考訳) 本稿では,マルチモーダル学習分析分野におけるアプローチを提案する。本手法では,オンライン授業における学習セッション中に収集した眼球運動データを可視化・解析するツールを開発した。このツールはVAADと呼ばれ、Visual Attention Analysis Dashboardの頭字語である。これらの眼球運動データは、アイトラッカーを用いて収集され、その後、解釈のために処理され、可視化される。本ツールの目的は、可視化を容易にし、様々な学習者間での違いや学習パターンを識別できるようにすることにより、データの記述的分析を行うことである。さらに、学習セッション中に学習者の活動を予測することができる予測モジュールを統合する。その結果、VAADは記述的視点と予測的視点の両方から、オンライン学習行動に関する貴重な洞察を提供する可能性を秘めている。 In this paper, we present an approach in the Multimodal Learning Analytics field. Within this approach, we have developed a tool to visualize and analyze eye movement data collected during learning sessions in online courses. The tool is named VAAD, an acronym for Visual Attention Analysis Dashboard. These eye movement data have been gathered using an eye-tracker and subsequently processed and visualized for interpretation. The purpose of the tool is to conduct a descriptive analysis of the data by facilitating its visualization, enabling the identification of differences and learning patterns among various learner populations. Additionally, it integrates a predictive module capable of anticipating learner activities during a learning session. Consequently, VAAD holds the potential to offer valuable insights into online learning behaviors from both descriptive and predictive perspectives.	翻訳日:2024-09-04 18:00:58 公開日:2024-09-02
# UniUSNet:Universal Ultrasound Disease Prediction and tissue Segmentationのためのプロンプタブルフレームワーク UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation ( http://arxiv.org/abs/2406.01154v3 ) ライセンス: Link先を確認	Zehui Lin, Zhuoneng Zhang, Xindi Hu, Zhifan Gao, Xin Yang, Yue Sun, Dong Ni, Tao Tan,	(参考訳) 超音波は可搬性、可搬性、安全性のために臨床で広く用いられている。しかし、現在のAI研究は、疾患予測と組織セグメンテーションの組み合わせを見落としていることが多い。超音波画像分類とセグメンテーションのための汎用フレームワークUniUSNetを提案する。このモデルは様々な超音波タイプ、解剖学的位置、入力形式を扱い、セグメンテーションと分類の両方に優れる。 7つの異なる解剖学的位置から9.7Kアノテーションを持つ包括的なデータセットでトレーニングされた私たちのモデルは、最先端のパフォーマンスと一致し、単一データセットおよび短縮モデルを上回る。ゼロショットおよび微調整実験は、最小限の微調整で強力な一般化と適応性を示す。モデルウェイトとコードはhttps://github.com/Zehui-Lin/UniUSNet.orgで利用可能です。 Ultrasound is widely used in clinical practice due to its affordability, portability, and safety. However, current AI research often overlooks combined disease prediction and tissue segmentation. We propose UniUSNet, a universal framework for ultrasound image classification and segmentation. This model handles various ultrasound types, anatomical positions, and input formats, excelling in both segmentation and classification tasks. Trained on a comprehensive dataset with over 9.7K annotations from 7 distinct anatomical positions, our model matches state-of-the-art performance and surpasses single-dataset and ablated models. Zero-shot and fine-tuning experiments show strong generalization and adaptability with minimal fine-tuning. We plan to expand our dataset and refine the prompting mechanism, with model weights and code available at (https://github.com/Zehui-Lin/UniUSNet).	翻訳日:2024-09-04 18:00:58 公開日:2024-09-02
# ReST-MCTS:プロセスリワードガイドツリーサーチによるLCM自己学習 ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search ( http://arxiv.org/abs/2406.03816v2 ) ライセンス: Link先を確認	Dan Zhang, Sining Zhoubian, Ziniu Hu, Yisong Yue, Yuxiao Dong, Jie Tang,	(参考訳) LLM自己学習における最近の方法論は、主にLLM生成応答と正しい出力回答を持つ者をトレーニングデータとしてフィルタリングすることに依存している。このアプローチは、しばしば低品質の微調整トレーニングセット(例えば、間違った計画や中間的推論)をもたらす。本稿では,プロセス報酬ガイダンスと木探索MCTSを統合することで,高品質な推論トレースの収集と,トレーニング方針や報酬モデルに対するステップ単位の価値を向上する,ReST-MCTSと呼ばれる強化自己学習手法を開発する。 ReST-MCTSは、ツリー検索ベースの強化学習によってプロセス報酬をトレーニングするために一般的に使用されるステップごとのマニュアルアノテーションを回避する: オラクルの最終正解が与えられた場合、ReST-MCTSは、このステップが正しい答えにつながる確率を推定することで、正しいプロセス報酬を推測することができる。これらの推論された報酬は、プロセス報酬モデルをさらに洗練するための価値目標として機能し、ポリシーモデルによる自己学習のための高品質なトレースの選択を促進する。 ReST-MCTS* における木探索ポリシーは,Best-of-N や Tree-of-Thought といった従来の LLM 推論ベースラインと比較して,同じ検索予算内で高い精度が得られることを示す。次に、この木探索ポリシーによって探索されたトレースをトレーニングデータとして使用することにより、複数の反復に対して連続的に3つの言語モデルを拡張し、ReST$^\text{EM}$やSelf-Rewarding LMなどの自己学習アルゴリズムより優れていることを示す。 Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning). In this paper, we develop a reinforced self-training approach, called ReST-MCTS, based on integrating process reward guidance with tree search MCTS for collecting higher-quality reasoning traces as well as per-step value to train policy and reward models. ReST-MCTS* circumvents the per-step manual annotation typically used to train process rewards by tree-search-based reinforcement learning: Given oracle final correct answers, ReST-MCTS* is able to infer the correct process rewards by estimating the probability this step can help lead to the correct answer. These inferred rewards serve dual purposes: they act as value targets for further refining the process reward model and also facilitate the selection of high-quality traces for policy model self-training. We first show that the tree-search policy in ReST-MCTS* achieves higher accuracy compared with prior LLM reasoning baselines such as Best-of-N and Tree-of-Thought, within the same search budget. We then show that by using traces searched by this tree-search policy as training data, we can continuously enhance the three language models for multiple iterations, and outperform other self-training algorithms such as ReST$^\text{EM}$ and Self-Rewarding LM.	翻訳日:2024-09-04 18:00:58 公開日:2024-09-02
# マクロリン拡張による極端Q-ラーニングの安定化 Stabilizing Extreme Q-learning by Maclaurin Expansion ( http://arxiv.org/abs/2406.04896v2 ) ライセンス: Link先を確認	Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada,	(参考訳) オフライン強化学習では、データセットからのアウト・オブ・ディストリビューション動作の評価による性能劣化を防止するために、イン・サンプル学習法が広く用いられている。 Extreme Q-learning (XQL)は、ベルマン誤差がガムベル分布に従うという仮定に基づいて損失関数を用いており、ソフトな最適値関数をサンプル内でモデル化することができる。オフラインとオンラインの強化学習環境では、強力なパフォーマンスを示している。しかし、損失関数の指数項による不安定性や、ガムベル分布から逸脱する誤差分布の危険性などの問題が残っている。そこで我々は,安定性を高めるために,マクロリン拡張エクストリームQ学習を提案する。この方法では、XQLの損失関数にマクロリン拡張を適用することにより、大きなエラーに対する安定性が向上する。このアプローチでは, 動作ポリシの下での値関数とソフトな最適値関数の間にモデル化された値関数を調整し, 拡張の順序に応じて安定性と最適性のトレードオフを実現する。また、正規分布からガンベル分布への誤差分布仮定の調整も可能である。提案手法は,従来XQLが不安定であったDM制御によるオンラインRLタスクの学習を著しく安定化させる。さらに、D4RLからいくつかのオフラインRLタスクのパフォーマンスを改善する。 In offline reinforcement learning, in-sample learning methods have been widely used to prevent performance degradation caused by evaluating out-of-distribution actions from the dataset. Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution, enabling it to model the soft optimal value function in an in-sample manner. It has demonstrated strong performance in both offline and online reinforcement learning settings. However, issues remain, such as the instability caused by the exponential term in the loss function and the risk of the error distribution deviating from the Gumbel distribution. Therefore, we propose Maclaurin Expanded Extreme Q-learning to enhance stability. In this method, applying Maclaurin expansion to the loss function in XQL enhances stability against large errors. This approach involves adjusting the modeled value function between the value function under the behavior policy and the soft optimal value function, thus achieving a trade-off between stability and optimality depending on the order of expansion. It also enables adjustment of the error distribution assumption from a normal distribution to a Gumbel distribution. Our method significantly stabilizes learning in online RL tasks from DM Control, where XQL was previously unstable. Additionally, it improves performance in several offline RL tasks from D4RL.	翻訳日:2024-09-04 18:00:58 公開日:2024-09-02
# 医用画像分類のための進化認識可変(EVA)コアセット選択 Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification ( http://arxiv.org/abs/2406.05677v2 ) ライセンス: Link先を確認	Yuxin Hong, Xiao Zhang, Xin Zhang, Joey Tianyi Zhou,	(参考訳) 医療分野では、特に遠隔医療施設やモバイルデバイスなどの資源限定環境において、高次元の大規模医療画像データを管理し、信頼性の高い医療分析を行うことが重要な課題である。これは、ストレージ、送信、計算コストを削減するために効率的なデータセット圧縮技術を必要とする。しかし、既存のコアセット選択方法は、主に自然画像データセット用に設計されており、クラス内変異やクラス間の類似性といった課題により、医療画像データセットに適用した場合に疑わしい効果を示す。本稿では, モデル学習の進化過程を二重ウィンドウアプローチで捉え, 分散測定によりより正確にサンプル重要度の変動を反映する, 進化認識バリアンス (EVA) と呼ばれる新しいコアセット選択戦略を提案する。医用画像データセットの大規模な実験は、従来のSOTA法、特に高い圧縮速度での戦略の有効性を実証している。 EVAは10%のトレーニングデータで98.27%の精度を達成しているが、完全なトレーニングセットでは97.20%である。一方、EVAはRandomを5.61%上回り、効率的な医用画像解析の可能性を示している。 In the medical field, managing high-dimensional massive medical imaging data and performing reliable medical analysis from it is a critical challenge, especially in resource-limited environments such as remote medical facilities and mobile devices. This necessitates effective dataset compression techniques to reduce storage, transmission, and computational cost. However, existing coreset selection methods are primarily designed for natural image datasets, and exhibit doubtful effectiveness when applied to medical image datasets due to challenges such as intra-class variation and inter-class similarity. In this paper, we propose a novel coreset selection strategy termed as Evolution-aware VAriance (EVA), which captures the evolutionary process of model training through a dual-window approach and reflects the fluctuation of sample importance more precisely through variance measurement. Extensive experiments on medical image datasets demonstrate the effectiveness of our strategy over previous SOTA methods, especially at high compression rates. EVA achieves 98.27% accuracy with only 10% training data, compared to 97.20% for the full training set. None of the compared baseline methods can exceed Random at 5% selection rate, while EVA outperforms Random by 5.61%, showcasing its potential for efficient medical image analysis.	翻訳日:2024-09-04 17:51:09 公開日:2024-09-02
# 最適資源を用いた隠れたまたは部分的に未知の量子演算子のリモート実装:一般化された視点 Remote Implementation of Hidden or Partially Unknown Quantum Operators using Optimal Resources: A Generalized View ( http://arxiv.org/abs/2406.06223v2 ) ライセンス: Link先を確認	Satish Kumar, Kuldeep Gangwar, Anirban Pathak,	(参考訳) 2つのプロトコルは、特定の形式の量子作用素のリモート実装の2つの密接なリンクを持つ異なる変種に対して提案される。第1のプロトコルは単一量子ビット隠れ量子演算子のリモート実装のために設計され、第2のプロトコルは部分的に未知の単一量子ビット量子演算子のリモート実装のために設計されている。どちらの場合も、空間的自由度で絡み合う2ビットの最大絡み合い状態が用いられる。ここで使われる量子資源は、初期の研究で使われたマルチパーティまたはマルチモードの絡み合った状態と比較して、最適で容易に実現および維持することができる。環境との相互作用による光子損失の影響を双方のスキームで解析した。提案したプロトコルは, 制御された, 双方向, 循環的, 循環的, 制御された双方向バージョンにも一般化され, 制御されたケースにおいて, ベル状態単独でもベル状態の製品でも, 制御されたケースにのみ古典的なコミュニケーションを施すだけで, それらのタスクを実行するのに十分であることを示す。これは、大きな絡み合った状態を必要とする以前の提案とは対照的である。さらに、複数のコントローラや/または複数のプレイヤーを含む隠れたまたは部分的に未知のオペレータのリモート実装は、ベル状態とその製品よりも複雑な量子チャネルを必要とすることに注意されたい。このような量子チャネルの明示的な形式も提供される。 Two protocols are proposed for two closely linked but different variants of remote implementation of quantum operators of specific forms. The first protocol is designed for the remote implementation of the single qubit hidden quantum operator, whereas the second one is designed for the remote implementation of the partially unknown single qubit quantum operator. In both cases two-qubit maximally entangled state, which is entangled in the spatial degree of freedom is used. The quantum resources used here are optimal and easy to realize and maintain in comparison to the multi-partite or multi-mode entangled states used in earlier works. The impact of photon loss due to interaction with the environment is analyzed for both the schemes. The proposed protocols are also generalized to their controlled, bidirectional, cyclic, controlled cyclic, and controlled bidirectional versions and it is shown that either Bell state alone or products of Bell states will be sufficient to perform these tasks with some additional classical communications in the controlled cases only. This is in sharp contrast to the earlier proposals that require large entangled states. In addition, it's noted that remote implementation of hidden or partially unknown operators involving multiple controllers and/or multiple players who jointly apply the desired operator(s) would require quantum channels more complex than the Bell states and their products. Explicit forms of such quantum channels are also provided.	翻訳日:2024-09-04 17:51:09 公開日:2024-09-02
# 連続および離散量子バスを用いた光誘起ダイナミクス Photo-induced dynamics with continuous and discrete quantum baths ( http://arxiv.org/abs/2406.07047v3 ) ライセンス: Link先を確認	Zhaoxuan Xie, Mattia Moroder, Ulrich Schollwöck, Sebastian Paeckel,	(参考訳) 複雑な分子における光物理過程の超高速量子力学は、量子化学と生物学における様々な興味深い応用で非常に難しい計算問題である。オープン量子系の最近の発展に触発されて、マルコフの埋め込みを用いて、離散的で効果的なボゾン自由度の集合を通して連続環境を記述する純粋状態の未発見ハイブリッドバス法を導入する。本手法は, 連続スペクトル密度と鋭いピークの双方を記述できる。これにより、離散振動モードの集合のユニタリダイナミクスを用いた長期記憶効果のキャプチャや、リンドブラッドやレッドフィールドのマスター方程式を用いたメモリレスマルコフ環境の利用といった、従来の手法の限界を克服する。量子化学と生物学の2つのパラダイム的問題に対して,本手法をベンチマークする。単元的記述と比較して、ボソニックモードの数が非常に少なく、エクシトニックダイナミクスを正確に記述でき、計算速度がほぼ1桁に向上することを示した。さらに、光ハーベスティング複合体のスペクトル密度が$$\delta$-peakの効果を明示的に考慮し、環境の長期記憶が動的に与える影響を強く示している。 The ultrafast quantum dynamics of photophysical processes in complex molecules is an extremely challenging computational problem with a wide variety of fascinating applications in quantum chemistry and biology. Inspired by recent developments in open quantum systems, we introduce a pure-state unraveled hybrid-bath method that describes a continuous environment via a set of discrete, effective bosonic degrees of freedom using a Markovian embedding. Our method is capable of describing both, a continuous spectral density and sharp peaks embedded into it. Thereby, we overcome the limitations of previous methods, which either capture long-time memory effects using the unitary dynamics of a set of discrete vibrational modes or use memoryless Markovian environments employing a Lindblad or Redfield master equation. We benchmark our method against two paradigmatic problems from quantum chemistry and biology. We demonstrate that compared to unitary descriptions, a significantly smaller number of bosonic modes suffices to describe the excitonic dynamics accurately, yielding a computational speed-up of nearly an order of magnitude. Furthermore, we take into account explicitly the effect of a $\delta$-peak in the spectral density of a light-harvesting complex, demonstrating the strong impact of the long-time memory of the environment on the dynamics.	翻訳日:2024-09-04 17:51:09 公開日:2024-09-02
# 超低温原子量子電池の安定集団充電 Stable collective charging of ultracold atoms quantum batteries ( http://arxiv.org/abs/2406.07397v2 ) ライセンス: Link先を確認	Abel Rojo-Francàs, Felipe Isaule, Alan C. Santos, Bruno Juliá-Díaz, Nikolaj Thomas Zinner,	(参考訳) 我々は,超低温の原子プラットフォームで実現可能な,異なるオンサイトエネルギーを持つ3井戸系において,相互作用する粒子数個で実現された新しい量子電池を提案する。我々は、最低エネルギー井戸における初期状態を作成し、SAP(Spatial Adiabatic Passage)ベースのプロトコルを用いて電池を充電し、高いエネルギー井戸の人口を可能とした。我々は, 相互作用強度の異なる帯電について検討し, 集団帯電の考慮は, ダイアバティック進化を通じて, 有限相互作用に対する最終電荷の興味深い振動挙動をもたらすことを示した。我々の発見は、安定かつ制御可能な量子電池を構築するための新しい道を開く。 We propose a novel quantum battery realized with a few interacting particles in a three-well system with different on-site energies, which could be realized with ultracold atom platforms. We prepare the initial state in the lowest energy well and charge the battery using a Spatial Adiabatic Passage (SAP)-based protocol, enabling the population of a higher energy well. We examine the charging under varying interaction strengths and reveal that the consideration of collective charging results in an intriguing oscillatory behavior of the final charge for finite interactions, through diabatic evolution. Our findings open a new avenue for building stable and controllable quantum batteries.	翻訳日:2024-09-04 17:51:09 公開日:2024-09-02
# MAP: 擬似近似によるアモータイズされたパレートフロントとの低計算モデル融合 MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation ( http://arxiv.org/abs/2406.07529v3 ) ライセンス: Link先を確認	Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio,	(参考訳) モデルマージは、同じトレーニング済みモデルから微調整された複数のシングルタスクモデルをマルチタスクモデルに結合する効果的なアプローチとして登場した。このプロセスは通常、追加のトレーニングなしでモデルのパラメータの重み付け平均を計算する。既存のモデルマージ手法は、平均的なタスク精度の向上に重点を置いている。しかしながら、異なるタスクの目的間の干渉と矛盾は、モデルマージ時のトレードオフにつながる可能性がある。現実世界のアプリケーションでは、さまざまなトレードオフを伴う一連のソリューションがより有益なものになり、実践者がさまざまな好みに基づいて意思決定するのに役立つ。本稿では,新しい低計算アルゴリズムであるモデルマージングとアモータライズされたパレートフロント(MAP)を提案する。 MAPは、複数のモデルをマージしてトレードオフを反映するためのParetoのスケーリング係数のセットを特定する。 MAPのコアコンポーネントは、事前選択されたスケーリング係数の集合から導かれる二次近似代理モデルを用いて、様々なタスクの評価指標を近似し、償却推論を可能にすることである。視覚および自然言語処理タスクの実験結果は、MAPがパレートフロントを正確に識別できることを示している。さらにMAPの計算量を削減するために,(1)ベイズ適応型サンプリングアルゴリズムを提案し,(2)複数段階のネスト型マージ方式を提案する。 Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages.	翻訳日:2024-09-04 17:51:09 公開日:2024-09-02
# ボリューム・メディカル・セグメンテーション・モデルにおける対向ロバスト性の評価について On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models ( http://arxiv.org/abs/2406.08486v2 ) ライセンス: Link先を確認	Hashmat Shadab Malik, Numan Saeed, Asif Hanif, Muzammal Naseer, Mohammad Yaqub, Salman Khan, Fahad Shahbaz Khan,	(参考訳) 近年, 臓器および腫瘍ベースの分節作業において, 体積医学的分節モデルが大きな成功を収めている。しかし、敵の攻撃に対するその脆弱性は未解明のままであり、医療分野におけるそのようなモデルを用いたツールの現実的な展開に関して深刻な懸念が持ち上がっている。このことは、既存のモデルの堅牢性を調べることの重要性を浮き彫りにしている。本研究の目的は,コンボリューショナル,トランスフォーマー,マンバをベースとしたモデルを含む,現在のボリュームセグメンテーションアーキテクチャにおける対角的ロバスト性を実証的に検証することである。我々はこの調査を4つのボリュームセグメンテーションデータセットに拡張し、ホワイトボックスとブラックボックスの双方の攻撃下で堅牢性を評価する。全体としては、ピクセルベースと周波数ベースの両方の攻撃は \emph{white box} 設定で合理的に動作するが、後者は転送ベースブラックボックス攻撃では著しく改善されている。本実験では, コンボリューションモデルよりもトランスフォーマーモデルの方が高いロバスト性を示し, マンバモデルが最も脆弱であることを示す。さらに,ボリュームセグメンテーションモデルの大規模トレーニングにより,敵攻撃に対するモデルの堅牢性が向上することを示す。コードとロバストモデルはhttps://github.com/HashmatShadab/Robustness-of-Volumetric-Medical-Segmentation-Modelsで公開されている。 Volumetric medical segmentation models have achieved significant success on organ and tumor-based segmentation tasks in recent years. However, their vulnerability to adversarial attacks remains largely unexplored, raising serious concerns regarding the real-world deployment of tools employing such models in the healthcare sector. This underscores the importance of investigating the robustness of existing models. In this context, our work aims to empirically examine the adversarial robustness across current volumetric segmentation architectures, encompassing Convolutional, Transformer, and Mamba-based models. We extend this investigation across four volumetric segmentation datasets, evaluating robustness under both white box and black box adversarial attacks. Overall, we observe that while both pixel and frequency-based attacks perform reasonably well under \emph{white box} setting, the latter performs significantly better under transfer-based black box attacks. Across our experiments, we observe transformer-based models show higher robustness than convolution-based models with Mamba-based models being the most vulnerable. Additionally, we show that large-scale training of volumetric segmentation models improves the model's robustness against adversarial attacks. The code and robust models are available at https://github.com/HashmatShadab/Robustness-of-Volumetric-Medical-Segmentation-Models.	翻訳日:2024-09-04 17:51:09 公開日:2024-09-02
# CHiSafetyBench: 大規模言語モデルのための中国の階層的安全性ベンチマーク CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models ( http://arxiv.org/abs/2406.10311v2 ) ライセンス: Link先を確認	Wenjing Zhang, Xuejiao Lei, Zhaoxiang Liu, Meijuan An, Bikun Yang, KaiKai Zhao, Kai Wang, Shiguo Lian,	(参考訳) 大規模言語モデル(LLM)の深い発展に伴い、その安全性に関する懸念が高まっている。しかし、中国のLLMの安全基準は乏しく、既存の安全分類は不十分であり、中国の真正なシナリオにおいて包括的な安全性検出能力が欠如している。本研究は,中国におけるリスクのあるコンテンツの識別と,リスクのある質問への回答を拒否するLLMの能力を評価するための,専用の安全ベンチマークであるCHiSafetyBenchを紹介する。 CHiSafetyBenchは5つのリスク領域と31のカテゴリからなる階層的な中国の安全分類を網羅したデータセットを組み込んでいる。このデータセットは、複数の選択質問と質問回答、リスクコンテンツ識別の観点からのLSMの評価、リスクのある質問への回答を拒否する能力の2つのタスクからなる。本ベンチマークを用いて,人的評価の代用として自動評価の実現可能性を検証するとともに,中国の主要LLMを対象とした総合的自動安全性評価を行う。本実験により, 各種安全領域における各種モデルの各種性能が明らかとなり, 中国における安全能力向上の可能性が示唆された。私たちのデータセットはhttps://github.com/UnicomAI/UnicomBenchmark/tree/main/CHiSafetyBenchで公開されています。 With the profound development of large language models(LLMs), their safety concerns have garnered increasing attention. However, there is a scarcity of Chinese safety benchmarks for LLMs, and the existing safety taxonomies are inadequate, lacking comprehensive safety detection capabilities in authentic Chinese scenarios. In this work, we introduce CHiSafetyBench, a dedicated safety benchmark for evaluating LLMs' capabilities in identifying risky content and refusing answering risky questions in Chinese contexts. CHiSafetyBench incorporates a dataset that covers a hierarchical Chinese safety taxonomy consisting of 5 risk areas and 31 categories. This dataset comprises two types of tasks: multiple-choice questions and question-answering, evaluating LLMs from the perspectives of risk content identification and the ability to refuse answering risky questions respectively. Utilizing this benchmark, we validate the feasibility of automatic evaluation as a substitute for human evaluation and conduct comprehensive automatic safety assessments on mainstream Chinese LLMs. Our experiments reveal the varying performance of different models across various safety domains, indicating that all models possess considerable potential for improvement in Chinese safety capabilities. Our dataset is publicly available at https://github.com/UnicomAI/UnicomBenchmark/tree/main/CHiSafetyBench.	翻訳日:2024-09-04 17:51:09 公開日:2024-09-02
# 利活用とユーティリティのバランス: 大規模言語モデルにおける認知バイアスの緩和 Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice Questions ( http://arxiv.org/abs/2406.10999v2 ) ライセンス: Link先を確認	Liman Wang, Hanyang Zhong,	(参考訳) 本稿では,大規模言語モデル(LLM)の意思決定過程における認知バイアスの役割を考察し,すべてのバイアスを取り除くという従来の目標に挑戦する。適切なバランスをとると、合理的な偏差やヒューリスティックなショートカットによって意思決定効率を高めることができる。ヒューリスティックなモデレーションと停止オプションを導入し、不確実な場合の応答を抑えることで、エラー率を減らし、意思決定精度を向上し、意思決定率を最適化する。専門家のコラボレーションを通じて開発されたBa balance Rigor and Utility(BRU)データセットを用いて、認知バイアスの検査がLLM決定を人間の推論とより緊密に一致させ、信頼性を高め、今後の改善のための戦略を提案することを示す。このアプローチは、認知バイアスを活用する新しい方法を提供し、様々なアプリケーションにおけるLCMの実用性を改善する。 This paper examines the role of cognitive biases in the decision-making processes of large language models (LLMs), challenging the conventional goal of eliminating all biases. We show that certain cognitive biases when properly balanced, can enhance decision-making efficiency through rational deviations and heuristic shortcuts. By introducing heuristic moderation and an abstention option, which allows LLMs to withhold responses when uncertain, we reduce error rates, improve decision accuracy, and optimize decision rates. Using the Balance Rigor and Utility (BRU) dataset, developed through expert collaboration, our findings demonstrate that targeted inspection of cognitive biases aligns LLM decisions more closely with human reasoning, enhancing reliability and suggesting strategies for future improvements. This approach offers a novel way to leverage cognitive biases to improve the practical utility of LLMs across various applications.	翻訳日:2024-09-04 17:51:09 公開日:2024-09-02
# DIDChain: 分散識別子とブロックチェーンによるサプライチェーンデータ管理の強化 DIDChain: Advancing Supply Chain Data Management with Decentralized Identifiers and Blockchain ( http://arxiv.org/abs/2406.11356v2 ) ライセンス: Link先を確認	Patrick Herbke, Sid Lamichhane, Kaustabh Barman, Sanjeet Raj Pandey, Axel Küpper, Andreas Abraham, Markus Sabadello,	(参考訳) サプライチェーンのデータ管理は、トレーサビリティ、透明性、信頼性の課題に直面している。これらの問題は、データサイロと通信障壁に起因する。ブロックチェーン技術を活用したフレームワークであるDIDChain、分散ID、InterPlanetary File Systemを紹介する。 DIDChainはサプライチェーンデータ管理を改善する。プライバシの懸念に対処するため、DIDChainでは、パブリックブロックチェーンの透明性とプライベートシステムのコントロールを組み合わせた、ハイブリッドブロックチェーンアーキテクチャを採用している。我々のハイブリッドアプローチはサプライチェーンイベントの信頼性と信頼性を保っている。また、サプライチェーンの参加者のデータのプライバシ要件も尊重している。 DIDChainの中心はcheqdインフラストラクチャである。チェクド・インフラストラクチャーは、乳生産の乳園からチーズ製造者へ移動する資産のような資産イベントのデジタルトレースを可能にする。この研究では、資産は原材料と製品である。 cheqdインフラストラクチャは、サプライチェーンデータ管理における資産のトレーサビリティと信頼性を保証する。ブロックチェーン対応サプライチェーンシステムへの私たちの貢献は、DIDChainの堅牢性を示しています。 DIDChainによるブロックチェーンテクノロジの統合は、データサイロと通信障壁に対するソリューションを提供する。 DIDChainでは,サプライチェーンのインフラを産業に転換する枠組みを提案する。 Supply chain data management faces challenges in traceability, transparency, and trust. These issues stem from data silos and communication barriers. This research introduces DIDChain, a framework leveraging blockchain technology, Decentralized Identifiers, and the InterPlanetary File System. DIDChain improves supply chain data management. To address privacy concerns, DIDChain employs a hybrid blockchain architecture that combines public blockchain transparency with the control of private systems. Our hybrid approach preserves the authenticity and reliability of supply chain events. It also respects the data privacy requirements of the participants in the supply chain. Central to DIDChain is the cheqd infrastructure. The cheqd infrastructure enables digital tracing of asset events, such as an asset moving from the milk-producing dairy farm to the cheese manufacturer. In this research, assets are raw materials and products. The cheqd infrastructure ensures the traceability and reliability of assets in the management of supply chain data. Our contribution to blockchain-enabled supply chain systems demonstrates the robustness of DIDChain. Integrating blockchain technology through DIDChain offers a solution to data silos and communication barriers. With DIDChain, we propose a framework to transform the supply chain infrastructure across industries.	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# 分散型識別子と検証基準を用いた小売店のライフサイクル管理 Lifecycle Management of Resumés with Decentralized Identifiers and Verifiable Credentials ( http://arxiv.org/abs/2406.11535v2 ) ライセンス: Link先を確認	Patrick Herbke, Anish Sapkota, Sid Lamichhane,	(参考訳) アプリケーションの信頼は、迅速かつ効率的な雇用プロセスに不可欠です。申請者は、雇用主が遅延や不正な情報のリスクを伴わずに信用できる検証済みの証明書を提示しなければならない。本稿では、分散アプリケーション、分散識別子、検証クレデンシャルを活用することにより、信頼課題に対処する、デジタルリサム認証を管理するための信頼枠組を紹介する。我々は,仲介者なしで検証可能なクレデンシャルをリアルタイムに発行し,保存し,検証するためのフレームワークを提案する。欧州ブロックチェーンサービスインフラストラクチャの統合を信頼アンカーとして紹介する。さらに,アプリケーションプロセスの合理化,検証時間の短縮,採用や専門認定など,さまざまな分野にわたる信頼性の高い認証エコシステムの育成などについても紹介する。 Trust in applications is crucial for fast and efficient hiring processes. Applicants must present verifiable credentials that employers can trust without delays or the risk of fraudulent information. This paper introduces a trust framework for managing digital resum\'e credentials, addressing trust challenges by leveraging Decentralized Applications, Decentralized Identifiers, and Verifiable Credentials. We propose a framework for real-time issuance, storage, and verification of Verifiable Credentials without intermediaries. We showcase the integration of the European Blockchain Service Infrastructure as a trust anchor. Furthermore, we demonstrate a streamlined application process, reducing verification times and fostering a reliable credentialing ecosystem across various sectors, including recruitment and professional certification.	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# リポジトリレベルコード生成における文脈の影響について On the Impacts of Contexts on Repository-Level Code Generation ( http://arxiv.org/abs/2406.11927v3 ) ライセンス: Link先を確認	Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui,	(参考訳) CodeLLMは、コード生成タスクに広く採用されているが、複雑なコンテキスト依存を伴うリポジトリレベルのコード生成を扱う能力は、まだ探索されていない。私たちの研究は、実行可能で機能的に正しいコードを生成するためにリポジトリレベルのコンテキストを活用することの重要性を強調しています。我々は、リポジトリレベルのコード生成を評価するために設計された新しいベンチマークである‘textbf{\methodnamews} を、実行可能性、包括的なテストケース生成による機能的正確性、クロスファイルコンテキストの正確な利用の3つの重要な側面に焦点をあてる。本研究は,開発者が本質的なコード依存(コンテキスト)を規定する制御シナリオについて検討し,効果的に統合するためのモデルに挑戦する。さらに,CodeLLMsの依存性を活用したインストラクションチューニングデータセットと,コンテキスト利用の定量化を目的とした新しいメトリックである‘textit{Dependency Invocation Rate(DIR)’を導入する。実験結果から,事前学習したLLMは正確性において優れた性能を示す一方で,文脈利用とデバッグ能力に優れた命令調整モデルが得られた。 \methodnamewsは、コード機能の評価と開発者の意図の整合性を評価するための包括的な評価フレームワークを提供する。データセットとソースコードは~\url{https://github.com/FSoft-AI4Code/RepoExec}で入手できる。 CodeLLMs have gained widespread adoption for code generation tasks, yet their capacity to handle repository-level code generation with complex contextual dependencies remains underexplored. Our work underscores the critical importance of leveraging repository-level contexts to generate executable and functionally correct code. We present \textbf{\methodnamews}, a novel benchmark designed to evaluate repository-level code generation, with a focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts. Our study examines a controlled scenario where developers specify essential code dependencies (contexts), challenging models to integrate them effectively. Additionally, we introduce an instruction-tuned dataset that enhances CodeLLMs' ability to leverage dependencies, along with a new metric, \textit{Dependency Invocation Rate (DIR)}, to quantify context utilization. Experimental results reveal that while pretrained LLMs demonstrate superior performance in terms of correctness, instruction-tuned models excel in context utilization and debugging capabilities. \methodnamews offers a comprehensive evaluation framework for assessing code functionality and alignment with developer intent, thereby advancing the development of more reliable CodeLLMs for real-world applications. The dataset and source code are available at~\url{https://github.com/FSoft-AI4Code/RepoExec}.	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# LLMの算術的推論を誘発する起因性連鎖を説明できる統一レンズとしてのニューロン活性化の検討 An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning of LLMs ( http://arxiv.org/abs/2406.12288v3 ) ライセンス: Link先を確認	Daking Rai, Ziyu Yao,	(参考訳) 大規模言語モデル(LLM)は、Chain-of-Thought(CoT)プロンプトによって引き起こされた強い算術的推論能力を示している。しかし、LLMによってどのように処理されるかは限定的な理解しか得られない。 CoTプロンプトの異なるコンポーネントを非難し、その結果のLCMパフォーマンスの変化を経験的に観察することに焦点を当てていた。しかし、これらのコンポーネントがLSM推論において重要である理由は明らかにされていない。このギャップを埋めるために、本稿では、先行研究による観察を統一的に説明するために、レンズとしての「ニューロン活性化」について検討する。具体的には、Llama2を例として、LLMのフィードフォワード層内のニューロンについて、算術的推論能力を活性化させた可能性があることを考察する。本研究は,GPT-4に基づく,推論を算術的に意味するニューロンを自動同定する手法を提案する。解析の結果、LLMのフィードフォワード層における推論ニューロンの活性化は、CoTプロンプトにおける様々な成分の重要性を説明でき、今後の研究は、より完全な理解のためにそれを拡張できることが判明した。 Large language models (LLMs) have shown strong arithmetic reasoning capabilities when prompted with Chain-of-Thought (CoT) prompts. However, we have only a limited understanding of how they are processed by LLMs. To demystify it, prior work has primarily focused on ablating different components in the CoT prompt and empirically observing their resulting LLM performance change. Yet, the reason why these components are important to LLM reasoning is not explored. To fill this gap, in this work, we investigate ``neuron activation'' as a lens to provide a unified explanation to observations made by prior work. Specifically, we look into neurons within the feed-forward layers of LLMs that may have activated their arithmetic reasoning capabilities, using Llama2 as an example. To facilitate this investigation, we also propose an approach based on GPT-4 to automatically identify neurons that imply arithmetic reasoning. Our analyses revealed that the activation of reasoning neurons in the feed-forward layers of an LLM can explain the importance of various components in a CoT prompt, and future research can extend it for a more complete understanding.	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# 保健 LLM 研究における多様性の分析 : サイエントメトリック・パースペクティブ Analyzing Diversity in Healthcare LLM Research: A Scientometric Perspective ( http://arxiv.org/abs/2406.13152v2 ) ライセンス: Link先を確認	David Restrepo, Chenwei Wu, Constanza Vásquez-Venegas, João Matos, Jack Gallifant, Leo Anthony Celi, Danielle S. Bitterman, Luis Filipe Nakayama,	(参考訳) 医療における大規模言語モデル (LLMs) の展開は, 臨床意思決定, 管理効率, 患者の予後を向上する大きな可能性を示唆している。しかしながら、これらのモデルの開発と適用における多様なグループの過小評価はバイアスを持続させ、不平等な医療提供につながる可能性がある。本稿では、2021年1月1日から2024年7月1日までのデータを含む、医療のためのLSM研究の総合的な科学的分析について述べる。著者、国、資金源を含むPubMedおよびDimensionsのメタデータを分析することにより、LCM研究への貢献者の多様性を評価する。高所得国(HICs)の男性作家や貢献者を中心に,男女差や地理的格差が顕著であった。本稿では,学術誌の包括性を評価するため,Giniの多様性に基づく新しい雑誌の多様性指標を提案する。医療におけるLLMの適正な適用を確保するためには,より大きな表現の必要性を強調した。我々は、人工知能研究における多様性と傾きを高めるための実行可能な戦略を提案し、医療革新においてより包括的で公平な未来を育むという究極の目標を掲げる。 The deployment of large language models (LLMs) in healthcare has demonstrated substantial potential for enhancing clinical decision-making, administrative efficiency, and patient outcomes. However, the underrepresentation of diverse groups in the development and application of these models can perpetuate biases, leading to inequitable healthcare delivery. This paper presents a comprehensive scientometric analysis of LLM research for healthcare, including data from January 1, 2021, to July 1, 2024. By analyzing metadata from PubMed and Dimensions, including author affiliations, countries, and funding sources, we assess the diversity of contributors to LLM research. Our findings highlight significant gender and geographic disparities, with a predominance of male authors and contributions primarily from high-income countries (HICs). We introduce a novel journal diversity index based on Gini diversity to measure the inclusiveness of scientific publications. Our results underscore the necessity for greater representation in order to ensure the equitable application of LLMs in healthcare. We propose actionable strategies to enhance diversity and inclusivity in artificial intelligence research, with the ultimate goal of fostering a more inclusive and equitable future in healthcare innovation.	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# 強化学習における時間差分化の慣用性 An Idiosyncrasy of Time-discretization in Reinforcement Learning ( http://arxiv.org/abs/2406.14951v2 ) ライセンス: Link先を確認	Kris De Asis, Richard S. Sutton,	(参考訳) 多くの強化学習アルゴリズムは、エージェントが固定的な時間ステップで環境と相互作用するという仮定に基づいて構築される。しかし、物理系は時間的に連続しており、デジタル的に制御するには時間差の粒度を選択する必要がある。さらに、このようなシステムは環境状態の進行に先立って決定が下されるのを待たず、離散化の選択が強化学習アルゴリズムにどのように影響するかを研究する必要がある。本研究では,連続時間と離散時間の関係について考察する。具体的には、離散時間アルゴリズムを離散化された連続時間環境に適用し、簡単な修正で戻り値の定義をよりよく整合させることができることに留意する。この観察は、時間差の粒度が選択される環境や、そのような粒度が本質的に確率的な環境を扱う場合の実践的考察である。 Many reinforcement learning algorithms are built on an assumption that an agent interacts with an environment over fixed-duration, discrete time steps. However, physical systems are continuous in time, requiring a choice of time-discretization granularity when digitally controlling them. Furthermore, such systems do not wait for decisions to be made before advancing the environment state, necessitating the study of how the choice of discretization may affect a reinforcement learning algorithm. In this work, we consider the relationship between the definitions of the continuous-time and discrete-time returns. Specifically, we acknowledge an idiosyncrasy with naively applying a discrete-time algorithm to a discretized continuous-time environment, and note how a simple modification can better align the return definitions. This observation is of practical consideration when dealing with environments where time-discretization granularity is a choice, or situations where such granularity is inherently stochastic.	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# 室温量子技術のための量子エミッタの近接場強結合と絡み合い Near-field Strong Coupling and Entanglement of Quantum Emitters for Room-temperature Quantum Technologies ( http://arxiv.org/abs/2406.15171v2 ) ライセンス: Link先を確認	Daniel D. A. Clarke, Ortwin Hess,	(参考訳) 近年、量子ナノフォトニクスはナノテクノロジーの豊富なネクサスにフォトニック量子情報処理を導入し、物理的コンパクト性、エネルギー効率、動作速度、温度堅牢性、スケーラビリティの観点から、現在の技術的限界を超えて量子技術の進歩を目覚ましい見込みを与えている。この観点からは、ナノプラズマ空洞の量子電磁力学がナノスケールの空間的および超高速の時間的状態まで量子技術を駆動し、それを周囲の温度まで上昇させるという、特に説得力のある可能性を明らかにする最近の研究をいくつか取り上げる。我々の視点は、量子プラズモンバイオセンシングの革新的な提案、超高速単一光子放出の推進、強いカップリング体制における近接場多粒子の絡み合いの実現、産業レベルのデバイスの使用に重点を置いている。我々は,超高速で室温の量子ナノテクノロジーにおいて,プラズモニックデバイスの特徴と機能がどのように現代の研究指令を形作っているかを強調した展望で結論付けた。 In recent years, quantum nanophotonics has forged a rich nexus of nanotechnology with photonic quantum information processing, offering remarkable prospects for advancing quantum technologies beyond their current technical limits in terms of physical compactness, energy efficiency, operation speed, temperature robustness and scalability. In this perspective, we highlight a number of recent studies that reveal the especially compelling potential of nanoplasmonic cavity quantum electrodynamics for driving quantum technologies down to nanoscale spatial and ultrafast temporal regimes, whilst elevating them to ambient temperatures. Our perspective encompasses innovative proposals for quantum plasmonic biosensing, driving ultrafast single-photon emission and achieving near-field multipartite entanglement in the strong coupling regime, with a notable emphasis on the use of industry-grade devices. We conclude with an outlook emphasizing how the bespoke characteristics and functionalities of plasmonic devices are shaping contemporary research directives in ultrafast and room-temperature quantum nanotechnologies.	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# SEDMamba: ロボット支援手術における効率的なエラー検出のためのボツネック機構と微細から粗い時間融合による選択的状態空間モデルの実現 SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-to-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery ( http://arxiv.org/abs/2406.15920v2 ) ライセンス: Link先を確認	Jialang Xu, Nazir Sirajudeen, Matthew Boal, Nader Francis, Danail Stoyanov, Evangelos Mazomenos,	(参考訳) 外科的エラーの自動検出は、ロボット支援手術を改善することができる。期待された進歩にもかかわらず、既存の手法は計算効率を保ちながら長期的な依存関係を確立するために、豊富な時間的コンテキストを捉えるという課題に直面している。本稿では,選択状態空間モデル(SSM)を外科的エラー検出に組み込んだSEDMambaという新しい階層モデルを提案する。 SEDMambaは、長期ビデオにおける外科的エラーの検出と時間的局所化のために、ボトルネック機構と微細から粗い時間的融合(FCTF)を備えた選択的SSMを強化する。ボトルネック機構は空間次元内の特徴を圧縮して復元し、計算複雑性を低減させる。 FCTFは、複数の拡張された1D畳み込み層を使用して、様々なスケール範囲にわたる時間情報をマージし、様々な期間のエラーを調節する。我々の研究は、実際の手術症例におけるエラー検出を支援するために、第一種、フレームレベル、生存中の外科的エラーデータセットにも貢献する。具体的には、オープンソースの根治的前立腺切除術データセット(SAR-RARP50)において、縫合作業中のエラーをアノテートするために、臨床的に検証された人体信頼性評価ツール(OCHRA)をデプロイする。実験の結果,SEDMambaはAUCが1.82%以上,AP性能が3.80%,計算複雑性が大幅に低下した状態で,最先端の手法よりも優れていた。対応するエラーアノテーション、コード、モデルはhttps://github.com/wzjialang/SEDMamba.comでリリースされる。 Automated detection of surgical errors can improve robotic-assisted surgery. Despite promising progress, existing methods still face challenges in capturing rich temporal context to establish long-term dependencies while maintaining computational efficiency. In this paper, we propose a novel hierarchical model named SEDMamba, which incorporates the selective state space model (SSM) into surgical error detection, facilitating efficient long sequence modelling with linear complexity. SEDMamba enhances selective SSM with a bottleneck mechanism and fine-to-coarse temporal fusion (FCTF) to detect and temporally localize surgical errors in long videos. The bottleneck mechanism compresses and restores features within their spatial dimension, thereby reducing computational complexity. FCTF utilizes multiple dilated 1D convolutional layers to merge temporal information across diverse scale ranges, accommodating errors of varying duration. Our work also contributes the first-of-its-kind, frame-level, in-vivo surgical error dataset to support error detection in real surgical cases. Specifically, we deploy the clinically validated observational clinical human reliability assessment tool (OCHRA) to annotate the errors during suturing tasks in an open-source radical prostatectomy dataset (SAR-RARP50). Experimental results demonstrate that our SEDMamba outperforms state-of-the-art methods with at least 1.82% AUC and 3.80% AP performance gains with significantly reduced computational complexity. The corresponding error annotations, code and models will be released at https://github.com/wzjialang/SEDMamba.	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# AudioBench: オーディオ大言語モデルのためのユニバーサルベンチマーク AudioBench: A Universal Benchmark for Audio Large Language Models ( http://arxiv.org/abs/2406.16020v3 ) ライセンス: Link先を確認	Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen,	(参考訳) 我々はAudioLLMs(AudioLLMs)を評価するために設計されたユニバーサルベンチマークであるAudioBenchを紹介する。 8つの異なるタスクと26のデータセットを含み、そのうち7つは新しく提案されたデータセットである。この評価は、音声理解、音声シーン理解、および音声理解(パラ言語学)の3つの主要な側面をターゲットにしている。最近の進歩にもかかわらず、オーディオ信号に条件付けされた命令に対するAudioLLMsの包括的なベンチマークが欠如している。 AudioBenchは、データセットと望ましい評価指標を設定することで、このギャップに対処する。さらに、5つの人気モデルの能力を評価し、すべてのタスクに一貫した一貫したモデルが存在しないことを発見した。我々は、AudioLLMsの研究見通しを概説し、我々のオープンソースの評価ツールキット、データ、およびリーダーボードが将来のモデル開発に堅牢なテストベッドを提供することを期待しています。 We introduce AudioBench, a universal benchmark designed to evaluate Audio Large Language Models (AudioLLMs). It encompasses 8 distinct tasks and 26 datasets, among which, 7 are newly proposed datasets. The evaluation targets three main aspects: speech understanding, audio scene understanding, and voice understanding (paralinguistic). Despite recent advancements, there lacks a comprehensive benchmark for AudioLLMs on instruction following capabilities conditioned on audio signals. AudioBench addresses this gap by setting up datasets as well as desired evaluation metrics. Besides, we also evaluated the capabilities of five popular models and found that no single model excels consistently across all tasks. We outline the research outlook for AudioLLMs and anticipate that our open-sourced evaluation toolkit, data, and leaderboard will offer a robust testbed for future model developments.	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# FastMem: Promptの高速覚書化により,大規模言語モデルのコンテキスト認識性が向上 FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models ( http://arxiv.org/abs/2406.16069v2 ) ライセンス: Link先を確認	Junyi Zhu, Shuochen Liu, Yu Yu, Bo Tang, Yibo Yan, Zhiyu Li, Feiyu Xiong, Tong Xu, Matthew B. Blaschko,	(参考訳) 大きな言語モデル(LLM)は、一貫性のあるテキストを生成するのに優れているが、コンテキスト認識に苦しむことが多く、提供された情報に忠実に従わなければならないタスクにおいて不正確である。我々は,命令を微調整したLLMの文脈認識を高速な記憶により向上させる新しい手法であるFastMemを紹介する。 FastMemは、最後のFeed-Forward Network (FFN)モジュールのみを微調整することで、推論前のプロンプトの可能性を最大化する。このターゲットのアプローチは、過度に適合することなく効率的な最適化を保証し、モデルの理解能力を大幅に改善し、コンテキストを正確に追従する。本実験は, 読解理解, テキスト要約, 出力構造への順守において, かなりの効果を示した。例えば、FastMemはNQ-SWAPデータセット上のLlama 3-8B-Instの精度を59.1%から71.6%に改善し、Qwen 1.5-4B-Chatの出力構造失敗率を34.9%から25.5%に下げる。大規模な実験の結果は、さまざまなアプリケーションにおけるLLMの信頼性と精度を高める堅牢なソリューションを提供するFastMemの可能性を浮き彫りにしている。私たちのコードは、https://github.com/IAAR-Shanghai/FastMemで利用可能です。 Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before inference by fine-tuning only the last Feed-Forward Network (FFN) module. This targeted approach ensures efficient optimization without overfitting, significantly improving the model's ability to comprehend and accurately follow the context. Our experiments demonstrate substantial gains in reading comprehension, text summarization and adherence to output structures. For instance, FastMem improves the accuracy of Llama 3-8B-Inst on the NQ-SWAP dataset from 59.1% to 71.6%, and reduces the output structure failure rate of Qwen 1.5-4B-Chat from 34.9% to 25.5%. Extensive experimental results highlight FastMem's potential to offer a robust solution to enhance the reliability and accuracy of LLMs in various applications. Our code is available at: https://github.com/IAAR-Shanghai/FastMem	翻訳日:2024-09-04 17:41:09 公開日:2024-09-02
# 線形判別分析におけるミスデータの直接処理による分類精度と解釈性の向上 Directly Handling Missing Data in Linear Discriminant Analysis for Enhancing Classification Accuracy and Interpretability ( http://arxiv.org/abs/2407.00710v2 ) ライセンス: Link先を確認	Tuan L. Vo, Uyen Dang, Thu Nguyen,	(参考訳) 人工知能(AI)モデルの採用が重要な現実世界の応用へと拡大するにつれて、これらのモデルの説明可能性が最も重要となる。線形判別分析(LDA)は、その解釈可能な性質から、クラス分布のモデル化と特徴の線形結合によるクラス分離の強化により、分類において一般的な選択肢である。しかし、現実世界のデータセットは不完全なデータに悩まされることが多く、分類精度とモデル解釈可能性の両方に重大な課題が生じる。本稿では、LDAを拡張して、計算不要な値でデータセットを扱えるように拡張した、重み付き欠落線形識別分析(WLDA)という、新しい頑健な分類手法を提案する。提案手法は,欠落したエントリをペナライズする重み行列を革新的に組み込んで,不完全データを直接パラメータ推定する。この手法はLDAの解釈可能性を保持するだけでなく、欠落したデータに悩まされるシナリオにおける分類性能を大幅に向上させる。我々はWLDAの特性を確立するために詳細な理論解析を行い、その説明可能性について徹底的に評価する。さまざまなデータセットにわたる実験結果は、WLDAが従来のメソッド、特にトレーニングとテスト両方のデータセットで欠落した値が一般的である困難な環境で、一貫してパフォーマンスを向上していることを示している。この進歩は、分類精度を改善し、不完全なデータに直面してモデルの透明性を維持するための重要なツールを提供する。 As the adoption of Artificial Intelligence (AI) models expands into critical real-world applications, ensuring the explainability of these models becomes paramount, particularly in sensitive fields such as medicine and finance. Linear Discriminant Analysis (LDA) remains a popular choice for classification due to its interpretable nature, derived from its capacity to model class distributions and enhance class separation through linear combinations of features. However, real-world datasets often suffer from incomplete data, posing substantial challenges for both classification accuracy and model interpretability. In this paper, we introduce a novel and robust classification method, termed Weighted missing Linear Discriminant Analysis (WLDA), which extends LDA to handle datasets with missing values without the need for imputation. Our approach innovatively incorporates a weight matrix that penalizes missing entries, thereby refining parameter estimation directly on incomplete data. This methodology not only preserves the interpretability of LDA but also significantly enhances classification performance in scenarios plagued by missing data. We conduct an in-depth theoretical analysis to establish the properties of WLDA and thoroughly evaluate its explainability. Experimental results across various datasets demonstrate that WLDA consistently outperforms traditional methods, especially in challenging environments where missing values are prevalent in both training and test datasets. This advancement provides a critical tool for improving classification accuracy and maintaining model transparency in the face of incomplete data.	翻訳日:2024-09-04 17:31:13 公開日:2024-09-02
# 自由二項決定図による量子状態生成 Quantum State Preparation via Free Binary Decision Diagram ( http://arxiv.org/abs/2407.01671v3 ) ライセンス: Link先を確認	Yu Tanaka, Hayata Yamasaki, Mio Murao,	(参考訳) 量子状態準備(QSP)は、量子状態の古典的な記述のための量子状態を作成するための量子計算の基本的なタスクである。古典的な$n$-qubit量子状態の記述は、一般に$\exp(O(n))$パラメータを持つが、これは本質的には最悪の場合を扱うのに非効率である。ここでは、量子状態の古典的な記述が重み付きエッジを持つFBDDによって与えられるとき、QSPのための量子アルゴリズムを構築し、この設定におけるQSPの空間と時間的複雑さを分析する。 N=O(\mathrm{poly}(n))$ノードを$\mathrm{exp}(O(n))$ではなく、$N=O(\mathrm{poly}(n))$ノードで重み付けされたFBDDで表現できる$n$-qubit状態の非自明な例を提供する。重み付きFBDDで表される任意の量子状態が$N$量子ビットを用いて$O(N)$サイズの量子回路で作成できることを示し、他のBDDベースのQSPと比較してQSPに必要な回路サイズを指数関数的に改善する。また、$n=O(n^2)$ノードと$O(n^2)$アシラリーキュービットを持つ重み付きFBDDで表現できる$n$-qubit状態の別の例も提示するが、振幅増幅に基づいてQSPで効率的に生成することはできない。これらの結果は、効率的なQSPの可能性を広げるためのツールとしてFBDDを使うためのテクニックを提供する。 Quantum state preparation (QSP) is a fundamental task in quantum computation to prepare a quantum state for a given classical description of the quantum state. The classical description of an $n$-qubit quantum state may have $\exp(O(n))$ parameters in general, which are inherently inefficient to deal with in the worst case; however, in many practical cases, we may be able to employ suitable data structures to represent such large-scale data in a compressed way, e.g., by using a free binary decision diagram (FBDD), a rooted directed acyclic graph with two terminal nodes to concisely represent a Boolean function. We here construct a quantum algorithm for QSP when the classical description of a quantum state is given by an FBDD with weighted edges, and analyze the space, and time complexity of QSP in this setting. We provide a nontrivial example of an $n$-qubit state that can be represented by a weighted FBDD with $N=O(\mathrm{poly}(n))$ nodes rather than $\mathrm{exp}(O(n))$. We show that any quantum state represented by the weighted FBDD with $N$ nodes can be prepared by an $O(N)$-sized quantum circuit using $N$ ancillary qubits, exponentially improving the required circuit size for QSP compared to other BDD-based QSPs. We also provide another example of an $n$-qubit state that can be represented by a weighted FBDD with $N=O(n^2)$ nodes, and $O(n^2)$ ancillary qubits, but cannot be prepared efficiently by a QSP based on the amplitude amplification. These results provide techniques to employ FBDDs as a tool for broadening the possibility of efficient QSP.	翻訳日:2024-09-04 17:31:13 公開日:2024-09-02
# NeuFair: ドロップアウトによるニューラルネットワークのフェアネス修復 NeuFair: Neural Network Fairness Repair with Dropout ( http://arxiv.org/abs/2407.04268v3 ) ライセンス: Link先を確認	Vishnu Asutosh Dasu, Ashish Kumar, Saeid Tizpaz-Niari, Gang Tan,	(参考訳) 本稿では,ディープニューラルネットワーク(DNN)における後処理バイアス緩和としてのニューロンのドロップアウトについて検討する。神経駆動型ソフトウェアソリューションは、社会的に重要な領域において、重要な公正性に影響を及ぼす。ニューラルネットワークは、データから統計的パターンを見つけるのに非常に適しているが、過去のデータから既存のバイアスをエンコードして増幅することができる。既存のバイアス軽減アルゴリズムでは、入力データセットや学習アルゴリズムを変更する必要があることが多い。ランダムにニューロンを落とすことによるトレーニング中に過剰な適合を防げる一般的なドロップアウト手法は、事前訓練されたDNNの公平性を改善するための効果的な、より侵入的なアプローチである可能性があると仮定する。しかし、ドロップするニューロンの理想的な集合を見つけることは組合せ問題である。我々は,事前学習したDNNにおける不公平さをトレーニング後の推論におけるドロップアウトによって軽減する,後処理のランダム化アルゴリズムであるNeuFairを提案する。我々のランダム化検索は、モデルの実用性を維持しながら差別を最小限に抑える目的によって導かれる。ランダム化アルゴリズムの設計は, モデルの性能劣化を最小限に抑えつつ, 公平性(最大69%)を向上させるのに有効であり, 効率的であることを示す。本稿では,これらの現象を直感的に説明し,探索アルゴリズムの様々なハイパーパラメータが結果に与える影響を慎重に検討する。最後に、NeuFairと異なる最先端バイアス緩和器を経験的、概念的に比較する。 This paper investigates neuron dropout as a post-processing bias mitigation for deep neural networks (DNNs). Neural-driven software solutions are increasingly applied in socially critical domains with significant fairness implications. While neural networks are exceptionally good at finding statistical patterns from data, they may encode and amplify existing biases from the historical data. Existing bias mitigation algorithms often require modifying the input dataset or the learning algorithms. We posit that the prevalent dropout methods that prevent over-fitting during training by randomly dropping neurons may be an effective and less intrusive approach to improve the fairness of pre-trained DNNs. However, finding the ideal set of neurons to drop is a combinatorial problem. We propose NeuFair, a family of post-processing randomized algorithms that mitigate unfairness in pre-trained DNNs via dropouts during inference after training. Our randomized search is guided by an objective to minimize discrimination while maintaining the model's utility. We show that our design of randomized algorithms is effective and efficient in improving fairness (up to 69%) with minimal or no model performance degradation. We provide intuitive explanations of these phenomena and carefully examine the influence of various hyperparameters of search algorithms on the results. Finally, we empirically and conceptually compare NeuFair to different state-of-the-art bias mitigators.	翻訳日:2024-09-04 17:31:13 公開日:2024-09-02
# United We Stand: 参加型分散型マルチエージェント計画 United We Stand: Decentralized Multi-Agent Planning With Attrition ( http://arxiv.org/abs/2407.08254v2 ) ライセンス: Link先を確認	Nhat Nguyen, Duong Nguyen, Gianluca Rizzo, Hung Nguyen,	(参考訳) 分散計画は情報収集タスクのための協調型マルチエージェントシステムの鍵となる要素である。しかし、現実的な大規模デプロイメントシナリオではエージェント障害の頻度が高いにもかかわらず、現在のアプローチは、まったく収束しない、あるいはリソース(例えばエネルギー)の非常に非効率な利用によって、障害の存在下ではパフォーマンスが悪くなっている。本研究では,Attritable MCTS (A-MCTS) を提案する。これは、各エージェントの局所的な貢献の推定にグローバルな報酬関数を使うことと、協調のための後悔のマッチングに基づいている。異なるシナリオ下での現実的なデータハーベストング問題における有効性を評価する。 A-MCTSは高故障率でも効率よく適応できることを理論的および実験的に示す。その結果、頻繁な障害が存在する場合、我々のソリューションは、グローバルなユーティリティとスケーラビリティの観点から、最も優れた既存アプローチよりも大幅に改善されていることが示唆された。 Decentralized planning is a key element of cooperative multi-agent systems for information gathering tasks. However, despite the high frequency of agent failures in realistic large deployment scenarios, current approaches perform poorly in the presence of failures, by not converging at all, and/or by making very inefficient use of resources (e.g. energy). In this work, we propose Attritable MCTS (A-MCTS), a decentralized MCTS algorithm capable of timely and efficient adaptation to changes in the set of active agents. It is based on the use of a global reward function for the estimation of each agent's local contribution, and regret matching for coordination. We evaluate its effectiveness in realistic data-harvesting problems under different scenarios. We show both theoretically and experimentally that A-MCTS enables efficient adaptation even under high failure rates. Results suggest that, in the presence of frequent failures, our solution improves substantially over the best existing approaches in terms of global utility and scalability.	翻訳日:2024-09-04 17:21:21 公開日:2024-09-02
# マルチブランチ深部畳み込みネットワークとLSTM-CNNを用いた心臓音の分類 Classification of Heart Sounds Using Multi-Branch Deep Convolutional Network and LSTM-CNN ( http://arxiv.org/abs/2407.10689v2 ) ライセンス: Link先を確認	Seyed Amir Latifi, Hassan Ghassemian, Maryam Imani,	(参考訳) 本稿では,クリニックにおける低コストシステムを用いて,より高精度かつ信頼性の高い心疾患の迅速かつ低コスト診断法を提案する。心臓疾患の自動診断の第一の限界は、正確で許容できるラベル付き標本の希少性であり、準備に費用がかかる。この問題に対処するため,本研究では2つの手法を提案する。最初の方法は、人間の聴覚処理にインスパイアされた独自のマルチブランチディープ畳み込みニューラルネットワーク(MBDCN)アーキテクチャで、様々なサイズの畳み込みフィルタと音声信号パワースペクトルを入力として利用することによって特徴抽出を最適化するように設計されている。第二の手法はLong Short-term memory-Convolutional Neural (LSCN)モデルと呼ばれ、ネットワークアーキテクチャにはLong Short-Term Memory (LSTM)ネットワークブロックが含まれており、時間領域における特徴抽出を改善する。 LSTMブロックとともに一次元畳み込み層からなる複数の並列分岐を結合するという革新的なアプローチは、音声信号処理タスクにおいて優れた結果を得るのに役立つ。実験により,提案手法が最先端技術よりも優れていることが示された。 LSCNネットワークによる心臓音の総合的分類精度は96%以上である。このネットワークの効率は、Mel Frequency Cepstral Coefficients (MFCC) やウェーブレット変換のような一般的な特徴抽出法と比較すると顕著である。そこで本手法は,心臓音の自動解析において有望な結果を示し,心血管疾患の診断と早期発見に有効である可能性が示唆された。 This paper presents a fast and cost-effective method for diagnosing cardiac abnormalities with high accuracy and reliability using low-cost systems in clinics. The primary limitation of automatic diagnosing of cardiac diseases is the rarity of correct and acceptable labeled samples, which can be expensive to prepare. To address this issue, two methods are proposed in this work. The first method is a unique Multi-Branch Deep Convolutional Neural Network (MBDCN) architecture inspired by human auditory processing, specifically designed to optimize feature extraction by employing various sizes of convolutional filters and audio signal power spectrum as input. In the second method, called as Long short-term memory-Convolutional Neural (LSCN) model, Additionally, the network architecture includes Long Short-Term Memory (LSTM) network blocks to improve feature extraction in the time domain. The innovative approach of combining multiple parallel branches consisting of the one-dimensional convolutional layers along with LSTM blocks helps in achieving superior results in audio signal processing tasks. The experimental results demonstrate superiority of the proposed methods over the state-of-the-art techniques. The overall classification accuracy of heart sounds with the LSCN network is more than 96%. The efficiency of this network is significant compared to common feature extraction methods such as Mel Frequency Cepstral Coefficients (MFCC) and wavelet transform. Therefore, the proposed method shows promising results in the automatic analysis of heart sounds and has potential applications in the diagnosis and early detection of cardiovascular diseases.	翻訳日:2024-09-04 17:21:21 公開日:2024-09-02
# 過平滑化理論の簡易化 Simplifying the Theory on Over-Smoothing ( http://arxiv.org/abs/2407.11876v2 ) ライセンス: Link先を確認	Andreas Roth,	(参考訳) グラフの畳み込みは、不規則な幾何学構造を持つデータに対して効率的に操作できる能力によって人気を集めている。しかし、グラフの畳み込みは過剰な平滑化を引き起こす。しかし、現在では多くの異なる定義や直観が共存しており、相容れない方向に焦点を当てた研究に繋がる。本稿では,過平滑化がパワーイテレーションの特別な場合であることを示すことによって,これらの方向を整列させようとする。これにより、オーバー・スムーシングに関する既存の理論が大幅に単純化され、よりアクセスしやすくなる。この理論に基づいて、オーバースムーシングの一般化形式としてのランク崩壊の包括的定義を提供し、対応する計量としてランクワン距離を導入する。 14の手法を実証的に評価したところ,これまでに知られていたよりも多くのモデルがこの問題に悩まされていることがわかった。 Graph convolutions have gained popularity due to their ability to efficiently operate on data with an irregular geometric structure. However, graph convolutions cause over-smoothing, which refers to representations becoming more similar with increased depth. However, many different definitions and intuitions currently coexist, leading to research efforts focusing on incompatible directions. This paper attempts to align these directions by showing that over-smoothing is merely a special case of power iteration. This greatly simplifies the existing theory on over-smoothing, making it more accessible. Based on the theory, we provide a novel comprehensive definition of rank collapse as a generalized form of over-smoothing and introduce the rank-one distance as a corresponding metric. Our empirical evaluation of 14 commonly used methods shows that more models than were previously known suffer from this issue.	翻訳日:2024-09-04 17:21:21 公開日:2024-09-02
# 臨床データを用いたICU脳卒中患者の死亡予測の高度な予測モデル Advanced Predictive Modeling for Enhanced Mortality Prediction in ICU Stroke Patients Using Clinical Data ( http://arxiv.org/abs/2407.14211v2 ) ライセンス: Link先を確認	Armin Abdollahi, Negin Ashrafi, Maryam Pishgar,	(参考訳) 背景:ストロークは成人の障害と死亡の第二の要因である。毎年1700万人が脳卒中を患っており、約85%が虚血性脳卒中である。集中治療室(ICU)における虚血性脳卒中患者の死亡予測は、治療戦略の最適化、資源配分、生存率の向上に不可欠である。方法:MIMIC-IVデータベースからICU虚血性脳卒中患者の診断,バイタルサイン,臨床検査,治療,治療,臨床ノートなどのデータを得た。ストローク患者は無作為にトレーニング (70%, n=2441), テスト (15%, n=523), 検証 (15%, n=523) に分けた。データ不均衡に対処するために、SMOTE(Synthetic Minority Over-Sampling Technique)を適用した。モデル開発のために30の特徴を選定し,最も優れた研究で使用される1095から特徴数を著しく減らした。我々は、死亡リスクを評価するためのディープラーニングモデルを開発し、比較のためにいくつかのベースライン機械学習モデルを実装した。結果: 特徴選択と深層学習にXGBoostを併用したXGB-DLモデルにより, 偽陽性を効果的に最小化した。 Model の AUROC は初日 0.865 (95% CI: 0.821 - 0.905) から 4 日目に 0.903 (95% CI: 0.868 - 0.936) に改善され、MIMIC-IV データベースで 0.945 AUROC (95% CI: 0.944 - 0.947) を使用した3,646 ICU 死亡患者のデータを使用した。他のMLモデルもAUROCの観点からは良好に動作したが、より具体的な点からDeep Learningを選択した。結論: 改良された特徴選択とデータクリーニングにより, 既存モデルと比較して13%のAUROCの改善が得られたが, 以前の研究では1095から30に減少した。 Background: Stroke is second-leading cause of disability and death among adults. Approximately 17 million people suffer from a stroke annually, with about 85% being ischemic strokes. Predicting mortality of ischemic stroke patients in intensive care unit (ICU) is crucial for optimizing treatment strategies, allocating resources, and improving survival rates. Methods: We acquired data on ICU ischemic stroke patients from MIMIC-IV database, including diagnoses, vital signs, laboratory tests, medications, procedures, treatments, and clinical notes. Stroke patients were randomly divided into training (70%, n=2441), test (15%, n=523), and validation (15%, n=523) sets. To address data imbalances, we applied Synthetic Minority Over-sampling Technique (SMOTE). We selected 30 features for model development, significantly reducing feature number from 1095 used in the best study. We developed a deep learning model to assess mortality risk and implemented several baseline machine learning models for comparison. Results: XGB-DL model, combining XGBoost for feature selection and deep learning, effectively minimized false positives. Model's AUROC improved from 0.865 (95% CI: 0.821 - 0.905) on first day to 0.903 (95% CI: 0.868 - 0.936) by fourth day using data from 3,646 ICU mortality patients in the MIMIC-IV database with 0.945 AUROC (95% CI: 0.944 - 0.947) during training. Although other ML models also performed well in terms of AUROC, we chose Deep Learning for its higher specificity. Conclusions: Through enhanced feature selection and data cleaning, proposed model demonstrates a 13% AUROC improvement compared to existing models while reducing feature number from 1095 in previous studies to 30.	翻訳日:2024-09-04 17:11:28 公開日:2024-09-02
# ハウスホルダー変換を用いたフラッメント量子埋め込み-アンサンブルに基づく多状態展開 Fragment quantum embedding using the Householder transformation: a multi-state extension based on ensembles ( http://arxiv.org/abs/2407.14278v2 ) ライセンス: Link先を確認	Filip Cernatic, Emmanuel Fromager, Saad Yalouz,	(参考訳) Yalouz et al (J. Chem. Phys. 157, 214112, 2022) と Sekaran et al (Phys. Rev. B 104, 035121, 2021; Computation 10, 45, 2022) による最近の研究では、拡張されたシステムに断片を埋め込む新しいツールとしてハウステリア変換を用いて密度行列埋め込み理論 (DMET) が再構成されている。この変換を参照非干渉型1電子還元密度行列に適用し、破片の浴軌道を構築し、その後の基底状態の計算に不可欠である。本研究は, これまでの展開を拡大し, 地中および励起状態を含む複数の電子状態の記述への世帯変換の利用を拡大するものである。アンサンブル非相互作用密度行列に基づいて, 連続した世帯変換によって正確な断片を埋め込むことが可能であり, 浴槽軌道の集合が大きくなることを示す。解析により, 入浴軌道の数は, 基準アンサンブル密度行列における天然軌道の分数的占有数と直接的に一致することを示した。また、通常のDMET浴場との接続も行う。次に、このアンサンブル埋め込みツールを単発DMET計算に用いて、ハバード格子モデルとアブイニシアト水素系において、基底状態と第1励起状態の両方を記述する。最後に,自己整合性によるアンサンブル埋め込みの促進と今後の展望について考察する。 In recent works by Yalouz et al. (J. Chem. Phys. 157, 214112, 2022) and Sekaran et al. (Phys. Rev. B 104, 035121, 2021; Computation 10, 45, 2022), Density Matrix Embedding Theory (DMET) has been reformulated through the use of the Householder transformation as a novel tool to embed a fragment within extended systems. The transformation was applied to a reference non-interacting one-electron reduced density matrix to construct fragments' bath orbitals, which are crucial for subsequent ground state calculations. In the present work, we expand upon these previous developments and extend the utilization of the Householder transformation to the description of multiple electronic states, including ground and excited states. Based on an ensemble noninteracting density matrix, we demonstrate the feasibility of achieving exact fragment embedding through successive Householder transformations, resulting in a larger set of bath orbitals. We analytically prove that the number of additional bath orbitals scales directly with the number of fractionally occupied natural orbitals in the reference ensemble density matrix. A connection with the regular DMET bath construction is also made. Then, we illustrate the use of this ensemble embedding tool in single-shot DMET calculations to describe both ground and first excited states in a Hubbard lattice model and an ab initio hydrogen system. Lastly, we discuss avenues for enhancing ensemble embedding through self-consistency and explore potential future directions.	翻訳日:2024-09-04 17:11:28 公開日:2024-09-02
# PD-APE:3次元視覚グラウンドのための適応位置符号化を用いた並列デコーディングフレームワーク PD-APE: A Parallel Decoding Framework with Adaptive Position Encoding for 3D Visual Grounding ( http://arxiv.org/abs/2407.14491v2 ) ライセンス: Link先を確認	Chenshu Hou, Liang Peng, Xiaopei Wu, Xiaofei He, Wenxiao Wang,	(参考訳) 3Dビジュアルグラウンドは、特定の自然言語記述にマッチする3Dポイントクラウドシーン内のオブジェクトを特定することを目的としている。これは、モデルが対象のオブジェクト自体にフォーカスするだけでなく、その記述が満たされているかどうかを判断するために周囲の環境も考慮する必要がある。これまでのほとんどの研究は、同じモジュール内で両方のタスクを達成しようとするが、容易に注意をそらすことになる。この目的のために、ターゲットオブジェクト属性と周辺レイアウトを別々にデコードするデュアルブランチデコーディングフレームワークPD-APEを提案する。具体的には、ターゲットオブジェクトブランチでは、デコーダがターゲットオブジェクトの特徴(例、カテゴリ、色)を記述するテキストトークンを処理し、ターゲットオブジェクト自体に注意を払うようにクエリを誘導する。周辺ブランチでは、クエリは周囲の環境情報を保持する他のテキストトークンと一致し、アテンションマップはテキストに記述されたレイアウトを正確にキャプチャする。提案されたデュアルブランチ設計に適合し、クエリは各ブランチの特定の目的に関連する点に集中することができる。さらに,各分岐に対して適応的な位置符号化法を設計する。対象のオブジェクトブランチでは、位置エンコーディングはシードポイントと予測された3Dボックスの間の相対的な位置に依存する。周辺ブランチでは、アテンションマップは視覚的特徴とテキスト的特徴の信頼性によってガイドされ、クエリは貴重なレイアウト情報を持つポイントに集中することができる。 ScanReferとNr3Dという、広く採用されている2つの3Dビジュアルグラウンドデータセットで、私たちは最先端の技術を超越していることが、大規模な実験で示されています。 3D visual grounding aims to identify objects in 3D point cloud scenes that match specific natural language descriptions. This requires the model to not only focus on the target object itself but also to consider the surrounding environment to determine whether the descriptions are met. Most previous works attempt to accomplish both tasks within the same module, which can easily lead to a distraction of attention. To this end, we propose PD-APE, a dual-branch decoding framework that separately decodes target object attributes and surrounding layouts. Specifically, in the target object branch, the decoder processes text tokens that describe features of the target object (e.g., category and color), guiding the queries to pay attention to the target object itself. In the surrounding branch, the queries align with other text tokens that carry surrounding environment information, making the attention maps accurately capture the layout described in the text. Benefiting from the proposed dual-branch design, the queries are allowed to focus on points relevant to each branch's specific objective. Moreover, we design an adaptive position encoding method for each branch respectively. In the target object branch, the position encoding relies on the relative positions between seed points and predicted 3D boxes. In the surrounding branch, the attention map is additionally guided by the confidence between visual and text features, enabling the queries to focus on points that have valuable layout information. Extensive experiments demonstrate that we surpass the state-of-the-art on two widely adopted 3D visual grounding datasets, ScanRefer and Nr3D.	翻訳日:2024-09-04 17:11:28 公開日:2024-09-02
# OriGen: Code-to-Code AugmentationとセルフリフレクションによるRTLコード生成の強化 OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection ( http://arxiv.org/abs/2407.16237v2 ) ライセンス: Link先を確認	Fan Cui, Chenyang Yin, Kexing Zhou, Youwei Xiao, Guangyu Sun, Qiang Xu, Qipeng Guo, Demin Song, Dahua Lin, Xingcheng Zhang, Yun, Liang,	(参考訳) 近年,GPT-4やClaude3-Opusといった商業モデルで顕著な進歩が見られるように,登録転送レベル(RTL)コードの生成において,LLM(Large Language Models)が有意な可能性を実証している。しかしながら、これらのプロプライエタリなLSMは、プライバシとセキュリティに関する懸念を引き起こすことが多い。オープンソースのLLMはこれらの問題に対する解決策を提供するが、RTLコード生成タスクでは、主に高品質のオープンソースRTLデータセットが不足しているため、商用モデルよりもパフォーマンスが低いのが一般的である。この課題に対処するために,自己回帰機能を組み込んだオープンソースフレームワークであるOriGenと,高品質で大規模なRTLコードを生成するための新たなデータセット拡張手法を紹介する。提案手法では,オープンソースのRTLコードデータセットの品質向上のために,コード-コード拡張手法を用いている。さらに、OriGenは、コンパイラフィードバックを活用するセルフリフレクションプロセスを通じて、構文エラーを修正できる。実験の結果、OriGenはRTLコード生成において、他のオープンソース代替よりも大幅に優れていることが示された。 VerilogEval-Humanベンチマークのpass@1メトリックでは、GPT-4 Turboを12.8%上回っている。さらに、OriGenは自己回帰と誤り訂正の優れた能力を示し、自己回帰能力を評価するために設計されたベンチマークでGPT-4を19.9%上回っている。 Recent studies have demonstrated the significant potential of Large Language Models (LLMs) in generating Register Transfer Level (RTL) code, with notable advancements showcased by commercial models such as GPT-4 and Claude3-Opus. However, these proprietary LLMs often raise concerns regarding privacy and security. While open-source LLMs offer solutions to these concerns, they typically underperform commercial models in RTL code generation tasks, primarily due to the scarcity of high-quality open-source RTL datasets. To address this challenge, we introduce OriGen , a fully open-source framework that incorporates self-reflection capabilities and a novel dataset augmentation methodology for generating high-quality, large-scale RTL code. Our approach employs a code-tocode augmentation technique to enhance the quality of open-source RTL code datasets. Furthermore, OriGen can rectify syntactic errors through a self-reflection process that leverages compiler feedback. Experimental results demonstrate that OriGen significantly outperforms other open-source alternatives in RTL code generation. It surpasses the previous best-performing open-source LLM by 12.8% and even exceeds GPT-4 Turbo in the pass@1 metric on the VerilogEval-Human benchmark. Moreover, OriGen exhibits superior capabilities in self-reflection and error correction, outperforming GPT-4 by 19.9% on a benchmark designed to evaluate self-reflection capabilities.	翻訳日:2024-09-04 17:11:28 公開日:2024-09-02
# マルチパーティ量子システムにおける量子相互情報のファミリー Family of Quantum Mutual Information in Multiparty Quantum Systems ( http://arxiv.org/abs/2407.16365v3 ) ライセンス: Link先を確認	Asutosh Kumar,	(参考訳) マルチパーティシステム内の情報のキャラクタリゼーションは、重要かつ複雑である。本稿では、一般化された条件付き相互情報の概念と、多人数の量子的相互情報測定のファミリについて述べる。我々は、これらの概念の性質を解釈し、説明し、また未解決の問題を指摘する。一般化された条件付き相互情報は、多パーティ量子システムの様々なコンポーネント間の相互依存性と相関をカプセル化するのに役立つ。さらに、多党間の量子相互情報の様々な定式化は、古典的、量子的、および全体的相関のより深い理解に寄与する。これらの洞察は、量子情報理論の分野における基礎研究を促進する可能性がある。 The characterization of information within a multiparty system is both significant and complex. This paper presents the concept of generalized conditional mutual information, along with a family of multiparty quantum mutual information measures. We provide interpretations and delineate the properties of these concepts, while also pointing out certain unresolved issues. The generalized conditional mutual information serves to encapsulate the interdependencies and correlations among various components of a multiparty quantum system. Additionally, various formulations of multiparty quantum mutual information contribute to a deeper comprehension of classical, quantum, and total correlations. These insights have the potential to propel fundamental research in the field of quantum information theory.	翻訳日:2024-09-04 17:11:28 公開日:2024-09-02
# 難解な数学質問のAIによる生成 AI-Assisted Generation of Difficult Math Questions ( http://arxiv.org/abs/2407.21009v2 ) ライセンス: Link先を確認	Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Nan Rosemary Ke, Michael Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal,	(参考訳) 現在のLLMトレーニングは、数学的推論をコア能力として位置づけている。公開されているソースが完全にタップされているため、多様で挑戦的な数学問題に対する需要は計り知れない。人間の専門家だけを頼りにすることは時間も費用もかかるが、LSMが生み出す質問には必要な多様性と難易度が欠けていることが多い。本稿では,LLMの強みとループ型アプローチを組み合わせることで,多種多様な難解な数学問題を生成する設計枠組みを提案する。我々は,LLMのメタ認知能力(Didolkar et al , 2024)を活用し,既存の数学データセットからコア"スキル"を抽出する。これらのスキルは、ランダムなコアスキルのペアでLLMに促すことによって、新しくて難しい質問を生成する基盤となる。各質問における2つの異なるスキルの使用により、そのような質問を見つけることは、LLMと人間の両方にとって「配布外」タスクとなる。私たちのパイプラインでは、マルチターンプロンプトを通じて質問やソリューションを反復的に生成し、洗練するためにLLMを採用しています。人間のアノテータは質問を検証し、さらに洗練し、その効率はさらなるLSM相互作用によって向上する。このパイプラインをMATHデータセット(Hendrycks et al , 2021)から抽出したスキルに適用することにより,MATH$^2$ – 高品質な数学質問のデータセットが得られた。 (a)MATH$^2$における全てのモデルのMATHよりも低い性能 (b)MATH$^2$の質問をコンテキスト内例として使用する場合,MATH上でのパフォーマンスが向上する。数学に重点を置いているが、我々の方法論は構造化推論を必要とする他の領域に適用できるようであり、スケーラブルな監視のコンポーネントとして考えられる。 MATH$^2$における成功率はMATHの正方形であり、MATH$^2$における問題の解決には2つの異なる数学スキルの非自明な組み合わせが必要であることを示唆している。 Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core "skills" from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an "out of distribution" task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH$^2$ - a dataset of higher-quality math questions, as evidenced by: (a) Lower performance of all models on MATH$^2$ than on MATH (b) Higher performance on MATH when using MATH$^2$ questions as in-context examples. Although focused on mathematics, our methodology seems applicable to other domains requiring structured reasoning, and potentially as a component of scalable oversight. Also of interest is a striking relationship observed between models' performance on the new dataset: the success rate on MATH$^2$ is the square on MATH, suggesting that successfully solving the question in MATH$^2$ requires a nontrivial combination of two distinct math skills.	翻訳日:2024-09-04 17:11:28 公開日:2024-09-02
# Barlow Twins Deep Neural Network for Advanced 1D Drug-Target Interaction Prediction Barlow Twins Deep Neural Network for Advanced 1D Drug-Target Interaction Prediction ( http://arxiv.org/abs/2408.00040v2 ) ライセンス: Link先を確認	Maximilian G. Schuh, Davide Boldini, Annkathrin I. Bohne, Stephan A. Sieber,	(参考訳) 薬物と標的の相互作用の正確な予測は、薬物発見を促進するために重要である。時間とコストを削減することによって、機械学習とディープラーニングは、この面倒な発見プロセスを加速することができる。新たなアプローチであるBarlowDTIでは、ターゲットタンパク質の構造を考慮しつつ、強力なBarlow Twinsアーキテクチャを特徴抽出に活用する。提案手法は,1次元の入力のみを用いて,複数のベンチマークに対して最先端の予測性能を実現する。勾配推力機械を基礎となる予測装置として用いることで、計算資源を十分に必要とせず、高速かつ効率的な予測が可能になる。また、個別のトレーニングサンプルに基づいて、モデルがどのように決定に達するかについても検討する。共結晶構造を比較することで,BarlowDTIは触媒活性および安定化残基を効果的に利用し,一次元入力データからモデルを一般化する能力を強調した。さらに、既存のメソッドに対して新たなベースラインをベンチマークする。これらのイノベーションは、薬物と標的の相互作用予測の効率と効果を改善し、薬物開発を加速し、分子間相互作用の理解を深めるための堅牢なツールを提供する。したがって、私たちはhttps://www.bio.nat.tum.de/oc2/barlowdtiで自由にアクセスできる使いやすいWebインターフェースを提供しています。 Accurate prediction of drug-target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive performance against multiple established benchmarks using only one-dimensional input. The use of gradient boosting machine as the underlying predictor ensures fast and efficient predictions without the need for substantial computational resources. We also investigate how the model reaches its decision based on individual training samples. By comparing co-crystal structures, we find that BarlowDTI effectively exploits catalytically active and stabilising residues, highlighting the model's ability to generalise from one-dimensional input data. In addition, we further benchmark new baselines against existing methods. Together, these innovations improve the efficiency and effectiveness of drug-target interaction predictions, providing robust tools for accelerating drug development and deepening the understanding of molecular interactions. Therefore, we provide an easy-to-use web interface that can be freely accessed at https://www.bio.nat.tum.de/oc2/barlowdti .	翻訳日:2024-09-04 17:11:28 公開日:2024-09-02
# 解釈可能性を考慮した時間的知識グラフにおける異常のオンライン検出 Online Detection of Anomalies in Temporal Knowledge Graphs with Interpretability ( http://arxiv.org/abs/2408.00872v2 ) ライセンス: Link先を確認	Jiasheng Zhang, Rex Ying, Jie Shao,	(参考訳) 時間的知識グラフ(TKG)は、実体間の関係の進化を捉える上で貴重な資源であるが、しばしばノイズに悩まされ、堅牢な異常検出機構を必要とする。既存の動的グラフ異常検出アプローチは、TKG内のノードとエッジのカテゴリによって導入されたリッチなセマンティクスを捉えるのに苦労するが、TKG埋め込み手法は解釈可能性に欠け、異常検出の信頼性を損なう。さらに,これらの手法は,知識更新によるパターン変化やセマンティックドリフトへの適応を阻害する。これらの課題に対処するために、TKGにおけるオンライン異常検出の解釈に適した効率的なTKG要約手法であるAnoTを導入する。 AnoTは、TKGを新しい規則グラフにまとめることから始まり、TKGの複雑なパターンの柔軟な推論を可能にする。新しい知識が出現すると、AnoTはルールグラフのノードにそれをマッピングし、ルールグラフを逆行して知識の異常スコアを導出する。トラバーサルは到達可能なノードを生成し、新しい知識の妥当性や異常を解釈可能な証拠を与える。全体として、AnoTは、オフラインのTKG要約とオンラインスコアリングのための検出器、新しい知識に基づくリアルタイムルールグラフ更新のための更新器、およびルールグラフの近似誤差を推定するモニターを含む、検出器・アップダッタ・モニタアーキテクチャを具現化している。 4つの実世界のデータセットの実験結果は、AnoTが既存の手法を精度と相互運用性の点ではるかに上回っていることを示している。すべての生データセットとAnoTの実装はhttps://github.com/zjs123/ANoTで提供されている。 Temporal knowledge graphs (TKGs) are valuable resources for capturing evolving relationships among entities, yet they are often plagued by noise, necessitating robust anomaly detection mechanisms. Existing dynamic graph anomaly detection approaches struggle to capture the rich semantics introduced by node and edge categories within TKGs, while TKG embedding methods lack interpretability, undermining the credibility of anomaly detection. Moreover, these methods falter in adapting to pattern changes and semantic drifts resulting from knowledge updates. To tackle these challenges, we introduce AnoT, an efficient TKG summarization method tailored for interpretable online anomaly detection in TKGs. AnoT begins by summarizing a TKG into a novel rule graph, enabling flexible inference of complex patterns in TKGs. When new knowledge emerges, AnoT maps it onto a node in the rule graph and traverses the rule graph recursively to derive the anomaly score of the knowledge. The traversal yields reachable nodes that furnish interpretable evidence for the validity or the anomalous of the new knowledge. Overall, AnoT embodies a detector-updater-monitor architecture, encompassing a detector for offline TKG summarization and online scoring, an updater for real-time rule graph updates based on emerging knowledge, and a monitor for estimating the approximation error of the rule graph. Experimental results on four real-world datasets demonstrate that AnoT surpasses existing methods significantly in terms of accuracy and interoperability. All of the raw datasets and the implementation of AnoT are provided in https://github.com/zjs123/ANoT.	翻訳日:2024-09-04 17:01:34 公開日:2024-09-02
# 深い非弾性過程の領域における空間と時間の追加次元 Additional dimensions of space and time in the domain of deep inelastic processes ( http://arxiv.org/abs/2408.02696v2 ) ライセンス: Link先を確認	B. B. Levchenko,	(参考訳) 我々は、有名なハイゼンベルクの不確実性関係とランダウ=ピエルスの不確実性関係が、新しい不確実性関係に属する'hidden'' 角変数を暗黙的に含んでいることを証明した。得られた関係に基づいて、間接測定において仮想粒子の速度を$U^$と推定する式を導出した。間接測定理論と導出式を適用し, DIS HERAデータから仮想光子の群速度の加群を推定した。 HERAデータから, 仮想光子の速度は, 自由空間における光速$c$, $U^>c$を超えることが示された。仮想光子と仮説タキオン粒子の性質はほぼ同一である。粒子相互作用の領域では、新しい角パラメータは位相空間幾何学のタイプと時空連続体の次元性と密接に関連している。正規化条件 $U^* =c$ at $Q^2=0\, \rm{GeV}^2$ の問題は、I. Bars が開発した 'Two-Time Physics' の枠組みの中で自然に解けることが示唆されている。 2T-物理学は、位相空間における局所シンプレクティック$\mathrm{Sp(2,R)}$ゲージ対称性とシグネチャ$\mathrm{(1+1',d+1')}$の時空幾何学を持つ理論である。 We prove that the well-known Heisenberg uncertainty relations and Landau-Peierls uncertainty relations implicitly contain ``hidden'' angular variables, which belong to new uncertainty relations. Based on the obtained relations, we derive a formula for estimating the speed $U^$ of a virtual particle in indirect measurements. We applied the theory of indirect measurements and the derived formula to estimate the module of the group velocity of virtual photons from the DIS HERA data. The HERA data indicate that the speed of virtual photons exceeds the speed of light $c$ in free space, $U^>c$. The properties of virtual photons and a hypothetical tachyon particle are almost identical. It is found that in the realm of particle interaction, the new angular parameters are closely related to the type of the phase-space geometry and dimensionality of the space-time continuum. It is suggested that the problem of the normalization condition $U^* =c$ at $Q^2=0\, \rm{GeV}^2$ can be solved naturally within the framework of ``Two-Time Physics'' developed by I. Bars. 2T-physics is the theory with local symplectic $\mathrm{Sp(2,R)}$ gauge symmetry in phase-space and the space-time geometry of signature $\mathrm{(1+1',d+1')}$ with one extra time-like and one extra space-like dimensions.	翻訳日:2024-09-04 17:01:34 公開日:2024-09-02
# バイアスノイズに対する高次元量子XYZ積符号 High-dimensional quantum XYZ product codes for biased noise ( http://arxiv.org/abs/2408.03123v2 ) ライセンス: Link先を確認	Zhipeng Liang, Zhengzhong Yi, Fusheng Yang, Jiahan Chen, Zicheng Wang, Xuan Wang,	(参考訳) 量子XYZ製品は、3つの古典的なコードを使用してCSS以外のコードのクラスを構築することができる。しかし,本研究に先立ち,その誤り訂正性能は深く研究されず,このコード構築法が高次元に一般化できるかどうかが疑問視される。本稿では,3 つの繰り返し符号の XYZ 生成物の特殊例である 3 次元トーリック符号の非 CSS 変種と見なせる 3 次元チャモン符号の誤り訂正性能について検討する。第2に、XYZ製品は4次元に一般化でき、4次元(4D)XYZ製品コード構築を提案し、4次元ホモロジー製品の変種と見なすことができ、4つの古典的コードまたは2つのCSSコードを用いて非CSSコードのクラスを構築する。 4Dホモロジー製品と比較して、4D XYZ製品は高次元またはコード距離の非CSSコードを構築することができることを示す。第3に、4D Chamon コードと 4D XYZ コンカレントコードという4D XYZ 製品の特別な例を2つ検討する。完全分離されたバイナリ信念伝搬と順序付き統計復号を併用したシミュレーションの結果,同じ2つのCSSコードを用いて4D XYZ製品は,4Dホモロジー製品によって構築されるCSSコードよりも,誤り訂正性能のよい非CSSコードを構築することができることがわかった。 Quantum XYZ product can construct a class of non-CSS codes by using three classical codes. However, before this work, their error-correcting performance is not studied in depth and whether this code construction method can be generalized to higher dimension is an open question. In this paper, we first study the error-correcting performance of the 3D Chamon code, which can be seen as a non-CSS variant of the 3D toric code and a special instance of the XYZ product of three repetition codes. Second, we show that XYZ product can be generalized to four dimension and propose four-dimensional (4D) XYZ product code construction, which can be seen as a variant of 4D homological product and constructs a class of non-CSS codes by using 4 classical codes or 2 CSS codes. Compared with 4D homological product, we show that 4D XYZ product can construct non-CSS codes with higher dimension or code distance. Third, we consider two special instances of 4D XYZ product, which we name 4D Chamon code and 4D XYZ concatenated code. Exploiting fully decoupled binary belief propagation combined with ordered statistics decoding, our simulation results show that, using the same two CSS codes, 4D XYZ product can construct non-CSS codes with better error-correcting performance for $Z$-biased noise than CSS codes constructed by 4D homological product, which is more meaningful for practice quantum computing system.	翻訳日:2024-09-04 17:01:34 公開日:2024-09-02
# GuidedNet: ラベル付きデータガイドによる半スーパービジョンのマルチオーガンセグメンテーション GuidedNet: Semi-Supervised Multi-Organ Segmentation via Labeled Data Guide Unlabeled Data ( http://arxiv.org/abs/2408.04914v2 ) ライセンス: Link先を確認	Haochen Zhao, Hui Meng, Deqian Yang, Xiaozheng Xie, Xiaoze Wu, Qingfeng Li, Jianwei Niu,	(参考訳) 半監督型多臓器画像分割法は, 疾患の診断と治療計画の改善を支援するとともに, 臓器アノテーションに必要な時間と労力を削減し, 既存の最先端の手法では, ラベル付きデータを地上の真実で訓練し, ラベルなしデータを擬似ラベルで訓練する。本稿では,ラベル付きデータとラベルなしデータの相互関係を反映しない2つのトレーニングフローを分離し,ラベル付きデータからの知識を活用してラベル付きデータのトレーニングをガイドする,ガイドドネットと呼ばれる半教師付きマルチ組織セグメンテーション手法を提案する。本研究の主な目的は、未ラベルデータにおける擬似ラベルの品質の向上と、小・複雑な臓器のネットワーク学習能力の向上である。鍵となる概念は、特徴空間における互いに近いラベル付きおよび未ラベル付きデータからのボクセル特徴が、同じクラスに属する可能性が高くなることである。この理論に基づいて、3D一貫性ガウス混合モデル(3D-CGMM)は、ラベル付きデータから特徴分布を活用して生成された擬似ラベルを補正するように設計されている。さらに、我々は、未ラベルデータから得られた知識を活用して、未ラベル付きデータのトレーニングガイドに活用する知識伝達クロス・プセウド・スーパービジョン(KT-CPS)戦略を導入する。 FLARE22とAMOSの2つの公開データセットに関する大規模な実験は、 GuidedNetが最先端のパフォーマンスを達成することができることを示した。提案したモデルによるソースコードはhttps://github.com/kimjisoo12/GuidedNet.comで公開されている。 Semi-supervised multi-organ medical image segmentation aids physicians in improving disease diagnosis and treatment planning and reduces the time and effort required for organ annotation.Existing state-of-the-art methods train the labeled data with ground truths and train the unlabeled data with pseudo-labels. However, the two training flows are separate, which does not reflect the interrelationship between labeled and unlabeled data.To address this issue, we propose a semi-supervised multi-organ segmentation method called GuidedNet, which leverages the knowledge from labeled data to guide the training of unlabeled data. The primary goals of this study are to improve the quality of pseudo-labels for unlabeled data and to enhance the network's learning capability for both small and complex organs.A key concept is that voxel features from labeled and unlabeled data that are close to each other in the feature space are more likely to belong to the same class.On this basis, a 3D Consistent Gaussian Mixture Model (3D-CGMM) is designed to leverage the feature distributions from labeled data to rectify the generated pseudo-labels.Furthermore, we introduce a Knowledge Transfer Cross Pseudo Supervision (KT-CPS) strategy, which leverages the prior knowledge obtained from the labeled data to guide the training of the unlabeled data, thereby improving the segmentation accuracy for both small and complex organs. Extensive experiments on two public datasets, FLARE22 and AMOS, demonstrated that GuidedNet is capable of achieving state-of-the-art performance. The source code with our proposed model are available at https://github.com/kimjisoo12/GuidedNet.	翻訳日:2024-09-04 16:51:50 公開日:2024-09-02
# 複合推論における包括的強化型ハイブリッドRAGシステム A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning ( http://arxiv.org/abs/2408.05141v3 ) ライセンス: Link先を確認	Ye Yuan, Chengwu Liu, Jingyang Yuan, Gongbo Sun, Siqi Li, Ming Zhang,	(参考訳) Retrieval-augmented Generation (RAG) は、大規模言語モデル(LLM)がそれらの精度を高め、外部知識ベースを統合することで幻覚を減らすことを可能にするフレームワークである。本稿では,検索品質,拡張推論能力,数値計算能力の向上など,総合的な最適化によって強化されたハイブリッドRAGシステムを提案する。我々はWebページのテキストチャンクとテーブルを洗練し、幻覚を減らす属性予測器を追加し、LLMナレッジ・エクストラクタとナレッジ・グラフ・エクストラクタを実行し、最後にすべての参照で推論戦略を構築した。我々は,メタCRAG KDD Cup 2024コンペティションを通じてCRAGデータセットのシステム評価を行った。局所評価とオンライン評価の両方で,我々のシステムは複雑な推論能力を大幅に向上させることを示した。局所評価では,ベースラインモデルと比較して精度が大幅に向上し,誤差率も大幅に低下し,スコアの顕著な増加を実現した。一方,提案システムの性能と一般化能力を実証し,オンラインアセスメントにおける卓越した成果を得た。我々のシステムのソースコードは \url{https://gitlab.aicrowd.com/shizueyy/crag-new} で公開されている。 Retrieval-augmented generation (RAG) is a framework enabling large language models (LLMs) to enhance their accuracy and reduce hallucinations by integrating external knowledge bases. In this paper, we introduce a hybrid RAG system enhanced through a comprehensive suite of optimizations that significantly improve retrieval quality, augment reasoning capabilities, and refine numerical computation ability. We refined the text chunks and tables in web pages, added attribute predictors to reduce hallucinations, conducted LLM Knowledge Extractor and Knowledge Graph Extractor, and finally built a reasoning strategy with all the references. We evaluated our system on the CRAG dataset through the Meta CRAG KDD Cup 2024 Competition. Both the local and online evaluations demonstrate that our system significantly enhances complex reasoning capabilities. In local evaluations, we have significantly improved accuracy and reduced error rates compared to the baseline model, achieving a notable increase in scores. In the meanwhile, we have attained outstanding results in online assessments, demonstrating the performance and generalization capabilities of the proposed system. The source code for our system is released in \url{https://gitlab.aicrowd.com/shizueyy/crag-new}.	翻訳日:2024-09-04 16:51:50 公開日:2024-09-02
# 局所演算と古典的資源による量子相関の分散 Distributing quantum correlations through local operations and classical resources ( http://arxiv.org/abs/2408.05490v2 ) ライセンス: Link先を確認	Adam G. Hawkins, Hannah McAleese, Mauro Paternostro,	(参考訳) ネットワークの各ノードに量子相関を分配することは、量子ネットワークの重要な側面である。本稿では,古典的相関しか持たない情報キャリアの混合状態を用いて,グローバルな量子相関を量子メモリに分散できる,ロバストで物理的に動機付けられたプロトコルを提案する。これに加えて, 分布は測定アウトカム独立であり, 分布は局所的ユニタリ演算と射影測定のみを用いて行われる。また、このプロトコルの大規模ネットワークへのスケーリングについて検討し、量子相関の構造を概説し、その局所的な演算への依存を示す。 Distributing quantum correlations to each node of a network is a key aspect of quantum networking. Here, we present a robust, physically-motivated protocol by which global quantum correlations, as characterised by the discord, can be distributed to quantum memories using a mixed state of information carriers which possess only classical correlations. In addition to this, said distribution is measurement-outcome independent, and the distribution is done using only bilocal unitary operations and projective measurements. We also explore the scaling of this protocol for larger networks and illustrate the structure of the quantum correlations, showing its dependence on the local operations performed.	翻訳日:2024-09-04 16:51:50 公開日:2024-09-02
# SABER-6D:形状表現に基づくインプシットオブジェクトポース推定 SABER-6D: Shape Representation Based Implicit Object Pose Estimation ( http://arxiv.org/abs/2408.05867v2 ) ライセンス: Link先を確認	Shishir Reddy Vutukur, Mengkejiergeli Ba, Benjamin Busam, Matthias Kayser, Gurprit Singh,	(参考訳) 本稿では,SABERと呼ばれる新しいエンコーダデコーダアーキテクチャを提案し,与えられたポーズで形状表現を学習することで,埋め込み空間における物体の6次元ポーズを学習する。このモデルにより、RGB画像入力からターゲットポーズで形状表現を行うことで、ポーズを学習することができる。 2次元画像に基づく物体の回転空間の学習を支援する補助タスクとして形状表現を行う。画像エンコーダは埋め込み空間内の回転を予測し、DeepSDFベースのデコーダは、所定のポーズでオブジェクトの形状を表現することを学習する。我々のアプローチは形状に基づいており、パイプラインは対称性に関係なくどんな種類のオブジェクトにも適している。さらに、SABERをトレーニングするためには、オブジェクトのCADモデルのみが必要である。我々のパイプラインは合成データに基づいており、対称ラベルを使わずに対称オブジェクトを処理できるため、追加のラベル付きトレーニングデータを必要としない。実験により,Occlusion-LineMODおよびT-LESSデータセット上で,対称オブジェクトと非対称オブジェクトのベンチマーク結果に近い結果が得られた。 In this paper, we propose a novel encoder-decoder architecture, named SABER, to learn the 6D pose of the object in the embedding space by learning shape representation at a given pose. This model enables us to learn pose by performing shape representation at a target pose from RGB image input. We perform shape representation as an auxiliary task which helps us in learning rotations space for an object based on 2D images. An image encoder predicts the rotation in the embedding space and the DeepSDF based decoder learns to represent the object's shape at the given pose. As our approach is shape based, the pipeline is suitable for any type of object irrespective of the symmetry. Moreover, we need only a CAD model of the objects to train SABER. Our pipeline is synthetic data based and can also handle symmetric objects without symmetry labels and, thus, no additional labeled training data is needed. The experimental evaluation shows that our method achieves close to benchmark results for both symmetric objects and asymmetric objects on Occlusion-LineMOD, and T-LESS datasets.	翻訳日:2024-09-04 16:51:50 公開日:2024-09-02
# 探索と学習構造:ナビゲーションエージェントにおけるアクティブ推論アプローチ Exploring and Learning Structure: Active Inference Approach in Navigational Agents ( http://arxiv.org/abs/2408.05982v2 ) ライセンス: Link先を確認	Daria de Tinguy, Tim Verbelen, Bart Dhoedt,	(参考訳) 動物ナビゲーション戦略からインスピレーションを得て,生物にインスパイアされた原理に根ざしたナビゲーションとマッピングのための新しい計算モデルを導入する。動物は、記憶、想像力、戦略的決定を効果的に利用して、複雑で空想的な環境をナビゲートすることで、優れたナビゲーション能力を示す。これらの知見に基づいて、従来の認知マッピングアプローチとアクティブ推論フレームワーク(AIF)を統合し、環境構造をいくつかのステップで学習する。長期記憶のためのトポロジカルマッピングとナビゲーション計画と構造学習のためのAIFの導入により,我々のモデルは動的に環境構造を把握し,探索中に予測された信念で内部マップを拡張することができる。 Clone-Structured Graph(CSCG)モデルによる比較実験は、ナビゲーションオーバーラップを最小限に抑えながら、1回で環境構造を迅速に学習するモデルの能力を強調している。これは、環境の次元や観察の種類に関する事前の知識なしで達成され、あいまいな環境をナビゲートする際の頑丈さと有効性を示す。 Drawing inspiration from animal navigation strategies, we introduce a novel computational model for navigation and mapping, rooted in biologically inspired principles. Animals exhibit remarkable navigation abilities by efficiently using memory, imagination, and strategic decision-making to navigate complex and aliased environments. Building on these insights, we integrate traditional cognitive mapping approaches with an Active Inference Framework (AIF) to learn an environment structure in a few steps. Through the incorporation of topological mapping for long-term memory and AIF for navigation planning and structure learning, our model can dynamically apprehend environmental structures and expand its internal map with predicted beliefs during exploration. Comparative experiments with the Clone-Structured Graph (CSCG) model highlight our model's ability to rapidly learn environmental structures in a single episode, with minimal navigation overlap. this is achieved without prior knowledge of the dimensions of the environment or the type of observations, showcasing its robustness and effectiveness in navigating ambiguous environments.	翻訳日:2024-09-04 16:51:50 公開日:2024-09-02
# BadMerging: モデルマージに対するバックドア攻撃 BadMerging: Backdoor Attacks Against Model Merging ( http://arxiv.org/abs/2408.07362v2 ) ライセンス: Link先を確認	Jinghuai Zhang, Jianfeng Chi, Zheng Li, Kunlin Cai, Yang Zhang, Yuan Tian,	(参考訳) 下流タスクのための微調整済みモデルが、オープンソースのタスク固有モデルの普及につながっている。近年,モデルマージング (MM) は,これら独立に調整されたモデル間の知識伝達を促進する効果的な手法として出現している。 MMは、複数の微調整されたタスク固有モデルを追加のトレーニングなしでマージモデルに直接結合し、その結果、複数のタスクで強化された機能を示す。 MMは優れたユーティリティを提供するが、敵が複数の下流タスクに影響を与えるためにMMを利用することができるため、セキュリティ上のリスクが伴う可能性がある。しかし、MMのセキュリティリスクはほとんど研究されていない。本稿では,新たな学習パラダイムとして,統合プロセスによる既存のバックドア攻撃に固有の課題を導入することを最初に見出した。これらの課題に対処するために、MM用に特別に設計された最初のバックドアアタックであるBadMergingを紹介します。特に、BadMergingは、1つのバックドアタスク固有のモデルへのコントリビュートによって、敵がマージされたモデル全体を妥協することを可能にする。 BadMergingは、2段階の攻撃機構と、異なるマージパラメータの変化に対する組込みバックドアの堅牢性を高めるために、新しい特徴補間ベースの損失を含む。統合モデルは異なるドメインからのタスクを組み込むことができるため、BadMergingは敵(オンタスクアタック)や他のコントリビュータ(オフタスクアタック)によって提供されるタスクを共同で妥協し、新しいアタックデザインで対応する固有の課題を解決することができる。大規模な実験により、BadMergingは様々なMMアルゴリズムに対する顕著な攻撃を達成している。本研究は,提案した攻撃設計が攻撃性能に段階的に寄与することを示すものである。最後に,従来の防衛機構が我々の攻撃に対して防御に失敗することを示し,より先進的な防衛の必要性を強調した。 Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective approach to facilitate knowledge transfer among these independently fine-tuned models. MM directly combines multiple fine-tuned task-specific models into a merged model without additional training, and the resulting model shows enhanced capabilities in multiple tasks. Although MM provides great utility, it may come with security risks because an adversary can exploit MM to affect multiple downstream tasks. However, the security risks of MM have barely been studied. In this paper, we first find that MM, as a new learning paradigm, introduces unique challenges for existing backdoor attacks due to the merging process. To address these challenges, we introduce BadMerging, the first backdoor attack specifically designed for MM. Notably, BadMerging allows an adversary to compromise the entire merged model by contributing as few as one backdoored task-specific model. BadMerging comprises a two-stage attack mechanism and a novel feature-interpolation-based loss to enhance the robustness of embedded backdoors against the changes of different merging parameters. Considering that a merged model may incorporate tasks from different domains, BadMerging can jointly compromise the tasks provided by the adversary (on-task attack) and other contributors (off-task attack) and solve the corresponding unique challenges with novel attack designs. Extensive experiments show that BadMerging achieves remarkable attacks against various MM algorithms. Our ablation study demonstrates that the proposed attack designs can progressively contribute to the attack performance. Finally, we show that prior defense mechanisms fail to defend against our attacks, highlighting the need for more advanced defense.	翻訳日:2024-09-04 16:51:50 公開日:2024-09-02
# LiveFC: オーディオストリームのライブFact-Checkingシステム LiveFC: A System for Live Fact-Checking of Audio Streams ( http://arxiv.org/abs/2408.07448v2 ) ライセンス: Link先を確認	Venktesh V, Vinay Setty,	(参考訳) デジタル時代の進歩は情報を急速に広めた。また、偽情報や偽情報の拡散も激化している。これは市民の不安のような深刻な結果をもたらす可能性がある。事実チェックは、これと戦うことを目的としているが、手動の事実チェックは面倒で、スケーラブルではない。自動ファクトチェックアプローチは存在するが、リアルタイムに動作せず、異なるモダリティによる誤情報拡散を必ずしも考慮していない。これは、リアルタイムのライブストリームのプロアクティブな事実チェックが、人々が偽の物語を知らされ、市民の不安を引き起こす破滅的な結果を防ぐのに役立つため、特に重要である。これは特に、ソーシャルメディアプラットフォームや政治集会や討論のような他のストリームでのビデオを通じて情報を急速に広めることに関連している。そこで本研究では,リアルタイムにライブオーディオストリームの事実チェックを支援する,LiveFCというプラットフォームを開発した。 LiveFCにはユーザフレンドリーなインターフェースがあり、検出されたクレームとその正確性、および各セグメントからのクレームに関する関連話者によるライブストリームの証拠を表示する。アプリはhttp://livefc.factiverse.aiでアクセスでき、デモのスクリーン録画はhttps://bit.ly/3WVAoIwで見ることができる。 The advances in the digital era have led to rapid dissemination of information. This has also aggravated the spread of misinformation and disinformation. This has potentially serious consequences, such as civil unrest. While fact-checking aims to combat this, manual fact-checking is cumbersome and not scalable. While automated fact-checking approaches exist, they do not operate in real-time and do not always account for spread of misinformation through different modalities. This is particularly important as proactive fact-checking on live streams in real-time can help people be informed of false narratives and prevent catastrophic consequences that may cause civil unrest. This is particularly relevant with the rapid dissemination of information through video on social media platforms or other streams like political rallies and debates. Hence, in this work we develop a platform named LiveFC, that can aid in fact-checking live audio streams in real-time. LiveFC has a user-friendly interface that displays the claims detected along with their veracity and evidence for live streams with associated speakers for claims from respective segments. The app can be accessed at http://livefc.factiverse.ai and a screen recording of the demo can be found at https://bit.ly/3WVAoIw.	翻訳日:2024-09-04 16:42:00 公開日:2024-09-02
# 3Qubitシステムを用いた量子エネルギーテレポーテーション Enhanced Quantum Energy Teleportation using a 3-Qubit System ( http://arxiv.org/abs/2408.07997v3 ) ライセンス: Link先を確認	Md Shoyib Hassan, Syed Emad Uddin Shubha, M. R. C Mahdy,	(参考訳) 量子エネルギーテレポーテーション(Quantum Energy Teleportation, QET)は、量子絡み合いを利用して、エネルギーを物理的に動かさずに2つの離れた場所間でエネルギーを伝達する新しい方法である。超伝導ハードウェア上での最初のQETの実現は、2キュービットのシステムを用いて、受信機ボブによる平均エネルギー回収効率は35.4%(Vのみ)であった。本稿では,QETのエネルギー効率を高めるために,3量子システムを用いた新しい手法を提案する。我々は、これを達成するために、3量子基底状態ハミルトニアンHを新たに組み込んだ。これは、送信者および受信者の観測可能な操作のゼロ平均エネルギーと反可換特性の制約に適合する。実験の結果, エネルギー回収効率は65.5%(Vのみ)で, 実用性に関しては2キュービットシステムよりも有意に向上した。この進歩は、実用的な量子エネルギー応用の一歩であるだけでなく、将来の量子エネルギーテレポーテーションおよび関連する量子技術研究のための新しいフレームワークも提供する。 Quantum Energy Teleportation (QET) is a novel method that leverages quantum entanglement to transfer energy between two distant locations without any physical movement of the energy. The first realization of QET on superconducting hardware, utilizing a 2-qubit system, demonstrated an average energy retrieval efficiency of 35.4% (observing only V ) by the receiver, Bob. In this paper, we present a new approach using a 3-qubit system to enhance the energy efficiency of QET. We have incorporated a novel 3-qubit ground state hamiltonian H to achieve this, that conforms the constraints of Zero mean energy and anti-commutative properties of the operations on the observable of the senders and receiver. Our experimental results show a significant improvement in energy retrieval, achieving an average efficiency of 65.5% (observing only V ), which is significantly higher than that of the 2-qubit system regarding practical usage. This advancement not only marks a step forward in practical quantum energy applications but also provides a new framework for future research in quantum energy teleportation and related quantum technologies.	翻訳日:2024-09-04 16:42:00 公開日:2024-09-02
# EagleEye: 悪質なイベントシーケンスを前兆グラフから明らかにする意図 EagleEye: Attention to Unveil Malicious Event Sequences from Provenance Graphs ( http://arxiv.org/abs/2408.09217v2 ) ライセンス: Link先を確認	Philipp Gysel, Candid Wüest, Kenneth Nwafor, Otakar Jašek, Andrey Ustyuzhanin, Dinil Mon Divakaran,	(参考訳) エンドポイントのセキュリティは、脅威とアタックの進化の性質のため、難しい。エンドポイントロギングシステムが成熟するにつれて、プロファイランスグラフ表現は洗練された振る舞いルールの作成を可能にします。しかし、出現する攻撃のペースに適応することは、ルールによってはスケーラブルではない。これにより、エンドポイントログから学習できるMLモデルの開発につながった。しかし、まだオープンな課題があります。一悪意あるマルウェアのパターンが長い一連の出来事に散らばり、二 ML分類結果が解釈できないこと。これらの問題に対処するため,我々は新しいシステムであるEagleEyeを開発し,提示する。 i) 動作イベントの表現には,コマンド行の埋め込みなど,プロファイランスグラフの豊富な機能を使用する。二イベントの長いシーケンスを抽出し、イベント埋め込みを学習し、三行動シーケンスを悪意の有無を分類するために、軽量トランスフォーマーモデルを訓練する。我々はEagleEyeを、2つのデータセット、すなわち企業環境からの新しい実世界のデータセットと、公開DARPAデータセットの最先端のベースラインと比較し比較する。 DARPAのデータセットでは、偽陽性率1%で、EagleEyeは悪意のあるすべての行動の89%の$\approxを検知し、2つの最先端ソリューションを38.5%で上回っている。さらに、トランスフォーマーの注意機構を利用して、長いシーケンスで最も疑わしい事象をハイライトし、マルウェアの警告を解釈できることが示される。 Securing endpoints is challenging due to the evolving nature of threats and attacks. With endpoint logging systems becoming mature, provenance-graph representations enable the creation of sophisticated behavior rules. However, adapting to the pace of emerging attacks is not scalable with rules. This led to the development of ML models capable of learning from endpoint logs. However, there are still open challenges: i) malicious patterns of malware are spread across long sequences of events, and ii) ML classification results are not interpretable. To address these issues, we develop and present EagleEye, a novel system that i) uses rich features from provenance graphs for behavior event representation, including command-line embeddings, ii) extracts long sequences of events and learns event embeddings, and iii) trains a lightweight Transformer model to classify behavior sequences as malicious or not. We evaluate and compare EagleEye against state-of-the-art baselines on two datasets, namely a new real-world dataset from a corporate environment, and the public DARPA dataset. On the DARPA dataset, at a false-positive rate of 1%, EagleEye detects $\approx$89% of all malicious behavior, outperforming two state-of-the-art solutions by an absolute margin of 38.5%. Furthermore, we show that the Transformer's attention mechanism can be leveraged to highlight the most suspicious events in a long sequence, thereby providing interpretation of malware alerts.	翻訳日:2024-09-04 16:42:00 公開日:2024-09-02
# Ancestral Reinforcement Learning:強化学習のためのゼロ階最適化と遺伝的アルゴリズムの統合 Ancestral Reinforcement Learning: Unifying Zeroth-Order Optimization and Genetic Algorithms for Reinforcement Learning ( http://arxiv.org/abs/2408.09493v2 ) ライセンス: Link先を確認	So Nakashima, Tetsuya J. Kobayashi,	(参考訳) 強化学習(RL)は、未知の環境内での相互作用を通じて最適な行動戦略を発見するための基本的なフレームワークを提供する。近年の進歩により, RLの性能と適用性は, エージェントの集団を様々な方法で活用することによって著しく向上することが示されている。ゼロ階最適化(ZOO)は、エージェントの集団を利用して目的関数の勾配を推定し、微分不可能なシナリオにおいても堅牢な政策改善を可能にする。別の応用として、GA(Genematic Algorithms)は、エージェント集団におけるポリシーの多様性の変異生成によるポリシー景観の探索と、その選択による洗練を促進する。自然の疑問は、エージェントの人口が持つことのできる2つの世界の中で、最高のものを得ることができるかどうかである。本研究では,ZOOの頑健な勾配推定とGAの探索的パワーを相乗的に組み合わせたAncestral Reinforcement Learning (ARL)を提案する。 ARLの鍵となる考え方は、集団内の各エージェントがその祖先の歴史、すなわち過去の祖先の人口を利用して勾配を推定し、GAのように現在の人口における政策の多様性を維持することである。また,ARLにおける集団探索は,対象関数のKL正規化を暗黙的に誘導し,探索の強化をもたらすことも理論的に明らかにした。以上の結果から,RLに対する集団アルゴリズムの適用性の向上が期待できる。 Reinforcement Learning (RL) offers a fundamental framework for discovering optimal action strategies through interactions within unknown environments. Recent advancement have shown that the performance and applicability of RL can significantly be enhanced by exploiting a population of agents in various ways. Zeroth-Order Optimization (ZOO) leverages an agent population to estimate the gradient of the objective function, enabling robust policy refinement even in non-differentiable scenarios. As another application, Genetic Algorithms (GA) boosts the exploration of policy landscapes by mutational generation of policy diversity in an agent population and its refinement by selection. A natural question is whether we can have the best of two worlds that the agent population can have. In this work, we propose Ancestral Reinforcement Learning (ARL), which synergistically combines the robust gradient estimation of ZOO with the exploratory power of GA. The key idea in ARL is that each agent within a population infers gradient by exploiting the history of its ancestors, i.e., the ancestor population in the past, while maintaining the diversity of policies in the current population as in GA. We also theoretically reveal that the populational search in ARL implicitly induces the KL-regularization of the objective function, resulting in the enhanced exploration. Our results extend the applicability of populational algorithms for RL.	翻訳日:2024-09-04 16:42:00 公開日:2024-09-02
# 入力計画からの学習行動コストについて On Learning Action Costs from Input Plans ( http://arxiv.org/abs/2408.10889v2 ) ライセンス: Link先を確認	Marianela Morales, Alberto Pozanco, Giuseppe Canonaco, Sriram Gopalakrishnan, Daniel Borrajo, Manuela Veloso,	(参考訳) アクションモデルを学習する作業の多くは、入力計画からアクションのダイナミクスを学ぶことに焦点を当てている。これにより、計画タスクの有効な計画を指定することができます。しかしながら、アクションコストの学習に焦点を当てる作業はほとんどなく、それによって異なる計画のランク付けが可能になります。本稿では,入力計画の集合が最適であるような行動の集合のコストを学習する新たな問題を紹介する。この問題を解決するために、未ラベルの入力計画からアクションのコストを学習するアルゴリズムである$LACFIP^k$を提案する。我々は、LACFIP^k$がいかにしてこの課題をうまく解決できるかを示す理論的および実証的な結果を提供する。 Most of the work on learning action models focus on learning the actions' dynamics from input plans. This allows us to specify the valid plans of a planning task. However, very little work focuses on learning action costs, which in turn allows us to rank the different plans. In this paper we introduce a new problem: that of learning the costs of a set of actions such that a set of input plans are optimal under the resulting planning model. To solve this problem we present $LACFIP^k$, an algorithm to learn action's costs from unlabeled input plans. We provide theoretical and empirical results showing how $LACFIP^k$ can successfully solve this task.	翻訳日:2024-09-04 16:42:00 公開日:2024-09-02
# 後方崩壊による遅延拡散モデルに基づく画像編集に対するGrey-box攻撃 A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse ( http://arxiv.org/abs/2408.10901v2 ) ライセンス: Link先を確認	Zhongliang Guo, Lei Fang, Jingyu Lin, Yifei Qian, Shuai Zhao, Zeyu Wang, Junhao Dong, Cunjian Chen, Ognjen Arandjelović, Chun Pong Lau,	(参考訳) 生成AI、特に潜在拡散モデル(LDM)の最近の進歩は、画像合成と操作に革命をもたらした。しかし、これらの生成技術は、データ不正と知的財産権侵害に関する懸念を提起する。機械学習モデルに対する敵対的な攻撃は広く研究されており、十分に確立された研究機関は、これらのテクニックを良心的な指標として拡張し、生成AIの根底にある誤用を防ぐ。 LDMによる画像の操作から保護するための現在のアプローチは、モデル固有の知識に依存し、生成した画像のセマンティック品質を著しく低下させることができないため、制限されている。これらの問題点に対処して,VAEが訓練中に後部崩壊に苦しむという観察に基づくPCA(Posterior Collapse Attack)を提案する。本手法は,対象モデルのホワイトボックス情報への依存を最小限に抑え,モデル固有の知識への暗黙的な依存を取り除く。 LDMのVAEエンコーダは,ごく少量のLDMパラメータにのみアクセスすることで,特に知覚的整合性において生成品質が著しく低下し,様々なモデルアーキテクチャ間で強い伝達性を示す。実験結果から,PCAは低ランタイムおよびVRAMのLDM画像生成に優れた摂動効果が得られた。我々の手法は既存の手法より優れており、より堅牢で一般化可能なソリューションを提供し、生成AIの急速な発展によって引き起こされる社会技術的課題を軽減するのに役立ちます。 Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have revolutionized image synthesis and manipulation. However, these generative techniques raises concerns about data misappropriation and intellectual property infringement. Adversarial attacks on machine learning models have been extensively studied, and a well-established body of research has extended these techniques as a benign metric to prevent the underlying misuse of generative AI. Current approaches to safeguarding images from manipulation by LDMs are limited by their reliance on model-specific knowledge and their inability to significantly degrade semantic quality of generated images. In response to these shortcomings, we propose the Posterior Collapse Attack (PCA) based on the observation that VAEs suffer from posterior collapse during training. Our method minimizes dependence on the white-box information of target models to get rid of the implicit reliance on model-specific knowledge. By accessing merely a small amount of LDM parameters, in specific merely the VAE encoder of LDMs, our method causes a substantial semantic collapse in generation quality, particularly in perceptual consistency, and demonstrates strong transferability across various model architectures. Experimental results show that PCA achieves superior perturbation effects on image generation of LDMs with lower runtime and VRAM. Our method outperforms existing techniques, offering a more robust and generalizable solution that is helpful in alleviating the socio-technical challenges posed by the rapidly evolving landscape of generative AI.	翻訳日:2024-09-04 16:42:00 公開日:2024-09-02
# 量子情報は追加構造を必要とするか? Does quantum information require additional structure? ( http://arxiv.org/abs/2408.11183v2 ) ライセンス: Link先を確認	Ryszard Horodecki,	(参考訳) ハイゼンベルクの普遍定数の下での基本的な物理理論モデルの分類の文脈における量子情報現象を含む物理現実のモデルの解釈的問題を考える。本稿では、物理現実と数学的モデルとの対応原理を導入し、物理モデルとの関連性について論じる。標準量子モデルにおける量子情報の現状を考察し、対応原理に基づき、量子情報の数学的表現としての波動関数の解釈を提案する。この文脈では、実験室空間と構成空間における量子理論の不適合な定式化を局所現実論の文脈で考える。次に、'古典'ミンコフスキー時空における相関に先行する直接観測されていない関係の量子空間の仮説を導入し、ライヒェンバッハの共通因数原理と矛盾する。最後に、有界状態に対する潜在的に測定可能な効果を予測する量子リレーショナル空間の例として、Chyli\'nskiモデルを示す。 We consider interpretative problems of models of physical reality, including quantum information phenomenon in the context of Heisenberg's classification of the fundamental physical theoretic models under the role of universal constants Planck's constant $h$ and speed of light $c$. We introduce the correspondence principle between physical reality and mathematical models, and we discuss its significance in relation to physical models. We consider the status of quantum information in the standard quantum model, and based on the correspondence principle, we propose an interpretation of the wave function as a mathematical representation of quantum information. In this context, we consider Clauser's analysis of incompatibility formulations of quantum theory in laboratory space and configuration space in the context of local realism. Then, we introduce the hypothesis of quantum space of directly unobserved relations, which precede correlations in the ``classical'' Minkowski space-time and are compatible with the Reichenbach common cause principle. Finally, we present the Chyli\'nski model as an example of quantum relational space, which predicts potentially measurable effects for the bound states.	翻訳日:2024-09-04 16:42:00 公開日:2024-09-02
# 動画拡散モデル Video Diffusion Models are Strong Video Inpainter ( http://arxiv.org/abs/2408.11402v2 ) ライセンス: Link先を確認	Minhyeok Lee, Suhwan Cho, Chajin Shin, Jungho Lee, Sunghun Yang, Sangyoun Lee,	(参考訳) 画素や特徴レベルでの光学的流れを用いた伝搬型映像の塗布は近年注目されている。しかし、光流予測の不正確さや時間経過に伴う雑音の伝搬といった制限がある。これらの問題は、ビデオ全体を通して一様でないノイズや時間的整合性の問題をもたらす。これらの問題に対処するため,我々はFFF-VDI (First Frame Filling Video Diffusion Inpainting Model) を提案する。我々は、FFF-VDIを、第1フレーム画像を非常に自然な映像に変換することができる、事前訓練された画像間拡散モデルの能力に着想を得た設計を行う。これを適用するために、将来のフレームのノイズ潜時情報を伝搬して、第1フレームのノイズ潜時符号のマスキング領域を埋める。次に,事前学習した画像間拡散モデルを微調整し,インペイント映像を生成する。提案モデルは、光学的フロー品質に依存した既存の手法の限界に対処し、より自然で時間的に一貫したビデオを生成する。提案手法は,画像と映像の拡散モデルを映像の塗装作業に効果的に統合する最初の方法である。種々の比較実験を通じて,提案モデルが高品質な多彩な塗布型を頑健に処理できることを実証した。 Propagation-based video inpainting using optical flow at the pixel or feature level has recently garnered significant attention. However, it has limitations such as the inaccuracy of optical flow prediction and the propagation of noise over time. These issues result in non-uniform noise and time consistency problems throughout the video, which are particularly pronounced when the removed area is large and involves substantial movement. To address these issues, we propose a novel First Frame Filling Video Diffusion Inpainting model (FFF-VDI). We design FFF-VDI inspired by the capabilities of pre-trained image-to-video diffusion models that can transform the first frame image into a highly natural video. To apply this to the video inpainting task, we propagate the noise latent information of future frames to fill the masked areas of the first frame's noise latent code. Next, we fine-tune the pre-trained image-to-video diffusion model to generate the inpainted video. The proposed model addresses the limitations of existing methods that rely on optical flow quality, producing much more natural and temporally consistent videos. This proposed approach is the first to effectively integrate image-to-video diffusion models into video inpainting tasks. Through various comparative experiments, we demonstrate that the proposed model can robustly handle diverse inpainting types with high quality.	翻訳日:2024-09-04 16:42:00 公開日:2024-09-02
# MCDubber:マルチモーダルなコンテキスト対応表現型ビデオダビング MCDubber: Multimodal Context-Aware Expressive Video Dubbing ( http://arxiv.org/abs/2408.11593v2 ) ライセンス: Link先を確認	Yuan Zhao, Zhenqi Jia, Rui Liu, De Hu, Feilong Bao, Guanglai Gao,	(参考訳) 自動ビデオダビング(AVD)は、与えられたスクリプトを取得し、唇の動きと韻律表現性に合わせた音声を生成することを目的としている。現在のAVDモデルは、合成音声の韻律を高めるために、主に現在の文の視覚情報を利用する。しかし, 生成したダビングの韻律とマルチモーダルな文脈との整合性は考慮する必要がある。この側面は以前の研究で見過ごされてきた。この問題に対処するため,大域的文脈韻律の整合性を確保するために,マルチモーダルなコンテキスト対応ビデオダビングモデルである「textbf{MCDubber}」を提案する。 MCDubber は,(1) 文脈持続時間調整器は,テキストフレームと唇フレーム間のコンテキスト認識アライメントを学習すること,(2) 文脈韻律予測器は,グローバルコンテキストの視覚的シーケンスを読み取って,コンテキスト認識のグローバルエネルギーとピッチを予測すること,(3) コンテキスト音響復号器は,隣接する接地トラスメルスペクトルの助けを借りて,最終的にグローバルコンテキストメルスペクトルを予測すること,の3つの主成分から構成される。このプロセスを通じて、MCDubberは、ダビング時の現行文の韻律表現性に対するマルチモーダルコンテキストの影響を十分に検討する。出力コンテキスト mel-spectrograms から対象文に属する抽出したmel-spectrogram は、最後の必要なダビングオーディオである。 Chemベンチマークデータセットの大規模な実験により、我々のMCDubberは、全ての高度なベースラインと比較してダビング表現性を著しく改善することが示された。コードとデモはhttps://github.com/XiaoYuanJun-zy/MCDubber.comで公開されている。 Automatic Video Dubbing (AVD) aims to take the given script and generate speech that aligns with lip motion and prosody expressiveness. Current AVD models mainly utilize visual information of the current sentence to enhance the prosody of synthesized speech. However, it is crucial to consider whether the prosody of the generated dubbing aligns with the multimodal context, as the dubbing will be combined with the original context in the final video. This aspect has been overlooked in previous studies. To address this issue, we propose a Multimodal Context-aware video Dubbing model, termed \textbf{MCDubber}, to convert the modeling object from a single sentence to a longer sequence with context information to ensure the consistency of the global context prosody. MCDubber comprises three main components: (1) A context duration aligner aims to learn the context-aware alignment between the text and lip frames; (2) A context prosody predictor seeks to read the global context visual sequence and predict the context-aware global energy and pitch; (3) A context acoustic decoder ultimately predicts the global context mel-spectrogram with the assistance of adjacent ground-truth mel-spectrograms of the target sentence. Through this process, MCDubber fully considers the influence of multimodal context on the prosody expressiveness of the current sentence when dubbing. The extracted mel-spectrogram belonging to the target sentence from the output context mel-spectrograms is the final required dubbing audio. Extensive experiments on the Chem benchmark dataset demonstrate that our MCDubber significantly improves dubbing expressiveness compared to all advanced baselines. The code and demos are available at https://github.com/XiaoYuanJun-zy/MCDubber.	翻訳日:2024-09-04 16:42:00 公開日:2024-09-02
# 量子誤り訂正符号のためのフォールトトレラント状態準備回路の自動合成 Automated Synthesis of Fault-Tolerant State Preparation Circuits for Quantum Error Correction Codes ( http://arxiv.org/abs/2408.11894v2 ) ライセンス: Link先を確認	Tom Peham, Ludwig Schmid, Lucas Berent, Markus Müller, Robert Wille,	(参考訳) フォールトトレラント量子アルゴリズムの主要な要素は、与えられた量子誤り訂正符号に対する論理状態の初期化である。現在利用可能なハードウェア上で実現可能な小さなコードインスタンスの有望な結果を実証したスキームは、エラーの拡散をチェックする検証ステップを備えた、フォールトトレラントな状態準備ステップを構成する。この方式の既知の回路構成は、主に手動で得られ、深さ最適化回路やゲート最適化回路を構築するアルゴリズム技術は存在しない。結果として、現在の最先端技術はこのスキームを特定のコードインスタンスにのみ利用し、主に距離3コードの特殊なケースに利用している。本研究では,任意のCSSコードに対してフォールトトレラントな状態準備回路を合成するための自動アプローチを提案する。本研究では,SAT法を応用して,深度・ゲート最適準備・検証回路からなる耐故障状態準備回路を構築する。また,適切な時間枠で最適解が得られないコードインスタンスに対して,フォールトトレラントな状態準備回路を合成できるヒューリスティックスも提供する。さらに、距離3を超える非決定論的状態準備回路の一般的な構成を与える。 $d=3$および$d=5$符号を用いた数値評価により、生成された回路が論理誤差率の望ましいスケーリングを示すことを確認した。得られたメソッドは、 https://github.com/cda-tum/mqt-qecc.com/mqt-qeccのミュンヘン量子ツールキット(MQT)の一部として公開されている。このような手法は、フォールトトレラントな量子コンピューティングの短期的な実証に役立つ、フォールトトレラントな回路構成を提供するための重要なステップである。 A central ingredient in fault-tolerant quantum algorithms is the initialization of a logical state for a given quantum error-correcting code from a set of noisy qubits. A scheme that has demonstrated promising results for small code instances that are realizable on currently available hardware composes a non-fault-tolerant state preparation step with a verification step that checks for spreading errors. Known circuit constructions of this scheme are mostly obtained manually, and no algorithmic techniques for constructing depth- or gate-optimal circuits exist. As a consequence, the current state of the art exploits this scheme only for specific code instances and mostly for the special case of distance 3 codes. In this work, we propose an automated approach for synthesizing fault-tolerant state preparation circuits for arbitrary CSS codes. We utilize methods based on satisfiability solving (SAT) techniques to construct fault-tolerant state preparation circuits consisting of depth- and gate-optimal preparation and verification circuits. We also provide heuristics that can synthesize fault-tolerant state preparation circuits for code instances where no optimal solution can be obtained in an adequate timeframe. Moreover, we give a general construction for non-deterministic state preparation circuits beyond distance 3. Numerical evaluations using $d=3$ and $d=5$ codes confirm that the generated circuits exhibit the desired scaling of the logical error rates. The resulting methods are publicly available as part of the Munich Quantum Toolkit (MQT) at https://github.com/cda-tum/mqt-qecc. Such methods are an important step in providing fault-tolerant circuit constructions that can aid in near-term demonstration of fault-tolerant quantum computing.	翻訳日:2024-09-04 16:32:02 公開日:2024-09-02
# マルチパラメトリックMRIによる乳癌患者の非侵襲的・パーソナライズドマネージメントに向けて Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model ( http://arxiv.org/abs/2408.12606v2 ) ライセンス: Link先を確認	Luyang Luo, Mingxiang Wu, Mei Li, Yi Xin, Qiong Wang, Varut Vardhanabhuti, Winnie CW Chu, Zhenhui Li, Juan Zhou, Pranav Rajpurkar, Hao Chen,	(参考訳) 乳腺MRI(英: Breast magnetic resonance imaging, MRI)は、乳がんの検出に最も敏感な撮像技術であり、高リスクの女性に日常的に使用される。乳房MRIの包括的マルチパラメトリックプロトコルにもかかわらず、既存の人工知能に基づく研究は主に単一のシーケンスに依存しており、検証は限られている。本稿では,マルチパラメトリックMRI情報を統一された構造に統合したMOME(Mixix-of-modality-experts model)を報告する。我々は,中国北部,南東,南西の3つの病院から5,205人の患者を対象とし,このモデルの開発と広範囲な評価を行う,最大規模のマルチパラメトリック乳房MRIデータセットを収集した。 MOMEは乳癌の正確かつ堅牢な同定を証明した。 4名の放射線科医に比較して悪性度判定の成績は良好で, 0.913 AUROC, 0.948 AUPRC, 0.905 F1スコア, 0.723 MCC で成績は良好であった。以上の結果から, BI-RADS 4症例の生検の必要性は7.3%, AUROCが0.709, AUROCが0.694, MOMEが0.694であった。このモデルは、スケーラブルで解釈可能な推論をサポートし、モダリティの欠如に適応し、病変を強調し、モダリティの寄与を測定することで決定的な説明を提供する。 MOMEは、差別的で、堅牢で、スケーラブルで、解釈可能なマルチモーダルモデルを示し、マルチパラメトリックな乳がん画像データに基づく、非侵襲的でパーソナライズドな乳がん患者の管理の道を開く。 Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts model (MOME) that integrates multiparametric MRI information within a unified structure, offering a noninvasive method for personalized breast cancer management. We have curated the largest multiparametric breast MRI dataset, involving 5,205 patients from three hospitals in the north, southeast, and southwest of China, for the development and extensive evaluation of our model. MOME demonstrated accurate and robust identification of breast cancer. It achieved comparable performance for malignancy recognition to that of four senior radiologists and significantly outperformed a junior radiologist, with 0.913 AUROC, 0.948 AUPRC, 0.905 F1 score, and 0.723 MCC. Our findings suggest that MOME could reduce the need for biopsies in BI-RADS 4 patients with a ratio of 7.3%, classify triple-negative breast cancer with an AUROC of 0.709, and predict pathological complete response to neoadjuvant chemotherapy with an AUROC of 0.694. The model further supports scalable and interpretable inference, adapting to missing modalities and providing decision explanations by highlighting lesions and measuring modality contributions. MOME exemplifies a discriminative, robust, scalable, and interpretable multimodal model, paving the way for noninvasive, personalized management of breast cancer patients based on multiparametric breast imaging data.	翻訳日:2024-09-04 16:32:02 公開日:2024-09-02
# エクイティ中心の公衆衛生決定のための機械学習の公正性を特徴付けるバイアスと予測指標の探索-ナラティブレビュー Exploring Bias and Prediction Metrics to Characterise the Fairness of Machine Learning for Equity-Centered Public Health Decision-Making: A Narrative Review ( http://arxiv.org/abs/2408.13295v2 ) ライセンス: Link先を確認	Shaina Raza, Arash Shaban-Nejad, Elham Dolatabadi, Hiroshi Mamiya,	(参考訳) 背景: 機械学習(ML)の急速な進歩は、公衆衛生研究、監視、意思決定を強化する新しい機会を表している。しかし、アルゴリズムバイアス、予測された人口健康結果の体系的誤り、そしてMLの公衆衛生適用による包括的理解の欠如がある。この物語レビューの目的は、これらのバイアスを評価するために、MLが生み出すバイアスの種類と定量的なメトリクスを調べることである。方法: PubMed, MEDLINE, IEEE (Institute of Electrical and Electronics Engineers), ACM (Association for Computing Machinery) Digital Library, Science Direct, Springer Natureの検索を行った。我々は,2008年から2023年にかけて英語で出版されたML領域と公衆および人口の健康状態の指標として,バイアスの種類や指標を記述した研究をキーワードとして用いた。結果:72項目が包括的基準を満たした。私たちのレビューでは、これらのバイアスを株式の観点から評価するために、一般的に説明されるバイアスの種類と量的指標を特定しました。結論: このレビューは, 公衆衛生に関するMLの評価枠組みを, 公平の観点から定式化する上で有効である。 Background: The rapid advancement of Machine Learning (ML) represents novel opportunities to enhance public health research, surveillance, and decision-making. However, there is a lack of comprehensive understanding of algorithmic bias, systematic errors in predicted population health outcomes, resulting from the public health application of ML. The objective of this narrative review is to explore the types of bias generated by ML and quantitative metrics to assess these biases. Methods : We performed search on PubMed, MEDLINE, IEEE (Institute of Electrical and Electronics Engineers), ACM (Association for Computing Machinery) Digital Library, Science Direct, and Springer Nature. We used keywords to identify studies describing types of bias and metrics to measure these in the domain of ML and public and population health published in English between 2008 and 2023, inclusive. Results: A total of 72 articles met the inclusion criteria. Our review identified the commonly described types of bias and quantitative metrics to assess these biases from an equity perspective. Conclusion : The review will help formalize the evaluation framework for ML on public health from an equity perspective.	翻訳日:2024-09-04 16:32:02 公開日:2024-09-02
# 拡散モデルエキスパートの連鎖による無訓練長ビデオ生成 Training-free Long Video Generation with Chain of Diffusion Model Experts ( http://arxiv.org/abs/2408.13423v3 ) ライセンス: Link先を確認	Wenhao Li, Yichao Cao, Xiu Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu,	(参考訳) ビデオ生成モデルは、映画製作などの分野で大きな可能性を秘めている。しかし、現在のビデオ拡散モデルでは、高い計算コストが必要であり、ビデオ生成タスクの複雑さのため、最適以下の結果が得られる。本稿では,ビデオ生成をより簡単なサブタスクに分解する,効率的な高品質なビデオ生成フレームワークである \textbf{ConFiner} を提案する。オフザシェルフ拡散モデルの専門家の鎖で高品質なビデオを生成することができ、それぞれが切り離されたサブタスクを担当している。改良期間中に,複数の拡散専門家の能力を単一のサンプリングにマージできるコーディネート・デノナイジングを導入する。さらに,ConFiner-Long フレームワークを設計し,ConFiner 上で3つの制約戦略で長いコヒーレントなビデオを生成する。実験の結果、推測コストのわずか10%のコストで、私たちのConFinerは、すべての客観的および主観的メトリクスでLavieやModelscopeのような代表モデルを超えています。そしてConFiner-Longは、600フレームまでの高品質でコヒーレントなビデオを生成することができる。 Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{con}trol and spatial-temporal re\textbf{fine}ment. It can generate high-quality videos with chain of off-the-shelf diffusion model experts, each expert responsible for a decoupled subtask. During the refinement, we introduce coordinated denoising, which can merge multiple diffusion experts' capabilities into a single sampling. Furthermore, we design ConFiner-Long framework, which can generate long coherent video with three constraint strategies on ConFiner. Experimental results indicate that with only 10\% of the inference cost, our ConFiner surpasses representative models like Lavie and Modelscope across all objective and subjective metrics. And ConFiner-Long can generate high-quality and coherent videos with up to 600 frames.	翻訳日:2024-09-04 16:21:29 公開日:2024-09-02
# MLR-Copilot:大規模言語モデルエージェントに基づく自律型機械学習研究 MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents ( http://arxiv.org/abs/2408.14033v2 ) ライセンス: Link先を確認	Ruochen Li, Teerth Patel, Qingyun Wang, Xinya Du,	(参考訳) 機械学習の研究は、技術的進歩とイノベーションに不可欠であり、その固有の複雑さ、実験の遅いペース、専門的な専門知識の必要性により、しばしば重大な課題に直面している。そこで我々は,大規模言語モデル(MLR-Copilot)を用いた自律機械学習研究という,大規模言語モデル(LLM)エージェントを用いた研究アイデアの自動生成と実装による機械学習研究の生産性向上を目的とした,新たな体系的フレームワークを提案する。このフレームワークは、研究アイデア生成、実験実装、実装実行の3つのフェーズで構成されている。第一に、既存の研究論文は、LLMを動力とするIdeanAgentの仮説と実験計画を生成するために使用されている。次に、実装生成フェーズはこれらの計画をExperimentAgentで実行可能なものに翻訳する。このフェーズは、検索されたプロトタイプコードを活用し、任意に候補モデルとデータを検索する。最後に、ExperimentAgentが管理する実行フェーズでは、人間のフィードバックと反復デバッグのためのメカニズムを使って実験を行い、実行可能な研究成果を達成する可能性を高める。我々は,5つの機械学習研究課題に関するフレームワークを評価し,研究の進展とイノベーションを促進するためのフレームワークの可能性を示す実験結果を示した。 Machine learning research, crucial for technological advancements and innovation, often faces significant challenges due to its inherent complexity, slow pace of experimentation, and the necessity for specialized expertise. Motivated by this, we present a new systematic framework, autonomous Machine Learning Research with large language models (MLR-Copilot), designed to enhance machine learning research productivity through the automatic generation and implementation of research ideas using Large Language Model (LLM) agents. The framework consists of three phases: research idea generation, experiment implementation, and implementation execution. First, existing research papers are used to generate hypotheses and experimental plans vis IdeaAgent powered by LLMs. Next, the implementation generation phase translates these plans into executables with ExperimentAgent. This phase leverages retrieved prototype code and optionally retrieves candidate models and data. Finally, the execution phase, also managed by ExperimentAgent, involves running experiments with mechanisms for human feedback and iterative debugging to enhance the likelihood of achieving executable research outcomes. We evaluate our framework on five machine learning research tasks and the experimental results show the framework's potential to facilitate the research progress and innovations.	翻訳日:2024-09-04 16:21:29 公開日:2024-09-02
# 空間課題における大規模言語モデルの評価:マルチタスクベンチマークによる検討 Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study ( http://arxiv.org/abs/2408.14438v3 ) ライセンス: Link先を確認	Liuchang Xu, Shuo Zhao, Qingming Lin, Luyao Chen, Qianqian Luo, Sensen Wu, Xinyue Ye, Hailin Feng, Zhenhong Du,	(参考訳) ChatGPTやGeminiといった大規模言語モデルの出現は、自然言語理解からコード生成まで、さまざまな能力を評価することの重要性を強調している。しかし,空間的課題におけるそれらの性能は包括的に評価されていない。本研究では,空間的タスクにおけるいくつかの高度なモデルの性能を体系的に探索し,比較するために設計された,新しいマルチタスク空間評価データセットを導入することで,このギャップに対処する。データセットは、空間的理解と経路計画を含む12の異なるタスクタイプを含み、それぞれが検証された正確な答えを持っている。 2相試験により,OpenAIのgpt-3.5-turbo,gpt-4o,ZhipuAIのglm-4を含む複数のモデルを評価した。最初はゼロショットテストを行い、続いてデータセットを難易度で分類し、迅速なチューニングテストを実行しました。結果,gpt-4oは第1相において平均71.3%の総合的精度を示した。ムーンショット-v1-8kは全体的に若干性能が劣ったが、地名認識タスクではgpt-4oを上回った。調査はまた、特定のタスクにおけるモデルパフォーマンスに対する迅速な戦略の影響を強調している。例えば、Chain-of-Thought(COT)戦略では、経路計画におけるgpt-4oの精度が12.4%から87.5%に向上し、一方、1ショット戦略では、マッピングタスクにおけるv1-8kの精度が10.1%から76.3%に向上した。 The advent of large language models such as ChatGPT, Gemini, and others has underscored the importance of evaluating their diverse capabilities, ranging from natural language understanding to code generation. However, their performance on spatial tasks has not been comprehensively assessed. This study addresses this gap by introducing a novel multi-task spatial evaluation dataset, designed to systematically explore and compare the performance of several advanced models on spatial tasks. The dataset encompasses twelve distinct task types, including spatial understanding and path planning, each with verified, accurate answers. We evaluated multiple models, including OpenAI's gpt-3.5-turbo, gpt-4o, and ZhipuAI's glm-4, through a two-phase testing approach. Initially, we conducted zero-shot testing, followed by categorizing the dataset by difficulty and performing prompt tuning tests. Results indicate that gpt-4o achieved the highest overall accuracy in the first phase, with an average of 71.3%. Although moonshot-v1-8k slightly underperformed overall, it surpassed gpt-4o in place name recognition tasks. The study also highlights the impact of prompt strategies on model performance in specific tasks. For example, the Chain-of-Thought (COT) strategy increased gpt-4o's accuracy in path planning from 12.4% to 87.5%, while a one-shot strategy enhanced moonshot-v1-8k's accuracy in mapping tasks from 10.1% to 76.3%.	翻訳日:2024-09-04 16:21:29 公開日:2024-09-02
# セマンティックな特徴融合誘導による多モード有向物体検出へのセグメンテーションモデルの適用 Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance ( http://arxiv.org/abs/2408.15063v3 ) ライセンス: Link先を確認	Kunpeng Wang, Danying Lin, Chenglong Li, Zhengzheng Tu, Bin Luo,	(参考訳) 既存のSOD(Multi-modal Salient Object Detection)手法は、スクラッチからのトレーニングモデルによる有効性を示すが、制限されたマルチモーダルデータは、これらの手法が最適性に達することを妨げている。本稿では,マルチモーダルSODのためのSAM(Pre-trained Segment Anything Model)の強力な特徴表現とゼロショット一般化能力を探求し,活用するための新しいフレームワークを提案する。最近のビジョンの基本モデルとして機能しているにもかかわらず、特に難易度の高いシーンにおいて、クラスに依存しないSAMを正確に理解し、検出するために駆動するのは簡単ではない。この目的のために,SODタスクにSAMを適応させるために,SODタスクに多モードサリエンシ固有の知識を組み込んだse\underline{m}antic f\underline{e}ature fu\underline{s}ion guidanc\underline{e} (Sammese) を用いた \underline{SAM} を開発した。しかし、SAMが単一モーダルデータに基づいて訓練し、複数のモーダル入力の相補的な利点を直接マイニングし、それらを包括的に利用して正確な正当性予測を実現することは困難である。これらの課題に対処するために,我々はまず,可視・熱・深度画像のペアから情報を統合することで,堅牢なマルチモーダル意味特徴を抽出する多モーダル補完融合モジュールを設計する。そして、抽出したマルチモーダルなセマンティック特徴をSAM画像エンコーダとマスクデコーダの両方に供給し、微調整とプロンプトを行う。具体的には、画像エンコーダにおいて、シングルモーダルSAMをマルチモーダル情報に適応させるために、マルチモーダルアダプタを提案する。マスクデコーダでは, 各種のサリエンシ・キューで対応する埋め込みを生成するために, セマンティック・ジオメトリ・プロンプト生成戦略を提案する。 RGB-D と RGB-T SOD のベンチマーク実験により,提案手法の有効性が示された。コードは \url{https://github.com/Angknpng/Sammese} で入手できる。 Although most existing multi-modal salient object detection (SOD) methods demonstrate effectiveness through training models from scratch, the limited multi-modal data hinders these methods from reaching optimality. In this paper, we propose a novel framework to explore and exploit the powerful feature representation and zero-shot generalization ability of the pre-trained Segment Anything Model (SAM) for multi-modal SOD. Despite serving as a recent vision fundamental model, driving the class-agnostic SAM to comprehend and detect salient objects accurately is non-trivial, especially in challenging scenes. To this end, we develop \underline{SAM} with se\underline{m}antic f\underline{e}ature fu\underline{s}ion guidanc\underline{e} (Sammese), which incorporates multi-modal saliency-specific knowledge into SAM to adapt SAM to multi-modal SOD tasks. However, it is difficult for SAM trained on single-modal data to directly mine the complementary benefits of multi-modal inputs and comprehensively utilize them to achieve accurate saliency prediction. To address these issues, we first design a multi-modal complementary fusion module to extract robust multi-modal semantic features by integrating information from visible and thermal or depth image pairs. Then, we feed the extracted multi-modal semantic features into both the SAM image encoder and mask decoder for fine-tuning and prompting, respectively. Specifically, in the image encoder, a multi-modal adapter is proposed to adapt the single-modal SAM to multi-modal information. In the mask decoder, a semantic-geometric prompt generation strategy is proposed to produce corresponding embeddings with various saliency cues. Extensive experiments on both RGB-D and RGB-T SOD benchmarks show the effectiveness of the proposed framework. The code will be available at \url{https://github.com/Angknpng/Sammese}.	翻訳日:2024-09-04 16:21:29 公開日:2024-09-02
# COMETにおける落とし穴と展望 Pitfalls and Outlooks in Using COMET ( http://arxiv.org/abs/2408.15366v2 ) ライセンス: Link先を確認	Vilém Zouhar, Pinzhen Chen, Tsz Kin Lam, Nikita Moghe, Barry Haddow,	(参考訳) COMETメートル法は導入以来,翻訳品質の人為的判断と強い相関関係にあることから,機械翻訳コミュニティの足跡となった。その成功は、品質評価のために微調整された事前訓練された多言語モデルであることに由来する。しかし、機械学習モデルであることは、広く知られていないかもしれない新しい落とし穴を生じさせる。我々はこれらの予期せぬ行動を3つの側面から調査する。 1) 技術: 時代遅れのソフトウェアバージョン及び計算精度 2) データは,テスト時の空のコンテンツ,言語ミスマッチ,翻訳文,及びトレーニングにおける分布及びドメインバイアスである。 3)使用状況と報告:文献におけるマルチリファレンスサポートとモデル参照。これらの問題は、COMETのスコアが論文や技術的な設定と同等ではないことを暗示している。さらに,ソフトウェアとモデル構成のシグネチャを生成するSacreCOMETパッケージと適切な引用を生成する。この作業の目標は、コミュニティがCOMETメトリックをよりうまく活用できるようにすることです。 Since its introduction, the COMET metric has blazed a trail in the machine translation community, given its strong correlation with human judgements of translation quality. Its success stems from being a modified pre-trained multilingual model finetuned for quality assessment. However, it being a machine learning model also gives rise to a new set of pitfalls that may not be widely known. We investigate these unexpected behaviours from three aspects: 1) technical: obsolete software versions and compute precision; 2) data: empty content, language mismatch, and translationese at test time as well as distribution and domain biases in training; 3) usage and reporting: multi-reference support and model referencing in the literature. All of these problems imply that COMET scores is not comparable between papers or even technical setups and we put forward our perspective on fixing each issue. Furthermore, we release the SacreCOMET package that can generate a signature for the software and model configuration as well as an appropriate citation. The goal of this work is to help the community make more sound use of the COMET metric.	翻訳日:2024-09-04 16:21:29 公開日:2024-09-02
# チャネルプルーニングのための効果的な情報理論フレームワーク An Effective Information Theoretic Framework for Channel Pruning ( http://arxiv.org/abs/2408.16772v2 ) ライセンス: Link先を確認	Yihao Chen, Zefang Wang,	(参考訳) チャネルプルーニングは畳み込みニューラルネットワークの高速化と圧縮のための有望な方法である。しかし、現在のプルーニングアルゴリズムは、レイヤーワイズプルーニング比を適切に割り当て、説得力のある基準で最も重要でないチャネルを破棄する方法が未解決のままである。本稿では,ニューラルネットの情報理論と解釈可能性を用いた新しいチャネルプルーニング手法を提案する。具体的には,情報エントロピーを,畳み込み層において期待される情報量とみなす。さらに、行列を線型方程式の系と仮定すると、高階行列はそれに対してより多くの解が存在することを示し、より不確実性を示す。情報理論の観点からは、ランクは情報量を記述することもできる。ニューラルネットワークにおいて、畳み込み層の2つの情報指標としてランクとエントロピーを考慮し、融合結果を「情報集中」と定義する、それらの妥協点に達するための融合関数を提案する。レイヤワイズプルーニング比を事前に定義する場合、情報集中をヒューリスティックな工学的チューニングの代わりに参照として使用し、より解釈可能なソリューションを提供する。さらに、ニューラルネットワークの解釈可能性において強力なツールであるShapley値を利用して、チャネルコントリビューションを評価し、その性能を維持しながら、モデル圧縮の最も重要でないチャネルを破棄する。大規模な実験により,本手法の有効性と有望な性能が示された。例えば、45.5%のFLOPを削減し、CIFAR-10上でResNet-56のパラメータを40.3%削除すると、精度が0.21%向上する。さらに,Top-1/Top-5の精度0.43%/0.11%の損失を,41.6%のFLOPを削減し,ImageNet上でResNet-50のパラメータを35.0%削除することで得る。 Channel pruning is a promising method for accelerating and compressing convolutional neural networks. However, current pruning algorithms still remain unsolved problems that how to assign layer-wise pruning ratios properly and discard the least important channels with a convincing criterion. In this paper, we present a novel channel pruning approach via information theory and interpretability of neural networks. Specifically, we regard information entropy as the expected amount of information for convolutional layers. In addition, if we suppose a matrix as a system of linear equations, a higher-rank matrix represents there exist more solutions to it, which indicates more uncertainty. From the point of view of information theory, the rank can also describe the amount of information. In a neural network, considering the rank and entropy as two information indicators of convolutional layers, we propose a fusion function to reach a compromise of them, where the fusion results are defined as ``information concentration''. When pre-defining layer-wise pruning ratios, we employ the information concentration as a reference instead of heuristic and engineering tuning to provide a more interpretable solution. Moreover, we leverage Shapley values, which are a potent tool in the interpretability of neural networks, to evaluate the channel contributions and discard the least important channels for model compression while maintaining its performance. Extensive experiments demonstrate the effectiveness and promising performance of our method. For example, our method improves the accuracy by 0.21% when reducing 45.5% FLOPs and removing 40.3% parameters for ResNet-56 on CIFAR-10. Moreover, our method obtains loss in Top-1/Top-5 accuracies of 0.43%/0.11% by reducing 41.6% FLOPs and removing 35.0% parameters for ResNet-50 on ImageNet.	翻訳日:2024-09-04 16:21:29 公開日:2024-09-02
# DiffLoad:拡散モデルによる電力負荷予測の不確かさの定量化 DiffLoad: Uncertainty Quantification in Electrical Load Forecasting with the Diffusion Model ( http://arxiv.org/abs/2306.01001v5 ) ライセンス: Link先を確認	Zhixian Wang, Qingsong Wen, Chaoli Zhang, Liang Sun, Yi Wang,	(参考訳) 電力需要予測は、単位コミットメントや経済派遣を含む電力システムの意思決定において重要な役割を担っている。再生可能エネルギー源の統合と、新型コロナウイルスのパンデミックなどの外部イベントの発生により、負荷予測の不確実性が急速に高まっている。負荷予測の不確実性は, てんかん性不確実性と失読性不確実性という2つのタイプに分けられる。このような不確実性を分離することで、意思決定者は、その不確実性がどの程度あるかをよりよく理解し、次の意思決定に対する信頼を高めることができる。本稿では, エピステミック不確かさを推定するための拡散型Seq2Seq構造を提案し, 強靭性付加コーシー分布を用いてアレタリック不確かさを推定する。本手法は,負荷予測の精度を確保するだけでなく,2種類の不確実性を分離し,異なる負荷レベルに適用できることを示す。関連するコードは \url{https://anonymous.4open.science/r/DiffLoad-4714/} にある。 Electrical load forecasting plays a crucial role in decision-making for power systems, including unit commitment and economic dispatch. The integration of renewable energy sources and the occurrence of external events, such as the COVID-19 pandemic, have rapidly increased uncertainties in load forecasting. The uncertainties in load forecasting can be divided into two types: epistemic uncertainty and aleatoric uncertainty. Separating these types of uncertainties can help decision-makers better understand where and to what extent the uncertainty is, thereby enhancing their confidence in the following decision-making. This paper proposes a diffusion-based Seq2Seq structure to estimate epistemic uncertainty and employs the robust additive Cauchy distribution to estimate aleatoric uncertainty. Our method not only ensures the accuracy of load forecasting but also demonstrates the ability to separate the two types of uncertainties and be applicable to different levels of loads. The relevant code can be found at \url{https://anonymous.4open.science/r/DiffLoad-4714/}.	翻訳日:2024-09-04 12:51:25 公開日:2024-09-02
# 一般確率論における量子チャネルの不適合性 The incompatibility of quantum channels in general probabilistic theories ( http://arxiv.org/abs/2403.01392v3 ) ライセンス: Link先を確認	Masataka Yamada, Takayuki Miyadera,	(参考訳) 量子論において、同時に実行できない操作の集合が存在する。これらの操作の集合は非互換と呼ばれる。この不整合性の定義は一般確率論(GPT)にまで拡張されるが、合成系の定義に対する互換集合の集合の依存性は十分には研究されていない。量子チャネルについては、従来の合成系に基づいてヒルベルト空間のテンソル積を用いて整合性を定義する。しかし、GPTでは合成系は一意に定義されておらず、状態の集合は最小テンソルから最大テンソルに変化する。本稿では、通常の量子互換に加えて、エフェクト空間の合成系における最小テンソルを用いたmin-tensor-compatibilityを導入し、量子ビット上のノイズのあるアイデンティティチャネルを用いたそれらの関係について検討する。その結果、min-tensor互換チャネル対の集合は、量子互換チャネル対の集合よりも厳密に幅が広いことがわかった。さらに、運用の観点から、ほぼ量子互換のチャネル対の概念を導入する。この概念は、相性検証における相関関数がチャネルと効果の局所的再解釈によって実現される場合に対応する。ほぼ量子互換なチャネル対の集合は、すべての min-テンソル互換なチャネル対の集合よりも厳密に狭いことを実証する。 In quantum theory, there exist sets of operations that cannot be performed simultaneously. These sets of operations are referred to as incompatible. While this definition of incompatibility extends to general probabilistic theories (GPTs), the dependency of the set of compatible sets on the definition of composite systems has not been thoroughly investigated. For quantum channels, compatibility is defined using the tensor product of Hilbert spaces, based on the conventional composite system. However, in GPTs, composite systems are not uniquely defined, and the set of states can vary from the minimal tensor to the maximal tensor. In this paper, in addition to the usual quantum compatibility, we introduce min-tensor-compatibility using the minimal tensor on the composite system of effect spaces and investigate their relationship employing noisy identity channels on qubits. As a result, we found that the set of min-tensor-compatible channel pairs is strictly broader than the set of quantum-compatible channel pairs. Furthermore, we introduce the concept of almost quantum compatible pairs of channels from an operational perspective. This concept corresponds to cases where the correlation functions in the verification of compatibility can be realized through a channel and local reinterpretation of effects. We demonstrate that the set of all almost quantum compatible channel pairs is strictly narrower than the set of all min-tensor-compatible channel pairs.	翻訳日:2024-09-04 12:51:25 公開日:2024-09-02
# ボルンルールは計測ノイズの結果か? Is the Born rule a result of measurement noise? ( http://arxiv.org/abs/2407.03139v3 ) ライセンス: Link先を確認	Frank Torres,	(参考訳) ボルン則は、偏りのない量子測定で観測される固有状態の確率分布を主張するが、それを保持する理由はいまだ解明されていない。シュロディンガー方程式力学(英語版)によりボルン則がどのように説明されるかについて、ある測定が、ある測定固有状態が任意に小さな許容範囲に収まるまで、ランダムなゆらぎに対応する系を含む場合について論じる。時間に依存した確率的ユニタリ行列 U(t) のクラスで、この振る舞いを生成するランダムウォークダイナミクスについて述べる。また、このユニタリ行列のクラスに相当するシュロディンガー方程式における確率ポテンシャルエネルギーのクラスについても論じる。この分析は、予測されたランダムウォークメカニズムに実際に従う計測方法や、ボルンルールの確率から逸脱する信頼性の高い測定装置を設計できるかどうかなど、考慮すべきいくつかの疑問を提起する。興味深いことに、もしこのランダムウォーク機構に何らかの測定が従えば、量子系を確率的な「ノイズ」に露出させることは、単に望ましくない副作用ではなく、そのような測定の本質的な部分である。この特徴は、量子センシングと量子コンピューティングにおけるノイズの低減に影響を及ぼすであろう。これは進行中の作業の草案です。質問や提案は歓迎です。 The Born rule asserts the probability distribution of eigenstates observed in unbiased quantum measurements, but the reason it holds remains elusive. This manuscript discusses how the Born rule might be explained by Schrodinger equation dynamics, if a measurement comprises a system responding to random fluctuations until it is within an arbitrarily small tolerance of a measurement eigenstate. We describe the random walk dynamics that produce this behavior in terms of a class of time-dependent, stochastic unitary matrices U(t). We also discuss the class of stochastic potential energies in the Schrodinger equation that is equivalent to this class of unitary matrices. This analysis raises some questions worth considering, including how to determine if any measurements actually follow the predicted random walk mechanism and whether a reliable measurement apparatus could be designed that deviates from Born rule probabilities. Interestingly, if any measurements do follow this random walk mechanism, then exposing a quantum system to stochastic 'noise' is an intrinsic part of such a measurement, not merely an unwanted side effect. This characteristic would have implications for reducing noise in quantum sensing and quantum computing. This is a draft of a work in progress. Questions and suggestions are welcome.	翻訳日:2024-09-04 12:51:25 公開日:2024-09-02
# デルタエンジンによる仮想世界を進化させる Evolving Virtual World with Delta-Engine ( http://arxiv.org/abs/2408.05842v4 ) ライセンス: Link先を確認	Hongqiu Wu, Zekai Xu, Tianyang Xu, Shize Wei, Yan Wang, Jiale Hong, Weiqi Wu, Hai Zhao, Min Zhang, Zhezhi He,	(参考訳) 本稿では,人々が住むことができるサイバー空間である「emph{virtual world}」に焦点を当てる。理想的な仮想世界は、私たちの現実世界と非常によく似ている。重要な側面の1つは、その進化する性質であり、個人が成長し、それによって客観的世界に影響を与える能力に反映されている。このような力学は予測不可能であり、既存のシステムの範囲を超えている。そこで我々は,この仮想世界を駆動する特別なエンジンである「textbf{\emph{Delta-Engine}}」を提案する。 $\Delta$は、世界の進化とエンジンのスケーラビリティを関連付ける。ベースエンジンとニューラルプロキシで構成される。ベースエンジンは仮想世界のプロトタイプをプログラムし、トリガーが与えられたら、ニューラルネットワークはベースエンジン上で \emph{incremental prediction} を通じて新しいスニペットを生成する。本稿ではデルタエンジンのフルスタック導入について述べる。デルタエンジンの重要な特徴は、世界中の未知の要素へのスケーラビリティである。技術的には、ニューラルネットワークとベースエンジンの完全なコワーキング、高品質なデータとの整合性から導かれる。本稿では,ベースエンジンをプロキシに組み込むエンジン指向の微調整手法を提案する。次に、人間とLLMの協調設計を議論し、新しい興味深いデータを効率よく生成する。最終的に,デルタエンジンの性能を総合的に評価する3つの評価原則を提案する。 In this paper, we focus on the \emph{virtual world}, a cyberspace where people can live in. An ideal virtual world shares great similarity with our real world. One of the crucial aspects is its evolving nature, reflected by individuals' capability to grow and thereby influence the objective world. Such dynamics is unpredictable and beyond the reach of existing systems. For this, we propose a special engine called \textbf{\emph{Delta-Engine}} to drive this virtual world. $\Delta$ associates the world's evolution to the engine's scalability. It consists of a base engine and a neural proxy. The base engine programs the prototype of the virtual world; given a trigger, the neural proxy generates new snippets on the base engine through \emph{incremental prediction}. This paper presents a full-stack introduction to the delta-engine. The key feature of the delta-engine is its scalability to unknown elements within the world, Technically, it derives from the prefect co-work of the neural proxy and the base engine, and the alignment with high-quality data. We introduce an engine-oriented fine-tuning method that embeds the base engine into the proxy. We then discuss the human-LLM collaborative design to produce novel and interesting data efficiently. Eventually, we propose three evaluation principles to comprehensively assess the performance of a delta engine: naive evaluation, incremental evaluation, and adversarial evaluation.	翻訳日:2024-09-04 12:51:25 公開日:2024-09-02
# 人中心自律意思決定システムのための信頼と責任のあるAI Trustworthy and Responsible AI for Human-Centric Autonomous Decision-Making Systems ( http://arxiv.org/abs/2408.15550v2 ) ライセンス: Link先を確認	Farzaneh Dehghani, Mahsa Dibaji, Fahim Anzum, Lily Dey, Alican Basdemir, Sayeh Bayat, Jean-Christophe Boucher, Steve Drew, Sarah Elaine Eaton, Richard Frayne, Gouri Ginde, Ashley Harris, Yani Ioannou, Catherine Lebel, John Lysack, Leslie Salgado Arzuaga, Emma Stanley, Roberto Souza, Ronnie de Souza Santos, Lana Wells, Tyler Williamson, Matthias Wilms, Zaman Wahid, Mark Ungrin, Marina Gavrilova, Mariana Bento,	(参考訳) 人工知能(AI)は、革命的な意思決定プロセスの道を開いた。しかしながら、ブラックボックスの性質は、バイアスと透明性に関連する重要な倫理的課題を呈している。 AIアプリケーションはバイアスに大きく影響され、一貫性がなく信頼性の低い結果を示し、大きなコストと結果をもたらし、不平等を強調し、リソースへのアクセスを不平等にします。したがって、安全で信頼性があり、倫理的で信頼できるAIシステムの開発が不可欠である。カルガリー大学のトランスディシプリナリー・スカラーシップ・イニシアチブ(Transdisciplinary Scholarship Initiative)の一部である、Trustworthy and Responsible AIと協力する我々の研究チームは、公正さ、バイアス緩和、再現性、一般化、解釈可能性、信頼性などを含む、Trustworthy and Responsible AIの研究を行っています。本稿では,AIバイアス,定義,検出と緩和の方法,およびバイアスを評価するメトリクスの複雑さをレビューし,議論する。また、人間中心の意思決定のさまざまな領域におけるAIの信頼性と広範な適用に関するオープンな課題や、責任と信頼に値するAIモデルを育成するためのガイドラインについても論じる。 Artificial Intelligence (AI) has paved the way for revolutionary decision-making processes, which if harnessed appropriately, can contribute to advancements in various sectors, from healthcare to economics. However, its black box nature presents significant ethical challenges related to bias and transparency. AI applications are hugely impacted by biases, presenting inconsistent and unreliable findings, leading to significant costs and consequences, highlighting and perpetuating inequalities and unequal access to resources. Hence, developing safe, reliable, ethical, and Trustworthy AI systems is essential. Our team of researchers working with Trustworthy and Responsible AI, part of the Transdisciplinary Scholarship Initiative within the University of Calgary, conducts research on Trustworthy and Responsible AI, including fairness, bias mitigation, reproducibility, generalization, interpretability, and authenticity. In this paper, we review and discuss the intricacies of AI biases, definitions, methods of detection and mitigation, and metrics for evaluating bias. We also discuss open challenges with regard to the trustworthiness and widespread application of AI across diverse domains of human-centric decision making, as well as guidelines to foster Responsible and Trustworthy AI models.	翻訳日:2024-09-04 12:43:33 公開日:2024-09-02
# 大規模言語モデルを用いた説得ゲーム Persuasion Games using Large Language Models ( http://arxiv.org/abs/2408.15879v2 ) ライセンス: Link先を確認	Ganesh Prasath Ramani, Shirish Karande, Santhosh V, Yash Bhatia,	(参考訳) 大型言語モデル (LLM) は、人間のような文章を解釈し、生成することのできる、恐ろしい道具として登場した。本稿では,LCMがユーザ視点を形作り,その決定を特定のタスクに影響を及ぼす可能性について考察する。この機能は、投資、クレジットカード、保険といった様々な分野のアプリケーションを見つけ、適切な保険政策、投資計画、クレジットカード、小売、そして行動変革支援システム(BCSS)のユーザーを支援する。エージェントのコンソーシアムが協調的に動作する高度なマルチエージェントフレームワークを提案する。主エージェントは説得対話を通じて直接ユーザエージェントと対話し、補助エージェントは情報検索、応答分析、説得戦略の開発、事実の検証を行う。我々の実験から得られた実証的な証拠は、この協調手法がLLMの説得力を高めることを証明している。ユーザエージェントの説得力に対する抵抗を継続的に分析し、ルールベースとLCMベースの抵抗パーポーションマッピング技術を組み合わせて対処する。我々は、シミュレートされたペルソナを採用し、保険、銀行、小売ドメインで会話を生成し、さまざまなパーソナタイプを認識し、調整し、影響を与える大規模言語モデル(LLM)の能力を評価する。同時に, LLMシミュレートされたペルソナの抵抗機構について検討した。説得は、対話前後の計測可能な調査、会話におけるLLM生成スコア、およびユーザ決定(購入または非購入)によって定量化される。 Large Language Models (LLMs) have emerged as formidable instruments capable of comprehending and producing human-like text. This paper explores the potential of LLMs, to shape user perspectives and subsequently influence their decisions on particular tasks. This capability finds applications in diverse domains such as Investment, Credit cards and Insurance, wherein they assist users in selecting appropriate insurance policies, investment plans, Credit cards, Retail, as well as in Behavioral Change Support Systems (BCSS). We present a sophisticated multi-agent framework wherein a consortium of agents operate in collaborative manner. The primary agent engages directly with user agents through persuasive dialogue, while the auxiliary agents perform tasks such as information retrieval, response analysis, development of persuasion strategies, and validation of facts. Empirical evidence from our experiments demonstrates that this collaborative methodology significantly enhances the persuasive efficacy of the LLM. We continuously analyze the resistance of the user agent to persuasive efforts and counteract it by employing a combination of rule-based and LLM-based resistance-persuasion mapping techniques. We employ simulated personas and generate conversations in insurance, banking, and retail domains to evaluate the proficiency of large language models (LLMs) in recognizing, adjusting to, and influencing various personality types. Concurrently, we examine the resistance mechanisms employed by LLM simulated personas. Persuasion is quantified via measurable surveys before and after interaction, LLM-generated scores on conversation, and user decisions (purchase or non-purchase).	翻訳日:2024-09-04 12:43:33 公開日:2024-09-02
# データ効率の良い一般化は基礎モデルのバイアスを悪化させるか? Does Data-Efficient Generalization Exacerbate Bias in Foundation Models? ( http://arxiv.org/abs/2408.16154v2 ) ライセンス: Link先を確認	Dilermando Queiroz, Anderson Carlos, Maíra Fatoretto, Luis Filipe Nakayama, André Anjos, Lilian Berton,	(参考訳) ファンデーションモデルは、様々なドメインでラベル効率を持つ堅牢なモデルとして登場した。医用画像では,ラベル付きデータの取得が困難であるため,診断の進歩に寄与する。しかし、事前学習中に機密属性の存在に偏った大量のラベル付きデータを使用することが、モデルの公平性に影響を与えるかどうかは不明である。本研究は,ブラジルの多ラベル眼科学データセット(BRSET)を微調整する際のファンデーションモデル(RetFound)のバイアスについて検討する。モデル評価は、教師付き学習と比較して、基礎モデルが、最大AUCと最低AUCの男女・年齢グループ間のギャップを減じる可能性を示唆している。しかし、データ効率の一般化では、データ量が減少するとバイアスが増大する。これらの結果は,データ制限のある実生活シナリオにファンデーションモデルをデプロイする場合,公平性の問題の可能性を検討する必要があることを示唆している。 Foundation models have emerged as robust models with label efficiency in diverse domains. In medical imaging, these models contribute to the advancement of medical diagnoses due to the difficulty in obtaining labeled data. However, it is unclear whether using a large amount of unlabeled data, biased by the presence of sensitive attributes during pre-training, influences the fairness of the model. This research examines the bias in the Foundation model (RetFound) when it is applied to fine-tune the Brazilian Multilabel Ophthalmological Dataset (BRSET), which has a different population than the pre-training dataset. The model evaluation, in comparison with supervised learning, shows that the Foundation Model has the potential to reduce the gap between the maximum AUC and minimum AUC evaluations across gender and age groups. However, in a data-efficient generalization, the model increases the bias when the data amount decreases. These findings suggest that when deploying a Foundation Model in real-life scenarios with limited data, the possibility of fairness issues should be considered.	翻訳日:2024-09-04 12:43:33 公開日:2024-09-02
# ロバスト制約マルコフ決定過程におけるエピグラフ形式による近似的ポリシー同定 Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form ( http://arxiv.org/abs/2408.16286v2 ) ライセンス: Link先を確認	Toshinori Kitamura, Tadashi Kozuno, Wataru Kumagai, Kenta Hoshino, Yohei Hosoe, Kazumi Kasaura, Masashi Hamaya, Paavo Parmas, Yutaka Matsuo,	(参考訳) 不確実な環境に対する安全なポリシーを設計することは、現実世界の制御アプリケーションにおいて不可欠である。しかし、この課題はマルコフ決定プロセス(MDP)フレームワークの中では不十分である。本稿では, 環境全体にわたる最悪のシナリオにおける制約を満足しつつ, 累積コストを最小化する, 頑健な制約付きMDP (RCMDP) における準最適ポリシを同定できるアルゴリズムを提案する。まず、政策勾配法による従来のラグランジアン最大ミン定式化は、その内部最小化問題における目的関数と制約関数との矛盾する勾配の和に遭遇することによって、最適解に閉じ込められることを証明した。この問題に対処するために、RCMDP問題のエピグラフ形式を活用し、目的あるいは制約のいずれかから単一の勾配を選択することで競合を解決する。エピグラフ形式に基づいて、ポリシー勾配のサブルーチンを持つバイナリ探索アルゴリズムを提案し、ポリシー評価を$\tilde{\mathcal{O}}(\varepsilon^{-4})$でRCMDPで$\varepsilon$-optimal Policyを識別することを証明した。 Designing a safe policy for uncertain environments is crucial in real-world control applications. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm capable of identifying a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments. We first prove that the conventional Lagrangian max-min formulation with policy gradient methods can become trapped in suboptimal solutions by encountering a sum of conflicting gradients from the objective and constraint functions during its inner minimization problem. To address this, we leverage the epigraph form of the RCMDP problem, which resolves the conflict by selecting a single gradient from either the objective or the constraints. Building on the epigraph form, we propose a binary search algorithm with a policy gradient subroutine and prove that it identifies an $\varepsilon$-optimal policy in an RCMDP with $\tilde{\mathcal{O}}(\varepsilon^{-4})$ policy evaluations.	翻訳日:2024-09-04 12:43:33 公開日:2024-09-02
# ディープラーニングに基づくラベルなし非参照画像品質評価基準:ナトリウムMRIデノイングへの応用 A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising ( http://arxiv.org/abs/2408.16481v2 ) ライセンス: Link先を確認	Shuaiyu Yuan, Tristan Whitmarsh, Dimitri A Kessler, Otso Arponen, Mary A McLean, Gabrielle Baxter, Frank Riemer, Aneurin J Kennerley, William J Brackenbury, Fiona J Gilbert, Joshua D Kaggie,	(参考訳) ナトリウムMRIのような新しい多核MRI技術は、本質的に低信号のため、画像品質が低下することが多い。画像のデノナイジングのような後処理手法は、画像の強調のために開発されている。しかし,これらの強調画像の評価は,特にナトリウムMRIのような高解像度・高信号画像が参照されていない場合を考えると困難である。非参照画像品質評価(NR-IQA)メトリクスは、この問題を解決するためのアプローチである。既存の学習ベースのNR-IQAメトリクスは、主観的な人間の意見から派生したラベルや、SNR(Signal-to-Noise Ratio)のようなメトリクスに依存している。深層学習(DL)モデルは特徴的トレーニングセットに特化している点に特有な特徴があることに留意する。そこで本研究では,新しいDLベースのNR-IQAメトリックであるモデルスペシャライゼーション・メトリック(MSM)を提案する。 MSMは、入力画像の品質を評価するために、入力画像とモデルの予測との差を測定する。陽子-陽子-陽子-重み付きMR画像および陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-陽子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子-光子 MSMはまた、専門家の評価とかなりの合意を結び、コーエンのカッパ係数の0.6528を達成し、既存のNR-IQA測定値を上回った。 New multinuclear MRI techniques, such as sodium MRI, generally suffer from low image quality due to an inherently low signal. Postprocessing methods, such as image denoising, have been developed for image enhancement. However, the assessment of these enhanced images is challenging especially considering when there is a lack of high resolution and high signal images as reference, such as in sodium MRI. No-reference Image Quality Assessment (NR-IQA) metrics are approaches to solve this problem. Existing learning-based NR-IQA metrics rely on labels derived from subjective human opinions or metrics like Signal-to-Noise Ratio (SNR), which are either time-consuming or lack accurate ground truths, resulting in unreliable assessment. We note that deep learning (DL) models have a unique characteristic in that they are specialized to a characteristic training set, meaning that deviations between the input testing data from the training data will reduce prediction accuracy. Therefore, we propose a novel DL-based NR-IQA metric, the Model Specialization Metric (MSM), which does not depend on ground-truth images or labels. MSM measures the difference between the input image and the model's prediction for evaluating the quality of the input image. Experiments conducted on both simulated distorted proton T1-weighted MR images and denoised sodium MR images demonstrate that MSM exhibits a superior evaluation performance on various simulated noises and distortions. MSM also has a substantial agreement with the expert evaluations, achieving an averaged Cohen's Kappa coefficient of 0.6528, outperforming the existing NR-IQA metrics.	翻訳日:2024-09-04 12:43:33 公開日:2024-09-02
# RLCP:テキスト・画像拡散モデルのための強化学習に基づく著作権保護手法 RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model ( http://arxiv.org/abs/2408.16634v2 ) ライセンス: Link先を確認	Zhuan Shi, Jing Yan, Xiaoli Tang, Lingjuan Lyu, Boi Faltings,	(参考訳) テキストから画像への生成モデルの高度化は、著作権侵害の基準と保護を定義し、強制する上で複雑な問題を引き起こしている。ウォーターマーキングやデータセットの重複といった既存の手法は、標準化されたメトリクスの欠如と拡散モデルにおける著作権侵害に対処する固有の複雑さのために、包括的なソリューションを提供できない。これらの課題に対処するため,テキスト・ツー・イメージ拡散モデルのための強化学習に基づく著作権保護手法を提案し,モデル生成データセットの品質を維持しながら著作権侵害コンテンツの生成を最小限にする。当社のアプローチは,著作権法と裁判所による侵害の先例に基づく,新たな著作権基準の導入から始まります。そこで,我々はDDPO(Denoising Diffusion Policy Optimization)フレームワークを用いて多段階の意思決定プロセスを通じてモデルを誘導し,提案した著作権基準を組み込んだ報酬関数を用いてモデルを最適化する。さらに、故障モードを緩和し、RL微調整を安定化するために、正規化用語としてKL発散を用いる。著作権と非著作権の画像の混合データセットを用いた実験により,画像品質を維持しながら著作権侵害のリスクを著しく低減することを示した。 The increasing sophistication of text-to-image generative models has led to complex challenges in defining and enforcing copyright infringement criteria and protection. Existing methods, such as watermarking and dataset deduplication, fail to provide comprehensive solutions due to the lack of standardized metrics and the inherent complexity of addressing copyright infringement in diffusion models. To deal with these challenges, we propose a Reinforcement Learning-based Copyright Protection(RLCP) method for Text-to-Image Diffusion Model, which minimizes the generation of copyright-infringing content while maintaining the quality of the model-generated dataset. Our approach begins with the introduction of a novel copyright metric grounded in copyright law and court precedents on infringement. We then utilize the Denoising Diffusion Policy Optimization (DDPO) framework to guide the model through a multi-step decision-making process, optimizing it using a reward function that incorporates our proposed copyright metric. Additionally, we employ KL divergence as a regularization term to mitigate some failure modes and stabilize RL fine-tuning. Experiments conducted on 3 mixed datasets of copyright and non-copyright images demonstrate that our approach significantly reduces copyright infringement risk while maintaining image quality.	翻訳日:2024-09-04 12:24:11 公開日:2024-09-02
# 連成・個別成分分析による拡散モデルにおける局所編集の実現 Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis ( http://arxiv.org/abs/2408.16845v2 ) ライセンス: Link先を確認	Theodoros Kouzelis, Manos Plitsis, Mihalis A. Nicolaou, Yannis Panagakis,	(参考訳) 拡散モデル(DM)の最近の進歩は、視覚合成と編集タスクの大幅な進歩をもたらし、GAN(Generative Adversarial Networks)の強力なライバルとして確立されている。しかし、DMsの潜伏空間はGANsほどよく理解されていない。最近の研究は、意味的潜在空間の性質を示すことが示されている認知ネットワークのボトルネック層を活用することで、DMの潜在空間における教師なし意味発見に焦点を当てている。しかし、これらのアプローチはグローバル属性の発見に限られている。本稿では、DMにおける局所的な画像操作の課題に対処し、事前学習されたDMの認知ネットワークによって学習された潜在意味を分解する教師なし手法を提案する。任意の画像と関心領域が与えられた場合、関心領域と潜在空間の対応する部分空間の関係を確立するために、認知ネットワークのヤコビアンを利用する。さらに、これらの部分空間の接合部と個々の成分をアンタングルして、局所的な画像操作を可能にする遅延方向を識別する。一度発見されると、これらの方向を異なる画像に適用して意味論的に一貫した編集を行うことができ、本手法は実用的な応用に適している。種々のデータセットに対する実験結果から,本手法はより局所化され,より忠実なセマンティック編集を作成できることを示した。 Recent advances in Diffusion Models (DMs) have led to significant progress in visual synthesis and editing tasks, establishing them as a strong competitor to Generative Adversarial Networks (GANs). However, the latent space of DMs is not as well understood as that of GANs. Recent research has focused on unsupervised semantic discovery in the latent space of DMs by leveraging the bottleneck layer of the denoising network, which has been shown to exhibit properties of a semantic latent space. However, these approaches are limited to discovering global attributes. In this paper we address, the challenge of local image manipulation in DMs and introduce an unsupervised method to factorize the latent semantics learned by the denoising network of pre-trained DMs. Given an arbitrary image and defined regions of interest, we utilize the Jacobian of the denoising network to establish a relation between the regions of interest and their corresponding subspaces in the latent space. Furthermore, we disentangle the joint and individual components of these subspaces to identify latent directions that enable local image manipulation. Once discovered, these directions can be applied to different images to produce semantically consistent edits, making our method suitable for practical applications. Experimental results on various datasets demonstrate that our method can produce semantic edits that are more localized and have better fidelity compared to the state-of-the-art.	翻訳日:2024-09-04 12:24:11 公開日:2024-09-02
# 深部畳み込みニューラルネットの疾患分類と画像モダリティにおける異種医療画像データセットへの影響 Disease Classification and Impact of Pretrained Deep Convolution Neural Networks on Diverse Medical Imaging Datasets across Imaging Modalities ( http://arxiv.org/abs/2408.17011v2 ) ライセンス: Link先を確認	Jutika Borah, Kumaresh Sarmah, Hidam Kumarjit Singh,	(参考訳) 胸部X線、全スライド画像、光コヒーレンス断層撮影などのイメージング技術は、それぞれ様々な医学的肺および眼疾患のスクリーニングおよび検出に役立っている。本稿では,2進分類と多進分類の異なる多種多様な医用画像データセット間での伝達学習を伴う,事前訓練された深部畳み込みニューラルネットワークの使用の複雑さについて検討する。我々は10のネットワークアーキテクチャとモデルファミリーを用いて総合的な性能解析を行い、それぞれ事前学習とランダム初期化を行った。その結果,固定特徴抽出器として事前訓練したモデルを用いることで,データセットに関係なく性能が低下することが判明した。対照的に、病理組織学のスライド画像全体の顕微鏡観察により、より良い性能が得られる。また、より深く複雑なアーキテクチャが必ずしも最高のパフォーマンスをもたらすとは限らないことも判明した。この観察は、ImageNetの改善が医療画像タスクと平行ではないことを示唆している。医療領域内では、ネットワークアーキテクチャのパフォーマンスは、データセットのシフトを伴うモデルファミリによって異なる。これは、特定のモダリティ内のモデルのパフォーマンスが、同じ領域内の別のモダリティに対して決定的でないことを示している。本研究は, 医用画像における深層学習技術の応用についてより深く理解し, 5つの異なる実験環境下での, 異なる医用画像データセットにおける事前学習ネットワークの影響を明らかにする。 Imaging techniques such as Chest X-rays, whole slide images, and optical coherence tomography serve as the initial screening and detection for a wide variety of medical pulmonary and ophthalmic conditions respectively. This paper investigates the intricacies of using pretrained deep convolutional neural networks with transfer learning across diverse medical imaging datasets with varying modalities for binary and multiclass classification. We conducted a comprehensive performance analysis with ten network architectures and model families each with pretraining and random initialization. Our finding showed that the use of pretrained models as fixed feature extractors yields poor performance irrespective of the datasets. Contrary, histopathology microscopy whole slide images have better performance. It is also found that deeper and more complex architectures did not necessarily result in the best performance. This observation implies that the improvements in ImageNet are not parallel to the medical imaging tasks. Within a medical domain, the performance of the network architectures varies within model families with shifts in datasets. This indicates that the performance of models within a specific modality may not be conclusive for another modality within the same domain. This study provides a deeper understanding of the applications of deep learning techniques in medical imaging and highlights the impact of pretrained networks across different medical imaging datasets under five different experimental settings.	翻訳日:2024-09-04 12:24:11 公開日:2024-09-02
# シュロディンガーダイナミクスを用いた量子計算におけるユニタリゲートの研究と実装 Study And Implementation of Unitary Gates in Quantum Computation Using Schrodinger Dynamics ( http://arxiv.org/abs/2408.17035v2 ) ライセンス: Link先を確認	Kumar Gautam,	(参考訳) この論文は、原子や振動子などの物理系を電場や磁場によってゆがめる量子ゲートを実現するという概念を探求している。基本的な考え方は、時間非依存のハミルトニアン $H_0$ が、時変ハミルトニアン $f(t)V$, where $f(t)$ が時間のスカラー関数であり、$V$ が $H_0$ に可換でないエルミート作用素であれば、時変ハミルトニアン $H_0+f(t)V$ に対応するシュロディンガー進化によってユニタリ作用素の大きなクラスが実現できるということである。これはリー群とリー代数におけるベーカー・カンベル・ハウスドルフの公式の結果である。まず、調和振動子を時間に依存しない非調和項で摂動させ、次に$U_g=e^{-\iota T H_1}$を演算する。そして、調和ハミルトニアンを線形時間依存項で摂動し、時間で$H(t)$に対応するユニタリ進化を計算する。このゲートは$U(T)=U(T,\epsilon,f)=T\{e^{-\iota\int_0^TH(t)dt}\}$と表すことができる。アンハーモニックゲート$U_g$は、制御されたユニタリゲートや量子フーリエ変換ゲートなどの量子計算においてよく使われるゲートのホストに置き換えられる。制御電界を適切に選択する。この論文はまた可制御性の問題にも対処し、時間 $f(t), 0\leq t\leq T$ のスカラー実値関数が存在する条件に基づいて、$\|\psi_\iota\rangle$ が任意の初期波動関数であり、$\|\psi_f\rangle$ が任意の最終波関数であれば、$U(T,f)\|\psi_i\rangle=\|\psi_f\rangle$ が成立する。部分解は、ユニタリ進化核をダイソン級数 truncated バージョンで置き換えることで得られる。すべての設計手順において、現れるゲートは無限次元であり、時間の制御可能な関数によって変調される原子と電磁場の間の相互作用を持つ。 This thesis explores the concept of realizing quantum gates using physical systems like atoms and oscillators perturbed by electric and magnetic fields. The basic idea is that if a time-independent Hamiltonian $H_0$ is perturbed by a time-varying Hamiltonian of the form $f(t)V$, where $f(t)$ is a scalar function of time and $V$ is a Hermitian operator that does not commute with $H_0$, then a large class of unitary operators can be realized via the Schrodinger evolution corresponding to the time-varying Hamiltonian $H_0+f(t)V$. This is a consequence of the Baker-Campbell-Hausdorff formula in Lie groups and Lie algebras. The thesis addresses two problems based on this idea: first, taking a Harmonic oscillator and perturbing it with a time-independent anharmonic term, and then computing $U_g=e^{-\iota T H_1}$. Then, perturbing the harmonic Hamiltonian with a linear time-dependent term, and calculating the unitary evolution corresponding to $H(t)$ at time $T$. This gate can be expressed as $U(T)=U(T,\epsilon,f)=T\{e^{-\iota\int_0^TH(t)dt}\}$.The anharmonic gate $U_g$ is replaced by a host of commonly used gates in quantum computation, such as controlled unitary gates and quantum Fourier transform gates. The control electric field is selected appropriately. The thesis also addresses the controllability issue, determining under what conditions there exists a scalar real valued function of time $f(t), 0\leq t\leq T$ such that if $\|\psi_\iota\rangle$ is any initial wave function and $\|\psi_f\rangle$ is any final wave function, then $U(T,f)\|\psi_i\rangle=\|\psi_f\rangle$. A partial solution was obtained by replacing the unitary evolution kernel by its Dyson series truncated version. In all design procedures, the gates that appear are infinite-dimensional, with an interaction between the atom and the electromagnetic field modulated by a controllable function of time.	翻訳日:2024-09-04 12:24:11 公開日:2024-09-02
# 対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳対訳 Instant Adversarial Purification with Adversarial Consistency Distillation ( http://arxiv.org/abs/2408.17064v2 ) ライセンス: Link先を確認	Chun Tong Lei, Hon Ming Yam, Zhongliang Guo, Chun Pong Lau,	(参考訳) ニューラルネットワークは、画像分類を含む広範囲の応用において顕著な性能にもかかわらず、微妙な対向ノイズに弱いことも知られている。 DiffPureのような拡散法に基づく浄化法が提案されているが、これらは時間を要する。本稿では,拡散モデルにおける1つのニューラルファンクション評価(NFE)における逆画像の浄化が可能な拡散に基づく浄化モデルである1ステップ制御精製(OSCP)を提案する。一段階の浄化にはLCM(Latent Consistency Model)とControlNetを使用します。 OSCPは,他の拡散型浄化法に比べて計算に親しみやすく,かつ時間効率がよい。さらに, 恒常蒸留と対向摂動の間には, 基本的矛盾がある。この存在論的不協和性に対処するため, 自然および逆多様体を効果的にブリッジし, 潜伏空間のより微妙な調整を容易にする新しい一貫した蒸留フレームワークであるガウス逆雑音蒸留(GAND)を提案する。実験の結果, GAND はフルファインチューン (FFT) を必要とせず, PEFT, eg, LoRA が十分であることがわかった。 Neural networks, despite their remarkable performance in widespread applications, including image classification, are also known to be vulnerable to subtle adversarial noise. Although some diffusion-based purification methods have been proposed, for example, DiffPure, those methods are time-consuming. In this paper, we propose One Step Control Purification (OSCP), a diffusion-based purification model that can purify the adversarial image in one Neural Function Evaluation (NFE) in diffusion models. We use Latent Consistency Model (LCM) and ControlNet for our one-step purification. OSCP is computationally friendly and time efficient compared to other diffusion-based purification methods; we achieve defense success rate of 74.19\% on ImageNet, only requiring 0.1s for each purification. Moreover, there is a fundamental incongruence between consistency distillation and adversarial perturbation. To address this ontological dissonance, we propose Gaussian Adversarial Noise Distillation (GAND), a novel consistency distillation framework that facilitates a more nuanced reconciliation of the latent space dynamics, effectively bridging the natural and adversarial manifolds. Our experiments show that the GAND does not need a Full Fine Tune (FFT); PEFT, e.g., LoRA is sufficient.	翻訳日:2024-09-04 12:24:11 公開日:2024-09-02
# RISSOLE:ブロックワイズ生成と検索誘導によるパラメータ効率拡散モデル RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance ( http://arxiv.org/abs/2408.17095v2 ) ライセンス: Link先を確認	Avideep Mukherjee, Soumya Banerjee, Piyush Rai, Vinay P. Namboodiri,	(参考訳) 拡散ベースのモデルは素晴らしい生成能力を誇示している。しかし、それらには膨大な数のパラメータがあり、結果としてモデルのサイズが膨大になるため、リソース制約のあるデバイスへのデプロイには適さない。ブロックワイズ生成は、画像全体を一度に生成するのではなく、一度に1ブロックを生成できるため、コンパクトな(パラメータ効率のよい)深層生成モデルを設計する上で有望な代替となる。しかし、生成したブロック間のコヒーレンスを確保することは簡単ではないため、ブロックワイズ生成もかなり難しい。そこで我々は,RAGモジュールによって検索された画像の対応するブロックを利用して,ブロックワイズ拡散モデルのトレーニングおよび生成段階を条件に,検索拡張生成(RAG)アプローチを設計する。我々の条件付きスキームは、訓練中に異なるブロックをまたがってコヒーレンスを保証し、その結果、世代間でコヒーレンスを保証します。ベースモデルとして潜在拡散モデル(LDM)を用いて,本手法を実証するが,他のデノナイジング拡散モデルと併用することができる。本稿では,提案手法によるコヒーレンス問題の解法を検証するために,モデルサイズがコンパクトで生成品質に優れたアプローチの有効性を実証するための実体実験を報告する。 Diffusion-based models demonstrate impressive generation capabilities. However, they also have a massive number of parameters, resulting in enormous model sizes, thus making them unsuitable for deployment on resource-constraint devices. Block-wise generation can be a promising alternative for designing compact-sized (parameter-efficient) deep generative models since the model can generate one block at a time instead of generating the whole image at once. However, block-wise generation is also considerably challenging because ensuring coherence across generated blocks can be non-trivial. To this end, we design a retrieval-augmented generation (RAG) approach and leverage the corresponding blocks of the images retrieved by the RAG module to condition the training and generation stages of a block-wise denoising diffusion model. Our conditioning schemes ensure coherence across the different blocks during training and, consequently, during generation. While we showcase our approach using the latent diffusion model (LDM) as the base model, it can be used with other variants of denoising diffusion models. We validate the solution of the coherence problem through the proposed approach by reporting substantive experiments to demonstrate our approach's effectiveness in compact model size and excellent generation quality.	翻訳日:2024-09-04 12:24:11 公開日:2024-09-02
# LSMS:医療画像参照セグメンテーションのための言語誘導型大規模メドセグメンタ LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation ( http://arxiv.org/abs/2408.17347v2 ) ライセンス: Link先を確認	Shuyi Ouyang, Jinyang Zhang, Xiangye Lin, Xilai Wang, Qingqing Chen, Yen-Wei Chen, Lanfen Lin,	(参考訳) 従来の医用画像分割法は、診断と治療のための特定の病変の特定を医師に促すのに不十分である。テキストを指導形式として利用することにより、与えられた言語表現に基づいて画像中の特定の病変をセグメント化する必要があるMIRS(Medicical Image Referring Segmentation)と呼ばれる新しいタスクを導入する。医用画像のさまざまな対象スケールのため、MIRSは、言語指導の下での正確な位置決めとセグメンテーションのために、堅牢な視覚言語モデリングと包括的マルチスケールインタラクションを要求する。しかし、これらの要求を満たすために既存の医用画像分割法は不足しており、セグメント化の精度は不十分である。言語誘導型スケール認識型MedSegmentor (LSMS) というアプローチを提案し,(1) 多様な畳み込みカーネルを利用して,豊富な視覚的知識を習得し,言語的特徴と密接な相互作用を行うことにより,病変の局所化能力を向上させる。(2) - 複数スケールのマルチモーダル特徴をグローバルにモデル化するフルスケールデコーダ。 MIRSに適したデータセットが欠如していることに対処し、RefHL-Seg(Reference Hepatic Lesion Segmentation)と呼ばれるビジョン言語医療データセットを構築した。本データセットは,231例の腹部CTスライス2,283例からなる。各種データセットにおけるMIRSと従来の医用画像分割作業における LSMS の性能を検証した。 LSMSは計算コストが低いすべてのデータセットで一貫してパフォーマンスが向上します。コードとデータセットがリリースされる。 Conventional medical image segmentation methods have been found inadequate in facilitating physicians with the identification of specific lesions for diagnosis and treatment. Given the utility of text as an instructional format, we introduce a novel task termed Medical Image Referring Segmentation (MIRS), which requires segmenting specified lesions in images based on the given language expressions. Due to the varying object scales in medical images, MIRS demands robust vision-language modeling and comprehensive multi-scale interaction for precise localization and segmentation under linguistic guidance. However, existing medical image segmentation methods fall short in meeting these demands, resulting in insufficient segmentation accuracy. In response, we propose an approach named Language-guided Scale-aware MedSegmentor (LSMS), incorporating two appealing designs: (1)~a Scale-aware Vision-Language Attention module that leverages diverse convolutional kernels to acquire rich visual knowledge and interact closely with linguistic features, thereby enhancing lesion localization capability; (2)~a Full-Scale Decoder that globally models multi-modal features across various scales, capturing complementary information between scales to accurately outline lesion boundaries. Addressing the lack of suitable datasets for MIRS, we constructed a vision-language medical dataset called Reference Hepatic Lesion Segmentation (RefHL-Seg). This dataset comprises 2,283 abdominal CT slices from 231 cases, with corresponding textual annotations and segmentation masks for various liver lesions in images. We validated the performance of LSMS for MIRS and conventional medical image segmentation tasks across various datasets. Our LSMS consistently outperforms on all datasets with lower computational costs. The code and datasets will be released.	翻訳日:2024-09-04 12:24:11 公開日:2024-09-02

Title

Authors

Abstract

論文公表日・翻訳日

# MaskMol: アクティビティ・クリフのための知識誘導型分子画像事前学習フレームワーク

MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for Activity Cliffs ( http://arxiv.org/abs/2409.12926v1 )

ライセンス: Link先を確認

Zhixiang Cheng, Hongxin Xiang, Pengsen Ma, Li Zeng, Xin Jin, Xixi Yang, Jianxin Lin, Yang Deng, Bosheng Song, Xinxin Feng, Changhui Deng, Xiangxiang Zeng,

(参考訳) 活性崖(英: Activity cliffs)は、構造的に類似しているが、その機能に顕著な違いを示す分子の対を指すもので、モデル表現の崩壊を招き、モデルを区別することが困難になる。我々の研究は、分子の類似性が増加するにつれて、グラフベースの手法はこれらのニュアンスを捉えるのに苦労する一方で、画像ベースのアプローチは事実上区別を保っていることを示唆している。そこで我々は知識誘導型分子画像自己教師型学習フレームワークMaskMolを開発した。 MaskMolは、原子、結合、サブ構造といった複数のレベルの分子知識を考慮し、分子画像の表現を正確に学習する。ピクセルマスキングタスクを利用することで、MaskMolは分子画像からきめ細かい情報を抽出し、微妙な構造変化を特定するために既存のディープラーニングモデルの限界を克服する。実験結果から,20のマクロ分子標的における活動崖推定と複合機能予測におけるMaskMolの精度と伝達性を示し,25の最先端ディープラーニングおよび機械学習アプローチを上回った。可視化分析は、活動崖関連分子サブ構造を同定する上で、MaskMolの高い生物学的解釈可能性を明らかにする。特に MaskMol を用いて腫瘍治療に用いられる候補EP4阻害剤を同定した。本研究は, 活動崖に対する意識を高めるだけでなく, 分子画像表現学習と仮想スクリーニング, 薬物発見の進展, 構造活性関係(SAR)の新たな洞察をもたらす新しい手法も導入する。

Activity cliffs, which refer to pairs of molecules that are structurally similar but show significant differences in their potency, can lead to model representation collapse and make the model challenging to distinguish them. Our research indicates that as molecular similarity increases, graph-based methods struggle to capture these nuances, whereas image-based approaches effectively retain the distinctions. Thus, we developed MaskMol, a knowledge-guided molecular image self-supervised learning framework. MaskMol accurately learns the representation of molecular images by considering multiple levels of molecular knowledge, such as atoms, bonds, and substructures. By utilizing pixel masking tasks, MaskMol extracts fine-grained information from molecular images, overcoming the limitations of existing deep learning models in identifying subtle structural changes. Experimental results demonstrate MaskMol's high accuracy and transferability in activity cliff estimation and compound potency prediction across 20 different macromolecular targets, outperforming 25 state-of-the-art deep learning and machine learning approaches. Visualization analyses reveal MaskMol's high biological interpretability in identifying activity cliff-relevant molecular substructures. Notably, through MaskMol, we identified candidate EP4 inhibitors that could be used to treat tumors. This study not only raises awareness about activity cliffs but also introduces a novel method for molecular image representation learning and virtual screening, advancing drug discovery and providing new insights into structure-activity relationships (SAR).

翻訳日:2024-11-07 12:48:01 公開日:2024-09-02

# ニューラルネットワークのための高能率汎用光加速器

An Efficient General-Purpose Optical Accelerator for Neural Networks ( http://arxiv.org/abs/2409.12966v1 )

ライセンス: Link先を確認

Sijie Fei, Amro Eldebiky, Grace Li Zhang, Bing Li, Ulf Schlichtmann,

(参考訳) 汎用光アクセラレータ(GOA)は、低レイテンシとエネルギー消費のため、ディープニューラルネットワーク(DNN)を加速する有望なプラットフォームとして登場した。このような加速器は、通常、所定の数のマッハ・ツェンダー干渉計(MZI)から構成される。しかし、このインターリービングアーキテクチャは、重み行列とGOAアーキテクチャのミスマッチにより、様々なサイズのニューラルネットワークを加速する際の効率が低い。本研究では,ニューラルネットワークのGOAへのマッピング効率を高めるために,ハイブリッドGOAアーキテクチャを提案する。このアーキテクチャでは、独立したMZIモジュールはマイクロリング共振器(MRR)と接続され、それらを結合して大きなニューラルネットワークを効率的に処理することができる。これらのモジュールはそれぞれ、可変係数で調整された入力を持つユニタリ行列を実装している。提案アーキテクチャのパラメータを遺伝的アルゴリズムを用いて探索する。ニューラルネットワークの精度を高めるために、選択された重み行列を特異値分解(SVD)を適用した複数のユニタリ行列に拡張する。ニューラルネットワークのカーネルも、オンチップの計算リソースを使用するように調整される。実験の結果,提案したアーキテクチャ上でのニューラルネットワークのマッピング効率は,データセットのCifar10とCifar100でそれぞれ21.87%,21.20%,24.69%,25.52%,VGG16とResnet18では25.52%向上した。また、消費電力と計算遅延をそれぞれ67%以上と21%以上削減することができる。

General-purpose optical accelerators (GOAs) have emerged as a promising platform to accelerate deep neural networks (DNNs) due to their low latency and energy consumption. Such an accelerator is usually composed of a given number of interleaving Mach-Zehnder- Interferometers (MZIs). This interleaving architecture, however, has a low efficiency when accelerating neural networks of various sizes due to the mismatch between weight matrices and the GOA architecture. In this work, a hybrid GOA architecture is proposed to enhance the mapping efficiency of neural networks onto the GOA. In this architecture, independent MZI modules are connected with microring resonators (MRRs), so that they can be combined to process large neural networks efficiently. Each of these modules implements a unitary matrix with inputs adjusted by tunable coefficients. The parameters of the proposed architecture are searched using genetic algorithm. To enhance the accuracy of neural networks, selected weight matrices are expanded to multiple unitary matrices applying singular value decomposition (SVD). The kernels in neural networks are also adjusted to use up the on-chip computational resources. Experimental results show that with a given number of MZIs, the mapping efficiency of neural networks on the proposed architecture can be enhanced by 21.87%, 21.20%, 24.69%, and 25.52% for VGG16 and Resnet18 on datasets Cifar10 and Cifar100, respectively. The energy consumption and computation latency can also be reduced by over 67% and 21%, respectively.

翻訳日:2024-11-07 12:36:59 公開日:2024-09-02

# プログラミングの割り当てをグラディングするとき、人間はどんな一貫性があるのか?

How Consistent Are Humans When Grading Programming Assignments? ( http://arxiv.org/abs/2409.12967v1 )

ライセンス: Link先を確認

Marcus Messer, Neil C. C. Brown, Michael Kölling, Miaojing Shi,

(参考訳) 学生に一貫した総合評価を提供することが重要である。小さなコホートは通常、クラスリーダーのような単一の評価官によって評価されるが、より大きなコホートは複数の評価官によって評価されることが多く、矛盾する格付けのリスクが増大する。プログラミング課題の人間のグレーティングの一貫性を調べるために、私たちは、各グレード40 CS1イントロダクティブJava課題の参加者28人に、正確性、コードのエレガンス、可読性、ドキュメントのグレードとフィードバックを提供し、40の割り当てを20の2つのバッチに分割した。 20の2回目のバッチでは、まず1つの課題を複製し、個々の評価者の内部の一貫性を分析しました。我々はクリッペンドルフの $\alpha$ -- a $\alpha > 0.667$ を用いてグループ間の信頼性を測定し、評価に基づいて仮の結論を出すことを推奨した。コードのエレガンス、可読性、ドキュメントに対して、平均$\alpha < 0.1$と、グルーピングの正確さでは平均$\alpha = 0.2$、そして平均$\alpha < 0.1$でした。学習者の個人的一貫性を測定するため,バッチ1とバッチ2の重複代入に対して与えられた学年間距離を測定した。代入が重複であることに気づかなかった22名の参加者は、正当性、コードの優雅性、可読性、ドキュメントについて同じ評価を受けた。平均的なグレード差は、正確性は1.79で、コードのエレガンス、可読性、ドキュメントは1.6未満である。以上の結果から,本研究における人間の学年は,学生の作業の一部を与えるために学年に同意できず,個々に矛盾することが多いことが示唆され,共有ルーリックだけでは整合性を確保するには不十分な「黄金標準」の考え方が欠陥がある可能性が示唆された。

Providing consistent summative assessment to students is important, as the grades they are awarded affect their progression through university and future career prospects. While small cohorts are typically assessed by a single assessor, such as the class leader, larger cohorts are often assessed by multiple assessors, which increases the risk of inconsistent grading. To investigate the consistency of human grading of programming assignments, we asked 28 participants to each grade 40 CS1 introductory Java assignments, providing grades and feedback for correctness, code elegance, readability and documentation; the 40 assignments were split into two batches of 20. In the second batch of 20, we duplicated one assignment from the first to analyse the internal consistency of individual assessors. We measured the inter-rater reliability of the groups using Krippendorf's $\alpha$ -- an $\alpha > 0.667$ is recommended to make tentative conclusions based on the rating. Our groups were inconsistent, with an average $\alpha = 0.2$ when grading correctness and an average $\alpha < 0.1$ for code elegance, readability and documentation. To measure the individual consistency of graders, we measured the distance between the grades they awarded for the duplicated assignment in batch one and batch two. Only one participant of the 22 who didn't notice that the assignment was a duplicate was awarded the same grade for correctness, code elegance, readability and documentation. The average grade difference was 1.79 for correctness and less than 1.6 for code elegance, readability and documentation. Our results show that human graders in our study can not agree on the grade to give a piece of student work and are often individually inconsistent, suggesting that the idea of a ``gold standard'' of human grading might be flawed, and highlights that a shared rubric alone is not enough to ensure consistency.

翻訳日:2024-11-07 12:36:59 公開日:2024-09-02

# MITHOS: 学校における社会と感情の交流を支援する対話型複合現実感トレーニング

MITHOS: Interactive Mixed Reality Training to Support Professional Socio-Emotional Interactions at Schools ( http://arxiv.org/abs/2409.12968v1 )

ライセンス: Link先を確認

Lara Chehayeb, Chirag Bhuvaneshwara, Manuel Anglet, Bernhard Hilpert, Ann-Kristin Meyer, Dimitra Tsovaltzi, Patrick Gebhard, Antje Biermann, Sinah Auchtor, Nils Lauinger, Julia Knopf, Andreas Kaiser, Fabian Kersting, Gregor Mehlmann, Florian Lingenfelser, Elisabeth André,

(参考訳) 対立状態に苦しむ教師は、しばしば恥と自白を経験するが、それは無能感に関係しているが、怒りとして外在することがある。混合信号のセンシングは、感情の規制を発達させるための緊急規則に失敗し、生徒が自分の感情を混乱させ、感情の規制を妨げてしまう可能性がある。したがって、感情の個々の経験に利益をもたらすだけでなく、効果的な対人感情の規制を育み、状況の管理方法に影響を与えることができる。 MITHOSは、教室衝突時の現実的なシチュエーション学習の機会を通じて、教師のコンフリクト解決スキルを訓練することを目的としたシステムである。 4つの段階において、MITHOSは教師の社会的感情的自己認識、視点決定、肯定的な態度をサポートする。以下に示す。イ自由な社会的相互作用を訓練し、相互の学生・エージェントの反応から自然の社会的フィードバックを受ける安全な仮想環境 b) アバターを通した空間的状況的視点 c) 共同規制プロセスによる感情経験に関する個別の仮想リフレクションガイダンス d)専門的行動戦略に関する専門家のフィードバック。本章は、半自動ウィザード・オブ・オズ(WoZ)システムにおける4つのステージとその実装について述べる。 WoZシステムは、完全に自動化されたハイブリッド(機械学習とモデルベース)システムの開発に使用されるデータを収集し、基礎となる心理的およびコンフリクト解決モデルを検証する。本稿では、シナリオリアリズムの観点からアプローチを検証する結果と、行動類似性を伴う自己認識の先行者に対する外部アバター類似性の効果の体系的検証について述べる。この章は、人間中心で一般化可能なXRのための学際的な研究を行うための共通の方法論に貢献し、それをサポートするように設計されたシステムを提示している。

Teachers in challenging conflict situations often experience shame and self-blame, which relate to the feeling of incompetence but may externalise as anger. Sensing mixed signals fails the contingency rule for developing affect regulation and may result in confusion for students about their own emotions and hinder their emotion regulation. Therefore, being able to constructively regulate emotions not only benefits individual experience of emotions but also fosters effective interpersonal emotion regulation and influences how a situation is managed. MITHOS is a system aimed at training teachers' conflict resolution skills through realistic situative learning opportunities during classroom conflicts. In four stages, MITHOS supports teachers' socio-emotional self-awareness, perspective-taking and positive regard. It provides: a) a safe virtual environment to train free social interaction and receive natural social feedback from reciprocal student-agent reactions, b) spatial situational perspective taking through an avatar, c) individual virtual reflection guidance on emotional experiences through co-regulation processes, and d) expert feedback on professional behavioural strategies. This chapter presents the four stages and their implementation in a semi-automatic Wizard-of-Oz (WoZ) System. The WoZ system affords collecting data that are used for developing the fully automated hybrid (machine learning and model-based) system, and to validate the underlying psychological and conflict resolution models. We present results validating the approach in terms of scenario realism, as well as a systematic testing of the effects of external avatar similarity on antecedents of self-awareness with behavior similarity. The chapter contributes to a common methodology of conducting interdisciplinary research for human-centered and generalisable XR and presents a system designed to support it.

翻訳日:2024-11-07 12:36:59 公開日:2024-09-02

# 目を通して見る:視覚言語モデルを用いた視覚的視点の評価

Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models ( http://arxiv.org/abs/2409.12969v1 )

ライセンス: Link先を確認

Gracjan Góral, Alicja Ziarko, Michal Nauman, Maciej Wołczyk,

(参考訳) 視覚的視点取り(VPT)は、他人の視点を理解する能力であり、個人が他人の行動を予測することを可能にする。例えば、運転手は歩行者が見ているものを評価することで事故を避けることができる。人間は通常、このスキルを幼少期に開発するが、最近登場したビジョン言語モデル(VLM)がそのような能力を持っているかどうかは不明だ。さらに、これらのモデルが現実世界にますます展開されるにつれて、VPTのようなニュアンスなタスクをどのように実行するかを理解することが不可欠です。本稿では、VPTスキルをテストするために、Isle-BricksとIsle-Dotsという2つの手動でキュレートしたデータセットを導入し、それを12のVLMの評価に利用した。すべてのモデルにおいて、視点を取る必要がある場合、大幅なパフォーマンス低下が観測される。さらに、オブジェクト検出タスクのパフォーマンスはVPTタスクのパフォーマンスと相関が低く、既存のベンチマークではこの問題を理解するのに十分でない可能性があることを示唆している。コードとデータセットはhttps://sites.google.com/view/perspective-takeで確認できる。

Visual perspective-taking (VPT), the ability to understand the viewpoint of another person, enables individuals to anticipate the actions of other people. For instance, a driver can avoid accidents by assessing what pedestrians see. Humans typically develop this skill in early childhood, but it remains unclear whether the recently emerging Vision Language Models (VLMs) possess such capability. Furthermore, as these models are increasingly deployed in the real world, understanding how they perform nuanced tasks like VPT becomes essential. In this paper, we introduce two manually curated datasets, Isle-Bricks and Isle-Dots for testing VPT skills, and we use it to evaluate 12 commonly used VLMs. Across all models, we observe a significant performance drop when perspective-taking is required. Additionally, we find performance in object detection tasks is poorly correlated with performance on VPT tasks, suggesting that the existing benchmarks might not be sufficient to understand this problem. The code and the dataset will be available at https://sites.google.com/view/perspective-taking

翻訳日:2024-11-07 12:36:59 公開日:2024-09-02

# TRACE: 分散クリックストリームイベントシーケンスからのトランスフォーマーベースのユーザ表現

TRACE: Transformer-based user Representations from Attributed Clickstream Event sequences ( http://arxiv.org/abs/2409.12972v1 )

ライセンス: Link先を確認

William Black, Alexander Manlove, Jack Pennington, Andrea Marchini, Ercument Ilhan, Vilda Markeviciute,

(参考訳) 旅行eコマースのWebサイトをナビゲートするユーザにとって、製品の調査と購入のプロセスは、長い期間にわたって多くのセッションにまたがる複雑なブラウジングパターンをもたらすことが多い。結果として得られたクリックストリームデータは、これらのユーザージャーニーを年代記し、パーソナライズされたレコメンデーションを大幅に強化できる洞察を導き出す貴重な機会を提供する。本稿では,リアルタイムレコメンデーションアプリケーションのために,ライブマルチセッションクリックストリームからリッチなユーザ埋め込みを生成するためのトランスフォーマーベースの新しいアプローチであるTRACEを紹介する。 TRACEは、複数のユーザセッションにまたがるサイト全体のページビューシーケンスを活用して、長期的なエンゲージメントをモデル化する。マルチタスク学習フレームワークを用いて、TRACEは、低次元表現に蒸留された包括的なユーザの好みと意図をキャプチャする。 TRACE がバニラトランスフォーマーや LLM スタイルのアーキテクチャよりも優れていることを実ユーザ旅行の大規模な旅行eコマースデータセットに関する大規模な実験を通じて実証する。学習した埋め込みの可視化は、潜伏したユーザ状態と行動に対応する有意義なクラスタを明らかにし、ナンスなユーザインタラクションと嗜好をキャプチャしてレコメンデーションシステムを強化するTRACEの可能性を強調している。

For users navigating travel e-commerce websites, the process of researching products and making a purchase often results in intricate browsing patterns that span numerous sessions over an extended period of time. The resulting clickstream data chronicle these user journeys and present valuable opportunities to derive insights that can significantly enhance personalized recommendations. We introduce TRACE, a novel transformer-based approach tailored to generate rich user embeddings from live multi-session clickstreams for real-time recommendation applications. Prior works largely focus on single-session product sequences, whereas TRACE leverages site-wide page view sequences spanning multiple user sessions to model long-term engagement. Employing a multi-task learning framework, TRACE captures comprehensive user preferences and intents distilled into low-dimensional representations. We demonstrate TRACE's superior performance over vanilla transformer and LLM-style architectures through extensive experiments on a large-scale travel e-commerce dataset of real user journeys, where the challenges of long page-histories and sparse targets are particularly prevalent. Visualizations of the learned embeddings reveal meaningful clusters corresponding to latent user states and behaviors, highlighting TRACE's potential to enhance recommendation systems by capturing nuanced user interactions and preferences

翻訳日:2024-11-07 12:36:59 公開日:2024-09-02

# 有限オートマタによる大規模言語モデルの宣言的統合と管理:自動化・コミュニケーション・倫理への応用

Declarative Integration and Management of Large Language Models through Finite Automata: Application to Automation, Communication, and Ethics ( http://arxiv.org/abs/2409.13693v1 )

ライセンス: Link先を確認

Thierry Petit, Arnault Pachot, Claire Conan-Vrinat, Alexandre Dubarry,

(参考訳) 本稿では,Large Language Models(LLM)を共有履歴と宣言的に組み合わせて設計した革新的なアーキテクチャを紹介する。我々のアプローチは汎用的で宣言的であり、イベント管理システムと組み合わされた有限オートマトンの構築に依存している。開発ツールは、プログラミングの最小限の労力、特にポジティブ心理学の手法をAIに統合するために、LLMの効率的で複雑な統合を容易にするために作られた。この手法の柔軟性は、自動化、コミュニケーション、倫理の応用例を通して実証される。

This article introduces an innovative architecture designed to declaratively combine Large Language Models (LLMs) with shared histories, and triggers to identify the most appropriate LLM for a given task. Our approach is general and declarative, relying on the construction of finite automata coupled with an event management system. The developed tool is crafted to facilitate the efficient and complex integration of LLMs with minimal programming effort, especially, but not only, for integrating methods of positive psychology to AI. The flexibility of our technique is demonstrated through applied examples in automation, communication, and ethics.

翻訳日:2024-11-07 05:57:35 公開日:2024-09-02

# プロジェクトコンフリクトを成功に導く - ステアメイトを解決するための複雑なシステム設計アプローチ

Confronting Project Conflicts into Success: a Complex Systems Design Approach to Resolving Stalemates ( http://arxiv.org/abs/2409.10549v1 )

ライセンス: Link先を確認

L. G. Teuber, A. R. M. Wolfert,

(参考訳) 今日の複雑なプロジェクト開発では、ステークホルダーがしばしば遅すぎる。また、多くの場合、システムの振る舞いのみに焦点を当て、個々の利害関係者の好みを統合しない、一方的な技術的な焦点がある。これにより、ステークホルダーは"社会的"から生まれるのではなく、"技術的"衝突に陥る。さらに、ステークホルダーは多面的な開発プロセスにしばしば関与します。したがって、システム現実とステークホルダーの利益の両方を共同合意と技術枠組みに統合する純粋に連想的かつアプリオリ的なアプローチが必要である。最先端のPreferendusは、実証済みのOpen Design Systems(Odesys)方法論に組み込まれたコンピュータ支援設計エンジンであり、成功への複雑さに直面する中立的なツールである。 Preferendusは、風力発電に関連する多くの自由度、プロジェクトの制約、および多くの利害関係者の客観的機能のための、最良の汎用ソリューションを共同で作成するためにデプロイされる。そこで本研究では, 選択型コンジョイント分析(CBCA)手法を用いて, 個々の利害関係者の重み付けを透過的に行うための構造化された利害関係者判断手法を導入する。また、個々の利害関係者選好関数の初期推定値を得ることができる。議論の余地のある外因性要因を内在的デザインパラメータとしてモデル化することにより、どの要因が技術的にも社会的にも(未)解決可能であり、利害と現実が結合しているかを示す。

In today's complex projects development, stakeholders are often involved too late. There is also in many cases a one-sided technical focus that only focuses on the system's behaviour and does not integrate the individual stakeholder preferences. This locks stakeholders into a 'technical' conflict instead of being able to emerge from it 'socially'. Moreover, stakeholders are often involved a-posteriori in a multi-faceted development process which is untransparent, leading to stalemates or even artefacts that nobody ever wants. There is thus a need for a purely associative and a-priori design-supported approach that integrates both system's reality and stakeholder's interests within a joint agreement and technical framework. The state-of-the-art Preferendus, the computer-aided design engine embedded within the proven Open Design Systems (Odesys) methodology, is a neutral tool in confronting complexity into success. The Preferendus is deployed to co-creatively generate a best-fit-for-common-purpose solution for a number of wind farm related degrees of freedom, project constraints and given a number of stakeholder objective functions. Since, the Preferendus design potential for a stalemate depends strongly on stakeholder interest, importance and trust, in this paper an structured stakeholder judgement approach is introduced to transparently arrive at individual stakeholder weights using a choice-based conjoint analysis (CBCA) method. This method also allows for obtaining an initial estimate for the individual stakeholder preference functions. By modelling disputable exogenous factors as endogenous design parameters, it is also shown for which factors the stalemate problem is indeed both technically and socially (un)solvable, while interests and reality are conjoined.

翻訳日:2024-09-22 21:22:31 公開日:2024-09-02

# エージェント・ソサエティ: 現実世界の骨格と大規模言語モデルによるテクスチャの融合

Agentic Society: Merging skeleton from real world and texture from Large Language Model ( http://arxiv.org/abs/2409.10550v1 )

ライセンス: Link先を確認

Yuqi Bai, Kun Sun, Huishi Yin,

(参考訳) 大規模言語モデル(LLM)やエージェント技術の最近の進歩は、社会科学実験のシミュレーションに有望な解決策を提供するが、多くの人が必要とする実世界の人口のデータが利用可能であることは、依然として大きな課題である。本稿では,人口統計データとLCMを用いて仮想人口生成を行い,資源要件を著しく低減し,実世界のデータに関連するプライバシーコンプライアンス問題を回避し,統計的真理性を維持した新たな枠組みについて検討する。実世界の国勢調査データに基づいて,まず人口統計特性を反映したペルソナを作成した。次に、画像生成モデルに類似した手法を用いて、複雑な詳細でこれらのペルソナを豊かにするためにLLMを用いるが、テキストデータに適用する。さらに,人格特性テスト,特に,生成したペルソナの深さと現実性を高めるビッグファイブモデルに基づいて,LLMの能力に対する本手法の有効性を評価するための枠組みを提案する。予備実験と分析により,社会科学実験における多様な人間の行動のシミュレーションに不可欠な多様性を持つペルソナを創出することが実証された。しかし, 評価結果から, 現在のLSMの能力に限界があるため, 統計的真理性の弱い兆候しか得られないことが示唆された。我々の研究から得た洞察は、人間の価値観と現実の複雑さを反映することの間のLCM内の緊張も強調する。厳密で厳密なテストは、さらなる研究を求めている。私たちのコードはhttps://github.com/baiyuqi/agentic-society.gitで公開されています。

Recent advancements in large language models (LLMs) and agent technologies offer promising solutions to the simulation of social science experiments, but the availability of data of real-world population required by many of them still poses as a major challenge. This paper explores a novel framework that leverages census data and LLMs to generate virtual populations, significantly reducing resource requirements and bypassing privacy compliance issues associated with real-world data, while keeping a statistical truthfulness. Drawing on real-world census data, our approach first generates a persona that reflects demographic characteristics of the population. We then employ LLMs to enrich these personas with intricate details, using techniques akin to those in image generative models but applied to textual data. Additionally, we propose a framework for the evaluation of the feasibility of our method with respect to capability of LLMs based on personality trait tests, specifically the Big Five model, which also enhances the depth and realism of the generated personas. Through preliminary experiments and analysis, we demonstrate that our method produces personas with variability essential for simulating diverse human behaviors in social science experiments. But the evaluation result shows that only weak sign of statistical truthfulness can be produced due to limited capability of current LLMs. Insights from our study also highlight the tension within LLMs between aligning with human values and reflecting real-world complexities. Thorough and rigorous test call for further research. Our codes are released at https://github.com/baiyuqi/agentic-society.git

翻訳日:2024-09-22 21:22:31 公開日:2024-09-02

# AI Literacy for All: Adjustable Interciplinary Socio-Technical Curriculum

AI Literacy for All: Adjustable Interdisciplinary Socio-technical Curriculum ( http://arxiv.org/abs/2409.10552v1 )

ライセンス: Link先を確認

Sri Yash Tadimalla, Mary Lou Maher,

(参考訳) 本稿では、AIの学際的理解を促進するカリキュラム「AI Literacy for All」と、その社会技術的意味、およびあらゆるレベルの教育への実践的応用について述べる。人工知能(AI)の急速な進化により、従来のAI教育カリキュラムを超えて、AIリテラシーが必要である。 AIリテラシーは、パブリックリテラシー、デザイナのための能力構築、AI概念の概念理解、ドメイン固有のスキルアップなど、さまざまな方法で概念化されている。これらの概念化のほとんどは、ChatGPTのようなジェネレーティブAI(Gen-AI)ツールの公開前に確立された。 AI教育は、AIの原則と応用に焦点を合わせ、AIの原則の熟達、これらの技術の基礎となる数学的基礎、AIソリューションを実装するために必要なプログラミングと数学的スキルを強調している。 AI Literacy for Allでは、技術的および非技術的学習結果を含むバランスの取れたカリキュラムを強調し、学際的な社会技術的文脈において、AI技術の概念的理解と批判的評価を可能にする。本稿では、AIリテラシーの4つの柱として、AIのスコープと技術的側面を理解すること、知識と責任のある方法でGen-AIと対話する方法を学ぶこと、倫理と責任のあるAIの社会技術的問題、そしてAIの社会的および将来の意味について述べる。 AI教育のすべての学習成果をコンピュータサイエンス専攻に含めることが重要であるが、学習成果は、非CS専攻、高校サマーキャンプ、成人労働者、一般人など、他の学習状況に合わせて調整することができる。本稿では、AIへの参加を広げる手段として、より学際的な社会技術アプローチを提供するために、AIリテラシー教育のシフトを提唱する。

This paper presents a curriculum, "AI Literacy for All," to promote an interdisciplinary understanding of AI, its socio-technical implications, and its practical applications for all levels of education. With the rapid evolution of artificial intelligence (AI), there is a need for AI literacy that goes beyond the traditional AI education curriculum. AI literacy has been conceptualized in various ways, including public literacy, competency building for designers, conceptual understanding of AI concepts, and domain-specific upskilling. Most of these conceptualizations were established before the public release of Generative AI (Gen-AI) tools like ChatGPT. AI education has focused on the principles and applications of AI through a technical lens that emphasizes the mastery of AI principles, the mathematical foundations underlying these technologies, and the programming and mathematical skills necessary to implement AI solutions. In AI Literacy for All, we emphasize a balanced curriculum that includes technical and non-technical learning outcomes to enable a conceptual understanding and critical evaluation of AI technologies in an interdisciplinary socio-technical context. The paper presents four pillars of AI literacy: understanding the scope and technical dimensions of AI, learning how to interact with Gen-AI in an informed and responsible way, the socio-technical issues of ethical and responsible AI, and the social and future implications of AI. While it is important to include all learning outcomes for AI education in a Computer Science major, the learning outcomes can be adjusted for other learning contexts, including, non-CS majors, high school summer camps, the adult workforce, and the public. This paper advocates for a shift in AI literacy education to offer a more interdisciplinary socio-technical approach as a pathway to broaden participation in AI.

翻訳日:2024-09-22 21:22:31 公開日:2024-09-02

# フラッピング」大学:LLM支援生涯学習環境

"Flipped" University: LLM-Assisted Lifelong Learning Environment ( http://arxiv.org/abs/2409.10553v1 )

ライセンス: Link先を確認

Kirill Krinkin, Tatiana Berlenko,

(参考訳) 人工知能技術の急速な発展、特にLarge Language Models (LLMs)は、生涯学習の風景に革命をもたらした。本稿では,LLMが支援する自己構築型生涯学習環境の概念的枠組みを提案する。知識と技能の急速な非現実化に追従する上で、従来の教育制度の欠如を強調している。提案する枠組みは、制度化された教育からパーソナライズされた自己駆動型学習への転換を強調する。 LLMの自然言語機能を活用して、動的かつ適応的な学習体験を提供し、知識獲得を支援する個人知的エージェントの作成を促進する。このフレームワークは、パーソナルワールドモデルの構築、学習の二重モード(トレーニングと探索)、再利用可能な学習アーティファクトの作成など、生涯学習の原則を統合する。さらに、効果的な学習軌跡を維持する上で、好奇心駆動学習と反射的実践の重要性を強調している。この論文は、単に知識を構造化したり伝達したりするのではなく、グローバルな知識の整合性を支援することに焦点を当て、教育機関の「華やかな」大学への進化を構想している。

The rapid development of artificial intelligence technologies, particularly Large Language Models (LLMs), has revolutionized the landscape of lifelong learning. This paper introduces a conceptual framework for a self-constructed lifelong learning environment supported by LLMs. It highlights the inadequacies of traditional education systems in keeping pace with the rapid deactualization of knowledge and skills. The proposed framework emphasizes the transformation from institutionalized education to personalized, self-driven learning. It leverages the natural language capabilities of LLMs to provide dynamic and adaptive learning experiences, facilitating the creation of personal intellectual agents that assist in knowledge acquisition. The framework integrates principles of lifelong learning, including the necessity of building personal world models, the dual modes of learning (training and exploration), and the creation of reusable learning artifacts. Additionally, it underscores the importance of curiosity-driven learning and reflective practices in maintaining an effective learning trajectory. The paper envisions the evolution of educational institutions into "flipped" universities, focusing on supporting global knowledge consistency rather than merely structuring and transmitting knowledge.

翻訳日:2024-09-22 21:22:31 公開日:2024-09-02

# 自律運転のためのビジョンベース深部強化学習におけるオフライン学習エンコーダの検討

An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous Driving ( http://arxiv.org/abs/2409.10554v1 )

ライセンス: Link先を確認

Shawan Mohammed, Alp Argun, Nicolas Bonnotte, Gerd Ascheid,

(参考訳) 本研究は、自律運転(AD)のような複雑な部分観測可能なマルコフ決定プロセス(POMDP)において、DRL(Deep Reinforcement Learning)が直面する課題について検討し、これらの環境における視覚に基づくナビゲーションのソリューションを提案する。部分可観測性はRL性能を著しく低下させ、センサ情報とデータ融合を増強して、よりマルコフ的な環境を反映させることにより、これを緩和することができる。しかし、これはより複雑な知覚モジュールを必要とし、RLによるトレーニングは固有の制限のために複雑である。ニューラルネットワークアーキテクチャが複雑化するにつれて、報酬関数がエラー信号としての有効性は低下する。空や特定の物体のようなイメージ内のタスク非関連要素は、さらなる複雑さを生じさせる。我々の研究は、オフラインで訓練されたエンコーダを用いて、自己教師付き学習を通じて大規模なビデオデータセットを活用し、一般化可能な表現を学習する。そして、DRLを通してこれらの表現の上にヘッドネットワークをトレーニングし、CARLA ADシミュレータでエゴ車両を制御することを学習する。本研究では,エンコーダのオフライン学習における学習方法の違いが,AD課題におけるDRLエージェントの性能に及ぼす影響を広範囲に調査する。さらに,CARLAシミュレータにおいて,BDD100Kの運転映像から得られた特徴を直接転送することで,車線追従や衝突回避をゼロショット学習方式で実現することを示す。最後に、転送された表現を効率的に活用するためのRLネットワークに対する様々なアーキテクチャ決定の影響について検討する。そこで本研究では,環境の適切な表現と,それらをRLネットワークに転送する最適な方法を紹介し,検証する。

Our research investigates the challenges Deep Reinforcement Learning (DRL) faces in complex, Partially Observable Markov Decision Processes (POMDP) such as autonomous driving (AD), and proposes a solution for vision-based navigation in these environments. Partial observability reduces RL performance significantly, and this can be mitigated by augmenting sensor information and data fusion to reflect a more Markovian environment. However, this necessitates an increasingly complex perception module, whose training via RL is complicated due to inherent limitations. As the neural network architecture becomes more complex, the reward function's effectiveness as an error signal diminishes since the only source of supervision is the reward, which is often noisy, sparse, and delayed. Task-irrelevant elements in images, such as the sky or certain objects, pose additional complexities. Our research adopts an offline-trained encoder to leverage large video datasets through self-supervised learning to learn generalizable representations. Then, we train a head network on top of these representations through DRL to learn to control an ego vehicle in the CARLA AD simulator. This study presents a broad investigation of the impact of different learning schemes for offline-training of encoders on the performance of DRL agents in challenging AD tasks. Furthermore, we show that the features learned by watching BDD100K driving videos can be directly transferred to achieve lane following and collision avoidance in CARLA simulator, in a zero-shot learning fashion. Finally, we explore the impact of various architectural decisions for the RL networks to utilize the transferred representations efficiently. Therefore, in this work, we introduce and validate an optimal way for obtaining suitable representations of the environment, and transferring them to RL networks.

翻訳日:2024-09-22 21:22:31 公開日:2024-09-02

# 医療におけるMLLMの民主化:資源制約環境における効果的な医療診断のためのTinyLLaVA-Med

Democratizing MLLMs in Healthcare: TinyLLaVA-Med for Efficient Healthcare Diagnostics in Resource-Constrained Settings ( http://arxiv.org/abs/2409.12184v1 )

ライセンス: Link先を確認

Aya El Mir, Lukelo Thadei Luoga, Boyuan Chen, Muhammad Abdullah Hanif, Muhammad Shafique,

(参考訳) 医療にMLLM(Multi-Modal Large Language Model)を配置することは、その高い計算要求と重要なメモリ要求によって妨げられ、Nvidia Jetson Xavierのようなリソース制約のあるデバイスでは特に困難である。この問題は、高度な診断を必要とするがリソースが限られている遠隔医療環境では特に顕著である。本稿では,汎用MLLMであるTinyLLaVAの最適化手法を提案する。この適応には、LLaVA-Medトレーニングパイプラインからインスピレーションを得て、医療データセット上での命令チューニングと微調整のTinyLLaVAが含まれる。提案手法は計算複雑性と消費電力の最小化に成功し,TinyLLaVA-Medは18.9W,メモリは1.9GB,VQA-RADは64.54%,SLAKEは70.70%であった。そのため、TinyLLaVA-Medは、計算資源の少ないハードウェア制約環境において、本質的な機能を維持し、最先端モデルに近い精度を提供する。

Deploying Multi-Modal Large Language Models (MLLMs) in healthcare is hindered by their high computational demands and significant memory requirements, which are particularly challenging for resource-constrained devices like the Nvidia Jetson Xavier. This problem is particularly evident in remote medical settings where advanced diagnostics are needed but resources are limited. In this paper, we introduce an optimization method for the general-purpose MLLM, TinyLLaVA, which we have adapted and renamed TinyLLaVA-Med. This adaptation involves instruction-tuning and fine-tuning TinyLLaVA on a medical dataset by drawing inspiration from the LLaVA-Med training pipeline. Our approach successfully minimizes computational complexity and power consumption, with TinyLLaVA-Med operating at 18.9W and using 11.9GB of memory, while achieving accuracies of 64.54% on VQA-RAD and 70.70% on SLAKE for closed-ended questions. Therefore, TinyLLaVA-Med achieves deployment viability in hardware-constrained environments with low computational resources, maintaining essential functionalities and delivering accuracies close to state-of-the-art models.

翻訳日:2024-09-22 21:12:27 公開日:2024-09-02

# 非線形力学系のスパース同定によるグラフ構造データからのゲバニング方程式の発見

Discovering Governing equations from Graph-Structured Data by Sparse Identification of Nonlinear Dynamical Systems ( http://arxiv.org/abs/2409.04463v1 )

ライセンス: Link先を確認

Mohammad Amin Basiri, Sina Khanmohammadi,

(参考訳) 機械学習(ML)と疎性促進技術の組み合わせは、データから支配方程式を直接抽出し、科学と工学の様々な分野における計算モデルに革命をもたらす。発見された力学モデルは、気候科学、神経科学、生態学、財務学、疫学などの課題に対処するために用いられる。しかし、力学系を発見するための既存のスパース同定法のほとんどは、サブシステム間の相互作用を考慮せずにシステム全体を一つのものとして扱う。結果として、そのようなモデルは創発的なシステムの振る舞いの小さな変化を捉えることができない。そこで我々は,グラフ構造データ(SINDyG)から非線形力学系のスパース同定法を開発し,ネットワーク構造をスパース回帰に組み込んで,基礎となるネットワーク力学を説明するモデルパラメータを同定した。 SINDyGは、精度とモデルの単純さを改善しながら、ネットワーク力学の制御方程式を発見する。

The combination of machine learning (ML) and sparsity-promoting techniques is enabling direct extraction of governing equations from data, revolutionizing computational modeling in diverse fields of science and engineering. The discovered dynamical models could be used to address challenges in climate science, neuroscience, ecology, finance, epidemiology, and beyond. However, most existing sparse identification methods for discovering dynamical systems treat the whole system as one without considering the interactions between subsystems. As a result, such models are not able to capture small changes in the emergent system behavior. To address this issue, we developed a new method called Sparse Identification of Nonlinear Dynamical Systems from Graph-structured data (SINDyG), which incorporates the network structure into sparse regression to identify model parameters that explain the underlying network dynamics. SINDyG discovers the governing equations of network dynamics while offering improvements in accuracy and model simplicity.

翻訳日:2024-09-15 05:31:27 公開日:2024-09-02

# 病理診断のための脳波言語モデル

EEG-Language Modeling for Pathology Detection ( http://arxiv.org/abs/2409.07480v1 )

ライセンス: Link先を確認

Sam Gijsen, Kerstin Ritter,

(参考訳) マルチモーダル言語モデリングは、大規模言語モデルの進歩を活用して、有能なマルチモーダルモデルを事前訓練する最近のブレークスルーを構成する。事前学習中の自然言語の統合は、特にコンピュータビジョンにおいて、学習された表現を大幅に改善することが示されている。しかし、機能的脳データ領域における多モーダル言語モデリングの有効性、特に病理診断の進歩は未解明のままである。本研究は臨床報告に基づく脳波モデルの先駆者であり,脳波は15,000である。我々は,この新たな領域にマルチモーダルアライメントを行う手法を拡張し,脳波言語モデルのトレーニングに有用なレポート中のテキスト情報について検討する。以上の結果から,患者の臨床経過,脳波の描写,医師の解釈など,さまざまな報告セグメントに曝露されることから,モデルがより豊かな表現を学習できることが示唆された。より狭い臨床テキスト情報に曝露されたモデルと比較して,臨床報告に基づいて脳波を検索するモデルが(その逆も)極めて高い精度で見つかる。しかし、これは対照的な学習アプローチを使用する場合にのみ観察される。特にアノテーションの少ないレギュレーションでは、ゼロショット分類と線形プローブの両方で示されるように、脳波言語モデルの表現は、脳波のみのモデルと比較して、病理診断を大幅に改善することができる。これらの結果は,脳活動データと臨床テキストの統合の可能性を強調し,脳波言語モデルが臨床応用の大きな進展を示すことを示唆している。

Multimodal language modeling constitutes a recent breakthrough which leverages advances in large language models to pretrain capable multimodal models. The integration of natural language during pretraining has been shown to significantly improve learned representations, particularly in computer vision. However, the efficacy of multimodal language modeling in the realm of functional brain data, specifically for advancing pathology detection, remains unexplored. This study pioneers EEG-language models trained on clinical reports and 15000 EEGs. We extend methods for multimodal alignment to this novel domain and investigate which textual information in reports is useful for training EEG-language models. Our results indicate that models learn richer representations from being exposed to a variety of report segments, including the patient's clinical history, description of the EEG, and the physician's interpretation. Compared to models exposed to narrower clinical text information, we find such models to retrieve EEGs based on clinical reports (and vice versa) with substantially higher accuracy. Yet, this is only observed when using a contrastive learning approach. Particularly in regimes with few annotations, we observe that representations of EEG-language models can significantly improve pathology detection compared to those of EEG-only models, as demonstrated by both zero-shot classification and linear probes. In sum, these results highlight the potential of integrating brain activity data with clinical text, suggesting that EEG-language models represent significant progress for clinical applications.

翻訳日:2024-09-15 05:01:16 公開日:2024-09-02

# 限られた結果データを用いた治療効果の効率的な評価におけるサロゲートの役割について

On the role of surrogates in the efficient estimation of treatment effects with limited outcome data ( http://arxiv.org/abs/2003.12408v4 )

ライセンス: Link先を確認

Nathan Kallus, Xiaojie Mao,

(参考訳) 多くの実験的、観察的な研究において、関心の結果を観察することはしばしば困難またはコストがかかり、平均治療効果(ATE)を推定する有効なサンプルサイズが減少する。一次利害関係にない結果のみを代理する単位にデータを組み込むことは、ATE推定の精度を高めることができる。我々は、厳格な代理条件を課すことを控え、サロゲートを目標とする結果の完全な代替として許容する。代わりに、未確立の処理の割り当てや欠如、それに対応する重複条件以外の仮定を伴わずに、サロゲート結果の豊富な観察によって、対象とする結果の可利用かつ限定的な観察を補う。ポテンシャルゲインを定量化するために、圧倒的な単位数と同等数の単位が欠落した場合に、ATE推定と代理無しの効率境界の差を導出する。我々は,これらの効率向上を実現するために,ロバストなATE推定と推論手法を開発した。職種訓練の長期学習効果を実証的に実証した。

In many experimental and observational studies, the outcome of interest is often difficult or expensive to observe, reducing effective sample sizes for estimating average treatment effects (ATEs) even when identifiable. We study how incorporating data on units for which only surrogate outcomes not of primary interest are observed can increase the precision of ATE estimation. We refrain from imposing stringent surrogacy conditions, which permit surrogates as perfect replacements for the target outcome. Instead, we supplement the available, albeit limited, observations of the target outcome with abundant observations of surrogate outcomes, without any assumptions beyond unconfounded treatment assignment and missingness and corresponding overlap conditions. To quantify the potential gains, we derive the difference in efficiency bounds on ATE estimation with and without surrogates, both when an overwhelming or comparable number of units have missing outcomes. We develop robust ATE estimation and inference methods that realize these efficiency gains. We empirically demonstrate the gains by studying long-term-earning effects of job training.

翻訳日:2024-09-07 07:35:31 公開日:2024-09-02

# Bitcoin時代のポンプとダンプ:暗号通貨市場操作のリアルタイム検出

Pump and Dumps in the Bitcoin Era: Real Time Detection of Cryptocurrency Market Manipulations ( http://arxiv.org/abs/2005.06610v2 )

ライセンス: Link先を確認

Massimo La Morgia, Alessandro Mei, Francesco Sassi, Julinda Stefa,

(参考訳) ここ数年、暗号通貨はますます人気を博している。専門家でない人々でさえ、これらの証券に投資し始め、今日では暗号通貨取引所は月に1000億ドル以上で取引を処理する。しかし、多くの暗号通貨は流動性が低く、市場操作のスキームが非常に高い。本稿では,インターネット上のコミュニティによって組織されたポンプ・ダンプ方式の詳細な分析を行う。これらのコミュニティがどのように組織化され、どのように詐欺を行うかを観察します。次に,ポンプ群とダンプ群に関する2つのケーススタディを報告する。最後に,この不正をリアルタイムに検出する手法を導入することで,ポンプやダンプの仕組みが動作している場合に,投資家が市場から外れないようにする。

In the last years, cryptocurrencies are increasingly popular. Even people who are not experts have started to invest in these securities and nowadays cryptocurrency exchanges process transactions for over 100 billion US dollars per month. However, many cryptocurrencies have low liquidity and therefore they are highly prone to market manipulation schemes. In this paper, we perform an in-depth analysis of pump and dump schemes organized by communities over the Internet. We observe how these communities are organized and how they carry out the fraud. Then, we report on two case studies related to pump and dump groups. Lastly, we introduce an approach to detect the fraud in real time that outperforms the current state of the art, so to help investors stay out of the market when a pump and dump scheme is in action.

翻訳日:2024-09-07 07:35:31 公開日:2024-09-02

# フロッケ力学の量子カオス測度

Quantum chaos measures for Floquet dynamics ( http://arxiv.org/abs/2007.07283v3 )

ライセンス: Link先を確認

Amin A. Nizami,

(参考訳) キックローターのような周期的に蹴られたフロケットシステムは、カオスのパラダイム的で実証的な単純なモデルである。非可積分量子力学には、ロシミットエコー(英語版)、自己相関関数(英語版)、OTOC(英語版)などのカオス的挙動の存在(または遷移)の診断尺度がいくつか存在する。我々はこれらの測度を、駆動量子系のユニタリフロケット作用素の固有系の観点から解析的に計算する。これらの式を用いて、トーラス上の量子キックローターの時間的変動を、積分可能かつカオス的ケースに対して決定する。キックローターのより単純な可積分変種に対しては、その力学の表現論的導出を与える。

Periodically kicked Floquet systems such as the kicked rotor are a paradigmatic and illustrative simple model of chaos. For non-integrable quantum dynamics there are several diagnostic measures of the presence of (or the transition to) chaotic behaviour including the Loschmidt echo, autocorrelation function and OTOC. We analytically compute these measures in terms of the eigensystem of the unitary Floquet operator of driven quantum systems. We use these expressions to determine the time variation of the measures for the quantum kicked rotor on the torus, for the integrable as well as the chaotic case. For a simpler integrable variant of the kicked rotor, we also give a representation theoretic derivation of its dynamics.

翻訳日:2024-09-07 07:35:31 公開日:2024-09-02

# 画像のカラー化: 調査とデータセット

Image Colorization: A Survey and Dataset ( http://arxiv.org/abs/2008.10774v4 )

ライセンス: Link先を確認

Saeed Anwar, Muhammad Tahir, Chongyi Li, Ajmal Mian, Fahad Shahbaz Khan, Abdul Wahab Muzaffar,

(参考訳) 画像のカラー化は、グレースケールの画像やビデオフレームのRGB色を推定し、美的および知覚的品質を改善する。過去10年間で、画像のカラー化のためのディープラーニング技術は大幅に進歩し、これらの技術の体系的な調査とベンチマークが必要である。本稿では、最近の最先端のディープラーニングベースの画像カラー化技術に関する総合的な調査を行い、それらの基本的なブロックアーキテクチャ、入力、オプティマイザ、損失関数、トレーニングプロトコル、トレーニングデータなどについて述べる。既存のカラー化テクニックを7つのクラスに分類し、ベンチマークデータセットや評価指標など、パフォーマンスを管理する重要な要因について論じる。既存のデータセットの制限を強調し、着色に特化した新しいデータセットを導入します。我々は既存のデータセットと提案した画像の両方を用いて、既存の画像のカラー化手法を広範囲に実験的に評価する。最後に,既存の手法の限界について議論し,この急速に進化する深層画像の着色に関する課題に対して,可能な解決策と今後の研究方向性を推奨する。データセットと評価のためのコードはhttps://github.com/saeed-anwar/ColorSurvey.comで公開されている。

Image colorization estimates RGB colors for grayscale images or video frames to improve their aesthetic and perceptual quality. Over the last decade, deep learning techniques for image colorization have significantly progressed, necessitating a systematic survey and benchmarking of these techniques. This article presents a comprehensive survey of recent state-of-the-art deep learning-based image colorization techniques, describing their fundamental block architectures, inputs, optimizers, loss functions, training protocols, training data, etc. It categorizes the existing colorization techniques into seven classes and discusses important factors governing their performance, such as benchmark datasets and evaluation metrics. We highlight the limitations of existing datasets and introduce a new dataset specific to colorization. We perform an extensive experimental evaluation of existing image colorization methods using both existing datasets and our proposed one. Finally, we discuss the limitations of existing methods and recommend possible solutions and future research directions for this rapidly evolving topic of deep image colorization. The dataset and codes for evaluation are publicly available at https://github.com/saeed-anwar/ColorSurvey.

翻訳日:2024-09-07 07:30:16 公開日:2024-09-02

# Beta-CoRM:$n$-gramプロファイル分析のためのベイズ的アプローチ

Beta-CoRM: A Bayesian Approach for $n$-gram Profiles Analysis ( http://arxiv.org/abs/2011.11558v3 )

ライセンス: Link先を確認

José A. Perusquía, Jim E. Griffin, Cristiano Villa,

(参考訳) $n$-gramプロファイルは、クラスタリングや分類のために、潜在的に異なる長さの長いシーケンスを分析するのに成功し、広く利用されている。主に、この目的のために機械学習アルゴリズムが使われているが、予測性能にもかかわらず、これらの手法は隠れた構造を発見したり、データの完全な確率的表現を提供することはできない。バイナリ属性として使われる$n$-gramプロファイルのために設計されたベイズ生成モデルの新しいクラスが、この問題に対処するために設計されている。提案したモデリングの柔軟性により、生成モデルにおける特徴選択への簡単なアプローチを考えることができる。さらに,合成および実データシナリオに適用した高速な推論手順のためにスライスサンプリングアルゴリズムを導出し,特徴選択が分類精度を向上させることを示す。

$n$-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for $n$-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.

翻訳日:2024-09-07 07:30:16 公開日:2024-09-02

# Insta-YOLO: リアルタイムインスタンスセグメンテーション

INSTA-YOLO: Real-Time Instance Segmentation ( http://arxiv.org/abs/2102.06777v3 )

ライセンス: Link先を確認

Eslam Mohamed, Abdelrahman Shaker, Ahmad El-Sallab, Mayada Hadhoud,

(参考訳) インスタンスセグメンテーションは、近年、様々なコンピュータビジョンアプリケーションで大きな注目を集めている。これは、同じクラスに属している場合でも、シーンの異なるオブジェクトに異なるIDを提供することを目的としている。これは様々なシナリオ、特にオクルージョンにおいて有用である。インスタンスセグメンテーションは通常、2段階のパイプラインとして実行される。まず、検出されたボックス領域内でオブジェクトを検出し、セマンティックセグメンテーションを行う。このプロセスは、特にセグメンテーション部分において、コストのかかるアップサンプリングを伴う。さらに、LiDARポイントクラウドや空中オブジェクト検出のようないくつかのアプリケーションでは、2段階のパイプラインに余分な複雑さをもたらすように、指向するボックスを予測する必要があることが多い。本稿では,リアルタイムインスタンス分割のための一段階のエンドツーエンドディープラーニングモデルであるInsta-YOLOを提案する。提案モデルはYOLOワンショットオブジェクト検出器にインスパイアされ,ボックス回帰損失はローカライゼーションヘッドの多項式回帰に置き換わる。この修正により、セグメント化アップサンプリングデコーダを完全に省略し、多項式出力係数からインスタンス分割輪郭を生成することができる。加えて、このアーキテクチャはオブジェクト指向オブジェクトに自然に適合します。当社のモデルは,Carnva,Cityscapes,Airbusの3つのデータセットで評価する。その結果,GTX-1080 GPUでは,mAPの精度は2倍に向上した。

Instance segmentation has gained recently huge attention in various computer vision applications. It aims at providing different IDs to different object of the scene, even if they belong to the same class. This is useful in various scenarios, especially in occlusions. Instance segmentation is usually performed as a two-stage pipeline. First, an object is detected, then semantic segmentation within the detected box area. This process involves costly up-sampling, especially for the segmentation part. Moreover, for some applications, such as LiDAR point clouds and aerial object detection, it is often required to predict oriented boxes, which add extra complexity to the two-stage pipeline. In this paper, we propose Insta-YOLO, a novel one-stage end-to-end deep learning model for real-time instance segmentation. The proposed model is inspired by the YOLO one-shot object detector, with the box regression loss is replaced with polynomial regression in the localization head. This modification enables us to skip the segmentation up-sampling decoder altogether and produces the instance segmentation contour from the polynomial output coefficients. In addition, this architecture is a natural fit for oriented objects. We evaluate our model on three datasets, namely, Carnva, Cityscapes and Airbus. The results show our model achieves competitive accuracy in terms of mAP with significant improvement in speed by 2x on GTX-1080 GPU.

翻訳日:2024-09-07 07:30:16 公開日:2024-09-02

# GAN-HA:新しい異種二重識別器ネットワークと近赤外・可視画像融合のための新しい注意基盤融合戦略を備えた生成逆数ネットワーク

GAN-HA: A generative adversarial network with a novel heterogeneous dual-discriminator network and a new attention-based fusion strategy for infrared and visible image fusion ( http://arxiv.org/abs/2404.15992v3 )

ライセンス: Link先を確認

Guosheng Lu, Zile Fang, Jiaju Tian, Haowen Huang, Yuelong Xu, Zhuolin Han, Yaoming Kang, Can Feng, Zhigang Zhao,

(参考訳) 赤外線・可視画像融合(IVIF)は、可視画像からテクスチャの詳細を統合しつつ、赤外線画像からの熱放射情報を保存することを目的としている。熱放射情報は主として画像強度で表現されるが、テクスチャの詳細は画像勾配で表現されるのが一般的である。しかし、既存の二重識別器生成敵ネットワーク(GAN)は、赤外線と可視画像情報の異なる学習ニーズを完全に考慮していない2つの構造的に同一の識別器に依存していることが多い。そこで本研究では,異種二重識別器ネットワークと注意型融合戦略(GAN-HA)を備えた新しいGANを提案する。具体的には、赤外画像と可視画像の本質的な違いを認識し、熱放射情報とテクスチャの詳細を同時に捉える新しい異種二重識別ネットワークを提案する。このネットワーク内の2つの判別器は構造的に異なり、赤外画像のための有能な判別器と、可視画像のための詳細な判別器を含む。彼らはそれぞれ、リッチな画像強度情報と画像勾配情報を学ぶことができる。さらに、異なるソース画像からの学習情報を適切に強調するために、ジェネレータ内に新しい注目ベースの融合戦略を設計し、融合結果の情報表現能力を向上させる。このようにして、GAN-HAによって生成された融合画像は、熱標的の塩分濃度とテクスチャの鋭さの両方をより効果的に維持することができる。様々な公開データセットに対する大規模な実験は、他の最先端(SOTA)アルゴリズムよりもGAN-HAの方が優れていることを示し、実用的な応用の可能性を示している。

Infrared and visible image fusion (IVIF) aims to preserve thermal radiation information from infrared images while integrating texture details from visible images. Thermal radiation information is mainly expressed through image intensities, while texture details are typically expressed through image gradients. However, existing dual-discriminator generative adversarial networks (GANs) often rely on two structurally identical discriminators for learning, which do not fully account for the distinct learning needs of infrared and visible image information. To this end, this paper proposes a novel GAN with a heterogeneous dual-discriminator network and an attention-based fusion strategy (GAN-HA). Specifically, recognizing the intrinsic differences between infrared and visible images, we propose, for the first time, a novel heterogeneous dual-discriminator network to simultaneously capture thermal radiation information and texture details. The two discriminators in this network are structurally different, including a salient discriminator for infrared images and a detailed discriminator for visible images. They are able to learn rich image intensity information and image gradient information, respectively. In addition, a new attention-based fusion strategy is designed in the generator to appropriately emphasize the learned information from different source images, thereby improving the information representation ability of the fusion result. In this way, the fused images generated by GAN-HA can more effectively maintain both the salience of thermal targets and the sharpness of textures. Extensive experiments on various public datasets demonstrate the superiority of GAN-HA over other state-of-the-art (SOTA) algorithms while showcasing its higher potential for practical applications.

翻訳日:2024-09-07 03:22:33 公開日:2024-09-02

# 暗黒物質とダークエネルギーの代替としての宇宙スケールの量子効果

Quantum Effects on Cosmic Scales as an Alternative to Dark Matter and Dark Energy ( http://arxiv.org/abs/2409.02954v1 )

ライセンス: Link先を確認

Da-Ming Chen, Lin Wang,

(参考訳) スピンねじれ理論 (英: spin-torsion theory) は、アインシュタインの一般相対性理論 (GR) に微小粒子のスピンを組み込むことで拡張する重力に対するゲージ理論である。本研究では、スピンねじれ理論をさらに発展させ、自由落下するマクロ粒子を含む球対称および静的重力系について検討する。我々は、マクロな物質の量子スピンが宇宙スケールで注目されるようになると仮定する。さらに、ディラックスピノルとディラック方程式は、粒子とその関連する過程のすべての重要な物理的特性を適切に捉えていると仮定する。このアプローチの重要な側面は、ディラック方程式の定数質量をスケール関数で置換することであり、量子効果と重力系のスケールとの接続を確立することができる。このメカニズムは、マクロな物質の量子効果がスケール依存であり、微小粒子では観測されない現象である局所的に減少することを保証している。任意の物質密度分布について、我々の理論は質量式内の量子ポテンシャルエネルギー(QPE)という追加の量子項を予測する。 QPEは時間拡張と距離収縮を誘導し、重力井戸を模倣する。宇宙論に適用すると、QPEはアインシュタインが静的宇宙論モデルで重力のバランスをとるために導入した宇宙定数に匹敵するものとして機能する。 QPEはまた、ハッブル赤方偏移の起源(伝統的には宇宙の膨張に由来する)のもっともらしい説明も提供している。予測光度距離-赤方偏移関係は、SNe Iaの宇宙試料から得られたSNe Iaデータと非常によく一致している。銀河の文脈では、QPEはダークマターに相当するものとして機能する。予測された円速度はSPARC (Spitzer Photometry and Accurate Rotation Curves database) の回転曲線データとよく一致している。

The spin-torsion theory is a gauge theory approach to gravity that expands upon Einstein's general relativity (GR) by incorporating the spin of microparticles. In this study, we further develop the spin-torsion theory to examine spherically symmetric and static gravitational systems that involve free-falling macroscopic particles. We posit that the quantum spin of macroscopic matter becomes noteworthy at cosmic scales. We further assume that the Dirac spinor and Dirac equation adequately capture all essential physical characteristics of the particles and their associated processes. A crucial aspect of our approach involves substituting the constant mass in the Dirac equation with a scale function, allowing us to establish a connection between quantum effects and the scale of gravitational systems. This mechanism ensures that the quantum effect of macroscopic matter is scale-dependent and diminishes locally, a phenomenon not observed in microparticles. For any given matter density distribution, our theory predicts an additional quantum term, the quantum potential energy (QPE), within the mass expression. The QPE induces time dilation and distance contraction, and thus mimics a gravitational well. When applied to cosmology, the QPE serves as a counterpart to the cosmological constant introduced by Einstein to balance gravity in his static cosmological model. The QPE also offers a plausible explanation for the origin of Hubble redshift (traditionally attributed to the universe's expansion). The predicted luminosity distance--redshift relation aligns remarkably well with SNe Ia data from the cosmological sample of SNe Ia. In the context of galaxies, the QPE functions as the equivalent of dark matter. The predicted circular velocities align well with rotation curve data from the SPARC (Spitzer Photometry and Accurate Rotation Curves database) sample.

翻訳日:2024-09-07 01:16:35 公開日:2024-09-02

# ジェネレーティブAIによるコード生成ツールがソフトウェアエンジニアの雇用に与える影響:リクルーターの経験、知覚、戦略

The Impact of Generative AI-Powered Code Generation Tools on Software Engineer Hiring: Recruiters' Experiences, Perceptions, and Strategies ( http://arxiv.org/abs/2409.00875v1 )

ライセンス: Link先を確認

Alyssia Chen, Timothy Huo, Yunhee Nam, Dan Port, Anthony Peruma,

(参考訳) ChatGPTやGitHub CopilotといったGenerative AI(GenAI)ツールの急速な進歩は、コード生成タスクを自動化することで、ソフトウェアエンジニアリングを変革している。これらのツールは開発者の生産性を向上させる一方で、ソフトウェアエンジニアリング候補の真の能力と潜在能力を評価する際に、組織や専門家を雇う上での課題も提示している。業界と学界の両方でこれらのツールに関する研究は存在するが、これらのツールが採用プロセスにどのように影響するかについては、研究の欠如がある。そこで本研究では,GenAIを利用したコード生成ツールに対する採用者の経験と認識,および候補評価の課題と戦略について検討する。業界の専門家32人を対象に行った調査では、ほとんどの参加者はそのようなツールに精通しているが、ほとんどの組織は、これらのツールの使用・知識を考慮に入れた候補評価手法を調整していない。面接中、候補者がこれらのツールの使用を許可すべきかどうかについては意見が分かれており、多くの参加者は、これらのツールを使用する上で、効果的に自分のスキルを発揮できる候補者を評価する。さらに、ほとんどの参加者は、GenAIを利用したコード生成ツールをコンピュータサイエンスカリキュラムに組み込むことが重要であると考えており、それを行う上で重要なリスクとメリットについて言及している。

The rapid advancements in Generative AI (GenAI) tools, such as ChatGPT and GitHub Copilot, are transforming software engineering by automating code generation tasks. While these tools improve developer productivity, they also present challenges for organizations and hiring professionals in evaluating software engineering candidates' true abilities and potential. Although there is existing research on these tools in both industry and academia, there is a lack of research on how these tools specifically affect the hiring process. Therefore, this study aims to explore recruiters' experiences and perceptions regarding GenAI-powered code generation tools, as well as their challenges and strategies for evaluating candidates. Findings from our survey of 32 industry professionals indicate that although most participants are familiar with such tools, the majority of organizations have not adjusted their candidate evaluation methods to account for candidates' use/knowledge of these tools. There are mixed opinions on whether candidates should be allowed to use these tools during interviews, with many participants valuing candidates who can effectively demonstrate their skills in using these tools. Additionally, most participants believe that it is important to incorporate GenAI-powered code generation tools into computer science curricula and mention the key risks and benefits of doing so.

翻訳日:2024-09-06 08:40:50 公開日:2024-09-02

# 添加物製造におけるディジタルツイン : システムレビュー

Digital Twins in Additive Manufacturing: A Systematic Review ( http://arxiv.org/abs/2409.00877v1 )

ライセンス: Link先を確認

Md Manjurul Ahsan, Benjamin Bevans, Chris Billings, Alexander Riensche, Yingtao Liu, Shivakumar Raman, Zahed Siddique,

(参考訳) Digital Twins (DT) は、AMマシンの物理的コンポーネントの仮想レプリカを作成する能力によって、リアルタイム生産監視に役立っているため、アダプティブマニュファクチャリング (AM) で人気が高まっている。機械学習(ML)、拡張現実(AR)、シミュレーションベースのモデルといった高度な技術は、製造プロセスにおいてインテリジェントで適応可能なDTを開発する上で重要な役割を果たします。しかし、スケーラビリティ、高品質なデータの統合、DT開発におけるリアルタイムアプリケーションに必要な計算能力について疑問が残る。 AMにおけるDTの現在の状態を理解することは、これらの課題に対処し、AMプロセスを進める上でそのポテンシャルを完全に活用するために不可欠である。この機会を考慮して、本研究は以下の4つの研究課題に対処することで、AMにおけるDTの総合的な概要を提供することを目的としている。 2)最近のDTの開発と実装について教えてください。 (3)プロセス改善とハイブリッド製造にDTはどのように使われているか? (4) DTは産業用 4.0 技術とどのように統合されているか? 現在の応用と技術について議論することで、AMやDTの研究者や実践者に対して、より深い理解と今後の研究の方向性を提供することを目指している。

Digital Twins (DTs) are becoming popular in Additive Manufacturing (AM) due to their ability to create virtual replicas of physical components of AM machines, which helps in real-time production monitoring. Advanced techniques such as Machine Learning (ML), Augmented Reality (AR), and simulation-based models play key roles in developing intelligent and adaptable DTs in manufacturing processes. However, questions remain regarding scalability, the integration of high-quality data, and the computational power required for real-time applications in developing DTs. Understanding the current state of DTs in AM is essential to address these challenges and fully utilize their potential in advancing AM processes. Considering this opportunity, this work aims to provide a comprehensive overview of DTs in AM by addressing the following four research questions: (1) What are the key types of DTs used in AM and their specific applications? (2) What are the recent developments and implementations of DTs? (3) How are DTs employed in process improvement and hybrid manufacturing? (4) How are DTs integrated with Industry 4.0 technologies? By discussing current applications and techniques, we aim to offer a better understanding and potential future research directions for researchers and practitioners in AM and DTs.

翻訳日:2024-09-06 08:40:50 公開日:2024-09-02

# ガウス的不安定チャネルとガウス的操舵の計算可能な定量化

Gaussian unsteerable channels and computable quantifications of Gaussian steering ( http://arxiv.org/abs/2409.00878v1 )

ライセンス: Link先を確認

Taotao Yan, Jie Guo, Jinchuan Hou, Xiaofei Qi, Kan He,

(参考訳) 連続変数系に対するガウスの操舵に関する現在の量子資源理論は欠陥があり不完全である。その主な欠点は、ガウスの不安定な状態からガウスの不安定な状態へ変換するガウスのチャネルのアーキテクチャの不十分な理解に起因し、自由な操作の限定的な選択に繋がる。本稿では,そのような$(m+n)$-mode Gaussianチャネルの構造を深く探求し,ガウス的非ステアブルチャネルのクラスと最大ガウス的非ステアブルチャネルのクラスを導入する。また、2つの量子化も提案する: $\mathcal{J}_{j}$ $(j=1,2)$ of $(m+n)$-mode Gaussian steering from $A$ to $B$。ガウス状態の共分散行列にのみ依存するため、$\mathcal{J}_{j}$の値の計算は単純で効率的である。 $\mathcal{J}_{j}$s は真のガウス的ステアリング測度ではないが、あるガウス的不安定チャネルの下での非増加のような良い性質を持っている。さらに、${\mathcal J}_2$ とガウスの操舵測度 $\mathcal N_3$ を比較すると、${\mathcal J}_2$ があるクラス$(1+1)$-mode Gaussian純状態における $\mathcal N_3$ の上界であることが分かる。例として、マルコフ環境におけるガウスステアリングの挙動を議論するために$\mathcal J_2$を応用し、量子ステアリングにおける急激な崩壊の興味深い現象を明らかにする1+1$モードガウス状態について述べる。

The current quantum resource theory for Gaussian steering for continuous-variable systems is flawed and incomplete. Its primary shortcoming stems from an inadequate comprehension of the architecture of Gaussian channels transforming Gaussian unsteerable states into Gaussian unsteerable states, resulting in a restricted selection of free operations. In the present paper, we explore in depth the structure of such $(m+n)$-mode Gaussian channels, and introduce the class of the Gaussian unsteerable channels and the class of maximal Gaussian unsteerable channels, both of them may be chosen as the free operations, which completes the resource theory for Gaussian steering from $A$ to $B$ by Alice's Gaussian measurements. We also propose two quantifications $\mathcal{J}_{j}$ $(j=1,2)$ of $(m+n)$-mode Gaussian steering from $A$ to $B$. The computation of the value of $\mathcal{J}_{j}$ is straightforward and efficient, as it solely relies on the covariance matrices of Gaussian states, eliminating the need for any optimization procedures. Though $\mathcal{J}_{j}$s are not genuine Gaussian steering measures, they have some nice properties such as non-increasing under certain Gaussian unsteerable channels. Additionally, we compare ${\mathcal J}_2$ with the Gaussian steering measure $\mathcal N_3$, which is based on the Uhlmann fidelity, revealing that ${\mathcal J}_2$ is an upper bound of $\mathcal N_3$ at certain class of $(1+1)$-mode Gaussian pure states. As an illustration, we apply $\mathcal J_2$ to discuss the behaviour of Gaussian steering for a special class of $(1+1)$-mode Gaussian states in Markovian environments, which uncovers the intriguing phenomenon of rapid decay in quantum steering.

翻訳日:2024-09-06 08:40:50 公開日:2024-09-02

# パラメータ数を超える:専門家のソフトな混ざり合いに暗黙のバイアス

Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts ( http://arxiv.org/abs/2409.00879v1 )

ライセンス: Link先を確認

Youngseog Chung, Dhruv Malik, Jeff Schneider, Yuanzhi Li, Aarti Singh,

(参考訳) スパースミキチャー・オブ・エキスパート(MoE)モデルに関する伝統的な見解は、単一の大規模専門家を訓練する代わりに、計算コストがかかるため、多数の小規模専門家を訓練できるというものである。小さい専門家の総パラメータ数が特異な大専門家のそれと等しければ、我々は、計算的トラクタビリティを得ながら、専門家の表現力を保ち、専門家の専門化を促進することを期待する。最近導入されたSoft MoEは、Sparse MoEの離散ルーティング機構をトークンを滑らかに混合する微分可能なゲーティング関数に置き換えている。このスムーズなゲーティング関数はスパースMoEに関連する様々なトレーニング不安定性を緩和するが、ソフトMoEの表現力に影響を及ぼす暗黙のバイアスを誘発するか、専門家の専門化の可能性は明らかでない。単元的に強力な専門家を持つSoft MoEは、単純な凸関数を表現できないことを証明した。このことは、Soft MoEの成功は、一大専門家の表現力を総合的に模倣する多くの小さな専門家の伝統的な視点では説明できないこと、そして複数の専門家が(固定された総パラメータ数であっても)優れた表現力を達成するために実際に必要であることを正当化している。本研究は,Soft MoEのエキスパート専門化の概念を導入し,パラメータの総数を変えながら,以下の(計算上は難解な)課題を考察する。入力が与えられたら、この入力のラベルを予測するための専門的なサブセットを見つけるにはどうすればよいのか? 経験的に、小さな専門家がたくさんいると、アーキテクチャは暗黙的に偏りがあり、専門的な専門家のサブセットを効率的に近似できることを示している。提案手法は推論時の計算量を削減するために容易に実装できる。

The traditional viewpoint on Sparse Mixture of Experts (MoE) models is that instead of training a single large expert, which is computationally expensive, we can train many small experts. The hope is that if the total parameter count of the small experts equals that of the singular large expert, then we retain the representation power of the large expert while gaining computational tractability and promoting expert specialization. The recently introduced Soft MoE replaces the Sparse MoE's discrete routing mechanism with a differentiable gating function that smoothly mixes tokens. While this smooth gating function successfully mitigates the various training instabilities associated with Sparse MoE, it is unclear whether it induces implicit biases that affect Soft MoE's representation power or potential for expert specialization. We prove that Soft MoE with a single arbitrarily powerful expert cannot represent simple convex functions. This justifies that Soft MoE's success cannot be explained by the traditional viewpoint of many small experts collectively mimicking the representation power of a single large expert, and that multiple experts are actually necessary to achieve good representation power (even for a fixed total parameter count). Continuing along this line of investigation, we introduce a notion of expert specialization for Soft MoE, and while varying the number of experts yet fixing the total parameter count, we consider the following (computationally intractable) task. Given any input, how can we discover the expert subset that is specialized to predict this input's label? We empirically show that when there are many small experts, the architecture is implicitly biased in a fashion that allows us to efficiently approximate the specialized expert subset. Our method can be easily implemented to potentially reduce computation during inference.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# 組込み型展開のためのVAEを用いたアウト・オブ・ディストリビューション検出器の圧縮

Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment ( http://arxiv.org/abs/2409.00880v1 )

ライセンス: Link先を確認

Aditya Bansal, Michael Yuhas, Arvind Easwaran,

(参考訳) アウト・オブ・ディストリビューション(OOD)検出器は、機械学習モデルのトレーニングディストリビューションの外でサンプルを識別することで、組み込みサイバー物理システムの安全モニターとして機能し、潜在的に安全でないアクションを防ぐことができる。しかし、OOD検出器はディープニューラルネットワークを使ってしばしば実装されるため、メモリと電力の制約のある組み込みシステムのリアルタイムな期限を満たすことは困難である。我々は,OOD検出を潜在空間で行う可変オートエンコーダ(VAE)に基づくOOD検出器のクラスを検討し,定量化,プルーニング,知識蒸留を適用した。これらの手法は他の深層モデルに対しても検討されてきたが、遅延空間のOOD検出に組み合わせた効果は検討されていない。これらの技術はVOEのテスト損失を増加させるが、これはOOD検出性能の比例的な低下には対応せず、組込みCPUやGPU上でリアルタイムに推測できるリーンOOD検出器を開発するために活用する。本稿では,3つの圧縮技術を組み合わせて,OOD検出器のAUROCを維持しながら,メモリと実行時間を著しく短縮する設計手法を提案する。この手法をJetson Nano上に2つの既存のOOD検出器を用いて実証し、GPUとCPUの推論時間をそれぞれ20%と28%削減し、AUROCをベースラインの5%に抑える。

Out-of-distribution (OOD) detectors can act as safety monitors in embedded cyber-physical systems by identifying samples outside a machine learning model's training distribution to prevent potentially unsafe actions. However, OOD detectors are often implemented using deep neural networks, which makes it difficult to meet real-time deadlines on embedded systems with memory and power constraints. We consider the class of variational autoencoder (VAE) based OOD detectors where OOD detection is performed in latent space, and apply quantization, pruning, and knowledge distillation. These techniques have been explored for other deep models, but no work has considered their combined effect on latent space OOD detection. While these techniques increase the VAE's test loss, this does not correspond to a proportional decrease in OOD detection performance and we leverage this to develop lean OOD detectors capable of real-time inference on embedded CPUs and GPUs. We propose a design methodology that combines all three compression techniques and yields a significant decrease in memory and execution time while maintaining AUROC for a given OOD detector. We demonstrate this methodology with two existing OOD detectors on a Jetson Nano and reduce GPU and CPU inference time by 20% and 28% respectively while keeping AUROC within 5% of the baseline.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# SAFE:ソフトウェア脆弱性検出のための意味的・統語的関係の活用における大規模言語モデルの改善

SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection ( http://arxiv.org/abs/2409.00882v1 )

ライセンス: Link先を確認

Van Nguyen, Surya Nepal, Tingmin Wu, Xingliang Yuan, Carsten Rudolph,

(参考訳) ソフトウェア脆弱性(SV)は、安全クリティカルなセキュリティシステムにとって、一般的かつ重要な懸念事項として浮上している。これにより、ソフトウェア脆弱性検出(SVD)のための機械学習やディープラーニングなど、AIベースの手法の利用が大幅に進歩した。 AIベースの手法はSVDで有望なパフォーマンスを示しているが、実際の、複雑で多様なソースコードデータセットに対する効果は、実際には限られている。そこで本研究では,SVDのソースコードデータから意味的・統語的関係を学習し,活用する大規模言語モデルの能力を高める新しいフレームワークを提案する。その結果,ソフトウェア脆弱性検出(SVD)問題に効果的に対処するため,ソースコードデータから基本知識の取得を可能とし,意味的・統語的関連性(セマンティック・アソシエーション)を十分に活用することが可能になる。実世界の3つの挑戦的データセット(ReVeal、D2A、Devign)に対する厳密で広範な実験結果は、我々のアプローチが最先端のベースラインと最先端のベースラインよりも優れていることを示している。要約すると、当社のSAFEアプローチは、F1測定で4.79%から9.15%、リコールで16.93%から21.70%のハイパフォーマンスを実現しています。

Software vulnerabilities (SVs) have emerged as a prevalent and critical concern for safety-critical security systems. This has spurred significant advancements in utilizing AI-based methods, including machine learning and deep learning, for software vulnerability detection (SVD). While AI-based methods have shown promising performance in SVD, their effectiveness on real-world, complex, and diverse source code datasets remains limited in practice. To tackle this challenge, in this paper, we propose a novel framework that enhances the capability of large language models to learn and utilize semantic and syntactic relationships from source code data for SVD. As a result, our approach can enable the acquisition of fundamental knowledge from source code data while adeptly utilizing crucial relationships, i.e., semantic and syntactic associations, to effectively address the software vulnerability detection (SVD) problem. The rigorous and extensive experimental results on three real-world challenging datasets (i.e., ReVeal, D2A, and Devign) demonstrate the superiority of our approach over the effective and state-of-the-art baselines. In summary, on average, our SAFE approach achieves higher performances from 4.79% to 9.15% for F1-measure and from 16.93% to 21.70% for Recall compared to the baselines across all datasets used.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# 海馬分節とアルツハイマー病診断のためのハイブリッドパラメーター高能率微調整法

A Novel Hybrid Parameter-Efficient Fine-Tuning Approach for Hippocampus Segmentation and Alzheimer's Disease Diagnosis ( http://arxiv.org/abs/2409.00884v1 )

ライセンス: Link先を確認

Wangang Cheng, Guanghua He, Keli Hu, Mingyu Fang, Liang Dong, Zhong Li, Hancan Zhu,

(参考訳) 深層学習法は医用画像のセグメンテーションを著しく進歩させたが、その成功は手動で注釈付けされた大量のデータに基づいており、正確なラベル付けには専門的な専門知識が必要である。さらに、これらの手法は、特に3次元の医療画像処理において、かなりの計算資源を必要とすることが多い。したがって、注釈付きデータや計算資源を限定した医用画像分割のための深層学習技術の適用は、依然として重要な課題である。本稿では,ハイブリッド並列およびシリアルアーキテクチャを用いたHyPSと呼ばれる,パラメータ効率の高いファインチューニング手法を提案する。 HyPSはモデルパラメータの最小限のサブセットを更新し、トレーニング済みモデルの本来の知識トラクチャーを維持しながら、下流タスクに関連する特定の特徴を学習する能力を向上する。医用画像分割のための最先端SwinUNETRモデルに適用する。当初、このモデルはBraTs2021データセットで事前トレーニングされ、その後HyPS法が3つの異なる海馬データセットに転送される。さらに, セグメンテーションの結果をもとに, ADNIデータセットから海馬の体積を算出し, それらをメタデータと組み合わせて病型分類を行った。アルツハイマー病(AD)と認知正常(CN)の個人、および早期軽度認知障害(EMCI)と後期軽度認知障害(LMCI)の区別において、HyPSはそれぞれ83.78%と64.29%の分類精度を達成した。以上の結果から,HyPS法はトレーニング済みモデルを用いた海馬セグメンテーションを効果的に促進するだけでなく,アルツハイマー病の検出を支援する可能性も示唆された。私たちのコードは公開されています。

Deep learning methods have significantly advanced medical image segmentation, yet their success hinges on large volumes of manually annotated data, which require specialized expertise for accurate labeling. Additionally, these methods often demand substantial computational resources, particularly for three-dimensional medical imaging tasks. Consequently, applying deep learning techniques for medical image segmentation with limited annotated data and computational resources remains a critical challenge. In this paper, we propose a novel parameter-efficient fine-tuning strategy, termed HyPS, which employs a hybrid parallel and serial architecture. HyPS updates a minimal subset of model parameters, thereby retaining the pre-trained model's original knowledge tructure while enhancing its ability to learn specific features relevant to downstream tasks. We apply this strategy to the state-of-the-art SwinUNETR model for medical image segmentation. Initially, the model is pre-trained on the BraTs2021 dataset, after which the HyPS method is employed to transfer it to three distinct hippocampus datasets.Extensive experiments demonstrate that HyPS outperforms baseline methods, especially in scenarios with limited training samples. Furthermore, based on the segmentation results, we calculated the hippocampal volumes of subjects from the ADNI dataset and combined these with metadata to classify disease types. In distinguishing Alzheimer's disease (AD) from cognitively normal (CN) individuals, as well as early mild cognitive impairment (EMCI) from late mild cognitive impairment (LMCI), HyPS achieved classification accuracies of 83.78% and 64.29%, respectively. These findings indicate that the HyPS method not only facilitates effective hippocampal segmentation using pre-trained models but also holds potential for aiding Alzheimer's disease detection. Our code is publicly available.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# ユーザプロファイルを考慮した事前学習モデルとパラメータ効率の良いファインチューニングによるユーザ特異的対話生成

User-Specific Dialogue Generation with User Profile-Aware Pre-Training Model and Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2409.00887v1 )

ライセンス: Link先を確認

Atsushi Otsuka, Kazuya Matsuo, Ryo Ishii, Narichika Nomoto, Hiroaki Sugiyama,

(参考訳) 本稿では,ユーザ固有のダイアログについて述べる。パーソナライズされた対話を人格記述で定義した仮想ユーザ対話に焦点をあてた以前の研究とは対照的に、ユーザ固有の対話は、人格に基づく対話以外の実際のユーザ対話を再現することを目的としている。対象ユーザの対話履歴を用いた微調整は,ユーザ固有のモデルの効率的な学習方法である。しかし、少量のデータのために過度に適合し、破壊する傾向がある。そこで本研究では,パラメータ効率の良い微調整と,ユーザプロファイルを含む事前学習された対話モデルを組み合わせることで,ユーザ固有モデルの学習手法を提案する。パラメータ効率の良い微調整は、モデル全体に少数のパラメータを追加するため、少量のトレーニングデータでも効率的にトレーニングすることができ、モデル破壊に対して堅牢である。さらに、自動推論されたユーザプロファイルに対する簡単なプロンプトを追加して学習した事前学習モデルは、微調整中のトレーニングデータが少ない場合でも、ユーザのプロファイルに関する知識を増強した音声を生成することができる。実験では,ユーザの個人情報を含むプロンプトを用いて,提案モデルと大言語モデル発話生成を比較した。実ユーザの発話を再現する実験により,提案モデルでは,小さいモデルであっても,比較手法よりも再現性の高い発話を生成できることが判明した。

This paper addresses user-specific dialogs. In contrast to previous research on personalized dialogue focused on achieving virtual user dialogue as defined by persona descriptions, user-specific dialogue aims to reproduce real-user dialogue beyond persona-based dialogue. Fine-tuning using the target user's dialogue history is an efficient learning method for a user-specific model. However, it is prone to overfitting and model destruction due to the small amount of data. Therefore, we propose a learning method for user-specific models by combining parameter-efficient fine-tuning with a pre-trained dialogue model that includes user profiles. Parameter-efficient fine-tuning adds a small number of parameters to the entire model, so even small amounts of training data can be trained efficiently and are robust to model destruction. In addition, the pre-trained model, which is learned by adding simple prompts for automatically inferred user profiles, can generate speech with enhanced knowledge of the user's profile, even when there is little training data during fine-tuning. In experiments, we compared the proposed model with large-language-model utterance generation using prompts containing users' personal information. Experiments reproducing real users' utterances revealed that the proposed model can generate utterances with higher reproducibility than the compared methods, even with a small model.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# 配列モデルにおける過度パラメータ化による適応性の向上

Improving Adaptivity via Over-Parameterization in Sequence Models ( http://arxiv.org/abs/2409.00894v1 )

ライセンス: Link先を確認

Yicheng Li, Qian Lin,

(参考訳) カーネルの固有関数がカーネル回帰において重要な役割を果たすことはよく知られている。いくつかの例を通して、同じ固有関数の集合であっても、これらの関数の順序が回帰結果に大きな影響を及ぼすことを示した。カーネルを対角化することでモデルを単純化し、列モデルの領域に過度にパラメータ化された勾配降下を導入し、固定された固有関数集合の様々な順序の影響を捉える。この方法は様々な固有関数順序の影響を探索するために設計されている。理論的には、過パラメータ化勾配流は信号の基盤構造に適応し、バニラ勾配流法を著しく上回ることを示す。さらに,より深いパラメータ化により,モデルの一般化能力がさらに向上することを示す。これらの結果は、オーバーパラメータ化のメリットに関する新たな視点を提供するだけでなく、カーネル体制を超えたニューラルネットワークの適応性と一般化の可能性に関する洞察を提供する。

It is well known that eigenfunctions of a kernel play a crucial role in kernel regression. Through several examples, we demonstrate that even with the same set of eigenfunctions, the order of these functions significantly impacts regression outcomes. Simplifying the model by diagonalizing the kernel, we introduce an over-parameterized gradient descent in the realm of sequence model to capture the effects of various orders of a fixed set of eigen-functions. This method is designed to explore the impact of varying eigenfunction orders. Our theoretical results show that the over-parameterization gradient flow can adapt to the underlying structure of the signal and significantly outperform the vanilla gradient flow method. Moreover, we also demonstrate that deeper over-parameterization can further enhance the generalization capability of the model. These results not only provide a new perspective on the benefits of over-parameterization and but also offer insights into the adaptivity and generalization potential of neural networks beyond the kernel regime.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# シャロウフェイクとディープフェイクの局所化のためのノイズとエッジ抽出に基づく二分岐法

A Noise and Edge extraction-based dual-branch method for Shallowfake and Deepfake Localization ( http://arxiv.org/abs/2409.00896v1 )

ライセンス: Link先を確認

Deepak Dagar, Dinesh Kumar Vishwakarma,

(参考訳) マルチメディアの信頼性は、高度な画像操作ローカライゼーション(IML)技術によってますます評価され、その結果、IMLフィールドが出現している。有効な操作モデルは、操作された部分と正当な部分の間の非意味的な差分の特徴を抽出し、アーティファクトを利用する必要がある。これは2つの領域間の直接比較を必要とする。と。現在のモデルでは、手作りの特徴に基づく機能アプローチ、畳み込みニューラルネットワーク(CNN)、あるいは両方を組み合わせたハイブリッドアプローチが採用されている。ハンドクラフト機能アプローチは事前にタンパリングを前提としており、それによって様々なタンパ処理の処理効率が制限されるが、CNNはアーティファクトに対処するには不十分なセマンティック情報をキャプチャする。これらの制約に対処するため,従来のCNN機能と手動で設計した特徴雑音を統合するデュアルブランチモデルを開発した。このモデルはデュアルブランチ戦略を採用しており、一方のブランチはノイズ特性を統合し、もう一方のブランチは階層的なConvNextモジュールを使用してRGB機能を統合する。さらに、エッジ監視損失を利用して境界操作情報を取得し、エッジの正確な位置決めを行う。さらに、この機能拡張モジュールを使用して属性の表示を最適化し、洗練する。 shallowfakesデータセット (CASIA, COVERAGE, COLUMBIA, NIST16) とディープフェイクデータセット Faceforensics++ (FF++) は、他のベースラインモデルと比較して特徴と優れたパフォーマンスを抽出する優れた能力を示すために、徹底的なテストを実施した。 AUCの得点は99%だった。このモデルは比較において優れており、既存の最先端モデル(SoTA)よりも容易に優れている。

The trustworthiness of multimedia is being increasingly evaluated by advanced Image Manipulation Localization (IML) techniques, resulting in the emergence of the IML field. An effective manipulation model necessitates the extraction of non-semantic differential features between manipulated and legitimate sections to utilize artifacts. This requires direct comparisons between the two regions.. Current models employ either feature approaches based on handcrafted features, convolutional neural networks (CNNs), or a hybrid approach that combines both. Handcrafted feature approaches presuppose tampering in advance, hence restricting their effectiveness in handling various tampering procedures, but CNNs capture semantic information, which is insufficient for addressing manipulation artifacts. In order to address these constraints, we have developed a dual-branch model that integrates manually designed feature noise with conventional CNN features. This model employs a dual-branch strategy, where one branch integrates noise characteristics and the other branch integrates RGB features using the hierarchical ConvNext Module. In addition, the model utilizes edge supervision loss to acquire boundary manipulation information, resulting in accurate localization at the edges. Furthermore, this architecture utilizes a feature augmentation module to optimize and refine the presentation of attributes. The shallowfakes dataset (CASIA, COVERAGE, COLUMBIA, NIST16) and deepfake dataset Faceforensics++ (FF++) underwent thorough testing to demonstrate their outstanding ability to extract features and their superior performance compared to other baseline models. The AUC score achieved an astounding 99%. The model is superior in comparison and easily outperforms the existing state-of-the-art (SoTA) models.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# 空に侵入する:地球観測星団におけるデータ遅延とオーバーフロー攻撃

Infiltrating the Sky: Data Delay and Overflow Attacks in Earth Observation Constellations ( http://arxiv.org/abs/2409.00897v1 )

ライセンス: Link先を確認

Xiaojian Wang, Ruozhou Yu, Dejun Yang, Guoliang Xue,

(参考訳) 低地球軌道(LEO)地球観測(EO)衛星は、地球を観測する方法を変えました。移動カメラのように、EO衛星は異なるミッションと優先順位の星座に形成され、処理のために地上に送信する必要がある膨大なデータを捕捉する。しかし、EO衛星はダウンリンク通信能力が非常に限られており、送信帯域、地上局の数と位置、高速衛星移動による小さな送信窓によって制限されている。資源利用を最適化するために、EOコンステレーションは、通信効率の最大化のために、通信スペクトルと地上局を共有することが期待されている。本稿では,EOコンステレーションにおける資源競争による新たな攻撃面について検討し,地球観測データの遅延や低下を正統なEOサービスを用いて検討する。具体的には、攻撃者は高優先度要求を注入して、一時的に低優先度データ送信ウィンドウをプリエンプトすることができる。さらに、予測可能な衛星力学を利用することで、攻撃者は低優先度の衛星から重要なデータを知的にターゲットし、配信を遅らせるか、データを不可逆的に落とすかのどちらかを示す。我々は、データ遅延攻撃とデータオーバーフロー攻撃の2つの攻撃を定式化し、攻撃者が攻撃戦略を考案するのを支援するアルゴリズムを設計し、典型的なシナリオにおけるその実現可能性や最適性を分析する。次に、実世界の衛星画像と軌道データを用いてトレース駆動シミュレーションを行い、現実的な衛星通信環境下でこれらの攻撃を発射する確率を評価する。これらの攻撃に対する防御の可能性についても論じる。

Low Earth Orbit (LEO) Earth Observation (EO) satellites have changed the way we monitor Earth. Acting like moving cameras, EO satellites are formed in constellations with different missions and priorities, and capture vast data that needs to be transmitted to the ground for processing. However, EO satellites have very limited downlink communication capability, limited by transmission bandwidth, number and location of ground stations, and small transmission windows due to high velocity satellite movement. To optimize resource utilization, EO constellations are expected to share communication spectrum and ground stations for maximum communication efficiency. In this paper, we investigate a new attack surface exposed by resource competition in EO constellations, targeting the delay or drop of Earth monitoring data using legitimate EO services. Specifically, an attacker can inject high-priority requests to temporarily preempt low-priority data transmission windows. Furthermore, we show that by utilizing predictable satellite dynamics, an attacker can intelligently target critical data from low-priority satellites, either delaying its delivery or irreversibly dropping the data. We formulate two attacks, the data delay attack and the data overflow attack, design algorithms to assist attackers in devising attack strategies, and analyze their feasibility or optimality in typical scenarios. We then conduct trace-driven simulations using real-world satellite images and orbit data to evaluate the success probability of launching these attacks under realistic satellite communication settings. We also discuss possible defenses against these attacks.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# 深部ReLUニューラルネットワークを用いたソボレフとベソフ関数の最適近似について

On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks ( http://arxiv.org/abs/2409.00901v1 )

ライセンス: Link先を確認

Yunfei Yang,

(参考訳) 本稿では, ソボレフ空間 $\mathcal{W}^{s,q}([0,1]^d)$ および Besov 空間 $\mathcal{B}^s_{q,r}([0,1]^d)$ において, 誤差が$L^p([0,1]^d)$ノルムで測定された場合, 幅が$W$ で深さが$L$ の深いReLUニューラルネットワークによって近似できる問題について検討する。この問題はいくつかの最近の研究によって研究され、ソボレフ埋め込み条件が 1/q −1/p<s/d$ であるときに、$p=q=\infty$ のときの対数係数への近似率 $\mathcal{O}((WL)^{-2s/d})$ と、固定幅のネットワークに対する $\mathcal{O}(L^{-2s/d})$ が成立するときに得られる。これらの結果を一般化するために、$\mathcal{O}((WL)^{-2s/d})$が実際にソボレフ埋め込み条件の下で成り立つことを示す。この値は対数因子に最適であることが知られている。我々の証明の鍵となるツールは、幅と深さの異なる深部ReLUニューラルネットワークを用いてスパースベクトルを符号化することである。

This paper studies the problem of how efficiently functions in the Sobolev spaces $\mathcal{W}^{s,q}([0,1]^d)$ and Besov spaces $\mathcal{B}^s_{q,r}([0,1]^d)$ can be approximated by deep ReLU neural networks with width $W$ and depth $L$, when the error is measured in the $L^p([0,1]^d)$ norm. This problem has been studied by several recent works, which obtained the approximation rate $\mathcal{O}((WL)^{-2s/d})$ up to logarithmic factors when $p=q=\infty$, and the rate $\mathcal{O}(L^{-2s/d})$ for networks with fixed width when the Sobolev embedding condition $1/q -1/p<s/d$ holds. We generalize these results by showing that the rate $\mathcal{O}((WL)^{-2s/d})$ indeed holds under the Sobolev embedding condition. It is known that this rate is optimal up to logarithmic factors. The key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth, which may be of independent interest.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# MV-Match:植物栄養失調のドメイン適応同定のためのマルチビューマッチング

MV-Match: Multi-View Matching for Domain-Adaptive Identification of Plant Nutrient Deficiencies ( http://arxiv.org/abs/2409.00903v1 )

ライセンス: Link先を確認

Jinhui Yi, Yanan Luo, Marion Deichmann, Gabriel Schaaf, Juergen Gall,

(参考訳) 栄養不足の早期、非侵襲的、オンサイト検出は、栄養不足による作物の大きな損失を防ぐためのタイムリーな行動を可能にするために重要である。ラベル付きデータを取得するのは非常に高価ですが、作物の複数のビューから画像を集めるのは簡単です。実用的な応用に関連があるにもかかわらず、ラベル付けされたソースドメインとラベル付けされていないターゲットドメインに対して複数のビューが利用できる教師なしのドメイン適応は、未調査の研究領域である。そこで本研究では,ソース領域とターゲット領域における複数のカメラビューを活用して,教師なし領域適応を実現する手法を提案する。 2つの栄養失調データセットに対する提案手法の評価を行った。提案手法は、他の教師なし領域適応法と比較して、両データセットの最先端結果を実現する。データセットとソースコードはhttps://github.com/jh-yi/MV-Match.comで入手できる。

An early, non-invasive, and on-site detection of nutrient deficiencies is critical to enable timely actions to prevent major losses of crops caused by lack of nutrients. While acquiring labeled data is very expensive, collecting images from multiple views of a crop is straightforward. Despite its relevance for practical applications, unsupervised domain adaptation where multiple views are available for the labeled source domain as well as the unlabeled target domain is an unexplored research area. In this work, we thus propose an approach that leverages multiple camera views in the source and target domain for unsupervised domain adaptation. We evaluate the proposed approach on two nutrient deficiency datasets. The proposed method achieves state-of-the-art results on both datasets compared to other unsupervised domain adaptation methods. The dataset and source code are available at https://github.com/jh-yi/MV-Match.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# 不完全な車両軌道予測のためのマルチスケールテンポラル核融合変圧器

Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction ( http://arxiv.org/abs/2409.00904v1 )

ライセンス: Link先を確認

Zhanwen Liu, Chao Li, Yang Wang, Nan Yang, Xing Fan, Jiaqi Ma, Xiangmo Zhao,

(参考訳) 運動予測は自律走行システムにおいて重要な役割を担い、周囲の車両の予測に基づいて、より正確な局所経路計画と運転決定を自動運転車が達成できるようにする。しかし、既存の手法では、実際の交通シナリオにおける軌道予測性能を必然的に低下させるオブジェクトの閉塞や知覚障害などによる潜在的な欠落値を無視している。この制限に対処するために,Multi-scale Temporal Fusion Transformer (MTFT) という,Multi-scale Attention Head (MAH) とContinuity Representation-guided Multi-scale Fusion (CRMF) モジュールからなる,不完全な車両軌道予測のための新しいエンドツーエンドフレームワークを提案する。具体的には、マルチヘッドアテンション機構を利用して、異なる時間的粒度から軌道のマルチスケールの運動表現を並列にキャプチャし、不足値の予測に対する悪影響を軽減する。さらに、マルチスケールの動作表現をCRMFモジュールに入力して、多スケールの融合を行い、車両の頑健な時間的特徴を得る。融合過程において、車両の運動の連続性表現は、最初に時間ステップを通して抽出され、融合を誘導し、結果として生じる時間的特徴が詳細な情報と車両の運動の全体的傾向の両方を包含し、車両の運動傾向と一致する将来の軌道の正確な復号を容易にする。道路交通シナリオと都市交通シナリオから得られた4つのデータセットについて,提案モデルの評価を行った。実験結果から, 不完全な車両軌道予測タスクにおいて, 高Dデータセット上での総合的な性能改善は39%以上であった。

Motion prediction plays an essential role in autonomous driving systems, enabling autonomous vehicles to achieve more accurate local-path planning and driving decisions based on predictions of the surrounding vehicles. However, existing methods neglect the potential missing values caused by object occlusion, perception failures, etc., which inevitably degrades the trajectory prediction performance in real traffic scenarios. To address this limitation, we propose a novel end-to-end framework for incomplete vehicle trajectory prediction, named Multi-scale Temporal Fusion Transformer (MTFT), which consists of the Multi-scale Attention Head (MAH) and the Continuity Representation-guided Multi-scale Fusion (CRMF) module. Specifically, the MAH leverages the multi-head attention mechanism to parallelly capture multi-scale motion representation of trajectory from different temporal granularities, thus mitigating the adverse effect of missing values on prediction. Furthermore, the multi-scale motion representation is input into the CRMF module for multi-scale fusion to obtain the robust temporal feature of the vehicle. During the fusion process, the continuity representation of vehicle motion is first extracted across time steps to guide the fusion, ensuring that the resulting temporal feature incorporates both detailed information and the overall trend of vehicle motion, which facilitates the accurate decoding of future trajectory that is consistent with the vehicle's motion trend. We evaluate the proposed model on four datasets derived from highway and urban traffic scenarios. The experimental results demonstrate its superior performance in the incomplete vehicle trajectory prediction task compared with state-of-the-art models, e.g., a comprehensive performance improvement of more than 39% on the HighD dataset.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# ViRED: エンジニアリング図面における視覚的関係の予測

ViRED: Prediction of Visual Relations in Engineering Drawings ( http://arxiv.org/abs/2409.00909v1 )

ライセンス: Link先を確認

Chao Gu, Ke Lin, Yiyang Luo, Jiahui Hou, Xiang-Yang Li,

(参考訳) エンジニアリング図面を正確に理解するためには,図面内の画像とその記述表との対応性を確立することが不可欠である。既存の文書理解手法は主にテキストを主なモダリティとして重視するが、実際の画像情報を含む文書には適さない。視覚的関係検出の分野では、タスクの構造は本質的に、描画中のすべてのエンティティペア間の関係を評価する能力を制限する。この問題に対処するため、電気工学図面における表と回路の関係を識別する視覚に基づく関係検出モデルViREDを提案する。我々のモデルは、主にビジョンエンコーダ、オブジェクトエンコーダ、リレーショナルデコーダの3つの部分から構成される。 We implement ViRED using PyTorch to evaluation its performance。 ViREDの有効性を検証するために,我々は一連の実験を行った。実験結果から,本手法は工学的描画データセットにおいて,関係予測のタスクにおいて96倍の精度を達成し,既存の手法よりも大幅に改善したことを示す。結果は、単一のエンジニアリング図面に多数のオブジェクトがある場合でも、ViREDは高速に推論できることを示している。

To accurately understand engineering drawings, it is essential to establish the correspondence between images and their description tables within the drawings. Existing document understanding methods predominantly focus on text as the main modality, which is not suitable for documents containing substantial image information. In the field of visual relation detection, the structure of the task inherently limits its capacity to assess relationships among all entity pairs in the drawings. To address this issue, we propose a vision-based relation detection model, named ViRED, to identify the associations between tables and circuits in electrical engineering drawings. Our model mainly consists of three parts: a vision encoder, an object encoder, and a relation decoder. We implement ViRED using PyTorch to evaluate its performance. To validate the efficacy of ViRED, we conduct a series of experiments. The experimental results indicate that, within the engineering drawing dataset, our approach attained an accuracy of 96\% in the task of relation prediction, marking a substantial improvement over existing methodologies. The results also show that ViRED can inference at a fast speed even when there are numerous objects in a single engineering drawing.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# 磁気三層膜に内在する非線形層間交換結合の駆動

Driving noncollinear interlayer exchange coupling intrinsically in magnetic trilayers ( http://arxiv.org/abs/2409.00911v1 )

ライセンス: Link先を確認

Guan-Wei Peng, Hung-Chin Wang, Yu-Jie Zhong, Chao-Cheng Kaun, Ching-Hao Chang,

(参考訳) 非磁性スペーサを金属三層として挟む強磁性側層は、スピントロニクスデバイスを実現するための重要なプラットフォームとなっている。最近の実験では、導電スペーサの幅や性質を操作することにより、側層間の非線形磁気アライメントが誘導されることが示されている。理論解析の結果,スペーサ幅の変化は層間交換結合(IEC)に大きく影響し,非線形アライメントをもたらすことが明らかとなった。解析および第1原理法により、Agスペーサの特定の幅において、側層の磁気モーメントが垂直であることを示す。このアライメントはAg量子井戸状態によって媒介され、3層にわたってスピンスパイラルを示す。以上の結果から,非直線IECは磁気デバイスやブートスピントロニクス技術を制御する自由度に優れており,輸送能力も向上していることが明らかとなった。

Ferromagnetic side layers sandwiching a nonmagnetic spacer as a metallic trilayer has become a pivotal platform for achieving spintronic devices. Recent experiments demonstrate that manipulating the width or the nature of conducting spacer induces noncollinear magnetic alignment between the side layers. Our theoretical analysis reveals that altering the width of spacer significantly affects the interlayer exchange coupling (IEC), resulting in noncollinear alignment. Through analytic and first-principles methods, our study on the Fe/Ag/Fe trilayer shows that at a specific width of the Ag spacer, the magnetic moments of side layers tend to be perpendicular. This alignment is mediated by Ag quantum well states, exhibiting spin spirals across the trilayer. Our results reveal that the noncollinear IEC offers a degree of freedom to control magnetic devices and boot spintronic technology with improved transport capabilities.

翻訳日:2024-09-06 08:30:49 公開日:2024-09-02

# 外観に基づく視線推定改善のための複数データセットのマージ

Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation ( http://arxiv.org/abs/2409.00912v1 )

ライセンス: Link先を確認

Liang Wu, Bertram E. Shi,

(参考訳) 外観に基づく視線推定器のトレーニングとテストのために、複数のデータセットが作成されている。直感的には、より多くのデータがより良いパフォーマンスをもたらすはずです。しかし、1つのエスティマターをトレーニングするためにデータセットを組み合わせることで、視線推定性能が向上することは滅多にない。ひとつは、視線サムプルを得るための実験プロトコルの違いであり、その結果、頭部ポーズの分布、アングルの視線、照明などの違いが生じる可能性がある。もう一つの理由は、視線角(ラベルミスマッチ)を定義する方法の矛盾である。本稿では、複数のデータセット、推定器アーキテクチャの変更、および視線適応モジュールの導入による視線推定性能の向上のための2つのイノベーションを提案する。ほとんどの最先端推定器は、2つの目と顔全体の画像から抽出された情報と平行に融合するか、最初に目からの情報と顔を組み合わせる。提案手法では,2段階トランスフォーマーを用いたGaze-Feature Fusion (TTGF) 法を用いて,両眼と顔の情報を別々にマージし,両眼にマージする。頭部ポーズの変化が左右の眼像に異なる影響を与えるため,頭部ポーズの変動が改善すると考えられる。提案手法は,各データセットにGaze Adaption Moduleを適用して,単一の共有推定器から推定した推定値を補正することにより,アノテーションの不一致を処理する。これにより、ラベル付けの違いに関わらず、データセット間で情報を結合することができます。我々の経験から、これらのイノベーションは、個人と集団の両方(10%から20%)でSOTAの視線推定性能を改善することが示されています。私たちのコードはhttps://github.com/HKUST-NISL/GazeSetMerge.comから入手可能です。

Multiple datasets have been created for training and testing appearance-based gaze estimators. Intuitively, more data should lead to better performance. However, combining datasets to train a single esti-mator rarely improves gaze estimation performance. One reason may be differences in the experimental protocols used to obtain the gaze sam-ples, resulting in differences in the distributions of head poses, gaze an-gles, illumination, etc. Another reason may be the inconsistency between methods used to define gaze angles (label mismatch). We propose two innovations to improve the performance of gaze estimation by leveraging multiple datasets, a change in the estimator architecture and the intro-duction of a gaze adaptation module. Most state-of-the-art estimators merge information extracted from images of the two eyes and the entire face either in parallel or combine information from the eyes first then with the face. Our proposed Two-stage Transformer-based Gaze-feature Fusion (TTGF) method uses transformers to merge information from each eye and the face separately and then merge across the two eyes. We argue that this improves head pose invariance since changes in head pose affect left and right eye images in different ways. Our proposed Gaze Adaptation Module (GAM) method handles annotation inconsis-tency by applying a Gaze Adaption Module for each dataset to correct gaze estimates from a single shared estimator. This enables us to combine information across datasets despite differences in labeling. Our experi-ments show that these innovations improve gaze estimation performance over the SOTA both individually and collectively (by 10% - 20%). Our code is available at https://github.com/HKUST-NISL/GazeSetMerge.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# ネステロフ加速勾配法の一般化連続時間モデル

Generalized Continuous-Time Models for Nesterov's Accelerated Gradient Methods ( http://arxiv.org/abs/2409.00913v1 )

ライセンス: Link先を確認

Chanwoong Park, Youngchae Cho, Insoon Yang,

(参考訳) 近年の研究では、ネステロフの加速勾配法を連続時間モデルで理解することへの関心が高まっている。しかし、既存のほとんどの研究はネステロフの方法の特定のクラスに焦点を当てており、これは深い理解と統一された視点の達成を妨げる。この欠点に対処するため、我々は、Nesterovの手法の幅広い範囲をカバーする一般化された連続時間モデルを提示した。主な貢献は以下の通りである。まず、一般化されたモデルの収束率を特定し、それらから派生した任意の特定の連続時間モデルに対する収束率を決定する必要をなくす。第2に,既存の6つの連続時間モデルが一般化されたモデルの特別な場合であることを示し,これらのモデルを分析し,理解するための統一ツールとして,我々のフレームワークを位置づけた。第三に、一般化されたモデルに基づくネステロフの手法の再起動方式を設計し、目的関数値の単調な減少を確実にすることを示す。モデルの広範な適用性のため、このスキームは元の再起動スキームと比較して、より広範なNesterovの手法のクラスに使用することができる。第4に、一般化されたモデルと連続時間における勾配流の関連を明らかにすることにより、一般化されたモデルの加速収束速度が勾配流の時間再パラメータ化に起因することを示す。理論的解析と結果を支援するための数値実験結果を提供する。

Recent research has indicated a substantial rise in interest in understanding Nesterov's accelerated gradient methods via their continuous-time models. However, most existing studies focus on specific classes of Nesterov's methods, which hinders the attainment of an in-depth understanding and a unified perspective. To address this deficit, we present generalized continuous-time models that cover a broad range of Nesterov's methods, including those previously studied under existing continuous-time frameworks. Our key contributions are as follows. First, we identify the convergence rates of the generalized models, eliminating the need to determine the convergence rate for any specific continuous-time model derived from them. Second, we show that six existing continuous-time models are special cases of our generalized models, thereby positioning our framework as a unifying tool for analyzing and understanding these models. Third, we design a restart scheme for Nesterov's methods based on our generalized models and show that it ensures a monotonic decrease in objective function values. Owing to the broad applicability of our models, this scheme can be used to a broader class of Nesterov's methods compared to the original restart scheme. Fourth, we uncover a connection between our generalized models and gradient flow in continuous time, showing that the accelerated convergence rates of our generalized models can be attributed to a time reparametrization in gradient flow. Numerical experiment results are provided to support our theoretical analyses and results.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# 有限熱貯留層下での量子熱機関の内部サイクルにおけるカルノー限界を超えて

Beyond the Carnot Limit in the Internal Cycles of a Quantum Heat Engine under Finite Heat Reservoirs ( http://arxiv.org/abs/2409.00914v1 )

ライセンス: Link先を確認

L. -L. Yan, M. -R. Yun, M. Li, S. -L. Su, K. -F. Cui, Gang Chen, M. Feng,

(参考訳) 本研究では, 内部サイクルは, 余剰な量子資源(例えばコヒーレンス, スクイーズ特性など)を消費することなく, 通常のカルノー限界よりも高い電子効率を持つことができる2つのナイト熱貯水池に結合した微視的熱エンジンの量子カルノーサイクルを解析的に検討する。エンジンは時間依存で動作し、内部サイクルと外部サイクルの両方が完全なカルノットサイクルを協調的に達成し、エンジンの効率は貯水池の熱容量と作動物質に依存する。最大効率と最大出力の分析結果から, 微視的エンジンの高性能化の背景となるメカニズムを明らかにするとともに, ナイトサイズ熱貯留層が果たす重要な役割を明らかにした。提案手法は, あらゆる微視的熱力学系に対して有効であり, 現在の実験室条件下で完全に実現可能である。

We investigate, in an analytical fashion, quantum Carnot cycles of a microscopic heat engine coupled to two nite heat reservoirs, whose internal cycles could own higher e ciency than the standard Carnot limit without consuming extra quantum resources, e.g., coherence or squeezing properties. The engine runs time-dependently, involving both the internal and external cycles to collaboratively accomplish a complete Carnot cycle, and the e ciency of the engine depends on the reservoirs heat capacities and the working substance. Our analytical results of the maximum efficiency and the maximum power output clarify the mechanism behind the high performance of the microscopic engines, displaying the key roles played by the nite-sized heat reservoirs. Our proposal is generally valid for any microscopic thermodynamic system and fully feasible under current laboratory conditions.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# 大次元における内積核回帰のピンスカー境界について

On the Pinsker bound of inner product kernel regression in large dimensions ( http://arxiv.org/abs/2409.00915v1 )

ライセンス: Link先を確認

Weihao Lu, Jialin Ding, Haobo Zhang, Qian Lin,

(参考訳) 特に球面$\mathbb{S}^{d}$上の内積核に関する最近の研究に基づいて、そのような設定における内積核回帰に対するピンスカー境界について検討する。具体的には、サンプルサイズ$n$ が $\alpha d^{\gamma}(1+o_{d}(1))$ によって与えられるシナリオに対処する。我々は、この設定でカーネル回帰の正確なミニマックスリスクを決定し、ミニマックス率だけでなく、過剰リスクに関連するピンスカー定数と呼ばれる正確な定数も特定した。

Building on recent studies of large-dimensional kernel regression, particularly those involving inner product kernels on the sphere $\mathbb{S}^{d}$, we investigate the Pinsker bound for inner product kernel regression in such settings. Specifically, we address the scenario where the sample size $n$ is given by $\alpha d^{\gamma}(1+o_{d}(1))$ for some $\alpha, \gamma>0$. We have determined the exact minimax risk for kernel regression in this setting, not only identifying the minimax rate but also the exact constant, known as the Pinsker constant, associated with the excess risk.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# MMT-BERT:Multitrack Music TransformerとMusicBERTを用いたコード認識シンボリック音楽生成

MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT ( http://arxiv.org/abs/2409.00919v1 )

ライセンス: Link先を確認

Jinlong Zhu, Keigo Sakurai, Ren Togo, Takahiro Ogawa, Miki Haseyama,

(参考訳) シンボリック・マルチトラック音楽生成に特化して設計された新しいシンボリック・ミュージック表現とジェネレーティブ・アディバーショナル・ネットワーク(GAN)フレームワークを提案する。シンボリック・ミュージック・ジェネレーションの主なテーマは、音楽データの事前処理とディープラーニング・フレームワークの実装である。シンボリック・ミュージック・ジェネレーションに特化した現在の技術は、一般的に2つの重要な課題に直面する: 弦と音階に関する情報の不足を訓練するデータと、シンボリック・ミュージック・表現のユニークな形式に適合した特別に設計されたモデル・アーキテクチャの必要性。本稿では,MusicLang コード解析モデルを用いた新しい記号的音楽表現を導入することで,上記の問題を解決する。本稿では,その表現に適応したMT-BERTアーキテクチャを提案する。頑健なマルチトラック・ミュージック・ジェネレータを構築するため,事前学習したMusicBERTモデルを微調整して判別器として機能し,相対論的標準損失を取り入れた。このアプローチは,MusicBERT内に符号化されたシンボリック音楽の深い理解に支えられ,本手法が生み出す音楽の協和性と人間性を裏付けるものである。実験により,最先端の手法を厳格に追従するアプローチの有効性が示された。

We propose a novel symbolic music representation and Generative Adversarial Network (GAN) framework specially designed for symbolic multitrack music generation. The main theme of symbolic music generation primarily encompasses the preprocessing of music data and the implementation of a deep learning framework. Current techniques dedicated to symbolic music generation generally encounter two significant challenges: training data's lack of information about chords and scales and the requirement of specially designed model architecture adapted to the unique format of symbolic music representation. In this paper, we solve the above problems by introducing new symbolic music representation with MusicLang chord analysis model. We propose our MMT-BERT architecture adapting to the representation. To build a robust multitrack music generator, we fine-tune a pre-trained MusicBERT model to serve as the discriminator, and incorporate relativistic standard loss. This approach, supported by the in-depth understanding of symbolic music encoded within MusicBERT, fortifies the consonance and humanity of music generated by our method. Experimental results demonstrate the effectiveness of our approach which strictly follows the state-of-the-art methods.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# ToolACE: LLM関数呼び出しのポイントを獲得する

ToolACE: Winning the Points of LLM Function Calling ( http://arxiv.org/abs/2409.00920v1 )

ライセンス: Link先を確認

Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian, Qun Liu, Enhong Chen,

(参考訳) 関数呼び出しは大きな言語モデルのアプリケーション境界を大幅に拡張し、高品質で多様なトレーニングデータがこの機能のアンロックに不可欠である。しかし、実際の関数呼び出しデータは収集と注釈が難しい一方で、既存のパイプラインで生成された合成データは、カバレッジと正確性に欠ける傾向にある。本稿では,高精度で複雑で多様なツール学習データを生成するための自動エージェントパイプラインであるToolACEを提案する。 ToolACEは、新しい自己進化合成プロセスを活用して、26,507の多様なAPIの包括的なAPIプールをキュレートする。ダイアログは、複数のエージェント間の相互作用を通じてさらに生成され、形式化された思考プロセスによってガイドされる。データ精度を確保するため、ルールベースとモデルベースのチェックを組み合わせた二重層検証システムを実装した。我々は、合成データに基づいてトレーニングされたモデルが、8Bパラメータだけで、最新のGPT-4モデルに匹敵する、バークレー・ファンクション・カリング・リーダーボードで最先端のパフォーマンスを達成することを実証した。我々のモデルとデータのサブセットはhttps://huggingface.co/Team-ACE.comで公開されています。

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# 型付きホールを用いた大規模言語モデルの統計的文脈化

Statically Contextualizing Large Language Models with Typed Holes ( http://arxiv.org/abs/2409.00921v1 )

ライセンス: Link先を確認

Andrew Blinn, Xiang Li, June Hyung Kim, Cyrus Omar,

(参考訳) 大規模言語モデル(LLM)は、プログラム合成のランドスケープを形変えた。しかし、現代のLLMベースのコード補完システムは、特にトレーニングデータやカーソルに近い定義で作業する場合に、適切なコンテキストが欠如しているため、壊れたコードを幻覚させることが多い。本稿では,言語サーバが公開している言語型とバインディング構造との密接な統合が,この文脈化問題にトークン効率のよい方法で対処できることを実証する。要するに、AIにもIDEが必要だ、と私たちは主張するのです! 特に,LLMコード生成をHazelのライブプログラムスケッチ環境に統合する。 Hazel Language Serverは、エラーがあっても、穴のタイプと型付けのコンテキストを特定し、有意義なプログラムスケッチが常に利用可能であることを保証します。これにより、コードベース全体のコンテキスト情報をカーソルにレキシカルにローカルでなくても、必ずしも同じファイルにローカルでなくても、開発者の目標にセマンティックにローカルになる可能性がある。 LLMによって合成された補完は、言語サーバーとのさらなる対話を通じて反復的に洗練される。これらの手法を評価するために,MVU (Model-view-update) WebアプリケーションのデータセットであるMVUBenchを紹介する。これらのアプリケーションは、アプリケーション固有のデータ構造に依存しているため、課題として機能する。型定義によるコンテキスト化は,特に影響が大きいことが分かりました。 Hazelのコンテキストでアイデアを導入し、MVUBenchをTypeScriptに移植して、これらのメソッドを高レベルの言語に適用可能であることを検証しました。最後に、言語サーバが実装できる言語サーバプロトコル(LSP)の保守的な拡張であるChatLSPの概要を述べる。

Large language models (LLMs) have reshaped the landscape of program synthesis. However, contemporary LLM-based code completion systems often hallucinate broken code because they lack appropriate context, particularly when working with definitions not in the training data nor near the cursor. This paper demonstrates that tight integration with the type and binding structure of a language, as exposed by its language server, can address this contextualization problem in a token-efficient manner. In short, we contend that AIs need IDEs, too! In particular, we integrate LLM code generation into the Hazel live program sketching environment. The Hazel Language Server identifies the type and typing context of the hole being filled, even in the presence of errors, ensuring that a meaningful program sketch is always available. This allows prompting with codebase-wide contextual information not lexically local to the cursor, nor necessarily in the same file, but that is likely to be semantically local to the developer's goal. Completions synthesized by the LLM are then iteratively refined via further dialog with the language server. To evaluate these techniques, we introduce MVUBench, a dataset of model-view-update (MVU) web applications. These applications serve as challenge problems due to their reliance on application-specific data structures. We find that contextualization with type definitions is particularly impactful. After introducing our ideas in the context of Hazel we duplicate our techniques and port MVUBench to TypeScript in order to validate the applicability of these methods to higher-resource languages. Finally, we outline ChatLSP, a conservative extension to the Language Server Protocol (LSP) that language servers can implement to expose capabilities that AI code completion systems of various designs can use to incorporate static context when generating prompts for an LLM.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# ProphetFuzz: 大規模言語モデルによるドキュメントのみによるハイリスクオプションの組み合わせの完全な自動予測とファズリング

ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model ( http://arxiv.org/abs/2409.00922v1 )

ライセンス: Link先を確認

Dawei Wang, Geng Zhou, Li Chen, Dan Li, Yukai Miao,

(参考訳) オプションの組み合わせに関連する脆弱性は、膨大な検索スペースのため、ソフトウェアのセキュリティテストにおいて重大な課題となる。従来の研究は、全てのオプションの組み合わせが脆弱性に対して同等の可能性を秘めているとして非効率に扱った突然変異やフィルタリング技術を通じてこの問題に対処していたため、非脆弱なターゲットではかなりの時間が費やされ、結果としてテスト効率が低下した。本稿では,大規模言語モデル(LLM)を設計したプロンプトエンジニアリングを用いて,リスクの高い選択肢の組み合わせ(脆弱性を含む可能性が高くなる)を予測し,人間の介入なしにファジテストを自動的に実施する。我々はProphetFuzzというツールを開発し、関連する3つの研究から収集された52のプログラムからなるデータセット上で評価した。実験全体では10.44CPUを消費した。 ProphetFuzzは1748のハイリスクオプションの組み合わせを平均8.69ドルと予測した。 72時間のファジグの後、ProphetFuzzは予測されたハイリスクオプションの組み合わせの12.30\%に関連する364のユニークな脆弱性を発見した。さらに、ProphetFuzzを使用して、これらのプログラムの最新バージョンで永続的なファジィを行い、140の脆弱性を発見し、93人の開発者が確認し、21人のCVE番号が与えられた。

Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mutation or filtering techniques, which inefficiently treated all option combinations as having equal potential for vulnerabilities, thus wasting considerable time on non-vulnerable targets and resulting in low testing efficiency. In this paper, we utilize carefully designed prompt engineering to drive the large language model (LLM) to predict high-risk option combinations (i.e., more likely to contain vulnerabilities) and perform fuzz testing automatically without human intervention. We developed a tool called ProphetFuzz and evaluated it on a dataset comprising 52 programs collected from three related studies. The entire experiment consumed 10.44 CPU years. ProphetFuzz successfully predicted 1748 high-risk option combinations at an average cost of only \$8.69 per program. Results show that after 72 hours of fuzzing, ProphetFuzz discovered 364 unique vulnerabilities associated with 12.30\% of the predicted high-risk option combinations, which was 32.85\% higher than that found by state-of-the-art in the same timeframe. Additionally, using ProphetFuzz, we conducted persistent fuzzing on the latest versions of these programs, uncovering 140 vulnerabilities, with 93 confirmed by developers and 21 awarded CVE numbers.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# 地下駐車場の稼働予測アルゴリズムの開発

Development of Occupancy Prediction Algorithm for Underground Parking Lots ( http://arxiv.org/abs/2409.00923v1 )

ライセンス: Link先を確認

Shijie Wang,

(参考訳) 本研究の中心となる目的は,地下などの悪環境下での自律運転が直面する認識課題に対処することである。当初,本論文は地下のガレージでデータ収集を開始している。 CARLAシミュレーション環境内にシミュレーションされた地下ガレージモデルを構築し、このシミュレーション環境でセマンティックキティフォーマットの接地真実データを収集する。その後、トランスフォーマーベースのOccupancy Networkモデルを統合し、このシナリオ内での占有グリッド予測タスクを完了させる。包括的なBEV認識フレームワークは、薄暗い、挑戦的な自律運転環境において、ニューラルネットワークモデルの精度を高めるように設計されている。最後に,提案手法の地下シナリオにおける知覚性能の精度を検証する実験を行った。提案手法は自作の地下ガレージデータセットであるSUSTech-COE-ParkingLotでテストし,良好な結果を得た。

The core objective of this study is to address the perception challenges faced by autonomous driving in adverse environments like basements. Initially, this paper commences with data collection in an underground garage. A simulated underground garage model is established within the CARLA simulation environment, and SemanticKITTI format occupancy ground truth data is collected in this simulated setting. Subsequently, the study integrates a Transformer-based Occupancy Network model to complete the occupancy grid prediction task within this scenario. A comprehensive BEV perception framework is designed to enhance the accuracy of neural network models in dimly lit, challenging autonomous driving environments. Finally, experiments validate the accuracy of the proposed solution's perception performance in basement scenarios. The proposed solution is tested on our self-constructed underground garage dataset, SUSTech-COE-ParkingLot, yielding satisfactory results.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# MedSAM-U:信頼性の高いMedSAMのための不確かさ誘導オートマルチプロンプト適応

MedSAM-U: Uncertainty-Guided Auto Multi-Prompt Adaptation for Reliable MedSAM ( http://arxiv.org/abs/2409.00924v1 )

ライセンス: Link先を確認

Nan Zhou, Ke Zou, Kai Ren, Mengting Luo, Linchao He, Meng Wang, Yidi Chen, Yi Zhang, Hu Chen, Huazhu Fu,

(参考訳) 医用セグメンテーションモデル (MedSAM) は, 医用画像のセグメンテーションにおいて顕著な性能を示し, この分野に大きな注目を集めている。しかし、異なるプロンプトタイプや場所に対する感度が問題となる。本稿では,MedSAMの精度を高める信頼性の高いプロンプトの開発に焦点を当て,これらの課題に対処する。我々はMedSAM-Uを導入する。MedSAM-Uは、より信頼性が高く正確な医用画像セグメンテーションのために、マルチプロンプト入力を自動的に洗練するための不確実性誘導フレームワークである。具体的には、まずMedSAMと統合されたMPA-MedSAMをトレーニングし、多様なMedSAM入力に適応させる。次に、不確実性誘導型マルチプロンプトを用いて、プロンプトと初期セグメンテーション結果に関する不確実性を効果的に推定する。特に、新しい不確実性誘導プロンプト適応手法が自動的に適用され、信頼性の高いプロンプトとその対応するセグメンテーション結果が導出される。複数のモードからのデータセットを用いてMedSAM-Uを検証し、普遍的な画像分割モデルを訓練する。 MedSAMと比較して、5つの異なるモーダルデータセットの実験結果から、提案したMedSAM-Uは、不確実性誘導されたプロンプトで平均1.7\%から20.5\%の性能向上を達成することが示された。

The Medical Segment Anything Model (MedSAM) has shown remarkable performance in medical image segmentation, drawing significant attention in the field. However, its sensitivity to varying prompt types and locations poses challenges. This paper addresses these challenges by focusing on the development of reliable prompts that enhance MedSAM's accuracy. We introduce MedSAM-U, an uncertainty-guided framework designed to automatically refine multi-prompt inputs for more reliable and precise medical image segmentation. Specifically, we first train a Multi-Prompt Adapter integrated with MedSAM, creating MPA-MedSAM, to adapt to diverse multi-prompt inputs. We then employ uncertainty-guided multi-prompt to effectively estimate the uncertainties associated with the prompts and their initial segmentation results. In particular, a novel uncertainty-guided prompts adaptation technique is then applied automatically to derive reliable prompts and their corresponding segmentation outcomes. We validate MedSAM-U using datasets from multiple modalities to train a universal image segmentation model. Compared to MedSAM, experimental results on five distinct modal datasets demonstrate that the proposed MedSAM-U achieves an average performance improvement of 1.7\% to 20.5\% across uncertainty-guided prompts.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# 授業場面における学生行動に向けて:新しいデータセットとベースライン

Towards Student Actions in Classroom Scenes: New Dataset and Baseline ( http://arxiv.org/abs/2409.00926v1 )

ライセンス: Link先を確認

Zhuolin Tan, Chenqiang Gao, Anyong Qin, Ruixin Chen, Tiecheng Song, Feng Yang, Deyu Meng,

(参考訳) 学生行動の分析は、教育研究において重要かつ困難な課題である。既存の取り組みは、教室の微妙なアクションダイナミクスを捉えるために、アクセス可能なデータセットが欠如していることによって妨げられている。本稿では,複雑な教室シーンを対象としたSAV(Multi-label student action video)データセットを提案する。データセットは、758の教室から、4,324の慎重にトリミングされたビデオクリップで構成され、それぞれに15の教室で生徒が表示するアクションがラベル付けされている。既存の行動データセットと比較して、我々のデータセットは、さまざまな実際の教室シナリオ、高品質のビデオデータ、微妙な動きの違い、密集した物体のエンゲージメント、大きなスケールの違い、様々な射撃角度、視覚的閉塞など、ユニークな課題を提供することで際立っている。データセットの複雑さが増大すると、アクション検出をベンチマークする新たな機会と課題がもたらされる。また,小型で高密度な対象領域における局所的な重要な細部への注意を高めるための,新しいベースライン手法であるビジュアルトランスフォーマーを提案する。平均精度は67.9 %, 平均精度は27.4 %, 平均精度は67.9 %, 平均精度は27.4 %であった。この論文は、データセットを提供するだけでなく、教育方法論や学習成果を変革するAI駆動型教育ツールのさらなる研究も求めている。コードとデータセットはhttps://github.com/Ritatanz/SAVで公開される。

Analyzing student actions is an important and challenging task in educational research. Existing efforts have been hampered by the lack of accessible datasets to capture the nuanced action dynamics in classrooms. In this paper, we present a new multi-label student action video (SAV) dataset for complex classroom scenes. The dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, each labeled with 15 different actions displayed by students in classrooms. Compared to existing behavioral datasets, our dataset stands out by providing a wide range of real classroom scenarios, high-quality video data, and unique challenges, including subtle movement differences, dense object engagement, significant scale differences, varied shooting angles, and visual occlusion. The increased complexity of the dataset brings new opportunities and challenges for benchmarking action detection. Innovatively, we also propose a new baseline method, a visual transformer for enhancing attention to key local details in small and dense object regions. Our method achieves excellent performance with mean Average Precision (mAP) of 67.9\% and 27.4\% on SAV and AVA, respectively. This paper not only provides the dataset but also calls for further research into AI-driven educational tools that may transform teaching methodologies and learning outcomes. The code and dataset will be released at https://github.com/Ritatanz/SAV.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# 自己判断: 適応自己評価による選択的指導

Self-Judge: Selective Instruction Following with Alignment Self-Evaluation ( http://arxiv.org/abs/2409.00935v1 )

ライセンス: Link先を確認

Hai Ye, Hwee Tou Ng,

(参考訳) 事前訓練された大規模言語モデル(LLM)は、命令チューニングを通じて人間の指示に従うように調整することができる。しかし、テストタイムデータの分散が変化しているため、チャットアシスタントとして振る舞う際に、現実的なエラーやコンテンツに不一致を生じさせる可能性のある命令を常に正確に実行するわけではない。そこで本研究では,次の命令に対するLCMの信頼性を高めるために,期待する応答品質が低ければ命令の実行を減らし,選択的な命令に従うことを提案する。我々は、モデル応答の数値的品質スコアを予測できる判断モデルを訓練する。データ不足に対処するために、人間に注釈付けされた品質スコアを必要とせずに、判断モデルを開発するための新しい自己学習フレームワークであるSelf-Jを導入する。提案手法はモデル固有の自己評価機能を利用して,ラベル付き命令チューニングデータから応答品質に関する情報を抽出する。応答サンプルとゴールド参照のセマンティックな類似性を評価することにより、自己評価と再検討を容易にするために、ゴールド参照応答が組み込まれている。トレーニング期間中に,基準自由推定の能力を高めるために,正則化手法として自己蒸留を実装した。一般的な指示追従タスクにおけるアライメント評価を検証するため,Hugging Faceから大規模高品質な命令を収集し,モデルトレーニングと評価を行った。提案手法は, GPT-4およびGPT-3.5-turboから抽出した教師モデルよりも, GPT-4との相関性が高いことを示す。我々の分析は、ドメイン間のモデルの強い一般化を示している。さらに、審査モデルは、例えば、WizardLM-13B-V1.2を89.17から92.48に引き上げ、AlpacaEvalのバージョンv1とv2の12.03から15.90にそれぞれ、ベストオブ32サンプリングを使用して、報奨モデルとして機能する。

Pre-trained large language models (LLMs) can be tailored to adhere to human instructions through instruction tuning. However, due to shifts in the distribution of test-time data, they may not always execute instructions accurately, potentially generating factual errors or misaligned content when acting as chat assistants. To enhance the reliability of LLMs in following instructions, we propose the study of selective instruction following, whereby the system declines to execute instructions if the anticipated response quality is low. We train judge models that can predict numerical quality scores for model responses. To address data scarcity, we introduce Self-J, a novel self-training framework for developing judge models without needing human-annotated quality scores. Our method leverages the model's inherent self-evaluation capability to extract information about response quality from labeled instruction-tuning data. It incorporates a gold reference answer to facilitate self-evaluation and recalibrates by assessing the semantic similarity between the response sample and the gold reference. During the training phase, we implement self-distillation as a regularization technique to enhance the capability of reference-free estimation. To validate alignment evaluation on general instruction-following tasks, we collect large-scale high-quality instructions from Hugging Face for model training and evaluation. Extensive experiments on five open-source models show that our method correlates much more with GPT-4 than strong baselines, e.g., supervised models distilled from GPT-4 and GPT-3.5-turbo. Our analysis shows our model's strong generalization across domains. Additionally, our judge models serve as good reward models, e.g., boosting WizardLM-13B-V1.2 from 89.17 to 92.48 and from 12.03 to 15.90 in version v1 and v2 of AlpacaEval respectively using best-of-32 sampling with our judge models.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# 感性トピックの自動検出のための大規模言語モデル

Large Language Models for Automatic Detection of Sensitive Topics ( http://arxiv.org/abs/2409.00940v1 )

ライセンス: Link先を確認

Ruoyu Wen, Stephanie Elena Crowe, Kunal Gupta, Xinyue Li, Mark Billinghurst, Simon Hoermann, Dwain Allan, Alaeddin Nassani, Thammathip Piumsomboon,

(参考訳) 安全なオンラインコミュニティを維持するためには、コンテンツモデレーションにおいて、敏感な情報検出が不可欠である。従来の手作業で補助することで、人間のモデレーターが圧倒的で面倒な作業から解放され、潜在的なリスクをもたらす可能性のあるフラグ付きコンテンツのみに集中できるようになる。急速に進歩する大規模言語モデル(LLM)は、自然言語を理解し処理する能力で知られており、このプロセスをサポートする潜在的なソリューションを提供する。本研究は,2つのオンラインデータセット内のメンタルヘルス領域における機密メッセージを検出するための5つのLLMの機能について検討し,精度,精度,リコール,F1スコア,一貫性の観点からその性能を評価する。以上の結果から, LLM はモデレーションワークフローに, 簡便かつ高精度な検出ツールとして組み込まれる可能性が示唆された。最高のパフォーマンスモデルであるGPT-4oは平均精度99.5\%、F1スコア0.99を達成した。我々は、モデレーションワークフローでLLMを使うことの利点と潜在的な課題について論じ、将来の研究は、この技術を利用する際の倫理的考慮事項に対処すべきだと提案する。

Sensitive information detection is crucial in content moderation to maintain safe online communities. Assisting in this traditionally manual process could relieve human moderators from overwhelming and tedious tasks, allowing them to focus solely on flagged content that may pose potential risks. Rapidly advancing large language models (LLMs) are known for their capability to understand and process natural language and so present a potential solution to support this process. This study explores the capabilities of five LLMs for detecting sensitive messages in the mental well-being domain within two online datasets and assesses their performance in terms of accuracy, precision, recall, F1 scores, and consistency. Our findings indicate that LLMs have the potential to be integrated into the moderation workflow as a convenient and precise detection tool. The best-performing model, GPT-4o, achieved an average accuracy of 99.5\% and an F1-score of 0.99. We discuss the advantages and potential challenges of using LLMs in the moderation workflow and suggest that future research should address the ethical considerations of utilising this technology.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# VQ-Flow:階層ベクトル量子化によるマルチクラス異常検出のための正規化フローのモデリング

VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization ( http://arxiv.org/abs/2409.00942v1 )

ライセンス: Link先を確認

Yixuan Zhou, Xing Xu, Zhe Sun, Jingkuan Song, Andrzej Cichocki, Heng Tao Shen,

(参考訳) 複雑なデータ分布をモデル化する能力で有名な確率モデルのカテゴリである正規化フローは、教師なし異常検出において顕著な効果を示した。本稿では,マルチクラス異常検出におけるフローの正規化の可能性について検討する。ベクトル量子化(VQ)の統合により,多クラス正規データの異なる概念を教師なしで識別するフローモデルが強化され,VQ-Flowと呼ばれる新しいフローベース統一手法が実現される。具体的には,概念識別のための概念プロトタイプコードブック (Conceptual Prototype Codebook, CPC) と概念固有パターンコードブック (Concomitant Concept-Specific Pattern Codebook, CSPC) の2つの相対的符号ブックを,階層的ベクトル量子化を用いて推定する。 VQ-Flowのフローモデルは、CSPCでキャプチャされた概念固有のパターンに基づいており、異なる概念に関連する特定の通常のパターンをモデル化することができる。さらに、CPCは、概念認識分布モデリングのためのVQ-Flowを可能にし、概念プロトタイプ上で再パラメータ化された混合ガウス分布を通して、複雑な多クラス正規分布を忠実に模倣する。ベクトル量子化の導入により、提案したVQ-Flowは、統一的なトレーニングスキーム内での多クラス異常検出において最先端の手法を推し進め、Detを得る。 /Loc AUROC 99.5%/98.3% MVTec AD コードベースはhttps://github.com/cool-xuan/vqflow.comで公開されている。

Normalizing flows, a category of probabilistic models famed for their capabilities in modeling complex data distributions, have exhibited remarkable efficacy in unsupervised anomaly detection. This paper explores the potential of normalizing flows in multi-class anomaly detection, wherein the normal data is compounded with multiple classes without providing class labels. Through the integration of vector quantization (VQ), we empower the flow models to distinguish different concepts of multi-class normal data in an unsupervised manner, resulting in a novel flow-based unified method, named VQ-Flow. Specifically, our VQ-Flow leverages hierarchical vector quantization to estimate two relative codebooks: a Conceptual Prototype Codebook (CPC) for concept distinction and its concomitant Concept-Specific Pattern Codebook (CSPC) to capture concept-specific normal patterns. The flow models in VQ-Flow are conditioned on the concept-specific patterns captured in CSPC, capable of modeling specific normal patterns associated with different concepts. Moreover, CPC further enables our VQ-Flow for concept-aware distribution modeling, faithfully mimicking the intricate multi-class normal distribution through a mixed Gaussian distribution reparametrized on the conceptual prototypes. Through the introduction of vector quantization, the proposed VQ-Flow advances the state-of-the-art in multi-class anomaly detection within a unified training scheme, yielding the Det./Loc. AUROC of 99.5%/98.3% on MVTec AD. The codebase is publicly available at https://github.com/cool-xuan/vqflow.

翻訳日:2024-09-06 08:21:03 公開日:2024-09-02

# 大規模言語モデルを用いた音声合成のためのフレームワーク

A Framework for Synthetic Audio Conversations Generation using Large Language Models ( http://arxiv.org/abs/2409.00946v1 )

ライセンス: Link先を確認

Kaung Myat Kyaw, Jonathan Hoyin Chan,

(参考訳) 本稿では,複数のペルソナ設定を持つ大言語モデル(LLM)を用いて合成会話音声を生成するためのフレームワークであるConversaSynthを紹介する。このフレームワークはまず、さまざまなトピックにわたる多様で一貫性のあるテキストベースの対話を生成し、その後、TTS(text-to-speech)システムを使用して音声に変換する。実験の結果、ConversaSynthは高品質な合成音声データセットを効果的に生成し、音声タグ付け、音声分類、マルチスピーカ音声認識のためのモデルの訓練と評価を大幅に向上させることができることがわかった。その結果、ConversaSynthが生成した合成データセットには、かなりの多様性とリアリズムがあり、堅牢で適応可能なオーディオベースのAIシステムの開発に適していることが示唆された。

In this paper, we introduce ConversaSynth, a framework designed to generate synthetic conversation audio using large language models (LLMs) with multiple persona settings. The framework first creates diverse and coherent text-based dialogues across various topics, which are then converted into audio using text-to-speech (TTS) systems. Our experiments demonstrate that ConversaSynth effectively generates highquality synthetic audio datasets, which can significantly enhance the training and evaluation of models for audio tagging, audio classification, and multi-speaker speech recognition. The results indicate that the synthetic datasets generated by ConversaSynth exhibit substantial diversity and realism, making them suitable for developing robust, adaptable audio-based AI systems.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# XNet v2: 制限が少なく、結果が良く、より普遍性が高い

XNet v2: Fewer Limitations, Better Results and Greater Universality ( http://arxiv.org/abs/2409.00947v1 )

ライセンス: Link先を確認

Yanfeng Zhou, Lingrui Li, Zichen Wang, Guole Liu, Ziwen Liu, Ge Yang,

(参考訳) XNetはウェーブレットベースのバイオメディカルセグメンテーションのためのX字型統一アーキテクチャを導入している。しかし、これまでのところXNetは、高周波数(HF)情報がない場合のパフォーマンス低下、生画像の未使用化、核融合の不十分など、その制限に直面している。これらの問題に対処するため、低周波・高周波補完モデルであるXNet v2を提案する。 XNet v2は、ウェーブレットベースの画像レベルの相補的融合を行い、融合結果と3つの異なるサブネットワークを入力して整合性損失を構築する。さらに,低周波(LF)情報とHF情報の転送を促進する機能レベルの融合モジュールを導入する。 XNet v2は、半教師付きセグメンテーションにおける最先端の達成と、完全に教師付き学習の反復的な結果の維持を実現する。さらに重要なのは、XNetが失敗するシナリオにおいて、XNet v2が優れていることだ。 XNetと比較して、XNet v2はより少ない制限、より良い結果、より大きな普遍性を示す。 3つの2Dデータセットと2つの3Dデータセットに関する大規模な実験は、XNet v2の有効性を示している。コードはhttps://github.com/Yanfeng-Zhou/XNetv2で入手できる。

XNet introduces a wavelet-based X-shaped unified architecture for fully- and semi-supervised biomedical segmentation. So far, however, XNet still faces the limitations, including performance degradation when images lack high-frequency (HF) information, underutilization of raw images and insufficient fusion. To address these issues, we propose XNet v2, a low- and high-frequency complementary model. XNet v2 performs wavelet-based image-level complementary fusion, using fusion results along with raw images inputs three different sub-networks to construct consistency loss. Furthermore, we introduce a feature-level fusion module to enhance the transfer of low-frequency (LF) information and HF information. XNet v2 achieves state-of-the-art in semi-supervised segmentation while maintaining competitve results in fully-supervised learning. More importantly, XNet v2 excels in scenarios where XNet fails. Compared to XNet, XNet v2 exhibits fewer limitations, better results and greater universality. Extensive experiments on three 2D and two 3D datasets demonstrate the effectiveness of XNet v2. Code is available at https://github.com/Yanfeng-Zhou/XNetv2 .

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# 汎用ロボット学習のための意味制御可能な拡張

Semantically Controllable Augmentations for Generalizable Robot Learning ( http://arxiv.org/abs/2409.00951v1 )

ライセンス: Link先を確認

Zoey Chen, Zhao Mandi, Homanga Bharadhwaj, Mohit Sharma, Shuran Song, Abhishek Gupta, Vikash Kumar,

(参考訳) ロボット操作の現実に見えないシナリオへの一般化には、トレーニング中にさまざまなデータセットを公開する必要がある。しかし、運用コストが高いため、大規模な実世界のデータセットの収集は困難である。これらの課題にもかかわらず、ロボット学習が一般化するには、ロボットの直接的な経験を超えて、データや事前のソースを活用することが不可欠である。本研究では,大量のWebスクラッドデータに対して事前学習された画像テキスト生成モデルが,そのようなデータソースとして機能することを示す。これらの生成モデルは、ロボットの直接体験を超えた幅広い現実のシナリオを含み、ロボットエージェントが現実世界の一般化を余分なコストで支援する追加の世界に露出する新しい合成体験を合成することができる。特に,本手法では,事前学習した生成モデルをデータ拡張の有効なツールとして活用する。本稿では,実世界の一般化を可能にする豊富なバリエーションを誘導しながら,意味制御可能な拡張とロボットデータセットの高速乗算のための生成的拡張フレームワークを提案する。ロボットデータの多種多様な拡張に基づいて、シミュレーションとキッチンやテーブルトップのような目に見えない現実環境の両方において、スケーラブルなロボット操作ポリシーがいかに訓練され、デプロイされるかを示す。実世界の多様なロボットアプリケーションにおける画像テキスト生成モデルの有効性を実証することにより、我々の生成拡張フレームワークは、人間の余分なコストでロボット学習の一般化を促進するためのスケーラブルで効率的な経路を提供する。

Generalization to unseen real-world scenarios for robot manipulation requires exposure to diverse datasets during training. However, collecting large real-world datasets is intractable due to high operational costs. For robot learning to generalize despite these challenges, it is essential to leverage sources of data or priors beyond the robot's direct experience. In this work, we posit that image-text generative models, which are pre-trained on large corpora of web-scraped data, can serve as such a data source. These generative models encompass a broad range of real-world scenarios beyond a robot's direct experience and can synthesize novel synthetic experiences that expose robotic agents to additional world priors aiding real-world generalization at no extra cost. In particular, our approach leverages pre-trained generative models as an effective tool for data augmentation. We propose a generative augmentation framework for semantically controllable augmentations and rapidly multiplying robot datasets while inducing rich variations that enable real-world generalization. Based on diverse augmentations of robot data, we show how scalable robot manipulation policies can be trained and deployed both in simulation and in unseen real-world environments such as kitchens and table-tops. By demonstrating the effectiveness of image-text generative models in diverse real-world robotic applications, our generative augmentation framework provides a scalable and efficient path for boosting generalization in robot learning at no extra human cost.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# 多体断熱通路:不安定、カオス、量子古典対応

Many-body adiabatic passage: Instability, chaos, and quantum classical correspondence ( http://arxiv.org/abs/2409.00952v1 )

ライセンス: Link先を確認

Anant Vijay Varma, Amichay Vardi, Doron Cohen,

(参考訳) 相互作用するボソン系の断熱通路は、相互作用と粒子間の絡み合いによって大きく影響を受ける。我々は,低次元カオス(3サイト連鎖)および高次元カオス(3サイト以上)を示すBose-Hubbard鎖におけるSTIRAP様のスキームを考える。転送プロトコルによって生成されるダイナミクスは、平均場古典的処理、トランケート・ウィグナー半古典的処理、および全多体量子シミュレーションにおいて現れる古典的および量子的カオス指紋を示す。

Adiabatic passage in systems of interacting bosons is substantially affected by interactions and inter-particle entanglement. We consider STIRAP-like schemes in Bose-Hubbard chains that exhibit low-dimensional chaos (a 3 site chain), and high-dimensional chaos (more than 3 sites). The dynamics that is generated by a transfer protocol exhibits striking classical and quantum chaos fingerprints that are manifest in the mean-field classical treatment, in the truncated-Wigner semiclassical treatment, and in the full many-body quantum simulations.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# PNVC:実践的なINRベースのビデオ圧縮を目指して

PNVC: Towards Practical INR-based Video Compression ( http://arxiv.org/abs/2409.00953v1 )

ライセンス: Link先を確認

Ge Gao, Ho Man Kwan, Fan Zhang, David Bull,

(参考訳) ニューラルビデオ圧縮は、最近、レート品質のパフォーマンスの観点から、従来のビデオコーデックと競合する大きな可能性を示している。しかしながら、これらの学習ビデオコーデックは、デコード複雑性(オートエンコーダベースの方法)や/またはシステム遅延(暗黙のニューラル表現(INR)ベースのモデル)に関連する様々な問題と関連付けられており、現在、それらが実用的なアプリケーションにデプロイされることを防いでいる。本稿では,実用的なニューラルビデオコーデックをターゲットとして,自動エンコーダと過度に適合したソリューションを革新的に組み合わせた,新しいINRベースのコーディングフレームワークであるPNVCを提案する。我々のアプローチは、新しい構造的再パラメータ化に基づくアーキテクチャ、階層的品質制御、変調に基づくエントロピーモデリング、スケールアウェアな位置埋め込みなど、いくつかの設計革新の恩恵を受けている。低遅延(LD)とランダムアクセス(RA)の両方をサポートしているため、PNVCは既存のINRベースのコーデックよりも優れており、HEVC HM 18.0(LD)に対して35%以上のBDレートの保存を実現している。これは、INRベースのビデオコーディングにとって重要な一歩であり、実践的なデプロイメントに向かっている。ソースコードは公開評価のために利用できる。

Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# 物理インフォームドニューラルネットワークを用いたディジタル画像相関法

Physics-Informed Neural Network Based Digital Image Correlation Method ( http://arxiv.org/abs/2409.00956v1 )

ライセンス: Link先を確認

Boda Li, Shichao Zhou, Qinwei Ma, Shaopeng Ma,

(参考訳) ディジタル画像相関(DIC)は、従来、変位場を決定するためにサブセットマッチングに頼っていた、フルフィールドの変形測定のための実験力学における鍵となる技術である。しかし、不均一な変形シナリオでは、形状関数やサブセットサイズのような最適なパラメータを選択することは困難である。最近のディープラーニングベースのDICアプローチは、教師付きと教師なしの両方で、ニューラルネットワークを使用してスペックル画像を変形場にマッピングし、手動チューニングなしで正確な測定を提供する。しかし,これらの手法ではスペックル画像の特徴を抽出するために複雑なネットワークアーキテクチャを必要とするため,解の精度は保証されない。従来のアプローチとは異なり、PINN-DICは、座標領域を入力として、変位場を出力する単純な完全に接続されたニューラルネットワークを使用する。 DIC制御方程式を損失関数に統合することにより、PINN-DICは、反復最適化により参照および変形スペックル画像から直接変位場を抽出する。シミュレーションおよび実実験による評価は、PINN-DICが非一様分野における深層学習に基づくDICの精度を維持しつつ、3つの異なる利点を提供していることを示している。 1)座標から変位場を直接取付けることにより、より単純なネットワークによる精度の向上。 2【最小パラメータ調整による不規則境界変位場の効果的取扱い】 3) 包括的DIC結果解析のための他のニューラルネットワークに基づく機械的解析手法と容易に統合できる。

Digital Image Correlation (DIC) is a key technique in experimental mechanics for full-field deformation measurement, traditionally relying on subset matching to determine displacement fields. However, selecting optimal parameters like shape functions and subset size can be challenging in non-uniform deformation scenarios. Recent deep learning-based DIC approaches, both supervised and unsupervised, use neural networks to map speckle images to deformation fields, offering precise measurements without manual tuning. However, these methods require complex network architectures to extract speckle image features, which does not guarantee solution accuracy This paper introduces PINN-DIC, a novel DIC method based on Physics-Informed Neural Networks (PINNs). Unlike traditional approaches, PINN-DIC uses a simple fully connected neural network that takes the coordinate domain as input and outputs the displacement field. By integrating the DIC governing equation into the loss function, PINN-DIC directly extracts the displacement field from reference and deformed speckle images through iterative optimization. Evaluations on simulated and real experiments demonstrate that PINN-DIC maintains the accuracy of deep learning-based DIC in non-uniform fields while offering three distinct advantages: 1) enhanced precision with a simpler network by directly fitting the displacement field from coordinates, 2) effective handling of irregular boundary displacement fields with minimal parameter adjustments, and 3) easy integration with other neural network-based mechanical analysis methods for comprehensive DIC result analysis.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# 音声と音声の同時翻訳における最先端化には,何が必要か?

What does it take to get state of the art in simultaneous speech-to-speech translation? ( http://arxiv.org/abs/2409.00965v1 )

ライセンス: Link先を確認

Vincent Wilmet, Johnson Du,

(参考訳) 本稿では, 音声合成モデルの性能向上にともなう遅延特性の詳細な解析を行い, 特に幻覚による遅延スパイクに着目した。様々な入力パラメータや条件を体系的に実験することにより、レイテンシのスパイクを最小限に抑え、全体的な性能を改善する方法を提案する。この結果から,注意深い入力管理と戦略的パラメータ調整を組み合わせることで,音声合成モデルの遅延挙動を著しく向上させることができることが示唆された。

This paper presents an in-depth analysis of the latency characteristics observed in simultaneous speech-to-speech model's performance, particularly focusing on hallucination-induced latency spikes. By systematically experimenting with various input parameters and conditions, we propose methods to minimize latency spikes and improve overall performance. The findings suggest that a combination of careful input management and strategic parameter adjustments can significantly enhance speech-to-speech model's latency behavior.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# 低次多項式による相関確率ブロックモデル検出のための計算遷移

A computational transition for detecting correlated stochastic block models by low-degree polynomials ( http://arxiv.org/abs/2409.00966v1 )

ライセンス: Link先を確認

Guanyi Chen, Jian Ding, Shuyang Gong, Zhangsong Li,

(参考訳) 一対のランダムグラフにおける相関性の検出は、近年広く研究されている基本的な統計的および計算上の問題である。この研究では、相関(スパース)確率ブロックモデル $\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon;s)$を共通の親確率ブロックモデル $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon)$ with $k=O(1)$ symmetric community, average degree $\lambda=O(1)$, divergence parameter $\epsilon$, subsampling probability $s$とみなす。このモデルを同一辺密度$\mathcal{G}(n,\tfrac{\lambda s}{n})$の独立したErd\H{o}s-R\'enyiグラフと区別する検出問題に対して、隣接行列のエントリの \emph{low-degree polynomials} に基づくテストに焦点を合わせ、容易かつ難しい規則を分離するしきい値を決定する。より正確には、このテストのクラスがこれらの2つのモデルを区別できることは、$s> \min \{ \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} \}$, where $\alpha\approx 0.338$ is the Otter's constant and $\frac{1}{\lambda \epsilon^2}$ is the Kesten-Stigum thresholdである場合に限る。低次硬さの証明は、低次硬さ計算の条件変種に基づいている。

Detection of correlation in a pair of random graphs is a fundamental statistical and computational problem that has been extensively studied in recent years. In this work, we consider a pair of correlated (sparse) stochastic block models $\mathcal{S}(n,\tfrac{\lambda}{n};k,\epsilon;s)$ that are subsampled from a common parent stochastic block model $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon)$ with $k=O(1)$ symmetric communities, average degree $\lambda=O(1)$, divergence parameter $\epsilon$, and subsampling probability $s$. For the detection problem of distinguishing this model from a pair of independent Erd\H{o}s-R\'enyi graphs with the same edge density $\mathcal{G}(n,\tfrac{\lambda s}{n})$, we focus on tests based on \emph{low-degree polynomials} of the entries of the adjacency matrices, and we determine the threshold that separates the easy and hard regimes. More precisely, we show that this class of tests can distinguish these two models if and only if $s> \min \{ \sqrt{\alpha}, \frac{1}{\lambda \epsilon^2} \}$, where $\alpha\approx 0.338$ is the Otter's constant and $\frac{1}{\lambda \epsilon^2}$ is the Kesten-Stigum threshold. Our proof of low-degree hardness is based on a conditional variant of the low-degree likelihood calculation.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# グラフニューラルネットワークを用いた深層強化学習による統合プロセス計画とスケジューリング問題の解法

Solving Integrated Process Planning and Scheduling Problem via Graph Neural Network Based Deep Reinforcement Learning ( http://arxiv.org/abs/2409.00968v1 )

ライセンス: Link先を確認

Hongpei Li, Han Zhang, Ziyan He, Yunkai Jia, Bo Jiang, Xiang Huang, Dongdong Ge,

(参考訳) 統合プロセス計画とスケジューリング(IPPS)問題は、プロセスルート計画とショップスケジューリングを組み合わせることで、生産の効率化と資源利用の最大化を実現している。混合整数線形計画法(MILP)とヒューリスティックアルゴリズムを用いる従来の手法では、IPPSを解く際の解の質と速度のバランスが良くない。本稿では,新しいエンドツーエンドのDeep Reinforcement Learning(DRL)手法を提案する。我々は、IPPS問題をマルコフ決定プロセス(MDP)としてモデル化し、不均一グラフニューラルネットワーク(GNN)を用いて、操作、機械、ジョブ間の複雑な関係を捉える。スケジューリング戦略の最適化にはPPO(Proximal Policy Optimization)を用いる。実験の結果,提案手法は従来手法と比較して,大規模IPPSインスタンスのソリューション効率と品質を著しく向上させ,現代のインテリジェント製造システムにおいて優れたスケジューリング戦略を提供することが示された。

The Integrated Process Planning and Scheduling (IPPS) problem combines process route planning and shop scheduling to achieve high efficiency in manufacturing and maximize resource utilization, which is crucial for modern manufacturing systems. Traditional methods using Mixed Integer Linear Programming (MILP) and heuristic algorithms can not well balance solution quality and speed when solving IPPS. In this paper, we propose a novel end-to-end Deep Reinforcement Learning (DRL) method. We model the IPPS problem as a Markov Decision Process (MDP) and employ a Heterogeneous Graph Neural Network (GNN) to capture the complex relationships among operations, machines, and jobs. To optimize the scheduling strategy, we use Proximal Policy Optimization (PPO). Experimental results show that, compared to traditional methods, our approach significantly improves solution efficiency and quality in large-scale IPPS instances, providing superior scheduling strategies for modern intelligent manufacturing systems.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# 解釈可能な畳み込みSyncNet

Interpretable Convolutional SyncNet ( http://arxiv.org/abs/2409.00971v1 )

ライセンス: Link先を確認

Sungjoon Park, Jaesub Yun, Donggeon Lee, Minsik Park,

(参考訳) さまざまな理由でビデオが同期不能になる可能性があるため、同期されたビデオを必要とするタスクのために、同期ネットがビデオを再同期するために使用される。これまでのSOTA(State-of-the-art)シンクネットはInfoNCEロスを使用しており、トランスフォーマーアーキテクチャに依存している。残念なことに、前者はモデルの出力を解釈しにくくし、後者は大きな画像に親しみがなく、同期ネットの有用性を制限している。本研究ではBCE損失(BBCE)とBCE損失(BCE)とInfoNCE損失(InfoNCE損失)に基づいて畳み込み同期ネットを訓練する。 InfoNCEの損失とは対照的に、BBCEの損失は複雑なサンプリングスキームを必要としない。我々のモデルはより大きな画像を扱うことができ、その出力は確率論的解釈を与えることができる。確率論的解釈により、オフセット時の確率やオフスクリーン比などのメトリクスを定義し、音声視覚(AV)音声データセットの同期品質を評価することができる。さらに、当社のモデルでは、LSS2データセットで9,6.5\%、LSS3データセットで9,3.8\%のSOTA精度を実現している。

Because videos in the wild can be out of sync for various reasons, a sync-net is used to bring the video back into sync for tasks that require synchronized videos. Previous state-of-the-art (SOTA) sync-nets use InfoNCE loss, rely on the transformer architecture, or both. Unfortunately, the former makes the model's output difficult to interpret, and the latter is unfriendly with large images, thus limiting the usefulness of sync-nets. In this work, we train a convolutional sync-net using the balanced BCE loss (BBCE), a loss inspired by the binary cross entropy (BCE) and the InfoNCE losses. In contrast to the InfoNCE loss, the BBCE loss does not require complicated sampling schemes. Our model can better handle larger images, and its output can be given a probabilistic interpretation. The probabilistic interpretation allows us to define metrics such as probability at offset and offscreen ratio to evaluate the sync quality of audio-visual (AV) speech datasets. Furthermore, our model achieves SOTA accuracy of $96.5\%$ on the LRS2 dataset and $93.8\%$ on the LRS3 dataset.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# IVGF:Fusion-Guided Infrared and Visible General Framework

IVGF: The Fusion-Guided Infrared and Visible General Framework ( http://arxiv.org/abs/2409.00973v1 )

ライセンス: Link先を確認

Fangcen Liu, Chenqiang Gao, Fang Chen, Pengcheng Li, Junjie Guo, Deyu Meng,

(参考訳) セマンティックセグメンテーション(セグメンテーション)やオブジェクト検出(オブジェクト検出)といった、赤外線および可視光二重モードタスクは、相補的な情報を融合することにより、極端な場面でも堅牢な性能を達成することができる。現在のほとんどのメソッドは、複数のタスクにまたがる一般化に制限があるタスク固有のフレームワークを設計している。本稿では、多くの高レベル視覚タスクに容易に拡張可能な、融合誘導型赤外線可視光一般フレームワークIVGFを提案する。まず、一般表現を抽出するために、SOTA赤外線および可視基盤モデルを採用する。そして,高次視覚タスクにおけるこれらの汎用表現のセマンティクス情報を強化するために,特徴マップとトークンのための特徴拡張モジュールとトークン拡張モジュールをそれぞれ設計する。さらに,2つのモードの相補的な情報を探究し,効果的に融合するための注意誘導核融合モジュールを提案する。さらに,データ拡張を行うために,カットアウト/ミックス拡張戦略を採用することで,モデルが2つのモダリティ間の地域相補性をマイニングする能力をさらに向上する。広範囲な実験により、IVGFはセマンティックセグメンテーションやオブジェクト検出タスクにおいて、最先端のデュアルモダリティ手法よりも優れていることが示された。詳細なアブレーション研究は各モジュールの有効性を実証し、別の実験では、二重モードセマンティックセマンティックセグメンテーションタスクにおいて提案手法の欠落防止能力について検討している。

Infrared and visible dual-modality tasks such as semantic segmentation and object detection can achieve robust performance even in extreme scenes by fusing complementary information. Most current methods design task-specific frameworks, which are limited in generalization across multiple tasks. In this paper, we propose a fusion-guided infrared and visible general framework, IVGF, which can be easily extended to many high-level vision tasks. Firstly, we adopt the SOTA infrared and visible foundation models to extract the general representations. Then, to enrich the semantics information of these general representations for high-level vision tasks, we design the feature enhancement module and token enhancement module for feature maps and tokens, respectively. Besides, the attention-guided fusion module is proposed for effectively fusing by exploring the complementary information of two modalities. Moreover, we also adopt the cutout&mix augmentation strategy to conduct the data augmentation, which further improves the ability of the model to mine the regional complementary between the two modalities. Extensive experiments show that the IVGF outperforms state-of-the-art dual-modality methods in the semantic segmentation and object detection tasks. The detailed ablation studies demonstrate the effectiveness of each module, and another experiment explores the anti-missing modality ability of the proposed method in the dual-modality semantic segmentation task.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# フェデレーションラーニングにおけるプライバシの強化:現実世界の医療アプリケーションのためのセキュアなアグリゲーション

Enhancing Privacy in Federated Learning: Secure Aggregation for Real-World Healthcare Applications ( http://arxiv.org/abs/2409.00974v1 )

ライセンス: Link先を確認

Riccardo Taiello, Sergen Cansiz, Marc Vesin, Francesco Cremonesi, Lucia Innocenti, Melek Önen, Marco Lorenzi,

(参考訳) 現実のシナリオ、特にヘルスケアにフェデレートドラーニング(FL)をデプロイすることは、コミュニケーションとセキュリティに課題をもたらす。特に、フェデレーションアグリゲーション手順に関して、研究者は、クライアントが送信するモデルのパラメータに対するプライバシー保証を提供するセキュアアグリゲーション(SA)スキームの研究に注力してきた。しかしながら、現在利用可能なFLフレームワークでのSAの実用性は現在、計算と通信のボトルネックのために制限されている。このギャップを埋めるために、オープンソースのFed-BioMedフレームワークにおけるSAの実装について検討する。我々は、医療データ分析問題パネルに広範なベンチマークを提供することにより、2つのSAプロトコル、Joye-Libert (JL) と Low Overhead Masking (LOM) を実装し、比較する。 4つのデータセットの理論的および実験的評価により、SAプロトコルはタスク精度を維持しながら、効果的にプライバシを保護することが示されている。トレーニング中の計算オーバーヘッドは、CPU上で1%未満、大規模モデルのGPUで50%未満であり、保護フェーズは10秒未満である。 Fed-BioMedにSAを組み込むことは、非SAシナリオと比較してタスクの正確性に2%以上影響を与えます。全体として、本研究では、現実世界の医療アプリケーションにおけるSAの実現可能性を示し、センシティブなアプリケーションにおけるプライバシ保護技術の採用に対するギャップを減らすことに寄与している。

Deploying federated learning (FL) in real-world scenarios, particularly in healthcare, poses challenges in communication and security. In particular, with respect to the federated aggregation procedure, researchers have been focusing on the study of secure aggregation (SA) schemes to provide privacy guarantees over the model's parameters transmitted by the clients. Nevertheless, the practical availability of SA in currently available FL frameworks is currently limited, due to computational and communication bottlenecks. To fill this gap, this study explores the implementation of SA within the open-source Fed-BioMed framework. We implement and compare two SA protocols, Joye-Libert (JL) and Low Overhead Masking (LOM), by providing extensive benchmarks in a panel of healthcare data analysis problems. Our theoretical and experimental evaluations on four datasets demonstrate that SA protocols effectively protect privacy while maintaining task accuracy. Computational overhead during training is less than 1% on a CPU and less than 50% on a GPU for large models, with protection phases taking less than 10 seconds. Incorporating SA into Fed-BioMed impacts task accuracy by no more than 2% compared to non-SA scenarios. Overall this study demonstrates the feasibility of SA in real-world healthcare applications and contributes in reducing the gap towards the adoption of privacy-preserving technologies in sensitive applications.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# ランダム化ガウス過程の上層信頼境界のレグレト解析

Regret Analysis for Randomized Gaussian Process Upper Confidence Bound ( http://arxiv.org/abs/2409.00979v1 )

ライセンス: Link先を確認

Shion Takeno, Yu Inatsu, Masayuki Karasuyama,

(参考訳) ガウス過程上信頼境界 (GP-UCB) はベイズ最適化 (BO) の理論的に確立されたアルゴリズムであり、目的関数 $f$ は GP に従うと仮定する。 GP-UCBの特筆すべき欠点は、反復とともに$\beta$が増加するという理論的な信頼パラメータが大きすぎることである。この欠点を軽減するために, 指数関数分布から生じる信頼度パラメータを用いて, 改良された乱数化GP-UCB (IRGP-UCB) と呼ばれるGP-UCBのランダム化変種を解析した。予測された後悔と条件付き後悔を分析し、予測と確率をそれぞれ$f$とノイズとBOアルゴリズムのランダム性で分析する。両方の後悔解析において、IRGP-UCBは入力領域が有限であれば信頼パラメータを増大させることなく、サブ線形後悔上限を達成する。最後に,合成およびベンチマーク関数と実世界のエミュレータを用いた数値実験を行った。

Gaussian process upper confidence bound (GP-UCB) is a theoretically established algorithm for Bayesian optimization (BO), where we assume the objective function $f$ follows GP. One notable drawback of GP-UCB is that the theoretical confidence parameter $\beta$ increased along with the iterations is too large. To alleviate this drawback, this paper analyzes the randomized variant of GP-UCB called improved randomized GP-UCB (IRGP-UCB), which uses the confidence parameter generated from the shifted exponential distribution. We analyze the expected regret and conditional expected regret, where the expectation and the probability are taken respectively with $f$ and noises and with the randomness of the BO algorithm. In both regret analyses, IRGP-UCB achieves a sub-linear regret upper bound without increasing the confidence parameter if the input domain is finite. Finally, we show numerical experiments using synthetic and benchmark functions and real-world emulators.

翻訳日:2024-09-06 08:08:59 公開日:2024-09-02

# 平衡外トンネル速度に対する熱力学およびエネルギー的制約

Thermodynamic and energetic constraints on out-of-equilibrium tunneling rates ( http://arxiv.org/abs/2409.00981v1 )

ライセンス: Link先を確認

Ludovico Tesser, Matteo Acciai, Christian Spånslätt, Inès Safi, Janine Splettstoesser,

(参考訳) 2つのサブシステム間のトンネル結合が遷移を引き起こす異なる温度で保持される二部量子系について検討する。 2つのサブシステム間の温度バイアスに依存した非平衡トンネル速度には2つの独立した制約があるが、どちらも結合量子系が小さい場合に特に制限的であることが分かる。これらの境界は、散逸した熱と、それぞれ温度バイアスを確立するのに必要な吸収エネルギーに関連付けられているため、熱力学的およびエネルギー的制約の形を取る。導出された制約は、トンネル機構への制限を除いて、相互作用や非線形エネルギースペクトルを含む任意のサブシステムハミルトニアンの量子系に適用される。これらの結果は、分子接合から結合キャビティまで、多くの実験的なシステムに関係しており、例えば、非平衡トンネル電流とそのノイズを測定することで試験することができる。

We study bipartite quantum systems kept at different temperatures where a tunnel coupling between the two subsystems induces transitions. We find two independent constraints on the temperature-bias-dependent, out-of-equilibrium tunneling rates between the two subsystems, which both turn out to be particularly restrictive when the coupled quantum systems are small. These bounds take the form of a thermodynamic and of an energetic constraint, as they are associated with the dissipated heat and with the absorbed energy required to establish and deplete the temperature bias, respectively. The derived constraints apply to a large class of experimentally accessible quantum systems: except for the restriction to the tunneling regime, they hold for arbitrary subsystem Hamiltonians, including interactions or non-linear energy spectra. These results hold for a large class of experimentally relevant systems, ranging from molecular junctions to coupled cavities, and can be tested by, for instance, measuring the out-of-equilibrium tunneling current and its noise.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# GCCRR:Ear-Worn IMUに基づく短周期歩行周期分割法

GCCRR: A Short Sequence Gait Cycle Segmentation Method Based on Ear-Worn IMU ( http://arxiv.org/abs/2409.00983v1 )

ライセンス: Link先を確認

Zhenye Xu, Yao Guo,

(参考訳) 運動機能障害患者の在宅モニタリングとリハビリテーションのための実践的,非侵襲的アプローチとして,耳鳴IMUの短いシーケンスを用いた歩行周期セグメンテーションの重要課題について述べる。以前の研究では下肢に位置するIMUに焦点が当てられていたが、耳を縫ったIMUは、最小限の侵入で歩行動態を捉えるのにユニークな利点がある。短周期を用いた歩行周期のセグメンテーションの課題に対処するために、我々は、微細な歩行位相セグメンテーションのために設計された新しい2段階アプローチである、歩行特性曲線回帰再生法(GCCRR)を導入する。第1段階は、セグメント化タスクを周期情報を組み込んだ1次元の特徴系列である歩行特性曲線(GCC)の回帰タスクに変換する。第2段階はピーク検出技術を用いて歩行周期を復元する。提案手法では,Bi-LSTMに基づく深層学習アルゴリズムを用いて,短い歩数列に対して信頼性の高いセグメンテーションを実現する。 HamlynGaitデータセットの評価では、GCCRRは80\%以上の精度を実現しており、Timestamp Errorは1回のサンプリング間隔以下である。その有望な結果にもかかわらず、より広範なセンサーシステムを使用する方法の遅れは、より大きな、より多様なデータセットの必要性を強調している。今後の研究は、モーションキャプチャシステムによるデータ拡張とアルゴリズムの一般化性の改善に焦点を当てる予定である。

This paper addresses the critical task of gait cycle segmentation using short sequences from ear-worn IMUs, a practical and non-invasive approach for home-based monitoring and rehabilitation of patients with impaired motor function. While previous studies have focused on IMUs positioned on the lower limbs, ear-worn IMUs offer a unique advantage in capturing gait dynamics with minimal intrusion. To address the challenges of gait cycle segmentation using short sequences, we introduce the Gait Characteristic Curve Regression and Restoration (GCCRR) method, a novel two-stage approach designed for fine-grained gait phase segmentation. The first stage transforms the segmentation task into a regression task on the Gait Characteristic Curve (GCC), which is a one-dimensional feature sequence incorporating periodic information. The second stage restores the gait cycle using peak detection techniques. Our method employs Bi-LSTM-based deep learning algorithms for regression to ensure reliable segmentation for short gait sequences. Evaluation on the HamlynGait dataset demonstrates that GCCRR achieves over 80\% Accuracy, with a Timestamp Error below one sampling interval. Despite its promising results, the performance lags behind methods using more extensive sensor systems, highlighting the need for larger, more diverse datasets. Future work will focus on data augmentation using motion capture systems and improving algorithmic generalizability.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# 共学習:会話型自然言語インタフェースを用いた多言語強化協調フレームワークのためのコード学習

Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces ( http://arxiv.org/abs/2409.00985v1 )

ライセンス: Link先を確認

Jiapeng Yu, Yuqian Wu, Yajing Zhan, Wenhao Guo, Zhou Xu, Raymond Lee,

(参考訳) 大規模言語モデル(LLM)に基づくオンライン質問・回答システム(Q\&A)は、レクリエーションから専門的な利用へと徐々に変化してきた。本稿では,コード学習コミュニティ(Code Learning (Co-Learning) Community)と呼ばれるコード修正のための環境強化学習(E-RL)を備えたマルチエージェントフレームワークを提案する。 702の誤り符号を持つ元のデータセットから複数のLSMの性能を評価し、E-RLの報酬または罰則として使用し、入力エラー符号を現在のエージェントで分析し、適切なLSMベースのエージェントを選択し、最適な誤り訂正精度を達成し、修正時間を短縮する。実験の結果,E-RL法と比較して精度が3倍,時間コストが15倍改善した。私たちのソースコードは、https://github.com/yuqian2003/Co_Learning.comで公開されています。

Online question-and-answer (Q\&A) systems based on the Large Language Model (LLM) have progressively diverged from recreational to professional use. This paper proposed a Multi-Agent framework with environmentally reinforcement learning (E-RL) for code correction called Code Learning (Co-Learning) community, assisting beginners to correct code errors independently. It evaluates the performance of multiple LLMs from an original dataset with 702 error codes, uses it as a reward or punishment criterion for E-RL; Analyzes input error codes by the current agent; selects the appropriate LLM-based agent to achieve optimal error correction accuracy and reduce correction time. Experiment results showed that 3\% improvement in Precision score and 15\% improvement in time cost as compared with no E-RL method respectively. Our source code is available at: https://github.com/yuqian2003/Co_Learning

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# パーソナライズされた唇読み:視覚と言語によるユニークな唇の動きに適応する

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language ( http://arxiv.org/abs/2409.00986v1 )

ライセンス: Link先を確認

Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, Yong Man Ro,

(参考訳) 唇読解は、唇の動きを分析して音声言語を予測することを目的としている。唇読解技術の進歩にもかかわらず、唇の外観などの視覚情報の変化に敏感なため、モデルが見えない話者に適用されると性能が低下する。この課題に対処するために、話者適応型唇読解技術は、視覚的モダリティにおいてターゲット話者に唇読取モデルを効果的に適応させることに集中して進歩してきた。対象話者の語彙選択などの言語情報への適応の有効性については,これまでの研究では検討されていない。さらに、話者適応のための既存のデータセットは語彙のサイズが限られており、実際のシナリオにおける従来の話者適応手法の検証が制限されている。これらの課題に対処するため,視覚レベルと言語レベルの両方の話者を対象に,事前学習モデルを適用した新しい話者適応型唇読解法を提案する。具体的には、プロンプトチューニングとLoRAアプローチを統合し、訓練済みの唇読解モデルに適用し、ターゲット話者に効果的に適用する。さらに,実世界のシナリオでの有効性を検証するために,VoxCeleb2とLSS3から派生した新たなデータセットであるVoxLRS-SAを導入する。約100Kの単語の語彙を含み、多様なポーズのバリエーションを提供し、野生の文レベルの唇読解における適応法の検証を初めて行うことができる。種々の実験を通して,既存の話者適応法は文レベルでの野生における性能も向上することを示した。さらに,提案手法により,提案手法は従来の提案手法と比較して,対象話者に適用した場合の大幅な改善を実現することを示す。

Lip reading aims to predict spoken language by analyzing lip movements. Despite advancements in lip reading technologies, performance degrades when models are applied to unseen speakers due to their sensitivity to variations in visual information such as lip appearances. To address this challenge, speaker adaptive lip reading technologies have advanced by focusing on effectively adapting a lip reading model to target speakers in the visual modality. The effectiveness of adapting language information, such as vocabulary choice, of the target speaker has not been explored in the previous works. Moreover, existing datasets for speaker adaptation have limited vocabulary size and pose variations, limiting the validation of previous speaker-adaptive methods in real-world scenarios. To address these issues, we propose a novel speaker-adaptive lip reading method that adapts a pre-trained model to target speakers at both vision and language levels. Specifically, we integrate prompt tuning and the LoRA approach, applying them to a pre-trained lip reading model to effectively adapt the model to target speakers. In addition, to validate its effectiveness in real-world scenarios, we introduce a new dataset, VoxLRS-SA, derived from VoxCeleb2 and LRS3. It contains a vocabulary of approximately 100K words, offers diverse pose variations, and enables the validation of adaptation methods in wild, sentence-level lip reading for the first time. Through various experiments, we demonstrate that the existing speaker-adaptive method also improves performance in the wild at the sentence level. Moreover, with the proposed adaptation method, we show that the proposed method achieves larger improvements when applied to the target speaker, compared to the previous works.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# 交互最適化によるブラインド画像デブロアリングのための自己監督型マルチスケールネットワーク

Self-Supervised Multi-Scale Network for Blind Image Deblurring via Alternating Optimization ( http://arxiv.org/abs/2409.00988v1 )

ライセンス: Link先を確認

Lening Guo, Jing Yu, Ning Zhang, Chuangbai Xiao,

(参考訳) ブラインドイメージデブロワーリング(Blind image deblurring)は、ぼやけたカーネルが未知のときに、未ブルーのイメージを推定する、挑戦的な低レベルのビジョンタスクである。本稿では,画像とぼやけたカーネルを交互に推定する,自己監督型マルチスケールブラインド画像デブロアリング手法を提案する。画像推定ステップでは、複数の入力と複数の出力を持つマルチスケールジェネレータネットワークを構築し、様々なスケールで遅延画像を協調的に推定し、ぼやけた画像のみから構築した画像ピラミッドによって監督する。このジェネレータは、ネットワーク上にアーキテクチャ上の制約を配置し、画像先行の数学的表現を必要としない。ぼやけたカーネル推定ステップでは、画像推定のために提案したマルチスケールジェネレータへの柔軟な適応のために、各スケールのぼやけたカーネルを2次正規化最小二乗モデルへの直接解で独立に推定する。提案手法は,複数スケールにわたる協調的推定により,計算集約的な粗大な伝播や,従来の数式最適化法で用いられる画像の劣化を回避できる。合成および現実的なデータセットの定量的および定性的な実験結果から,本手法の優れた性能,特に大規模および実世界のぼかしの処理性能を示す。

Blind image deblurring is a challenging low-level vision task that involves estimating the unblurred image when the blur kernel is unknown. In this paper, we present a self-supervised multi-scale blind image deblurring method to jointly estimate the latent image and the blur kernel via alternating optimization. In the image estimation step, we construct a multi-scale generator network with multiple inputs and multiple outputs to collaboratively estimate latent images at various scales, supervised by an image pyramid constructed from only the blurred image. This generator places architectural constraints on the network and avoids the need for mathematical expression of image priors. In the blur kernel estimation step, the blur kernel at each scale is independently estimated with a direct solution to a quadratic regularized least-squares model for its flexible adaptation to the proposed multi-scale generator for image estimation. Thanks to the collaborative estimation across multiple scales, our method avoids the computationally intensive coarse-to-fine propagation and additional image deblurring processes used in traditional mathematical optimization-based methods. Quantitative and qualitative experimental results on synthetic and realistic datasets demonstrate the superior performance of our method, especially for handling large and real-world blurs.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# ブラインド顔修復のための3次元優先誘導拡散法

3D Priors-Guided Diffusion for Blind Face Restoration ( http://arxiv.org/abs/2409.00991v1 )

ライセンス: Link先を確認

Xiaobin Lu, Xiaobin Hu, Jun Luo, Ben Zhu, Yaping Ruan, Wenqi Ren,

(参考訳) 劣化した顔画像から鮮明な顔画像を復元するためのブラインド顔復元作業。 GAN(Generative Adversarial Networks)を先駆者として採用した最近のアプローチは、この分野において顕著な成功を収めている。しかし、これらの手法は、特に複雑な劣化シナリオにおいて、現実主義と忠実さのバランスを達成する上で困難に直面する。拡散モデルの例外的リアリズム生成能力を継承し,自己認識の忠実さに制約されるために,3次元顔の先行を構造と同一性制約としてデノナイズド拡散プロセスに埋め込むことにより,新しい拡散基盤フレームワークを提案する。具体的には、より正確な3D先行表現を得るために、予め訓練された復元ネットワークで処理された初期復元顔画像を用いて、3D形態モデル(3DMM)により3D顔画像を再構成する。ノイズ推定プロセスにマッピングされる3次元顔画像の構造情報と同一性情報の両方を利用するために、カスタマイズされたマルチレベル特徴抽出手法を用いる。識別情報のノイズ推定への融合を強化するため,時間認識融合ブロック(TAFB)を提案する。本モジュールは,初期構造改善とテクスチャ詳細強化を伴う拡散モデルにおけるデノナイジング過程の動的性質を考慮した,より効率的で適応的な重みの融合を提供する。

Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success in this field. However, these methods encounter challenges in achieving a balance between realism and fidelity, particularly in complex degradation scenarios. To inherit the exceptional realism generative ability of the diffusion model and also constrained by the identity-aware fidelity, we propose a novel diffusion-based framework by embedding the 3D facial priors as structure and identity constraints into a denoising diffusion process. Specifically, in order to obtain more accurate 3D prior representations, the 3D facial image is reconstructed by a 3D Morphable Model (3DMM) using an initial restored face image that has been processed by a pretrained restoration network. A customized multi-level feature extraction method is employed to exploit both structural and identity information of 3D facial images, which are then mapped into the noise estimation process. In order to enhance the fusion of identity information into the noise estimation, we propose a Time-Aware Fusion Block (TAFB). This module offers a more efficient and adaptive fusion of weights for denoising, considering the dynamic nature of the denoising process in the diffusion model, which involves initial structure refinement followed by texture detail enhancement.Extensive experiments demonstrate that our network performs favorably against state-of-the-art algorithms on synthetic and real-world datasets for blind face restoration.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# 剛性に基づく損失関数を持つ物理インフォームドDeepONetによる構造応答予測

Physics-informed DeepONet with stiffness-based loss functions for structural response prediction ( http://arxiv.org/abs/2409.00994v1 )

ライセンス: Link先を確認

Bilal Ahmed, Yuqing Qiu, Diab W. Abueidda, Waleed El-Sekelly, Borja Garcia de Soto, Tarek Abdoun, Mostafa E. Mobasher,

(参考訳) 有限要素モデリングは、構造解析のための確立されたツールであるが、複雑な構造をモデル化するには、広範囲な前処理、重要な分析努力、かなりの時間を要することが多い。本研究では,DeepOnetを用いた構造的静的応答のリアルタイム予測手法を導入することで,この課題に対処する。このアプローチは、様々な負荷クラスとマグニチュードの下でレスポンスを正確に予測する柔軟性を提供します。トレーニングされたDeepONetは、1秒以内にドメイン全体のソリューションを生成することができる。この機能は、FEモデリングにおける各新しいケースで通常必要とされる広範囲なリモデリングと分析の必要性を効果的に排除する。提案手法を実橋の簡易な2次元ビーム構造と包括的3次元モデルという2つの構造に適用する。 DeepONetで複数の変数を予測するには、分割ブランチ/トランクと複数のDeepONetsを1つのDeepONetに統合する2つの戦略を利用する。データ駆動トレーニングに加えて、新しい物理インフォームドトレーニングアプローチを導入する。この方法は構造剛性行列を活用し、基本的な平衡とエネルギー保存の原理を強制し、2つの新しい物理学インフォームド損失関数(エネルギー保存とシュア補数を用いた静的平衡)をもたらす。損失関数の様々な組み合わせを用いて、トレーニング時間を大幅に短縮し、5%未満の誤差率を達成する。本研究では,ハイブリッド損失関数によって強化されたDeepONetが,各メッシュ点における変位と回転を,トレーニング時間を短縮して正確に,効率的に予測できることを示す。

Finite element modeling is a well-established tool for structural analysis, yet modeling complex structures often requires extensive pre-processing, significant analysis effort, and considerable time. This study addresses this challenge by introducing an innovative method for real-time prediction of structural static responses using DeepOnet which relies on a novel approach to physics-informed networks driven by structural balance laws. This approach offers the flexibility to accurately predict responses under various load classes and magnitudes. The trained DeepONet can generate solutions for the entire domain, within a fraction of a second. This capability effectively eliminates the need for extensive remodeling and analysis typically required for each new case in FE modeling. We apply the proposed method to two structures: a simple 2D beam structure and a comprehensive 3D model of a real bridge. To predict multiple variables with DeepONet, we utilize two strategies: a split branch/trunk and multiple DeepONets combined into a single DeepONet. In addition to data-driven training, we introduce a novel physics-informed training approaches. This method leverages structural stiffness matrices to enforce fundamental equilibrium and energy conservation principles, resulting in two novel physics-informed loss functions: energy conservation and static equilibrium using the Schur complement. We use various combinations of loss functions to achieve an error rate of less than 5% with significantly reduced training time. This study shows that DeepONet, enhanced with hybrid loss functions, can accurately and efficiently predict displacements and rotations at each mesh point, with reduced training time.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# DataSculpt:多目的分割によるLCM後トレーニングのためのデータランドスケープの構築

DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning ( http://arxiv.org/abs/2409.00997v1 )

ライセンス: Link先を確認

Keer Lu, Zheng Liang, Xiaonan Nie, Da Pan, Shusen Zhang, Keshi Zhao, Weipeng Chen, Zenan Zhou, Guosheng Dong, Wentao Zhang, Bin Cui,

(参考訳) 長期コンテキストモデリングの有効性は、様々なアプリケーションにおいて大規模言語モデル(LLM)にとって重要である。その可能性にもかかわらず、LLMsの長期的文脈における有効性は、常に期待を満たさないため、トレーニングにおける長期的シーケンスの効率的な管理には重大な課題が生じる。この難しさは、異なるデータソースにまたがる固有の長さバイアスに起因する、長いシーケンスに適した包括的で多様なトレーニングデータセットの不足と、拡張されたコンテキストでのトレーニングのための大規模データ管理に関連する論理的複雑さによって複雑化されている。本研究では,拡張コンテキストトレーニングのためのデータアーキテクチャを戦略的に強化するデータ構築フレームワークであるDataSculptを紹介する。我々の徹底的な評価は、DataSculptが長期コンテキストトレーニングのパフォーマンスを向上する驚くべき能力を示し、18.09%の検索強化、21.23%の要約、21.27%の読み取り理解、3.81%のコード補完、そしてモデルの全体的な習熟度を4.88%の改善で保ちながら達成していることを示している。

The effectiveness of long-context modeling is important for Large Language Models (LLMs) in various applications. Despite their potential, LLMs' efficacy in processing long context does not consistently meet expectations, posing significant challenges for efficient management of prolonged sequences in training. This difficulty is compounded by the scarcity of comprehensive and diverse training datasets suitable for long sequences, which stems from inherent length biases across different data sources, and the logistical complexities associated with massive data management for training in extended contexts. In this work, we introduce DataSculpt, a data construction framework designed to strategically augment the data architecture for extended-context training. Our thorough evaluations demonstrate DataSculpt's remarkable capacity to boost long-context training performance, achieving improvements including an 18.09% increase in retrieval augmentation, 21.23% in summarization, 21.27% in reading comprehension, and a 3.81% rise in code completion, all while preserving the models' overall proficiency with a 4.88% improvement.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# 画像分類のための高調な量子エクストリーム学習マシン

Harnessing Quantum Extreme Learning Machines for image classification ( http://arxiv.org/abs/2409.00998v1 )

ライセンス: Link先を確認

A. De Lorenzis, M. P. Casado, M. P. Estarellas, N. Lo Gullo, T. Lux, F. Plastina, A. Riera, J. Settino,

(参考訳) 量子機械学習への関心は、古典的な手法に取り組むのが難しい問題に対する効率的なソリューションを開発する可能性から、ますます高まっている。本研究は,画像分類タスクにおける量子機械学習技術の利用に焦点を当てた研究である。我々は,量子貯水池基板が提供する豊富な特徴写像を利用して,量子極端学習マシンを利用する。我々は、データセット作成から画像最終分類まで、量子極端学習マシンプロセスの異なるフェーズを体系的に分析する。特に、主成分分析による符号化とオートエンコーダの使用による影響、および量子貯水池における異なるハミルトニアンの使用によるモデルのダイナミクスについて検討する。その結果,量子貯水池の導入は分類器の精度を体系的に向上させることがわかった。さらに、異なるエンコーディングは異なるパフォーマンスをもたらす可能性があるが、異なる接続度を持つハミルトン人は、相互作用している場合と同じ差別率を示す。

Interest in quantum machine learning is increasingly growing due to the possibility of developing efficient solutions to problems that are difficult to tackle with classical methods. In this context, the research work presented here focuses on the use of quantum machine learning techniques for image classification tasks. We exploit a quantum extreme learning machine by taking advantage of its rich feature map provided by the quantum reservoir substrate. We systematically analyse different phases of the quantum extreme learning machine process, from the dataset preparation to the image final classification. In particular, we investigate the impact of encoding through a Principal Component Analysis and the use of Auto-Encoders, as well as the dynamics of the model through the use of different Hamiltonians for the quantum reservoir. Our results show that the introduction of a quantum reservoir systematically improves the accuracy of the classifier. Additionally, while different encodings can lead to significantly different performances, Hamiltonians with varying degrees of connectivity exhibit the same discrimination rate, provided they are interacting.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# 雑音確率的誤差キャンセラと一般化物理実装可能性

Noisy Probabilistic Error Cancellation and Generalized Physical Implementability ( http://arxiv.org/abs/2409.01000v1 )

ライセンス: Link先を確認

Tian-Ren Jin, Kai Xu, Yu-Ran Zhang, Heng Fan,

(参考訳) 量子デコヒーレントノイズは実用的な量子プロセッサの性能に大きな影響を与えている。確率的誤差キャンセル量子誤差軽減法は、物理的チャネルではないノイズ逆演算を準確率的にシミュレートし、ノイズをキャンセルする。物理実装性(英: physical implementability)は、準確率分解によって物理チャネルを持つ非物理的量子演算をシミュレートする最小のコストである。しかし、実際は、このキャンセルはノイズの影響を受けうるため、実装可能なチャネルは物理的チャネルのすべてではないため、物理的実装性は確率的エラーキャンセル法の実情を完全に表現するのに十分ではない。したがって、自由量子資源の任意の凸集合に物理実装性を一般化し、その性質を議論する。ノイズの多いパウリベースでエラーチャネルを最適にキャンセルする方法を実証する。さらに、この一般化に関連するいくつかの性質についても論じる。我々は、その特性と構造を包括的に調査し、量子情報処理の分野でより多くの応用を期待する。

Quantum decoherent noises have significantly influenced the performance of practical quantum processors. Probabilistic error cancellation quantum error mitigation method quasiprobabilistically simulates the noise inverse operations, which are not physical channels, to cancel the noises. Physical implementability is the minimal cost to simulate a non-physical quantum operation with physical channels by the quasiprobabilistic decomposition. However, in practical, this cancellation may also be influenced by noises, and the implementable channels are not all of the physical channels, so the physical implementability is not sufficient to completely depict the practical situation of the probabilistic error cancellation method. Therefore, we generalize the physical implementability to an arbitrary convex set of free quantum resources and discuss several of its properties. We demonstrate the way to optimally cancel the error channel with the noisy Pauli basis. In addition, we also discuss the several properties relevant to this generalization. We expect that its properties and structures will be investigated comprehensively, and it will have more applications in the field of quantum information processing.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# チャットGPTを超えて - ソフトウェア品質保証タスクを多言語 LLM とバリデーション技術で強化する

Beyond ChatGPT: Enhancing Software Quality Assurance Tasks with Diverse LLMs and Validation Techniques ( http://arxiv.org/abs/2409.01001v1 )

ライセンス: Link先を確認

Ratnadira Widyasari, David Lo, Lizi Liao,

(参考訳) LLM(Large Language Models)の進歩に伴い、ソフトウェア品質保証(Software Quality Assurance, SQA)への応用が増加している。しかし、これらのアプリケーションの現在の焦点は、主にChatGPTに焦点を当てている。この重要な領域では、様々なLLMの性能を理解することにはまだギャップがある。本稿では,2つのSQAタスク(障害局所化と脆弱性検出)にまたがる複数のLSMの能力に関する包括的調査を行うことにより,このギャップに対処することを目的とする。 GPT-3.5, GPT-4o, および他の4つのLLM(LLaMA-3-70B, LLaMA-3-8B, Gemma-7B, Mixtral-8x7B)を用いて比較検討を行い, これらの課題の有効性を検討した。以上の結果より,複数のLDMがGPT-3.5より優れていることが示唆された。さらに、低性能のLLMでさえ独自の正しい予測を提供し、異なるLLMの結果を組み合わせて全体的な性能を高める可能性を示唆した。 LLMの結果を組み合わせる投票機構を実装することで,両タスクにおいてGPT-3.5よりも10%以上の改善を実現した。さらに、検証プロンプトを用いて一方のLSM回答を他方に対して検証することにより、LCM回答を洗練するためのクロスバリデーション手法を導入した。このアプローチにより、障害のローカライゼーションが16%、脆弱性検出が12%、GPT-3.5が4%向上した。また, LLMの結果に説明文を組み込むことが, クロスバリデーション手法の有効性に影響を与えることも示唆した。

With the advancement of Large Language Models (LLMs), their application in Software Quality Assurance (SQA) has increased. However, the current focus of these applications is predominantly on ChatGPT. There remains a gap in understanding the performance of various LLMs in this critical domain. This paper aims to address this gap by conducting a comprehensive investigation into the capabilities of several LLMs across two SQA tasks: fault localization and vulnerability detection. We conducted comparative studies using GPT-3.5, GPT-4o, and four other publicly available LLMs (LLaMA-3-70B, LLaMA-3-8B, Gemma-7B, and Mixtral-8x7B), to evaluate their effectiveness in these tasks. Our findings reveal that several LLMs can outperform GPT-3.5 in both tasks. Additionally, even the lower-performing LLMs provided unique correct predictions, suggesting the potential of combining different LLMs' results to enhance overall performance. By implementing a voting mechanism to combine the LLMs' results, we achieved more than a 10% improvement over the GPT-3.5 in both tasks. Furthermore, we introduced a cross-validation approach to refine the LLM answer by validating one LLM answer against another using a validation prompt. This approach led to performance improvements of 16% in fault localization and 12% in vulnerability detection compared to the GPT-3.5, with a 4% improvement compared to the best-performed LLMs. Our analysis also indicates that the inclusion of explanations in the LLMs' results affects the effectiveness of the cross-validation technique.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# Free-DyGS:動的手術ビデオのためのガウススプレイティングに基づくカメラ不要シーン再構成

Free-DyGS: Camera-Pose-Free Scene Reconstruction based on Gaussian Splatting for Dynamic Surgical Videos ( http://arxiv.org/abs/2409.01003v1 )

ライセンス: Link先を確認

Qian Li, Shuojue Yang, Daiyun Shen, Yueming Jin,

(参考訳) 内視鏡的ビデオの再構成は,高忠実度可視化と外科手術の効率化に不可欠である。重要にもかかわらず、既存の3D再構成手法は、精度の厳しい要求、不正確なカメラ位置決め、複雑なダイナミックシーン、迅速な再構築の必要性など、いくつかの課題に直面している。これらの課題に対処するために,3Dガウススプラッティング技術を活用し,ダイナミックな手術ビデオに適したカメラレスシーン再構築フレームワークであるFree-DyGSを提案する。提案手法は,フレーム単位の再構築戦略を採用し,シーン初期化,共同学習,シーン拡張,レトロスペクティブ学習という4つの段階に分けられる。本稿では,RGBDフレームから各画素のガウス属性を逐次生成するために,Scene Initialization と Expansion フェーズ内に一般化可能なガウスパラメータ化モジュールを導入する。共同学習フェーズは、革新的なフレキシブルな変形モジュールによって促進されるシーン変形とカメラポーズを同時に推定する。シーン拡大段階では、カメラが動くにつれてガウス点が徐々に大きくなる。振り返り学習フェーズは、先行フレームの再評価を通じてシーン変形の精度を高めることを目的としている。提案されたFree-DyGSの有効性は、StereoMISとHamlynデータセットという2つのデータセットの実験を通じて実証されている。実験結果は、Free-DyGSが従来のベースラインモデルを超え、レンダリング忠実度と計算効率の両方を上回っていることを示している。

Reconstructing endoscopic videos is crucial for high-fidelity visualization and the efficiency of surgical operations. Despite the importance, existing 3D reconstruction methods encounter several challenges, including stringent demands for accuracy, imprecise camera positioning, intricate dynamic scenes, and the necessity for rapid reconstruction. Addressing these issues, this paper presents the first camera-pose-free scene reconstruction framework, Free-DyGS, tailored for dynamic surgical videos, leveraging 3D Gaussian splatting technology. Our approach employs a frame-by-frame reconstruction strategy and is delineated into four distinct phases: Scene Initialization, Joint Learning, Scene Expansion, and Retrospective Learning. We introduce a Generalizable Gaussians Parameterization module within the Scene Initialization and Expansion phases to proficiently generate Gaussian attributes for each pixel from the RGBD frames. The Joint Learning phase is crafted to concurrently deduce scene deformation and camera pose, facilitated by an innovative flexible deformation module. In the scene expansion stage, the Gaussian points gradually grow as the camera moves. The Retrospective Learning phase is dedicated to enhancing the precision of scene deformation through the reassessment of prior frames. The efficacy of the proposed Free-DyGS is substantiated through experiments on two datasets: the StereoMIS and Hamlyn datasets. The experimental outcomes underscore that Free-DyGS surpasses conventional baseline models in both rendering fidelity and computational efficiency.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# 大規模言語モデルの知恵を解き放つ:人工知能への道のり

Unlocking the Wisdom of Large Language Models: An Introduction to The Path to Artificial General Intelligence ( http://arxiv.org/abs/2409.01007v1 )

ライセンス: Link先を確認

Edward Y. Chang,

(参考訳) この小冊子"Unlocking the Wisdom of Large Language Models"は包括的作品"The Path to Artificial General Intelligence"の紹介となる。一連の9つのアフォリスムを通じて、敵のLLM対話を通じてAIの未来を探究するための重要な洞察と原則を抽出する。本稿では,人工知能(AGI)の実現に向けた潜在的経路として,このアプローチを提案する。この冊子には本書の巻名、抄録、序文が含まれており、その全文で最初の2章を提示している。

This booklet, "Unlocking the Wisdom of Large Language Models," serves as an introduction to the comprehensive work "The Path to Artificial General Intelligence." Through a series of nine aphorisms, we distill key insights and principles that underpin the larger exploration of AI's future through adversarial LLM dialogue. We propose this approach as a potential path to realizing artificial general intelligence (AGI). This booklet also includes the titles, abstracts, and introductions of the chapters in the main book, and presents the first two chapters in their entirety.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# 木を$\ell_1$-双曲距離に適合させる

Fitting trees to $\ell_1$-hyperbolic distances ( http://arxiv.org/abs/2409.01010v1 )

ライセンス: Link先を確認

Joon-Hyeok Yim, Anna C. Gilbert,

(参考訳) 植物遺伝学的解析、メートル法埋め込み、近似アルゴリズム、幾何グラフニューラルネット、階層データの解析において、木を構築することは重要な要素である。しかし、それまでのアルゴリズム的な研究の多くは、一般的な距離空間(すなわち、事前制約のないもの)に焦点を当てていた。双曲幾何学と幾何群理論の数学的解析からいくつかのアイデアを取り入れ、木嵌合問題を、双曲性(ウルトラメトリック性)ベクトルと木埋め込みの誤差の関係を見出すものとして研究する。すなわち、すべての点三重項上の双曲性(ultrametric)値のベクトルを定義し、このベクトルの$\ell_p$ノルムと、最良の木の歪みの$\ell_q$ノルムを比較する。この定式化により、双曲性ベクトルの正規化された$\ell_1$ノルムの言葉で平均双曲性 (ultrametricity) を定義することができる。さらに、グロモフの古典的ツリー適合結果は、$p = q = \infty$ resultと解釈できる。出力埋め込みの$\ell_1$エラーが双曲性ベクトルの$\ell_1$ノルム(すなわち$p = q = 1$)で解析的に有界であるようなアルゴリズム HCCRootedTreeFit を提案する。さらに、このアルゴリズムはグロモフの結果や関連するアルゴリズムと比較して、理論的および経験的性能が著しく異なる。最後に、HCCRootedTreeFitと関連する木適合アルゴリズムを用いて、階層型データ解析と幾何グラフニューラルネットワークの標準データセットは、合成された木のようなデータセットと根本的に異なる木適合性を持ち、これらの標準データセットのより洗練された分析が求められていることを示す。

Building trees to represent or to fit distances is a critical component of phylogenetic analysis, metric embeddings, approximation algorithms, geometric graph neural nets, and the analysis of hierarchical data. Much of the previous algorithmic work, however, has focused on generic metric spaces (i.e., those with no a priori constraints). Leveraging several ideas from the mathematical analysis of hyperbolic geometry and geometric group theory, we study the tree fitting problem as finding the relation between the hyperbolicity (ultrametricity) vector and the error of tree (ultrametric) embedding. That is, we define a vector of hyperbolicity (ultrametric) values over all triples of points and compare the $\ell_p$ norms of this vector with the $\ell_q$ norm of the distortion of the best tree fit to the distances. This formulation allows us to define the average hyperbolicity (ultrametricity) in terms of a normalized $\ell_1$ norm of the hyperbolicity vector. Furthermore, we can interpret the classical tree fitting result of Gromov as a $p = q = \infty$ result. We present an algorithm HCCRootedTreeFit such that the $\ell_1$ error of the output embedding is analytically bounded in terms of the $\ell_1$ norm of the hyperbolicity vector (i.e., $p = q = 1$) and that this result is tight. Furthermore, this algorithm has significantly different theoretical and empirical performance as compared to Gromov's result and related algorithms. Finally, we show using HCCRootedTreeFit and related tree fitting algorithms, that supposedly standard data sets for hierarchical data analysis and geometric graph neural networks have radically different tree fits than those of synthetic, truly tree-like data sets, suggesting that a much more refined analysis of these standard data sets is called for.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# チューバンボスリップスクリプトのためのマルチモーダルマルチグラニュリティトケナイザ

Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts ( http://arxiv.org/abs/2409.01011v1 )

ライセンス: Link先を確認

Yingfa Chen, Chenlong Hu, Cong Feng, Chenyang Song, Shi Yu, Xu Han, Zhiyuan Liu, Maosong Sun,

(参考訳) 本研究では,古代中国における春・秋・戦国期(紀元前771-256年)に用いられた中竹スリップ(CBS)スクリプトに着目し,古代中国文字の分析に特化して設計された多モード多粒性トークンについて述べる。一つの文字が複数のサブ文字の組み合わせである古代中国語の複雑な階層構造を考えると、トークンライザはまず文字検出を採用して文字境界の特定を行い、文字レベルとサブ文字レベルの両方で文字認識を行う。さらに,学術コミュニティを支援するために,100K以上の注釈付き文字画像スキャンを備えたCBSの大規模データセットも収集した。我々のデータセット上に構築された音声タグ付けタスクでは、私たちのトークンライザを使うことで、主流のサブワードトークンライザと比較してF1スコアが5.5%向上します。我々の研究は、特定の文字のさらなる調査に役立つだけでなく、他の形態の漢文についての研究を進める可能性も持っている。

This study presents a multi-modal multi-granularity tokenizer specifically designed for analyzing ancient Chinese scripts, focusing on the Chu bamboo slip (CBS) script used during the Spring and Autumn and Warring States period (771-256 BCE) in Ancient China. Considering the complex hierarchical structure of ancient Chinese scripts, where a single character may be a combination of multiple sub-characters, our tokenizer first adopts character detection to locate character boundaries, and then conducts character recognition at both the character and sub-character levels. Moreover, to support the academic community, we have also assembled the first large-scale dataset of CBSs with over 100K annotated character image scans. On the part-of-speech tagging task built on our dataset, using our tokenizer gives a 5.5% relative improvement in F1-score compared to mainstream sub-word tokenizers. Our work not only aids in further investigations of the specific script but also has the potential to advance research on other forms of ancient Chinese scripts.

翻訳日:2024-09-06 07:59:10 公開日:2024-09-02

# リコメンデーションのための多様性向上型コラボレーションメトリックラーニング

Improved Diversity-Promoting Collaborative Metric Learning for Recommendation ( http://arxiv.org/abs/2409.01012v1 )

ライセンス: Link先を確認

Shilong Bao, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, Qingming Huang,

(参考訳) コラボレーティブ・メトリック・ラーニング(CML)は、最近、レコメンデーション・システム(RS)において一般的な方法として現れ、メトリック・ラーニングとコラボレーティブ・フィルタリングのギャップを埋めている。 RSの慣例に従い、既存のプラクティスはモデル設計においてユニークなユーザー表現を利用する。本稿では,ユーザが複数のカテゴリの関心を持つ,困難なシナリオに焦点を当てる。この設定の下では、ユニークなユーザ表現は、特にアイテムカテゴリの分布が不均衡な場合に、優先バイアスを引き起こす可能性がある。この問題に対処するため,本稿では,ユーザの少数派の関心を概ね無視する目的で,‘textit{Diversity-Promoting Collaborative Metric Learning}’ (DPCML) と呼ばれる新しい手法を提案する。 DPCMLの背景にある重要な考え方は、ユーザがアイテムに対する好みを集約するシステムにおいて、埋め込みセットの中で最小のアイテム-ユーザ距離を取ることで、各ユーザに対して複数の表現セットを導入することである。具体的には、2つの効果的な割り当て戦略をインスタンス化し、各ユーザに対して適切な量のベクトルを探索する。一方、マルチベクタ表現戦略をより良くするために、textit{Diversity Control Regularization Scheme} (DCRS) が開発されている。理論的には、DPCMLは従来のCMLよりも小さな一般化誤差を誘導できることを示す。さらに, CMLに基づくアプローチでは, 対の目的によって引き起こされる計算負担を軽減するために, 通常は textit{ negative sample} を必要とする。本稿では,One-Way partial AUC(OPAUC)の観点から広く採用されているハード・アウェア・サンプリングの基本的な限界を明らかにし,CMLのパラダイムに対する効果的なサンプリング代替案を開発する。最後に、さまざまなベンチマークデータセットに関する包括的な実験は、DPCMLの有効性を物語っている。コードは \url{https://github.com/statusrank/LibCML} で入手できる。

Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called \textit{Diversity-Promoting Collaborative Metric Learning} (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to introduce a set of multiple representations for each user in the system where users' preference toward an item is aggregated by taking the minimum item-user distance among their embedding set. Specifically, we instantiate two effective assignment strategies to explore a proper quantity of vectors for each user. Meanwhile, a \textit{Diversity Control Regularization Scheme} (DCRS) is developed to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could induce a smaller generalization error than traditional CML. Furthermore, we notice that CML-based approaches usually require \textit{negative sampling} to reduce the heavy computational burden caused by the pairwise objective therein. In this paper, we reveal the fundamental limitation of the widely adopted hard-aware sampling from the One-Way Partial AUC (OPAUC) perspective and then develop an effective sampling alternative for the CML-based paradigm. Finally, comprehensive experiments over a range of benchmark datasets speak to the efficacy of DPCML. Code are available at \url{https://github.com/statusrank/LibCML}.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# SeCo-INR: 医用画像超解像のための意味的条件付きインシシトニューラル表現

SeCo-INR: Semantically Conditioned Implicit Neural Representations for Improved Medical Image Super-Resolution ( http://arxiv.org/abs/2409.01013v1 )

ライセンス: Link先を確認

Mevan Ekanayake, Zhifeng Chen, Gary Egan, Mehrtash Harandi, Zhaolin Chen,

(参考訳) Inlicit Neural Representations (INR)は、大規模なトレーニングデータセットを必要とせずに、信号の連続的な表現を学習する能力のために、最近ディープラーニングの分野を進歩させた。医用画像の超高分解能化のためにINR法が研究されているが, 医用画像における局所化先行への適応性は広く研究されていない。医用画像には、INRの精度と堅牢性を高めるために貴重な局所的な事前情報を提供する、豊富な解剖学的分類が含まれている。本研究では,医療画像から局所的な先行値を用いてINRを条件付けし,高精度なモデルフィッティングと補間機能を実現する,Semantically Conditioned INR (SeCo-INR) と呼ばれる新しいフレームワークを提案する。本フレームワークは、医用画像のセマンティックセグメンテーション特徴の連続表現を学習し、それを用いて画像の各セマンティック領域に対して最適なINRを導出する。我々は,いくつかの医用画像モダリティを用いてフレームワークを試験し,最先端の手法と比較して高い定量スコアとよりリアルな超解像出力を得た。

Implicit Neural Representations (INRs) have recently advanced the field of deep learning due to their ability to learn continuous representations of signals without the need for large training datasets. Although INR methods have been studied for medical image super-resolution, their adaptability to localized priors in medical images has not been extensively explored. Medical images contain rich anatomical divisions that could provide valuable local prior information to enhance the accuracy and robustness of INRs. In this work, we propose a novel framework, referred to as the Semantically Conditioned INR (SeCo-INR), that conditions an INR using local priors from a medical image, enabling accurate model fitting and interpolation capabilities to achieve super-resolution. Our framework learns a continuous representation of the semantic segmentation features of a medical image and utilizes it to derive the optimal INR for each semantic region of the image. We tested our framework using several medical imaging modalities and achieved higher quantitative scores and more realistic super-resolution outputs compared to state-of-the-art methods.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# 鳥の視点からストリートビューへ:潜伏拡散モデルを用いた多次元および条件付き画像の作成

From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model ( http://arxiv.org/abs/2409.01014v1 )

ライセンス: Link先を確認

Xiaojie Xu, Tianshuo Xu, Fulong Ma, Yingcong Chen,

(参考訳) 本研究では,Blord's-Eye View(BEV)生成を探索し,BEVマップを対応する多視点ストリートイメージに変換する。 BEVは、マルチセンサー融合を支援する統一空間表現で価値があり、様々な自律運転アプリケーションにおいて重要な役割を担っている。 BEVマップから正確なストリートビュー画像を作成することは、複雑な交通シナリオを描写し、運転アルゴリズムを強化するために不可欠である。同時に、拡散に基づく条件付き画像生成モデルは、多種多様で高品質で条件に整合した結果が得られ、顕著な結果を示した。それでも、これらのモデルのトレーニングには、かなりのデータと計算資源が必要である。したがって、特定の条件生成タスクのための安定拡散のような先進的なモデルを微調整する方法が、有望な道として現れる。本稿では,BEVレイアウトから画像を生成するための実用的なフレームワークを提案する。提案手法は,ニューラルビュー変換とストリート画像生成の2つの主要コンポーネントから構成される。ニューラルビュー変換フェーズは、BEVとパースペクティブビューの形状対応を学習することにより、BEVマップをアライメントされたマルチビューセマンティックセマンティックセマンティクスマップに変換する。その後、Street Image Generation フェーズでは、これらのセグメンテーションを、微調整された潜在拡散モデルを導く条件として利用する。この微調整プロセスにより、ビューとスタイルの一貫性が保証される。本モデルでは,交通状況下での大規模な事前学習拡散モデルの生成能力を活用し,多種多様かつ条件に整合したストリートビュー画像を生成する。

We explore Bird's-Eye View (BEV) generation, converting a BEV map into its corresponding multi-view street images. Valued for its unified spatial representation aiding multi-sensor fusion, BEV is pivotal for various autonomous driving applications. Creating accurate street-view images from BEV maps is essential for portraying complex traffic scenarios and enhancing driving algorithms. Concurrently, diffusion-based conditional image generation models have demonstrated remarkable outcomes, adept at producing diverse, high-quality, and condition-aligned results. Nonetheless, the training of these models demands substantial data and computational resources. Hence, exploring methods to fine-tune these advanced models, like Stable Diffusion, for specific conditional generation tasks emerges as a promising avenue. In this paper, we introduce a practical framework for generating images from a BEV layout. Our approach comprises two main components: the Neural View Transformation and the Street Image Generation. The Neural View Transformation phase converts the BEV map into aligned multi-view semantic segmentation maps by learning the shape correspondence between the BEV and perspective views. Subsequently, the Street Image Generation phase utilizes these segmentations as a condition to guide a fine-tuned latent diffusion model. This finetuning process ensures both view and style consistency. Our model leverages the generative capacity of large pretrained diffusion models within traffic contexts, effectively yielding diverse and condition-coherent street view images.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# Fed-MUnet:脳腫瘍分離のための多モードフェデレーションUnet

Fed-MUnet: Multi-modal Federated Unet for Brain Tumor Segmentation ( http://arxiv.org/abs/2409.01020v1 )

ライセンス: Link先を確認

Ruojun Zhou, Lisha Qu, Lei Zhang, Ziming Li, Hongwei Yu, Bing Luo,

(参考訳) 深層学習に基づく手法は、単モード磁気共鳴イメージング(MRI)画像とマルチモード磁気共鳴イメージング(MRI)画像の両方を用いて脳腫瘍のセグメンテーションに広く用いられている。最近の研究の多くは、診療所間のデータ共有という本質的な課題のために、集中的なトレーニングに重点を置いている。プライバシーの懸念を軽減するために、研究者は脳腫瘍のセグメンテーションタスクにフェデレートラーニング(FL)メソッドを導入した。しかし、現在では単一のモーダルMRIに焦点が当てられており、マルチモーダルMRIについては限定的な研究がなされている。この課題には、複雑な構造、大規模パラメータ、マルチモーダルMRIを用いたFLベースの手法の過剰適合問題などが含まれる。以上の課題に対処するため,我々は,FLトレーニングに適した脳腫瘍セグメンテーション(Fed-MUnet)のための新しいマルチモーダルFLフレームワークを提案する。我々は、BraTS2022データセットを用いて、我々のアプローチを評価した。実験により,本フレームワークは分散学習とプライバシ保護のFL特性を実現することを示す。腫瘍, 腫瘍コア, 腫瘍全体の5つの指標の平均値は87.5%, 90.6%, 92.2%であり, それぞれSOTA法よりも高い値を示した。パラメータ数、浮動小数点演算量(FLOP)、推論の観点では、Fed-MUnetは最先端のセグメンテーションバックボーンと比較してパレートが最適であり、高いパフォーマンスを実現し、プライバシー問題に取り組む。私たちのコードはhttps://github.com/Arnold-Jun/Fed-MUnet.comでオープンソース化されています。

Deep learning-based techniques have been widely utilized for brain tumor segmentation using both single and multi-modal Magnetic Resonance Imaging (MRI) images. Most current studies focus on centralized training due to the intrinsic challenge of data sharing across clinics. To mitigate privacy concerns, researchers have introduced Federated Learning (FL) methods to brain tumor segmentation tasks. However, currently such methods are focusing on single modal MRI, with limited study on multi-modal MRI. The challenges include complex structure, large-scale parameters, and overfitting issues of the FL based methods using multi-modal MRI. To address the above challenges, we propose a novel multi-modal FL framework for brain tumor segmentation (Fed-MUnet) that is suitable for FL training. We evaluate our approach with the BraTS2022 datasets, which are publicly available. The experimental results demonstrate that our framework achieves FL nature of distributed learning and privacy preserving. For the enhancing tumor, tumor core and whole tumor, the mean of five major metrics were 87.5%, 90.6% and 92.2%, respectively, which were higher than SOTA methods while preserving privacy. In terms of parameters count, quantity of floating-point operations (FLOPs) and inference, Fed-MUnet is Pareto optimal compared with the state-of-the-art segmentation backbone while achieves higher performance and tackles privacy issue. Our codes are open-sourced at https://github.com/Arnold-Jun/Fed-MUnet.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# SINET:水中画像強調のための空間駆動型解釈型ニューラルネットワーク

SINET: Sparsity-driven Interpretable Neural Network for Underwater Image Enhancement ( http://arxiv.org/abs/2409.01022v1 )

ライセンス: Link先を確認

Gargi Panda, Soumitra Kundu, Saumik Bhattacharya, Aurobinda Routray,

(参考訳) 水中画像の品質向上は海洋研究と技術の発展に不可欠である。この研究は、水中画像強調(UIE)タスクのための空間駆動型解釈型ニューラルネットワーク(SINET)を導入する。純粋な深層学習法とは異なり、我々のネットワークアーキテクチャは、新しいチャネル固有の畳み込みスパース符号化(CCSC)モデルに基づいており、基礎となる画像強調プロセスの良好な解釈性を保証する。 SINETの鍵となる特徴は、3つのスパース特徴推定ブロック(SFEB)を用いて3つの色チャネルから有意な特徴を推定することである。 SFEBのアーキテクチャは、$\ell_1$ regulaized convolutional sparse coding (CSC) 問題を解決するための反復アルゴリズムをアンロールすることによって設計されている。我々の実験によると、SINETは最先端のPSNRの値を$1.05$dB、計算複雑性を$3873$で上回っている。

Improving the quality of underwater images is essential for advancing marine research and technology. This work introduces a sparsity-driven interpretable neural network (SINET) for the underwater image enhancement (UIE) task. Unlike pure deep learning methods, our network architecture is based on a novel channel-specific convolutional sparse coding (CCSC) model, ensuring good interpretability of the underlying image enhancement process. The key feature of SINET is that it estimates the salient features from the three color channels using three sparse feature estimation blocks (SFEBs). The architecture of SFEB is designed by unrolling an iterative algorithm for solving the $\ell_1$ regulaized convolutional sparse coding (CSC) problem. Our experiments show that SINET surpasses state-of-the-art PSNR value by $1.05$ dB with $3873$ times lower computational complexity.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# データ分割におけるランダム性による予測精度の変動と間隔推定による公正評価

Variation in prediction accuracy due to randomness in data division and fair evaluation using interval estimation ( http://arxiv.org/abs/2409.01025v1 )

ライセンス: Link先を確認

Isao Goto,

(参考訳) 本稿では,機械学習アルゴリズムを用いて予測モデルを構築する際の「簡単な問題」に答えようとする。様々な疾患の診断および予測モデルは、大規模なコホート研究と機械学習アルゴリズムのデータを用いて提案されているが、その一般化性には課題がある。この課題のいくつかの原因が指摘されており、ランダムなデータセットの分割がその1つと考えられている。本研究では,AutoML(Automatic Machine Learning framework)とオープン糖尿病データを用いて,「初期状態」に依存した33,600の糖尿病診断モデルを構築し,その予測精度を評価した。その結果,予測精度は初期状態依存分布であった。この分布は正規分布に従うことができるため,予測モデルの精度を正確に比較するために,統計的間隔推定を用いて予測精度の予測間隔を推定する。

This paper attempts to answer a "simple question" in building predictive models using machine learning algorithms. Although diagnostic and predictive models for various diseases have been proposed using data from large cohort studies and machine learning algorithms, challenges remain in their generalizability. Several causes for this challenge have been pointed out, and partitioning of the dataset with randomness is considered to be one of them. In this study, we constructed 33,600 diabetes diagnosis models with "initial state" dependent randomness using autoML (automatic machine learning framework) and open diabetes data, and evaluated their prediction accuracy. The results showed that the prediction accuracy had an initial state-dependent distribution. Since this distribution could follow a normal distribution, we estimated the expected interval of prediction accuracy using statistical interval estimation in order to fairly compare the accuracy of the prediction models.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# 顔偽物検出のための偽物発見学習

Learning to Discover Forgery Cues for Face Forgery Detection ( http://arxiv.org/abs/2409.01030v1 )

ライセンス: Link先を確認

Jiahe Tian, Peng Chen, Cai Yu, Xiaomeng Fu, Xi Wang, Jiao Dai, Jizhong Han,

(参考訳) フォージェリーキューのピクセルレベルのアノテーションである位置操作マップは、顔フォージェリー検出において解釈可能な検出結果を提供するのに不可欠である。関連する学習オブジェクトは、検出器の分類性能を改善するための補助的なタスクとして広く採用されているが、実際の顔と偽顔を比較して、操作マップを監督として取得する必要がある。この要件は、未確認の顔に適用性を制限するとともに、現実のシナリオに矛盾する。さらに、使用した比較手法は、圧縮やアップサンプリングによって導入されたノイズを含む、すべての変化したピクセルに注釈を付ける。このようなマップを監督として使用すると、悪用可能な手がかりの学習が妨げられ、モデルが過度に適合する傾向がある。これらの問題に対処するために,フォージェリーキューディスカバリ (FoCus) と呼ばれる弱教師付きモデルを導入する。 FoCusは、注意マップ内の鍛造された領域を検知するいくつかの検出器とは異なり、部分的かつ不正確な偽造の手がかりを捕捉する欠点を補うように設計されている。具体的には、分類中の偽の手がかりを特定するための分類注意領域提案モジュールと、よりリッチな手がかりの学習を容易にするための補完学習モジュールを提案する。生成した操作マップは、顔偽造検知器を強化するためにより良い監視を行うことができる。提案したFoCusの操作マップの可視化は,既存手法と比較して高い解釈性とロバスト性を示す。 5つのデータセットと4つのマルチタスクモデルに対する実験は、FoCusがデータセット内およびデータセット内の両方で有効であることを示す。

Locating manipulation maps, i.e., pixel-level annotation of forgery cues, is crucial for providing interpretable detection results in face forgery detection. Related learning objects have also been widely adopted as auxiliary tasks to improve the classification performance of detectors whereas they require comparisons between paired real and forged faces to obtain manipulation maps as supervision. This requirement restricts their applicability to unpaired faces and contradicts real-world scenarios. Moreover, the used comparison methods annotate all changed pixels, including noise introduced by compression and upsampling. Using such maps as supervision hinders the learning of exploitable cues and makes models prone to overfitting. To address these issues, we introduce a weakly supervised model in this paper, named Forgery Cue Discovery (FoCus), to locate forgery cues in unpaired faces. Unlike some detectors that claim to locate forged regions in attention maps, FoCus is designed to sidestep their shortcomings of capturing partial and inaccurate forgery cues. Specifically, we propose a classification attentive regions proposal module to locate forgery cues during classification and a complementary learning module to facilitate the learning of richer cues. The produced manipulation maps can serve as better supervision to enhance face forgery detectors. Visualization of the manipulation maps of the proposed FoCus exhibits superior interpretability and robustness compared to existing methods. Experiments on five datasets and four multi-task models demonstrate the effectiveness of FoCus in both in-dataset and cross-dataset evaluations.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# パラメータ効率の良い微調整におけるタスク特化方向のパワーの解放

Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning ( http://arxiv.org/abs/2409.01035v1 )

ライセンス: Link先を確認

Chongjie Si, Zhiyi Shi, Shifan Zhang, Xiaokang Yang, Hanspeter Pfister, Wei Shen,

(参考訳) 大規模な言語モデルは、下流のタスクで素晴らしいパフォーマンスを示すが、全てのパラメータを完全に微調整する際には、リソース消費がかなり必要である。これを軽減するために、LoRAのようなパラメータ効率の良い細調整(PEFT)戦略が開発されている。本稿では,大規模モデルを事前学習状態からPEFTにおけるタスク固有化へ移行させる上で,タスク固有方向の概念を掘り下げる。本稿では,これらの方向性を明確に定義し,その特性と実用化の課題を探求する枠組みを提案する。そこで我々は,タスク特定方向の影響を最大化し,目標タスクに対するモデル性能を向上させることを目的とした,新しいアプローチであるLoRA-Dashを導入する。広汎な実験によりLoRA-Dashの有効性が確定され、詳細な分析によりLoRA-Dashの基礎となるメカニズムが明らかにされた。コードはhttps://github.com/Chongjie-Si/Subspace-Tuning.comで公開されている。

Large language models demonstrate impressive performance on downstream tasks, yet requiring extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions--critical for transitioning large models from pre-trained states to task-specific enhancements in PEFT. We propose a framework to clearly define these directions and explore their properties, and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of task-specific directions during the fine-tuning process, thereby enhancing model performance on targeted tasks. Extensive experiments have conclusively demonstrated the effectiveness of LoRA-Dash, and in-depth analyses further reveal the underlying mechanisms of LoRA-Dash. The code is available at https://github.com/Chongjie-Si/Subspace-Tuning.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# NYK-MS:カートゥーン・キャプション・データセットの多モードメタファーとサーカスム理解ベンチマーク

NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding Benchmark on Cartoon-Caption Dataset ( http://arxiv.org/abs/2409.01037v1 )

ライセンス: Link先を確認

Ke Chang, Hao Li, Junzhao Zhang, Yunfang Wu,

(参考訳) メタファーとサルカズムは人々のコミュニケーション、特にインターネットや10代の若者に人気があるミームにおいて一般的な比喩表現である。我々はNYK-MS(NewYorKer for Metaphor and Sarcasm)という新しいベンチマークを作成し、比喩理解タスクの1,583のサンプルと皮肉理解タスクの1,578のサンプルを含む。これらのタスクにはメタファ/サルカズムが含まれているか、どの単語やオブジェクトがメタファ/サルカズムを含んでいるか、何を風刺しているか、なぜそれがメタファ/サルカズムを含んでいるのか、そして7つのタスクのすべてが少なくとも3つのアノテーションによって十分に注釈付けされている。一貫性と品質を向上させるために、いくつかのラウンドでデータセットに注釈を付け、GUIとGPT-4Vを使って効率を上げる。ベンチマークに基づいて、多くの実験を行います。ゼロショット実験では,Large Language Models (LLM) とLarge Multi-modal Models (LMM) が分類タスクをうまく行うことができず,スケールが大きくなるにつれて,他の5つのタスクのパフォーマンスが向上することを示した。従来のプレトレインモデルを用いた実験では,拡張法とアライメント法により,ベンチマークが以前のデータセットと整合性を証明し,両モードの双方を理解するためにモデルが必要であることを示す。

Metaphor and sarcasm are common figurative expressions in people's communication, especially on the Internet or the memes popular among teenagers. We create a new benchmark named NYK-MS (NewYorKer for Metaphor and Sarcasm), which contains 1,583 samples for metaphor understanding tasks and 1,578 samples for sarcasm understanding tasks. These tasks include whether it contains metaphor/sarcasm, which word or object contains metaphor/sarcasm, what does it satirize and why does it contains metaphor/sarcasm, all of the 7 tasks are well-annotated by at least 3 annotators. We annotate the dataset for several rounds to improve the consistency and quality, and use GUI and GPT-4V to raise our efficiency. Based on the benchmark, we conduct plenty of experiments. In the zero-shot experiments, we show that Large Language Models (LLM) and Large Multi-modal Models (LMM) can't do classification task well, and as the scale increases, the performance on other 5 tasks improves. In the experiments on traditional pre-train models, we show the enhancement with augment and alignment methods, which prove our benchmark is consistent with previous dataset and requires the model to understand both of the two modalities.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# 街路地図を用いた降雨時のロバスト車両位置推定と追跡

Robust Vehicle Localization and Tracking in Rain using Street Maps ( http://arxiv.org/abs/2409.01038v1 )

ライセンス: Link先を確認

Yu Xiang Tan, Malika Meghjani,

(参考訳) GPSによる車両のローカライゼーションと追跡は、トンネルセグメントや密集した都市部でよく経験される不安定な位置情報に悩まされている。また、視覚オドメトリー(VO)と視覚慣性オドメトリー(VIO)は、視覚入力の閉塞やぼやけを引き起こす悪天候条件の影響を受けやすい。本稿では,道路網を用いた地図情報を用いた車両位置推定手法を提案し,特に降雨やトンネルを走行するような敵のシナリオにおいて,ドリフト計測と間欠的なGPS計測を補正する。具体的には、断続的なGPSとドリフトIMUとVOを融合したフレキシブルフュージョンアルゴリズムと、ロバストな車両のローカライゼーションと追跡のための2次元マップ情報を組み合わせた。われわれのアプローチをMap-Fusionと呼んでいる。本提案手法は,晴天・降雨条件にまたがる異なる国々の地理的に多様な4つのデータセットに対して,強固に評価する。これらのデータセットには、トンネルやアンダーパスにおける難解な視覚セグメントも含まれている。マップ情報の統合により、Map-Fusionアルゴリズムは、すべてのデータセットにまたがる最先端VOおよびVIOアプローチの誤差を低減する。また,提案したアルゴリズムを実環境およびハードウェア制約された移動ロボット上でリアルタイムに検証する。 Map-Fusionは、晴天で2.46m、雨で6.05m、150mルートで6.05mの誤差を達成した。

GPS-based vehicle localization and tracking suffers from unstable positional information commonly experienced in tunnel segments and in dense urban areas. Also, both Visual Odometry (VO) and Visual Inertial Odometry (VIO) are susceptible to adverse weather conditions that causes occlusions or blur on the visual input. In this paper, we propose a novel approach for vehicle localization that uses street network based map information to correct drifting odometry estimates and intermittent GPS measurements especially, in adversarial scenarios such as driving in rain and tunnels. Specifically, our approach is a flexible fusion algorithm that integrates intermittent GPS, drifting IMU and VO estimates together with 2D map information for robust vehicle localization and tracking. We refer to our approach as Map-Fusion. We robustly evaluate our proposed approach on four geographically diverse datasets from different countries ranging across clear and rain weather conditions. These datasets also include challenging visual segments in tunnels and underpasses. We show that with the integration of the map information, our Map-Fusion algorithm reduces the error of the state-of-the-art VO and VIO approaches across all datasets. We also validate our proposed algorithm in a real-world environment and in real-time on a hardware constrained mobile robot. Map-Fusion achieved 2.46m error in clear weather and 6.05m error in rain weather for a 150m route.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# 修正Q-ラーニングアルゴリズムを用いた多目的タスク学習の高速化

Accelerated Multi-objective Task Learning using Modified Q-learning Algorithm ( http://arxiv.org/abs/2409.01046v1 )

ライセンス: Link先を確認

Varun Prakash Rajamohan, Senthil Kumar Jagatheesaperumal,

(参考訳) ロボットは産業において広範囲の応用を見出す。近年,家庭シナリオにおいてもロボットの影響が急速に拡大している。 Q-learningアルゴリズムは、目標を達成するための報酬を最大化することを目的としている。本稿では,Q-SDを用いたQ-learningと呼ばれるQ-ラーニングアルゴリズムの修正版を提案する。このアルゴリズムはタスク学習を強化し、タスク完了をより意味のあるものにする。ロボットマニピュレータ(エージェント)は、テーブルクリーニングのタスクにQ-SDアルゴリズムを適用する。 Q-SDを用いて、エージェントは、マニピュレータの移動距離を最小化しながらタスクを達成するために必要なステップのシーケンスを取得する。テーブルを異なる次元のグリッドに分割します。第1のグリッド数は3倍、第2のグリッドは4倍の4倍である。 Q-SDアルゴリズムを用いて、これらの2つの環境で得られた最大成功率は、それぞれ86%と59%であった。さらに,従来のQ-ラーニングアルゴリズムと比較して,これら2つの環境においてエージェントが移動した平均距離の減少は,それぞれ8.61%,6.7%であった。

Robots find extensive applications in industry. In recent years, the influence of robots has also increased rapidly in domestic scenarios. The Q-learning algorithm aims to maximise the reward for reaching the goal. This paper proposes a modified version of the Q-learning algorithm, known as Q-learning with scaled distance metric (Q-SD). This algorithm enhances task learning and makes task completion more meaningful. A robotic manipulator (agent) applies the Q-SD algorithm to the task of table cleaning. Using Q-SD, the agent acquires the sequence of steps necessary to accomplish the task while minimising the manipulator's movement distance. We partition the table into grids of different dimensions. The first has a grid count of 3 times 3, and the second has a grid count of 4 times 4. Using the Q-SD algorithm, the maximum success obtained in these two environments was 86% and 59% respectively. Moreover, Compared to the conventional Q-learning algorithm, the drop in average distance moved by the agent in these two environments using the Q-SD algorithm was 8.61% and 6.7% respectively.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# INTENTAS -- 微小重力のためのエンタングルメント強化原子センサー

INTENTAS -- An entanglement-enhanced atomic sensor for microgravity ( http://arxiv.org/abs/2409.01051v1 )

ライセンス: Link先を確認

O. Anton, I. Bröckel, D. Derr, A. Fieguth, M. Franzke, M. Gärtner, E. Giese, J. S. Haase, J. Hamann, A. Heidt, S. Kanthak, C. Klempt, J. Kruse, M. Krutzik, S. Kubitza, C. Lotz, K. Müller, J. Pahl, E. M. Rasel, M. Schiemangk, W. P. Schleich, S. Schwertfeger, A. Wicht, L. Wörner,

(参考訳) INTENTASプロジェクトは、微小重力環境下で絡み合ったボース=アインシュタイン凝縮体(BEC)を利用した原子センサーを開発することを目的としている。この重要な成果は、絡み合いの強い感性と長い尋問時間の両方から恩恵を受ける測定能力を向上させるために必要である。このプロジェクトは、ハノーファーのアインシュタイン・エレベータの実験プラットフォームに特有のサイズ、重量、電力管理(SWaP)に関する重要な課題に対処している。この設計により、絡み目の生成と検出に不可欠な低騒音環境が確保される。さらに、この装置は、BECを全光学的に作成するための革新的なアプローチを特徴とし、様々な構成のためのフレキシブルなシステムを提供し、迅速なターンアラウンドタイムの要求を満たす。 Einstein-Elevatorにおけるこの技術の実証が成功すれば、宇宙への将来の展開の道が開けることになる。

The INTENTAS project aims to develop an atomic sensor utilizing entangled Bose-Einstein condensates (BECs) in a microgravity environment. This key achievement is necessary to advance the capability for measurements that benefit from both entanglement-enhanced sensitivities and extended interrogation times. The project addresses significant challenges related to size, weight, and power management (SWaP) specific to the experimental platform at the Einstein-Elevator in Hannover. The design ensures a low-noise environment essential for the creation and detection of entanglement. Additionally, the apparatus features an innovative approach to the all-optical creation of BECs, providing a flexible system for various configurations and meeting the requirements for rapid turnaround times. Successful demonstration of this technology in the Einstein-Elevator will pave the way for a future deployment in space, where its potential applications will unlock high-precision quantum sensing.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# 生成AIの文脈における文学メタファーの展望

A Perspective on Literary Metaphor in the Context of Generative AI ( http://arxiv.org/abs/2409.01053v1 )

ライセンス: Link先を確認

Imke van Heerden, Anil Bas,

(参考訳) 本研究は,創作テキスト生成と文芸理論の交わりにおいて,文芸メタファーの役割と,多種多様な意味を生み出す能力について考察する。この点において、文学的比喩は特定の言語の発展に不可欠である。原語の含意が文質を向上させるかどうかを検討するため,アフリカーンスでLSTMに基づく言語モデルを訓練した。ネットワークは、魅力的に斬新な音声の人物を含むフレーズを生成する。具体的には、AIがどのようにデファミリアライゼーション技術として活用されるかに重点を置いている。テキスト生成に関する文学的視点を提供することで、本論文は美的価値、解釈、評価に関する思慮に富んだ疑問を提起する。

At the intersection of creative text generation and literary theory, this study explores the role of literary metaphor and its capacity to generate a range of meanings. In this regard, literary metaphor is vital to the development of any particular language. To investigate whether the inclusion of original figurative language improves textual quality, we trained an LSTM-based language model in Afrikaans. The network produces phrases containing compellingly novel figures of speech. Specifically, the emphasis falls on how AI might be utilised as a defamiliarisation technique, which disrupts expected uses of language to augment poetic expression. Providing a literary perspective on text generation, the paper raises thought-provoking questions on aesthetic value, interpretation and evaluation.

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# Follow-Your-Canvas: 大規模コンテンツ生成による高分解能ビデオ露光

Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation ( http://arxiv.org/abs/2409.01055v1 )

ライセンス: Link先を確認

Qihua Chen, Yue Ma, Hongfa Wang, Junkun Yuan, Wenzhe Zhao, Qi Tian, Hongmei Wang, Shaobo Min, Qifeng Chen, Wei Liu,

(参考訳) 本稿では,大規模なコンテンツ生成による高精細映像の画質向上について検討する。我々は、ビデオに大きく勝とうとする既存の手法が直面する一般的な問題として、低品質なコンテンツの生成とGPUメモリによる制限を挙げている。これらの課題に対処するため,<textit{Follow-Your-Canvas} という拡散型手法を提案する。基本設計は2つある。まず,「単発」のアウトペイントという一般的な手法を使わずに,タスクを空間的ウィンドウに分散し,シームレスにマージする。これにより、GPUメモリに制約されることなく、どんなサイズや解像度の動画にも勝ることができます。次に、ソース映像とその相対位置関係を各ウィンドウの生成工程に注入する。これにより、各ウィンドウ内の生成された空間レイアウトが、ソースビデオと調和する。これら2つの設計と組み合わせることで、空間的・時間的整合性を維持しつつ、リッチなコンテンツで高解像度の露光映像を生成することができる。 Follow-Your-Canvas は 512X512 から 1152X2048 (9X) までの大規模なビデオ撮影で優れており、高品質で美的な結果が得られる。様々な解像度とスケールのセットアップで最高の定量的結果が得られる。コードはhttps://github.com/mayuelala/FollowYourCanvasで公開されている。

This paper explores higher-resolution video outpainting with extensive content generation. We point out common issues faced by existing methods when attempting to largely outpaint videos: the generation of low-quality content and limitations imposed by GPU memory. To address these challenges, we propose a diffusion-based method called \textit{Follow-Your-Canvas}. It builds upon two core designs. First, instead of employing the common practice of "single-shot" outpainting, we distribute the task across spatial windows and seamlessly merge them. It allows us to outpaint videos of any size and resolution without being constrained by GPU memory. Second, the source video and its relative positional relation are injected into the generation process of each window. It makes the generated spatial layout within each window harmonize with the source video. Coupling with these two designs enables us to generate higher-resolution outpainting videos with rich content while keeping spatial and temporal consistency. Follow-Your-Canvas excels in large-scale video outpainting, e.g., from 512X512 to 1152X2048 (9X), while producing high-quality and aesthetically pleasing results. It achieves the best quantitative results across various resolution and scale setups. The code is released on https://github.com/mayuelala/FollowYourCanvas

翻訳日:2024-09-06 07:49:16 公開日:2024-09-02

# No Peer, No Cry: フォールトインジェクションによるネットワークアプリケーションファズリング

No Peer, no Cry: Network Application Fuzzing via Fault Injection ( http://arxiv.org/abs/2409.01059v1 )

ライセンス: Link先を確認

Nils Bars, Moritz Schloegel, Nico Schiller, Lukas Bernhard, Thorsten Holz,

(参考訳) ネットワーク対応アプリケーションは、特にインターネットに接続された場合、あらゆる種類の攻撃にさらされる。結果として、NginxやcurlのようなクライアントアプリケーションのようなWebサーバは、メモリ安全性違反を排除すべく、コードのセキュリティと強化にあらゆる努力を払っています。ファジングはソフトウェアのバグを発見するための最も成功したアプローチの1つだと証明されているからだ。しかし、ネットワークアプリケーションをファジィングすることに焦点を当てた驚くべき研究はほとんどない。基礎となる理由を研究すると、コミュニケーションのインタラクティブな性質、そのステートフルさ、交換メッセージの保護が典型的なファジィザを非効率にすることがわかった。記録されたメッセージを再生したり、それをオンザフライで修正しようとする試みは、特定のターゲットに対してのみ機能し、しばしば通信の早期終了につながる。本稿では、これらの課題を詳細に議論し、既存のプロトコル状態空間への取り組みの焦点がいかにして緩和し難いかを強調する。我々は、メッセージを変更するのではなく、フォールトインジェクションに依存する、根本的に異なるアプローチを提案する。効果的に、私たちはコミュニケーションピアの1つを、その出力がターゲットピアの期待に合わない奇妙な状態に強制します。重要なことは、この奇妙なピアはプロトコルメッセージを適切に暗号化/署名することができ、現在のファジィザの根本的な課題を克服できます。事実上、通信システムをそのままにしておくが、小さな汚職を発生させる。サーバまたはクライアントを奇妙なピアにすることができるので、クライアントサイドのネットワークアプリケーションを効果的にテストできるのは、私たちのアプローチが初めてです。 16の目標を評価した結果,Fuzztruction-Netは,他のファジィよりもカバー範囲やバグの点で優れていることがわかった。全体として、Fuzztruction-Netは、WebサーバのNginxやApache HTTPd、OpenSSHクライアントなど、よくテストされたソフトウェアの23のバグを発見した。

Network-facing applications are commonly exposed to all kinds of attacks, especially when connected to the internet. As a result, web servers like Nginx or client applications such as curl make every effort to secure and harden their code to rule out memory safety violations. One would expect this to include regular fuzz testing, as fuzzing has proven to be one of the most successful approaches to uncovering bugs in software. Yet, surprisingly little research has focused on fuzzing network applications. When studying the underlying reasons, we find that the interactive nature of communication, its statefulness, and the protection of exchanged messages render typical fuzzers ineffective. Attempts to replay recorded messages or modify them on the fly only work for specific targets and often lead to early termination of communication. In this paper, we discuss these challenges in detail, highlighting how the focus of existing work on protocol state space promises little relief. We propose a fundamentally different approach that relies on fault injection rather than modifying messages. Effectively, we force one of the communication peers into a weird state where its output no longer matches the expectations of the target peer, potentially uncovering bugs. Importantly, this weird peer can still properly encrypt/sign the protocol message, overcoming a fundamental challenge of current fuzzers. In effect, we leave the communication system intact but introduce small corruptions. Since we can turn either the server or the client into the weird peer, our approach is the first that can effectively test client-side network applications. Evaluating 16 targets, we show that Fuzztruction-Net outperforms other fuzzers in terms of coverage and bugs found. Overall, Fuzztruction-Net uncovered 23 new bugs in well-tested software, such as the web servers Nginx and Apache HTTPd and the OpenSSH client.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# ランダム消去によるモデル反転攻撃に対する防御

Defending against Model Inversion Attacks via Random Erasing ( http://arxiv.org/abs/2409.01062v1 )

ライセンス: Link先を確認

Viet-Hung Tran, Ngoc-Bao Nguyen, Son T. Mai, Hans Vandierendonck, Ngai-man Cheung,

(参考訳) Model Inversion(MI)は、マシンラーニングモデルの悪用を通じてプライベートトレーニングデータを再構築することに焦点を当てた、プライバシー侵害の一種である。 MI攻撃に対抗するため、SOTA(State-of-the-art)MI防衛法はトレーニング損失と矛盾する正規化に依存し、プライバシ保護とモデルユーティリティの間に明確な緊張を生じさせる。本稿では,MI攻撃に対する防御方法を新たに提案する。我々の手法は新たな視点を採り、トレーニングデータに焦点をあてる。我々のアイデアは、過去にデータ拡張技術として応用されたランダム消去(RE)に関する新しい知見に基づいており、閉塞下でのモデルの精度を向上させる。我々の研究では、MI攻撃精度の劣化にREを適用することに重点を置いている。我々の重要な洞察は、MI攻撃は高次元のプライベートイメージを再構築するために、モデル内に符号化された大量のプライベートトレーニングデータ情報を必要とすることである。そこで本研究では,トレーニング中のモデルに提示されるプライベート情報を減らすためにREを適用することを提案する。その結果,MI復元の精度が著しく低下し,攻撃精度が低下する可能性が示唆された。一方、モデルの自然な精度は適度にしか影響しない。本手法は,既存の防衛手法を実装・補完することが極めて容易である。提案手法は,モデルのプライバシと実用性のバランスをとる上で,SOTAの性能を実現することができることを示す。その結果,MI攻撃,ネットワークアーキテクチャ,アタック構成にまたがる既存の防御よりも,我々の手法が優れていることを一貫して示している。

Model Inversion (MI) is a type of privacy violation that focuses on reconstructing private training data through abusive exploitation of machine learning models. To defend against MI attacks, state-of-the-art (SOTA) MI defense methods rely on regularizations that conflict with the training loss, creating explicit tension between privacy protection and model utility. In this paper, we present a new method to defend against MI attacks. Our method takes a new perspective and focuses on training data. Our idea is based on a novel insight on Random Erasing (RE), which has been applied in the past as a data augmentation technique to improve the model accuracy under occlusion. In our work, we instead focus on applying RE for degrading MI attack accuracy. Our key insight is that MI attacks require significant amount of private training data information encoded inside the model in order to reconstruct high-dimensional private images. Therefore, we propose to apply RE to reduce private information presented to the model during training. We show that this can lead to substantial degradation in MI reconstruction quality and attack accuracy. Meanwhile, natural accuracy of the model is only moderately affected. Our method is very simple to implement and complementary to existing defense methods. Our extensive experiments of 23 setups demonstrate that our method can achieve SOTA performance in balancing privacy and utility of the models. The results consistently demonstrate the superiority of our method over existing defenses across different MI attacks, network architectures, and attack configurations.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# ハイブリッドアクティブ推論モデルにおける学習

Learning in Hybrid Active Inference Models ( http://arxiv.org/abs/2409.01066v1 )

ライセンス: Link先を確認

Poppy Collis, Ryan Singh, Paul F Kinghorn, Christopher L Buckley,

(参考訳) 人工知能におけるオープンな問題は、システムが本質的に連続的な問題を解決するのに有用な離散的な抽象化を柔軟に学習する方法である。計算神経科学におけるこれまでの研究は、能動的推論の形式主義の下で意思決定中に離散変数と連続変数を機能的に統合することを検討した(Parr, Friston & de Vries, 2017; Parr & Friston, 2018)。しかし、その焦点はカテゴリー決定の表現的物理的実装であり、階層的混合生成モデルが知られていると仮定される。結果として、このフレームワークが学習にどのように拡張されるのかは不明だ。そこで本研究では,高レベル離散型アクティブ・推論・プランナが低レベル連続型アクティブ・推論・コントローラの上に位置する,新しい階層型ハイブリッド・アクティブ・推論・エージェントを提案する。複素連続力学の断片的線形分解による有意な離散表現のエンドツーエンド学習を実現するリカレントスイッチング線形力学系(rSLDS)の最近の研究を活用している(Linderman et al , 2016)。 rSLDSが学習した表現は,(1)オプションフレームワークを連想させる手法で時間的に制約されたサブゴールを指定できるようにし,(2)情報理論的な探索ボーナスを活用できるように,(2)離散空間への探索を解除し,(3)離散プランナーの低レベル問題に対する近似解を「キャッシュ」する。提案手法を連続マウンテンカータスクに適用し,探索の強化による高速なシステム識別と,抽象的なサブゴールのデライン化による計画成功を実証する。

An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018). However, their focus is on the expressive physical implementation of categorical decisions and the hierarchical mixed generative model is assumed to be known. As a consequence, it is unclear how this framework might be extended to learning. We therefore present a novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller. We make use of recent work in recurrent switching linear dynamical systems (rSLDS) which implement end-to-end learning of meaningful discrete representations via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). The representations learned by the rSLDS inform the structure of the hybrid decision-making agent and allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and successful planning through the delineation of abstract sub-goals.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# グローバル・ローカル・デフォルマブル・トランスフォーメーションによるプログレッシブ網膜画像登録

Progressive Retinal Image Registration via Global and Local Deformable Transformations ( http://arxiv.org/abs/2409.01068v1 )

ライセンス: Link先を確認

Yepeng Liu, Baosheng Yu, Tian Chen, Yuliang Gu, Bo Du, Yongchao Xu, Jun Cheng,

(参考訳) 網膜画像登録は眼科診断過程において重要な役割を担っている。異なる網膜画像間の視角や解剖学的構造にばらつきがあるため、キーポイントベースのアプローチは、その堅牢性と低レイテンシにより、網膜画像登録の主流となる。これらの手法は通常、網膜表面が平面であると仮定し、画像間の大域的な変換を表すホモグラフィ行列を得るために特徴マッチングを採用する。しかし、このような平面仮説は、網膜表面がほぼ湾曲しているため、必然的に登録誤差を生じさせる。この制限は、視角に有意な差がある画像対を登録する場合に顕著である。この問題に対処するため,HybridRetinaと呼ばれるハイブリッドレジストレーションフレームワークを提案する。そこで我々は,GAMorphと呼ばれるキーポイント検出器と変形ネットワークを用いて,大域的な変換と局所的な変形可能な変換を推定する。具体的には,GAMorphのトレーニングを指導するために,多段階の画素関係知識を統合する。さらに,画像の幾何学的先行を含むエッジアテンションモジュールを利用することで,画像の変形領域が臨床的興味のある血管領域により集中することを保証する。 FIREとFLoRI21という2つの広く使われているデータセットの実験により、提案したHybridRetinaは最先端の手法よりも大幅に優れていることが示された。コードはhttps://github.com/lyp-deeplearning/awesome-retinal-registrationで公開されている。

Retinal image registration plays an important role in the ophthalmological diagnosis process. Since there exist variances in viewing angles and anatomical structures across different retinal images, keypoint-based approaches become the mainstream methods for retinal image registration thanks to their robustness and low latency. These methods typically assume the retinal surfaces are planar, and adopt feature matching to obtain the homography matrix that represents the global transformation between images. Yet, such a planar hypothesis inevitably introduces registration errors since retinal surface is approximately curved. This limitation is more prominent when registering image pairs with significant differences in viewing angles. To address this problem, we propose a hybrid registration framework called HybridRetina, which progressively registers retinal images with global and local deformable transformations. For that, we use a keypoint detector and a deformation network called GAMorph to estimate the global transformation and local deformable transformation, respectively. Specifically, we integrate multi-level pixel relation knowledge to guide the training of GAMorph. Additionally, we utilize an edge attention module that includes the geometric priors of the images, ensuring the deformation field focuses more on the vascular regions of clinical interest. Experiments on two widely-used datasets, FIRE and FLoRI21, show that our proposed HybridRetina significantly outperforms some state-of-the-art methods. The code is available at https://github.com/lyp-deeplearning/awesome-retinal-registration.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# 大規模量子ネットワーク展開のための青写真

A blueprint for large-scale quantum-network deployments ( http://arxiv.org/abs/2409.01069v1 )

ライセンス: Link先を確認

Alberto Sebastián-Lombraña, Hans H. Brunner, Juan P. Brito, Rubén B. Méndez, Rafael J. Vicente, Jaime S. Buruaga, Laura Ortiz, Chi-Hang Fred Fung, Momtchil Peev, José M. Rivas-Moscoso, Felipe Jiménez, Antonio Pastor, Diego R. López, Jesús Folgueira, Vicente Martín,

(参考訳) 量子通信(Quantum Communications)は、暗号、量子コンピューティング、クロック同期などの潜在的な応用の進歩を約束する分野である。しかし、量子現象に基づく通信は外部の障害からの極度の分離を必要とし、古典的な現象と共に量子信号の伝送が困難になる。すでに展開されている光ネットワークにおいて、レガシトラフィックも持つ量子通信を導入するために、さまざまな技術がテストされている。これは物理的なレイヤだけでなく、運用層や管理層でも問題が発生します。ネットワーク運用者の間で広く受け入れられるためには、量子的資源と古典的資源の共同管理と運用、標準の遵守、品質と法的保証に対処する必要がある。この記事では、MadQCI(Madrid Quantum Communication Infrastructure)テストベッドにデプロイされ、評価された上記の問題に対するソリューションの詳細な説明を紹介する。このネットワークは、2つの異なるオペレータのプロダクションノードに複数のプロバイダから量子キー分散モジュールをインストールすることで、通信エコシステムに量子通信を統合するように設計されている。モジュールは130km以上の光ファイバーを配置した光スイッチネットワークを介して接続された。テストは、既存の古典的ネットワークのレガシートラフィックを保護する厳格なサービスレベルの合意に従って実施された。目標は、光学トランスポートと暗号化の変更を制限し、可能な限り多くの標準に準拠しながら、あらゆるレベルで完全な量子古典互換を実現することであった。この取り組みは、大規模な量子ネットワーク展開の基盤として使用できるブループリントとして機能することを目的としていた。 MadQCIの機能を示すために、エンドツーエンドの暗号化サービスがデプロイされ、さまざまなユースケースが紹介された。

Quantum Communications is a field that promises advances in cryptography, quantum computing and clock synchronisation, among other potential applications. However, communication based on quantum phenomena requires an extreme level of isolation from external disturbances, making the transmission of quantum signals together with classical ones difficult. A range of techniques has been tested to introduce quantum communications in already deployed optical networks which also carry legacy traffic. This comes with challenges, not only at the physical layer but also at the operations and management layer. To achieve a broad acceptance among network operators, the joint management and operation of quantum and classical resources, compliance with standards, and quality and legal assurance need to be addressed. This article presents a detailed account of solutions to the above issues, deployed and evaluated in the MadQCI (Madrid Quantum Communication Infrastructure) testbed. This network is designed to integrate quantum communications in the telecommunications ecosystem by installing quantum-key-distribution modules from multiple providers in production nodes of two different operators. The modules were connected through an optical-switched network with more than 130 km of deployed optical fibre. The tests were done in compliance with strict service level agreements that protected the legacy traffic of the pre-existing classical network. The goal was to achieve full quantum-classical compatibility at all levels, while limiting the modifications of optical transport and encryption and complying with as many standards as possible. This effort was intended to serve as a blueprint, which can be used as the foundation of large-scale quantum network deployments. To demonstrate the capabilities of MadQCI, end-to-end encryption services were deployed and a variety of use-cases were showcased.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# VideoLLaMB:リカレントメモリブリッジによる長文ビデオ理解

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges ( http://arxiv.org/abs/2409.01071v1 )

ライセンス: Link先を確認

Yuxuan Wang, Cihang Xie, Yang Liu, Zilong Zheng,

(参考訳) 近年の大規模ビデオ言語モデルの進歩は、リアルタイムプランニングや詳細なインタラクションにおいて大きな可能性を秘めている。しかし、それらの高い計算要求と注釈付きデータセットの不足は、学術研究者にとって実用性を制限している。本稿では,橋梁内の時間的メモリトークンを利用して,歴史的視覚データとともにビデオシーケンス全体を符号化し,意味的連続性を効果的に保ち,様々なタスクにおけるモデル性能を向上させるための,新しいフレームワークであるVideoLLaMBを紹介する。このアプローチには、リカレントメモリトークンと、ビデオを独立したセマンティックユニットに分割してセマンティックな整合性を維持するSceneTillingアルゴリズムが含まれている。実証的に、VideoLLaMBは既存のビデオ言語モデルを大きく上回り、3つのVideoQAベンチマークで競合製品よりも5.5ポイント、エゴセントリックプランニングでは2.06ポイント改善されている。 MVBench の総合的な結果から, VideoLLaMB-7B は, 従来の 7B モデルと同等の LLM モデルよりも著しく良好な結果が得られることが示された。ビデオ長が最大8倍になるにもかかわらず、PLLaVAとして頑丈な性能を維持している。さらに,ビデオハイスタック(NIAVH)ベンチマークのフレーム検索結果から,長大なビデオ内の特定のフレームを正確に識別する VideoLLaMB の長所を検証した。我々のSceneTillingアルゴリズムは、追加のトレーニングを必要とせずに、ストリーミングビデオキャプションを直接生成することを可能にする。 16フレームでトレーニングされたVideoLLaMBは、リニアGPUメモリスケーリングを備えた1台のNvidia A100 GPU上で最大320フレームをサポートする。

Recent advancements in large-scale video-language models have shown significant potential for real-time planning and detailed interactions. However, their high computational demands and the scarcity of annotated datasets limit their practicality for academic researchers. In this work, we introduce VideoLLaMB, a novel framework that utilizes temporal memory tokens within bridge layers to allow for the encoding of entire video sequences alongside historical visual data, effectively preserving semantic continuity and enhancing model performance across various tasks. This approach includes recurrent memory tokens and a SceneTilling algorithm, which segments videos into independent semantic units to preserve semantic integrity. Empirically, VideoLLaMB significantly outstrips existing video-language models, demonstrating a 5.5 points improvement over its competitors across three VideoQA benchmarks, and 2.06 points on egocentric planning. Comprehensive results on the MVBench show that VideoLLaMB-7B achieves markedly better results than previous 7B models of same LLM. Remarkably, it maintains robust performance as PLLaVA even as video length increases up to 8 times. Besides, the frame retrieval results on our specialized Needle in a Video Haystack (NIAVH) benchmark, further validate VideoLLaMB's prowess in accurately identifying specific frames within lengthy videos. Our SceneTilling algorithm also enables the generation of streaming video captions directly, without necessitating additional training. In terms of efficiency, VideoLLaMB, trained on 16 frames, supports up to 320 frames on a single Nvidia A100 GPU with linear GPU memory scaling, ensuring both high performance and cost-effectiveness, thereby setting a new foundation for long-form video-language models in both academic and practical applications.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# 逆気象条件下でのロバストなオンラインドメイン適応セマンティックセマンティックセグメンテーションに向けて

Towards Robust Online Domain Adaptive Semantic Segmentation under Adverse Weather Conditions ( http://arxiv.org/abs/2409.01072v1 )

ライセンス: Link先を確認

Taorong Liu, Jing Xiao, Liang Liao, Chia-Wen Lin,

(参考訳) オンラインドメイン適応(OnDA)は、急激な気象イベントなど、ドメイン間の明確な境界を欠いた、モデルのデプロイ中に発生する最小限のコストで、予期せぬドメイン変更を処理するように設計されている。しかし、現在のドメインに適応するためにモデル自体にのみ依存する既存のOnDAメソッドは、連続的なドメインシフトの中で曖昧なクラスを誤って識別し、この誤った知識を次のドメインに渡します。そこで本研究では, ドメインシフトを動的に検出し, ハイパーパラメータを調整し, トレーニングコストとエラー伝搬を最小化するために, ハイパーパラメータを動的に調整する, テキストbf{R}obust \textbf{O}nline \textbf{D}omain \textbf{A}daptive \textbf{S}emantic \textbf{S}emantic \textbf{S}egmentation frameworkを提案する。具体的には、高度に乱れた領域を動的に選択し、これらの領域を隠蔽し、曖昧なクラスにおけるエラーの蓄積を緩和し、動的自然環境における外部ノイズに対するモデルの堅牢性を高める、 \textbf{D}ynamic \textbf{A}mbiguous \textbf{P}atch \textbf{Mask} (\textbf{DAP Mask})戦略を導入する。さらに、ターゲットドメインシーンをクラスレベルのソースバッファで拡張し、高い不確実性とノイズのあるラベルを低減し、適応を加速し、オンラインドメイン適応のためのより効率的なソリューションを提供するドメイン認識混合手法である、 \textbf{D}ynamic \textbf{S}ource \textbf{C}lass \textbf{Mix} (\textbf{DSC Mix})を提案する。提案手法は,約40フレーム/秒(FPS)を維持しながら,広く使用されているOnDAベンチマークの最先端手法より優れている。

Online Domain Adaptation (OnDA) is designed to handle unforeseeable domain changes at minimal cost that occur during the deployment of the model, lacking clear boundaries between the domain, such as sudden weather events. However, existing OnDA methods that rely solely on the model itself to adapt to the current domain often misidentify ambiguous classes amidst continuous domain shifts and pass on this erroneous knowledge to the next domain. To tackle this, we propose \textbf{RODASS}, a \textbf{R}obust \textbf{O}nline \textbf{D}omain \textbf{A}daptive \textbf{S}emantic \textbf{S}egmentation framework, which dynamically detects domain shifts and adjusts hyper-parameters to minimize training costs and error propagation. Specifically, we introduce the \textbf{D}ynamic \textbf{A}mbiguous \textbf{P}atch \textbf{Mask} (\textbf{DAP Mask}) strategy, which dynamically selects highly disturbed regions and masks these regions, mitigating error accumulation in ambiguous classes and enhancing the model's robustness against external noise in dynamic natural environments. Additionally, we present the \textbf{D}ynamic \textbf{S}ource \textbf{C}lass \textbf{Mix} (\textbf{DSC Mix}), a domain-aware mix method that augments target domain scenes with class-level source buffers, reducing the high uncertainty and noisy labels, thereby accelerating adaptation and offering a more efficient solution for online domain adaptation. Our approach outperforms state-of-the-art methods on widely used OnDA benchmarks while maintaining approximately 40 frames per second (FPS).

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# SCOPE: LLMの埋め込みによる手話文脈処理

SCOPE: Sign Language Contextual Processing with Embedding from LLMs ( http://arxiv.org/abs/2409.01073v1 )

ライセンス: Link先を確認

Yuqi Liu, Wenqian Zhang, Sihan Ren, Chengyu Huang, Jingyi Yu, Lan Xu,

(参考訳) 世界中の約7000万人の聴覚障害者が使用する手話は、視覚的および文脈的な情報を伝える視覚言語である。視覚に基づく手話認識(SLR)と翻訳(SLT)の現在の手法は、限られたデータセットの多様性と文脈に関連のある情報の無視により、対話シーンに苦慮している。これらの課題に対処するために,新しいコンテキスト認識型SLRおよびSLTフレームワークであるSCOPE(Sign Language Contextual Processing with Embedding from LLMs)を紹介する。 SLRでは,多モードエンコーダを用いて対話コンテキストを利用し,光度レベル認識を強化する。後続のSLTでは、事前の会話コンテキストを取り入れたLarge Language Model(LLM)をさらに微調整する。また,72時間の中国語手話ビデオを含む新しい手話データセットを,様々なシナリオにおける文脈対話に貢献する。我々のSCOPEフレームワークは,Phoenix-2014T,CSL-Daily,SCOPEデータセットなど,複数のデータセット上で最先端のパフォーマンスを実現している。さらに,Deafコミュニティの参加者による調査は,実世界の応用における我々のアプローチの堅牢性と有効性をさらに検証した。私たちのデータセットとコードはどちらも、さらなる研究を促進するためにオープンソース化されます。

Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information. Current methods in vision-based sign language recognition (SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information. To address these challenges, we introduce SCOPE (Sign language Contextual Processing with Embedding from LLMs), a novel context-aware vision-based SLR and SLT framework. For SLR, we utilize dialogue contexts through a multi-modal encoder to enhance gloss-level recognition. For subsequent SLT, we further fine-tune a Large Language Model (LLM) by incorporating prior conversational context. We also contribute a new sign language dataset that contains 72 hours of Chinese sign language videos in contextual dialogues across various scenarios. Experimental results demonstrate that our SCOPE framework achieves state-of-the-art performance on multiple datasets, including Phoenix-2014T, CSL-Daily, and our SCOPE dataset. Moreover, surveys conducted with participants from the Deaf community further validate the robustness and effectiveness of our approach in real-world applications. Both our dataset and code will be open-sourced to facilitate further research.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# Bootstrap SGD:アルゴリズムの安定性とロバスト性

Bootstrap SGD: Algorithmic Stability and Robustness ( http://arxiv.org/abs/2409.01074v1 )

ライセンス: Link先を確認

Andreas Christmann, Yunwen Lei,

(参考訳) 本稿では,確率勾配降下(SGD)に対する経験的ブートストラップ法を用いて,分離可能なヒルベルト空間上の経験的リスクを最小限に抑える手法について,アルゴリズム的安定性と統計的ロバスト性の観点から検討する。最初の2つのアプローチは平均に基づいており、理論的観点から検討されている。アルゴリズム的安定性に基づくタイプ1とタイプ2のブートストラップSGDの一般化解析を行う。また、ブートストラップSGDを用いて、中央曲線の純粋に分布自由な点方向の信頼区間を構築することが可能であることを実証するために、ブートストラップSGDの別のタイプを提案する。

In this paper some methods to use the empirical bootstrap approach for stochastic gradient descent (SGD) to minimize the empirical risk over a separable Hilbert space are investigated from the view point of algorithmic stability and statistical robustness. The first two types of approaches are based on averages and are investigated from a theoretical point of view. A generalization analysis for bootstrap SGD of Type 1 and Type 2 based on algorithmic stability is done. Another type of bootstrap SGD is proposed to demonstrate that it is possible to construct purely distribution-free pointwise confidence intervals of the median curve using bootstrap SGD.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# 効率性を超えて: 一般化のための分子データプルーニング

Beyond Efficiency: Molecular Data Pruning for Enhanced Generalization ( http://arxiv.org/abs/2409.01081v1 )

ライセンス: Link先を確認

Dingshuo Chen, Zhixun Li, Yuyan Ni, Guibin Zhang, Ding Wang, Qiang Liu, Shu Wu, Jeffrey Xu Yu, Liang Wang,

(参考訳) 様々な分子タスクや大量のデータセットの出現により、効率的なトレーニングの実施は、この地域で急務だが未調査の課題となっている。データプルーニング(DP)は、トレーニングの負担を減らし、あまり影響力のないサンプルをフィルタリングし、トレーニングのコアセットを形成する。しかし、分子タスクの事前訓練モデルへの依存が高まると、従来のドメイン内DPメソッドは互換性がなくなる。そこで本研究では,データ解析を事前訓練したモデルに適用する,ソースフリーなデータ解析シナリオに焦点を当てた,拡張一般化(MolPeg)のための分子データ解析フレームワークを提案する。トレーニング中に異なる更新ペースで2つのモデルを維持することにより、損失差に基づいてサンプルの情報量を測定する新しいスコアリング機能を導入する。 MolPegはプラグイン・アンド・プレイのフレームワークとして、ソースドメインとターゲットドメインの両方の認識を実現し、4つの下流タスクで既存のDPメソッドを一貫して上回ります。注目すべきは、HIVおよびPCBAデータセット上のデータの60～70%をプルーニングしても、フルデータセットトレーニングから得られるパフォーマンスを上回ることができることだ。我々の研究は、効率的なデータ処理メトリクスの発見が、転送学習における効率の向上と優れた一般化の両方に有効な道をもたらすことを示唆している。

With the emergence of various molecular tasks and massive datasets, how to perform efficient training has become an urgent yet under-explored issue in the area. Data pruning (DP), as an oft-stated approach to saving training burdens, filters out less influential samples to form a coreset for training. However, the increasing reliance on pretrained models for molecular tasks renders traditional in-domain DP methods incompatible. Therefore, we propose a Molecular data Pruning framework for enhanced Generalization (MolPeg), which focuses on the source-free data pruning scenario, where data pruning is applied with pretrained models. By maintaining two models with different updating paces during training, we introduce a novel scoring function to measure the informativeness of samples based on the loss discrepancy. As a plug-and-play framework, MolPeg realizes the perception of both source and target domain and consistently outperforms existing DP methods across four downstream tasks. Remarkably, it can surpass the performance obtained from full-dataset training, even when pruning up to 60-70% of the data on HIV and PCBA dataset. Our work suggests that the discovery of effective data-pruning metrics could provide a viable path to both enhanced efficiency and superior generalization in transfer learning.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# 画像検索のための証拠変換器

Evidential Transformers for Improved Image Retrieval ( http://arxiv.org/abs/2409.01082v1 )

ライセンス: Link先を確認

Danilo Dordevic, Suryansh Kumar,

(参考訳) 本稿では,画像検索を改良し,頑健にするための不確実性駆動型トランスモデルであるEvidential Transformerを紹介する。本稿では,コンテンツベース画像検索(CBIR)にいくつかの貢献を行う。我々は,画像検索に確率的手法を取り入れ,堅牢で信頼性の高い結果を得る。さらに,Global Context Vision Transformer (GC ViT) アーキテクチャを利用して,複数のデータセットの最先端検索結果を改善する。 SOP(Stanford Online Products)とCUB-200-2011データセットのすべてのテスト設定でCBIRに新しいベンチマークを設定することで、我々のアプローチの信頼性を一貫して実証した。

We introduce the Evidential Transformer, an uncertainty-driven transformer model for improved and robust image retrieval. In this paper, we make several contributions to content-based image retrieval (CBIR). We incorporate probabilistic methods into image retrieval, achieving robust and reliable results, with evidential classification surpassing traditional training based on multiclass classification as a baseline for deep metric learning. Furthermore, we improve the state-of-the-art retrieval results on several datasets by leveraging the Global Context Vision Transformer (GC ViT) architecture. Our experimental results consistently demonstrate the reliability of our approach, setting a new benchmark in CBIR in all test settings on the Stanford Online Products (SOP) and CUB-200-2011 datasets.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# フローマッチングによるアフォーマンス型ロボット操作

Affordance-based Robot Manipulation with Flow Matching ( http://arxiv.org/abs/2409.01083v1 )

ライセンス: Link先を確認

Fan Zhang, Michael Gienger,

(参考訳) 本稿では,人間を含むマルチタスクデータを収集する場合,特に日常の生活環境において,視覚的空き時間モデルに基づいて,ロボットの軌道を効果的に学習する,という2つの基本的な課題に焦点を当てた支援ロボット操作の枠組みを提案する。学習可能なテキストを凍結視覚モデルにプリペイドするパラメータ効率の高いプロンプトチューニング手法を用いて,マルチタスクシナリオにおける操作能力の予測を行う。そこで本研究では,教師付きフローマッチング手法を用いて,ロボットの軌道を手頃な価格で案内する手法を提案する。フローマッチングは、望まれるロボット軌道にランダムなウェイポイントを流れる条件プロセスとして、ロボットビズモータポリシーを表す。最後に、私たちのフレームワークをテストするために、デイリーリビングのアクティビティにまたがる10のタスクからなる現実世界のデータセットを紹介します。提案手法では, パラメータ効率を満足しつつ, 言語プロンサによる操作能力向上のためのプロンプトチューニング手法が, 競合性能を達成し, データスケールにおける他の微調整プロトコルよりも優れていた。単一フローマッチングポリシによるマルチタスクロボット軌道の学習も,特にマルチモーダルロボット動作分布を考慮すれば,代替動作クローン法よりも一貫してパフォーマンスが向上する。本フレームワークは,ロボット操作のためのフローマッチングにより,相性モデル学習と軌道生成をシームレスに統合する。

We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understanding tasks, especially in daily living scenarios where gathering multi-task data involving humans requires strenuous effort; second, effectively learning robot trajectories by grounding the visual affordance model. We tackle the first challenge by employing a parameter-efficient prompt tuning method that prepends learnable text prompts to the frozen vision model to predict manipulation affordances in multi-task scenarios. Then we propose to learn robot trajectories guided by affordances in a supervised Flow Matching method. Flow matching represents a robot visuomotor policy as a conditional process of flowing random waypoints to desired robot trajectories. Finally, we introduce a real-world dataset with 10 tasks across Activities of Daily Living to test our framework. Our extensive evaluation highlights that the proposed prompt tuning method for learning manipulation affordance with language prompter achieves competitive performance and even outperforms other finetuning protocols across data scales, while satisfying parameter efficiency. Learning multi-task robot trajectories with a single flow matching policy also leads to consistently better performance than alternative behavior cloning methods, especially given multimodal robot action distributions. Our framework seamlessly unifies affordance model learning and trajectory generation with flow matching for robot manipulation.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# DPD編集:マルチモーダルファッション画像編集のための詳細保存拡散モデル

DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing ( http://arxiv.org/abs/2409.01086v1 )

ライセンス: Link先を確認

Xiaolong Wang, Zhi-Qi Cheng, Jue Wang, Xiaojiang Peng,

(参考訳) ファッション画像編集は、デザイン概念をインタラクティブに視覚化することで、デザイナーが創造的なアイデアを伝える上で重要なツールである。現在のファッション画像編集技術は、マルチモーダルプロンプトと強力な拡散モデルによって進歩しているが、しばしば編集領域を正確に識別し、望ましいテクスチャの詳細を保存するのに苦労する。これらの課題に対処するために,我々は,Detail-Preserved Diffusion Models (DPDEdit) と呼ばれる潜在拡散モデルに基づく,新しいマルチモーダルなファッション画像編集アーキテクチャを導入する。 DPDEditは、テキストプロンプト、地域マスク、人間のポーズイメージ、衣料品のテクスチャイメージを統合することで、拡散モデルのファッション画像生成をガイドする。編集領域を正確に特定するために、まず、ユーザのテキスト記述に基づいて編集領域を予測し、他の条件と組み合わせてローカル編集を行う。テクスチャのテクスチャの詳細を対象のファッションイメージに転送するために,テクスチャ注入と精細化機構を提案する。具体的には、このメカニズムは、テキスト記述とテクスチャイメージを統合するために分離されたクロスアテンション層を使用し、補助的なU-Netを組み込んで、生成されたテクスチャテクスチャの高周波の詳細を保存する。さらに,マルチモーダルな言語モデルを用いてVITON-HDデータセットを拡張し,テクスチャ画像とテクスチャ記述を用いたペアサンプルを生成する。広汎な実験により,DPDEditは与えられたマルチモーダル入力と画像の忠実度とコヒーレンスの観点から,最先端の手法よりも優れていた。

Fashion image editing is a crucial tool for designers to convey their creative ideas by visualizing design concepts interactively. Current fashion image editing techniques, though advanced with multimodal prompts and powerful diffusion models, often struggle to accurately identify editing regions and preserve the desired garment texture detail. To address these challenges, we introduce a new multimodal fashion image editing architecture based on latent diffusion models, called Detail-Preserved Diffusion Models (DPDEdit). DPDEdit guides the fashion image generation of diffusion models by integrating text prompts, region masks, human pose images, and garment texture images. To precisely locate the editing region, we first introduce Grounded-SAM to predict the editing region based on the user's textual description, and then combine it with other conditions to perform local editing. To transfer the detail of the given garment texture into the target fashion image, we propose a texture injection and refinement mechanism. Specifically, this mechanism employs a decoupled cross-attention layer to integrate textual descriptions and texture images, and incorporates an auxiliary U-Net to preserve the high-frequency details of generated garment texture. Additionally, we extend the VITON-HD dataset using a multimodal large language model to generate paired samples with texture images and textual descriptions. Extensive experiments show that our DPDEdit outperforms state-of-the-art methods in terms of image fidelity and coherence with the given multimodal inputs.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# キーワード予測のための事前学習言語モデル:レビュー

Pre-Trained Language Models for Keyphrase Prediction: A Review ( http://arxiv.org/abs/2409.01087v1 )

ライセンス: Link先を確認

Muhammad Umair, Tangina Sultana, Young-Koo Lee,

(参考訳) キーフレーズ予測(KP)は、その内容を要約できる文書中のキーフレーズを特定するのに不可欠である。しかし、近年の自然言語処理(NLP)の進歩により、ディープラーニング技術を用いたより効率的なKPモデルが開発されている。事前学習言語モデルを用いたキーフレーズ抽出と生成の併用による包括的探索の制限は,文献における重要なギャップを浮き彫りにし,本研究は,この欠損を橋渡しし,過去の調査の限界に対処するための統一的かつ詳細な分析を提供するよう,我々の調査論文を説得する。そこで本研究では,キーフレーズ抽出(KPE)とキーフレーズ生成(KPG)の2種類のタスクについて,異なる学習技術(スーパーバイザ,教師なし,半教師付き,自己教師付き)を用いて,大規模テキストコーパスで学習する,キーフレーズ予測のための事前学習言語モデル(PLM-KP)のトピックを広く検討する。 PLM-KPE と KPG に適切な分類法を導入し,これらの2つの NLP の課題を強調した。さらに,キーフレーズの予測に期待できる今後の方向性を指摘する。

Keyphrase Prediction (KP) is essential for identifying keyphrases in a document that can summarize its content. However, recent Natural Language Processing (NLP) advances have developed more efficient KP models using deep learning techniques. The limitation of a comprehensive exploration jointly both keyphrase extraction and generation using pre-trained language models spotlights a critical gap in the literature, compelling our survey paper to bridge this deficiency and offer a unified and in-depth analysis to address limitations in previous surveys. This paper extensively examines the topic of pre-trained language models for keyphrase prediction (PLM-KP), which are trained on large text corpora via different learning (supervisor, unsupervised, semi-supervised, and self-supervised) techniques, to provide respective insights into these two types of tasks in NLP, precisely, Keyphrase Extraction (KPE) and Keyphrase Generation (KPG). We introduce appropriate taxonomies for PLM-KPE and KPG to highlight these two main tasks of NLP. Moreover, we point out some promising future directions for predicting keyphrases.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# 分散学習に基づくプライバシ保護記録リンクの実現に向けて

Towards Split Learning-based Privacy-Preserving Record Linkage ( http://arxiv.org/abs/2409.01088v1 )

ライセンス: Link先を確認

Michail Zervas, Alexandros Karakasidis,

(参考訳) ユーザデータのプライバシが要求されるアプリケーションを容易にするために、Split Learningが最近導入された。しかし、プライバシ保存記録リンク(Privacy-Preserving Record Linkage)は、異なるデータ所有者のデータベース間で同一の現実世界のエンティティを識別する問題であるが、追加情報は開示されていない。本稿では,プライバシ保存記録マッチングのための分割学習の可能性について検討し,従来型の集中型SVM技術に対する最小のマッチング効果を示す参照セットの利用を通じて,新たなトレーニング手法を導入する。

Split Learning has been recently introduced to facilitate applications where user data privacy is a requirement. However, it has not been thoroughly studied in the context of Privacy-Preserving Record Linkage, a problem in which the same real-world entity should be identified among databases from different dataholders, but without disclosing any additional information. In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching, by introducing a novel training method through the utilization of Reference Sets, which are publicly available data corpora, showcasing minimal matching impact against a traditional centralized SVM-based technique.

翻訳日:2024-09-06 07:38:47 公開日:2024-09-02

# CARIn:シングルDNNおよびマルチDNNワークロードのための不均一デバイスに対する制約認識と応答推論

CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads ( http://arxiv.org/abs/2409.01089v1 )

ライセンス: Link先を確認

Ioannis Panopoulos, Stylianos I. Venieris, Iakovos S. Venieris,

(参考訳) 近年のディープラーニングアプリケーションの絶え間ない拡大は、リアルタイム処理の急激な要求、プライバシーの懸念の高まり、さまざまなドメイン間のレイテンシの低減などによって、デバイス上での実行に対する重要なシフトを引き起こしている。本稿では,モバイルデバイス上でのディープニューラルネットワーク(DNN)の実行を最適化する上で,デバイスの不均一性,マルチDNN実行,動的ランタイム適応といった課題に対処する。 CARInは、ユーザ定義のサービスレベルの目的の下で、シングルDNNおよびマルチDNNアプリケーションの最適化デプロイ用に設計された新しいフレームワークである。 MOOソルバとして表現型多目的最適化フレームワークとランタイム対応ソート・検索アルゴリズム(RASS)を活用して、CARInは、マルチDNN実行に伴うリソース競合問題に対処しながら、動的条件への効率的な適応を容易にする。特に、RASSは一連の構成を生成し、その後の実行時適応を予測し、環境変動に応じて迅速に低オーバーヘッドの調整を行う。テキスト分類、シーン認識、顔分析など、さまざまなタスクにわたる広範囲な評価は、畳み込みニューラルネットワークやトランスフォーマー、現実的なユースケースなど、さまざまなモデルアーキテクチャにおけるCARInの汎用性を示している。現状のOODInフレームワークとは対照的に,単一モデルの設計では1.92倍,最大10.69倍に達した。さらに,マルチDNNアプリケーションにおいてハードウェアを意識しない設計に比べて,最大4.06倍の高速化を実現している。最後に,環境問題に対する最適設計の特定に係わる時間的オーバーヘッドを効果的に排除しつつ,その性能を維持する。

The relentless expansion of deep learning applications in recent years has prompted a pivotal shift toward on-device execution, driven by the urgent need for real-time processing, heightened privacy concerns, and reduced latency across diverse domains. This article addresses the challenges inherent in optimising the execution of deep neural networks (DNNs) on mobile devices, with a focus on device heterogeneity, multi-DNN execution, and dynamic runtime adaptation. We introduce CARIn, a novel framework designed for the optimised deployment of both single- and multi-DNN applications under user-defined service-level objectives. Leveraging an expressive multi-objective optimisation framework and a runtime-aware sorting and search algorithm (RASS) as the MOO solver, CARIn facilitates efficient adaptation to dynamic conditions while addressing resource contention issues associated with multi-DNN execution. Notably, RASS generates a set of configurations, anticipating subsequent runtime adaptation, ensuring rapid, low-overhead adjustments in response to environmental fluctuations. Extensive evaluation across diverse tasks, including text classification, scene recognition, and face analysis, showcases the versatility of CARIn across various model architectures, such as Convolutional Neural Networks and Transformers, and realistic use cases. We observe a substantial enhancement in the fair treatment of the problem's objectives, reaching 1.92x when compared to single-model designs and up to 10.69x in contrast to the state-of-the-art OODIn framework. Additionally, we achieve a significant gain of up to 4.06x over hardware-unaware designs in multi-DNN applications. Finally, our framework sustains its performance while effectively eliminating the time overhead associated with identifying the optimal design in response to environmental challenges.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# ディジタルツインネットワークの2時間同期とマイグレーション:マルチエージェント深部強化学習アプローチ

Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach ( http://arxiv.org/abs/2409.01092v1 )

ライセンス: Link先を確認

Wenshuai Liu, Yaru Fu, Yongna Guo, Fu Lee Wang, Wen Sun, Yan Zhang,

(参考訳) デジタル双生児(DT)は、物理的世界のリアルタイム状態を表現し、自己維持システムを実現するための有望なイネーブラーとして登場した。実際には、モバイルユーザ(MU)のような物理デバイスのDTは、レイテンシを低減するために、マルチアクセスエッジコンピューティング(MEC)ネットワークに一般的にデプロイされる。 DTの精度と忠実性を確保するためには、MUがDTと定期的にステータスを同期させることが不可欠である。しかし、MUモビリティはDT同期に重大な課題をもたらす。まず、MUモビリティはDTマイグレーションをトリガーし、同期障害を引き起こす可能性がある。次に、MUはDTの忠実性を保証するためにDTと頻繁に同期する必要がある。それでも、MUモビリティによって引き起こされるMECサーバ間のDTマイグレーションは、頻繁に発生する可能性がある。そこで本稿では,MUの長期平均エネルギー消費を最小限に抑えるために,非凸確率問題を確立することにより,信頼性を考慮した2段階のDT同期・マイグレーションフレームワークを提案する。我々はリアプノフ理論を用いて信頼性制約を変換し、新しい問題を部分的に観測可能なマルコフ決定過程(POMDP)として再構成する。さらに,ベータ分布(Beta-HAPPO)法による不均一なエージェント近似ポリシー最適化手法を開発し,その解法を提案する。シミュレーションの結果, 提案手法は, 他のベンチマークと比較すると, 省エネ性を大幅に向上することがわかった。

Digital twins (DTs) have emerged as a promising enabler for representing the real-time states of physical worlds and realizing self-sustaining systems. In practice, DTs of physical devices, such as mobile users (MUs), are commonly deployed in multi-access edge computing (MEC) networks for the sake of reducing latency. To ensure the accuracy and fidelity of DTs, it is essential for MUs to regularly synchronize their status with their DTs. However, MU mobility introduces significant challenges to DT synchronization. Firstly, MU mobility triggers DT migration which could cause synchronization failures. Secondly, MUs require frequent synchronization with their DTs to ensure DT fidelity. Nonetheless, DT migration among MEC servers, caused by MU mobility, may occur infrequently. Accordingly, we propose a two-timescale DT synchronization and migration framework with reliability consideration by establishing a non-convex stochastic problem to minimize the long-term average energy consumption of MUs. We use Lyapunov theory to convert the reliability constraints and reformulate the new problem as a partially observable Markov decision-making process (POMDP). Furthermore, we develop a heterogeneous agent proximal policy optimization with Beta distribution (Beta-HAPPO) method to solve it. Numerical results show that our proposed Beta-HAPPO method achieves significant improvements in energy savings when compared with other benchmarks.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# DS MYOLO: シナリオ駆動のためのSSMに基づく信頼性の高いオブジェクト検出器

DS MYOLO: A Reliable Object Detector Based on SSMs for Driving Scenarios ( http://arxiv.org/abs/2409.01093v1 )

ライセンス: Link先を確認

Yang Li, Jianli Xiao,

(参考訳) 正確なリアルタイムオブジェクト検出により、高度な運転支援システムの安全性が向上し、運転シナリオに不可欠なコンポーネントとなる。ディープラーニング技術の急速な発展に伴い、CNNベースのリアルタイムオブジェクト検出器YOLOが注目されている。しかし、CNNのローカルな焦点はパフォーマンスのボトルネックをもたらす。検出器性能をさらに向上するため、研究者らはグローバルな受容場を利用するトランスフォーマーベースの自己認識機構を導入したが、その2次複雑さは計算コストを大幅に上回っている。最近、マンバは線形複雑であり、地球規模の選択的走査によって大きな進歩を遂げた。マンバの卓越した性能に触発されて,我々は新しい物体検出器DS MYOLOを提案する。この検出器は、単純化された選択的走査型融合ブロック(SimVSS Block)を通してグローバルな特徴情報をキャプチャし、ネットワークの深い特徴を効果的に統合する。さらに,計算複雑性を低く保ちながら,チャネル間の特徴的相互作用を向上させる効率的なチャネルアテンション畳み込み(ECAConv)を導入する。 CCTSDB 2021およびVLD-45駆動シナリオデータセットの大規模な実験により、DS MYOLOは、同様のスケールのYOLOシリーズのリアルタイムオブジェクト検出器において、大きな可能性と競争上の優位性を示すことが示された。

Accurate real-time object detection enhances the safety of advanced driver-assistance systems, making it an essential component in driving scenarios. With the rapid development of deep learning technology, CNN-based YOLO real-time object detectors have gained significant attention. However, the local focus of CNNs results in performance bottlenecks. To further enhance detector performance, researchers have introduced Transformer-based self-attention mechanisms to leverage global receptive fields, but their quadratic complexity incurs substantial computational costs. Recently, Mamba, with its linear complexity, has made significant progress through global selective scanning. Inspired by Mamba's outstanding performance, we propose a novel object detector: DS MYOLO. This detector captures global feature information through a simplified selective scanning fusion block (SimVSS Block) and effectively integrates the network's deep features. Additionally, we introduce an efficient channel attention convolution (ECAConv) that enhances cross-channel feature interaction while maintaining low computational complexity. Extensive experiments on the CCTSDB 2021 and VLD-45 driving scenarios datasets demonstrate that DS MYOLO exhibits significant potential and competitive advantage among similarly scaled YOLO series real-time object detectors.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# OCMG-Net:非構造点雲のニューラル配向正規化

OCMG-Net: Neural Oriented Normal Refinement for Unstructured Point Clouds ( http://arxiv.org/abs/2409.01100v1 )

ライセンス: Link先を確認

Yingrui Wu, Mingyang Zhao, Weize Quan, Jian Shi, Xiaohong Jia, Dong-Ming Yan,

(参考訳) 非構造点雲から指向性正規項を推定するための頑健な精錬法を提案する。計算の複雑さに悩まされたり、望ましい精度を達成できなかった従来の手法とは対照的に、我々の新しいフレームワークは、特徴空間に手話方向とデータ拡張を取り入れ、初期指向の正規性を洗練させ、効率と精度のバランスを損なう。従来の手法ではノイズによる方向の不整合の問題に対処するため,クリーンな点の雲に最も近い点でアノテートされた正規を補正することにより,推定誤差を忠実に最小化する,Chamfer Normal Distanceと呼ばれる新しい指標を導入する。このメトリクスは、課題に取り組むだけでなく、ネットワークトレーニングを支援し、ノイズに対するネットワークの堅牢性を大幅に向上させる。さらに,マルチスケールな局所的特徴集約と階層的幾何情報融合を統合し,複雑な幾何学的詳細をより効果的に捕捉し,スケール選択のあいまいさを顕著に低減する,革新的なデュアル並列アーキテクチャを提案する。室内および屋外シナリオ間の合成および実世界のデータセット間の非指向性および指向性正規推定タスクにおいて,本手法の優位性と汎用性を示す。コードはhttps://github.com/YingruiWoo/OCMG-Net.gitで公開されている。

We present a robust refinement method for estimating oriented normals from unstructured point clouds. In contrast to previous approaches that either suffer from high computational complexity or fail to achieve desirable accuracy, our novel framework incorporates sign orientation and data augmentation in the feature space to refine the initial oriented normals, striking a balance between efficiency and accuracy. To address the issue of noise-caused direction inconsistency existing in previous approaches, we introduce a new metric called the Chamfer Normal Distance, which faithfully minimizes the estimation error by correcting the annotated normal with the closest point found on the potentially clean point cloud. This metric not only tackles the challenge but also aids in network training and significantly enhances network robustness against noise. Moreover, we propose an innovative dual-parallel architecture that integrates Multi-scale Local Feature Aggregation and Hierarchical Geometric Information Fusion, which enables the network to capture intricate geometric details more effectively and notably reduces ambiguity in scale selection. Extensive experiments demonstrate the superiority and versatility of our method in both unoriented and oriented normal estimation tasks across synthetic and real-world datasets among indoor and outdoor scenarios. The code is available at https://github.com/YingruiWoo/OCMG-Net.git.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# 進化的ソフトアクター批判によるAIオリンピックの挑戦

AI Olympics challenge with Evolutionary Soft Actor Critic ( http://arxiv.org/abs/2409.01104v1 )

ライセンス: Link先を確認

Marco Calì, Alberto Sinigaglia, Niccolò Turcato, Ruggero Carli, Gian Antonio Susto,

(参考訳) 次回報告では、IROS 2024で開催されるAIオリンピック大会の解決策について述べる。私たちのソリューションは、モデルフリーのDeep Reinforcement Learningアプローチと進化戦略を組み合わせています。使用済みのアルゴリズムを簡潔に記述し、そのアプローチの詳細を提供する。

In the following report, we describe the solution we propose for the AI Olympics competition held at IROS 2024. Our solution is based on a Model-free Deep Reinforcement Learning approach combined with an evolutionary strategy. We will briefly describe the algorithms that have been used and then provide details of the approach

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# Poster: O-RANセキュリティテストラボの開発

Poster: Developing an O-RAN Security Test Lab ( http://arxiv.org/abs/2409.01107v1 )

ライセンス: Link先を確認

Sotiris Michaelides, David Rupprecht, Katharina Kohls,

(参考訳) Open Radio Access Networks (ORAN) は、数年前に提案された新しいアーキテクチャアプローチであり、5Gの現在の次世代無線アクセスネットワーク(NG-RAN)の拡張である。 ORANは、さまざまなRadio Access Networks(RAN)コンポーネント間のオープンインターフェースを実装し、マシンラーニングや仮想化、デアグリゲーションといったモダンなテクノロジをRANに導入することで、少数のベンダによってコントロールされる、クローズドなRAN市場を破ることを目指している。しかし、ORANのアーキテクチャ設計は、そのセキュリティに関する懸念や議論を引き起こしており、これはその大きな欠点の1つと考えられている。 ORANに関するいくつかの理論的リスク分析が実施されているが、私たちの知る限りでは、まだ1つの実践的リスク解析も行われていない。本ポスターでは,ORAN 5Gネットワークを最小限かつ将来的に展開する手法について論じる。

Open Radio Access Networks (ORAN) is a new architectural approach, having been proposed only a few years ago, and it is an expansion of the current Next Generation Radio Access Networks (NG-RAN) of 5G. ORAN aims to break this closed RAN market that is controlled by a handful of vendors, by implementing open interfaces between the different Radio Access Networks (RAN) components, and by introducing modern technologies to the RAN like machine learning, virtualization, and disaggregation. However, the architectural design of ORAN was recently causing concerns and debates about its security, which is considered one of its major drawbacks. Several theoretical risk analyses related to ORAN have been conducted, but to the best of our knowledge, not even a single practical one has been performed yet. In this poster, we discuss and propose a way for a minimal, future-proof deployment of an ORAN 5G network, able to accommodate various hands-on security analyses for its different elements.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# SOOD-ImageNet:Semantic Out-Of-Distribution Image ClassificationとSemantic Segmentationのための大規模データセット

SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation ( http://arxiv.org/abs/2409.01109v1 )

ライセンス: Link先を確認

Alberto Bacchin, Davide Allegro, Stefano Ghidoni, Emanuele Menegatti,

(参考訳) コンピュータビジョンにおけるアウト・オブ・ディストリビューション(OOD)の検出は重要な研究領域であり、関連するベンチマークは実際のシナリオにおけるモデルの一般化可能性とその適用性を評価する上で重要な役割を果たす。しかし、文献における既存のOODベンチマークには、1)潜在的な課題としてセマンティックシフトを見落としている場合が多く、(2)現代のモデルのトレーニングに使用される大規模なデータセットと比較して、その規模は限られている。これらのギャップに対処するために,OOD条件下でのイメージ分類やセマンティックセグメンテーションなどのコンピュータビジョンタスクのために設計された,56のクラスにまたがる約1.6万の画像からなる新しいデータセットSOOD-ImageNetを紹介し,セマンティックシフトの問題に焦点をあてる。我々は、人間の正確なチェックによって補完される現代の視覚言語モデルの能力を活用する革新的なデータエンジンを開発することで、必要なスケーラビリティと品質を確保した。我々は,SOOD-ImageNetにおける様々なモデルの広範囲なトレーニングと評価を通じて,OOD研究をコンピュータビジョンで大きく前進させる可能性を示す。プロジェクトページはhttps://github.com/bach05/SOODImageNet.gitで公開されている。

Out-of-Distribution (OOD) detection in computer vision is a crucial research area, with related benchmarks playing a vital role in assessing the generalizability of models and their applicability in real-world scenarios. However, existing OOD benchmarks in the literature suffer from two main limitations: (1) they often overlook semantic shift as a potential challenge, and (2) their scale is limited compared to the large datasets used to train modern models. To address these gaps, we introduce SOOD-ImageNet, a novel dataset comprising around 1.6M images across 56 classes, designed for common computer vision tasks such as image classification and semantic segmentation under OOD conditions, with a particular focus on the issue of semantic shift. We ensured the necessary scalability and quality by developing an innovative data engine that leverages the capabilities of modern vision-language models, complemented by accurate human checks. Through extensive training and evaluation of various models on SOOD-ImageNet, we showcase its potential to significantly advance OOD research in computer vision. The project page is available at https://github.com/bach05/SOODImageNet.git.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# 連続対称性群に対する量子スピン鎖の対称性保護状態の分類

Classification of symmetry protected states of quantum spin chains for continuous symmetry groups ( http://arxiv.org/abs/2409.01112v1 )

ライセンス: Link先を確認

Bruno de Oliveira Carvalho, Wojciech De Roeck, Tijl Jappens,

(参考訳) 量子スピン系の対称性保護状態(SPT)は、いくつかの著者によって研究された。有限オンサイト対称性群 $G$ に対応する SPT は、Kapustin et al [J. Math. Phys. (2021)] によって確立された第2コホモロジー群 $H^2(G,U(1))$ によって分類される。この結果はコンパクト位相対称性群 $G$ の場合に拡張する。我々はまた、我々の分類結果が局所的に有界なオンサイト次元を持つスピン鎖のクラスに収まるという意味で、既存の結果を強化する。

Symmetry protected states (SPT's) of quantum spin systems were studied by several authors. For one-dimensional systems (spin chains), there is an essentially complete and rigorous understanding: SPT's corresponding to finite on-site symmetry groups $G$ are classified by the second cohomology group $H^2(G,U(1))$, as established by Kapustin et al. [J. Math. Phys. (2021)]. We extend this result to the case of compact topological symmetry groups $G$. We also strengthen the existing results in the sense that our classification results holds within the class of spin chains with locally bounded on-site dimensions.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# KMTalk:キーモーション埋め込みによる音声駆動型3D顔アニメーション

KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding ( http://arxiv.org/abs/2409.01113v1 )

ライセンス: Link先を確認

Zhihao Xu, Shengjie Gong, Jiapeng Tang, Lingyu Liang, Yining Huang, Haojie Li, Shuangping Huang,

(参考訳) キーモーション埋め込みを用いた音声系列から3次元顔の動きを合成する新しい手法を提案する。データ駆動技術の最近の進歩にもかかわらず、音声信号と3D顔メッシュの正確なマッピングは依然として難しい。シーケンス全体の直接回帰は、しばしば問題の性質が不適切なため、過度に滑らかな結果をもたらす。そこで本研究では,キーモーションキャプチャを導入して3次元顔画像を生成するプログレッシブ学習機構を提案する。具体的には、言語ベースのキーモーション獲得とモーダル間動作完了という2つのモジュールを通して、言語とデータ駆動の先行情報を統合する。前者は重要な動きを識別し、関連する3D表情を学習し、正確な唇音声同期を保証する。後者は、キーモーションを音声機能によってガイドされた3D音声の完全なシーケンスに拡張し、時間的コヒーレンスとオーディオ-視覚的整合性を改善する。既存の最先端手法と比較して、より鮮明で一貫した会話顔アニメーションを生成する上で、我々のアプローチが優れていることを示す。提案手法を既存の手法と統合することにより,提案手法の有効性を裏付ける結果が得られた。コードと重みはプロジェクトのWebサイトにある: \url{https://github.com/ffxzh/KMTalk}。

We present a novel approach for synthesizing 3D facial motions from audio sequences using key motion embeddings. Despite recent advancements in data-driven techniques, accurately mapping between audio signals and 3D facial meshes remains challenging. Direct regression of the entire sequence often leads to over-smoothed results due to the ill-posed nature of the problem. To this end, we propose a progressive learning mechanism that generates 3D facial animations by introducing key motion capture to decrease cross-modal mapping uncertainty and learning complexity. Concretely, our method integrates linguistic and data-driven priors through two modules: the linguistic-based key motion acquisition and the cross-modal motion completion. The former identifies key motions and learns the associated 3D facial expressions, ensuring accurate lip-speech synchronization. The latter extends key motions into a full sequence of 3D talking faces guided by audio features, improving temporal coherence and audio-visual consistency. Extensive experimental comparisons against existing state-of-the-art methods demonstrate the superiority of our approach in generating more vivid and consistent talking face animations. Consistent enhancements in results through the integration of our proposed learning scheme with existing methods underscore the efficacy of our approach. Our code and weights will be at the project website: \url{https://github.com/ffxzh/KMTalk}.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# ランダム畳み込みカーネルに基づく変換を用いた時系列分類:プーリング演算子と入力表現が重要である

Time series classification with random convolution kernels based transforms: pooling operators and input representations matter ( http://arxiv.org/abs/2409.01115v1 )

ライセンス: Link先を確認

Mouhamadou Mansour Lo, Gildas Morvan, Mathieu Rossi, Fabrice Morganti, David Mercier,

(参考訳) 本稿では,SelF-Rocketと呼ばれるMiniRocketをベースとした,高速時系列分類(TSC)のための新しいアプローチを提案する。ランダムな畳み込みカーネルに基づく既存のアプローチとは異なり、トレーニングプロセス中に最適な入力表現とプーリング演算子を動的に選択する。 SelF-Rocketはカリフォルニア大学リバーサイド校(UCR)のベンチマークデータセットで最先端の精度を実現している。

This article presents a new approach based on MiniRocket, called SelF-Rocket, for fast time series classification (TSC). Unlike existing approaches based on random convolution kernels, it dynamically selects the best couple of input representations and pooling operator during the training process. SelF-Rocket achieves state-of-the-art accuracy on the University of California Riverside (UCR) TSC benchmark datasets.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# オープンソース・ソフトウェア・ソリューションの公共セクター買収におけるソフトロイン--市町村営Eサービスプラットフォームを事例として

Soft-lockins in Public Sector Acquisitions of Open Source Software-solutions: A Case Study on a Municipal E-Service Platform ( http://arxiv.org/abs/2409.01118v1 )

ライセンス: Link先を確認

Per Persson, Johan Linåker,

(参考訳) 背景: オープンソースソフトウェア(OSS)は、しばしばロックインのリスクを軽減するオプションと見なされる。しかし、単一ベンダのOSSは、知識の対称性と技術的な障壁のために、依然としてソフトなロックインをもたらす可能性がある。 Aim: この研究は、このようなソフトロックインをレンダリングするアクターを調査します。研究設計: 190以上の自治体で使用されているE-service Platform(ESP)の質的なケーススタディを行う。結果: ユーザ主導のロックイン要因は, 限定的・透過的コミュニケーション, 調達の制限的資格要件, 保守の混乱, 現状の快適性など, 重要なカテゴリーとして出現した。技術的なロックイン要因には、不十分なドキュメント、依存性管理の問題、限定的なテストカバレッジなどがある。結論: 自治体間の快適で保守的な文化の存在に対処するために、強いリーダーシップと継続的な訓練が必要である。オープンソースのStewards、すなわちOSSプロジェクトの中立的なホストは、これらのタスクにおいて自治体をサポートすると同時に、より広範なサプライヤエコシステムを実現するためのオープンで競争力のあるコラボレーションを促進するのに役立つ。

Background: Open Source Software (OSS) is often seen as an option to mitigate risks of lock-ins. Yet, single-vendor OSS can still result in soft lock-ins due to knowledge asymmetries and technical barriers. Aim: This study explores actors that render such soft lock-ins. Research design: We conduct a qualitative case study of an E-service Platform (ESP) used by over 190+ municipalities. Results: User-driven lock-in factors emerged as a significant category, including limited and non-transparent communication, restrictive qualification requirements in procurement, confusion on maintainership, and comfort in the status quo. Technical lock-in factors include inadequate documentation, dependency management issues, and limited test coverage. Conclusions: Strong leadership and continuous training is needed to address presence of comfort and conservative culture among municipalities. Open Source Stewards, i.e., neutral hosts for OSS projects, can support municipalities in these tasks while also helping to foster an open, competitive collaboration that can enable a broader supplier ecosystem.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# 非線形波動方程式の孤立波シミュレーションのための2段階初期値反復物理学インフォームドニューラルネットワーク

Two-stage initial-value iterative physics-informed neural networks for simulating solitary waves of nonlinear wave equations ( http://arxiv.org/abs/2409.01124v1 )

ライセンス: Link先を確認

Jin Song, Ming Zhong, George Em Karniadakis, Zhenya Yan,

(参考訳) 従来の数値反復法と物理インフォームドニューラルネットワーク(PINN)に基づく非線形波動方程式の孤立波計算のための新しい2段階初期値反復ニューラルネットワーク(IINN)を提案する。具体的には、IINNフレームワークは2つのサブネットワークで構成され、そのうちの1つは与えられた初期値に適合するために使用され、もう1つは物理情報を含み、最初のサブネットワークに基づいてトレーニングを継続する。重要なことに、IINN法は、与えられた初期値とは別に、境界条件を含む追加のデータ情報を必要としない。提案手法の有効性を示すための理論的保証を提供する。提案したIINN法は,1次元非線形シュリンガー方程式(NLS),PT-対称光学格子を持つ1次元飽和NLS方程式,KdV方程式,電位を持つ2次元NLS方程式,電位を持つ2次元修正GP方程式,2+1次元KP方程式,3次元NLS方程式など,様々な非線形波動方程式の解の学習に有効である。これらの応用は,本手法の有効性を示す証拠となる。最後に,従来の手法と比較することにより,提案手法の利点を実証する。

We propose a new two-stage initial-value iterative neural network (IINN) algorithm for solitary wave computations of nonlinear wave equations based on traditional numerical iterative methods and physics-informed neural networks (PINNs). Specifically, the IINN framework consists of two subnetworks, one of which is used to fit a given initial value, and the other incorporates physical information and continues training on the basis of the first subnetwork. Importantly, the IINN method does not require any additional data information including boundary conditions, apart from the given initial value. Corresponding theoretical guarantees are provided to demonstrate the effectiveness of our IINN method. The proposed IINN method is efficiently applied to learn some types of solutions in different nonlinear wave equations, including the one-dimensional (1D) nonlinear Schr\"odinger equations (NLS) equation (with and without potentials), the 1D saturable NLS equation with PT -symmetric optical lattices, the 1D focusing-defocusing coupled NLS equations, the KdV equation, the two-dimensional (2D) NLS equation with potentials, the 2D amended GP equation with a potential, the (2+1)-dimensional KP equation, and the 3D NLS equation with a potential. These applications serve as evidence for the efficacy of our method. Finally, by comparing with the traditional methods, we demonstrate the advantages of the proposed IINN method.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# 雑音チャネル上の通信におけるロバスト表現の学習

Learning Robust Representations for Communications over Noisy Channels ( http://arxiv.org/abs/2409.01129v1 )

ライセンス: Link先を確認

Sudharsan Senthil, Shubham Paul, Nambi Seshadri, R. David Koilpillai,

(参考訳) ディープラーニング(DL)ベースの通信システムは、従来の数学的モデル化システムよりも利点がある。 FCNN(Fully Connected Neural Networks)は、ディープラーニングアーキテクチャである。最適化問題を解くことはよく知られているが、既存の文献では、通信モデルの堅牢な表現を学ばないことが示唆されている。本研究は,既存の古典モデルからインスピレーションを受けずに,エンドツーエンドの通信システムを学習するFCNNの可能性を探るものである。本研究は,厳密な電力制約の下でシンボルの堅牢な表現を生成するために,コスト関数の変動によるドメイン知識の付与が与える影響について検討する。さらに,Barlow Twinsフレームワークにインスパイアされた新しいエンコーダ構造を導入する。最後に,SNR(Signal to Noise Ratio)の感度について,しばしば見落とされがちな課題に対処し,通信システムにおけるその重要性を強調するトレーニング戦略を導入する。このような手法がより信頼性の高いモデルを生み出すことを実証する。

A deep learning (DL)-based communication system offers advantages over traditional mathematically modelled systems, as the former may be jointly optimized. FCNNs (Fully Connected Neural Networks) are common Deep Learning architectures. Though they are well known to solve optimization problems, existing literature suggests that they fail to learn robust representations for communication models. This work explores the potential of FCNNs to learn an end-to-end communication system without taking any inspiration from existing classical models. The study investigates the impact of imbibing domain knowledge by varying cost functions to generate robust representations of symbols under strict power constraints. Additionally, we introduce a novel encoder structure inspired by the Barlow Twins framework. Finally, we introduce a training strategy that addresses the often-overlooked issue of training Signal to Noise Ratio (SNR) sensitivity, highlighting its importance in communication systems. We demonstrate that such a method leads to more reliable models.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# 変性体からの絡み合い変換のための誤差指数

Error exponents for entanglement transformations from degenerations ( http://arxiv.org/abs/2409.01130v1 )

ライセンス: Link先を確認

Dávid Bugár, Péter Vrana,

(参考訳) 本稿では, 純粋な多粒子状態間の漸近型LOCC変換における速度と強い逆指数のトレードオフ関係について検討する。一対の状態の間の単一コピー確率変換は、速度 1 での漸近変換が可能であり、指数関数的に成功確率が減少することを意味する。しかし、漸近変換が非ゼロ確率で実現可能である可能性はあるが、同じ速度の有限個のコピーの間には、確率的にさえ変換が存在しない。そのような場合、最適成功確率が指数関数的に減少するかどうかは分かっていない。漸近的変換の実現可能性を示すための基本的な道具は変性である。任意の退化は、初期状態のコピーとGHZ状態のサブ線形数からターゲット状態の同じコピー数への確率的LOCC変換をもたらす。これらのプロトコルは自由に選択できるパラメータを含むが、選択は成功確率に影響を与える。本稿では、パラメータの漸近的最適選択を特徴付け、結果のプロトコルのエラー指数に対するシングルレター式を導出する。特にこれは、確率変換が退化から生じるときの成功確率の指数的な下界を意味する。

This paper explores the trade-off relation between the rate and the strong converse exponent for asymptotic LOCC transformations between pure multipartite states. Any single-copy probabilistic transformation between a pair of states implies that an asymptotic transformation at rate 1 is possible with an exponentially decreasing success probability. However, it is possible that an asymptotic transformation is feasible with nonzero probability, but there is no transformation between any finite number of copies with the same rate, even probabilistically. In such cases it is not known if the optimal success probability decreases exponentially or faster. A fundamental tool for showing the feasibility of an asymptotic transformation is degeneration. Any degeneration gives rise to a sequence of stochastic LOCC transformations from copies of the initial state plus a sublinear number of GHZ states to the same number of copies of the target state. These protocols involve parameters that can be freely chosen, but the choice affects the success probability. In this paper, we characterize an asymptotically optimal choice of the parameters and derive a single-letter expression for the error exponent of the resulting protocol. In particular, this implies an exponential lower bound on the success probability when the stochastic transformation arises from a degeneration.

翻訳日:2024-09-06 07:26:52 公開日:2024-09-02

# 単眼画像から奥行きを理解できる大規模言語モデル

Large Language Models Can Understanding Depth from Monocular Images ( http://arxiv.org/abs/2409.01133v1 )

ライセンス: Link先を確認

Zhongyi Xia, Tianzhao Wu,

(参考訳) 単眼深度推定はコンピュータビジョンアプリケーションにおいて重要な機能である。本稿では,資源利用の効率化と一貫したニューラルネットワークアーキテクチャを用いて,大規模言語モデル(LLM)を最小限の監視で効果的に解釈可能であることを示す。 LLM-MDEは,言語理解を通して深度を解読するマルチモーダルフレームワークである。具体的には、LLM-MDEは、事前訓練されたLLMの深度推定能力を高めるために、クロスモーダルプログラミングと適応的なプロンプト推定モジュールの2つの主要な戦略を採用している。これらの戦略は、視覚表現をテキストプロトタイプと整合させ、それぞれ単眼画像に基づいてプロンプトを自動生成する。実世界のMDEデータセットに関する総合的な実験により、資源使用を最小化しながら、数秒/ゼロのタスクに優れるLLM-MDEの有効性と優位性が確認された。ソースコードは公開されている。

Monocular depth estimation is a critical function in computer vision applications. This paper shows that large language models (LLMs) can effectively interpret depth with minimal supervision, using efficient resource utilization and a consistent neural network architecture. We introduce LLM-MDE, a multimodal framework that deciphers depth through language comprehension. Specifically, LLM-MDE employs two main strategies to enhance the pretrained LLM's capability for depth estimation: cross-modal reprogramming and an adaptive prompt estimation module. These strategies align vision representations with text prototypes and automatically generate prompts based on monocular images, respectively. Comprehensive experiments on real-world MDE datasets confirm the effectiveness and superiority of LLM-MDE, which excels in few-/zero-shot tasks while minimizing resource use. The source code is available.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# 2evy-Index分数Kerr媒体におけるソリトン崩壊、変調不安定、ローグ波励起の抑制

Suppression of soliton collapses, modulational instability, and rogue-wave excitation in two-Lévy-index fractional Kerr media ( http://arxiv.org/abs/2409.01135v1 )

ライセンス: Link先を確認

Ming Zhong, Yong Chen, Zhenya Yan, Boris A. Malomed,

(参考訳) L'{e}vy indices, $\alpha_{1}\, \alpha_{2}\in (1, 2]$, and self-focusing or defocusing Kerr linearity。いくつかの基本ソリトンは変分近似を用いて得られ、数値的な結果と比較して検証される。ソリトン崩壊は、L\'{e}vy index $\alpha =1$の1次元立方乗分数非線形Schr\"{o}dinger方程式で示され、2-L\'{e}vy-index分数非線形Schr\"{o}dinger系で抑制できる。ソリトンの安定性は、ガウスパルスとの衝突や系のパラメータの断熱的変動に対しても検討される。連続波の変調不安定性を2-L\'{e}vy-index系でも検討した。特に、変調不安定性は、2つの回折係数が反対の符号を持つとき、デフォーカス非線形性(英語版)の場合に生じることがある。変調不安定性の結果を用いて, 連続波上に1次および2次ローグ波を発生させ, ケーラー非線形性の両符号を求める。

s in laser systems with two fractional-dispersion/diffraction terms, quantified by their L\'{e}vy indices, $\alpha_{1}\, \alpha_{2}\in (1, 2]$, and self-focusing or defocusing Kerr nonlinearity. Some fundamental solitons are obtained by means of the variational approximation, which are verified by comparison with numerical results. We find that the soliton collapse, exhibited by the one-dimensional cubic fractional nonlinear Schr\"{o}dinger equation with only one L\'{e}vy index $\alpha =1$, can be suppressed in the two-L\'{e}vy-index fractional nonlinear Schr\"{o}dinger system. Stability of the solitons is also explored against collisions with Gaussian pulses and adiabatic variation of the system parameters. Modulation instability of continuous waves is investigated in the two-L\'{e}vy-index system too. In particular, the modulation instability may occur in the case of the defocusing nonlinearity when two diffraction coefficients have opposite signs. Using results for the modulation instability, we produce first- and second-order rogue waves on top of continuous waves, for both signs of the Kerr nonlinearity.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# 希少物体のための合成衛星画像の生成:モデルと計量の実証的比較

Generating Synthetic Satellite Imagery for Rare Objects: An Empirical Comparison of Models and Metrics ( http://arxiv.org/abs/2409.01138v1 )

ライセンス: Link先を確認

Tuong Vy Nguyen, Johannes Hoster, Alexander Glaser, Kristian Hildebrand, Felix Biessmann,

(参考訳) 生成的ディープラーニングアーキテクチャは、現実的で高解像度の偽画像を生成することができる。この文脈における重要な疑問は、特にニッチドメインにおいて、現実的なイメージを生成するのがどの程度簡単か、ということです。特定の画像の内容を達成するのに必要な反復的なプロセスは、自動化と制御が困難である。特に稀なクラスでは、生成的アプローチが現実的なイメージとアライメントを生み出すかどうかを評価できない。本研究では,合成衛星画像を生成するために微調整した生成アーキテクチャの大規模評価について述べる。この制限は、世界中の約400の施設にしか存在しないため、実世界の事例の限られた回数で訓練とテストデータが制限される他の多くのシナリオに例えられる。我々は,ゲームエンジンから得られた2種類のモーダル性,テキスト入力,画像入力を条件付けて合成画像を生成する。生成した画像は, 自動評価のためによく使用される指標を用いて評価し, そして, その信頼性を評価するために実施したユーザスタディからの人的判断と比較した。本研究は, 稀な物体であっても, テキストや詳細な建築レイアウトによる合成衛星画像の生成が可能であることを示す。実際、一般的に使用されている画像品質メトリクスと人間の評価との間には、強い負の相関関係があることが分かっています。

Generative deep learning architectures can produce realistic, high-resolution fake imagery -- with potentially drastic societal implications. A key question in this context is: How easy is it to generate realistic imagery, in particular for niche domains. The iterative process required to achieve specific image content is difficult to automate and control. Especially for rare classes, it remains difficult to assess fidelity, meaning whether generative approaches produce realistic imagery and alignment, meaning how (well) the generation can be guided by human input. In this work, we present a large-scale empirical evaluation of generative architectures which we fine-tuned to generate synthetic satellite imagery. We focus on nuclear power plants as an example of a rare object category - as there are only around 400 facilities worldwide, this restriction is exemplary for many other scenarios in which training and test data is limited by the restricted number of occurrences of real-world examples. We generate synthetic imagery by conditioning on two kinds of modalities, textual input and image input obtained from a game engine that allows for detailed specification of the building layout. The generated images are assessed by commonly used metrics for automatic evaluation and then compared with human judgement from our conducted user studies to assess their trustworthiness. Our results demonstrate that even for rare objects, generation of authentic synthetic satellite imagery with textual or detailed building layouts is feasible. In line with previous work, we find that automated metrics are often not aligned with human perception -- in fact, we find strong negative correlations between commonly used image quality metrics and human ratings.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# LLM-PQA: LLM強化予測クエリー解法

LLM-PQA: LLM-enhanced Prediction Query Answering ( http://arxiv.org/abs/2409.01140v1 )

ライセンス: Link先を確認

Ziyu Li, Wenjie Zhao, Asterios Katsifodimos, Rihan Hai,

(参考訳) LLM(Large Language Models)の出現は、従来のSQLベースのデータベースシステムの制約を越えて、クエリの処理方法を変更する機会を提供する。しかし、予測クエリにLLMを使用することは、外部MLモデルを採用する必要があり、回答を提供するために推論を行う必要があるため、依然として困難である。本稿では,自然言語で表現された予測クエリに対処する新しいツール LLM-PQA を紹介する。 LLM-PQAは、データレイクとモデル動物園を統合することにより、予測クエリの必要性を予測するためのLLMと検索強化メカニズムを結合する最初の方法である。この統合により、ユーザは多様な異種データと多様なMLモデルにアクセスでき、動的予測クエリ応答が容易になる。さらに、LLM-PQAは、特定のクエリ要求に基づいて、オンデマンドでモデルを動的にトレーニングすることができ、モデル動物園で事前訓練されたモデルがタスクのために利用できなくても、信頼性と関連する結果を保証する。

The advent of Large Language Models (LLMs) provides an opportunity to change the way queries are processed, moving beyond the constraints of conventional SQL-based database systems. However, using an LLM to answer a prediction query is still challenging, since an external ML model has to be employed and inference has to be performed in order to provide an answer. This paper introduces LLM-PQA, a novel tool that addresses prediction queries formulated in natural language. LLM-PQA is the first to combine the capabilities of LLMs and retrieval-augmented mechanism for the needs of prediction queries by integrating data lakes and model zoos. This integration provides users with access to a vast spectrum of heterogeneous data and diverse ML models, facilitating dynamic prediction query answering. In addition, LLM-PQA can dynamically train models on demand, based on specific query requirements, ensuring reliable and relevant results even when no pre-trained model in a model zoo, available for the task.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# Duplex: エキスパート、グループクエリアテンション、継続的バッチを備えた大規模言語モデルのためのデバイス

Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching ( http://arxiv.org/abs/2409.01141v1 )

ライセンス: Link先を確認

Sungmin Yun, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim, Byeongho Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn,

(参考訳) 大規模言語モデル(LLM)は、さまざまなコンテキストにまたがる高品質なコンテンツを生成する能力のために登場した。コンピューティングリソースの爆発的に増加する需要を減らすために、専門家の混在(MoE)が出現した。 MoE層は、少ない計算で膨大な数のパラメータを活用できる。最先端の継続的バッチ処理を適用するとスループットが向上するが、MoE層やアテンション層でのDRAMアクセスが頻繁に発生する。従来の計算装置では,MoE処理やアテンション層処理に制限があり,実行時間全体を支配し,演算強度が低い(Op/B)。 PIM(Process-in-Memory)アーキテクチャのような低Op/BをターゲットとするデバイスでのみMoE層を処理することは、連続バッチによるMoE層内のOp/Bの変動により困難である。これらの課題に対処するため,1台のデバイスで低Op/B動作を効果的に行うために,高Op/Bに適したxPUとLogic-PIMを組み合わせたDuplexを提案する。 Duplex は LLM 内の各層の Op/B に基づいて最も適切なプロセッサを選択する。 MoE層のOp/Bが少なくとも1であり、アテンション層のOp/Bがグループ化されたクエリアテンションに対して4〜8の値を持つため、以前のPIMアーキテクチャは効率的ではない。近年の傾向に基づき、Logic-PIM は DRAM ダイと論理ダイとの高帯域通信を可能にし、論理ダイに強力な処理ユニットを配置するスルー・シリコン・バイス (TSV) を追加し、数ダースから数ダースまでの低Op/B操作に最適である。本稿では,xPU と Logic-PIM を最大限に活用するために,エキスパートとアテンションの共同処理を提案する。

Large language models (LLMs) have emerged due to their capability to generate high-quality content across diverse contexts. To reduce their explosively increasing demands for computing resources, a mixture of experts (MoE) has emerged. The MoE layer enables exploiting a huge number of parameters with less computation. Applying state-of-the-art continuous batching increases throughput; however, it leads to frequent DRAM access in the MoE and attention layers. We observe that conventional computing devices have limitations when processing the MoE and attention layers, which dominate the total execution time and exhibit low arithmetic intensity (Op/B). Processing MoE layers only with devices targeting low-Op/B such as processing-in-memory (PIM) architectures is challenging due to the fluctuating Op/B in the MoE layer caused by continuous batching. To address these challenges, we propose Duplex, which comprises xPU tailored for high-Op/B and Logic-PIM to effectively perform low-Op/B operation within a single device. Duplex selects the most suitable processor based on the Op/B of each layer within LLMs. As the Op/B of the MoE layer is at least 1 and that of the attention layer has a value of 4-8 for grouped query attention, prior PIM architectures are not efficient, which place processing units inside DRAM dies and only target extremely low-Op/B (under one) operations. Based on recent trends, Logic-PIM adds more through-silicon vias (TSVs) to enable high-bandwidth communication between the DRAM die and the logic die and place powerful processing units on the logic die, which is best suited for handling low-Op/B operations ranging from few to a few dozens. To maximally utilize the xPU and Logic-PIM, we propose expert and attention co-processing.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# LATEX-GCL:Large Language Models (LLMs)-based data Augmentation for Text-Attributed Graph Contrastive Learning

LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning ( http://arxiv.org/abs/2409.01145v1 )

ライセンス: Link先を確認

Haoran Yang, Xiangyu Zhao, Sirui Huang, Qing Li, Guandong Xu,

(参考訳) Graph Contrastive Learning(GCL)は、自己教師付きグラフ学習の強力なパラダイムであり、さまざまなアプリケーションシナリオに注目されている。しかし、テキスト分散グラフ(TAG)について学ぶためのGCLはまだ検討されていない。機能埋め込みマスキングのような従来の拡張技術では、TAGのテキスト属性を直接処理することはできない。 GCLをTAGに適用するための簡単な戦略は、テキスト属性を言語モデルを介して機能埋め込みにエンコードし、次に処理するために次のGCLモジュールに埋め込むことである。このような戦略は3つの大きな課題に直面している。 (I) 情報損失を回避するための失敗 (II) テキストエンコーディングフェーズにおける意味的損失 (III) 暗黙的な拡張制約(英語版)は制御不能で理解不能な結果をもたらす。本稿では,LATEX-GCLと呼ばれる新しいGCLフレームワークを提案する。LATEX-GCL(Large Language Models, LLM)を用いてテキスト拡張とLLMの強力な自然言語処理能力を利用して,前述の3つの制限に対処し,TAGタスクにGCLを適用する方法について検討する。 4つの高品質なTAGデータセットに対する大規模な実験は、提案したLATEX-GCL法の優位性を示している。ソースコードとデータセットは再現性を容易にするためにリリースされており、このリンクからアクセスすることができる。

Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised graph learning that has attracted attention across various application scenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet to be explored. Because conventional augmentation techniques like feature embedding masking cannot directly process textual attributes on TAGs. A naive strategy for applying GCL to TAGs is to encode the textual attributes into feature embeddings via a language model and then feed the embeddings into the following GCL module for processing. Such a strategy faces three key challenges: I) failure to avoid information loss, II) semantic loss during the text encoding phase, and III) implicit augmentation constraints that lead to uncontrollable and incomprehensible results. In this paper, we propose a novel GCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to produce textual augmentations and LLMs' powerful natural language processing (NLP) abilities to address the three limitations aforementioned to pave the way for applying GCL to TAG tasks. Extensive experiments on four high-quality TAG datasets illustrate the superiority of the proposed LATEX-GCL method. The source codes and datasets are released to ease the reproducibility, which can be accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# FMRFT:Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking

FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking ( http://arxiv.org/abs/2409.01148v1 )

ライセンス: Link先を確認

Mingyuan Yao, Yukang Huo, Qingbin Tian, Jiayin Zhao, Xiao Liu, Ruifeng Wang, Haihua Wang,

(参考訳) 魚の成長, 異常行動, および魚の病気は, 画像処理による魚の追跡によって早期に検出できる。しかし、水中での反射や、高い類似性、刺激による急激な水泳、多目的閉塞などのいくつかの理由により、魚の多目的追跡に困難が生じる。これらの課題に対処するため,本稿では,複雑なマルチシーン・スタージョン追跡データセットを構築し,リアルタイム魚追跡モデルであるFMRFTを提案する。このモデルでは,マルチフレーム映像のタイミング記憶と高速特徴抽出を実現するために,低メモリ消費のMamba In Mamba (MIM) アーキテクチャを導入し,マルチフィッシュ映像における連続フレームの相関解析の効率を向上させる。さらに、RT-DETRの優れた特徴相互作用と事前フレーム処理機能を活用し、効率的な追跡アルゴリズムを提供する。 QTSIクエリインタラクション処理モジュールを組み込むことで、モデルは隠蔽されたオブジェクトと冗長なトラッキングフレームを効果的に処理し、より正確で安定した魚追跡を実現する。データセット上でトレーニングおよびテストが行われ、IDF1スコアは90.3%、MOTA精度は94.3%である。実験結果から,FMRFTモデルでは魚の群集における相似性と相互排除の課題に効果的に対処でき,工場の農業環境における正確な追跡が可能であることが示唆された。

Growth, abnormal behavior, and diseases of fish can be early detected by monitoring fish tracking through the method of image processing, which is of great significance for factory aquaculture. However, underwater reflections and some reasons with fish, such as the high similarity , rapid swimming caused by stimuli and multi-object occlusion bring challenges to multi-target tracking of fish. To address these challenges, this paper establishes a complex multi-scene sturgeon tracking dataset and proposes a real-time end-to-end fish tracking model, FMRFT. In this model, the Mamba In Mamba (MIM) architecture with low memory consumption is introduced into the tracking algorithm to realize multi-frame video timing memory and fast feature extraction, which improves the efficiency of correlation analysis for contiguous frames in multi-fish video. Additionally, the superior feature interaction and a priori frame processing capabilities of RT-DETR are leveraged to provide an effective tracking algorithm. By incorporating the QTSI query interaction processing module, the model effectively handles occluded objects and redundant tracking frames, resulting in more accurate and stable fish tracking. Trained and tested on the dataset, the model achieves an IDF1 score of 90.3% and a MOTA accuracy of 94.3%. Experimental results demonstrate that the proposed FMRFT model effectively addresses the challenges of high similarity and mutual occlusion in fish populations, enabling accurate tracking in factory farming environments.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# パラメータ自由表現アライメントによるマルチモーダル幻覚の理解

Understanding Multimodal Hallucination with Parameter-Free Representation Alignment ( http://arxiv.org/abs/2409.01151v1 )

ライセンス: Link先を確認

Yueqian Wang, Jianxin Liang, Yuxuan Wang, Huishuai Zhang, Dongyan Zhao,

(参考訳) 幻覚は、MLLM(Multimodal Large Language Models)において一般的な問題であるが、根底にある原則はよく分かっていない。本稿では,MLLMのどの成分が物体幻覚に寄与するかを考察する。画像表現自体以外の要素の影響を完全に回避しつつ画像表現を解析するために,任意の2つの表現システム間の類似度を,追加の訓練パラメータを必要とせずに測定できるパラメトリックフリー表現アライメントメトリック(Pfram)を提案する。特に、Pframは人間の表現システムとニューラル表現システムのアライメントを評価できる。オブジェクトアノテーションとのアライメントを評価することで、さまざまなモデルアーキテクチャやサイズにまたがる、さまざまな最先端MLLMにおけるオブジェクト幻覚との強い一貫した相関が示されることを示す。さらに, MLLMにおける画像表現に関する他の重要な課題として, 異なるモジュールの役割, テキスト命令の影響, 代替視覚エンコーダの使用などについて検討する。私たちのコードは、https://github.com/yellow-binary-tree/Pfram.comで利用可能です。

Hallucination is a common issue in Multimodal Large Language Models (MLLMs), yet the underlying principles remain poorly understood. In this paper, we investigate which components of MLLMs contribute to object hallucinations. To analyze image representations while completely avoiding the influence of all other factors other than the image representation itself, we propose a parametric-free representation alignment metric (Pfram) that can measure the similarities between any two representation systems without requiring additional training parameters. Notably, Pfram can also assess the alignment of a neural representation system with the human representation system, represented by ground-truth annotations of images. By evaluating the alignment with object annotations, we demonstrate that this metric shows strong and consistent correlations with object hallucination across a wide range of state-of-the-art MLLMs, spanning various model architectures and sizes. Furthermore, using this metric, we explore other key issues related to image representations in MLLMs, such as the role of different modules, the impact of textual instructions, and potential improvements including the use of alternative visual encoders. Our code is available at: https://github.com/yellow-binary-tree/Pfram.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# 実世界の会話型エンティティリンクはゼロショット以上を必要とする

Real World Conversational Entity Linking Requires More Than Zeroshots ( http://arxiv.org/abs/2409.01152v1 )

ライセンス: Link先を確認

Mohanna Hoveyda, Arjen P. de Vries, Maarten de Rijke, Faegheh Hasibi,

(参考訳) 会話におけるエンティティリンク(EL)は、主にドメイン固有のロングテールエンティティを含むエンティティアノテーション付き会話データセットとスパースナレッジベース(KB)の不足により、実用的なアプリケーションにおいて顕著な課題に直面している。我々は,資源制約下でのELモデルの有効性を評価するための評価シナリオを設計した。評価には、Fandom、現実世界のEL複雑度を例示するFandom、広く使われているWikipediaの2つのKBが使われている。まず、Fandomを用いた新しい不慣れKBに一般化するELモデルの能力と、RedditのFandomエンティティに関する議論に基づいて収集したゼロショット対話エンティティリンクデータセットを評価する。次に,ELモデルの事前学習を伴わずに,会話環境への適応性を評価する。以上の結果から,既存のゼロショットELモデルは,事前トレーニングを伴わずに新しいドメイン固有KBに導入され,性能が著しく低下していることが示唆された。その結果,従来の評価手法はゼロショットELにおける実世界の複雑さを捉えるには不十分であり,限られたリソースに適応するための会話型ELモデルの設計と評価のための新たなアプローチの必要性が浮き彫りになった。本研究で提案した評価設定とデータセットを公開している。

Entity linking (EL) in conversations faces notable challenges in practical applications, primarily due to the scarcity of entity-annotated conversational datasets and sparse knowledge bases (KB) containing domain-specific, long-tail entities. We designed targeted evaluation scenarios to measure the efficacy of EL models under resource constraints. Our evaluation employs two KBs: Fandom, exemplifying real-world EL complexities, and the widely used Wikipedia. First, we assess EL models' ability to generalize to a new unfamiliar KB using Fandom and a novel zero-shot conversational entity linking dataset that we curated based on Reddit discussions on Fandom entities. We then evaluate the adaptability of EL models to conversational settings without prior training. Our results indicate that current zero-shot EL models falter when introduced to new, domain-specific KBs without prior training, significantly dropping in performance. Our findings reveal that previous evaluation approaches fall short of capturing real-world complexities for zero-shot EL, highlighting the necessity for new approaches to design and assess conversational EL models to adapt to limited resources. The evaluation setup and the dataset proposed in this research are made publicly available.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# 反復リアプノフ法による符号化量子ゲート生成について

On encoded quantum gate generation by iterative Lyapunov-based methods ( http://arxiv.org/abs/2409.01153v1 )

ライセンス: Link先を確認

Paulo Sergio Pereira da Silva, Pierre Rouchon,

(参考訳) 本稿では,量子ゲート生成の符号化問題について述べる。この考え方は、合成される量子ゲートの次元$\bar n$よりも高次元$n$の量子系を考えることである。 2つの正則部分集合 $\mathbb{E} = \{e_1, e_2, \ldots, e_{\bar n}\}$ と $\mathbb F = \{f_1, f_2, \ldots, f_{\bar n}\}$ of $\mathbb{C}^n$ が与えられたとき、符号化された量子ゲート生成の問題は、すべての初期状態 $e_i$ が $\exp(\jmath \phi) f_i, i=1,2, \ldots ,\bar n$ にステアされるように、間隔 $[0, T_f]$ で定義された開ループ制御則を得ることである。この問題には古典的な(完全な)量子ゲート生成問題、$\bar n = n$、$\bar n = 1$、$ 1 < \bar n < n$ のエンコードゲート生成問題が含まれる。したがって、ここでは3つの問題が共通のアプローチで統一される。閉量子系における符号化ゲート生成問題を考えるために, RIGA (emph{Reference Input Generation Algorithm) が一般化される。適切なリャプノフ函数は、符号化ゲートの支持上の直交射影から導かれる。 2つの結合トランスモン量子ビット、トランスモン量子ビットに結合したキャビティモード、およびN=10$の大きい次元ケースを含む1列のN$量子ビットである。

The problem of encoded quantum gate generation is studied in this paper. The idea is to consider a quantum system of higher dimension $n$ than the dimension $\bar n$ of the quantum gate to be synthesized. Given two orthonormal subsets $\mathbb{E} = \{e_1, e_2, \ldots, e_{\bar n}\}$ and $\mathbb F = \{f_1, f_2, \ldots, f_{\bar n}\}$ of $\mathbb{C}^n$, the problem of encoded quantum gate generation consists in obtaining an open loop control law defined in an interval $[0, T_f]$ in a way that all initial states $e_i$ are steered to $\exp(\jmath \phi) f_i, i=1,2, \ldots ,\bar n$ up to some desired precision and to some global phase $\phi \in \mathbb{R}$. This problem includes the classical (full) quantum gate generation problem, when $\bar n = n$, the state preparation problem, when $\bar n = 1$, and finally the encoded gate generation when $ 1 < \bar n < n$. Hence, three problems are unified here within a unique common approach. The \emph{Reference Input Generation Algorithm (RIGA)} is generalized in this work for considering the encoded gate generation problem for closed quantum systems. A suitable Lyapunov function is derived from the orthogonal projector on the support of the encoded gate. Three case-studies of physical interest indicate the potential interest of such numerical algorithm: two coupled transmon-qubits, a cavity mode coupled to a transmon-qubit, and a chain of $N$ qubits, including a large dimensional case for which $N=10$.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# ニューラルネットワークによる感染症流行の予測と関連する不確実性

Forecasting infectious disease prevalence with associated uncertainty using neural networks ( http://arxiv.org/abs/2409.01154v1 )

ライセンス: Link先を確認

Michael Morris,

(参考訳) 感染症は人的・経済的に重荷を負う。病気の発生を正確に予測することで、公衆衛生機関は既存の疾患や新興疾患に効果的に対応できる。この分野の進歩にもかかわらず、正確な予測モデルの開発は依然として重要な課題である。この論文では、ニューラルネットワーク(NN)と関連する不確実性推定(NNの流行予測への適用を制限する重要なコンポーネント)を用いた2つの方法論フレームワークを提案する。米国におけるインフルエンザ様疾患(ILI)を予測し,その枠組みを整備する。提案手法は,従来のILIレートと連動してWeb検索活動データを用いて,NNアーキテクチャのトレーニングを行う。我々のモデルは不確実区間を生成するためにベイズ層を組み込んでおり、より伝統的なアプローチの正当な代替品として位置づけている。最高のアーキテクチャ: 反復リカレントニューラルネットワーク(IRNN)は平均絶対誤差を10.3%削減し、4つのインフルエンザシーズンにおける予測タスクの平均で17.1%改善する。提案手法は,IRNNにおけるサンプリング手順を変更し,不確実性評価を改善するアーキテクチャであるIRNNを導入して構築する。第2のフレームワークでは、ニューラル常微分方程式を使用して、機械的コンパートメンタルモデルとNN間のギャップを埋めます。我々は、ILIレートとWeb検索活動データを組み合わせた8つのニューラルODEモデルを評価し、予測を行った。これらはIRNNとIRNN0(IRNNはILIレートのみを使用する)と比較される。 Web検索活動データなしで訓練されたモデルは、スキルの点でIRNN0を16%上回っている。今後は、最高のパフォーマンスのIRNNと競合するために、Web検索データを使ったニューラルODEをより効果的に活用することに注力する必要がある。

Infectious diseases pose significant human and economic burdens. Accurately forecasting disease incidence can enable public health agencies to respond effectively to existing or emerging diseases. Despite progress in the field, developing accurate forecasting models remains a significant challenge. This thesis proposes two methodological frameworks using neural networks (NNs) with associated uncertainty estimates - a critical component limiting the application of NNs to epidemic forecasting thus far. We develop our frameworks by forecasting influenza-like illness (ILI) in the United States. Our first proposed method uses Web search activity data in conjunction with historical ILI rates as observations for training NN architectures. Our models incorporate Bayesian layers to produce uncertainty intervals, positioning themselves as legitimate alternatives to more conventional approaches. The best performing architecture: iterative recurrent neural network (IRNN), reduces mean absolute error by 10.3% and improves Skill by 17.1% on average in forecasting tasks across four flu seasons compared to the state-of-the-art. We build on this method by introducing IRNNs, an architecture which changes the sampling procedure in the IRNN to improve the uncertainty estimation. Our second framework uses neural ordinary differential equations to bridge the gap between mechanistic compartmental models and NNs; benefiting from the physical constraints that compartmental models provide. We evaluate eight neural ODE models utilising a mixture of ILI rates and Web search activity data to provide forecasts. These are compared with the IRNN and IRNN0 - the IRNN using only ILI rates. Models trained without Web search activity data outperform the IRNN0 by 16% in terms of Skill. Future work should focus on more effectively using neural ODEs with Web search data to compete with the best performing IRNN.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# TempMe: テキスト・ビデオ検索に便利なビデオ・テンポラル・トーケン・マージ

TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval ( http://arxiv.org/abs/2409.01156v1 )

ライセンス: Link先を確認

Leqi Shen, Tianxiang Hao, Sicheng Zhao, Yifeng Zhang, Pengzhang Liu, Yongjun Bao, Guiguang Ding,

(参考訳) ほとんどのテキストビデオ検索手法は、テキストイメージを事前訓練したCLIPをバックボーンとして使用し、計算オーバーヘッドの高い複雑なモジュールを組み込む。その結果、多くの研究が効率的な微調整に焦点を当てた。効率的な適応の第一の課題は、画像とビデオのモダリティの固有の相違から生じる。各サンプルビデオフレームは、画像エンコーダによって独立して処理されなければならない。既存の効率的な方法は訓練可能なパラメータを小さく微調整するが、大きなトークン数のために高い推論コストを発生させる。本研究では,時間的冗長性は連続するフレームにおける繰り返し情報により,モデルの複雑さに大きく寄与する,と論じる。既存の画像モデルのトークン圧縮手法では、フレーム間の時間的冗長性を見落としているため、ユニークな課題を解決できない。これらの課題に対処するため,時間的冗長性を低減するため,TempMe(TempMe)を提案する。具体的には、プログレッシブ・マルチグラニュラリティ・フレームワークを導入する。近隣のクリップを徐々に組み合わせることで、異なるフレームに時間トークンをマージし、ビデオレベルの特徴を学習することで、複雑さの低減とパフォーマンスの向上を実現します。大規模な実験により、TempMeの優位性が検証された。従来の効率的なテキストビデオ検索手法と比較して、TempMeは出力トークンを95%、GFLOPを51%削減し、1.8倍の高速化と4.4%のR-Sum改善を実現した。さらにTempMeは、効率的かつ完全な微調整手法を効果的に統合することで、堅牢な一般化能力を示す。完全な微調整により、TempMeは7.9%のR-Sumの改善、1.57倍の高速化、75.2%のGPUメモリ使用率を実現している。私たちのコードは解放されます。

Most text-video retrieval methods utilize the text-image pre-trained CLIP as a backbone, incorporating complex modules that result in high computational overhead. As a result, many studies focus on efficient fine-tuning. The primary challenge in efficient adaption arises from the inherent differences between image and video modalities. Each sampled video frame must be processed by the image encoder independently, which increases complexity and complicates practical deployment. Although existing efficient methods fine-tune with small trainable parameters, they still incur high inference costs due to the large token number. In this work, we argue that temporal redundancy significantly contributes to the model's high complexity due to the repeated information in consecutive frames. Existing token compression methods for image models fail to solve the unique challenges, as they overlook temporal redundancy across frames. To tackle these problems, we propose Temporal Token Merging (TempMe) to reduce temporal redundancy. Specifically, we introduce a progressive multi-granularity framework. By gradually combining neighboring clips, we merge temporal tokens across different frames and learn video-level features, leading to lower complexity and better performance. Extensive experiments validate the superiority of our TempMe. Compared to previous efficient text-video retrieval methods, TempMe significantly reduces output tokens by 95% and GFLOPs by 51%, while achieving a 1.8X speedup and a 4.4% R-Sum improvement. Additionally, TempMe exhibits robust generalization capabilities by integrating effectively with both efficient and full fine-tuning methods. With full fine-tuning, TempMe achieves a significant 7.9% R-Sum improvement, trains 1.57X faster, and utilizes 75.2% GPU memory usage. Our code will be released.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# 自動音声キャプションのための補助検索モデルによるEnCLAPの拡張

Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning ( http://arxiv.org/abs/2409.01160v1 )

ライセンス: Link先を確認

Jaeyeon Kim, Jaeyoon Jung, Minjeong Jeon, Sang Hoon Woo, Jinjoo Lee,

(参考訳) 本稿では,DCASE2024 Challenge Task6 (Automated Audio Captioning) と Task8 (Language-based Audio Retrieval) について述べる。本稿では,EnCLAP音声キャプションフレームワークに基づくアプローチを開発し,課題の6タスクに最適化する。特に、基礎となるコンポーネントの変更と、再品位プロセスの組み入れについて概説する。さらに、修正したフレームワークの副産物である補足型レトリバーモデルをTask8に送信します。提案システムでは,タスク6のFENSEスコアが0.542,タスク8のmAP@10スコアが0.386,ベースラインモデルが大幅に向上した。

In this technical report, we describe our submission to DCASE2024 Challenge Task6 (Automated Audio Captioning) and Task8 (Language-based Audio Retrieval). We develop our approach building upon the EnCLAP audio captioning framework and optimizing it for Task6 of the challenge. Notably, we outline the changes in the underlying components and the incorporation of the reranking process. Additionally, we submit a supplementary retriever model, a byproduct of our modified framework, to Task8. Our proposed systems achieve FENSE score of 0.542 on Task6 and mAP@10 score of 0.386 on Task8, significantly outperforming the baseline models.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# 性能と効率のバランスをとる:画像テキストの相互作用に基づく多モーダル大言語モデルプルーニング法

Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction ( http://arxiv.org/abs/2409.01162v1 )

ライセンス: Link先を確認

Gaotong Yu, Yi Chen, Jian Xu,

(参考訳) 近年,多モーダル大規模言語モデル (MM-LLM) は多モーダルタスクにおいて大きな成功を収めている。 MM-LLMsフレームワークでは、LLM層における連結テキストと視覚トークンの処理が主な計算消費ステップである。 LLMの入力トークンの長さは、全体的なトレーニングと推論効率に直接影響を及ぼす。そこで本研究では,MM-LLMの視覚的トークンについて検討した。その結果,視覚エンコーダにおける視覚トークンとCLSトークンの類似性は,長いテール分布に従うことがわかった。言い換えれば、少数の視覚トークンだけがCLSトークンと非常によく似ている。そこで我々は,この問題に対処する動的プルーニングアルゴリズムを設計した。まず、異なる入力サンプルに対して、視覚的CLSトークン類似度曲線の屈折点を探索し、対応するセグメンテーション点として使用し、視覚マーカーをトリミングする。このプロセスは、主に視覚エンコーダの出力を減らし、モデルを加速する。そして、LLM層において、連結された視覚テキストトークンを2度目のプルーニングを行う。この過程で、視覚的特徴とテキスト的特徴の相互作用により、テキスト相関の低い視覚的トークンとテキスト的トークンはさらにフィルタリングされ、効率と性能のバランスがとれる。複数のデータセットから得られた結果から,提案手法は元のトークン量の平均22%を使用する場合,元のトークン量と競合する性能を達成できることが示唆された。私たちのソースコードは受理後、公開されます。

Recently, multimodal large language models (MM-LLMs) have achieved great success in many multimodal tasks, but their high computational costs limit their further promotion and application. In the MM-LLMs framework, the main computational consumption step is the processing of concatenated text and visual tokens at the LLM layer. The length of the input token for LLM directly affects the overall training and inference efficiency. In response to this issue, we further studied the visual tokens of MM-LLMs. We found that the similarity between visual and CLS tokens in the visual encoder follows a long-tail distribution. In other words, only a few visual tokens are highly similar to CLS tokens. Therefore, we designed a dynamic pruning algorithm to address this issue. Firstly, for different input samples, we search for the inflection point of their visual CLS token similarity curve and use it as the corresponding segmentation point to trim the visual markers. This process mainly reduces the output of the visual encoder to accelerate the model. Then, in the LLM layer, the concatenated visual text tokens are pruned for the second time. During this process, due to the interaction between visual and textual features, visual and textual tokens with low text correlation are further filtered, achieving a balance between efficiency and performance. The results on multiple datasets show that our proposed method can achieve performance that competes with the original performance when using an average of 22% of the original token quantity. Our source code will be made publicly available following acceptance.

翻訳日:2024-09-06 07:13:03 公開日:2024-09-02

# PACSBO: おそらくほぼ正しいベイズ最適化

PACSBO: Probably approximately correct safe Bayesian optimization ( http://arxiv.org/abs/2409.01163v1 )

ライセンス: Link先を確認

Abdullah Tokmak, Thomas B. Schön, Dominik Baumann,

(参考訳) 安全なベイズ最適化(BO)アルゴリズムは、システムのダイナミクスを知らずに最適な制御ポリシーを見つけることを約束すると同時に、高い確率で安全性を保証する。これらの保証と引き換えに、一般的なアルゴリズムは滑らかな仮定を必要とする:再生カーネルヒルベルト空間(RKHS)のノルム上の既知の上限。 RKHS は潜在的に無限次元空間であり、実際、その対応する RKHS において未知函数の上界を得る方法は不明である。そこで本研究では,データから未知関数のRKHSノルムの上界を推定し,その理論的性質について検討するアルゴリズムを提案する。さらに、リプシッツに基づく手法と同様に、RKHSノルムをグローバルな対象ではなく局所的な対象として扱い、保守主義を減少させる。 RKHSノルム推定とRKHSノルムの局所解釈を安全なBOアルゴリズムに統合すると、ほぼ正しいベイズ最適化のためのアルゴリズムPACSBOが得られる。

Safe Bayesian optimization (BO) algorithms promise to find optimal control policies without knowing the system dynamics while at the same time guaranteeing safety with high probability. In exchange for those guarantees, popular algorithms require a smoothness assumption: a known upper bound on a norm in a reproducing kernel Hilbert space (RKHS). The RKHS is a potentially infinite-dimensional space, and it is unclear how to, in practice, obtain an upper bound of an unknown function in its corresponding RKHS. In response, we propose an algorithm that estimates an upper bound on the RKHS norm of an unknown function from data and investigate its theoretical properties. Moreover, akin to Lipschitz-based methods, we treat the RKHS norm as a local rather than a global object, and thus reduce conservatism. Integrating the RKHS norm estimation and the local interpretation of the RKHS norm into a safe BO algorithm yields PACSBO, an algorithm for probably approximately correct safe Bayesian optimization, for which we provide numerical and hardware experiments that demonstrate its applicability and benefits over popular safe BO algorithms.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# 焦点長とカメラポッドの共通体長変化によるカメラパラメータの変動

Variation of Camera Parameters due to Common Physical Changes in Focal Length and Camera Pose ( http://arxiv.org/abs/2409.01171v1 )

ライセンス: Link先を確認

Hsin-Yi Chen, Chuan-Kai Fu, Jen-Hui Chuang,

(参考訳) カメラ固有のパラメータの正確な校正は、インテリジェントシステムや自動運転車などの分野における様々なコンピュータビジョンベースの応用に不可欠である。しかし、既存の校正方式は、一般的な物理的変化によるカメラパラメータの変動の一般的な傾向を見出すには不適である。本稿では,焦点距離とカメラポーズの変化による大小の変動を,最近提案されたキャリブレーション法で同定できることを実証した。実験結果から、前者は様々なタイプのカメラの主点偏差の傾向(方向)が異なるが、後者は内部レンズの配置が異なるためか、後者は重力方向による偏差に非常によく似た傾向を持つ。最後に, カメラキャリブレーションの異なる方法において, 3次元から2次元への再投射誤差を比較検討した。

Accurate calibration of camera intrinsic parameters is crucial to various computer vision-based applications in the fields of intelligent systems, autonomous vehicles, etc. However, existing calibration schemes are incompetent for finding general trend of the variation of camera parameters due to common physical changes. In this paper, it is demonstrated that major and minor variations due to changes in focal length and camera pose, respectively, can be identified with a recently proposed calibration method. It is readily observable from the experimental results that the former variations have different trends (directions) of principal point deviation for different types of camera, possibly due to different internal lens configurations, while the latter have very similar trends in the deviation which is most likely due to direction of gravity. Finally, to confirm the validity of such unprecedented findings, 3D to 2D reprojection errors are compared for different methods of camera calibration.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# ブリルアン光学系における合成磁性による低しきい値量子相関

Low threshold quantum correlations via synthetic magnetism in Brillouin optomechanical system ( http://arxiv.org/abs/2409.01172v1 )

ライセンス: Link先を確認

D. R. K. Massembele, P. Djorwé, K. B. Emale, Jia-Xin Peng, A. -H. Abdel-Aty, K. S. Nisar,

(参考訳) 本稿では, ブリュアン光学系における低駆動閾値量子相関を合成磁性に基づいて生成する手法を提案する。提案手法は,2つの光モードに結合した機械的(音響的)共振器を標準振動圧(電気的拘束力)によって構成する。音響モードと光学モードを結合する電気的拘束力は、システム内の後方刺激ブリルアン散乱(BSBS)過程をトリガーする。さらに、機械的および音響的共振器は、結合率$J_m$で機械的に結合される。機械的結合がなければ、生成した量子相関は強い駆動場を必要とする。フォノンホッピング結合を考慮し、合成磁性を誘導し、低結合強度の量子相関を生成する。生成した量子相関は急激な死と再生性フェノネナを示し、熱雑音に対して堅牢である。本研究は,量子通信,量子センサ,量子計算タスクに有用な低しきい値量子相関生成法を提案する。

We propose a scheme to generate low driving threshold quantum correlations in Brillouin optomechanical system based on synthetic magnetism. Our proposal consists of a mechanical (acoustic) resonator coupled to two optical modes through the standard optomechanical radiation pressure (an electrostrictive force). The electrostrictive force that couples the acoustic mode to the optical ones striggers Backward Stimulated Brillouin Scattering (BSBS) process in the system. Moreover, the mechanical and acoustic resonators are mechanically coupled through the coupling rate $J_m$, which is $\theta$-phase modulated. Without a mechanical coupling, the generated quantum correlations require a strong driving field. By accounting phonon hopping coupling, the synthetic magnetism is induced and the quantum correlations are generated for low coupling strengths. The generated quantum correlations display sudden death and revival phenonmena, and are robust against thermal noise. Our results suggest a way for low threshold quantum correlations generation, and are useful for quantum communications, quantum sensors, and quantum computational tasks.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# アウト・オブ・ディストリビューション検出のためのログスケーリング

Logit Scaling for Out-of-Distribution Detection ( http://arxiv.org/abs/2409.01175v1 )

ライセンス: Link先を確認

Andrija Djurisic, Rosanne Liu, Mladen Nikolic,

(参考訳) 機械学習とAIモデルのオープンワールド環境への安全なデプロイは、アウト・オブ・ディストリビューション(OOD)データを正確に検出する能力、モデルのトレーニング内容と大きく異なるデータサンプルに重きを置いている。 OOD検出への現在のアプローチは、モデルをさらにトレーニングすることや、もはやアクセスできないかもしれないトレーニングデータに関する統計を必要とすることが多い。さらに、既存のOOD検出メソッドの多くは、異なるアーキテクチャ間で転送された場合のパフォーマンスを維持するのに苦労している。我々の研究は、トレーニングデータ配信へのアクセスを必要とせず、トレーニングされたネットワークをそのまま維持し、さまざまなアーキテクチャにわたって強力なパフォーマンスを維持する、シンプルなポストホック手法を提案することで、これらの課題に対処する。我々の方法である Logit Scaling (LTS) は、その名が示すように、単純にロジットを、分散内(ID) と OOD のサンプルを効果的に区別する方法でスケールする。 CIFAR-10, CIFAR-100, ImageNet, OpenOOD など,様々なスケールでベンチマークを行った。実験では、3つのIDと14のOODデータセットと9つのモデルアーキテクチャがカバーされた。全体として、我々は様々なアーキテクチャにおける最先端性能、堅牢性、適応性を実証し、高度なOOD検出のための普遍的なソリューションへの道を開いた。

The safe deployment of machine learning and AI models in open-world settings hinges critically on the ability to detect out-of-distribution (OOD) data accurately, data samples that contrast vastly from what the model was trained with. Current approaches to OOD detection often require further training the model, and/or statistics about the training data which may no longer be accessible. Additionally, many existing OOD detection methods struggle to maintain performance when transferred across different architectures. Our research tackles these issues by proposing a simple, post-hoc method that does not require access to the training data distribution, keeps a trained network intact, and holds strong performance across a variety of architectures. Our method, Logit Scaling (LTS), as the name suggests, simply scales the logits in a manner that effectively distinguishes between in-distribution (ID) and OOD samples. We tested our method on benchmarks across various scales, including CIFAR-10, CIFAR-100, ImageNet and OpenOOD. The experiments cover 3 ID and 14 OOD datasets, as well as 9 model architectures. Overall, we demonstrate state-of-the-art performance, robustness and adaptability across different architectures, paving the way towards a universally applicable solution for advanced OOD detection.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# 自律運転におけるオンラインコーナーケース検出のためのエンド・ツー・エンド・エンドとモジュラー・ドライビング・アプローチの統合

Integrating End-to-End and Modular Driving Approaches for Online Corner Case Detection in Autonomous Driving ( http://arxiv.org/abs/2409.01178v1 )

ライセンス: Link先を確認

Gemb Kaljavesi, Xiyan Su, Frank Diermeyer,

(参考訳) オンラインコーナーケース検出は、自動運転車の安全性を確保するために不可欠である。現在の自律運転アプローチは、モジュラーアプローチとエンドツーエンドアプローチに分類することができる。両手法の利点を生かして,エンド・ツー・エンドのアプローチをモジュールシステムに統合したオンラインコーナーケース検出手法を提案する。モジュールシステムは一次駆動タスクを引き継ぎ、エンド・ツー・エンドのネットワークは二次駆動として並列に動作し、システム間の不一致をコーナーケース検出に使用する。本手法を実車に実装し,定性的に評価する。本研究は,2次駆動システムとして,状況認識の優れたエンド・ツー・エンドネットワークが,コーナケースの検出に有効であることを示す。これらのことから,このようなアプローチは自動運転車の安全性を高める可能性を秘めていると考えられる。

Online corner case detection is crucial for ensuring safety in autonomous driving vehicles. Current autonomous driving approaches can be categorized into modular approaches and end-to-end approaches. To leverage the advantages of both, we propose a method for online corner case detection that integrates an end-to-end approach into a modular system. The modular system takes over the primary driving task and the end-to-end network runs in parallel as a secondary one, the disagreement between the systems is then used for corner case detection. We implement this method on a real vehicle and evaluate it qualitatively. Our results demonstrate that end-to-end networks, known for their superior situational awareness, as secondary driving systems, can effectively contribute to corner case detection. These findings suggest that such an approach holds potential for enhancing the safety of autonomous vehicles.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# Recoverable Compression: テキスト情報によるマルチモーダルビジョントークン復元機構

Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information ( http://arxiv.org/abs/2409.01179v1 )

ライセンス: Link先を確認

Yi Chen, Jian Xu, Xu-Yao Zhang, Wen-Zhuo Liu, Yang-Yang Liu, Cheng-Lin Liu,

(参考訳) 大規模言語モデリング技術の進歩により、視覚エンコーダと大規模言語モデルを組み合わせた大規模マルチモーダルモデルは、様々な視覚的タスクにおいて例外的な性能を示した。現在の大規模マルチモーダルモデルのほとんどは、ビジュアルエンコーダから得られた視覚的特徴を大きな言語モデルにマッピングし、下流タスクのテキストと並行して入力として使用することでこれを実現している。したがって、視覚トークンの数はモデルのトレーニングと推論速度に直接影響を与える。しかし、大規模なマルチモーダルモデルでは、トークンのプルーニングや圧縮に視覚情報に頼るだけで重要な情報が失われる可能性がある。一方、質問の形式でのテキスト入力には、質問に答えるのに役立つ貴重な情報が含まれており、モデルにさらなる知識を提供する。純粋に視覚的トークンプルーニング法で起こりうる潜在的な単純化と過剰なプルーニングに対処するために,テキスト情報を用いた動的視覚的トークン回復機構を提案する。このメカニズムは、質問テキストと視覚トークンの類似性を利用して、重要なテキスト情報で視覚的に意味のあるトークンを回収し、他の重要でないトークンをマージする。実験により,提案手法は,視覚トークンを平均10%まで圧縮しながら,従来の手法に匹敵する性能を示した。私たちのソースコードは受理後、公開されます。

With the advancement of large-scale language modeling techniques, large multimodal models combining visual encoders with large language models have demonstrated exceptional performance in various visual tasks. Most of the current large-scale multimodal models achieve this by mapping visual features obtained from the visual encoder into a large language model and using them as inputs alongside text for downstream tasks. Therefore, the number of visual tokens directly affects the training and inference speed of the model. There has been significant work on token pruning for visual transformers, but for large multimodal models, only relying on visual information for token pruning or compression may lead to significant loss of important information. On the other hand, the textual input in the form of a question may contain valuable information that can aid in answering the question, providing additional knowledge to the model. To address the potential oversimplification and excessive pruning that can occur with most purely visual token pruning methods, we propose a text information-guided dynamic visual token recovery mechanism that does not require training. This mechanism leverages the similarity between the question text and visual tokens to recover visually meaningful tokens with important text information while merging other less important tokens. Experimental results demonstrate that our proposed method achieves comparable performance to the original approach while compressing the visual tokens to an average of 10% of the original quantity. Our source code will be made publicly available following acceptance.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# PitVis-2023 : 内視鏡下下垂体手術ビデオにおけるワークフロー認識の試み

PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery ( http://arxiv.org/abs/2409.01184v1 )

ライセンス: Link先を確認

Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, Guoyan Zheng, Abdul Qayyum, Moona Mazher, Imran Razzak, Tianbin Li, Jin Ye, Junjun He, Szymon Płotka, Joanna Kaleta, Amine Yamlahi, Antoine Jund, Patrick Godau, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Dominik Rivoir, Alejandra Pérez, Santiago Rodriguez, Pablo Arbeláez, Danail Stoyanov, Hani J. Marcus, Sophia Bano,

(参考訳) 最小侵襲手術のビデオに応用されるコンピュータビジョンの分野は、ますます成長している。ワークフロー認識(Workflow recognition)とは、手術のさまざまな側面を自動認識することである。この情報は後に、手術を学ぶとき、生きた手術中、手術ノートを書くときに、臨床医を助けるために使われる。 The Pituitary Vision (PitVis) 2023 Challengeは、内視鏡下垂体手術のビデオで、コミュニティに認識のステップと計測を課している。これは、視覚を制限し歪ませる作業スペースが小さいことや、より正確なモデル予測を必要とする機器とステップの切り替えの頻度が高いことによる、他の最小侵襲の手術と比較してもユニークなタスクである。参加者には25本のビデオが提供され、2008-Oct-2023年、カナダのバンクーバーで開催された内視鏡的ビジョン2023チャレンジの一環としてMICCAI-2023で発表された。さまざまなディープラーニングモデルを使用して、6つの国にまたがる9チームからの18のサブミッションがあった。トップパフォーマンスモデルの共通性は、時空間法とマルチタスク法を採用し、ステップと楽器の認識において、純粋にスペーシャルな単一タスクモデルよりも50%以上、マクロF1スコアが10%以上改善されたことである。したがって、PitVis-2023 Challengeは、最小侵襲手術における最先端のコンピュータビジョンモデルが新しいデータセットに転送可能であることを示した。ベンチマーク結果は論文に記載されており、データセットはhttps://doi.org/10.5522/04/26531686で公開されている。

The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery: including which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery; during live surgery; and when writing operation notes. The Pituitary Vision (PitVis) 2023 Challenge tasks the community to step and instrument recognition in videos of endoscopic pituitary surgery. This is a unique task when compared to other minimally invasive surgeries due to the smaller working space, which limits and distorts vision; and higher frequency of instrument and step switching, which requires more precise model predictions. Participants were provided with 25-videos, with results presented at the MICCAI-2023 conference as part of the Endoscopic Vision 2023 Challenge in Vancouver, Canada, on 08-Oct-2023. There were 18-submissions from 9-teams across 6-countries, using a variety of deep learning models. A commonality between the top performing models was incorporating spatio-temporal and multi-task methods, with greater than 50% and 10% macro-F1-score improvement over purely spacial single-task models in step and instrument recognition respectively. The PitVis-2023 Challenge therefore demonstrates state-of-the-art computer vision models in minimally invasive surgery are transferable to a new dataset, with surgery specific techniques used to enhance performance, progressing the field further. Benchmark results are provided in the paper, and the dataset is publicly available at: https://doi.org/10.5522/04/26531686.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# 自己監督型, 生成型学習によるバックドアディフェンス

Backdoor Defense through Self-Supervised and Generative Learning ( http://arxiv.org/abs/2409.01185v1 )

ライセンス: Link先を確認

Ivan Sabolić, Ivan Grubišić, Siniša Šegvić,

(参考訳) バックドア攻撃は、手作りのトリガーを導入し、対応するラベルを望ましいターゲットクラスに切り替えることで、トレーニングデータのごく一部を変更する。このようなデータのトレーニングは、選択されたテストサンプルに悪意のある推論を引き起こすバックドアを注入する。ほとんどの防衛は、差別的な学習手順の様々な修正を通じて、このような攻撃を緩和する。対照的に、自己教師付き表現空間におけるクラスごとの分布の生成モデルに基づくアプローチについて検討する。興味深いことに、これらの表現は最近のバックドア攻撃で保存されるか、ひどく乱される。どちらの場合も、クラスごとの生成モデルにより、有毒なデータを検出し、データセットをクリーン化することができます。実験により、クリーン化されたデータセットでのトレーニングは、攻撃の成功率を大幅に低減し、良心的な入力の精度を維持することが示された。

Backdoor attacks change a small portion of training data by introducing hand-crafted triggers and rewiring the corresponding labels towards a desired target class. Training on such data injects a backdoor which causes malicious inference in selected test samples. Most defenses mitigate such attacks through various modifications of the discriminative learning procedure. In contrast, this paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space. Interestingly, these representations get either preserved or heavily disturbed under recent backdoor attacks. In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset. Experiments show that training on cleansed dataset greatly reduces the attack success rate and retains the accuracy on benign inputs.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# ボソン・スピンモデルにおけるホレボ境界と客観性

Holevo bound and objectivity in the boson-spin model ( http://arxiv.org/abs/2409.01186v1 )

ライセンス: Link先を確認

Tae-Hun Lee, Jarosław K. Korbicz,

(参考訳) 量子系における客観的で古典的な性質の出現は、量子情報理論の現代言語で説明できる。本稿では、そのような分析の例を示す。我々は、量子チャネル理論をオープン量子系のボソンスピンモデルに適用し、リコイルレス近似とフロケ理論を用いて、チャネルの容量を束縛するホエボ量を用いて、中央系の情報をその環境にブロードキャストする。本研究は, 短期体制を解析し, キャパシティの初期成長と漸近体制を2次的に示す。温度や環境のトンネルエネルギーなどのモデルパラメータへの複雑な依存も分析し、ホレヴォ境界が最大に達するようなeg状態を示す。

Emergence of objective, classical properties in quantum systems can be described in the modern language of quantum information theory. In this work, we present an example of such an analysis. We apply the quantum channel theory to a boson-spin model of open quantum systems and calculate, using recoilless approximation and the Floquet theory, the Hoevo quantity, which bounds the capacity of the channel, broadcasting information about the central system into its environment. We analyze both the short-time regime, showing quadratic in time initial growth of the capacity, and the asymptotic regime. Complicated dependence on the model parameters, such as temperature, tunneling energy for the environment, etc is also analyzed, showing e.g. regimes where the Holevo bound reaches its maximum.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# 空間モード分解法による単一光子超解光分光

Single-photon super-resolved spectroscopy from spatial-mode demultiplexing ( http://arxiv.org/abs/2409.01190v1 )

ライセンス: Link先を確認

Luigi Santamaria Amato, Fabrizio Sgobba, Deborah Pallotti, Cosmo Lupo,

(参考訳) サブ回折分解能を持つ非コヒーレント光の分光を実証する。原理実証実験では、回折限界以下で分離する不整点のような一対の源のスペクトルを解析する。この2つの源は惑星系を模しており、恒星の光源は明るく、惑星の光源は暗くなっている。 2つの画像が重なり合っているため、二次音源のスペクトル情報を取得することは困難である。この制限は、空間モードデマルチプレクシングに基づく構造化された測定を利用して解決され、光は横磁場のエルミート・ガウス成分で最初にソートされ、光子検出によって測定される。これにより、2つの源からの光子を効果的に分離することができます。太陽系外惑星の大気スペクトロスコピーを強化するための応用が提案されている。空間デマルチプレクシングに基づく超高分解能イメージングのいくつかの実験が過去数年間に行われ、有望な結果が得られた。ここでは、私たちの知識を最大限に活用するために、この概念を分光領域に拡張する。

We demonstrate spectroscopy of incoherent light with sub-diffraction resolution. In a proof-of-principle experiment we analyze the spectrum of a pair of incoherent point-like sources whose separation is below the diffraction limit. The two sources mimic a planetary system, with a brighter source for the star and a dimmer one for the planet. Acquiring spectral information about the secondary source is hard because the two images have a substantial overlap. This limitation is solved by leveraging a structured measurement based on spatial-mode demultiplexing, where light is first sorted in its Hermite-Gaussian components in the transverse field, then measured by photon detection. This allows us to effectively decouple the photons coming from the two sources. An application is suggested to enhance exoplanets' atmosphere spectroscopy. A number of experiments of super-resolution imaging based on spatial demultiplexing have been conducted in the past few years, with promising results. Here, for the first time to the best of our knowledge, we extend this concept to the domain of spectroscopy.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# CLIBE: トランスフォーマーベースNLPモデルにおける動的バックドアの検出

CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models ( http://arxiv.org/abs/2409.01193v1 )

ライセンス: Link先を確認

Rui Zeng, Xi Chen, Yuwen Pu, Xuhong Zhang, Tianyu Du, Shouling Ji,

(参考訳) バックドアはNLPモデルに注入され、入力テキストにトリガーと呼ばれる特定の機能が含まれており、攻撃者が秘密に選択した場合に誤動作を誘発する。静的テキストトリガで使用される固定語、フレーズ、文とは異なり、NLP動的バックドアは抽象的および潜在的なテキスト機能に関連する設計トリガーを攻撃し、従来の静的バックドア攻撃よりもかなりステルス性が高い。しかし、NLPバックドア検出に関する既存の研究は、主に静的バックドア攻撃に対する防御に焦点を当てているが、NLPモデルにおける動的バックドアの検出は明らかにされていない。本稿では, Transformer ベースの NLP モデルで動的バックドアを検出する最初のフレームワークである CLIBE を提案する。 CLIBEは、ターゲットラベルとして限られた数の参照サンプルを分類するように、注目層に最適化された重量摂動を組み込むことで、疑似トランスフォーマーモデルに「ファウショット摂動」を注入する。その後、CLIBEは、この数発の摂動の一般化能力を利用して、元のモデルが動的バックドアを含むかどうかを判断する。 3つの高度なNLP動的バックドア攻撃,2つの広く使用されているトランスフォーマーフレームワーク,および4つの実世界の分類タスクに対する広範囲な評価は,CLIBEの有効性を強く検証する。また,様々なアダプティブアタックに対するCLIBEの堅牢性を示す。さらに、CLIBEを用いて、Hugging Face上で49の人気のTransformerモデルを精査し、動的バックドアを含む確率の高いモデルを見つける。我々はHugging Faceにコンタクトを取り、このモデルのバックドア動作の詳細な証拠を提供した。さらに、CLIBEを拡張し、有害な振る舞いを示すように修正されたバックドアテキスト生成モデルを検出する。私たちの知る限り、CLIBEは、入力テストサンプルをトリガーすることなく、テキスト生成モデルのバックドアを検出することができる最初のフレームワークです。

Backdoors can be injected into NLP models to induce misbehavior when the input text contains a specific feature, known as a trigger, which the attacker secretly selects. Unlike fixed words, phrases, or sentences used in the static text trigger, NLP dynamic backdoor attacks design triggers associated with abstract and latent text features, making them considerably stealthier than traditional static backdoor attacks. However, existing research on NLP backdoor detection primarily focuses on defending against static backdoor attacks, while detecting dynamic backdoors in NLP models remains largely unexplored. This paper presents CLIBE, the first framework to detect dynamic backdoors in Transformer-based NLP models. CLIBE injects a "few-shot perturbation" into the suspect Transformer model by crafting optimized weight perturbation in the attention layers to make the perturbed model classify a limited number of reference samples as a target label. Subsequently, CLIBE leverages the generalization ability of this few-shot perturbation to determine whether the original model contains a dynamic backdoor. Extensive evaluation on three advanced NLP dynamic backdoor attacks, two widely-used Transformer frameworks, and four real-world classification tasks strongly validates the effectiveness of CLIBE. We also demonstrate the robustness of CLIBE against various adaptive attacks. Furthermore, we employ CLIBE to scrutinize 49 popular Transformer models on Hugging Face and discover one exhibiting a high probability of containing a dynamic backdoor. We have contacted Hugging Face and provided detailed evidence of this model's backdoor behavior. Moreover, we extend CLIBE to detect backdoor text generation models modified to exhibit toxic behavior. To the best of our knowledge, CLIBE is the first framework capable of detecting backdoors in text generation models without access to trigger input test samples.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# 新生児脳における学習に基づく繊維配向分布推定における地表面構造の影響

Ground-truth effects in learning-based fiber orientation distribution estimation in neonatal brains ( http://arxiv.org/abs/2409.01195v1 )

ライセンス: Link先を確認

Rizhong Lin, Hamza Kebiri, Ali Gholipour, Yufei Chen, Jean-Philippe Thiran, Davood Karimi, Meritxell Bach Cuadra,

(参考訳) 拡散磁気共鳴イメージング(Diffusion Magnetic Resonance Imaging, DMRI)は、脳の微細構造を生体内で描写する非侵襲的な方法である。ファイバ配向分布(英: Fiber orientation distributions、FOD)は、ホワイトマターファイバーの構成をマッピングするのに広く用いられる数学的表現である。近年、深層ニューラルネットワークを用いたFOD推定は成功しており、特に拡散測定の少ない新生児では成功している。これらの方法は、主にマルチシェルの制約付き球状脱畳(MSMT-CSD)を用いて再構成されたターゲットFODに基づいて訓練されている。本稿では,MSMT-CSDとS3T-CSDによる制約付き球面デコンボリューション(SS3T-CSD)の両面において,U-Netアーキテクチャに基づく最先端モデルのトレーニングにより,この仮説を検証する。以上の結果より, SS3T-CSDとSS3T-CSDとの単繊維推定ボクセルの比率はMSMT-CSDよりも現実的であることが示唆された。さらに入力勾配方向の増大はMSMT-CSDよりもSS3T-CSDの性能を著しく向上させる。最後に、年齢領域シフト設定では、SS3T-CSDは年齢群全体で堅牢なパフォーマンスを維持しており、より正確な新生児脳画像撮影の可能性を示している。

Diffusion Magnetic Resonance Imaging (dMRI) is a non-invasive method for depicting brain microstructure in vivo. Fiber orientation distributions (FODs) are mathematical representations extensively used to map white matter fiber configurations. Recently, FOD estimation with deep neural networks has seen growing success, in particular, those of neonates estimated with fewer diffusion measurements. These methods are mostly trained on target FODs reconstructed with multi-shell multi-tissue constrained spherical deconvolution (MSMT-CSD), which might not be the ideal ground truth for developing brains. Here, we investigate this hypothesis by training a state-of-the-art model based on the U-Net architecture on both MSMT-CSD and single-shell three-tissue constrained spherical deconvolution (SS3T-CSD). Our results suggest that SS3T-CSD might be more suited for neonatal brains, given that the ratio between single and multiple fiber-estimated voxels with SS3T-CSD is more realistic compared to MSMT-CSD. Additionally, increasing the number of input gradient directions significantly improves performance with SS3T-CSD over MSMT-CSD. Finally, in an age domain-shift setting, SS3T-CSD maintains robust performance across age groups, indicating its potential for more accurate neonatal brain imaging.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# 電気・磁気・マルチポール相互作用のための統一偏極形式

A unifying polarization formalism for electric- and magnetic-multipole interactions ( http://arxiv.org/abs/2409.01197v1 )

ライセンス: Link先を確認

R. Casini, R. Manso Sainz, A. Lopez Ariste, N. Kaikati,

(参考訳) 偏極のための球面テンソル形式は、任意の順序の電気的および磁気的マルチポール遷移の処理に拡張する。我々は、原子系と偏光場との相互作用を記述する作用素のテンソル形式を導出するために、球面波の膨張に頼っており、これは自然界の偏光特性を記述する球面テンソルの導入につながっている。直接応用として、フォーマリズムは電気四極子転移における放射の散乱に影響を及ぼす放射異方性をモデル化し、磁場の存在下でのハンル効果をモデル化するために用いられる。

We extend the spherical tensorial formalism for polarization to the treatment of electric- and magnetic-multipole transitions of any order. We rely on the spherical-wave expansion to derive the tensor form of the operator describing the interaction of the atomic system with a polarized radiation field, which naturally leads to the introduction of spherical tensors describing the polarization properties of the interacting field. As a direct application, the formalism is used to model the radiation anisotropy affecting the scattering of radiation in an electric-quadrupole transition, and the associated Hanle effect in the presence of a magnetic field.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# OD-VAE:潜時ビデオ拡散モデル改善のための全次元ビデオ圧縮機

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model ( http://arxiv.org/abs/2409.01199v1 )

ライセンス: Link先を確認

Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinghua Cheng, Li Yuan,

(参考訳) 可変オートエンコーダ (VAE) は遅延表現に動画を圧縮し、遅延ビデオ拡散モデル (LVDM) に先行する重要なコンポーネントである。再現品質が同じであれば、ビデオに対するVAEの圧縮が十分であればなるほど、LVDMはより効率的になります。しかし、ほとんどのLVDMは、ビデオの圧縮が空間次元でのみ行われ、時間次元ではしばしば無視される2D画像VAEを使用している。正確な再現を約束しながら、より簡潔な潜在表現を得るために、VAE内のビデオの時間的圧縮を実行する方法はめったにない。このギャップを埋めるために、時間的・空間的に動画を圧縮できるOD-VAEという全次元圧縮VAEを提案する。 OD-VAEのより十分な圧縮は、ビデオ再構成に大きな課題をもたらすが、細かな設計によって高い再構成精度を達成することができる。映像再構成品質と圧縮速度のトレードオフを改善するために、OD-VAEの4つの変種を導入分析する。さらに、OD-VAEをより効率的にトレーニングするための新しいテール初期化を設計し、GPUメモリに制限のある任意の長さの動画をOD-VAEが扱えるようにするための新しい推論戦略を提案する。ビデオ再構成とLVDMに基づくビデオ生成に関する総合的な実験により,提案手法の有効性と有効性を示した。

Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ignored in the temporal dimension. How to conduct temporal compression for videos in a VAE to obtain more concise latent representations while promising accurate reconstruction is seldom explored. To fill this gap, we propose an omni-dimension compression VAE, named OD-VAE, which can temporally and spatially compress videos. Although OD-VAE's more sufficient compression brings a great challenge to video reconstruction, it can still achieve high reconstructed accuracy by our fine design. To obtain a better trade-off between video reconstruction quality and compression speed, four variants of OD-VAE are introduced and analyzed. In addition, a novel tail initialization is designed to train OD-VAE more efficiently, and a novel inference strategy is proposed to enable OD-VAE to handle videos of arbitrary length with limited GPU memory. Comprehensive experiments on video reconstruction and LVDM-based video generation demonstrate the effectiveness and efficiency of our proposed methods.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# EnCLAP++: 自動オーディオキャプションパフォーマンスを最適化するためのEnCLAPフレームワークの分析

EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance ( http://arxiv.org/abs/2409.01201v1 )

ライセンス: Link先を確認

Jaeyeon Kim, Minjeon Jeon, Jaeyoon Jung, Sang Hoon Woo, Jinjoo Lee,

(参考訳) 本研究では,音声の自動キャプションにおける最先端モデルであるEnCLAPフレームワークの解析と最適化を目的とする。本研究では,音響エンコーダコンポーネントの変更の影響について検討し,異なるデータセットスケールでの事前学習について検討し,再分類方式の有効性について検討する。生成されたキャプションの広範な実験と定量的解析により,オリジナルをはるかに上回る拡張版であるEnCLAP++を開発した。

In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++, an enhanced version that significantly surpasses the original.

翻訳日:2024-09-06 07:01:54 公開日:2024-09-02

# 一般産業インテリジェンスに向けて:IIoT強化型連続型大規模モデルに関する調査

Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models ( http://arxiv.org/abs/2409.01207v1 )

ライセンス: Link先を確認

Jiao Chen, Jiayi He, Fangfang Chen, Zuohong Lv, Jianhua Tang, Weihua Li, Zuozhu Liu, Howard H. Yang, Guangjie Han,

(参考訳) 現在、Industrial Internet of Things(IIoT)のほとんどのアプリケーションは、依然としてCNNベースのニューラルネットワークに依存している。言語、ビジョン、マルチモーダルモデルを含むトランスフォーマーベースの大規模モデル(LM)は、AIGC(AIGC)において印象的な能力を示してきたが、検出、計画、制御といった産業分野への応用は依然として比較的限られている。産業環境に事前訓練されたLMを配置することは、タスクの複雑さ、データの多様性、ユーザ要求の動的性質などにより、安定性と可塑性の課題に直面することが多い。これらの課題に対処するために、事前学習と微調整の戦略と継続的学習は効果的なソリューションであることが証明され、モデルが動的要求に適応し、推論と意思決定能力を継続的に最適化できるようになりました。本稿では, GII における LM と GII 上の LM の2つの重要な領域に着目し, IIoT による汎用産業情報 (GII) への LM の統合について検討する。前者は産業用アプリケーションの課題に対して最適化されたソリューションを提供するためにLMを活用することに焦点を当て、後者は産業用デバイス、エッジコンピューティング、クラウドコンピューティングを含む協調シナリオにおけるLM学習と推論機能の継続的な最適化について研究している。本稿では,より汎用的で適応的な未来に向けて,GIIの総合的な理論枠組みと研究方向性を確立することを目的とした,GIIの今後の発展に関する知見を提供する。

Currently, most applications in the Industrial Internet of Things (IIoT) still rely on CNN-based neural networks. Although Transformer-based large models (LMs), including language, vision, and multimodal models, have demonstrated impressive capabilities in AI-generated content (AIGC), their application in industrial domains, such as detection, planning, and control, remains relatively limited. Deploying pre-trained LMs in industrial environments often encounters the challenge of stability and plasticity due to the complexity of tasks, the diversity of data, and the dynamic nature of user demands. To address these challenges, the pre-training and fine-tuning strategy, coupled with continual learning, has proven to be an effective solution, enabling models to adapt to dynamic demands while continuously optimizing their inference and decision-making capabilities. This paper surveys the integration of LMs into IIoT-enhanced General Industrial Intelligence (GII), focusing on two key areas: LMs for GII and LMs on GII. The former focuses on leveraging LMs to provide optimized solutions for industrial application challenges, while the latter investigates continuous optimization of LMs learning and inference capabilities in collaborative scenarios involving industrial devices, edge computing, and cloud computing. This paper provides insights into the future development of GII, aiming to establish a comprehensive theoretical framework and research direction for GII, thereby advancing GII towards a more general and adaptive future.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# 欠測データ付き混合型データの統計的ジャンプモデル

Statistical Jump Model for Mixed-Type Data with Missing Data Imputation ( http://arxiv.org/abs/2409.01208v1 )

ライセンス: Link先を確認

Federico P. Cortese, Antonio Pievatolo,

(参考訳) 本稿では,混合型データに対する統計的ジャンプモデルを導入することで,時間的進化を伴う混合型データをクラスタリングすることの課題に対処する。この新しいフレームワークは、状態の持続性、解釈可能性の向上、状態スイッチの頻度の低減、および欠落したデータの効率的な処理を含む。このモデルは、状態条件の手段とモードで容易に解釈でき、実践者や政策立案者にはアクセス可能である。本研究では, 従来の大気質指標と比較して, 大気質の持続的な状態の推測において, その優位性を示すとともに, 大気質データへの実証的応用を通じて, 本手法の有効性を検証した。コントリビューションには、混合型時間クラスタリングの堅牢な方法、効果的なデータ管理の欠如、環境モニタリングの実践的洞察が含まれている。

In this paper, we address the challenge of clustering mixed-type data with temporal evolution by introducing the statistical jump model for mixed-type data. This novel framework incorporates regime persistence, enhancing interpretability and reducing the frequency of state switches, and efficiently handles missing data. The model is easily interpretable through its state-conditional means and modes, making it accessible to practitioners and policymakers. We validate our approach through extensive simulation studies and an empirical application to air quality data, demonstrating its superiority in inferring persistent air quality regimes compared to the traditional air quality index. Our contributions include a robust method for mixed-type temporal clustering, effective missing data management, and practical insights for environmental monitoring.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# MobileIQA:知識蒸留を用いた非参照画像品質評価のためのモバイルレベルのディバイスオピニオンネットワークのエクスプロイト

MobileIQA: Exploiting Mobile-level Diverse Opinion Network For No-Reference Image Quality Assessment Using Knowledge Distillation ( http://arxiv.org/abs/2409.01212v1 )

ライセンス: Link先を確認

Zewen Chen, Sunhan Xu, Yun Zeng, Haochen Guo, Jian Guo, Shuai Liu, Juan Wang, Bing Li, Weiming Hu, Dehua Liu, Hesong Li,

(参考訳) 高解像度(HR)画像の需要が高まる中、NR-IQA(No-Reference Image Quality Assessment)が注目されるようになり、モバイルデバイス上でのリアルタイムな画質向上とユーザエクスペリエンスの向上を実現している。しかし、既存のNR-IQA法では、HR画像を小さな解像度にリサイズまたはトリミングすることが多く、重要な詳細が失われる。そして、そのほとんどは計算量が多いため、計算資源が限られているため、モバイル機器への応用を妨げている。これらの課題に対処するため,高解像度入力により画像の詳細を保存しながら,画像品質を効率的に評価する,軽量なバックボーンを用いた新しい手法であるMobileIQAを提案する。 MobileIQAは、多視点アテンション学習(MAL)モジュールを用いて、データセットアノテーションプロセス中に異なるアノテータによって提供される主観的な意見をシミュレートする。モデルは教師モデルを使用して、知識蒸留を通して学生モデルの学習を誘導する。この方法は高い性能を維持しながら計算複雑性を著しく低減する。実験により、MobileIQAは、評価指標と計算効率において、新しいIQA法よりも優れていることが示された。コードはhttps://github.com/chencn2020/MobileIQA.comで入手できる。

With the rising demand for high-resolution (HR) images, No-Reference Image Quality Assessment (NR-IQA) gains more attention, as it can ecaluate image quality in real-time on mobile devices and enhance user experience. However, existing NR-IQA methods often resize or crop the HR images into small resolution, which leads to a loss of important details. And most of them are of high computational complexity, which hinders their application on mobile devices due to limited computational resources. To address these challenges, we propose MobileIQA, a novel approach that utilizes lightweight backbones to efficiently assess image quality while preserving image details through high-resolution input. MobileIQA employs the proposed multi-view attention learning (MAL) module to capture diverse opinions, simulating subjective opinions provided by different annotators during the dataset annotation process. The model uses a teacher model to guide the learning of a student model through knowledge distillation. This method significantly reduces computational complexity while maintaining high performance. Experiments demonstrate that MobileIQA outperforms novel IQA methods on evaluation metrics and computational efficiency. The code is available at https://github.com/chencn2020/MobileIQA.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# スキュー特徴密度を考慮した教師付きパターン認識

Supervised Pattern Recognition Involving Skewed Feature Densities ( http://arxiv.org/abs/2409.01213v1 )

ライセンス: Link先を確認

Alexandre Benatti, Luciano da F. Costa,

(参考訳) パターン認識は、多くの科学および技術活動の基礎となる特に重要な課題である。同時に、パターン認識には、データ要素を表現する機能の選択や、可能な各変換など、いくつかの課題が含まれている。本研究は, 類似度指数に基づくユークリッド距離の分類ポテンシャルと相似類似度指数に基づく相似性指数を, k-neighbors による分類法を用いて比較した。重なりのない, あるいは重複しない, それぞれの密度を特徴とする2つの群が与えられた場合, ユークリッド距離に基づくk近傍法の性能を類似度指標として定量的に評価するために, 異なるタイプの変換が得られた。より具体的には、隣り合う2つの群の密度の間の交点を分類する精度が比較に考慮される。また,データ要素間の比較のシャープさが,各教師付き分類性能とは無関係であることが確認された。

Pattern recognition constitutes a particularly important task underlying a great deal of scientific and technologica activities. At the same time, pattern recognition involves several challenges, including the choice of features to represent the data elements, as well as possible respective transformations. In the present work, the classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared by using the k-neighbors supervised classification method respectively to features resulting from several types of transformations of one- and two-dimensional symmetric densities. Given two groups characterized by respective densities without or with overlap, different types of respective transformations are obtained and employed to quantitatively evaluate the performance of k-neighbors methodologies based on the Euclidean distance an coincidence similarity index. More specifically, the accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account for the comparison. Several interesting results are described and discussed, including the enhanced potential of the dissimilarity index for classifying datasets with right skewed feature densities, as well as the identification that the sharpness of the comparison between data elements can be independent of the respective supervised classification performance.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# PythonエコシステムにおけるSBOM生成ツール:詳細分析

SBOM Generation Tools in the Python Ecosystem: an In-Detail Analysis ( http://arxiv.org/abs/2409.01214v1 )

ライセンス: Link先を確認

Serena Cofano, Giacomo Benedetti, Matteo Dell'Amico,

(参考訳) SBOM(Software Bills of Materials)は、ソフトウェアを構成するコンポーネントをリストアップすることで透明性を向上させるもので、ソフトウェアサプライチェーン攻撃の実施問題に対する重要な対策である。 SBOM生成ツールは、プロジェクトソースファイルを取り込み、SBOMを出力として提供し、ソフトウェアエコシステムと相互作用する。 SBOMはセキュリティ実践者にとって大幅に改善されているが、完全かつ正しいSBOMを提供することは依然として未解決の問題である。本稿では,SBOMの完全性と正しさに影響を及ぼす問題の原因をPyPIエコシステムに焦点をあてて検討する。我々はCycloneDX標準を用いて4つの人気のあるSBOM生成ツールを分析する。私たちの分析では、依存関係のバージョン、メタデータファイル、リモート依存関係、オプションの依存関係に関する問題を取り上げています。さらに,PyPIエコシステムにおけるメタデータの標準が欠如していることから,体系的な問題を見出した。これにはメタデータファイルの存在の矛盾や、コンテンツのフォーマットのバリエーションが含まれている。

Software Bills of Material (SBOMs), which improve transparency by listing the components constituting software, are a key countermeasure to the mounting problem of Software Supply Chain attacks. SBOM generation tools take project source files and provide an SBOM as output, interacting with the software ecosystem. While SBOMs are a substantial improvement for security practitioners, providing a complete and correct SBOM is still an open problem. This paper investigates the causes of the issues affecting SBOM completeness and correctness, focusing on the PyPI ecosystem. We analyze four popular SBOM generation tools using the CycloneDX standard. Our analysis highlights issues related to dependency versions, metadata files, remote dependencies, and optional dependencies. Additionally, we identified a systematic issue with the lack of standards for metadata in the PyPI ecosystem. This includes inconsistencies in the presence of metadata files as well as variations in how their content is formatted.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# ESP-PCT:ポイントクラウドトランスにおける時間的・空間的冗長性の効率的な圧縮によるVRセマンティックパフォーマンスの向上

ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers ( http://arxiv.org/abs/2409.01216v1 )

ライセンス: Link先を確認

Luoyu Mei, Shuai Wang, Yun Cheng, Ruofeng Liu, Zhimeng Yin, Wenchao Jiang, Shuai Wang, Wei Gong,

(参考訳) セマンティック認識は仮想現実(VR)アプリケーションにおいて重要なものであり、没入的でインタラクティブな体験を可能にする。有望なアプローチは、ミリ波(mmWave)信号を利用して点雲を生成することである。しかし、現在のmmWaveポイントクラウドモデルの高い計算とメモリ要求は、その効率と信頼性を妨げている。この制限に対処するため,本論文では,VRアプリケーションに適した2段階のセマンティック・セマンティック・パフォーマンス・ポイント・クラウド・トランスフォーマであるESP-PCTを紹介する。 ESP-PCTは、センサポイントクラウドデータの精度を活用し、ローカライゼーションとフォーカスステージをエンドツーエンドで共同でトレーニングする意味認識プロセスを最適化する。各種VRセマンティック認識条件でESP-PCTを評価し,認識効率を大幅に向上させた。特に、ESP-PCTは計算要求(FLOP)を76.9%削減し、メモリ使用量を78.2%削減し、93.2%の精度を達成している。これらのことは、高い精度と冗長性を低下させることにより、VRセマンティック認識におけるESP-PCTの可能性を強調している。このプロジェクトのコードとデータは \url{https://github.com/lymei-SEU/ESP-PCT} で公開されている。

Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a novel Enhanced Semantic Performance Point Cloud Transformer with a two-stage semantic recognition framework tailored for VR applications. ESP-PCT takes advantage of the accuracy of sensory point cloud data and optimizes the semantic recognition process, where the localization and focus stages are trained jointly in an end-to-end manner. We evaluate ESP-PCT on various VR semantic recognition conditions, demonstrating substantial enhancements in recognition efficiency. Notably, ESP-PCT achieves a remarkable accuracy of 93.2% while reducing the computational requirements (FLOPs) by 76.9% and memory usage by 78.2% compared to the existing Point Transformer model simultaneously. These underscore ESP-PCT's potential in VR semantic recognition by achieving high accuracy and reducing redundancy. The code and data of this project are available at \url{https://github.com/lymei-SEU/ESP-PCT}.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# 低資源テキストから音声への多言語学習戦略

A multilingual training strategy for low resource Text to Speech ( http://arxiv.org/abs/2409.01217v1 )

ライセンス: Link先を確認

Asma Amalas, Mounir Ghogho, Mohamed Chetouani, Rachid Oulad Haj Thami,

(参考訳) 近年, 音声合成技術は, ニューラルテキスト・トゥ・スピーチ(TTS)の進歩により, 高品質な音声合成を実現している。しかし、これらのTSモデルは、生産にコストがかかり、既存のすべての言語にはスケーラビリティが低い膨大なデータに依存しており、特に低リソース言語にはほとんど注目されない。知識伝達のような技術により、データセット作成の負担を軽減することができる。そこで本稿では,ソーシャルメディアからのデータを小さなTSデータセット構築に使用することができるか,低リソース言語における言語間変換学習(TL)が,このタイプのデータを扱うことができるか,という2つの側面について検討する。本稿では,単言語コーパスの学習の代替として,多言語モデリングがどの程度活用できるかを具体的に評価する。そこで本稿では,対象とする低リソース言語に対するTSモデルをトレーニングするために,外国語からのデータをどのように選択し,プールするかを検討する。以上の結果から,多言語事前学習は単言語事前学習よりも,生成した音声の明瞭さと自然性を高めることが示唆された。

Recent speech technologies have led to produce high quality synthesised speech due to recent advances in neural Text to Speech (TTS). However, such TTS models depend on extensive amounts of data that can be costly to produce and is hardly scalable to all existing languages, especially that seldom attention is given to low resource languages. With techniques such as knowledge transfer, the burden of creating datasets can be alleviated. In this paper, we therefore investigate two aspects; firstly, whether data from social media can be used for a small TTS dataset construction, and secondly whether cross lingual transfer learning (TL) for a low resource language can work with this type of data. In this aspect, we specifically assess to what extent multilingual modeling can be leveraged as an alternative to training on monolingual corporas. To do so, we explore how data from foreign languages may be selected and pooled to train a TTS model for a target low resource language. Our findings show that multilingual pre-training is better than monolingual pre-training at increasing the intelligibility and naturalness of the generated speech.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# 画像検索技術の概観:データ強化と逆学習アプローチ

A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches ( http://arxiv.org/abs/2409.01219v1 )

ライセンス: Link先を確認

Kim Jinwoo,

(参考訳) 画像検索はコンピュータビジョンにおいて重要な研究課題であり、オンライン製品検索からセキュリティ監視システムまで幅広い応用の見通しがある。近年,ディープラーニングの進歩により画像検索の精度と効率が著しく向上している。しかし、既存の手法はまだ多くの課題に直面しており、特に大規模データセット、クロスドメイン検索、照明の変動、閉塞、視点などの現実的な条件から生じるイメージ摂動を扱う。これらの課題に対処するため,画像検索の分野では,データ拡張手法や逆学習法が広く応用されている。データ拡張は、より多様なトレーニングサンプルを生成し、現実世界のバリエーションをシミュレートし、オーバーフィッティングを減らすことで、モデルの一般化能力と堅牢性を高める。一方、敵の攻撃と防衛は、潜在的な攻撃に対するモデルの堅牢性を改善し、実用的な応用における信頼性を確保するために、訓練中に摂動を導入する。本稿では,画像検索における最新の研究成果を包括的に要約し,検索性能向上におけるデータ強化と対人学習技術の役割に着目した。今後の方向性や潜在的な課題についても論じる。

Image retrieval is a crucial research topic in computer vision, with broad application prospects ranging from online product searches to security surveillance systems. In recent years, the accuracy and efficiency of image retrieval have significantly improved due to advancements in deep learning. However, existing methods still face numerous challenges, particularly in handling large-scale datasets, cross-domain retrieval, and image perturbations that can arise from real-world conditions such as variations in lighting, occlusion, and viewpoint. Data augmentation techniques and adversarial learning methods have been widely applied in the field of image retrieval to address these challenges. Data augmentation enhances the model's generalization ability and robustness by generating more diverse training samples, simulating real-world variations, and reducing overfitting. Meanwhile, adversarial attacks and defenses introduce perturbations during training to improve the model's robustness against potential attacks, ensuring reliability in practical applications. This review comprehensively summarizes the latest research advancements in image retrieval, with a particular focus on the roles of data augmentation and adversarial learning techniques in enhancing retrieval performance. Future directions and potential challenges are also discussed.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# THInC: コンピュータ・ヒューム検出のための理論駆動型フレームワーク

THInC: A Theory-Driven Framework for Computational Humor Detection ( http://arxiv.org/abs/2409.01232v1 )

ライセンス: Link先を確認

Victor De Marez, Thomas Winters, Ayla Rigouts Terryn,

(参考訳) ヒューマンコミュニケーションと認知の基本的な側面は、社会的なエンゲージメントにおいて重要な役割を果たすためである。ユーモアに関する理論は何世紀にもわたって進化してきたが、単一の総合的なユーモア理論についてはまだ合意が得られていない。同様に、大規模言語モデルの最近の進歩にもかかわらず、コンピュータでユーモアを認識することは重要な課題である。さらに、ユーモアを検出するためのほとんどの計算手法は、既存のユーモア理論に基づいていない。本稿では、THInC(Theory-driven Humor Interpretation and Classification)と呼ばれる複数のユーモア理論に基づく、ユーモア分類のための解釈可能なフレームワークを作成することにより、ユーモア理論研究と計算ユーモア検出の長年のギャップを埋めることに貢献した。 THInCは解釈可能なGA2M分類器をアンサンブルし、それぞれ異なるユーモア理論を表す。私たちは、理論の異なる側面を定量的に反映するプロキシ機能を積極的に作成するために、透明なフローを設計しました。このフレームワークの実装により、F1スコアは0.85となる。フレームワークの連想的解釈可能性により、プロキシの有効性の分析、ジョークの特徴と理論のアライメント、グローバルに貢献する特徴の同定が可能になる。本稿では,様々なユーモア理論から情報を得て,理論駆動型ユーモア分類の今後の発展のための基盤を提供するユーモア検出フレームワークの構築に向けた先駆的な取り組みを示す。また、ユーモア理論を定量的に自動比較する第一歩として機能する。

Humor is a fundamental aspect of human communication and cognition, as it plays a crucial role in social engagement. Although theories about humor have evolved over centuries, there is still no agreement on a single, comprehensive humor theory. Likewise, computationally recognizing humor remains a significant challenge despite recent advances in large language models. Moreover, most computational approaches to detecting humor are not based on existing humor theories. This paper contributes to bridging this long-standing gap between humor theory research and computational humor detection by creating an interpretable framework for humor classification, grounded in multiple humor theories, called THInC (Theory-driven Humor Interpretation and Classification). THInC ensembles interpretable GA2M classifiers, each representing a different humor theory. We engineered a transparent flow to actively create proxy features that quantitatively reflect different aspects of theories. An implementation of this framework achieves an F1 score of 0.85. The associative interpretability of the framework enables analysis of proxy efficacy, alignment of joke features with theories, and identification of globally contributing features. This paper marks a pioneering effort in creating a humor detection framework that is informed by diverse humor theories and offers a foundation for future advancements in theory-driven humor classification. It also serves as a first step in automatically comparing humor theories in a quantitative manner.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# SoK: 自動運転車の画像処理パイプラインのセキュリティ

SoK: Security of the Image Processing Pipeline in Autonomous Vehicles ( http://arxiv.org/abs/2409.01234v1 )

ライセンス: Link先を確認

Michael Kühr, Mohammad Hamad, Pedram MohajerAnsari, Mert D. Pesé, Sebastian Steinhorst,

(参考訳) カメラは自動運転車にとって重要なセンサーだ。それらは、知覚を含む多くの安全クリティカルなタスクに不可欠な画像をキャプチャする。これらのイメージを処理するには、複数のレイヤを持つ複雑なパイプラインを使用する。このパイプラインに対するセキュリティ攻撃は、乗客の安全とシステムパフォーマンスに深刻な影響を及ぼす可能性がある。しかし、多くの攻撃はパイプラインの異なるレイヤを見落としており、その実現可能性と影響は様々である。画像処理パイプラインの品質と堅牢性を改善する研究は行われているが、これらの取り組みは、その潜在的な相乗効果を意識せずに、しばしばセキュリティ研究と並行して機能する。本研究では,自律走行車における画像処理パイプラインのセキュリティとロバスト性の研究を組み合わせることで,このギャップを埋めることを目的とする。我々は,自動車セキュリティ標準ISO 21434による攻撃のリスクを分類し,システムセキュリティ全体のすべてのレイヤを検討する必要性を強調した。我々はまた、既存の堅牢性の研究が、現在の研究ギャップに対処して、攻撃の影響を軽減するのにどのように役立つかを実証する。最後に、各層にまたがる様々なパラメータに影響を及ぼすことができる組込みテストベッドを提案し、研究者は異なる防御戦略の効果と攻撃効果を分析できる。ユースケース分析により,このようなテスト環境の重要性を実証し,強靭性に関する研究の一例として,HDR画像を用いて視覚障害を緩和する方法を示す。

Cameras are crucial sensors for autonomous vehicles. They capture images that are essential for many safety-critical tasks, including perception. To process these images, a complex pipeline with multiple layers is used. Security attacks on this pipeline can severely affect passenger safety and system performance. However, many attacks overlook different layers of the pipeline, and their feasibility and impact vary. While there has been research to improve the quality and robustness of the image processing pipeline, these efforts often work in parallel with security research, without much awareness of their potential synergy. In this work, we aim to bridge this gap by combining security and robustness research for the image processing pipeline in autonomous vehicles. We classify the risk of attacks using the automotive security standard ISO 21434, emphasizing the need to consider all layers for overall system security. We also demonstrate how existing robustness research can help mitigate the impact of attacks, addressing the current research gap. Finally, we present an embedded testbed that can influence various parameters across all layers, allowing researchers to analyze the effects of different defense strategies and attack impacts. We demonstrate the importance of such a test environment through a use-case analysis and show how blinding attacks can be mitigated using HDR imaging as an example of robustness-related research.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# MRIおよびメタボロミクスに基づく年齢スコアは、多コホート・フェデレーション・ラーニングによって示される死亡予測に相乗的に作用する

MRI-based and metabolomics-based age scores act synergetically for mortality prediction shown by multi-cohort federated learning ( http://arxiv.org/abs/2409.01235v1 )

ライセンス: Link先を確認

Pedro Mateus, Swier Garst, Jing Yu, Davy Cats, Alexander G. J. Harms, Mahlet Birhanu, Marian Beekman, P. Eline Slagboom, Marcel Reinders, Jeroen van der Grond, Andre Dekker, Jacobus F. A. Jansen, Magdalena Beran, Miranda T. Schram, Pieter Jelle Visser, Justine Moonen, Mohsen Ghanbari, Gennady Roshchupkin, Dina Vojinovic, Inigo Bermejo, Hailiang Mei, Esther E. Bron,

(参考訳) 生物学的年齢スコアは、生理的バイオマーカーに基づいて時系列を推定することにより、老化を特徴づける新しいツールである。様々なスコアが老化関連の結果と関連している。脳MRI画像(BrainAge)による年齢スコアとメタボミクスバイオマーカー(MetaboAge)による年齢スコアの関係について検討した。我々は3つのコホートでBrainAgeを推定するために、連合型ディープラーニングモデルを訓練した。フェデレートされたBrainAgeモデルでは,コホート全体の年齢予測誤差が局所訓練モデルよりも有意に低かった。コホート間の年齢間隔を調和させることにより、BrainAgeの精度が向上した。その後,連合と生存分析を用いてBrainAgeとMetaboAgeを比較した。その結果,BrainAgeとMetaboAgeの相関は小さかった。そこで本研究では,老化過程の異なる側面を捉えた老化スコアについて検討した。

Biological age scores are an emerging tool to characterize aging by estimating chronological age based on physiological biomarkers. Various scores have shown associations with aging-related outcomes. This study assessed the relation between an age score based on brain MRI images (BrainAge) and an age score based on metabolomic biomarkers (MetaboAge). We trained a federated deep learning model to estimate BrainAge in three cohorts. The federated BrainAge model yielded significantly lower error for age prediction across the cohorts than locally trained models. Harmonizing the age interval between cohorts further improved BrainAge accuracy. Subsequently, we compared BrainAge with MetaboAge using federated association and survival analyses. The results showed a small association between BrainAge and MetaboAge as well as a higher predictive value for the time to mortality of both scores combined than for the individual scores. Hence, our study suggests that both aging scores capture different aspects of the aging process.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# 信頼に値するハイパースペクトル画像分類のための空間認識コンフォーマル予測

Spatial-Aware Conformal Prediction for Trustworthy Hyperspectral Image Classification ( http://arxiv.org/abs/2409.01236v1 )

ライセンス: Link先を確認

Kangdao Liu, Tianhao Sun, Hao Zeng, Yongshan Zhang, Chi-Man Pun, Chi-Man Vong,

(参考訳) ハイパースペクトル画像(HSI)分類では、各ピクセルに特定のラベルを割り当て、様々な土地被覆カテゴリを識別する。深層分類器はこの分野で高い予測精度を示してきたが、その不確実性を定量化することは重要な課題であり、重要な文脈での応用を妨げる。本研究ではまず,HSI分類の文脈において,不確実性定量化の新たな手法である \textit{Conformal Prediction} (CP) の適用性について理論的に評価する。次に、信頼に値する予測セットをHSI分類器に提供するコンフォメーション手順を提案し、これらのセットが真のラベルをユーザ特定確率で含むことを保証するカバレッジ保証を提供する。この基盤を基盤として,HSIに固有の必須空間情報を,空間相関の高い画素の非整合点を集約して組み込んだ「textit{Spatial-Aware Conformal Prediction} (\texttt{SACP})」を導入する。理論的および実証的な結果は、'texttt{SACP} が HSI 分類において標準 CP より優れていることを示している。ソースコードは \url{https://github.com/J4ckLiu/SACP} でアクセスできる。

Hyperspectral image (HSI) classification involves assigning specific labels to each pixel to identify various land cover categories. Although deep classifiers have shown high predictive accuracy in this field, quantifying their uncertainty remains a significant challenge, which hinders their application in critical contexts. This study first theoretically evaluates the applicability of \textit{Conformal Prediction} (CP), an emerging technique for uncertainty quantification, in the context of HSI classification. We then propose a conformal procedure that provides HSI classifiers with trustworthy prediction sets, offering coverage guarantees that ensure these sets contain the true labels with a user-specified probability. Building on this foundation, we introduce \textit{Spatial-Aware Conformal Prediction} (\texttt{SACP}), which incorporates essential spatial information inherent in HSIs by aggregating non-conformity scores of pixels with high spatial correlation. Both theoretical and empirical results demonstrate that \texttt{SACP} outperforms standard CP in HSI classification. The source code is accessible at \url{https://github.com/J4ckLiu/SACP}.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# CyberCortex.AI: 自律ロボットと複雑自動化のためのAIベースのオペレーティングシステム

CyberCortex.AI: An AI-based Operating System for Autonomous Robotics and Complex Automation ( http://arxiv.org/abs/2409.01241v1 )

ライセンス: Link先を確認

Sorin Grigorescu, Mihai Zaha,

(参考訳) 自律型ロボットと複雑な自動化アプリケーションを制御するための基盤となるフレームワークは、知覚制御タスクをスケジューリングできるオペレーティングシステム(OS)であり、他のロボットピアやリモートクラウドコンピュータにリアルタイムのデータ通信を提供する。本稿では、異種AIベースのロボティクスと複雑な自動化アプリケーションを実現するために設計されたロボットOSであるCyberCortex.AIを紹介する。 CyberCortex.AIは分散分散OSで、ロボット同士の対話やクラウド上の高性能コンピュータ(HPC)との通信を可能にする。ロボットのセンサーと制御データは、その後ロボットにデプロイされるAIアルゴリズムのトレーニングを目的として、HPCシステムに向けてストリームされる。ロボットの各機能(例えば、知覚データ取得、経路計画、動作制御など)は、インターネットを介して共有されるいわゆるDataBlock of Filterの中で実行される。データは、いわゆる \textit{Temporal Addressable Memory} (TAM) を通じて格納され、各フィルタの入力と出力の間のゲートウェイとして機能する。 CyberCortex.AIには2つの主要なコンポーネントがある。 i) ロボットの組み込みハードウェア上で動作するDataBlockのリアルタイム実装であるCyberCortex.AI.inferenceシステム ii) クラウド上のHPCコンピュータ上で動作するCyberCortex.AI.dojoで、AIアルゴリズムの設計、トレーニング、デプロイに使用される。本稿では,Unitree A1脚ロボットとAnafi Parrot 4Kドローンをベースとした森林火災防止システムである「textit{i}」と,CyberCortex.AIを用いた自律走行システム「textit{ii}」を提案する。

The underlying framework for controlling autonomous robots and complex automation applications are Operating Systems (OS) capable of scheduling perception-and-control tasks, as well as providing real-time data communication to other robotic peers and remote cloud computers. In this paper, we introduce CyberCortex.AI, a robotics OS designed to enable heterogeneous AI-based robotics and complex automation applications. CyberCortex.AI is a decentralized distributed OS which enables robots to talk to each other, as well as to High Performance Computers (HPC) in the cloud. Sensory and control data from the robots is streamed towards HPC systems with the purpose of training AI algorithms, which are afterwards deployed on the robots. Each functionality of a robot (e.g. sensory data acquisition, path planning, motion control, etc.) is executed within a so-called DataBlock of Filters shared through the internet, where each filter is computed either locally on the robot itself, or remotely on a different robotic system. The data is stored and accessed via a so-called \textit{Temporal Addressable Memory} (TAM), which acts as a gateway between each filter's input and output. CyberCortex.AI has two main components: i) the CyberCortex.AI.inference system, which is a real-time implementation of the DataBlock running on the robots' embedded hardware, and ii) the CyberCortex.AI.dojo, which runs on an HPC computer in the cloud, and it is used to design, train and deploy AI algorithms. We present a quantitative and qualitative performance analysis of the proposed approach using two collaborative robotics applications: \textit{i}) a forest fires prevention system based on an Unitree A1 legged robot and an Anafi Parrot 4K drone, as well as \textit{ii}) an autonomous driving system which uses CyberCortex.AI for collaborative perception and motion control.

翻訳日:2024-09-06 06:47:21 公開日:2024-09-02

# 符号摂動和法のサンプル複雑度

Sample Complexity of the Sign-Perturbed Sums Method ( http://arxiv.org/abs/2409.01243v1 )

ライセンス: Link先を確認

Szabolcs Szentpéteri, Balázs Csanád Csáji,

(参考訳) 独立雑音項や対称雑音項などの微妙な統計的仮定の下で,真のシステムパラメータに対する正確で漸近的でない信頼領域を構成するサイン・パーステッド・サムズ法(SPS)のサンプル複雑性について検討する。 SPSの標準的なバージョンは線形回帰問題を扱うが、閉ループのセットアップであっても確率線形(力学)システムや非線形および非パラメトリック問題にも一般化できる。この手法の強い整合性は厳密に証明されたが、アルゴリズムのサンプルの複雑さはスカラー線形回帰問題に対してのみ解析された。本稿では,一般線形回帰問題に対するSPSのサンプル複雑性について検討する。有限試料径のSPS信頼領域の直径に対して高い確率上界を確立し,SPS領域が古典的漸近的信頼楕円体と同じ最適な速度で収縮することを示す。最後に,SPS信頼領域の理論的境界と経験的大きさの差について実験的に検討した。

We study the sample complexity of the Sign-Perturbed Sums (SPS) method, which constructs exact, non-asymptotic confidence regions for the true system parameters under mild statistical assumptions, such as independent and symmetric noise terms. The standard version of SPS deals with linear regression problems, however, it can be generalized to stochastic linear (dynamical) systems, even with closed-loop setups, and to nonlinear and nonparametric problems, as well. Although the strong consistency of the method was rigorously proven, the sample complexity of the algorithm was only analyzed so far for scalar linear regression problems. In this paper we study the sample complexity of SPS for general linear regression problems. We establish high probability upper bounds for the diameters of SPS confidence regions for finite sample sizes and show that the SPS regions shrink at the same, optimal rate as the classical asymptotic confidence ellipsoids. Finally, the difference between the theoretical bounds and the empirical sizes of SPS confidence regions is investigated experimentally.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# 安全強化学習における安全な探索の再考

Revisiting Safe Exploration in Safe Reinforcement learning ( http://arxiv.org/abs/2409.01245v1 )

ライセンス: Link先を確認

David Eckel, Baohe Zhang, Joschka Bödecker,

(参考訳) 安全強化学習(SafeRL)は、安全という概念で標準的な強化学習を拡張している。しかし、この指標はコストの上昇の程度を区別できず、頻繁な軽度イベントと同等の重大なコストイベントを扱い、リスクの高い振る舞いを招き、安全でない探索をもたらす可能性がある。本研究では, トレーニング中の安全対策として, 安全でないステップの重大度を連続的発生に基づいて評価する, 新たな測定基準であるEMCCを導入する。この指標は特に、長期間の安全違反と時折の安全違反の区別に有効である。 EMMCをオン・アンド・オフ・ポリシーのアルゴリズムに応用し,その安全性をベンチマークする。最後に,ベンチマークによる評価を行い,アルゴリズム設計のための高速な評価を可能にする,新しい軽量ベンチマークタスクを提案する。

Safe reinforcement learning (SafeRL) extends standard reinforcement learning with the idea of safety, where safety is typically defined through the constraint of the expected cost return of a trajectory being below a set limit. However, this metric fails to distinguish how costs accrue, treating infrequent severe cost events as equal to frequent mild ones, which can lead to riskier behaviors and result in unsafe exploration. We introduce a new metric, expected maximum consecutive cost steps (EMCC), which addresses safety during training by assessing the severity of unsafe steps based on their consecutive occurrence. This metric is particularly effective for distinguishing between prolonged and occasional safety violations. We apply EMMC in both on- and off-policy algorithm for benchmarking their safe exploration capability. Finally, we validate our metric through a set of benchmarks and propose a new lightweight benchmark task, which allows fast evaluation for algorithm design.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# 連続波場を用いた水素充填中空コアファイバの周波数変換

Frequency conversion in a hydrogen-filled hollow-core fiber using continuous-wave fields ( http://arxiv.org/abs/2409.01246v1 )

ライセンス: Link先を確認

Anica Hamer, Frank Vewinger, Thorsten Peters, Michael H. Frosz, Simon Stellmer,

(参考訳) 光ファイバーに基づく大規模量子ネットワークでは、光子はいわゆるフライング量子ビットとして情報の基本キャリアである。これらはまた、可視または近赤外波長で動作する原子と固体のプラットフォームからなるハイブリッドアーキテクチャの異なるコンポーネント間の相互接続や、テレコムバンド内の光リンクとして機能する。量子周波数変換は、その量子状態を保ちながら単一の光子の色を変える経路である。現在、このプロセスには非線形結晶が使用されている。しかし、その性能は、受信帯域幅、チューニング可能性、偏光感度、および望ましくない背景放射によって制限される。有望な代替手段は、気体中の刺激されたラマン散乱に基づいている。ここでは,水素充填反共振中空コアファイバの偏光保存周波数変換について述べる。このアプローチは光ファイバーネットワークへのシームレスな統合と単一エミッタへのインタフェースを約束する。パルスポンプ場を用いた関連する実験とは違い、2つのコヒーレント連続波ポンプ場を利用する。

In large-area quantum networks based on optical fibers, photons are the fundamental carriers of information as so-called flying qubits. They may also serve as the interconnect between different components of a hybrid architecture, which might comprise atomic and solid state platforms operating at visible or near-infrared wavelengths, as well as optical links in the telecom band. Quantum frequency conversion is the pathway to change the color of a single photon while preserving its quantum state. Currently, nonlinear crystals are utilized for this process. However, their performance is limited by their acceptance bandwidth, tunability, polarization sensitivity, as well as undesired background emission. A promising alternative is based on stimulated Raman scattering in gases. Here, we demonstrate polarization-preserving frequency conversion in a hydrogen-filled anti-resonant hollow-core fiber. This approach holds promises for seamless integration into optical fiber networks and interfaces to single emitters. Disparate from related experiments that employ a pulsed pump field, we here take advantage of two coherent continuous-wave pump fields.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# 大規模言語モデルにおけるリスク評価のための会話複雑度

Conversational Complexity for Assessing Risk in Large Language Models ( http://arxiv.org/abs/2409.01247v1 )

ライセンス: Link先を確認

John Burden, Manuel Cebrian, Jose Hernandez-Orallo,

(参考訳) 大きな言語モデル(LLM)は二重用途ジレンマを示し、特に対話的相互作用を通じて、有害な可能性を持ちながら有益なアプリケーションを可能にする。様々な安全対策にもかかわらず、先進的なLLMは脆弱なままである。ケビン・ルースのBingとの有名な会話は、長期にわたる対話の後有害なアウトプットを引き起こした。これは、同様のコンテンツをより簡単に作成できる単純な初期のジェイルブレイクとは対照的であり、疑問を提起する: LLMから有害な情報を引き出すのに、どのくらいの会話努力が必要か? 本稿では,特定の応答を得るために使用される会話長を定量化する会話長(CL)と,その応答につながるユーザの命令シーケンスのコルモゴロフ複雑性として定義される会話複雑度(CC)の2つの尺度を提案する。 Kolmogorov複雑性の計算不能性に対処するため,リファレンスLCMを用いてCCを近似し,ユーザ命令の圧縮性を評価する。このアプローチを大規模な赤チームデータセットに適用し、有害で無害な会話の長さと複雑さの統計的分布を定量的に分析する。我々の経験から、この分布分析とCCの最小化はAIの安全性を理解するための貴重なツールであり、有害な情報のアクセシビリティに関する洞察を与えてくれることが示唆されている。この研究は、LLMの安全性に対する新たな視点の基礎を確立し、害を与える経路のアルゴリズム的な複雑さを中心にしている。

Large Language Models (LLMs) present a dual-use dilemma: they enable beneficial applications while harboring potential for harm, particularly through conversational interactions. Despite various safeguards, advanced LLMs remain vulnerable. A watershed case was Kevin Roose's notable conversation with Bing, which elicited harmful outputs after extended interaction. This contrasts with simpler early jailbreaks that produced similar content more easily, raising the question: How much conversational effort is needed to elicit harmful information from LLMs? We propose two measures: Conversational Length (CL), which quantifies the conversation length used to obtain a specific response, and Conversational Complexity (CC), defined as the Kolmogorov complexity of the user's instruction sequence leading to the response. To address the incomputability of Kolmogorov complexity, we approximate CC using a reference LLM to estimate the compressibility of user instructions. Applying this approach to a large red-teaming dataset, we perform a quantitative analysis examining the statistical distribution of harmful and harmless conversational lengths and complexities. Our empirical findings suggest that this distributional analysis and the minimisation of CC serve as valuable tools for understanding AI safety, offering insights into the accessibility of harmful information. This work establishes a foundation for a new perspective on LLM safety, centered around the algorithmic complexity of pathways to harm.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# 逆算法:逆算法ロバスト性判定法の検討と評価

Adversarial Pruning: A Survey and Benchmark of Pruning Methods for Adversarial Robustness ( http://arxiv.org/abs/2409.01249v1 )

ライセンス: Link先を確認

Giorgio Piras, Maura Pintor, Ambra Demontis, Battista Biggio, Giorgio Giacinto, Fabio Roli,

(参考訳) 近年の研究では、ニューラルネットワークのプルーニング技術が提案され、ネットワークのサイズを減らし、敵の例に対する堅牢性を保っている。これらの手法は, 複雑で明瞭な設計を伴い, 相違を解析し, 公正かつ正確な比較を行うのが困難である。本研究では,これらの課題を,現在の敵作法を調査し,パイプライン,いつ産卵するか,具体例,どのように産卵するかという2つの主要な次元に基づいて分類する新しい分類法を提案することによって克服する。次に、現在の経験分析の限界を強調し、それに対応するための新しい公正な評価ベンチマークを提案する。最終的に,現在の逆解析手法の実証的再評価を行い,その結果について考察し,高い性能の逆解析手法の共通特性と共通問題を明らかにする。 https://github.com/pralab/AdversarialPruningBenchmarkで公開されているベンチマークへのコントリビューションを歓迎します。

Recent work has proposed neural network pruning techniques to reduce the size of a network while preserving robustness against adversarial examples, i.e., well-crafted inputs inducing a misclassification. These methods, which we refer to as adversarial pruning methods, involve complex and articulated designs, making it difficult to analyze the differences and establish a fair and accurate comparison. In this work, we overcome these issues by surveying current adversarial pruning methods and proposing a novel taxonomy to categorize them based on two main dimensions: the pipeline, defining when to prune; and the specifics, defining how to prune. We then highlight the limitations of current empirical analyses and propose a novel, fair evaluation benchmark to address them. We finally conduct an empirical re-evaluation of current adversarial pruning methods and discuss the results, highlighting the shared traits of top-performing adversarial pruning methods, as well as common issues. We welcome contributions in our publicly-available benchmark at https://github.com/pralab/AdversarialPruningBenchmark

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# GAS: アクティベーション支援型非同期フェデレーションラーニング

GAS: Generative Activation-Aided Asynchronous Split Federated Learning ( http://arxiv.org/abs/2409.01251v1 )

ライセンス: Link先を確認

Jiarong Yang, Yuan Liu,

(参考訳) Split Federated Learning (SFL)は、クライアントとサーバ間の共有モデルを分割し、協調的にトレーニングする。最近のSFL研究は、クライアントからサーバへのアクティベーションとクライアント側モデルの同期送信を想定している。しかし、クライアント間の計算能力と通信能力の大幅な変化により、アクティベーションとクライアント側モデルが非同期にサーバにやってくる。非同期による遅延はSFLの性能を著しく低下させる。この問題に対処するために,アクティベーションバッファとモデルバッファをサーバに埋め込んで,それぞれに非同期に送信されるアクティベーションとクライアント側モデルを管理する非同期SFLフレームワークを検討する。さらに、非同期アクティベーション送信がリソース豊富なクライアントからのアクティベーションを頻繁に受信するので、サーバサイドモデルのバイアスのある更新につながるため、生成アクティベーション支援非同期SFL(GAS)を提案する。 GASでは、受信したアクティベーションに基づいて各ラベルのアクティベーション分布を保持し、バイアスの程度に応じてこれらの分布からアクティベーションを生成する。これらの生成アクティベーションは、サーバサイドモデルの更新を支援し、より正確な更新を保証するために使用される。より厳密な収束境界を導出し,提案手法の有効性を実証した。

Split Federated Learning (SFL) splits and collaboratively trains a shared model between clients and server, where clients transmit activations and client-side models to server for updates. Recent SFL studies assume synchronous transmission of activations and client-side models from clients to server. However, due to significant variations in computational and communication capabilities among clients, activations and client-side models arrive at server asynchronously. The delay caused by asynchrony significantly degrades the performance of SFL. To address this issue, we consider an asynchronous SFL framework, where an activation buffer and a model buffer are embedded on the server to manage the asynchronously transmitted activations and client-side models, respectively. Furthermore, as asynchronous activation transmissions cause the buffer to frequently receive activations from resource-rich clients, leading to biased updates of the server-side model, we propose Generative activations-aided Asynchronous SFL (GAS). In GAS, the server maintains an activation distribution for each label based on received activations and generates activations from these distributions according to the degree of bias. These generative activations are then used to assist in updating the server-side model, ensuring more accurate updates. We derive a tighter convergence bound, and our experiments demonstrate the effectiveness of the proposed method.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# 単眼深度3次元モデリングによる自律走行のリアルタイム予測

Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling ( http://arxiv.org/abs/2409.01256v1 )

ライセンス: Link先を確認

Haicheng Liao, Yongkang Li, Chengyue Wang, Songning Lai, Zhenning Li, Zilin Bian, Jaeyoung Lee, Zhiyong Cui, Guohui Zhang, Chengzhong Xu,

(参考訳) 交通事故予測の第一の目的は、自動運転技術の安全性と信頼性を高める上で重要な課題であるダシュカムビデオを用いて、潜在的な事故をリアルタイムで予測することである。本研究では,高度な3Dシーンモデリングのためのモノクルディープキューを組み込むことにより,現在のSOTA(State-of-the-art (SOTA))2D手法を超えて予測能力を著しく向上させる,革新的なフレームワークであるAccNetを紹介する。本稿では,交通事故データセットにおけるスキュードデータ分散の課題に対処し,早期予測のためのバイナリ適応損失(BA-LEA)を提案する。この新たな損失関数は、マルチタスク学習戦略とともに、予測モデルの焦点を事故前の臨界瞬間にシフトさせる。平均精度 (AP) や平均タイム・トゥ・アクシデント (mTTA) といった重要な指標を用いて, 予測精度に優れるDashcam Accident Dataset (DAD) , Car Crash Dataset (CCD) , AnAn Accident Detection (A3D) , DADA-2000 Dataset (DADA-2000) の3つのベンチマークデータセット上で, フレームワークの性能を厳格に評価する。

The primary goal of traffic accident anticipation is to foresee potential accidents in real time using dashcam videos, a task that is pivotal for enhancing the safety and reliability of autonomous driving technologies. In this study, we introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods by incorporating monocular depth cues for sophisticated 3D scene modeling. Addressing the prevalent challenge of skewed data distribution in traffic accident datasets, we propose the Binary Adaptive Loss for Early Anticipation (BA-LEA). This novel loss function, together with a multi-task learning strategy, shifts the focus of the predictive model towards the critical moments preceding an accident. {We rigorously evaluate the performance of our framework on three benchmark datasets--Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D), and DADA-2000 Dataset--demonstrating its superior predictive accuracy through key metrics such as Average Precision (AP) and mean Time-To-Accident (mTTA).

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# 二重機械学習がパネルデータに到達 -- 約束、落とし穴、潜在的な解決策

Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions ( http://arxiv.org/abs/2409.01266v1 )

ライセンス: Link先を確認

Jonathan Fuhr, Dominik Papies,

(参考訳) 機械学習(ML)アルゴリズムを用いた因果効果の推定は、適切なフレームワークで使用すれば、機能的なフォーム仮定を緩和するのに役立ちます。しかしながら、これらのフレームワークの多くは断面データの設定を前提としていますが、研究者はしばしばパネルデータにアクセスでき、従来の方法では、ユニット間の不均一性を扱うのに役立ちます。本稿では、観測されていない異種性の存在下でのパネルデータに対して、ダブル/デバイアスド機械学習(DML)(Chernozhukov et al , 2018)を適用する方法について検討する。この適応は、DMLのクロスフィッティング手順が独立データを前提としており、観測されていない不均一性は、非線形に観測された不均一な設定において必ずしも追加的に分離可能であるとは限らないため、困難である。様々なシミュレーションにおいて,直感的に魅力的な推定器の性能を評価する。クロスフィット仮定の違反は効果推定の正確性にほとんど不適切であると考えられるが、多くの手法では観測されていない不均一性の存在を適切に考慮できない。しかし,DMLにおける相関ランダムエフェクトアプローチ(Mundlak, 1978)に基づく予測モデルを用いることで,観測された共同設立者数に対して大きなサンプルサイズを考慮し,正確な係数推定が可能であることが判明した。また、観測された共同設立者に対する観測されていない異種性の影響が、ほとんどの代替手法の性能に重要な役割を担っていることも示している。

Estimating causal effect using machine learning (ML) algorithms can help to relax functional form assumptions if used within appropriate frameworks. However, most of these frameworks assume settings with cross-sectional data, whereas researchers often have access to panel data, which in traditional methods helps to deal with unobserved heterogeneity between units. In this paper, we explore how we can adapt double/debiased machine learning (DML) (Chernozhukov et al., 2018) for panel data in the presence of unobserved heterogeneity. This adaptation is challenging because DML's cross-fitting procedure assumes independent data and the unobserved heterogeneity is not necessarily additively separable in settings with nonlinear observed confounding. We assess the performance of several intuitively appealing estimators in a variety of simulations. While we find violations of the cross-fitting assumptions to be largely inconsequential for the accuracy of the effect estimates, many of the considered methods fail to adequately account for the presence of unobserved heterogeneity. However, we find that using predictive models based on the correlated random effects approach (Mundlak, 1978) within DML leads to accurate coefficient estimates across settings, given a sample size that is large relative to the number of observed confounders. We also show that the influence of the unobserved heterogeneity on the observed confounders plays a significant role for the performance of most alternative methods.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# DAVIDE:深度を意識した動画のデブリ

DAVIDE: Depth-Aware Video Deblurring ( http://arxiv.org/abs/2409.01274v1 )

ライセンス: Link先を確認

German F. Torres, Jussi Kalliola, Soumya Tripathy, Erman Acar, Joni-Kristian Kämäräinen,

(参考訳) ビデオのデブロアリングは、ぼやけたフレームの連続からシャープな詳細を回復することを目的としている。携帯電話における深度センサの普及と,深度情報により深度を誘導する可能性にもかかわらず,深度認識の難読化はわずかしか注目されていない。本稿では,映像の深度情報の影響を研究するために,DAVIDEデータセットについて紹介する。データセットは、同期されたぼかし、シャープ、ディープビデオで構成されている。本稿では,既存の深度RGBビデオデブロアリングモデルに深度情報を注入する方法について検討し,深度対応ビデオデブロアリングのための強力なベースラインを提案する。ビデオデブリにおける深度情報の意義を明らかにするとともに,深度手がかりが有用である症例について考察した。さらに, この結果から, 深度が劣化性能を向上する一方で, モデルに時間的コンテキストが長くなると, この効果は低下することが示された。プロジェクトページ: https://germanftv.github.io/DAVIDE.github.io/

Video deblurring aims at recovering sharp details from a sequence of blurry frames. Despite the proliferation of depth sensors in mobile phones and the potential of depth information to guide deblurring, depth-aware deblurring has received only limited attention. In this work, we introduce the 'Depth-Aware VIdeo DEblurring' (DAVIDE) dataset to study the impact of depth information in video deblurring. The dataset comprises synchronized blurred, sharp, and depth videos. We investigate how the depth information should be injected into the existing deep RGB video deblurring models, and propose a strong baseline for depth-aware video deblurring. Our findings reveal the significance of depth information in video deblurring and provide insights into the use cases where depth cues are beneficial. In addition, our results demonstrate that while the depth improves deblurring performance, this effect diminishes when models are provided with a longer temporal context. Project page: https://germanftv.github.io/DAVIDE.github.io/ .

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# CHSH不等式の不整合

An inconsistency in the CHSH inequality ( http://arxiv.org/abs/2409.01275v1 )

ライセンス: Link先を確認

Andrea Aiello,

(参考訳) CHSHの不等式の違反は、量子力学と局所的で現実的な隠れ変数理論の間の不可逆的な衝突を示すと考えられている。我々は、CHSH不等式を証明する数学的仮定が、実際、そのような不等式をテストする実験の物理学とは相容れないことを示した。これは、現在利用可能な実験データに基づいて、局所的な現実的な隠れ変数理論を排除できないことを意味する。しかし、CHSH不等式の実験的な証明は原則として可能であることも示しているが、実際、そのような実験をどのように実装するかは定かではない。

Violation of the CHSH inequality supposedly demonstrates an irreconcilable conflict between quantum mechanics and local, realistic hidden variable theories. We show that the mathematical assumptions underlying the proof of the CHSH inequality are, in fact, incompatible with the physics of the experiments testing such inequality. This implies that we cannot dismiss local realistic hidden variable theories on the basis of currently available experimental data yet. However, we also show that an experimental proof of CHSH inequality is, in principle, possible, but it is unclear how to implement, in practice, such an experiment.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# ビジネスプロセス改善の体系的レビュー:運用研究とビジネスプロセスマネジメントの概念の融合における成果と可能性

A Systematic Review of Business Process Improvement: Achievements and Potentials in Combining Concepts from Operations Research and Business Process Management ( http://arxiv.org/abs/2409.01276v1 )

ライセンス: Link先を確認

Michel Kunkler, Felix Schumann, Stefanie Rinderle-Ma,

(参考訳) ビジネスプロセスマネジメントとオペレーションリサーチは、どちらも組織における価値創造を強化することを目的としている2つの研究分野です。ビジネス・プロセス・マネジメントは歴史的に正確なモデルの提供に重点を置いてきたが、オペレーティング・リサーチはトラクタブル・モデルとそのソリューションの構築に重点を置いてきた。この体系的な文献レビューは、両方の分野から組み合わせた概念を用いた作品を特定し分析する。特に、ビジネスプロセスモデルがどのように数学的モデルとして概念化され、どの最適化技術がこれらのモデルに適用されたかを分析する。その結果,資源配分とスケジューリングの問題に強い焦点が当てられている。現在のアプローチは、多くの問題に対する確率的な性質のサポートを欠いていることが多く、リソース関連の情報やデータの観点からの情報といった、プロセスモデルやイベントログからの情報のみをわずかに利用する。

Business Process Management and Operations Research are two research fields that both aim to enhance value creation in organizations. While Business Process Management has historically emphasized on providing precise models, Operations Research has focused on constructing tractable models and their solutions. This systematic literature review identifies and analyzes work that uses combined concepts from both disciplines. In particular, it analyzes how business process models have been conceptualized as mathematical models and which optimization techniques have been applied to these models. Results indicate a strong focus on resource allocation and scheduling problems. Current approaches often lack support of the stochastic nature of many problems, and do only sparsely use information from process models or from event logs, such as resource-related information or information from the data perspective.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# 1次元ベクトル量子化に基づく画像分類に基づく逆攻撃

One-Index Vector Quantization Based Adversarial Attack on Image Classification ( http://arxiv.org/abs/2409.01282v1 )

ライセンス: Link先を確認

Haiju Fan, Xiaona Qin, Shuang Chen, Hubert P. H. Shum, Ming Li,

(参考訳) ストレージと送信を改善するため、画像は一般的に圧縮される。ベクトル量子化(VQ)は圧縮率の高い圧縮法であり、他の圧縮手法を抑圧する。これにもかかわらず、画像分類における既存の敵対的攻撃法は、圧縮された領域ではほとんど例外なくピクセル領域で実行されており、現実のシナリオでは適用できない。本稿では,VQ領域における新たなワンインデックス攻撃手法を提案する。ワンインデックス攻撃方法は、圧縮されたデータストリーム内の単一のインデックスを変更して、圧縮された画像が誤って分類されるようにする。攻撃を実現するには単一のVQインデックスを変更するだけでよい。提案手法は,実際の攻撃シナリオと一致した半ブラックボックス攻撃に属する。本稿では,Resnet,NIN,VGG16の3つの画像分類モデルに対して,本手法を適用した。 CIFAR-10 と Fashion MNIST の画像の 55.9% と 77.4% はそれぞれ、高いレベルの誤分類信頼性と低いレベルの画像摂動で攻撃に成功している。

To improve storage and transmission, images are generally compressed. Vector quantization (VQ) is a popular compression method as it has a high compression ratio that suppresses other compression techniques. Despite this, existing adversarial attack methods on image classification are mostly performed in the pixel domain with few exceptions in the compressed domain, making them less applicable in real-world scenarios. In this paper, we propose a novel one-index attack method in the VQ domain to generate adversarial images by a differential evolution algorithm, successfully resulting in image misclassification in victim models. The one-index attack method modifies a single index in the compressed data stream so that the decompressed image is misclassified. It only needs to modify a single VQ index to realize an attack, which limits the number of perturbed indexes. The proposed method belongs to a semi-black-box attack, which is more in line with the actual attack scenario. We apply our method to attack three popular image classification models, i.e., Resnet, NIN, and VGG16. On average, 55.9% and 77.4% of the images in CIFAR-10 and Fashion MNIST, respectively, are successfully attacked, with a high level of misclassification confidence and a low level of image perturbation.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# IoMTの医療・患者に対する包括的影響

Comprehensive up-to-date impact of the IoMT in healthcare and patients ( http://arxiv.org/abs/2409.01287v1 )

ライセンス: Link先を確認

Guy. Mouanda,

(参考訳) インターネット・オブ・メディカル・モノ(IoMT)は、医療データの収集・拡散に多数の技術を適用し、医療サービスの特徴、有効性、可用性を開発することを目的とした、急速に拡大する分野である。 IoMTデバイスには、ウェアラブルセンサー、インプラント可能なデバイス、スマートホームメソッド、遠隔医療ポリシー、モバイルアプリケーションが含まれている。 IoMTの応用は、慢性疾患の管理、遠隔患者の監視、緊急対応、臨床診断支援、健康増進、健康など多岐にわたる。本稿では,この発展領域の長所,短所,展望の方向性について考察する。また、IoMTの倫理的、法的、社会的意味や、IoMT環境の危険性や脆弱性についても検討する。

The Internet of Medical Things (IoMT) is a quickly expanding field that intends to develop the features, effectiveness, and availability of healthcare services by applying numerous technologies to gather and diffuse medical data. IoMT devices incorporate wearable sensors, implantable devices, smart home methods, telemedicine policies, and mobile applications. IoMT applications range from chronic disease administration, remote patient monitoring, emergency response, and clinical decision support to health promotion and wellness. This paper aligns on the advantages, defies, and outlook directions of this developing domain. The paper also examines the ethical, legal, and social implications of IoMT, as well as the possible risks and vulnerabilities of the IoMT environment

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# De Broglie-Bohm量子力学

De Broglie-Bohm Quantum Mechanics ( http://arxiv.org/abs/2409.01294v1 )

ライセンス: Link先を確認

Antony Valentini,

(参考訳) De Broglie-Bohmパイロット波の量子力学の定式化について概説し、場の理論、高エネルギー物理学、重力、宇宙論への応用を強調した。

We provide an overview of the de Broglie-Bohm pilot-wave formulation of quantum mechanics, emphasising its applications to field theory, high-energy physics, gravitation, and cosmology.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# 絡み合いの離散診断としての位相次数と$Δ$VAEへの応用

Topological degree as a discrete diagnostic for disentanglement, with applications to the $Δ$VAE ( http://arxiv.org/abs/2409.01303v1 )

ライセンス: Link先を確認

Mahefa Ratsisetraina Ravelonanosy, Vlado Menkovski, Jacobus W. Portegies,

(参考訳) 本研究では, 単位球面$\mathcal{S}^2$の拡散変分オートエンコーダ(\Delta$VAE)を潜在空間として用いて, トポロジカル・幾何学的構造を捉え, 潜在因子を分解する能力について検討する。そこで本研究では,データ多様体から潜在空間への写像であるエンコーダの位相次数 (topological degree of the encoder) を新たに導入する。ホモロジー理論のツールを用いて、この次数を計算するアルゴリズムを導出し、実装する。トレーニング手順から得られたモデルのエンコーダのエンコーダの度合いをアルゴリズムを用いて計算する。実験の結果、$\Delta$VAEはLSBDスコアが比較的小さいことを示し、初期化後の度合に関わらず、訓練後のエンコーダの次数は$-1$または$+1$となり、その結果、エンコーダは少なくとも同相であることを示す。

We investigate the ability of Diffusion Variational Autoencoder ($\Delta$VAE) with unit sphere $\mathcal{S}^2$ as latent space to capture topological and geometrical structure and disentangle latent factors in datasets. For this, we introduce a new diagnostic of disentanglement: namely the topological degree of the encoder, which is a map from the data manifold to the latent space. By using tools from homology theory, we derive and implement an algorithm that computes this degree. We use the algorithm to compute the degree of the encoder of models that result from the training procedure. Our experimental results show that the $\Delta$VAE achieves relatively small LSBD scores, and that regardless of the degree after initialization, the degree of the encoder after training becomes $-1$ or $+1$, which implies that the resulting encoder is at least homotopic to a homeomorphism.

翻訳日:2024-09-06 06:37:11 公開日:2024-09-02

# ニューラルネットワークを用いた高精度実空間電子密度

Highly Accurate Real-space Electron Densities with Neural Networks ( http://arxiv.org/abs/2409.01306v1 )

ライセンス: Link先を確認

Lixue Cheng, P. Bernát Szabó, Zeno Schätzle, Derk Kooi, Jonas Köhler, Klaas J. H. Giesbertz, Frank Noé, Jan Hermann, Paola Gori-Giorgi, Adam Foster,

(参考訳) 量子化学における変分ab-initio法は、波動関数への直接アクセスを提供する他の方法の中でも際立っている。これは原則として、エネルギー以外の他の観測可能な興味の抽出を可能にするが、実際、この抽出は技術的に困難であり、計算的に非現実的であることが多い。ここでは,電子密度を量子化学において観測可能な中心となるものとみなし,その密度を既知の漸近特性を捉えるニューラルネットワークを用いて表現し,スコアマッチングとノイズコントラスト推定により波動関数からトレーニングすることにより,実空間多電子波関数から正確な密度を求める新しい手法を提案する。深層学習型 ans\atze (深部QMC) を用いた変分量子モンテカルロを用いて、基底セット誤差のない高精度な波動関数を得るとともに、新しい手法を用いて、双極子モーメント、原子間力、接触密度、その他の密度に基づく特性を計算して、対応する正確な電子密度を求める。

Variational ab-initio methods in quantum chemistry stand out among other methods in providing direct access to the wave function. This allows in principle straightforward extraction of any other observable of interest, besides the energy, but in practice this extraction is often technically difficult and computationally impractical. Here, we consider the electron density as a central observable in quantum chemistry and introduce a novel method to obtain accurate densities from real-space many-electron wave functions by representing the density with a neural network that captures known asymptotic properties and is trained from the wave function by score matching and noise-contrastive estimation. We use variational quantum Monte Carlo with deep-learning ans\"atze (deep QMC) to obtain highly accurate wave functions free of basis set errors, and from them, using our novel method, correspondingly accurate electron densities, which we demonstrate by calculating dipole moments, nuclear forces, contact densities, and other density-based properties.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# クープマン演算子理論によるニューラルネットワーク層を線形演算として表現する

Representing Neural Network Layers as Linear Operations via Koopman Operator Theory ( http://arxiv.org/abs/2409.01308v1 )

ライセンス: Link先を確認

Nishant Suresh Aswani, Saif Eddin Jabari, Muhammad Shafique,

(参考訳) 単純なニューラルネットワークの強い性能は、しばしばその非線形活性化に起因する。しかし、ニューラルネットワークの線形ビューは、ネットワークの理解と制御をよりアプローチしやすくする。ニューラルネットワークの動的システムビューから、クープマン作用素理論と動的モード分解(DMD)との接続を利用して、新しい視点を提供する。同時に、システムを適切な可観測空間に埋め込むことで、動的システムを線形化するフレームワークを提供する。ニューラルネットワークを力学系として再フレーミングすることにより、事前学習された多層パーセプトロン(MLP)の非線形層を有限次元線形作用素に置き換えることができることを示す。さらに、DMD の固有値と SVD の右特異ベクトルを分析し、時間遅延座標がネットワーク層を線形化するクープマン理論において、単純かつ高効率な観測可能空間を提供することを示す。その結果、Yin-YangデータセットでトレーニングされたMLPの層をDMDモデルからの予測に置き換え、元の98.4%と比較して最大97.3%のmdoel精度を実現した。さらに、MNISTデータセットでトレーニングされたMLPのレイヤを95.8%に置き換える。

The strong performance of simple neural networks is often attributed to their nonlinear activations. However, a linear view of neural networks makes understanding and controlling networks much more approachable. We draw from a dynamical systems view of neural networks, offering a fresh perspective by using Koopman operator theory and its connections with dynamic mode decomposition (DMD). Together, they offer a framework for linearizing dynamical systems by embedding the system into an appropriate observable space. By reframing a neural network as a dynamical system, we demonstrate that we can replace the nonlinear layer in a pretrained multi-layer perceptron (MLP) with a finite-dimensional linear operator. In addition, we analyze the eigenvalues of DMD and the right singular vectors of SVD, to present evidence that time-delayed coordinates provide a straightforward and highly effective observable space for Koopman theory to linearize a network layer. Consequently, we replace layers of an MLP trained on the Yin-Yang dataset with predictions from a DMD model, achieving a mdoel accuracy of up to 97.3%, compared to the original 98.4%. In addition, we replace layers in an MLP trained on the MNIST dataset, achieving up to 95.8%, compared to the original 97.2% on the test set.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# 集団開発における望ましくないパターンとは何か

What Could Possibly Go Wrong: Undesirable Patterns in Collective Development ( http://arxiv.org/abs/2409.01312v1 )

ライセンス: Link先を確認

Mikhail Evtikhiev, Ekaterina Koshchenko, Vladimir Kovalenko,

(参考訳) ソフトウェア開発は、しばしば技術的取り組みと見なされるが、基本的には、チームメンバ間のコラボレーションを必要とする社会的活動である。これを認めて、ソフトウェア開発コミュニティは、コラボレーションに関連する潜在的な欠点に対処するための戦略を考案した。様々な研究がソフトウェア工学における社会的ダイナミクスを捉えようと試みている。本研究では,多くのチームワーク問題を識別する手法を開発し,それに対応する様々なアプローチを提案する。しかしながら、一部のチームワークの問題はまだ検討されておらず、実践者の認識から共通のパターンへの包括的ボトムアップ調査が必要である。本稿では, 集団開発における望ましくないパターンの概念を紹介する。詳細な38回の探索的なインタビューを通じて,42のパターンを識別・分類し,その起源と結果を明らかにする。その後の調査では、それぞれ436名と968名の参加者が、望ましくないパターンの重要性と頻度を調査し、これらのパターンを管理する潜在的なツールや特徴を評価した。この研究は、望ましくないパターンの微妙な理解に寄与し、その影響を評価し、産業応用のための実用的ツールと特徴を提案する。この発見は、より詳細な研究と、協調的なソフトウェアエンジニアリングプラクティスを強化するツールの開発のための貴重な基盤を提供する。

Software development, often perceived as a technical endeavor, is fundamentally a social activity requiring collaboration among team members. Acknowledging this, the software development community has devised strategies to address possible collaboration-related shortcomings. Various studies have attempted to capture the social dynamics within software engineering. In these studies, the authors developed methods to identify numerous teamwork issues and proposed various approaches to address them. However, certain teamwork issues remain unstudied, necessitating a comprehensive bottom-up exploration from practitioner's perceptions to common patterns. This paper introduces the concept of undesirable patterns in collective development, referring to potential teamwork problems that may escalate if unaddressed. Through 38 in-depth exploratory interviews, we identify and classify 42 patterns, revealing their origins and consequences. Subsequent surveys, 436 and 968 participants each, explore the significance and frequency of the undesirable patterns, and evaluate potential tools and features to manage these patterns. The study contributes a nuanced understanding of undesirable patterns, evaluating their impact and proposing pragmatic tools and features for industrial application. The findings provide a valuable foundation for further in-depth studies and the development of tools to enhance collaborative software engineering practices.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# イメージジェネレータの診断精度向上のための平均埋め込み

Disentangling Mean Embeddings for Better Diagnostics of Image Generators ( http://arxiv.org/abs/2409.01314v1 )

ライセンス: Link先を確認

Sebastian G. Gruber, Pascal Tobias Ziegler, Florian Buettner,

(参考訳) イメージジェネレータの評価は、特定の画像領域に対する微妙な洞察を提供することにおいて、従来のメトリクスの限界のため、依然として課題である。画像のすべての領域が同様の容易さで学習されるわけではないため、これは重要な問題である。本研究では,中心核アライメントによる個々の画素クラスタに対するコサイン類似性の積に平均埋め込みのコサイン類似性を解き放つ新しい手法を提案する。これにより、クラスタワイズ性能が全体の画像生成性能に与える影響を定量化することができる。実世界の様々なユースケースにおいて、モデル誤動作の画素領域を識別する可能性や説明性をいかに向上させるかを示す。

The evaluation of image generators remains a challenge due to the limitations of traditional metrics in providing nuanced insights into specific image regions. This is a critical problem as not all regions of an image may be learned with similar ease. In this work, we propose a novel approach to disentangle the cosine similarity of mean embeddings into the product of cosine similarities for individual pixel clusters via central kernel alignment. Consequently, we can quantify the contribution of the cluster-wise performance to the overall image generation performance. We demonstrate how this enhances the explainability and the likelihood of identifying pixel regions of model misbehavior across various real-world use cases.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# 多周波ニューラルボルン反復法による2次元逆散乱問題の解法

Multi-frequency Neural Born Iterative Method for Solving 2-D Inverse Scattering Problems ( http://arxiv.org/abs/2409.01315v1 )

ライセンス: Link先を確認

Daoqi Liu, Tao Shan, Maokun Li, Fan Yang, Shenheng Xu,

(参考訳) 本研究では,多周波電磁法(EM)逆散乱問題(ISP)に対処する深層学習に基づくイメージング手法を提案する。深層学習技術とEM物理法則を組み合わせることで、単周波ニューラルBIMの原理に導かれる多周波ニューラルボルン反復法(NeuralBIM)の開発に成功した。この手法は,マルチタスク学習技術とNeuralBIMの効率的な反復インバージョン処理を統合し,堅牢な多周波ボルン反復インバージョンモデルを構築する。トレーニング中、モデルはホモシステマティックな不確実性によって導かれるマルチタスク学習アプローチを採用し、各周波数データの重みを適応的に割り当てる。さらに、ISPの物理法則に制約された教師なし学習法を用いて、コントラストや全フィールドデータを必要としない多周波ニューラルBIMモデルを訓練する。多周波ニューラルBIMの有効性は、ISPを解くための精度と計算効率の向上を実証し、合成および実験データを通して検証する。さらに、この手法は強力な一般化機能と耐雑音性を示す。多周波ニューラルBIM法は、多周波EMデータに対する新しい逆変換法を探索し、多周波データの電磁ISPに有効な解を提供する。

In this work, we propose a deep learning-based imaging method for addressing the multi-frequency electromagnetic (EM) inverse scattering problem (ISP). By combining deep learning technology with EM physical laws, we have successfully developed a multi-frequency neural Born iterative method (NeuralBIM), guided by the principles of the single-frequency NeuralBIM. This method integrates multitask learning techniques with NeuralBIM's efficient iterative inversion process to construct a robust multi-frequency Born iterative inversion model. During training, the model employs a multitask learning approach guided by homoscedastic uncertainty to adaptively allocate the weights of each frequency's data. Additionally, an unsupervised learning method, constrained by the physical laws of ISP, is used to train the multi-frequency NeuralBIM model, eliminating the need for contrast and total field data. The effectiveness of the multi-frequency NeuralBIM is validated through synthetic and experimental data, demonstrating improvements in accuracy and computational efficiency for solving ISP. Moreover, this method exhibits strong generalization capabilities and noise resistance. The multi-frequency NeuralBIM method explores a novel inversion method for multi-frequency EM data and provides an effective solution for the electromagnetic ISP of multi-frequency data.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# LoGex:ガイド拡散による極めて稀な病理組織クラスの尾部検出の改善

LoGex: Improved tail detection of extremely rare histopathology classes via guided diffusion ( http://arxiv.org/abs/2409.01317v1 )

ライセンス: Link先を確認

Maximilian Mueller, Matthias Hein,

(参考訳) 現実的な医療環境では、データは本質的に長い尾を持つことが多く、ほとんどのサンプルは少数のクラスと稀なクラスの長い尾に集中しており、通常は少数のサンプルしか含まれていない。この分布は、希少な条件が検出しにくく、限られたデータのために分類することが難しいため、重大な課題である。本稿では,レアクラスを分類する代わりに,配布外データとして確実に検出することを目的とする。我々はローランク適応(LoRA)と拡散誘導を利用して、検出問題に対するターゲット合成データを生成する。本研究は, 頭骨の分類精度を低下させることなく, 尾骨の10サンプルのみを用いて, 組織学的課題におけるOOD検出性能を著しく改善した。

In realistic medical settings, the data are often inherently long-tailed, with most samples concentrated in a few classes and a long tail of rare classes, usually containing just a few samples. This distribution presents a significant challenge because rare conditions are critical to detect and difficult to classify due to limited data. In this paper, rather than attempting to classify rare classes, we aim to detect these as out-of-distribution data reliably. We leverage low-rank adaption (LoRA) and diffusion guidance to generate targeted synthetic data for the detection problem. We significantly improve the OOD detection performance on a challenging histopathological task with only ten samples per tail class without losing classification accuracy on the head classes.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# 量子イマジナリー時間進化による強相関量子多体系の近似基底状態の生成

Generating Approximate Ground States of Strongly Correlated Quantum Many-Body Systems Through Quantum Imaginary Time Evolution ( http://arxiv.org/abs/2409.01320v1 )

ライセンス: Link先を確認

Michael P. Kaicher, Florian Dommert, Christopher Wever, Maximilian Amsler, Michael Kühn,

(参考訳) 量子多体系の基底状態の特性を生成または探究するために設計されたほとんどの量子アルゴリズムは、所望の基底状態と大きな重なり合う初期状態として入力する必要がある。そのような基底状態を作るための1つのアプローチは、Imaginary Time Evolution (ITE)である。最近の[Motta, M., Sun, C., Tan, A.T.K. et al (2020)]の研究は、量子イマジナリー時間進化(Quantum Imaginary Time Evolution, QITE)と呼ばれるアルゴリズムを導入した。本研究では,格子および分子電子構造ハミルトニアンのITTを近似するQITEアルゴリズムの能力に関するヒューリスティックな研究を行う。大規模システムに対して古典的初期状態が整った場合のQITEアルゴリズムの性能を数値的に研究し,その一部が産業応用に関心を持ち,古典的平均場解よりも定性的にITTの挙動を再現し,改善できるかどうかを確認する。この研究で検討するシステムは、短距離と長距離の相互作用を示す様々な格子幾何学の1次元および2次元格子系から、分子電子構造のハミルトニアンの活性空間まで様々である。 QITE と ITE の比較に加え、フェルミオンガウス状態が古典的コンピュータ上で効率よく計算でき、任意の格子幾何学と次元におけるジェネリックスピンハミルトニアンの量子コンピュータ上で効率よく実装できる初期状態として機能しうることを示す。

Most quantum algorithms designed to generate or probe properties of the ground state of a quantum many-body system require as input an initial state with a large overlap with the desired ground state. One approach for preparing such a ground state is Imaginary Time Evolution (ITE). Recent work by [Motta, M., Sun, C., Tan, A.T.K. et al. (2020)] introduced an algorithm -- which we will refer to as Quantum Imaginary Time Evolution (QITE) -- that shows how ITE can be approximated by a sequence of unitary operators, making QITE potentially implementable on early fault-tolerant quantum computers. In this work, we provide a heuristic study of the capabilities of the QITE algorithm in approximating the ITE of lattice and molecular electronic structure Hamiltonians. We numerically study the performance of the QITE algorithm when provided with a good classical initial state for a large class of systems, some of which are of interest to industrial applications, and check if QITE is able to qualitatively replicate the ITE behavior and improve over a classical mean-field solution. The systems we consider in this work range from one- and two-dimensional lattice systems of various lattice geometries displaying short- and long-range interactions, to active spaces of molecular electronic structure Hamiltonians. In addition to the comparison of QITE and ITE, we explicitly show how imaginary time evolved fermionic Gaussian states can serve as initial states which can be efficiently computed on classical computers and efficiently implemented on quantum computers for generic spin Hamiltonians in arbitrary lattice geometries and dimensions, which can be of independent interest.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# ガイド・アンド・リスケール:効果的なチューニング自由な実画像編集のためのセルフガイド機構

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing ( http://arxiv.org/abs/2409.01322v1 )

ライセンス: Link先を確認

Vadim Titov, Madina Khalmatova, Alexandra Ivanova, Dmitry Vetrov, Aibek Alanov,

(参考訳) 近年の大規模テキスト・画像生成モデルの発展にもかかわらず、実際の画像をこれらのモデルで操作することは難しい問題である。既存の編集方法の主な制限は、入力画像のイメージ固有の外観を維持するために、幅広い画像編集において一貫した品質で実行できないか、あるいは時間を要するハイパーパラメータチューニングや拡散モデルの微調整を必要とすることである。本稿では,誘導機構による拡散サンプリングプロセスの修正に基づく新しい手法を提案する。本研究では,入力画像の全体構造と編集すべきでない局所的な外観を保存するための自己誘導技術について検討する。特に,画像の局所的および大域的構造を保存することを目的としたレイアウト保存エネルギー関数を明示的に導入する。さらに,本研究では,世代間における分類器フリーガイダンスとガイドの基準のバランスをとることで,雑音分布の保存を可能にするノイズ再スケーリング機構を提案する。このような誘導的アプローチは、拡散モデルと正確な反転過程を微調整する必要はない。その結果,提案手法は高速かつ高品質な編集機構を提供する。本実験では,人為的評価と定量的解析により,提案手法により,人間に好適な編集が可能であり,原画像の編集品質と保存のトレードオフが良好であることを示す。私たちのコードはhttps://github.com/FusionBrainLab/Guide-and-Rescale.comで利用可能です。

Despite recent advances in large-scale text-to-image generative models, manipulating real images with these models remains a challenging problem. The main limitations of existing editing methods are that they either fail to perform with consistent quality on a wide range of image edits or require time-consuming hyperparameter tuning or fine-tuning of the diffusion model to preserve the image-specific appearance of the input image. We propose a novel approach that is built upon a modified diffusion sampling process via the guidance mechanism. In this work, we explore the self-guidance technique to preserve the overall structure of the input image and its local regions appearance that should not be edited. In particular, we explicitly introduce layout-preserving energy functions that are aimed to save local and global structures of the source image. Additionally, we propose a noise rescaling mechanism that allows to preserve noise distribution by balancing the norms of classifier-free guidance and our proposed guiders during generation. Such a guiding approach does not require fine-tuning the diffusion model and exact inversion process. As a result, the proposed method provides a fast and high-quality editing mechanism. In our experiments, we show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing which is more preferable by humans and also achieves a better trade-off between editing quality and preservation of the original image. Our code is available at https://github.com/FusionBrainLab/Guide-and-Rescale.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# 自律的なロコ操作課題における接地言語モデル

Grounding Language Models in Autonomous Loco-manipulation Tasks ( http://arxiv.org/abs/2409.01326v1 )

ライセンス: Link先を確認

Jin Wang, Nikos Tsagarakis,

(参考訳) 行動自律性を持ったヒューマノイドロボットは、私たちの日常生活における理想的な協力者とされてきた。固定ベースのロボットアームと比較して、ヒューマノイドロボットはより大きな操作スペースを提供し、制御と計画の難しさを大幅に増大させる。汎用型ヒューマノイドロボットへの急速な進歩にもかかわらず、ほとんどの研究は、身体全体の調整とタスク計画に関する研究がほとんどなく、移動性と操作性の両方を含む長期的タスクをオープンエンドの言語指導下で実証する可能性を制限して、移動能力に重点を置いている。本研究では,異なるシナリオにおけるタスクに基づいて行動を学び,選択し,計画する新しいフレームワークを提案する。我々は、強化学習(RL)と全身最適化を組み合わせることで、ロボットの動きを生成し、それらをモーションライブラリーに格納する。我々はさらに,大規模言語モデル(LLM)の計画と推論機能を活用し,一連の動作プリミティブからなる階層的なタスクグラフを構築し,より高レベルな計画で下位レベルの実行をブリッジする。 CENTAUROロボットを用いたシミュレーションおよび実世界の実験により、言語モデルに基づくプランナーは、非構造化シーンにおける自由テキストコマンドからの高い自律性を証明し、新しいロコ操作タスクに効率的に適応できることが示されている。

Humanoid robots with behavioral autonomy have consistently been regarded as ideal collaborators in our daily lives and promising representations of embodied intelligence. Compared to fixed-based robotic arms, humanoid robots offer a larger operational space while significantly increasing the difficulty of control and planning. Despite the rapid progress towards general-purpose humanoid robots, most studies remain focused on locomotion ability with few investigations into whole-body coordination and tasks planning, thus limiting the potential to demonstrate long-horizon tasks involving both mobility and manipulation under open-ended verbal instructions. In this work, we propose a novel framework that learns, selects, and plans behaviors based on tasks in different scenarios. We combine reinforcement learning (RL) with whole-body optimization to generate robot motions and store them into a motion library. We further leverage the planning and reasoning features of the large language model (LLM), constructing a hierarchical task graph that comprises a series of motion primitives to bridge lower-level execution with higher-level planning. Experiments in simulation and real-world using the CENTAURO robot show that the language model based planner can efficiently adapt to new loco-manipulation tasks, demonstrating high autonomy from free-text commands in unstructured scenes.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# SPDiffusion:多概念テキスト画像生成のための意味的保護拡散

SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation ( http://arxiv.org/abs/2409.01327v1 )

ライセンス: Link先を確認

Yang Zhang, Rui Zhang, Xuecheng Nie, Haochen Li, Jikun Chen, Yifan Hao, Xin Zhang, Luoqi Liu, Ling Li,

(参考訳) 近年のテキスト・ツー・イメージモデルでは,高品質な画像の生成に顕著な成功を収めている。しかし、複数の文字やオブジェクトを含む画像を生成するマルチコンセプト生成をタスクすると、既存のメソッドは属性の混乱に悩まされ、重度のテキストイメージの不整合が発生する。属性の混乱は、潜在特徴のある領域が複数のまたは間違ったプロンプトトークンに付随する場合に発生する。本研究では,意味的保護拡散(SPDiffusion, Semantic Protection Diffusion)を提案する。 SPDiffusion frameworkでは、各領域とトークンの関連性を表すセマンティック保護マスク(SP-Mask)を設計し、生成プロセスにおいて無関係なトークンが特定の領域に与える影響を保護するためのセマンティック保護クロスアテンション(SP-Attn)を提案する。提案手法を評価するため,多種多様なマルチコンセプト・ベンチマークを作成し,SPDiffusionはこのベンチマークの最先端結果を達成し,その有効性を実証した。当社の方法は,ControlNet,Story Diffusion,PhotoMaker,PixArt-alphaなど,他の多くのアプリケーションメソッドやバックボーンと組み合わせて,マルチコンセプト機能を強化し,高い互換性とスケーラビリティを示す。

Recent text-to-image models have achieved remarkable success in generating high-quality images. However, when tasked with multi-concept generation which creates images containing multiple characters or objects, existing methods often suffer from attribute confusion, resulting in severe text-image inconsistency. We found that attribute confusion occurs when a certain region of the latent features attend to multiple or incorrect prompt tokens. In this work, we propose novel Semantic Protection Diffusion (SPDiffusion) to protect the semantics of regions from the influence of irrelevant tokens, eliminating the confusion of non-corresponding attributes. In the SPDiffusion framework, we design a Semantic Protection Mask (SP-Mask) to represent the relevance of the regions and the tokens, and propose a Semantic Protection Cross-Attention (SP-Attn) to shield the influence of irrelevant tokens on specific regions in the generation process. To evaluate our method, we created a diverse multi-concept benchmark, and SPDiffusion achieves state-of-the-art results on this benchmark, proving its effectiveness. Our method can be combined with many other application methods or backbones, such as ControlNet, Story Diffusion, PhotoMaker and PixArt-alpha to enhance their multi-concept capabilities, demonstrating strong compatibility and scalability.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# プライバシ保護機械学習における画像データセット機能の影響評価

Assessing the Impact of Image Dataset Features on Privacy-Preserving Machine Learning ( http://arxiv.org/abs/2409.01329v1 )

ライセンス: Link先を確認

Lucas Lange, Maurice-Maximilian Heykeroth, Erhard Rahm,

(参考訳) 機械学習(ML)はコンピュータビジョンを含む多くの分野において重要である。しかし、センシティブなデータに基づいてトレーニングされたMLモデルは、攻撃や情報漏洩が可能であるため、セキュリティ上の課題に直面している。プライバシ保存機械学習(PPML)は、差分プライバシー(DP)を使用して、ユーティリティとプライバシのバランスをとることで、この問題に対処する。本研究では,私的および非私的畳み込みニューラルネットワーク(CNN)モデルの有用性と脆弱性に影響を与える画像データセットの特徴を明らかにする。複数のデータセットとプライバシ予算を分析することで、不均衡なデータセットはマイノリティクラスで脆弱性を増大させるが、DPはこの問題を緩和する。クラスの少ないデータセットは、モデルユーティリティとプライバシの両方を改善し、高いエントロピーまたは低いFisher Discriminant Ratio(FDR)データセットは、ユーティリティとプライバシのトレードオフを悪化させる。これらの洞察は、画像データセットのユーティリティプライバシトレードオフを推定し、最適化する実践者や研究者にとって貴重なガイダンスを提供する。

Machine Learning (ML) is crucial in many sectors, including computer vision. However, ML models trained on sensitive data face security challenges, as they can be attacked and leak information. Privacy-Preserving Machine Learning (PPML) addresses this by using Differential Privacy (DP) to balance utility and privacy. This study identifies image dataset characteristics that affect the utility and vulnerability of private and non-private Convolutional Neural Network (CNN) models. Through analyzing multiple datasets and privacy budgets, we find that imbalanced datasets increase vulnerability in minority classes, but DP mitigates this issue. Datasets with fewer classes improve both model utility and privacy, while high entropy or low Fisher Discriminant Ratio (FDR) datasets deteriorate the utility-privacy trade-off. These insights offer valuable guidance for practitioners and researchers in estimating and optimizing the utility-privacy trade-off in image datasets, helping to inform data and privacy modifications for better outcomes based on dataset characteristics.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# デジタル病理組織学とディープラーニングを用いた小児脳腫瘍分類:多施設スウェーデンコホートを用いたSOTA法の評価

Pediatric brain tumor classification using digital histopathology and deep learning: evaluation of SOTA methods on a multi-center Swedish cohort ( http://arxiv.org/abs/2409.01330v1 )

ライセンス: Link先を確認

Iulian Emil Tampu, Per Nyman, Christoforos Spyretos, Ida Blystad, Alia Shamikh, Gabriela Prochazka, Teresita Díaz de Ståhl, Johanna Sandgren, Peter Lundberg, Neda Haj-Hosseini,

(参考訳) 脳腫瘍は小児や若年者で最も一般的な固形腫瘍であるが、大きな病理組織学的データセットの不足は、このグループにおける計算病理学の適用を制限している。本研究は、ヘマトキシリンおよびエオシン全スライド画像(WSI)の小児脳腫瘍を多施設スウェーデンコホートから分類するために、最先端の組織学的基盤モデルから得られたパッチ機能に対する2つの弱教師付き多重インスタンス学習(MIL)アプローチを実装した。スウェーデンの6つの大学病院から脳腫瘍と診断された540人の被験者(年齢8.5$\pm$4.9年)のWSIが集められた。インスタンス(パッチ)レベルの特徴は、事前訓練された3つの特徴抽出器(ResNet50, UNI, CONCH)を使用してWSIから取得した。患者レベルの分類には,アテンションベースMIL (ABMIL) やクラスタリング制約アテンションMIL (CLAM) を用いた。小児脳腫瘍の階層的分類に基づく3つの分類課題(腫瘍分類,家族分類,タイプ分類)をモデルとして評価した。モデル一般化は、2つのセンターのデータに関するトレーニングと、他の4つのセンターのデータによるテストによって評価された。注意マッピングによるモデル解釈性の評価を行った。 UNIとABBILの相関係数は0.86$\pm$0.04,0.63$\pm$0.04,0.53$\pm$0.05,それぞれ腫瘍分類,家族分類,型分類で達成された。一般化を評価する際、UNIとCONCHを利用したモデルはResNet50を用いたモデルよりも優れていた。しかし,in-siteからout-of-siteテストへの性能低下は,特徴抽出器間で類似していた。以上の結果から,多施設の国立データセットにおいて,様々な階層レベルでの小児脳腫瘍の診断における最先端の計算病理学的手法の可能性が示唆された。

Brain tumors are the most common solid tumors in children and young adults, but the scarcity of large histopathology datasets has limited the application of computational pathology in this group. This study implements two weakly supervised multiple-instance learning (MIL) approaches on patch-features obtained from state-of-the-art histology-specific foundation models to classify pediatric brain tumors in hematoxylin and eosin whole slide images (WSIs) from a multi-center Swedish cohort. WSIs from 540 subjects (age 8.5$\pm$4.9 years) diagnosed with brain tumor were gathered from the six Swedish university hospitals. Instance (patch)-level features were obtained from WSIs using three pre-trained feature extractors: ResNet50, UNI and CONCH. Instances were aggregated using attention-based MIL (ABMIL) or clustering-constrained attention MIL (CLAM) for patient-level classification. Models were evaluated on three classification tasks based on the hierarchical classification of pediatric brain tumors: tumor category, family and type. Model generalization was assessed by training on data from two of the centers and testing on data from four other centers. Model interpretability was evaluated through attention-mapping. The highest classification performance was achieved using UNI features and AMBIL aggregation, with Matthew's correlation coefficient of 0.86$\pm$0.04, 0.63$\pm$0.04, and 0.53$\pm$0.05, for tumor category, family and type classification, respectively. When evaluating generalization, models utilizing UNI and CONCH features outperformed those using ResNet50. However, the drop in performance from the in-site to out-of-site testing was similar across feature extractors. These results show the potential of state-of-the-art computational pathology methods in diagnosing pediatric brain tumors at different hierarchical levels with fair generalizability on a multi-center national dataset.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# Few-shot Guidanceによるテスト時間適応の強化

Enhancing Test Time Adaptation with Few-shot Guidance ( http://arxiv.org/abs/2409.01341v1 )

ライセンス: Link先を確認

Siqi Luo, Yi Xin, Yuntao Du, Zhongwei Wan, Tao Tan, Guangtao Zhai, Xiaohong Liu,

(参考訳) 深層ニューラルネットワークは、トレーニング(ソース)とテスト(ターゲット)データのドメインシフトに直面しながら、大きなパフォーマンス低下に直面することが多い。この問題に対処するために、事前訓練されたソースモデルを適用して、アウト・オブ・ディストリビューションのストリーミングターゲットデータを処理するテスト時間適応(TTA)手法が提案されている。これらの手法はある種の緩和を提供するが、ドメインシフト補正のための信頼性の高いメカニズムは欠如しており、現実のアプリケーションでは不安定であることが多い。そこで我々は,Few-Shot Test Time Adaptation (FS-TTA) を開発した。少ない入力の原則に従うと、FS-TTAは目に見えないターゲットドメインでの盲点探索を減らす。さらに,FS-TTAに取り組むための2段階のフレームワークを提案する。 (i)オーバーフィッティングを避けるために機能多様性拡張モジュールを使用するとともに、少数ショットのサポートセットで事前訓練されたソースモデルを微調整する。二モデル適応のための高品質な擬似ラベルを作成するため、プロトタイプメモリバンクガイダンスに基づくテスト時間適応を実装した。 3つのクロスドメイン分類ベンチマークに関する広範な実験を通じて、FS-TTAとフレームワークの性能と信頼性を実証した。

Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data. To address this issue, Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data. Although these methods offer some relief, they lack a reliable mechanism for domain shift correction, which can often be erratic in real-world applications. In response, we develop Few-Shot Test Time Adaptation (FS-TTA), a novel and practical setting that utilizes a few-shot support set on top of TTA. Adhering to the principle of few inputs, big gains, FS-TTA reduces blind exploration in unseen target domains. Furthermore, we propose a two-stage framework to tackle FS-TTA, including (i) fine-tuning the pre-trained source model with few-shot support set, along with using feature diversity augmentation module to avoid overfitting, (ii) implementing test time adaptation based on prototype memory bank guidance to produce high quality pseudo-label for model adaptation. Through extensive experiments on three cross-domain classification benchmarks, we demonstrate the superior performance and reliability of our FS-TTA and framework.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# Mutual Benefit: 公の場での自動運転車データの共有

Mutual Benefit: The Case for Sharing Autonomous Vehicle Data with the Public ( http://arxiv.org/abs/2409.01342v1 )

ライセンス: Link先を確認

David Goedicke, Natalie Chyi, Alexandra Bremers, Stacey Li, James Grimmelmann, Wendy Ju,

(参考訳) 自動運転は、公道で頻繁にテストされる、広く研究されている技術である。これらのテストから生成されたデータは、この技術を前進させる各企業にとって重要な競争要素である。本稿では、このデータの一部が、信頼された団体を通じて、実験中のコミュニティに対する補償と統制の形で共有することで、一般市民により明確な利益をもたらすべきだという規範的考え方を論じる。この議論を支持するために、どのようなデータを共有することができるか、自動運転車のデータを共有する倫理的ケース、現在AVデータの共有方法に関するケーススタディ、類似の交通業界から既存のデータ共有プラットフォームを引き合いに出し、どのようにデータを共有するべきかを推奨し、なぜそのようなデータ共有を奨励すべきかという議論を締めくくる。

Autonomous driving is a widely researched technology that is frequently tested on public roads. The data generated from these tests represent an essential competitive element for the respective companies moving this technology forward. In this paper, we argue for the normative idea that a part of this data should more explicitly benefit the general public by sharing it through a trusted entity as a form of compensation and control for the communities that are being experimented upon. To support this argument, we highlight what data is available to be shared, make the ethical case for sharing autonomous vehicle data, present case studies in how AV data is currently shared, draw from existing data-sharing platforms from similar transportation industries to make recommendations on how data should be shared and conclude with arguments as to why such data-sharing should be encouraged.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# Pairing Analogy-Augmented Generation with Procedural Memory for Procedural Q&A

Pairing Analogy-Augmented Generation with Procedural Memory for Procedural Q&A ( http://arxiv.org/abs/2409.01344v1 )

ライセンス: Link先を確認

K Roth, Rushil Gupta, Simon Halle, Bang Liu,

(参考訳) RAGパラダイムのLLMは、様々なタスクにおいて顕著なパフォーマンスを示しているが、まだ目に見えない領域、特に手続き的質問応答のような複雑なタスクではパフォーマンスが低い。本研究では,テキストベースの手続きを操作するための新しい形式と構造を導入する。このフォーマリズムに基づいて、LangChain Pythonドキュメントから取り除かれたLCStepと呼ばれる新しいデータセットも提示する。さらに、従来のRAGシステムを拡張して、人間の類推的推論からインスピレーションを得て、過去の経験を同化して、目に見えない問題を解決する新しいシステムAg(analytic-augmented Generation)を提案する。提案手法は,特殊な知識に適応するために,カスタムプロシージャメモリストアを備えたフリーズ言語モデルを用いている。本研究では,LCStep,RecipeNLG,CHAMPデータセットにおいて,LCStep,RecipeNLG,ChAMPの2つのLLMに基づく評価において,AAGが数ショットベースライン,RAGベースラインよりも優れており,RecipeNLGの場合は人間による評価で相関していることを示す。

While LLMs in the RAG paradigm have shown remarkable performance on a variety of tasks, they still under-perform on unseen domains, especially on complex tasks like procedural question answering. In this work, we introduce a novel formalism and structure for manipulating text-based procedures. Based on this formalism, we further present a novel dataset called LCStep, scraped from the LangChain Python docs. Moreover, we extend the traditional RAG system to propose a novel system called analogy-augmented generation (AAG), that draws inspiration from human analogical reasoning and ability to assimilate past experiences to solve unseen problems. The proposed method uses a frozen language model with a custom procedure memory store to adapt to specialized knowledge. We demonstrate that AAG outperforms few-shot and RAG baselines on LCStep, RecipeNLG, and CHAMP datasets under a pairwise LLM-based evaluation, corroborated by human evaluation in the case of RecipeNLG.

翻訳日:2024-09-06 06:25:12 公開日:2024-09-02

# 暗黙の知識による準備から得られる言語モデル

Language Models Benefit from Preparation with Elicited Knowledge ( http://arxiv.org/abs/2409.01345v1 )

ライセンス: Link先を確認

Jiacan Yu, Hannah An, Lenhart K. Schubert,

(参考訳) ゼロショット・チェーン・オブ・シンキング (ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング、ゼロショット・チェーン・オブ・シンキング) は、複数の推論ステップを必要とするタスクに対して言語モデル (LM) による質問応答 (QA) でしばしば用いられる。しかしながら、いくつかのQAタスクは、連鎖推論ステップよりも、関連する知識へのアクセスに重点を置いている。そこで本研究では, LMの2つの事例を用いて, LM1が関連情報を生成し, LM2がこの情報に基づいて疑問に答える, PreP と呼ばれる単純な汎用的プロンプト手法を提案する。 PrePは、ユーザのドメイン知識から独立して設計されており、特別なプロンプトエンジニアリングを必要とせずに、様々なQAタスクに適用できる。提案手法の有効性を評価するため,人工物部品および材料組成に関する広範囲なスキーマ的データセットから,100のバイナリ選択質問のデータセットを作成する。これらの質問は、2つのアーティファクトのうちどれが、他のアーティファクトと材料を共有する可能性が低いかを問う。このような質問は、異なるアーティファクトの部分構造における共有材料に関するLMの知識を調査する。提案手法は,我々のデータセットと3つの公開コモンセンス推論データセットで検証する。我々の手法の平均精度は、テストされたすべてのデータセットでテストされた他のすべてのメソッドよりも一貫して高い。

The zero-shot chain of thought (CoT) approach is often used in question answering (QA) by language models (LMs) for tasks that require multiple reasoning steps, typically enhanced by the prompt "Let's think step by step." However, some QA tasks hinge more on accessing relevant knowledge than on chaining reasoning steps. We introduce a simple general prompting technique, called PREP, that involves using two instances of LMs: the first (LM1) generates relevant information, and the second (LM2) answers the question based on this information. PREP is designed to be general and independent of the user's domain knowledge, making it applicable across various QA tasks without the need for specialized prompt engineering. To evaluate the effectiveness of our prompting method, we create a dataset of 100 binary-choice questions, derived from an extensive schematic dataset on artifact parts and material composition. These questions ask which of two artifacts is less likely to share materials with another artifact. Such questions probe the LM's knowledge of shared materials in the part structure of different artifacts. We test our method on our dataset and three published commonsense reasoning datasets. The average accuracy of our method is consistently higher than that of all the other tested methods across all the tested datasets.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# ターゲット駆動蒸留:目標時間選択と分離誘導による連続蒸留

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance ( http://arxiv.org/abs/2409.01347v1 )

ライセンス: Link先を確認

Cunzheng Wang, Ziyuan Guo, Yuxuan Duan, Huaxia Li, Nemo Chen, Xu Tang, Yao Hu,

(参考訳) 連続蒸留法は拡散モデルの生成タスクを加速させることで大きな成功を収めた。しかし, 従来の連続蒸留法では, 目標の時間ステップの選択に単純かつ簡単な手法が用いられていたため, 画像のぼやけや細かな損失に悩まされることが多い。これらの制約に対処するため,(1)ターゲット駆動蒸留(Target-Driven Distillation, TDD)を導入し,(1)目標タイムステップの微妙な選択戦略を採用し,トレーニング効率を向上する;(2)トレーニング中に分離したガイダンスを活用する;(2)推論期間中のガイダンス尺度の学習後にTDDを開放する;(3)非等価サンプリングとx0クリッピングをオプションで装備することで,画像サンプリングをより柔軟かつ正確に行えるようにする。実験では、TDDが数ステップの世代で最先端のパフォーマンスを達成することを検証する。

Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting target timesteps, they usually struggle with blurs and detail losses in generated images. To address these limitations, we introduce Target-Driven Distillation (TDD), which (1) adopts a delicate selection strategy of target timesteps, increasing the training efficiency; (2) utilizes decoupled guidances during training, making TDD open to post-tuning on guidance scale during inference periods; (3) can be optionally equipped with non-equidistant sampling and x0 clipping, enabling a more flexible and accurate way for image sampling. Experiments verify that TDD achieves state-of-the-art performance in few-step generation, offering a better choice among consistency distillation models.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# PatternPaint: 生成AIと塗装技術を用いたレイアウトパターンの生成

PatternPaint: Generating Layout Patterns Using Generative AI and Inpainting Techniques ( http://arxiv.org/abs/2409.01348v1 )

ライセンス: Link先を確認

Guanglei Zhou, Bhargav Korrapati, Gaurav Rajavendra Reddy, Jiang Hu, Yiran Chen, Dipto G. Thakurta,

(参考訳) VLSIレイアウトパターンの生成は、幅広いDFM(Design For Manufacturability)研究に不可欠である。本研究では,設計規則法則的金属配置パターンを作成するための生成機械学習モデルの可能性について検討する。提案手法は, 複雑な設計規則設定において法的なパターンを生成でき, 高い多様性を達成できることを示す。フレキシブルな設定を備えた設計システムは、局所的な変更を伴うパターン生成と、設計規則違反訂正の両方をサポートする。提案手法はIntel 18A Process Design Kit (PDK) で検証され,20の開始パターンしか持たない多種多様なDRC対応パターンライブラリを生成することができる。

Generation of VLSI layout patterns is essential for a wide range of Design For Manufacturability (DFM) studies. In this study, we investigate the potential of generative machine learning models for creating design rule legal metal layout patterns. Our results demonstrate that the proposed model can generate legal patterns in complex design rule settings and achieves a high diversity score. The designed system, with its flexible settings, supports both pattern generation with localized changes, and design rule violation correction. Our methodology is validated on Intel 18A Process Design Kit (PDK) and can produce a wide range of DRC-compliant pattern libraries with only 20 starter patterns.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# スペクトル: 逆補正を用いた条件変換器を用いたターゲット話者抽出

Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement ( http://arxiv.org/abs/2409.01352v1 )

ライセンス: Link先を確認

Tathagata Bandyopadhyay,

(参考訳) 近年,自然言語処理,コンピュータビジョン,信号処理など,多くのディープラーニングアプリケーションにおいて,アテンションベースのトランスフォーマーがデファクトスタンダードになっている。本論文では,モノラルマルチスピーカ混合音声信号からターゲット話者の音声を抽出するトランスフォーマーに基づくエンドツーエンドモデルを提案する。既存の話者抽出法とは異なり、話者埋め込み一貫性と波形エンコーダの可逆性を付加する2つの追加目的を導入し、話者エンコーダと音声分離器を併用して話者条件埋め込みをよりよく捉える。さらに,抽出した音声の知覚品質を向上するために,マルチスケール判別器を利用する。実験の結果,セパレータのバックボーンにデュアルパストランスフォーマーを用いることで,CNNのベースラインを3.12ドルdBポイント向上できることがわかった。最後に、我々のアプローチを最近の最先端技術と比較し、我々のモデルは、追加のデータ依存を発生させずに、平均4.1ドルのdBポイントで既存のメソッドを上回ります。

Recently, attention-based transformers have become a de facto standard in many deep learning applications including natural language processing, computer vision, signal processing, etc.. In this paper, we propose a transformer-based end-to-end model to extract a target speaker's speech from a monaural multi-speaker mixed audio signal. Unlike existing speaker extraction methods, we introduce two additional objectives to impose speaker embedding consistency and waveform encoder invertibility and jointly train both speaker encoder and speech separator to better capture the speaker conditional embedding. Furthermore, we leverage a multi-scale discriminator to refine the perceptual quality of the extracted speech. Our experiments show that the use of a dual path transformer in the separator backbone along with proposed training paradigm improves the CNN baseline by $3.12$ dB points. Finally, we compare our approach with recent state-of-the-arts and show that our model outperforms existing methods by $4.1$ dB points on an average without creating additional data dependency.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# ピクセルからオブジェクトへ:局所的および大域的アグリゲーションを用いた部分と対象のセグメンテーションのための階層的アプローチ

From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation ( http://arxiv.org/abs/2409.01353v1 )

ライセンス: Link先を確認

Yunfei Xie, Cihang Xie, Alan Yuille, Jieru Mei,

(参考訳) 本稿では,高精細画像分割作業のための階層型トランスフォーマーモデルを導入し,オブジェクト分割の包括的範囲で分割の粒度を効果的にブリッジする。このアプローチの核心は多面的表現戦略であり、個々のピクセルからスーパーピクセルへ体系的に進行し、最終的には凝集性グループ形成へと発展する。このアーキテクチャは、ローカルアグリゲーションとグローバルアグリゲーションという2つの重要なアグリゲーション戦略によって支えられている。局所アグリゲーションはスーパーピクセルを形成するために使用され、画像データの固有の冗長性を利用してオブジェクトの特定の部分と密に整合したセグメントを生成し、オブジェクトレベルの監視によってガイドされる。対照的に、グローバルアグリゲーションはこれらのスーパーピクセルをインターリンクし、それらを大きなグループに編成し、オブジェクト全体と相関し、部分レベルの監視の恩恵を受ける。このデュアルアグリゲーションフレームワークは、計算効率を保ちながら、様々な監視入力への多彩な適応を保証する。本手法は, 異なる監督モダリティ間の適応性と計算管理性のバランスを改善し, セグメンテーション性能の大幅な向上を図っている。 PartImageNetデータセットでテストすると,従来の状態よりも2.8%,mIoUスコアが0.8%,オブジェクトセグメンテーションが0.8%向上した。同様に、Pascal Partデータセットでは、それぞれ1.5%と2.0%のパフォーマンス向上を記録している。

In this paper, we introduce a hierarchical transformer-based model designed for sophisticated image segmentation tasks, effectively bridging the granularity of part segmentation with the comprehensive scope of object segmentation. At the heart of our approach is a multi-level representation strategy, which systematically advances from individual pixels to superpixels, and ultimately to cohesive group formations. This architecture is underpinned by two pivotal aggregation strategies: local aggregation and global aggregation. Local aggregation is employed to form superpixels, leveraging the inherent redundancy of the image data to produce segments closely aligned with specific parts of the object, guided by object-level supervision. In contrast, global aggregation interlinks these superpixels, organizing them into larger groups that correlate with entire objects and benefit from part-level supervision. This dual aggregation framework ensures a versatile adaptation to varying supervision inputs while maintaining computational efficiency. Our methodology notably improves the balance between adaptability across different supervision modalities and computational manageability, culminating in significant enhancement in segmentation performance. When tested on the PartImageNet dataset, our model achieves a substantial increase, outperforming the previous state-of-the-art by 2.8% and 0.8% in mIoU scores for part and object segmentation, respectively. Similarly, on the Pascal Part dataset, it records performance enhancements of 1.5% and 2.0% for part and object segmentation, respectively.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# 説明空間: 時系列解釈可能性の新しい視点

Explanation Space: A New Perspective into Time Series Interpretability ( http://arxiv.org/abs/2409.01354v1 )

ライセンス: Link先を確認

Shahbaz Rezaei, Xin Liu,

(参考訳) 深層学習モデルの人間による理解可能な説明は、多くの重要かつ敏感なアプリケーションに必要である。各入力特徴(分類器の決定のために)の重要性を直接入力に投影できる画像や表データとは異なり、時系列識別可能な特徴(例えば支配周波数)は、ユーザが容易に理解できる時間領域に現れにくいことが多い。さらに、ほとんどの説明手法は、どんな特徴も欠如していることを示す指標として基準値を必要とする。しかしながら、視覚タスクのブラックピクセルや表データのゼロ/平均値として定義される特徴の欠如の概念は、時系列ではよく定義されていない。表と視覚ドメインから時系列ドメインへの説明可能なAIメソッド(XAI)の採用にもかかわらず、これらの違いは実際にはこれらのXAIメソッドの適用を制限する。本稿では,既存の手法を用いて時間領域で訓練されたモデルを他の説明空間で解釈できる簡易かつ効果的な手法を提案する。それぞれが特定の時系列でこれらの問題を緩和できる4つの説明空間を提案する。トレーニングされたモデルやXAIメソッドを変更することなく,既存のプラットフォームで簡単に適用することができる。

Human understandable explanation of deep learning models is necessary for many critical and sensitive applications. Unlike image or tabular data where the importance of each input feature (for the classifier's decision) can be directly projected into the input, time series distinguishable features (e.g. dominant frequency) are often hard to manifest in time domain for a user to easily understand. Moreover, most explanation methods require a baseline value as an indication of the absence of any feature. However, the notion of lack of feature, which is often defined as black pixels for vision tasks or zero/mean values for tabular data, is not well-defined in time series. Despite the adoption of explainable AI methods (XAI) from tabular and vision domain into time series domain, these differences limit the application of these XAI methods in practice. In this paper, we propose a simple yet effective method that allows a model originally trained on time domain to be interpreted in other explanation spaces using existing methods. We suggest four explanation spaces that each can potentially alleviate these issues in certain types of time series. Our method can be readily adopted in existing platforms without any change to trained models or XAI methods.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# 法律領域における非英語ハイブリッド検索の調査

Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain ( http://arxiv.org/abs/2409.01357v1 )

ライセンス: Link先を確認

Antoine Louis, Gijs van Dijck, Gerasimos Spanakis,

(参考訳) ハイブリッド検索は,特に検索品質の顕著な改善が観察された領域外文脈において,異なるマッチングパラダイムの制約を相殺するための効果的な戦略として現れてきた。しかし、既存の研究は主に限定的な検索手法に焦点をあてており、英語のみのドメイン一般データセットをペアで評価している。本研究は,フランス語の未探索法分野における多種多様な検索モデルに対するハイブリッド検索の有効性について検討し,ゼロショットとインドメインの両方のシナリオを評価する。その結果,0ショットの文脈では,異なるドメインジェネラルモデルとの融合は,融合法によらず,スタンドアローンモデルと比較して連続的に性能を向上することがわかった。驚くべきことに、モデルがドメイン内でトレーニングされている場合、融合は、注意深く調整された重みでスコアを融合しない限り、最も優れた単一システムの使用と比較して、一般的にパフォーマンスを低下させる。これらの新たな洞察は、新しい分野や言語にまたがる事前発見の適用性を高め、英語以外の専門分野におけるハイブリッド検索の深い理解に寄与する。

Hybrid search has emerged as an effective strategy to offset the limitations of different matching paradigms, especially in out-of-domain contexts where notable improvements in retrieval quality have been observed. However, existing research predominantly focuses on a limited set of retrieval methods, evaluated in pairs on domain-general datasets exclusively in English. In this work, we study the efficacy of hybrid search across a variety of prominent retrieval models within the unexplored field of law in the French language, assessing both zero-shot and in-domain scenarios. Our findings reveal that in a zero-shot context, fusing different domain-general models consistently enhances performance compared to using a standalone model, regardless of the fusion method. Surprisingly, when models are trained in-domain, we find that fusion generally diminishes performance relative to using the best single system, unless fusing scores with carefully tuned weights. These novel insights, among others, expand the applicability of prior findings across a new field and language, and contribute to a deeper understanding of hybrid search in non-English specialized domains.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# ポスト量子と量子ブロックチェーンの探索と比較

A Survey and Comparison of Post-quantum and Quantum Blockchains ( http://arxiv.org/abs/2409.01358v1 )

ライセンス: Link先を確認

Zebo Yang, Haneen Alfauri, Behrooz Farkiani, Raj Jain, Roberto Di Pietro, Aiman Erbad,

(参考訳) ブロックチェーンは、分散型の信頼とコミュニケーションを促進する能力によって、学術や産業からかなりの注目を集めている。しかし、量子コンピューティングの急速な進歩は、既存のブロックチェーン技術のセキュリティに重大な脅威をもたらす。特に、ShorとGroverのアルゴリズムの出現は、ブロックチェーンを支える暗号システムの妥協に関する懸念を提起する。そのため、量子攻撃に対してブロックチェーン技術を強化する方法を開発することが不可欠である。この問題に対して、2つの異なるアプローチが提案されている。最初のアプローチは、量子攻撃に耐性のある古典的な暗号アルゴリズムを活用することを目的とした、量子後ブロックチェーンである。第2のアプローチでは、量子コンピュータとネットワークのパワーを活用してブロックチェーンの基礎を再構築する、量子ブロックチェーンについて検討している。本稿では、これらの領域におけるオープンな質問と残りの課題を探求しながら、量子後ブロックチェーンと量子ブロックチェーンの包括的概要と比較を提供することを目的とする。詳細な紹介を提供し、ブロックチェーンの構造、セキュリティ、プライバシ、その他の重要な要素の違いを調べ、現在の研究動向を議論することで結論付けている。

Blockchains have gained substantial attention from academia and industry for their ability to facilitate decentralized trust and communications. However, the rapid progress of quantum computing poses a significant threat to the security of existing blockchain technologies. Notably, the emergence of Shor's and Grover's algorithms raises concerns regarding the compromise of the cryptographic systems underlying blockchains. Consequently, it is essential to develop methods that reinforce blockchain technology against quantum attacks. In response to this challenge, two distinct approaches have been proposed. The first approach involves post-quantum blockchains, which aim to utilize classical cryptographic algorithms resilient to quantum attacks. The second approach explores quantum blockchains, which leverage the power of quantum computers and networks to rebuild the foundations of blockchains. This paper aims to provide a comprehensive overview and comparison of post-quantum and quantum blockchains while exploring open questions and remaining challenges in these domains. It offers an in-depth introduction, examines differences in blockchain structure, security, privacy, and other key factors, and concludes by discussing current research trends.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# 解釈可能な畳み込みカーネルによる時系列の関連

Correlating Time Series with Interpretable Convolutional Kernels ( http://arxiv.org/abs/2409.01362v1 )

ライセンス: Link先を確認

Xinyu Chen, HanQin Cai, Fuqiang Liu, Jinhua Zhao,

(参考訳) 本研究では,一変量,多変量,多次元時系列データにおける畳み込みカーネル学習の問題に対処する。まず、非負制約付きスパース回帰問題として単変量時系列に対する畳み込みカーネル学習を定式化し、円形畳み込みと循環行列の特性を活用する。第2に、この手法を多変量および多次元時系列データに一般化するために、テンソル計算を用い、畳み込みカーネル学習問題をテンソルの形で再構成する。これはベクトル化やテンソル展開操作を通じて、標準的なスパース回帰問題に変換される。提案手法では,既存の非負の部分空間探索法を用いて最適化問題に対処し,畳み込みカーネルが時間的相関やパターンを捕捉できるようにする。提案モデルを評価するために,複数の実世界の時系列データセットに適用する。ニューヨーク市とシカゴの多次元ライドシェアとタクシー旅行のデータから、畳み込みカーネルは解釈可能な局所的相関と周期的パターン、例えば毎週の季節性を明らかにしている。多次元流体流動データでは、畳み込みカーネルによって捕捉された局所的相関と非局所的相関の両方がテンソル分解を補強し、流体流動再構成タスクの性能改善につながる。そこで本研究では,時系列データから畳み込みカーネルを自動的に学習するための洞察に富んだ基礎を構築し,空間性や非負性性制約による解釈性を重視した。

This study addresses the problem of convolutional kernel learning in univariate, multivariate, and multidimensional time series data, which is crucial for interpreting temporal patterns in time series and supporting downstream machine learning tasks. First, we propose formulating convolutional kernel learning for univariate time series as a sparse regression problem with a non-negative constraint, leveraging the properties of circular convolution and circulant matrices. Second, to generalize this approach to multivariate and multidimensional time series data, we use tensor computations, reformulating the convolutional kernel learning problem in the form of tensors. This is further converted into a standard sparse regression problem through vectorization and tensor unfolding operations. In the proposed methodology, the optimization problem is addressed using the existing non-negative subspace pursuit method, enabling the convolutional kernel to capture temporal correlations and patterns. To evaluate the proposed model, we apply it to several real-world time series datasets. On the multidimensional rideshare and taxi trip data from New York City and Chicago, the convolutional kernels reveal interpretable local correlations and cyclical patterns, such as weekly seasonality. In the context of multidimensional fluid flow data, both local and nonlocal correlations captured by the convolutional kernels can reinforce tensor factorization, leading to performance improvements in fluid flow reconstruction tasks. Thus, this study lays an insightful foundation for automatically learning convolutional kernels from time series data, with an emphasis on interpretability through sparsity and non-negativity constraints.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# 角運動量絡みによる量子重力の低エネルギー試験

Low-Energy Test of Quantum Gravity via Angular Momentum Entanglement ( http://arxiv.org/abs/2409.01364v1 )

ライセンス: Link先を確認

Trinidad B. Lantaño, Luciano Petruzziello, Susana F. Huelga, Martin B. Plenio,

(参考訳) 現在、低エネルギー状態における重力相互作用の量子的性質を探索するための実験は、典型的には2つの球対称試験質量の量子化された中心-質量自由度、またはニュートンポテンシャルによって支配される重力相互作用の下で非対称質量の自由度に焦点をあてている。対照的に, 球対称試験質量の角モータ間の相互作用を, 角モータ間の効果的な双極子相互作用をもたらすフレーム描画に関連する木レベル相対論的補正を考慮した検討する。このアプローチでは、プローブの質量は直接的に関係せず、代わりに角運動量が中心的な役割を果たす。最適エンタングリング速度は最大非局在化初期状態で達成されるが、それぞれが回転の固有状態で初期化されている場合でも、2つの回転系の間に有意な量子相関が生じることが実証された。さらに、生成した絡み合いの典型的なノイズ源に対する堅牢性について検討し、角運動量と球対称試験質量の組み合わせが多くの一般的なノイズ源の影響を軽減することを観察する。

Currently envisaged tests for probing the quantum nature of the gravitational interaction in the low-energy regime typically focus either on the quantized center-of-mass degrees of freedom of two spherically-symmetric test masses or on the rotational degrees of freedom of non-symmetric masses under a gravitational interaction governed by the Newtonian potential. In contrast, here we investigate the interaction between the angular momenta of spherically-symmetric test masses considering a tree-level relativistic correction related to frame-dragging that leads to an effective dipolar interaction between the angular momenta. In this approach, the mass of the probes is not directly relevant; instead, their angular momentum plays the central role. We demonstrate that, while the optimal entangling rate is achieved with a maximally delocalized initial state, significant quantum correlations can still arise between two rotating systems even when each is initialized in an eigenstate of rotation. Additionally, we examine the robustness of the generated entanglement against typical sources of noise and observe that our combination of angular momentum and spherically-symmetric test-masses mitigates the impact of many common noise sources.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# CHESS: Channel-Wise ThresholdingとSelective SparsificationによるLLM推論の最適化

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification ( http://arxiv.org/abs/2409.01366v1 )

ライセンス: Link先を確認

Junhui He, Shangyu Wu, Weidong Wen, Chun Jason Xue, Qingan Li,

(参考訳) エッジデバイスに大規模言語モデル(LLM)をデプロイすることは、計算オーバーヘッドとメモリ要求がかなり大きいため、大きな課題となる。活性化スパーシフィケーションは、推論中に活性化されたニューロンの数を減らすことでこれらの課題を軽減することができる。既存の手法では、アクティベーションテンソルの統計に基づく閾値に基づくスペーシングが一般的である。しかし、これらの手法は、アクティベーションスペーシフィケーションがパフォーマンスに与える影響を明示的にモデル化するものではない。この問題に対処するため,本論文では,スパーシフィケーション決定を最適化する新たな目的を導入することにより,アクティベーションスペーシフィケーション問題を再考する。本稿では,この改革を基礎として,Channel-wise thrEsholding と Selective Sparsification による一般活性化スカラー化手法であるCHESSを提案する。第一に、チャネルワイドの閾値付けは、フィードフォワードネットワーク(FFN)層内の各アクティベーションチャネルにユニークな閾値を割り当てる。次に、選択的なスペーシフィケーションは、アテンションモジュール内の特定の層に閾値に基づくアクティベーションスペーシフィケーションを適用する。最後に,LLM推論を高速化するスパースカーネルの実装について述べる。実験結果から,提案したCHESSは,既存の手法に比べてパラメータを小さくし,最大1.27倍の高速化を実現していることがわかった。

Deploying large language models (LLMs) on edge devices presents significant challenges due to the substantial computational overhead and memory requirements. Activation sparsification can mitigate these challenges by reducing the number of activated neurons during inference. Existing methods typically employ thresholding-based sparsification based on the statistics of activation tensors. However, these methods do not explicitly model the impact of activation sparsification on performance, leading to suboptimal performance degradation. To address this issue, this paper reformulates the activation sparsification problem by introducing a new objective that optimizes the sparsification decisions. Building on this reformulation, we propose CHESS, a general activation sparsification approach via CHannel-wise thrEsholding and Selective Sparsification. First, channel-wise thresholding assigns a unique threshold to each activation channel in the feed-forward network (FFN) layers. Then, selective sparsification involves applying thresholding-based activation sparsification to specific layers within the attention modules. Finally, we detail the implementation of sparse kernels to accelerate LLM inference. Experimental results demonstrate that the proposed CHESS achieves lower performance degradation over 8 downstream tasks while activating fewer parameters compared to existing methods, thus speeding up the LLM inference by up to 1.27x.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# インフォメーション・ボトルネックに基づくグラフ表現学習のバイアス化

Debiasing Graph Representation Learning based on Information Bottleneck ( http://arxiv.org/abs/2409.01367v1 )

ライセンス: Link先を確認

Ziyi Zhang, Mingxuan Ouyang, Wanyu Lin, Hao Lan, Lei Yang,

(参考訳) グラフ表現学習は、金融やソーシャルネットワークなど、多くの現実世界のアプリケーションにおいて優れたパフォーマンスを示している。しかし、既存の作品の多くは、意思決定プロセスにおける公正性への注意不足のために差別的な予測を下す可能性がある。この監視によって、公正な表現学習への注目が高まっている。公正表現学習に関する最近の研究の中で、敵対的学習に基づく先行研究は、通常不安定または反生産的なパフォーマンスを誘発する。本研究では,変動グラフオートエンコーダに基づく新しいフレームワークGRAFairの設計と実装について述べる。 GRAFairの要点は条件フェアネス・ボトルネック(Conditional Fairness Bottleneck)であり、表現の効用と関心の情報とのトレードオフを捉えることを目的としている。変分近似を適用することにより、最適化対象を抽出できる。特にGRAFairは、敵の訓練を受けずに機密情報をほとんど含まないまま、タスクの情報表現を訓練することができる。実世界の様々なデータセットに対する実験により,提案手法の有効性を,公正性,有用性,堅牢性,安定性の観点から実証した。

Graph representation learning has shown superior performance in numerous real-world applications, such as finance and social networks. Nevertheless, most existing works might make discriminatory predictions due to insufficient attention to fairness in their decision-making processes. This oversight has prompted a growing focus on fair representation learning. Among recent explorations on fair representation learning, prior works based on adversarial learning usually induce unstable or counterproductive performance. To achieve fairness in a stable manner, we present the design and implementation of GRAFair, a new framework based on a variational graph auto-encoder. The crux of GRAFair is the Conditional Fairness Bottleneck, where the objective is to capture the trade-off between the utility of representations and sensitive information of interest. By applying variational approximation, we can make the optimization objective tractable. Particularly, GRAFair can be trained to produce informative representations of tasks while containing little sensitive information without adversarial training. Experiments on various real-world datasets demonstrate the effectiveness of our proposed method in terms of fairness, utility, robustness, and stability.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# 量子パンプロトサイコリズムと構造と主観的な組み合わせ問題

Quantum panprotopsychism and the structure and subject-summing combination problem ( http://arxiv.org/abs/2409.01368v1 )

ライセンス: Link先を確認

Rodolfo Gambini, Jorge Pullin,

(参考訳) 先程の論文では、パンプロト心理学の一形態として、状態や内部現象を伴う事象の量子力学のオントロジーが、意識の現象的側面を説明するのに適していることが示されている。我々は、パン心理学とパンプロト心理学のパレットと穀物の組み合わせの問題が、量子レベルで不適切な超越性に関する古典物理学に基づく暗黙の仮説から生じることを証明した。本稿では、ウィリアム・ジェームスによってもたらされた主観的仮定問題について、おそらくパンサイコズムの第一かつ最も重要な組み合わせ問題について論じる。我々はまず、その論文で提示された量子パンプロト心理学的アプローチにおける経験者の物理的相違点を特定することから始める。これを達成するために、ホワイトヘッドが提唱した包括的概念から着想を得た経験の主題の概念に目を向け、この概念が物体や事象の量子オントロジーに適応可能であることを示す。量子力学の不確定性と因果開放性のため、このオントロジーは構造結合問題の残りの側面の分析にも適しており、意識の構造が原始動物から人間へとどのように進化したかを示している。この分析は、脳における量子認知機構の実装に関する条件を課し、それらに取り組むための新しい問題と戦略を提案する。特に、進化の発達の度合いが異なる動物における経験の構造化についてである。

In a previous paper, we have shown that an ontology of quantum mechanics in terms of states and events with internal phenomenal aspects, that is, a form of panprotopsychism, is well suited to explaining the phenomenal aspects of consciousness. We have proved there that the palette and grain combination problems of panpsychism and panprotopsychism arise from implicit hypotheses based on classical physics about supervenience that are inappropriate at the quantum level, where an exponential number of emergent properties and states arise. In this article, we address what is probably the first and most important combination problem of panpsychism: the subject-summing problem originally posed by William James. We begin by identifying the physical counterparts of the subjects of experience within the quantum panprotopsychic approach presented in that article. To achieve this, we turn to the notion of subject of experience inspired by the idea of prehension proposed by Whitehead and show that this notion can be adapted to the quantum ontology of objects and events. Due to the indeterminacy of quantum mechanics and its causal openness, this ontology also seems to be suitable for the analysis of the remaining aspects of the structure combination problem, which shows how the structuration of consciousness could have evolved from primitive animals to humans. The analysis imposes conditions on possible implementations of quantum cognition mechanisms in the brain and suggests new problems and strategies to address them. In particular, with regard to the structuring of experiences in animals with different degrees of evolutionary development.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# スケーラブルな逆強化学習による言語模倣

Imitating Language via Scalable Inverse Reinforcement Learning ( http://arxiv.org/abs/2409.01369v1 )

ライセンス: Link先を確認

Markus Wulfmeier, Michael Bloesch, Nino Vieillard, Arun Ahuja, Jorg Bornschein, Sandy Huang, Artem Sokolov, Matt Barnes, Guillaume Desjardins, Alex Bewley, Sarah Maria Elisabeth Bechtle, Jost Tobias Springenberg, Nikola Momchev, Olivier Bachem, Matthieu Geist, Martin Riedmiller,

(参考訳) 言語モデルトレーニングの大半は模倣学習に基づいている。プレトレーニング、教師付き微調整をカバーし、人間からのフィードバック(RLHF)からの強化学習の開始条件に影響を与える。次のトークン予測のための最大推定値(MLE)の単純性とスケーラビリティは、主要なパラダイムとしての役割を導いた。しかし、より広範な模倣学習の分野は、自己回帰生成に基づくシーケンシャルな構造をより効果的に活用することができる。我々は、逆強化学習(IRL)の模倣に対する視点を調査し、報酬を抽出し、個々のトークンの確率ではなくシーケンスを直接最適化し、その利点を大規模言語モデルの微調整に向け評価する。我々は,MLEの時間差正規化拡張として,逆ソフトQ-ラーニングを改良した新しいアングルを提供する。これにより、MLEとIRLの原則的な接続が作成され、教師付き微調整(SFT)設定において、パフォーマンスと世代間の多様性が向上した、追加の複雑さのトレードオフが可能になる。特に,タスク性能を最大化しながら多様性を維持するため,IRLをオンラインデータ生成なしでも固定SFTデータセットに強力な代替手段として活用する上で,IRLに基づく模倣の明確な利点を見出した。 IRL抽出報酬関数の解析により、教師付きおよび嗜好に基づくLLMポストトレーニングの強化により、より堅牢な報酬関数の利点が示唆された。

The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability of maximum likelihood estimation (MLE) for next token prediction led to its role as predominant paradigm. However, the broader field of imitation learning can more effectively utilize the sequential structure underlying autoregressive generation. We focus on investigating the inverse reinforcement learning (IRL) perspective to imitation, extracting rewards and directly optimizing sequences instead of individual token likelihoods and evaluate its benefits for fine-tuning large language models. We provide a new angle, reformulating inverse soft-Q-learning as a temporal difference regularized extension of MLE. This creates a principled connection between MLE and IRL and allows trading off added complexity with increased performance and diversity of generations in the supervised fine-tuning (SFT) setting. We find clear advantages for IRL-based imitation, in particular for retaining diversity while maximizing task performance, rendering IRL a strong alternative on fixed SFT datasets even without online data generation. Our analysis of IRL-extracted reward functions further indicates benefits for more robust reward functions via tighter integration of supervised and preference-based LLM post-training.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# 離散最適化のための量子コンピューティング:3つの技術のハイライト

Quantum Computing for Discrete Optimization: A Highlight of Three Technologies ( http://arxiv.org/abs/2409.01373v1 )

ライセンス: Link先を確認

Alexey Bochkarev, Raoul Heese, Sven Jäger, Philine Schiewe, Anita Schöbel,

(参考訳) 量子最適化は量子コンピューティングの有望なフロンティアとして登場し、数学的最適化問題に対する新しい数値的アプローチを提供している。本研究の目的は,オペレーショナル・リサーチ(OR)と量子コンピューティングのコミュニティ間の学際的な研究を促進することである。この目的のために、様々な種類の量子ハードウェアを市場に投入する3つの量子パワー最適化手法を検討する。これらの手法を説明するために,トラベリングセールスパーソン問題,重み付き最大カット,最大独立セットの3つの古典的最適化問題を解く。一般のオーディエンスを念頭に置いて、各アプローチの背後にある直感と重要な参照を提供し、対応する高レベルのワークフローを記述し、重要な実践的考察を強調します。特に、問題定式化とデバイス固有の構成の重要性を強調し、計算に必要なリソース量(量子ビットの数に焦点をあてる)に与える影響を強調します。これらの点は、QuEraの中立原子マシン、D-Waveの量子アニール、IBMのゲートベースのデバイスという3種類の量子コンピュータに関する一連の実験で説明されている。

Quantum optimization has emerged as a promising frontier of quantum computing, providing novel numerical approaches to mathematical optimization problems. The main goal of this paper is to facilitate interdisciplinary research between the Operations Research (OR) and Quantum Computing communities by providing an OR scientist's perspective on selected quantum-powered methods for discrete optimization. To this end, we consider three quantum-powered optimization approaches that make use of different types of quantum hardware available on the market. To illustrate these approaches, we solve three classical optimization problems: the Traveling Salesperson Problem, Weighted Maximum Cut, and Maximum Independent Set. With a general OR audience in mind, we attempt to provide an intuition behind each approach along with key references, describe the corresponding high-level workflow, and highlight crucial practical considerations. In particular, we emphasize the importance of problem formulations and device-specific configurations, and their impact on the amount of resources required for computation (where we focus on the number of qubits). These points are illustrated with a series of experiments on three types of quantum computers: a neutral atom machine from QuEra, a quantum annealer from D-Wave, and a gate-based device from IBM.

翻訳日:2024-09-06 06:11:05 公開日:2024-09-02

# H-ARC: 抽象と推論コーパスベンチマークにおける人的パフォーマンスのロバストな評価

H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark ( http://arxiv.org/abs/2409.01374v1 )

ライセンス: Link先を確認

Solim LeGris, Wai Keen Vong, Brenden M. Lake, Todd M. Gureckis,

(参考訳) ARC(Abstraction and Reasoning Corpus)は、人や機械における配布外一般化の挑戦をテストするために設計されたビジュアルプログラム合成ベンチマークである。 2019年以降、既存の人工知能手法による課題について、限られた進展が観察されている。人間と機械のパフォーマンスを比較することは、ベンチマークの有効性にとって重要である。以前の研究では、人間がARCベンチマークからいかにうまくタスクを解くことができるかを調べたが、それらは元のデータセットやARCの変種からのタスクのサブセットのみを使用していたため、人間のパフォーマンスを仮に見積もっただけだった。本研究では,元のARC問題集合から,400のトレーニングと400のタスクの完全なセット上で1729人の人間を評価することにより,より堅牢な人的パフォーマンスの推定値を得る。トレーニングセットでは、平均人のパフォーマンスが73.3%から77.2%、トレーニングセットでは76.2%、公的評価セットでは55.9%から68.9%と報告されている経験平均が64.2%と見積もられている。しかし、800件のうち790件は少なくとも1件の人が3回試みて解決可能であり、一般に公開されているARCタスクの大部分は、インターネット上で採用される一般的なクラウドワーカーによって原則的に解決可能であることを示唆している。特に、これらの数値は以前の推定よりもわずかに低いが、人間のパフォーマンスはARCを解くための最先端のアプローチを大きく上回っている。 ARCの研究を容易にするために、私たちはH-ARC(Human-ARC)と呼ばれるデータセットを公開しました。

The Abstraction and Reasoning Corpus (ARC) is a visual program synthesis benchmark designed to test challenging out-of-distribution generalization in humans and machines. Since 2019, limited progress has been observed on the challenge using existing artificial intelligence methods. Comparing human and machine performance is important for the validity of the benchmark. While previous work explored how well humans can solve tasks from the ARC benchmark, they either did so using only a subset of tasks from the original dataset, or from variants of ARC, and therefore only provided a tentative estimate of human performance. In this work, we obtain a more robust estimate of human performance by evaluating 1729 humans on the full set of 400 training and 400 evaluation tasks from the original ARC problem set. We estimate that average human performance lies between 73.3% and 77.2% correct with a reported empirical average of 76.2% on the training set, and between 55.9% and 68.9% correct with a reported empirical average of 64.2% on the public evaluation set. However, we also find that 790 out of the 800 tasks were solvable by at least one person in three attempts, suggesting that the vast majority of the publicly available ARC tasks are in principle solvable by typical crowd-workers recruited over the internet. Notably, while these numbers are slightly lower than earlier estimates, human performance still greatly exceeds current state-of-the-art approaches for solving ARC. To facilitate research on ARC, we publicly release our dataset, called H-ARC (human-ARC), which includes all of the submissions and action traces from human participants.