Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240920となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 代数的クリプトアナリシスに関するフォーマルパワーシリーズ Formal Power Series on Algebraic Cryptanalysis ( http://arxiv.org/abs/2007.14729v3 ) ライセンス: Link先を確認	Shuhei Nakamura,	(参考訳) 多項式方程式の系を解くための暗号系を減少させる攻撃の複雑性推定において、第1の転落次数の正則度と上界は、しばしば暗号解析において用いられる。正則性の次数は半正則性仮定の下で単変量形式列を用いて容易に計算できるが、第1の転位次数の上界を決定するためには、入力システムの具体的なシジーを調べる必要がある。本稿では,多項式系における第1降下次数の上界を十分に大域にわたって検討する。この場合、非半正則系の第一降下次数は正則度で上界し、多階多項式系の第一落下次数は、多変量形式的級数列から決定される一定の値で上界することを示す。さらに、多項式系の最初の転倒次数を計算するための理論的な仮定を十分に大きな場上で提供する。 In the complexity estimation for an attack that reduces a cryptosystem to solving a system of polynomial equations, the degree of regularity and an upper bound of the first fall degree are often used in cryptanalysis. While the degree of regularity can be easily computed using a univariate formal power series under the semi-regularity assumption, determining an upper bound of the first fall degree requires investigating the concrete syzygies of an input system. In this paper, we investigate an upper bound of the first fall degree for a polynomial system over a sufficiently large field. In this case, we prove that the first fall degree of a non-semi-regular system is bounded above by the degree of regularity, and that the first fall degree of a multi-graded polynomial system is bounded above by a certain value determined from a multivariate formal power series. Moreover, we provide a theoretical assumption for computing the first fall degree of a polynomial system over a sufficiently large field.	翻訳日:2024-11-09 15:57:56 公開日:2024-09-20
# 高次元データに関する講義ノート Lecture notes on high-dimensional data ( http://arxiv.org/abs/2101.05841v7 ) ライセンス: Link先を確認	Sven-Ake Wegner,	(参考訳) 以下は、2019-2020年にイギリスでBScの学生に教えた「数学データサイエンス」の講座の最初の部分に基づく講義ノートである。トピックは、高次元における測度集中、高次元におけるガウス確率ベクトル、乱射影、ガウスデータの分離・分離である。改訂版が教科書 (Mathematical Introduction to Data Science, Springer, Berlin, Heidelberg, 2024, https://link.springer.com/book/10.1007/978-3-662-69426-8] の一部として出版された。 These are lecture notes based on the first part of a course on 'Mathematical Data Science', which I taught to final year BSc students in the UK in 2019-2020. Topics include: concentration of measure in high dimensions; Gaussian random vectors in high dimensions; random projections; separation/disentangling of Gaussian data. A revised version has been published as part of the textbook [Mathematical Introduction to Data Science, Springer, Berlin, Heidelberg, 2024, https://link.springer.com/book/10.1007/978-3-662-69426-8].	翻訳日:2024-11-09 15:57:56 公開日:2024-09-20
# 個人データフローの可視化:Booking.comの事例から Visualising Personal Data Flows: Insights from a Case Study of Booking.com ( http://arxiv.org/abs/2304.09603v5 ) ライセンス: Link先を確認	Haiyue Yuan, Matthew Boakes, Xiao Ma, Dongmei Cao, Shujun Li,	(参考訳) 商業組織は、絶え間なく増加する個人情報を保持し、処理している。ポリシーや法律は、これらの企業がデータの収集、保管、処理、共有に関してより透明性を持たなければならないように、継続的に変更されている。本稿では、プライバシポリシから抽出した個人データフローを可視化するケーススタディとして、Booking.comを取り上げている。消費者の個人情報の共有方法を示すことによって、私たちは質問を提起し、プライバシポリシを使用してオンラインユーザに対して、個人データフローの真の規模と状況について通知する際の課題と制限に関する議論を拡大します。このケーススタディは、よりデータフロー指向のプライバシポリシ分析に関する今後の研究や、複雑なビジネスエコシステムにおける個人データフローに関するより包括的なオントロジーの構築について教えてくれます。 Commercial organisations are holding and processing an ever-increasing amount of personal data. Policies and laws are continually changing to require these companies to be more transparent regarding the collection, storage, processing and sharing of this data. This paper reports our work of taking Booking.com as a case study to visualise personal data flows extracted from their privacy policy. By showcasing how the company shares its consumers' personal data, we raise questions and extend discussions on the challenges and limitations of using privacy policies to inform online users about the true scale and the landscape of personal data flows. This case study can inform us about future research on more data flow-oriented privacy policy analysis and on the construction of a more comprehensive ontology on personal data flows in complicated business ecosystems.	翻訳日:2024-11-09 15:13:22 公開日:2024-09-20
# ARTICLE: 文脈学習によるアノテーションの信頼性 ARTICLE: Annotator Reliability Through In-Context Learning ( http://arxiv.org/abs/2409.12218v2 ) ライセンス: Link先を確認	Sujan Dutta, Deepak Pandita, Tharindu Cyril Weerasooriya, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh,	(参考訳) トレーニングおよび評価データにおけるアノテータの品質を保証することは、NLPにおける機械学習の重要な部分である。感情分析や攻撃的音声検出といった課題は本質的に主観的であり、誠実なアノテータ間の意見の相違による作業不足による不一致の区別が困難であるため、従来の品質評価アプローチでは難しいシナリオを生み出す。一貫性を確保しつつアノテーションの多様な視点を増大させることを目的として,自己整合性を通じてアノテーションの品質を推定するインコンテキストラーニング(ICL)フレームワークである‘texttt{ARTICLE} を提案する。我々は,複数のLLMを用いて2つの攻撃的音声データセット上でこの枠組みを評価し,その性能を従来の手法と比較した。以上の結果から, 信頼性アノテータを識別する堅牢な手法として, <texttt{ARTICLE} が利用可能であることが示唆された。 Ensuring annotator quality in training and evaluation data is a key piece of machine learning in NLP. Tasks such as sentiment analysis and offensive speech detection are intrinsically subjective, creating a challenging scenario for traditional quality assessment approaches because it is hard to distinguish disagreement due to poor work from that due to differences of opinions between sincere annotators. With the goal of increasing diverse perspectives in annotation while ensuring consistency, we propose \texttt{ARTICLE}, an in-context learning (ICL) framework to estimate annotation quality through self-consistency. We evaluate this framework on two offensive speech datasets using multiple LLMs and compare its performance with traditional methods. Our findings indicate that \texttt{ARTICLE} can be used as a robust method for identifying reliable annotators, hence improving data quality.	翻訳日:2024-11-07 15:49:40 公開日:2024-09-20
# 連続変数を持つフォン・ノイマン型相互作用ハミルトニアンからのスペクトル放送構造 Spectrum Broadcast Structures from von Neumann type interaction Hamiltonians with continuous variables ( http://arxiv.org/abs/2409.12372v2 ) ライセンス: Link先を確認	Alberto Acevedo, Janek Wehr, Jarosław Korbicz,	(参考訳) 本稿では,最近確立されたスペクトル放送構造理論(SBS)の数学的基礎に貢献する。これらは多部量子状態であり、目的性の操作的概念を符号化し、より先進的なデコヒーレンスを示す。我々は、自由量子系の理論においてユビキタスなフォン・ノイマン型測定相互作用を介して、N環境と相互作用する中心系において、SBSとSBSへの漸近収束について研究する。系が無限次元ヒルベルト空間によってモデル化され、ハミルトニアンの系に付随する作用素が純粋に連続スペクトルを持つ場合に焦点を当てる。このようなセットアップは、SBS理論で対処されていないヒッヘルトを数学的に複雑化する。 In this paper, we contribute to the mathematical foundations of the recently established theory of Spectrum Broadcast Structures (SBS). These are multipartite quantum states, encoding an operational notion of objectivity and exhibiting a more advanced form of decoherence. We study SBS and asymptotic convergence to SBS in the case of a central system interacting with N environments via the von Neumann-type measurement interactions, ubiquitous in the theory of open quantum systems. We will be focusing on the case where the system is modeled by an infinite-dimensional Hilbert space and the operators associated with the system in the Hamiltonian have purely continuous spectrum. Such a setup yields mathematical complications that have hitherto not been addressed in the theory of SBS.	翻訳日:2024-11-07 15:14:47 公開日:2024-09-20
# 自動走査透過電子顕微鏡実験における教師なし逆方向画像分割 Unsupervised Reward-Driven Image Segmentation in Automated Scanning Transmission Electron Microscopy Experiments ( http://arxiv.org/abs/2409.12462v2 ) ライセンス: Link先を確認	Kamyar Barakati, Utkarsh Pratiush, Austin C. Houston, Gerd Duscher, Sergei V. Kalinin,	(参考訳) 走査透過電子顕微鏡(STEM)における自動実験は、人間の解釈、意思決定、サイト選択分光法、原子操作のためのデータ表現を最適化するために、高速な画像分割を必要とする。現在、セグメンテーションタスクは典型的には、人間のラベル付きデータを必要とし、解像度、サンプリング、ビーム形状の変化に起因する分布外ドリフト効果に敏感な教師付き機械学習手法を用いて実行される。本稿では,STEMにおけるオンザフライ画像解析のための報酬駆動最適化ワークフローの運用とベンチマークを行う。この教師なしのアプローチは、人間のラベルに依存しておらず、完全に説明可能であるため、はるかに堅牢である。説明的フィードバックは、人間が意思決定を検証し、報酬関数のパレートフロンティアに沿って位置を選択することでモデルを調整するのに役立つ。本手法のタイミングと有効性を確立し,高スループットおよび動的自動STEM実験におけるリアルタイム性能を示す。報酬駆動型アプローチは、説明可能な堅牢な分析ワークフローの構築を可能にし、電子顕微鏡や走査型プローブ顕微鏡、化学画像の幅広い画像解析タスクに一般化することができる。 Automated experiments in scanning transmission electron microscopy (STEM) require rapid image segmentation to optimize data representation for human interpretation, decision-making, site-selective spectroscopies, and atomic manipulation. Currently, segmentation tasks are typically performed using supervised machine learning methods, which require human-labeled data and are sensitive to out-of-distribution drift effects caused by changes in resolution, sampling, or beam shape. Here, we operationalize and benchmark a recently proposed reward-driven optimization workflow for on-the fly image analysis in STEM. This unsupervised approach is much more robust, as it does not rely on human labels and is fully explainable. The explanatory feedback can help the human to verify the decision making and potentially tune the model by selecting the position along the Pareto frontier of reward functions. We establish the timing and effectiveness of this method, demonstrating its capability for real-time performance in high-throughput and dynamic automated STEM experiments. The reward driven approach allows to construct explainable robust analysis workflows and can be generalized to a broad range of image analysis tasks in electron and scanning probe microscopy and chemical imaging.	翻訳日:2024-11-07 14:41:29 公開日:2024-09-20
# 自己回帰型言語モデルにおける知識蒸留における分布移動の探索と促進 Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models ( http://arxiv.org/abs/2409.12512v2 ) ライセンス: Link先を確認	Jun Rao, Xuebo Liu, Zepeng Lin, Liang Ding, Jing Li, Dacheng Tao, Min Zhang,	(参考訳) 知識蒸留(KD)は、より小さな学生モデルを模倣するように訓練することで、大きな教師モデルを圧縮する技術である。自動回帰言語モデルにおけるKDの成功は主に、露出バイアスに対処するために、モード探索と学生生成出力(SGO)にReverse KLに依存する。理論的解析と実験による検証の結果,Reverse KLは教師分布の特定の特徴を効果的に模倣するが,その行動のほとんどを捉えないことがわかった。逆に、SGOは、特に生徒モデルが教師モデルよりもかなり小さい場合、高い計算コストを発生させ、最適化の課題を示す。これらの制約は主に教師モデルの不変分布によるもので、様々な大きさのモデルに適応的に適応できない。オンライン知識蒸留(OKD)を導入し、教師ネットワークは小さなオンラインモジュールを統合し、学生モデルと同時学習する。この戦略は、オンラインサンプリングの必要性を排除し、トレーニング中に教師のオンラインモジュールのパラメータを最小限に更新するだけで、学生の配布に動的に適応することで蒸留をより良くする。複数の世代データセットにまたがる大規模な結果から、OKDは様々なモデルアーキテクチャやサイズにおいて、リードメソッドのパフォーマンスを達成または超えることを示し、トレーニング時間を最大4倍に短縮する。 Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them. The success of KD in auto-regressive language models mainly relies on Reverse KL for mode-seeking and student-generated output (SGO) to combat exposure bias. Our theoretical analyses and experimental validation reveal that while Reverse KL effectively mimics certain features of the teacher distribution, it fails to capture most of its behaviors. Conversely, SGO incurs higher computational costs and presents challenges in optimization, particularly when the student model is significantly smaller than the teacher model. These constraints are primarily due to the immutable distribution of the teacher model, which fails to adjust adaptively to models of varying sizes. We introduce Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model. This strategy abolishes the necessity for on-policy sampling and merely requires minimal updates to the parameters of the teacher's online module during training, thereby allowing dynamic adaptation to the student's distribution to make distillation better. Extensive results across multiple generation datasets show that OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.	翻訳日:2024-11-07 14:41:29 公開日:2024-09-20
# Michelangelo: 遅延構造クエリによるHaystackを越えた長期のコンテキスト評価 Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries ( http://arxiv.org/abs/2409.12640v2 ) ライセンス: Link先を確認	Kiran Vodrahalli, Santiago Ontanon, Nilesh Tripuraneni, Kelvin Xu, Sanil Jain, Rakesh Shivanna, Jeffrey Hui, Nishanth Dikkala, Mehran Kazemi, Bahare Fatemi, Rohan Anil, Ethan Dyer, Siamak Shakeri, Roopali Vij, Harsh Mehta, Vinay Ramasesh, Quoc Le, Ed Chi, Yifeng Lu, Orhan Firat, Angeliki Lazaridou, Jean-Baptiste Lespiau, Nithya Attaluri, Kate Olszewska,	(参考訳) ミケランジェロ(Michelangelo)は、大規模言語モデルに対する最小限の、合成的で、未学習の長文推論評価であり、自動採点も容易である。この評価は、任意の長さのコンテキストに対する評価のための、新しい統一されたフレームワークによって導かれる。 Latent Structure Queries Framework (LSQ) の中心的な考え方は、コンテキスト内の無関係な情報を 'chisel away'' するモデルを必要とするタスクを構築し、コンテキスト内の遅延構造を明らかにすることである。この潜在構造に対するモデルの理解を検証するため、モデルに構造の詳細を問い合わせる。 LSQを用いて、コードおよび自然言語ドメイン間での3つの診断長文評価を行い、長文言語モデル機能のより強力な信号を提供する。いくつかの最先端モデルで評価を行い、その両方を実証する。 a) 提案された評価は高信号であり、かつ b)長文情報の合成に改善の余地があること。 We introduce Michelangelo: a minimal, synthetic, and unleaked long-context reasoning evaluation for large language models which is also easy to automatically score. This evaluation is derived via a novel, unifying framework for evaluations over arbitrarily long contexts which measure the model's ability to do more than retrieve a single piece of information from its context. The central idea of the Latent Structure Queries framework (LSQ) is to construct tasks which require a model to ``chisel away'' the irrelevant information in the context, revealing a latent structure in the context. To verify a model's understanding of this latent structure, we query the model for details of the structure. Using LSQ, we produce three diagnostic long-context evaluations across code and natural-language domains intended to provide a stronger signal of long-context language model capabilities. We perform evaluations on several state-of-the-art models and demonstrate both that a) the proposed evaluations are high-signal and b) that there is significant room for improvement in synthesizing long-context information.	翻訳日:2024-11-07 14:08:12 公開日:2024-09-20
# PRAGA:空間多モードオミクス解析のためのプロトタイプ対応グラフ適応アグリゲーション PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis ( http://arxiv.org/abs/2409.12728v2 ) ライセンス: Link先を確認	Xinlei Huang, Zhiqi Ma, Dian Meng, Yanran Liu, Shiwei Ruan, Qingqiang Sun, Xubin Zheng, Ziyue Qiao,	(参考訳) 2023年にネイチャー・メソッドズ(Nature Methods)によって先進的な生物学的手法として強調された空間多モードオミクス技術は、生物学的規制プロセスを空間的文脈で解決する上で重要な役割を担っている。近年、K-nearest neighbor(KNN)グラフに基づくグラフニューラルネットワークは、シークエンシングスポット間の意味関係をモデル化する能力により、空間的マルチモーダルオミクス法で注目されている。しかし、固定されたKNNグラフは、生物学的シーケンシングプロセス中に避けられないデータ摂動によって隠された潜伏意味関係を捕捉できず、意味情報が失われる。さらに、スポットアノテーションの欠如や、実際にはクラス番号の先行が、空間的マルチモーダルオミクスモデルの最適化を妨げている。本稿では,空間的マルチモーダルオミクス分析(PRAGA)のための空間的マルチモーダルオミクス解決フレームワークであるPRototype-Aware Graph Adaptative Aggregationを提案する。 PRAGAは動的グラフを構築し、潜在意味関係を捉え、空間情報と特徴意味論を包括的に統合する。学習可能なグラフ構造は、クロスモーダルな知識を学習することで摂動を損なうこともできる。さらに, ベイジアン・ガウス混合モデルの動的適応性に基づいて, 未知の生物前駆体に対するマルチモーダルオミクス表現を最適化する, 動的プロトタイプ型コントラスト学習を提案する。 7つの競合する手法によるシミュレーションおよび実データに対する定量的および定性的な実験は、PRAGAの優れた性能を示す。 Spatial multi-modal omics technology, highlighted by Nature Methods as an advanced biological technique in 2023, plays a critical role in resolving biological regulatory processes with spatial context. Recently, graph neural networks based on K-nearest neighbor (KNN) graphs have gained prominence in spatial multi-modal omics methods due to their ability to model semantic relations between sequencing spots. However, the fixed KNN graph fails to capture the latent semantic relations hidden by the inevitable data perturbations during the biological sequencing process, resulting in the loss of semantic information. In addition, the common lack of spot annotation and class number priors in practice further hinders the optimization of spatial multi-modal omics models. Here, we propose a novel spatial multi-modal omics resolved framework, termed PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA). PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics. The learnable graph structure can also denoise perturbations by learning cross-modal knowledge. Moreover, a dynamic prototype contrastive learning is proposed based on the dynamic adaptability of Bayesian Gaussian Mixture Models to optimize the multi-modal omics representations for unknown biological priors. Quantitative and qualitative experiments on simulated and real datasets with 7 competing methods demonstrate the superior performance of PRAGA.	翻訳日:2024-11-07 13:45:42 公開日:2024-09-20
# 天然シリコン/シリコン-ゲルマニウム中の300$\,$mmウェハ加工スピン量子ビット Industrial 300$\,$mm wafer processed spin qubits in natural silicon/silicon-germanium ( http://arxiv.org/abs/2409.12731v2 ) ライセンス: Link先を確認	Thomas Koch, Clement Godfrin, Viktor Adam, Julian Ferrero, Daniel Schroller, Noah Glaeser, Stefan Kubicek, Ruoyu Li, Roger Loo, Shana Massar, George Simion, Danny Wan, Kristiaan De Greve, Wolfgang Wernsdorfer,	(参考訳) 普遍量子コンピュータの実現には数千から数百万の量子ビットの演算が必要である。既存の産業用半導体製造技術とインフラをアップスケーリングと再現性に利用できるため、シリコンベースのスピンキュービットはこの目標を達成する上で最も有望なプラットフォームとなっている。現在最大の半導体ベースの量子プロセッサの実装は、低電荷ノイズ、長いクビットコヒーレンス時間、高速駆動速度で知られるシリコン/シリコン-ゲルマニウムヘテロ構造で実現されたが、高構造的な複雑さは工業的実装の課題を生み出している。ここでは, 天然Si/SiGeヘテロ構造にホストされる量子ドットを, ヘテロ構造成長からCoマイクロマグネットモノリシック集積に至るまで, 産業用300$\,$mm半導体ウェハプロセスラインで完全に作製した。 2$\,\mathrm{\mu eV/\sqrt{Hz}}$, 1$\,$s, コヒーレンス時間$T_2^$, $T_2^H$ of 1$\,\mathrm{\mu s}$, 50$\,\mathrm{\mu s}$のスピン緩和時間について報告する。さらに、5$\,$MHzまでのRabi周波数と、99$\,\%$以上の単一キュービットゲートフィデリティを実現する。スケーラビリティに加えて、300$\,$mmプロセスの高い再現性は、キュービット品質の最適化に不可欠であるプロセスパラメータに対するキュービットメートル法依存性の決定論的研究を可能にする。 The realisation of an universal quantum computer will require the operation of thousands to millions of qubits. The possibility of using existing industrial semiconductor fabrication techniques and infrastructure for up-scaling and reproducibility makes silicon based spin qubits one of the most promising platforms to achieve this goal. The implementation of the up to now largest semiconductor based quantum processor was realized in a silicon/silicon-germanium heterostructure known for its low charge noise, long qubit coherence times and fast driving speeds, but the high structural complexity creates challenges for industrial implementations. Here we demonstrate quantum dots hosted in a natural Si/SiGe heterostructure fully fabricated by an industrial 300$\,$mm semiconductor wafer process line from heterostructure growth to Co micromagnet monolithic integration. We report charge noise values below 2$\,\mathrm{\mu eV/\sqrt{Hz}}$, spin relaxation times of over 1$\,$s and coherence times $T_2^$ and $T_2^H$ of 1$\,\mathrm{\mu s}$ and 50$\,\mathrm{\mu s}$ respectively, for quantum wells grown using natural silicon. Further, we achieve Rabi frequencies up to 5$\,$MHz and single qubit gate fidelities above 99$\,\%$. In addition to scalability, the high reproducibility of the 300$\,$mm processes enables the deterministic study of qubit metric dependencies on process parameters, which is essential for optimising qubit quality.	翻訳日:2024-11-07 13:45:42 公開日:2024-09-20
# 医学用微調整大言語モデル : 直接選好最適化の役割と意義 Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Preference Optimization ( http://arxiv.org/abs/2409.12741v2 ) ライセンス: Link先を確認	Thomas Savage, Stephen Ma, Abdessalem Boukil, Vishwesh Patel, Ekanath Rangan, Ivan Rodriguez, Jonathan H Chen,	(参考訳) 医学分野では,Large Language Model (LLM) の微調整が不十分である。ファインチューニングの最も一般的な2つの方法は、Supervised Fine Tuning (SFT) と Direct Preference Optimization (DPO) であるが、どちらのテクニックを使うかをユーザーに伝えるガイダンスはほとんどない。本研究は,医学における5つの共通自然言語タスクにおけるSFTとDPOの性能の比較である。テキストデータの分類,数値データの分類,臨床推論,要約,臨床トリアージである。 SFTだけではテキストデータの分類に十分であるのに対し、DPOは、より複雑な臨床推論、要約、臨床トリアージのタスクのパフォーマンスを向上させる。本研究は,医療におけるDPO微調整の役割と重要性を確立し,この手法の普及を阻止する現在のソフトウェアギャップに注意を払っている。 Large Language Model (LLM) fine tuning is underutilized in the field of medicine. Two of the most common methods of fine tuning are Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO), but there is little guidance informing users when to use either technique. In this investigation, we compare the performance of SFT and DPO for five common natural language tasks in medicine: Classification with text data, Classification with numeric data, Clinical Reasoning, Summarization, and Clinical Triage. We find that SFT alone is sufficient for Classification with text data, whereas DPO improves performance for the more complex tasks of Clinical Reasoning, Summarization and Clinical Triage. Our results establish the role and importance of DPO fine tuning within medicine, and consequently call attention to current software gaps that prevent widespread deployment of this technique.	翻訳日:2024-11-07 13:34:43 公開日:2024-09-20
# デジタル双対再利用性向上のための添加物製造監視システムのドメイン適応に関する研究 Investigation on domain adaptation of additive manufacturing monitoring systems to enhance digital twin reusability ( http://arxiv.org/abs/2409.12785v2 ) ライセンス: Link先を確認	Jiarui Xie, Zhuo Yang, Chun-Chun Hu, Haw-Ching Yang, Yan Lu, Yaoyao Fiona Zhao,	(参考訳) パウダーベッド・フュージョン(PBF)は、複雑なジオメトリーの迅速な製造を可能にする新しい金属添加物製造(AM)技術である。しかし、細孔や発声などの欠陥が生じ、構造上の不整合が生じ、部品の機械的性能が損なわれる可能性がある。これは、一部の欠陥の性質がプロセス中に確率的であり、外部から見えないため、品質保証にとって重要な課題となっている。この問題に対処するために、機械学習(ML)ベースのモデリングを用いたデジタルツイン(DT)をAMプロセスの監視と制御のためにデプロイすることができる。メルトプールはプロセス監視において最もよく見られる物理現象の1つであり、通常は高速カメラによって行われる。ラベル付きおよび前処理後、メルトプール画像を使用して、プロセス異常検出や印刷品質評価などのDTアプリケーションのためのMLベースのモデルをトレーニングする。それでも、AMマシンや監視機器など、AM設定の幅広いばらつきのため、DTの再利用性は制限されている。ある設定から収集したデータセットを使用してトレーニングされたMLモデルのパフォーマンスは、通常、他の設定に適用した場合に損なわれる。本稿では,AM DTの再利用性を高めるため,異なるAM設定間の知識伝達パイプラインを提案する。ソースとターゲットのデータセットは、National Institute of Standards and TechnologyとNational Cheng Kung Universityから、異なるカメラ、材料、AMマシン、プロセスパラメータで収集されている。提案されたパイプラインは、データ前処理、データ拡張、ドメインアライメント、決定アライメントの4つのステップで構成されている。ソースデータセットのみを使用してトレーニングされたモデルと比較して、このパイプラインは、ターゲットデータセットからのラベル付きトレーニングデータなしで、メルトプール異常検出の精度を31%向上させた。 Powder bed fusion (PBF) is an emerging metal additive manufacturing (AM) technology that enables rapid fabrication of complex geometries. However, defects such as pores and balling may occur and lead to structural unconformities, thus compromising the mechanical performance of the part. This has become a critical challenge for quality assurance as the nature of some defects is stochastic during the process and invisible from the exterior. To address this issue, digital twin (DT) using machine learning (ML)-based modeling can be deployed for AM process monitoring and control. Melt pool is one of the most commonly observed physical phenomena for process monitoring, usually by high-speed cameras. Once labeled and preprocessed, the melt pool images are used to train ML-based models for DT applications such as process anomaly detection and print quality evaluation. Nonetheless, the reusability of DTs is restricted due to the wide variability of AM settings, including AM machines and monitoring instruments. The performance of the ML models trained using the dataset collected from one setting is usually compromised when applied to other settings. This paper proposes a knowledge transfer pipeline between different AM settings to enhance the reusability of AM DTs. The source and target datasets are collected from the National Institute of Standards and Technology and National Cheng Kung University with different cameras, materials, AM machines, and process parameters. The proposed pipeline consists of four steps: data preprocessing, data augmentation, domain alignment, and decision alignment. Compared with the model trained only using the source dataset, this pipeline increased the melt pool anomaly detection accuracy by 31% without any labeled training data from the target dataset.	翻訳日:2024-11-07 13:23:33 公開日:2024-09-20
# ROV-Extended Abstract を用いたバイオファウリングビルド状態推定のための自律的ビジュアルフィッシュペン検査 Autonomous Visual Fish Pen Inspections for Estimating the State of Biofouling Buildup Using ROV -- Extended Abstract ( http://arxiv.org/abs/2409.12813v2 ) ライセンス: Link先を確認	Matej Fabijanić, Nadir Kapetanović, Nikola Mišković,	(参考訳) 魚介類検査のプロセスは、小規模でも工業でも、どの魚養殖所でも必要なメンテナンス作業であり、完全に自動化される可能性のある作業である。自律的な海洋車両で定期的な検査を行う訓練されたダイバーをリプレースすることで、人力のコストを低減し、水中検査を行う人間に関連するリスクを取り除くことができる。このような自律性のレベルを達成することは、バイオファウル化ビルドの状態を推定できる画像処理アルゴリズムを開発することを意味する。本研究の目的は、ROVのための自律制御アルゴリズムの開発から、魚介類の画像の自動分割、バイオファウリング状態の正確な推定に至るまで、これらの検査プロセスを自動化するための完全なソリューションを提案することである。第1部は、市販のROVを音響SBL位置決めシステムで修正し、閉ループ制御システムを開発する。第2の部分は、画像セグメンテーションを行うためにAIに依存するバイオファウリング推定フレームワークを実装し、確立されたコンピュータビジョン手法を用いて画像を処理することにより、魚のケージからROVの距離を大まかに推定することで実現される。これには、トレーニング対象のセマンティックセグメンテーションを実行するニューラルネットワーク用のイメージデータセットを作成するためのラベルツールの開発も含まれていた。実験結果から, 自律ミッションに音響トランスポンダを装着したROVの有効性を示し, 良好な距離推定能力とともに, バイオファウリング推定フレームワークが正確な評価を行う能力を示した。その結果, 生物汚濁推定精度は養殖業での利用可能性を示している。 The process of fish cage inspections, which is a necessary maintenance task at any fish farm, be it small scale or industrial, is a task that has the potential to be fully automated. Replacing trained divers who perform regular inspections with autonomous marine vehicles would lower the costs of manpower and remove the risks associated with humans performing underwater inspections. Achieving such a level of autonomy implies developing an image processing algorithm that is capable of estimating the state of biofouling buildup. The aim of this work is to propose a complete solution for automating the said inspection process; from developing an autonomous control algorithm for an ROV, to automatically segmenting images of fish cages, and accurately estimating the state of biofouling. The first part is achieved by modifying a commercially available ROV with an acoustic SBL positioning system and developing a closed-loop control system. The second part is realized by implementing a proposed biofouling estimation framework, which relies on AI to perform image segmentation, and by processing images using established computer vision methods to obtain a rough estimate of the distance of the ROV from the fish cage. This also involved developing a labeling tool in order to create a dataset of images for the neural network performing the semantic segmentation to be trained on. The experimental results show the viability of using an ROV fitted with an acoustic transponder for autonomous missions, and demonstrate the biofouling estimation framework's ability to provide accurate assessments, alongside satisfactory distance estimation capabilities. In conclusion, the achieved biofouling estimation accuracy showcases clear potential for use in the aquaculture industry.	翻訳日:2024-11-07 13:23:33 公開日:2024-09-20
# スマートスケーリング: 小規模モデル初期化による大規模言語モデルの事前トレーニングの高速化 Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization ( http://arxiv.org/abs/2409.12903v2 ) ライセンス: Link先を確認	Mohammad Samragh, Iman Mirzadeh, Keivan Alizadeh Vahid, Fartash Faghri, Minsik Cho, Moin Nabi, Devang Naik, Mehrdad Farajtabar,	(参考訳) 言語モデルの事前学習フェーズは、しばしばランダムに初期化パラメータから始まる。モデルスケーリングの現在のトレンドでは、大量のパラメータをトレーニングするのは、非常に遅くてコストがかかります。対照的に、小さな言語モデルは訓練に費用がかからないが、大きなモデルの精度を達成できないことが多い。本稿では,これら2つの制度を接続する興味深いアイデアを探求する。より小さな事前学習モデルを用いて,大規模言語モデルを初期化する手法を開発することができるか? このような初期化は、トレーニング時間と最終的な正確性という面で、何らかのメリットをもたらすのだろうか? 本稿では,事前学習した言語モデルのパラメータを,隠れ次元の増大した大規模モデルのパラメータに拡張する手法であるHyperCloningを紹介する。我々の手法は、より大きなモデルがより小さなモデルの機能を保っていることを保証します。結果として、より大きなモデルは、トレーニングを開始する前に、より小さなモデルの予測能力と精度をすでに継承している。このような初期化モデルをトレーニングすることで,大規模言語モデルの事前学習に必要なGPU時間を大幅に削減できることを実証する。 The pre-training phase of language models often begins with randomly initialized parameters. With the current trends in scaling models, training their large number of parameters can be extremely slow and costly. In contrast, small language models are less expensive to train, but they often cannot achieve the accuracy of large models. In this paper, we explore an intriguing idea to connect these two different regimes: Can we develop a method to initialize large language models using smaller pre-trained models? Will such initialization bring any benefits in terms of training time and final accuracy? In this paper, we introduce HyperCloning, a method that can expand the parameters of a pre-trained language model to those of a larger model with increased hidden dimensions. Our method ensures that the larger model retains the functionality of the smaller model. As a result, the larger model already inherits the predictive power and accuracy of the smaller model before the training starts. We demonstrate that training such an initialized model results in significant savings in terms of GPU hours required for pre-training large language models.	翻訳日:2024-11-07 12:59:09 公開日:2024-09-20
# CorBin-FL:共通ランダム性を用いた個人差分フェデレーション学習機構 CorBin-FL: A Differentially Private Federated Learning Mechanism using Common Randomness ( http://arxiv.org/abs/2409.13133v1 ) ライセンス: Link先を確認	Hojat Allah Salehi, Md Jueal Mia, S. Sandeep Pradhan, M. Hadi Amini, Farhad Shirani,	(参考訳) Federated Learning (FL)は、分散機械学習のための有望なフレームワークとして登場した。複数のクライアント間の協調学習を可能にし、分散データとコンピューティングリソースを活用する。しかし、FLはプライバシー保証、通信効率、全体的なモデル精度のバランスをとることの課題に直面している。本研究では,モデル全体の精度を維持しつつ,相関二項確率量子化を用いて差分プライバシーを実現するプライバシメカニズムであるCorBin-FLを紹介する。このアプローチでは、セキュアなマルチパーティ計算技術を使用して、クライアントが個々のプライバシを損なうことなく、ローカルモデル更新の相関量子化を行うことができる。我々は,CorBin-FLがパラメータレベルの局所差分プライバシー(PLDP)を達成すること,および平均二乗誤差ユーティリティ尺度とPLDPプライバシー尺度との間のプライバシー効用トレードオフを漸近的に最適化することを示す理論的解析を行った。さらに,PLDPに加えて,ユーザレベルおよびサンプルレベルの中央差分プライバシー保証を実現する拡張であるAugCorBin-FLを提案する。両方のメカニズムに対して、プライバシパラメータと平均2乗誤差性能測定値のバウンダリを導出する。 MNISTとCIFAR10データセットの大規模な実験により、我々のメカニズムは、同一のPLDPプライバシー予算の下でモデル精度の点で、ガウスとラプラシアのメカニズムを含む既存の微分プライベートFLメカニズムよりも優れていることが示された。 Federated learning (FL) has emerged as a promising framework for distributed machine learning. It enables collaborative learning among multiple clients, utilizing distributed data and computing resources. However, FL faces challenges in balancing privacy guarantees, communication efficiency, and overall model accuracy. In this work, we introduce CorBin-FL, a privacy mechanism that uses correlated binary stochastic quantization to achieve differential privacy while maintaining overall model accuracy. The approach uses secure multi-party computation techniques to enable clients to perform correlated quantization of their local model updates without compromising individual privacy. We provide theoretical analysis showing that CorBin-FL achieves parameter-level local differential privacy (PLDP), and that it asymptotically optimizes the privacy-utility trade-off between the mean square error utility measure and the PLDP privacy measure. We further propose AugCorBin-FL, an extension that, in addition to PLDP, achieves user-level and sample-level central differential privacy guarantees. For both mechanisms, we derive bounds on privacy parameters and mean squared error performance measures. Extensive experiments on MNIST and CIFAR10 datasets demonstrate that our mechanisms outperform existing differentially private FL mechanisms, including Gaussian and Laplacian mechanisms, in terms of model accuracy under equal PLDP privacy budgets.	翻訳日:2024-11-07 11:52:12 公開日:2024-09-20
# ラベルマスキング蒸留によるフェデレートラーニング Federated Learning with Label-Masking Distillation ( http://arxiv.org/abs/2409.13136v1 ) ライセンス: Link先を確認	Jianghu Lu, Shikun Li, Kexin Bao, Pengju Wang, Zhenxing Qian, Shiming Ge,	(参考訳) フェデレーション学習は、グローバルサーバの調整を通じて、複数のローカルクライアントに分散したデータ上でモデルを協調的にトレーニングするための、プライバシ保護の方法を提供する。本稿では,クライアントのユーザ行動が異なるため,異なるクライアント間のラベル分布が著しく異なる,フェデレート学習におけるラベル分布スキューに着目した。このようなケースに直面した場合、ほとんどの既存手法は、クライアントにおけるラベル分布情報の不十分な利用により、最適以下に最適化される。そこで我々は,FedLMDと呼ばれるラベルマスキング蒸留手法を提案し,各クライアントのラベル分布を知覚することで,フェデレーション学習を容易にする。トレーニング中のクラス毎のサンプル数に基づいて、ラベルを多数と少数に分類する。クライアントモデルは、ローカルデータから大多数のラベルの知識を学習する。蒸留のプロセスは、グローバルモデルから大多数のラベルの予測を隠蔽し、クライアントのマイノリティなラベル知識の保存に集中できるようにします。一連の実験により, 提案手法は様々なケースで最先端の性能を達成できることが示されている。さらに,クライアントの限られたリソースを考慮し,計算コストを増大させることなく,従来の軽量なアプローチよりも優れた教師を必要としないFedLMD-Tfを提案する。私たちのコードはhttps://github.com/wnma3mz/FedLMDで利用可能です。 Federated learning provides a privacy-preserving manner to collaboratively train models on data distributed over multiple local clients via the coordination of a global server. In this paper, we focus on label distribution skew in federated learning, where due to the different user behavior of the client, label distributions between different clients are significantly different. When faced with such cases, most existing methods will lead to a suboptimal optimization due to the inadequate utilization of label distribution information in clients. Inspired by this, we propose a label-masking distillation approach termed FedLMD to facilitate federated learning via perceiving the various label distributions of each client. We classify the labels into majority and minority labels based on the number of examples per class during training. The client model learns the knowledge of majority labels from local data. The process of distillation masks out the predictions of majority labels from the global model, so that it can focus more on preserving the minority label knowledge of the client. A series of experiments show that the proposed approach can achieve state-of-the-art performance in various cases. Moreover, considering the limited resources of the clients, we propose a variant FedLMD-Tf that does not require an additional teacher, which outperforms previous lightweight approaches without increasing computational costs. Our code is available at https://github.com/wnma3mz/FedLMD.	翻訳日:2024-11-07 11:52:12 公開日:2024-09-20
# リラベル蒸留による深部ネットワーク予測の解釈 Interpret the Predictions of Deep Networks via Re-Label Distillation ( http://arxiv.org/abs/2409.13137v1 ) ライセンス: Link先を確認	Yingying Hua, Shiming Ge, Daichi Zhang,	(参考訳) ブラックボックスのディープネットワークの予測を解釈することで、デプロイメントの信頼性が向上する。本研究では,入力から予測への直接写像を自己超越的に学習するための再ラベル蒸留手法を提案する。画像はVAEサブスペースに投影され、潜在ベクトルをランダムに摂動させることで、いくつかの合成画像を生成する。そして、これらの合成画像は、2つのクラスのうちの1つにアノテートすることができる。その後、ディープネットワークで注釈付けされたラベルを教師として使用し、これらの合成画像をクラスにマッピングすることで、アノテーションを近似する線形学生モデルを訓練する。このようにして、これらの再ラベルされた合成画像はディープネットワークの局所的な分類機構をうまく記述することができ、学習した学生は予測に対してより直感的な説明を提供することができる。本手法の有効性を質的,定量的に検証した。 Interpreting the predictions of a black-box deep network can facilitate the reliability of its deployment. In this work, we propose a re-label distillation approach to learn a direct map from the input to the prediction in a self-supervision manner. The image is projected into a VAE subspace to generate some synthetic images by randomly perturbing its latent vector. Then, these synthetic images can be annotated into one of two classes by identifying whether their labels shift. After that, using the labels annotated by the deep network as teacher, a linear student model is trained to approximate the annotations by mapping these synthetic images to the classes. In this manner, these re-labeled synthetic images can well describe the local classification mechanism of the deep network, and the learned student can provide a more intuitive explanation towards the predictions. Extensive experiments verify the effectiveness of our approach qualitatively and quantitatively.	翻訳日:2024-11-07 11:52:12 公開日:2024-09-20
# 高レベル合成のためのハードウェア設計の比較学習 Learning to Compare Hardware Designs for High-Level Synthesis ( http://arxiv.org/abs/2409.13138v1 ) ライセンス: Link先を確認	Yunsheng Bai, Atefeh Sohrabizadeh, Zijian Ding, Rongjian Liang, Weikai Li, Ding Wang, Haoxing Ren, Yizhou Sun, Jason Cong,	(参考訳) 高レベル合成(HLS)は、高レベルコードをハードウェア設計に変換する自動設計プロセスであり、ハードウェアアクセラレーションの迅速な開発を可能にする。 HLSはソースコードに挿入されたディレクティブであるプラグマに依存しており、プラグマは様々な設定と値を持ち、結果として生じるハードウェア設計に大きな影響を及ぼす。 HARPのような最先端のMLベースのHLSメソッドは、まず、ソースコードとプラグマのグラフベースの表現に適用されるグラフニューラルネットワーク(GNN)に基づいて、ディープラーニングモデルを訓練する。その後、設計空間探索(DSE)を行い、プラグマ設計空間を探索し、モデルを用いて候補設計をランク付けし、トップデザインを返却する。しかし、従来のDSE手法は、プラグマ設定とパフォーマンスメトリクスの非常に非線形な関係と、非回避的な方法でパフォーマンスに影響を与えるプラグマ間の複雑な相互作用により、課題に直面している。これらの課題に対処するために,ハードウェア設計を比較して効率的なHLS最適化を行う新しいアプローチである compareXplore を提案する。 CompareXploreは、ペアワイズな選好学習とポイントワイズなパフォーマンス予測を組み合わせたハイブリッドな損失関数を導入し、モデルが相対的な選好と絶対的なパフォーマンスの両方をキャプチャできるようにする。さらに,設計間の最も情報的な差異に着目した新しいノード差注意モジュールを導入し,性能に影響を及ぼす致命的なプラグマを同定する。 CompareXploreは2段階のDSEを採用しており、初期設計プルーニングにポイントワイズ予測モデルが使用され、その後、正確な性能検証のためのペアワイズ比較ステージが採用されている。大規模な実験では、ComparXploreはランキングの指標を大幅に改善し、選択した設計に対して高品質なHLS結果を生成し、既存のSOTA法よりも優れている。 High-level synthesis (HLS) is an automated design process that transforms high-level code into hardware designs, enabling the rapid development of hardware accelerators. HLS relies on pragmas, which are directives inserted into the source code to guide the synthesis process, and pragmas have various settings and values that significantly impact the resulting hardware design. State-of-the-art ML-based HLS methods, such as HARP, first train a deep learning model, typically based on graph neural networks (GNNs) applied to graph-based representations of the source code and pragmas. They then perform design space exploration (DSE) to explore the pragma design space, rank candidate designs using the model, and return the top designs. However, traditional DSE methods face challenges due to the highly nonlinear relationship between pragma settings and performance metrics, along with complex interactions between pragmas that affect performance in non-obvious ways. To address these challenges, we propose compareXplore, a novel approach that learns to compare hardware designs for effective HLS optimization. CompareXplore introduces a hybrid loss function that combines pairwise preference learning with pointwise performance prediction, enabling the model to capture both relative preferences and absolute performance. Moreover, we introduce a novel node difference attention module that focuses on the most informative differences between designs, enabling the model to identify critical pragmas impacting performance. CompareXplore adopts a two-stage DSE, where a pointwise prediction model is used for the initial design pruning, followed by a pairwise comparison stage for precise performance verification. In extensive experiments, compareXplore achieves significant improvements in ranking metrics and generates high-quality HLS results for the selected designs, outperforming the existing SOTA method.	翻訳日:2024-11-07 11:52:12 公開日:2024-09-20
# G-Fuzz: gVisor用の直接ファジィフレームワーク G-Fuzz: A Directed Fuzzing Framework for gVisor ( http://arxiv.org/abs/2409.13139v1 ) ライセンス: Link先を確認	Yuwei Li, Yuan Chen, Shouling Ji, Xuhong Zhang, Guanglu Yan, Alex X. Liu, Chunming Wu, Zulie Pan, Peng Lin,	(参考訳) gVisorは、Googleが公開しているコンテナ用のアプリケーションレベルのカーネルである。 gVisorは軽量で分離性も高いため、多くのIT企業で広く使用されている。上流のgVisorの新しい脆弱性が見つかると、下流の開発者が対応するコードをテストしてセキュリティを維持することが重要になる。この目的を達成するために、誘導ファジィングは有望である。それにもかかわらず、gVisorに既存の有向ファジィ法を適用するには多くの課題がある。主な理由は、既存の有向ファザは主にC/C++アプリケーション用であり、gVisorはGo言語で記述されたOSカーネルであるからである。上記の課題に対処するため,gVisor用のファジィフレームワークであるG-Fuzzを提案する。 G-Fuzzには、3つのコアメソッドがあり、軽量できめ細かな距離計算、ターゲットと関連するsyscall推論と利用、探索と利用の動的スイッチがある。 G-Fuzzのメソッドは一般的なもので、他のOSカーネルに転送できる。我々はG-Fuzzの性能を評価するために広範囲な実験を行った。 Syzkaller と比較すると、最先端のカーネルファジターである G-Fuzz は性能を著しく上回っている。さらに,G-Fuzzの各コア法の重要性を厳格に評価した。 G-Fuzzは業界に展開され、深刻な脆弱性を複数発見している。 gVisor is a Google-published application-level kernel for containers. As gVisor is lightweight and has sound isolation, it has been widely used in many IT enterprises \cite{Stripe, DigitalOcean, Cloundflare}. When a new vulnerability of the upstream gVisor is found, it is important for the downstream developers to test the corresponding code to maintain the security. To achieve this aim, directed fuzzing is promising. Nevertheless, there are many challenges in applying existing directed fuzzing methods for gVisor. The core reason is that existing directed fuzzers are mainly for general C/C++ applications, while gVisor is an OS kernel written in the Go language. To address the above challenges, we propose G-Fuzz, a directed fuzzing framework for gVisor. There are three core methods in G-Fuzz, including lightweight and fine-grained distance calculation, target related syscall inference and utilization, and exploration and exploitation dynamic switch. Note that the methods of G-Fuzz are general and can be transferred to other OS kernels. We conduct extensive experiments to evaluate the performance of G-Fuzz. Compared to Syzkaller, the state-of-the-art kernel fuzzer, G-Fuzz outperforms it significantly. Furthermore, we have rigorously evaluated the importance for each core method of G-Fuzz. G-Fuzz has been deployed in industry and has detected multiple serious vulnerabilities.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# スコアベースマルチビームポイントクラウドデノイング Score-Based Multibeam Point Cloud Denoising ( http://arxiv.org/abs/2409.13143v1 ) ライセンス: Link先を確認	Li Ling, Yiping Xie, Nils Bore, John Folkesson,	(参考訳) MBES (Multibeam echo-sounder) はバスメータマッピングのためのデファクトセンサである。近年、安価なMBESセンサーとグローバルマッピングイニシアチブは、利用可能なデータの指数関数的な成長をもたらしている。しかし、生のMBESデータには半自動フィルタリングを必要とするノイズが1-25%含まれており、Cheld UncertaintyやBathymetric Estimator(CUBE)などのツールが使用されている。本研究では,3Dポイントクラウドコミュニティからインスピレーションを得て,スコアベースのポイントクラウドデノナイジングネットワークをMBESのアウトレイラ検出とデノナイジングに応用した。我々は,実際のMBES調査データに基づいて,このネットワークを訓練し,評価した。提案手法は従来の手法よりも優れており,既存のMBES標準ワークフローに容易に組み込むことができる。将来の研究を促進するために、コードと事前訓練されたモデルはオンラインで利用可能である。 Multibeam echo-sounder (MBES) is the de-facto sensor for bathymetry mapping. In recent years, cheaper MBES sensors and global mapping initiatives have led to exponential growth of available data. However, raw MBES data contains 1-25% of noise that requires semi-automatic filtering using tools such as Combined Uncertainty and Bathymetric Estimator (CUBE). In this work, we draw inspirations from the 3D point cloud community and adapted a score-based point cloud denoising network for MBES outlier detection and denoising. We trained and evaluated this network on real MBES survey data. The proposed method was found to outperform classical methods, and can be readily integrated into existing MBES standard workflow. To facilitate future research, the code and pretrained model are available online.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# GASA-UNet:3次元医用画像分割のためのグローバル軸自己注意U-Net GASA-UNet: Global Axial Self-Attention U-Net for 3D Medical Image Segmentation ( http://arxiv.org/abs/2409.13146v1 ) ライセンス: Link先を確認	Chengkun Sun, Russell Stevens Terry, Jiang Bian, Jie Xu,	(参考訳) 複数の臓器の正確なセグメンテーションと画像診断における病理組織の分化は極めて重要であるが、特にニュアンスド分類や曖昧な臓器の境界については困難である。これらの課題に対処するために,GASA-UNetを導入した。このブロックは、異なる解剖学的断面を表す各2次元平面で、画像データを3次元実体として処理する。この空間的文脈内ではVoxelの特徴が定義され、抽出した1DパッチにMHSA(Multi-Head Self-Attention)機構を利用してこれらの平面間の接続を容易にする。位置埋め込み (PE) は我々の注目の枠組みに組み込まれ, 空間的文脈でボクセルの特徴を豊かにし, 組織分類と臓器縁のデライン化を強化した。我々のモデルは, BTCV, AMOS, KiTS23の3つのベンチマークデータセット上で, Diceスコアと正規化表面Dice (NSD) を用いて, より小さな解剖学的構造に対して, セグメンテーション性能の有望な改善を実証した。 Accurate segmentation of multiple organs and the differentiation of pathological tissues in medical imaging are crucial but challenging, especially for nuanced classifications and ambiguous organ boundaries. To tackle these challenges, we introduce GASA-UNet, a refined U-Net-like model featuring a novel Global Axial Self-Attention (GASA) block. This block processes image data as a 3D entity, with each 2D plane representing a different anatomical cross-section. Voxel features are defined within this spatial context, and a Multi-Head Self-Attention (MHSA) mechanism is utilized on extracted 1D patches to facilitate connections across these planes. Positional embeddings (PE) are incorporated into our attention framework, enriching voxel features with spatial context and enhancing tissue classification and organ edge delineation. Our model has demonstrated promising improvements in segmentation performance, particularly for smaller anatomical structures, as evidenced by enhanced Dice scores and Normalized Surface Dice (NSD) on three benchmark datasets, i.e., BTCV, AMOS, and KiTS23.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# QSVMにおける量子カーネルのアンザッツにおける特徴埋め込み配置の影響 The Impact of Feature Embedding Placement in the Ansatz of a Quantum Kernel in QSVMs ( http://arxiv.org/abs/2409.13147v1 ) ライセンス: Link先を確認	Ilmo Salmenperä, Ilmars Kuhtarskis, Arianne Meijer van de Griend, Jukka K. Nurminen,	(参考訳) 量子カーネルの有用な機能マップを設計することは、古典的な機械学習モデルに対するアドバンテージを達成するための重要なタスクである。回路アーキテクチャの選択、すなわち、機能依存ゲートが他のゲートとどのように織り交ぜられるかは、比較的未解明の問題であり、量子埋め込みカーネル(QEK)と呼ばれる量子カーネルのモデルを使用する場合、非常に重要である。我々は,QEKにおける様々なアーキテクチャパターンを研究,分類し,既存のアーキテクチャスタイルが文献が想定しているように振る舞わないことを示す。また、古いものに基づいた新しい代替アーキテクチャも作成し、古いものよりも少ないゲートを含む一方で、同等に機能することを示す。 Designing a useful feature map for a quantum kernel is a critical task when attempting to achieve an advantage over classical machine learning models. The choice of circuit architecture, i.e. how feature-dependent gates should be interwoven with other gates is a relatively unexplored problem and becomes very important when using a model of quantum kernels called Quantum Embedding Kernels (QEK). We study and categorize various architectural patterns in QEKs and show that existing architectural styles do not behave as the literature supposes. We also produce a novel alternative architecture based on the old ones and show that it performs equally well while containing fewer gates than its older counterparts.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# UniTabNet: テーブル構造認識のためのブリッジングビジョンと言語モデル UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition ( http://arxiv.org/abs/2409.13148v1 ) ライセンス: Link先を確認	Zhenrong Zhang, Shuhang Liu, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Yu Hu,	(参考訳) デジタル時代には、テーブル構造認識技術は大量の表データを処理するための重要なツールである。従来の手法は主に表構造回復の視覚的側面に焦点を当てていたが、表内のテキスト意味論、特に記述的なテキスト細胞を効果的に理解できない場合が多い。本稿では,画像・テキストモデルに基づくテーブル構造解析のための新しいフレームワークUniTabNetを提案する。 UniTabNetは‘divide-and-conquer’戦略を採用し、画像とテキストのモデルを使ってテーブルセルを分離し、物理デコーダと論理デコーダを統合して完全なテーブル構造を再構築する。我々は、モデルが関連する領域に焦点を向け、予測精度を高めるビジョンガイドにより、我々のフレームワークをさらに強化する。さらに,テーブルイメージのテクスチャ意味を理解するためのモデル機能を改善するために,Language Guiderを導入する。 PubTabNet、PubTables1M、WTW、iFLYTABなどの卓越したテーブル構造データセットに基づいて、UniTabNetは、新しい最先端のパフォーマンスを実現し、我々のアプローチの有効性を実証する。コードは一般公開される予定だ。 In the digital era, table structure recognition technology is a critical tool for processing and analyzing large volumes of tabular data. Previous methods primarily focus on visual aspects of table structure recovery but often fail to effectively comprehend the textual semantics within tables, particularly for descriptive textual cells. In this paper, we introduce UniTabNet, a novel framework for table structure parsing based on the image-to-text model. UniTabNet employs a ``divide-and-conquer'' strategy, utilizing an image-to-text model to decouple table cells and integrating both physical and logical decoders to reconstruct the complete table structure. We further enhance our framework with the Vision Guider, which directs the model's focus towards pertinent areas, thereby boosting prediction accuracy. Additionally, we introduce the Language Guider to refine the model's capability to understand textual semantics in table images. Evaluated on prominent table structure datasets such as PubTabNet, PubTables1M, WTW, and iFLYTAB, UniTabNet achieves a new state-of-the-art performance, demonstrating the efficacy of our approach. The code will also be made publicly available.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# PIXERによる視覚情報ユーティリティの学習 Learning Visual Information Utility with PIXER ( http://arxiv.org/abs/2409.13151v1 ) ライセンス: Link先を確認	Yash Turkar, Timothy Chase Jr, Christo Aluckal, Karthik Dantu,	(参考訳) 正確な特徴検出は、自律ロボット工学、3D再構成、医療画像、リモートセンシングなど、様々なコンピュータビジョンタスクに欠かせない。視覚特徴の堅牢性向上の進歩にもかかわらず、特定の特徴型アルゴリズムによって処理される前の視覚情報の有用性を計測する手法は存在しない。このギャップに対処するために,PIXER と "Featureness" の概念を導入する。ベイズ学習の一般化を活用することで,モンテカルロサンプリングのようなコストのかかる操作を回避し,広範囲のアプリケーションに適応可能なカスタマイズ可能な特徴定義を許容し,画素の高機能化への寄与の確率と不確実性の両方を定量化する。 PIXERを特徴量選択性のある視覚的オドメトリーで評価し, RMSE軌道における平均31%の改善を実現し, 特徴量が49%減少した。 Accurate feature detection is fundamental for various computer vision tasks, including autonomous robotics, 3D reconstruction, medical imaging, and remote sensing. Despite advancements in enhancing the robustness of visual features, no existing method measures the utility of visual information before processing by specific feature-type algorithms. To address this gap, we introduce PIXER and the concept of "Featureness," which reflects the inherent interest and reliability of visual information for robust recognition, independent of any specific feature type. Leveraging a generalization on Bayesian learning, our approach quantifies both the probability and uncertainty of a pixel's contribution to robust visual utility in a single-shot process, avoiding costly operations such as Monte Carlo sampling and permitting customizable featureness definitions adaptable to a wide range of applications. We evaluate PIXER on visual odometry with featureness selectivity, achieving an average of 31% improvement in RMSE trajectory with 49% fewer features.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# スキップ接続を超えて: 特異点除去のためのプールとアンプール設計 Beyond Skip Connection: Pooling and Unpooling Design for Elimination Singularities ( http://arxiv.org/abs/2409.13154v1 ) ライセンス: Link先を確認	Chengkun Sun, Jinqian Pan, Juoli Jin, Russell Stevens Terry, Jiang Bian, Jie Xu,	(参考訳) 深層畳み込みニューラルネットワーク(CNN)のトレーニングでは、除去特異点の広範的問題、損失ランドスケープ内の退化多様体につながるノードの一貫した非活性化など、ユニークな課題が提示されている。これらの特異性は、特徴伝播を妨害することで効率的な学習を妨げる。これを軽減するために、私たちは、Max Pooling、Max Unpooling、3倍の畳み込み、スキップ接続を戦略的に組み合わせたアーキテクチャ拡張であるPool Skipを紹介します。この構成は、トレーニングプロセスを安定化し、レイヤ間の機能の整合性を維持するのに役立つ。また, プールスキップの発達を支える重み慣性仮説を提案し, 次元およびアフィン補償による除去特異性に起因する劣化の緩和に関する理論的知見を提供する。本手法は,分類やセグメンテーションなどのタスクを含む2次元の自然画像と3次元の医用画像の両方に焦点をあてて,様々なベンチマークで評価する。以上の結果から,より堅牢なCNNトレーニングとモデル性能向上を目的としたPool Skipの有効性が示唆された。 Training deep Convolutional Neural Networks (CNNs) presents unique challenges, including the pervasive issue of elimination singularities, consistent deactivation of nodes leading to degenerate manifolds within the loss landscape. These singularities impede efficient learning by disrupting feature propagation. To mitigate this, we introduce Pool Skip, an architectural enhancement that strategically combines a Max Pooling, a Max Unpooling, a 3 times 3 convolution, and a skip connection. This configuration helps stabilize the training process and maintain feature integrity across layers. We also propose the Weight Inertia hypothesis, which underpins the development of Pool Skip, providing theoretical insights into mitigating degradation caused by elimination singularities through dimensional and affine compensation. We evaluate our method on a variety of benchmarks, focusing on both 2D natural and 3D medical imaging applications, including tasks such as classification and segmentation. Our findings highlight Pool Skip's effectiveness in facilitating more robust CNN training and improving model performance.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# 局所更新による分散適応最適化の収束性 Convergence of Distributed Adaptive Optimization with Local Updates ( http://arxiv.org/abs/2409.13155v1 ) ライセンス: Link先を確認	Ziheng Cheng, Margalit Glasgow,	(参考訳) 本稿では,局所的な更新(間欠的通信)を用いた分散適応アルゴリズムについて検討する。現代の機械学習モデルの分散トレーニングにおける適応的手法の実証的成功にもかかわらず、適応的手法における局所的更新の理論的利点、特に通信複雑性の低減の観点からはまだ完全には理解されていない。本稿では,運動量を持つ \em Local SGD \em (\em Local \em SGDM) と \em Local \em Adam がそれぞれ凸および弱凸設定でミニバッチよりも優れていることを示す。これは局所的な更新の利点を示す重要なステップであるが、一般化された滑らかさ仮定と勾配クリッピングの下では困難である。 We study distributed adaptive algorithms with local updates (intermittent communication). Despite the great empirical success of adaptive methods in distributed training of modern machine learning models, the theoretical benefits of local updates within adaptive methods, particularly in terms of reducing communication complexity, have not been fully understood yet. In this paper, we prove that \em Local SGD \em with momentum (\em Local \em SGDM) and \em Local \em Adam can outperform their minibatch counterparts in convex and weakly convex settings, respectively. Our analysis relies on a novel technique to prove contraction during local iterations, which is a crucial but challenging step to show the advantages of local updates, under generalized smoothness assumption and gradient clipping.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# RRM:ロバスト・リワードモデルトレーニングは、リワードハッキングを緩和する RRM: Robust Reward Model Training Mitigates Reward Hacking ( http://arxiv.org/abs/2409.13156v1 ) ライセンス: Link先を確認	Tianqi Liu, Wei Xiong, Jie Ren, Lichang Chen, Junru Wu, Rishabh Joshi, Yang Gao, Jiaming Shen, Zhen Qin, Tianhe Yu, Daniel Sohn, Anastasiia Makarova, Jeremiah Liu, Yuan Liu, Bilal Piot, Abe Ittycheriah, Aviral Kumar, Mohammad Saleh,	(参考訳) リワードモデル(RM)は、大きな言語モデル(LLM)と人間の嗜好の整合において重要な役割を果たす。しかし、特定のプロンプトに結びついたレスポンスペアに依存する従来のRMトレーニングでは、応答長やフォーマットなど、プロンプト非依存のアーティファクトからプロンプト駆動の好みを遠ざけるのに苦労している。本研究では,従来のRMトレーニング手法の基本的制限を明らかにするとともに,好みを決定する際に,RMがコンテキスト信号と無関係なアーティファクトを効果的に区別することができないことを示す。そこで本稿では,これらのアーティファクトに依存しない好みを学習する因果的枠組みを導入し,それらを排除するために設計された新しいデータ拡張手法を提案する。大規模な実験により,提案手法は望ましくないアーティファクトをフィルタし,より堅牢な報酬モデル(RRM)を実現することができた。我々のRRMは、RewardBench上でGemma-2-9b-itでトレーニングされたペアワイズ報酬モデルの性能を改善し、精度を80.61%から84.15%に向上させる。さらに、RMとRRMの両方を用いて2つのDPOポリシーを訓練し、RTMがDPOポリシーを大幅に強化し、MT-Benchスコアが7.27から8.31に、AlpacaEval-2が33.46%から52.49%に改善したことを示す。 Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. However, traditional RM training, which relies on response pairs tied to specific prompts, struggles to disentangle prompt-driven preferences from prompt-independent artifacts, such as response length and format. In this work, we expose a fundamental limitation of current RM training methods, where RMs fail to effectively distinguish between contextual signals and irrelevant artifacts when determining preferences. To address this, we introduce a causal framework that learns preferences independent of these artifacts and propose a novel data augmentation technique designed to eliminate them. Extensive experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model (RRM). Our RRM improves the performance of a pairwise reward model trained on Gemma-2-9b-it, on RewardBench, increasing accuracy from 80.61% to 84.15%. Additionally, we train two DPO policies using both the RM and RRM, demonstrating that the RRM significantly enhances DPO-aligned policies, improving MT-Bench scores from 7.27 to 8.31 and length-controlled win-rates in AlpacaEval-2 from 33.46% to 52.49%.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# 仮想現実感のための高忠実マスクレスニューラルサーフェス再構成 High-Fidelity Mask-free Neural Surface Reconstruction for Virtual Reality ( http://arxiv.org/abs/2409.13158v1 ) ライセンス: Link先を確認	Haotian Bai, Yize Chen, Lin Wang,	(参考訳) 多視点画像からのオブジェクト中心の表面再構成は、AR/VRのための編集可能なデジタルアセットを作成する上で重要である。幾何学的制約が欠如しているため、既存の方法、例えばNeuSはメッシュ処理でコンパクトな表面を再構築するためにオブジェクトマスクに注釈を付ける必要がある。しかし、マスクの注釈は、その厄介な性質のためにかなりの労働コストをもたらしている。本稿では,多視点オブジェクトマスクを使わずにコンパクトかつ正確な表面を復元することを目的とした,ニューラル暗黙表面再構成のための新しいレンダリングベースフレームワークであるHi-NeuSを提案する。私たちの重要な洞察は、オブジェクト中心のビューの重なり合う領域は、カメラがオブジェクトの周りを周回するときに、自然に関心の対象を浮き彫りにするということです。興味の対象は、複数のビューから蓄積されたレンダリング重量の分布を推定することで特定できる。これにより、多視点レンダリングウェイトを用いて、ニューラルネットワークの符号付き距離関数(SDF)を自己監督的にガイドする幾何学的洗練手法が考案される。具体的には、これらの重みを保ち、それらの分布に基づいて擬似表面を再サンプリングする。これにより、SDFと関心の対象とのアライメントが容易になる。次に、幾何整合性に対するSDFのバイアスを正則化する。さらに, より正確な評価のために, ポストプロセッシングを行わずに, 抽出したメッシュを計測するためにアンマスクド・チャンファー距離(CD)を用いることを提案する。我々のアプローチはNeuSとその変種であるNeuangeloを通じて検証され、異なるNeuSバックボーン間の適応性を実証した。 DTUデータセットの広範囲なベンチマークにより,本手法は表面ノイズを約20%低減し,未加工のCDを約30%改善し,表面の細部を改良した。 Hi-NeuSの優位性はさらに、BlendedMVSとハンドヘルドカメラによるコンテンツ作成に有効である。 Object-centric surface reconstruction from multi-view images is crucial in creating editable digital assets for AR/VR. Due to the lack of geometric constraints, existing methods, e.g., NeuS necessitate annotating the object masks to reconstruct compact surfaces in mesh processing. Mask annotation, however, incurs considerable labor costs due to its cumbersome nature. This paper presents Hi-NeuS, a novel rendering-based framework for neural implicit surface reconstruction, aiming to recover compact and precise surfaces without multi-view object masks. Our key insight is that the overlapping regions in the object-centric views naturally highlight the object of interest as the camera orbits around objects. The object of interest can be specified by estimating the distribution of the rendering weights accumulated from multiple views, which implicitly identifies the surface that a user intends to capture. This inspires us to design a geometric refinement approach, which takes multi-view rendering weights to guide the signed distance functions (SDF) of neural surfaces in a self-supervised manner. Specifically, it retains these weights to resample a pseudo surface based on their distribution. This facilitates the alignment of the SDF to the object of interest. We then regularize the SDF's bias for geometric consistency. Moreover, we propose to use unmasked Chamfer Distance(CD) to measure the extracted mesh without post-processing for more precise evaluation. Our approach has been validated through NeuS and its variant Neuralangelo, demonstrating its adaptability across different NeuS backbones. Extensive benchmark on the DTU dataset shows that our method reduces surface noise by about 20%, and improves the unmasked CD by around 30%, achieving better surface details. The superiority of Hi-NeuS is further validated on BlendedMVS and handheld camera captures for content creation.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# アフリカの未来を守る : タンザニアにおける子どもの安全、学習、技能獲得のためのサイバーセキュリティ戦略 Protecting Africa's Future: Cybersecurity Strategies for Child Safety, Learning, and Skill Acquisition in Tanzania ( http://arxiv.org/abs/2409.13159v1 ) ライセンス: Link先を確認	Ezekia Gilliard, Abdul Maziko, Gideon Rwechungura, Ahmed Abubakar Aliyu, Erasto Kayumbe,	(参考訳) 今日、アフリカの子どもたちはインターネットからリスクが増している。危険物には有害なコンテンツ、暴力、搾取、虐待、無視が含まれる。これらすべてがモバイルとインターネットのテクノロジー利用の増加によって増加しており、安全を危険にさらすだけでなく、将来に必要なスキルを学ぶ能力にも影響を与えている。本稿では,第3世界のアフリカ諸国が直面している,子どものオンライン安全性の確保と,その発達ニーズを支える上での課題について概説する。これは、子供たちをオンラインの脅威から保護し、デジタルリテラシーを強化するために他国が採用した効果的な慣行と政策を強調している。我々は、他の国が児童虐待から保護し、デジタル世界で成功するためのベストプラクティスと政策を共有することに重点を置いています。この研究は、UNICEFや国連などの組織との国際協力の重要性とともに、タンザニア共和国特有のオンライン安全戦略、法的枠組み、レコメンデーションを強調している。アフリカの政策立案者、教育者、サイバーセキュリティの専門家に実践的なガイダンスと勧告を提供し、大陸内外での子供のオンライン安全活動を強化することを目的としている。 Today, children across Africa are at a growing risk from the Internet. Dangers include harmful content, violence, exploitation, abuse, and neglect. All these have increased due to increased mobile and Internet technology use, which not only places their safety at risk but also affects their ability to learn essential skills for their future. This paper provides an overview of the unique challenges faced by third-world African countries in ensuring the online safety of children while also supporting their developmental needs. It highlights effective practices and policies adopted by other nations to safeguard children from online threats and enhance their digital literacy. We are focusing on sharing the best practices and policies other countries have used to protect children from abuse and help them succeed in the digital world. The study emphasizes the online safety strategies, legal frameworks, and recommendations specific to the United Republic of Tanzania, along with the significance of international collaborations with organizations like UNICEF and the UN. The goal is to provide African policymakers, educators, and cybersecurity professionals with practical guidance and recommendations to strengthen child online safety initiatives both within and beyond the continent.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# Zero-shot Point Cloud Anomaly Detectionに向けて:マルチビュープロジェクションフレームワーク Towards Zero-shot Point Cloud Anomaly Detection: A Multi-View Projection Framework ( http://arxiv.org/abs/2409.13162v1 ) ライセンス: Link先を確認	Yuqi Cheng, Yunkang Cao, Guoyang Xie, Zhichao Lu, Weiming Shen,	(参考訳) ポイントクラウド内の異常を検出することは、様々な産業アプリケーションにとって重要であるが、従来の教師なしの手法は、データ取得コスト、初期生産制約、製品カテゴリ間の限定的な一般化による課題に直面している。これらの課題を克服するために、トレーニング済みのビジョンランゲージモデル(VLM)を利用して異常を検出する、Multi-View Projection(MVP)フレームワークを導入する。具体的には、MVPは、クラウドデータを多視点深度画像に向けることで、ポイントクラウド異常検出をイメージ異常検出に変換する。ゼロショット画像異常検出法に続いて、予め訓練したVLMを用いて、これらの深度画像上の異常を検出する。事前学習されたVLMは、本質的にゼロショット点雲異常検出に適合せず、特異性に欠ける可能性があることを考慮し、これらのVLMを微調整するための学習可能な視覚的および適応的テキストプロンプト技術の統合を提案し、その検出性能を向上させる。 MVTec 3D-ADとReal3D-ADの広範囲な実験により,提案するMVPフレームワークの優れたゼロショット異常検出性能と高速化技術の有効性が実証された。自動車用プラスチック部品の検査における実世界の評価は,提案手法が実用上の見当たらないシナリオにも一般化可能であることをさらに示している。コードはhttps://github.com/hustCYQ/MVP-PCLIPで入手できる。 Detecting anomalies within point clouds is crucial for various industrial applications, but traditional unsupervised methods face challenges due to data acquisition costs, early-stage production constraints, and limited generalization across product categories. To overcome these challenges, we introduce the Multi-View Projection (MVP) framework, leveraging pre-trained Vision-Language Models (VLMs) to detect anomalies. Specifically, MVP projects point cloud data into multi-view depth images, thereby translating point cloud anomaly detection into image anomaly detection. Following zero-shot image anomaly detection methods, pre-trained VLMs are utilized to detect anomalies on these depth images. Given that pre-trained VLMs are not inherently tailored for zero-shot point cloud anomaly detection and may lack specificity, we propose the integration of learnable visual and adaptive text prompting techniques to fine-tune these VLMs, thereby enhancing their detection performance. Extensive experiments on the MVTec 3D-AD and Real3D-AD demonstrate our proposed MVP framework's superior zero-shot anomaly detection performance and the prompting techniques' effectiveness. Real-world evaluations on automotive plastic part inspection further showcase that the proposed method can also be generalized to practical unseen scenarios. The code is available at https://github.com/hustCYQ/MVP-PCLIP.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# 隠れたアクティベーションは十分ではない:ニューラルネットワークの予測に対する一般的なアプローチ Hidden Activations Are Not Enough: A General Approach to Neural Network Predictions ( http://arxiv.org/abs/2409.13163v1 ) ライセンス: Link先を確認	Samuel Leblanc, Aiky Rasolomanana, Marco Armenta,	(参考訳) 本稿では,クイバー表現理論のツールを用いたニューラルネットワーク解析のための新しい数学的枠組みを提案する。このフレームワークにより,ニューラルネットワークが認識する新たなデータサンプルとトレーニングデータとの類似性を定量化することができる。データサンプルの帰納的クイバー表現を活用することで、従来の隠蔽層出力よりも多くの情報をキャプチャする。このクイバー表現は、フォワードパスの計算の複雑さを1つの行列に抽象化し、行列空間における単純な幾何学的および統計的引数を用いてニューラルネットワークの予測を研究する。私たちの数学的結果はアーキテクチャ非依存でタスク非依存であり、広く適用できます。本稿では,MNIST と FashionMNIST のデータセットに対して,異なる MLP アーキテクチャの対角的例といくつかの対向的攻撃方法を検出する問題に対して,本研究の結果を適用した。我々の実験は、我々の \href{https://github.com/MarcoArmenta/Hidden-Activations-are-Enough}{publicly available repository} で再現できる。 We introduce a novel mathematical framework for analyzing neural networks using tools from quiver representation theory. This framework enables us to quantify the similarity between a new data sample and the training data, as perceived by the neural network. By leveraging the induced quiver representation of a data sample, we capture more information than traditional hidden layer outputs. This quiver representation abstracts away the complexity of the computations of the forward pass into a single matrix, allowing us to employ simple geometric and statistical arguments in a matrix space to study neural network predictions. Our mathematical results are architecture-agnostic and task-agnostic, making them broadly applicable. As proof of concept experiments, we apply our results for the MNIST and FashionMNIST datasets on the problem of detecting adversarial examples on different MLP architectures and several adversarial attack methods. Our experiments can be reproduced with our \href{https://github.com/MarcoArmenta/Hidden-Activations-are-not-Enough}{publicly available repository}.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# 姿勢制御のためのモジュラ衛星の形態と挙動の協調最適化 Morphology and Behavior Co-Optimization of Modular Satellites for Attitude Control ( http://arxiv.org/abs/2409.13166v1 ) ライセンス: Link先を確認	Yuxing Wang, Jie Li, Cong Yu, Xinyang Li, Simeng Huang, Yongzhe Chang, Xueqian Wang, Bin Liang,	(参考訳) モジュラー衛星の出現は、宇宙探査における柔軟性、レジリエンス、拡張性の新たなパラダイムを導入し、宇宙船工学における重要な転換点となった。姿勢制御などの複雑な課題に対処するためには、衛星の形態的アーキテクチャと制御器の両方が性能の最適化に不可欠である。最適な制御に関するかなりの研究にもかかわらず、特定のミッションの制約に合わせたモジュラー衛星の最適化と実用的な組み立て戦略の開発には大きなギャップが残っている。この研究のギャップは、設計と制御の協調最適化という本質的に複雑な性質から生じる。従来、人工進化によって取り組まれていたこの問題は、サンプル非効率で計算コストのかかる個々のコントローラの適合度に基づいて形態を最適化することである。本稿では、モジュラー衛星の形状と制御を同時に最適化し、姿勢制御ミッションの性能と効率を向上させるための、新しい勾配に基づくアプローチを提案する。我々のモンテカルロシミュレーションは、この共最適化アプローチが、進化に基づくアプローチで設計されたものよりも、ミッション性能のよいモジュラー衛星を産み出すことを示した。さらに,本研究では今後の研究の道筋について論じる。 The emergence of modular satellites marks a significant transformation in spacecraft engineering, introducing a new paradigm of flexibility, resilience, and scalability in space exploration endeavors. In addressing complex challenges such as attitude control, both the satellite's morphological architecture and the controller are crucial for optimizing performance. Despite substantial research on optimal control, there remains a significant gap in developing optimized and practical assembly strategies for modular satellites tailored to specific mission constraints. This research gap primarily arises from the inherently complex nature of co-optimizing design and control, a process known for its notorious bi-level optimization loop. Conventionally tackled through artificial evolution, this issue involves optimizing the morphology based on the fitness of individual controllers, which is sample-inefficient and computationally expensive. In this paper, we introduce a novel gradient-based approach to simultaneously optimize both morphology and control for modular satellites, enhancing their performance and efficiency in attitude control missions. Our Monte Carlo simulations demonstrate that this co-optimization approach results in modular satellites with better mission performance compared to those designed by evolution-based approaches. Furthermore, this study discusses potential avenues for future research.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# 電子ノイズシステムにおけるドリフト補償のための教師なしアテンションベースマルチソースドメイン適応フレームワーク Unsupervised Attention-Based Multi-Source Domain Adaptation Framework for Drift Compensation in Electronic Nose Systems ( http://arxiv.org/abs/2409.13167v1 ) ライセンス: Link先を確認	Wenwen Zhang, Shuhao Hu, Zhengyuan Zhang, Yuanjin Zheng, Qi Jie Wang, Zhiping Lin,	(参考訳) 電子鼻(E-nose)システムを用いた産業環境における有害・有害・爆発・可燃性ガスの連続的・長期モニタリングは、ガスセンサの時間変化によるガス識別精度の低下という重大な課題に直面している。この問題に対処するために,E-noseシステムにおいて,ドリフト補償を伴うガス識別のための非監視型マルチソースドメイン共有プライベート機能融合適応(AMDS-PFFA)フレームワークを提案する。 AMDS-PFFAモデルは、初期段階で収集された複数のソースドメインからのラベル付きデータを有効利用し、ターゲットドメインからのラベルなしガスセンサアレイドリフト信号中のガスを正確に識別する。このモデルの有効性を検証するため、カリフォルニア大学アーバイン校(UCI)の標準ドリフトガスデータセット(36ヶ月以上)と、自家製E-noseシステムからのドリフト信号データ(30ヶ月)を用いて、広範囲な実験的評価を行った。近年のドリフト補償法と比較して、AMDS-PFFAモデルは強い収束率で、UCIデータセットで83.20%、ターゲット領域のバッチで開発したE-noseシステムからのデータで93.96%に達する。これらの結果は, ドリフト補償によるガス識別におけるAMDS-PFFAモデルの優れた性能を示し, 既存手法よりも優れていた。 Continuous, long-term monitoring of hazardous, noxious, explosive, and flammable gases in industrial environments using electronic nose (E-nose) systems faces the significant challenge of reduced gas identification accuracy due to time-varying drift in gas sensors. To address this issue, we propose a novel unsupervised attention-based multi-source domain shared-private feature fusion adaptation (AMDS-PFFA) framework for gas identification with drift compensation in E-nose systems. The AMDS-PFFA model effectively leverages labeled data from multiple source domains collected during the initial stage to accurately identify gases in unlabeled gas sensor array drift signals from the target domain. To validate the model's effectiveness, extensive experimental evaluations were conducted using both the University of California, Irvine (UCI) standard drift gas dataset, collected over 36 months, and drift signal data from our self-developed E-nose system, spanning 30 months. Compared to recent drift compensation methods, the AMDS-PFFA model achieves the highest average gas recognition accuracy with strong convergence, attaining 83.20% on the UCI dataset and 93.96% on data from our self-developed E-nose system across all target domain batches. These results demonstrate the superior performance of the AMDS-PFFA model in gas identification with drift compensation, significantly outperforming existing methods.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# AI時代の経済政策への挑戦 Economic Policy Challenges for the Age of AI ( http://arxiv.org/abs/2409.13168v1 ) ライセンス: Link先を確認	Anton Korinek,	(参考訳) 本稿では、AIの人工知能(AGI)への転換的進歩が、経済学者や経済政策立案者にもたらす大きな課題について考察する。私は、AIの時代が、労働の役割を減らし、前例のない生産性向上をもたらすことによって、経済の基本的な構造にどのように革命をもたらすかを検討するが、失業、所得分配、教育と人的資本の価値に対する懸念を提起する。私は、AGI後の労働にどのような役割が残るのか、どの生産要因が重要になるのかを探求する。この論文は,(1)不平等と所得分配,(2)教育と技術開発,(3)社会的・政治的安定,(4)マクロ経済政策,(5)反トラストと市場規制,(6)知的財産,(7)環境影響,(8)グローバルAIガバナンスという,AI時代の経済政策の8つの重要な課題を明らかにする。経済学者がこれらの課題をより深く理解するためにどのように貢献できるかを強調して結論付けている。 This paper examines the profound challenges that transformative advances in AI towards Artificial General Intelligence (AGI) will pose for economists and economic policymakers. I examine how the Age of AI will revolutionize the basic structure of our economies by diminishing the role of labor, leading to unprecedented productivity gains but raising concerns about job disruption, income distribution, and the value of education and human capital. I explore what roles may remain for labor post-AGI, and which production factors will grow in importance. The paper then identifies eight key challenges for economic policy in the Age of AI: (1) inequality and income distribution, (2) education and skill development, (3) social and political stability, (4) macroeconomic policy, (5) antitrust and market regulation, (6) intellectual property, (7) environmental implications, and (8) global AI governance. It concludes by emphasizing how economists can contribute to a better understanding of these challenges.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# 層内LPBFモニタリングのための生成拡散モデルによる深層学習に基づく光学画像の超解像 Deep Learning based Optical Image Super-Resolution via Generative Diffusion Models for Layerwise in-situ LPBF Monitoring ( http://arxiv.org/abs/2409.13171v1 ) ライセンス: Link先を確認	Francis Ogoke, Sumesh Kalambettu Suresh, Jesse Adamczyk, Dan Bolintineanu, Anthony Garland, Michael Heiden, Amir Barati Farimani,	(参考訳) レーザー粉体層融合(L-PBF)における欠陥の確率的形成は, 高精度使用例への採用に悪影響を及ぼす。光モニタリング技術は,レイヤワイドイメージングに基づく欠陥の同定に利用することができるが,コストやメモリの制約により,高解像度化が困難である。そこで我々は,低コストで低解像度なビルドプレート画像と詳細な高解像度なビルドプレート画像とを結びつけ,コスト効率のよいプロセス監視を実現するため,生成型ディープラーニングモデルを実装した。そのため,低分解能Webカメラ画像からビルドプレートの現実的な高分解能画像を生成するための条件付き潜伏確率拡散モデルを訓練し,小型の特徴の分布と表面粗さを復元する。まず、ピーク信号対雑音比(PSNR)、構造類似度指標(SSIM)、ウェーブレット共分散測定を用いて、生成した画像の再構成品質を解析し、そのモデルの性能を評価する。さらに,Segment Anything Foundationモデルに基づくフレームワークを設計し,プリント部の3次元形状を再現し,再構成した試料の表面粗さを解析する。最後に、実装されたフレームワークのゼロショット一般化能力を、合成低解像度データを作成することによって、他の部分のジオメトリに拡張する。 The stochastic formation of defects during Laser Powder Bed Fusion (L-PBF) negatively impacts its adoption for high-precision use cases. Optical monitoring techniques can be used to identify defects based on layer-wise imaging, but these methods are difficult to scale to high resolutions due to cost and memory constraints. Therefore, we implement generative deep learning models to link low-cost, low-resolution images of the build plate to detailed high-resolution optical images of the build plate, enabling cost-efficient process monitoring. To do so, a conditional latent probabilistic diffusion model is trained to produce realistic high-resolution images of the build plate from low-resolution webcam images, recovering the distribution of small-scale features and surface roughness. We first evaluate the performance of the model by analyzing the reconstruction quality of the generated images using peak-signal-to-noise-ratio (PSNR), structural similarity index measure (SSIM) and wavelet covariance metrics that describe the preservation of high-frequency information. Additionally, we design a framework based upon the Segment Anything foundation model to recreate the 3D morphology of the printed part and analyze the surface roughness of the reconstructed samples. Finally, we explore the zero-shot generalization capabilities of the implemented framework to other part geometries by creating synthetic low-resolution data.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# フラッターミニマにおけるバイラテラルシャープネスの最小化 Bilateral Sharpness-Aware Minimization for Flatter Minima ( http://arxiv.org/abs/2409.13173v1 ) ライセンス: Link先を確認	Jiaxin Deng, Junbiao Pang, Baochang Zhang, Qingming Huang,	(参考訳) SAM (Sharpness-Aware Minimization) は、Max-Sharpness (MaxS) を小さくすることで一般化を促進する。実践的な成功にもかかわらず,SAM の一般化強化を支える MAxS が「Flatness Indicator Problem (FIP)」に直面することを実証的に見出した。より良い平坦度指標(FI)は、ニューラルネットワークのより良い一般化をもたらすだろう。なぜならSAMは自然界における欲求的な探索法であるからである。本稿では, トレーニング損失と現在の重量を囲む周辺地域の最小損失との差を利用して, ミンシャープネス(Min-Sharpness, MinS)と表現する。 MaxSとMinSをマージすることで、最適化中により平坦な方向を示すより良いFIを作成しました。具体的には、このFIをSAMと組み合わせて提案したバイラテラルSAM (BSAM) と組み合わせ、SAMのそれよりもより平坦な最小値を求める。この理論解析は、BSAMが局所ミニマに収束することを証明している。大規模な実験により、BSAMは、分類、移動学習、ポーズ推定、ネットワーク量子化といった様々なタスクにおいて、バニラSAMよりも優れた一般化性能とロバスト性を提供することが示された。コードは、https://github.com/ajiaaa/BSAM.comで公開されている。 Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS). Despite the practical success, we empirically found that the MAxS behind SAM's generalization enhancements face the "Flatness Indicator Problem" (FIP), where SAM only considers the flatness in the direction of gradient ascent, resulting in a next minimization region that is not sufficiently flat. A better Flatness Indicator (FI) would bring a better generalization of neural networks. Because SAM is a greedy search method in nature. In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS). By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM. The theoretical analysis proves that BSAM converges to local minima. Extensive experiments demonstrate that BSAM offers superior generalization performance and robustness compared to vanilla SAM across various tasks, i.e., classification, transfer learning, human pose estimation, and network quantization. Code is publicly available at: https://github.com/ajiaaa/BSAM.	翻訳日:2024-11-07 11:41:13 公開日:2024-09-20
# RPAF:大規模リコメンダシステムにおけるキャッシュ割り当てのための強化予測アロケーションフレームワーク RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems ( http://arxiv.org/abs/2409.13175v1 ) ライセンス: Link先を確認	Shuo Su, Xiaoshuang Chen, Yao Wang, Yulin Wu, Ziqiang Zhang, Kaiqiao Zhan, Ben Wang, Kun Gai,	(参考訳) 現代のリコメンデータシステムは計算集約的なインフラ上に構築されており、計算資源が限られているため、特にピーク時に各要求に対してリアルタイムな計算を行うことは困難である。ユーザ側のキャッシュによるリコメンデーションは、リアルタイムのレコメンデーションができない場合に広く使用される。しかし、ユーザ全体のエンゲージメントを最大化するために、リアルタイムおよびキャッシュされたレコメンデーションを割り当てることは困難である。本稿では,キャッシュアロケーションにおける2つの重要な課題,すなわち,バリューストラテジー依存とストリーミングアロケーションを示す。そこで我々は,これらの問題に対処する強化予測割当フレームワーク(RPAF)を提案する。 RPAFは、予測とアロケーション段階を含む強化学習ベースの2段階フレームワークである。予測段階は、値戦略依存性を考慮したキャッシュ選択の値を推定し、割り当て段階は、グローバルな予算制約を満たしつつ、各要求に対するキャッシュ選択を決定する。 RPAF訓練の課題には, グローバル性と予算制約の厳格性が含まれており, この問題に対処するための緩やかなローカルアロケータ (RLA) が提案されている。さらに、ストリーミングアロケーション問題に対処するために、アロケーションステージでPoolRankアルゴリズムが使用される。実験の結果,RPAFは計算予算制約下でのユーザのエンゲージメントを大幅に改善することがわかった。 Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users' overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users' engagement under computational budget constraints.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# 説明可能なAIとLLMを用いた適応型エンドツーエンドIoTセキュリティフレームワーク An Adaptive End-to-End IoT Security Framework Using Explainable AI and LLMs ( http://arxiv.org/abs/2409.13177v1 ) ライセンス: Link先を確認	Sudipto Baral, Sajal Saha, Anwar Haque,	(参考訳) IoT(Internet of Things)の指数関数的な成長は、サイバーセキュリティの脅威の複雑さと量を大幅に増加させ、高度でスケーラブルで解釈可能なセキュリティフレームワークの開発を必要としている。本稿では、機械学習(ML)、説明可能なAI(XAI)、大規模言語モデル(LLM)を活用した、リアルタイムIoT攻撃検出および応答のための革新的で包括的なフレームワークを提案する。 SHAP(SHapley Additive exPlanations)やLIME(Local Interpretable Model-Agnostic Explanations)といったXAI技術をモデルに依存しないアーキテクチャに統合することにより、さまざまなMLアルゴリズムにまたがるフレームワークの適応性を確保する。さらに、LSMの組み入れにより、検出決定の解釈可能性とアクセシビリティが向上し、システム管理者に検出された脅威の動作可能で人間に理解可能な説明を提供する。私たちのエンドツーエンドフレームワークは、モデル開発からデプロイメントへのシームレスな移行を促進するだけでなく、既存の研究でしばしば欠落している現実世界のアプリケーション機能も表しています。 The CIC-IOT-2023 dataset \cite{neto2023ciciot2023}, Gemini and OPENAI LLMS shows unique strengths in attack mitigation: Gemini provide exact, focused strategy, OPENAI provides extensive, in-deepth security measures。 SHAPアルゴリズムとLIMEアルゴリズムをXAIに組み込むことで、攻撃検出、詳細な特徴分析、微調整、誤分類の適応によるモデル改善の機会を強調し、精度を高めることができる。 The exponential growth of the Internet of Things (IoT) has significantly increased the complexity and volume of cybersecurity threats, necessitating the development of advanced, scalable, and interpretable security frameworks. This paper presents an innovative, comprehensive framework for real-time IoT attack detection and response that leverages Machine Learning (ML), Explainable AI (XAI), and Large Language Models (LLM). By integrating XAI techniques such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) with a model-independent architecture, we ensure our framework's adaptability across various ML algorithms. Additionally, the incorporation of LLMs enhances the interpretability and accessibility of detection decisions, providing system administrators with actionable, human-understandable explanations of detected threats. Our end-to-end framework not only facilitates a seamless transition from model development to deployment but also represents a real-world application capability that is often lacking in existing research. Based on our experiments with the CIC-IOT-2023 dataset \cite{neto2023ciciot2023}, Gemini and OPENAI LLMS demonstrate unique strengths in attack mitigation: Gemini offers precise, focused strategies, while OPENAI provides extensive, in-depth security measures. Incorporating SHAP and LIME algorithms within XAI provides comprehensive insights into attack detection, emphasizing opportunities for model improvement through detailed feature analysis, fine-tuning, and the adaptation of misclassifications to enhance accuracy.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# API提案における大規模コードモデルの体系的評価:いつ,いつ,どのように A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How ( http://arxiv.org/abs/2409.13178v1 ) ライセンス: Link先を確認	Chaozheng Wang, Shuzheng Gao, Cuiyun Gao, Wenxuan Wang, Chun Yong Chong, Shan Gao, Michael R. Lyu,	(参考訳) API提案は、現代のソフトウェア開発において重要なタスクであり、現在の状況に基づいてサードパーティのAPIを予測し、推奨することでプログラマを支援する。大規模コードモデル(LCM)の最近の進歩は、API提案タスクにおいて有望であることを示している。しかし彼らは主に、どのAPIを使うべきかを提案することに重点を置いており、プログラマは、提案されたAPIを使うタイミングやAPIを使う方法など、実際にAPIを使用している間、より多くの支援を要求する可能性があることを無視している。このギャップを軽減するため,本論文では,API提案タスクのLCMを体系的に評価する。調査を容易にするために、まず、683の人気のあるJavaプロジェクトで使用されている176のAPIをカバーする、多様なコードスニペットのコレクションを含むベンチマークを構築しました。 API提案タスクの3つのシナリオは評価のために考慮される。(1)API使用の望ましい位置とタイミングを決定することを目的とした'`\textit{when to use}''、(2)ライブラリから適切なAPIを特定することを目的とした'`\textit{which to use}'、(3)APIの引数を予測することを目的とした'`\textit{how to use}'である。この3つのシナリオを考慮すれば、開発者のためのAPIの提案におけるLCMの機能の包括的な評価が可能になる。評価では,3つのシナリオに対して,異なるモデルサイズを持つ9つの一般的なLCMを選択する。また、文脈選択がモデル性能に与える影響を詳細に分析する。 API suggestion is a critical task in modern software development, assisting programmers by predicting and recommending third-party APIs based on the current context. Recent advancements in large code models (LCMs) have shown promise in the API suggestion task. However, they mainly focus on suggesting which APIs to use, ignoring that programmers may demand more assistance while using APIs in practice including when to use the suggested APIs and how to use the APIs. To mitigate the gap, we conduct a systematic evaluation of LCMs for the API suggestion task in the paper. To facilitate our investigation, we first build a benchmark that contains a diverse collection of code snippets, covering 176 APIs used in 853 popular Java projects. Three distinct scenarios in the API suggestion task are then considered for evaluation, including (1) ``\textit{when to use}'', which aims at determining the desired position and timing for API usage; (2) ``\textit{which to use}'', which aims at identifying the appropriate API from a given library; and (3) ``\textit{how to use}'', which aims at predicting the arguments for a given API. The consideration of the three scenarios allows for a comprehensive assessment of LCMs' capabilities in suggesting APIs for developers. During the evaluation, we choose nine popular LCMs with varying model sizes for the three scenarios. We also perform an in-depth analysis of the influence of context selection on the model performance ...	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# ConvLSTMTransNet:インターネットトラフィックテレメトリのためのハイブリッドディープラーニングアプローチ ConvLSTMTransNet: A Hybrid Deep Learning Approach for Internet Traffic Telemetry ( http://arxiv.org/abs/2409.13179v1 ) ライセンス: Link先を確認	Sajal Saha, Saikat Das, Glaucio H. S. Carvalho,	(参考訳) 本稿では、時系列予測のためのハイブリッドディープラーニングモデルConvLSTMTransNetと、インターネットトラフィックテレメトリへの具体的な適用について述べる。このモデルは、畳み込みニューラルネットワーク(CNN)、Long Short-Term Memory(LSTM)ネットワーク、およびTransformerエンコーダの強みを統合し、時系列データに固有の複雑な時空間関係をキャプチャする。 The ConvLSTMTransNet model were evaluation on three baseline model: RNN, LSTM, Gated Recurrent Unit (GRU) using real Internet traffic data sampleed from high-speed port on a provider edge router。 Mean Absolute Error (MAE)、Root Mean Squared Error (RMSE)、Weighted Absolute Percentage Error (WAPE)といったパフォーマンス指標を使用して各モデルの精度を評価した。以上の結果から,ConvLSTMTransNetは予測精度において,ベースラインモデルよりも約10%優れていた。 ConvLSTMTransNetは、時間的依存関係を捕捉し、インターネットトラフィックデータから空間的特徴を抽出する能力を高めるという、革新的なアーキテクチャ上の特徴により、従来のモデルを上回る。これらの知見は、より正確な予測を達成するために、インターネットトラフィックデータの複雑さに合わせた高度なアーキテクチャを採用することの重要性を浮き彫りにしている。 In this paper, we present a novel hybrid deep learning model, named ConvLSTMTransNet, designed for time series prediction, with a specific application to internet traffic telemetry. This model integrates the strengths of Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Transformer encoders to capture complex spatial-temporal relationships inherent in time series data. The ConvLSTMTransNet model was evaluated against three baseline models: RNN, LSTM, and Gated Recurrent Unit (GRU), using real internet traffic data sampled from high-speed ports on a provider edge router. Performance metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Weighted Absolute Percentage Error (WAPE) were used to assess each model's accuracy. Our findings demonstrate that ConvLSTMTransNet significantly outperforms the baseline models by approximately 10% in terms of prediction accuracy. ConvLSTMTransNet surpasses traditional models due to its innovative architectural features, which enhance its ability to capture temporal dependencies and extract spatial features from internet traffic data. Overall, these findings underscore the importance of employing advanced architectures tailored to the complexities of internet traffic data for achieving more precise predictions.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# インターネットトラフィック予測におけるデータ制限の克服:トランスファーラーニングとウェーブレット拡張を用いたLSTMモデル Overcoming Data Limitations in Internet Traffic Forecasting: LSTM Models with Transfer Learning and Wavelet Augmentation ( http://arxiv.org/abs/2409.13181v1 ) ライセンス: Link先を確認	Sajal Saha, Anwar Haque, Greg Sidebottom,	(参考訳) 小型ISPネットワークにおけるインターネットトラフィックの効果的な予測は、データ可用性の制限によって困難である。本稿では, LSTM を用いた2つのモデル LSTMSeq2Seq と LSTMSeq2SeqAtn を用いた転送学習とデータ拡張手法を用いて, この課題を考察する。データセットは実際のインターネットトラフィックテレメトリを表し、さまざまなネットワークドメインにわたる多様なトラフィックパターンに関する洞察を提供する。両モデルとも単段階予測では良好に動作したが,特に長期精度では多段階予測が困難であった。小さなデータセットでは、LSTMSeq2Seqは一般的にLSTMSeq2SeqAtnよりも優れており、より高いモデル複雑性が必ずしもより良いパフォーマンスをもたらすとは限らないことを示している。モデルの有効性は、異なるネットワーク領域で異なり、異なるトラフィック特性の影響を反映している。データ不足に対処するため、離散ウェーブレット変換はデータ拡張に使用され、特に短期的な予測において、モデルの性能が大幅に改善された。分析の結果、限られたデータを持つシナリオでは、データの増大が不可欠であることが判明した。さらに、LSTMSeq2SeqAtnにおける注意機構により、より短期的な予測一貫性は向上するが、より長い予測ではより大きな変動性を提供する。その結果、交通予測における異なるモデリングアプローチの利点と限界が浮き彫りになった。本研究は、特にデータ可用性に制限のある小さなISPネットワークにおいて、交通予測モデルの精度を高める上で、転送学習とデータ拡張の重要性を浮き彫りにしている。 Effective internet traffic prediction in smaller ISP networks is challenged by limited data availability. This paper explores this issue using transfer learning and data augmentation techniques with two LSTM-based models, LSTMSeq2Seq and LSTMSeq2SeqAtn, initially trained on a comprehensive dataset provided by Juniper Networks and subsequently applied to smaller datasets. The datasets represent real internet traffic telemetry, offering insights into diverse traffic patterns across different network domains. Our study revealed that while both models performed well in single-step predictions, multi-step forecasts were challenging, particularly in terms of long-term accuracy. In smaller datasets, LSTMSeq2Seq generally outperformed LSTMSeq2SeqAtn, indicating that higher model complexity does not necessarily translate to better performance. The models' effectiveness varied across different network domains, reflecting the influence of distinct traffic characteristics. To address data scarcity, Discrete Wavelet Transform was used for data augmentation, leading to significant improvements in model performance, especially in shorter-term forecasts. Our analysis showed that data augmentation is crucial in scenarios with limited data. Additionally, the study included an analysis of the models' variability and consistency, with attention mechanisms in LSTMSeq2SeqAtn providing better short-term forecasting consistency but greater variability in longer forecasts. The results highlight the benefits and limitations of different modeling approaches in traffic prediction. Overall, this research underscores the importance of transfer learning and data augmentation in enhancing the accuracy of traffic prediction models, particularly in smaller ISP networks with limited data availability.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# $\textit{SKIntern}$: より優れたCoT能力を小言語モデルに拡張するためのシンボリック知識の内部化 $\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models ( http://arxiv.org/abs/2409.13183v1 ) ライセンス: Link先を確認	Huanxuan Liao, Shizhu He, Yupu Hao, Xiang Li, Yuanzhe Zhang, Kang Liu, Jun Zhao,	(参考訳) SLM(Small Language Models)は、LLM(Large Language Models)の高い計算要求とプライバシー上の懸念から注目を集めている。 LLMから抽出したCoT(Chains of Thought)データを用いた微調整SLMの研究は,その推論能力の向上を目的としている。さらに、いくつかのCoT蒸留法は、外部シンボル知識を生成プロセスに導入し、限られた知識記憶、推論能力、およびSLMの外部ドメイン(OOD)一般化を改善する。しかし、記号的知識の導入により計算オーバーヘッドが増加し、潜在的なノイズがもたらされる。本稿では,SLM に記号的知識を内在化させる革新的な手法である $\textit{SKIntern}$ を導入する。知識を効率的に内部化することにより、$\textit{SKIntern}$は計算オーバーヘッドを減らし、推論中の問題のみに焦点を当てることで推論プロセスを高速化する。最先端のベースラインを5倍以上上回り、推論コスト(FLOPで測定される)を最大4倍に削減します。私たちのコードは \url{https://github.com/Xnhyacinth/SKIntern} で利用可能です。 Small Language Models (SLMs) are attracting attention due to the high computational demands and privacy concerns of Large Language Models (LLMs). Some studies fine-tune SLMs using Chains of Thought (CoT) data distilled from LLMs, aiming to enhance their reasoning ability. Furthermore, Some CoT distillation methods introduce external symbolic knowledge into the generation process to improve the limited knowledge memory, reasoning ability and out-of-domain (OOD) generalization of SLMs. However, the introduction of symbolic knowledge increases computational overhead and introduces potential noise. In this paper, we introduce $\textit{SKIntern}$, an innovative approach that empowers SLMs to internalize symbolic knowledge and few-shot examples gradually through a progressive fine-tuning process, guided by a predefined linear decay schedule under curriculum learning. By efficiently internalizing knowledge, $\textit{SKIntern}$ reduces computational overhead and speeds up the reasoning process by focusing solely on the question during inference. It outperforms state-of-the-art baselines by over 5\%, while reducing inference costs (measured in FLOPs) by up to $4\times$ across a wide range of SLMs in both in-domain (ID) and out-of-domain (OOD) tasks. Our code will be available at \url{https://github.com/Xnhyacinth/SKIntern}.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# ASPINN:特異摂動微分方程式を解く漸近戦略 ASPINN: An asymptotic strategy for solving singularly perturbed differential equations ( http://arxiv.org/abs/2409.13185v1 ) ライセンス: Link先を確認	Sen Wang, Peizhi Zhao, Tao Song,	(参考訳) 特異摂動微分方程式 (SPDE) の解法は, 境界層における解の急激な変化に起因する。本論文では,物理情報ニューラルネットワーク (PINN) と一般結合型物理情報ニューラルネットワーク (GKPINN) の一般化である漸近的物理情報ニューラルネットワーク (ASPINN) を提案する。これは漸近解析の考え方に基づく分解法である。 PINNと比較して、ASPINN法は境界層に指数層が配置されているため、SPDEを解くのに強い適合性を持つ。 GKPINNとは異なり、ASPINNは完全に接続されたレイヤーの数を減らし、トレーニングコストをより効率的に削減する。さらに、ASPINNは理論上境界層での解をより正確に近似し、GKPINNと比較して精度も向上する。本稿では,ASPINN法が境界層問題において有望であることを示す,多様なSPDEのクラスを解くことでASPINNの効果を実証する。さらに,MLPの代わりにChebyshev Kolmogorov-Arnold Networks (Chebyshev-KAN)を導入し,様々な実験で高い性能を実現した。 Solving Singularly Perturbed Differential Equations (SPDEs) presents challenges due to the rapid change of their solutions at the boundary layer. In this manuscript, We propose Asymptotic Physics-Informed Neural Networks (ASPINN), a generalization of Physics-Informed Neural Networks (PINN) and General-Kindred Physics-Informed Neural Networks (GKPINN) approaches. This is a decomposition method based on the idea of asymptotic analysis. Compared to PINN, the ASPINN method has a strong fitting ability for solving SPDEs due to the placement of exponential layers at the boundary layer. Unlike GKPINN, ASPINN lessens the number of fully connected layers, thereby reducing the training cost more effectively. Moreover, ASPINN theoretically approximates the solution at the boundary layer more accurately, which accuracy is also improved compared to GKPINN. We demonstrate the effect of ASPINN by solving diverse classes of SPDEs, which clearly shows that the ASPINN method is promising in boundary layer problems. Furthermore, we introduce Chebyshev Kolmogorov-Arnold Networks (Chebyshev-KAN) instead of MLP, achieving better performance in various experiments.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# 適応型大規模言語モデルが糖尿病治療における複数の医療作業を促進する An adapted large language model facilitates multiple medical tasks in diabetes care ( http://arxiv.org/abs/2409.13191v1 ) ライセンス: Link先を確認	Lai Wei, Zhen Ying, Muyang He, Yutong Chen, Qian Yang, Yanzhe Hong, Jiaping Lu, Xiaoying Li, Weiran Huang, Ying Chen,	(参考訳) 糖尿病は世界的な健康上の重荷となる慢性疾患であり、糖尿病管理の最適化には複数のステークホルダーの協力が必要である。大規模言語モデル(LLM)は、様々な医療シナリオにおいて有望であるが、様々な糖尿病タスクにおけるその効果は証明されていない。本研究では,糖尿病特異的LSMを訓練し,評価するための枠組みを導入した。最初に、データ収集、フィルタリング、拡張、改善を含む包括的なデータ処理パイプラインを開発しました。このアプローチは、高品質で糖尿病特異的なデータセットの作成に寄与し、スクラッチから完全に評価ベンチマークをいくつか作成する。収集したトレーニングデータセットを用いて糖尿病特異的LLMファミリーを微調整し,他のLLMと比較した各種糖尿病タスクの理解と処理の最先端性を示した。さらに, 糖尿病治療におけるモデルの有用性について臨床的検討を行い, パーソナライズされた医療提供, 医療支援, 臨床業務の合理化などについて検討した。そこで本研究では,糖尿病特異的LLMファミリーを開発・評価する枠組みを導入し,臨床実践の強化と,エンドユーザーと対面した糖尿病支援のためのパーソナライズされたデータ駆動型サポートの提供の可能性を強調した。コードはGitHubでhttps://github.com/waltonfuture/Diabetica.comで提供されている。 Diabetes is a chronic disease that poses a significant global health burden, and optimizing diabetes management requires multi-stakeholder collaboration. Large language models (LLMs) have shown promise in various healthcare scenarios, but their effectiveness across a diverse range of diabetes tasks remains unproven. In this study, we introduced a framework to train and validate diabetes-specific LLMs. We first developed a comprehensive data processing pipeline that includes data collection, filtering, augmentation and refinement. This approach contributes to creating a high-quality, diabetes-specific dataset, and several evaluation benchmarks entirely from scratch. Utilizing the collected training dataset, we fine-tuned a diabetes-specific LLM family that demonstrated state-of-the-art proficiency in understanding and processing various diabetes tasks compared to other LLMs. Furthermore, clinical studies showed the potential applications of our models in diabetes care, including providing personalized healthcare, assisting medical education, and streamlining clinical tasks. In conclusion, our study introduced a framework to develop and evaluate a diabetes-specific LLM family, and highlighted its potential to enhance clinical practice and provide personalized, data-driven support for diabetes support when facing different end users. The code is provided via GitHub at https://github.com/waltonfuture/Diabetica.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# ChemDFM-X:化学のための大規模マルチモーダルモデルを目指して ChemDFM-X: Towards Large Multimodal Model for Chemistry ( http://arxiv.org/abs/2409.13194v1 ) ライセンス: Link先を確認	Zihan Zhao, Bo Chen, Jingpiao Li, Lu Chen, Liyang Wen, Pengyu Wang, Zichen Zhu, Danyang Zhang, Ziping Wan, Yansi Li, Zhongyang Dai, Xin Chen, Kai Yu,	(参考訳) AIツールの急速な開発は、化学を含む自然科学の研究に前例のない支援を提供すると予想されている。しかし、既存の単一タスク特化モデルや、新しい大規模マルチモーダルモデル(LMM)は、幅広い化学データモダリティやタスクカテゴリをカバーできない。化学者の真の要求に応えるために,LMMの潜在能力を活用した真に実用的で有用な研究アシスタントとして機能するクロスモーダルケミカル・ジェネラル・インテリジェンス(CGI)システムが必要である。本稿では,ChemDFM-X (ChemDFM-X) を初めて導入する。近似計算とタスク固有モデル予測により、初期モダリティから、多様なマルチモーダルデータを生成する。この戦略は十分な化学訓練コーパスを生成し、過剰なコストを大幅に削減し、7.6Mデータを含む命令チューニングデータセットを生成する。命令の微調整の後、ChemDFM-Xは様々なデータモダリティを持つ様々な化学タスクの広範な実験で評価される。その結果,マルチモーダルおよびモーダル間知識理解におけるChemDFM-Xの能力が示された。 ChemDFM-Xは、CGIに一歩近づいた化学における全てのモダリティの整合に向けた重要なマイルストーンである。 Rapid developments of AI tools are expected to offer unprecedented assistance to the research of natural science including chemistry. However, neither existing unimodal task-specific specialist models nor emerging general large multimodal models (LMM) can cover the wide range of chemical data modality and task categories. To address the real demands of chemists, a cross-modal Chemical General Intelligence (CGI) system, which serves as a truly practical and useful research assistant utilizing the great potential of LMMs, is in great need. In this work, we introduce the first Cross-modal Dialogue Foundation Model for Chemistry (ChemDFM-X). Diverse multimodal data are generated from an initial modality by approximate calculations and task-specific model predictions. This strategy creates sufficient chemical training corpora, while significantly reducing excessive expense, resulting in an instruction-tuning dataset containing 7.6M data. After instruction finetuning, ChemDFM-X is evaluated on extensive experiments of different chemical tasks with various data modalities. The results demonstrate the capacity of ChemDFM-X for multimodal and inter-modal knowledge comprehension. ChemDFM-X marks a significant milestone toward aligning all modalities in chemistry, a step closer to CGI.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# BoilerTAI: 教育フォーラムにおけるジェネレーティブAIを用いた指導の強化プラットフォーム BoilerTAI: A Platform for Enhancing Instruction Using Generative AI in Educational Forums ( http://arxiv.org/abs/2409.13196v1 ) ライセンス: Link先を確認	Anvit Sinha, Shruti Goyal, Zachary Sy, Rhianna Kuperus, Ethan Dickey, Andres Bejarano,	(参考訳) コントリビューション: このResearch Categoryトラックのフルペーパーは、Generative AI(GenAI)とオンラインの教育フォーラムをシームレスに統合し、スタッフの教育能力を高めるための新しいアプローチを提供する、実用的でスケーラブルなプラットフォームを記述している。このプラットフォームは、学生ポストとLLM(Large Language Model)との相互作用を円滑に進めることによって、指導スタッフが反応を効率的に管理し、洗練し、承認することを可能にする。この貢献は、指導支援の効率性と効果を高め、学生に提供する応答の質と速度を大幅に向上させ、全体としての学習経験を豊かにする。背景: ヴィゴツキーの社会文化的理論とより知識のある他者(MKO)の概念を基礎として,GenAIが学生とインストラクターの教育対話を強化するために補助的なMKOとして機能するかを検討する。調査質問:GenAIは、教育討論フォーラムに投稿された学生の質問に対して、指導要員の負担軽減にどの程度効果があるか? 方法論: 大規模なプログラミングコースにおいて混合メソッドのアプローチを用いることで、AI-TAは、学生の質問を事前に答えるためにAI支援プラットフォームを使用した。我々は、AI生成応答に対する修正頻度などの効率指標を分析し、AI-TAから定性的なフィードバックを収集した。その結果、AI-TAが生み出す反応に対する学生の反応と、人間のインストラクターが与える反応とでは有意な差は認められなかった。これは、GenAIが適切に管理された場合、教育ニーズを効果的に満たせることを示唆している。さらに、AI-TAは、学習の質を損なうことなく教育効率を高めるGenAIの可能性を指して、クエリに応答するために必要な認知負荷の低減を経験した。 Contribution: This Full paper in the Research Category track describes a practical, scalable platform that seamlessly integrates Generative AI (GenAI) with online educational forums, offering a novel approach to augment the instructional capabilities of staff. The platform empowers instructional staff to efficiently manage, refine, and approve responses by facilitating interaction between student posts and a Large Language Model (LLM). This contribution enhances the efficiency and effectiveness of instructional support and significantly improves the quality and speed of responses provided to students, thereby enriching the overall learning experience. Background: Grounded in Vygotsky's socio-cultural theory and the concept of the More Knowledgeable Other (MKO), the study examines how GenAI can act as an auxiliary MKO to enrich educational dialogue between students and instructors. Research Question: How effective is GenAI in reducing the workload of instructional staff when used to pre-answer student questions posted on educational discussion forums? Methodology: Using a mixed-methods approach in large introductory programming courses, human Teaching Assistants (AI-TAs) employed an AI-assisted platform to pre-answer student queries. We analyzed efficiency indicators like the frequency of modifications to AI-generated responses and gathered qualitative feedback from AI-TAs. Findings: The findings indicate no significant difference in student reception to responses generated by AI-TAs compared to those provided by human instructors. This suggests that GenAI can effectively meet educational needs when adequately managed. Moreover, AI-TAs experienced a reduction in the cognitive load required for responding to queries, pointing to GenAI's potential to enhance instructional efficiency without compromising the quality of education.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# 大規模言語モデル学習における局所SGDのスケーリング法則の探索 Exploring Scaling Laws for Local SGD in Large Language Model Training ( http://arxiv.org/abs/2409.13198v1 ) ライセンス: Link先を確認	Qiaozhi He, Xiaomin Zhuang, Zhihua Wu,	(参考訳) 本稿では,ゆるく接続されたデバイスでのトレーニングを容易にする分散最適化アルゴリズムであるLLMトレーニングにおける局所SGDのスケーリング法について検討する。実験により, モデルパラメータ, データセット, 計算資源など, 従来の手法と比較して, 局所的なSGDが競合する結果が得られることを示す。さらに,マルチクラスタセットアップやエッジコンピューティング環境など,様々な実践シナリオにおけるローカルSGDの適用について検討する。本研究は, 実効マルチクラスタLLMトレーニングに必要な条件を明らかにし, LLMトレーニングプロセスにおけるエッジコンピューティングリソースの活用の可能性と限界について検討した。これは、単一の大規模クラスタトレーニングの代替として、その生存性を示すものだ。 This paper investigates scaling laws for local SGD in LLM training, a distributed optimization algorithm that facilitates training on loosely connected devices. Through extensive experiments, we show that local SGD achieves competitive results compared to conventional methods, given equivalent model parameters, datasets, and computational resources. Furthermore, we explore the application of local SGD in various practical scenarios, including multi-cluster setups and edge computing environments. Our findings elucidate the necessary conditions for effective multi-cluster LLM training and examine the potential and limitations of leveraging edge computing resources in the LLM training process. This demonstrates its viability as an alternative to single large-cluster training.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# CFSP: 粗い活性化情報を持つLCMのための効率的な構造化プルーニングフレームワーク CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information ( http://arxiv.org/abs/2409.13199v1 ) ライセンス: Link先を確認	Yuxin Wang, Minghua Ma, Zekun Wang, Jingchang Chen, Huiming Fan, Liping Shan, Qing Yang, Dongliang Xu, Ming Liu, Bing Qin,	(参考訳) LLM(Large Language Models)の余剰パラメータと計算オーバーヘッドは、現実のアプリケーションに挑戦する。冗長パラメータを除去して非構造的あるいは構造的疎結合を目標とするネットワークプルーニングは,最近,LLM加速のために検討されている。既存のLCMプルーニング作業は、非構造化プルーニングに重点を置いている。対照的に、構造化プルーニングは一般的なデバイスでのレイテンシを低減することができる。しかし、構造的刈り込みを効率的に行い、特に疎度比の高い性能を維持することは依然として課題である。この目的のために、我々は、粗い(インターブロック)ときめ細かい(イントラブロック)アクティベーション情報の両方をガイドプルーニングの重要基準として活用する、CFSPと呼ばれる効率的な構造化プルーニングフレームワークを導入する。プルーニングは、機能アクティベーションを計算するために1つのフォワードパスしか必要としないため、非常に効率的である。具体的には,まず,各ブロックの重み付けを重要度に基づいて,各ブロックに分散予算を割り当てる。さらに,粗い重要度に基づいてトレーニングのオーバーヘッドを適応的に配分し,さらなる性能向上を図るリカバリファインチューニング戦略を導入する。実験結果から, CFSPは, 多様なモデルにおいて, 様々な予算にまたがる既存手法よりも優れていることがわかった。私たちのコードはhttps://github.com/wyxscir/CFSP.comで公開されます。 The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing LLM pruning works focus on unstructured pruning, which typically requires special hardware support for a practical speed-up. In contrast, structured pruning can reduce latency on general devices. However, it remains a challenge to perform structured pruning efficiently and maintain performance, especially at high sparsity ratios. To this end, we introduce an efficient structured pruning framework named CFSP, which leverages both Coarse (interblock) and Fine-grained (intrablock) activation information as an importance criterion to guide pruning. The pruning is highly efficient, as it only requires one forward pass to compute feature activations. Specifically, we first allocate the sparsity budget across blocks based on their importance and then retain important weights within each block. In addition, we introduce a recovery fine-tuning strategy that adaptively allocates training overhead based on coarse-grained importance to further improve performance. Experimental results demonstrate that CFSP outperforms existing methods on diverse models across various sparsity budgets. Our code will be available at https://github.com/wyxscir/CFSP.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# ニューラル・シンボリック協調蒸留:複雑な推論タスクのための小言語モデルの改善 Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks ( http://arxiv.org/abs/2409.13203v1 ) ライセンス: Link先を確認	Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Jun Zhao,	(参考訳) 本稿では,大規模言語モデル (LLMs, e g , \textgreater 13B) の複雑な推論能力を学習するための知識蒸留法である $\textbf{Ne}$ural-$\textbf{Sy}$mbolic $\textbf{C}$ollaborative $\textbf{D}$istillation$\textbf{NesyCD}$B を提案する。これらのタスクは、一般的な認知能力だけでなく、専門知識も要求するので、SLM(Small Language Models, SLMs, eg, $\leq$ 7B)にとって複雑な推論タスクは難しいと我々は主張する。そのため、NesyCDはLLMの一般的な能力と専門知識を異なる方法で蒸留する。一方,教師のLSMからパラメータ化されたニューラルネットワークの学生のSLMにのみ一般能力を蒸留する。一方,複雑な推論課題の専門的能力と非常識的知識については,記号的知識蒸留法を用いて,その専門的知識を記号的知識基盤(KB)内に獲得・保存する。一般的な機能と特殊な機能を分離することにより、提案したNesyCDは、より小さなモデルを活用し、パラメータ化されたニューラルネットワークとシンボリックKBをブレンドすることで、より優れたパフォーマンスを実現することができる。さらに、特殊なKBはよく一般化し、人間によって解釈され、操作される。実験の結果,NesyCDは領域内(BBH, GSM8K)および領域外(AGIEval, ARC)データセット上でのSLMの複雑な推論性能を大幅に向上させることがわかった。特に,LLaMA3-8B と Qwen2-7B は GPT-3.5-turbo を上回り,LLaMA3-70B に近づいた。私たちのコードはhttps://github.com/Xnhyacinth/NesyCDで公開されます。 In this paper, we propose $\textbf{Ne}$ural-$\textbf{Sy}$mbolic $\textbf{C}$ollaborative $\textbf{D}$istillation ($\textbf{NesyCD}$), a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs, e.g., \textgreater 13B). We argue that complex reasoning tasks are difficult for Small Language Models (SLMs, e.g., $\leq$ 7B), as these tasks demand not only general cognitive abilities but also specialized knowledge, which is often sparse and difficult for these neural-based SLMs to effectively capture. Therefore, NesyCD distills the general capabilities and specialized knowledge in LLMs using different manners. On the one hand, we distill only general abilities from teacher LLMs into the student SLMs of parameterized neural networks. On the other hand, for the specialized abilities and uncommon knowledge of a complex reasoning task, we employ a symbolic knowledge distillation approach to obtain and store the specialized knowledge within a symbolic knowledge base (KB). By decoupling general and specialized capabilities, the proposed NesyCD can achieve superior performance cost-effectively, utilizing smaller models and blending parameterized neural networks with symbolic KB. Moreover, the specialized KB generalizes well and is comprehended and manipulated by humans. Our experiments show that NesyCD significantly boosts SLMs' complex reasoning performance on in-domain (BBH, GSM8K) and out-of-domain (AGIEval, ARC) datasets. Notably, our approach enabled the LLaMA3-8B and Qwen2-7B to surpass GPT-3.5-turbo in performance and come close to matching LLaMA3-70B, despite the latter having nine times more parameters. Our code will be available at https://github.com/Xnhyacinth/NesyCD.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# 回帰誘導ニューラルネットワークを用いた環境危険因子による健康リスクの集団不均一性の解明 Unveiling Population Heterogeneity in Health Risks Posed by Environmental Hazards Using Regression-Guided Neural Network ( http://arxiv.org/abs/2409.13205v1 ) ライセンス: Link先を確認	Jong Woo Nam, Eun Young Choi, Jennifer A. Ailshire, Yao-Yi Chiang,	(参考訳) 環境の危険は、特定の個人を不均等に高いリスクに陥らせる。これらの危険が人間の健康を危険にさらす中、最も脆弱な集団の正確な同定は公衆衛生にとって重要である。モデレート多重回帰(MMR)は、リスクへの曝露と他の集団特性の間の相互作用項を線形回帰モデルに付加することにより、これを調査するための簡単な方法を提供する。しかし、脆弱性が多くの特徴の断面に隠されている場合、MMRは意味のある発見を見つける能力に制限されることが多い。本稿では、ニューラルネットワーク(ANN)を用いて予測器を非線形に結合し、局所予測器と相互作用する潜伏表現を生成するハイブリッド手法である回帰誘導ニューラルネットワーク(RegNN)を提案する。大気汚染(PM2.5)が認知機能に与える影響について,ReGNNを用いた調査を行った。従来のMMRモデルに適合する結果と比較することにより,従来のMMRを用いて隠蔽される集団の不均一性をReGNNを用いて発見できることを実証した。本質的には、ReGNNは、個人の健康リスクに対する感受性を効果的に要約し定量化することで、従来の回帰モデルを強化する新しいツールである。 Environmental hazards place certain individuals at disproportionately higher risks. As these hazards increasingly endanger human health, precise identification of the most vulnerable population subgroups is critical for public health. Moderated multiple regression (MMR) offers a straightforward method for investigating this by adding interaction terms between the exposure to a hazard and other population characteristics to a linear regression model. However, when the vulnerabilities are hidden within a cross-section of many characteristics, MMR is often limited in its capabilities to find any meaningful discoveries. Here, we introduce a hybrid method, named regression-guided neural networks (ReGNN), which utilizes artificial neural networks (ANNs) to non-linearly combine predictors, generating a latent representation that interacts with a focal predictor (i.e. variable measuring exposure to an environmental hazard). We showcase the use of ReGNN for investigating the population heterogeneity in the health effects of exposure to air pollution (PM2.5) on cognitive functioning scores. We demonstrate that population heterogeneity that would otherwise be hidden using traditional MMR can be found using ReGNN by comparing its results to the fit results of the traditional MMR models. In essence, ReGNN is a novel tool that enhances traditional regression models by effectively summarizing and quantifying an individual's susceptibility to health risks.	翻訳日:2024-11-07 11:29:51 公開日:2024-09-20
# 倫理的問題に対するレコメンダシステム監査のための統一因果関係 A Unified Causal Framework for Auditing Recommender Systems for Ethical Concerns ( http://arxiv.org/abs/2409.13210v1 ) ライセンス: Link先を確認	Vibhhu Sharma, Shantanu Gupta, Nil-Jana Akpinar, Zachary C. Lipton, Liu Leqi,	(参考訳) 推薦システムがさまざまなドメインに広くデプロイされるようになると、ユーザの信念や好みに影響を及ぼすようになる。推薦システムの監査は、レコメンデーションアルゴリズムの継続的な改善を保証するだけでなく、バイアスや倫理的懸念といった潜在的な問題に対する保護も必要である。本稿では、因果レンズからのレコメンデータシステム監査を考察し、監査基準を定義するための一般的なレシピを提供する。この一般的な因果監査フレームワークの下では、既存の監査指標を分類し、それらのギャップを識別する -- 特に、レコメンデーションプロセスのマルチステップのダイナミクスを考慮しつつ、ユーザエージェンシーを監査するための指標が欠如している。筆者らは,我々のフレームワークを活用して,ユーザ自身の推奨に影響を及ぼす能力と,他のユーザの推奨に影響を及ぼす能力を測定する,未来と過去の対応性と安定性の2つの尺度を提案する。我々は、これらのメトリクスを計算するための勾配ベースのアプローチとブラックボックスアプローチの両方を提供し、監査人がレコメンデータシステムに異なるレベルのアクセスでそれらを計算できるようにする。本実験では,提案手法の有効性を実証し,提案手法を用いてレコメンダシステムの設計を検証した。 As recommender systems become widely deployed in different domains, they increasingly influence their users' beliefs and preferences. Auditing recommender systems is crucial as it not only ensures the continuous improvement of recommendation algorithms but also safeguards against potential issues like biases and ethical concerns. In this paper, we view recommender system auditing from a causal lens and provide a general recipe for defining auditing metrics. Under this general causal auditing framework, we categorize existing auditing metrics and identify gaps in them -- notably, the lack of metrics for auditing user agency while accounting for the multi-step dynamics of the recommendation process. We leverage our framework and propose two classes of such metrics:future- and past-reacheability and stability, that measure the ability of a user to influence their own and other users' recommendations, respectively. We provide both a gradient-based and a black-box approach for computing these metrics, allowing the auditor to compute them under different levels of access to the recommender system. In our experiments, we demonstrate the efficacy of methods for computing the proposed metrics and inspect the design of recommender systems through these proposed metrics.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# MalMixer: Retrieval-Augmented Semi-Supervised Learningを用いたFew-Shotのマルウェア分類 MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning ( http://arxiv.org/abs/2409.13213v1 ) ライセンス: Link先を確認	Eric Li, Yifan Zhang, Yu Huang, Kevin Leach,	(参考訳) 近年のマルウェアの増殖と増殖は、感染家族による新しいサンプルを迅速に分類する実践者の能力を検証している。労働集約的なリバースエンジニアリングの取り組みとは対照的に、機械学習アプローチはスピードと精度の向上を実証している。しかし、既存のディープラーニングマルウェアのファミリー分類器の多くは、トレーニング前に手動で分析される大量のサンプルを使用して校正されなければならない。さらに、トレーニングセットの範囲を超えて、新しいマルウェアサンプルが出現するにつれて、トレーニングセットを更新するためには、さらなるリバースエンジニアリングの努力を払わなければならない。野生で発見された新しいサンプルの量は、現代の分類器を適切に訓練するのに十分なマルウェアをリバースエンジニアリングする実践者の能力にかなりの圧力を与えている。本稿では,半教師付き学習を用いたマルウェアファミリー分類器であるMalMixerを提案する。本稿では、マルウェアの特徴表現を増強し、半教師付きマルウェアファミリー分類の少数ショット性能を向上させるための新しいドメイン知識認識手法を提案する。そこで,MalMixerは,数ショットのマルウェアファミリー分類設定において,最先端のパフォーマンスを実現する。本研究は、軽量なドメイン認識機能拡張手法の有効性と有効性を確認し、マルウェア分類問題に対処する上で、類似の半教師付き分類器の能力を強調した。 Recent growth and proliferation of malware has tested practitioners' ability to promptly classify new samples according to malware families. In contrast to labor-intensive reverse engineering efforts, machine learning approaches have demonstrated increased speed and accuracy. However, most existing deep-learning malware family classifiers must be calibrated using a large number of samples that are painstakingly manually analyzed before training. Furthermore, as novel malware samples arise that are beyond the scope of the training set, additional reverse engineering effort must be employed to update the training set. The sheer volume of new samples found in the wild creates substantial pressure on practitioners' ability to reverse engineer enough malware to adequately train modern classifiers. In this paper, we present MalMixer, a malware family classifier using semi-supervised learning that achieves high accuracy with sparse training data. We present a novel domain-knowledge-aware technique for augmenting malware feature representations, enhancing few-shot performance of semi-supervised malware family classification. We show that MalMixer achieves state-of-the-art performance in few-shot malware family classification settings. Our research confirms the feasibility and effectiveness of lightweight, domain-knowledge-aware feature augmentation methods and highlights the capabilities of similar semi-supervised classifiers in addressing malware classification issues.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# 多重忠実度による不誠実な絡み合いの検出 Detecting unfaithful entanglement by multiple fidelities ( http://arxiv.org/abs/2409.13214v1 ) ライセンス: Link先を確認	Ruiqi Zhang, Zhaohui Wei,	(参考訳) 未知の量子状態に対する証明の絡み合いは、量子コンピューティングと量子物理学の基本的な問題である。実装が容易であるため、現代の量子実験におけるこの問題に対する最も一般的なアプローチは、忠実度に基づく絡み合った証人による標的量子状態の検出である。具体的には、対象状態と絡み合った純状態との忠実度が一定の値を超えると、対象状態が絡み合うことが保証される。しかし、近年では、いわゆる不信な量子状態が存在し、絡み合うことができるが、その絡み合いは、忠実性に基づく絡み合いの証人によっては証明できないことが判明している。本稿では,複数の忠実度を組み合わせることで,忠実度に基づく絡み合いをわずかに修正した場合,この手法で不信な量子状態に対する絡み合いを証明できることを,具体例で示す。特に,修正された絡み目の数学的構造を分析し,それらの最適設計を探索するアルゴリズムを提案する。 Certifying entanglement for unknown quantum states experimentally is a fundamental problem in quantum computing and quantum physics. Because of being easy to implement, a most popular approach for this problem in modern quantum experiments is detecting target quantum states with fidelity-based entanglement witnesses. Specifically, if the fidelity between a target state and an entangled pure state exceeds a certain value, the target state can be guaranteed to be entangled. Recently, however, it has been realized that there exist so-called unfaithful quantum states, which can be entangled, but their entanglement cannot be certified by any fidelity-based entanglement witnesses. In this paper, by specific examples we show that if one makes a slight modification to fidelity-based entanglement witnesses by combining multiple fidelities together, it is still possible to certify entanglement for unfaithful quantum states with this popular technique. Particularly, we will analyze the mathematical structure of the modified entanglement witnesses, and propose an algorithm that can search for the optimal designs for them.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# 3D-GSW:放射場における著作権保護のための3Dガウスめっき透かし 3D-GSW: 3D Gaussian Splatting Watermark for Protecting Copyrights in Radiance Fields ( http://arxiv.org/abs/2409.13222v1 ) ライセンス: Link先を確認	Youngdong Jang, Hyunje Park, Feng Yang, Heeju Ko, Euijin Choo, Sangpil Kim,	(参考訳) 近年, 高速レンダリングと画像品質により, 3次元空間を表現する革新的な手法として, 3次元ガウススプラッティングが注目されている。しかし、3Dガウシアンスプラッティングの著作権保護はまだ導入されていない。本稿では,3次元ガウススプラッティングのための新しい透かし法を提案する。提案手法は,事前学習した3次元ガウススプラッティングモデルを微調整することにより,バイナリメッセージを3次元ガウスに埋め込む。これを実現するために、離散フーリエ変換を用いて高頻度のパッチを見つけ出し、3Dガウス寄与ベクトルに基づいて3Dガウスを分割する周波数誘導密度化(FGD)を提案する。レンダリングされたピクセルの色に対する3Dガウスの寄与であり、レンダリング品質とビット精度の両方を改善している。さらに、レンダリング品質を向上させるために、適応的な勾配マスクを変更する。実験の結果,本手法は3次元ガウシアンに透かしを埋め込むことができ,攻撃に対するキャパシティとロバスト性を高めることができることがわかった。提案手法は最適化コストを削減し,他の手法と比較して最先端の性能を実現する。 Recently, 3D Gaussian splatting has been getting a lot of attention as an innovative method for representing 3D space due to rapid rendering and image quality. However, copyright protection for the 3D Gaussian splatting has not yet been introduced. In this paper, we present a novel watermarking method for 3D Gaussian splatting. The proposed method embeds a binary message into 3D Gaussians by fine-tuning the pre-trained 3D Gaussian splatting model. To achieve this, we present Frequency-Guided Densification (FGD) that utilizes Discrete Fourier Transform to find patches with high-frequencies and split 3D Gaussians based on 3D Gaussian Contribution Vector. It is each 3D Gaussian contribution to rendered pixel colors, improving both rendering quality and bit accuracy. Furthermore, we modify an adaptive gradient mask to enhance rendering quality. Our experiments show that our method can embed a watermark in 3D Gaussians imperceptibly with increased capacity and robustness against attacks. Our method reduces optimization cost and achieves state-of-the-art performance compared to other methods.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# 並列化可能な物理シミュレータを用いた非定常物体マニピュレーションのためのインクリメンタルFew-Shot適応 Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation using Parallelizable Physics Simulators ( http://arxiv.org/abs/2409.13228v1 ) ライセンス: Link先を確認	Fabian Baumeister, Lukas Mack, Joerg Stueckler,	(参考訳) 日々の環境やフレキシブル生産といったオープンワールド環境でタスクを実行するインテリジェントロボットにとって、ショット適応は重要な機能である。本稿では,モデル予測制御のための物理に基づく力学モデルに反復的に適応する,非包括的操作のための新しいアプローチを提案する。ロボットとオブジェクトの相互作用の例として,モデルのパラメータを漸進的に適用する。これは、並列化可能な剛体物理シミュレーションを動的世界モデルとして用いたパラメータのサンプリングベース最適化によって達成される。代わりに、効率的なサンプリングベース最適化を用いたモデル予測制御に最適化されたダイナミクスモデルを用いることができる。シミュレーションおよび実ロボットを用いたいくつかの物体押出実験において,本手法の有効性を検証した。 Few-shot adaptation is an important capability for intelligent robots that perform tasks in open-world settings such as everyday environments or flexible production. In this paper, we propose a novel approach for non-prehensile manipulation which iteratively adapts a physics-based dynamics model for model-predictive control. We adapt the parameters of the model incrementally with a few examples of robot-object interactions. This is achieved by sampling-based optimization of the parameters using a parallelizable rigid-body physics simulation as dynamic world model. In turn, the optimized dynamics model can be used for model-predictive control using efficient sampling-based optimization. We evaluate our few-shot adaptation approach in several object pushing experiments in simulation and with a real robot.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# 脳腫瘍分離のためのnnU-NetにおけるマルチスケールエンコーダとOmni次元動的畳み込み強化 Multiscale Encoder and Omni-Dimensional Dynamic Convolution Enrichment in nnU-Net for Brain Tumor Segmentation ( http://arxiv.org/abs/2409.13229v1 ) ライセンス: Link先を確認	Sahaj K. Mistry, Sourav Saini, Aashray Gupta, Aayush Gupta, Sunny Rai, Vinit Jakhetiya, Ujjwal Baid, Sharath Chandra Guntuku,	(参考訳) 脳腫瘍の分節はコンピュータ支援診断において重要な役割を担っている。本研究では nnU-Net アーキテクチャを改良した新しいセグメンテーションアルゴリズムを提案する。 nnU-Netアーキテクチャのエンコーダ部では、全次元動的畳み込み層を組み込んで従来の畳み込み層を強化し、特徴表現を改善した。同時に,様々な尺度からの現代的洞察を活用するマルチスケールアテンション戦略を提案する。モデルの有効性はBraTS-2023チャレンジの多様なデータセットで実証される。オムニ次元動的畳み込み(ODConv)層とマルチスケール機能を統合することで、複数の腫瘍セグメンテーションデータセット間でnnU-Netアーキテクチャの性能が大幅に向上する。注目すべきは、BraTS Africaデータセットの検証において、提案したモデルが良好な精度が得られることだ。 ODconvのソースコードと完全なトレーニングコードはGitHubで公開されている。 Brain tumor segmentation plays a crucial role in computer-aided diagnosis. This study introduces a novel segmentation algorithm utilizing a modified nnU-Net architecture. Within the nnU-Net architecture's encoder section, we enhance conventional convolution layers by incorporating omni-dimensional dynamic convolution layers, resulting in improved feature representation. Simultaneously, we propose a multi-scale attention strategy that harnesses contemporary insights from various scales. Our model's efficacy is demonstrated on diverse datasets from the BraTS-2023 challenge. Integrating omni-dimensional dynamic convolution (ODConv) layers and multi-scale features yields substantial improvement in the nnU-Net architecture's performance across multiple tumor segmentation datasets. Remarkably, our proposed model attains good accuracy during validation for the BraTS Africa dataset. The ODconv source code along with full training code is available on GitHub.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# DNNの不確かさと敵攻撃との関係 Relationship between Uncertainty in DNNs and Adversarial Attacks ( http://arxiv.org/abs/2409.13232v1 ) ライセンス: Link先を確認	Abigail Adeniran, Adewale Adeyemo,	(参考訳) ディープニューラルネットワーク(DNN)は、最先端の結果を達成し、多くの課題において人間の精度よりも優れており、自然言語処理、パターン認識、予測、制御最適化など、さまざまな分野に採用されている。しかし、DNNは結果の不確実性を伴うため、あるレベルの信頼の域外にある結果を予測する。これらの不確実性は、敵の攻撃によって悪化する可能性があるモデルまたはデータ制約に起因している。敵攻撃は、DNNに摂動入力を提供することを目的としており、DNNは誤った予測をしたり、モデルの不確実性を増大させる。本稿では,DNNの不確実性と敵攻撃との関係を考察し,敵攻撃がDNNの不確実性をいかに引き起こすかを強調した。 Deep Neural Networks (DNNs) have achieved state of the art results and even outperformed human accuracy in many challenging tasks, leading to DNNs adoption in a variety of fields including natural language processing, pattern recognition, prediction, and control optimization. However, DNNs are accompanied by uncertainty about their results, causing them to predict an outcome that is either incorrect or outside of a certain level of confidence. These uncertainties stem from model or data constraints, which could be exacerbated by adversarial attacks. Adversarial attacks aim to provide perturbed input to DNNs, causing the DNN to make incorrect predictions or increase model uncertainty. In this review, we explore the relationship between DNN uncertainty and adversarial attacks, emphasizing how adversarial attacks might raise DNN uncertainty.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# 混合音と人工音のみを用いたフェデレーション環境におけるラベル不均衡のバランス Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise ( http://arxiv.org/abs/2409.13235v1 ) ライセンス: Link先を確認	Kyle Sang, Tahseen Rabbani, Furong Huang,	(参考訳) 分散あるいはフェデレーションされた環境のクライアントは、しばしばラベルの異なるサブセットに向かってスキューされたデータを保持する。このシナリオは、異種または非IDフェデレーション学習と呼ばれ、モデルトレーニングとパフォーマンスを著しく妨げていることが示されている。本研究では,スキューラベル分布のバランスをとるための,単純かつ効果的な拡張戦略の限界について検討する。既存のアルゴリズムでは、ローカルトレーニングデータのミックスアップのような擬似イメージのみをトレーニングしていますが、当社の強化されたクライアントデータセットは、実画像と擬似イメージの両方で構成されています。他の文献とは対照的に,(1) DP-Instahide 変種を用いて画像符号化の復調性を低減し,(2) ツイストとして,訓練なしのStyleGAN が生成する人工ラベル付き「自然ノイズ」を用いて局所データを補う。これらのノイズのある画像は、自然のシーンに存在するパワースペクトルパターンを模倣し、ミキシング画像とともに、クライアント間のラベルの分布を均質化するのに役立ちます。ラベル付きCIFAR-10およびMNIST訓練において,混合と自然雑音による少量の増強が顕著に改善することが実証された。 Clients in a distributed or federated environment will often hold data skewed towards differing subsets of labels. This scenario, referred to as heterogeneous or non-iid federated learning, has been shown to significantly hinder model training and performance. In this work, we explore the limits of a simple yet effective augmentation strategy for balancing skewed label distributions: filling in underrepresented samples of a particular label class using pseudo-images. While existing algorithms exclusively train on pseudo-images such as mixups of local training data, our augmented client datasets consist of both real and pseudo-images. In further contrast to other literature, we (1) use a DP-Instahide variant to reduce the decodability of our image encodings and (2) as a twist, supplement local data using artificially labeled, training-free 'natural noise' generated by an untrained StyleGAN. These noisy images mimic the power spectra patterns present in natural scenes which, together with mixup images, help homogenize label distribution among clients. We demonstrate that small amounts of augmentation via mixups and natural noise markedly improve label-skewed CIFAR-10 and MNIST training.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# ひずみ定位を急激な不連続性としてモデル化するDeep Ritz法の適用可能性を探る Exploring the ability of the Deep Ritz Method to model strain localization as a sharp discontinuity ( http://arxiv.org/abs/2409.13241v1 ) ライセンス: Link先を確認	Omar León, Víctor Rivera, Angel Vázquez-Patiño, Jacinto Ulloa, Esteban Samaniego,	(参考訳) 本研究では, 変位場における急激な不連続性として固体中のひずみ局在をモデル化するためのDeep Ritz Method (DRM) の可能性について探索的検討を行った。このために、弾塑性固体の変種設定において、正則化された強不連続キネマティクスを用いる。対応する数学的モデルは、ニューラルネットワーク(ANN)を用いて離散化される。アーキテクチャはキネマティクスを処理し、境界値問題の変分文は損失関数によって処理される。このアプローチの背景にある主な考え方は、ANNのトレーニング可能なパラメータを用いて、平衡問題と局所化帯域の位置の両方を解決することである。概念実証として,DRM の枠組み内での弾塑性固体のひずみ局在の計算モデルが実現可能であることを示す。 We present an exploratory study of the possibilities of the Deep Ritz Method (DRM) for the modeling of strain localization in solids as a sharp discontinuity in the displacement field. For this, we use a regularized strong discontinuity kinematics within a variational setting for elastoplastic solids. The corresponding mathematical model is discretized using Artificial Neural Networks (ANNs). The architecture takes care of the kinematics, while the variational statement of the boundary value problem is taken care of by the loss function. The main idea behind this approach is to solve both the equilibrium problem and the location of the localization band by means of trainable parameters in the ANN. As a proof of concept, we show through both 1D and 2D numerical examples that the computational modeling of strain localization for elastoplastic solids within the framework of DRM is feasible.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# 単一画像からの閉塞除去のためのディープジェネレーティブ・アドバイサル・ネットワーク Deep Generative Adversarial Network for Occlusion Removal from a Single Image ( http://arxiv.org/abs/2409.13242v1 ) ライセンス: Link先を確認	Sankaraganesh Jonna, Moushumi Medhi, Rajiv Ranjan Sahay,	(参考訳) 今日では、インストレッシブイメージングデバイスの能力が強化され、インターネット上でのマルチメディアコンテンツの獲得と共有が大幅に増加しています。画像センサー技術の進歩にもかかわらず、‘textit{occlusions’のような厄介な条件は写真撮影を妨げ、監視、検出、認識などのアプリケーションの性能を低下させる可能性がある。オークルージョンセグメンテーションは、スケールのばらつきや照明の変化などにより困難である。同様に、前景の閉塞からシーンを復元することは、閉鎖された領域を正確に推定し、周囲のコンテキストとの整合性を維持するという複雑さのために、重大な課題を引き起こす。特に、画像のデフェンシングは、形状、テクスチャ、色、パターン、そしてしばしば散らかった環境の様々なバリエーションのために、独自の課題を提示している。本研究では,単一画像からの閉塞の自動検出と除去に焦点を当てた。本稿では,完全自動2段階畳み込みニューラルネットワークを提案する。我々は、GANを利用して、構造とテクスチャの両方を含む現実的なコンテンツを、インペイントのための単一ショットで合成する。ゼロショットの一般化を評価するため,提案したフェンス状閉塞セグメンテーションデータセットを用いて,訓練された閉塞検出モデルを評価した。データセットはGitHubにある。 Nowadays, the enhanced capabilities of in-expensive imaging devices have led to a tremendous increase in the acquisition and sharing of multimedia content over the Internet. Despite advances in imaging sensor technology, annoying conditions like \textit{occlusions} hamper photography and may deteriorate the performance of applications such as surveillance, detection, and recognition. Occlusion segmentation is difficult because of scale variations, illumination changes, and so on. Similarly, recovering a scene from foreground occlusions also poses significant challenges due to the complexity of accurately estimating the occluded regions and maintaining coherence with the surrounding context. In particular, image de-fencing presents its own set of challenges because of the diverse variations in shape, texture, color, patterns, and the often cluttered environment. This study focuses on the automatic detection and removal of occlusions from a single image. We propose a fully automatic, two-stage convolutional neural network for fence segmentation and occlusion completion. We leverage generative adversarial networks (GANs) to synthesize realistic content, including both structure and texture, in a single shot for inpainting. To assess zero-shot generalization, we evaluated our trained occlusion detection model on our proposed fence-like occlusion segmentation dataset. The dataset can be found on GitHub.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# 認知から認知へ:ソーシャルナビゲーションのための未来認識フレームワーク From Cognition to Precognition: A Future-Aware Framework for Social Navigation ( http://arxiv.org/abs/2409.13244v1 ) ライセンス: Link先を確認	Zeying Gong, Tianshuai Hu, Ronghe Qiu, Junwei Liang,	(参考訳) 混み合った空間で安全に効率的に移動するためには、ロボットは環境の現在の状態を認識できるだけでなく、将来の人間の動きも予測すべきである。本稿では,人間の軌道を明示的に予測し,将来の人間の進路を阻害する罰則を課すことにより,社会的に認識されたナビゲーションに取り組むための強化学習アーキテクチャであるFalconを提案する。現実的な評価を容易にするために,Social-HM3DとSocial-MP3Dの2つの新しいデータセットを含むSocialNavベンチマークを導入する。このベンチマークでは、自然の人間の動きと軌道パターンを取り入れた、シーン面積の大きさに基づいて、適切な量の人間のエージェントが集まっている大規模な写真リアリスティック屋内シーンを提供する。新しいベンチマークでは,最先端の学習手法と古典的なルールベースの経路計画アルゴリズムを用いて,詳細な実験分析を行う。その結果、今後の予測の重要性が示され、我々の手法は、約90%の個人空間コンプライアンスを維持しつつ、55%のタスク成功率を達成することができた。コードとデータセットをリリースします。デモのビデオはhttps://zeying-gong.github.io/projects/falcon/ で見ることができる。 To navigate safely and efficiently in crowded spaces, robots should not only perceive the current state of the environment but also anticipate future human movements. In this paper, we propose a reinforcement learning architecture, namely Falcon, to tackle socially-aware navigation by explicitly predicting human trajectories and penalizing actions that block future human paths. To facilitate realistic evaluation, we introduce a novel SocialNav benchmark containing two new datasets, Social-HM3D and Social-MP3D. This benchmark offers large-scale photo-realistic indoor scenes populated with a reasonable amount of human agents based on scene area size, incorporating natural human movements and trajectory patterns. We conduct a detailed experimental analysis with the state-of-the-art learning-based method and two classic rule-based path-planning algorithms on the new benchmark. The results demonstrate the importance of future prediction and our method achieves the best task success rate of 55% while maintaining about 90% personal space compliance. We will release our code and datasets. Videos of demonstrations can be viewed at https://zeying-gong.github.io/projects/falcon/ .	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# マルチタスク学習によるクロススキャナ腺癌分画の改善 Understanding Stain Separation Improves Cross-Scanner Adenocarcinoma Segmentation with Joint Multi-Task Learning ( http://arxiv.org/abs/2409.13246v1 ) ライセンス: Link先を確認	Ho Heon Kim, Won Chan Jeong, Young Shin Ko, Young Jin Park,	(参考訳) デジタル病理学は、腫瘍の診断とセグメンテーションに大きな進歩をもたらしたが、臓器、組織の準備、取得(ドメインシフトとして知られる)の違いによる画像の多様性は、現在のアルゴリズムの有効性を制限している。 COSAS(Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation)は、セグメンテーションアルゴリズムのドメインシフトに対するレジリエンスを改善することでこの問題に対処する。提案手法では,マルチデコーダオートエンコーダを用いたマルチタスク学習フレームワーク内での汚れ分離による教師なし学習を採用する。このモデルは、染色マトリクスと染色密度を分離し、色の変化を処理し、スキャナー間の一般化を改善する。さらに,ステン強化技術の混合によりモデルの堅牢性を高め,セグメンテーションにU-netアーキテクチャを使用した。本手法の新規性はマルチタスク学習フレームワーク内での染色分離の利用であり,色の変化から組織構造を効果的に切り離すことができる。このアプローチは、異なる病理組織染色のセグメンテーション精度と一般化を改善し、デジタル病理学におけるより信頼性の高い診断ツールの道を開くことを約束する。 Digital pathology has made significant advances in tumor diagnosis and segmentation, but image variability due to differences in organs, tissue preparation, and acquisition - known as domain shift - limits the effectiveness of current algorithms. The COSAS (Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation) challenge addresses this issue by improving the resilience of segmentation algorithms to domain shift, with Task 2 focusing on adenocarcinoma segmentation using a diverse dataset from six scanners, pushing the boundaries of clinical diagnostics. Our approach employs unsupervised learning through stain separation within a multi-task learning framework using a multi-decoder autoencoder. This model isolates stain matrix and stain density, allowing it to handle color variation and improve generalization across scanners. We further enhanced the robustness of the model with a mixture of stain augmentation techniques and used a U-net architecture for segmentation. The novelty of our method lies in the use of stain separation within a multi-task learning framework, which effectively disentangles histological structures from color variations. This approach shows promise for improving segmentation accuracy and generalization across different histopathological stains, paving the way for more reliable diagnostic tools in digital pathology.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# T2M-X:部分的注釈付きデータから表現型テキスト対運動生成を学習する T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data ( http://arxiv.org/abs/2409.13251v1 ) ライセンス: Link先を確認	Mingdian Liu, Yilin Liu, Gurunandan Krishnan, Karl S Bayer, Bing Zhou,	(参考訳) テキストプロンプトからヒューマノイドアニメーションを生成することは、アニメーション制作とAR/VR体験に大きな影響を与える。しかし,既存手法では表情や手の動きを除いた身体の動きデータしか生成できない。この制限は、主に全身のモーションデータセットが欠如しているため、プロダクション使用の準備が困難である。このようなデータセットを作成しようとする最近の試みは、人工的に強化されたデータにおける異なる身体部分間の運動の不整合、またはRGBビデオから抽出されたデータ品質の低下をもたらす。本研究では,部分注釈付きデータから表現力のあるテキスト・ツー・モーション生成を学習する2段階のT2M-Xを提案する。 T2M-Xは、高品質なモーション出力を保証するために、体、手、顔用の3つの別個のベクトル量子変分オートエンコーダ(VQ-VAEs)を訓練する。本研究は,データセットの制約に対するロバスト性を示すとともに,定量的および定性的にベースラインを大幅に改善したことを示す。 The generation of humanoid animation from text prompts can profoundly impact animation production and AR/VR experiences. However, existing methods only generate body motion data, excluding facial expressions and hand movements. This limitation, primarily due to a lack of a comprehensive whole-body motion dataset, inhibits their readiness for production use. Recent attempts to create such a dataset have resulted in either motion inconsistency among different body parts in the artificially augmented data or lower quality in the data extracted from RGB videos. In this work, we propose T2M-X, a two-stage method that learns expressive text-to-motion generation from partially annotated data. T2M-X trains three separate Vector Quantized Variational AutoEncoders (VQ-VAEs) for body, hand, and face on respective high-quality data sources to ensure high-quality motion outputs, and a Multi-indexing Generative Pretrained Transformer (GPT) model with motion consistency loss for motion generation and coordination among different body parts. Our results show significant improvements over the baselines both quantitatively and qualitatively, demonstrating its robustness against the dataset limitations.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# 知識グラフとLLMの活用による立法システムのサポートと監視 Leveraging Knowledge Graphs and LLMs to Support and Monitor Legislative Systems ( http://arxiv.org/abs/2409.13252v1 ) ライセンス: Link先を確認	Andrea Colombo,	(参考訳) 知識グラフ(KG)は、大規模データセットを構造化された相互接続された情報に整理し、さまざまな分野にわたるデータ分析を強化するために使用されている。立法の文脈において、KGsの潜在的な自然な応用の1つは、法律とそれらの記事とより広範な立法の文脈を結びつける複雑な相互接続のセットをモデル化することである。同時に、GPTのような大規模言語モデル(LLM)の台頭は、テキスト生成や文書起草といった法的な応用に新たな機会をもたらしている。彼らの可能性にもかかわらず、法的な文脈におけるLSMの使用は、新しい法律が毎日発行されるため、幻覚の欠如と最新の情報への依存を必要とするため、非常に重要である。本研究は、立法プロセスの相乗効果と支援について、立法知識グラフとLLMを用いて検討する。我々は、立法制度にKGを使うことの利点、LLMが正確なアウトプットを保証することによって立法活動をどのように支援できるか、そして、非技術系ユーザーがそのような技術を彼らの活動に利用できるようにする方法についての3つの主要な疑問に対処する。この目的のために,我々は,立法分析の実施可能性を高めることを目的とした,イタリアの立法に焦点を当てた対話型プラットフォームであるLegis AI Platformを開発した。 Knowledge Graphs (KGs) have been used to organize large datasets into structured, interconnected information, enhancing data analytics across various fields. In the legislative context, one potential natural application of KGs is modeling the intricate set of interconnections that link laws and their articles with each other and the broader legislative context. At the same time, the rise of large language models (LLMs) such as GPT has opened new opportunities in legal applications, such as text generation and document drafting. Despite their potential, the use of LLMs in legislative contexts is critical since it requires the absence of hallucinations and reliance on up-to-date information, as new laws are published on a daily basis. This work investigates how Legislative Knowledge Graphs and LLMs can synergize and support legislative processes. We address three key questions: the benefits of using KGs for legislative systems, how LLM can support legislative activities by ensuring an accurate output, and how we can allow non-technical users to use such technologies in their activities. To this aim, we develop Legis AI Platform, an interactive platform focused on Italian legislation that enhances the possibility of conducting legislative analysis and that aims to support lawmaking activities.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# Informative Graph Neural Network を用いたデータドリフトにおける時空間インダクティブ予測 Inductive Spatial Temporal Prediction Under Data Drift with Informative Graph Neural Network ( http://arxiv.org/abs/2409.13253v1 ) ライセンス: Link先を確認	Jialun Zheng, Divya Saxena, Jiannong Cao, Hanchen Yang, Penghui Ruan,	(参考訳) 帰納的時空間予測は、非常にダイナミックなシナリオ(例えば、交通システム、株式市場)に不可欠な、目に見えないデータを予測するために歴史的データを一般化することができる。しかし、外部イベント(都市構造の成長、市場崩壊など)や新たなエンティティ(ロケーション、株式など)は、時間の経過とともにデータドリフトを誘導することで予測精度を損なう可能性がある。既存の研究では、データドリフトに対抗するために不変パターンを抽出するが、パターンの多様性は無視する。この問題に対処するため,多変量パターンを抽出し,データドリフト時の予測精度を向上させるためのインフォーマティブグラフニューラルネットワーク(INF-GNN)を設計した。まず,一意に設計された指標であるRelation Importance (RI) を用いて,安定な実体と異なる空間関係を効果的に選択できる情報サブグラフを構築する。このサブグラフは、近隣のマージを通じて新しいエンティティのデータをさらに一般化する。次に,時間間隔内の影響関数を用いて抽出した貴重なタイムスタンプを強調するための情報的時間記憶バッファを提案する。このメモリバッファは、INF-GNNが影響力のある時間パターンを識別することを可能にする。最後に、RI損失の最適化はパターンの整合性のために設計されている。大規模なデータドリフト下の実世界のデータセットに関する大規模な実験は、INF-GNNが既存の選択肢よりも大幅に優れていることを示した。 Inductive spatial temporal prediction can generalize historical data to predict unseen data, crucial for highly dynamic scenarios (e.g., traffic systems, stock markets). However, external events (e.g., urban structural growth, market crash) and emerging new entities (e.g., locations, stocks) can undermine prediction accuracy by inducing data drift over time. Most existing studies extract invariant patterns to counter data drift but ignore pattern diversity, exhibiting poor generalization to unseen entities. To address this issue, we design an Informative Graph Neural Network (INF-GNN) to distill diversified invariant patterns and improve prediction accuracy under data drift. Firstly, we build an informative subgraph with a uniquely designed metric, Relation Importance (RI), that can effectively select stable entities and distinct spatial relationships. This subgraph further generalizes new entities' data via neighbors merging. Secondly, we propose an informative temporal memory buffer to help the model emphasize valuable timestamps extracted using influence functions within time intervals. This memory buffer allows INF-GNN to discern influential temporal patterns. Finally, RI loss optimization is designed for pattern consolidation. Extensive experiments on real-world dataset under substantial data drift demonstrate that INF-GNN significantly outperforms existing alternatives.	翻訳日:2024-11-07 11:18:04 公開日:2024-09-20
# 神経群形成による創発的集団再生 Emergent Collective Reproduction via Evolving Neuronal Flocks ( http://arxiv.org/abs/2409.13254v1 ) ライセンス: Link先を確認	Nam H. Le, Richard Watson, Mike Levin, Chrys Buckley,	(参考訳) この研究は、複雑な生殖集団の出現をシミュレートするために、複雑に自己組織化と自然選択を融合させる新しい人工生命の枠組みであるVitaNovaを通じて、個人性(ETI)の進化的遷移の理解を促進する。捕食者と空間的制約によってそれらに挑戦する環境の中で個々のエージェントを動的にモデル化することで、VitaNovaは単純なエージェントが集合的複製を示す凝集単位へと進化するメカニズムを解明する。この結果は, 自己組織的行動と適応的進化戦略の相乗効果を, ETIの基本的要因として示している。このアプローチは、高次の生物学的個性に対する深い理解に寄与するだけでなく、ETIの実証的研究、現在の理論的枠組みの挑戦、拡張における新たな先例となる。 This study facilitates the understanding of evolutionary transitions in individuality (ETIs) through a novel artificial life framework, named VitaNova, that intricately merges self-organization and natural selection to simulate the emergence of complex, reproductive groups. By dynamically modelling individual agents within an environment that challenges them with predators and spatial constraints, VitaNova elucidates the mechanisms by which simple agents evolve into cohesive units exhibiting collective reproduction. The findings underscore the synergy between self-organized behaviours and adaptive evolutionary strategies as fundamental drivers of ETIs. This approach not only contributes to a deeper understanding of higher-order biological individuality but also sets a new precedent in the empirical investigation of ETIs, challenging and extending current theoretical frameworks.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# 1H遷移金属化合物のハイブリッド次トポロジカル相と遷移 Hybrid-Order Topological Phase And Transition in 1H Transition Metal Compounds ( http://arxiv.org/abs/2409.13258v1 ) ライセンス: Link先を確認	Ning-Jing Yang, Zhigao Huang, Jian-Min Zhang,	(参考訳) 近年のハイブリッドトポロジカル状態(Nature 628, 527 (2024))の実験的観測から着想を得て, 1H遷移金属化合物(TMC)中のハイブリッド-オーダートポロジカル絶縁体(HOTI)を予測し, フェルミ準位付近で2階と1階のトポロジカル状態が共存することを示した。当初、1H-TMCはd軌道のバンドギャップのために2階の位相位相を示す。 p-軌道とd-軌道がカップリングすると、一階の位相特性が現れる。このハイブリッド秩序トポロジカル相転移は結晶場効果によって調整可能である。第一原理計算と組み合わせて、WTe2とNbSe2の相転移を説明する。さらに、HOTIの1階のトポロジカルバンドギャップは、強いスピンホール効果を示す。我々の発見は、2次元電子材料における新しいハイブリッド秩序トポロジカル位相を明らかにし、スピントロニクスの応用を強調した。 Inspired by recent experimental observations of hybrid topological states [Nature 628, 527 (2024)], we predict hybrid-order topological insulators (HOTIs) in 1H transition metal compounds (TMCs), where both second-order and first-order topological states coexist near the Fermi level. Initially, 1H-TMCs exhibit a second-order topological phase due to the d-orbital band gap. Upon coupling of p- and d- orbitals couple, first-order topological characteristics emerge. This hybrid-order topological phase transition is tunable via crystal field effects. Combined with first-principles calculations, we illustrate the phase transition with WTe2 and NbSe2. In addition, the first-order topological band gap of the HOTI exhibits a strong spin Hall effect. Our finding reveal novel hybrid-order topological phase in 2D electron materials and highlight spintronic applications.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# 深層学習を用いたゲノムスケール代謝ネットワークにおける欠失反応の解離のための一般化可能な枠組み A generalizable framework for unlocking missing reactions in genome-scale metabolic networks using deep learning ( http://arxiv.org/abs/2409.13259v1 ) ライセンス: Link先を確認	Xiaoyi Liu, Hongpeng Yang, Chengwei Ai, Ruihan Dong, Yijie Ding, Qianqian Yuan, Jijun Tang, Fei Guo,	(参考訳) 代謝過程の不完全な知識は、GEnome-scale Metabolic Model (GEMs)の精度を妨げ、システム生物学や代謝工学の進歩を妨げる。既存のギャップ埋め法は、計算予測と実験結果の差を最小限に抑えるために、表現型データに依存するのが一般的である。しかし、実験データやアノテートされたゲノムが利用可能になる前に、初期状態のGEMに自動的かつ正確なギャップ埋め方法がない。本研究では,GEM内のハイパーエッジ予測問題としてモデル化することで,ギャップ埋めの問題に対処するディープラーニング駆動ツールであるCLOSEgapsを紹介する。具体的には、CLOSEgapsは代謝ネットワークをハイパーグラフとしてマッピングし、そのハイパートポロジーの特徴を学習し、仮説的な反応を利用して、欠落した反応とギャップを識別する。この革新的なアプローチは、代謝ネットワーク内の既知の反応と仮説的な反応の両方を特徴づけ、キュレーションすることができる。 CLOSEgaps は人工的に導入した GEM の 96% 以上のギャップを正確に埋めることを示した。さらに、CLOSEgapsは24個のGEMの表現型予測を強化し、2つの生物において4つの重要な代謝物(ラクタート、エタノール、プロピオネート、サクシネート)を生産する際の顕著な改善を見出した。あらゆる GEM に対して広く適用可能な解として、CLOSEgaps はギャップ埋めプロセスの自動化と、反応と観察された代謝表現型の間の欠如した関係を明らかにするための有望なモデルである。 Incomplete knowledge of metabolic processes hinders the accuracy of GEnome-scale Metabolic models (GEMs), which in turn impedes advancements in systems biology and metabolic engineering. Existing gap-filling methods typically rely on phenotypic data to minimize the disparity between computational predictions and experimental results. However, there is still a lack of an automatic and precise gap-filling method for initial state GEMs before experimental data and annotated genomes become available. In this study, we introduce CLOSEgaps, a deep learning-driven tool that addresses the gap-filling issue by modeling it as a hyperedge prediction problem within GEMs. Specifically, CLOSEgaps maps metabolic networks as hypergraphs and learns their hyper-topology features to identify missing reactions and gaps by leveraging hypothetical reactions. This innovative approach allows for the characterization and curation of both known and hypothetical reactions within metabolic networks. Extensive results demonstrate that CLOSEgaps accurately gap-filling over 96% of artificially introduced gaps for various GEMs. Furthermore, CLOSEgaps enhances phenotypic predictions for 24 GEMs and also finds a notable improvement in producing four crucial metabolites (Lactate, Ethanol, Propionate, and Succinate) in two organisms. As a broadly applicable solution for any GEM, CLOSEgaps represents a promising model to automate the gap-filling process and uncover missing connections between reactions and observed metabolic phenotypes.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# 中国におけるASR誤り訂正のための大言語モデルはPinyinを理解すべきである Large Language Model Should Understand Pinyin for Chinese ASR Error Correction ( http://arxiv.org/abs/2409.13262v1 ) ライセンス: Link先を確認	Yuang Li, Xiaosong Qiao, Xiaofeng Zhao, Huan Zhao, Wei Tang, Min Zhang, Hao Yang,	(参考訳) 大規模言語モデルは、生成誤り訂正によって自動音声認識システムを強化することができる。本稿では,中国語の中国語の音声表現であるPinyiを利用して中国語のASR誤り訂正を改善するPinyin-enhanced GECを提案する。提案手法は, 合成誤差をトレーニングに用い, 推論時に最良仮説を用いる。さらに,Pinyinとテキスト間の変換タスクによる特徴空間の整合性を考慮したマルチタスク学習手法を提案する。 Aishell-1とCommon Voiceデータセットの実験は、我々のアプローチがテキストのみの入力でGECを一貫して上回っていることを示している。より重要なことは、PY-GECの有効性とマルチタスクトレーニングの2つの側面から、直感的な説明を提供することである。 1)ピニイン特徴に対する注意重量の増加,及び 2)Pinyinとテキスト隠蔽状態の整列した特徴空間。 Large language models can enhance automatic speech recognition systems through generative error correction. In this paper, we propose Pinyin-enhanced GEC, which leverages Pinyi, the phonetic representation of Mandarin Chinese, as supplementary information to improve Chinese ASR error correction. Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference. Additionally, we introduce a multitask training approach involving conversion tasks between Pinyin and text to align their feature spaces. Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input. More importantly, we provide intuitive explanations for the effectiveness of PY-GEC and multitask training from two aspects: 1) increased attention weight on Pinyin features; and 2) aligned feature space between Pinyin and text hidden states.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# ライフスパン認知システムに向けて Towards LifeSpan Cognitive Systems ( http://arxiv.org/abs/2409.13265v1 ) ライセンス: Link先を確認	Yu Wang, Chi Han, Tongtong Wu, Xiaoxin He, Wangchunshu Zhou, Nafis Sadeq, Xiusi Chen, Zexue He, Wei Wang, Gholamreza Haffari, Heng Ji, Julian McAuley,	(参考訳) シミュレーションされたデジタル世界であれ、人間社会であれ、複雑な環境と継続的に対話する人間のようなシステムを構築することは、いくつかの重要な課題を提示している。これの中心は、相互作用を経験と呼ぶ連続して高周波の相互作用を可能にすることである。本稿では,このシステムをLifeSpan Cognitive System (LSCS)と呼ぶ。 LSCSの重要な特徴は、過去の経験を維持し、正確にリコールしながら、インクリメンタルで迅速な更新を行う機能である。本稿は,(1)抽象化と経験の融合,(2)正確なリコールによる長期維持という2つの大きな課題を特定する。これらの特性は、新しい経験を保存し、過去の経験を整理し、関連する歴史的データを活用する方法で環境に反応するために不可欠である。通常、微調整や特定のドメインやタスクのパフォーマンス向上に集中するために大きなコーパスに依存している継続学習を持つ言語モデルとは異なり、LSCSは環境からの新たな情報を高速かつ漸進的に更新する必要がある。上記の2つの課題を解決する可能性を持つ既存の技術は、過去の経験を保存するのに必要な相対空間を測定する概念的尺度であるストレージ複雑度(Storage Complexity)に基づいて、4つのクラスに分類される。これら4つの技術のそれぞれには、それぞれ独自の強みと限界がある。既存の技術がLSCSのみを達成できないことを考えると、LSCSには4種類の技術を統合する新しいパラダイムが提案されている。新パラダイムは,2つのコアプロセス – 吸収エクスペリエンスと生成応答 – を通じて運用される。 Building a human-like system that continuously interacts with complex environments -- whether simulated digital worlds or human society -- presents several key challenges. Central to this is enabling continuous, high-frequency interactions, where the interactions are termed experiences. We refer to this envisioned system as the LifeSpan Cognitive System (LSCS). A critical feature of LSCS is its ability to engage in incremental and rapid updates while retaining and accurately recalling past experiences. We identify two major challenges in achieving this: (1) Abstraction and Experience Merging, and (2) Long-term Retention with Accurate Recall. These properties are essential for storing new experiences, organizing past experiences, and responding to the environment in ways that leverage relevant historical data. Unlike language models with continual learning, which typically rely on large corpora for fine-tuning and focus on improving performance within specific domains or tasks, LSCS must rapidly and incrementally update with new information from its environment at a high frequency. Existing technologies with the potential of solving the above two major challenges can be classified into four classes based on a conceptual metric called Storage Complexity, which measures the relative space required to store past experiences. Each of these four classes of technologies has its own strengths and limitations. Given that none of the existing technologies can achieve LSCS alone, we propose a novel paradigm for LSCS that integrates all four classes of technologies. The new paradigm operates through two core processes: Absorbing Experiences and Generating Responses.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# JoyHallo: マンダリンのデジタルヒューマンモデル JoyHallo: Digital human model for Mandarin ( http://arxiv.org/abs/2409.13268v1 ) ライセンス: Link先を確認	Sheng Shi, Xuyang Cao, Jun Zhao, Guoxin Wang,	(参考訳) 音声によるビデオ生成では、マンダリンのビデオを作成することが大きな課題である。包括的なマンダリンデータセットの収集は困難であり、マンダリンの複雑な唇の動きは、英語と比較してモデルトレーニングをさらに複雑にしている。本研究では、JD Health International Inc.の従業員から29時間のマンダリン音声ビデオを収集し、その結果、jdh-Halloデータセットが得られた。このデータセットには、さまざまな年齢と話し方が含まれており、会話と専門の医療トピックの両方を含んでいる。マンダリンのJoyHalloモデルに適応するために、我々は中国語wav2vec2モデルをオーディオ機能埋め込みに使用した。唇, 表情, ポーズの特徴間の機能間関係を捉えるために, 半疎結合構造を提案する。この統合により情報利用効率が向上するだけでなく、推論速度も14.3%向上する。特に、JoyHalloは、英語のビデオを生成する強力な能力を維持しており、優れた言語間の生成能力を誇示している。コードとモデルはhttps://jdh-algo.github.io/JoyHalloで公開されている。 In audio-driven video generation, creating Mandarin videos presents significant challenges. Collecting comprehensive Mandarin datasets is difficult, and the complex lip movements in Mandarin further complicate model training compared to English. In this study, we collected 29 hours of Mandarin speech video from JD Health International Inc. employees, resulting in the jdh-Hallo dataset. This dataset includes a diverse range of ages and speaking styles, encompassing both conversational and specialized medical topics. To adapt the JoyHallo model for Mandarin, we employed the Chinese wav2vec2 model for audio feature embedding. A semi-decoupled structure is proposed to capture inter-feature relationships among lip, expression, and pose features. This integration not only improves information utilization efficiency but also accelerates inference speed by 14.3%. Notably, JoyHallo maintains its strong ability to generate English videos, demonstrating excellent cross-language generation capabilities. The code and models are available at https://jdh-algo.github.io/JoyHallo.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# 経験的自由クラス増分学習のためのアダプティブ・マージングローバル分類器 Adaptive Margin Global Classifier for Exemplar-Free Class-Incremental Learning ( http://arxiv.org/abs/2409.13275v1 ) ライセンス: Link先を確認	Zhongren Yao, Xiaobin Chang,	(参考訳) EFCIL(Exemplar-free class-incremental Learning)は、新しいタスク学習に古いクラスサンプルが欠落しているため、大きな課題となる。古いクラスと新しいクラスの厳密な不均衡のため、学習された分類器は、新しいクラスに偏りやすい。さらに、EFCIL で機能抽出器を継続的に更新することは、例えば、古いクラスの特徴の識別能力を損なう可能性がある。既存の手法は主にバイアス付き分類器学習を扱うことに焦点を当てている。本研究では,提案手法を用いて両事例を考察する。具体的には,データ不均衡やサンプリングといった既存手法のバイアス要因を回避するために,まず分散ベースグローバル分類器(DBGC)を導入する。さらに重要なことに、古いクラスの妥協された分布は、単純な操作、分散拡大(VE)によってシミュレートされる。 VEをDBGCに組み込むと、EFCILの新たな分類が失われる。この損失は、Adaptive Margin Softmax Cross Entropy (AMarX)と等価である。そこで提案手法は,Adaptive Margin Global Classifier (AMGC) と呼ばれる。 AMGCは単純だが有効である。広範囲な実験により、AMGCは、難易度の高いEFCIL設定下で、画像分類結果に優れていることが示されている。詳細な分析も、さらなるデモのために提供されている。 Exemplar-free class-incremental learning (EFCIL) presents a significant challenge as the old class samples are absent for new task learning. Due to the severe imbalance between old and new class samples, the learned classifiers can be easily biased toward the new ones. Moreover, continually updating the feature extractor under EFCIL can compromise the discriminative power of old class features, e.g., leading to less compact and more overlapping distributions across classes. Existing methods mainly focus on handling biased classifier learning. In this work, both cases are considered using the proposed method. Specifically, we first introduce a Distribution-Based Global Classifier (DBGC) to avoid bias factors in existing methods, such as data imbalance and sampling. More importantly, the compromised distributions of old classes are simulated via a simple operation, variance enlarging (VE). Incorporating VE based on DBGC results in a novel classification loss for EFCIL. This loss is proven equivalent to an Adaptive Margin Softmax Cross Entropy (AMarX). The proposed method is thus called Adaptive Margin Global Classifier (AMGC). AMGC is simple yet effective. Extensive experiments show that AMGC achieves superior image classification results on its own under a challenging EFCIL setting. Detailed analysis is also provided for further demonstration.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# ランダムサンプリングによるディープ・ニューラル・オペレーター・ネットワークの効率的な学習 Efficient Training of Deep Neural Operator Networks via Randomized Sampling ( http://arxiv.org/abs/2409.13280v1 ) ライセンス: Link先を確認	Sharmila Karumuri, Lori Graham-Brady, Somdatta Goswami,	(参考訳) ニューラル演算子(NOs)は、無限次元関数空間間の写像を学習するためにディープニューラルネットワークを使用する。一般的なNOアーキテクチャであるDeep operator Network (DeepONet)は、様々な科学・工学的応用における複雑な力学のリアルタイム予測に成功している。本稿では,DeepONetのトレーニング中に採用するランダムサンプリング手法を提案する。提案手法は,物理系が定義されている有界領域の時空間位置に対応する基底関数を出力するDeepONetモデルのトランクネットワークを対象としている。伝統的に、損失関数を構築しながら、DeepONetトレーニングは、全ての出力関数がイテレーション毎に評価される時空間点の均一なグリッドを考える。このアプローチは、確率勾配降下(SGD)オプティマイザの制限により、バッチサイズが大きくなり、一般化が貧弱になり、メモリ要求が増大する。トランクネットの入力に対するランダムサンプリングは、これらの課題を軽減し、一般化を改善し、トレーニング中のメモリ要求を低減し、計算能力が大幅に向上する。 3つのベンチマーク例を通じて仮説を検証し、従来のトレーニングアプローチと比較して、全体的なテストエラーを同等または低いものにしながら、トレーニング時間の大幅な削減を実証した。実験の結果,訓練中にトランクネットワーク入力にランダム化を組み込むことで,DeepONetの効率性と堅牢性が向上し,複雑な物理系のモデリングにおけるフレームワークの性能向上に期待できる道筋が得られた。 Neural operators (NOs) employ deep neural networks to learn mappings between infinite-dimensional function spaces. Deep operator network (DeepONet), a popular NO architecture, has demonstrated success in the real-time prediction of complex dynamics across various scientific and engineering applications. In this work, we introduce a random sampling technique to be adopted during the training of DeepONet, aimed at improving the generalization ability of the model, while significantly reducing the computational time. The proposed approach targets the trunk network of the DeepONet model that outputs the basis functions corresponding to the spatiotemporal locations of the bounded domain on which the physical system is defined. Traditionally, while constructing the loss function, DeepONet training considers a uniform grid of spatiotemporal points at which all the output functions are evaluated for each iteration. This approach leads to a larger batch size, resulting in poor generalization and increased memory demands, due to the limitations of the stochastic gradient descent (SGD) optimizer. The proposed random sampling over the inputs of the trunk net mitigates these challenges, improving generalization and reducing memory requirements during training, resulting in significant computational gains. We validate our hypothesis through three benchmark examples, demonstrating substantial reductions in training time while achieving comparable or lower overall test errors relative to the traditional training approach. Our results indicate that incorporating randomization in the trunk network inputs during training enhances the efficiency and robustness of DeepONet, offering a promising avenue for improving the framework's performance in modeling complex physical systems.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# 純外部予測のための時間分散深層学習モデル -気象画像時系列を用いた水表深予測への適用- Time Distributed Deep Learning models for Purely Exogenous Forecasting. Application to Water Table Depth Prediction using Weather Image Time Series ( http://arxiv.org/abs/2409.13284v1 ) ライセンス: Link先を確認	Matteo Salis, Abdourrahmane M. Atto, Stefano Ferraris, Rosa Meo,	(参考訳) 地下水資源は水循環において最も重要な要素の1つであるため、それらを正確に予測するモデルを開発することは、持続可能な資源管理フレームワークにおいて重要な課題である。深層学習(DL)モデルは、特に空間分布データ(例えばラスタデータ)を供給することによって、水文学において非常に効果的であることが明らかにされている。多くの地域では、水文学的な測定は定期的に、または定期的に取得することは困難であり、場合によっては、最後に利用可能なデータは最新のものではない。逆に、水資源に大きな影響を及ぼす気象データは、通常より利用でき、高品質である。具体的には,Grana-Maira漁獲量(Piemonte, IT)の表層深度を,外因性気象画像時系列のみを用いて予測する2つの異なるDLモデルを提案する。画像時系列を扱うために、どちらのモデルも最初のTime Distributed Convolutional Neural Network (TDC) で構成され、各ステップで利用可能な画像をベクトル表現にエンコードする。最初のモデルであるTDC-LSTMは、LSTM層に基づくシークエンシャルモジュールを使用して、時間的関係を学習し、予測を出力する。第2のモデルであるTDC-UnPWaveNetは、代わりにWaveNetアーキテクチャの新バージョンを使用しており、ここでは、入力されたものに関して、シーケンスを短く、完全にシフトさせるように適応している。この目的と、UnPWaveNetの異なるシーケンス長を扱うために、タイム分散層のように振る舞う新しいチャネル分散層を設計しました。 TDC-LSTMとTDC-UnPWaveNetはどちらも顕著な結果を示した。 TDC-LSTMはバイアスの低減に重点を置いており、TDC-UnPWaveNetは相関の最大化とKGEに重点を置いている。 Groundwater resources are one of the most relevant elements in the water cycle, therefore developing models to accurately predict them is a pivotal task in the sustainable resources management framework. Deep Learning (DL) models have been revealed very effective in hydrology, especially by feeding spatially distributed data (e.g. raster data). In many regions, hydrological measurements are difficult to obtain regularly or periodically in time, and in some cases, last available data are not up to date. Reversely, weather data, which significantly impacts water resources, are usually more available and with higher quality. More specifically, we have proposed two different DL models to predict the water table depth in the Grana-Maira catchment (Piemonte, IT) using only exogenous weather image time series. To deal with the image time series, both models are made of a first Time Distributed Convolutional Neural Network (TDC) which encodes the image available at each time step into a vectorial representation. The first model, TDC-LSTM uses then a Sequential Module based on an LSTM layer to learn temporal relations and output the predictions. The second model, TDC-UnPWaveNet uses instead a new version of the WaveNet architecture, adapted here to output a sequence shorter and completely shifted in the future with respect to the input one. To this aim, and to deal with the different sequence lengths in the UnPWaveNet, we have designed a new Channel Distributed layer, that acts like a Time Distributed one but on the channel dimension, i.e. applying the same set of operations to each channel of the input. TDC-LSTM and TDC-UnPWaveNet have shown both remarkable results. However, the two models have focused on different learnable information: TDC-LSTM has focused more on lowering the bias, while the TDC-UnPWaveNet has focused more on the temporal dynamics maximising correlation and KGE.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# 点雲対応のための自己注意重みとしての局所ガウス Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence ( http://arxiv.org/abs/2409.13291v1 ) ライセンス: Link先を確認	Alessandro Riva, Alessandro Raganato, Simone Melzi,	(参考訳) ポイントクラウドマッチングのための現在のデータ駆動手法は、広範囲なトレーニング時間と計算資源を必要とし、モデルデプロイメントとアプリケーションにとって重要な課題を提示している。点雲マッチングタスクにおいて、エンコーダのみのトランスフォーマーアーキテクチャによる最近の進歩は、特に入力形状の各点を中心とするガウス関数に類似した、注意頭における意味論的意味のあるパターンの出現を明らかにしている。本研究では,これらのパターンを,トランスフォーマーアーキテクチャのアテンションヘッドに固定されたアテンション重みとして組み込むことにより,この現象をさらに解明する。本稿では,ガウシアンに対して所定の分散値を利用する方法と,学習可能なパラメータとして分散値を扱う方法の2つを評価する。さらに、ノイズデータの性能を分析し、ノイズに対する堅牢性を改善するための可能性を探る。その結果,注意重みの修正はトレーニングプロセスの促進だけでなく,最適化の安定性の向上にも寄与することがわかった。さらに,注入した情報が最も影響のある特定の層を同定し,その情報に対するネットワークの依存度を理解するためのアブレーション実験を行った。 Current data-driven methodologies for point cloud matching demand extensive training time and computational resources, presenting significant challenges for model deployment and application. In the point cloud matching task, recent advancements with an encoder-only Transformer architecture have revealed the emergence of semantically meaningful patterns in the attention heads, particularly resembling Gaussian functions centered on each point of the input shape. In this work, we further investigate this phenomenon by integrating these patterns as fixed attention weights within the attention heads of the Transformer architecture. We evaluate two variants: one utilizing predetermined variance values for the Gaussians, and another where the variance values are treated as learnable parameters. Additionally we analyze the performances on noisy data and explore a possible way to improve robustness to noise. Our findings demonstrate that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization. Furthermore, we conducted an ablation study to identify the specific layers where the infused information is most impactful and to understand the reliance of the network on this information.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# BPMの育児に向けて:持続可能なビジネスプロセスのための人間中心のアプローチ Towards Nudging in BPM: A Human-Centric Approach for Sustainable Business Processes ( http://arxiv.org/abs/2409.13295v1 ) ライセンス: Link先を確認	Cielo Gonzalez Moyano, Finn Klessascheck, Saimir Bala, Stephan A. Fahrenkrog-Petersen, Jan Mendling,	(参考訳) ビジネスプロセス管理(BPM)は、主に技術的なソリューションを見つけることに焦点を当てています。ナッジ(英: Nudging)は、心理学と行動経済学のアプローチであり、人々の行動を導く。本稿では,BPMライフサイクルの異なるフェーズにヌードを組み込む方法について述べる。さらに、より持続可能なビジネスプロセスのための代替戦略として、ヌードがどうあるべきかを概説する。我々は,nudgingの統合がプロセスマイニングやビジネスプロセス管理において,より人間中心となる重要な機会を提供することを示す。ナッジの採用に伴う課題についても論じる。 Business Process Management (BPM) is mostly centered around finding technical solutions. Nudging is an approach from psychology and behavioral economics to guide people's behavior. In this paper, we show how nudging can be integrated into the different phases of the BPM lifecycle. Further, we outline how nudging can be an alternative strategy for more sustainable business processes. We show how the integration of nudging offers significant opportunities for process mining and business process management in general to be more human-centric. We also discuss challenges that come with the adoption of nudging.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment ( http://arxiv.org/abs/2409.13299v1 ) ライセンス: Link先を確認	Yooseok Lim, Sujee Lee,	(参考訳) 個別の患者状況の正確な診断と適切な服薬戦略は、パーソナライズされた医療意思決定プロセスの中核的な要素である。患者の状態を再帰的に評価し、適切な薬剤を投与する治療処置を、強化学習(RL)問題として効果的にモデル化することができる。重要なことに、この文脈におけるRLの成功は、最適な治療戦略を正確に表現する、明確に定義された報酬関数の確立に依存している。しかし、RLにおける学習方向を明示的な指標の限られたセットで定義することは、必要なドメイン知識の本質的な複雑さのためにタスクを複雑にする。このアプローチはまた、RLポリシーが臨床医の治療意図を適切に反映していない可能性を高め、様々な状況や指標を考慮することで決定される。本研究では,臨床医の意図を反映した報酬関数の開発に焦点をあて,オフラインRL環境に沿ったオフライン逆強化学習(IRL)を行うオフラインモデルに基づくガイド・リワード学習(OMG-RL)を導入する。 OMG-RLを通じて、限られたデータから専門家の意図を含むパラメータ化された報酬関数を学習し、エージェントのポリシーを強化する。ヘパリン投与課題に対する提案手法の検証を行った。その結果、OMG-RLによる政策学習は有意義であり、ヘパリンの効果をモニタリングするための重要な指標である活性化部分トロンボプラスチン時間(aPTT)において、学習方針が正に強化されていることが確認された。このアプローチはヘパリン服薬問題だけでなく、一般のRLベースの薬物服薬タスクにも広く利用することができる。 Accurate diagnosis of individual patient conditions and appropriate medication dosing strategies are core elements of personalized medical decision-making processes. This therapeutic procedure, which entails recursively assessing the patient's condition and administering suitable medications, can effectively be modeled as a reinforcement learning (RL) problem. Crucially, the success of RL in this context depends on the establishment of a well-defined reward function that accurately represents the optimal treatment strategy. However, defining the learning direction in RL with only a limited set of explicit indicators complicates the task due to the inherent complexity of the required domain knowledge. This approach may also increase the likelihood that the RL policy does not adequately reflect the clinician's treatment intentions, which are determined by considering various situations and indicators. In this study, we focus on developing a reward function that reflects the clinician's intentions and introduce Offline Model-based Guided Reward Learning (OMG-RL), which performs offline inverse reinforcement learning (IRL) aligned with the offline RL environment. Through OMG-RL, we learn a parameterized reward function that includes the expert's intentions from limited data, thereby enhancing the agent's policy. We validate the proposed approach on the heparin dosing task. The results demonstrate that policy learning through OMG-RL is meaningful and confirm that the learned policy is positively reinforced in terms of activated partial thromboplastin time (aPTT), a key indicator for monitoring the effects of heparin. This approach can be broadly utilized not only for the heparin dosing problem but also for RL-based medication dosing tasks in general.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# 予測DNA断片化:機械学習を用いた化学測定法の非破壊的類似 Predicting DNA fragmentation: A non-destructive analogue to chemical assays using machine learning ( http://arxiv.org/abs/2409.13306v1 ) ライセンス: Link先を確認	Byron A Jacobs, Ifthakaar Shaik, Frando Lin,	(参考訳) 全世界では不妊率は増加しており、全出生の2.55%は2022年の体外受精(IVF)によって支えられている。男性不妊は、これらの症例の約半数の原因である。精子DNAの品質はIVFの成功に大きな影響を及ぼす。精子DNAの評価は伝統的に、IVFに対して精子細胞を不適格にする化学測定によって行われる。多くの複合要因が人口危機を招き、近年では全世界で出生率が低下している。このような補助的生殖技術(ART)が最近の研究の焦点となっている。同時に、人工知能はユビキタスに成長し、現代の生活の多くの側面に浸透している。最先端の機械学習の出現と、多くの分野での例外的な性能を生かし、この研究はこれらの成功に基づき、不安定な精子の画像から精子のDNA断片化を予測する新しい枠組みを提案する。精子の完全性を維持し、IVFのための精子の最適な選択を可能にする予測モデルをレンダリングする。 Globally, infertility rates are increasing, with 2.5\% of all births being assisted by in vitro fertilisation (IVF) in 2022. Male infertility is the cause for approximately half of these cases. The quality of sperm DNA has substantial impact on the success of IVF. The assessment of sperm DNA is traditionally done through chemical assays which render sperm cells ineligible for IVF. Many compounding factors lead to the population crisis, with fertility rates dropping globally in recent history. As such assisted reproductive technologies (ART) have been the focus of recent research efforts. Simultaneously, artificial intelligence has grown ubiquitous and is permeating more aspects of modern life. With the advent of state-of-the-art machine learning and its exceptional performance in many sectors, this work builds on these successes and proposes a novel framework for the prediction of sperm cell DNA fragmentation from images of unstained sperm. Rendering a predictive model which preserves sperm integrity and allows for optimal selection of sperm for IVF.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# MeMoir: メモリ使用量に基づくソフトウェア駆動のカバレッジチャネル MeMoir: A Software-Driven Covert Channel based on Memory Usage ( http://arxiv.org/abs/2409.13310v1 ) ライセンス: Link先を確認	Jeferson Gonzalez-Gomez, Jose Alejandro Ibarra-Campos, Jesus Yamir Sandoval-Morales, Lars Bauer, Jörg Henkel,	(参考訳) カバーチャネル攻撃は、現代のコンピューティングシステムに対する深刻な脅威として継続的に研究されてきた。ソフトウェアベースの秘密チャンネルは、悪質なアクター間の不正なコミュニケーションを確立するために仮想リソースを活用するため、これらの攻撃の通常、検出が難しい分岐である。本稿では,MeMoirについて紹介する。MeMoirは,初めてメモリ使用量をチャネルの媒体として利用する,ソフトウェア駆動のカバートチャネルである。汎用的なIntel x86-64ベースのデスクトップコンピュータとARM64ベースの組み込みシステムである。以上の結果から,新しいアーキテクチャおよびハードウェアに依存しないサーキットチャネルが有効であり,エラーの少ない中程度の伝送速度を実現することが示唆された。さらに,Hyper-V仮想化環境からWindows 11ホストシステムへの情報伝達が可能な攻撃事例も提示した。さらに,システムメモリの使用を監視することで,95%以上の精度で,偽陽性と偽陰性率の低いシステムに攻撃が存在するかどうかを予測できる機械学習ベースの検出器を実装した。最後に,他の通常のアプリケーションと比較して,システム内の低電力オーバーヘッドを誘導しながら,攻撃を効果的に軽減するノイズベース対策を提案する。 Covert channel attacks have been continuously studied as severe threats to modern computing systems. Software-based covert channels are a typically hard-to-detect branch of these attacks, since they leverage virtual resources to establish illegitimate communication between malicious actors. In this work, we present MeMoir: a novel software-driven covert channel that, for the first time, utilizes memory usage as the medium for the channel. We implemented the new covert channel on two real-world platforms with different architectures: a general-purpose Intel x86-64-based desktop computer and an ARM64-based embedded system. Our results show that our new architecture- and hardware-agnostic covert channel is effective and achieves moderate transmission rates with very low error. Moreover, we present a real use-case for our attack where we were able to communicate information from a Hyper-V virtualized enviroment to a Windows 11 host system. In addition, we implement a machine learning-based detector that can predict whether an attack is present in the system with an accuracy of more than 95% with low false positive and false negative rates by monitoring the use of system memory. Finally, we introduce a noise-based countermeasure that effectively mitigates the attack while inducing a low power overhead in the system compared to other normal applications.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# UIテスト再利用のためのスキル適応型模倣学習 Skill-Adpative Imitation Learning for UI Test Reuse ( http://arxiv.org/abs/2409.13311v1 ) ライセンス: Link先を確認	Mengzhou Wu, Hao Wang, Jun Ren, Yuan Cao, Yuetong Li, Alex Jiang, Dezhi Ran, Yitao Hu, Wei Yang, Tao Xie,	(参考訳) ユーザインターフェース(UI)テストケースを手作業で作成するコストを軽減するため、UIテストマイグレーションは、同様の機能を持つソースアプリから、ターゲットとするモバイルアプリケーション(アプリ)のテストケースを自動的に生成することを目的としている。従来、このプロセスは、ソースアプリのイベントをテキスト記述に基づいてターゲットアプリのイベントにマッピングする、シーケンシャルなUIイベントマッピング問題としてアプローチされてきた。これまでの研究は、NLPモデルのイベントマッピング精度の向上に重点を置いてきた。 NLP機能を備えた大規模言語モデル(LLM)の出現は、ほぼ完璧なイベントマッピングの可能性を示しているが、我々の研究は、LLMの高精度なイベントマッピングでさえ、ソースとターゲットアプリ間の実装の相違に対処するには不十分であり、UIテストマイグレーションのためのLLM駆動ソリューションの全体的な効果を低下させることを示した。そこで本研究では,2つの鍵となる設計によるUIテストマイグレーションの有効性向上を目的とした,スキル適応型模倣学習フレームワークSAILを提案する。まず、SAILは、ソーステストケースをデモとして活用し、テストケースの基礎となるスキルを多レベルに抽象化し、ソーステストケースからテスト情報を抽出して、ターゲットアプリ上でのテスト生成の知識ベースとする。第2に、SAILは学習したスキルのサブセットを選択的に再利用し、新しいコンテキストおよび履歴認識スキル適応を用いて、ターゲットアプリのテストケースの生成を誘導する。 SAILは任意の模倣学習技術でインスタンス化できるが、LLMのテキスト内学習機能を利用してSAILをインスタンス化する。評価の結果、SAILはUIテストマイグレーションの有効性を大幅に改善し、最先端のアプローチよりも149\%高い成功率を示した。 To alleviate the substantial cost of manually crafting user interface (UI) test cases, UI test migration aims to automatically generate test cases for a target mobile application (app) by adapting those from a source app that shares similar functionalities. Traditionally, this process has been approached as a sequential UI-event-mapping problem, where events in the source app are mapped to those in the target one based on their textual descriptions. Prior research has extensively focused on enhancing the event-mapping accuracy of NLP models. Although the advent of large language models (LLMs) with impressive NLP capabilities suggests the potential for near-perfect event-mapping, our study demonstrates that even the highly accurate event-mapping of LLMs is insufficient to address the implementation discrepancies between the source and the target apps, reducing the overall effectiveness of LLM-driven solutions for UI test migration. To address this challenge, in this paper, we propose SAIL, a skill-adaptive imitation learning framework designed to enhance the effectiveness of UI test migration through two key designs. First, SAIL leverages the source test cases as demonstrations and employs a multi-level abstraction of test cases' underlying skills, so as to extract the testing information from source test cases as the knowledge base for the subsequent test generation on the target app. Second, SAIL selectively reuses a subset of the learned skills to guide the generation of test cases for the target app with its novel context- and history-aware skill adaptation. While SAIL can be instantiated with any imitation learning techniques, we utilize the in-context learning capabilities of LLMs to instantiate SAIL. Evaluations results show that SAIL substantially improves the effectiveness of UI test migration, with 149\% higher success rate than state-of-the-art approaches.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# GAProtoNet:解釈可能なテキスト分類のためのマルチヘッドグラフアテンションに基づくプロトタイプネットワーク GAProtoNet: A Multi-head Graph Attention-based Prototypical Network for Interpretable Text Classification ( http://arxiv.org/abs/2409.13312v1 ) ライセンス: Link先を確認	Ximing Wen, Wenjuan Tan, Rosina O. Weber,	(参考訳) 事前訓練されたトランスフォーマーベース言語モデル(LM)は、強力な単語埋め込みによるテキスト分類タスクの大幅な改善を達成できることでよく知られているが、そのブラックボックスの性質は、解釈可能性の欠如につながっている。本稿では,LMエンコーダで構築したテキスト分類モデルの決定を記述した,新しいホワイトボックスのマルチヘッドグラフアテンションに基づくプロトタイプネットワークであるGAProtoNetを紹介する。提案手法では,入力ベクトルとプロトタイプをグラフ内のノードとみなし,入力ノードとプロトタイプノードの間のエッジを選択的に構築し,解釈可能なプロトタイプ表現を学習する。推測中、モデルは各プロトタイプに割り当てられた注目スコアによって重み付けされた活性型プロトタイプの線形結合に基づいて決定を行い、その選択を注意重みによって透過的に説明し、最も近いマッチングトレーニング例に投影する。複数の公開データセットを用いた実験により,元のブラックボックスLMの精度を犠牲にすることなく,より優れた結果が得られた。また,提案手法は4種類のネットワーク変動を比較検討し,F1の精度と精度を比較検討した。プロトタイプクラスタのケーススタディと可視化は,LMを用いて構築したブラックボックスモデルの決定を効率的に説明できることを示す。 Pretrained transformer-based Language Models (LMs) are well-known for their ability to achieve significant improvement on text classification tasks with their powerful word embeddings, but their black-box nature, which leads to a lack of interpretability, has been a major concern. In this work, we introduce GAProtoNet, a novel white-box Multi-head Graph Attention-based Prototypical Network designed to explain the decisions of text classification models built with LM encoders. In our approach, the input vector and prototypes are regarded as nodes within a graph, and we utilize multi-head graph attention to selectively construct edges between the input node and prototype nodes to learn an interpretable prototypical representation. During inference, the model makes decisions based on a linear combination of activated prototypes weighted by the attention score assigned for each prototype, allowing its choices to be transparently explained by the attention weights and the prototypes projected into the closest matching training examples. Experiments on multiple public datasets show our approach achieves superior results without sacrificing the accuracy of the original black-box LMs. We also compare with four alternative prototypical network variations and our approach achieves the best accuracy and F1 among all. Our case study and visualization of prototype clusters also demonstrate the efficiency in explaining the decisions of black-box models built with LMs.	翻訳日:2024-11-07 07:51:11 公開日:2024-09-20
# 高次元ベイズネットワーク学習のためのリング型分散アルゴリズム A Ring-Based Distributed Algorithm for Learning High-Dimensional Bayesian Networks ( http://arxiv.org/abs/2409.13314v1 ) ライセンス: Link先を確認	Jorge D. Laborda, Pablo Torrijos, José M. Puerta, José A. Gámez,	(参考訳) 高次元データからベイズネットワーク(BN)を学習することは複雑で時間を要する作業である。文献には水平(インスタンス)や垂直(変数)のパーティショニングに基づくアプローチがあるが、GESアルゴリズム自体に基づく手法を除いて、Greedy Equivalence Search (GES)アルゴリズムと同じ理論的性質を保証できない。本稿では, GES を局所学習アルゴリズムとして用い, GES と同じ理論的特性を保証しながら,CPU 時間の短縮を図った有向リングベース分散手法を提案する。この方法は、可能なエッジの集合を分割し、リング内の各プロセッサが受信したサブセットでのみ動作するように制限することを含む。グローバルラーニングプロセスは、収束基準を満たすまで数ラウンドを繰り返す反復アルゴリズムである。各ラウンドにおいて、各プロセッサは、前者のリングからBNを受け取り、それを自身のBNモデルと融合させ、その結果を、エッジの集合に制約された局所学習プロセスの開始解として利用する。その後、環の後継者に得られたモデルを送付する。 3つの大きなドメイン(400-1000変数)で実験を行い、GESとその高速バージョン(fGES)と比較して提案手法の有効性を実証した。 Learning Bayesian Networks (BNs) from high-dimensional data is a complex and time-consuming task. Although there are approaches based on horizontal (instances) or vertical (variables) partitioning in the literature, none can guarantee the same theoretical properties as the Greedy Equivalence Search (GES) algorithm, except those based on the GES algorithm itself. In this paper, we propose a directed ring-based distributed method that uses GES as the local learning algorithm, ensuring the same theoretical properties as GES but requiring less CPU time. The method involves partitioning the set of possible edges and constraining each processor in the ring to work only with its received subset. The global learning process is an iterative algorithm that carries out several rounds until a convergence criterion is met. In each round, each processor receives a BN from its predecessor in the ring, fuses it with its own BN model, and uses the result as the starting solution for a local learning process constrained to its set of edges. Subsequently, it sends the model obtained to its successor in the ring. Experiments were carried out on three large domains (400-1000 variables), demonstrating our proposal's effectiveness compared to GES and its fast version (fGES).	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# 品質・多様性における性能・再現性トレードオフの探求 Exploring the Performance-Reproducibility Trade-off in Quality-Diversity ( http://arxiv.org/abs/2409.13315v1 ) ライセンス: Link先を確認	Manon Flageat, Hannah Janmohamed, Bryan Lim, Antoine Cully,	(参考訳) 品質多様性(QD)アルゴリズムは多くの領域やアプリケーションで有望な結果を示している。しかし、複雑な実世界のアプリケーションでQDが使用される場合、ソリューションの適合性と行動推定の不確実性は依然として大きな課題である。不確実なアプリケーションの性能を改善するためのいくつかのアプローチが提案されているが、多くの人は重要な課題に対処できない。ほとんどの先行した方法は、適合性と再現性を共同で改善し、それらが矛盾する目的である可能性を無視する。例えば、ロボット工学では、解は不確実な環境で最大速度の90%を確実に歩けるが、より速く歩く解は転倒しやすい。これはトレードオフなので、この2つのソリューションのどちらか一方が他方よりも"良い"ものではありません。したがって、アルゴリズムは本質的に一方の解を選ぶことはできないが、これら2つの矛盾する目的に対して与えられた選好のみを強制することができる。本稿では,不確実なQDに対する性能再現性トレードオフとして,この問題を定式化する。そこで本稿では, トレードオフに対する最適解を求める新たな4つのQDアルゴリズムを提案する。また,これらの選好が事前に定義できない場合のA-posteriori QDアルゴリズムを提案する。以上の結果から,提案手法は与えられた嗜好を満たす解を見出すことができた。重要なことは、このトレードオフを単純に説明すれば、我々のアプローチは既存の不確実なQD手法よりも優れているということです。これは、性能再現性トレードオフを考慮すると、パフォーマンスのみを最適化した場合に通常見逃される重要なステップストーンがアンロックされることを示している。 Quality-Diversity (QD) algorithms have exhibited promising results across many domains and applications. However, uncertainty in fitness and behaviour estimations of solutions remains a major challenge when QD is used in complex real-world applications. While several approaches have been proposed to improve the performance in uncertain applications, many fail to address a key challenge: determining how to prioritise solutions that perform consistently under uncertainty, in other words, solutions that are reproducible. Most prior methods improve fitness and reproducibility jointly, ignoring the possibility that they could be contradictory objectives. For example, in robotics, solutions may reliably walk at 90% of the maximum velocity in uncertain environments, while solutions that walk faster are also more prone to falling over. As this is a trade-off, neither one of these two solutions is "better" than the other. Thus, algorithms cannot intrinsically select one solution over the other, but can only enforce given preferences over these two contradictory objectives. In this paper, we formalise this problem as the performance-reproducibility trade-off for uncertain QD. We propose four new a-priori QD algorithms that find optimal solutions for given preferences over the trade-offs. We also propose an a-posteriori QD algorithm for when these preferences cannot be defined in advance. Our results show that our approaches successfully find solutions that satisfy given preferences. Importantly, by simply accounting for this trade-off, our approaches perform better than existing uncertain QD methods. This suggests that considering the performance-reproducibility trade-off unlocks important stepping stones that are usually missed when only performance is optimised.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# JMedBench: 日本の生物医学大言語モデル評価ベンチマーク JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language Models ( http://arxiv.org/abs/2409.13317v1 ) ライセンス: Link先を確認	Junfeng Jiang, Jiahao Huang, Akiko Aizawa,	(参考訳) 日本語大言語モデル(LLM)の最近の発展は、主に一般ドメインに焦点を当てており、日本の生物医学 LLM の進歩は少ない。ひとつの障害は、比較のための包括的な大規模ベンチマークがないことだ。また, バイオメディカルLLMを評価するための資源も不十分である。そこで本研究では,4つのカテゴリに8つのLSMと5つのタスクにまたがる20のバイオメディカルデータセットを含む新しいベンチマークを提案する。実験結果から,(1)日本の生物医学的課題において,日本の生物医学的知識をより深く理解した LLM がより優れた性能を発揮すること,(2)日本の生物医学的領域を主目的としない LLM が相変わらず良好な性能を発揮すること,(3) 日本の生物医学的課題において既存の LLM を改良する余地がまだ残っていること,などが示唆された。さらに、この分野の発展をさらに促進できる洞察を提供する。我々の評価ツールはベンチマークに合わせており、データセットはhttps://huggingface.co/datasets/Coldog2333/JMedBenchで公開されています。 Recent developments in Japanese large language models (LLMs) primarily focus on general domains, with fewer advancements in Japanese biomedical LLMs. One obstacle is the absence of a comprehensive, large-scale benchmark for comparison. Furthermore, the resources for evaluating Japanese biomedical LLMs are insufficient. To advance this field, we propose a new benchmark including eight LLMs across four categories and 20 Japanese biomedical datasets across five tasks. Experimental results indicate that: (1) LLMs with a better understanding of Japanese and richer biomedical knowledge achieve better performance in Japanese biomedical tasks, (2) LLMs that are not mainly designed for Japanese biomedical domains can still perform unexpectedly well, and (3) there is still much room for improving the existing LLMs in certain Japanese biomedical tasks. Moreover, we offer insights that could further enhance development in this field. Our evaluation tools tailored to our benchmark as well as the datasets are publicly available in https://huggingface.co/datasets/Coldog2333/JMedBench to facilitate future research.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# SLaVA-CXR:胸部X線レポート自動化のための小言語と視覚アシスタント SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation ( http://arxiv.org/abs/2409.13321v1 ) ライセンス: Link先を確認	Jinge Wu, Yunsoo Kim, Daqian Shi, David Cliffton, Fenglin Liu, Honghan Wu,	(参考訳) 大規模言語モデル(LLMs)の成功に触発されて、臨床医を支援する医療分野におけるLSMの開発への研究関心が高まっている。しかし、病院では、クローズドソースの商用LCMを使用するにはプライバシーの問題があり、特に資源効率のよい地域や低所得国では、大規模な計算資源を必要とする。我々はChest X-Rayレポートの自動化に使用できるオープンソースのSmall Language and Vision Assistant (SLaVA-CXR)を提案する。そこで我々はまず,放射線技師の認知発達をシミュレートしたRe$3$Training法を提案し,認識・推論・報告の訓練方法においてモデルを最適化する。そこで,プライバシー規制に準拠した高品質で多様な学習コーパスを生成できるデータ合成手法RADEXを提案する。実験の結果,SLaVA-CXRは2.7Bのバックボーン上に構築されており,従来の最先端モデルよりも6倍高速な推論効率を実現していることがわかった。 Inspired by the success of large language models (LLMs), there is growing research interest in developing LLMs in the medical domain to assist clinicians. However, for hospitals, using closed-source commercial LLMs involves privacy issues, and developing open-source public LLMs requires large-scale computational resources, which are usually limited, especially in resource-efficient regions and low-income countries. We propose an open-source Small Language and Vision Assistant (SLaVA-CXR) that can be used for Chest X-Ray report automation. To efficiently train a small assistant, we first propose the Re$^3$Training method, which simulates the cognitive development of radiologists and optimizes the model in the Recognition, Reasoning, and Reporting training manner. Then, we introduce a data synthesis method, RADEX, which can generate a high-quality and diverse training corpus with privacy regulation compliance. The extensive experiments show that our SLaVA-CXR built on a 2.7B backbone not only outperforms but also achieves 6 times faster inference efficiency than previous state-of-the-art larger models.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# 核スピン-異性体重ね合わせのコヒーレントダイナミクス Coherent dynamics of a nuclear-spin-isomer superposition ( http://arxiv.org/abs/2409.13322v1 ) ライセンス: Link先を確認	Tamar Levin, Ziv Meir,	(参考訳) システムのサイズと複雑さの増加に伴う量子コヒーレンスを保存することは大きな課題である。分子は、様々な大きさと複雑さと多くの自由度を持ち、量子から古典的行動への遷移を研究するための優れたプラットフォームである。分子の量子制御の研究は振動と回転に焦点を当てているが、ここでは同じ分子の2つの核スピン異性体の間の量子重ね合わせを作ることに焦点を当てる。本稿では、2つの非結合の核-スピン-異性体状態間の強い結合を生み出すために、スペクトルにおける避けられた交差を利用して、異性体量子ビットを生成するスキームを提案する。我々は,4レベルハミルトニアンを用いて体系をモデル化し,システムの異なる状態とパラメータのコヒーレントなダイナミクスを探索する。我々の4レベルモデルとアプローチは、同様のエネルギーレベル構造を持つ他のシステムに適用できる。 Preserving quantum coherence with the increase of a system's size and complexity is a major challenge. Molecules, with their diverse sizes and complexities and many degrees of freedom, are an excellent platform for studying the transition from quantum to classical behavior. While most quantum-control studies of molecules focus on vibrations and rotations, we focus here on creating a quantum superposition between two nuclear-spin isomers of the same molecule. We present a scheme that exploits an avoided crossing in the spectrum to create strong coupling between two uncoupled nuclear-spin-isomer states, hence creating an isomeric qubit. We model our scheme using a four-level Hamiltonian and explore the coherent dynamics in the different regimes and parameters of our system. Our four-level model and approach can be applied to other systems with a similar energy-level structure.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# 2音駆動とパラメトリックポンプの接合効果による強い機械的スクイーズの発生 Generation of strong mechanical squeezing through the joint effect of two-tone driving and parametric pumping ( http://arxiv.org/abs/2409.13323v1 ) ライセンス: Link先を確認	Xiao-Jie Wu, Huan-Huan Cheng, Qiannan Wu, Cheng-Hua Bai, Shao-Xiong Wu,	(参考訳) 光学系における2音駆動とパラメトリックポンプの相乗的機構を利用して、強力な機械的スクイーズを効率的に作成する革新的な手法を提案する。光学パラメトリック増幅器によって誘導されるキャビティフィールドのスクイーズ効果は、2トーン駆動により圧縮されたメカニカル発振器に伝達でき、メカニカル発振器のスクイーズ化の度合いは、任意の単一機構によって得られたものを上回る。我々のプロジェクトは、幅広い条件で強力な機械的スクイーズを生成するために、多用途で効率的なアプローチを提供する。 We propose an innovative scheme to efficiently prepare strong mechanical squeezing through utilizing the synergistic mechanism of two-tone driving and parametric pumping in an optomechanical system. By reasonable choosing the system parameters, the proposal highlights the following prominent advantages: the squeezing effect of the cavity field induced by the optical parametric amplifier can be transferred to the mechanical oscillator, which has been squeezed by the two-tone driving, and the degree of squeezing of the mechanical oscillator will surpass that obtained by any single mechanism; the joint mechanism can enhance the degree of squeezing significantly and break the 3 dB mechanical squeezing limit, which is particularly evident in range where the red/blue-detuned ratio is sub-optimal; the mechanical squeezing achieved through this distinctive joint mechanism exhibits notable robustness against both thermal noise and decay of mechanical oscillator. Our project offers a versatile and efficient approach for generating strong mechanical squeezing across a wide range of conditions.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# 半教師付きデュアルモーダルセマンティックセマンティックセグメンテーションに向けて Towards Semi-supervised Dual-modal Semantic Segmentation ( http://arxiv.org/abs/2409.13325v1 ) ライセンス: Link先を確認	Qiulei Dong, Jianan Li, Shuang Deng,	(参考訳) 3Dおよび2Dデータ取得技術の開発により、シーンの点雲と画像の同時取得が容易になり、デュアルモーダルなセマンティックセマンティックセグメンテーションがさらに容易になった。ポイントクラウドとイメージを同時にセグメンテーションする既存の方法のほとんどは、ラベル付きトレーニングデータの量と品質に大きく依存している。しかし、大量のポイントワイドおよびピクセルワイドラベリング手順は時間がかかり、労働集約的である。そこで本研究では,少数のラベル付き点群,多数のラベル付き点群,およびラベル付き画像を用いて,PD-Netと呼ばれる半教師付きデュアルモーダルセマンティックセマンティックセマンティックセマンティクスタスクを処理する並列デュアルストリームネットワークを提案する。提案したPD-Netは、2つの並列ストリーム(元のストリームと擬似ラベル予測ストリームと呼ばれる)で構成されている。擬似ラベル予測ストリームは、未ラベルの点雲とその対応する画像の擬似ラベルを予測する。そして、ラベルなしデータを元のストリームに送信して自己学習を行う。各ストリームは、それぞれ3Dデータと2Dデータのための2つのエンコーダデコーダブランチを含む。各ストリームにおいて、複数のデュアルモーダル融合モジュールが二重モーダル特徴を融合するために探索される。さらに、擬似ラベル予測ストリームによって出力される擬似ラベルを最適化するために擬似ラベル最適化モジュールを探索した。 2つの公開データセットの実験結果から、提案手法は、比較半教師付き手法よりも優れているだけでなく、ほとんどの場合、完全教師付き手法で競合性能を達成できることが示された。 With the development of 3D and 2D data acquisition techniques, it has become easy to obtain point clouds and images of scenes simultaneously, which further facilitates dual-modal semantic segmentation. Most existing methods for simultaneously segmenting point clouds and images rely heavily on the quantity and quality of the labeled training data. However, massive point-wise and pixel-wise labeling procedures are time-consuming and labor-intensive. To address this issue, we propose a parallel dual-stream network to handle the semi-supervised dual-modal semantic segmentation task, called PD-Net, by jointly utilizing a small number of labeled point clouds, a large number of unlabeled point clouds, and unlabeled images. The proposed PD-Net consists of two parallel streams (called original stream and pseudo-label prediction stream). The pseudo-label prediction stream predicts the pseudo labels of unlabeled point clouds and their corresponding images. Then, the unlabeled data is sent to the original stream for self-training. Each stream contains two encoder-decoder branches for 3D and 2D data respectively. In each stream, multiple dual-modal fusion modules are explored for fusing the dual-modal features. In addition, a pseudo-label optimization module is explored to optimize the pseudo labels output by the pseudo-label prediction stream. Experimental results on two public datasets demonstrate that the proposed PD-Net not only outperforms the comparative semi-supervised methods but also achieves competitive performances with some fully-supervised methods in most cases.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# 拡散確率モデルによる生成空力設計 Generative Aerodynamic Design with Diffusion Probabilistic Models ( http://arxiv.org/abs/2409.13328v1 ) ライセンス: Link先を確認	Thomas Wagenaar, Simone Mancini, Andrés Mateo-Gabín,	(参考訳) 空力設計のためのジオメトリの最適化は、ジオメトリを評価し、反復的に改善するために、多くの高価なシミュレーションに依存することが多い。しばしばリフト・アンド・ドラッグ、空力モーメント、表面積の観点で、所望の要求に近く特性を持つ開始幾何を提供することで、シミュレーションの数を減らすことができる。生成モデルは、シミュレーションの大規模なデータセット上でジオメトリを一般化することにより、そのような開始ジオメトリを提供する可能性があることを示す。特に,XFOILシミュレーションで訓練した拡散確率モデルを用いて,所定の空力特性と制約を条件とした2次元翼ジオメトリーを合成する。翼はベルンシュタイン多項式でパラメータ化され、生成された設計の滑らかさを保証する。モデルが同一の要件と制約に対して多様な候補設計を生成可能であることを示し、最適化手順に複数の出発点を提供する設計空間を効果的に探索する。しかし、候補設計の品質は、データセット内の模擬設計の分布に依存する。重要なことに、このデータセットのジオメトリは、生成されたジオメトリが物理的であることを保証するために、拡散モデルの条件付けに使われていない他の要件や制約を満たす必要がある。 The optimization of geometries for aerodynamic design often relies on a large number of expensive simulations to evaluate and iteratively improve the geometries. It is possible to reduce the number of simulations by providing a starting geometry that has properties close to the desired requirements, often in terms of lift and drag, aerodynamic moments and surface areas. We show that generative models have the potential to provide such starting geometries by generalizing geometries over a large dataset of simulations. In particular, we leverage diffusion probabilistic models trained on XFOIL simulations to synthesize two-dimensional airfoil geometries conditioned on given aerodynamic features and constraints. The airfoils are parameterized with Bernstein polynomials, ensuring smoothness of the generated designs. We show that the models are able to generate diverse candidate designs for identical requirements and constraints, effectively exploring the design space to provide multiple starting points to optimization procedures. However, the quality of the candidate designs depends on the distribution of the simulated designs in the dataset. Importantly, the geometries in this dataset must satisfy other requirements and constraints that are not used in conditioning of the diffusion model, to ensure that the generated geometries are physical.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# 新たなデータセットを用いた非拘束環境における果実・野菜検出の促進 Enhancing Fruit and Vegetable Detection in Unconstrained Environment with a Novel Dataset ( http://arxiv.org/abs/2409.13330v1 ) ライセンス: Link先を確認	Sandeep Khanna, Chiranjoy Chattopadhyay, Suman Kundu,	(参考訳) コンピュータビジョンによる果物や野菜の検出の自動化は、農業の近代化、効率の向上、食品品質の確保、技術的に先進的で持続可能な農業慣行への貢献に不可欠である。本稿では,実環境における果実や野菜の検出とローカライズのためのエンドツーエンドパイプラインを提案する。これを実現するために、FRUVEG67というデータセットをキュレートした。このデータセットには、制約のないシナリオでキャプチャされた67種類の果物や野菜の画像が含まれており、クラス毎に手動で注釈付けされたサンプルはわずかである。我々は,残りの非注釈画像にラベルをつけるためにオブジェクトのバウンディングボックスを生成する半教師付きデータアノテーションアルゴリズム(SSDA)を開発した。 Fruit and Vegetable Detection Network (FVDNet) は3つの異なるグリッド構成を持つYOLOv7のアンサンブルバージョンである。我々は,境界ボックス予測に平均的アプローチ,およびクラス予測に投票機構を用いる。我々は、より小さな物体をよりよく検出するために、焦点損失とともにJensen-Shannon divergence (JSD)を統合した。実験の結果,従来のYOLOに比べてFVDNetの方が優れており,検出性能とローカライゼーション性能が著しく向上していることがわかった。平均平均精度(mAP)は全クラスで0.78であった。さらに,FVDNetの有効性をオープンカテゴリの冷凍機画像を用いて評価し,有望な結果を示した。 Automating the detection of fruits and vegetables using computer vision is essential for modernizing agriculture, improving efficiency, ensuring food quality, and contributing to technologically advanced and sustainable farming practices. This paper presents an end-to-end pipeline for detecting and localizing fruits and vegetables in real-world scenarios. To achieve this, we have curated a dataset named FRUVEG67 that includes images of 67 classes of fruits and vegetables captured in unconstrained scenarios, with only a few manually annotated samples per class. We have developed a semi-supervised data annotation algorithm (SSDA) that generates bounding boxes for objects to label the remaining non-annotated images. For detection, we introduce the Fruit and Vegetable Detection Network (FVDNet), an ensemble version of YOLOv7 featuring three distinct grid configurations. We employ an averaging approach for bounding-box prediction and a voting mechanism for class prediction. We have integrated Jensen-Shannon divergence (JSD) in conjunction with focal loss to better detect smaller objects. Our experimental results highlight the superiority of FVDNet compared to previous versions of YOLO, showcasing remarkable improvements in detection and localization performance. We achieved an impressive mean average precision (mAP) score of 0.78 across all classes. Furthermore, we evaluated the efficacy of FVDNet using open-category refrigerator images, where it demonstrates promising results.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# 術前多言語BERTを埋め込みに応用した悪性プロンプト注射の検出の改善 Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection ( http://arxiv.org/abs/2409.13331v1 ) ライセンス: Link先を確認	Md Abdur Rahman, Hossain Shahriar, Fan Wu, Alfredo Cuzzocrea,	(参考訳) 大きな言語モデル(LLM)は、その優れた能力と広範囲のアプリケーションに適用できることで有名である。しかし、この広範な利用は重大な脆弱性をもたらす。また、現在のアプローチでは、現実世界のアプリケーションにおけるこれらの脆弱性の複雑さや進化の性質に適切に対処できないため、大規模な言語モデルにおける悪意あるインジェクション攻撃に対する効果的な検出と緩和戦略の必要性に、大きなギャップがあることがよく観察されている。したがって、本研究は、実際のLLMアプリケーションに最も危険な脆弱性の一つである悪意のあるプロンプトインジェクション攻撃の影響に焦点を当てている。正規のプロンプトから悪意のあるプロンプトを分類するために、多言語BERT、DistilBertのような様々なBERT(Bidirectional Encoder Representations from Transformers)を適用する。また,多言語BERTを用いた迅速なテキストのトークン化と埋め込み生成が,ガウスネーブベイズ,ランダムフォレスト,サポートベクターマシン,ロジスティック回帰といった機械学習手法の性能向上にどのように貢献するかを観察した。各モデルの性能は、悪意のあるプロンプトを発見するためにバイナリ分類を改善するために、様々なパラメータで厳格に分析される。プロンプトを埋め込むための多言語BERTアプローチは、既存の作業を大幅に改善し、性能を上回り、ロジスティック回帰により96.55%の精度を達成した。さらに,モデルの誤り予測について検討し,その限界について考察した。この発見は、多様なLSMの脆弱性に最も適したモデルを見つけるために、様々なBERTをチューニングする研究者を導くことができる。 Large language models (LLMs) are renowned for their exceptional capabilities, and applying to a wide range of applications. However, this widespread use brings significant vulnerabilities. Also, it is well observed that there are huge gap which lies in the need for effective detection and mitigation strategies against malicious prompt injection attacks in large language models, as current approaches may not adequately address the complexity and evolving nature of these vulnerabilities in real-world applications. Therefore, this work focuses the impact of malicious prompt injection attacks which is one of most dangerous vulnerability on real LLMs applications. It examines to apply various BERT (Bidirectional Encoder Representations from Transformers) like multilingual BERT, DistilBert for classifying malicious prompts from legitimate prompts. Also, we observed how tokenizing the prompt texts and generating embeddings using multilingual BERT contributes to improve the performance of various machine learning methods: Gaussian Naive Bayes, Random Forest, Support Vector Machine, and Logistic Regression. The performance of each model is rigorously analyzed with various parameters to improve the binary classification to discover malicious prompts. Multilingual BERT approach to embed the prompts significantly improved and outperformed the existing works and achieves an outstanding accuracy of 96.55% by Logistic regression. Additionally, we investigated the incorrect predictions of the model to gain insights into its limitations. The findings can guide researchers in tuning various BERT for finding the most suitable model for diverse LLMs vulnerabilities.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# 大規模言語モデルにおける時間意識: Fact Recallのベンチマーク Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time ( http://arxiv.org/abs/2409.13338v1 ) ライセンス: Link先を確認	David Herel, Vojtech Bartek, Tomas Mikolov,	(参考訳) 大統領は誰ですか。答えは質問のタイミングによって変わる。大きな言語モデル(LLM)は様々な推論タスクで評価されるが、時間という重要な次元を見逃してしまうことが多い。現実のシナリオでは、回答の正しさはしばしば時間的文脈に結びついている。本稿では,LLMが時間に敏感な事実を処理できることを厳格に検証するための新しいデータセットを提案する。我々のベンチマークは、LLMの知識と正しい時間コンテキストの整合性を測定するための体系的な方法を提供し、現在の評価手法における重要なギャップを埋め、将来のモデルにおける現実の応用性を改善するための貴重なツールを提供する。 Who is the US President? The answer changes depending on when the question is asked. While large language models (LLMs) are evaluated on various reasoning tasks, they often miss a crucial dimension: time. In real-world scenarios, the correctness of answers is frequently tied to temporal context. In this paper, we introduce a novel dataset designed to rigorously test LLMs' ability to handle time-sensitive facts. Our benchmark offers a systematic way to measure how well LLMs align their knowledge with the correct time context, filling a key gap in current evaluation methods and offering a valuable tool for improving real-world applicability in future models.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# タブラルバイオメディカルデータのための低性能機械学習における特徴重要度の有効性 Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data ( http://arxiv.org/abs/2409.13342v1 ) ライセンス: Link先を確認	Youngro Lee, Giacomo Baruzzo, Jeonghwan Kim, Jongmo Seo, Barbara Di Camillo,	(参考訳) 表型バイオメディカルデータ分析では,特徴の重要性を議論する上で,高精度のチューニングモデルが必須であると考えられる。本研究では,性能の低いモデルも特徴として有用であることを示すとともに,一般的な信念に挑戦する。性能が連続的に低下するにつれて特徴量の変化を観測する実験を提案する。 3つの合成データセットと6つの実バイオメディカルデータセットを用いて、完全なデータセットから得られた特徴のランクを、サンプルサイズ(データ切断)が減ったもの(機能切断)または少ないもの(機能切断)と比較する。合成データセットでは、特徴切断は特徴ランクを変えないが、データ切断は低い性能で高い相違を示す。実際のデータセットでは、フィーチャーカットはデータカットと同じような、あるいは小さな変更を示しているが、いくつかのデータセットは反対である。相関を除去することで特徴の相互作用が制御される場合、特徴の切断は安定した安定性を示す。特徴値の分布を解析し,そのモデルが特徴間の特徴重要度を区別できない可能性を理論的に検証することにより,特徴切断による性能劣化にもかかわらず,データ切断によるものではないにもかかわらず,モデルが特徴重要度を識別できることを明らかにする。本研究は,データサイズが十分であれば,低性能レベルでも特徴重要度を維持可能であると結論付け,表型医療データ解析における最適下地性能に寄与する重要な要因である。本稿では,分類器の性能が十分でない場合でも,特徴量分析と統計解析を併用して相対的に特徴量を比較する可能性を示す。 In tabular biomedical data analysis, tuning models to high accuracy is considered a prerequisite for discussing feature importance, as medical practitioners expect the validity of feature importance to correlate with performance. In this work, we challenge the prevailing belief, showing that low-performing models may also be used for feature importance. We propose experiments to observe changes in feature rank as performance degrades sequentially. Using three synthetic datasets and six real biomedical datasets, we compare the rank of features from full datasets to those with reduced sample sizes (data cutting) or fewer features (feature cutting). In synthetic datasets, feature cutting does not change feature rank, while data cutting shows higher discrepancies with lower performance. In real datasets, feature cutting shows similar or smaller changes than data cutting, though some datasets exhibit the opposite. When feature interactions are controlled by removing correlations, feature cutting consistently shows better stability. By analyzing the distribution of feature importance values and theoretically examining the probability that the model cannot distinguish feature importance between features, we reveal that models can still distinguish feature importance despite performance degradation through feature cutting, but not through data cutting. We conclude that the validity of feature importance can be maintained even at low performance levels if the data size is adequate, which is a significant factor contributing to suboptimal performance in tabular medical data analysis. This paper demonstrates the potential for utilizing feature importance analysis alongside statistical analysis to compare features relatively, even when classifier performance is not satisfactory.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# $\textit{"I Don't Use AI for everything"$: Exploring Utility, Attitude, and Responsibility of AI-powered Tools in Software Development $\textit{"I Don't Use AI for Everything"}$: Exploring Utility, Attitude, and Responsibility of AI-empowered Tools in Software Development ( http://arxiv.org/abs/2409.13343v1 ) ライセンス: Link先を確認	Shidong Pan, Litian Wang, Tianyi Zhang, Zhenchang Xing, Yanjie Zhao, Qinghua Lu, Xiaoyu Sun,	(参考訳) AIを活用したツールが変革の力として現れ、ソフトウェア開発業界を根本的に改革し、さまざまな分野にまたがる広範な影響を約束している。本研究では、ソフトウェア開発プロセスにおけるAIを活用したツールの採用、影響、およびセキュリティに関する考察を行う。さまざまなバックグラウンドを持つ19人のソフトウェア実践者との半構造化インタビューを通じて、AIツールの有用性、開発者に対する態度、セキュリティとプライバシ責任の3つの重要な側面を探求する。ソフトウェア開発のさまざまな段階において,AIツールが広く採用されていることが判明した。開発者は一般的に、AIに対する肯定的な態度を示し、仕事を置き換える脅威ではなく、効率を高めるアシスタントと見なしている。しかし、彼らはまた、ソフトウェア開発における複雑な、馴染みのない、あるいは非常に専門的なタスクを扱うAIの能力の限界を認識した。セキュリティとプライバシに関して、私たちは開発者の間でさまざまなレベルのリスク意識を見つけました。私たちの研究は、ソフトウェア開発におけるAIの採用状況に関する洞察を提供し、ソフトウェア産業におけるAIの統合を効果的にナビゲートするために、実践者、組織、AIプロバイダ、規制機関に推奨する。 AI-empowered tools have emerged as a transformative force, fundamentally reshaping the software development industry and promising far-reaching impacts across diverse sectors. This study investigates the adoption, impact, and security considerations of AI-empowered tools in the software development process. Through semi-structured interviews with 19 software practitioners from diverse backgrounds, we explore three key aspects: the utility of AI tools, developers' attitudes towards them, and security and privacy responsibilities. Our findings reveal widespread adoption of AI tools across various stages of software development. Developers generally express positive attitudes towards AI, viewing it as an efficiency-enhancing assistant rather than a job replacement threat. However, they also recognized limitations in AI's ability to handle complex, unfamiliar, or highly specialized tasks in software development. Regarding security and privacy, we found varying levels of risk awareness among developers, with larger companies implementing more comprehensive risk management strategies. Our study provides insights into the current state of AI adoption in software development and offers recommendations for practitioners, organizations, AI providers, and regulatory bodies to effectively navigate the integration of AI in the software industry.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# マルチモーダルモデルのための新しい適応的微調整アルゴリズム:リモートセンシングにおける自己最適化分類と高品質データセットの選択 A Novel Adaptive Fine-Tuning Algorithm for Multimodal Models: Self-Optimizing Classification and Selection of High-Quality Datasets in Remote Sensing ( http://arxiv.org/abs/2409.13345v1 ) ライセンス: Link先を確認	Yi Ren, Tianyi Zhang, Zhixiong Han, Weibin Li, Zhiyang Wang, Wenbo Ji, Chenhao Qin, Chenbin Liang, Licheng Jiao,	(参考訳) マルチモーダル大モデルに対する適応的な微調整アルゴリズムを提案する。このアルゴリズムの中核ステップは2段階の切り離しを含む。まず、大量のデータを意味ベクトル空間に投影し、MiniBatchKMeansアルゴリズムを自動クラスタリングに使用する。この分類により、各クラスタ内のデータは、高いセマンティックな類似性を示す。次に、各クラスタのデータを処理し、マルチモーダル大モデルのベクトル空間における原データと摂動データの変換差を計算する。この差はデータの一般化指標として機能する。この測定値に基づいて、トレーニングのための高一般化ポテンシャルを持つデータを選択する。このアルゴリズムを用いて、GeoChatマルチモーダルリモートセンシングデータセットの3分の1を用いて、2台の3090 GPU上でInternLM-XComposer2-VL-7Bモデルをトレーニングした。その結果,我々のアルゴリズムは最先端のベースラインよりも優れていた。様々なベースライン最適に選択された3分の1のデータセットでトレーニングしたモデルは、実験的な検証に基づいて、フルデータセットでトレーニングしたモデルと比較して、さまざまなリモートセンシングメトリクスのパフォーマンスが1%低下しただけだった。このアプローチは、トレーニング時間を68.2%削減しながら、汎用能力を著しく維持した。さらに、UCMercedおよびAID評価データセットで89.86点、77.19点を記録し、GeoChatデータセットを5.43点、GeoChatデータセットを5.16点上回った。 LRBEN評価データセットでは0.91ポイントの低下しか示さなかった。 We propose an adaptive fine-tuning algorithm for multimodal large models. The core steps of this algorithm involve two stages of truncation. First, the vast amount of data is projected into a semantic vector space, and the MiniBatchKMeans algorithm is used for automated clustering. This classification ensures that the data within each cluster exhibit high semantic similarity. Next, we process the data in each cluster, calculating the translational difference between the original and perturbed data in the multimodal large model's vector space. This difference serves as a generalization metric for the data. Based on this metric, we select the data with high generalization potential for training. We applied this algorithm to train the InternLM-XComposer2-VL-7B model on two 3090 GPUs using one-third of the GeoChat multimodal remote sensing dataset. The results demonstrate that our algorithm outperforms the state-of-the-art baselines. various baselines. The model trained on our optimally chosen one-third dataset, based on experimental validation, exhibited only 1% reduction in performance across various remote sensing metrics compared to the model trained on the full dataset. This approach significantly preserved general-purpose capabilities while reducing training time by 68.2%. Furthermore, the model achieved scores of 89.86 and 77.19 on the UCMerced and AID evaluation datasets, respectively, surpassing the GeoChat dataset by 5.43 and 5.16 points. It only showed a 0.91-point average decrease on the LRBEN evaluation dataset.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# チューニング不要のパーソナライズド画像生成 Imagine yourself: Tuning-Free Personalized Image Generation ( http://arxiv.org/abs/2409.13346v1 ) ライセンス: Link先を確認	Zecheng He, Bo Sun, Felix Juefei-Xu, Haoyu Ma, Ankit Ramchandani, Vincent Cheung, Siddharth Shah, Anmol Kalia, Harihar Subramanyam, Alireza Zareian, Li Chen, Ankit Jain, Ning Zhang, Peizhao Zhang, Roshan Sumbaly, Peter Vajda, Animesh Sinha,	(参考訳) 拡散モデルは様々な画像と画像のタスクにおいて顕著な効果を示した。本研究では,画像のパーソナライズを目的とした最先端モデルであるImagine yourselfを紹介する。従来のチューニングベースのパーソナライズ手法とは異なり、Imagine自身はチューニング不要のモデルとして機能し、すべてのユーザが個別に調整することなく共有フレームワークを利用することができる。さらに、従来の研究は、複雑なプロンプトに従って視覚的品質を保ちながら、アイデンティティ保存のバランスをとるという課題に遭遇し、結果として参照画像のコピー・ペースト効果が強いモデルとなった。したがって、参照画像、 \eg、表情の変化、頭と体のポーズ、生成した画像の多様性を著しく変更する必要のあるプロンプトに従って画像を生成することは困難である。これらの制限に対処するために,提案手法を紹介する。 1)画像の多様性を促進するための新しい合成ペアデータ生成機構 2)3つのテキストエンコーダと、テキスト忠実性を改善するための完全に訓練可能なビジョンエンコーダを備えた完全に平行なアテンションアーキテクチャ 3) 視覚的品質の境界を徐々に推し進める, 粗大な多段階ファインタニング手法を提案する。我々の研究は、Imagine自身が最先端のパーソナライズモデルを超え、アイデンティティ保存、視覚的品質、テキストアライメントにおいて優れた能力を示すことを示した。このモデルは、様々なパーソナライズアプリケーションのための堅牢な基盤を確立する。人間の評価結果は、過去のパーソナライゼーションモデルと比較して、モデルのSOTA優越性(アイデンティティ保存、テキスト忠実性、視覚的魅力)を全側面にわたって評価する。 Diffusion models have demonstrated remarkable efficacy across various image-to-image tasks. In this research, we introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. Unlike conventional tuning-based personalization techniques, Imagine yourself operates as a tuning-free model, enabling all users to leverage a shared framework without individualized adjustments. Moreover, previous work met challenges balancing identity preservation, following complex prompts and preserving good visual quality, resulting in models having strong copy-paste effect of the reference images. Thus, they can hardly generate images following prompts that require significant changes to the reference image, \eg, changing facial expression, head and body poses, and the diversity of the generated images is low. To address these limitations, our proposed method introduces 1) a new synthetic paired data generation mechanism to encourage image diversity, 2) a fully parallel attention architecture with three text encoders and a fully trainable vision encoder to improve the text faithfulness, and 3) a novel coarse-to-fine multi-stage finetuning methodology that gradually pushes the boundary of visual quality. Our study demonstrates that Imagine yourself surpasses the state-of-the-art personalization model, exhibiting superior capabilities in identity preservation, visual quality, and text alignment. This model establishes a robust foundation for various personalization applications. Human evaluation results validate the model's SOTA superiority across all aspects (identity preservation, text faithfulness, and visual appeal) compared to the previous personalization models.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# V-Hands:リモートホワイトボードインタラクションのためのタッチスクリーンによるハンドトラッキング V-Hands: Touchscreen-based Hand Tracking for Remote Whiteboard Interaction ( http://arxiv.org/abs/2409.13347v1 ) ライセンス: Link先を確認	Xinshuang Liu, Yizhong Zhang, Xin Tong,	(参考訳) ホワイトボードベースのリモートコミュニケーションでは、描画されたコンテンツと手画面のインタラクションのシームレスな統合が、没入的なユーザエクスペリエンスに不可欠である。これまでの方法では、手の動きを捉えるために、かさばる装置のセットアップが必要だったり、静電容量画像から手の動きを正確に追跡できなかったりしていた。本稿では,容量的ビデオフレームから両手の3Dポーズを正確に追跡するリアルタイム手法を提案する。そこで我々は,手の位置をキャパシタフレームから同定し,手関節位置から手関節位置を推定するディープニューラルネットワークを開発し,制約された逆運動論的解法を用いて手関節位置から3次元手ポーズを復元する。さらに,高品質な手画面インタラクションデータをキャプチャする装置を設計し,より正確な同期型容量ビデオと手ポーズデータセットを得た。本手法は,遠隔通信のためのコンパクトな装置構成を維持しながら,キャパシタフレームの3次元ハンドトラッキングの精度と安定性を向上させる。提案手法の有効性を検証し,提案手法の有効性を検証した。私たちのコード、モデル、データセットはhttps://V-Hands.github.io.comで公開されています。 In whiteboard-based remote communication, the seamless integration of drawn content and hand-screen interactions is essential for an immersive user experience. Previous methods either require bulky device setups for capturing hand gestures or fail to accurately track the hand poses from capacitive images. In this paper, we present a real-time method for precise tracking 3D poses of both hands from capacitive video frames. To this end, we develop a deep neural network to identify hands and infer hand joint positions from capacitive frames, and then recover 3D hand poses from the hand-joint positions via a constrained inverse kinematic solver. Additionally, we design a device setup for capturing high-quality hand-screen interaction data and obtained a more accurate synchronized capacitive video and hand pose dataset. Our method improves the accuracy and stability of 3D hand tracking for capacitive frames while maintaining a compact device setup for remote communication. We validate our scheme design and its superior performance on 3D hand pose tracking and demonstrate the effectiveness of our method in whiteboard-based remote communication. Our code, model, and dataset are available at https://V-Hands.github.io.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# ID-Guard: ブレーキング識別による顔操作のユニバーサルフレームワーク ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification ( http://arxiv.org/abs/2409.13349v1 ) ライセンス: Link先を確認	Zuomin Qu, Wei Lu, Xiangyang Luo, Qian Wang, Xiaochun Cao,	(参考訳) 深層学習に基づく顔操作の誤用は、公民権に対する潜在的な脅威となる。この不正行為を防止すべく、画像に見えない敵の摂動を加えて操作過程を妨害するプロアクティブディフェンス技術が提案され、偽造された出力が観察者を不安にさせる。しかし、その非指向的な出力の破壊は、画像中の人物のアイデンティティ情報の保持を招き、個人のスティグマティゼーションにつながる可能性がある。本稿では,IDガード(ID-Guard)と呼ばれる,顔操作と戦うための新しいユニバーサルフレームワークを提案する。具体的には、特定の顔画像に対応するクロスモデルユニバーサル対向摂動を生成するために、エンコーダ・デコーダネットワークの1つのフォワードパスしか必要としない。顔画像の匿名性を確保するため、偽造顔の識別情報を標的に破壊する新しいIDM(IDM)を導入する。さらに,多タスク学習問題として,異なる顔操作への障害を考慮した摂動を最適化し,クロスモデル性能を向上させるために動的重み付け戦略を設計する。提案フレームワークは, 顔画像の特定領域を効果的に歪ませることによって, 複数の顔の操作に対する防御効果を顕著に報告した。さらに,我々の実験では,破壊された画像が顔の塗り絵やオープンソースの画像認識システムを避けることができるID-Guardの能力を明らかにした。 The misuse of deep learning-based facial manipulation poses a potential threat to civil rights. To prevent this fraud at its source, proactive defense technology was proposed to disrupt the manipulation process by adding invisible adversarial perturbations into images, making the forged output unconvincing to the observer. However, their non-directional disruption of the output may result in the retention of identity information of the person in the image, leading to stigmatization of the individual. In this paper, we propose a novel universal framework for combating facial manipulation, called ID-Guard. Specifically, this framework requires only a single forward pass of an encoder-decoder network to generate a cross-model universal adversarial perturbation corresponding to a specific facial image. To ensure anonymity in manipulated facial images, a novel Identity Destruction Module (IDM) is introduced to destroy the identifiable information in forged faces targetedly. Additionally, we optimize the perturbations produced by considering the disruption towards different facial manipulations as a multi-task learning problem and design a dynamic weights strategy to improve cross-model performance. The proposed framework reports impressive results in defending against multiple widely used facial manipulations, effectively distorting the identifiable regions in the manipulated facial images. In addition, our experiments reveal the ID-Guard's ability to enable disrupted images to avoid face inpaintings and open-source image recognition systems.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# 大規模言語モデルにおける感情認知の最近の進歩 Recent Advancement of Emotion Cognition in Large Language Models ( http://arxiv.org/abs/2409.13354v1 ) ライセンス: Link先を確認	Yuyan Chen, Yanghua Xiao,	(参考訳) 大規模言語モデル(LLM)における感情認知は、ソーシャルメディア、人間とコンピュータの相互作用、メンタルヘルスアセスメントなど、さまざまなアプリケーションにおけるパフォーマンス向上に不可欠である。我々は、感情分類、感情的に豊かな反応生成、心の理論を主軸とする現在の研究の展望を探求するとともに、注釈付きデータへの依存や感情処理の複雑さといった課題を認識している。本稿では,感情認知のためのLSMの最近の進歩について,詳細な調査を行う。我々は、Ulric Neisserの認知段階と整合して、重要な研究、方法論、成果、資源を探究する。さらに、教師なし学習アプローチや、より複雑で解釈可能な感情認知LLMの開発など、この発展分野における研究の今後の方向性について概説する。また、LLMの感情認知能力を向上させるために使用されるコントラスト学習などの高度な手法についても論じる。 Emotion cognition in large language models (LLMs) is crucial for enhancing performance across various applications, such as social media, human-computer interaction, and mental health assessment. We explore the current landscape of research, which primarily revolves around emotion classification, emotionally rich response generation, and Theory of Mind assessments, while acknowledge the challenges like dependency on annotated data and complexity in emotion processing. In this paper, we present a detailed survey of recent progress in LLMs for emotion cognition. We explore key research studies, methodologies, outcomes, and resources, aligning them with Ulric Neisser's cognitive stages. Additionally, we outline potential future directions for research in this evolving field, including unsupervised learning approaches and the development of more complex and interpretable emotion cognition LLMs. We also discuss advanced methods such as contrastive learning used to improve LLMs' emotion cognition capabilities.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# EmotionQueen: 大規模言語モデルの共感を評価するベンチマーク EmotionQueen: A Benchmark for Evaluating Empathy of Large Language Models ( http://arxiv.org/abs/2409.13359v1 ) ライセンス: Link先を確認	Yuyan Chen, Hao Wang, Songzhou Yan, Sijia Liu, Yueze Li, Yi Zhao, Yanghua Xiao,	(参考訳) 大規模言語モデル(LLM)における感情知能は自然言語処理において非常に重要である。しかし,従来の研究は,LLMの全体的感情知能を評価するには不十分な感情認識など,基本的な感情分析タスクに重点を置いていた。そこで本稿では,LLMの感情的知性を評価するためのEmotionQueenというフレームワークを提案する。このフレームワークには、キーイベント認識、混合イベント認識、インプリシット感情認識、意図認識の4つの固有のタスクが含まれている。 LLMは重要な出来事や暗黙の感情を認識し、共感的な反応を生成するよう要求される。また、感情関連文の認識と応答におけるLLMの能力を評価するための2つの指標を設計する。実験により、LLMの能力と感情知能の限界について重要な結論が得られた。 Emotional intelligence in large language models (LLMs) is of great importance in Natural Language Processing. However, the previous research mainly focus on basic sentiment analysis tasks, such as emotion recognition, which is not enough to evaluate LLMs' overall emotional intelligence. Therefore, this paper presents a novel framework named EmotionQueen for evaluating the emotional intelligence of LLMs. The framework includes four distinctive tasks: Key Event Recognition, Mixed Event Recognition, Implicit Emotional Recognition, and Intention Recognition. LLMs are requested to recognize important event or implicit emotions and generate empathetic response. We also design two metrics to evaluate LLMs' capabilities in recognition and response for emotion-related statements. Experiments yield significant conclusions about LLMs' capabilities and limitations in emotion intelligence.	翻訳日:2024-11-07 07:40:00 公開日:2024-09-20
# FPBoost: 生存分析のための完全なパラメトリックグラディエントブースティング FPBoost: Fully Parametric Gradient Boosting for Survival Analysis ( http://arxiv.org/abs/2409.13363v1 ) ライセンス: Link先を確認	Alberto Archetti, Eugenio Lomurno, Diego Piccinotti, Matteo Matteucci,	(参考訳) 生存分析は、時間から時間までのデータを分析し、貴重な臨床的洞察を抽出するための重要なツールである。近年,ニューラルネットワークと決定木を利用した機械学習技術が数多く開発されている。これらのうち、最も成功したアプローチは、しばしばモデル化されたハザード関数の形状に関する特定の仮定に依存する。これらの仮定には、比例的ハザード、加速された障害時間、予め定義された時間点の集合での離散推定が含まれる。本研究では,個別のパラメトリック・ハザード・コントリビューションの重み付け和に基づくサバイバルモデル設計のための新しいパラダイムを提案する。我々は,付加的ハザード関数を適用し,生存率や累積ハザード関数に基づくアプローチを改善することにより,フィールドに新たなコントリビューションをもたらすために,よく知られたアンサンブル技術を構築した。さらに、我々はFPBoostと呼ぶモデルを提案し、勾配押し上げによる生存率を直接最適化する最初のアルゴリズムである。我々は、さまざまなデータセットをまたいだアプローチを評価し、さまざまな最先端モデルと比較した。その結果、FPBoostは、基準値と校正値の両方でリスク推定を改善することが示された。 Survival analysis is a critical tool for analyzing time-to-event data and extracting valuable clinical insights. Recently, numerous machine learning techniques leveraging neural networks and decision trees have been developed for this task. Among these, the most successful approaches often rely on specific assumptions about the shape of the modeled hazard function. These assumptions include proportional hazard, accelerated failure time, or discrete estimation at a predefined set of time points. In this study, we propose a novel paradigm for survival model design based on the weighted sum of individual fully parametric hazard contributions. We build upon well-known ensemble techniques to deliver a novel contribution to the field by applying additive hazard functions, improving over approaches based on survival or cumulative hazard functions. Furthermore, the proposed model, which we call FPBoost, is the first algorithm to directly optimize the survival likelihood via gradient boosting. We evaluated our approach across a diverse set of datasets, comparing it against a variety of state-of-the-art models. The results demonstrate that FPBoost improves risk estimation, according to both concordance and calibration metrics.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# 可観測物の量子度生成における量子速度制限 Quantum Speed limit on the production of quantumness of observables ( http://arxiv.org/abs/2409.13365v1 ) ライセンス: Link先を確認	Divyansh Shrimali, Swapnil Bhowmick, Arun Kumar Pati,	(参考訳) 量子システムの古典的でない特徴は、環境や雑音にさらされると劣化することがある。量子系がノイズの存在下で古典的でない特徴を示すのに要する最低時間はどのくらいか? ここでは、2つの与えられた可観測体の可観測子のノルムとして観測可能の量子性に明確な速度制限を証明している。このような量子度測定の速度制限は、量子度の変化率の基本的な上限を設定し、与えられた量によってシステムの量子度を変更するのに必要な時間に対する下限を与える。さらに、量子系における重ね合わせの量をキャプチャする量子コヒーレンスのような古典的でない特徴に対して、速度制限を証明した。得られた速度制限は、興味のある物理過程に対して達成可能であることを実証したので、これらの境界は厳密であると見なすことができる。 Non-classical features of quantum systems can degrade when subjected to environment and noise. Here, we ask a fundamental question: What is the minimum amount of time it takes for a quantum system to exhibit non-classical features in the presence of noise? Here, we prove distinct speed limits on the quantumness of observable as the norm of the commutator of two given observables. The speed limit on such quantumness measures sets the fundamental upper bound on the rate of change of quantumness, which provides the lower bound on the time required to change the quantumness of a system by a given amount. Additionally, we have proved speed limit for the non-classical features such as quantum coherence that captures the amount of superposition in the quantum systems. We have demonstrated that obtained speed limits are attainable for physical processes of interest, and hence, these bounds can be considered to be tight.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# RingMo-Aerial:アフィン変換コントラスト学習を用いた空中リモートセンシング基礎モデル RingMo-Aerial: An Aerial Remote Sensing Foundation Model With A Affine Transformation Contrastive Learning ( http://arxiv.org/abs/2409.13366v1 ) ライセンス: Link先を確認	Wenhui Diao, Haichen Yu, Kaiyue Kang, Tong Ling, Di Liu, Yingchao Feng, Hanbo Bi, Libo Ren, Xuexue Li, Yongqiang Mao, Xian Sun,	(参考訳) 空中リモートセンシング(ARS)の視覚タスクは、視角の独特の特徴のために大きな課題を生んでいる。既存の研究は主に特定のタスクのアルゴリズムに焦点を当てており、幅広いARSビジョンアプリケーションに適用性に制限がある。本稿では,ARSビジョンの分野における基礎モデル研究のギャップを埋めることを目的としたRingMo-Aerialモデルを提案する。周波数強化型マルチヘッド・セルフアテンション(FE-MSA)機構とアフィン変換に基づくコントラスト学習事前学習手法を導入することにより、小型目標に対するモデルの検出能力を向上し、ARSの特徴となる傾いた視野角に最適化する。さらに,ARS-Adapterは,様々なARSビジョンタスクにおけるモデルの適応性と有効性を改善するために,効率的なパラメータ調整手法である。実験により、RingMo-Aerialは複数の下流タスクにおいてSOTA性能を達成することを示した。このことは、ARS視覚タスクの性能向上におけるRingMo-Aerialの実用性と有効性を示している。 Aerial Remote Sensing (ARS) vision tasks pose significant challenges due to the unique characteristics of their viewing angles. Existing research has primarily focused on algorithms for specific tasks, which have limited applicability in a broad range of ARS vision applications. This paper proposes the RingMo-Aerial model, aiming to fill the gap in foundation model research in the field of ARS vision. By introducing the Frequency-Enhanced Multi-Head Self-Attention (FE-MSA) mechanism and an affine transformation-based contrastive learning pre-training method, the model's detection capability for small targets is enhanced and optimized for the tilted viewing angles characteristic of ARS. Furthermore, the ARS-Adapter, an efficient parameter fine-tuning method, is proposed to improve the model's adaptability and effectiveness in various ARS vision tasks. Experimental results demonstrate that RingMo-Aerial achieves SOTA performance on multiple downstream tasks. This indicates the practicality and effectiveness of RingMo-Aerial in enhancing the performance of ARS vision tasks.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# ALPEC: 臨床における機械学習による覚醒検出のための総合的評価フレームワークとデータセット ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice ( http://arxiv.org/abs/2409.13367v1 ) ライセンス: Link先を確認	Stefan Kraft, Andreas Theissler, Vera Wienhausen-Wilke, Philipp Walter, Gjergji Kasneci,	(参考訳) 睡眠障害の診断には睡眠中の覚醒剤の検出が不可欠である。しかし、臨床実践における機械学習(ML)の使用は、主に臨床プロトコルとMLメソッドのミスマッチによって、基本的な問題によって妨げられている。臨床医は通常、覚醒の開始のみに注釈を付けるが、MLメソッドは開始と終了の両方にアノテーションに依存する。また、覚醒検出モデルに対する臨床ニーズに合わせて標準化された評価手法は存在しない。本研究は, 覚醒剤の局所化と正確な事象数(ALPEC)を重視した新しい後処理・評価フレームワークを導入することで, これらの課題に対処する。我々は,ML実践者が,臨床実践と整合して覚醒的発症を検出することに注力することを推奨する。この変化が現在のトレーニングや評価方法に与える影響について検討し、単純化と課題に対処する。我々は、上記の臨床アノテーション制約を反映し、既存のポリソノグラフィーデータセットに存在しないモダリティを含む、新しい包括的ポリソノグラフィーデータセット(CPS)を利用する。本論文と並行してデータセットを公開し,マルチモーダルデータを利用した覚醒的オンセット検出の利点を実証する。本研究は,MLに基づく覚醒検出を臨床環境に統合し,技術進歩と臨床ニーズとのギャップを減らした。 Detecting arousals in sleep is essential for diagnosing sleep disorders. However, using Machine Learning (ML) in clinical practice is impeded by fundamental issues, primarily due to mismatches between clinical protocols and ML methods. Clinicians typically annotate only the onset of arousals, while ML methods rely on annotations for both the beginning and end. Additionally, there is no standardized evaluation methodology tailored to clinical needs for arousal detection models. This work addresses these issues by introducing a novel post-processing and evaluation framework emphasizing approximate localization and precise event count (ALPEC) of arousals. We recommend that ML practitioners focus on detecting arousal onsets, aligning with clinical practice. We examine the impact of this shift on current training and evaluation schemes, addressing simplifications and challenges. We utilize a novel comprehensive polysomnographic dataset (CPS) that reflects the aforementioned clinical annotation constraints and includes modalities not present in existing polysomnographic datasets. We release the dataset alongside this paper, demonstrating the benefits of leveraging multimodal data for arousal onset detection. Our findings significantly contribute to integrating ML-based arousal detection in clinical settings, reducing the gap between technological advancements and clinical needs.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# MCICSAM:Monte Carlo-Guided Interpolation Consistency Segment Anything Model for Semi-Supervised Prestate Zone Segmentation MCICSAM: Monte Carlo-guided Interpolation Consistency Segment Anything Model for Semi-Supervised Prostate Zone Segmentation ( http://arxiv.org/abs/2409.13371v1 ) ライセンス: Link先を確認	Guantian Huang, Beibei Li, Xiaobing Fan, Aritrick Chatterjee, Cheng Wei, Shouliang Qi, Wei Qian, Dianning He,	(参考訳) 前立腺内の様々な領域の正確なセグメンテーションは、前立腺関連疾患の診断と治療に重要である。しかし、特に前立腺画像のような特殊な医療分野におけるラベル付きデータの不足は、大きな課題となっている。 Segment Anything Model (SAM)は、自然画像分割のための新しい大きなモデルであるが、医療画像にはいくつかの課題がある。 SAMの強力な特徴抽出機能を活用し,医用画像アノテーションの低データボリューム問題に対処するために,モンテカルロのローランド適応(LoRA)と半教師あり学習手法を用いた補間整合(MCIC)を用いて,SAMの微調整を行う。半教師付き学習に基づく前立腺領域セグメンテーションに適用するためのモンテカルロ誘導補間一貫性セグメンテーションモデル(MCICSAM)を提案する。非ラベルデータセクションでは、MCICは入力データに対して2つの異なる補間変換を行い、モンテカルロの不確実性解析を出力に組み込む。これらの補間されたサンプルに課される一貫性の制約により、モデルがラベルのないデータの分布をよりよく適合させ、最終的に半教師付きシナリオのパフォーマンスを向上させることができる。 Dice と Hausdorff Distance at 95th percentile (HD95) を使ってモデル性能を検証する。 MCICSAMはDiceを79.38%、89.95%で、HD95値を3.12と2.27で改善している。同時に、MCICSAMは強い一般化性を示す。この手法は前立腺画像分割の分野で新たな可能性をもたらすことが期待されている。 Accurate segmentation of various regions within the prostate is pivotal for diagnosing and treating prostate-related diseases. However, the scarcity of labeled data, particularly in specialized medical fields like prostate imaging, poses a significant challenge. Segment Anything Model (SAM) is a new large model for natural image segmentation, but there are some challenges in medical imaging. In order to better utilize the powerful feature extraction capability of SAM as well as to address the problem of low data volume for medical image annotation, we use Low-Rank Adaptation (LoRA) and semi-supervised learning methods of Monte Carlo guided interpolation consistency (MCIC) to enhance the fine-tuned SAM. We propose Monte Carlo-guided Interpolation Consistency Segment Anything Model (MCICSAM) for application to semi-supervised learning based prostate region segmentation. In the unlabeled data section, MCIC performs two different interpolation transformations on the input data and incorporates Monte Carlo uncertainty analysis in the output, forcing the model to be consistent in its predictions. The consistency constraints imposed on these interpolated samples allow the model to fit the distribution of unlabeled data better, ultimately improving its performance in semi-supervised scenarios. We use Dice and Hausdorff Distance at 95th percentile (HD95) to validate model performance. MCICSAM yieldes Dice with 79.38% and 89.95%, along with improves HD95 values of 3.12 and 2.27 for transition zone and transition zone. At the same time MCICSAM demonstrates strong generalizability. This method is expected to bring new possibilities in the field of prostate image segmentation.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# 非エルミートグライド時間対称性 Non-Hermitian glide-time symmetry ( http://arxiv.org/abs/2409.13372v1 ) ライセンス: Link先を確認	Li-Wei Wang, Jian-Hua Jiang,	(参考訳) 非エルミート系は、従来のエルミート系を超えて、例外点や複雑なスペクトルトポロジーのような興味深い概念や、非エルミート皮膚効果(NHSE)のようなエキゾチックな現象をもたらした。しかしながら、非エルミート系に関する以前の研究は主に固有状態の性質に焦点を当てており、非エルミート力学現象についてはより限定的な議論がなされている。ここでは、非エルミート物理学におけるパリティ時対称性の成功に触発され、グライド時反転(GT)対称性を持つ一次元非エルミート系を理論的に研究する。我々は、GT対称性が特異な物理的性質をもたらし、非エルミート系においてリッチな動的現象を可能にすることを発見した。注目すべきは、異なる動的位相にまたがる多様な挙動を示す動的NHSEを明らかにし、非エルミート力学の豊かさを解明することである。我々は、リッチな非エルミート力学現象を理解するための理論的枠組みを確立する。さらに、GT対称系のリッチな動的位相は、バルク内およびエッジ境界における力学の顕著なチューニングを可能にすることを示す。これらには、バルクにおける指向性波動伝播と増幅、エッジ境界における波のトラップと動的パターンが含まれる。理論的枠組みの発展と豊富な非エルミート力学相の研究の両方により、この研究は非エルミート力学の将来の研究の基盤となり、格子対称性の重要な役割に特に重点を置いている。 Non-Hermitian systems, going beyond conventional Hermitian systems, have brought in intriguing concepts such as exceptional points and complex spectral topology as well as exotic phenomena such as non-Hermitian skin effects (NHSEs). However, previous studies on non-Hermitian systems predominantly focus on the properties of eigenstates, with rather limited discussions on non-Hermitian dynamic phenomena. Here, inspired by the celebrated success of the parity-time symmetry in non-Hermitian physics, we theoretically study a one-dimensional non-Hermitian system with glide-time reversal (GT) symmetry. We discover that the GT symmetry leads to unique physical properties and enables rich dynamic phenomena in non-Hermitian systems. Remarkably, we reveal the dynamic NHSEs that exhibit diverse behaviors across distinct dynamic phases, elucidating the richness of non-Hermitian dynamics. We establish the theoretical frameworks for understanding the rich non-Hermitian dynamic phenomena. We further show that the rich dynamic phases in the GT-symmetric systems enable the remarkable tuning of the dynamics in the bulk as well as at the edge boundaries. These include the directional wave propagation and amplification in the bulk, as well as the wave trapping and the dynamic patterns at the edge boundaries. With both the development in the theoretical framework and the study of the rich non-Hermitian dynamic phases, this work serves as a stepstone for future studies on non-Hermitian dynamics with a special emphasize on the pivotal role of the lattice symmetry.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# LLMはまだ計画できない, LRMは可能か? OpenAIのo1のPlanBenchに関する予備的評価 LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench ( http://arxiv.org/abs/2409.13373v1 ) ライセンス: Link先を確認	Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati,	(参考訳) 望ましい状況を達成するための行動コースを計画する能力は、長年、知的エージェントのコアコンピテンスと考えられてきた。大規模言語モデル(LLM)の出現により、そのような計画能力を持っているかどうかという問題にかなりの関心が寄せられている。 GPT3のリリース直後の2022年に開発した拡張可能なベンチマークであるPlanBenchは、LLMの計画能力を評価する上で重要なツールであり続けている。 GPT3以来、新しいプライベートおよびオープンソース LLM が多数存在するが、このベンチマークの進捗は驚くほど遅かった。 OpenAIによると、最近のo1(Strawberry)モデルは、自己回帰型LLMの通常の制限から逃れるために特別に構築され、訓練されている。この開発を触媒として利用し、現在のLLMと新しいLRMがPlanBenchにどの程度優れているかを包括的に検討する。ご覧の通り、o1のパフォーマンスはベンチマークの量子的改善であり、競争を上回りますが、それでも飽和には程遠いです。この改善は、そのようなシステムをデプロイする前に考慮すべき正確性、効率、保証に関する問題にもつながる。 The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities. PlanBench, an extensible benchmark we developed in 2022, soon after the release of GPT3, has remained an important tool for evaluating the planning abilities of LLMs. Despite the slew of new private and open source LLMs since GPT3, progress on this benchmark has been surprisingly slow. OpenAI claims that their recent o1 (Strawberry) model has been specifically constructed and trained to escape the normal limitations of autoregressive LLMs--making it a new kind of model: a Large Reasoning Model (LRM). Using this development as a catalyst, this paper takes a comprehensive look at how well current LLMs and new LRMs do on PlanBench. As we shall see, while o1's performance is a quantum improvement on the benchmark, outpacing the competition, it is still far from saturating it. This improvement also brings to the fore questions about accuracy, efficiency, and guarantees which must be considered before deploying such systems.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# ポスト選択1ショット対称量子状態の識別とアクセプタンスにおける性能指標としての誤差最小化測定 Error-Minimizing Measurements in Postselected One-Shot Symmetric Quantum State Discrimination and Acceptance as a Performance Metric ( http://arxiv.org/abs/2409.13379v1 ) ライセンス: Link先を確認	Saurabh Kumar Gupta, Abhishek K. Gupta,	(参考訳) 量子状態を用いた仮説テストでは、2つの可能な状態の1つを含むブラックボックスが与えられると、仮説の1つを優先して測定を行う。ポストセレクトされた仮説テストでは、仮説のいずれかを選択しない3番目の結果が追加される。ポストセレクトされたシナリオでは、最小誤差の1ショット対称仮説テストは、選択された結果の1つが生じるという事実に基づいて条件付けられた文献によって特徴づけられる。この方向にさらに進み、最小限の誤差につながるあらゆる可能な測定値のセットを与えます。パラメトリック形式で任意の誤差最小化測定を行った。どの仮説も選択しないことは、テストの品質を損なうことに注意してください。さらに、これらの測定値が品質によって異なることを示す例を挙げる。ポストセレクトされた仮説テストの品質について議論する必要がある。そこで, 提案手法は, 任意の誤差最小化測定値に対する受理の表現をパラメータとして定義することで, ポストセレクト仮説検定の質を特徴付ける。最小誤差を達成できる測定値のセットについて、受け入れを最大化し、それを達成した例を与えられたので、受け入れの観点で可能な最良の測定値の例を与える。 In hypothesis testing with quantum states, given a black box containing one of the two possible states, measurement is performed to detect in favor of one of the hypotheses. In postselected hypothesis testing, a third outcome is added, corresponding to not selecting any of the hypotheses. In postselected scenario, minimum error one-shot symmetric hypothesis testing is characterized in literature conditioned on the fact that one of the selected outcomes occur. We proceed further in this direction to give the set of all possible measurements that lead to the minimum error. We have given an arbitrary error-minimizing measurement in a parametric form. Note that not selecting any of the hypotheses decimates the quality of testing. We further give an example to show that these measurements vary in quality. There is a need to discuss the quality of postselected hypothesis testing. We then characterize the quality of postselected hypothesis testing by defining a new metric acceptance and give expression of acceptance for an arbitrary error-minimizing measurement in terms of some parameters of the measurement. On the set of measurements that achieve minimum error, we have maximized the acceptance, and given an example which achieves that, thus giving an example of the best possible measurement in terms of acceptance.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# 音声合成におけるロバスト協調透かしのための音声コーデック強化 Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis ( http://arxiv.org/abs/2409.13382v1 ) ライセンス: Link先を確認	Lauri Juvela, Xin Wang,	(参考訳) 合成音声の自動検出がますます重要になっているのは、現在の合成法がヒトの音声とほぼ区別がつかず、一般に広くアクセス可能であるためである。音声透かしやその他のアクティブな開示手法は、受動的検出に基づいて従来のディープフェイク防御を補完できるため、研究活動を惹きつけている。アクティブな検出と受動的検出の両方において、堅牢性は大きな関心事である。従来のオーディオ透かしは、特にオーディオコーデックアプリケーションによる攻撃を受けやすい。野生に放出されるほとんどの音声および音声コンテンツは、純粋に分配方法としてオーディオコーデックを通り抜ける。我々は最近,雑音に富むが識別可能な伝送路上で生成した音声をより容易に検出する手法として,協調的な透かしを提案する。本稿では,従来の音声コーデックやニューラルオーディオコーデックと併用するためにチャネル拡張を拡張し,様々な構成に対するコーデックビットレートの転送性および効果を評価する。その結果、勾配近似のための波形領域ストレートスルー推定器を用いて、ブラックボックスオーディオコーデックによって協調的な透かしを確実に拡張できることが示唆された。さらに,この結果から,ニューラルオーディオコーデックによるチャネル拡張は従来のコーデックによく寄与することが示された。リスニングテストでは、8kbpsの高ビットレートコーデックやDACで、協調的な透かしは知覚上の劣化を無視できることを示した。 Automatic detection of synthetic speech is becoming increasingly important as current synthesis methods are both near indistinguishable from human speech and widely accessible to the public. Audio watermarking and other active disclosure methods of are attracting research activity, as they can complement traditional deepfake defenses based on passive detection. In both active and passive detection, robustness is of major interest. Traditional audio watermarks are particularly susceptible to removal attacks by audio codec application. Most generated speech and audio content released into the wild passes through an audio codec purely as a distribution method. We recently proposed collaborative watermarking as method for making generated speech more easily detectable over a noisy but differentiable transmission channel. This paper extends the channel augmentation to work with non-differentiable traditional audio codecs and neural audio codecs and evaluates transferability and effect of codec bitrate over various configurations. The results show that collaborative watermarking can be reliably augmented by black-box audio codecs using a waveform-domain straight-through-estimator for gradient approximation. Furthermore, that results show that channel augmentation with a neural audio codec transfers well to traditional codecs. Listening tests demonstrate collaborative watermarking incurs negligible perceptual degradation with high bitrate codecs or DAC at 8kbps.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# 弱測定方式における量子メモリによるエンプティ波の試行 A Test of Empty Wave via Quantum Memory in a Weak Measurement Scheme ( http://arxiv.org/abs/2409.13383v1 ) ライセンス: Link先を確認	Jian-Peng Dou, Feng Lu, Hao Tang, Xiao-Wen Shang, Xian-Min Jin,	(参考訳) 量子力学では、長年の疑問が残る: 単一の光子が二重スリットをどう横切るのか? 1つの直感的な図は、光子は1つのスリットのみを通過し、その波動関数は「空」波と「フル」波に分裂することを示している。しかし、この空の波の現実はまだ確認されていない。本稿では、量子メモリと弱い測定を組み合わせた新しい実験構成を提案し、空波の性質について検討する。単一の原子励起は、二重スリット実験において2つの経路に類似した自由空間と量子メモリの間に確率的に分割される。量子メモリは、量子状態が崩壊することなく、保存されたスピン波の存在により単一光子ラマン散乱が増強される経路検出器として機能する。この拡張は古典的な情報として記録され、量子メモリに格納されたスピン波は2回検索され、干渉可視性は79%である。選択後において弱い値が検出される従来の弱い測定方式とは異なり、干渉が起こる前に弱い値を古典的な情報に変換する。量子メモリは,部分的な情報を抽出しながらコヒーレンスを保ち,量子計測に新たな洞察を与える計測装置としての可能性を示す。 In quantum mechanics, a long-standing question remains: How does a single photon traverse double slits? One intuitive picture suggests that the photon passes through only one slit, while its wavefunction splits into an ``empty" wave and a ``full" wave. However, the reality of this empty wave is yet to be verified. Here, we present a novel experimental configuration that combines quantum memory and weak measurement to investigate the nature of the empty wave. A single atomic excitation is probabilistically split between free space and a quantum memory, analogous to the two paths in a double-slit experiment. The quantum memory serves as a path detector, where single-photon Raman scattering is enhanced due to the presence of a stored spin wave, without collapsing the quantum state. This enhancement is recorded as classical information, and the spin wave stored in the quantum memory is retrieved twice, with an interference visibility of 79%. Unlike conventional weak measurement schemes, where weak values are detected during post-selection, our approach converts the weak value into classical information before interference takes place. Our results demonstrate the potential of quantum memory as a measurement device that preserves coherence while extracting partial information, offering new insights into quantum measurement.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# 不確実環境におけるロバスト信号制御のためのスケーラブル多目的最適化 Scalable Multi-Objective Optimization for Robust Traffic Signal Control in Uncertain Environments ( http://arxiv.org/abs/2409.13388v1 ) ライセンス: Link先を確認	Weian Guo, Wuzhao Li, Zhiou Zhang, Lun Zhang, Li Li, Dongyang Li,	(参考訳) 知的交通信号制御は、経済効率、環境持続可能性、日常生活の質に重要な影響を及ぼす現代都市経営にとって不可欠である。しかし、この数十年間、大規模な交通ネットワークの管理、交差点の調整、不確実な交通条件下での堅牢性確保において、大きな課題が続いている。本稿では,動的かつ不確実な都市環境におけるロバストな交通信号制御のための,スケーラブルな多目的最適化手法を提案する。本稿では,確率変数と確率的トラフィックパターンを組み込んだ多目的最適化モデルを提案する。本稿では,適応ハイブリッド多目的最適化アルゴリズム (Adaptive Hybrid Multi-Objective Optimization Algorithm, AHMOA) を提案する。 AHMOAは、予測不可能なトラフィックの変化に対応しつつ、平均遅延、ネットワーク安定性、システムの堅牢性など、複数の目的を同時に最適化する。このアルゴリズムは、進化的戦略と、探索と搾取のバランスをとるための適応的なメカニズムを結合し、履歴トラフィックデータを活用するためのメモリベースの評価メカニズムを組み込む。シミュレーションはマンハッタン、パリ、サンパウロ、イスタンブールなど様々な都市で行われている。実験の結果、AHMOAは最先端のアルゴリズムを一貫して上回り、不確実な環境下で複雑な交通システムを管理するためのスケーラブルで堅牢なPareto最適ソリューションを提供する能力があることが示された。 Intelligent traffic signal control is essential to modern urban management, with important impacts on economic efficiency, environmental sustainability, and quality of daily life. However, in current decades, it continues to pose significant challenges in managing large-scale traffic networks, coordinating intersections, and ensuring robustness under uncertain traffic conditions. This paper presents a scalable multi-objective optimization approach for robust traffic signal control in dynamic and uncertain urban environments. A multi-objective optimization model is proposed in this paper, which incorporates stochastic variables and probabilistic traffic patterns to capture traffic flow dynamics and uncertainty. We propose an algorithm named Adaptive Hybrid Multi-Objective Optimization Algorithm (AHMOA), which addresses the uncertainties of city traffic, including network-wide signal coordination, fluctuating patterns, and environmental impacts. AHMOA simultaneously optimizes multiple objectives, such as average delay, network stability, and system robustness, while adapting to unpredictable changes in traffic. The algorithm combines evolutionary strategies with an adaptive mechanism to balance exploration and exploitation, and incorporates a memory-based evaluation mechanism to leverage historical traffic data. Simulations are conducted in different cities including Manhattan, Paris, Sao Paulo, and Istanbul. The experimental results demonstrate that AHMOA consistently outperforms several state-of-the-art algorithms and the algorithm is competent to provide scalable, robust Pareto optimal solutions for managing complex traffic systems under uncertain environments.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# 2次元・3次元の1次構造テンソルスケール空間 Feature-Centered First Order Structure Tensor Scale-Space in 2D and 3D ( http://arxiv.org/abs/2409.13389v1 ) ライセンス: Link先を確認	Pawel Tomasz Pieta, Anders Bjorholm Dahl, Jeppe Revall Frisvad, Siavash Arjomand Bigdeli, Anders Nymark Christensen,	(参考訳) 構造テンソル法は画像構造の2次元および3次元解析によく用いられるが、その結果は多くの場合、ユーザのメソッドパラメータの選択に非常に依存している。微分フィルタの幅を画像特徴量に直結させることにより, 1次構造テンソルスケール空間におけるパラメータ選択を単純化する。リングフィルタのステップを導入することで、ガウス積分/平滑化を特徴端から中心へより正確に微分フィルタ応答をシフトさせる手法に置き換える。さらに、抽出された構造的測度を用いて、スケールマップの既知の不正確さを補正し、2Dと3Dの両方の特徴量を信頼性良く表現できることを示す。従来の1次構造テンソルやそれ以前の構造テンソルスケール空間のアプローチと比較して、我々の解ははるかに正確であり、最小限のユーザ入力で幅広い構造パラメータを抽出するアウト・オブ・ザ・ボックス法として機能する。 The structure tensor method is often used for 2D and 3D analysis of imaged structures, but its results are in many cases very dependent on the user's choice of method parameters. We simplify this parameter choice in first order structure tensor scale-space by directly connecting the width of the derivative filter to the size of image features. By introducing a ring-filter step, we substitute the Gaussian integration/smoothing with a method that more accurately shifts the derivative filter response from feature edges to their center. We further demonstrate how extracted structural measures can be used to correct known inaccuracies in the scale map, resulting in a reliable representation of the feature sizes both in 2D and 3D. Compared to the traditional first order structure tensor, or previous structure tensor scale-space approaches, our solution is much more accurate and can serve as an out-of-the-box method for extracting a wide range of structural parameters with minimal user input.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# 機械学習型原子間ポテンシャルのベンチマークとしての圧力下の水素 Hydrogen under Pressure as a Benchmark for Machine-Learning Interatomic Potentials ( http://arxiv.org/abs/2409.13390v1 ) ライセンス: Link先を確認	Thomas Bischoff, Bastian Jäckl, Matthias Rupp,	(参考訳) 機械学習原子間ポテンシャル(MLPs)は、原子論系のポテンシャルエネルギー表面の高速でデータ駆動の代理モデルであり、アブ初期分子動力学(MD)シミュレーションを数桁の規模で加速することができる。 MLPの性能は、トレーニングで使われていないデータに対するエネルギーと力の予測誤差として一般的に測定される。テストセット上での予測誤差は低いが、MDシミュレーションでは良い性能が保証されない。後者は、加速シミュレーションの実行から得られる物理的動機付けされた性能測定を必要とする。しかし、そのような措置の採用は、それらを計算し解釈するのに必要な努力とドメイン知識によって制限されている。この制限を克服するため,圧力下での水素中の液体-液体相転移のMDシミュレーションにおいて,MDシミュレーションにおいてMDPの性能を自動的に評価するベンチマークシステムを提案する。ベンチマークのh-llpt-24データセットは、異なる温度と質量密度でのMDシミュレーションによる参照測地、エネルギー、力、ストレスを提供する。ベンチマークのPythonコードは、MDシミュレーションを自動で実行し、圧力、安定な分子分数、拡散係数、放射分布関数を定量的に比較、視覚化する。このベンチマークを用いて, 液体-液相転移を再現できない状態のMLPがいくつか存在することを示す。 Machine-learning interatomic potentials (MLPs) are fast, data-driven surrogate models of atomistic systems' potential energy surfaces that can accelerate ab-initio molecular dynamics (MD) simulations by several orders of magnitude. The performance of MLPs is commonly measured as the prediction error in energies and forces on data not used in their training. While low prediction errors on a test set are necessary, they do not guarantee good performance in MD simulations. The latter requires physically motivated performance measures obtained from running accelerated simulations. However, the adoption of such measures has been limited by the effort and domain knowledge required to calculate and interpret them. To overcome this limitation, we present a benchmark that automatically quantifies the performance of MLPs in MD simulations of a liquid-liquid phase transition in hydrogen under pressure, a challenging benchmark system. The benchmark's h-llpt-24 dataset provides reference geometries, energies, forces, and stresses from density functional theory MD simulations at different temperatures and mass densities. The benchmark's Python code automatically runs MLP-accelerated MD simulations and calculates, quantitatively compares and visualizes pressures, stable molecular fractions, diffusion coefficients, and radial distribution functions. Employing this benchmark, we show that several state-of-the-art MLPs fail to reproduce the liquid-liquid phase transition.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# Elite-EvGS: イベント・ツー・ビデオ優先の蒸留によるイベントベース3次元ガウス分割学習 Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors ( http://arxiv.org/abs/2409.13392v1 ) ライセンス: Link先を確認	Zixin Zhang, Kanghao Chen, Lin Wang,	(参考訳) イベントカメラは、固定フレームではなく、非同期でスパースなイベントストリームを出力するバイオインスパイアされたセンサーである。高ダイナミックレンジや高時間分解能などの異なる利点から、ロボットマッピングにおいて重要な3D再構成にイベントカメラが応用されている。近年, 3次元ガウススプラッティング(3DGS)などのニューラルレンダリング技術は, 3次元再構成に成功している。しかし、効果的なイベントベースの3DGSパイプラインの開発方法はまだ解明されていない。特に、3DGSは、通常、高品質な初期化と密集した多視点制約に依存しているため、その固有のスパース性から、3DGS最適化に潜在的な問題が現れる。そこで我々は,イベントベースの新しい3DGSフレームワークElite-EvGSを提案する。我々のキーとなる考え方は、既成のイベント・ツー・ビデオ(E2V)モデルから事前知識を抽出し、粗い最適化方法でイベントから3Dシーンを効果的に再構築することである。具体的には、イベントからの3DGS初期化の複雑さに対処するため、E2Vモデルによって生成されたフレームから粗い3DGSを最適化し、イベントを組み込んで詳細を洗練するウォームアップ初期化戦略を導入する。そこで本稿では,ウィンドウスライシングによるイベント監視を段階的に削減する,プログレッシブなイベント監視戦略を提案する。これにより、イベントフレームの時間的ランダム性が微妙に向上し、局所的なテクスチャとグローバルな構造の詳細の最適化に寄与する。ベンチマークデータセットの実験では、Elite-EvGSがより優れたテクスチャと構造の詳細で3Dシーンを再構築できることが示されている。一方,本手法は,高速な動きや低照度シーンなどの多様な課題を含む実世界のデータに対して,高い性能が得られる。 Event cameras are bio-inspired sensors that output asynchronous and sparse event streams, instead of fixed frames. Benefiting from their distinct advantages, such as high dynamic range and high temporal resolution, event cameras have been applied to address 3D reconstruction, important for robotic mapping. Recently, neural rendering techniques, such as 3D Gaussian splatting (3DGS), have been shown successful in 3D reconstruction. However, it still remains under-explored how to develop an effective event-based 3DGS pipeline. In particular, as 3DGS typically depends on high-quality initialization and dense multiview constraints, a potential problem appears for the 3DGS optimization with events given its inherent sparse property. To this end, we propose a novel event-based 3DGS framework, named Elite-EvGS. Our key idea is to distill the prior knowledge from the off-the-shelf event-to-video (E2V) models to effectively reconstruct 3D scenes from events in a coarse-to-fine optimization manner. Specifically, to address the complexity of 3DGS initialization from events, we introduce a novel warm-up initialization strategy that optimizes a coarse 3DGS from the frames generated by E2V models and then incorporates events to refine the details. Then, we propose a progressive event supervision strategy that employs the window-slicing operation to progressively reduce the number of events used for supervision. This subtly relives the temporal randomness of the event frames, benefiting the optimization of local textural and global structural details. Experiments on the benchmark datasets demonstrate that Elite-EvGS can reconstruct 3D scenes with better textural and structural details. Meanwhile, our method yields plausible performance on the captured real-world data, including diverse challenging conditions, such as fast motion and low light scenes.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# PointSAM:リモートセンシング画像のためのポイントアップセグメンテーションモデル PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images ( http://arxiv.org/abs/2409.13401v1 ) ライセンス: Link先を確認	Nanqing Liu, Xun Xu, Yongyi Su, Haojie Zhang, Heng-Chao Li,	(参考訳) Segment Anything Model (SAM)は画像分割のための高度な基礎モデルであり、リモートセンシング画像(RSI)に広く応用されている。 RSIと自然画像のドメインギャップのため、従来の方法では、ソーストレーニング済みのモデルとしてSAMを使用し、完全に教師付きマスクで微調整する。これらの手法とは異なり、我々の研究はより便利で挑戦的なポイントアノテーションを使ってSAMを微調整することに焦点を当てている。 SAMのゼロショット機能を活用して、トレーニング用に擬似ラベルを反復的に生成する自己学習フレームワークを採用する。しかし、擬似ラベルがノイズラベルを含む場合、エラーの蓄積のリスクがある。この問題に対処するため、ターゲットデータセットからターゲットプロトタイプを抽出し、ハンガリーのアルゴリズムを用いて予測プロトタイプとマッチングし、モデルが間違った方向に学習するのを防ぐ。さらに、複雑な背景とRSI内のオブジェクトの密分布のため、ポイントプロンプトを使用すると、複数のオブジェクトが1つとして認識される。この問題を解決するために,インスタンスマスクの非重複性に基づく負のプロンプトキャリブレーション手法を提案する。簡単に言えば、重なり合うマスクのプロンプトを対応する負の信号として使い、洗練されたマスクを生み出す。本稿では,これらの手法を組み合わせることで,ポイントSAMという新しいセグメンテーションモデルを提案する。我々は, WHU, HRSID, NWPU VHR-10を含むRSIデータセットを用いて実験を行い, SAM, SAM2, および他の比較手法による直接試験よりも優れた結果を得た。さらに,PointSAMをポイント・ツー・ボックス・コンバータとして導入し,提案手法を他のポイント・教師付きタスクに拡張できることを示す。コードはhttps://github.com/Lans1ng/PointSAMで公開されている。 Segment Anything Model (SAM) is an advanced foundational model for image segmentation, widely applied to remote sensing images (RSIs). Due to the domain gap between RSIs and natural images, traditional methods typically use SAM as a source pre-trained model and fine-tune it with fully supervised masks. Unlike these methods, our work focuses on fine-tuning SAM using more convenient and challenging point annotations. Leveraging SAM's zero-shot capabilities, we adopt a self-training framework that iteratively generates pseudo-labels for training. However, if the pseudo-labels contain noisy labels, there is a risk of error accumulation. To address this issue, we extract target prototypes from the target dataset and use the Hungarian algorithm to match them with prediction prototypes, preventing the model from learning in the wrong direction. Additionally, due to the complex backgrounds and dense distribution of objects in RSI, using point prompts may result in multiple objects being recognized as one. To solve this problem, we propose a negative prompt calibration method based on the non-overlapping nature of instance masks. In brief, we use the prompts of overlapping masks as corresponding negative signals, resulting in refined masks. Combining the above methods, we propose a novel Pointly-supervised Segment Anything Model named PointSAM. We conduct experiments on RSI datasets, including WHU, HRSID, and NWPU VHR-10, and the results show that our method significantly outperforms direct testing with SAM, SAM2, and other comparison methods. Furthermore, we introduce PointSAM as a point-to-box converter and achieve encouraging results, suggesting that this method can be extended to other point-supervised tasks. The code is available at https://github.com/Lans1ng/PointSAM.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# マルチモーダルディープラーニングカメラライダー校正モデルの検証と探索 Validation & Exploration of Multimodal Deep-Learning Camera-Lidar Calibration models ( http://arxiv.org/abs/2409.13402v1 ) ライセンス: Link先を確認	Venkat Karramreddy, Liam Mitchell,	(参考訳) 本稿では,マルチモーダルセンサシステムの校正のためのディープラーニングアーキテクチャの探索,評価,実装における革新的な研究について述べる。その背景にあるのは、センサー融合を利用して、3D LiDARと2Dカメラのダイナミックでリアルタイムなアライメントを実現することだ。静的キャリブレーション法は退屈で時間を要するため,この問題を解決するために,従来型ニューラルネットワーク(CNN)と幾何学的に情報を得た学習を組み合わせることを提案する。我々は、RegNet、CalibNet、LCCNetなどのExtrinsic LiDAR-Camera Calibrationツールの基本原則を活用し、オンラインで利用可能なオープンソースモデルを探索し、その結果を対応する研究論文と比較する。これらの視覚的および測定可能なアウトプットを抽出するために必要な要件は、ソースコードの微調整、トレーニング、バリデーション、テストの各フレームワークを等しく比較することであった。この手法は,どの先進的ネットワークが最も正確かつ一貫した予測を生成するかを調べることを目的としている。一連の実験を通じて、その過程での潜在的な改善の欠点と領域を明らかにします。 LCCNetは、検証したすべてのモデルの中で、最高の結果をもたらすことが分かりました。 This article presents an innovative study in exploring, evaluating, and implementing deep learning architectures for the calibration of multi-modal sensor systems. The focus behind this is to leverage the use of sensor fusion to achieve dynamic, real-time alignment between 3D LiDAR and 2D Camera sensors. static calibration methods are tedious and time-consuming, which is why we propose utilizing Conventional Neural Networks (CNN) coupled with geometrically informed learning to solve this issue. We leverage the foundational principles of Extrinsic LiDAR-Camera Calibration tools such as RegNet, CalibNet, and LCCNet by exploring open-source models that are available online and comparing our results with their corresponding research papers. Requirements for extracting these visual and measurable outputs involved tweaking source code, fine-tuning, training, validation, and testing for each of these frameworks for equal comparisons. This approach aims to investigate which of these advanced networks produces the most accurate and consistent predictions. Through a series of experiments, we reveal some of their shortcomings and areas for potential improvements along the way. We find that LCCNet yields the best results out of all the models that we validated.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# クレジットカードの不正検出:ディープラーニングアプローチ Credit Card Fraud Detection: A Deep Learning Approach ( http://arxiv.org/abs/2409.13406v1 ) ライセンス: Link先を確認	Sourav Verma, Joydip Dhar,	(参考訳) クレジットカードは、最近の電子取引におけるオンラインとオフラインの両方の支払いモードにおいて、最も広範なインストール方法の1つである。クレジットカードの発明は電子取引をかなり楽にしたしかし、犯罪に対する新たな詐欺の機会も提供し、詐欺率の上昇につながった。不正なクレジットカード取引により、多くの機関や個人によって実質的な金額が失われている。したがって、改善された動的不正認識フレームワークを適応させることは、すべてのクレジットカード流通銀行が損失を軽減するために必須となった。実際、不正なクレジットカード取引の問題は、コンセプトドリフト(concept drift)、クラス不均衡(class im Balance)、検証レイテンシ(Verification latency)といった、関連するリアルタイムの課題に関係している。しかし、現在のシステムの大部分は人工知能(AI)、ファジィ論理、機械学習、データマイニング、遺伝的アルゴリズムなどに基づいており、詐欺検出システム(FDS)のすべての課題にほとんど対処しない仮定に依存している。本稿では,偽陽性率が非常に低い不正カバレッジを得るために,Deep Learningアルゴリズムを理解し,実装することを目的とする。また、一般的なパターンを学習するための教師なし(半教師なし)手法として自動エンコーダを実装することを目的とする。キーワード:クレジットカード詐欺、不正検出システム(FDS)、電子取引、コンセプトドリフト、クラス不均衡、検証レイテンシ、機械学習、ディープラーニング Credit card is one of the most extensive methods of instalment for both online and offline mode of payment for electronic transactions in recent times. credit cards invention has provided significant ease in electronic transactions. However, it has also provided new fraud opportunities for criminals, which results in increased fraud rates. Substantial amount of money has been lost by many institutions and individuals due to fraudulent credit card transactions. Adapting improved and dynamic fraud recognition frameworks thus became essential for all credit card distributing banks to mitigate their losses. In fact, the problem of fraudulent credit card transactions implicates a number of relevant real-time challenges, namely: Concept drift, Class imbalance, and Verification latency. However, the vast majority of current systems are based on artificial intelligence (AI), Fuzzy logic, Machine Learning, Data mining, Genetic Algorithms, and so on, rely on assumptions that hardly address all the relevant challenges of fraud-detection system (FDS). This paper aims to understand & implement Deep Learning algorithms in order to obtain a high fraud coverage with very low false positive rate. Also, it aims to implement an auto-encoder as an unsupervised (semi-supervised) method of learning common patterns. Keywords: Credit card fraud, Fraud-detection system (FDS), Electronic transactions, Concept drift, Class imbalance, Verification latency, Machine Learning, Deep Learning	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# 非定常コストを考慮したマルチエフェクタ時空間計画のコントラスト説明に関するユーザスタディ A User Study on Contrastive Explanations for Multi-Effector Temporal Planning with Non-Stationary Costs ( http://arxiv.org/abs/2409.13427v1 ) ライセンス: Link先を確認	Xiaowei Liu, Kevin McAreavey, Weiru Liu,	(参考訳) 本稿では,スマートホームの時間的計画のためのエンドユーザーアプリケーションとして,コンストラッシブな説明を採用する。本アプリケーションでは、アプライアンスタスクの実行の要件、動的エネルギー関税によるエネルギーの支払い、高容量バッテリーストレージへのアクセス、電力をグリッドに販売することができる。装置の同時スケジューリングは、これをマルチエフェクタ計画の問題とし、動的関税は、非定常的なコスト(または、定常だが外因性事象に依存するコスト)をもたらす。これらの特徴は、一般に既存のPDDLベースのプランナーではプランニング問題がサポートされないため、適切なアプライアンス数や時間的地平線にスケールする独自のドメイン依存プランナーを設計する。我々は,2つのユーザストーリーに基づいて,オンラインクラウドソーシングプラットフォームを用いた128人の参加者を対象に,コントロールされたユーザスタディを実施している。比較質問や説明を提示したユーザは,満足度が高く,理解度が向上する傾向があり,これらの機能にアクセスできないユーザに比べて,推奨されるAIスケジュールに好適に適合する可能性が示唆された。 In this paper, we adopt constrastive explanations within an end-user application for temporal planning of smart homes. In this application, users have requirements on the execution of appliance tasks, pay for energy according to dynamic energy tariffs, have access to high-capacity battery storage, and are able to sell energy to the grid. The concurrent scheduling of devices makes this a multi-effector planning problem, while the dynamic tariffs yield costs that are non-stationary (alternatively, costs that are stationary but depend on exogenous events). These characteristics are such that the planning problems are generally not supported by existing PDDL-based planners, so we instead design a custom domain-dependent planner that scales to reasonable appliance numbers and time horizons. We conduct a controlled user study with 128 participants using an online crowd-sourcing platform based on two user stories. Our results indicate that users provided with contrastive questions and explanations have higher levels of satisfaction, tend to gain improved understanding, and rate the helpfulness more favourably with the recommended AI schedule compared to those without access to these features.	翻訳日:2024-11-07 07:28:56 公開日:2024-09-20
# 大規模マルチモーダルモデルによる指導誘導多粒度セグメントとキャプション Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model ( http://arxiv.org/abs/2409.13407v1 ) ライセンス: Link先を確認	Li Zhou, Xu Yuan, Zenghui Sun, Zikun Zhou, Jingsong Lan,	(参考訳) 大規模マルチモーダルモデル(LMM)は、大規模言語モデルを拡張することで大きな進歩を遂げた。この進歩を踏まえ、LMMの最新の開発は、セグメンテーションモデルの統合による高密度ピクセルワイドセグメンテーションを生成する能力を示しているが、既存の作品のテキスト応答とセグメンテーションマスクはインスタンスレベルに留まり、細部まで細部まで理解とセグメンテーションを行う能力に制限がある。この制限を克服するために、スグメンテーションとキャプション(SegCap)の粒度をユーザ指示に従ってシームレスに調整できるMGLMM(Multi-Granularity Large Multimodal Model)を導入する。このようなタスクをMGSC(Multi-Granularity Segmentation and Captioning)と呼ぶ。 MGSCタスク上でのモデルトレーニングと評価のためのベンチマークが欠如しているのを見て、カスタマイズされた自動アノテーションパイプラインを使用して、複数の粒度のマスクとキャプションを並べたベンチマークを構築した。このベンチマークは、10Kイメージと30Kイメージ検索ペアで構成されている。我々は、さらなる研究のための自動データセットアノテーションパイプラインの実装とともにデータセットをリリースし、また、異種セグメンテーションデータセットを統一する新しいSegCapデータフォーマットを提案し、マルチタスクトレーニング中にオブジェクトの概念と視覚的特徴を効果的に関連付けることを支援します。大規模な実験により,MGLMMは8つの下流タスクに精通し,MGSC,GCG,画像キャプション,セグメンテーションの参照,複数と空のセグメンテーションタスク,推論セグメンテーションタスクの最先端性能を実現していることがわかった。 MGLMMの優れた性能と汎用性は、マルチモーダル研究の進展にその潜在的影響を浮き彫りにした。 Large Multimodal Models (LMMs) have achieved significant progress by extending large language models. Building on this progress, the latest developments in LMMs demonstrate the ability to generate dense pixel-wise segmentation through the integration of segmentation models.Despite the innovations, the textual responses and segmentation masks of existing works remain at the instance level, showing limited ability to perform fine-grained understanding and segmentation even provided with detailed textual cues.To overcome this limitation, we introduce a Multi-Granularity Large Multimodal Model (MGLMM), which is capable of seamlessly adjusting the granularity of Segmentation and Captioning (SegCap) following user instructions, from panoptic SegCap to fine-grained SegCap. We name such a new task Multi-Granularity Segmentation and Captioning (MGSC). Observing the lack of a benchmark for model training and evaluation over the MGSC task, we establish a benchmark with aligned masks and captions in multi-granularity using our customized automated annotation pipeline. This benchmark comprises 10K images and more than 30K image-question pairs. We will release our dataset along with the implementation of our automated dataset annotation pipeline for further research.Besides, we propose a novel unified SegCap data format to unify heterogeneous segmentation datasets; it effectively facilitates learning to associate object concepts with visual features during multi-task training. Extensive experiments demonstrate that our MGLMM excels at tackling more than eight downstream tasks and achieves state-of-the-art performance in MGSC, GCG, image captioning, referring segmentation, multiple and empty segmentation, and reasoning segmentation tasks. The great performance and versatility of MGLMM underscore its potential impact on advancing multimodal research.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 自動内視鏡石盤認識のための合成画像の妥当性評価 Evaluating the plausibility of synthetic images for improving automated endoscopic stone recognition ( http://arxiv.org/abs/2409.13409v1 ) ライセンス: Link先を確認	Ruben Gonzalez-Perez, Francisco Lopez-Tiro, Ivan Reyes-Amezcua, Eduardo Falcon-Morales, Rosa-Maria Rodriguez-Gueant, Jacques Hubert, Michel Daudon, Gilberto Ochoa-Ruiz, Christian Daul,	(参考訳) 現在、Morpho-Constitutional Analysis (MCA) は腎臓結石の組織学的診断の事実上のアプローチであり、再発を避けるためにパーソナライズされた治療を確立するための重要なステップである。近年では、内視鏡的石盤認識(ESR)と呼ばれる、そのようなタスクを術中実行することに焦点を当てている。どちらの方法も、分析されたサンプルをいくつかのサブグループに分離するために、表面と腎臓石の断面で観察された特徴に依存している。しかし、ESRで見られる高いサーバ内変動と複雑な動作条件を考えると、コンピュータ支援診断にAIを使うことには多くの関心がある。しかし、現在のAIモデルは、優れたパフォーマンスを達成し、目に見えないディストリビューションを一般化するために、大きなデータセットを必要としている。これは大きなラベル付きデータセットの取得が非常に困難であり、腎臓石のクラスは非常に稀であるため、大きな問題である。そこで本研究では,既存の腎臓結石データセットを拡張するための拡散法を提案する。本研究の目的は,前生児データを用いた事前トレーニングモデルに使用可能な多彩な腎臓結石画像を作成することである。本研究では,CCD画像の自然画像と合成画像とを混合することにより,未確認の術中データに非常によく対応できるモデルを訓練することができることを示す。その結果,ImageNetのみで事前学習したベースラインモデルに比べて精度が10%向上する可能性が示唆された。さらに,CCD画像のみを用いたモデル列車と比較して,表面画像の6%,断面画像の10%の改善が見られ,合成画像の有効性が示された。 Currently, the Morpho-Constitutional Analysis (MCA) is the de facto approach for the etiological diagnosis of kidney stone formation, and it is an important step for establishing personalized treatment to avoid relapses. More recently, research has focused on performing such tasks intra-operatively, an approach known as Endoscopic Stone Recognition (ESR). Both methods rely on features observed in the surface and the section of kidney stones to separate the analyzed samples into several sub-groups. However, given the high intra-observer variability and the complex operating conditions found in ESR, there is a lot of interest in using AI for computer-aided diagnosis. However, current AI models require large datasets to attain a good performance and for generalizing to unseen distributions. This is a major problem as large labeled datasets are very difficult to acquire, and some classes of kidney stones are very rare. Thus, in this paper, we present a method based on diffusion as a way of augmenting pre-existing ex-vivo kidney stone datasets. Our aim is to create plausible diverse kidney stone images that can be used for pre-training models using ex-vivo data. We show that by mixing natural and synthetic images of CCD images, it is possible to train models capable of performing very well on unseen intra-operative data. Our results show that is possible to attain an improvement of 10% in terms of accuracy compared to a baseline model pre-trained only on ImageNet. Moreover, our results show an improvement of 6% for surface images and 10% for section images compared to a model train on CCD images only, which demonstrates the effectiveness of using synthetic images.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# CT/PET画像における深層学習による腫瘍分節の正弦波正規化 Sine Wave Normalization for Deep Learning-Based Tumor Segmentation in CT/PET Imaging ( http://arxiv.org/abs/2409.13410v1 ) ライセンス: Link先を確認	Jintao Ren, Muheng Li, Stine Sofia Korreman,	(参考訳) 本報告では, オートPETIIIチャレンジのために開発されたCT/PETスキャンにおける腫瘍分離の正常化ブロックについて述べる。 SineNormalはPETデータに周期的な正弦変換を適用して病変検出を強化する。 PET強調領域における強度の変化を強調し、同心リングパターンを生成することにより、特にマルチトラックPETデータセットに挑戦するセグメンテーション精度を向上させることを目的としている。プロジェクトのコードはGitHubで公開されている(https://github.com/BBQtime/Sine-Wave-Normalization-for-Deep-Learning-Based-Tumor-Segmentation-in-CT -PET)。 This report presents a normalization block for automated tumor segmentation in CT/PET scans, developed for the autoPET III Challenge. The key innovation is the introduction of the SineNormal, which applies periodic sine transformations to PET data to enhance lesion detection. By highlighting intensity variations and producing concentric ring patterns in PET highlighted regions, the model aims to improve segmentation accuracy, particularly for challenging multitracer PET datasets. The code for this project is available on GitHub (https://github.com/BBQtime/Sine-Wave-Normalization-for-Deep-Learning-Based-Tumor-Segmentation-in-CT -PET).	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 時間差重み付けによるMS病変の経時的分節化 Longitudinal Segmentation of MS Lesions via Temporal Difference Weighting ( http://arxiv.org/abs/2409.13416v1 ) ライセンス: Link先を確認	Maximilian Rokuss, Yannick Kirchhoff, Saikat Roy, Balint Kovacs, Constantin Ulrich, Tassilo Wald, Maximilian Zenk, Stefan Denner, Fabian Isensee, Philipp Vollmuth, Jens Kleesiek, Klaus Maier-Hein,	(参考訳) 経時的MRI検査における多発性硬化症(MS)病変の正確な分節化は、疾患の進行と治療効果の監視に不可欠である。臨床実習で画像を評価する場合、時間的変化が考慮されるが、既存のディープラーニング手法のほとんどは、異なる時点からのスキャンを別々に扱う。縦断画像を用いた研究の中では、時間点を統合するために用いられる最優先の手法は、チャネルワイズ結合である。本稿では,ベースラインとフォローアップスキャンの時間的差を,差分重みブロックと呼ばれるユニークなアーキテクチャ的帰納バイアスによって明示的に取り込む新しい手法を提案する。 2つのタイムポイントから機能をマージし、スキャン間の変更を強調します。病変のセグメンテーション (Dice Score, Hausdorff distance) と病変検出 (Lesion-level $F_1$ score) において, 2つのデータセットの経時的, 単独のタイムポイントモデルと比較して, 優れたスコアが得られた。私たちのコードはwww.github.com/MIC-DKFZ/Longitudinal-Difference-Weightingで公開されています。 Accurate segmentation of Multiple Sclerosis (MS) lesions in longitudinal MRI scans is crucial for monitoring disease progression and treatment efficacy. Although changes across time are taken into account when assessing images in clinical practice, most existing deep learning methods treat scans from different timepoints separately. Among studies utilizing longitudinal images, a simple channel-wise concatenation is the primary albeit suboptimal method employed to integrate timepoints. We introduce a novel approach that explicitly incorporates temporal differences between baseline and follow-up scans through a unique architectural inductive bias called Difference Weighting Block. It merges features from two timepoints, emphasizing changes between scans. We achieve superior scores in lesion segmentation (Dice Score, Hausdorff distance) as well as lesion detection (lesion-level $F_1$ score) as compared to state-of-the-art longitudinal and single timepoint models across two datasets. Our code is made publicly available at www.github.com/MIC-DKFZ/Longitudinal-Difference-Weighting.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 超伝導回路の熱分光計 Thermal spectrometer for superconducting circuits ( http://arxiv.org/abs/2409.13417v1 ) ライセンス: Link先を確認	Christoforus Dimas Satrya, Yu-Cheng Chang, Rishabh Upadhyay, Ilari K. Makinen, Joonas T. Peltonen, Bayan Karimi, Jukka P. Pekola,	(参考訳) 超伝導回路は、基本量子現象の研究や量子技術応用のための多用途で制御可能なプラットフォームを提供する。量子回路の状態を読み出し、その特性を特徴づける従来の手法は、高価で複雑な計装を含むrf測定方式に基づいている。本稿では、コプラナー導波路共振器を用いた概念実証実験において、超伝導回路の特性を調べるための熱分光計の簡単なdc測定を実演する。共振器内のマイクロ波光子のごく一部はオンチップボルメータによって吸収され、測定可能な温度上昇をもたらす。このプロセスによる温度計のdc信号のモニタリングにより、共振器の共振周波数とラインシェイプ(品質係数)を決定することができる。実証されたスキームは、単純なdc測定であり、200GHzまでの広帯域を持ち、典型的なrf分光計よりかなり優れている。さらに、熱測定は従来のrf測定とは異なり、ローレンツ吸収信号の高周波数独立基準レベルが得られる。低出力状態では、測定は完全にキャリブレーションフリーである。そこで本手法は,従来の手法よりも多くの点で優れている量子回路の代替分光器を提供する。 Superconducting circuits provide a versatile and controllable platform for studies of fundamental quantum phenomena as well as for quantum technology applications. A conventional technique to read out the state of a quantum circuit or to characterize its properties is based on rf measurement schemes involving costly and complex instrumentation. Here we demonstrate a simple dc measurement of a thermal spectrometer to investigate properties of a superconducting circuit, in this proof-of-concept experiment a coplanar waveguide resonator. A fraction of the microwave photons in the resonator is absorbed by an on-chip bolometer, resulting in a measurable temperature rise. By monitoring the dc signal of the thermometer due to this process, we are able to determine the resonance frequency and the lineshape (quality factor) of the resonator. The demonstrated scheme, which is a simple dc measurement, has a wide band up to 200 GHz, well exceeding that of the typical rf spectrometer. Moreover, the thermal measurement yields a highly frequency independent reference level of the Lorentzian absorption signal, unlike the conventional rf measurement. In the low power regime, the measurement is fully calibration-free. Our technique thus offers an alternative spectrometer for quantum circuits, which is in many ways superior with respect to conventional methods.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 就労型デュアルコントゥーリング Occupancy-Based Dual Contouring ( http://arxiv.org/abs/2409.13418v1 ) ライセンス: Link先を確認	Jisung Hwang, Minhyuk Sung,	(参考訳) 本稿では,計算時間を数秒で達成しつつ,占有関数の最先端性能を実現する2つのコントゥーリング手法を提案する。本手法は,GPU並列化を最大化するために,学習不要かつ慎重に設計されている。近年の暗黙の神経表現の急激な増加は、占有領域に大きな関心を惹きつけ、その結果、それらに基づく広範囲な3D再構成と生成方法がもたらされた。しかし、そのような手法の出力は、結果として生じる占有関数をメッシュに変換するボトルネックのために過小評価されている。マーチングキューブは階段のような人工物を産み出す傾向があり、その後のほとんどの研究は符号付き距離関数を入力として活用することに焦点を当て、占有関数に対する準最適結果も得る。 Manifold Dual Contouring (MDC) に基づくOccupancy-based Dual Contouring (ODC) を提案する。本研究では,局所表面の正規点と1D点を同時に計算し,二次誤差関数による3D点の同定を支援する補助的2D点を提案する。 1D, 2D, 3Dの点を探索するために, すべての格子縁, 顔, 細胞に並列化可能な高速アルゴリズムを開発した。複数の3次元ニューラル生成モデルと3Dメッシュデータセットを用いた実験により,本手法が先行研究と比較して最高の忠実度を達成できることが実証された。 We introduce a dual contouring method that provides state-of-the-art performance for occupancy functions while achieving computation times of a few seconds. Our method is learning-free and carefully designed to maximize the use of GPU parallelization. The recent surge of implicit neural representations has led to significant attention to occupancy fields, resulting in a wide range of 3D reconstruction and generation methods based on them. However, the outputs of such methods have been underestimated due to the bottleneck in converting the resulting occupancy function to a mesh. Marching Cubes tends to produce staircase-like artifacts, and most subsequent works focusing on exploiting signed distance functions as input also yield suboptimal results for occupancy functions. Based on Manifold Dual Contouring (MDC), we propose Occupancy-Based Dual Contouring (ODC), which mainly modifies the computation of grid edge points (1D points) and grid cell points (3D points) to not use any distance information. We introduce auxiliary 2D points that are used to compute local surface normals along with the 1D points, helping identify 3D points via the quadric error function. To search the 1D, 2D, and 3D points, we develop fast algorithms that are parallelizable across all grid edges, faces, and cells. Our experiments with several 3D neural generative models and a 3D mesh dataset demonstrate that our method achieves the best fidelity compared to prior works.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 状態空間モデル、出現、エルゴード性:安定した予測には、どれくらいのパラメータが必要か? State space models, emergence, and ergodicity: How many parameters are needed for stable predictions? ( http://arxiv.org/abs/2409.13421v1 ) ライセンス: Link先を確認	Ingvar Ziemann, Nikolai Matni, George J. Pappas,	(参考訳) 与えられたタスクを実行するために、モデルのパラメータはいくつ必要か? 自己教師付き学習によって事前訓練された大規模言語モデルは、パラメータの数が臨界スケールに達するにつれて、多段階推論のような創発的な能力を示すと論じられている。本研究では,この現象が単純な理論モデルで類似して再現できるかどうかを考察する。本稿では,線形力学系(自制学習の単純な例)の学習の問題点が相転移を示すことを示す。すなわち、すべての非エルゴード線形系に対して、学習者がそのしきい値より少ないパラメータを使用すると、大きなシーケンス長の有界誤差を達成できないような臨界しきい値が存在する。異なることに、我々のモデルでは、かなりの長距離相関を示すタスクにはパラメータ(出現に類似した現象)の臨界数が必要であり、学習者のパラメトリゼーションの役割についても検討し、隠れ状態を持つ線形力学系の単純なバージョン($\mathbb{R}$の不完全なランダムウォーク)を考える。この状況に対して,フィルタ長が有効メモリ長と水平線に依存する一定の閾値を超えない限り,ランダムウォークを円滑に学習できる線形フィルタを用いた学習者が存在しないことを示す。 How many parameters are required for a model to execute a given task? It has been argued that large language models, pre-trained via self-supervised learning, exhibit emergent capabilities such as multi-step reasoning as their number of parameters reach a critical scale. In the present work, we explore whether this phenomenon can analogously be replicated in a simple theoretical model. We show that the problem of learning linear dynamical systems -- a simple instance of self-supervised learning -- exhibits a corresponding phase transition. Namely, for every non-ergodic linear system there exists a critical threshold such that a learner using fewer parameters than said threshold cannot achieve bounded error for large sequence lengths. Put differently, in our model we find that tasks exhibiting substantial long-range correlation require a certain critical number of parameters -- a phenomenon akin to emergence. We also investigate the role of the learner's parametrization and consider a simple version of a linear dynamical system with hidden state -- an imperfectly observed random walk in $\mathbb{R}$. For this situation, we show that there exists no learner using a linear filter which can succesfully learn the random walk unless the filter length exceeds a certain threshold depending on the effective memory length and horizon of the problem.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 未知環境におけるロボットダイナミクスの最適化のための因果強化学習 Causal Reinforcement Learning for Optimisation of Robot Dynamics in Unknown Environments ( http://arxiv.org/abs/2409.13423v1 ) ライセンス: Link先を確認	Julian Gerald Dcruz, Sam Mahoney, Jia Yun Chua, Adoundeth Soukhabandith, John Mugabe, Weisi Guo, Miguel Arana-Catania,	(参考訳) 未知の環境におけるロボットの自律的な操作は、物体の運動可能性のような相互作用のダイナミクスの知識が不足しているため、困難である。本研究は,ロボット操作の強化を目的とした,新たな因果強化学習手法を導入し,都市検索・救助(SAR)シナリオに適用する。提案した機械学習アーキテクチャにより、ロボットは、テクスチャや形状などの物体の視覚的特徴と、その動作性などの相互作用における物体のダイナミクスとの間の因果関係を学習し、意思決定プロセスを大幅に改善することができる。我々は因果的発見とRL実験を行い、因果的RLの優れた性能を実証し、非因果的モデルと比較して、複雑な状況下での学習時間を24.5%以上減少させた。 Autonomous operations of robots in unknown environments are challenging due to the lack of knowledge of the dynamics of the interactions, such as the objects' movability. This work introduces a novel Causal Reinforcement Learning approach to enhancing robotics operations and applies it to an urban search and rescue (SAR) scenario. Our proposed machine learning architecture enables robots to learn the causal relationships between the visual characteristics of the objects, such as texture and shape, and the objects' dynamics upon interaction, such as their movability, significantly improving their decision-making processes. We conducted causal discovery and RL experiments demonstrating the Causal RL's superior performance, showing a notable reduction in learning times by over 24.5% in complex situations, compared to non-causal models.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# HMD$^2$:単一エゴセントリックヘッドマウントデバイスによる環境認識運動生成 HMD$^2$: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device ( http://arxiv.org/abs/2409.13426v1 ) ライセンス: Link先を確認	Vladimir Guzov, Yifeng Jiang, Fangzhou Hong, Gerard Pons-Moll, Richard Newcombe, C. Karen Liu, Yuting Ye, Lingni Ma,	(参考訳) 本稿では,外向きカラーカメラと視覚SLAM機能を備えた頭部装着装置を用いて,リアルな全身動作のオンライン生成について検討する。本稿では, 運動再構成と生成のバランスをとるための新しいシステム HMD$^2$ を導入する。再建の観点から,本システムは,頭部運動,SLAM点雲,画像埋め込みなどの解析的特徴と学習的特徴の両方を最大限に活用することを目的としている。生成面では、HMD$^2$はマルチモーダルな条件付き運動拡散モデルを採用し、生成した動きの時間的コヒーレンスを維持するために時系列バックボーンを組み込んでおり、自動回帰インペイントを用いて、最小レイテンシ(0.17秒)でオンライン動作推論を容易にする。集合的に、我々のシステムは、公開可能なスマートグラスを用いて、広範囲の屋内および屋外環境において収集された200時間を超える広範囲なデータセットにスケール可能な、非常に効果的で堅牢なソリューションを提供していることを実証した。 This paper investigates the online generation of realistic full-body human motion using a single head-mounted device with an outward-facing color camera and the ability to perform visual SLAM. Given the inherent ambiguity of this setup, we introduce a novel system, HMD$^2$, designed to balance between motion reconstruction and generation. From a reconstruction standpoint, our system aims to maximally utilize the camera streams to produce both analytical and learned features, including head motion, SLAM point cloud, and image embeddings. On the generative front, HMD$^2$ employs a multi-modal conditional motion Diffusion model, incorporating a time-series backbone to maintain temporal coherence in generated motions, and utilizes autoregressive in-painting to facilitate online motion inference with minimal latency (0.17 seconds). Collectively, we demonstrate that our system offers a highly effective and robust solution capable of scaling to an extensive dataset of over 200 hours collected in a wide range of complex indoor and outdoor environments using publicly available smart glasses.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# テキスト対応マスド画像モデリングによるシーンテキスト除去のためのテキストローカライゼーションの活用 Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling ( http://arxiv.org/abs/2409.13431v1 ) ライセンス: Link先を確認	Zixiao Wang, Hongtao Xie, YuXin Wang, Yadong Qu, Fengjun Guo, Pengwei Liu,	(参考訳) 既存のシーンテキスト削除(STR)タスクは、高価なピクセルレベルのラベリングのため、トレーニングデータ不足に悩まされる。本稿では,低コストなテキスト検出ラベル付きSTRモデル(テキスト境界ボックスなど)を事前学習可能なテキスト対応マスク付き画像モデリングアルゴリズム(TMIM)を導入することで,この問題に対処することを目的とする。間接的補助的タスクのみを用いて暗黙的特徴抽出能力を高める従来の事前訓練方法とは異なり、TMIMではまずSTRタスクを弱教師付きで直接訓練し、STRの知識を明確かつ効率的に探索する。 TMIMでは、まず背景モデリングストリームを構築し、マスクされた非テキスト領域を復元することで背景生成規則を学習する。一方、マスクされたテキスト領域に擬似STRラベルを提供する。次に、擬似ラベルから学習し、そのモデルにエンドツーエンドのSTR能力を持たせるために、テキスト消去ストリームを提案する。 2つのコラボレーティブストリームから恩恵を受けながら、私たちのSTRモデルは、高コストSTRラベルの制限を大幅に軽減する、公開テキスト検出データセットでのみ、素晴らしいパフォーマンスを達成できます。実験により,本手法は他のプレトレイン法よりも優れ,最先端性能(SCUT-EnsTextの37.35 PSNR)が得られた。コードはhttps://github.com/wzx99/TMIMで入手できる。 Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling. In this paper, we aim to address this issue by introducing a Text-aware Masked Image Modeling algorithm (TMIM), which can pretrain STR models with low-cost text detection labels (e.g., text bounding box). Different from previous pretraining methods that use indirect auxiliary tasks only to enhance the implicit feature extraction ability, our TMIM first enables the STR task to be directly trained in a weakly supervised manner, which explores the STR knowledge explicitly and efficiently. In TMIM, first, a Background Modeling stream is built to learn background generation rules by recovering the masked non-text region. Meanwhile, it provides pseudo STR labels on the masked text region. Second, a Text Erasing stream is proposed to learn from the pseudo labels and equip the model with end-to-end STR ability. Benefiting from the two collaborative streams, our STR model can achieve impressive performance only with the public text detection datasets, which greatly alleviates the limitation of the high-cost STR labels. Experiments demonstrate that our method outperforms other pretrain methods and achieves state-of-the-art performance (37.35 PSNR on SCUT-EnsText). Code will be available at https://github.com/wzx99/TMIM.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 摂動高次例外点の固有値スペクトルに対するグラフ理論的アプローチ Graph-theoretical approach to the eigenvalue spectrum of perturbed higher-order exceptional points ( http://arxiv.org/abs/2409.13434v1 ) ライセンス: Link先を確認	Daniel Grom, Julius Kullig, Malte Röntgen, Jan Wiersig,	(参考訳) 例外点はパラメータ空間の特別な縮退点であり、開量子および波動系を記述する(効果的)非エルミート・ハミルトニアンにおいて生じる。 n階の例外点において、n 個の固有値と対応する固有ベクトルが同時に結合する。これらの結合固有値は、センサ応用に有用な小さな摂動に対して強い応答を示すのが一般的である。強度$\epsilon$のいわゆる一般摂動は、$\epsilon$のn番目の根に比例する固有値を変化させる。摂動下での異なる固有値の振る舞いは非GA(non-generic)と呼ばれる。様々な種類の摂動に対する固有値の振る舞いの理解は望ましいものであり、応用にも不可欠である。我々は、高次例外点の固有値スペクトル、すなわち n > 2 に対する摂動効果の理解に寄与するグラフ理論的視点を提唱する。半無限導波路と端ミラーを結合したマイクロリングのシステムについて,非遺伝的摂動の関連性を強調し,その発生を解釈する。さらに、そのようなシステムにおいて空洞選択的センシングに生じる飽和効果は、グラフ理論図の中で自然に説明される。 Exceptional points are special degeneracy points in parameter space that can arise in (effective) non-Hermitian Hamiltonians describing open quantum and wave systems. At an n-th order exceptional point, n eigenvalues and the corresponding eigenvectors simultaneously coalesce. These coalescing eigenvalues typically exhibit a strong response to small perturbations which can be useful for sensor applications. A so-called generic perturbation with strength $\epsilon$ changes the eigenvalues proportional to the n-th root of $\epsilon$. A different eigenvalue behavior under perturbation is called non-generic. An understanding of the behavior of the eigenvalues for various types of perturbations is desirable and also crucial for applications. We advocate a graph-theoretical perspective that contributes to the understanding of perturbative effects on the eigenvalue spectrum of higher-order exceptional points, i.e. n > 2. To highlight the relevance of non-generic perturbations and to give an interpretation for their occurrence, we consider an illustrative example, a system of microrings coupled by a semi-infinite waveguide with an end mirror. Furthermore, the saturation effect occurring for cavity-selective sensing in such a system is naturally explained within the graph-theoretical picture.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 遺伝子モデルを用いたダウンシンドローム脳バイオマーカーの発見に向けて Towards the Discovery of Down Syndrome Brain Biomarkers Using Generative Models ( http://arxiv.org/abs/2409.13437v1 ) ライセンス: Link先を確認	Jordi Malé, Juan Fortea, Mateus Rozalem Aranha, Yann Heuzé, Neus Martínez-Abadías, Xavier Sevillano,	(参考訳) 脳イメージングにより、神経科学者は、ダウン症候群、認知障害や記憶障害の神経解剖学的基盤を解明するための関心領域の特定など、遺伝や神経発達障害の脳形態を分析できるようになった。しかし、脳解剖学、認知能力、アルツハイマー病などの合併症の関連性はまだダウン症候群の集団ではよく分かっていない。人工知能の最新の進歩は、大量の脳磁気共鳴イメージングスキャンを解析する自動ツールを開発する機会となり、手動解析のボトルネックを克服する。本研究では、アルツハイマー病による神経変性の度合いに影響を及ぼすダウン症候群患者の脳変化を検出するための生成モデルを提案する。そこで我々は,脳磁気共鳴画像スキャンの独自のデータセットを活用し,変分オートエンコーダと拡散モデルに基づく最先端の脳異常検出モデルの評価を行った。総合的な評価プロセスの後、本研究はいくつかの重要な分析を含む。まず,神経放射線学の専門家による質的評価を行った。第2に, 生成モデルに対する定量的および定性的再構成忠実度調査を行った。第3に,ヒストグラムのポストプロセッシングがモデル性能をいかに向上させるかを検討するため,アブレーション試験を行った。最後に,皮質下構造の定量的体積解析を行った。以上の結果より,ダウン症候群の脳解剖を特徴付ける一次変化,小脳小脳,拡大した心室,大脳皮質の縮小,およびアルツハイマー病による頭頂葉の変化を効果的に検出できるモデルがあることが示唆された。 Brain imaging has allowed neuroscientists to analyze brain morphology in genetic and neurodevelopmental disorders, such as Down syndrome, pinpointing regions of interest to unravel the neuroanatomical underpinnings of cognitive impairment and memory deficits. However, the connections between brain anatomy, cognitive performance and comorbidities like Alzheimer's disease are still poorly understood in the Down syndrome population. The latest advances in artificial intelligence constitute an opportunity for developing automatic tools to analyze large volumes of brain magnetic resonance imaging scans, overcoming the bottleneck of manual analysis. In this study, we propose the use of generative models for detecting brain alterations in people with Down syndrome affected by various degrees of neurodegeneration caused by Alzheimer's disease. To that end, we evaluate state-of-the-art brain anomaly detection models based on Variational Autoencoders and Diffusion Models, leveraging a proprietary dataset of brain magnetic resonance imaging scans. Following a comprehensive evaluation process, our study includes several key analyses. First, we conducted a qualitative evaluation by expert neuroradiologists. Second, we performed both quantitative and qualitative reconstruction fidelity studies for the generative models. Third, we carried out an ablation study to examine how the incorporation of histogram post-processing can enhance model performance. Finally, we executed a quantitative volumetric analysis of subcortical structures. Our findings indicate that some models effectively detect the primary alterations characterizing Down syndrome's brain anatomy, including a smaller cerebellum, enlarged ventricles, and cerebral cortex reduction, as well as the parietal lobe alterations caused by Alzheimer's disease.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 脳波代表学習のための個人用マルチモーダルラプラシアンドロップアウト(DP-MLD) Differentially Private Multimodal Laplacian Dropout (DP-MLD) for EEG Representative Learning ( http://arxiv.org/abs/2409.13440v1 ) ライセンス: Link先を確認	Xiaowen Fu, Bingxin Wang, Xinzhou Guo, Guoqing Liu, Yang Xiang,	(参考訳) 近年,マルチモーダル脳波(EEG)学習は,疾患検出において大きな可能性を秘めている。同時に、法的・倫理的な懸念から、臨床研究におけるプライバシーの確保がますます重要になっている。プライバシー保護のための広く採用されているスキームは、その明確な解釈と実装の容易さのため、差分プライバシー(DP)である。 DP下では数多くの手法が提案されているが、モデルや信号データの複雑さのため、マルチモーダル脳波データについては広く研究されていない。本稿では,マルチモーダル脳波学習のためのDP-MLD方式を提案する。本稿では,言語モデルによる脳波データをテキストとして処理し,視覚変換器による脳波データを画像として処理する多モーダル代表学習モデルを提案する。 DPを実現するために,プライバシ予算内でランダム度割り当てと性能を動的に最適化する新しい適応型機能レベルのラプラシアンドロップアウト方式を設計する。パーキンソン病(PD)におけるフリーズ・オブ・ゲイト(FoG)のオープンソースマルチモーダルデータセットの実験において,提案手法は分類精度をおよそ4倍改善し,DP下でのマルチモーダル脳波学習における最先端性能を実現する。 Recently, multimodal electroencephalogram (EEG) learning has shown great promise in disease detection. At the same time, ensuring privacy in clinical studies has become increasingly crucial due to legal and ethical concerns. One widely adopted scheme for privacy protection is differential privacy (DP) because of its clear interpretation and ease of implementation. Although numerous methods have been proposed under DP, it has not been extensively studied for multimodal EEG data due to the complexities of models and signal data considered there. In this paper, we propose a novel Differentially Private Multimodal Laplacian Dropout (DP-MLD) scheme for multimodal EEG learning. Our approach proposes a novel multimodal representative learning model that processes EEG data by language models as text and other modal data by vision transformers as images, incorporating well-designed cross-attention mechanisms to effectively extract and integrate cross-modal features. To achieve DP, we design a novel adaptive feature-level Laplacian dropout scheme, where randomness allocation and performance are dynamically optimized within given privacy budgets. In the experiment on an open-source multimodal dataset of Freezing of Gait (FoG) in Parkinson's Disease (PD), our proposed method demonstrates an approximate 4\% improvement in classification accuracy, and achieves state-of-the-art performance in multimodal EEG learning under DP.	翻訳日:2024-11-07 07:17:49 公開日:2024-09-20
# 自然言語入力による階層学習を用いた検索・救助における選択的探索と情報収集 Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input ( http://arxiv.org/abs/2409.13445v1 ) ライセンス: Link先を確認	Dimitrios Panagopoulos, Adolfo Perrusquia, Weisi Guo,	(参考訳) 近年、ロボットと自律システムは私たちの日常生活にますます不可欠なものとなり、様々な領域にまたがる複雑な問題に対する解決策を提供してきた。しかし、SAR(Search and rescue)オペレーションにおけるそれらの応用は、ユニークな課題を提示している。災害に遭った地域を網羅的に探索することは、地形の広さ、変化する環境、そして関連する時間的制約のためにしばしば実現不可能である。従来のロボットシステムは、事前に定義された探索パターンで動作し、人間の利害関係者が提供する真実を取り入れ、活用する能力が欠如している。このギャップに対処するため,大規模言語モデル(LLM)と階層的強化学習(HRL)フレームワークを連携させるシステムを導入する。提案システムは,人間の利害関係者からの言語入力を実用的なRLインサイトへ翻訳し,検索戦略を調整するように設計されている。 LLMによる人為的情報の利用とHRLによるタスク実行の構造化により、我々のアプローチは自律能力と人間の知能のギャップを埋めるだけでなく、長い地平線とスパース報酬によって特徴づけられる環境におけるエージェントの学習効率と意思決定プロセスを大幅に改善する。 In recent years, robots and autonomous systems have become increasingly integral to our daily lives, offering solutions to complex problems across various domains. Their application in search and rescue (SAR) operations, however, presents unique challenges. Comprehensively exploring the disaster-stricken area is often infeasible due to the vastness of the terrain, transformed environment, and the time constraints involved. Traditional robotic systems typically operate on predefined search patterns and lack the ability to incorporate and exploit ground truths provided by human stakeholders, which can be the key to speeding up the learning process and enhancing triage. Addressing this gap, we introduce a system that integrates social interaction via large language models (LLMs) with a hierarchical reinforcement learning (HRL) framework. The proposed system is designed to translate verbal inputs from human stakeholders into actionable RL insights and adjust its search strategy. By leveraging human-provided information through LLMs and structuring task execution through HRL, our approach not only bridges the gap between autonomous capabilities and human intelligence but also significantly improves the agent's learning efficiency and decision-making process in environments characterised by long horizons and sparse rewards.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# Minstrel: 非AIエキスパートのためのマルチエージェントコーディネーションによる構造的プロンプト生成 Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts ( http://arxiv.org/abs/2409.13449v1 ) ライセンス: Link先を確認	Ming Wang, Yuanzhong Liu, Xiaoyu Liang, Yijie Huang, Daling Wang, Xiaocui Yang, Sijia Shen, Shi Feng, Xiaoming Zhang, Chaofeng Guan, Yifei Zhang,	(参考訳) LLMは様々な領域にまたがって高い性能を示してきた。それでも、彼らの仕事を助けるための高品質なプロンプトを定式化することは、非AI専門家にとって挑戦となる。プロンプトエンジニアリングにおける既存の研究は、幾らか分散した最適化原則と設計が経験的に依存したプロンプトオプティマイザを示唆している。残念なことに、これらの取り組みには構造的な設計がなく、高い学習コストが発生しており、特にAIの専門家以外の人々にとって、プロンプトの反復的な更新には適していない。構造的再利用可能なプログラミング言語に着想を得て,構造的プロンプト設計フレームワークであるLangGPTを提案する。さらに、構造的プロンプトの自動生成を実現するために、リフレクションを備えた多世代エージェントであるMinstrelを導入する。実験とケーススタディにより,ミンストレルが生成した構造的プロンプトや手書きによるLLMの性能向上が明らかに示された。さらに,オンラインコミュニティにおけるユーザ調査を通じて,構造的プロンプトの使いやすさを分析した。 LLMs have demonstrated commendable performance across diverse domains. Nevertheless, formulating high-quality prompts to assist them in their work poses a challenge for non-AI experts. Existing research in prompt engineering suggests somewhat scattered optimization principles and designs empirically dependent prompt optimizers. Unfortunately, these endeavors lack a structural design, incurring high learning costs and it is not conducive to the iterative updating of prompts, especially for non-AI experts. Inspired by structured reusable programming languages, we propose LangGPT, a structural prompt design framework. Furthermore, we introduce Minstrel, a multi-generative agent system with reflection to automate the generation of structural prompts. Experiments and the case study illustrate that structural prompts generated by Minstrel or written manually significantly enhance the performance of LLMs. Furthermore, we analyze the ease of use of structural prompts through a user survey in our online community.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# ADMMに基づくフェデレーションラーニング Noise-Robust and Resource-Efficient ADMM-based Federated Learning ( http://arxiv.org/abs/2409.13451v1 ) ライセンス: Link先を確認	Ehsan Lari, Reza Arablouei, Vinay Chakravarthi Gogineni, Stefan Werner,	(参考訳) フェデレートラーニング(FL)は、クライアントサーバ通信を活用して、分散データ上でグローバルモデルをトレーニングする。しかし、通信ノイズやエラーはモデルの精度を損なう可能性がある。この問題に対処するために,通信負荷を低減しつつ,通信騒音に対する堅牢性を高める新しいFLアルゴリズムを提案する。本稿では,重み付き最小二乗回帰問題(WLS)を具体例として,提案アルゴリズムを導出する。本稿では,分散凸最適化問題としてのWLS回帰を,ランダムスケジューリングを用いた分散ネットワーク上での分散凸最適化問題として,通信効率の向上を目的とした。次に、この問題を反復的に解くために乗算器の交互方向法(ADMM)を適用する。累積的な通信雑音による有害な影響を抑えるため,両変数を排除し,各クライアントで新たなローカルモデル更新を実装することで,鍵となる修正を導入する。この微妙ながら効果的な変更により、各クライアントで2つではなく1つのノイズの多いグローバルモデル更新を使用することで、追加的な通信ノイズに対する堅牢性が改善される。さらに、サーバに選択されていなくてもクライアントがローカル更新を継続できるように、また別の修正を加えて、大幅なパフォーマンス改善を実現しました。我々の理論解析は,サーバが各繰り返しにおけるノイズの多いリンク上でクライアントのランダムなサブセットと通信する場合でも,平均および平均2乗感覚におけるアルゴリズムの収束を確認している。その結果,提案アルゴリズムの有効性を検証し,理論的知見を裏付ける結果を得た。 Federated learning (FL) leverages client-server communications to train global models on decentralized data. However, communication noise or errors can impair model accuracy. To address this problem, we propose a novel FL algorithm that enhances robustness against communication noise while also reducing communication load. We derive the proposed algorithm through solving the weighted least-squares (WLS) regression problem as an illustrative example. We first frame WLS regression as a distributed convex optimization problem over a federated network employing random scheduling for improved communication efficiency. We then apply the alternating direction method of multipliers (ADMM) to iteratively solve this problem. To counteract the detrimental effects of cumulative communication noise, we introduce a key modification by eliminating the dual variable and implementing a new local model update at each participating client. This subtle yet effective change results in using a single noisy global model update at each client instead of two, improving robustness against additive communication noise. Furthermore, we incorporate another modification enabling clients to continue local updates even when not selected by the server, leading to substantial performance improvements. Our theoretical analysis confirms the convergence of our algorithm in both mean and the mean-square senses, even when the server communicates with a random subset of clients over noisy links at each iteration. Numerical results validate the effectiveness of our proposed algorithm and corroborate our theoretical findings.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# 機械学習におけるパラメータ推定のためのランク1格子を用いたデータ圧縮 Data Compression using Rank-1 Lattices for Parameter Estimation in Machine Learning ( http://arxiv.org/abs/2409.13453v1 ) ライセンス: Link先を確認	Michael Gnewuch, Kumar Harsha, Marcin Wnuk,	(参考訳) 平均二乗誤差と正規化バージョンは、教師付き機械学習における標準損失関数である。しかし、これらの大きなデータセットに対する損失を計算することは、計算的に要求される。 J. Dick と M. Feischl [Journal of Complexity 67 (2021)] のアプローチを改良し、ランク1格子を用いて広範なデータセットを小さくするアルゴリズムを提案する。ランク1格子は準モンテカルロ(QMC)点集合であり、慎重に選択されたとしても多次元単位立方体においてよく分布する。前処理ステップの圧縮戦略は、すべての格子点に対して、元のデータと応答に依存する一対の重みを割り当て、その相対的な重要性を示す。その結果、圧縮されたデータにより、最適化ステップにおける繰り返し損失計算がより高速になる。我々は、QMCデータ圧縮アルゴリズムの誤差と、フーリエ係数が十分に高速に崩壊する関数に対する前処理ステップのコストを分析し、それらがある種のウィーナー代数やコロボフ空間に存在するようにした。特に、関数が十分に滑らかである限り、我々のアプローチが任意の高収束率につながることを証明している。 The mean squared error and regularized versions of it are standard loss functions in supervised machine learning. However, calculating these losses for large data sets can be computationally demanding. Modifying an approach of J. Dick and M. Feischl [Journal of Complexity 67 (2021)], we present algorithms to reduce extensive data sets to a smaller size using rank-1 lattices. Rank-1 lattices are quasi-Monte Carlo (QMC) point sets that are, if carefully chosen, well-distributed in a multidimensional unit cube. The compression strategy in the preprocessing step assigns every lattice point a pair of weights depending on the original data and responses, representing its relative importance. As a result, the compressed data makes iterative loss calculations in optimization steps much faster. We analyze the errors of our QMC data compression algorithms and the cost of the preprocessing step for functions whose Fourier coefficients decay sufficiently fast so that they lie in certain Wiener algebras or Korobov spaces. In particular, we prove that our approach can lead to arbitrary high convergence rates as long as the functions are sufficiently smooth.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# コンピュータビジョンにおける概念に基づく説明:我々はどこにいてどこへ行くのか? Concept-Based Explanations in Computer Vision: Where Are We and Where Could We Go? ( http://arxiv.org/abs/2409.13456v1 ) ライセンス: Link先を確認	Jae Hee Lee, Georgii Mikriukov, Gesina Schwalbe, Stefan Wermter, Diedrich Wolter,	(参考訳) 概念に基づくXAI(C-XAI)アプローチは、概念(画像の中の意味論的意味のある部分)を参照する説明は直感的に理解でき、関連する領域のみを明らかにする唾液ベースのテクニックを越えているため、将来的な研究分野である。近年のこの分野の顕著な進歩を考えると、コミュニティは進歩とトレンドを批判的に見る時が来た。そこで本研究では,C-XAI法を用いて,興味深く未探索な領域を同定し,今後の研究方向性を提案する。この目的のために、説明すべき概念の選択、概念表現の選択、概念の制御方法の3つの主な方向を考える。後者では,知識表現と学習の分野からインスピレーションを得る手法を提案し,これが今後のC-XAI研究をいかに充実させるかを示した。 Concept-based XAI (C-XAI) approaches to explaining neural vision models are a promising field of research, since explanations that refer to concepts (i.e., semantically meaningful parts in an image) are intuitive to understand and go beyond saliency-based techniques that only reveal relevant regions. Given the remarkable progress in this field in recent years, it is time for the community to take a critical look at the advances and trends. Consequently, this paper reviews C-XAI methods to identify interesting and underexplored areas and proposes future research directions. To this end, we consider three main directions: the choice of concepts to explain, the choice of concept representation, and how we can control concepts. For the latter, we propose techniques and draw inspiration from the field of knowledge representation and learning, showing how this could enrich future C-XAI research.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# Facebook URLデータセットにおける時間的エンゲージメント、コンテンツ品質、イデオロギー Engagement, Content Quality and Ideology over Time on the Facebook URL Dataset ( http://arxiv.org/abs/2409.13461v1 ) ライセンス: Link先を確認	Emma Fraxanet, Fabrizio Germano, Andreas Kaltenbrunner, Vicenç Gómez,	(参考訳) ソーシャルメディア利用者のイデオロギーとオンラインニュース消費の関係を解き放つことで、ユーザのエンゲージメント行動とレコメンダシステムのコンテンツ提供とのフィードバックループに関する重要な洞察が得られる。しかしながら、プラットフォームによって引き起こされる影響から固有のユーザの振る舞いを遠ざけることは、特に限られた期間をカバーするデータセットを扱う場合、大きな課題となる。本研究では、2017年1月から2020年12月まで、米国におけるニュースURLに関連するユーザエンゲージメント指標を調査し、Facebook Privacy-Protected Full URLs Datasetを用いて、総括分析と縦断解析を行った。ニュースソースのイデオロギー的アライメントと質を,ユーザの政治的嗜好と合わせて取り入れることで,リベラル,保守的,中道的な読者を対象に,イデオロギーとニュース消費の質の重み付け平均を構築した。これにより、進化の追跡が可能になります。一リベラル派と保守派のイデオロギー的ギャップ (ii)各グループのニュース消費の平均品質。これらの指標は、偏光や誤報のようなより広い現象と関連付けられている。両指標のトレンドには,ユーザエンゲージメントの変化に伴う2つの大きな変化がある。両点ともイデオロギー的ギャップが拡大し,ニュース品質が低下するが,第1点以降はエンゲージメントが増加し,第2点以降は減少する。最後に、Facebookのニュースフィードアルゴリズムの2つのメジャーアップデートとの関係について議論することで、これらの変更を文脈化します。 Unpacking the relationship between the ideology of social media users and their online news consumption offers critical insight into the feedback loop between users' engagement behavior and the recommender systems' content provision. However, disentangling inherent user behavior from platform-induced influences poses significant challenges, particularly when working with datasets covering limited time periods. In this study, we conduct both aggregate and longitudinal analyses using the Facebook Privacy-Protected Full URLs Dataset, examining user engagement metrics related to news URLs in the U.S. from January 2017 to December 2020. By incorporating the ideological alignment and quality of news sources, along with users' political preferences, we construct weighted averages of ideology and quality of news consumption for liberal, conservative, and moderate audiences. This allows us to track the evolution of (i) the ideological gap between liberals and conservatives and (ii) the average quality of each group's news consumption. These metrics are linked to broader phenomena such as polarization and misinformation. We identify two significant shifts in trends for both metrics, each coinciding with changes in user engagement. Interestingly, during both inflection points, the ideological gap widens and news quality declines; however, engagement increases after the first one and decreases after the second. Finally, we contextualize these changes by discussing their potential relation to two major updates to Facebook's News Feed algorithm.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# 畳み込みニューラルネットワークを用いた圧縮画像のロバスト能動物体検出 Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks ( http://arxiv.org/abs/2409.13464v1 ) ライセンス: Link先を確認	Guibiao Liao, Wei Gao,	(参考訳) 健全物体検出(SOD)は近年大きく進歩している。実際のシナリオでは、圧縮画像(CI)がデータ転送と記憶の主要な媒体となる。しかし、畳み込みニューラルネットワーク(CNN)を用いた圧縮画像のSODに向けて注意が向けられている。本稿では,圧縮画像上でのCNNに基づく有意な物体検出の厳密なベンチマークと解析を行う。この問題を包括的に研究するために、既存の公開SODデータセットからさまざまなCI SODデータセットを慎重に確立する。次に, 圧縮画像(約264万画像)上での強靭性の評価を行い, 代表的CNNに基づくSOD法について検討した。重要な点として,評価結果は2つの重要な発見である。 1) 現在最先端のCNNベースのSODモデルは、クリーンな画像に優れたが、圧縮された画像に適用すると大きなパフォーマンスボトルネックが生じる。 2)CI SODのロバスト性に影響を与える主な要因は,圧縮画像の特徴と,有意な特徴学習の限界に根ざしている。これらの観測に基づいて、我々は、堅牢なCNNベースのCI SODを実現するために、ロバストな特徴表現学習に焦点を当てた、単純で有望なベースラインフレームワークを提案する。本手法の有効性を実証し, クリーンなデータに対する競合精度を維持しつつ, 画像劣化の度合いを著しく改善したことを示す。我々は、CNNベースのSODアルゴリズムの堅牢性をより包括的に理解し、コミュニティにおける今後の研究を促進するために、ベンチマークの取り組み、分析的洞察、提案された技術が貢献できることを願っている。 Salient object detection (SOD) has achieved substantial progress in recent years. In practical scenarios, compressed images (CI) serve as the primary medium for data transmission and storage. However, scant attention has been directed towards SOD for compressed images using convolutional neural networks (CNNs). In this paper, we are dedicated to strictly benchmarking and analyzing CNN-based salient object detection on compressed images. To comprehensively study this issue, we meticulously establish various CI SOD datasets from existing public SOD datasets. Subsequently, we investigate representative CNN-based SOD methods, assessing their robustness on compressed images (approximately 2.64 million images). Importantly, our evaluation results reveal two key findings: 1) current state-of-the-art CNN-based SOD models, while excelling on clean images, exhibit significant performance bottlenecks when applied to compressed images. 2) The principal factors influencing the robustness of CI SOD are rooted in the characteristics of compressed images and the limitations in saliency feature learning. Based on these observations, we propose a simple yet promising baseline framework that focuses on robust feature representation learning to achieve robust CNN-based CI SOD. Extensive experiments demonstrate the effectiveness of our approach, showcasing markedly improved robustness across various levels of image degradation, while maintaining competitive accuracy on clean data. We hope that our benchmarking efforts, analytical insights, and proposed techniques will contribute to a more comprehensive understanding of the robustness of CNN-based SOD algorithms, inspiring future research in the community.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# 片道横断CNOTゲートによる高能率耐故障コードスイッチング Efficient fault-tolerant code switching via one-way transversal CNOT gates ( http://arxiv.org/abs/2409.13465v1 ) ライセンス: Link先を確認	Sascha Heußen, Janine Hilder,	(参考訳) コードスイッチングは、2つのQEC符号と相補的なゲートセットを組み合わせることで、FT量子ゲート操作の普遍的なセットを容易にする確立された技術であり、それぞれがフォールトトレラントの実装が容易である。本研究では,FT回路設計の制約を考慮に入れたコードスイッチング方式を提案する。これらのゲートは本質的にFTであり、追加のキュービットオーバーヘッドはない。我々は、既存の量子プロセッサ(例えば、閉じ込められたイオンや中性原子)での動作に適した、低距離カラーコードへのスキームの適用を解析する。超伝導量子ビットに基づくアーキテクチャにおいて生じる接続制約について,簡潔に論じる。回路レベルの雑音の数値シミュレーションにより,本手法により促進される論理的な$T$ゲートは,フラグ-FTマジック状態注入プロトコルと物理値の$T$ゲートを低物理誤差で上回る可能性が示唆された。トランスバーサルコードスイッチングは、任意のコード距離のコードペアに自然にスケールする。距離3実装と物理ゲートの両方と比較して,距離5プロトコルの性能向上を観察し,現実的に実現可能な物理エンタングゲート誤差率について検討した。論理的補助量子ビットが十分に確実に準備できることを前提として、このスキームを大規模な並列化でどのように実装できるかを論じる。当社の論理的な$T$-gateは、コストのかかる州立工場を回避します。 QECを実行し、FTユニバーサルゲートセットを達成するための要件は、基本的に同じである: 論理補助キュービットをオフラインに準備し、トランスバースゲートを実行し、高速に測定する。したがって、トランスバーサル符号切替は、FT普遍量子計算のより実用的なハードウェア実現を可能にする。このスキームは、論理量子ビット上で実行される量子アルゴリズムの実験的なデモンストレーションのためのリソース要件を緩和する。 Code switching is an established technique that facilitates a universal set of FT quantum gate operations by combining two QEC codes with complementary sets of gates, which each by themselves are easy to implement fault-tolerantly. In this work, we present a code switching scheme that respects the constraints of FT circuit design by only making use of transversal gates. These gates are intrinsically FT without additional qubit overhead. We analyze application of the scheme to low-distance color codes, which are suitable for operation in existing quantum processors, for instance based on trapped ions or neutral atoms. We briefly discuss connectivity constraints that arise for architectures based on superconducting qubits. Numerical simulations of circuit-level noise indicate that a logical $T$-gate, facilitated by our scheme, could outperform both flag-FT magic state injection protocols and a physical $T$-gate at low physical error rates. Transversal code switching naturally scales to code pairs of arbitrary code distance. We observe improved performance of a distance-5 protocol compared to both the distance-3 implementation and the physical gate for realistically attainable physical entangling gate error rates. We discuss how the scheme can be implemented with a large degree of parallelization, provided that logical auxiliary qubits can be prepared reliably enough. Our logical $T$-gate circumvents potentially costly magic state factories. The requirements to perform QEC and to achieve an FT universal gate set are then essentially the same: Prepare logical auxiliary qubits offline, execute transversal gates and perform fast-enough measurements. Transversal code switching thus serves to enable more practical hardware realizations of FT universal quantum computation. The scheme alleviates resource requirements for experimental demonstrations of quantum algorithms run on logical qubits.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# 孤立林を用いたフェデレーション学習環境におけるグローバル・アウトリー検出 Global Outlier Detection in a Federated Learning Setting with Isolation Forest ( http://arxiv.org/abs/2409.13466v1 ) ライセンス: Link先を確認	Daniele Malpetti, Laura Azzimonti,	(参考訳) 本稿では,特にクロスサイロシナリオをターゲットとした,フェデレーション学習環境におけるグローバルなアウトレイラの検出手法を提案する。当社のアプローチでは、2つのサーバの使用と、クライアントから1つのサーバにマスキングされたローカルデータの送信を伴います。データのマスキングは、外れ値の識別を引き続き許可しながら、機密情報の開示を防止する。さらに、プライバシーをさらに保護するために、サーバがどのクライアントがマスキングされたデータポイントを所有しているかを知らないよう、置換機構を実装している。サーバは、アイソレーションフォレストまたはその拡張バージョンを使用して、マスクされたデータに対する外れ値検出を実行し、クライアントにアウト値情報を送信し、その後のフェデレーションモデルトレーニングを開始する前に、ローカルデータセットの外れ値の識別と削除を可能にする。このアプローチは、プレーンデータに対する分離フォレストアルゴリズムの集中実行に匹敵する結果をもたらす。 We present a novel strategy for detecting global outliers in a federated learning setting, targeting in particular cross-silo scenarios. Our approach involves the use of two servers and the transmission of masked local data from clients to one of the servers. The masking of the data prevents the disclosure of sensitive information while still permitting the identification of outliers. Moreover, to further safeguard privacy, a permutation mechanism is implemented so that the server does not know which client owns any masked data point. The server performs outlier detection on the masked data, using either Isolation Forest or its extended version, and then communicates outlier information back to the clients, allowing them to identify and remove outliers in their local datasets before starting any subsequent federated model training. This approach provides comparable results to a centralized execution of Isolation Forest algorithms on plain data.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# ライドバーグ原子イオン分子の振動結合 Vibrationally coupled Rydberg atom-ion molecules ( http://arxiv.org/abs/2409.13469v1 ) ライセンス: Link先を確認	Ilango Maran, Liam J. Bond, Jeremy T. Young, Arghavan Safavi-Naini, Rene Gerritsma,	(参考訳) 両端にRydberg原子と結合したポールトラップにイオン結晶が閉じ込められたハイブリッド原子イオン系におけるRydberg原子イオン分子(RAIMs)の発生について検討した。このようなシステムの実現可能性を評価するため、我々はポールトラップのrf電位がRAIMに与える影響を詳細にFloquet解析し、スケーリング法則に基づく生存確率の定性解析を行う。 RAIMは十分に弱い低周波トラップに対して生存する。次に、このハイブリッドシステムを用いて、イオン結晶の共通運動モードを利用して、チェーンの端で2つのRAIMを形成する確率を抑制(遮断)または強化(阻害)し、典型的な遮断半径をイオン結晶の長さで置き換える手法を提案する。 We study the occurrence of Rydberg atom-ion molecules (RAIMs) in a hybrid atom-ion system with an ion crystal trapped in a Paul trap coupled to Rydberg atoms on its either ends. To assess the feasibility of such a system, we perform a detailed Floquet analysis of the effect of the Paul trap's rf potential on the RAIMs and provide a qualitative analysis of the survival probability based on scaling laws. We conclude that the RAIM survives for sufficiently weak and low frequency traps. We then use this hybrid system and propose a scheme to utilise the common motional modes of the ion crystal to suppress (blockade) or enhance (anti-blockade) the probability of forming two RAIMs at the ends of the chain, replacing the typical blockade radius by the length of the ion crystal.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# 決定論的・確率的動的分類法--雑音を伴うランダム対逆攻撃に対抗する Deterministic versus stochastic dynamical classifiers: opposing random adversarial attacks with noise ( http://arxiv.org/abs/2409.13470v1 ) ライセンス: Link先を確認	Lorenzo Chicchi, Duccio Fanelli, Diego Febbe, Lorenzo Buffoni, Francesca Di Patti, Lorenzo Giambagli, Raffele Marino,	(参考訳) 興奮性生物学的ニューロンの相互交叉ダイナミクスを記述するために神経科学で広く用いられている連続可変フィリングレート(CVFR)モデルは、ここで訓練され、動的に補助される分類器としてテストされる。この目的のために、モデルは、そのスペクトル分解を通じて、ノード間結合行列に自己整合的に埋め込まれた植込み誘引器のセットを供給される。金額を分類する学習は、課された均衡のアトラクションの盆地を削り、それぞれの関係のクラスを反映した、対応する目的地目標に向けて異なる項目を誘導する。 CVFRモデルの確率的変種も研究され、不可逆的ランダム攻撃に対して頑健であることが判明し、分類対象の項目が破損した。この驚くべき発見は、ノイズと動的特性が互いに共鳴するときに生じる、非常に多くの驚くべき影響の1つである。 The Continuous-Variable Firing Rate (CVFR) model, widely used in neuroscience to describe the intertangled dynamics of excitatory biological neurons, is here trained and tested as a veritable dynamically assisted classifier. To this end the model is supplied with a set of planted attractors which are self-consistently embedded in the inter-nodes coupling matrix, via its spectral decomposition. Learning to classify amounts to sculp the basin of attraction of the imposed equilibria, directing different items towards the corresponding destination target, which reflects the class of respective pertinence. A stochastic variant of the CVFR model is also studied and found to be robust to aversarial random attacks, which corrupt the items to be classified. This remarkable finding is one of the very many surprising effects which arise when noise and dynamical attributes are made to mutually resonate.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# 皮質誘導性バイアスを伴うRNNにおける刺激と刺激の学習 Stimulus-to-Stimulus Learning in RNNs with Cortical Inductive Biases ( http://arxiv.org/abs/2409.13471v1 ) ライセンス: Link先を確認	Pantelis Vafidis, Antonio Rangel,	(参考訳) 動物は、条件付けのプロセスを通じて経験から外部の事象を予測することを学ぶ。条件付けの自然なメカニズムは刺激の置換であり、これまでの行動的意義のない刺激に対する神経反応は、それが確実に予測する行動学的に重要な刺激によって生成されるものと徐々に同一になる。本研究では,脳皮質における誘導バイアスの2つの形態を応用した刺激置換モデルを提案する。複合刺激表現の形式における表現誘導バイアスと,皮質連想学習の基本単位として機能することが示されている2成分錐体ニューロンの形式におけるアーキテクチャ誘導バイアスである。これらのニューロンの性質は、刺激置換を実装し、シナプスでローカルに利用可能な情報のみを利用する生物学的に妥当な学習規則を可能にする。本モデルでは, 各種条件付け現象を多岐にわたって生成し, 個々の実験課題のパラメータ微調整に頼らずに, 動物実験と共生する訓練量の関連性を学習できることを示す。対照的に、よく用いられるヘビアン規則は、混合選択性による一般的な刺激-刺激関連を学習できず、タスク固有のパラメータの微調整が必要であることを示す。我々の枠組みは、大脳皮質におけるマルチコンパートメントニューロン処理の重要性を強調し、大脳皮質動物を進化の端とみなす方法を示している。 Animals learn to predict external contingencies from experience through a process of conditioning. A natural mechanism for conditioning is stimulus substitution, whereby the neuronal response to a stimulus with no prior behavioral significance becomes increasingly identical to that generated by a behaviorally significant stimulus it reliably predicts. We propose a recurrent neural network model of stimulus substitution which leverages two forms of inductive bias pervasive in the cortex: representational inductive bias in the form of mixed stimulus representations, and architectural inductive bias in the form of two-compartment pyramidal neurons that have been shown to serve as a fundamental unit of cortical associative learning. The properties of these neurons allow for a biologically plausible learning rule that implements stimulus substitution, utilizing only information available locally at the synapses. We show that the model generates a wide array of conditioning phenomena, and can learn large numbers of associations with an amount of training commensurate with animal experiments, without relying on parameter fine-tuning for each individual experimental task. In contrast, we show that commonly used Hebbian rules fail to learn generic stimulus-stimulus associations with mixed selectivity, and require task-specific parameter fine-tuning. Our framework highlights the importance of multi-compartment neuronal processing in the cortex, and showcases how it might confer cortical animals the evolutionary edge.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# Flotta: セキュアでフレキシブルなSparkにインスパイアされたフェデレーション学習フレームワーク Flotta: a Secure and Flexible Spark-inspired Federated Learning Framework ( http://arxiv.org/abs/2409.13473v1 ) ライセンス: Link先を確認	Claudio Bonesana, Daniele Malpetti, Sandra Mitrović, Francesca Mangili, Laura Azzimonti,	(参考訳) Flottaは、バイオメディカルフィールドのような高度なセキュリティを必要とする状況下で研究を行う多党コンソーシアムに分散されたセンシティブなデータに基づいて機械学習モデルをトレーニングするために設計されたフェデレートラーニングフレームワークである。 FlottaはPythonパッケージで、Apache Sparkのいくつかの側面にインスパイアされたもので、柔軟性とセキュリティの両方を提供し、コンソーシアム内部のマシンのみを使用して研究を行うことができる。本稿では,フレームワークの主要なコンポーネントと,フレームワークの能力とセキュリティ,柔軟性,ユーザフレンドリさを強調する実践的なユースケースについて述べる。 We present Flotta, a Federated Learning framework designed to train machine learning models on sensitive data distributed across a multi-party consortium conducting research in contexts requiring high levels of security, such as the biomedical field. Flotta is a Python package, inspired in several aspects by Apache Spark, which provides both flexibility and security and allows conducting research using solely machines internal to the consortium. In this paper, we describe the main components of the framework together with a practical use case to illustrate the framework's capabilities and highlight its security, flexibility and user-friendliness.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# 大規模言語モデルにおける非学習ファクチュアル知識の代替選好最適化 Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models ( http://arxiv.org/abs/2409.13474v1 ) ライセンス: Link先を確認	Anmol Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid Hasan, Elita Lobo,	(参考訳) 機械学習は、特定のトレーニングデータの影響をモデルから効率的に排除することを目的としている。しかし、既存のLarge Language Models (LLMs) の未学習メソッドは、無視セットに関連する応答を抑えるために、負のフィードバックのみに頼っているため、しばしば非感覚的あるいは一貫性のないアウトプットが発生し、モデルの有用性を低下させ、潜在的なプライバシーリスクを生じさせる、という重大な課題に直面している。この制限に対処するため、我々はAltPO(Alternate Preference Optimization)と呼ばれる新しい手法を提案する。また,新たな評価指標を導入し,その評価基準の妥当性を検証した。大規模な実験により、我々のアプローチは効果的なアンラーニングを可能にするだけでなく、全体的なモデル性能を維持しながら、望ましくないモデル動作を避けることができることが示された。 Machine unlearning aims to efficiently eliminate the influence of specific training data, known as the forget set, from the model. However, existing unlearning methods for Large Language Models (LLMs) face a critical challenge: they rely solely on negative feedback to suppress responses related to the forget set, which often results in nonsensical or inconsistent outputs, diminishing model utility and posing potential privacy risks. To address this limitation, we propose a novel approach called Alternate Preference Optimization (AltPO), which combines negative feedback with in-domain positive feedback on the forget set. Additionally, we introduce new evaluation metrics to assess the quality of responses related to the forget set. Extensive experiments show that our approach not only enables effective unlearning but also avoids undesirable model behaviors while maintaining overall model performance.	翻訳日:2024-11-07 07:04:14 公開日:2024-09-20
# PLOT: 部品発見に対応する部分スロットアテンション付きテキストベースの人物検索 PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery ( http://arxiv.org/abs/2409.13475v1 ) ライセンス: Link先を確認	Jicheol Park, Dongwon Kim, Boseung Jeong, Suha Kwak,	(参考訳) 膨大な画像コレクション内の個人を特定するために自由形式のテキストクエリを利用するテキストベースの人物検索は、視覚的およびテキスト的表現、特に人間の部分レベルでの調整において、ユニークな課題を提示する。既存の手法は、直接的な部分レベルの監督やヒューリスティックな特徴への依存が欠如しているため、部分的な特徴抽出とアライメントに苦慮することが多い。本稿では、スロットアテンションに基づく部分発見モジュールを活用して、特異部分をモジュール間で自律的に識別・整列し、明示的な部分レベルの対応監督を伴わずに解釈可能性と検索精度を向上させる新しいフレームワークを提案する。さらに、テキストベースの動的部分注意は各部分の重要性を調整し、検索結果をさらに改善する。提案手法は3つの公開ベンチマークで評価され,既存手法よりも優れていた。 Text-based person search, employing free-form text queries to identify individuals within a vast image collection, presents a unique challenge in aligning visual and textual representations, particularly at the human part level. Existing methods often struggle with part feature extraction and alignment due to the lack of direct part-level supervision and reliance on heuristic features. We propose a novel framework that leverages a part discovery module based on slot attention to autonomously identify and align distinctive parts across modalities, enhancing interpretability and retrieval accuracy without explicit part-level correspondence supervision. Additionally, text-based dynamic part attention adjusts the importance of each part, further improving retrieval outcomes. Our method is evaluated on three public benchmarks, significantly outperforming existing methods.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# 皮膚科医のような説明可能なAIはメラノーマの診断精度を高める:眼球追跡研究 Dermatologist-like explainable AI enhances melanoma diagnosis accuracy: eye-tracking study ( http://arxiv.org/abs/2409.13476v1 ) ライセンス: Link先を確認	Tirtha Chanda, Sarah Haggenmueller, Tabea-Clara Bucher, Tim Holland-Letz, Harald Kittler, Philipp Tschandl, Markus V. Heppt, Carola Berking, Jochen S. Utikal, Bastian Schilling, Claudia Buerger, Cristian Navarrete-Dechent, Matthias Goebeler, Jakob Nikolas Kather, Carolin V. Schneider, Benjamin Durani, Hendrike Durani, Martin Jansen, Juliane Wacker, Joerg Wacker, Reader Study Consortium, Titus J. Brinker,	(参考訳) 人工知能(AI)システムは、皮膚科医のメラノーマの診断精度を大幅に改善し、説明可能なAI(XAI)システムは、臨床医のAIによる決定に対する信頼と信頼をさらに高めた。これらの進歩にもかかわらず、皮膚科医がAIとXAIの両方のツールとどのように関わるかの客観的評価には、依然として重要な必要性がある。そこで本研究では,76名の皮膚科医を対象に,XAIシステムを用いてメラノーマとネビの16例の皮膚内視鏡像の診断を行った。視線追跡技術は、その相互作用を評価するために用いられた。診断性能は、説明的特徴を欠いた標準的なAIシステムと比較された。以上の結果から,XAIシステムは診断精度を標準AIと比較して2.8ポイント向上した。さらに,AI/XAIシステムと複雑な病変との診断上の相違は,視力の増大による認知負荷の増加と関連していた。これらの知見は、臨床実践、視覚タスクのためのAIツールの設計、医療診断におけるXAIの広範な発展に重要な意味を持っている。 Artificial intelligence (AI) systems have substantially improved dermatologists' diagnostic accuracy for melanoma, with explainable AI (XAI) systems further enhancing clinicians' confidence and trust in AI-driven decisions. Despite these advancements, there remains a critical need for objective evaluation of how dermatologists engage with both AI and XAI tools. In this study, 76 dermatologists participated in a reader study, diagnosing 16 dermoscopic images of melanomas and nevi using an XAI system that provides detailed, domain-specific explanations. Eye-tracking technology was employed to assess their interactions. Diagnostic performance was compared with that of a standard AI system lacking explanatory features. Our findings reveal that XAI systems improved balanced diagnostic accuracy by 2.8 percentage points relative to standard AI. Moreover, diagnostic disagreements with AI/XAI systems and complex lesions were associated with elevated cognitive load, as evidenced by increased ocular fixations. These insights have significant implications for clinical practice, the design of AI tools for visual tasks, and the broader development of XAI in medical diagnostics.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# コンテンツ・スタイルモデリングに基づくガイド付きマルチコントラストMRI再構成のためのプラグ・アンド・プレイ法 A Plug-and-Play Method for Guided Multi-contrast MRI Reconstruction based on Content/Style Modeling ( http://arxiv.org/abs/2409.13477v1 ) ライセンス: Link先を確認	Chinmay Rao, Matthias van Osch, Nicola Pezzotti, Jeroen de Bresser, Laurens Beljaards, Jakob Meineke, Elwin de Weerdt, Huangling Lu, Mariya Doneva, Marius Staring,	(参考訳) 同じ解剖学の複数のMRIコントラストには冗長な情報が含まれているため、アンサンプされた後続のコントラストの再構築を導くための先行として、1コントラストが使用できる。この目的のために,学習に基づく指導的再構築手法が提案されている。しかし、2つの重要な課題が残っている。 (a)大規模なペアトレーニングデータセットの要件 b) モデルの内部表現の直感的な理解の欠如と共有情報の活用。本稿では,これらの課題に対処するため,ガイド付き再構築のためのモジュラー2段階アプローチを提案する。 2コントラスト画像データのコンテンツ/スタイルモデルは、ほとんど損なわれない方法で学習され、その後、反復再構成においてプラグ・アンド・プレイ演算子として適用される。内容とスタイルのアンタングル化は、コントラスト非依存およびコントラスト固有の要因の明示的な表現を可能にする。これに基づいて、事前情報を再構成に組み込むことにより、エイリアス化された再構成内容と参照スキャンから派生したクリーンコンテンツとを簡易に置き換えることができる。この手法をPnP-MUNITと呼ぶ。解釈可能性や収束性といった様々な側面をシミュレーションで調べる。さらに、その実用性はNYU fastMRI DICOMデータセットと2つの社内生データセットで実証され、与えられたSSIMの学習ベースの非誘導的再構成よりも最大32.6%の高速化が得られる。放射線学的な課題として、PnP-MUNITは診断品質における臨床再建よりも33.3%の加速を可能にした。 Since multiple MRI contrasts of the same anatomy contain redundant information, one contrast can be used as a prior for guiding the reconstruction of an undersampled subsequent contrast. To this end, several learning-based guided reconstruction methods have been proposed. However, two key challenges remain - (a) the requirement of large paired training datasets and (b) the lack of intuitive understanding of the model's internal representation and utilization of the shared information. We propose a modular two-stage approach for guided reconstruction, addressing these challenges. A content/style model of two-contrast image data is learned in a largely unpaired manner and is subsequently applied as a plug-and-play operator in iterative reconstruction. The disentanglement of content and style allows explicit representation of contrast-independent and contrast-specific factors. Based on this, incorporating prior information into the reconstruction reduces to simply replacing the aliased reconstruction content with clean content derived from the reference scan. We name this novel approach PnP-MUNIT. Various aspects like interpretability and convergence are explored via simulations. Furthermore, its practicality is demonstrated on the NYU fastMRI DICOM dataset and two in-house raw datasets, obtaining up to 32.6% more acceleration over learning-based non-guided reconstruction for a given SSIM. In a radiological task, PnP-MUNIT allowed 33.3% more acceleration over clinical reconstruction at diagnostic quality.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# ディバイドとコンカレントに基づくシンボル脆弱性検出 Divide and Conquer based Symbolic Vulnerability Detection ( http://arxiv.org/abs/2409.13478v1 ) ライセンス: Link先を確認	Christopher Scherb, Luc Bryan Heitz, Hermann Grieder,	(参考訳) 現代のソフトウェア開発では、複雑なソフトウェアシステムのバグや脆弱性が避けられないため、脆弱性検出が不可欠である。テストフェーズにおけるこれらの脆弱性の検出と排除が不可欠である。ファジィングなどの現在の手法はこの目的のために広く用いられている。ファジィングは、ランダムな突然変異や世代を用いて広範囲のバグや脆弱性を特定するのに効率的であるが、脆弱性の正しさや欠如を保証しない。したがって、重要インフラと制御システムの安全性と安全性を確保するために、非ランダムな手法が好ましい。本稿では,各種ソフトウェア脆弱性を特定するために,シンボル実行と制御フローグラフ解析に基づく脆弱性検出手法を提案する。提案手法では,無関係なプログラム情報を排除し,その処理を高速化し,従来のシンボル実行法やモデル検査法と比較して大規模プログラムの解析を可能にする。 In modern software development, vulnerability detection is crucial due to the inevitability of bugs and vulnerabilities in complex software systems. Effective detection and elimination of these vulnerabilities during the testing phase are essential. Current methods, such as fuzzing, are widely used for this purpose. While fuzzing is efficient in identifying a broad range of bugs and vulnerabilities by using random mutations or generations, it does not guarantee correctness or absence of vulnerabilities. Therefore, non-random methods are preferable for ensuring the safety and security of critical infrastructure and control systems. This paper presents a vulnerability detection approach based on symbolic execution and control flow graph analysis to identify various types of software weaknesses. Our approach employs a divide-and-conquer algorithm to eliminate irrelevant program information, thus accelerating the process and enabling the analysis of larger programs compared to traditional symbolic execution and model checking methods.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# Invertible ResNets for Inverse Imaging Problems: Competitive Performance with Provable Regularization Properties (特集:情報ネットワーク) Invertible ResNets for Inverse Imaging Problems: Competitive Performance with Provable Regularization Properties ( http://arxiv.org/abs/2409.13482v1 ) ライセンス: Link先を確認	Clemens Arndt, Judith Nickel,	(参考訳) 学習に基づく手法は、逆問題、特に画像再構成タスクにおいて顕著な性能を示した。彼らの成功にもかかわらず、これらのアプローチは理論的な保証を欠くことが多く、医療画像のようなセンシティブな応用に不可欠である。 Arndt et al (2023 Inverse Problems 39 125018, 2024 Inverse Problems 40 045021) による最近の研究は、非可逆残差ネットワーク (iResNets) に基づくデータ駆動再構築法を解析することによって、このギャップに対処している。彼らは合理的な仮定の下で、このアプローチが収束正則化スキームを構成することを明らかにした。しかし, 再現法の性能は, 学術的な玩具問題や小型のiResNetアーキテクチャでのみ検証された。本研究では,2つの実世界の画像処理タスク(線形ぼやけた演算子と非線形拡散演算子)におけるiResNetsの性能を評価することで,このギャップに対処する。そのため、Arndtらによる理論的結果のいくつかを非線形逆問題を含むように拡張し、大規模高性能iResNetアーキテクチャの設計に対する洞察を提供する。数値実験により,iResNetモデルの性能を最先端のニューラルネットワークと比較し,その有効性を確認した。さらに,本手法の理論的保証を数値的に検討し,ネットワークの可逆性によって学習したフォワード演算子とその学習正規化をより深く解析できることを示す。 Learning-based methods have demonstrated remarkable performance in solving inverse problems, particularly in image reconstruction tasks. Despite their success, these approaches often lack theoretical guarantees, which are crucial in sensitive applications such as medical imaging. Recent works by Arndt et al (2023 Inverse Problems 39 125018, 2024 Inverse Problems 40 045021) addressed this gap by analyzing a data-driven reconstruction method based on invertible residual networks (iResNets). They revealed that, under reasonable assumptions, this approach constitutes a convergent regularization scheme. However, the performance of the reconstruction method was only validated on academic toy problems and small-scale iResNet architectures. In this work, we address this gap by evaluating the performance of iResNets on two real-world imaging tasks: a linear blurring operator and a nonlinear diffusion operator. To do so, we extend some of the theoretical results from Arndt et al to encompass nonlinear inverse problems and offer insights for the design of large-scale performant iResNet architectures. Through numerical experiments, we compare the performance of our iResNet models against state-of-the-art neural networks, confirming their efficacy. Additionally, we numerically investigate the theoretical guarantees of this approach and demonstrate how the invertibility of the network enables a deeper analysis of the learned forward operator and its learned regularization.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# 音声によるオープンドメイン質問応答に対する多モーダルDense Retrievalアプローチ A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering ( http://arxiv.org/abs/2409.13483v1 ) ライセンス: Link先を確認	Georgios Sidiropoulos, Evangelos Kanoulas,	(参考訳) 音声インタフェースを介してQAシステムと対話するユーザの増加に伴い、音声ベースのオープンドメイン質問応答(大量のコーパスと音声質問を含むQA)が重要な課題となっている。音声ベースのオープンドメインQAでは,パス検索が重要な課題である。これまでの研究では、高密度テキストレトリバーに入力する前に音声質問を書き起こす自動音声認識(ASR)モデルによるパイプラインを採用していた。このようなパイプラインにはいくつかの制限がある。 ASRモデルの必要性は、アノテートされた音声データを持たない低リソース言語や特殊なドメインに適用性を制限する。さらに、ASRモデルは、そのエラーをレトリバーに伝達する。本研究では、音声質問を直接処理可能な、ASRフリーでエンドツーエンドにトレーニングされた多モーダル高密度検索器を提案することにより、これらの制限を緩和しようとする。以上の結果から,ASRが重要な単語を誤って書き起こした場合や,単語誤り率の高い書き起こしを発生させた場合に,検索性能が向上する可能性が示唆された。 Speech-based open-domain question answering (QA over a large corpus of text passages with spoken questions) has emerged as an important task due to the increasing number of users interacting with QA systems via speech interfaces. Passage retrieval is a key task in speech-based open-domain QA. So far, previous works adopted pipelines consisting of an automatic speech recognition (ASR) model that transcribes the spoken question before feeding it to a dense text retriever. Such pipelines have several limitations. The need for an ASR model limits the applicability to low-resource languages and specialized domains with no annotated speech data. Furthermore, the ASR model propagates its errors to the retriever. In this work, we try to alleviate these limitations by proposing an ASR-free, end-to-end trained multimodal dense retriever that can work directly on spoken questions. Our experimental results showed that, on shorter questions, our retriever is a promising alternative to the \textit{ASR and Retriever} pipeline, achieving better retrieval performance in cases where ASR would have mistranscribed important words in the question or have produced a transcription with a high word error rate.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# 「弁護士は男性である」:LLMによるヒンディー語生成におけるインプシットジェンダーバイアスの検討 'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi Language Generation by LLMs ( http://arxiv.org/abs/2409.13484v1 ) ライセンス: Link先を確認	Ishika Joshi, Ishita Gupta, Adrita Dey, Tapan Parikh,	(参考訳) 大きな言語モデル(LLM)は、翻訳、顧客サポート、教育などのタスクのために、様々な言語でテキストを生成するためにますます使われています。これらの進歩にもかかわらず、LLMは英語で顕著なジェンダーバイアスを示しており、ヒンディー語のような比較的表現の浅い言語でコンテンツを生成する際にさらに顕著になる。本研究はヒンディー語のテキスト生成における性差の暗黙的偏見を調査し,それを英語のそれと比較する。我々はWinoBiasにインスパイアされたHindiデータセットを開発し、GPT-4oやClaude-3 sonnetといったモデルからの応答のステレオタイプパターンを調べた。その結果、ヒンディー語では87.8%、英語のGPT-4o世代では33.4%、ヒンディー語では職業、権力階層、社会階級といったジェンダーステレオタイプが多かった。この研究は、言語間での性別バイアスの変化を強調し、生成的AIシステムにおいてこれらのバイアスをナビゲートするための考察を提供する。 Large Language Models (LLMs) are increasingly being used to generate text across various languages, for tasks such as translation, customer support, and education. Despite these advancements, LLMs show notable gender biases in English, which become even more pronounced when generating content in relatively underrepresented languages like Hindi. This study explores implicit gender biases in Hindi text generation and compares them to those in English. We developed Hindi datasets inspired by WinoBias to examine stereotypical patterns in responses from models like GPT-4o and Claude-3 sonnet. Our results reveal a significant gender bias of 87.8% in Hindi, compared to 33.4% in English GPT-4o generation, with Hindi responses frequently relying on gender stereotypes related to occupations, power hierarchies, and social class. This research underscores the variation in gender biases across languages and provides considerations for navigating these biases in generative AI systems.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# 大規模言語モデルにおけるミンド理論の強化のための制約付き推論チェイン Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models ( http://arxiv.org/abs/2409.13490v1 ) ライセンス: Link先を確認	Zizheng Lin, Chunkit Chan, Yangqiu Song, Xin Liu,	(参考訳) LLM(Large Language Models)が持つ理論-of-Mind(ToM)能力は制限されている。 LLMにおけるToMの改善手法の多くはゼロショットプロンプトを採用しており、複雑なToM推論タスクのパフォーマンスの低下や、非ナラティブコンテキストを扱うことができないといった問題に直面している。本稿では、ドメイン知識とToM次元間の因果関係を利用してこれらの制約に対処する、制約付きチェーン・オブ・ToM(CCoToM)というゼロショットプロンプト手法を提案する。具体的には、CCoToM は LLM に対して、まず LLM に関連する ToM 次元(例えば、信念)を推論するように促すことにより、明示的な推論連鎖を構築するよう誘導する。その後、CCoToMは、生成されたToM次元とそれに対応する因果関係に基づいて、問い合わせされたToM次元を推測するようにLCMに促す。さらに、CCoToMはインダクティブバイアスを導入し、ToM次元間の一貫性を改善するプロンプトに適応的に制約を課す。物語の他に、CCoToMは会話のような物語的でないコンテキストも扱える。大規模な実験により、CCoToMはすべてのLLMとデータセットに対して、従来の最先端の手法をはるかに上回っていることが示されている。また,CCoToMについてより深い知見を得るため,詳細な分析を行う。コードを公開しました。 Theory-of-Mind (ToM) ability possessed by Large Language Models (LLMs) has been shown to be limited. Most existing methods for improving ToM in LLMs adopt zero-shot prompting, and they face challenges including poor performance in complex ToM reasoning tasks and an inability to handle non-narrative contexts. We propose a zero-shot prompting method named Constrained Chain-of-ToM (CCoToM) that leverages domain knowledge and the causal relations between ToM dimensions to address these limitations. Specifically, CCoToM guides LLMs to construct explicit reasoning chains by first prompting LLMs to infer related ToM dimensions (e.g., belief). Afterward, CCoToM prompts LLMs to infer the queried ToM dimension based on the generated related ToM dimensions and corresponding causal relations. Additionally, CCoToM adaptively imposes constraints on prompts to introduce inductive biases and improve consistency between ToM dimensions. Besides narratives, CCoToM can also handle non-narrative contexts like conversations. Extensive experiments show that CCoToM consistently outperforms previous state-of-the-art methods by large margins across all LLMs and datasets used. We also conduct in-depth analyses to gain deeper insights into CCoToM. We have made our code publicly available.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# DAP-LED:CLIPによる低照度化と劣化の学習 DAP-LED: Learning Degradation-Aware Priors with CLIP for Joint Low-light Enhancement and Deblurring ( http://arxiv.org/abs/2409.13496v1 ) ライセンス: Link先を確認	Ling Wang, Chen Wu, Lin Wang,	(参考訳) 自律走行車やロボットは、RGBカメラの長時間露光による照度と動きのぼかしが低いため、夜間に信頼できる視覚に苦しむことが多い。既存の手法はこの課題に対処し、既訓練の低照度エンハンスメントとデブロアリングモデルを順次接続する。残念なことに、これらの手法は、過剰に露光された領域における顕著な人工物(色歪み)を引き起こすか、暗黒領域の運動キューをほとんど学ばないようにする。本稿では,視覚言語モデルであるCLIP(Contrastive Language- Image Pretraining)が,夜間における多様な劣化レベルを包括的に知覚できることを示す。そこで本研究では,低照度化と劣化を共同で実現し,深度推定,セグメンテーション,暗黒領域の検出といった下流作業に役立てる,トランスフォーマーを用いた新しい共同学習フレームワーク DAP-LED を提案する。重要な洞察は、CLIPを活用して、夜間に画像から劣化レベルを適応的に学習することだ。これにより、統合タスクの最適化のためのリッチな意味情報と視覚的表現を学習することができる。これを実現するために、まずCLIP誘導クロスフュージョンモジュールを導入し、画像埋め込みからマルチスケールのパッチワイズ分解ヒートマップを得る。熱マップは設計したCLIP拡張変換器ブロックを介して融合され、効率的なモデル最適化のための有用な劣化情報を保持する。実験の結果,既存の手法と比較して,DAP-LEDは暗黒環境での最先端性能を実現していることがわかった。一方、強化された結果は3つの下流タスクに有効であることが示されている。デモやその他の結果については、プロジェクトページを参照してほしい。 Autonomous vehicles and robots often struggle with reliable visual perception at night due to the low illumination and motion blur caused by the long exposure time of RGB cameras. Existing methods address this challenge by sequentially connecting the off-the-shelf pretrained low-light enhancement and deblurring models. Unfortunately, these methods often lead to noticeable artifacts (\eg, color distortions) in the over-exposed regions or make it hardly possible to learn the motion cues of the dark regions. In this paper, we interestingly find vision-language models, \eg, Contrastive Language-Image Pretraining (CLIP), can comprehensively perceive diverse degradation levels at night. In light of this, we propose a novel transformer-based joint learning framework, named DAP-LED, which can jointly achieve low-light enhancement and deblurring, benefiting downstream tasks, such as depth estimation, segmentation, and detection in the dark. The key insight is to leverage CLIP to adaptively learn the degradation levels from images at night. This subtly enables learning rich semantic information and visual representation for optimization of the joint tasks. To achieve this, we first introduce a CLIP-guided cross-fusion module to obtain multi-scale patch-wise degradation heatmaps from the image embeddings. Then, the heatmaps are fused via the designed CLIP-enhanced transformer blocks to retain useful degradation information for effective model optimization. Experimental results show that, compared to existing methods, our DAP-LED achieves state-of-the-art performance in the dark. Meanwhile, the enhanced results are demonstrated to be effective for three downstream tasks. For demo and more results, please check the project page: \url{https://vlislab22.github.io/dap-led/}.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# ハイパースペクトルイメージングによる画素レベルの物質分類のための深層学習手法 A Deep Learning Approach for Pixel-level Material Classification via Hyperspectral Imaging ( http://arxiv.org/abs/2409.13498v1 ) ライセンス: Link先を確認	Savvas Sifnaios, George Arvanitakis, Fotios K. Konstantinidis, Georgios Tsimiklis, Angelos Amditis, Panayiotis Frangos,	(参考訳) コンピュータビジョンの最近の進歩、特に検出、セグメンテーション、分類は、様々な領域に大きな影響を与えている。しかし、これらの進歩はRGBベースのシステムと結びついており、廃棄物の選別、医薬品、防衛といった産業において、形状や色を超えた高度な物体のキャラクタリゼーションが必要とされるには不十分である。ハイパースペクトル(HS)イメージングは、スペクトル情報と空間情報の両方を撮像し、これらの制限に対処し、特に速度、コスト、安全性の点で、X線蛍光やラマン分光のような従来の技術よりも有利である。本研究では,HSイメージングと深層学習を併用した材料評価の可能性について検討した。研究は以下のとおりである。一 HSカメラ、コンベア及び制御照明を備えた実験装置を設計すること。二半自動マスク生成及びラマン分光法によるラベル付けによる各種プラスチック(HDPE、PET、PP、PS)の多目的データセットの作成三画素レベルの物質分類のためのHS画像に基づいて訓練された深層学習モデルを開発すること。このモデルは99.94\%の分類精度を達成し、色、サイズ、形状のばらつきの堅牢性を証明し、材料重なりを効果的に処理した。ブラックオブジェクトの課題のような制限も議論されている。 RGBからHSイメージングへのコンピュータビジョンの拡張は実現可能であり、従来の手法の大きな制限を克服し、将来的な応用の可能性を示している。 Recent advancements in computer vision, particularly in detection, segmentation, and classification, have significantly impacted various domains. However, these advancements are tied to RGB-based systems, which are insufficient for applications in industries like waste sorting, pharmaceuticals, and defense, where advanced object characterization beyond shape or color is necessary. Hyperspectral (HS) imaging, capturing both spectral and spatial information, addresses these limitations and offers advantages over conventional technologies such as X-ray fluorescence and Raman spectroscopy, particularly in terms of speed, cost, and safety. This study evaluates the potential of combining HS imaging with deep learning for material characterization. The research involves: i) designing an experimental setup with HS camera, conveyor, and controlled lighting; ii) generating a multi-object dataset of various plastics (HDPE, PET, PP, PS) with semi-automated mask generation and Raman spectroscopy-based labeling; and iii) developing a deep learning model trained on HS images for pixel-level material classification. The model achieved 99.94\% classification accuracy, demonstrating robustness in color, size, and shape invariance, and effectively handling material overlap. Limitations, such as challenges with black objects, are also discussed. Extending computer vision beyond RGB to HS imaging proves feasible, overcoming major limitations of traditional methods and showing strong potential for future applications.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# HUT: Adamard Updated Transformationによるより効率的なファインチューニング手法 HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation ( http://arxiv.org/abs/2409.13501v1 ) ライセンス: Link先を確認	Geyuan Zhang, Xiaofei Zhou, Chuheng Chen,	(参考訳) 下流タスクのための微調整済み言語モデルが、NLPで素晴らしい成果を上げている。しかし,モデルパラメータが急速に大きくなるため,パラメータの微調整は不可能となる。これを解決するために、パラメータ効率の良いファインチューニング(PEFT)メソッドはパラメータのサブセットだけを更新する。 LoRAのようなほとんどのPEFTメソッドは、元のパラメータに学習された重み行列の増分を含むインクリメンタルアップデートを使用する。有効ではあるが、これらの手法は複雑なパラメータのダイナミックスをキャプチャする際の制限に直面し、元のパラメータと更新されたパラメータの間に強い相関は保たない。これらの課題を克服するために,元のパラメータから更新パラメータへの変換を直接構成する直接更新変換(UT)パラダイムを提案する。このアプローチにより、元のパラメータと更新されたパラメータの相関が保存されることが保証され、事前トレーニング中に学んだ意味的特徴が活用される。このパラダイムに基づいて,Hadamard Updated Transformation (HUT) 法を提案する。 HUTは、2つの低ランク行列でアダマール変換を用いて元の重量行列を効率的に更新し、より表現力が高く柔軟な更新機構を提供する。これによりHUTは、関数変換によってよりリッチなパラメータ機能をキャプチャし、モデル品質を維持したり改善したりしながら、計算の複雑さを低減できる。 RoBERTaおよびGPT-2に関する理論的解析と広範な実験により、HUTの有効性が検証された。その結果,HUTはモデル品質の観点から他のPEFT法と同等以上の性能を示し,計算複雑性を著しく低減した。 Fine-tuning pre-trained language models for downstream tasks has achieved impressive results in NLP. However, fine-tuning all parameters becomes impractical due to the rapidly increasing size of model parameters. To address this, Parameter Efficient Fine-Tuning (PEFT) methods update only a subset of parameters. Most PEFT methods, such as LoRA, use incremental updates, which involve adding learned weight matrix increments to the original parameters. Although effective, these methods face limitations in capturing complex parameter dynamics and do not maintain a strong correlation between the original and updated parameters. To overcome these challenges, we propose the direct Updated Transformation (UT) paradigm, which constructs a transformation directly from the original to the updated parameters. This approach ensures that the correlation between the original and updated parameters is preserved, leveraging the semantic features learned during pre-training. Building on this paradigm, we present the Hadamard Updated Transformation (HUT) method. HUT efficiently updates the original weight matrix using the Hadamard transformation with two low-rank matrices, offering a more expressive and flexible update mechanism. This allows HUT to capture richer parameter features through functional transformations, reducing computational complexity while maintaining or improving model quality. Theoretical analysis and extensive experiments on RoBERTa and GPT-2 validate the effectiveness of HUT. Results show that HUT performs on par with or better than other PEFT methods in terms of model quality, while significantly reducing computational complexity.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# 音声によるスケッチ:「非現実的」な音のレンダリング Sketching With Your Voice: "Non-Phonorealistic" Rendering of Sounds via Vocal Imitation ( http://arxiv.org/abs/2409.13507v1 ) ライセンス: Link先を確認	Matthew Caren, Kartik Chandra, Joshua B. Tenenbaum, Jonathan Ragan-Kelley, Karima Ma,	(参考訳) 本研究では,人間の声の模倣を自動生成する手法を提案する。まず、人間の声道の模擬モデルから、まずモデルの制御パラメータを調整して声道模倣を試み、その合成音声を聴覚的特徴の観点から対象音と一致させる。そして,人間の直感に合うようにコミュニケーションの認知理論を適用し,人間の話者が聴取者に対して戦略的に判断する方法について考察する。最後に,本手法にこのようなコミュニケーション的推論を加えると,聴覚的特徴のみに適合するよりも人間の直感に適合することを示す実験とユーザスタディについて述べる。この観察はコンピュータグラフィックスにおける描写の研究に幅広い意味を持っている。 We present a method for automatically producing human-like vocal imitations of sounds: the equivalent of "sketching," but for auditory rather than visual representation. Starting with a simulated model of the human vocal tract, we first try generating vocal imitations by tuning the model's control parameters to make the synthesized vocalization match the target sound in terms of perceptually-salient auditory features. Then, to better match human intuitions, we apply a cognitive theory of communication to take into account how human speakers reason strategically about their listeners. Finally, we show through several experiments and user studies that when we add this type of communicative reasoning to our method, it aligns with human intuitions better than matching auditory features alone does. This observation has broad implications for the study of depiction in computer graphics.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# 正規化変分量子イマジナリー時間進化によるシュウィンガーモデルのシミュレーション Simulating the Schwinger Model with a Regularized Variational Quantum Imaginary Time Evolution ( http://arxiv.org/abs/2409.13510v1 ) ライセンス: Link先を確認	Xiao-Wei Li, Fei Li, Jiapei Zhuang, Man-Hong Yung,	(参考訳) シュウィンガーモデル(Schwinger model)は量子色力学(QCD)における非摂動アルゴリズムのテストのベンチマークとして機能し、強い結合状態におけるQCDとの類似性を強調している。しかし、古典的アルゴリズムは「符号問題」や大規模システム処理の難しさなど、シュウィンガーモデルをシミュレートする際の課題に直面する。これらの制限は、障害を克服するために量子コンピューティング技術を含む代替シミュレーションアプローチの探索を動機付けている。シュウィンガーモデルをシミュレートする既存の変分量子アルゴリズム(VQA)は、主に数学的勾配に基づく最適化に依存しており、直感的かつ物理的に誘導された最適化経路を提供しないこともある。対照的に、変分量子イマジナリー時間進化法(VQITE)は、物理的に着想を得た最適化手法を提供する。したがって、VQITEはSchwingerモデルをシミュレートするための強力なツールである。しかし, 標準VQITE法は, 非可逆行列問題に悩まされるため, 十分に安定ではない。この問題に対処するため,我々は正規化VQITE法(regularized-VQITE (rVQITE)) と呼ばれるVQITEの正規化バージョンを提案した。数値シミュレーションにより,提案手法は性能が向上し,他の手法と比較して収束が速いことを示す。我々は、シュウィンガーモデルにおいて様々な物理観測値の位相図をシミュレートするためにrVQITE法を用い、その結果の位相境界は正確な計算手法から得られるものと一致している。 The Schwinger model serves as a benchmark for testing non-perturbative algorithms in quantum chromodynamics (QCD), emphasizing its similarities to QCD in strong coupling regimes, primarily due to the phenomena such as confinement and charge screening. However, classical algorithms encounter challenges when simulating the Schwinger model, such as the "sign problem" and the difficulty in handling large-scale systems. These limitations motivate the exploration of alternative simulation approaches, including quantum computing techniques, to overcome the obstacles. While existing variational quantum algorithms (VQAs) methods for simulating the Schwinger model primarily rely on mathematical gradient-based optimization, which sometimes fail to provide intuitive and physically-guided optimization pathways. In contrast, the Variational Quantum Imaginary Time Evolution (VQITE) method offers a physically-inspired optimization approach. Therefore, we introduce that VQITE holds promise as a potent tool for simulating the Schwinger model. However, the standard VQITE method is not sufficiently stable, as it encounters difficulties with the non-invertible matrix problem. To address this issue, we have proposed a regularized version of the VQITE, which we have named the Regularized-VQITE (rVQITE) method, as it incorporates a truncation-based approach. Through numerical simulations, we demonstrate that our proposed rVQITE approach achieves better performance and exhibits faster convergence compared to other related techniques. We employ the rVQITE method to simulate the phase diagrams of various physical observables in the Schwinger model, and the resulting phase boundaries are in agreement with those obtained from an exact computational approach.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# ユニバーサル画像検索のための効率的・識別的特徴抽出 Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval ( http://arxiv.org/abs/2409.13513v1 ) ライセンス: Link先を確認	Morris Florek, David Tschirschwitz, Björn Barz, Volker Rodehorst,	(参考訳) 現在の画像検索システムはドメインの特異性や一般化の問題に直面することが多い。本研究の目的は、様々な領域にまたがる強力な意味的イメージ表現を提供する普遍的特徴抽出器のための、計算効率の良いトレーニングフレームワークを開発することにより、これらの制限を克服することである。この目的のために、リソース効率のトレーニングを可能にするM4D-35kと呼ばれるマルチドメイントレーニングデータセットをキュレートしました。さらに、効率的な普遍的特徴抽出に適合するかどうかについて、最先端のビジュアルセマンティック基礎モデルとマージンに基づく距離学習損失関数の広範な評価と比較を行う。制約のある計算資源にもかかわらず、Google Universal Image Embedding Challengeにおいて、mMP@5の0.721で最先端の成果を達成している。これにより、ベストパフォーマンスメソッドのわずか0.7ポイントのリードボードに、私たちのメソッドを第2位に配置します。しかし、我々のモデルは、全体的なパラメータが32%少なく、トレーニング可能なパラメータが289倍少ない。類似の計算条件を持つ手法と比較して,従来の最先端の手法よりも3.3パーセント高い性能を示した。私たちはコードとM4D-35kのトレーニングセットアノテーションをhttps://github.com/morrisfl/UniFExでリリースしています。 Current image retrieval systems often face domain specificity and generalization issues. This study aims to overcome these limitations by developing a computationally efficient training framework for a universal feature extractor that provides strong semantic image representations across various domains. To this end, we curated a multi-domain training dataset, called M4D-35k, which allows for resource-efficient training. Additionally, we conduct an extensive evaluation and comparison of various state-of-the-art visual-semantic foundation models and margin-based metric learning loss functions regarding their suitability for efficient universal feature extraction. Despite constrained computational resources, we achieve near state-of-the-art results on the Google Universal Image Embedding Challenge, with a mMP@5 of 0.721. This places our method at the second rank on the leaderboard, just 0.7 percentage points behind the best performing method. However, our model has 32% fewer overall parameters and 289 times fewer trainable parameters. Compared to methods with similar computational requirements, we outperform the previous state of the art by 3.3 percentage points. We release our code and M4D-35k training set annotations at https://github.com/morrisfl/UniFEx.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# Transducer-based ASRのためのAho-Corasickアルゴリズムを用いたLM支援キーワードバイアス LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR ( http://arxiv.org/abs/2409.13514v1 ) ライセンス: Link先を確認	Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Andres Carofilis, Shashi Kumar, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju,	(参考訳) 近年の音声認識におけるエンドツーエンドモデルの成功にもかかわらず、特殊で語彙外な単語の認識や、テキストによる高速なドメイン適応は依然として困難である。特別なエンティティへのバイアスが全体的なパフォーマンスの低下につながることはよくあることです。単語レベルn-gram言語モデルとAho-Corasick文字列マッチングアルゴリズムに基づく浅層融合アプローチを組み合わせ,名前付きエンティティのバイアスリストを組み合わせることで,音声認識性能を向上させるためのライトオンザフライ方式を提案する。 Aho-Corasickアルゴリズムは他の手法よりも効率的であることが証明され、高速な文脈適応が可能となった。 n-gram言語モデル(n-gram language model)は、失敗と出力のアークを持つグラフとして導入され、アーク重みはn-gram確率から適応される。言語モデルは、言語モデルと1つのコンテキストグラフのバイアスエンティティを組み合わせることで、全体的なパフォーマンスを気にするときに、キーワードバイアスの追加サポートとして使用される。我々は、名前付きエンティティや語彙外エンティティのパフォーマンスを含む、4つの言語、2つのパブリック、および1つのプライベートデータセットに関する知見を実証した。逆実時間係数の実用的差のない一般単語誤り率の21.6%の相対的な改善を実現した。 Despite the recent success of end-to-end models for automatic speech recognition, recognizing special rare and out-of-vocabulary words, as well as fast domain adaptation with text, are still challenging. It often happens that biasing to the special entities leads to a degradation in the overall performance. We propose a light on-the-fly method to improve automatic speech recognition performance by combining a bias list of named entities with a word-level n-gram language model with the shallow fusion approach based on the Aho-Corasick string matching algorithm. The Aho-Corasick algorithm has proved to be more efficient than other methods and allows fast context adaptation. An n-gram language model is introduced as a graph with fail and output arcs, where the arc weights are adapted from the n-gram probabilities. The language model is used as an additional support to keyword biasing when the language model is combined with bias entities in a single context graph to take care of the overall performance. We demonstrate our findings on 4 languages, 2 public and 1 private datasets including performance on named entities and out-of-vocabulary entities. We achieve up to 21.6% relative improvement in the general word error rate with no practical difference in the inverse real-time factor.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# 人工衛星と地上の量子ネットワークのための効率的な絡み合いルーティング Efficient Entanglement Routing for Satellite-Aerial-Terrestrial Quantum Networks ( http://arxiv.org/abs/2409.13517v1 ) ライセンス: Link先を確認	Yu Zhang, Yanmin Gong, Lei Fan, Yu Wang, Zhu Han, Yuanxiong Guo,	(参考訳) 6G以降の時代には、宇宙と地上の量子ネットワーク(SATQN)が、グローバルスケールの量子インターネットの未来を形作っている。本稿では, 衛星, 空中, 地上の量子ネットワーク間の協調関係について検討し, 長距離での高忠実な量子絡み合いを効率よく伝達する。まず、既存の衛星、空中、地上の量子ネットワークの概要を概観する。その後、経路選択と絡み合い発生率(PS-EGR)を共同で最適化することにより、量子ネットワークスループットを最大化する目的で、絡み合いルーティング問題に対処する。元の問題は、本質的に難解な混合整数線形プログラミング(MILP)問題として定式化されていることを考慮し、この問題を効率的に解くためにベンダー分解法(BD)ベースのアルゴリズムを提案する。数値計算により,PS-EGR方式の有効性が検証され,システム内の様々な最適化可能な要因について貴重な知見が得られた。最後に, SATQN における今後の研究に向けて, 今後の課題について検討し, 今後の課題を提示する。 In the era of 6G and beyond, space-aerial-terrestrial quantum networks (SATQNs) are shaping the future of the global-scale quantum Internet. This paper investigates the collaboration among satellite, aerial, and terrestrial quantum networks to efficiently transmit high-fidelity quantum entanglements over long distances. We begin with a comprehensive overview of existing satellite-, aerial-, and terrestrial-based quantum networks. Subsequently, we address the entanglement routing problem with the objective of maximizing quantum network throughput by jointly optimizing path selection and entanglement generation rates (PS-EGR). Given that the original problem is formulated as a mixed-integer linear programming (MILP) problem, which is inherently intractable, we propose a Benders' decomposition (BD)-based algorithm to solve the problem efficiently. Numerical results validate the effectiveness of the proposed PS-EGR scheme, offering valuable insights into various optimizable factors within the system. Finally, we discuss the current challenges and propose promising avenues for future research in SATQNs.	翻訳日:2024-11-07 06:53:09 公開日:2024-09-20
# モラル基礎理論と事前学習言語モデル:現状と課題 A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges ( http://arxiv.org/abs/2409.13521v1 ) ライセンス: Link先を確認	Lorenzo Zangari, Candida M. Greco, Davide Picca, Andrea Tagarelli,	(参考訳) 道徳的価値は初期の文明に深く根ざし、社会秩序と共通の善を規制する規範や法則の中で成文化された。人間の行動と文化的指向の心理的基盤を理解する上で重要な役割を担っている。モラル・ファンデーション理論(MFT)は、異なる文化が個人や社会生活を形作る方法の基礎となる道徳的基盤を識別する確立した枠組みである。自然言語処理,特にプレトレーニング言語モデル(PLM)の最近の進歩は,テキストデータから道徳的次元の抽出と分析を可能にしている。本調査では, MFT インフォームド PLM の総合的なレビューを行い, PLM の道徳的傾向とその MFT の文脈における応用について分析した。また、関連するデータセットやレキシコンをレビューし、トレンド、制限、今後の方向性について議論する。 PLMとMFTの交差点の構造的な概要を提供することにより、この研究はPLMの領域内の道徳心理学的洞察を橋渡しし、道徳的に意識されたAIシステムを構築するためのさらなる研究と開発の道を開く。 Moral values have deep roots in early civilizations, codified within norms and laws that regulated societal order and the common good. They play a crucial role in understanding the psychological basis of human behavior and cultural orientation. The Moral Foundation Theory (MFT) is a well-established framework that identifies the core moral foundations underlying the manner in which different cultures shape individual and social lives. Recent advancements in natural language processing, particularly Pre-trained Language Models (PLMs), have enabled the extraction and analysis of moral dimensions from textual data. This survey presents a comprehensive review of MFT-informed PLMs, providing an analysis of moral tendencies in PLMs and their application in the context of the MFT. We also review relevant datasets and lexicons and discuss trends, limitations, and future directions. By providing a structured overview of the intersection between PLMs and MFT, this work bridges moral psychology insights within the realm of PLMs, paving the way for further research and development in creating morally aware AI systems.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# EMMeTT:効率的なマルチモーダル機械翻訳訓練 EMMeTT: Efficient Multimodal Machine Translation Training ( http://arxiv.org/abs/2409.13523v1 ) ライセンス: Link先を確認	Piotr Żelasko, Zhehuai Chen, Mengru Wang, Daniel Galvez, Oleksii Hrinchuk, Shuoyang Ding, Ke Hu, Jagadeesh Balam, Vitaly Lavrukhin, Boris Ginsburg,	(参考訳) 基礎言語モデルのモダリティ拡張に対する関心の高まりは、最も効果的で効率的なマルチモーダルトレーニングアプローチに関する議論を保証している。本研究は、ニューラルマシン翻訳(NMT)に焦点を当て、自動音声翻訳(AST)を含む音声-LLMの共同マルチモーダルトレーニングシステムを提案する。本稿では,Canary-1Bの音声エンコーダで拡張されたデコーダのみのGPTとエンコーダ・デコーダT5の2つの基盤モデルアーキテクチャについて検討する。共同マルチモーダルトレーニングを扱うために,EMMeTTと呼ばれる新しいトレーニングフレームワークを提案する。 EMMeTTは、言語、データセット、モダリティ間のバランスの取れたサンプリング、効率的なシーケンシャルなデータイテレーション、バッチサイズオプティマイザ(OOMptimizer)によって補完されるマルチモーダルデータのための新しい2Dバケットスキームによって、トレーニング効率を向上させる。マルチモーダルなトレーニングは、両方のアーキテクチャに一貫して役立ちます。さらに、EMMeTTで訓練されたSALM-T5は、オリジナルのNMT能力を保ちながら、FLORESとFLEURSの4言語サブセット上でASTベースラインを上回っている。結果、多モーダル翻訳モデルでは、強いテキストと音声の翻訳結果を同時に生成する。 A rising interest in the modality extension of foundation language models warrants discussion on the most effective, and efficient, multimodal training approach. This work focuses on neural machine translation (NMT) and proposes a joint multimodal training regime of Speech-LLM to include automatic speech translation (AST). We investigate two different foundation model architectures, decoder-only GPT and encoder-decoder T5, extended with Canary-1B's speech encoder. To handle joint multimodal training, we propose a novel training framework called EMMeTT. EMMeTT improves training efficiency with the following: balanced sampling across languages, datasets, and modalities; efficient sequential data iteration; and a novel 2D bucketing scheme for multimodal data, complemented by a batch size optimizer (OOMptimizer). We show that a multimodal training consistently helps with both architectures. Moreover, SALM-T5 trained with EMMeTT retains the original NMT capability while outperforming AST baselines on four-language subsets of FLORES and FLEURS. The resultant Multimodal Translation Model produces strong text and speech translation results at the same time.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# サイバー防衛のためのコンテキストAI - LLMを用いた自動調査 Contextualized AI for Cyber Defense: An Automated Survey using LLMs ( http://arxiv.org/abs/2409.13524v1 ) ライセンス: Link先を確認	Christoforus Yoga Haryanto, Anne Maria Elvira, Trung Duc Nguyen, Minh Hieu Vu, Yoshiano Hartanto, Emily Lomempow, Arathi Arakala,	(参考訳) 本稿では,2015年から2024年にかけてのサイバー防衛能力向上におけるコンテキストAIの可能性について調査する。私たちは、組織的信頼とガバナンスフレームワークのギャップを指摘しながら、堅牢性、信頼性、統合方法に重点を置いています。文献調査手法として, (A) ChatGPT 4 と (B) Gemma 2:9b を用いた。学術研究にLLMを使うことの有効性と課題について論じ,今後の研究者に洞察を提供する。 This paper surveys the potential of contextualized AI in enhancing cyber defense capabilities, revealing significant research growth from 2015 to 2024. We identify a focus on robustness, reliability, and integration methods, while noting gaps in organizational trust and governance frameworks. Our study employs two LLM-assisted literature survey methodologies: (A) ChatGPT 4 for exploration, and (B) Gemma 2:9b for filtering with Claude 3.5 Sonnet for full-text analysis. We discuss the effectiveness and challenges of using LLMs in academic research, providing insights for future researchers.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# 時系列基礎モデルに向けて Towards Long-Context Time Series Foundation Models ( http://arxiv.org/abs/2409.13530v1 ) ライセンス: Link先を確認	Nina Żukowska, Mononito Goswami, Michał Wiliński, Willa Potosnak, Artur Dubrawski,	(参考訳) 時系列基礎モデルは、ゼロショットの設定であっても、幅広い領域にわたる様々なタスクにおいて印象的なパフォーマンスを示している。しかし、これらのモデルのほとんどは短い単変量時系列を入力として扱うように設計されている。これは、特に、時間的および変数内依存関係の強い長い多変量データを扱う医療のような分野において、実用的使用を制限する。本研究は,言語ドメインと時系列ドメインの両方から,様々なコンテキスト拡張手法をカタログ化し,体系的に比較し,エンコーダのみのTSFMが変数間の依存性を効果的にモデル化できるようにするための,新しい圧縮メモリ機構を導入することで,このギャップを埋めるものである。我々は,近年のマルチタスク時系列基盤モデルであるMOMENTを多変量文脈で導入することで,このアプローチの利点を実証する。 Time series foundation models have shown impressive performance on a variety of tasks, across a wide range of domains, even in zero-shot settings. However, most of these models are designed to handle short univariate time series as an input. This limits their practical use, especially in domains such as healthcare with copious amounts of long and multivariate data with strong temporal and intra-variate dependencies. Our study bridges this gap by cataloging and systematically comparing various context expansion techniques from both language and time series domains, and introducing a novel compressive memory mechanism to allow encoder-only TSFMs to effectively model intra-variate dependencies. We demonstrate the benefits of our approach by imbuing MOMENT, a recent family of multi-task time series foundation models, with the multivariate context.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# ロボットがどう動くか予測する高レベルパターン Using High-Level Patterns to Estimate How Humans Predict a Robot will Behave ( http://arxiv.org/abs/2409.13533v1 ) ライセンス: Link先を確認	Sagar Parekh, Lauren Bramblett, Nicola Bezzo, Dylan P. Losey,	(参考訳) ロボットと対話する人間は、ロボットが次に何をするかを予測する。例えば、最近の自動運転車の行動に基づいて、近くの人間のドライバーは、車が同じ車線に留まっていると予測するかもしれない。ロボットが人間の安全でシームレスな相互作用の予測を理解することは重要である。例えば、自動運転車が人間をマージしていないと認識しているなら、自動運転車は実際にマージを意図している。従来の研究は、人間がロボットの振る舞いを正確に予測していると仮定していた。しかし、人間と人間の予測に関する最近の研究は、人間は高いレベルの振る舞いを予測することによって、他のエージェントを近似する傾向があることを示唆している。この発見を,ロボットが人間の行動を予測する方法を推定する2階のマインド・アプローチの開発に応用する。データから直接これらの高いレベルの予測を抽出するために、最近の人間とロボットの軌道を離散的な潜在空間に埋め込む。この潜伏空間の各要素は、異なる種類の振舞い(例えば、人間の前にマージし、同じ車線に残る)を捉え、下層の振舞いと整合した状態空間のベクトル場にデコードする。ロボット行動の高レベルおよびコース予測は実際の人間の予測と一致すると仮定する。本稿では,この仮説を支持するための最初の証拠を概念実証ユーザスタディを通じて提示する。 A human interacting with a robot often forms predictions of what the robot will do next. For instance, based on the recent behavior of an autonomous car, a nearby human driver might predict that the car is going to remain in the same lane. It is important for the robot to understand the human's prediction for safe and seamless interaction: e.g., if the autonomous car knows the human thinks it is not merging -- but the autonomous car actually intends to merge -- then the car can adjust its behavior to prevent an accident. Prior works typically assume that humans make precise predictions of robot behavior. However, recent research on human-human prediction suggests the opposite: humans tend to approximate other agents by predicting their high-level behaviors. We apply this finding to develop a second-order theory of mind approach that enables robots to estimate how humans predict they will behave. To extract these high-level predictions directly from data, we embed the recent human and robot trajectories into a discrete latent space. Each element of this latent space captures a different type of behavior (e.g., merging in front of the human, remaining in the same lane) and decodes into a vector field across the state space that is consistent with the underlying behavior type. We hypothesize that our resulting high-level and course predictions of robot behavior will correspond to actual human predictions. We provide initial evidence in support of this hypothesis through a proof-of-concept user study.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# フォーミュラ・スーパービジョンによる視覚幾何学的事前学習 Formula-Supervised Visual-Geometric Pre-training ( http://arxiv.org/abs/2409.13535v1 ) ライセンス: Link先を確認	Ryosuke Yamada, Kensho Hara, Hirokatsu Kataoka, Koshi Makihara, Nakamasa Inoue, Rio Yokota, Yutaka Satoh,	(参考訳) コンピュータビジョンの歴史を通じて、画像(視覚)と点雲(幾何学)の統合を研究してきたが、画像と3Dオブジェクト認識の進歩は、これらのモダリティを別々に処理する傾向にある。我々は、この分割を統一トランスモデル上に画像と点雲を統合することで橋渡しすることを目指している。このアプローチは画像と点雲のモジュラリティ固有の特性を統合し、画像における基本的な下流タスクと、視覚幾何学的表現を学習することで、統一トランスフォーマーモデル上での3次元オブジェクト認識を実現する。本研究では,FSVGP (Fulall-Supervised Visual-Geometric Pre-training) について述べる。相互モダリティの監督を通じて,視覚的モダリティと幾何学的モダリティの間の教師付き事前学習を可能にする。 FSVGPはまた、実際のデータ収集、モダリティ間のアライメント、人間のアノテーションへの依存を減らす。実験の結果,FSVGPは画像と3Dオブジェクトの分類,検出,セグメンテーションの6つのタスクで,VisualAtomやPC-FractalDBよりも効果的に事前トレーニングを行うことがわかった。これらの成果は、画像および3次元物体認識におけるFSVGPの優れた一般化を示し、視覚幾何学的表現学習における合成事前学習の可能性を強調している。プロジェクトのWebサイトはhttps://ryosuke-yamada.github.io/fdsl-fsvgp/で公開されている。 Throughout the history of computer vision, while research has explored the integration of images (visual) and point clouds (geometric), many advancements in image and 3D object recognition have tended to process these modalities separately. We aim to bridge this divide by integrating images and point clouds on a unified transformer model. This approach integrates the modality-specific properties of images and point clouds and achieves fundamental downstream tasks in image and 3D object recognition on a unified transformer model by learning visual-geometric representations. In this work, we introduce Formula-Supervised Visual-Geometric Pre-training (FSVGP), a novel synthetic pre-training method that automatically generates aligned synthetic images and point clouds from mathematical formulas. Through cross-modality supervision, we enable supervised pre-training between visual and geometric modalities. FSVGP also reduces reliance on real data collection, cross-modality alignment, and human annotation. Our experimental results show that FSVGP pre-trains more effectively than VisualAtom and PC-FractalDB across six tasks: image and 3D object classification, detection, and segmentation. These achievements demonstrate FSVGP's superior generalization in image and 3D object recognition and underscore the potential of synthetic pre-training in visual-geometric representation learning. Our project website is available at https://ryosuke-yamada.github.io/fdsl-fsvgp/.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# ShizishanGPT: ツールとリソースを統合する農業用大規模言語モデル ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources ( http://arxiv.org/abs/2409.13537v1 ) ライセンス: Link先を確認	Shuting Yang, Zehui Liu, Wolfgang Mayer,	(参考訳) 大規模言語モデル(LLM)の最近の発展は、複雑な問合せを扱う知的対話システムの能力を大幅に向上させた。しかし、現在のLLMは、特に農業のような技術分野において、専門分野の知識に制限を課している。この問題に対処するため,我々は,Retrieval Augmented Generation(RAG)フレームワークとエージェントアーキテクチャに基づく農業用知的質問応答システムであるShizishanGPTを提案する。シジシャンGPTは、一般的な質問に答える汎用的なGPT-4ベースのモジュール、大言語モデルの知識をタイムリーに更新できない問題に補償する検索エンジンモジュール、ドメイン事実を提供する農業知識グラフモジュール、ドメイン知識を補うためにRAGを使用する検索モジュール、作物の表現型予測、遺伝子発現解析などの特殊なモデルを実行する農業エージェントモジュールを含む5つの主要なモジュールから構成されている。本研究に特化して設計された100の農業問題を含むデータセットを用いてシジシャンGPTを評価した。実験の結果,このツールはモジュール設計と異なるドメイン知識ソースの統合により,より正確かつ詳細な回答を提供するため,一般のLLMよりも優れていた。ソースコード、データセット、モデルウェイトはhttps://github.com/Zaiwen/CropGPT.comで公開されています。 Recent developments in large language models (LLMs) have led to significant improvements in intelligent dialogue systems'ability to handle complex inquiries. However, current LLMs still exhibit limitations in specialized domain knowledge, particularly in technical fields such as agriculture. To address this problem, we propose ShizishanGPT, an intelligent question answering system for agriculture based on the Retrieval Augmented Generation (RAG) framework and agent architecture. ShizishanGPT consists of five key modules: including a generic GPT-4 based module for answering general questions; a search engine module that compensates for the problem that the large language model's own knowledge cannot be updated in a timely manner; an agricultural knowledge graph module for providing domain facts; a retrieval module which uses RAG to supplement domain knowledge; and an agricultural agent module, which invokes specialized models for crop phenotype prediction, gene expression analysis, and so on. We evaluated the ShizishanGPT using a dataset containing 100 agricultural questions specially designed for this study. The experimental results show that the tool significantly outperforms general LLMs as it provides more accurate and detailed answers due to its modular design and integration of different domain knowledge sources. Our source code, dataset, and model weights are publicly available at https://github.com/Zaiwen/CropGPT.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# 第2回パーセプションテストチャレンジのマルチ選択ビデオQAトラックへの第1位ソリューション First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge ( http://arxiv.org/abs/2409.13538v1 ) ライセンス: Link先を確認	Yingzhe Peng, Yixiao Yuan, Zitian Ao, Huapeng Zhou, Kangqi Wang, Qipeng Zhu, Xu Yang,	(参考訳) 本稿では,第2回知覚テストチャレンジの多目的ビデオ質問回答(Multiple-choice Video Question Answering, QA)トラックに対する第1位ソリューションについて述べる。このコンペティションは複雑なビデオ理解の課題を提起し、ビデオコンテンツに関する質問を正確に理解し答えるモデルを必要とした。この課題に対処するために、我々は強力なQwenVL2 (7B)モデルを活用し、提供されたトレーニングセットで微調整しました。さらに、私たちはパフォーマンスを高めるためにモデルアンサンブル戦略とテスト時間拡張を採用しました。連続最適化により,本手法はリーダボード上でのTop-1精度0.7647を達成した。 In this report, we present our first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge. This competition posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content. To address this challenge, we leveraged the powerful QwenVL2 (7B) model and fine-tune it on the provided training set. Additionally, we employed model ensemble strategies and Test Time Augmentation to boost performance. Through continuous optimization, our approach achieved a Top-1 Accuracy of 0.7647 on the leaderboard.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# FullAnno:MLLMの画像理解を強化するデータエンジン FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs ( http://arxiv.org/abs/2409.13540v1 ) ライセンス: Link先を確認	Jing Hao, Yuxiang Zhao, Song Chen, Yanpeng Sun, Qiang Chen, Gang Zhang, Kun Yao, Errui Ding, Jingdong Wang,	(参考訳) MLLM(Multimodal Large Language Models)は、その強力な推論と一般化機能を備えた幅広い視覚言語タスクにおいて、有望であることを示す。しかし、それらはSupervised Fine-Tuning (SFT) フェーズの高品質なデータに大きく依存している。既存のアプローチは、GPT-4Vによる高品質なデータのキュレートを目標としているが、GPT-4Vの商業的性質と、モデルを指示するために使用するプロンプトの単純さのため、スケーラビリティが低い。そこで我々は,オブジェクトのカテゴリと位置,地域記述,テキスト情報,および画像の高密度キャプションからなる,大規模で高品質できめ細かい画像アノテーションを生成可能なデータエンジンであるFullAnnoシステムを開発した。このエンジンは、複数の専門家モデルを含むカスケードアノテーションプロセスで特徴付けられ、濃密な画像キャプションを生成するためにLSMを指示するためにリッチなプロンプトを使用する。我々は、FullAnnoシステムを用いてCOCOおよびVisual Genomeデータセットを再注釈し、オブジェクトアノテーションの数を3倍にし、元の画像キャプションの長さを15。実験により、再生したアノテーションは、複数のベンチマークでLLaVA-v1.5の能力を著しく向上できることが示された。再注釈されたデータは、https://arcana-project-page.github.ioで入手できる。 Multimodal Large Language Models (MLLMs) have shown promise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities. However, they heavily depend on high-quality data in the Supervised Fine-Tuning (SFT) phase. The existing approaches aim to curate high-quality data via GPT-4V, but they are not scalable due to the commercial nature of GPT-4V and the simplicity of the prompts used to instruct the model. To this end, we devised the FullAnno system, which is a data engine that can generate large-scale, high-quality, and fine-grained image annotations consisting of the category and position of objects, region descriptions, text information, as well as image dense captions. This engine is characterized by its cascade annotation process, which involves multiple expert models and employs rich prompts to instruct LLMs in generating dense image captions. We re-annotated the COCO and Visual Genome datasets using our FullAnno system, tripling the number of object annotations and increasing the length of the original image captions by a factor of 15. Experiments show that the regenerated annotation can significantly enhance the capabilities of LLaVA-v1.5 on several benchmarks. The re-annotated data are available at: https://arcana-project-page.github.io	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# 融合と流れ:フォトニックグラフ状態を確実に構築するための正式なプロトコル Fusion and flow: formal protocols to reliably build photonic graph states ( http://arxiv.org/abs/2409.13541v1 ) ライセンス: Link先を確認	Giovanni de Felice, Boldizsár Poór, Lia Yeh, William Cashman,	(参考訳) Photonicsは、計測ベースの量子コンピューティングの実装のための有望なプラットフォームを提供する。最近提案されたフュージョンベースのアーキテクチャは、普遍性とフォールトトレランスを達成することを目的としている。これらの手法では、資源グラフ状態上で核融合と単一ビット計測を行うことにより計算を行う。これらのアーキテクチャの検証には、線形代数的、確率的、制御フロー構造を統一形式言語で結合する必要がある。本稿では,線形光学,ZX計算,データフロープログラミングを融合して,フォトニック量子コンピューティングのためのフレームワークを開発する。パウリの誤差を誘発する核融合測定を特徴付けるとともに、核融合ネットワークのための新しい流れ構造を用いて補正可能であることを示す。任意の核融合を実現するための新しい再帰的・再帰的プロトコルの正しさを証明し、光子源が絡み合った線形光学系に対する普遍性のグラフ理論的証明を提供する。提案するフレームワークは、フォトニック量子コンピューティングのためのコンパイルアルゴリズムの開発方法である。 Photonics offers a promising platform for implementations of measurement-based quantum computing. Recently proposed fusion-based architectures aim to achieve universality and fault-tolerance. In these approaches, computation is carried out by performing fusion and single-qubit measurements on a resource graph state. The verification of these architectures requires linear algebraic, probabilistic, and control flow structures to be combined in a unified formal language. This paper develops a framework for photonic quantum computing by bringing together linear optics, ZX calculus, and dataflow programming. We characterize fusion measurements that induce Pauli errors and show that they are correctable using a novel flow structure for fusion networks. We prove the correctness of new repeat-until-success protocols for the realization of arbitrary fusions and provide a graph-theoretic proof of universality for linear optics with entangled photon sources. The proposed framework paves the way for the development of compilation algorithms for photonic quantum computing.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# 半スーパービジョンノード分類のためのグラフ類似性正規化ソフトマックス Graph Similarity Regularized Softmax for Semi-Supervised Node Classification ( http://arxiv.org/abs/2409.13544v1 ) ライセンス: Link先を確認	Yiming Yang, Jun Liu, Wei Wan,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データのために設計された強力なディープラーニングモデルであり、ソフトマックス関数は半教師付きノード分類の最も一般的な分類法である。しかし、ソフトマックス関数はグラフ構造の空間情報を欠いている。本稿では,半教師付きノード分類におけるGNNのためのグラフ類似性正規化ソフトマックスを提案する。非局所的全変動(TV)正規化をソフトマックス活性化関数に組み込むことで、グラフ固有の空間情報をより効果的に捉えることができる。非局所勾配と発散作用素の重みはグラフの隣接行列に基づいて決定される。本稿では,提案手法をGCNとGraphSAGEのアーキテクチャに適用し,それぞれを引用とWebページリンクデータセット上でテストする。数値実験はノード分類と一般化能力において優れた性能を示す。これらの結果は、グラフ類似性が正則化されたソフトマックスは、因数グラフと非因数グラフの両方に有効であることを示している。 Graph Neural Networks (GNNs) are powerful deep learning models designed for graph-structured data, demonstrating effectiveness across a wide range of applications.The softmax function is the most commonly used classifier for semi-supervised node classification. However, the softmax function lacks spatial information of the graph structure. In this paper, we propose a graph similarity regularized softmax for GNNs in semi-supervised node classification. By incorporating non-local total variation (TV) regularization into the softmax activation function, we can more effectively capture the spatial information inherent in graphs. The weights in the non-local gradient and divergence operators are determined based on the graph's adjacency matrix. We apply the proposed method into the architecture of GCN and GraphSAGE, testing them on citation and webpage linking datasets, respectively. Numerical experiments demonstrate its good performance in node classification and generalization capabilities. These results indicate that the graph similarity regularized softmax is effective on both assortative and disassortative graphs.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# 分割型ランダム化平滑化による正逆ロバスト性証明 Certified Adversarial Robustness via Partition-based Randomized Smoothing ( http://arxiv.org/abs/2409.13546v1 ) ライセンス: Link先を確認	Hossein Goli, Farzan Farnia,	(参考訳) ディープニューラルネットワーク分類器の信頼性の高い応用には、敵の摂動に対する堅牢性証明が必要である。ガウスの平滑化は、正規有界摂動に対するロバスト性を証明するための広く分析されたアプローチであり、認定された予測半径はガウスのノイズの分散と、加法的なガウスのノイズの下でのニューラルネットの予測の信頼度に依存する。しかし、高次元画像データセットに適用した場合、高分散のガウス雑音が画像の視認性を著しく損なうため、原ガウス滑らか化の認定半径は比較的小さい可能性がある。本稿では,Pixel Partitioningに基づくランダム化平滑化手法を提案する。提案するPPRSアルゴリズムは,加法ガウス雑音下での画像の可視性を向上させる。本稿では,標準的なコンピュータビジョンデータセットとニューラルネットワークアーキテクチャにPPRSを適用した数値結果について論じる。実験により, ランダムな平滑化における付加ガウス雑音に対する予測モデルの精度と安定性が著しく向上したことが示された。 A reliable application of deep neural network classifiers requires robustness certificates against adversarial perturbations. Gaussian smoothing is a widely analyzed approach to certifying robustness against norm-bounded perturbations, where the certified prediction radius depends on the variance of the Gaussian noise and the confidence level of the neural net's prediction under the additive Gaussian noise. However, in application to high-dimensional image datasets, the certified radius of the plain Gaussian smoothing could be relatively small, since Gaussian noise with high variances can significantly harm the visibility of an image. In this work, we propose the Pixel Partitioning-based Randomized Smoothing (PPRS) methodology to boost the neural net's confidence score and thus the robustness radius of the certified prediction. We demonstrate that the proposed PPRS algorithm improves the visibility of the images under additive Gaussian noise. We discuss the numerical results of applying PPRS to standard computer vision datasets and neural network architectures. Our empirical findings indicate a considerable improvement in the certified accuracy and stability of the prediction model to the additive Gaussian noise in randomized smoothing.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# 自由四元数選択によるパラメータ化制御ゲートの最適化 Optimizing a parameterized controlled gate with Free Quaternion Selection ( http://arxiv.org/abs/2409.13547v1 ) ライセンス: Link先を確認	Hiroyoshi Kurogi, Katsuhiro Endo, Yuki Sato, Michihiko Sugawara, Kaito Wada, Kenji Sugisaki, Shu Kanno, Hiroshi C. Watanabe, Haruyuki Nakano,	(参考訳) 変分アルゴリズムでは、量子回路は伝統的に単一量子ビットゲートに対してパラメータ化される。本研究では、一般化された制御ゲートをパラメータ化し、コスト値の局所最小化に最適なパラメータを推定するアルゴリズムを提案する。提案手法は,Isingおよび分子ハミルトニアンの変分量子固有解法(VQE),フィデリティ最大化のための変分量子アルゴリズム(VQA),時間発展演算子のユニタリコンパイルなど,様々な最適化問題に適用する。提案手法は,他の手法よりも浅い回路で効率よく最適化し,高い表現性を示す。さらに, この手法は, 化学系応用において要求される粒子数保存ゲートを一般化し, 完全に最適化することができる。この特性を利用して、分子ハミルトニアンの時間発展作用素を実際に近似し、トロッター分解による標準実装と比較して浅い回路で力学をシミュレートした。 In variational algorithms, quantum circuits are conventionally parametrized with respect to single-qubit gates. In this study, we parameterize a generalized controlled gate and propose an algorithm to estimate the optimal parameters for locally minimizing the cost value, where we extend the free quaternion selection method, an optimization method for a single-qubit gate. To benchmark the performance, we apply the proposed method to various optimization problems, including the Variational Quantum Eigensolver (VQE) for Ising and molecular Hamiltonians, Variational Quantum Algorithms (VQA) for fidelity maximization, and unitary compilation of time evolution operators. In these applications, the proposed method shows efficient optimization and greater expressibility with shallower circuits than other methods. Furthermore, this method is also capable of generalizing and fully optimizing particle-number-conserving gates, which are in demand in chemical systems applications. Taking advantage of this property, we have actually approximated time evolution operators of molecular Hamiltonian and simulated the dynamics with shallower circuits in comparison to the standard implementation by Trotter decomposition.	翻訳日:2024-11-07 06:41:58 公開日:2024-09-20
# 計算ノートにおけるコンテクスト化されたデータ記述コード生成 Contextualized Data-Wrangling Code Generation in Computational Notebooks ( http://arxiv.org/abs/2409.13551v1 ) ライセンス: Link先を確認	Junjie Huang, Daya Guo, Chenglong Wang, Jiazhen Gu, Shuai Lu, Jeevana Priya Inala, Cong Yan, Jianfeng Gao, Nan Duan, Michael R. Lyu,	(参考訳) データラングリングは、計算ノートブックのさらなる分析のために生データを準備するプロセスであり、データサイエンスにおいて不可欠だが時間を要するステップである。コード生成は、ユーザ意図を実行可能なコードに変換することによって、アナリストのオーバーヘッドを削減するために、データラングリングプロセスを自動化する可能性がある。正確なコードラングリングデータの生成は、テキストコンテキスト、コードコンテキスト、データコンテキストなど、ノートブックに存在するリッチコンテキストの包括的な考慮を必要とする。しかし、ノートブックはしばしば複数の非線形解析タスクを線形コードブロックのシーケンスにインターリーブする。ソースコードブロックでモデルを直接トレーニングするのは、正確なラングリングコード生成のためにコンテキストを完全に活用するのに失敗する。このギャップを埋めるために、コード生成タスクを乱すデータモデルのトレーニングを支援するために、明確でリッチなコンテキストで高品質なデータセットを構築することを目的としています。本研究では,まず,マルチモーダルなコンテキスト依存を明確化したデータラングリングコード生成例を抽出するための自動アプローチであるCoCoMineを提案する。最初はデータフロー分析を採用して、データラングリングコードを含むコードブロックを識別する。次にCoCoMineは、ノートブックのトレースと再生を通じて、コンテキスト化されたデータラングリングコード例を抽出する。 CoCoMineでは、Notebooksでコンテキスト化されたデータラングリングコード生成のための58,221のサンプルを含むデータセットであるCoCoNoteを構築している。データセットの有効性を示すため、トレーニング済みのコードモデルの範囲を微調整し、タスク上で様々な大きな言語モデルを促す。さらに、コード生成を強化するために、データコンテキストとコード/テキストコンテキストを別々にエンコードするDataCoderを提案する。実験結果から,データラングリングコード生成にデータコンテキストを組み込むことの重要性と,本モデルの有効性が示された。コードとデータは url でリリースします。 Data wrangling, the process of preparing raw data for further analysis in computational notebooks, is a crucial yet time-consuming step in data science. Code generation has the potential to automate the data wrangling process to reduce analysts' overhead by translating user intents into executable code. Precisely generating data wrangling code necessitates a comprehensive consideration of the rich context present in notebooks, including textual context, code context and data context. However, notebooks often interleave multiple non-linear analysis tasks into linear sequence of code blocks, where the contextual dependencies are not clearly reflected. Directly training models with source code blocks fails to fully exploit the contexts for accurate wrangling code generation. To bridge the gap, we aim to construct a high quality datasets with clear and rich contexts to help training models for data wrangling code generation tasks. In this work, we first propose an automated approach, CoCoMine to mine data-wrangling code generation examples with clear multi-modal contextual dependency. It first adopts data flow analysis to identify the code blocks containing data wrangling codes. Then, CoCoMine extracts the contextualized datawrangling code examples through tracing and replaying notebooks. With CoCoMine, we construct CoCoNote, a dataset containing 58,221 examples for Contextualized Data-wrangling Code generation in Notebooks. To demonstrate the effectiveness of our dataset, we finetune a range of pretrained code models and prompt various large language models on our task. Furthermore, we also propose DataCoder, which encodes data context and code&textual contexts separately to enhance code generation. Experiment results demonstrate the significance of incorporating data context in data-wrangling code generation and the effectiveness of our model. We release code and data at url...	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# 接地的特徴とコアフェレントな特徴を持つビジュアルストーリーの生成 Generating Visual Stories with Grounded and Coreferent Characters ( http://arxiv.org/abs/2409.13555v1 ) ライセンス: Link先を確認	Danyang Liu, Mirella Lapata, Frank Keller,	(参考訳) 登場人物は物語において重要である。彼らはプロットを前進させ、感情的なつながりを作り、物語のテーマを具現化する。ビジュアルなストーリーテリング手法は、特定のキャラクターに関する物語を構築することなく、それに関連するプロットやイベントをより重視する。その結果、生成されたストーリーはジェネリックに感じられ、キャラクタが不在、曖昧、または誤っている。これらの問題を緩和するため,キャラクタ中心のストーリー生成という新たなタスクを導入し,一貫した接地と中核的なキャラクタの言及で視覚的なストーリーを予測できる最初のモデルを提案する。我々のモデルは、広く使われているVISTベンチマークの上に構築された新しいデータセットに基づいて微調整されています。具体的には、VISTを視覚的およびテキスト的文字コア参照チェーンで強化する自動パイプラインを開発する。また、物語における文字の豊かさとコア参照を測定するための新しい評価指標を提案する。実験結果から,本モデルは,ベースラインや最先端システムと比較して,一貫性とコアフェレントな繰り返しキャラクタを持つストーリーを生成することがわかった。 Characters are important in narratives. They move the plot forward, create emotional connections, and embody the story's themes. Visual storytelling methods focus more on the plot and events relating to it, without building the narrative around specific characters. As a result, the generated stories feel generic, with character mentions being absent, vague, or incorrect. To mitigate these issues, we introduce the new task of character-centric story generation and present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions. Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark. Specifically, we develop an automated pipeline to enrich VIST with visual and textual character coreference chains. We also propose new evaluation metrics to measure the richness of characters and coreference in stories. Experimental results show that our model generates stories with recurring characters which are consistent and coreferent to larger extent compared to baselines and state-of-the-art systems.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# 視覚増強による信頼できるヘイトスピーチ検出 Trustworthy Hate Speech Detection Through Visual Augmentation ( http://arxiv.org/abs/2409.13557v1 ) ライセンス: Link先を確認	Ziyuan Yang, Ming Yan, Yingyu Chen, Hui Wang, Zexin Lu, Yi Zhang,	(参考訳) ソーシャルメディアプラットフォームでのヘイトスピーチの急増は、ヘイトスピーチ検出(HSD)がますます批判的になり、大きな課題となっている。現在のHSD法は、検出性能を高めるために文脈情報を充実させることに重点を置いているが、ヘイトスピーチの本質的な不確実性を見落としている。本稿では,視覚的拡張(TrusV-HSD)による信頼に値するヘイトスピーチ検出手法を提案する。 TrusV-HSDは、ペアデータのないマルチモーダル接続を通じて、信頼できる情報を効果的に抽出することで意味表現を学習する。公開HSDデータセットを用いた実験では,TrusV-HSDの有効性が示され,従来の手法よりも顕著な改善が見られた。 The surge of hate speech on social media platforms poses a significant challenge, with hate speech detection~(HSD) becoming increasingly critical. Current HSD methods focus on enriching contextual information to enhance detection performance, but they overlook the inherent uncertainty of hate speech. We propose a novel HSD method, named trustworthy hate speech detection method through visual augmentation (TrusV-HSD), which enhances semantic information through integration with diffused visual images and mitigates uncertainty with trustworthy loss. TrusV-HSD learns semantic representations by effectively extracting trustworthy information through multi-modal connections without paired data. Our experiments on public HSD datasets demonstrate the effectiveness of TrusV-HSD, showing remarkable improvements over conventional methods.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# 生成モデルと対向摂動を考慮したニューラルネットワークの効率的な可視化 Efficient Visualization of Neural Networks with Generative Models and Adversarial Perturbations ( http://arxiv.org/abs/2409.13559v1 ) ライセンス: Link先を確認	Athanasios Karagounis,	(参考訳) 本稿では,既存の手法を改良した生成ネットワークによるディープビジュアライゼーション手法を提案する。従来の複数のネットワークとは対照的に,ジェネレータと識別器のみを必要とするため,使用するネットワーク数を削減し,アーキテクチャを単純化する。さらに,本モデルでは事前学習の知識を少なくし,非対話的学習プロセスを用いて,判別器がジェネレータと競合するのではなく,ガイドとして機能する。この研究のコアコントリビューションは、特定のクラスラベルと整合した詳細な視覚化画像を生成する能力である。本モデルでは,複数層にまたがるクラス情報を伝播することにより,ラベル指向の画像生成を促進できる,ユニークなスキップ接続型ブロック設計を取り入れている。さらに、これらの生成した視覚化を逆例として利用し、元の画像に最小限の修正を施した分類網を効果的に騙す方法について検討する。実験結果から,本手法は標的攻撃と非目標攻撃の両方において従来の対向的事例生成技術より優れ,摂動を最小限に抑えた94.5%の愚行率を達成できた。この研究は、可視化手法と敵の例とのギャップを埋めるものであり、愚かさが可視化品質を評価するための定量的指標となることを示唆している。本研究から得られた知見は、ニューラルネットワークの解釈可能性と敵攻撃に対する脆弱性に関する新たな視点を提供する。 This paper presents a novel approach for deep visualization via a generative network, offering an improvement over existing methods. Our model simplifies the architecture by reducing the number of networks used, requiring only a generator and a discriminator, as opposed to the multiple networks traditionally involved. Additionally, our model requires less prior training knowledge and uses a non-adversarial training process, where the discriminator acts as a guide rather than a competitor to the generator. The core contribution of this work is its ability to generate detailed visualization images that align with specific class labels. Our model incorporates a unique skip-connection-inspired block design, which enhances label-directed image generation by propagating class information across multiple layers. Furthermore, we explore how these generated visualizations can be utilized as adversarial examples, effectively fooling classification networks with minimal perceptible modifications to the original images. Experimental results demonstrate that our method outperforms traditional adversarial example generation techniques in both targeted and non-targeted attacks, achieving up to a 94.5% fooling rate with minimal perturbation. This work bridges the gap between visualization methods and adversarial examples, proposing that fooling rate could serve as a quantitative measure for evaluating visualization quality. The insights from this study provide a new perspective on the interpretability of neural networks and their vulnerabilities to adversarial attacks.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# 故障診断のためのログからの故障指示情報のデミスティファイションと抽出 Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis ( http://arxiv.org/abs/2409.13561v1 ) ライセンス: Link先を確認	Junjie Huang, Zhihan Jiang, Jinyang Liu, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Cong Feng, Hui Dong, Zengyin Yang, Michael R. Lyu,	(参考訳) ログはオンラインサービスシステムのメンテナンスにおいて必須であり、多くの場合、効果的な障害軽減のための重要な情報を含んでいる。既存の異常検出手法は、広範な実行時データ内の異常なログの識別を容易にするが、技術者による手動によるログメッセージの調査は、労働集約的かつエラーを起こしやすい欠陥を理解するのに不可欠である。 CloudAでログベースのトラブルシューティングのプラクティスを調べると、エンジニアが診断のためにログ情報の2つのカテゴリを優先していることが分かりました。これには、異常なシステムイベントを記録するフォールトインジケート記述や、関連するエンティティを指定するフォールトインジケートパラメータが含まれる。そこで本研究では,LoFIと呼ばれる異常診断のためのログから,そのような故障情報を自動的に抽出する手法を提案する。 LoFIは2つの重要なステージから構成される。最初の段階では、LoFIは、意味的類似性に基づいて、障害に関連するログを収集する粗粒度フィルタリングを行う。第2段階では、LoFIは学習済みの言語モデルと新しいプロンプトベースのチューニング手法を利用して、収集したログから興味の詳細な情報を抽出する。我々は、Apache Sparkから収集したログとCloudAから収集した産業データセット上でLoFIを評価する。実験の結果、LoFIは全てのベースライン法を有意差で上回り、最高のベースライン法であるChatGPTよりもF1の25.8~37.9の絶対的な改善を達成している。このことは、欠陥を示す情報の認識におけるLoFIの有効性を強調している。さらに,CloudAにおけるLoFIのデプロイの成功とユーザスタディにより,本手法の有用性が検証された。コードとデータはhttps://github.com/Jun-jie-Huang/LoFI.comで公開されている。 Logs are imperative in the maintenance of online service systems, which often encompass important information for effective failure mitigation. While existing anomaly detection methodologies facilitate the identification of anomalous logs within extensive runtime data, manual investigation of log messages by engineers remains essential to comprehend faults, which is labor-intensive and error-prone. Upon examining the log-based troubleshooting practices at CloudA, we find that engineers typically prioritize two categories of log information for diagnosis. These include fault-indicating descriptions, which record abnormal system events, and fault-indicating parameters, which specify the associated entities. Motivated by this finding, we propose an approach to automatically extract such faultindicating information from logs for fault diagnosis, named LoFI. LoFI comprises two key stages. In the first stage, LoFI performs coarse-grained filtering to collect logs related to the faults based on semantic similarity. In the second stage, LoFI leverages a pre-trained language model with a novel prompt-based tuning method to extract fine-grained information of interest from the collected logs. We evaluate LoFI on logs collected from Apache Spark and an industrial dataset from CloudA. The experimental results demonstrate that LoFI outperforms all baseline methods by a significant margin, achieving an absolute improvement of 25.8~37.9 in F1 over the best baseline method, ChatGPT. This highlights the effectiveness of LoFI in recognizing fault-indicating information. Furthermore, the successful deployment of LoFI at CloudA and user studies validate the utility of our method. The code and data are available at https://github.com/Jun-jie-Huang/LoFI.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# Proxion:Ethereumの衝突脆弱性を見つけるための隠れたスマートコントラクト Proxion: Uncovering Hidden Proxy Smart Contracts for Finding Collision Vulnerabilities in Ethereum ( http://arxiv.org/abs/2409.13563v1 ) ライセンス: Link先を確認	Cheng-Kang Chen, Wen-Yi Chu, Muoi Tran, Laurent Vanbever, Hsu-Chun Hsiao,	(参考訳) プロキシ設計パターンにより、Ethereumスマートコントラクトを同時に不変かつアップグレード可能とし、元のコントラクトをデータストレージを含むプロキシコントラクトと実装ロジックを含むロジックコントラクトに分割する。このアーキテクチャは、セキュリティ上の問題、すなわちプロキシとロジックのコントラクト間の機能衝突とストレージの衝突が知られており、実際のインシデントでユーザから数百万ドル相当のデジタル資産を盗まれている。この懸念に応えて、いくつかの以前の研究がEthereumのプロキシコントラクトを特定して、衝突を検出する方法を模索している。しかし、それらすべてがカバー範囲が限られているために不足しており、多くの場合、利用可能なソースコードや過去のトランザクションとの契約のみの分析に制限される。このギャップを埋めるために、私たちは、すべてのプロキシスマートコントラクトとEthereum内のそれらの衝突を識別する、自動クロスコントラクトアナライザであるProxionを紹介します。 Proxionを際立たせるのは、ソースコードと過去のトランザクションの両方を欠く隠れたスマートコントラクトを分析する能力だ。 Proxionは効率性と精度を向上させる様々な技術を備えており、最先端のツールよりも優れており、特に数百万のプロキシ契約と何千もの未報告の衝突を識別している。我々は、2015年から2023年までの3600万以上の生きた契約を分析し、54.2%がプロキシ契約であり、約150万の契約が少なくとも1つの衝突問題を示すことを明らかにした。 The proxy design pattern allows Ethereum smart contracts to be simultaneously immutable and upgradeable, in which an original contract is split into a proxy contract containing the data storage and a logic contract containing the implementation logic. This architecture is known to have security issues, namely function collisions and storage collisions between the proxy and logic contracts, and has been exploited in real-world incidents to steal users' millions of dollars worth of digital assets. In response to this concern, several previous works have sought to identify proxy contracts in Ethereum and detect their collisions. However, they all fell short due to their limited coverage, often restricting analysis to only contracts with available source code or past transactions. To bridge this gap, we present Proxion, an automated cross-contract analyzer that identifies all proxy smart contracts and their collisions in Ethereum. What sets Proxion apart is its ability to analyze hidden smart contracts that lack both source code and past transactions. Equipped with various techniques to enhance efficiency and accuracy, Proxion outperforms the state-of-the-art tools, notably identifying millions more proxy contracts and thousands of unreported collisions. We apply Proxion to analyze over 36 million alive contracts from 2015 to 2023, revealing that 54.2% of them are proxy contracts, and about 1.5 million contracts exhibit at least one collision issue.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# ディープラーニングと機械学習、ビッグデータ分析と管理の強化:テンソルフロー事前学習モデル Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Tensorflow Pretrained Models ( http://arxiv.org/abs/2409.13566v1 ) ライセンス: Link先を確認	Keyu Chen, Ziqian Bi, Qian Niu, Junyu Liu, Benji Peng, Sen Zhang, Ming Liu, Ming Li, Xuanhe Pan, Jiawei Xu, Jinlang Wang, Pohsun Feng,	(参考訳) 本書は、ディープラーニングにおけるTensorFlow事前学習モデルの応用に焦点を当て、画像分類やオブジェクト検出などのタスクにこれらのモデルを効果的に使用するための詳細なガイダンスを提供する。 ResNet、MobileNet、EfficientNetといったモダンアーキテクチャの実践的な実装をカバーし、実世界の実例や実験を通じてトランスファーラーニングのパワーを実証している。この本は線形探索とモデル微調整を比較し、PCA、t-SNE、UMAPといった技術を使って、読者が異なるアプローチの影響を直感的に理解できるように視覚化する。初心者向けに設計された本書には、完全なサンプルコードとステップ・バイ・ステップの指示が含まれており、読者は事前学習されたモデルを利用して、実践的なシナリオにおけるパフォーマンスを改善する方法を素早く習得することができる。この本は、理論的な洞察と実践を融合することにより、読者に様々な深層学習課題に自信を持って取り組む知識を与える。 This book focuses on the application of TensorFlow pre-trained models in deep learning, providing detailed guidance on effectively using these models for tasks such as image classification and object detection. It covers practical implementations of modern architectures like ResNet, MobileNet, and EfficientNet, demonstrating the power of transfer learning through real-world examples and experiments. The book compares linear probing and model fine-tuning, offering visualizations using techniques such as PCA, t-SNE, and UMAP to help readers intuitively understand the impact of different approaches. Designed for beginners to advanced users, this book includes complete example code and step-by-step instructions, enabling readers to quickly master how to leverage pre-trained models to improve performance in practical scenarios. By blending theoretical insights with hands-on practice, this book equips readers with the knowledge to confidently tackle various deep learning challenges.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# ふわふわ雲に対処する:S2および/またはS1画像の時系列を用いたフィールド境界検出 Tackling fluffy clouds: field boundaries detection using time series of S2 and/or S1 imagery ( http://arxiv.org/abs/2409.13568v1 ) ライセンス: Link先を確認	Foivos I. Diakogiannis, Zheng-Shu Zhou, Jeff Wang, Gonzalo Mata, Dave Henry, Roger Lawes, Amy Parker, Peter Caccetta, Rodrigo Ibata, Ondrej Hlinka, Jonathan Richetti, Kathryn Batchelor, Chris Herrmann, Andrew Toovey, John Taylor,	(参考訳) 正確なフィールド境界線作成は、デジタル農業において重要な課題であり、作物のモニタリングから資源管理まで、あらゆることに影響を及ぼす。既存の手法はしばしばノイズに悩まされ、特に光リモートセンシングにおいて雲のカバーを扱う場合、様々な風景を一般化することができない。そこで本研究では,Sentinel-2 (S2) およびSentinel-1 (S1) 画像からの時系列データを活用する手法を提案する。本稿では,衛星画像時系列に特化して設計された3次元ビジョントランスフォーマーアーキテクチャについて紹介する。 2つのモデルが提案されている: PTAViT3DはS2またはS1データを独立に処理し、PTAViT3D-CAは両方のデータセットを融合して精度を高める。両モデルとも、時空間相関を利用して、疎密で密集した雲の範囲で評価される。その結果,S1モデルでは空間分解能の点でS2画像に匹敵する性能を提供するため,部分的(S2,S2,S1データ融合)や密集雲被覆(S1)であっても,領域境界を効果的に導出できることが示唆された。このアプローチの重要な強みは、時空間相関をメモリ効率のよい方法で活用することで、クラウドに汚染された画像を直接処理できる能力にある。この手法は、オーストラリアのフィールド境界をマッピングするためにePaddocks製品で使用され、様々な農業環境に適応可能な堅牢でスケーラブルなソリューションを提供する。私たちのコードはhttps://github.com/feevos/tfcl.comで公開されています。 Accurate field boundary delineation is a critical challenge in digital agriculture, impacting everything from crop monitoring to resource management. Existing methods often struggle with noise and fail to generalize across varied landscapes, particularly when dealing with cloud cover in optical remote sensing. In response, this study presents a new approach that leverages time series data from Sentinel-2 (S2) and Sentinel-1 (S1) imagery to improve performance under diverse cloud conditions, without the need for manual cloud filtering. We introduce a 3D Vision Transformer architecture specifically designed for satellite image time series, incorporating a memory-efficient attention mechanism. Two models are proposed: PTAViT3D, which handles either S2 or S1 data independently, and PTAViT3D-CA, which fuses both datasets to enhance accuracy. Both models are evaluated under sparse and dense cloud coverage by exploiting spatio-temporal correlations. Our results demonstrate that the models can effectively delineate field boundaries, even with partial (S2 or S2 and S1 data fusion) or dense cloud cover (S1), with the S1-based model providing performance comparable to S2 imagery in terms of spatial resolution. A key strength of this approach lies in its capacity to directly process cloud-contaminated imagery by leveraging spatio-temporal correlations in a memory-efficient manner. This methodology, used in the ePaddocks product to map Australia's national field boundaries, offers a robust, scalable solution adaptable to varying agricultural environments, delivering precision and reliability where existing methods falter. Our code is available at https://github.com/feevos/tfcl.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# オーストラリア首都圏における電子投票システムeVACS 2020/2024のセキュリティ分析 Security analysis of the Australian Capital Territory's eVACS 2020/2024 paperless direct recording electronic voting system ( http://arxiv.org/abs/2409.13570v1 ) ライセンス: Link先を確認	Chris Culnane, Andrew Conway, Vanessa Teague, Ty Wilson-Brown,	(参考訳) 本報告では,Ada Web Services Libraryにおける2つの暗号エラーがeVACSに与える影響について述べる。これらのエラーは、2024年3月に公開された2024 eVACSコードの検査とテストの過程で確認した。この問題をAdaCoreに開示し、当時の影響を関連する選挙当局に説明しました。 This report describes the implications for eVACS of two cryptographic errors in the Ada Web Services Library that it depends on. We identified these errors in the course of examining and testing the 2024 eVACS code, which was made publicly available in March 2024. We disclosed the problems to AdaCore, and explained the implications at the time to the relevant electoral authorities.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# ファクトリワイド動的スケジューリングのためのスケーラブルなマルチエージェント強化学習 Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling ( http://arxiv.org/abs/2409.13571v1 ) ライセンス: Link先を確認	Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang,	(参考訳) リアルタイムな動的スケジューリングは、意思決定の複雑さのため、現代の製造プロセスにおいて極めて難しい課題である。近年、強化学習(RL)がこの課題に対処するための影響のある手法として注目されている。しかし、古典的なRL法は通常、大規模な工場規模のスケジューリングには適さない人為的なディスパッチ規則に依存している。このギャップを埋めるために,本論文では,スケジューリング問題を各エージェントが処理するサブプロブレムの集合に分解した後,所望のコーディネーションを得るためにリーダ・フォロワマルチエージェントRL(MARL)の概念を適用した。さらに、エージェントのエラーによる生産能力の壊滅的な損失を防止するためにルールベースの変換アルゴリズムを提案することで、手順をさらに強化する。実験の結果,提案手法は様々な面において,最先端の深部RLに基づくスケジューリングモデルよりも優れていた。さらに、提案したモデルは、要求の変化に対する最も堅牢なスケジューリング性能を提供する。全体として、提案したMARLベースのスケジューリングモデルでは、リアルタイムスケジューリング問題に対する有望な解決策が提示され、様々な製造業における潜在的な応用が期待できる。 Real-time dynamic scheduling is a crucial but notoriously challenging task in modern manufacturing processes due to its high decision complexity. Recently, reinforcement learning (RL) has been gaining attention as an impactful technique to handle this challenge. However, classical RL methods typically rely on human-made dispatching rules, which are not suitable for large-scale factory-wide scheduling. To bridge this gap, this paper applies a leader-follower multi-agent RL (MARL) concept to obtain desired coordination after decomposing the scheduling problem into a set of sub-problems that are handled by each individual agent for scalability. We further strengthen the procedure by proposing a rule-based conversion algorithm to prevent catastrophic loss of production capacity due to an agent's error. Our experimental results demonstrate that the proposed model outperforms the state-of-the-art deep RL-based scheduling models in various aspects. Additionally, the proposed model provides the most robust scheduling performance to demand changes. Overall, the proposed MARL-based scheduling model presents a promising solution to the real-time scheduling problem, with potential applications in various manufacturing industries.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# 自己の法則:非合意親密メディアにおけるDMCAの効用 A Law of One's Own: The Inefficacy of the DMCA for Non-Consensual Intimate Media ( http://arxiv.org/abs/2409.13575v1 ) ライセンス: Link先を確認	Li Qiwei, Shihui Zhang, Samantha Paige Pratt, Andrew Timothy Kasper, Eric Gilbert, Sarita Schoenebeck,	(参考訳) NCIM(Non-consensual Intimate Media)は、表現されている個人に対して、インターネット規模の害を与えるメディアである。削除を求める最も強力なツールの1つは、デジタルミレニアム著作権法(DMCA)である。しかし、DMCAはNCIMの問題に対処するよりも著作権保持者を保護するために設計された。本稿では,54,000以上のDMCAレポートと8500万以上のURLを10年以上にわたって収集したデータセットを用いて,NCIM削除に対するDMCAの有効性を評価する。その結果、インデクシングURLの50%以下は60日以内にウェブサイトのホストから削除され、Google検索はインデクシングコンテンツのデインデクシングに11.7日を要した。ウェブホスト全体では、最初の48時間でURLのわずか4%が削除される。さらに、非商業的なNCIMのための最も頻繁に報告されるドメインは、大きなプラットフォームではなく、より小さなウェブサイトである。我々は、大きなプラットフォームや小さなプラットフォームにまたがって実施可能な、削除期間の短縮を保証する新しい法律の必要性を強調します。 Non-consensual intimate media (NCIM) presents internet-scale harm to individuals who are depicted. One of the most powerful tools for requesting its removal is the Digital Millennium Copyright Act (DMCA). However, the DMCA was designed to protect copyright holders rather than to address the problem of NCIM. Using a dataset of more than 54,000 DMCA reports and over 85 million infringing URLs spanning over a decade, this paper evaluates the efficacy of the DMCA for NCIM takedown. Results show less than 50% of infringing URLs are removed from website hosts in 60 days, and Google Search takes a median of 11.7 days to deindex infringing content. Across web hosts, only 4% of URLs are removed within the first 48 hours. Additionally, the most frequently reported domains for non-commercial NCIM are smaller websites, not large platforms. We stress the need for new laws that ensure a shorter time to takedown that are enforceable across big and small platforms alike.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# Region Prompt Tuning:Regional Text Promptを利用したきめ細かいシーンテキスト検出 Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt ( http://arxiv.org/abs/2409.13576v1 ) ライセンス: Link先を確認	Xingtao Lin, Heqian Qiu, Lanxiao Wang, RUihang Wang, Linfeng XU, Hongliang Li,	(参考訳) プロンプトチューニングの最近の進歩は、シーンテキスト検出などの下流タスクに対して、Contrastive Language-Image Pre-trained (CLIP)のような大規模モデルに適応することに成功した。通常、テキストプロンプトはテキストエンコーダの入力を補完し、細粒度の詳細を無視しながらグローバルな特徴に焦点を合わせ、シーンテキスト検出のタスクではきめ細かいテキストが無視される。本稿では,詳細なシーンテキスト検出のための領域プロンプトチューニング(RPT)手法を提案する。リージョンプロンプトチューニング法は、地域テキストプロンプトを個々の文字に分解し、視覚特徴マップを地域視覚トークンに分割し、文字とトークンを1対1で対応させる。これにより、文字はトークンの局所的な特徴と一致し、詳細な特徴やきめ細かいテキストが省略されるのを避けることができる。これを実現するために,各文字を対応するトークンにリンクするための共有位置埋め込みを導入し,各領域のテキストプロンプト文字をターゲットの `text'' に合わせるために双方向距離ロスを用いる。細粒度レベルで情報を洗練するために,符号化前後の文字-トークンレベルの相互作用を実装した。提案手法は,画像テキストプロセスから得られた一般的なスコアマップと文字とトークンのマッチングから得られた領域スコアマップを組み合わせることで,グローバルな特徴とローカルな特徴のバランスを保ち,DBNetに入力してテキストを検知する最終的なスコアマップを生成する。 ICDAR2015、TotalText、CTW1500といったベンチマークの実験では、RTTのパフォーマンスが印象的であり、シーンテキスト検出の有効性が強調されている。 Recent advancements in prompt tuning have successfully adapted large-scale models like Contrastive Language-Image Pre-trained (CLIP) for downstream tasks such as scene text detection. Typically, text prompt complements the text encoder's input, focusing on global features while neglecting fine-grained details, leading to fine-grained text being ignored in task of scene text detection. In this paper, we propose the region prompt tuning (RPT) method for fine-grained scene text detection, where region text prompt proposed would help focus on fine-grained features. Region prompt tuning method decomposes region text prompt into individual characters and splits visual feature map into region visual tokens, creating a one-to-one correspondence between characters and tokens. This allows a character matches the local features of a token, thereby avoiding the omission of detailed features and fine-grained text. To achieve this, we introduce a sharing position embedding to link each character with its corresponding token and employ a bidirectional distance loss to align each region text prompt character with the target ``text''. To refine the information at fine-grained level, we implement character-token level interactions before and after encoding. Our proposed method combines a general score map from the image-text process with a region score map derived from character-token matching, producing a final score map that could balance the global and local features and be fed into DBNet to detect the text. Experiments on benchmarks like ICDAR2015, TotalText, and CTW1500 demonstrate RPT impressive performance, underscoring its effectiveness for scene text detection.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# Time and Tokens: エンドツーエンド音声障害検出のベンチマーク Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection ( http://arxiv.org/abs/2409.13582v1 ) ライセンス: Link先を確認	Xuanru Zhou, Jiachen Lian, Cheol Jun Cho, Jingwen Liu, Zongli Ye, Jinming Zhang, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Maria Luisa Gorno Tempini, Gopala Anumanchipalli,	(参考訳) 音声のディフルエンシモデリングは、繰り返し、ブロック、挿入、置換、削除などの音声のディフルエンシを検出するタスクである。最近の進歩は、この問題を時間に基づく物体検出問題として扱う。本研究では,この問題を新しい視点から再考する: 障害のトークン化と検出問題をトークンベース自動音声認識(ASR)問題としてモデル化する。規則に基づく音声とテキストのディフルエンシシミュレータを提案し、VCTKトケンを開発し、その後、Whisperのようなセク2seqアーキテクチャを開発し、良好な性能を持つ新しいベンチマークを構築する。また,提案手法と時間に基づく手法を体系的に比較し,今後の研究を促進するための統一ベンチマークを提案する。より広い科学コミュニティのために、これらのリソースをオープンソースにしています。プロジェクトページはhttps://rorizzz.github.io/で公開されている。 Speech dysfluency modeling is a task to detect dysfluencies in speech, such as repetition, block, insertion, replacement, and deletion. Most recent advancements treat this problem as a time-based object detection problem. In this work, we revisit this problem from a new perspective: tokenizing dysfluencies and modeling the detection problem as a token-based automatic speech recognition (ASR) problem. We propose rule-based speech and text dysfluency simulators and develop VCTK-token, and then develop a Whisper-like seq2seq architecture to build a new benchmark with decent performance. We also systematically compare our proposed token-based methods with time-based methods, and propose a unified benchmark to facilitate future research endeavors. We open-source these resources for the broader scientific community. The project page is available at https://rorizzz.github.io/	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# ニューロシンボリック・コンフォーマル分類 Neurosymbolic Conformal Classification ( http://arxiv.org/abs/2409.13585v1 ) ライセンス: Link先を確認	Arthur Ledaguenel, Céline Hudelot, Mostepha Khouadjia,	(参考訳) 過去数十年間、主にディープラーニング(DL)によって駆動される機械学習(ML)が大幅に改善されてきた。しかし、多くの領域におけるMLの成功にもかかわらず、(分散シフトや敵攻撃などに直面した)MLシステムの整合性の保証や脆弱性を提供することの不可能さは、信頼できるAIシステムの設計を妨げている。この脆弱性を軽減し、ニューロシンボリックAIと共形予測を含むMLシステムの動作に関するいくつかの保証を提供するために、いくつかの研究パスが研究されている。ニューロシンボリック人工知能(Neurosymbolic AI)は、ニューラルネットワーク学習能力とシンボリックシステムの推論能力を組み合わせることを目的とした研究分野である。このハイブリダイゼーションの目的の1つは、システムの出力が何らかの事前の知識に従うという理論的な保証を提供することである。コンフォーマル予測(Conformal prediction)とは、一意の予測を信頼セットと呼ばれる一連の予測に変換することによって、MLシステムの不確実性を考慮した一連の手法である。興味深いことに、これは信頼セット内の真のラベルの存在に関する統計的保証が伴う。どちらのアプローチも分布自由であり、モデルに依存しない。本稿では,この2つのアプローチが相互に補完する方法について述べる。本稿では,いくつかのニューロシンボリックな共形予測手法を導入し,その特性(信頼性セットのサイズ,計算複雑性など)について検討する。 The last decades have seen a drastic improvement of Machine Learning (ML), mainly driven by Deep Learning (DL). However, despite the resounding successes of ML in many domains, the impossibility to provide guarantees of conformity and the fragility of ML systems (faced with distribution shifts, adversarial attacks, etc.) have prevented the design of trustworthy AI systems. Several research paths have been investigated to mitigate this fragility and provide some guarantees regarding the behavior of ML systems, among which are neurosymbolic AI and conformal prediction. Neurosymbolic artificial intelligence is a growing field of research aiming to combine neural network learning capabilities with the reasoning abilities of symbolic systems. One of the objective of this hybridization can be to provide theoritical guarantees that the output of the system will comply with some prior knowledge. Conformal prediction is a set of techniques that enable to take into account the uncertainty of ML systems by transforming the unique prediction into a set of predictions, called a confidence set. Interestingly, this comes with statistical guarantees regarding the presence of the true label inside the confidence set. Both approaches are distribution-free and model-agnostic. In this paper, we see how these two approaches can complement one another. We introduce several neurosymbolic conformal prediction techniques and explore their different characteristics (size of confidence sets, computational complexity, etc.).	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# 機械学習による量子固有解アルゴリズムの高速化 Accelerating Quantum Eigensolver Algorithms With Machine Learning ( http://arxiv.org/abs/2409.13587v1 ) ライセンス: Link先を確認	Avner Bensoussan, Elena Chachkarova, Karine Even-Mendoza, Sophie Fortz, Connor Lenihan,	(参考訳) 本稿では,NISQデバイス上でのハミルトン基底状態エネルギー計算の高速化について検討する。本稿では,量子固有解法を応用した量子アルゴリズムの高速化を機械学習と併用して提案する。我々は、XGBoostのPythonレグレシタを使用して、最大16キュービットのシステムから古典的にマイニングされたデータに関する2つの小さなモデルを訓練した。 Eigensolverのハイパーパラメータを最適化することにより,20ビット,24ビット,28ビットシステムに対する予備的アプローチを評価した。これらのモデルはハイパーパラメータ値を予測し、28量子ビットシステムでのテストでは0.13\%-0.15\%エラーを減少させる。しかし,20量子ビット系と24量子ビット系では決定的な結果が得られず,ハミルトン特性に基づくトレーニングデータのさらなる検討が提案されている。今後の研究では、機械学習モデルをトレーニングして、ハイパーパラメータを超えて量子アルゴリズムの実行の他の側面やサブルーチンを最適化する予定です。 In this paper, we explore accelerating Hamiltonian ground state energy calculation on NISQ devices. We suggest using search-based methods together with machine learning to accelerate quantum algorithms, exemplified in the Quantum Eigensolver use case. We trained two small models on classically mined data from systems with up to 16 qubits, using XGBoost's Python regressor. We evaluated our preliminary approach on 20-, 24- and 28-qubit systems by optimising the Eigensolver's hyperparameters. These models predict hyperparameter values, leading to a 0.13\%-0.15\% reduction in error when tested on 28-qubit systems. However, due to inconclusive results with 20- and 24-qubit systems, we suggest further examination of the training data based on Hamiltonian characteristics. In future work, we plan to train machine learning models to optimise other aspects or subroutines of quantum algorithm execution beyond its hyperparameters.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# ChainBuddy: LLMパイプラインを生成するAIエージェントシステム ChainBuddy: An AI Agent System for Generating LLM Pipelines ( http://arxiv.org/abs/2409.13588v1 ) ライセンス: Link先を確認	Jingyue Zhang, Ian Arawjo,	(参考訳) 大規模言語モデル(LLM)が進歩するにつれて、その潜在的なアプリケーションは大幅に成長した。しかし、ユーザ固有のタスクにおけるLCMの挙動を評価し、効果的にパイプラインを構築することは依然として困難である。多くのユーザーはどこから始めるかに苦慮しており、しばしば「ブランクページ」問題と呼ばれる。 ChainBuddyは、ChainForgeプラットフォームに組み込まれた評価LLMパイプラインを生成するためのAIアシスタントである。 ChainBuddyは、LCMの振る舞いを計画し、評価するための単純でユーザフレンドリな方法を提供する。本稿では,ChainBuddyをベースラインインタフェースと比較した内的ユーザスタディを報告する。 AIアシストを使用する場合、参加者は要求の少ない作業負荷を報告し、LCM動作の評価パイプラインのセットアップをより確実に感じた。我々は,AIのオープンエンド評価において,ユーザを支援するインターフェースの将来に対する洞察を導き出す。 As large language models (LLMs) advance, their potential applications have grown significantly. However, it remains difficult to evaluate LLM behavior on user-specific tasks and craft effective pipelines to do so. Many users struggle with where to start, often referred to as the "blank page" problem. ChainBuddy, an AI assistant for generating evaluative LLM pipelines built into the ChainForge platform, aims to tackle this issue. ChainBuddy offers a straightforward and user-friendly way to plan and evaluate LLM behavior, making the process less daunting and more accessible across a wide range of possible tasks and use cases. We report a within-subjects user study comparing ChainBuddy to the baseline interface. We find that when using AI assistance, participants reported a less demanding workload and felt more confident setting up evaluation pipelines of LLM behavior. We derive insights for the future of interfaces that assist users in the open-ended evaluation of AI.	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# MRI分類モデルにおける$k$-Space特徴の影響の解析 Analyzing the Effect of $k$-Space Features in MRI Classification Models ( http://arxiv.org/abs/2409.13589v1 ) ライセンス: Link先を確認	Pascal Passigan, Vayd Ramkumar,	(参考訳) 医療診断における人工知能(AI)の統合は、しばしばモデル不透明さによって妨げられ、高い精度のシステムは透明な推論なしで「ブラックボックス」として機能する。この制限は、信頼性と信頼性が最重要である臨床環境において重要である。これを解決するために、医用イメージングに適した説明可能なAI手法を開発した。画像領域と周波数領域の両方にわたるMRIスキャンを解析する畳み込みニューラルネットワーク(CNN)を用いることで,一様マニフォールド近似と投影UMAPを組み込んだ新しいアプローチを導入し,潜時入力埋め込みの可視化を行う。このアプローチは、早期トレーニング効率を高めるだけでなく、追加機能がモデル予測に与える影響の理解を深め、解釈可能性を高め、より正確で直感的な診断推論をサポートする。 The integration of Artificial Intelligence (AI) in medical diagnostics is often hindered by model opacity, where high-accuracy systems function as "black boxes" without transparent reasoning. This limitation is critical in clinical settings, where trust and reliability are paramount. To address this, we have developed an explainable AI methodology tailored for medical imaging. By employing a Convolutional Neural Network (CNN) that analyzes MRI scans across both image and frequency domains, we introduce a novel approach that incorporates Uniform Manifold Approximation and Projection UMAP] for the visualization of latent input embeddings. This approach not only enhances early training efficiency but also deepens our understanding of how additional features impact the model predictions, thereby increasing interpretability and supporting more accurate and intuitive diagnostic inferences	翻訳日:2024-11-07 06:30:58 公開日:2024-09-20
# マルチモーダル生成プリミティブを利用した画像編集 Portrait Video Editing Empowered by Multimodal Generative Priors ( http://arxiv.org/abs/2409.13591v1 ) ライセンス: Link先を確認	Xuan Gao, Haiyao Xiao, Chenglai Zhong, Shimin Hu, Yudong Guo, Juyong Zhang,	(参考訳) マルチモーダルプロンプトを用いた一貫した表現型スタイリングを実現する強力なポートレートビデオ編集手法であるPortraitGenを紹介する。伝統的なポートレートビデオ編集手法は、しばしば3Dと時間的一貫性に悩まされ、通常、レンダリングの品質と効率性が欠如している。これらの問題に対処するため、我々はポートレートビデオフレームを動的3次元ガウス場に引き上げ、フレーム間の構造的・時間的コヒーレンスを確保する。さらに,洗練されたスタイル編集を可能にするだけでなく,100FPS以上のレンダリング速度を実現するニューラルガウステクスチャ機構を設計する。提案手法は,大規模2次元生成モデルから抽出した知識によるマルチモーダル入力を取り入れたものである。また,表情類似性指導と顔認識画像編集モジュールを内蔵し,反復的データセット更新に伴う劣化問題を効果的に軽減する。大規模な実験により, 時間的一貫性, 編集効率, レンダリング品質が向上した。提案手法の幅広い適用性は、テキスト駆動編集、画像駆動編集、リライティングなど様々なアプリケーションを通じて実証され、ビデオ編集の分野を前進させる大きな可能性を浮き彫りにしている。デモビデオとリリースされたコードは、プロジェクトページで公開されています。 We introduce PortraitGen, a powerful portrait video editing method that achieves consistent and expressive stylization with multimodal prompts. Traditional portrait video editing methods often struggle with 3D and temporal consistency, and typically lack in rendering quality and efficiency. To address these issues, we lift the portrait video frames to a unified dynamic 3D Gaussian field, which ensures structural and temporal coherence across frames. Furthermore, we design a novel Neural Gaussian Texture mechanism that not only enables sophisticated style editing but also achieves rendering speed over 100FPS. Our approach incorporates multimodal inputs through knowledge distilled from large-scale 2D generative models. Our system also incorporates expression similarity guidance and a face-aware portrait editing module, effectively mitigating degradation issues associated with iterative dataset updates. Extensive experiments demonstrate the temporal consistency, editing efficiency, and superior rendering quality of our method. The broad applicability of the proposed approach is demonstrated through various applications, including text-driven editing, image-driven editing, and relighting, highlighting its great potential to advance the field of video editing. Demo videos and released code are provided in our project page: https://ustc3dv.github.io/PortraitGen/	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# yesBut: 視覚言語モデルのサファイア理解能力を評価するための高品質アノテーション付きマルチモーダルデータセット YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models ( http://arxiv.org/abs/2409.13592v1 ) ライセンス: Link先を確認	Abhilash Nandy, Yash Agarwal, Ashish Patwa, Millon Madhur Das, Aman Bansal, Ankit Raj, Pawan Goyal, Niloy Ganguly,	(参考訳) 風刺やユーモアを理解することは、現在のVision-Languageモデルでも難しい課題です。本稿では,風刺画像検出(画像が風刺的かどうかを検出する),理解(画像の背景にある理由を生成する),コンプリート(画像の一方が風刺的であるような2つの選択肢から残りの半分を選択),高品質なデータセットYesBut(2547枚,風刺的1084枚,非風刺的1463枚)の課題を提示し,それらの課題を評価する。データセットの各風刺画像は、笑いや皮肉のような矛盾するシナリオとともに、通常のシナリオを描いている。視覚的QAや画像キャプションなどのマルチモーダルタスクにおける現在のビジョンランゲージモデルの成功にもかかわらず、ベンチマーク実験により、ゼロショット設定におけるYesButデータセットにおける提案されたタスクでは、自動化と人的評価の両方において、そのようなモデルが不十分であることが示されている。さらに、さらなる研究のために、119枚のリアルな風刺写真データセットをリリースする。データセットとコードはhttps://github.com/abhi1nandy2/yesbut_datasetで公開されている。 Understanding satire and humor is a challenging task for even current Vision-Language models. In this paper, we propose the challenging tasks of Satirical Image Detection (detecting whether an image is satirical), Understanding (generating the reason behind the image being satirical), and Completion (given one half of the image, selecting the other half from 2 given options, such that the complete image is satirical) and release a high-quality dataset YesBut, consisting of 2547 images, 1084 satirical and 1463 non-satirical, containing different artistic styles, to evaluate those tasks. Each satirical image in the dataset depicts a normal scenario, along with a conflicting scenario which is funny or ironic. Despite the success of current Vision-Language Models on multimodal tasks such as Visual QA and Image Captioning, our benchmarking experiments show that such models perform poorly on the proposed tasks on the YesBut Dataset in Zero-Shot Settings w.r.t both automated as well as human evaluation. Additionally, we release a dataset of 119 real, satirical photographs for further research. The dataset and code are available at https://github.com/abhi1nandy2/yesbut_dataset.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# クロスターゲットスタンス検出:技術,データセット,課題の調査 Cross-Target Stance Detection: A Survey of Techniques, Datasets, and Challenges ( http://arxiv.org/abs/2409.13594v1 ) ライセンス: Link先を確認	Parisa Jamadi Khiabani, Arkaitz Zubiaga,	(参考訳) スタンス検出は、テキストで表現された視点を所定のターゲットに向けて決定するタスクである。タスク内の特定の方向は、特定のターゲットに関連するサンプルに基づいてトレーニングされたモデルが、新しい、目に見えないターゲットに適用される、クロスターゲットスタンス検出に焦点を当てる。オンラインの視点や意見の分析やマイニングの必要性が高まる中、このタスクは近年大きな関心を集めている。本稿は,過去10年間の目標間姿勢検出の進歩を概観し,基礎統計手法から現代ニューラルモデル,LLMモデルへの進化を概説する。これらの進歩は、精度と適応性に顕著な改善をもたらした。イノベーティブなアプローチには、トピックグループ化された注意とゼロショット検出のための逆学習の使用、モデルロバスト性を高める微調整技術などがある。さらに、プロンプトチューニング手法と外部知識の統合により、モデル性能はさらに改善された。これらのモデルを評価するために使用されるデータセットの包括的概要も提供され、この分野の進歩と課題に関する貴重な洞察を提供する。我々は,研究の新たな方向性を強調し,今後の課題への道筋を提案することで結論付ける。 Stance detection is the task of determining the viewpoint expressed in a text towards a given target. A specific direction within the task focuses on cross-target stance detection, where a model trained on samples pertaining to certain targets is then applied to a new, unseen target. With the increasing need to analyze and mining viewpoints and opinions online, the task has recently seen a significant surge in interest. This review paper examines the advancements in cross-target stance detection over the last decade, highlighting the evolution from basic statistical methods to contemporary neural and LLM-based models. These advancements have led to notable improvements in accuracy and adaptability. Innovative approaches include the use of topic-grouped attention and adversarial learning for zero-shot detection, as well as fine-tuning techniques that enhance model robustness. Additionally, prompt-tuning methods and the integration of external knowledge have further refined model performance. A comprehensive overview of the datasets used for evaluating these models is also provided, offering valuable insights into the progress and challenges in the field. We conclude by highlighting emerging directions of research and by suggesting avenues for future work in the task.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# 非エルミート系における断熱増幅への幾何学的寄与 Geometric contribution to adiabatic amplification in non-Hermitian systems ( http://arxiv.org/abs/2409.13595v1 ) ライセンス: Link先を確認	Tomoki Ozawa, Henning Schomerus,	(参考訳) 非エルミート量子力学の概念は、光学、古典力学、メタマテリアルデザインなど、様々な古典システムの理解と操作に有用であることが証明されている。近年, 断熱処理におけるベリー相の非エルミートアナログを実験的に測定した。非エルミート系では、ベリー相は虚部を持ち、全波の強度の増幅や減衰に寄与する。ベリー曲率の虚部が 0 であるとき、この幾何増幅係数はパラメータ空間における断熱経路の初期点と最終点によってのみ決定され、これらの点が経路によってどのように接続されるかには依存しない。我々は、この経路独立が適切な対称性によって保証される非エルミート・ハミルトン群のクラスをリストし、これらのクラスの一部について、増幅係数は初期点と最終点のピーターマン因子の観点でのみ記述できることを見出した。我々の結果は、断熱過程下での波動関数のノルムがどのように変化するかを観察することによって、ピーターマン因子を実験的に得ることができる。我々は、物理的関連性の具体例を用いて、我々の理論を検証した。 Concepts from non-Hermitian quantum mechanics have proven useful in understanding and manipulating a variety of classical systems, such as encountered in optics, classical mechanics, and metamaterial design. Recently, the non-Hermitian analog of the Berry phase for adiabatic processes has been experimentally measured. In non-Hermitian systems, the Berry phase can have an imaginary part, which contributes to the amplification or decay of the total wave intensity. When the imaginary part of the Berry curvature is zero, this geometric amplification factor is determined solely by the initial and final points of the adiabatic path in parameter space, and does not depend on how these points are connected by the path. We list classes of non-Hermitian Hamiltonians where this path independence is guaranteed by suitable symmetries, and find that, for some of these classes, the amplification factor can be written only in terms of the Petermann factors of the initial and final points. Our result can, in turn, be used to experimentally obtain the Petermann factor by observing how the norm of the wavefunction changes under adiabatic processes. We validate our theory using a couple of concrete examples of physical relevance.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# Prithvi WxC:気候と気候の基礎モデル Prithvi WxC: Foundation Model for Weather and Climate ( http://arxiv.org/abs/2409.13598v1 ) ライセンス: Link先を確認	Johannes Schmude, Sujit Roy, Will Trojak, Johannes Jakubik, Daniel Salles Civitarese, Shraddha Singh, Julian Kuehnert, Kumar Ankur, Aman Gupta, Christopher E Phillips, Romeo Kienzler, Daniela Szwarcman, Vishal Gaur, Rajat Shinde, Rohit Lal, Arlindo Da Silva, Jorge Luis Guevara Diaz, Anne Jones, Simon Pfreundschuh, Amy Lin, Aditi Sheshadri, Udaysankar Nair, Valentine Anantharaj, Hendrik Hamann, Campbell Watson, Manil Maskey, Tsengdar J Lee, Juan Bernabe Moreno, Rahul Ramachandran,	(参考訳) AIエミュレータは、HPCシステムで動作する従来の数値天気予報モデルのパフォーマンスに匹敵する可能性があるという認識から、予測、ダウンスケール、あるいは現在のキャストといったユースケースに対処する大規模なAIモデルが増えている。 AI文学における並列的な開発は、複数の異なるユースケースに対応するために効果的に調整可能な基盤モデルに焦点が当てられているが、天気と気候に関する開発は、主に中距離予測に特に重点を置いて、シングルユースケースに焦点を当てている。このギャップを埋めるために、Prithvi WxCというパラメータ基盤モデルを導入する。これは、Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2)から160変数を用いて開発された。 Prithvi WxCはエンコーダ-デコーダベースのアーキテクチャを採用し、様々なトランスフォーマーモデルの概念を取り入れて、入力データにおける地域的およびグローバルな依存関係を効果的にキャプチャする。このモデルは、異なる位相の気象現象を微細な解像度でモデル化するために、大きなトークン数に対応できるように設計されている。さらに,マスクを用いた再建と予測のパラダイムを組み合わせた混合目標を用いて訓練を行った。本稿では, 自動回帰ロールアウト予測, ダウンスケーリング, 重力波フラックスパラメータ化, エクストリームイベント推定など, 課題のある下流タスクのセットでモデルを検証する。 2.3億のパラメータを持つ事前トレーニングされたモデルは、関連する微調整ワークフローとともに、Hugging Faceを通じてオープンソースコントリビューションとして公開された。 Triggered by the realization that AI emulators can rival the performance of traditional numerical weather prediction models running on HPC systems, there is now an increasing number of large AI models that address use cases such as forecasting, downscaling, or nowcasting. While the parallel developments in the AI literature focus on foundation models -- models that can be effectively tuned to address multiple, different use cases -- the developments on the weather and climate side largely focus on single-use cases with particular emphasis on mid-range forecasting. We close this gap by introducing Prithvi WxC, a 2.3 billion parameter foundation model developed using 160 variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). Prithvi WxC employs an encoder-decoder-based architecture, incorporating concepts from various recent transformer models to effectively capture both regional and global dependencies in the input data. The model has been designed to accommodate large token counts to model weather phenomena in different topologies at fine resolutions. Furthermore, it is trained with a mixed objective that combines the paradigms of masked reconstruction with forecasting. We test the model on a set of challenging downstream tasks namely: Autoregressive rollout forecasting, Downscaling, Gravity wave flux parameterization, and Extreme events estimation. The pretrained model with 2.3 billion parameters, along with the associated fine-tuning workflows, has been publicly released as an open-source contribution via Hugging Face.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# MeLIAD:Metric LearningとEntropy-based Scoringを用いた解釈可能なFew-Shot異常検出 MeLIAD: Interpretable Few-Shot Anomaly Detection with Metric Learning and Entropy-based Scoring ( http://arxiv.org/abs/2409.13602v1 ) ライセンス: Link先を確認	Eirini Cholopoulou, Dimitris K. Iakovidis,	(参考訳) 異常検出(AD)は、欠陥製品を検出し、品質検査を自動化するマルチメディアアプリケーションにおいて重要な役割を果たす。ディープラーニング(DL)モデルは通常、大規模なアノテートデータを必要とする。これらのモデルのブラックボックスの性質は、ユーザーが信頼することを禁じている。これらの課題に対処するために,従来の手法と異なり,真の異常の分布仮定に頼らずに設計による解釈性を実現する,新しい異常検出手法であるMeLIADを提案する。 MeLIADは、拡張テクニックを使わずに、トレーニング用の異常のサンプルをわずかに必要としており、本質的に解釈可能であり、画像がなぜ異常であると特定されたかに関する洞察を提供する可視化を提供する。これは、異常なインスタンスの識別とローカライズのための、新しいトレーニング可能なエントロピーベースのスコアリングコンポーネントと、メトリック学習目的の異常スコアリングコンポーネントを協調的に最適化する新規なロス関数を導入することで達成される。解釈可能性の定量的かつ定性的な評価を含む5つの公開ベンチマークデータセットの実験は、MeLIADが最先端の手法と比較して異常検出とローカライゼーション性能の改善を実現していることを示している。 Anomaly detection (AD) plays a pivotal role in multimedia applications for detecting defective products and automating quality inspection. Deep learning (DL) models typically require large-scale annotated data, which are often highly imbalanced since anomalies are usually scarce. The black box nature of these models prohibits them from being trusted by users. To address these challenges, we propose MeLIAD, a novel methodology for interpretable anomaly detection, which unlike the previous methods is based on metric learning and achieves interpretability by design without relying on any prior distribution assumptions of true anomalies. MeLIAD requires only a few samples of anomalies for training, without employing any augmentation techniques, and is inherently interpretable, providing visualizations that offer insights into why an image is identified as anomalous. This is achieved by introducing a novel trainable entropy-based scoring component for the identification and localization of anomalous instances, and a novel loss function that jointly optimizes the anomaly scoring component with a metric learning objective. Experiments on five public benchmark datasets, including quantitative and qualitative evaluation of interpretability, demonstrate that MeLIAD achieves improved anomaly detection and localization performance compared to state-of-the-art methods.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# 時間発展局所作用素における行列要素のパウリ重量要件-平衡温度を超える依存性 Pauli weight requirement of the matrix elements in time-evolved local operators: dependence beyond the equilibration temperature ( http://arxiv.org/abs/2409.13603v1 ) ライセンス: Link先を確認	Carlos Ramos-Marimón, Stefano Carignano, Luca Tagliacozzo,	(参考訳) ハイゼンベルク図における局所作用素の平衡外進化をシミュレートする複雑さは、一般の非可積分系に間に合うように線形に成長する作用素の絡み合いによって支配され、計算資源の指数的な増加をもたらす。この課題を単純化するための有望なアプローチは、作用素の一部を破棄し、ラコフスキーらによって提案された「軽い」パウリ弦(パウリ行列がほとんどない弦)によって形成された部分空間に焦点を当てることである。本研究では, この戦略が同質な生成物状態から始まるクエンチに応用できるかどうかを考察する。エルゴード力学では、これらの初期状態は幅広い平衡温度にアクセスできる。所望の行列要素に集中し、初期状態に平行なパウリ文字列を含む演算子の部分のみを保持することによって、複雑なシナリオを明らかにする。場合によっては、光のパウリ弦は力学を記述するのに十分であり、現在のアルゴリズムによる効率的なシミュレーションを可能にしている。しかし、他のケースでは、より重い文字列が必要となり、現在の能力を超えて計算要求を押し進める。我々は,Bloch球面上のほとんどの点において異なる演算子に対して計算を行う演算子重みエントロピー(Operator Weight Entropy)を用いて,この振る舞いを分析する。 The complexity of simulating the out-of-equilibrium evolution of local operators in the Heisenberg picture is governed by the operator entanglement, which grows linearly in time for generic non-integrable systems, leading to an exponential increase in computational resources. A promising approach to simplify this challenge involves discarding parts of the operator and focusing on a subspace formed by "light" Pauli strings - strings with few Pauli matrices - as proposed by Rakovszki et al. [PRB 105, 075131 (2022)]. In this work, we investigate whether this strategy can be applied to quenches starting from homogeneous product states. For ergodic dynamics, these initial states grant access to a wide range of equilibration temperatures. By concentrating on the desired matrix elements and retaining only the portion of the operator that contains Pauli strings parallel to the initial state, we uncover a complex scenario. In some cases, the light Pauli strings suffice to describe the dynamics, enabling efficient simulation with current algorithms. However, in other cases, heavier strings become necessary, pushing computational demands beyond our current capabilities. We analyze this behavior using a newly introduced measure of complexity, the Operator Weight Entropy, which we compute for different operators across most points on the Bloch sphere.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# 自閉症スペクトラム障害児の包括的ビデオ理解に向けて Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder ( http://arxiv.org/abs/2409.13606v1 ) ライセンス: Link先を確認	Aditya Kommineni, Digbalay Bose, Tiantian Feng, So Hyun Kim, Helen Tager-Flusberg, Somer Bishop, Catherine Lord, Sudarsana Kadiri, Shrikanth Narayanan,	(参考訳) 自閉症スペクトラム障害(Autism Spectrum disorder)の文脈における臨床ビデオは、しばしば子供と介護者・臨床専門家の間の長い形式の相互作用であり、複雑な言語行動と非言語行動を含んでいる。これらの動画を客観的に分析することで、自閉症スペクトラム障害児の行動に関する微妙な洞察を臨床医や研究者に提供することができる。これらのビデオを手作業でコーディングするのは時間を要する作業であり、高いレベルのドメイン知識が必要です。したがって、これらの相互作用を計算的に捉える能力は、手作業を強化し、診断手順をサポートすることができる。本研究では,3つのモダリティ(音声,ビデオ,テキスト)にまたがる基礎モデルを用いて,子どものインタラクション・セッションの分析を行う。本稿では,大規模言語モデルを推論エージェントとして利用することにより,複数のモーダルを結合する統一手法を提案する。本研究は,行動認識と異常行動検出という,情報粒度の異なる2つのタスクにおいて,その性能を評価する。提案したマルチモーダルパイプラインは,モダリティに特有の制約に対して堅牢性を提供し,単調な設定に比べて臨床ビデオ解析の性能を向上させる。 Clinical videos in the context of Autism Spectrum Disorder are often long-form interactions between children and caregivers/clinical professionals, encompassing complex verbal and non-verbal behaviors. Objective analyses of these videos could provide clinicians and researchers with nuanced insights into the behavior of children with Autism Spectrum Disorder. Manually coding these videos is a time-consuming task and requires a high level of domain expertise. Hence, the ability to capture these interactions computationally can augment the manual effort and enable supporting the diagnostic procedure. In this work, we investigate the use of foundation models across three modalities: speech, video, and text, to analyse child-focused interaction sessions. We propose a unified methodology to combine multiple modalities by using large language models as reasoning agents. We evaluate their performance on two tasks with different information granularity: activity recognition and abnormal behavior detection. We find that the proposed multimodal pipeline provides robustness to modality-specific limitations and improves performance on the clinical video analysis compared to unimodal settings.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# FIHA:Davidson Scene Graphsを用いた視覚言語モデルの自律幻覚評価 FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs ( http://arxiv.org/abs/2409.13612v1 ) ライセンス: Link先を確認	Bowen Yan, Zhengsong Zhang, Liqiang Jing, Eftekhar Hossain, Xinya Du,	(参考訳) LVLM(Large Vision-Language Models)の急速な開発は、しばしば幻覚の広範な問題を引き起こし、コスト効率と包括的評価がますます重要になっている。現在のアプローチは、主にコストのかかるアノテーションに依存しており、アスペクト間の関係、属性、依存関係など、すべての側面を評価するという点において、包括的ではない。そこで, FIHA (Autonomous Fine-grained Hallucination evAluation Evaluation in LVLMs) を導入し, LLMフリーかつアノテーションフリーな手法で幻覚LVLMにアクセスし, 異なる種類の幻覚間の依存性をモデル化した。 FIHAは、画像データセット上のQ&Aペアを最小限のコストで生成することができ、画像とキャプションの両方から幻覚評価を可能にする。本手法では, FIHA-v1というベンチマークを導入し, MSCOCO と Foggy の様々な画像に対する多様な質問からなる。さらに、Davidson Scene Graph(DSG)を用いて、Q&Aペア間の構造を整理し、評価の信頼性を高める。 FIHA-v1を用いた代表モデルの評価を行い,その限界と課題を強調した。コードとデータを公開しました。 The rapid development of Large Vision-Language Models (LVLMs) often comes with widespread hallucination issues, making cost-effective and comprehensive assessments increasingly vital. Current approaches mainly rely on costly annotations and are not comprehensive -- in terms of evaluating all aspects such as relations, attributes, and dependencies between aspects. Therefore, we introduce the FIHA (autonomous Fine-graIned Hallucination evAluation evaluation in LVLMs), which could access hallucination LVLMs in the LLM-free and annotation-free way and model the dependency between different types of hallucinations. FIHA can generate Q&A pairs on any image dataset at minimal cost, enabling hallucination assessment from both image and caption. Based on this approach, we introduce a benchmark called FIHA-v1, which consists of diverse questions on various images from MSCOCO and Foggy. Furthermore, we use the Davidson Scene Graph (DSG) to organize the structure among Q&A pairs, in which we can increase the reliability of the evaluation. We evaluate representative models using FIHA-v1, highlighting their limitations and challenges. We released our code and data.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# pAE:ヒト視覚系におけるフィードフォワードとフィードバックストリームの統合による側方遺伝子核のモデリングのための効率的なオートエンコーダアーキテクチャ pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System ( http://arxiv.org/abs/2409.13622v1 ) ライセンス: Link先を確認	Moslem Gorji, Amin Ranjbar, Mohammad Bagher Menhaj,	(参考訳) 視覚野は脳の不可欠な部分であり、階層的に物体を識別する役割を担っている。ボトムアップおよびトップダウン経路の両方で視覚情報を処理する際には、視覚野の前野としての外側原核(LGN)の役割を理解することが重要である。視覚刺激が網膜に達すると、初期処理のためにLGN領域に伝達され、さらに処理するために視覚野に送られる。本研究では,人間の視覚情報処理を近似した深部畳み込みモデルを提案する。我々は,pAEアーキテクチャに基づいて設計した浅層畳み込みモデルを用いて,LGN領域の関数を近似することを目的とする。 pAEモデルは、V1領域からのフィードフォワードとフィードバックストリームを問題に統合しようと試みている。このモデリングフレームワークは、固定カメラが連続的に捉えた自然な画像を含む視覚刺激データセットの時間的および非時間的データ供給モードと、動物(動作中)の画像と動物のない画像の2つのカテゴリを含む。次に,提案モデルとGabor およびbiorthogonal wavelet 関数を用いたウェーブレットフィルタバンク法を比較した。実験の結果,提案手法は人体ベンチマークと高い類似性を持つ結果を得るだけでなく,他のモデルよりも優れた性能を示すことがわかった。 pAEモデルは最終99.26%の予測性能を達成し、時間モードでの人間の結果よりも約28%向上したことを示す。 The visual cortex is a vital part of the brain, responsible for hierarchically identifying objects. Understanding the role of the lateral geniculate nucleus (LGN) as a prior region of the visual cortex is crucial when processing visual information in both bottom-up and top-down pathways. When visual stimuli reach the retina, they are transmitted to the LGN area for initial processing before being sent to the visual cortex for further processing. In this study, we introduce a deep convolutional model that closely approximates human visual information processing. We aim to approximate the function for the LGN area using a trained shallow convolutional model which is designed based on a pruned autoencoder (pAE) architecture. The pAE model attempts to integrate feed forward and feedback streams from/to the V1 area into the problem. This modeling framework encompasses both temporal and non-temporal data feeding modes of the visual stimuli dataset containing natural images captured by a fixed camera in consecutive frames, featuring two categories: images with animals (in motion), and images without animals. Subsequently, we compare the results of our proposed deep-tuned model with wavelet filter bank methods employing Gabor and biorthogonal wavelet functions. Our experiments reveal that the proposed method based on the deep-tuned model not only achieves results with high similarity in comparison with human benchmarks but also performs significantly better than other models. The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# 準長距離非ハーミタンスキンモードからの超スペクトル感度と非局所二重不純物境界状態 Ultra spectral sensitivity and non-local bi-impurity bound states from quasi-long-range non-hermitian skin modes ( http://arxiv.org/abs/2409.13623v1 ) ライセンス: Link先を確認	Chang Shu, Kai Zhang, Kai Sun,	(参考訳) 量子力学の基本的な信条は、量子系のエネルギースペクトルが無限に弱く空間的に制限された摂動に対して安定であり続けることである。本稿では、このスペクトル安定性の原理が非エルミート系において熱力学極限で失敗することを実証する。例えば、非相互作用非エルミート系 $H_0$ と点のような不純物がいくつかあり、それぞれが局所短距離ポテンシャル $V_i$ を$i=1, \ldots, n$ で導入する。不純物ポテンシャルが十分に弱い場合、単一の不純物を導入するとスペクトルが変わらず、すなわち$H_0$と$H_0 + V_1$はほぼ同じエネルギースペクトルを持つ。しかし、もし2番目の不純物である$H_0 + V_1 + V_2$が導入された場合、これらの局所ポテンシャルがどれほど弱いとしても、それらの距離が十分に大きい限り、エネルギースペクトルの著しい変化が起こり、安定スペクトルの伝統的な期待と直接矛盾する。注目すべきは、この現象は非局所的であり、摂動の影響は2つの不純物の間の距離とともに指数関数的に増加することである。言い換えれば、ハミルトニアンは完全に局所的であるが、そのエネルギースペクトルは単一の無限小弱い不純物の存在に盲目であり、宇宙において大きな距離で分離された2つの無限小弱い不純物の存在を検出することができる。グリーン関数法を用いて、このスペクトル感度の起源を明らかにし、これは非局所的二不純物境界状態の形成から生じる。我々は、そのようなスペクトル不安定性を同定し、特徴付ける解析理論を提供し、数値解と完全に一致することを示す。 A fundamental tenet of quantum mechanics is that the energy spectrum of a quantum system shall remain stable against infinitesimally weak and spatially confined perturbations. In this article, we demonstrate that this principle of spectral stability fails in non-Hermitian systems at the thermodynamic limit. Consider, for instance, a non-interacting non-Hermitian system $H_0$ with a couple of point-like impurities, each of which introduces a local short-range potential $V_i$ with $i=1, \ldots, n$ labeling the impurities. If the impurity potentials are sufficiently weak, introducing a single impurity will not alter the spectrum; that is, $H_0$ and $H_0 + V_1$ have nearly identical energy spectra. However, if a second impurity is introduced, $H_0 + V_1 + V_2$, we find that no matter how weak these local potentials are, as long as the distance between them is sufficiently large, significant alterations in the energy spectrum can arise, directly contradicting the traditional expectation of a stable spectrum. Remarkably, this phenomenon is non-local, and the impact of the perturbations increases exponentially with the distance between the two impurities. In other words, although the Hamiltonian is entirely local, its energy spectrum, which is blind to the presence of a single infinitesimally weak impurity, is capable of detecting the presence of two infinitesimally weak impurities separated by a large distance in space. Using Green's function techniques, we uncover the origin of this spectral sensitivity, which arises from the formation of non-local bi-impurity bound states: non-local stationary states with wavepackets propagating back-and-forth between the two impurities. We provide an analytic theory to identify and characterize such spectral instabilities, showing perfect agreement with numerical solutions.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# GSConvモジュールとECAアテンション機構に基づくUnet脳腫瘍画像のセグメンテーションの改善 Improved Unet brain tumor image segmentation based on GSConv module and ECA attention mechanism ( http://arxiv.org/abs/2409.13626v1 ) ライセンス: Link先を確認	Qiyuan Tian, Zhuoyue Wang, Xiaoling Cui,	(参考訳) U-Netアーキテクチャに基づく深層学習アルゴリズムである脳腫瘍に対する医用画像分割法の改良モデルについて述べる。従来のU-Netに基づいて,医療画像分割作業におけるモデルの性能向上を目的としたGSConvモジュールとECAアテンション機構を導入する。これらの改良により、新しいU-Netモデルは、重要なチャネルに柔軟に集中しながら、より効率的なマルチスケール機能の抽出と活用が可能となり、セグメンテーション結果が大幅に改善される。実験中、改良されたU-Netモデルを訓練し、体系的に評価する。トレーニングセットとテストセットの損失曲線を調べた結果,2つの損失値が8世紀以降の最低点まで急速に減少し,徐々に収束し,安定することがわかった。これは、我々のモデルが優れた学習能力と一般化能力を持っていることを示している。さらに, 平均交点比 (mIoU) の変化を観測した結果, 平均交点比 (mIoU) は35世紀以降徐々に0.8に近づき, 安定に保たれていることがわかった。従来のU-Netと比較して、GSConvモジュールとECAアテンション機構に基づく改良版は、セグメンテーション効果の明らかな利点を示している。特に脳腫瘍画像エッジの処理において、改良されたモデルによりより正確なセグメンテーション結果が得られる。この成果は、医用画像解析の精度を向上するだけでなく、より信頼性の高い臨床診断支援も提供する。 An improved model of medical image segmentation for brain tumor is discussed, which is a deep learning algorithm based on U-Net architecture. Based on the traditional U-Net, we introduce GSConv module and ECA attention mechanism to improve the performance of the model in medical image segmentation tasks. With these improvements, the new U-Net model is able to extract and utilize multi-scale features more efficiently while flexibly focusing on important channels, resulting in significantly improved segmentation results. During the experiment, the improved U-Net model is trained and evaluated systematically. By looking at the loss curves of the training set and the test set, we find that the loss values of both rapidly decline to the lowest point after the eighth epoch, and then gradually converge and stabilize. This shows that our model has good learning ability and generalization ability. In addition, by monitoring the change in the mean intersection ratio (mIoU), we can see that after the 35th epoch, the mIoU gradually approaches 0.8 and remains stable, which further validates the model. Compared with the traditional U-Net, the improved version based on GSConv module and ECA attention mechanism shows obvious advantages in segmentation effect. Especially in the processing of brain tumor image edges, the improved model can provide more accurate segmentation results. This achievement not only improves the accuracy of medical image analysis, but also provides more reliable technical support for clinical diagnosis.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# Beauty Beyond Words: Ingredient-Based Product Attributesを使用した説明可能な美しいプロダクトレコメンデーション Beauty Beyond Words: Explainable Beauty Product Recommendations Using Ingredient-Based Product Attributes ( http://arxiv.org/abs/2409.13628v1 ) ライセンス: Link先を確認	Siliang Liu, Rahul Suresh, Amin Banitalebi-Dehkordi,	(参考訳) 正確な属性抽出は、美容製品のレコメンデーションと顧客との信頼構築に不可欠である。既存のソリューションはしばしば信頼性が低く不完全であるため、これは未解決の問題である。美容製品材料に基づくエンド・ツー・エンドの教師あり学習を用いて美容特性を抽出するシステムを提案する。私たちのシステムに対する重要な洞察は、新しいエネルギーベースの暗黙的モデルアーキテクチャである。この暗黙的なモデルアーキテクチャは、正確性、説明可能性、堅牢性、柔軟性という面で大きなメリットをもたらします。さらに、暗黙のモデルは、利用可能な追加属性を組み込むように簡単に微調整できるため、現実世界のアプリケーションではより便利です。当社のモデルをeコマーススキンケア製品カタログデータセット上で検証し,その有効性を実証する。最後に, 美容レコメンデーションの具体的属性抽出が, 美容レコメンデーションの説明可能性の向上にどのように貢献するかを示す。 Accurate attribute extraction is critical for beauty product recommendations and building trust with customers. This remains an open problem, as existing solutions are often unreliable and incomplete. We present a system to extract beauty-specific attributes using end-to-end supervised learning based on beauty product ingredients. A key insight to our system is a novel energy-based implicit model architecture. We show that this implicit model architecture offers significant benefits in terms of accuracy, explainability, robustness, and flexibility. Furthermore, our implicit model can be easily fine-tuned to incorporate additional attributes as they become available, making it more useful in real-world applications. We validate our model on a major e-commerce skincare product catalog dataset and demonstrate its effectiveness. Finally, we showcase how ingredient-based attribute extraction contributes to enhancing the explainability of beauty recommendations.	翻訳日:2024-11-07 06:19:44 公開日:2024-09-20
# 一様TC$^0$の変換器 Transformers in Uniform TC$^0$ ( http://arxiv.org/abs/2409.13629v1 ) ライセンス: Link先を確認	David Chiang,	(参考訳) これまで、平均的注意変換器(AHAT)とSMAT(Softmax-attention transformer)によって認識された言語は、回路複雑性クラスTC$^0$に含まれていた。 Strobl は AHAT が L-ユニフォーム TC$0$ で近似できることを示し、Merrill と Sabharwal は SMAT が DLOGTIME-ユニフォーム TC$0$ で近似できることを示した。ここでは、近似のないAHAT、浮動小数点精度のO(poly(n))ビットのSMAT、最大2$^{-O(poly(n))のSMAT、絶対誤差がすべてDLOGTIME-uniform TC$^0$であることを示す。 Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However, these results assume limited-precision arithmetic: using floating-point numbers with O(log n) bits (where n is the length of the input string), Strobl showed that AHATs can be approximated in L-uniform TC$^0$, and Merrill and Sabharwal showed that SMATs can be approximated in DLOGTIME-uniform TC$^0$. Here, we improve these results, showing that AHATs with no approximation, SMATs with O(poly(n)) bits of floating-point precision, and SMATs with at most $2^{-O(poly(n))}$ absolute error are all in DLOGTIME-uniform TC$^0$.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# リモートセンシング画像分割参照のための微粒化画像テキストアライメントの探索 Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation ( http://arxiv.org/abs/2409.13637v1 ) ライセンス: Link先を確認	Sen Lei, Xinyu Xiao, Heng-Chao Li, Zhenwei Shi, Qing Zhu,	(参考訳) 言語表現が与えられた場合、リモートセンシング画像セグメンテーション(RRSIS)は、基底オブジェクトを識別し、画像内にピクセル単位のラベルを割り当てることを目的としている。このタスクの重要な課題の1つは、テキストイメージアライメントを通じて差別的なマルチモーダル機能をキャプチャすることである。しかし、既存のRRSIS法では1つのバニラと粗いアライメントを使用し、言語表現を直接抽出して視覚的特徴と融合させる。本稿では,「きめ細かい画像テキストアライメント」により,マルチモーダル情報の抽出を改善することができると論じる。そこで本稿では,視覚的および言語的表現を完全に活用する,FIANetと呼ばれる新たな参照リモートセンシング画像分割手法を提案する。具体的には、原文参照表現を文脈テキストとみなし、さらに接地対象テキストと空間位置テキストに分解する。提案した細粒度画像テキストアライメントモジュール(FIAM)は、入力画像と対応するテキストの特徴を同時に活用し、より優れた識別的マルチモーダル表現を学習する。一方,リモートセンシングにおける地上オブジェクトの様々なスケールを扱うために,テキスト対応マルチスケール拡張モジュール(TMEM)を導入し,クロススケールフュージョンと交差点を適応的に行う。本稿では,RefSegRS と RRSIS-D を含む2つのリモートセンシングデータセットに対する提案手法の有効性を評価する。コードは公開されます。 Given a language expression, referring remote sensing image segmentation (RRSIS) aims to identify the ground objects and assign pixel-wise labels within the imagery. The one of key challenges for this task is to capture discriminative multi-modal features via text-image alignment. However, the existing RRSIS methods use one vanilla and coarse alignment, where the language expression is directly extracted to be fused with the visual features. In this paper, we argue that a "fine-grained image-text alignment" can improve the extraction of multi-modal information. To this point, we here proposed a new referring remote sensing image segmentation method, termed FIANet, that fully exploits the visual and linguistic representations. Specifically, the original referring expression is regarded as context text, which is further decoupled into ground object text and spatial position text. The proposed fine-grained image-text alignment module (FIAM) would simultaneously leverage the features of the input image and the corresponding texts and learn better discriminative multi-modal representation. Meanwhile, to handle the various scales of ground objects in remote sensing, we introduce a Text-aware Multi-scale Enhancement Module (TMEM) to adaptively perform cross-scale fusion and intersections. We evaluate the effectiveness of the proposed methods on two public referring remote sensing datasets including RefSegRS and RRSIS-D, and our method obtains superior performance over several state-of-the-art methods. The code will be publicly available.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# 絡み合った光子のオンチップパルス整形 On-chip pulse shaping of entangled photons ( http://arxiv.org/abs/2409.13638v1 ) ライセンス: Link先を確認	Kaiyi Wu, Lucas M. Cohen, Karthik V. Myilswamy, Navin B. Lingaraju, Hsuan-Hao Lu, Joseph M. Lukens, Andrew M. Weiner,	(参考訳) 6チャネルマイクロリング共振器を用いたシリコンフォトニックパルス整形器を用いて、絡み合った光子のスペクトル形状を示す。マイクロ共振器を用いたパルス整形器における熱位相シフタの精密校正により,Hilbert空間の最大6\times 6$(3\times 3$)次元に対応する2つの周波数双絡量子ビットに対する3〜GHzグリッド上のラインバイライン位相制御を,共有(非依存)信号イドラーフィルタに対して示す。パルス整形器の微細スペクトル分解能はナノ秒スケールの時間特性の制御を可能にし、これは理論との整合性に優れた双光子相関関数の直接的一致検出によって観察される。この研究は、我々の知る限り、統合スペクトル整形器を用いたバイフォトンパルス整形の最初の実演であり、量子情報処理への応用に有意義な可能性を秘めている。 We demonstrate spectral shaping of entangled photons with a six-channel microring-resonator-based silicon photonic pulse shaper. Through precise calibration of thermal phase shifters in a microresonator-based pulse shaper, we demonstrate line-by-line phase control on a 3~GHz grid for two frequency-bin-entangled qudits, corresponding to Hilbert spaces of up to $6\times 6$ ($3\times 3$) dimensions for shared (independent) signal-idler filters. The pulse shaper's fine spectral resolution enables control of nanosecond-scale temporal features, which are observed by direct coincidence detection of biphoton correlation functions that show excellent agreement with theory. This work marks, to our knowledge, the first demonstration of biphoton pulse shaping using an integrated spectral shaper and holds significant promise for applications in quantum information processing.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# 精度最適化を超えて:大規模言語モデルファインチューニングのためのコンピュータビジョンの損失 Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning ( http://arxiv.org/abs/2409.13641v1 ) ライセンス: Link先を確認	Daniele Rege Cambrin, Giuseppe Gallipoli, Irene Benedetto, Luca Cagliero, Paolo Garza,	(参考訳) 大きな言語モデル(LLM)は、様々なタスクで素晴らしいパフォーマンスを示しています。しかしながら、現在のトレーニングアプローチでは、標準的なクロスエントロピー損失と、広範なデータ、人間のフィードバック、あるいはパフォーマンスを高めるためのアドホックメソッドを組み合わせる。これらのソリューションは、コスト、複雑さ、あるいはリソース要件のために、スケーラビリティや実現不可能な場合が多い。本研究では,自然言語生成におけるセマンティックセグメンテーションの損失関数の利用について検討した。本研究は,様々な大きさのモデルにまたがって,数学的単語問題と質問応答の解法の有効性を評価する。分析結果から,従来のクロスエントロピー損失は準最適選択であり,FocalやLov\'aszなどの代替(タスク依存)損失を最小限に抑えるために訓練されたモデルでは,追加データや人的フィードバックを必要とせず,正確な一致で+42%向上することがわかった。これらの結果は、より効率的でアクセスしやすいトレーニングプロセスのための、有望な経路であることを示唆している。 Large Language Models (LLMs) have demonstrated impressive performance across various tasks. However, current training approaches combine standard cross-entropy loss with extensive data, human feedback, or ad hoc methods to enhance performance. These solutions are often not scalable or feasible due to their associated costs, complexity, or resource requirements. This study investigates the use of established semantic segmentation loss functions in natural language generation to create a versatile, practical, and scalable solution for fine-tuning different architectures. We evaluate their effectiveness in solving Math Word Problems and question answering across different models of varying sizes. For the analyzed tasks, we found that the traditional Cross-Entropy loss represents a sub-optimal choice, while models trained to minimize alternative (task-dependent) losses, such as Focal or Lov\'asz, achieve a mean improvement of +42% on exact match without requiring additional data or human feedback. These findings suggest a promising pathway for more efficient and accessible training processes.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# LLMエージェントと自己回帰を用いた順序付きコード解析による断層位置決めの強化 Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection ( http://arxiv.org/abs/2409.13642v1 ) ライセンス: Link先を確認	Md Nakhla Rafi, Dong Jae Kim, Tse-Hsun Chen, Shaowei Wang,	(参考訳) ソフトウェア欠陥の配置と修正は、ソフトウェア開発における時間とリソース集約的な作業である。スペクトルベースのフォールトローカライゼーション(SBFL)のような従来のフォールトローカライゼーション手法は、テストカバレッジデータの統計解析に頼っているが、精度が低い場合が多い。学習ベースのテクニックは、より効果的ではあるが、広範なトレーニングデータを必要とし、計算コストがかかる。大規模言語モデル(LLM)の最近の進歩は、コード理解と推論を強化することによって、フォールトローカライゼーションの有望な改善を提供する。しかし、これらのLSMベースの技術は、トークンの制限、長い入力で性能を低下させ、複数の相互作用するコンポーネントを含む複雑なシステムを持つ大規模プロジェクトを管理するのが困難であるなど、依然として課題に直面している。これらの問題に対処するために,SBFLランキングを分割・分散戦略と統合した,新しいLLMエージェントベースの障害ローカライズ手法であるLLM4FLを紹介する。大規模なカバレッジデータを管理可能なグループに分割し、プロンプトチェーンを通じて複数のLLMエージェントを採用することで、LLM4FLはコードベースをナビゲートし、障害をより効率的にローカライズする。このアプローチには自己回帰と連鎖推論も組み込まれており、エージェントが繰り返し修正を生成し、不審なメソッドを再ランクすることができる。 LLM4FLをDefects4J (V2.0.0)ベンチマークで評価した。 LLM4FLがAutoFLを19.27%上回り,DeepFLやGraceといった最先端の監視技術を上回る性能を示した。さらに,適用範囲分割と連鎖の促進がフォールトローカライゼーション性能に及ぼす影響を強調し,Top-1の精度を最大22%向上させることができることを示す。 Locating and fixing software faults is a time-consuming and resource-intensive task in software development. Traditional fault localization methods, such as Spectrum-Based Fault Localization (SBFL), rely on statistical analysis of test coverage data but often suffer from lower accuracy. Learning-based techniques, while more effective, require extensive training data and can be computationally expensive. Recent advancements in Large Language Models (LLMs) offer promising improvements in fault localization by enhancing code comprehension and reasoning. However, these LLM-based techniques still face challenges, including token limitations, degraded performance with long inputs, and difficulties managing large-scale projects with complex systems involving multiple interacting components. To address these issues, we introduce LLM4FL, a novel LLM-agent-based fault localization approach that integrates SBFL rankings with a divide-and-conquer strategy. By dividing large coverage data into manageable groups and employing multiple LLM agents through prompt chaining, LLM4FL navigates the codebase and localizes faults more effectively. The approach also incorporates self-reflection and chain-of-thought reasoning, enabling agents to iteratively generate fixes and re-rank suspicious methods. We evaluated LLM4FL on the Defects4J (V2.0.0) benchmark, comprising 675 real-world faults from 14 open-source Java projects. Our results demonstrate that LLM4FL outperforms AutoFL by 19.27% in Top-1 accuracy and surpasses state-of-the-art supervised techniques such as DeepFL and Grace, all without task-specific training. Additionally, we highlight the impact of coverage splitting and prompt chaining on fault localization performance and show that different method ordering can improve Top-1 accuracy by up to 22%.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# 病理歩行分類のための深層学習モデルのベンチマーク信頼性 Benchmarking Reliability of Deep Learning Models for Pathological Gait Classification ( http://arxiv.org/abs/2409.13643v1 ) ライセンス: Link先を確認	Abhishek Jaiswal, Nisheeth Srivastava,	(参考訳) 神経変性疾患の早期発見は、早期診断と治療がより良い予後をもたらす可能性があるため、重要なオープンな問題である。研究者たちは最近、機械学習アルゴリズムの進歩を活用して、変化した歩行の症状を検出することを模索している。しかし、近年の文献では、様々なセンサーやアルゴリズムを用いて、肯定的かつ正確な検出の主張がいくつかなされているが、実際に実現されるには程遠い。本稿では,翻訳を阻害するギャップを同定するための既存手法について分析する。 3つのKinectシミュレーションと1つの実際のパーキンソン病データセットを対象とした一連の実験を用いて、これらのアプローチにおけるエラーの原因と一般化失敗について強調する。これらの観測結果に基づき,我々はAMS-GCN (Asynchronous Multi-Stream Graph Convolutional Network) という強力なベースラインを提案する。 Early detection of neurodegenerative disorders is an important open problem, since early diagnosis and treatment may yield a better prognosis. Researchers have recently sought to leverage advances in machine learning algorithms to detect symptoms of altered gait, possibly corresponding to the emergence of neurodegenerative etiologies. However, while several claims of positive and accurate detection have been made in the recent literature, using a variety of sensors and algorithms, solutions are far from being realized in practice. This paper analyzes existing approaches to identify gaps inhibiting translation. Using a set of experiments across three Kinect-simulated and one real Parkinson's patient datasets, we highlight possible sources of errors and generalization failures in these approaches. Based on these observations, we propose our strong baseline called Asynchronous Multi-Stream Graph Convolutional Network (AMS-GCN) that can reliably differentiate multiple categories of pathological gaits across datasets.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# 物理・等質制約ニューラルネットワークのための重複しないシュワルツ型領域分割法 Non-overlapping, Schwarz-type Domain Decomposition Method for Physics and Equality Constrained Artificial Neural Networks ( http://arxiv.org/abs/2409.13644v1 ) ライセンス: Link先を確認	Qifeng Hu, Shamsulhaq Basir, Inanc Senocak,	(参考訳) 本稿では,偏微分方程式(PDE)の物理インフォームド機械学習に適した,一般化されたインタフェース条件を用いた非重複型シュワルツ型領域分解法を提案する。本手法は,各サブドメインにおける物理と等価制約付き人工ニューラルネットワーク(PECANN)を利用する。 PDEのみを制約するために初期条件と境界条件を用いるPECANN法から分岐し,PDEと境界条件を併用し,各サブドメインに対して特殊に定式化された汎用インターフェース損失関数を制約する。この修正により、隣接するサブドメイン間の情報交換を遅らせる一方で、サブドメイン固有のインタフェースパラメータの学習が促進され、通信オーバーヘッドが大幅に減少する。拡張ラグランジアン法を条件適応更新戦略で利用することにより、各サブドメインにおける制約付き最適化問題を2つの制約なし問題に変換する。このアプローチは、モデルパラメータのアドホックなチューニングを必要とせずに、ニューラルネットワークのトレーニングを可能にする。メッセージ・パッシング・インタフェース・モデルを用いて, 並列スケーリング性能を32プロセスまで高めながら, 様々な前方および逆問題にまたがって, メソッドの一般化能力と堅牢な並列性能を実証する。このアプローチの重要な強みは、Laplace 方程式と Helmholtz 方程式を統一されたフレームワーク内で複数スケールの解で解く能力である。 We introduce a non-overlapping, Schwarz-type domain decomposition method employing a generalized interface condition, tailored for physics-informed machine learning of partial differential equations (PDEs) in both forward and inverse scenarios. Our method utilizes physics and equality constrained artificial neural networks (PECANN) in each subdomain. Diverging from the original PECANN method, which uses initial and boundary conditions to constrain the PDEs alone, our method jointly employs both the boundary conditions and PDEs to constrain a specially formulated generalized interface loss function for each subdomain. This modification enhances the learning of subdomain-specific interface parameters, while delaying information exchange between neighboring subdomains, and thereby significantly reduces communication overhead. By utilizing an augmented Lagrangian method with a conditionally adaptive update strategy, the constrained optimization problem in each subdomain is transformed into a dual unconstrained problem. This approach enables neural network training without the need for ad-hoc tuning of model parameters. We demonstrate the generalization ability and robust parallel performance of our method across a range of forward and inverse problems, with solid parallel scaling performance up to 32 processes using the Message Passing Interface model. A key strength of our approach is its capability to solve both Laplace's and Helmholtz equations with multi-scale solutions within a unified framework, highlighting its broad applicability and efficiency.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# DP$^2$-FedSAM:パーソナライズされたシャープネス認識最小化による個人的フェデレーション学習の促進 DP$^2$-FedSAM: Enhancing Differentially Private Federated Learning Through Personalized Sharpness-Aware Minimization ( http://arxiv.org/abs/2409.13645v1 ) ライセンス: Link先を確認	Zhenxiao Zhang, Yuanxiong Guo, Yanmin Gong,	(参考訳) Federated Learning(FL)は、複数のクライアントが生データを共有せずに、協調的にモデルをトレーニングできる分散機械学習アプローチである。 FLで共有されるモデル更新によって、センシティブな情報が推測されるのを防ぐために、差分プライベート・フェデレーション・ラーニング(DPFL)が提案されている。 DPFLは、共有モデル更新にランダムノイズを加えて、FLの形式的かつ厳格なプライバシ保護を保証する。しかし、既存のDPFL法は、特にデータ不均一性のある設定において、モデルユーティリティーの深刻な劣化をもたらすことが多い。モデルの有用性を高めるため,DP$^2$-FedSAM: シャープネスを意識した個人化フェデレート学習手法を提案する。 DP$^2$-FedSAMはパーソナライズされた部分的モデル共有とシャープネス対応の最小化オプティマイザを活用して、ノイズの追加とクリップの悪影響を軽減し、プライバシを犠牲にすることなくモデルの有用性を大幅に改善する。理論的観点から,提案手法のプライバシと収束保証の厳密な理論的解析を行う。 DP$^2$-FedSAMの有効性を評価するため、一般的なベンチマークデータセットに基づいて広範囲な評価を行う。提案手法は,従来のDPFL法,特に異種データ設定と比較して,プライバシーとユーティリティのトレードオフを改善していることを確認した。 Federated learning (FL) is a distributed machine learning approach that allows multiple clients to collaboratively train a model without sharing their raw data. To prevent sensitive information from being inferred through the model updates shared in FL, differentially private federated learning (DPFL) has been proposed. DPFL ensures formal and rigorous privacy protection in FL by clipping and adding random noise to the shared model updates. However, the existing DPFL methods often result in severe model utility degradation, especially in settings with data heterogeneity. To enhance model utility, we propose a novel DPFL method named DP$^2$-FedSAM: Differentially Private and Personalized Federated Learning with Sharpness-Aware Minimization. DP$^2$-FedSAM leverages personalized partial model-sharing and sharpness-aware minimization optimizer to mitigate the adverse impact of noise addition and clipping, thereby significantly improving model utility without sacrificing privacy. From a theoretical perspective, we provide a rigorous theoretical analysis of the privacy and convergence guarantees of our proposed method. To evaluate the effectiveness of DP$^2$-FedSAM, we conduct extensive evaluations based on common benchmark datasets. Our results verify that our method improves the privacy-utility trade-off compared to the existing DPFL methods, particularly in heterogeneous data settings.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# OATS: スパースとローランクの分解を通した外部対応プルーニング OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition ( http://arxiv.org/abs/2409.13652v1 ) ライセンス: Link先を確認	Stephen Zhang, Vardan Papyan,	(参考訳) 近年の大規模ファンデーションモデルへのパラダイムシフトにより、ディープラーニングの新しい時代がもたらされた。これらの問題を緩和するために、費用のかかる再トレーニングを必要としないポストホックニューラルネットワークプルーニング技術に協力的な努力が続けられている。かなりの進歩にもかかわらず、既存の手法では圧縮が増加するにつれてモデル性能が着実に低下することが多い。本稿では、入力埋め込みにおける第2モーメント情報を利用して、モデル重みをスパース行列とローランク行列の和に分解する、OATSと呼ばれる大きなトランスフォーマーを圧縮する新しい手法を提案する。再トレーニングなしで、OATSはLlama-3やPhi-3のような大型言語モデルやViTやDINOv2のようなビジョントランスフォーマーで最大$60\%の価格でモデルを圧縮し、最大$1.37\timesのCPUアクセラレーションとパーカッショニングされたモデルで、最先端のパフォーマンスを達成する。 The recent paradigm shift to large-scale foundation models has brought about a new era for deep learning that, while has found great success in practice, has also been plagued by prohibitively expensive costs in terms of high memory consumption and compute. To mitigate these issues, there has been a concerted effort in post-hoc neural network pruning techniques that do not require costly retraining. Despite the considerable progress being made, existing methods often exhibit a steady drop in model performance as the compression increases. In this paper, we present a novel approach to compressing large transformers, coined OATS, that utilizes the second moment information in the input embeddings to decompose the model weights into a sum of sparse and low-rank matrices. Without any retraining, OATS achieves state-of-the-art performance when compressing models by up to $60\%$ on large language models such as Llama-3 and Phi-3 and vision transformers such as ViT and DINOv2 while delivering up to $1.37\times$ the CPU acceleration versus a model that was comparably pruned.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# ニューラルネットワークに基づく動的システムのモデルに対するニューラルフィルタリング Neural filtering for Neural Network-based Models of Dynamic Systems ( http://arxiv.org/abs/2409.13654v1 ) ライセンス: Link先を確認	Parham Oveissi, Turibius Rozario, Ankit Goel,	(参考訳) 力学系モデリングにおけるニューラルネットワークの適用は、複雑な非線形関数を推定する能力によって顕著になっている。その効果にもかかわらず、ニューラルネットワークは長期的な予測において課題に直面し、予測エラーは時間とともに分散し、精度が低下する。本稿では,ニューラルネットワークを用いた動的システムの長期予測精度を高めるニューラルネットワークフィルタを提案する。拡張カルマンフィルタによって動機付けられたニューラルネットワークフィルタは、ニューラルネットワークの状態予測と物理系からの計測とを組み合わせて、推定状態の精度を向上させる。ニューラルネットワークによる予測精度の向上は、4つの非線形力学系への応用を通じて実証される。数値実験により、ニューラルネットワークフィルタは予測精度を著しく改善し、状態推定共分散を束縛し、ニューラルネットワーク予測より優れていることが示された。 The application of neural networks in modeling dynamic systems has become prominent due to their ability to estimate complex nonlinear functions. Despite their effectiveness, neural networks face challenges in long-term predictions, where the prediction error diverges over time, thus degrading their accuracy. This paper presents a neural filter to enhance the accuracy of long-term state predictions of neural network-based models of dynamic systems. Motivated by the extended Kalman filter, the neural filter combines the neural network state predictions with the measurements from the physical system to improve the estimated state's accuracy. The neural filter's improvements in prediction accuracy are demonstrated through applications to four nonlinear dynamical systems. Numerical experiments show that the neural filter significantly improves prediction accuracy and bounds the state estimate covariance, outperforming the neural network predictions.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# 自動広告オークションチューニングのための適応混合重要度サンプリング Adaptive Mixture Importance Sampling for Automated Ads Auction Tuning ( http://arxiv.org/abs/2409.13655v1 ) ライセンス: Link先を確認	Yimeng Jia, Kaushal Paneri, Rong Huang, Kailash Singh Maurya, Pavan Mallapragada, Yifan Shi,	(参考訳) 本稿では,オンライン広告オークションなどの大規模レコメンデータシステムにおいて,キーパフォーマンス指標(KPI)を最適化するための新しいアプローチとして,アダプティブ・ミックス・コンパタンス・サンプリング(AMIS)を提案する。従来の重要サンプリング(IS)手法は、特にマルチモーダルランドスケープの複雑さをナビゲートし、最適化タスクの局所的な最適化を避ける際に、動的環境における課題に直面している。標準適応ISや複数ISのように重要度を更新・混合する代わりに、AMISフレームワークは、提案分布として混合分布を活用し、各繰り返しにおける混合パラメータと混合率の両方を動的に調整し、探索の多様性と効率を向上させる。大規模なオフラインシミュレーションを通じて、AMISは、特にノイズの多い環境で、単純なガウスの重要度サンプリング(GIS)を著しく上回ることを示す。さらに,本手法は,主要な検索エンジン上でのオンラインA/B実験を通じて実世界のシナリオにおいて検証される。これらの結果から,AMISはノイズの多い環境下での収束を促進させ,より正確で信頼性の高い意思決定を重要視しうることが明らかとなった。 This paper introduces Adaptive Mixture Importance Sampling (AMIS) as a novel approach for optimizing key performance indicators (KPIs) in large-scale recommender systems, such as online ad auctions. Traditional importance sampling (IS) methods face challenges in dynamic environments, particularly in navigating through complexities of multi-modal landscapes and avoiding entrapment in local optima for the optimization task. Instead of updating importance weights and mixing samples across iterations, as in canonical adaptive IS and multiple IS, our AMIS framework leverages a mixture distribution as the proposal distribution and dynamically adjusts both the mixture parameters and their mixing rates at each iteration, thereby enhancing search diversity and efficiency. Through extensive offline simulations, we demonstrate that AMIS significantly outperforms simple Gaussian Importance Sampling (GIS), particularly in noisy environments. Moreover, our approach is validated in real-world scenarios through online A/B experiments on a major search engine, where AMIS consistently identifies optimal tuning points that are more likely to be adopted as mainstream configurations. These findings indicate that AMIS enhances convergence in noisy environments, leading to more accurate and reliable decision-making in the context of importance sampling off-policy estimators.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# ソフトウェアモデリング課題における行動・相互作用・課題の探索--学生を対象とした実証的研究 Exploring Actions, Interactions and Challenges in Software Modelling Tasks: An Empirical Investigation with Students ( http://arxiv.org/abs/2409.13656v1 ) ライセンス: Link先を確認	Shalini Chakraborty, Javier Troya, Lola Burgueño, Grischa Liebel,	(参考訳) 背景: ソフトウェアモデリングは創造的だが難しいタスクです。モデリング担当者は、モデリングの問題を理解することから、適切なモデリング戦略やモデリングツールでそれを解決するまで、プロセスで迷子になることが多い。モデリングを学ぶ学生は、しばしば表記法やツールに圧倒される。学生にシステマティック・モデリングを教えるためには,学生の実践的モデリング知識と,モデリング中に直面する課題について検討する必要がある。 Aim: 学生のモデリング知識とモデリング行動を探究することを目的としている。さらに,特定のモデリングツールのモデリング課題を解決しつつ,学生の課題について検討する。方法:2つの大学と国から16組の学生が1時間,モデル作成タスクを解く様子を観察し,実証的研究を行った。結果: 個々のモデリングスタイル, ツールのインターフェース, モデリング知識に基づいて, クラス図とシーケンス図のモデリングパターンが異なることがわかった。モデリングツールが学生のモデリングスタイルにどのように影響するか、学生の自信と創造性を育むためにどのように使用できるかを観察した。そこで本研究では,モデリング教育の強化と,実践的なモデリングスキルの獲得を支援するためのガイドラインのセットを開発した。結論: 教育におけるモデリングの指針は、構造化され、体系化されるべきである。本研究により,様々なモデリングスタイルが存在することが明らかとなった。モデラーの創造的な側面、特に学生である間に育むことが不可欠である。そのため、適切なツールを選択することが重要であり、学生はツールがモデリングスタイルにどのように影響するかを理解する必要がある。 Background: Software modelling is a creative yet challenging task. Modellers often find themselves lost in the process, from understanding the modelling problem to solving it with proper modelling strategies and modelling tools. Students learning modelling often get overwhelmed with the notations and tools. To teach students systematic modelling, we must investigate students' practical modelling knowledge and the challenges they face while modelling. Aim: We aim to explore students' modelling knowledge and modelling actions. Further, we want to investigate students' challenges while solving a modelling task on specific modelling tools. Method: We conducted an empirical study by observing 16 pairs of students from two universities and countries solving modelling tasks for one hour. Results: We find distinct patterns of modelling of class and sequence diagrams based on individual modelling styles, the tools' interface and modelling knowledge. We observed how modelling tools influence students' modelling styles and how they can be used to foster students' confidence and creativity. Based on these observations, we developed a set of guidelines aimed at enhancing modelling education and helping students acquire practical modelling skills. Conclusions: The guidance for modelling in education needs to be structured and systematic. Our findings reveal that different modelling styles exist, which should be properly studied. It is essential to nurture the creative aspect of a modeller, particularly while they are still students. Therefore, selecting the right tool is important, and students should understand how a tool can influence their modelling style.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# 拡散モデルを用いた自律走行試験のための効率的な領域拡張 Efficient Domain Augmentation for Autonomous Driving Testing Using Diffusion Models ( http://arxiv.org/abs/2409.13661v1 ) ライセンス: Link先を確認	Luciano Baresi, Davide Yi Xian Hu, Andrea Stocco, Paolo Tonella,	(参考訳) シミュレーションベースのテストは、自律運転システム(ADS)の信頼性を評価するために広く用いられているが、その効果は、そのようなシミュレータで利用可能な運用設計領域(ODD)条件によって制限されている。この制限に対処するため、本研究では、ADSシステムレベルのテストを強化するために、生成人工知能技術と物理ベースのシミュレータの統合について検討する。本研究は, 拡散モデルに基づく3つの生成戦略の有効性と計算オーバーヘッド, すなわち, インペインティング, インペインティング, 精細化によるインペインティングについて検討した。具体的には、これらの技術を用いて、新しいODDを表す運転シナリオのシミュレーション生成画像を生成する能力を評価する。セマンティックセグメンテーションに基づく不適切な入力に対して,ニューラル生成画像のセマンティックな保存とリアリズムを確保するために,新しい自動検出手法を採用した。次にシステムレベルのテストを行い、新たに合成されたODDに対するADSの一般化能力を評価する。以上の結果から,拡散モデルがADSのシステムレベルテストにおけるODDカバレッジを向上させることが示唆された。自動セマンティックバリデータは, 偽陽性率を3倍に抑え, 生成した画像の正しさと品質を維持した。我々の手法は実世界のテストの前に新しいADSシステム障害を特定することに成功している。 Simulation-based testing is widely used to assess the reliability of Autonomous Driving Systems (ADS), but its effectiveness is limited by the operational design domain (ODD) conditions available in such simulators. To address this limitation, in this work, we explore the integration of generative artificial intelligence techniques with physics-based simulators to enhance ADS system-level testing. Our study evaluates the effectiveness and computational overhead of three generative strategies based on diffusion models, namely instruction-editing, inpainting, and inpainting with refinement. Specifically, we assess these techniques' capabilities to produce augmented simulator-generated images of driving scenarios representing new ODDs. We employ a novel automated detector for invalid inputs based on semantic segmentation to ensure semantic preservation and realism of the neural generated images. We then perform system-level testing to evaluate the ADS's generalization ability to newly synthesized ODDs. Our findings show that diffusion models help increase the ODD coverage for system-level testing of ADS. Our automated semantic validator achieved a percentage of false positives as low as 3\%, retaining the correctness and quality of the generated images for testing. Our approach successfully identified new ADS system failures before real-world testing.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# グラフニューラルネットワークを用いた遺伝子発現からの遺伝子制御ネットワークの解析 Analysis of Gene Regulatory Networks from Gene Expression Using Graph Neural Networks ( http://arxiv.org/abs/2409.13664v1 ) ライセンス: Link先を確認	Hakan T. Otal, Abdulhamit Subasi, Furkan Kurt, M. Abdullah Canbaz, Yasin Uzun,	(参考訳) 遺伝子調節ネットワーク(GRN)の複雑さの解明は、細胞プロセスや疾患のメカニズムを理解する上で重要である。伝統的な計算手法は、しばしばこれらのネットワークの動的な性質に苦しむ。本研究では、GRNのようなグラフ構造化データをモデリングするための強力なアプローチであるグラフニューラルネットワーク(GNN)の利用について検討する。本研究は,グラフ注意ネットワーク v2 (GATv2) を用いて,表現データと文献由来のブールモデルを用いて,GRNの構築と取調べに新たなアプローチを提案する。規制相互作用を正確に予測し、キーレギュレータをピンポイントするモデルの有効性は、GNNフレームワークの目印である高度な注意機構に起因している。これらの知見は、GNNがGRN分析に革命をもたらし、従来の制限に対処し、より豊かな生物学的洞察を提供することを示唆している。 GNNの成功は、我々のモデルが高品質なデータに依存していることが強調されているように、進歩を維持するために強化されたデータ収集方法を求めている。 GRN研究におけるGNNの統合は、パーソナライズド医薬品、薬物発見、および我々の生物学的システムの把握における先駆的な発展を目標とし、ノードとエッジの予測を改善するネットワークの構造解析によって促進される。 Unraveling the complexities of Gene Regulatory Networks (GRNs) is crucial for understanding cellular processes and disease mechanisms. Traditional computational methods often struggle with the dynamic nature of these networks. This study explores the use of Graph Neural Networks (GNNs), a powerful approach for modeling graph-structured data like GRNs. Utilizing a Graph Attention Network v2 (GATv2), our study presents a novel approach to the construction and interrogation of GRNs, informed by gene expression data and Boolean models derived from literature. The model's adeptness in accurately predicting regulatory interactions and pinpointing key regulators is attributed to advanced attention mechanisms, a hallmark of the GNN framework. These insights suggest that GNNs are primed to revolutionize GRN analysis, addressing traditional limitations and offering richer biological insights. The success of GNNs, as highlighted by our model's reliance on high-quality data, calls for enhanced data collection methods to sustain progress. The integration of GNNs in GRN research is set to pioneer developments in personalized medicine, drug discovery, and our grasp of biological systems, bolstered by the structural analysis of networks for improved node and edge prediction.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# DiffFluid: 平板拡散モデルが流れの予測に有効である DiffFluid: Plain Diffusion Models are Effective Predictors of Flow Dynamics ( http://arxiv.org/abs/2409.13665v1 ) ライセンス: Link先を確認	Dongyu Luo, Jianyu Wu, Jing Wang, Hairun Xie, Xiangyu Yue, Shixiang Tang,	(参考訳) 本稿では, トランスフォーマーを用いた拡散モデルが, 各種作業条件, ダーシー流, レイノルズ数などの流体力学の効果的な予測因子であることを示す。複雑な構造に依存して複雑な相関関係を抽出し,基礎となる物理状態を学習する従来の流体力学解法とは異なり,本手法は画像翻訳問題としてフローダイナミクスの予測を定式化し,その問題に対処するために平らな拡散モデルを利用する。このモデル設計の複雑さの低減は、流体力学方程式の複雑な物理的状態や幾何学的特徴を捉える能力を損なうことなく、高精度な解が導かれる。各種流体関連ベンチマークの予備試験では、DiffFluidは、特に流体力学におけるナヴィエ・ストークス方程式の解法において、一貫した技術性能を達成し、相対精度は+44.8%向上した。さらに, ダシー流方程式の+14.0%と+11.3%の相対的な改善, オイラー方程式の翼問題も達成した。コードは受理後、https://github.com/DongyuLUO/DiffFluidでリリースされる。 We showcase the plain diffusion models with Transformers are effective predictors of fluid dynamics under various working conditions, e.g., Darcy flow and high Reynolds number. Unlike traditional fluid dynamical solvers that depend on complex architectures to extract intricate correlations and learn underlying physical states, our approach formulates the prediction of flow dynamics as the image translation problem and accordingly leverage the plain diffusion model to tackle the problem. This reduction in model design complexity does not compromise its ability to capture complex physical states and geometric features of fluid dynamical equations, leading to high-precision solutions. In preliminary tests on various fluid-related benchmarks, our DiffFluid achieves consistent state-of-the-art performance, particularly in solving the Navier-Stokes equations in fluid dynamics, with a relative precision improvement of +44.8%. In addition, we achieved relative improvements of +14.0% and +11.3% in the Darcy flow equation and the airfoil problem with Euler's equation, respectively. Code will be released at https://github.com/DongyuLUO/DiffFluid upon acceptance.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# 短ブロック長誤り訂正符号を用いたデベタック・ワール境界を越えた連続可変量子鍵分布の情報再構成 Information Reconciliation for Continuous-Variable Quantum Key Distribution Beyond the Devetak-Winter Bound Using Short Blocklength Error Correction Codes ( http://arxiv.org/abs/2409.13667v1 ) ライセンス: Link先を確認	Kadir Gümüş, João dos Reis Frazão, Aaron Albores-Mejia, Boris Škorić, Gabriele Liga, Yunus Can Gültekin, Thomas Bradley, Alex Alvarado, Chigo Okonkwo,	(参考訳) 本稿では,短いブロック長低レート符号と長いブロック長高レート符号を用いた2段階の誤り訂正方式を用いた照合プロトコルを提案する。この2段階の復号法を用いることで,デベタック・Winter境界を超える秘密鍵レートを達成可能であることを示す。短ブロック長の低密度パリティチェックコードを用いてプロトコルをシミュレートし,最大1.5。これらの高整合効率を用いることで、CV-QKD系の達成可能な距離を2倍にすることができる。 In this paper we introduce a reconciliation protocol with a two-step error correction scheme using a short blocklength low rate code and a long blocklength high rate code. We show that by using this two-step decoding method it is possible to achieve secret key rates beyond the Devetak-Winter bound. We simulate the protocol using short blocklength low-density parity check code, and show that we can obtain reconciliation efficiencies up to 1.5. Using these high reconciliation efficiencies, it is possible double the achievable distances of CV-QKD systems.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# ニューラル情報処理システムにおける動的計算の時空的展望 A Spacetime Perspective on Dynamical Computation in Neural Information Processing Systems ( http://arxiv.org/abs/2409.13669v1 ) ライセンス: Link先を確認	T. Anderson Keller, Lyle Muller, Terrence J. Sejnowski, Max Welling,	(参考訳) 現在では、皮質構造における進行波やその他の時空間の時空間的リカレントな神経力学の実質的な証拠があるが、これらの観察は通常、地形的に整理された選択性やフィードフォワード受容の場の概念と整合することが困難である。構造化された選択性と動的性は矛盾せず、補完的であるニューラル・コンピューティングの新しい「時空」視点を導入する。時空間力学は、自然神経系が世界の近似的視覚的・時間的・抽象的対称性を保存量として符号化し、一般化と長期作業記憶の向上を可能にするメカニズムであることを示す。 There is now substantial evidence for traveling waves and other structured spatiotemporal recurrent neural dynamics in cortical structures; but these observations have typically been difficult to reconcile with notions of topographically organized selectivity and feedforward receptive fields. We introduce a new 'spacetime' perspective on neural computation in which structured selectivity and dynamics are not contradictory but instead are complimentary. We show that spatiotemporal dynamics may be a mechanism by which natural neural systems encode approximate visual, temporal, and abstract symmetries of the world as conserved quantities, thereby enabling improved generalization and long-term working memory.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# グラフ変分オートエンコーダと帯域最適化グラフニューラルネットを用いた複数条件の予測モデル作成フレームワーク A Generative Framework for Predictive Modeling of Multiple Chronic Conditions Using Graph Variational Autoencoder and Bandit-Optimized Graph Neural Network ( http://arxiv.org/abs/2409.13671v1 ) ライセンス: Link先を確認	Julian Carvajal Rico, Adel Alaeddini, Syed Hasib Akhter Faruqui, Susan P Fisher-Hoch, Joseph B Mccormick,	(参考訳) 複数の慢性疾患(MCC)の出現を予測することは早期介入とパーソナライズされた医療にとって重要である。グラフニューラルネットワーク(GNN)は、MCCに見られるような複雑なグラフデータをモデリングするための効果的な手法である。しかし、GNNの重大な課題は、既存のグラフ構造に依存していることだ。そこで本稿では,MCCの予測分析を強化するために,データの分布を利用してグラフ構造を代表的に構築するGNNの新たな生成フレームワークを提案する。本フレームワークでは,患者データの複雑な関係を捉えるために,グラフ変分オートエンコーダ(GVAE)を採用している。これにより、個々の健康トラジェクトリの包括的な理解が可能になり、元の特徴セットを保存しながら、多様な患者の確率的類似性グラフの作成が容易になる。 GVAEデコーダから生成されたこれらの患者の確率的類似性グラフのバリエーションは、新しいラプラシア正規化技術を用いてGNNによって処理され、時間とともにグラフ構造を洗練し、MCCの予測精度を向上させる。文脈的帯域幅は、確率的に生成されたグラフを評価し、モデル収束までGNNモデルの最良の性能グラフを反復的に識別するように設計されている。我々は,MCC患者の大コホート(n = 1,592)に対して,$\varepsilon$-Greedyおよびmulti-armed Banditアルゴリズムに対して,提案手法の有効性を検証した。これらの進歩は、予測医療分析を変革するための提案されたアプローチの可能性を強調し、MCC管理に対してよりパーソナライズされ、積極的なアプローチを可能にする。 Predicting the emergence of multiple chronic conditions (MCC) is crucial for early intervention and personalized healthcare, as MCC significantly impacts patient outcomes and healthcare costs. Graph neural networks (GNNs) are effective methods for modeling complex graph data, such as those found in MCC. However, a significant challenge with GNNs is their reliance on an existing graph structure, which is not readily available for MCC. To address this challenge, we propose a novel generative framework for GNNs that constructs a representative underlying graph structure by utilizing the distribution of the data to enhance predictive analytics for MCC. Our framework employs a graph variational autoencoder (GVAE) to capture the complex relationships in patient data. This allows for a comprehensive understanding of individual health trajectories and facilitates the creation of diverse patient stochastic similarity graphs while preserving the original feature set. These variations of patient stochastic similarity graphs, generated from the GVAE decoder, are then processed by a GNN using a novel Laplacian regularization technique to refine the graph structure over time and improves the prediction accuracy of MCC. A contextual Bandit is designed to evaluate the stochastically generated graphs and identify the best-performing graph for the GNN model iteratively until model convergence. We validate the performance of the proposed contextual Bandit algorithm against $\varepsilon$-Greedy and multi-armed Bandit algorithms on a large cohort (n = 1,592) of patients with MCC. These advancements highlight the potential of the proposed approach to transform predictive healthcare analytics, enabling a more personalized and proactive approach to MCC management.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# 非凸平滑化条件の最近の進歩とディープ線形ニューラルネットワークへの適用性 Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks ( http://arxiv.org/abs/2409.13672v1 ) ライセンス: Link先を確認	Vivak Patel, Christian Varner,	(参考訳) ディープラーニングによるスムーズな最適化問題における非凸性の存在は、文学における新たなスムーズな条件とそれに対応する収束解析を引き起こした。我々はこれらの滑らかさ条件を議論し、それらを注文し、それらが保持されているかどうかを判断するための条件を提供し、二項分類のための深い線形ニューラルネットワークのトレーニングへの適用性を評価する。 The presence of non-convexity in smooth optimization problems arising from deep learning have sparked new smoothness conditions in the literature and corresponding convergence analyses. We discuss these smoothness conditions, order them, provide conditions for determining whether they hold, and evaluate their applicability to training a deep linear neural network for binary classification.	翻訳日:2024-11-07 06:08:43 公開日:2024-09-20
# SoloParkour: 原始的な経験から学ぶ視覚ロコモーションのための制約付き強化学習 SoloParkour: Constrained Reinforcement Learning for Visual Locomotion from Privileged Experience ( http://arxiv.org/abs/2409.13678v1 ) ライセンス: Link先を確認	Elliot Chane-Sane, Joseph Amigo, Thomas Flayols, Ludovic Righetti, Nicolas Mansard,	(参考訳) Parkourは、足のついたロボットにとって重要な課題であり、限られた感覚入力に基づいて、俊敏性と精度で複雑な環境をナビゲートする必要がある。本研究では,深度画素からロボット制御コマンドに至るまでのエンドツーエンドの視覚ポリシーをトレーニングし,アジャイルで安全な四足歩行を実現するための新しい手法を提案する。本稿では,ロボットの身体的限界におけるアジャイルスキルの出現を最大化し,安全性を確保しつつ,制約付き強化学習(RL)問題としてロボットパルクールを定式化する。まず、ロボットの周囲に関する特権情報を用いて、視覚のない政策を訓練する。次に、この特権ポリシーから経験を生成し、深度画像から効率的なオフポリチィRLアルゴリズムのサンプルをウォームスタートする。これによりロボットは、この特権付き体験から視覚的移動への行動に適応し、RLの計算コストをピクセルから直接回避することができる。本研究では,実際のSolo-12ロボットにおいて,歩行,登山,跳躍,クロールなど,さまざまなパーキングスキルを実行する能力を示す。 Parkour poses a significant challenge for legged robots, requiring navigation through complex environments with agility and precision based on limited sensory inputs. In this work, we introduce a novel method for training end-to-end visual policies, from depth pixels to robot control commands, to achieve agile and safe quadruped locomotion. We formulate robot parkour as a constrained reinforcement learning (RL) problem designed to maximize the emergence of agile skills within the robot's physical limits while ensuring safety. We first train a policy without vision using privileged information about the robot's surroundings. We then generate experience from this privileged policy to warm-start a sample efficient off-policy RL algorithm from depth images. This allows the robot to adapt behaviors from this privileged experience to visual locomotion while circumventing the high computational costs of RL directly from pixels. We demonstrate the effectiveness of our method on a real Solo-12 robot, showcasing its capability to perform a variety of parkour skills such as walking, climbing, leaping, and crawling.	翻訳日:2024-11-07 05:57:35 公開日:2024-09-20
# 8量子ビット上のフォールトトレラント対測定符号 A fault-tolerant pairwise measurement-based code on eight qubits ( http://arxiv.org/abs/2409.13681v1 ) ライセンス: Link先を確認	Linnea Grans-Samuelsson, David Aasen, Parsa Bonderson,	(参考訳) 回路ノイズの誤差補正を行う8量子ビットに対して, 故障距離3。このコードは、ペアワイズ・パウリの測定に最も近い近接接続を持つ長方形の量子ビット列のサブセットに実装でき、シンドローム抽出回路は深さ28である。完全クリフォード群を生成する8ビット符号のパッチに対するフォールトトレラントな論理演算について述べる。論理的アイドル時と論理的2量子ビット計測時の両方で回路雑音下での性能を推定する。擬似閾値は、アイドル物理量子ビットの雑音量に応じて、$10^{-5} と$2\times 10^{-4} と見積もる。誤り訂正に加えて、ポストセレクション(全ての次数1の故障を補正し、上位の故障のサブセットを拒絶)を使用することで、擬似閾値を最大で最大で改善することができる。 We construct a pairwise measurement-based code on eight qubits that is error correcting for circuit noise, with fault distance 3. The code can be implemented on a subset of a rectangular array of qubits with nearest neighbor connectivity of pairwise Pauli measurements, with a syndrome extraction circuit of depth 28. We describe fault-tolerant logical operations on patches of this eight-qubit code that generate the full Clifford group. We estimate the performance under circuit noise both during logical idle and during a logical two-qubit measurement. We estimate the pseudo-threshold to be between $10^{-5}$ and $2\times 10^{-4}$, depending on the amount of noise on idle physical qubits. The use of post-selection in addition to error correction (correcting all degree one faults and rejecting a subset of the higher degree faults) can improve the pseudo-threshold by up to an order of magnitude.	翻訳日:2024-11-07 05:57:35 公開日:2024-09-20
# ReMEmbR:ロボットナビゲーションのための長距離時空間メモリの構築と推論 ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation ( http://arxiv.org/abs/2409.13682v1 ) ライセンス: Link先を確認	Abrar Anwar, John Welsh, Joydeep Biswas, Soha Pouya, Yan Chang,	(参考訳) 長い時間にわたって複雑な環境をナビゲートし理解することは、ロボットにとって重要な課題である。ロボットと対話する人々は、何が起きたのか、いつ起きたのか、どれくらい前に起きたのかといった質問をしたいかもしれない。この問題を解決するために,ロボットナビゲーションのための長距離ビデオ質問応答システムReMEmbRを導入する。 ReMEmbRを評価するために,長距離ロボットナビゲーションビデオに空間的,時間的,記述的な質問を注釈付けするNaVQAデータセットを導入する。 ReMEmbRは、時間情報、空間情報、画像を利用して、連続的に成長するロボットの履歴を効率的に扱う、メモリビルディングとクエリフェーズを含む構造化されたアプローチを採用している。我々の実験により、ReMEmbRはLLMとVLMのベースラインよりも優れており、低レイテンシで効率的な長距離推論を実現することができることが示された。さらに、ロボットにReMEmbRをデプロイし、アプローチが多様なクエリを処理可能であることを示す。データセット、コード、ビデオ、その他の資料は以下のリンクで見ることができる。 Navigating and understanding complex environments over extended periods of time is a significant challenge for robots. People interacting with the robot may want to ask questions like where something happened, when it occurred, or how long ago it took place, which would require the robot to reason over a long history of their deployment. To address this problem, we introduce a Retrieval-augmented Memory for Embodied Robots, or ReMEmbR, a system designed for long-horizon video question answering for robot navigation. To evaluate ReMEmbR, we introduce the NaVQA dataset where we annotate spatial, temporal, and descriptive questions to long-horizon robot navigation videos. ReMEmbR employs a structured approach involving a memory building and a querying phase, leveraging temporal information, spatial information, and images to efficiently handle continuously growing robot histories. Our experiments demonstrate that ReMEmbR outperforms LLM and VLM baselines, allowing ReMEmbR to achieve effective long-horizon reasoning with low latency. Additionally, we deploy ReMEmbR on a robot and show that our approach can handle diverse queries. The dataset, code, videos, and other material can be found at the following link: https://nvidia-ai-iot.github.io/remembr	翻訳日:2024-11-07 05:57:35 公開日:2024-09-20
# クラス非依存画像セグメンテーションにおけるボトムアップアプローチ A Bottom-Up Approach to Class-Agnostic Image Segmentation ( http://arxiv.org/abs/2409.13687v1 ) ライセンス: Link先を確認	Sebastian Dille, Ari Blondal, Sylvain Paris, Yağız Aksoy,	(参考訳) クラスに依存しないイメージセグメンテーションは、画像編集ワークフローの自動化において重要なコンポーネントである。文献における既存の手法は、オブジェクト検出がオブジェクトごとのセグメンテーションに先行するクラスベースのアプローチのパラダイムに従って、トップダウンの定式化に固執することが多い。本研究では,クラスに依存しないセグメンテーション問題に対処するためのボトムアップの新たな定式化を提案する。特徴空間の射影球面に直接ネットワークを監督し、計量学習文学に触発された損失と、新しいセグメンテーション空間表現で定義された損失を生かした。セグメンテーションの結果は、推定された特徴の簡単な平均シフトクラスタリングによって得られる。ボトムアップの定式化は、クラスベースのセグメンテーション用に設計されたデータセットで訓練された場合でも、例外的な一般化能力を示す。さらに,細胞分裂と核分裂の課題に対処することで,我々のジェネリックアプローチの有効性を示す。ボトムアップの定式化によって、文献における多様なセグメンテーションの課題に関する貴重な洞察が得られると信じています。 Class-agnostic image segmentation is a crucial component in automating image editing workflows, especially in contexts where object selection traditionally involves interactive tools. Existing methods in the literature often adhere to top-down formulations, following the paradigm of class-based approaches, where object detection precedes per-object segmentation. In this work, we present a novel bottom-up formulation for addressing the class-agnostic segmentation problem. We supervise our network directly on the projective sphere of its feature space, employing losses inspired by metric learning literature as well as losses defined in a novel segmentation-space representation. The segmentation results are obtained through a straightforward mean-shift clustering of the estimated features. Our bottom-up formulation exhibits exceptional generalization capability, even when trained on datasets designed for class-based segmentation. We further showcase the effectiveness of our generic approach by addressing the challenging task of cell and nucleus segmentation. We believe that our bottom-up formulation will offer valuable insights into diverse segmentation challenges in the literature.	翻訳日:2024-11-07 05:57:35 公開日:2024-09-20
# 深層学習による消費者製品から発生したマイクロプラスチックおよびナノプラスチックの形態学的検出と分類 Morphological Detection and Classification of Microplastics and Nanoplastics Emerged from Consumer Products by Deep Learning ( http://arxiv.org/abs/2409.13688v1 ) ライセンス: Link先を確認	Hadi Rezvani, Navid Zarrabi, Ishaan Mehta, Christopher Kolios, Hussein Ali Jaafar, Cheng-Hao Kao, Sajad Saeedi, Nariman Yousefi,	(参考訳) プラスチック汚染は、健康や環境システムに影響を及ぼす世界的な問題であり、マイクロプラスチックやナノプラスチックは、飲料水から空気まで様々な媒体で見られる。これらの汚染物質を研究する伝統的な方法は、労働集約的で時間を要するものであり、より効率的な技術への移行を必要とする。そこで本研究では,マイクロ・ナノプラスチックの自動検出・分類をオブジェクト検出アルゴリズムを用いて行う,新しいオープンソースデータセットであるMiNaとマイクロ・ナノプラスチックについて紹介する。このデータセットは、現実的な水性条件下でシミュレートされた走査型電子顕微鏡画像からなる。我々は、MiNaに最先端検出アルゴリズムを適用し、その有効性を評価し、各手法の固有の課題と可能性を特定する。このデータセットは、マイクロプラスチック研究のために利用可能なリソースの重大なギャップを埋めるだけでなく、この分野における将来の進歩のための堅牢な基盤も提供する。 Plastic pollution presents an escalating global issue, impacting health and environmental systems, with micro- and nanoplastics found across mediums from potable water to air. Traditional methods for studying these contaminants are labor-intensive and time-consuming, necessitating a shift towards more efficient technologies. In response, this paper introduces micro- and nanoplastics (MiNa), a novel and open-source dataset engineered for the automatic detection and classification of micro and nanoplastics using object detection algorithms. The dataset, comprising scanning electron microscopy images simulated under realistic aquatic conditions, categorizes plastics by polymer type across a broad size spectrum. We demonstrate the application of state-of-the-art detection algorithms on MiNa, assessing their effectiveness and identifying the unique challenges and potential of each method. The dataset not only fills a critical gap in available resources for microplastic research but also provides a robust foundation for future advancements in the field.	翻訳日:2024-11-07 05:57:35 公開日:2024-09-20
# 自己回帰映像の時間的アライメント Temporally Aligned Audio for Video with Autoregression ( http://arxiv.org/abs/2409.13689v1 ) ライセンス: Link先を確認	Ilpo Viertola, Vladimir Iashin, Esa Rahtu,	(参考訳) V-AURAは,ビデオ音声生成における時間的アライメントと関連性を実現するための,最初の自己回帰モデルである。 V-AURAは、高フレームの視覚特徴抽出器と、細粒度な視覚イベントを捕捉し、正確な時間的アライメントを確保するために、モーダルなオーディオ-視覚特徴融合戦略を使用する。さらに,高音声・視覚関連性を有するベンチマークデータセットであるVisualSoundを提案する。 VisualSoundはVGGSoundをベースとしている。キュレーション中、聴覚イベントが視覚イベントと一致していないサンプルを除去する。 V-AURAは、時間的アライメントと意味的関連性において、同等のオーディオ品質を維持しながら、現在の最先端モデルより優れている。コード、サンプル、VisualSoundおよびモデルはhttps://v-aura.notion.siteで入手できる。 We introduce V-AURA, the first autoregressive model to achieve high temporal alignment and relevance in video-to-audio generation. V-AURA uses a high-framerate visual feature extractor and a cross-modal audio-visual feature fusion strategy to capture fine-grained visual motion events and ensure precise temporal alignment. Additionally, we propose VisualSound, a benchmark dataset with high audio-visual relevance. VisualSound is based on VGGSound, a video dataset consisting of in-the-wild samples extracted from YouTube. During the curation, we remove samples where auditory events are not aligned with the visual ones. V-AURA outperforms current state-of-the-art models in temporal alignment and semantic relevance while maintaining comparable audio quality. Code, samples, VisualSound and models are available at https://v-aura.notion.site	翻訳日:2024-11-07 05:57:35 公開日:2024-09-20
# 野生における有色拡散固有画像分解 Colorful Diffuse Intrinsic Image Decomposition in the Wild ( http://arxiv.org/abs/2409.13690v1 ) ライセンス: Link先を確認	Chris Careaga, Yağız Aksoy,	(参考訳) 固有の画像分解は、1枚の写真から反射率と効果を分離することを目的としている。問題の複雑さのため、ほとんどの先行研究は単色照明とランベルトの世界を前提としており、照明対応画像編集アプリケーションでは使用を制限している。本研究では,入力画像から拡散アルベド,カラフルな拡散シェーディング,特異残留成分を分離する。我々は、最初は単色照明を、次にランベルト世界の仮定を徐々に取り除き、結果に到達する。この問題をより簡単なサブプロブレムに分割することで、地上のデータセットが限られているにもかかわらず、幅の広いカラフルな拡散シェーディング推定が可能であることを示す。拡張された内在モデルにより、写真の照度を意識した分析が可能となり、明度除去や画素ごとのホワイトバランスなどの画像編集に利用することができる。 Intrinsic image decomposition aims to separate the surface reflectance and the effects from the illumination given a single photograph. Due to the complexity of the problem, most prior works assume a single-color illumination and a Lambertian world, which limits their use in illumination-aware image editing applications. In this work, we separate an input image into its diffuse albedo, colorful diffuse shading, and specular residual components. We arrive at our result by gradually removing first the single-color illumination and then the Lambertian-world assumptions. We show that by dividing the problem into easier sub-problems, in-the-wild colorful diffuse shading estimation can be achieved despite the limited ground-truth datasets. Our extended intrinsic model enables illumination-aware analysis of photographs and can be used for image editing applications such as specularity removal and per-pixel white balancing.	翻訳日:2024-11-07 05:57:35 公開日:2024-09-20
# 古典的影を用いた効率的な計測駆動アイジネギー推定 Efficient Measurement-Driven Eigenenergy Estimation with Classical Shadows ( http://arxiv.org/abs/2409.13691v1 ) ライセンス: Link先を確認	Yizhi Shen, Alex Buzali, Hong-Ye Hu, Katherine Klymko, Daan Camps, Susanne F. Yelin, Roel Van Beeumen,	(参考訳) ターゲットハミルトニアンの下でのリアルタイム進化を利用した量子アルゴリズムは、重要なスペクトル情報を抽出する際の顕著な効率を証明している。しかし、これらの手法のより広い可能性、特に基底状態の計算以上のものは、未発見である。本研究では,観測可能な動的モード分解と,短期的な実装に適した測定駆動型固有解法と,古典的なシャドウトモグラフィを組み合わせたマルチオブザーバブル・ダイナミックモード分解(MODMD)の枠組みを紹介する。 MODMDは、古典的なシャドウ技法のランダムスクランブルを利用して、豊富なスペクトル情報を符号化する信号部分空間である指数関数的にリソース要求を減らした。特に、一般的なアダマールテスト回路を、低ランクの可観測物を予測するためのプロトコルに置き換え、多くの低ランクの可観測物を予測するための古典的なシャドウトモグラフィーの新たな応用を示す。我々はMODMDのスペクトル近似に関する理論的保証を確立し、異なる誤差源を考慮に入れた。理想的な場合、スペクトル誤差は$\exp(- \Delta E t_{\rm max})$で、$\Delta E$はハミルトンスペクトルギャップ、$t_{\rm max}$は最大シミュレーション時間である。この分析は、シミュレーションを通して観察される急激な収束の厳密な正当化を与える。本フレームワークの実用性を示すため,多体システムの代表的エネルギーとして,低地・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地中・地層・地層・地中・地中・地中・地中我々の研究は、短期および早期の耐故障性量子デバイス上での測定駆動アルゴリズムを効率的に設計するための道を開く。 Quantum algorithms exploiting real-time evolution under a target Hamiltonian have demonstrated remarkable efficiency in extracting key spectral information. However, the broader potential of these methods, particularly beyond ground state calculations, is underexplored. In this work, we introduce the framework of multi-observable dynamic mode decomposition (MODMD), which combines the observable dynamic mode decomposition, a measurement-driven eigensolver tailored for near-term implementation, with classical shadow tomography. MODMD leverages random scrambling in the classical shadow technique to construct, with exponentially reduced resource requirements, a signal subspace that encodes rich spectral information. Notably, we replace typical Hadamard-test circuits with a protocol designed to predict low-rank observables, thus marking a new application of classical shadow tomography for predicting many low-rank observables. We establish theoretical guarantees on the spectral approximation from MODMD, taking into account distinct sources of error. In the ideal case, we prove that the spectral error scales as $\exp(- \Delta E t_{\rm max})$, where $\Delta E$ is the Hamiltonian spectral gap and $t_{\rm max}$ is the maximal simulation time. This analysis provides a rigorous justification of the rapid convergence observed across simulations. To demonstrate the utility of our framework, we consider its application to fundamental tasks, such as determining the low-lying, i.e. ground or excited, energies of representative many-body systems. Our work paves the path for efficient designs of measurement-driven algorithms on near-term and early fault-tolerant quantum devices.	翻訳日:2024-11-07 05:57:35 公開日:2024-09-20
# 連系・自動車両の多エージェント協調決定のための値ベース並列更新MCTS法 A Value Based Parallel Update MCTS Method for Multi-Agent Cooperative Decision Making of Connected and Automated Vehicles ( http://arxiv.org/abs/2409.13783v1 ) ライセンス: Link先を確認	Ye Han, Lijun Zhang, Dejian Meng, Xingyu Hu, Songyu Weng,	(参考訳) 本稿では,コネクテッド・オートマチック車両(CAV)用多車協調運転における横方向および対数方向の連成決定の問題を解決するために,マルチエージェント・マルコフゲームに対する並列更新によるモンテカルロ木探索(MCTS)法を提案する。部分定常交通流における多車両共同動作空間における並列動作を解析することにより、並列更新法は潜在的危険な動作を迅速に排除し、探索幅を犠牲にすることなく探索深度を増大させることができる。提案手法は,ランダムに発生する多数のトラフィックフローにおいて検証される。実験の結果,SOTA強化学習アルゴリズムやヒューリスティック手法よりも頑健さと性能がよいことがわかった。提案アルゴリズムを用いた車両運転戦略は,人間の運転者を超えた合理性を示し,コーディネートゾーンにおける交通効率と安全性の優位性を示す。 To solve the problem of lateral and logitudinal joint decision-making of multi-vehicle cooperative driving for connected and automated vehicles (CAVs), this paper proposes a Monte Carlo tree search (MCTS) method with parallel update for multi-agent Markov game with limited horizon and time discounted setting. By analyzing the parallel actions in the multi-vehicle joint action space in the partial-steady-state traffic flow, the parallel update method can quickly exclude potential dangerous actions, thereby increasing the search depth without sacrificing the search breadth. The proposed method is tested in a large number of randomly generated traffic flow. The experiment results show that the algorithm has good robustness and better performance than the SOTA reinforcement learning algorithms and heuristic methods. The vehicle driving strategy using the proposed algorithm shows rationality beyond human drivers, and has advantages in traffic efficiency and safety in the coordinating zone.	翻訳日:2024-11-07 05:13:17 公開日:2024-09-20
# 物理インフォームドカーネル学習 Physics-informed kernel learning ( http://arxiv.org/abs/2409.13786v1 ) ライセンス: Link先を確認	Nathan Doumèche, Francis Bach, Gérard Biau, Claire Boyer,	(参考訳) 物理インフォームド機械学習は、データ駆動項と偏微分方程式(PDE)正則化の両方を含む損失関数を最小化することにより、物理先行を学習プロセスに統合するのが一般的である。問題をカーネル回帰タスクとして定式化することに基づいて、Fourier法を用いて関連するカーネルを近似し、物理インフォームドリスク関数を最小化するトラクタブルな推定器を提案する。このアプローチを物理インフォームド・カーネル・ラーニング(PIKL)と呼ぶ。この枠組みは理論的な保証を提供し、収束速度に対する物理前の影響の定量化を可能にする。シミュレーションによるPIKL推定器の数値性能をハイブリッドモデリングとPDEの解法の両方で示す。特に、PIKLは、精度と計算時間の両方において、物理インフォームドニューラルネットワークより優れていることを示す。さらに, PIKL が従来の PDE ソルバを超えている場合, 特にノイズのある境界条件のシナリオにおいて同定する。 Physics-informed machine learning typically integrates physical priors into the learning process by minimizing a loss function that includes both a data-driven term and a partial differential equation (PDE) regularization. Building on the formulation of the problem as a kernel regression task, we use Fourier methods to approximate the associated kernel, and propose a tractable estimator that minimizes the physics-informed risk function. We refer to this approach as physics-informed kernel learning (PIKL). This framework provides theoretical guarantees, enabling the quantification of the physical prior's impact on convergence speed. We demonstrate the numerical performance of the PIKL estimator through simulations, both in the context of hybrid modeling and in solving PDEs. In particular, we show that PIKL can outperform physics-informed neural networks in terms of both accuracy and computation time. Additionally, we identify cases where PIKL surpasses traditional PDE solvers, particularly in scenarios with noisy boundary conditions.	翻訳日:2024-11-07 05:13:17 公開日:2024-09-20
# テキスト分類のためのマルチソースメタ学習による未知領域の一般化学習 Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification ( http://arxiv.org/abs/2409.13787v1 ) ライセンス: Link先を確認	Yuxuan Hu, Chenwei Zhang, Min Yang, Xiaodan Liang, Chengming Li, Xiping Hu,	(参考訳) ディープラーニング手法の急速な発展に伴い、テキスト分類の分野で多くのブレークスルーがあった。このタスクのために開発されたモデルは、高い精度を達成することが示されている。しかし、これらのモデルのほとんどは、表示されたドメインのラベル付きデータを使って訓練されている。これらのモデルは、モデルの一般化に直接関係する新しい挑戦的未確認領域において、高い精度を維持することは困難である。本稿では,テキスト分類のマルチソース領域一般化について検討し,未知の領域で高い精度を達成できるモデルをトレーニングするために,複数の参照領域を使用するフレームワークを提案する。具体的には、モデル一般化の過程を未知の領域にシミュレートし、十分なドメイン関連特徴を抽出するマルチソースメタラーニングドメイン一般化フレームワークを提案する。メタラーニングフレームワークと協調するドメイン特化機能を記憶するためのメモリ機構を導入した。さらに、モデルが十分なドメイン不変の機能を学ぶことを可能にする、新しい"ジャイ"メカニズムを採用しています。実験により、我々のメタラーニングフレームワークは、目に見えない領域に一般化するモデルの能力を効果的に向上し、マルチソーステキスト分類データセットにおける最先端の手法よりも優れていることが示された。 With the rapid development of deep learning methods, there have been many breakthroughs in the field of text classification. Models developed for this task have been shown to achieve high accuracy. However, most of these models are trained using labeled data from seen domains. It is difficult for these models to maintain high accuracy in a new challenging unseen domain, which is directly related to the generalization of the model. In this paper, we study the multi-source Domain Generalization of text classification and propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain. Specifically, we propose a multi-source meta-learning Domain Generalization framework to simulate the process of model generalization to an unseen domain, so as to extract sufficient domain-related features. We introduced a memory mechanism to store domain-specific features, which coordinate with the meta-learning framework. Besides, we adopt the novel "jury" mechanism that enables the model to learn sufficient domain-invariant features. Experiments demonstrate that our meta-learning framework can effectively enhance the ability of the model to generalize to an unseen domain and can outperform the state-of-the-art methods on multi-source text classification datasets.	翻訳日:2024-11-07 05:13:17 公開日:2024-09-20
# TSP組合せ最適化問題の量子進化アルゴリズム Quantum evolutionary algorithm for TSP combinatorial optimisation problem ( http://arxiv.org/abs/2409.13788v1 ) ライセンス: Link先を確認	Yijiang Ma, Tan Chye Cheah,	(参考訳) 本稿では、量子遺伝的アルゴリズム(QGA)を用いて、旅行セールスマン問題(TSP)と呼ばれる新しい問題を解決する方法を実装した。我々は、この新しいアプローチがいかにうまく機能するかを、古典的遺伝的アルゴリズム(CGA)として知られる従来の手法と比較した。 TSPは、一連の都市を訪れるための最も効率的な経路を見つけ、距離を最小化し、出発点に戻ることを目的として、組合せ最適化において確立された課題である。我々は、計算複雑性と実用上の重要性から、両方のアルゴリズムの性能をテストするためにTSPを選択した。実験では,国際標準ライブラリTSPLIBからデータセットを選択した。アルゴリズムの設計と実装,TSPインスタンスのさまざまなサイズとタイプの実験を行うことで,最適解の精度,イテレーション数,実行時間,アルゴリズムの安定性を詳細に解析する。実験の結果,特に問題サイズが大きい場合,ほとんどのテストインスタンスにおいて,優れた解がより高速に見つかるという点で,CGAがQGAより優れていたことが示唆された。これは、量子コンピューティングの原理が複雑な組合せ最適化問題を解く新しい方法を提供するが、量子現象の実装と量子回転ゲートの最適角度のようなパラメータの設定は困難であり、望ましい結果を達成するためにさらなる最適化が必要であることを示唆している。さらに、QGAが実際の量子ハードウェア上でテストされていないことに注意する必要がある。これらの制限は将来のさらなる研究に豊かな機会を与える。 This paper implements a new way of solving a problem called the traveling salesman problem (TSP) using quantum genetic algorithm (QGA). We compared how well this new approach works to the traditional method known as a classical genetic algorithm (CGA). The TSP is a well-established challenge in combinatorial optimization where the objective is to find the most efficient path to visit a series of cities, minimizing the total distance, and returning to the starting point. We chose the TSP to test the performance of both algorithms because of its computational complexity and importance in practical applications. We choose the dataset from the international standard library TSPLIB for our experiments. By designing and implementing both algorithms and conducting experiments on various sizes and types of TSP instances, we provide an in-depth analysis of the accuracy of the optimal solution, the number of iterations, the execution time, and the stability of the algorithms for both. The empirical findings indicate that the CGA outperforms the QGA in terms of finding superior solutions more quickly in most of the test instances, especially when the problem size is large. This suggests that although the principle of quantum computing provides a new way to solve complex combinatorial optimisation problems, the implementation of quantum phenomena and the setting of parameters such as the optimal angle for a quantum revolving gate is challenging and need further optimisation to achieve the desired results. Additionally, it is important to note that the QGA has not been tested on real quantum hardware, so its true performance remains unverified. These limitations provide rich opportunities for further research in the future.	翻訳日:2024-11-07 05:13:17 公開日:2024-09-20
# 人工軌道の再考--Imitative Generation and Benchmarks Beyond Datasaurus Revisiting Synthetic Human Trajectories: Imitative Generation and Benchmarks Beyond Datasaurus ( http://arxiv.org/abs/2409.13790v1 ) ライセンス: Link先を確認	Bangchao Deng, Xin Jing, Tianyue Yang, Bingqing Qu, Philippe Cudre-Mauroux, Dingqi Yang,	(参考訳) クラウドマネージメントや疫病予防など,様々なアプリケーションにおいて重要な役割を担っている人間の軌道データは,現実的な制約やプライバシー上の懸念から入手が困難である。この文脈では、合成された人間の軌道データは、しばしば要約統計と分布の類似性の下で、現実世界の人間の軌道に可能な限り近いシミュレートするために生成される。しかしながら、人間の移動パターンの複雑さはこれらの類似性(例えば `Datasaurus'')によって過度に単純化され、生成モデル設計と生成された軌道のベンチマークの両方に固有のバイアスをもたらす。そこで我々は,探索と優先回帰モデルを統合したニューラル・テンポラル・ポイント・プロセスとして設計されたhuman-Imitative tRAjectory GenErativeモデルであるMIRAGEを提案する。これは、従来の方法のように特定の統計分布を適合させるのではなく、軌道生成における人間の意思決定プロセスを模倣しているため、Datasaurusの問題を避けている。さらに,4つの典型的な下流タスクに対してトラジェクトリ生成モデルを体系的にベンチマークし,各タスクに対する複数のテクニックと評価指標を統合し,生成したトラジェクトリの究極の有用性を包括的に評価するために,Datasaurusを超える包括的タスクベース評価プロトコルを提案する。我々は,MIRAGEを3つの実世界のユーザトラジェクトリデータセットに対して,大規模なベースラインの収集に対して徹底的に評価する。その結果、MIRAGEが生成した軌道データは、最高の統計的および分布的類似性を59.0-71.5%改善しただけでなく、10.9-33.4%改善したタスクベース評価でも最高の性能が得られることがわかった。 Human trajectory data, which plays a crucial role in various applications such as crowd management and epidemic prevention, is challenging to obtain due to practical constraints and privacy concerns. In this context, synthetic human trajectory data is generated to simulate as close as possible to real-world human trajectories, often under summary statistics and distributional similarities. However, the complexity of human mobility patterns is oversimplified by these similarities (a.k.a. ``Datasaurus''), resulting in intrinsic biases in both generative model design and benchmarks of the generated trajectories. Against this background, we propose MIRAGE, a huMan-Imitative tRAjectory GenErative model designed as a neural Temporal Point Process integrating an Exploration and Preferential Return model. It imitates the human decision-making process in trajectory generation, rather than fitting any specific statistical distributions as traditional methods do, thus avoiding the Datasaurus issue. Moreover, we also propose a comprehensive task-based evaluation protocol beyond Datasaurus to systematically benchmark trajectory generative models on four typical downstream tasks, integrating multiple techniques and evaluation metrics for each task, to comprehensively assess the ultimate utility of the generated trajectories. We conduct a thorough evaluation of MIRAGE on three real-world user trajectory datasets against a sizeable collection of baselines. Results show that compared to the best baselines, MIRAGE-generated trajectory data not only achieves the best statistical and distributional similarities with 59.0-71.5% improvement, but also yields the best performance in the task-based evaluation with 10.9-33.4% improvement.	翻訳日:2024-11-07 05:13:17 公開日:2024-09-20
# 機械学習を用いた肝細胞癌早期診断のためのマルチオミクスデータ統合 Multi-omics data integration for early diagnosis of hepatocellular carcinoma (HCC) using machine learning ( http://arxiv.org/abs/2409.13791v1 ) ライセンス: Link先を確認	Annette Spooner, Mohammad Karimi Moridani, Azadeh Safarchi, Salim Maher, Fatemeh Vafaee, Amany Zekry, Arcot Sowmya,	(参考訳) 様々な患者データから得られた相補的な情報は、患者の疾患状態のより正確なモデリングと、疾患の基盤となる生物学的プロセスの理解に役立つ。しかし、マルチモーダル・マルチオミクスデータの解析は、高次元と様々なサイズ、統計分布、スケールとモダリティ間の信号強度など、多くの課題を呈している。本研究では,異なるモードのマルチクラスデータの遅延統合が可能な,さまざまなアンサンブル機械学習アルゴリズムの性能を比較した。アンサンブル法とそのバリエーションを検証した。一硬くて柔らかい投票権を有する投票権二メタ学習者三ハード・投票、ソフト・投票、メタ・ラーナーを用いたマルチモーダル・アダブーストモデルで、各ブースティングラウンドにおけるモダリティ、PB-MVBoostモデル、及びエキスパート・モデルの新たな応用。これらは、単純な結合をベースラインとして比較した。肝細胞癌(HCC)と乳癌および過敏性腸疾患(IBD)の研究に関する4つの検証データセットを用いて,本法について検討した。受信機の動作曲線の下の領域をパフォーマンスの指標として最大0.85の性能値を達成するモデルを開発し、PB-MVBoostとAdaboostの2つの強化された手法をソフトな投票で評価した。また,選択した特徴の安定性と臨床症状の大きさについても検討した。最後に,マルチモーダルなマルチクラスデータの統合を推奨する。 The complementary information found in different modalities of patient data can aid in more accurate modelling of a patient's disease state and a better understanding of the underlying biological processes of a disease. However, the analysis of multi-modal, multi-omics data presents many challenges, including high dimensionality and varying size, statistical distribution, scale and signal strength between modalities. In this work we compare the performance of a variety of ensemble machine learning algorithms that are capable of late integration of multi-class data from different modalities. The ensemble methods and their variations tested were i) a voting ensemble, with hard and soft vote, ii) a meta learner, iii) a multi-modal Adaboost model using a hard vote, a soft vote and a meta learner to integrate the modalities on each boosting round, the PB-MVBoost model and a novel application of a mixture of experts model. These were compared to simple concatenation as a baseline. We examine these methods using data from an in-house study on hepatocellular carcinoma (HCC), along with four validation datasets on studies from breast cancer and irritable bowel disease (IBD). Using the area under the receiver operating curve as a measure of performance we develop models that achieve a performance value of up to 0.85 and find that two boosted methods, PB-MVBoost and Adaboost with a soft vote were the overall best performing models. We also examine the stability of features selected, and the size of the clinical signature determined. Finally, we provide recommendations for the integration of multi-modal multi-class data.	翻訳日:2024-11-07 05:13:17 公開日:2024-09-20
# ソフトグリッパーのマルチモーダルデータ融合のための連続学習 Continual Learning for Multimodal Data Fusion of a Soft Gripper ( http://arxiv.org/abs/2409.13792v1 ) ライセンス: Link先を確認	Nilay Kushawaha, Egidio Falotico,	(参考訳) 連続学習(英: Continual Learning, CL)とは、アルゴリズムが学習した情報を保持しつつ、その環境から新たな知識を継続的に段階的に獲得する能力である。あるデータモダリティに基づいてトレーニングされたモデルは、異なるモダリティでテストした場合、しばしば失敗する。単純なアプローチは、2つのモダリティを融合させ、それらの特徴を結合し、融合したデータでモデルをトレーニングすることかもしれない。しかし、新しいドメインに遭遇するたびに、スクラッチからモデルを再トレーニングする必要があります。本稿では,ラベル付きデータが不足する人工環境において,クラス増分学習シナリオとドメイン増分学習シナリオの両方を活用することで,異なるデータモダリティを漸進的に学習できる連続学習アルゴリズムを提案する。提案アルゴリズムは効率的で,各クラスのプロトタイプを格納する必要がある。本研究では,ソフトな空気圧グリップから得られる触覚データと,ビデオシーケンスから抽出した物体の静止画像から得られる視覚データからなる,難易度の高いマルチモーダルデータセットに対して,アルゴリズムの有効性を評価する。さらに、カスタムデータセットとCore50データセットに関するアブレーション調査を行い、アルゴリズムのさまざまなコンポーネントのコントリビューションを強調します。このアルゴリズムのロバスト性をさらに実証するため,ロボットオペレーティングシステム(ROS)フレームワークと同期したソフトグリップと外部独立カメラセットアップを用いて,オブジェクト分類のリアルタイムな実験を行った。 Continual learning (CL) refers to the ability of an algorithm to continuously and incrementally acquire new knowledge from its environment while retaining previously learned information. A model trained on one data modality often fails when tested with a different modality. A straightforward approach might be to fuse the two modalities by concatenating their features and training the model on the fused data. However, this requires retraining the model from scratch each time it encounters a new domain. In this paper, we introduce a continual learning algorithm capable of incrementally learning different data modalities by leveraging both class-incremental and domain-incremental learning scenarios in an artificial environment where labeled data is scarce, yet non-iid (independent and identical distribution) unlabeled data from the environment is plentiful. The proposed algorithm is efficient and only requires storing prototypes for each class. We evaluate the algorithm's effectiveness on a challenging custom multimodal dataset comprising of tactile data from a soft pneumatic gripper, and visual data from non-stationary images of objects extracted from video sequences. Additionally, we conduct an ablation study on the custom dataset and the Core50 dataset to highlight the contributions of different components of the algorithm. To further demonstrate the robustness of the algorithm, we perform a real-time experiment for object classification using the soft gripper and an external independent camera setup, all synchronized with the Robot Operating System (ROS) framework.	翻訳日:2024-11-07 05:13:17 公開日:2024-09-20
# 完全AI自動ヴァイシング攻撃の可能性について On the Feasibility of Fully AI-automated Vishing Attacks ( http://arxiv.org/abs/2409.13793v1 ) ライセンス: Link先を確認	João Figueiredo, Afonso Carvalho, Daniel Castro, Daniel Gonçalves, Nuno Santos,	(参考訳) ヴァイシング攻撃(英語: vishing attack)とは、個人を騙して個人情報、財務情報、セキュリティ情報などの機密情報を開示する、社会工学の一形態である。攻撃者は、被害者を操作するために、声のコミュニケーションの緊急性や正確性を認識し、しばしば銀行や技術支援のような合法的な組織として振る舞う。情報保護のために設計されたセキュリティコントロールをバイパスすることは、特に深刻な脅威だ。本研究では,AIの出現にともなって,ヴァイシング攻撃がエスカレートする可能性について検討する。理論的には、AIを利用したソフトウェアボットは、潜在的な犠牲者との会話を電話で開始し、機密情報を開示することで、これらの攻撃を自動化することができるかもしれない。この論文を検証するために、公開AI技術を用いて開発されたAIを利用したバイシングシステムであるViKingを紹介する。その中核となる認知プロセッサはLarge Language Model(LLM)で、被害者との会話をコントロールし、通話における音声テキスト変換を容易にする音声テキストと音声音声のモジュールのパイプラインを補完する。 240人の参加者によるコントロールされた社会実験を通じて、ヴァイシングキャンペーンのリスクについて明示的に警告を受けた者でさえも、ViKingが多くの参加者に機密情報を開示するよう説得することに成功していることがわかった。 ViKingのボットとのインタラクションは、一般的に現実的と考えられていた。これらの結果から、VKingのようなツールは、潜在的な悪意のあるアクターに既にアクセス可能であり、また、サイバー認知プログラムの貴重なリソースとして機能する可能性があると結論付けている。 A vishing attack is a form of social engineering where attackers use phone calls to deceive individuals into disclosing sensitive information, such as personal data, financial information, or security credentials. Attackers exploit the perceived urgency and authenticity of voice communication to manipulate victims, often posing as legitimate entities like banks or tech support. Vishing is a particularly serious threat as it bypasses security controls designed to protect information. In this work, we study the potential for vishing attacks to escalate with the advent of AI. In theory, AI-powered software bots may have the ability to automate these attacks by initiating conversations with potential victims via phone calls and deceiving them into disclosing sensitive information. To validate this thesis, we introduce ViKing, an AI-powered vishing system developed using publicly available AI technology. It relies on a Large Language Model (LLM) as its core cognitive processor to steer conversations with victims, complemented by a pipeline of speech-to-text and text-to-speech modules that facilitate audio-text conversion in phone calls. Through a controlled social experiment involving 240 participants, we discovered that ViKing has successfully persuaded many participants to reveal sensitive information, even those who had been explicitly warned about the risk of vishing campaigns. Interactions with ViKing's bots were generally considered realistic. From these findings, we conclude that tools like ViKing may already be accessible to potential malicious actors, while also serving as an invaluable resource for cyber awareness programs.	翻訳日:2024-11-07 05:13:17 公開日:2024-09-20
# 内在性単像HDR再建術 Intrinsic Single-Image HDR Reconstruction ( http://arxiv.org/abs/2409.13803v1 ) ライセンス: Link先を確認	Sebastian Dille, Chris Careaga, Yağız Aksoy,	(参考訳) 一般的なカメラの低ダイナミックレンジ(LDR)は、自然界のコントラストのリッチなコントラストを捉えることができず、色や彩度が飽和したピクセルの細部が失われる。 1枚のLDR写真からシーンに存在する輝度の高ダイナミックレンジ(HDR)を再構成することは、多くの計算写真やリアルな画像表示への応用において重要な課題である。 HDR再構成タスクは、シーンに存在するコンテキストを用いて、失われた詳細を推測することを目的としており、ニューラルネットワークは高レベルの幾何学的および照明的キューを理解する必要がある。これにより、データ駆動アルゴリズムが正確で高解像度な結果を生成するのが難しくなる。そこで本研究では,本研究におけるHDR再建問題の物理的に着想を得たリモデリングについて紹介する。固有モデルにより、シェーディング領域のダイナミックレンジを拡張し、アルベド領域の失われた色の詳細を復元するために、個別のネットワークを訓練することができる。課題を2つの単純なサブタスクに分割することで,多種多様な写真の性能向上が期待できる。 The low dynamic range (LDR) of common cameras fails to capture the rich contrast in natural scenes, resulting in loss of color and details in saturated pixels. Reconstructing the high dynamic range (HDR) of luminance present in the scene from single LDR photographs is an important task with many applications in computational photography and realistic display of images. The HDR reconstruction task aims to infer the lost details using the context present in the scene, requiring neural networks to understand high-level geometric and illumination cues. This makes it challenging for data-driven algorithms to generate accurate and high-resolution results. In this work, we introduce a physically-inspired remodeling of the HDR reconstruction problem in the intrinsic domain. The intrinsic model allows us to train separate networks to extend the dynamic range in the shading domain and to recover lost color details in the albedo domain. We show that dividing the problem into two simpler sub-tasks improves performance in a wide variety of photographs.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# メソスコピックシステムとしての重力 Gravity as a mesoscopic system ( http://arxiv.org/abs/2409.13808v1 ) ライセンス: Link先を確認	Pietro Pelliconi, Julian Sonner, Herman Verlinde,	(参考訳) ETHに関する一般的なカオスシステムにおいて,ブラウン運動と熱相関関数の時間変動の概念的,定量的な類似性を引き出すために,確率論的メソスコピック記述を用いる。この枠組みでは、「単純」作用素の熱相関関数は確率過程によって記述され、確率論的意味でのみ顕微鏡理論の特徴を探索することができる。この定式化はAdS$_3$の半古典重力の場合に適用し、ワームホールの寄与は確率過程のモーメントとして自然に特定できることを示す。また,より高次かつ高次に情報を隠蔽し,確率的枠組みの中で自然に正当化できる「マトリシカ人形」の再帰的構造を指摘する。次に、境界面から重力結果を再解釈し、CFTのOPEデータを確率分布に推し進める。この研究の結果、AdSの半古典重力は自然に量子重力のメソスコピックな記述として解釈でき、メソスコピックなホログラフィック双対性はモーメントvs-確率分布双対性として表されることが示された。 We employ a probabilistic mesoscopic description to draw conceptual and quantitative analogies between Brownian motion and late-time fluctuations of thermal correlation functions in generic chaotic systems respecting ETH. In this framework, thermal correlation functions of `simple' operators are described by stochastic processes, which are able to probe features of the microscopic theory only in a probabilistic sense. We apply this formalism to the case of semiclassical gravity in AdS$_3$, showing that wormhole contributions can be naturally identified as moments of stochastic processes. We also point out a `Matryoshka doll' recursive structure in which information is hidden in higher and higher moments, and which can be naturally justified within the stochastic framework. We then re-interpret the gravitational results from the boundary perspective, promoting the OPE data of the CFT to probability distributions. The outcome of this study shows that semiclassical gravity in AdS can be naturally interpreted as a mesoscopic description of quantum gravity, and a mesoscopic holographic duality can be framed as a moment-vs-probability-distribution duality.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# 浅魔術深度量子回路の古典的シミュラビリティ Classical Simulability of Quantum Circuits with Shallow Magic Depth ( http://arxiv.org/abs/2409.13809v1 ) ライセンス: Link先を確認	Yifan Zhang, Yuxuan Zhang,	(参考訳) 量子マジック(quantum magic)は、量子計算が古典的なシミュレーションを超えることができるリソースである。以前の結果は、古典的なシミュラビリティに、$T$ゲートや安定化器ランクの数によって特徴づけられる量子魔法の量と結びついている。しかし、量子回路シミュレーションの硬さに対する量子魔法の分布の影響は未解決のままである。本研究では,3つのタスク,すなわち振幅推定,サンプリング,パウリ可観測値の評価を交互に行う量子回路の古典的シミュラビリティについて検討する。すべての$T$ゲートが1つの層に分散されている場合、振幅推定と乗法誤差へのサンプリングは、合理的な仮定の下では古典的に難解であるが、パウリ可観測性は容易に評価できる。驚いたことに、たった1つの$T$ゲート層を追加するか、単にすべての$T$ゲートを$T^{\frac{1}{2}}$に置き換えるだけで、Pauli評価タスクは、PからGapP完全への急激な複雑性遷移を明らかにする。それでも、精度要件が 1/poly($n$) 加法誤差に緩和されると、振幅を計算するための多項式時間古典的アルゴリズム、Pauli observable を与え、対角ゲートの積に分解可能な任意のマジック-ディープワン回路に対して $\log(n)$ の限界分布からサンプリングすることができる。我々の研究は、高度に魔法の回路をシミュレートする新しい技術を提供しています。 Quantum magic is a resource that allows quantum computation to surpass classical simulation. Previous results have linked the amount of quantum magic, characterized by the number of $T$ gates or stabilizer rank, to classical simulability. However, the effect of the distribution of quantum magic on the hardness of simulating a quantum circuit remains open. In this work, we investigate the classical simulability of quantum circuits with alternating Clifford and $T$ layers across three tasks: amplitude estimation, sampling, and evaluating Pauli observables. In the case where all $T$ gates are distributed in a single layer, performing amplitude estimation and sampling to multiplicative error are already classically intractable under reasonable assumptions, but Pauli observables are easy to evaluate. Surprisingly, with the addition of just one $T$ gate layer or merely replacing all $T$ gates with $T^{\frac{1}{2}}$, the Pauli evaluation task reveals a sharp complexity transition from P to GapP-complete. Nevertheless, when the precision requirement is relaxed to 1/poly($n$) additive error, we are able to give a polynomial time classical algorithm to compute amplitudes, Pauli observable, and sampling from $\log(n)$ sized marginal distribution for any magic-depth-one circuit that is decomposable into a product of diagonal gates. Our research provides new techniques to simulate highly magical circuits while shedding light on their complexity and their significant dependence on the magic depth.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# 実時間量子シミュレーションによるエネルギー・電荷相関の探索:シュウィンガーモデルからの考察 Probing Celestial Energy and Charge Correlations through Real-Time Quantum Simulations: Insights from the Schwinger Model ( http://arxiv.org/abs/2409.13816v1 ) ライセンス: Link先を確認	João Barata, Swagato Mukherjee,	(参考訳) 高エネルギー物理学における光線演算子 (LRO) の応用の最近の発展により, 実時間量子シミュレーションによりLROの相関関数を研究する新たな手法が提案されている。量子シミュレータは、低次元量子場理論 (QFT) における性質 LRO を探索する理想的な実験室を提供すると主張する。これは1+1-d Schwingerモデルで例示され、テンソルネットワーク法を用いてエネルギーと電荷の相関子の計算に焦点をあてている。格子から必要な相関関数を抽出する際のいくつかの課題にもかかわらず、使用する方法論は実際の量子デバイスに拡張することができる。 Motivated by recent developments in the application of light-ray operators (LROs) in high energy physics, we propose a new strategy to study correlation functions of LROs through real-time quantum simulations. We argue that quantum simulators provide an ideal laboratory to explore the properties LROs in lower-dimensional quantum field theories (QFTs). This is exemplified in the 1+1-d Schwinger model, employing tensor network methods, focusing on the calculation of energy and charge correlators. Despite some challenges in extracting the necessary correlation functions from the lattice the methodology used can be extended to real quantum devices.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# 正常心のダイナミクスを解き放つためのパーソナライズされた3D+tメッシュ生成モデル A Personalised 3D+t Mesh Generative Model for Unveiling Normal Heart Dynamics ( http://arxiv.org/abs/2409.13825v1 ) ライセンス: Link先を確認	Mengyun Qiao, Kathryn A McGurk, Shuo Wang, Paul M. Matthews, Declan P O Regan, Wenjia Bai,	(参考訳) 心臓の構造と運動を理解することは、心臓血管疾患の診断と管理に不可欠である。心臓の形状や運動パターンは、人口動態、人文計測、疾患の要因に影響される。正常な形状や動きのパターンを解明し、各個人が基準からどう逸脱するかを理解することは、正確な診断とパーソナライズされた治療戦略を促進する。そこで我々は,心臓の形状と運動パターンの分布を学習する条件付き生成モデルであるMeshHeartを開発した。 MeshHeartは、年齢、性別、体重、高さなどの臨床的要因を考慮して、3D+t心筋メッシュ配列を生成することができる。高次元かつ複雑な時空間メッシュデータをモデル化するために、MeshHeartは幾何学的エンコーダを使用して潜時空間内の心臓メッシュを表現し、続いて時間変換器を用いて潜時表現の運動ダイナミクスをモデル化する。 MeshHeartに基づいて、3D+t心筋メッシュ配列の潜時空間を調査し、潜時空間におけるパーソナライズされた規範パターンからのリアルハートの偏差を定量化する、潜時デルタと呼ばれる新しい距離メートル法を提案する。 38,309人の被験者からなる大規模なデータセットを用いた実験では、MeshHeartは心臓メッシュ配列の再構築と生成において高いパフォーマンスを示す。潜伏空間で定義される特徴は心疾患の分類において高い差別性を示すが、潜伏デルタは現象ワイド・アソシエーション研究において臨床表現型と強い相関を示す。この研究のコードとモデルは、デジタル心臓モデリングのさらなる研究に役立てるためにリリースされる。 Understanding the structure and motion of the heart is crucial for diagnosing and managing cardiovascular diseases, the leading cause of global death. There is wide variation in cardiac shape and motion patterns, that are influenced by demographic, anthropometric and disease factors. Unravelling the normal patterns of shape and motion, as well as understanding how each individual deviates from the norm, would facilitate accurate diagnosis and personalised treatment strategies. To this end, we developed a novel conditional generative model, MeshHeart, to learn the distribution of cardiac shape and motion patterns. MeshHeart is capable of generating 3D+t cardiac mesh sequences, taking into account clinical factors such as age, sex, weight and height. To model the high-dimensional and complex spatio-temporal mesh data, MeshHeart employs a geometric encoder to represent cardiac meshes in a latent space, followed by a temporal Transformer to model the motion dynamics of latent representations. Based on MeshHeart, we investigate the latent space of 3D+t cardiac mesh sequences and propose a novel distance metric termed latent delta, which quantifies the deviation of a real heart from its personalised normative pattern in the latent space. In experiments using a large dataset of 38,309 subjects, MeshHeart demonstrates a high performance in cardiac mesh sequence reconstruction and generation. Features defined in the latent space are highly discriminative for cardiac disease classification, whereas the latent delta exhibits strong correlation with clinical phenotypes in phenome-wide association studies. The codes and models of this study will be released to benefit further research on digital heart modelling.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# ViTGuard:視覚変換器の逆例に対する注意認識検出 ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer ( http://arxiv.org/abs/2409.13828v1 ) ライセンス: Link先を確認	Shihua Sun, Kenechukwu Nwodo, Shridatt Sugrim, Angelos Stavrou, Haining Wang,	(参考訳) 視覚タスクにおけるトランスフォーマーの使用は、コンピュータビジョン(CV)における畳み込みニューラルネットワーク(CNN)の伝統的な支配的な役割に挑戦している。画像分類タスクにおいて、視覚変換器(ViT)は画像内のパッチ間の空間的関係を効果的に確立し、正確な予測のために重要な領域に注意を向ける。しかし、CNNと同様、ViTは敵の攻撃に弱いため、画像分類器を誤解させ、慎重に設計された摂動のある画像に対して誤った判断を下す。さらに、小さな領域で任意の摂動をもたらす敵パッチ攻撃は、ViTに対してより深刻な脅威となる。さらに悪いことに、もともとCNNモデル用に設計された従来の検出方法は、ViTに適用された場合、実用的でないか、大幅な性能低下を被り、パッチ攻撃を見落としている。本稿では,ViTGuardを,インプットやパッチ攻撃全体にわたって摂動が広まる典型的な攻撃を含む敵攻撃に対して,ViTモデルを防御するための一般的な検出方法として提案する。 ViTGuardはMasked Autoencoder (MAE)モデルを使用して、ランダムにマスキングされたパッチを非マスキング領域から回収し、柔軟な画像再構成戦略を提供する。次に、しきい値に基づく検出器は、注意マップやCLSトークン表現などの特徴的なViT特徴を利用して、通常のサンプルと反対のサンプルを区別する。 MAEモデルは、トレーニング中に敵のサンプルを含まないため、我々の検出器が目に見えない攻撃に対して有効であることを保証する。 ViTGuardは、3つのデータセットにわたる9つの攻撃の下で既存の7つの検出方法と比較される。評価結果は既存の検出器よりもViTGuardの方が優れていることを示している。最後に,検出回避の可能性を考慮し,ViTGuardのアダプティブアタックに対する堅牢性をさらに実証する。 The use of transformers for vision tasks has challenged the traditional dominant role of convolutional neural networks (CNN) in computer vision (CV). For image classification tasks, Vision Transformer (ViT) effectively establishes spatial relationships between patches within images, directing attention to important areas for accurate predictions. However, similar to CNNs, ViTs are vulnerable to adversarial attacks, which mislead the image classifier into making incorrect decisions on images with carefully designed perturbations. Moreover, adversarial patch attacks, which introduce arbitrary perturbations within a small area, pose a more serious threat to ViTs. Even worse, traditional detection methods, originally designed for CNN models, are impractical or suffer significant performance degradation when applied to ViTs, and they generally overlook patch attacks. In this paper, we propose ViTGuard as a general detection method for defending ViT models against adversarial attacks, including typical attacks where perturbations spread over the entire input and patch attacks. ViTGuard uses a Masked Autoencoder (MAE) model to recover randomly masked patches from the unmasked regions, providing a flexible image reconstruction strategy. Then, threshold-based detectors leverage distinctive ViT features, including attention maps and classification (CLS) token representations, to distinguish between normal and adversarial samples. The MAE model does not involve any adversarial samples during training, ensuring the effectiveness of our detectors against unseen attacks. ViTGuard is compared with seven existing detection methods under nine attacks across three datasets. The evaluation results show the superiority of ViTGuard over existing detectors. Finally, considering the potential detection evasion, we further demonstrate ViTGuard's robustness against adaptive attacks for evasion.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# エネルギー相関器の量子コンピューティング Quantum Computing for Energy Correlators ( http://arxiv.org/abs/2409.13830v1 ) ライセンス: Link先を確認	Kyle Lee, Francesco Turro, Xiaojun Yao,	(参考訳) 近年、エネルギー相関器は高エネルギー衝突の破砕力学を解明するための強力な観測器として出現している。本研究では,ハミルトニアン格子を用いたエネルギー相関器計算のための最初の数値計算手法を導入する。さらに、量子コンピューティングハードウェアとアルゴリズムの急速な進歩を動機として、量子場理論におけるエネルギー相関子を計算する量子アルゴリズムを提案する。このアルゴリズムには、基底状態の準備、ソース、シンク、エネルギーフラックス、リアルタイム進化演算子、アダマール試験が含まれる。我々はSU(2)純ゲージ理論を$3\times 3$および$5\times 5$ honeycomb lattices with $j_{\rm max} = \frac{1}{2}$で2+1$次元で適用し、古典的手法と量子アルゴリズムの両方を利用し、後者は特定の構成のためにIBMエミュレータを用いてテストした。結果は強い結合状態の期待された挙動と一致し、弱い結合状態と強い結合状態にまたがる閉じ込めのダイナミクスを研究するためにより包括的な研究を動機付けている。 In recent years, energy correlators have emerged as powerful observables for probing the fragmentation dynamics of high-energy collisions. We introduce the first numerical strategy for calculating energy correlators using the Hamiltonian lattice approach, providing access to the intriguing nonperturbative dynamics of these observables. Furthermore, motivated by rapid advances in quantum computing hardware and algorithms, we propose a quantum algorithm for calculating energy correlators in quantum field theories. This algorithm includes ground state preparation, the application of source, sink, energy flux, real-time evolution operators, and the Hadamard test. We validate our approach by applying it to the SU(2) pure gauge theory in $2+1$ dimensions on $3\times 3$ and $5\times 5$ honeycomb lattices with $j_{\rm max} = \frac{1}{2}$ at various couplings, utilizing both classical methods and the quantum algorithm, the latter tested using the IBM emulator for specific configurations. The results are consistent with the expected behavior of the strong coupling regime and motivate a more comprehensive study to probe the confinement dynamics across the weak and strong coupling regimes.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# 部分情報探索による大規模言語モデルの著作権リスクの測定 Measuring Copyright Risks of Large Language Model via Partial Information Probing ( http://arxiv.org/abs/2409.13831v1 ) ライセンス: Link先を確認	Weijie Zhao, Huajie Shao, Zhaozhuo Xu, Suzhen Duan, Denghui Zhang,	(参考訳) LLM(Large Language Models)のトレーニングに使用されるデータソースの探索は、これらのモデルによる著作権侵害の可能性を調査するための重要な方向である。このアプローチは、トレーニングデータにおける著作権物質の使用の可能性を特定することができるが、侵害リスクを直接測定するものではない。近年の研究では、LLMが著作権のあるコンテンツを直接出力できるかどうかのテストに移行している。この方向性に対応するために,著作権資料から部分的な情報を提供することにより,著作権侵害コンテンツを生成するLLMの能力を調査し評価し,著作権侵害コンテンツを生成するために繰り返しプロンプトを使用することを試みる。具体的には、著作権のあるテキストの一部をLSMに入力し、それを完了するように促し、生成したコンテンツとオリジナルの著作権のある資料との重なりを解析する。これらの部分的な入力に基づいて著作権素材と重なり合うコンテンツをLLMが生成できることが本研究で実証された。 Exploring the data sources used to train Large Language Models (LLMs) is a crucial direction in investigating potential copyright infringement by these models. While this approach can identify the possible use of copyrighted materials in training data, it does not directly measure infringing risks. Recent research has shifted towards testing whether LLMs can directly output copyrighted content. Addressing this direction, we investigate and assess LLMs' capacity to generate infringing content by providing them with partial information from copyrighted materials, and try to use iterative prompting to get LLMs to generate more infringing content. Specifically, we input a portion of a copyrighted text into LLMs, prompt them to complete it, and then analyze the overlap between the generated content and the original copyrighted material. Our findings demonstrate that LLMs can indeed generate content highly overlapping with copyrighted materials based on these partial inputs.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# STOP! 攻撃的進行に対する感度テストによる大規模言語モデルのベンチマーク STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions ( http://arxiv.org/abs/2409.13843v1 ) ライセンス: Link先を確認	Robert Morabito, Sangmitra Madhusudan, Tyler McDonald, Ali Emami,	(参考訳) 大規模言語モデル(LLM)における明示的バイアスと暗黙的バイアスの緩和は、自然言語処理の分野において重要な焦点となっている。しかし、現在の多くの方法論は、それぞれの状況におけるより広い文脈や潜在的なバイアスのスペクトルを考慮せずに、独立したシナリオを評価する。これを解決するために,攻撃性評価(STOP, Sensitivity Testing on Offensive Progressions)データセットを導入した。幅広い9つの人口層と46のサブデデデノグラフィーをカバーし、STOPはインクリシティと包括的カバレッジを保証する。 GPT-4、Mixtral、Llama 3など、いくつかの主要なクローズドおよびオープンソースモデルを評価した。以上の結果から,最も優れたモデルでさえバイアスを不整合に検出し,成功率は19.3%から69.8%であった。また, BBQ, StereoSet, CrowS-Pairsなどのセンシティブなタスクのモデル応答率を最大191%向上し, 性能を維持・改善する。 STOPは、より効果的なバイアス緩和戦略を可能にし、より公平な言語モデルの作成を容易にするLLMにおけるバイアスの複雑な性質を評価するための新しいフレームワークを提供する。 Mitigating explicit and implicit biases in Large Language Models (LLMs) has become a critical focus in the field of natural language processing. However, many current methodologies evaluate scenarios in isolation, without considering the broader context or the spectrum of potential biases within each situation. To address this, we introduce the Sensitivity Testing on Offensive Progressions (STOP) dataset, which includes 450 offensive progressions containing 2,700 unique sentences of varying severity that progressively escalate from less to more explicitly offensive. Covering a broad spectrum of 9 demographics and 46 sub-demographics, STOP ensures inclusivity and comprehensive coverage. We evaluate several leading closed- and open-source models, including GPT-4, Mixtral, and Llama 3. Our findings reveal that even the best-performing models detect bias inconsistently, with success rates ranging from 19.3% to 69.8%. We also demonstrate how aligning models with human judgments on STOP can improve model answer rates on sensitive tasks such as BBQ, StereoSet, and CrowS-Pairs by up to 191%, while maintaining or even improving performance. STOP presents a novel framework for assessing the complex nature of biases in LLMs, which will enable more effective bias mitigation strategies and facilitates the creation of fairer language models.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# 脳拡散MRIにおける視野拡大のための多モード条件変分U-Net Multi-Modality Conditioned Variational U-Net for Field-of-View Extension in Brain Diffusion MRI ( http://arxiv.org/abs/2409.13846v1 ) ライセンス: Link先を確認	Zhiyuan Li, Tianyuan Yao, Praitayini Kanakaraj, Chenyu Gao, Shunxing Bao, Lianrui Zuo, Michael E. Kim, Nancy R. Newlin, Gaurav Rudravaram, Nazirah M. Khairi, Yuankai Huo, Kurt G. Schilling, Walter A. Kukull, Arthur W. Toga, Derek B. Archer, Timothy J. Hohman, Bennett A. Landman,	(参考訳) 拡散磁気共鳴イメージング(dMRI)における不完全視野(FOV)は、全脳白質結合の体積および束解析を著しく阻害することができる。既存の研究は, 深部生成モデルを用いて, 欠落した領域をインパルス化することを検討したが, ペア化された多モードデータから追加情報をどのように活用するか, そして, インパルス化の質を高め, 下流のトラクトグラフィーに有用かは定かではない。このギャップを埋めるために、FOVの取得した部分における学習拡散特徴を完全な脳解剖学的構造に統合することにより、FOVの不完全部分におけるdMRIスキャンを計算するための新しい枠組みを提案する。この設計により,提案するフレームワークは,dMRIスキャンの計算性能を向上し,不完全なFOVによる破壊dMRIスキャンにおける全脳トラクトグラフィーの修復に有用である,という仮説を立てる。 96名の被験者の異なる2つのコホートでフレームワークを試験し、T1wとdMRIの情報を等しく扱うベースライン計算法と比較した。提案手法は,角相関係数 (p < 1E-5) で示されるような計算性能と,Dice スコア (p < 0.01) で示される下流トラクトグラフィーの精度において有意に向上した。提案手法は, ベースライン法と比較して, 対の多重モードデータから追加情報を活用することにより, dMRIスキャンのインパルス化性能を向上させることが示唆された。提案手法によって達成されたインプットは、全脳トラクトグラフィーを増強し、神経変性に関連する束を分析する際の不確実性を低下させる。 An incomplete field-of-view (FOV) in diffusion magnetic resonance imaging (dMRI) can severely hinder the volumetric and bundle analyses of whole-brain white matter connectivity. Although existing works have investigated imputing the missing regions using deep generative models, it remains unclear how to specifically utilize additional information from paired multi-modality data and whether this can enhance the imputation quality and be useful for downstream tractography. To fill this gap, we propose a novel framework for imputing dMRI scans in the incomplete part of the FOV by integrating the learned diffusion features in the acquired part of the FOV to the complete brain anatomical structure. We hypothesize that by this design the proposed framework can enhance the imputation performance of the dMRI scans and therefore be useful for repairing whole-brain tractography in corrupted dMRI scans with incomplete FOV. We tested our framework on two cohorts from different sites with a total of 96 subjects and compared it with a baseline imputation method that treats the information from T1w and dMRI scans equally. The proposed framework achieved significant improvements in imputation performance, as demonstrated by angular correlation coefficient (p < 1E-5), and in downstream tractography accuracy, as demonstrated by Dice score (p < 0.01). Results suggest that the proposed framework improved imputation performance in dMRI scans by specifically utilizing additional information from paired multi-modality data, compared with the baseline method. The imputation achieved by the proposed framework enhances whole brain tractography, and therefore reduces the uncertainty when analyzing bundles associated with neurodegenerative.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# Segment Discovery:Eコマースのターゲティングを強化する Segment Discovery: Enhancing E-commerce Targeting ( http://arxiv.org/abs/2409.13847v1 ) ライセンス: Link先を確認	Qiqi Li, Roopali Singh, Charin Polpanumas, Tanner Fiez, Namita Kumar, Shreya Chakrabarti,	(参考訳) 現代のeコマースサービスは、ゲーム、ショッピング、ビデオストリーミングなどの製品に顧客を巻き込むインセンティブや介入で顧客を狙うことが多い。この顧客エンゲージメントは、より多くの顧客獲得と既存の顧客維持を促進し、顧客エクスペリエンスを改善しながら、企業のビジネスを拡大します。多くの場合、顧客はランダムにターゲットされるか、望ましい行動の妥当性に基づいてターゲットされる。しかし、そのようなポリシーは、介入から最も利益を得るであろう顧客の集合を標的にせず、またいかなる制約も考慮しないため、準最適である。本稿では,要求された制約を考慮しつつ,ビジネスに対する価値を最大化するために,ユーザに対してユースケース特異的な介入を目標とする,アップリフトモデリングと制約付き最適化に基づくポリシフレームワークを提案する。本研究では,2つの大規模実験と実運用による最先端のターゲティング手法の改善について述べる。 Modern e-commerce services frequently target customers with incentives or interventions to engage them in their products such as games, shopping, video streaming, etc. This customer engagement increases acquisition of more customers and retention of existing ones, leading to more business for the company while improving customer experience. Often, customers are either randomly targeted or targeted based on the propensity of desirable behavior. However, such policies can be suboptimal as they do not target the set of customers who would benefit the most from the intervention and they may also not take account of any constraints. In this paper, we propose a policy framework based on uplift modeling and constrained optimization that identifies customers to target for a use-case specific intervention so as to maximize the value to the business, while taking account of any given constraints. We demonstrate improvement over state-of-the-art targeting approaches using two large-scale experimental studies and a production implementation.	翻訳日:2024-11-07 05:01:49 公開日:2024-09-20
# 対称性を考慮したグラフニューラルネットワークを用いた結晶材料の学習順序付け Learning Ordering in Crystalline Materials with Symmetry-Aware Graph Neural Networks ( http://arxiv.org/abs/2409.13851v1 ) ライセンス: Link先を確認	Jiayu Peng, James Damewood, Jessica Karaguesian, Jaclyn R. Lunger, Rafael Gómez-Bombarelli,	(参考訳) グラフ畳み込みニューラルネットワーク(GCNN)は、構造から特性を予測することにより、触媒やエネルギー貯蔵などの分野における結晶材料の化学空間をスクリーニングする機械学習のワークホースとなっている。しかし、多成分材料は、与えられた格子構造が高次構造から完全に不規則な固体溶液まで、様々な元素配列を包含できる化学(非)秩序を示すことができるため、ユニークな課題である。重要な点として、安定性、強度、触媒性能といった特性は構造だけでなく順序にも依存する。したがって、厳密な材料設計を可能にするため、GCNNが原子配列を識別できることを保証することが重要である。しかし、GCNNのオーダリング・アウェア能力はよく理解されていない。本稿では,多成分材料の秩序に依存したエネルギーを,高スループット原子論シミュレーションで生成したカスタムデータセットで取得するニューラルネットワークアーキテクチャをベンチマークする。従来の対称性不変なGCNNは、同じ物質の様々な対称非等価な原子配列の間の構造的差異を識別できないが、対称性等価なモデルアーキテクチャは本質的に様々な順序の異なる結晶対称性を保存し、区別することができる。 Graph convolutional neural networks (GCNNs) have become a machine learning workhorse for screening the chemical space of crystalline materials in fields such as catalysis and energy storage, by predicting properties from structures. Multicomponent materials, however, present a unique challenge since they can exhibit chemical (dis)order, where a given lattice structure can encompass a variety of elemental arrangements ranging from highly ordered structures to fully disordered solid solutions. Critically, properties like stability, strength, and catalytic performance depend not only on structures but also on orderings. To enable rigorous materials design, it is thus critical to ensure GCNNs are capable of distinguishing among atomic orderings. However, the ordering-aware capability of GCNNs has been poorly understood. Here, we benchmark various neural network architectures for capturing the ordering-dependent energetics of multicomponent materials in a custom-made dataset generated with high-throughput atomistic simulations. Conventional symmetry-invariant GCNNs were found unable to discern the structural difference between the diverse symmetrically inequivalent atomic orderings of the same material, while symmetry-equivariant model architectures could inherently preserve and differentiate the distinct crystallographic symmetries of various orderings.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# 言語モデルは、彼らが説くことを実践しているか? LLMで符号化されたジェンダー付き言語改革に関する言語イデオロギーの検討 Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs ( http://arxiv.org/abs/2409.13852v1 ) ライセンス: Link先を確認	Julia Watson, Sophia Lee, Barend Beekhuizen, Suzanne Stevenson,	(参考訳) 本研究では,LLMが生成したテキストにおける言語イデオロギーを,英語のジェンダー化言語改革(議員・女性・男性などの役割名詞,特異名詞)のケーススタディを通じて研究する。まず、政治的偏見を見出す:「正しい」あるいは「自然な」言語を使うよう求められた場合、LLMは保守的な(進歩的な)価値観に合わせるよう求められたのと最もよく似た言語を使用する。このことは、LLMのメタ言語的嗜好が特定の政治的グループの言語イデオロギーを暗黙的に伝達する様子を示している。第二に、LSMは内部の矛盾を示す: LLMは、より明示的なメタ言語的文脈が提供されるときに、より頻繁に性中立な変種を使用する。このことは、LLMが生成したテキストで表現される言語イデオロギーがいかに異なるかを示しており、これはユーザにとって予期せぬことである。本稿では,これらの知見が価値アライメントに与える影響について論じる。 We study language ideologies in text produced by LLMs through a case study on English gendered language reform (related to role nouns like congressperson/-woman/-man, and singular they). First, we find political bias: when asked to use language that is "correct" or "natural", LLMs use language most similarly to when asked to align with conservative (vs. progressive) values. This shows how LLMs' metalinguistic preferences can implicitly communicate the language ideologies of a particular political group, even in seemingly non-political contexts. Second, we find LLMs exhibit internal inconsistency: LLMs use gender-neutral variants more often when more explicit metalinguistic context is provided. This shows how the language ideologies expressed in text produced by LLMs can vary, which may be unexpected to users. We discuss the broader implications of these findings for value alignment.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# 動的ソフトプロンプトを用いた大規模言語モデルのアンロック記憶 Unlocking Memorization in Large Language Models with Dynamic Soft Prompting ( http://arxiv.org/abs/2409.13853v1 ) ライセンス: Link先を確認	Zhepeng Wang, Runxue Bao, Yawen Wu, Jackson Taylor, Cao Xiao, Feng Zheng, Weiwen Jiang, Shangqian Gao, Yanfu Zhang,	(参考訳) 事前訓練された大規模言語モデル(LLM)は、要約、質問応答、翻訳などの自然言語処理(NLP)タスクに革命をもたらした。しかし、LCMはトレーニングデータを記憶する傾向があるため、重大なセキュリティ上のリスクを生じ、プライバシー侵害や著作権侵害につながる可能性がある。この記憶の正確な測定は、これらの潜在的なリスクの評価と緩和に不可欠である。しかし、過去の記憶を特徴付ける試みは、接頭辞のみを使用するか、あるいは接頭辞に一定のソフトプロンプトを前置することで制限されるため、入力の変化に反応することができない。この課題に対処するために,動的にプレフィックスに依存したソフトプロンプトを用いてLLM記憶を推定する新しい手法を提案する。提案手法では,入力の変化に対応するソフトプロンプトを生成するためにトランスフォーマーベースのジェネレータを訓練することにより,記憶データのより正確な抽出を可能にする。提案手法は,従来の手法の限界に対処するだけでなく,各種実験環境において,最先端技術と比較して優れた性能を示す。特に,本手法は,テキスト生成タスクとコード生成タスクの発見可能な記憶率の観点から,バニラベースラインに対する112.75%と32.26%の最大相対的改善を達成できる。 Pretrained large language models (LLMs) have revolutionized natural language processing (NLP) tasks such as summarization, question answering, and translation. However, LLMs pose significant security risks due to their tendency to memorize training data, leading to potential privacy breaches and copyright infringement. Accurate measurement of this memorization is essential to evaluate and mitigate these potential risks. However, previous attempts to characterize memorization are constrained by either using prefixes only or by prepending a constant soft prompt to the prefixes, which cannot react to changes in input. To address this challenge, we propose a novel method for estimating LLM memorization using dynamic, prefix-dependent soft prompts. Our approach involves training a transformer-based generator to produce soft prompts that adapt to changes in input, thereby enabling more accurate extraction of memorized data. Our method not only addresses the limitations of previous methods but also demonstrates superior performance in diverse experimental settings compared to state-of-the-art techniques. In particular, our method can achieve the maximum relative improvement of 112.75% and 32.26% over the vanilla baseline in terms of discoverable memorization rate for the text generation task and code generation task respectively.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# Wormhole: 共同進化型シーケンスのための概念的深層表現学習 Wormhole: Concept-Aware Deep Representation Learning for Co-Evolving Sequences ( http://arxiv.org/abs/2409.13857v1 ) ライセンス: Link先を確認	Kunpeng Xu, Lifei Chen, Shengrui Wang,	(参考訳) IoTアプリケーションや金融市場、オンラインアクティビティログといった複雑なシステムを分析するためには、共同進化型シーケンスにおける動的概念の識別と理解が不可欠だ。これらの概念はシーケンシャルなデータの構造と振舞いに関する貴重な洞察を与え、より良い意思決定と予測を可能にします。本稿では,概念を意識した新しい深層表現学習フレームワークであるWormholeを紹介する。本モデルは,動的概念とその遷移の堅牢な識別を保証するために,自己表現層と時間的滑らか性制約を提示する。さらに、概念遷移は潜伏空間の急激な変化を識別し、ワームホールを通過するような新しい行動へのシフトを示すことによって検出される。このメカニズムは、共進化配列内の概念を正確に識別し、これらのワームホールの正確な位置を特定し、学習された表現の解釈可能性を高める。実験により、時系列データを意味のある概念に効果的に分割し、複雑な時間的パターンを分析し、概念のドリフトを検出するための貴重なツールを提供する。 Identifying and understanding dynamic concepts in co-evolving sequences is crucial for analyzing complex systems such as IoT applications, financial markets, and online activity logs. These concepts provide valuable insights into the underlying structures and behaviors of sequential data, enabling better decision-making and forecasting. This paper introduces Wormhole, a novel deep representation learning framework that is concept-aware and designed for co-evolving time sequences. Our model presents a self-representation layer and a temporal smoothness constraint to ensure robust identification of dynamic concepts and their transitions. Additionally, concept transitions are detected by identifying abrupt changes in the latent space, signifying a shift to new behavior - akin to passing through a wormhole. This novel mechanism accurately discerns concepts within co-evolving sequences and pinpoints the exact locations of these wormholes, enhancing the interpretability of the learned representations. Experiments demonstrate that this method can effectively segment time series data into meaningful concepts, providing a valuable tool for analyzing complex temporal patterns and advancing the detection of concept drifts.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# SSE:産業規模データ同化のためのマルチモーダルセマンティックデータ選択と強化 SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation ( http://arxiv.org/abs/2409.13860v1 ) ライセンス: Link先を確認	Maying Shen, Nadine Chang, Sifei Liu, Jose M. Alvarez,	(参考訳) 近年、人工知能のために収集されたデータは、管理不能な量に成長している。特に自動運転車などの産業アプリケーションでは、モデルの性能が飽和している間、モデルトレーニングの計算予算が超過している。データの洪水をナビゲートするために、最もセマンティックに多様性があり重要なデータセット部分を選択するためのフレームワークを提案する。そして、巨大なラベルのないデータプールから意味のある新しいデータを発見し、さらに意味を豊かにする。重要なことは、基礎モデルを利用して各データポイントのセマンティクスを生成することで、説明可能性を提供することができる。 SSE(Semantic Selection and Enrichment framework)が有効であることを定量的に示す。 a) より小さなトレーニングデータセットでモデルパフォーマンスをうまく維持し、 b) オリジナルのデータセットサイズを超えることなく、より小さなデータセットを豊かにすることにより、モデル性能を向上させる。その結果、セマンティックな多様性は最適なデータ選択とモデル性能に欠かせないことを示した。 In recent years, the data collected for artificial intelligence has grown to an unmanageable amount. Particularly within industrial applications, such as autonomous vehicles, model training computation budgets are being exceeded while model performance is saturating -- and yet more data continues to pour in. To navigate the flood of data, we propose a framework to select the most semantically diverse and important dataset portion. Then, we further semantically enrich it by discovering meaningful new data from a massive unlabeled data pool. Importantly, we can provide explainability by leveraging foundation models to generate semantics for every data point. We quantitatively show that our Semantic Selection and Enrichment framework (SSE) can a) successfully maintain model performance with a smaller training dataset and b) improve model performance by enriching the smaller dataset without exceeding the original dataset size. Consequently, we demonstrate that semantic diversity is imperative for optimal data selection and model performance.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# グラフニューラルネットワークによるエアロゾルダイナミクスのシミュレーション Learning to Simulate Aerosol Dynamics with Graph Neural Networks ( http://arxiv.org/abs/2409.13861v1 ) ライセンス: Link先を確認	Fabiana Ferracina, Payton Beeler, Mahantesh Halappanavar, Bala Krishnamoorthy, Marco Minutoli, Laura Fierce,	(参考訳) 大気エアロゾルが気候、天候、大気品質に与える影響は、個々の粒子の特性に依存する。粒子分解モデルは、粒子物理化学的性質においてこの多様性を捉えることができる唯一のモデルであり、これらのモデルは計算的に高価である。粒子分解マイクロフィジカルモデルの加速戦略として、グラフベースのエアロゾルダイナミクス学習(GLAD)を導入し、このモデルを用いて粒子分解モデルPartMC-MOSAICのサロゲートを訓練する。 GLADは、粒子ベースの流体力学モデルをシミュレートするために使用されている機械学習フレームワークであるGraph Network-based Simulator (GNS)を実装している。 GLADでは、各粒子はグラフ内のノードとして表現され、時間とともに粒子の個体群の進化は、学習されたメッセージパッシングによってシミュレートされる。我々は,硫酸塩,黒炭素,有機炭素,水からなる粒子に硫酸を凝縮させる単純なエアロゾルシステムに対するGNSアプローチを実証した。ノードとして粒子を持つグラフを構築し、PartMC-MOSAICから出力されたモデルを用いてグラフニューラルネットワーク(GNN)を訓練する。トレーニングされたGNNは、時間とともにエアロゾルのダイナミクスをシミュレートし、予測するために使用することができる。結果は、フレームワークが化学力学を正確に学習し、異なるシナリオをまたいで一般化し、効率的なトレーニングと予測時間を達成する能力を示す。エアロゾルマイクロ物理と化学のモデリングにおけるフレームワークの堅牢性と適応性を強調し,3つのシナリオにまたがる性能評価を行った。 Aerosol effects on climate, weather, and air quality depend on characteristics of individual particles, which are tremendously diverse and change in time. Particle-resolved models are the only models able to capture this diversity in particle physiochemical properties, and these models are computationally expensive. As a strategy for accelerating particle-resolved microphysics models, we introduce Graph-based Learning of Aerosol Dynamics (GLAD) and use this model to train a surrogate of the particle-resolved model PartMC-MOSAIC. GLAD implements a Graph Network-based Simulator (GNS), a machine learning framework that has been used to simulate particle-based fluid dynamics models. In GLAD, each particle is represented as a node in a graph, and the evolution of the particle population over time is simulated through learned message passing. We demonstrate our GNS approach on a simple aerosol system that includes condensation of sulfuric acid onto particles composed of sulfate, black carbon, organic carbon, and water. A graph with particles as nodes is constructed, and a graph neural network (GNN) is then trained using the model output from PartMC-MOSAIC. The trained GNN can then be used for simulating and predicting aerosol dynamics over time. Results demonstrate the framework's ability to accurately learn chemical dynamics and generalize across different scenarios, achieving efficient training and prediction times. We evaluate the performance across three scenarios, highlighting the framework's robustness and adaptability in modeling aerosol microphysics and chemistry.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# 継続的な学習における持続的バックドアアタック Persistent Backdoor Attacks in Continual Learning ( http://arxiv.org/abs/2409.13864v1 ) ライセンス: Link先を確認	Zhen Guo, Abhinav Kumar, Reza Tourani,	(参考訳) バックドア攻撃はニューラルネットワークに重大な脅威となり、敵は特定の入力でモデル出力を操作できる。バックドア攻撃は様々な文脈で研究されているが、継続学習における実践性と持続性、特に新しいデータ分布が学習され統合されるにつれて、モデルパラメータへの継続的な更新が時間とともにこれらの攻撃の有効性に影響を与えることを理解する上ではほとんど注目されていない。このギャップに対処するために、最小の敵の影響を生かして、Blind Task Backdoor と Latent Task Backdoor-each という2つの永続的なバックドア攻撃を導入する。私たちの盲目タスクバックドアは、トレーニングプロセスを直接制御することなく、損失計算を微調整します。我々はこれらの攻撃を様々な構成で評価し、静的、動的、物理的、意味的なトリガーで有効性を示す。以上の結果から,両攻撃は連続学習アルゴリズム間で連続的に高い成功率を達成するとともに,SentiNetやI-BAUといった最先端の防御を効果的に回避していることがわかった。 Backdoor attacks pose a significant threat to neural networks, enabling adversaries to manipulate model outputs on specific inputs, often with devastating consequences, especially in critical applications. While backdoor attacks have been studied in various contexts, little attention has been given to their practicality and persistence in continual learning, particularly in understanding how the continual updates to model parameters, as new data distributions are learned and integrated, impact the effectiveness of these attacks over time. To address this gap, we introduce two persistent backdoor attacks-Blind Task Backdoor and Latent Task Backdoor-each leveraging minimal adversarial influence. Our blind task backdoor subtly alters the loss computation without direct control over the training process, while the latent task backdoor influences only a single task's training, with all other tasks trained benignly. We evaluate these attacks under various configurations, demonstrating their efficacy with static, dynamic, physical, and semantic triggers. Our results show that both attacks consistently achieve high success rates across different continual learning algorithms, while effectively evading state-of-the-art defenses, such as SentiNet and I-BAU.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# MAGICS:ロボット安全の収束型ニューラルシンセサイザーのための暗黙の批判的スタックルバーグによるミニマックスアクター付き対向RL MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety ( http://arxiv.org/abs/2409.13867v1 ) ライセンス: Link先を確認	Justin Wang, Haimin Hu, Duy Phuong Nguyen, Jaime Fernández Fisac,	(参考訳) 頑健な最適制御理論は、確実に安全であるロボット制御ポリシーを計算するための厳密な枠組みを提供するが、高次元問題へのスケールに苦慮し、ロボット安全性の抽出可能な合成にディープラーニングの利用が増大する。残念ながら、既存の神経安全合成法は収束保証と解解釈性に欠けることが多い。本稿では、最小値平衡解への局所収束を保証する新しい逆相関強化学習(RL)アルゴリズムである、Implicit Critic Stackelberg(MAGICS)により導かれるミニマックスアクターについて述べる。次に,本手法を用いて,一般の深部RLに基づくロボット安全合成アルゴリズムの局所収束保証を実現する。 3次元四足歩行ロボットによるOpenAI Gym環境のシミュレーション実験とハードウェア実験の両方を通して、MAGICSは最先端の神経安全合成法よりも堅牢な制御ポリシーが得られることを示した。 While robust optimal control theory provides a rigorous framework to compute robot control policies that are provably safe, it struggles to scale to high-dimensional problems, leading to increased use of deep learning for tractable synthesis of robot safety. Unfortunately, existing neural safety synthesis methods often lack convergence guarantees and solution interpretability. In this paper, we present Minimax Actors Guided by Implicit Critic Stackelberg (MAGICS), a novel adversarial reinforcement learning (RL) algorithm that guarantees local convergence to a minimax equilibrium solution. We then build on this approach to provide local convergence guarantees for a general deep RL-based robot safety synthesis algorithm. Through both simulation studies on OpenAI Gym environments and hardware experiments with a 36-dimensional quadruped robot, we show that MAGICS can yield robust control policies outperforming the state-of-the-art neural safety synthesis methods.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# 深層学習に基づく肺結節検出・分節のためのU-Structure Deep Learning-Based Channel Squeeze U-Structure for Lung Nodule Detection and Segmentation ( http://arxiv.org/abs/2409.13868v1 ) ライセンス: Link先を確認	Mingxiu Sui, Jiacheng Hu, Tong Zhou, Zibo Liu, Likang Wen, Junliang Du,	(参考訳) 本稿では,早期肺癌診断の精度向上を目的とした,肺結節の自動検出・分節のための新しい深層学習法を提案する。提案手法では,ネットワークの複数のセマンティックレベルにわたる特徴抽出と情報統合を最適化する,ユニークな "Channel Squeeze U-Structure" を利用する。このアーキテクチャには、浅い情報処理、チャネル残基構造、チャネルストレッチ統合という3つの重要なモジュールが含まれている。これらのモジュールは、早期診断に欠かせない小さな、知覚不可能な、または地面ガラスの結節を検出し、セグメント化するモデルの能力を増強する。本手法は, 感度, Dice 類似度係数, 精度, IoU (Intersection over Union) において優れた性能を示す。 Lung Image Database Consortium (LIDC)データセットで5倍のクロスバリデーションを用いて大規模な実験を行い、優れた安定性と堅牢性を示した。以上の結果から, 本手法は, コンピュータ支援診断システムの改善に有意な可能性を秘めており, 臨床における放射線科医の信頼性と, 肺がんの早期発見, 特に資源限定環境での担い手として有用であることが示唆された。 This paper introduces a novel deep-learning method for the automatic detection and segmentation of lung nodules, aimed at advancing the accuracy of early-stage lung cancer diagnosis. The proposed approach leverages a unique "Channel Squeeze U-Structure" that optimizes feature extraction and information integration across multiple semantic levels of the network. This architecture includes three key modules: shallow information processing, channel residual structure, and channel squeeze integration. These modules enhance the model's ability to detect and segment small, imperceptible, or ground-glass nodules, which are critical for early diagnosis. The method demonstrates superior performance in terms of sensitivity, Dice similarity coefficient, precision, and mean Intersection over Union (IoU). Extensive experiments were conducted on the Lung Image Database Consortium (LIDC) dataset using five-fold cross-validation, showing excellent stability and robustness. The results indicate that this approach holds significant potential for improving computer-aided diagnosis systems, providing reliable support for radiologists in clinical practice and aiding in the early detection of lung cancer, especially in resource-limited settings	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# ジェネレーティブAIは非民主的バイアスとステレオタイプを担い、女性、黒人個人、年齢グループ、AI生成画像の障害のある人々の活動の表現 Generative AI Carries Non-Democratic Biases and Stereotypes: Representation of Women, Black Individuals, Age Groups, and People with Disability in AI-Generated Images across Occupations ( http://arxiv.org/abs/2409.13869v1 ) ライセンス: Link先を確認	Ayoob Sadeghiani,	(参考訳) AI開発におけるAIガバナンスと倫理は重要な関心事となり、IT企業、政府、研究者の間でAIが民主主義にもたらす潜在的なリスクについて活発に議論されている。この短いエッセイは、生成的AIがアウトプットに株式保存グループをいかに含んでいるか、あるいは排除しているか、というリスクを強調することを目的としている。この結果から、生成的AIは性別、人種、年齢、可視性障害について公平に包括的ではないことが明らかとなった。 AI governance and ethics in AI development have become critical concerns, prompting active discussions among tech companies, governments, and researchers about the potential risks AI poses to our democracies. This short essay aims to highlight one such risk: how generative AI includes or excludes equity-deserving groups in its outputs. The findings reveal that generative AI is not equitably inclusive regarding gender, race, age, and visible disability.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# プライバシ問題としての(工業的)フェデレーション学習におけるデータ流通の変化 Data Distribution Shifts in (Industrial) Federated Learning as a Privacy Issue ( http://arxiv.org/abs/2409.13875v1 ) ライセンス: Link先を確認	David Brunner, Alessio Montuoro,	(参考訳) 我々は、少数の強力で潜在的に競合する工業者間のコラボレーションである産業連合学習を、顧客に提供するサービスを改善することを目的とした第三者の仲介により検討する。この設定は、デバイス間設定などでは発生しないプライバシーリスクを隠蔽する、と我々は主張する。企業は知的財産権と生産過程を非常に保護している。生産の変更に関する情報と、その時期を非公開にしておくこと。本研究では,潜在的に微妙な時間的データ分布シフトを検出することによって,共同作業者の1人が競合製品に変化を推測するシナリオについて検討する。このフレーミングでは、たとえトレーニング収束に悪影響を及ぼさないとしても、データ分散シフトは常に問題となる。そこで本研究の目的は,従来の評価指標よりも分布変化の検出が優れている方法を見つけることである。マイナーシフトでさえ、コラボレーティブに学習された機械学習モデルに変換されるという仮定に基づいて、アタッカーは、関連する変更をピックアップするために、文献からメトリクスを選択することで、共有モデルの内部状態を追跡する。ベンチマークデータセットに関する実証的研究では、正直だが正確な攻撃者は、他のクライアントに対して微妙な分布シフトを検出できることを示した。 We consider industrial federated learning, a collaboration between a small number of powerful, potentially competing industrial players, mediated by a third party aspiring to improve the service it provides to its customers. We argue that this configuration harbours covert privacy risks that do not arise in e.g. cross-device settings. Companies are very protective of their intellectual property and production processes. Information about changes to their production and the timing of which is to be kept private. We study a scenario in which one of the collaborators infers changes to their competitors' production by detecting potentially subtle temporal data distribution shifts. In this framing, a data distribution shift is always problematic, even if it has no negative effect on training convergence. Thus, our goal is to find means that allow the detection of distributional shifts better than customary evaluation metrics. Based on the assumption that even minor shifts translate into the collaboratively learned machine learning model, the attacker tracks the shared models' internal state with a selection of metrics from literature in order to pick up on relevant changes. In an empirical study on benchmark datasets, we show an honest-but-curious attacker to be capable of detecting subtle distributional shifts on other clients, in some cases long before they become obvious in evaluation.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# 物理インフォームド変分空間ガウス過程 Physics-Informed Variational State-Space Gaussian Processes ( http://arxiv.org/abs/2409.13876v1 ) ライセンス: Link先を確認	Oliver Hamelijnck, Arno Solin, Theodoros Damoulas,	(参考訳) 微分方程式は、多くの科学的・工学的応用に不可欠な重要な力学モデルである。利用可能なデータが豊富にあることで、データ駆動物理インフォームドモデルへの関心が高まっている。ガウス過程(GP)は、事前の知識を取り入れ、不確実性を定量化しながら、複雑な非線形現象をモデル化できるため、このタスクに特に適している。現在のアプローチではいくつかの成功例があるが、計算のスケーリングが不十分な場合や、時間的設定のみに集中する場合に制限がある。本研究は, 線形および非線形の物理的制約を処理し, 効率的な線形インタイム計算コストを実現する変動時空間GPを導入することにより, これらの問題に対処する。我々は,本手法を人工的および実世界の様々な設定で実証し,予測的および計算的両方の性能において最先端の手法より優れていることを示す。 Differential equations are important mechanistic models that are integral to many scientific and engineering applications. With the abundance of available data there has been a growing interest in data-driven physics-informed models. Gaussian processes (GPs) are particularly suited to this task as they can model complex, non-linear phenomena whilst incorporating prior knowledge and quantifying uncertainty. Current approaches have found some success but are limited as they either achieve poor computational scalings or focus only on the temporal setting. This work addresses these issues by introducing a variational spatio-temporal state-space GP that handles linear and non-linear physical constraints while achieving efficient linear-in-time computation costs. We demonstrate our methods in a range of synthetic and real-world settings and outperform the current state-of-the-art in both predictive and computational performance.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# 予測精度の達成:ECML-PKDD 2024におけるボルボの発見へのLSTMと擬似ラベルの活用 Achieving Predictive Precision: Leveraging LSTM and Pseudo Labeling for Volvo's Discovery Challenge at ECML-PKDD 2024 ( http://arxiv.org/abs/2409.13877v1 ) ライセンス: Link先を確認	Carlo Metta, Marco Gregnanin, Andrea Papini, Silvia Giulia Galfrè, Andrea Fois, Francesco Morandin, Marco Fantozzi, Maurizio Parton,	(参考訳) 本稿では,ECML-PKDD 2024におけるVolvo Discovery Challengeにおいて,Long Short-Term Memory Network と pseudo-labeling を用いて,Volvo トラックの部品のメンテナンスニーズを予測する手法を提案する。トレーニングデータを処理して,テストセット構造を反映し,ベースLSTMモデルを適用して,テストデータを反復的にラベル付けした。提案手法はモデルの予測能力を改良し,マクロ平均F1スコア 0.879 を達成し,予測保守における堅牢な性能を実証した。この研究は、産業環境で機械学習技術を効果的に適用するための貴重な洞察を提供する。 This paper presents the second-place methodology in the Volvo Discovery Challenge at ECML-PKDD 2024, where we used Long Short-Term Memory networks and pseudo-labeling to predict maintenance needs for a component of Volvo trucks. We processed the training data to mirror the test set structure and applied a base LSTM model to label the test data iteratively. This approach refined our model's predictive capabilities and culminated in a macro-average F1-score of 0.879, demonstrating robust performance in predictive maintenance. This work provides valuable insights for applying machine learning techniques effectively in industrial settings.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# 事前学習音声と画像ネットモデルを用いた受動ソナー分類のための伝達学習 Transfer Learning for Passive Sonar Classification using Pre-trained Audio and ImageNet Models ( http://arxiv.org/abs/2409.13878v1 ) ライセンス: Link先を確認	Amirmohammad Mohammadi, Tejashri Kelhe, Davelle Carreiro, Alexandra Van Dine, Joshua Peeples,	(参考訳) 転送学習は、大規模で事前訓練されたモデルを活用し、下流タスクの微調整を行うために一般的に使用される。最も一般的な事前トレーニングモデルは当初、ImageNetを使ってトレーニングされている。しかし、それらの一般化能力は様々なデータモダリティにまたがる可能性がある。本研究では、水中音響目標認識(UATR)の文脈において、事前学習された音声ニューラルネットワーク(PANN)とImageNetの事前学習モデルを比較した。また, 受動的ソナー分類において, ImageNet事前学習モデルの方が若干優れていた。また,モデル事前学習と微調整におけるサンプリングレートの影響についても検討した。本研究は,UATR領域におけるラベル付きデータ不足による制約に対処するために,事前学習モデルの可能性を示す,UATRの伝達学習応用に寄与する。 Transfer learning is commonly employed to leverage large, pre-trained models and perform fine-tuning for downstream tasks. The most prevalent pre-trained models are initially trained using ImageNet. However, their ability to generalize can vary across different data modalities. This study compares pre-trained Audio Neural Networks (PANNs) and ImageNet pre-trained models within the context of underwater acoustic target recognition (UATR). It was observed that the ImageNet pre-trained models slightly out-perform pre-trained audio models in passive sonar classification. We also analyzed the impact of audio sampling rates for model pre-training and fine-tuning. This study contributes to transfer learning applications of UATR, illustrating the potential of pre-trained models to address limitations caused by scarce, labeled data in the UATR domain.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# I Never Said That:A data, taxonomy and baselines on response clarity classification "I Never Said That": A dataset, taxonomy and baselines on response clarity classification ( http://arxiv.org/abs/2409.13879v1 ) ライセンス: Link先を確認	Konstantinos Thomas, Giorgos Filandrianos, Maria Lymperaiou, Chrysoula Zerva, Giorgos Stamou,	(参考訳) 公言における平等と曖昧さは、特に政治科学や政治インタビューの分析において、よく研究された談話現象である。本研究では,政治インタビューから抽出した質問に対する回答の明確さに関する密接に関連する問題を,LLM(Large Language Models)の能力と人間の専門性を活かして解決することを目的としている。そこで本研究では,応答の明瞭さを検知・分類するタスクを編み出した新しい分類法と,政治的インタビューから抽出された質問応答(QA)ペアからなる対応する明瞭度分類データセットを導入する。提案する2段階分類法は,与えられた質問(ハイレベル)に提供された情報の観点から応答の明確さに対処し,不明瞭で不明瞭な応答(低レベル)に関連する回避手法のきめ細かい分類法を提供する。我々はChatGPTと人間のアノテーションを組み合わせて、政治インタビューから個別のQAペアを収集、検証、注釈し、新たに導入された応答明確化タスクに使用します。我々は、さまざまなモデルアーキテクチャ、サイズ、適応手法を用いて詳細な分析を行い、洞察を得、提案したデータセットとタスクに対して新たなベースラインを確立するために、いくつかの実験を行う。 Equivocation and ambiguity in public speech are well-studied discourse phenomena, especially in political science and analysis of political interviews. Inspired by the well-grounded theory on equivocation, we aim to resolve the closely related problem of response clarity in questions extracted from political interviews, leveraging the capabilities of Large Language Models (LLMs) and human expertise. To this end, we introduce a novel taxonomy that frames the task of detecting and classifying response clarity and a corresponding clarity classification dataset which consists of question-answer (QA) pairs drawn from political interviews and annotated accordingly. Our proposed two-level taxonomy addresses the clarity of a response in terms of the information provided for a given question (high-level) and also provides a fine-grained taxonomy of evasion techniques that relate to unclear, ambiguous responses (lower-level). We combine ChatGPT and human annotators to collect, validate and annotate discrete QA pairs from political interviews, to be used for our newly introduced response clarity task. We provide a detailed analysis and conduct several experiments with different model architectures, sizes and adaptation methods to gain insights and establish new baselines over the proposed dataset and task.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# ヒストグラム層時間遅延ニューラルネットワークを用いた時間周波数特性の検討 Investigation of Time-Frequency Feature Combinations with Histogram Layer Time Delay Neural Networks ( http://arxiv.org/abs/2409.13881v1 ) ライセンス: Link先を確認	Amirmohammad Mohammadi, Iren'e Masabarakiza, Ethan Barnes, Davelle Carreiro, Alexandra Van Dine, Joshua Peeples,	(参考訳) 深層学習は手動による特徴抽出の頻度を減少させているが、特に水中音響信号のモデル性能向上には、特徴工学によるデータの変換が不可欠である。音声信号が時間周波数表現に変換され、これらのスペクトルのその後の処理が性能に大きな影響を及ぼす。この研究は、ヒストグラム層時間遅延ニューラルネットワークにおいて、異なる時間周波数特徴の組み合わせを使用することによる性能への影響を示す。最適な特徴セットは、特定の特徴の組み合わせが単一データ特徴より優れていることを示す結果と同一視される。 While deep learning has reduced the prevalence of manual feature extraction, transformation of data via feature engineering remains essential for improving model performance, particularly for underwater acoustic signals. The methods by which audio signals are converted into time-frequency representations and the subsequent handling of these spectrograms can significantly impact performance. This work demonstrates the performance impact of using different combinations of time-frequency features in a histogram layer time delay neural network. An optimal set of features is identified with results indicating that specific feature combinations outperform single data features.	翻訳日:2024-11-07 04:50:50 公開日:2024-09-20
# マルチLLMデバイアスフレームワーク A Multi-LLM Debiasing Framework ( http://arxiv.org/abs/2409.13884v1 ) ライセンス: Link先を確認	Deonna M. Owens, Ryan A. Rossi, Sungchul Kim, Tong Yu, Franck Dernoncourt, Xiang Chen, Ruiyi Zhang, Jiuxiang Gu, Hanieh Deilamsalehy, Nedim Lipka,	(参考訳) 大規模言語モデル(LLM)は、社会に多大な利益をもたらす可能性がある強力なツールであるが、社会的不平等を持続するバイアスを示す。データ強化、ゼロショットプロンプト、モデル微調整を用いたバイアス緩和技術が大幅に進歩したにもかかわらず、バイアスは継続的に持続し、微妙なバイアスは人間の検出を妨げる可能性がある。近年,LLMにおける推論や事実性の向上に有効なマルチLLMアプローチへの関心が高まっている。本稿では,LLMのバイアス低減を目的としたマルチLLM脱バイアスフレームワークを提案する。我々の研究は,LLMを疎結合化するための2つの異なるアプローチを初めて導入し,評価した。我々のマルチLLMフレームワークは,LLMのバイアスを著しく低減し,複数のソーシャルグループでベースライン法よりも優れていた。 Large Language Models (LLMs) are powerful tools with the potential to benefit society immensely, yet, they have demonstrated biases that perpetuate societal inequalities. Despite significant advancements in bias mitigation techniques using data augmentation, zero-shot prompting, and model fine-tuning, biases continuously persist, including subtle biases that may elude human detection. Recent research has shown a growing interest in multi-LLM approaches, which have been demonstrated to be effective in improving the quality of reasoning and factuality in LLMs. Building on this approach, we propose a novel multi-LLM debiasing framework aimed at reducing bias in LLMs. Our work is the first to introduce and evaluate two distinct approaches within this framework for debiasing LLMs: a centralized method, where the conversation is facilitated by a single central LLM, and a decentralized method, where all models communicate directly. Our findings reveal that our multi-LLM framework significantly reduces bias in LLMs, outperforming the baseline method across several social groups.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# 直観的物理学の先駆者によるビデオゲームの学習 Learning to Play Video Games with Intuitive Physics Priors ( http://arxiv.org/abs/2409.13886v1 ) ライセンス: Link先を確認	Abhishek Jaiswal, Nisheeth Srivastava,	(参考訳) ビデオゲームは極めて構造化された領域であり、アルゴリズムによる意思決定は現実世界に悪影響を及ぼすことなくテストできる。状態空間の表現を手作りする問題を避けるために画像入力が主流であるが、このアプローチは人間が実際にゲームを学ぶ方法から体系的に分離する。本稿では,多数のビデオゲームでよく一般化されたオブジェクトベースの入力表現を設計する。これらの表現を用いて,実世界の物理の直観的表現から導かれる単純な帰納的バイアスを用いて,幼児に似たゲームを学ぶエージェントの能力を評価する。このようなバイアスを用いて、Q-ラーニングアルゴリズムで使用されるオブジェクトカテゴリ表現を構築し、観察対象の余裕に基づいて複数のゲームがどれだけうまく遊べるかを評価する。以上の結果から,人間のようなオブジェクトインタラクションは,複数のビデオゲームを遊べるように学習し,特に不慣れなオブジェクトに対して,より優れた一般化性を示すことが示唆された。このような手法をさらに探求することで、機械は人間中心の方法で学習できるようになる。 Video game playing is an extremely structured domain where algorithmic decision-making can be tested without adverse real-world consequences. While prevailing methods rely on image inputs to avoid the problem of hand-crafting state space representations, this approach systematically diverges from the way humans actually learn to play games. In this paper, we design object-based input representations that generalize well across a number of video games. Using these representations, we evaluate an agent's ability to learn games similar to an infant - with limited world experience, employing simple inductive biases derived from intuitive representations of physics from the real world. Using such biases, we construct an object category representation to be used by a Q-learning algorithm and assess how well it learns to play multiple games based on observed object affordances. Our results suggest that a human-like object interaction setup capably learns to play several video games, and demonstrates superior generalizability, particularly for unfamiliar objects. Further exploring such methods will allow machines to learn in a human-centric way, thus incorporating more human-like learning benefits.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# コントラスト学習を用いたグラフGCCAによる脳認識フィンガープリント Brain-Cognition Fingerprinting via Graph-GCCA with Contrastive Learning ( http://arxiv.org/abs/2409.13887v1 ) ライセンス: Link先を確認	Yixin Wang, Wei Peng, Yu Zhang, Ehsan Adeli, Qingyu Zhao, Kilian M. Pohl,	(参考訳) 多くの縦断的神経画像研究は、脳機能と認知の間の動的相互作用を研究することによって、脳の老化と疾患の理解を改善することを目的としている。そのためには、時間とともに個々の変動を考慮しながら、それらの多次元関係を正確に符号化する必要がある。そこで本研究では,グラフ注意ネットワークと一般化正準相関解析を用いて,それらの関係を符号化した教師なし学習モデル(‘underline{\textbf{Co}}ntrastive Learning-based \underline{\textbf{Gra}}ph Generalized \underline{\textbf{Ca}}nonical correlation Analysis(CoGraCa)’)を提案する。個々の人の独特の神経・認知表現型を反映した脳認知指紋を作成するために、モデルは個別化およびマルチモーダルのコントラスト学習にも依存する。安静時機能MRIと各参加者の複数訪問時に取得した認知的指標からなる健常者の縦断的データセットにCoGraCaを適用した。生成された指紋は、性別と年齢を識別する上で、大きな個人差を効果的に捉え、現在のシングルモーダルモデルとCCAベースのマルチモーダルモデルより優れている。さらに重要なのは、この2つのモダリティ間の解釈可能な相互作用を提供するエンコーディングです。 Many longitudinal neuroimaging studies aim to improve the understanding of brain aging and diseases by studying the dynamic interactions between brain function and cognition. Doing so requires accurate encoding of their multidimensional relationship while accounting for individual variability over time. For this purpose, we propose an unsupervised learning model (called \underline{\textbf{Co}}ntrastive Learning-based \underline{\textbf{Gra}}ph Generalized \underline{\textbf{Ca}}nonical Correlation Analysis (CoGraCa)) that encodes their relationship via Graph Attention Networks and generalized Canonical Correlational Analysis. To create brain-cognition fingerprints reflecting unique neural and cognitive phenotype of each person, the model also relies on individualized and multimodal contrastive learning. We apply CoGraCa to longitudinal dataset of healthy individuals consisting of resting-state functional MRI and cognitive measures acquired at multiple visits for each participant. The generated fingerprints effectively capture significant individual differences and outperform current single-modal and CCA-based multimodal models in identifying sex and age. More importantly, our encoding provides interpretable interactions between those two modalities.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# Recommender システムにおけるコンテキスト多要素帯域の因果的特徴選択法 Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System ( http://arxiv.org/abs/2409.13888v1 ) ライセンス: Link先を確認	Zhenyu Zhao, Yexi Jiang,	(参考訳) 特徴(文脈)は、文脈的マルチアーム・バンディット(MAB)のパフォーマンスに重要である。大規模オンラインシステムでは,重要な特徴の欠如が準最適報酬の帰結を招き,不適切な機能を含むと過度に適合し,モデル解釈性が低下し,実装コストが低下する。しかし、従来の機械学習モデルの特徴選択手法は、結果変数と相関する特徴を選択するが、文脈MABにとって真に重要である腕間で不均一な処理効果をもたらすとは限らないため、文脈MABのユースケースでは失敗する。本稿では,その特徴が報奨分布に寄与する異種因果効果に基づいて,同種MAB問題のためのモデルフリー特徴選択手法を提案する。推薦システムにおけるコンテンツカバー画像を最適化するためのオンライン実験から,合成データと実データに基づいて実験的な評価を行う。その結果、この特徴選択法は、重要でない特徴よりも、文脈的MAB報酬につながる重要な特徴を効果的に選択することを示した。モデル組込み手法と比較して,このモデルフリー手法は高速な計算速度,実装の容易さ,不特定問題の突発性を生かしている。 Features (a.k.a. context) are critical for contextual multi-armed bandits (MAB) performance. In practice of large scale online system, it is important to select and implement important features for the model: missing important features can led to sub-optimal reward outcome, and including irrelevant features can cause overfitting, poor model interpretability, and implementation cost. However, feature selection methods for conventional machine learning models fail short for contextual MAB use cases, as conventional methods select features correlated with the outcome variable, but not necessarily causing heterogeneuous treatment effect among arms which are truely important for contextual MAB. In this paper, we introduce model-free feature selection methods designed for contexutal MAB problem, based on heterogeneous causal effect contributed by the feature to the reward distribution. Empirical evaluation is conducted based on synthetic data as well as real data from an online experiment for optimizing content cover image in a recommender system. The results show this feature selection method effectively selects the important features that lead to higher contextual MAB reward than unimportant features. Compared with model embedded method, this model-free method has advantage of fast computation speed, ease of implementation, and prune of model mis-specification issues.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# 大規模言語モデルを用いた臨床概念埋め込みを用いた伝達学習 Transfer Learning with Clinical Concept Embeddings from Large Language Models ( http://arxiv.org/abs/2409.13893v1 ) ライセンス: Link先を確認	Yuhe Gao, Runxue Bao, Yuelyu Ji, Yiming Sun, Chenxi Song, Jeffrey P. Ferraro, Ye Ye,	(参考訳) 知識共有は医療において重要であり、特に複数の臨床現場からのデータを活用してデータの不足に対処し、コストを削減し、タイムリーな介入を可能にする。トランスファーラーニングは、クロスサイト知識の伝達を促進するが、大きな課題は、異なる部位にわたる臨床概念における異質性である。大言語モデル (LLM) は, 臨床的概念の意味を捉え, 異種性を減少させる重要な可能性を示す。本研究では,2つの大規模医療システムからの電子健康記録を分析し,LLMのセマンティック埋め込みが局所的,共有的,移動的学習モデルに与える影響を評価する。結果は、Med-BERTのようなドメイン固有のLLMは、ローカルおよび直接転送のシナリオで一貫して性能が向上し、OpenAI埋め込みのような汎用モデルでは、最適なパフォーマンスのために微調整が必要であることを示している。しかしながら、バイオメディカルな埋め込みを伴うモデルの過剰なチューニングは、バランスの必要性を強調する効果を低下させる可能性がある。本研究は、医療における効果的な知識伝達のためのドメイン固有の埋め込みと注意深いモデルチューニングの重要性を強調した。 Knowledge sharing is crucial in healthcare, especially when leveraging data from multiple clinical sites to address data scarcity, reduce costs, and enable timely interventions. Transfer learning can facilitate cross-site knowledge transfer, but a major challenge is heterogeneity in clinical concepts across different sites. Large Language Models (LLMs) show significant potential of capturing the semantic meaning of clinical concepts and reducing heterogeneity. This study analyzed electronic health records from two large healthcare systems to assess the impact of semantic embeddings from LLMs on local, shared, and transfer learning models. Results indicate that domain-specific LLMs, such as Med-BERT, consistently outperform in local and direct transfer scenarios, while generic models like OpenAI embeddings require fine-tuning for optimal performance. However, excessive tuning of models with biomedical embeddings may reduce effectiveness, emphasizing the need for balance. This study highlights the importance of domain-specific embeddings and careful model tuning for effective knowledge transfer in healthcare.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# PTQ4ADM:効率的なテキスト条件付き音声拡散モデルのための後処理量子化 PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models ( http://arxiv.org/abs/2409.13894v1 ) ライセンス: Link先を確認	Jayneel Vora, Aditya Krishnan, Nader Bouacida, Prabhu RV Shankar, Prasant Mohapatra,	(参考訳) 拡散モデルは、画像、オーディオ、ビデオドメインにわたる生成タスクの最先端として現れ、高品質で多様な、文脈に関連のあるデータを生み出している。しかし、より広範な採用は、高い計算コストと大きなメモリフットプリントによって制限される。ポストトレーニング量子化(PTQ)は、低帯域幅パラメータによるモデル複雑性の低減によるこれらの課題を軽減するための有望なアプローチを提供する。しかし、拡散モデルへのPTQの直接適用は、特にテキスト対オーディオ合成のような条件付きタスクにおいて、複数のデノナイズステップにまたがる蓄積量子化ノイズによる合成品質を劣化させる可能性がある。本研究は、音声拡散モデル(ADM)を定量化する新しいフレームワークであるPTQ4ADMを紹介する。本研究の主な貢献は,(1)カバレッジ駆動型プロンプト拡張法,(2)テキスト条件ADMのアクティベーション対応校正セット生成アルゴリズムである。これらの技術は、合成の忠実さを維持しながら、オーディオの側面とモダリティを包括的に網羅することを保証する。我々は,テキスト条件音声生成のためのTANGO,Make-An-Audio,Audioモデルに対するアプローチを検証する。 PTQ4ADMは、完全精度モデルに匹敵する合成品質指標(FDスコアの$<$5\%増加)を達成しつつ、モデルサイズを最大70\%減少させる能力を示した。バックボーンネットワーク内の特定の層を4ビットの重みと8ビットのアクティベーションに定量化できることを示す。この作業は、リソース制約のある環境でのADMのより効率的なデプロイの道を開く。 Denoising diffusion models have emerged as state-of-the-art in generative tasks across image, audio, and video domains, producing high-quality, diverse, and contextually relevant data. However, their broader adoption is limited by high computational costs and large memory footprints. Post-training quantization (PTQ) offers a promising approach to mitigate these challenges by reducing model complexity through low-bandwidth parameters. Yet, direct application of PTQ to diffusion models can degrade synthesis quality due to accumulated quantization noise across multiple denoising steps, particularly in conditional tasks like text-to-audio synthesis. This work introduces PTQ4ADM, a novel framework for quantizing audio diffusion models(ADMs). Our key contributions include (1) a coverage-driven prompt augmentation method and (2) an activation-aware calibration set generation algorithm for text-conditional ADMs. These techniques ensure comprehensive coverage of audio aspects and modalities while preserving synthesis fidelity. We validate our approach on TANGO, Make-An-Audio, and AudioLDM models for text-conditional audio generation. Extensive experiments demonstrate PTQ4ADM's capability to reduce the model size by up to 70\% while achieving synthesis quality metrics comparable to full-precision models($<$5\% increase in FD scores). We show that specific layers in the backbone network can be quantized to 4-bit weights and 8-bit activations without significant quality loss. This work paves the way for more efficient deployment of ADMs in resource-constrained environments.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# LLM for Everyone: 大規模言語モデルにおける下記表現の表現 LLM for Everyone: Representing the Underrepresented in Large Language Models ( http://arxiv.org/abs/2409.13897v1 ) ライセンス: Link先を確認	Samuel Cahyawijaya,	(参考訳) 自然言語処理(NLP)は、多数のタスクに優れた大規模言語モデル(LLM)の重大な影響を目撃している。しかし、多言語設定におけるLLMの制限、特に表現不足言語では、依然として大きなハードルとなっている。この論文は、表現不足言語に焦点をあてて、NLPの研究と開発におけるギャップを埋めることを目的としている。 LLMの総合的な評価を行い、これらの言語におけるそれらの能力を評価し、多言語および多文化の一般化の課題を明らかにする。この論文は多言語一般化のギャップに対処し,LLM能力の相違を軽減し,タスク一般化能力の喪失を伴わずに,未表現言語上でのより優れた一般化を実現するためのデータと計算効率の手法を提案する。提案手法は、言語間連続的命令チューニング、検索に基づく言語間インコンテキスト学習、コンテキスト内クエリアライメントを網羅する。さらに,異なる言語で動作するLDM間の文化的価値アライメントを測定する手法を提案する。これらの貢献は、LLMの多言語的・多文化的なアライメントを強化することを目的としており、最終的にNLP分野をより大きな平等と包摂性へと進めることを目的としている。 Natural language processing (NLP) has witnessed a profound impact of large language models (LLMs) that excel in a multitude of tasks. However, the limitation of LLMs in multilingual settings, particularly in underrepresented languages, remains a significant hurdle. This thesis aims to bridge the gap in NLP research and development by focusing on underrepresented languages. A comprehensive evaluation of LLMs is conducted to assess their capabilities in these languages, revealing the challenges of multilingual and multicultural generalization. Addressing the multilingual generalization gap, this thesis proposes data-and-compute-efficient methods to mitigate the disparity in LLM ability in underrepresented languages, allowing better generalization on underrepresented languages without the loss of task generalization ability. The proposed solutions cover cross-lingual continual instruction tuning, retrieval-based cross-lingual in-context learning, and in-context query alignment. Furthermore, a novel method to measure cultural values alignment between LLMs operating in different languages is proposed, ensuring cultural sensitivity and inclusivity. These contributions aim to enhance the multilingual and multicultural alignment of LLMs in underrepresented languages, ultimately advancing the NLP field toward greater equality and inclusiveness.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# ドメイン固有検索オーグメント生成による大規模言語モデルの強化:眼科における長期消費者健康調査を事例として Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology ( http://arxiv.org/abs/2409.13902v1 ) ライセンス: Link先を確認	Aidan Gilson, Xuguang Ai, Thilaka Arunachalam, Ziyou Chen, Ki Xiong Cheong, Amisha Dave, Cameron Duic, Mercy Kibe, Annette Kaminaka, Minali Prasad, Fares Siddig, Maxwell Singer, Wendy Wong, Qiao Jin, Tiarnan D. L. Keenan, Xia Hu, Emily Y. Chew, Zhiyong Lu, Hua Xu, Ron A. Adelman, Yih-Chung Tham, Qingyu Chen,	(参考訳) 医学におけるLarge Language Models (LLMs) の可能性にもかかわらず、それらが支持する証拠や幻覚的証拠に基づく応答を生じさせる可能性がある。 Retrieval Augment Generation (RAG)はこの問題に対処するために人気があるが、下流ドメイン固有のアプリケーションでRAGを実装し評価する研究はほとんどない。我々は,7万件の眼科用文書を用いたRAGパイプラインを開発し,推測時間にLCMを増大させるために関連文書を検索した。長期の消費者健康問題に対する事例研究において,10名の医療従事者を対象に,RAGと非RAGに関する500件以上のLCMの基準を含む回答を体系的に評価した。この評価は、証拠の事実性、証拠の選択とランキング、証拠の帰属、そして正確性と完全性に答えることに焦点を当てている。 RAGのないLLMは合計252の参照を提供した。そのうち45.3%が幻覚、34.1%が小さな誤り、20.6%が正しい。対照的に、RAGのLLMは精度を著しく向上させ(54.5%が正しい)、小さな幻覚で18.8%、エラーで26.7%の誤差を減らした。 RAGが取得した上位10の文書のうち62.5%がLCMレスポンスの上位リファレンスに選ばれ、平均ランクは4.9である。 RAGの使用はまた、証拠属性(5点スケールで1.85から2.49に増加、P<0.001)、精度がわずかに低下する(3.52から3.23、P=0.03)、完全性(3.47から3.27、P=0.17)を改善した。以上の結果から, LLMは, 医療領域における下流の応用への懸念を喚起し, 幻覚的, 誤った証拠をしばしば示していたことが示唆された。 RAGはそのような証拠の割合を大幅に減らしたが、課題に遭遇した。 Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. While Retrieval Augment Generation (RAG) is popular to address this issue, few studies implemented and evaluated RAG in downstream domain-specific applications. We developed a RAG pipeline with 70,000 ophthalmology-specific documents that retrieve relevant documents to augment LLMs during inference time. In a case study on long-form consumer health questions, we systematically evaluated the responses including over 500 references of LLMs with and without RAG on 100 questions with 10 healthcare professionals. The evaluation focuses on factuality of evidence, selection and ranking of evidence, attribution of evidence, and answer accuracy and completeness. LLMs without RAG provided 252 references in total. Of which, 45.3% hallucinated, 34.1% consisted of minor errors, and 20.6% were correct. In contrast, LLMs with RAG significantly improved accuracy (54.5% being correct) and reduced error rates (18.8% with minor hallucinations and 26.7% with errors). 62.5% of the top 10 documents retrieved by RAG were selected as the top references in the LLM response, with an average ranking of 4.9. The use of RAG also improved evidence attribution (increasing from 1.85 to 2.49 on a 5-point scale, P<0.001), albeit with slight decreases in accuracy (from 3.52 to 3.23, P=0.03) and completeness (from 3.47 to 3.27, P=0.17). The results demonstrate that LLMs frequently exhibited hallucinated and erroneous evidence in the responses, raising concerns for downstream applications in the medical domain. RAG substantially reduced the proportion of such evidence but encountered challenges.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# CI-Bench: 合成データに基づくAIアシスタントのコンテキスト統合のベンチマーク CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data ( http://arxiv.org/abs/2409.13903v1 ) ライセンス: Link先を確認	Zhao Cheng, Diane Wan, Matthew Abueg, Sahra Ghalebikesabi, Ren Yi, Eugene Bagdasarian, Borja Balle, Stefan Mellem, Shawn O'Banion,	(参考訳) 生成AIの進歩は、ユーザーに代わって多様なタスクを実行するパーソナライズされたアプリケーションの新しい時代に向かっている。一般のAIアシスタントはまだ完全には登場していないが、個人情報を共有する可能性を秘めている。本稿では、モデル推論中に個人情報を保護するAIアシスタントの能力を評価するための総合的な総合的なベンチマークであるCI-Benchを紹介する。我々のベンチマークでは、コンテキスト整合性フレームワークを活用することで、役割、情報タイプ、伝達原則を含む重要なコンテキスト次元にわたる情報フローの体系的な評価を可能にします。本稿では,対話や電子メールを含む自然なコミュニケーションを生成するための,新しい,スケーラブルで多段階の合成データパイプラインを提案する。より小さく、狭く焦点を絞った以前の研究とは違って、我々は、対話やメールを含む自然なコミュニケーションを合成的に生成する、スケーラブルでマルチステップのデータパイプラインを新たに提示し、8つのドメインで4400万のテストサンプルを生成する。さらに、AIアシスタントを定式化し、評価し、パーソナルアシスタントタスクに対するさらなる研究と注意深いトレーニングの必要性を実証する。 CI-Benchは将来の言語モデル開発、デプロイメント、システム設計、データセット構築をガイドする貴重なツールとして、最終的にはユーザのプライバシの期待に沿うAIアシスタントの開発に寄与する、と私たちは考えています。 Advances in generative AI point towards a new era of personalized applications that perform diverse tasks on behalf of users. While general AI assistants have yet to fully emerge, their potential to share personal data raises significant privacy challenges. This paper introduces CI-Bench, a comprehensive synthetic benchmark for evaluating the ability of AI assistants to protect personal information during model inference. Leveraging the Contextual Integrity framework, our benchmark enables systematic assessment of information flow across important context dimensions, including roles, information types, and transmission principles. We present a novel, scalable, multi-step synthetic data pipeline for generating natural communications, including dialogues and emails. Unlike previous work with smaller, narrowly focused evaluations, we present a novel, scalable, multi-step data pipeline that synthetically generates natural communications, including dialogues and emails, which we use to generate 44 thousand test samples across eight domains. Additionally, we formulate and evaluate a naive AI assistant to demonstrate the need for further study and careful training towards personal assistant tasks. We envision CI-Bench as a valuable tool for guiding future language model development, deployment, system design, and dataset construction, ultimately contributing to the development of AI assistants that align with users' privacy expectations.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# 狭義ニューラルネットワークの高次元学習 High-dimensional learning of narrow neural networks ( http://arxiv.org/abs/2409.13904v1 ) ライセンス: Link先を確認	Hugo Cui,	(参考訳) 近年は、高速ペースの多様化と機械学習アプリケーションの普及が目覚ましい。しかし、高次元データから学習するニューラルネットワークの驚くべき効率に関する理論的な理解は、いまだに明らかになっていない。この試みにおいて、統計物理学にインスパイアされた分析は、幅広い種類の可解モデルに対して、高次元におけるニューラルネットワークの学習の厳密な漸近的特徴付けを可能にした。この写本は、この作品の最近の進歩の根底にあるツールとアイデアをレビューしている。本稿では,これまで研究されてきたモデルを特殊インスタンスとして包含する汎用モデル(シーケンスマルチインデックスモデル)を提案する。この統合されたフレームワークは、多層パーセプトロン、オートエンコーダ、アテンションメカニズムを含む限られた数の隠れユニットを持つ機械学習アーキテクチャの幅広いクラスをカバーする。複製法や近似メッセージパッシングアルゴリズムなどの統計物理手法を用いて,シーケンス多重インデックスモデルの学習過程を詳細に解析する。そこで本書は,いくつかの先行研究で報告された分析の統一的なプレゼンテーションと,機械学習の統計物理学分野における中心的技術の概要を述べる。このレビューは、統計物理学のアプローチに目を向ける機械学習理論家にとって有用なプライマーであり、また、そのようなアイデアをニューラルネットワークの研究に移行することに関心を持つ統計物理学者にとっても価値があるだろう。 Recent years have been marked with the fast-pace diversification and increasing ubiquity of machine learning applications. Yet, a firm theoretical understanding of the surprising efficiency of neural networks to learn from high-dimensional data still proves largely elusive. In this endeavour, analyses inspired by statistical physics have proven instrumental, enabling the tight asymptotic characterization of the learning of neural networks in high dimensions, for a broad class of solvable models. This manuscript reviews the tools and ideas underlying recent progress in this line of work. We introduce a generic model -- the sequence multi-index model -- which encompasses numerous previously studied models as special instances. This unified framework covers a broad class of machine learning architectures with a finite number of hidden units, including multi-layer perceptrons, autoencoders, attention mechanisms; and tasks, including (un)supervised learning, denoising, contrastive learning, in the limit of large data dimension, and comparably large number of samples. We explicate in full detail the analysis of the learning of sequence multi-index models, using statistical physics techniques such as the replica method and approximate message-passing algorithms. This manuscript thus provides a unified presentation of analyses reported in several previous works, and a detailed overview of central techniques in the field of statistical physics of machine learning. This review should be a useful primer for machine learning theoreticians curious of statistical physics approaches; it should also be of value to statistical physicists interested in the transfer of such ideas to the study of neural networks.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# 経済学のための量子モンテカルロ:ストレステストとマクロ経済学的ディープラーニング Quantum Monte Carlo for Economics: Stress Testing and Macroeconomic Deep Learning ( http://arxiv.org/abs/2409.13909v1 ) ライセンス: Link先を確認	Vladimir Skavysh, Sofia Priazhkina, Diego Guala, Thomas R. Bromley,	(参考訳) 計算手法は、経済分析のフロンティアを開放し、達成可能なもののボトルネックとして機能する。我々は、Quantum Monte Carlo(QMC)アルゴリズムが経済応用のランタイムを改善することができるかどうかを初めて研究する。量子コンピューティング、特にQMCアルゴリズムの詳細を紹介する。次に、量子回路の定式化とエンコード方法を説明する。 (a)信用ショック及び火災販売を伴う銀行ストレステストモデル b) 深層学習で解決した新古典的投資モデル (c)ディープニューラルネットワークで解決した現実的なマクロモデル。本稿では、QMCと古典計算システムの計算能力について論じ、QMCのベンチマークにおけるいくつかの革新について述べる。 Computational methods both open the frontiers of economic analysis and serve as a bottleneck in what can be achieved. We are the first to study whether Quantum Monte Carlo (QMC) algorithm can improve the runtime of economic applications and challenges in doing so. We provide a detailed introduction to quantum computing and especially the QMC algorithm. Then, we illustrate how to formulate and encode into quantum circuits (a) a bank stress testing model with credit shocks and fire sales, (b) a neoclassical investment model solved with deep learning, and (c) a realistic macro model solved with deep neural networks. We discuss potential computational gains of QMC versus classical computing systems and present a few innovations in benchmarking QMC.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# OneBEV:1つのパノラマ画像による鳥の視点のセマンティックマッピング OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping ( http://arxiv.org/abs/2409.13912v1 ) ライセンス: Link先を確認	Jiale Wei, Junwei Zheng, Ruiping Liu, Jie Hu, Jiaming Zhang, Rainer Stiefelhagen,	(参考訳) 自律運転の分野では、ピンホールフロントビューイメージやパノラマに比べて包括的な情報を提供するため、バードズアイビュー(BEV)の認識がコミュニティの注目を集めている。複数の狭視野カメラと複雑なポーズ推定に依存する従来のBEV法は、しばしば校正と同期の問題に直面している。上記の課題の壁を壊すため,1つのパノラマ画像のみを入力として用いた新しいBEVセマンティックマッピング手法であるOneBEVを導入し,マッピングプロセスの簡素化と計算複雑性の低減を図る。 Mamba View Transformation (MVT)と呼ばれる歪み認識モジュールは、パノラマの空間歪みを処理し、従来の注意機構を使わずにフロントビュー機能をBEV機能に変換するように設計されている。効率的なフレームワークとは別に、OneBEVタスク用に調整されたnuScenes-360とDeepAccident-360という2つのデータセットをコントリビュートする。実験の結果、OneBEV は nuScenes-360 と DeepAccident-360 で 51.1% と 36.1% mIoU の最先端性能を達成した。この作業は、自律運転におけるBEVセマンティックマッピングを前進させ、より高度で信頼性の高い自律システムへの道を開く。 In the field of autonomous driving, Bird's-Eye-View (BEV) perception has attracted increasing attention in the community since it provides more comprehensive information compared with pinhole front-view images and panoramas. Traditional BEV methods, which rely on multiple narrow-field cameras and complex pose estimations, often face calibration and synchronization issues. To break the wall of the aforementioned challenges, in this work, we introduce OneBEV, a novel BEV semantic mapping approach using merely a single panoramic image as input, simplifying the mapping process and reducing computational complexities. A distortion-aware module termed Mamba View Transformation (MVT) is specifically designed to handle the spatial distortions in panoramas, transforming front-view features into BEV features without leveraging traditional attention mechanisms. Apart from the efficient framework, we contribute two datasets, i.e., nuScenes-360 and DeepAccident-360, tailored for the OneBEV task. Experimental results showcase that OneBEV achieves state-of-the-art performance with 51.1% and 36.1% mIoU on nuScenes-360 and DeepAccident-360, respectively. This work advances BEV semantic mapping in autonomous driving, paving the way for more advanced and reliable autonomous systems.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# 標的単語活動検出器:辞書を使わずにASR単語境界を得るためのアプローチ Target word activity detector: An approach to obtain ASR word boundaries without lexicon ( http://arxiv.org/abs/2409.13913v1 ) ライセンス: Link先を確認	Sunit Sivasankaran, Eric Sun, Jinyu Li, Yan Huang, Jing Pan,	(参考訳) エンドツーエンド(E2E)のASRモデルから単語のタイムスタンプ情報を取得することは、トレーニング中に明示的な時間アライメントが欠如しているため、依然として困難である。この問題は多言語モデルではさらに複雑である。既存の手法は、レキシコンに依存するか、追加のトークンを導入するかのいずれかであり、スケーラビリティの問題と計算コストの増大につながる。本研究では,語彙に依存することなく単語境界を推定する手法を提案する。本手法は,サブワードトークン単位と事前訓練されたASRモデルからの単語埋め込みを利用して,トレーニング中に単語アライメント情報のみを必要とする。提案手法は,余分なコストを伴わずに,任意の言語にスケールアップすることができる。我々は5つの言語で訓練された多言語ASRモデルを用いてアプローチを検証する。 Obtaining word timestamp information from end-to-end (E2E) ASR models remains challenging due to the lack of explicit time alignment during training. This issue is further complicated in multilingual models. Existing methods, either rely on lexicons or introduce additional tokens, leading to scalability issues and increased computational costs. In this work, we propose a new approach to estimate word boundaries without relying on lexicons. Our method leverages word embeddings from sub-word token units and a pretrained ASR model, requiring only word alignment information during training. Our proposed method can scale-up to any number of languages without incurring any additional cost. We validate our approach using a multilingual ASR model trained on five languages and demonstrate its effectiveness against a strong baseline.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# 分離性、統合性、モデル不確実性によるデータプルーニング-重要度サンプリング Data Pruning via Separability, Integrity, and Model Uncertainty-Aware Importance Sampling ( http://arxiv.org/abs/2409.13915v1 ) ライセンス: Link先を確認	Steven Grosz, Rui Zhao, Rajeev Ranjan, Hongcheng Wang, Manoj Aggarwal, Gerard Medioni, Anil Jain,	(参考訳) 本稿では,重要サンプリングに基づく新しいプルーニング指標とプルーニング手順を導入することにより,画像分類のための既存のデータプルーニング手法を改善する。提案手法は,データ分離性,データの完全性,モデルの不確実性を明示的に考慮し,サンプリング手順はプルーニング率に適応し,クラス内分離とクラス間分離の両方を考慮し,プルーニングの有効性をさらに向上させる。さらに、サンプリング法は、他のプルーニング指標にも容易に適用でき、性能が向上する。全体として、提案手法はハイプルーニング比にうまくスケールし、より詳細な分類シナリオを含む4つのベンチマークデータセットの実験で示されるように、異なる分類モデルにまたがってより良く一般化される。 This paper improves upon existing data pruning methods for image classification by introducing a novel pruning metric and pruning procedure based on importance sampling. The proposed pruning metric explicitly accounts for data separability, data integrity, and model uncertainty, while the sampling procedure is adaptive to the pruning ratio and considers both intra-class and inter-class separation to further enhance the effectiveness of pruning. Furthermore, the sampling method can readily be applied to other pruning metrics to improve their performance. Overall, the proposed approach scales well to high pruning ratio and generalizes better across different classification models, as demonstrated by experiments on four benchmark datasets, including the fine-grained classification scenario.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# 意思決定システムにおける誤差アライメントの測定 Measuring Error Alignment for Decision-Making Systems ( http://arxiv.org/abs/2409.13919v1 ) ライセンス: Link先を確認	Binxia Xu, Antonis Bikakis, Daniel Onah, Andreas Vlachidis, Luke Dickens,	(参考訳) AIシステムは、将来の意思決定プロセスにおいて重要な役割を果たすように設定されているため、信頼性と信頼性は重要な懸念事項である。その規模と複雑さのため、現代のAIシステムは直接解釈に抵抗し、それらのシステムに対する信頼を確立するための代替手段が必要であり、それらがいかに人間の価値観と整合するかを判断する。我々は、AIと人間の情報処理の類似性に関する優れた測定が、これらの同じ目的を達成することができるかもしれないと論じる。表象アライメント(RA)アプローチは、2つのシステムの内部状態間の類似度を測定するが、関連するデータは高価であり、人間のシステムにとって収集が困難である。対照的に、行動アライメント(BA)の比較は安価で容易であるが、その感度と信頼性については疑問が残る。本稿では,同一インスタンス上の2つのシステムのエラーの類似度を計測する行動アライメント指標の誤分類契約と,2つのシステムのエラー分布の類似度を計測するクラスレベルのエラー類似度を新たに提案する。我々のメトリクスはRAメトリクスとよく相関し、他のBAメトリックに補完的な情報を提供し、領域の範囲内で、新しい価値アライメントのアプローチのシーンを設定します。 Given that AI systems are set to play a pivotal role in future decision-making processes, their trustworthiness and reliability are of critical concern. Due to their scale and complexity, modern AI systems resist direct interpretation, and alternative ways are needed to establish trust in those systems, and determine how well they align with human values. We argue that good measures of the information processing similarities between AI and humans, may be able to achieve these same ends. While Representational alignment (RA) approaches measure similarity between the internal states of two systems, the associated data can be expensive and difficult to collect for human systems. In contrast, Behavioural alignment (BA) comparisons are cheaper and easier, but questions remain as to their sensitivity and reliability. We propose two new behavioural alignment metrics misclassification agreement which measures the similarity between the errors of two systems on the same instances, and class-level error similarity which measures the similarity between the error distributions of two systems. We show that our metrics correlate well with RA metrics, and provide complementary information to another BA metric, within a range of domains, and set the scene for a new approach to value alignment.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# サンスクリット NLP タスクのための統一モデル ByT5-Sanskrit One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit NLP Tasks ( http://arxiv.org/abs/2409.13920v1 ) ライセンス: Link先を確認	Sebastian Nehrdich, Oliver Hellwig, Kurt Keutzer,	(参考訳) 形態的にリッチな言語は、下流のNLPアプリケーションのために処理することが難しいことで有名である。本稿では,形態的にリッチなサンスクリット言語を含むNLPアプリケーション向けに設計された,新しい事前学習型言語モデルByT5-Sanskritを提案する。 ByT5-Sanskritは,既存のサンスクリット単語セグメンテーションタスクにおいて,従来のデータ駆動手法よりもかなり優れており,現行のレキシコンモデルの性能と一致している。外部の言語資源によってカバーされていないデータへのデプロイが容易で、より堅牢である。また、Vedic Sanskrit依存性解析とOCRポストコレクトタスクにおいて、最先端の新たな結果も達成している。さらに,サンスクリットのDigital Corpusに基づいて,サンスクリット語のセグメンテーション,補題化,モルフォシンタクティックタギングタスクの共同トレーニングのための新しいマルチタスクデータセットを導入する。このデータセットでByT5-Sanskritを微調整し、様々な下流のSanskritアプリケーションのための多目的マルチタスクモデルを作成します。我々はこのモデルをサンスクリット言語アノテーションプロジェクト、情報検索のセットアップ、サンスクリット機械翻訳パイプラインにおける前処理のステップとして利用してきた。また,本手法は,他の形態学的にリッチな言語の補題化や依存関係解析のための新たなベストスコアを得ることを示す。そこで我々は, バイトレベルの事前学習型言語モデルにより, 形態的にリッチな言語に対して優れた性能を達成でき, トークン化モデルより優れ, それらの言語に対してNLPパイプラインを構築する際に重要な探索ベクトルを示すことを実証した。 Morphologically rich languages are notoriously challenging to process for downstream NLP applications. This paper presents a new pretrained language model, ByT5-Sanskrit, designed for NLP applications involving the morphologically rich language Sanskrit. We evaluate ByT5-Sanskrit on established Sanskrit word segmentation tasks, where it outperforms previous data-driven approaches by a considerable margin and matches the performance of the current best lexicon-based model. It is easier to deploy and more robust to data not covered by external linguistic resources. It also achieves new state-of-the-art results in Vedic Sanskrit dependency parsing and OCR post-correction tasks. Additionally, based on the Digital Corpus of Sanskrit, we introduce a novel multitask dataset for the joint training of Sanskrit word segmentation, lemmatization, and morphosyntactic tagging tasks. We fine-tune ByT5-Sanskrit on this dataset, creating a versatile multitask model for various downstream Sanskrit applications. We have used this model in Sanskrit linguistic annotation projects, in information retrieval setups, and as a preprocessing step in a Sanskrit machine translation pipeline. We also show that our approach yields new best scores for lemmatization and dependency parsing of other morphologically rich languages. We thus demonstrate that byte-level pretrained language models can achieve excellent performance for morphologically rich languages, outperforming tokenizer-based models and presenting an important vector of exploration when constructing NLP pipelines for such languages.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# 量子古典最適化におけるコヒーレントアプローチ A coherent approach to quantum-classical optimization ( http://arxiv.org/abs/2409.13924v1 ) ライセンス: Link先を確認	Andrés N. Cáliz, Jordi Riu, Josep Bosch, Pau Torrente, Jose Miralles, Arnau Riera,	(参考訳) テンソルネットワーク(TN)を用いた変分量子アルゴリズム(VQA)の事前最適化を取り入れたハイブリッド量子古典最適化技術は、量子計算資源の削減を可能にすることが示されている。現実のユースケースでよく見られる大規模な最適化問題の場合、この戦略は、そうでない実行コストを削減し、結果の品質を向上させるためにほとんど必須である。我々は、コヒーレンスエントロピーを、量子状態の適合性を効果的な初期化候補として決定する重要な指標とみなす。量子近似最適化アルゴリズム (QAOA) の広範な数値実験により, 最適初期化状態は純粋ギブス状態であることが判明した。さらに、これらの結果は古典的最適化問題に適応した単純かつ斬新な表現性の概念を含めることで説明される。そこで本研究では,これらの課題に対する従来のアプローチを大幅に改善する量子古典最適化プロトコルを提案する。 Hybrid quantum-classical optimization techniques, which incorporate the pre-optimization of Variational Quantum Algorithms (VQAs) using Tensor Networks (TNs), have been shown to allow for the reduction of quantum computational resources. In the particular case of large optimization problems, commonly found in real-world use cases, this strategy is almost mandatory to reduce the otherwise unfathomable execution costs and improve the quality of the results. We identify the coherence entropy as a crucial metric in determining the suitability of quantum states as effective initialization candidates. Our findings are validated through extensive numerical tests for the Quantum Approximate Optimization Algorithm (QAOA), in which we find that the optimal initialization states are pure Gibbs states. Further, these results are explained with the inclusion of a simple and yet novel notion of expressivity adapted to classical optimization problems. Based on this finding, we propose a quantum-classical optimization protocol that significantly improves on previous approaches for such tasks, with specific focus on its effectiveness.	翻訳日:2024-11-07 04:39:44 公開日:2024-09-20
# SpaceBlender:3Dシーンのブレンディングでコンテキストリッチなコラボレーションスペースを作る SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending ( http://arxiv.org/abs/2409.13926v1 ) ライセンス: Link先を確認	Nels Numan, Shwetha Rajaram, Balasaravanan Thoravi Kumaravel, Nicolai Marquardt, Andrew D. Wilson,	(参考訳) 生成AIを使用して仮想現実(VR)アプリケーションのための3D空間を作成することへの関心が高まっている。しかし、今日のモデルでは、ユーザの物理的なコンテキストを取り入れることの恩恵を受ける共同作業のサポートが不足している、人工環境が生み出されている。 VRテレプレゼンスをサポートする環境を生成するために,生成AI技術を利用してユーザの物理的環境を統合された仮想空間にブレンドする,新しいパイプラインであるSpaceBlenderを導入する。このパイプラインは、ユーザが提供する2D画像を、深さ推定、メッシュアライメント、幾何学的な先行と適応的なテキストプロンプトによってガイドされた拡散ベースの空間補完からなる反復的なプロセスを通じて、コンテキストに富んだ3D環境に変換する。 20人の参加者がペアで協調的なVR親和性ダイアグラムタスクを行った予備研究において、SpaceBlenderを汎用的な仮想環境と最先端のシーン生成フレームワークと比較し、コラボレーションに適した仮想空間を作成する能力を評価した。参加者はSpaceBlenderが提供する親しみやすさとコンテキストを高く評価しただけでなく、タスクの焦点から逸脱する可能性のある生成環境における複雑さにも言及した。参加者からのフィードバックに基づいて,パイプラインの改善と混合空間の価値と設計について議論する。 There is increased interest in using generative AI to create 3D spaces for Virtual Reality (VR) applications. However, today's models produce artificial environments, falling short of supporting collaborative tasks that benefit from incorporating the user's physical context. To generate environments that support VR telepresence, we introduce SpaceBlender, a novel pipeline that utilizes generative AI techniques to blend users' physical surroundings into unified virtual spaces. This pipeline transforms user-provided 2D images into context-rich 3D environments through an iterative process consisting of depth estimation, mesh alignment, and diffusion-based space completion guided by geometric priors and adaptive text prompts. In a preliminary within-subjects study, where 20 participants performed a collaborative VR affinity diagramming task in pairs, we compared SpaceBlender with a generic virtual environment and a state-of-the-art scene generation framework, evaluating its ability to create virtual spaces suitable for collaboration. Participants appreciated the enhanced familiarity and context provided by SpaceBlender but also noted complexities in the generative environments that could detract from task focus. Drawing on participant feedback, we propose directions for improving the pipeline and discuss the value and design of blended spaces for different scenarios.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# コード生成における補助関数を利用した命令学習型言語モデルの活用 Eliciting Instruction-tuned Code Language Models' Capabilities to Utilize Auxiliary Function for Code Generation ( http://arxiv.org/abs/2409.13928v1 ) ライセンス: Link先を確認	Seonghyeon Lee, Suyeon Kim, Joonwon Jang, Heejae Chon, Dongha Lee, Hwanjo Yu,	(参考訳) 本稿では,コード事前学習言語モデル上に構築された命令学習モデルのコード生成挙動について検討する。本稿では,クエリに追加したり,命令追従機能を備えた補助関数を組み込むための応答プレフィックスを提供することによって,モデルに補助関数を提供する方法をいくつか設計する。実験の結果,基本モデルの補助的機能利用能力と命令追従能力の併用の有効性が示された。特に、オープンソースの言語モデルで我々のアプローチを採用するパフォーマンスは、最近の強力なプロプライエタリな言語モデル、すなわちgpt-4oよりも優れています。 We study the code generation behavior of instruction-tuned models built on top of code pre-trained language models when they could access an auxiliary function to implement a function. We design several ways to provide auxiliary functions to the models by adding them to the query or providing a response prefix to incorporate the ability to utilize auxiliary functions with the instruction-following capability. Our experimental results show the effectiveness of combining the base models' auxiliary function utilization ability with the instruction following ability. In particular, the performance of adopting our approaches with the open-sourced language models surpasses that of the recent powerful proprietary language models, i.e., gpt-4o.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# マルチモーダルAIシステムにおける視点決定の失敗 Failures in Perspective-taking of Multimodal AI Systems ( http://arxiv.org/abs/2409.13929v1 ) ライセンス: Link先を確認	Bridget Leonard, Kristin Woodard, Scott O. Murray,	(参考訳) 本研究は,マルチモーダルAIシステムにおける空間表現に関するこれまでの研究を拡張した。現在のモデルでは、画像からの空間情報の豊富な理解が示されていますが、この情報は、人間や動物の空間認知において使用されるアナログ表現とは異なる、命題表現に根ざしています。これらの限界をさらに探求するため,GPT-4oの視点決定能力を評価するために,認知・発達科学の手法を適用した。我々の分析は、人間の脳の認知発達とマルチモーダルAIの比較を可能にし、将来の研究とモデル開発のためのガイダンスを提供する。 This study extends previous research on spatial representations in multimodal AI systems. Although current models demonstrate a rich understanding of spatial information from images, this information is rooted in propositional representations, which differ from the analog representations employed in human and animal spatial cognition. To further explore these limitations, we apply techniques from cognitive and developmental science to assess the perspective-taking abilities of GPT-4o. Our analysis enables a comparison between the cognitive development of the human brain and that of multimodal AI, offering guidance for future research and model development.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# RN-SDEs:Residual Null-Space Diffusion Stochastic Differential Equations を用いた限定角度CT再構成 RN-SDEs: Limited-Angle CT Reconstruction with Residual Null-Space Diffusion Stochastic Differential Equations ( http://arxiv.org/abs/2409.13930v1 ) ライセンス: Link先を確認	Jiaqi Guo, Santiago Lopez-Tapia, Wing Shun Li, Yunnan Wu, Marcelo Carignano, Vadim Backman, Vinayak P. Dravid, Aggelos K. Katsaggelos,	(参考訳) CTは医用画像から材料分析まで幅広く用いられている画像モダリティである。 1つの大きな課題は、特定の角度でのスキャン情報の欠如から生じ、アーチファクトで歪んだCT画像に繋がる。この結果,リミテッドアングルCT (Limited Angle Computed Tomography, LACT) 再構成問題と呼ばれる問題が発生する。この問題を解決するために,平均回帰確率微分方程式を用いて拡散過程を特徴づける拡散モデルの変種であるResidual Null-Space Diffusion Stochastic Differential Equations (RN-SDEs)を提案する。 RN-SDEの一般化可能性を示すために,ChromSTEMとC4KC-KiTSの2つの異なるLACTデータセットを用いて実験を行った。実験により,学習した平均回帰SDEを先行値として活用し,RNSD(Range-Null Space Decomposition)に基づくデータ一貫性を強調することにより,RN-SDEは高画質の画像の大幅な劣化から復元し,ほとんどのLACTタスクで最先端のパフォーマンスを実現することができることを示す。さらに,計算複雑性と実行効率を定量的に比較し,提案手法の有効性を強調した。 Computed tomography is a widely used imaging modality with applications ranging from medical imaging to material analysis. One major challenge arises from the lack of scanning information at certain angles, leading to distorted CT images with artifacts. This results in an ill-posed problem known as the Limited Angle Computed Tomography (LACT) reconstruction problem. To address this problem, we propose Residual Null-Space Diffusion Stochastic Differential Equations (RN-SDEs), which are a variant of diffusion models that characterize the diffusion process with mean-reverting (MR) stochastic differential equations. To demonstrate the generalizability of RN-SDEs, our experiments are conducted on two different LACT datasets, i.e., ChromSTEM and C4KC-KiTS. Through extensive experiments, we show that by leveraging learned Mean-Reverting SDEs as a prior and emphasizing data consistency using Range-Null Space Decomposition (RNSD) based rectification, RN-SDEs can restore high-quality images from severe degradation and achieve state-of-the-art performance in most LACT tasks. Additionally, we present a quantitative comparison of computational complexity and runtime efficiency, highlighting the superior effectiveness of our proposed approach.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# 大規模合成沈殿・浸水データを用いた生成機械学習による高分解能フラッド確率マッピング High-Resolution Flood Probability Mapping Using Generative Machine Learning with Large-Scale Synthetic Precipitation and Inundation Data ( http://arxiv.org/abs/2409.13936v1 ) ライセンス: Link先を確認	Lipai Huang, Federico Antolini, Ali Mostafavi, Russell Blessing, Matthew Garcia, Samuel D. Brody,	(参考訳) 高解像度の洪水確率マップは、既存の洪水リスク評価手法の限界に対処するために不可欠であるが、歴史的イベントデータの提供によってしばしば制限される。また,物理モデルを用いた確率的洪水図作成に必要なシミュレーションデータの生成は,その実現可能性を抑制するための計算と時間的労力が伴う。このギャップに対処するために,生成機械学習を利用して大規模人工浸水データをシミュレートし,確率的洪水図を作成する新しい手法であるFlood-Precip GAN(Flood-Precipitation Generative Adversarial Network)を紹介した。テキサス州ハリス郡に焦点をあてて、Flood-Precip GANは、限られた数の物理ベースのモデル生成降水フロードイベントを使用して、細胞深度推定器を訓練することから始まる。このモデルは降水量に基づく特徴を強調し、普遍的なモデルよりも優れています。その後、制約のあるGAN(Generative Adversarial Network)を用いて、合成沈殿記録を条件付きで生成する。これらの記録をフィルタリングし、真の降水パターンとの密接な整合性を確保するため、戦略的しきい値が確立されている。各細胞について、合成事象はK-アネレスト近傍アルゴリズムを用いて滑らかに処理され、深さ推定器を通して合成深度分布を導出する。この手順を反復し,1万回の人工降水-降水イベントを発生させた後,異なる浸水深さを考慮し,様々な形態で洪水確率マップを構築した。類似度および相関指標による検証は、真のデータに対する合成深度分布の忠実性を確認する。 Flood-Precip GANは、高解像度の洪水確率マップを作成するのに必要な合成洪水深度データを生成するスケーラブルなソリューションを提供する。 High-resolution flood probability maps are essential for addressing the limitations of existing flood risk assessment approaches but are often limited by the availability of historical event data. Also, producing simulated data needed for creating probabilistic flood maps using physics-based models involves significant computation and time effort inhibiting the feasibility. To address this gap, this study introduces Flood-Precip GAN (Flood-Precipitation Generative Adversarial Network), a novel methodology that leverages generative machine learning to simulate large-scale synthetic inundation data to produce probabilistic flood maps. With a focus on Harris County, Texas, Flood-Precip GAN begins with training a cell-wise depth estimator using a limited number of physics-based model-generated precipitation-flood events. This model, which emphasizes precipitation-based features, outperforms universal models. Subsequently, a Generative Adversarial Network (GAN) with constraints is employed to conditionally generate synthetic precipitation records. Strategic thresholds are established to filter these records, ensuring close alignment with true precipitation patterns. For each cell, synthetic events are smoothed using a K-nearest neighbors algorithm and processed through the depth estimator to derive synthetic depth distributions. By iterating this procedure and after generating 10,000 synthetic precipitation-flood events, we construct flood probability maps in various formats, considering different inundation depths. Validation through similarity and correlation metrics confirms the fidelity of the synthetic depth distributions relative to true data. Flood-Precip GAN provides a scalable solution for generating synthetic flood depth data needed to create high-resolution flood probability maps, significantly enhancing flood preparedness and mitigation efforts.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# クラウド支援型組込みIoTシステムのための軽量でレジリエントなシグナチャ Lightweight and Resilient Signatures for Cloud-Assisted Embedded IoT Systems ( http://arxiv.org/abs/2409.13937v1 ) ライセンス: Link先を確認	Saif E. Nouma, Attila A. Yavuz,	(参考訳) デジタル署名は、非監査によるスケーラブルな認証を提供し、IoT(Internet of Things)にとって重要なツールである。多くのIoTアプリケーションは、クラウドコンピューティングでよく使われる膨大なリソース制限されたデバイスを保有している。しかし、重要な妥協(物理、マルウェアなど)は、攻撃ベクタの増加とオープンな運用環境のためにIoTに重大な脅威をもたらす。フォワードセキュリティと分散キー管理は、そのような脅威を緩和するための重大な侵害耐性対策である。しかし、フォワードセキュアなシグネチャは、ローエンドのIoTには極端にコストがかかります。本研究では,ハードウェア・アシスト付き軽量・レジリエント・シグナチャ (LRSHA) と,その前方安全版 (FLRSHA) という2つの新しいデジタルシグナチャを作成する。小さなキーと署名サイズで、ほぼ最適に署名できる。我々は、コストのかかる署名操作をなくすためにコミットメント分離や、ハードウェア支援の分散サーバなど、さまざまな設計戦略を相乗化して、耐障害性検証を実現している。本手法は, セキュリティ上の強い仮定(非クラスタ化, 中央サーバ)や検証器の重荷(極大ストレージ, 計算)に悩まされることなく, より高速なフォワードセキュア署名とコンパクトなキー/署名サイズを実現する。我々は,我々のスキームのセキュリティを正式に証明し,コモディティハードウェアと8ビットAVRマイクロコントローラの両方で本格的なオープンソース実装による性能評価を行う。 Digital signatures provide scalable authentication with non-repudiation and are vital tools for the Internet of Things (IoT). Many IoT applications harbor vast quantities of resource-limited devices often used with cloud computing. However, key compromises (e.g., physical, malware) pose a significant threat to IoTs due to increased attack vectors and open operational environments. Forward security and distributed key management are critical breach-resilient countermeasures to mitigate such threats. Yet forward-secure signatures are exorbitantly costly for low-end IoTs, while cloud-assisted approaches suffer from centrality or non-colluding semi-honest servers. In this work, we create two novel digital signatures called Lightweight and Resilient Signatures with Hardware Assistance (LRSHA) and its Forward-secure version (FLRSHA). They offer a near-optimally efficient signing with small keys and signature sizes. We synergize various design strategies, such as commitment separation to eliminate costly signing operations and hardware-assisted distributed servers to enable breach-resilient verification. Our schemes achieve magnitudes of faster forward-secure signing and compact key/signature sizes without suffering from strong security assumptions (non-colluding, central servers) or a heavy burden on the verifier (extreme storage, computation). We formally prove the security of our schemes and validate their performance with full-fledged open-source implementations on both commodity hardware and 8-bit AVR microcontrollers.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# 空間的類似性を考慮した簡易無教師型知識蒸留 Simple Unsupervised Knowledge Distillation With Space Similarity ( http://arxiv.org/abs/2409.13939v1 ) ライセンス: Link先を確認	Aditya Singh, Haohan Wang,	(参考訳) 最近の研究によると、自己教師付き学習(SSL)はより小さなアーキテクチャに容易に拡張されない。ラベルなしで小さなネットワークをトレーニングしながら、この欠点を緩和する一つの方法は、教師なし知識蒸留(UKD)を採用することである。既存のUKDは、教師と学生の相互/イントラの関係に相応しい手工芸品保存にアプローチする。しかし、これは教師のマッピングに存在する他の重要な関係を見落とし/無視する可能性がある。本稿では,サンプル間の保存にふさわしい関係をヒューリスティックに構築する代わりに,教師の埋め込み多様体をモデル化する動機付けを直接行う。写像された多様体が類似しているなら、すべてのインター/イントラのサンプル関係は間接的に保存される。まず, 従来の手法では, 正規化埋め込み機能にのみ依存するため, 教師の潜伏多様体を保存できないことを示す。続いて,正規化により失われた情報を取得するための簡易な目的を提案する。提案する損失成分である「textbf{space similarity}」は,生徒の特徴空間の各次元を,教師の対応する次元に類似するように動機付ける。様々なベンチマークで提案手法の強い性能を示す広範な実験を行った。 As per recent studies, Self-supervised learning (SSL) does not readily extend to smaller architectures. One direction to mitigate this shortcoming while simultaneously training a smaller network without labels is to adopt unsupervised knowledge distillation (UKD). Existing UKD approaches handcraft preservation worthy inter/intra sample relationships between the teacher and its student. However, this may overlook/ignore other key relationships present in the mapping of a teacher. In this paper, instead of heuristically constructing preservation worthy relationships between samples, we directly motivate the student to model the teacher's embedding manifold. If the mapped manifold is similar, all inter/intra sample relationships are indirectly conserved. We first demonstrate that prior methods cannot preserve teacher's latent manifold due to their sole reliance on $L_2$ normalised embedding features. Subsequently, we propose a simple objective to capture the lost information due to normalisation. Our proposed loss component, termed \textbf{space similarity}, motivates each dimension of a student's feature space to be similar to the corresponding dimension of its teacher. We perform extensive experiments demonstrating strong performance of our proposed approach on various benchmarks.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# ペアワイズ特徴比較によるリコースコストの学習 Learning Recourse Costs from Pairwise Feature Comparisons ( http://arxiv.org/abs/2409.13940v1 ) ライセンス: Link先を確認	Kaivalya Rawal, Himabindu Lakkaraju,	(参考訳) 本稿では,ユーザの好みを学習し,推測する際にユーザ入力を組み込む新しい手法を提案する。ブラックボックス機械学習モデルのユーザに対して、アクション可能なリコースを提供しようとする場合、個々の機能の変更の容易性について、彼らの個人的な好みを取り入れたいと願うことがよくあります。これらのリコース探索アルゴリズムは、通常、各特徴を修正コストに関連付ける、徹底したタプルセットを必要とする。本稿では,人間を直接調査することでこのようなコストを得るのは難しいため,非計測的人間比較調査を用いて機能的コストを自動推定するBradley-Terryモデルを提案する。提案する手法は,提案する提案手法では,提案する提案手法は,コストを明示的に定量化することなく,他の方法と比較して実施し易い項目を抽出し,提案手法のすべての特徴量と比較したインプットのみを提供する。 MAP推定値を用いて,個々の特徴コストを効率よく学習し,各特徴ペアの比較データを含む必要のない,各特徴が修正コストに関連付けられているような総括的な特徴コストの学習に十分であることを示す。 This paper presents a novel technique for incorporating user input when learning and inferring user preferences. When trying to provide users of black-box machine learning models with actionable recourse, we often wish to incorporate their personal preferences about the ease of modifying each individual feature. These recourse finding algorithms usually require an exhaustive set of tuples associating each feature to its cost of modification. Since it is hard to obtain such costs by directly surveying humans, in this paper, we propose the use of the Bradley-Terry model to automatically infer feature-wise costs using non-exhaustive human comparison surveys. We propose that users only provide inputs comparing entire recourses, with all candidate feature modifications, determining which recourses are easier to implement relative to others, without explicit quantification of their costs. We demonstrate the efficient learning of individual feature costs using MAP estimates, and show that these non-exhaustive human surveys, which do not necessarily contain data for each feature pair comparison, are sufficient to learn an exhaustive set of feature costs, where each feature is associated with a modification cost.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# TalkMosaic:マルチモーダルLLMQ&Aインタラクションによる対話型フォトモザイク TalkMosaic: Interactive PhotoMosaic with Multi-modal LLM Q&A Interactions ( http://arxiv.org/abs/2409.13941v1 ) ライセンス: Link先を確認	Kevin Li, Fulu Li,	(参考訳) 本研究では, 環境保護のテーマとして, 鳥やライオンなどの動物のイメージを構成するために, 幅広い種類の車両の画像を用いて, 合成画像中の車に関する情報を最大化し, 環境問題に対する意識を高める。本稿では,写真モザイク画像中のタイル画像とそれに対応する原車画像とのインタラクティブな切り替えをデスクトップ上に自動的に保存する「クリック・アンド・ディスプレイ」という簡単な操作を用いて,芸術的に構成されたフォトモザイク画像とのインタラクションを示す。カーイメージ情報と関連する知識をChatGPTに組み込むことで,TalkMosaicというマルチモーダルカスタムGPTを構築する。元のカーイメージをTalkMosaicにアップロードすることで、与えられたカーイメージについて質問し、高い環境基準を満たす車イメージのタイヤの購入場所など、効率よく、かつ効果的に回答を得ることができる。スパースアテンションと量子化技術を用いてマルチモーダル LLM の推論を高速化する方法を,提案した確率的 FlashAttention (PrFlashAttention) 法とStaircase Adaptive Quantization (SAQ) 法を用いて詳細に解析する。実装されたプロトタイプは,提案手法の有効性と有効性を示す。 We use images of cars of a wide range of varieties to compose an image of an animal such as a bird or a lion for the theme of environmental protection to maximize the information about cars in a single composed image and to raise the awareness about environmental challenges. We present a novel way of image interaction with an artistically-composed photomosaic image, in which a simple operation of "click and display" is used to demonstrate the interactive switch between a tile image in a photomosaic image and the corresponding original car image, which will be automatically saved on the Desktop. We build a multimodal custom GPT named TalkMosaic by incorporating car images information and the related knowledge to ChatGPT. By uploading the original car image to TalkMosaic, we can ask questions about the given car image and get the corresponding answers efficiently and effectively such as where to buy the tire in the car image that satisfies high environmental standards. We give an in-depth analysis on how to speed up the inference of multimodal LLM using sparse attention and quantization techniques with presented probabilistic FlashAttention (PrFlashAttention) and Staircase Adaptive Quantization (SAQ) methods. The implemented prototype demonstrates the feasibility and effectiveness of the presented approach.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# 純拡散: 生成拡散モデルにおけるバックドアによるバックドア対策 PureDiffusion: Using Backdoor to Counter Backdoor in Generative Diffusion Models ( http://arxiv.org/abs/2409.13945v1 ) ライセンス: Link先を確認	Vu Tuan Truong, Long Bao Le,	(参考訳) 拡散モデル(DM)は、幅広い生成タスクにおいて最先端の能力を達成した高度なディープラーニングモデルである。しかし、最近の研究では、バックドア攻撃に関する脆弱性が示されており、モデル入力がバックドアトリガーを含む場合、バックドアDMは、バックドアターゲットと呼ばれる指定結果(例えば有害画像)を連続的に生成する。 DMを攻撃するために様々なバックドア技術が研究されているが、これらの脅威に対する防御方法はまだ限られており、特にバックドアトリガーの反転には不十分である。本稿では,DMに埋め込まれたバックドアトリガを反転させることで,バックドア攻撃を効果的に検出できる新しいバックドア防御フレームワークであるPureDiffusionを紹介する。各種トリガ・ターゲット対に関する広範な実験により、PureDiffusionは、忠実度(逆トリガが元のトリガにどの程度似ているか)とバックドア成功率(逆トリガが対応するバックドア目標に導かれる率)において、既存の防御方法よりも優れた性能を示した。特に、特定のケースでは、PureDiffusionによって反転されたバックドアトリガは、元のトリガよりも高い攻撃成功率を達成する。 Diffusion models (DMs) are advanced deep learning models that achieved state-of-the-art capability on a wide range of generative tasks. However, recent studies have shown their vulnerability regarding backdoor attacks, in which backdoored DMs consistently generate a designated result (e.g., a harmful image) called backdoor target when the models' input contains a backdoor trigger. Although various backdoor techniques have been investigated to attack DMs, defense methods against these threats are still limited and underexplored, especially in inverting the backdoor trigger. In this paper, we introduce PureDiffusion, a novel backdoor defense framework that can efficiently detect backdoor attacks by inverting backdoor triggers embedded in DMs. Our extensive experiments on various trigger-target pairs show that PureDiffusion outperforms existing defense methods with a large gap in terms of fidelity (i.e., how much the inverted trigger resembles the original trigger) and backdoor success rate (i.e., the rate that the inverted trigger leads to the corresponding backdoor target). Notably, in certain cases, backdoor triggers inverted by PureDiffusion even achieve higher attack success rate than the original triggers.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# PyGRF:Python地理ランダムフォレストモデルの改良と公衆衛生・自然災害の事例研究 PyGRF: An improved Python Geographical Random Forest model and case studies in public health and natural disasters ( http://arxiv.org/abs/2409.13947v1 ) ライセンス: Link先を確認	Kai Sun, Ryan Zhenqi Zhou, Jiyeon Kim, Yingjie Hu,	(参考訳) 地理ランダムフォレスト(GRF)は、最近開発された、空間的に明示的な機械学習モデルである。より正確な予測と局所的な解釈を提供する能力により、GRFはすでに多くの研究で使われている。しかし、現在のGRFモデルでは、局所モデルウェイトと帯域幅ハイパーパラメータの決定に制限があり、ローカルトレーニングサンプルの数が不足している可能性があり、時には高い局所予測誤差がある。また、Rパッケージとして実装されているGRFは、現在Pythonバージョンを持っていない。この研究は、理論インフォームドなハイパーパラメータ決定、局所的なトレーニングサンプル展開、空間的に重み付けされた局所予測を導入することで、これらの制限に対処する。また,Python ベースの GRF モデルとパッケージ PyGRF を開発した。 PyGRFの性能をサンプルデータセットで評価し、公衆衛生と自然災害の2つのケーススタディでさらにその利用を実証した。 Geographical random forest (GRF) is a recently developed and spatially explicit machine learning model. With the ability to provide more accurate predictions and local interpretations, GRF has already been used in many studies. The current GRF model, however, has limitations in its determination of the local model weight and bandwidth hyperparameters, potentially insufficient numbers of local training samples, and sometimes high local prediction errors. Also, implemented as an R package, GRF currently does not have a Python version which limits its adoption among machine learning practitioners who prefer Python. This work addresses these limitations by introducing theory-informed hyperparameter determination, local training sample expansion, and spatially-weighted local prediction. We also develop a Python-based GRF model and package, PyGRF, to facilitate the use of the model. We evaluate the performance of PyGRF on an example dataset and further demonstrate its use in two case studies in public health and natural disasters.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# 追従信号としてのフォローアップ類似を用いた言語モデルの調整 Aligning Language Models Using Follow-up Likelihood as Reward Signal ( http://arxiv.org/abs/2409.13948v1 ) ライセンス: Link先を確認	Chen Zhang, Dading Chong, Feng Jiang, Chengguang Tang, Anningzhe Gao, Guohua Tang, Haizhou Li,	(参考訳) 自然な人間同士の会話では、参加者はフォローアップ反応に基づいてフィードバック信号を受け取ることが多い。これらの反応には、口頭反応、表情、感情状態の変化、その他の非言語的手がかりが含まれる。同様に、人間と機械のインタラクションにおいて、マシンはユーザのフォローアップ発話をフィードバック信号として利用して、ユーザの要求に適切に対処したかどうかを評価することができる。そこで本稿では,人間や商業LLMに基づく嗜好アノテーションに頼ることなく,好ましくない応答を区別する報酬として,フォローアップ発話の可能性を提案する。提案した報奨メカニズムは,大規模人間やGPT-4アノテートデータを用いて訓練された強力な報奨モデルの性能を,ペアワイズと4つのレーティングベースベンチマークで比較した。提案手法は,FLRのメカニズムに基づいて,基本方針モデルのオンライン世代からの選好データを自動的にマイニングするものである。その後、嗜好データを用いて、直接選好最適化(DPO)などの直接選好(DAP)手法により、ベースモデルの有用性を高める。最後に、自然言語フィードバックによる追従可能性を提供する言語モデルの微調整により、報酬モデルベンチマークにおけるFLRの性能が著しく向上し、基本方針モデルの有用性が整合できることを実証する。 In natural human-to-human conversations, participants often receive feedback signals from one another based on their follow-up reactions. These reactions can include verbal responses, facial expressions, changes in emotional state, and other non-verbal cues. Similarly, in human-machine interactions, the machine can leverage the user's follow-up utterances as feedback signals to assess whether it has appropriately addressed the user's request. Therefore, we propose using the likelihood of follow-up utterances as rewards to differentiate preferred responses from less favored ones, without relying on human or commercial LLM-based preference annotations. Our proposed reward mechanism, ``Follow-up Likelihood as Reward" (FLR), matches the performance of strong reward models trained on large-scale human or GPT-4 annotated data on 8 pairwise-preference and 4 rating-based benchmarks. Building upon the FLR mechanism, we propose to automatically mine preference data from the online generations of a base policy model. The preference data are subsequently used to boost the helpfulness of the base model through direct alignment from preference (DAP) methods, such as direct preference optimization (DPO). Lastly, we demonstrate that fine-tuning the language model that provides follow-up likelihood with natural language feedback significantly enhances FLR's performance on reward modeling benchmarks and effectiveness in aligning the base policy model's helpfulness.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# Mufu: LLMを用いた低リソース翻訳のための多言語融合学習 Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM ( http://arxiv.org/abs/2409.13949v1 ) ライセンス: Link先を確認	Zheng Wei Lim, Nitish Gupta, Honglin Yu, Trevor Cohn,	(参考訳) 多言語大言語モデル (LLM) は優れた翻訳者であるが、これは主に高リソース言語に限られている。多くのLLMでは、低リソース言語からの翻訳は依然として難しい課題である。この低リソース環境でのデータ効率を最大化するために、自動生成された多言語候補の選択と、プロンプト内の不正確な翻訳を訂正する命令を含む無文を導入する。 Mufuは、翻訳タスクをポストティングタスクに変換し、LCMの推論能力を補助的な翻訳候補で活用し、モデルが入力品質を評価し、セマンティクスを言語横断的に整列させ、関連する入力からコピーし、正しくないインスタンスをオーバーライドするように求めている。 Flores-200データセット上でのEn-XX翻訳実験により,Museスタイルのプロンプトに対して微調整されたLLMは,高品質な補助翻訳候補に対して頑健であり,低リソースと低リソースの言語ペアの64%でNLLB 1.3B蒸留モデルよりも優れた性能が得られることが示された。低リソース翻訳におけるファインチューンのみのベースラインよりも平均3.1 chrFの改善を維持しながら、これらのモデルを蒸留して推論コストを削減する。 Multilingual large language models (LLMs) are great translators, but this is largely limited to high-resource languages. For many LLMs, translating in and out of low-resource languages remains a challenging task. To maximize data efficiency in this low-resource setting, we introduce Mufu, which includes a selection of automatically generated multilingual candidates and an instruction to correct inaccurate translations in the prompt. Mufu prompts turn a translation task into a postediting one, and seek to harness the LLM's reasoning capability with auxiliary translation candidates, from which the model is required to assess the input quality, align the semantics cross-lingually, copy from relevant inputs and override instances that are incorrect. Our experiments on En-XX translations over the Flores-200 dataset show LLMs finetuned against Mufu-style prompts are robust to poor quality auxiliary translation candidates, achieving performance superior to NLLB 1.3B distilled model in 64% of low- and very-low-resource language pairs. We then distill these models to reduce inference cost, while maintaining on average 3.1 chrF improvement over finetune-only baseline in low-resource translations.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# 高速セグメンテーションのための深層学習とAR/VR設計と製作を可能にする臨界次元メトロジーとキャラクタリゼーション Deep learning for fast segmentation and critical dimension metrology & characterization enabling AR/VR design and fabrication ( http://arxiv.org/abs/2409.13951v1 ) ライセンス: Link先を確認	Kundan Chaudhary, Subhei Shaar, Raja Muthinti,	(参考訳) 拡張現実/バーチャルリアリティ(AR/VR)モジュールで使用されるコンポーネントの設計と製造には,顕微鏡画像の定量的解析が不可欠である。しかし、これらの複雑な画像から関心領域(ROI)を分割し、臨界次元(CD)を抽出するには、プロセス、材料、デバイス最適化において実行可能な決定の鍵となるディープラーニングモデルのような新しい技術が必要である。本研究では,電子顕微鏡画像の多種多様なデータセットを用いて,事前学習したセグメンテーションモデル(SAM)の微調整について報告する。低ランク適応(LoRA)などの手法を用いて,トレーニング時間を短縮し,ROI抽出の精度を高める。モデルが見えない画像に一般化する能力はゼロショット学習を促進し、セグメント化されたROIからCDを正確に抽出するCD抽出モデルをサポートする。本研究では, 表面緩和格子(SRG)とフレネルレンズの断面画像から, 単一モードとマルチクラスモードの両方で, バイナリ画像の正確な抽出を実証する。さらに、これらのバイナリ画像は、関連するCDの抽出を補助する遷移点を特定するために使用される。微調整セグメンテーションモデルとCD抽出モデルの組み合わせは、分析能力の向上、データと洞察の時間の向上、製造プロセスの最適化によって、様々な産業用途に多大な利点をもたらす。 Quantitative analysis of microscopy images is essential in the design and fabrication of components used in augmented reality/virtual reality (AR/VR) modules. However, segmenting regions of interest (ROIs) from these complex images and extracting critical dimensions (CDs) requires novel techniques, such as deep learning models which are key for actionable decisions on process, material and device optimization. In this study, we report on the fine-tuning of a pre-trained Segment Anything Model (SAM) using a diverse dataset of electron microscopy images. We employed methods such as low-rank adaptation (LoRA) to reduce training time and enhance the accuracy of ROI extraction. The model's ability to generalize to unseen images facilitates zero-shot learning and supports a CD extraction model that precisely extracts CDs from the segmented ROIs. We demonstrate the accurate extraction of binary images from cross-sectional images of surface relief gratings (SRGs) and Fresnel lenses in both single and multiclass modes. Furthermore, these binary images are used to identify transition points, aiding in the extraction of relevant CDs. The combined use of the fine-tuned segmentation model and the CD extraction model offers substantial advantages to various industrial applications by enhancing analytical capabilities, time to data and insights, and optimizing manufacturing processes.	翻訳日:2024-11-07 04:28:44 公開日:2024-09-20
# マルチフィル・イン・ザ・ブランク・エクサムによる大規模言語モデルにおけるゼロリソース幻覚検出の強化 A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource Hallucination Detection in Large Language Models ( http://arxiv.org/abs/2409.17173v1 ) ライセンス: Link先を確認	Satoshi Munakata, Taku Fukui, Takao Mohri,	(参考訳) 大型言語モデル (LLM) はしばしば幻覚テキストを作成する。このようなテキストをセマンティックに比較し,複数のバージョンを確率的に生成する手法が開発されている。しかし、各再生テキストのストーリーラインが変更されると、生成されたテキストは非互換になり、検出精度が悪化する。本稿では,このストーリーライン変更問題に対処するために,マルチフィル・ザ・ブランク試験を取り入れた幻覚検出手法を提案する。まず,本手法は,原文から複数の対象をマスキングすることで,マルチフィル・ザ・ブランク試験を生成する。第2に、LCMは繰り返しこの試験に答えるよう促す。このアプローチは、テスト回答のストーリーラインが元のストーリーと一致していることを保証する。最後に、原文自体の「emph{hallucination snowballing}」の可能性を考慮して、各原文に対する幻覚の度合いを試験結果で評価する。実験結果から,本手法は既存手法に勝るだけでなく,既存手法とのアンサンブルにおける最先端性能も向上することが明らかとなった。 Large language models (LLMs) often fabricate a hallucinatory text. Several methods have been developed to detect such text by semantically comparing it with the multiple versions probabilistically regenerated. However, a significant issue is that if the storyline of each regenerated text changes, the generated texts become incomparable, which worsen detection accuracy. In this paper, we propose a hallucination detection method that incorporates a multiple-fill-in-the-blank exam approach to address this storyline-changing issue. First, our method creates a multiple-fill-in-the-blank exam by masking multiple objects from the original text. Second, prompts an LLM to repeatedly answer this exam. This approach ensures that the storylines of the exam answers align with the original ones. Finally, quantifies the degree of hallucination for each original sentence by scoring the exam answers, considering the potential for \emph{hallucination snowballing} within the original text itself. Experimental results show that our method alone not only outperforms existing methods, but also achieves clearer state-of-the-art performance in the ensembles with existing methods.	翻訳日:2024-11-06 16:50:22 公開日:2024-09-20
# CSCE: 因果的意義と一貫性の同時促進によるLCM推論の促進 CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Casual Significance and Consistency ( http://arxiv.org/abs/2409.17174v1 ) ライセンス: Link先を確認	Kangsheng Wang, Xiao Zhang, Zizheng Guo, Tianyu Hu, Huimin Ma,	(参考訳) チェーン・オブ・シンキング(CoT)のようなチェーンベースの推論手法は,大規模言語モデル(LLM)の推論タスクの解決において,その役割を担っている。しかし, LLMの推論能力, 特に長距離推論タスクにおいて, 因果錯覚は LLM の推論能力向上に重要な障害となっている。本稿では,因果的意義と一貫性,すなわち因果的意義と一貫性(CSCE)を同時に考慮する非連鎖的推論フレームワークを提案する。治療効果評価を利用してLCMの損失関数をカスタマイズし、因果的意義と一貫性の2つの側面から推論能力を高める。これにより、モデルは基本的な因果関係を捉え、さまざまなシナリオで堅牢で一貫したパフォーマンスを維持することができる。さらに、我々は、CoTのようなチェーンベースの手法でよく用いられる複数のワンステップ推論をカスケードする推論プロセスから、一行で全ての推論プロセスを出力する因果拡張手法に変換し、モデルの推論効率をさらに向上する。大規模な実験により,本手法は推理成功率と速度の両方を改善した。これらの改善は、非チェーンベースの手法が推論タスクを完了させるのに役立つことも示している。 Chain-based reasoning methods like chain of thought (CoT) play a rising role in solving reasoning tasks for large language models (LLMs). However, the causal illusions between \textit{a step of reasoning} and \textit{corresponding state transitions} are becoming a significant obstacle to advancing LLMs' reasoning capabilities, especially in long-range reasoning tasks. This paper proposes a non-chain-based reasoning framework for simultaneous consideration of causal significance and consistency, i.e., the Causal Significance and Consistency Enhancer (CSCE). We customize LLM's loss function utilizing treatment effect assessments to enhance its reasoning ability from two aspects: causal significance and consistency. This ensures that the model captures essential causal relationships and maintains robust and consistent performance across various scenarios. Additionally, we transform the reasoning process from the cascading multiple one-step reasoning commonly used in Chain-Based methods, like CoT, to a causal-enhanced method that outputs the entire reasoning process in one go, further improving the model's reasoning efficiency. Extensive experiments show that our method improves both the reasoning success rate and speed. These improvements further demonstrate that non-chain-based methods can also aid LLMs in completing reasoning tasks.	翻訳日:2024-11-06 16:50:22 公開日:2024-09-20
# Post-Quantum Cryptography Anonymous Scheme -- PQCWC: Post-Quantum Cryptography Winternitz-Chen Post-Quantum Cryptography Anonymous Scheme -- PQCWC: Post-Quantum Cryptography Winternitz-Chen ( http://arxiv.org/abs/2410.03678v1 ) ライセンス: Link先を確認	Abel C. H. Chen,	(参考訳) 量子コンピューティング技術が成熟するにつれて、主流の非対称暗号法のセキュリティに脅威をもたらす。これに応えて、National Institute of Standards and Technologyは2024年8月にポスト量子暗号(PQC)アルゴリズムの最終版をリリースした。これらの量子後暗号アルゴリズムは主に格子ベースの暗号とハッシュベースの暗号に基づいている。そこで本研究では,プライバシ保護における将来的な応用に向けたPQCに基づく匿名スキームの設計を検討することを目的とした,PQCWC(Post-Quantum Cryptography Winternitz-Chen)匿名スキームを提案する。この研究で設計された匿名のスキームは主にウィンターニッツ署名スキームに基づいており、これは元の公開鍵が証明書に暴露されるのを防ぐことができる。さらに、PQCWC匿名スキームは、バタフライキー拡張機構を統合し、世界で初めてハッシュベースのバタフライキー拡張機構を提案し、登録局と認証局の両方の匿名性を達成し、プライバシーを完全に保護する。実験環境では,Secure Hash Algorithm-1(SHA-1),SHA-2シリーズ,SHA-3シリーズ,BLAKEシリーズなど,さまざまなハッシュアルゴリズムを比較した。提案手法は,鍵長,署名長,鍵生成時間,署名生成時間,署名検証時間を増大させることなく,匿名性を実現することができることを示す。 As quantum computing technology matures, it poses a threat to the security of mainstream asymmetric cryptographic methods. In response, the National Institute of Standards and Technology released the final version of post-quantum cryptographic (PQC) algorithm standards in August 2024. These post-quantum cryptographic algorithms are primarily based on lattice-based and hash-based cryptography. Therefore, this study proposes the Post-Quantum Cryptography Winternitz-Chen (PQCWC) anonymous scheme, aimed at exploring the design of anonymous schemes based on PQC for future applications in privacy protection. The anonymous scheme designed in this study is mainly built on the Winternitz signature scheme, which can prevent the original public key from being exposed in the certificate. Furthermore, the PQCWC anonymous scheme integrates the butterfly key expansion mechanism, proposing the first hash-based butterfly key expansion mechanism in the world, achieving anonymity for both the registration authority and the certificate authority, thereby fully protecting privacy. In the experimental environment, this study compares various hash algorithms, including Secure Hash Algorithm-1 (SHA-1), the SHA-2 series, the SHA-3 series, and the BLAKE series. The results demonstrate that the proposed anonymous scheme can achieve anonymity without increasing key length, signature length, key generation time, signature generation time, or signature verification time.	翻訳日:2024-11-02 20:48:16 公開日:2024-09-20
# MRSO: グローバル最適化のためのラット群最適化による探索と爆発のバランス MRSO: Balancing Exploration and Exploitation through Modified Rat Swarm Optimization for Global Optimization ( http://arxiv.org/abs/2410.03684v1 ) ライセンス: Link先を確認	Hemin Sardar Abdulla, Azad A. Ameen, Sarwar Ibrahim Saeed, Ismail Asaad Mohammed, Tarik A. Rashid,	(参考訳) インテリジェントテクノロジーの急速な進歩は、複雑な問題に対処するために自然な振る舞いを活用する最適化アルゴリズムの開発につながった。このうち、ラットの社会的・行動的特徴にインスパイアされたラット群最適化器(RSO)は、その収束精度と探索能力は制限されているものの、様々な領域でポテンシャルを示した。これらの欠点に対処するため,本研究では,探索と搾取のバランスを高めるために,MRSO(Modified Rat Swarm Optimizer)を導入する。 MRSOは探索効率と耐久性を改善するために独自の改良を加えており、溶接ビーム、圧力容器、ギヤトレインの設計といった挑戦的な工学的問題に適合している。古典的ベンチマーク関数による広範囲なテストの結果,MRSOは局所最適化を回避し,9つのマルチモーダル関数のうち6つと7つの固定次元マルチモーダル関数において高い精度を達成し,性能を著しく向上することが示された。 CEC 2019ベンチマークでは、MRSOは10機能中6機能で標準RSOよりも優れており、優れたグローバル検索能力を示している。工学的設計問題に適用すると、MRSOは一貫してRSOよりも平均的な結果を提供し、その効果が証明される。さらに,従来のベンチマークとCEC-2019のベンチマークを用いた8つのアルゴリズムとの比較を行った。 MRSOはこれらのアルゴリズムよりも優れており、23の古典的ベンチマーク関数のうち6つ、CEC-2019のベンチマーク関数のうち4つで優れた結果が得られる。これらの結果は、工学アプリケーションにおけるタスクを最適化するための信頼性と効率的なツールとして、MRSOが多大な貢献をしていることを示す。 The rapid advancement of intelligent technology has led to the development of optimization algorithms that leverage natural behaviors to address complex issues. Among these, the Rat Swarm Optimizer (RSO), inspired by rats' social and behavioral characteristics, has demonstrated potential in various domains, although its convergence precision and exploration capabilities are limited. To address these shortcomings, this study introduces the Modified Rat Swarm Optimizer (MRSO), designed to enhance the balance between exploration and exploitation. MRSO incorporates unique modifications to improve search efficiency and durability, making it suitable for challenging engineering problems such as welded beam, pressure vessel, and gear train design. Extensive testing with classical benchmark functions shows that MRSO significantly improves performance, avoiding local optima and achieving higher accuracy in six out of nine multimodal functions and in all seven fixed-dimension multimodal functions. In the CEC 2019 benchmarks, MRSO outperforms the standard RSO in six out of ten functions, demonstrating superior global search capabilities. When applied to engineering design problems, MRSO consistently delivers better average results than RSO, proving its effectiveness. Additionally, we compared our approach with eight recent and well-known algorithms using both classical and CEC-2019 bench-marks. MRSO outperforms each of these algorithms, achieving superior results in six out of 23 classical benchmark functions and in four out of ten CEC-2019 benchmark functions. These results further demonstrate MRSO's significant contributions as a reliable and efficient tool for optimization tasks in engineering applications.	翻訳日:2024-11-02 20:48:16 公開日:2024-09-20
# FAIR GPT:ChatGPTにおける研究データ管理の仮想コンサルタント FAIR GPT: A virtual consultant for research data management in ChatGPT ( http://arxiv.org/abs/2410.07108v1 ) ライセンス: Link先を確認	Renat Shigapov, Irene Schumm,	(参考訳) FAIR GPTはChatGPTの最初の仮想コンサルタントであり、研究者や組織がFAIR(Findable, Accessible, Interoperable, Reusable)の原則に準拠したデータやメタデータを作成できるようにする。メタデータの改善、データセットの編成、リポジトリの選択に関するガイダンスを提供する。正確性を確保するために、FAIR GPTは外部APIを使用して、データセットFAIRnessを評価し、制御された語彙を検索し、リポジトリを推奨し、幻覚を最小化し、精度を向上させる。また、ドキュメンテーション(データおよびソフトウェア管理計画、READMEファイル、コードブック)の作成を支援し、適切なライセンスを選択する。本稿では,その特徴,応用,限界について述べる。 FAIR GPT is a first virtual consultant in ChatGPT designed to help researchers and organizations make their data and metadata compliant with the FAIR (Findable, Accessible, Interoperable, Reusable) principles. It provides guidance on metadata improvement, dataset organization, and repository selection. To ensure accuracy, FAIR GPT uses external APIs to assess dataset FAIRness, retrieve controlled vocabularies, and recommend repositories, minimizing hallucination and improving precision. It also assists in creating documentation (data and software management plans, README files, and codebooks), and selecting proper licenses. This paper describes its features, applications, and limitations.	翻訳日:2024-10-31 22:27:10 公開日:2024-09-20
# 不完全文脈情報を用いた胸部X線解析におけるマルチモーダル大言語モデルの有用性 Utility of Multimodal Large Language Models in Analyzing Chest X-ray with Incomplete Contextual Information ( http://arxiv.org/abs/2410.07111v1 ) ライセンス: Link先を確認	Choonghan Kim, Seonhee Cho, Joo Heung Yoon,	(参考訳) 背景: 大規模言語モデル (LLM) は, 臨床現場での利用が進んでいるが, 不完全な放射線学報告に悩まされる可能性がある。胸部X線撮影では,マルチモーダルLSM(テキストと画像を用いた)の精度と理解が向上し,臨床診断支援に有効かどうかを検討した。目的:不完全データとマルチモーダルデータの両方を用いて胸部X線写真から正確な印象を発生させるLLMの堅牢性を評価すること。資料と方法:MIMIC-CXRデータベースから300枚のX線画像レポートペアを使用した。 3つのLLM(OpenFlamingo、MedFlamingo、IDEFICS)はテキストのみのフォーマットとマルチモーダルフォーマットの両方でテストされた。印象はまず全文から生成され、それから20%、50%、80%を除去してテストされた。胸部X線を用いて画像の追加効果を評価し, 統計解析による3つの指標を用いてモデル性能を比較した。結果: テキストのみのモデル (OpenFlamingo, MedFlamingo, IDEFICS) も同様のパフォーマンス (ROUGE-L: 0.39 vs. 0.21 vs. 0.21; F1RadGraph: 0.34 vs. 0.17 vs. 0.17; F1CheXbert: 0.53 vs. 0.40 vs. 0.40) で、OpenFlamingoは完全なテキスト(p<0.001。すべてのモデルにまたがる不完全なデータにより、パフォーマンスは低下した。しかし、画像を追加することでMedFlamingoとIDEFICS(p<0.001)のパフォーマンスが大幅に向上し、不完全なテキストであってもOpenFlamingoと同等かそれ以上になった。結論: LLMは不完全な放射線学データで低品質な出力を生成するが, マルチモーダルLSMは信頼性を改善し, 臨床的意思決定を支援することができる。キーワード:大規模言語モデル、マルチモーダル、意味分析、胸部X線撮影、臨床診断支援 Background: Large language models (LLMs) are gaining use in clinical settings, but their performance can suffer with incomplete radiology reports. We tested whether multimodal LLMs (using text and images) could improve accuracy and understanding in chest radiography reports, making them more effective for clinical decision support. Purpose: To assess the robustness of LLMs in generating accurate impressions from chest radiography reports using both incomplete data and multimodal data. Material and Methods: We used 300 radiology image-report pairs from the MIMIC-CXR database. Three LLMs (OpenFlamingo, MedFlamingo, IDEFICS) were tested in both text-only and multimodal formats. Impressions were first generated from the full text, then tested by removing 20%, 50%, and 80% of the text. The impact of adding images was evaluated using chest x-rays, and model performance was compared using three metrics with statistical analysis. Results: The text-only models (OpenFlamingo, MedFlamingo, IDEFICS) had similar performance (ROUGE-L: 0.39 vs. 0.21 vs. 0.21; F1RadGraph: 0.34 vs. 0.17 vs. 0.17; F1CheXbert: 0.53 vs. 0.40 vs. 0.40), with OpenFlamingo performing best on complete text (p<0.001). Performance declined with incomplete data across all models. However, adding images significantly boosted the performance of MedFlamingo and IDEFICS (p<0.001), equaling or surpassing OpenFlamingo, even with incomplete text. Conclusion: LLMs may produce low-quality outputs with incomplete radiology data, but multimodal LLMs can improve reliability and support clinical decision-making. Keywords: Large language model; multimodal; semantic analysis; Chest Radiography; Clinical Decision Support;	翻訳日:2024-10-31 22:27:10 公開日:2024-09-20
# 4次元およびそれ以下における位相的および幾何学的多次元多様体データの生成 Generating Topologically and Geometrically Diverse Manifold Data in Dimensions Four and Below ( http://arxiv.org/abs/2410.07115v1 ) ライセンス: Link先を確認	Khalil Mathieu Hannouch, Stephan Chalup,	(参考訳) データのトポロジ的特性を理解することは、多くの研究領域において重要である。近年の研究では、合成4次元画像型データが、4次元畳み込みニューラルネットワークモデルのトレーニングに有用であることが示されている。これらのモデルはまた、永続的ホモロジーのような既存のトポロジ的データ分析技術では不可能な画像前処理技術の使用を許容しているように見える。本稿では,代数トポロジからの手法と形態学などの画像処理技術を組み合わせることで,トポロジラベルを用いたトポロジ的な2次元・3次元・4次元画像データを生成する方法について検討する。これらのアプローチは、これを4Dで達成するためのロードマップを提供することを目的として、2Dと3Dで説明されています。 Understanding the topological characteristics of data is important to many areas of research. Recent work has demonstrated that synthetic 4D image-type data can be useful to train 4D convolutional neural network models to see topological features in these data. These models also appear to tolerate the use of image preprocessing techniques where existing topological data analysis techniques such as persistent homology do not. This paper investigates how methods from algebraic topology, combined with image processing techniques such as morphology, can be used to generate topologically sophisticated and diverse-looking 2-, 3-, and 4D image-type data with topological labels in simulation. These approaches are illustrated in 2D and 3D with the aim of providing a roadmap towards achieving this in 4D.	翻訳日:2024-10-31 22:17:22 公開日:2024-09-20
# 2次深層学習モデルを用いた地中レーダ画像からの埋設物の分類 Classification of Buried Objects from Ground Penetrating Radar Images by using Second Order Deep Learning Models ( http://arxiv.org/abs/2410.07117v1 ) ライセンス: Link先を確認	Douba Jafuno, Ammar Mian, Guillaume Ginolhac, Nickolas Stelzenmuller,	(参考訳) 本稿では,埋設物を分類するために,共分散行列に基づく新しい分類モデルを構築した。提案したモデルの入力は、古典的な地中貫入レーダ(GPR)システムで得られたハイパーボラサムネイルである。これらのサムネイルは古典的CNNの第1層に入力され、畳み込みフィルタの出力を用いて共分散行列が生成される。次に、共分散行列は、Symmetric Positive Definite (SPD)行列を分類するために特定の層からなるネットワークに与えられる。大規模データベースでは、GPRデータ用に設計された浅層ネットワークと、コンピュータビジョンアプリケーションで一般的に使用される従来のCNN、特にトレーニングデータの数が減少し、誤ラベルデータが存在する場合において、我々のアプローチが優れていることを示す。また、異なる気象モードや考慮事項から、トレーニングデータとテストセットが得られた場合のモデルの関心についても説明する。 In this paper, a new classification model based on covariance matrices is built in order to classify buried objects. The inputs of the proposed models are the hyperbola thumbnails obtained with a classical Ground Penetrating Radar (GPR) system. These thumbnails are entered in the first layers of a classical CNN which results in a covariance matrix by using the outputs of the convolutional filters. Next, the covariance matrix is given to a network composed of specific layers to classify Symmetric Positive Definite (SPD) matrices. We show in a large database that our approach outperform shallow networks designed for GPR data and conventional CNNs typically used in computer vision applications, particularly when the number of training data decreases and in the presence of mislabeled data. We also illustrate the interest of our models when training data and test sets are obtained from different weather modes or considerations.	翻訳日:2024-10-31 22:17:22 公開日:2024-09-20
# E-Commerce Query Product Type Prediction のためのトランスファー学習 Transfer Learning for E-commerce Query Product Type Prediction ( http://arxiv.org/abs/2410.07121v1 ) ライセンス: Link先を確認	Anna Tigunova, Thomas Ricatte, Ghadir Eraisha,	(参考訳) eコマース検索エンジンでは、顧客の意図をよく理解することが不可欠だ。特に、正しい商品タイプを検索クエリに関連付けることは、顧客に対して正しい商品を提示する上で重要な役割を担っている。クエリ製品タイプ分類(Q2PT)は、検索クエリが短く曖昧であるため、既存の製品カテゴリの数が非常に多く、数千の値にまたがるため、特に難しいタスクである。さらに、国際市場は、言語や方言の多様性、文化的な違いといった追加の課題に直面し、クエリの解釈に影響を与える。本研究は、グローバルマルチローカライズeコマース市場におけるQ2PT予測に焦点を当てる。各ローカライズ毎にQ2PTモデルをトレーニングする一般的なアプローチは、低リソースストアで大幅なパフォーマンス低下を示す。さらに,本手法では,新たな国へのスムーズな展開が不可能であり,データ収集と新たなローカライズ特化Q2PTモデルをスクラッチからトレーニングする必要がある。そこで本研究では,高リソースから低リソースのローカライズへの変換学習を用いて,Q2PT性能のグローバルな同等性を実現することを提案する。ローカルごとのQ2PTモデルと統合されたQ2PTモデルをベンチマークし、世界中の店舗でトレーニングデータとモデル構造を共有する。さらに,各地域別Q2PTモデルと地域別Q2PTモデルを比較し,国別特性のタスク依存性を示す。グローバルな20の地域を対象とした大規模なeコマースデータセット上で,Q2PTモデルの定量的および定性的な分析を行い,各地域を意識したQ2PTモデルの方が,代替品よりも優れた性能を示した。 Getting a good understanding of the customer intent is essential in e-commerce search engines. In particular, associating the correct product type to a search query plays a vital role in surfacing correct products to the customers. Query product type classification (Q2PT) is a particularly challenging task because search queries are short and ambiguous, the number of existing product categories is extremely large, spanning thousands of values. Moreover, international marketplaces face additional challenges, such as language and dialect diversity and cultural differences, influencing the interpretation of the query. In this work we focus on Q2PT prediction in the global multilocale e-commerce markets. The common approach of training Q2PT models for each locale separately shows significant performance drops in low-resource stores. Moreover, this method does not allow for a smooth expansion to a new country, requiring to collect the data and train a new locale-specific Q2PT model from scratch. To tackle this, we propose to use transfer learning from the highresource to the low-resource locales, to achieve global parity of Q2PT performance. We benchmark the per-locale Q2PT model against the unified one, which shares the training data and model structure across all worldwide stores. Additionally, we compare locale-aware and locale-agnostic Q2PT models, showing the task dependency on the country-specific traits. We conduct extensive quantiative and qualitative analysis of Q2PT models on the large-scale e-commerce dataset across 20 worldwide locales, which shows that unified locale-aware Q2PT model has superior performance over the alternatives.	翻訳日:2024-10-31 22:17:22 公開日:2024-09-20
# Eコマースにおける高度なAI顧客サービスのためのエンドクラウドコラボレーションフレームワーク End-Cloud Collaboration Framework for Advanced AI Customer Service in E-commerce ( http://arxiv.org/abs/2410.07122v1 ) ライセンス: Link先を確認	Liangyu Teng, Yang Liu, Jing Liu, Liang Song,	(参考訳) 近年、eコマース業界は、高度なAI駆動のカスタマーサービスソリューションに対する需要が急増している。従来のクラウドベースのモデルは、レイテンシ、パーソナライズされたサービス、プライバシの懸念といった面で制限に直面しています。さらに、エンドデバイスは大きなAIモデルを効果的に展開するための計算資源を欠いていることが多い。本稿では,eコマースにおける高度なAI顧客サービスのための革新的なエンドクラウドコラボレーション(ECC)フレームワークを提案する。このフレームワークは、クラウドモデルの一般化ポテンシャルを深く探求し、端末チップのコンピューティングパワーリソースを効果的に活用することにより、大規模クラウドモデルと中小エンドモデルの利点をある程度緩和する。具体的には、大規模なクラウドモデルは、教師として機能し、エンドモデルの学習を指導し、促進することで、エンドモデルの大規模で高品質なデータへの依存を著しく減らし、従来のエンドモデルのトレーニングにおけるデータボトルネックに対処し、業界アプリケーションの迅速な展開のための新しいパラダイムを提供する。さらに,クラウドモデルからのガイダンスとリアルタイムユーザフィードバックに基づいて,エンドモデルを継続的にイテレーションし,アップグレードすることが可能なオンライン・エボリューティブ・ラーニング・ストラテジーを導入する。この戦略は、局所的な微調整を行い、プライバシ保護とパーソナライズされたサービスという2つの目標を達成することによって、センシティブな情報のアップロードを回避しながら、モデルがアプリケーションシナリオの迅速な変更に柔軟に対応できることを保証する。 % 電子商取引分野におけるカスタマイズされたモデル微調整手法に体系的な貢献をする。結論として、我々は、詳細なコーパス収集(例えば、データ組織、クリーニング、前処理)を実装し、ECCベースのeコマース顧客サービスの業界特化モデルを訓練する。 In recent years, the e-commerce industry has seen a rapid increase in the demand for advanced AI-driven customer service solutions. Traditional cloud-based models face limitations in terms of latency, personalized services, and privacy concerns. Furthermore, end devices often lack the computational resources to deploy large AI models effectively. In this paper, we propose an innovative End-Cloud Collaboration (ECC) framework for advanced AI customer service in e-commerce. This framework integrates the advantages of large cloud models and mid/small-sized end models by deeply exploring the generalization potential of cloud models and effectively utilizing the computing power resources of terminal chips, alleviating the strain on computing resources to some extent. Specifically, the large cloud model acts as a teacher, guiding and promoting the learning of the end model, which significantly reduces the end model's reliance on large-scale, high-quality data and thereby addresses the data bottleneck in traditional end model training, offering a new paradigm for the rapid deployment of industry applications. Additionally, we introduce an online evolutive learning strategy that enables the end model to continuously iterate and upgrade based on guidance from the cloud model and real-time user feedback. This strategy ensures that the model can flexibly adapt to the rapid changes in application scenarios while avoiding the uploading of sensitive information by performing local fine-tuning, achieving the dual goals of privacy protection and personalized service. %We make systematic contributions to the customized model fine-tuning methods in the e-commerce domain. To conclude, we implement in-depth corpus collection (e.g., data organization, cleaning, and preprocessing) and train an ECC-based industry-specific model for e-commerce customer service.	翻訳日:2024-10-31 22:17:22 公開日:2024-09-20
# AIとビッグデータによる災害リスク低減の転換--法学と学際的視点から Transforming disaster risk reduction with AI and big data: Legal and interdisciplinary perspectives ( http://arxiv.org/abs/2410.07123v1 ) ライセンス: Link先を確認	Kwok P Chun, Thanti Octavianti, Nilay Dogulu, Hristos Tyralis, Georgia Papacharalampous, Ryan Rowberry, Pingyu Fan, Mark Everard, Maria Francesch-Huidobro, Wellington Migliari, David M. Hannah, John Travis Marshall, Rafael Tolosana Calasanz, Chad Staddon, Ida Ansharyani, Bastien Dieppois, Todd R Lewis, Juli Ponce, Silvia Ibrean, Tiago Miguel Ferreira, Chinkie Peliño-Golle, Ye Mu, Manuel Delgado, Elizabeth Silvestre Espinoza, Martin Keulertz, Deepak Gopinath, Cheng Li,	(参考訳) 複雑な災害リスクを管理するには学際的な努力が必要だ。法、社会科学、自然科学のサイロを断ち切ることは、災害リスク低減のあらゆるプロセスに不可欠である。これにより、法と自然環境の交差に大きな影響を及ぼしたAI技術の急速な進化のための適応システムが可能になる。 AIが法的枠組みや環境管理にどのように影響するかを探求すると同時に、法的および環境的配慮が社会経済領域内でAIをいかに分断するかを検討することが不可欠である。共同制作のレビューの観点から、弁護士、社会科学者、環境科学者の洞察に基づいて、責任あるデータマイニングの原則は、安全性、透明性、公正性、説明責任、競争性に基づいて提案される。この議論は、環境科学と社会科学の知識のAI統合に基づく適応的な法体系を構築するための学際協力のための青写真を提供する。環境科学者と意思決定者の間の言語使用の相違は、安全で信頼性があり、挑戦可能な災害管理フレームワークに対する法的考察の原則に基づいて、AIがいかに有用で正確かを妨げる。ソーシャルネットワークがAIに基づく災害リスク軽減に有用である場合、災害管理の結果のプライバシーと責任に関連する法的意味を考慮する必要がある。公正で説明可能な原則は、環境配慮を強調し、公的な関与に関する社会経済的な議論を促進する。また、AIは教育において重要な役割を担い、次の世代の法学、社会科学、自然科学を融合させ、調和において学際的な解決策に取り組む。 Managing complex disaster risks requires interdisciplinary efforts. Breaking down silos between law, social sciences, and natural sciences is critical for all processes of disaster risk reduction. This enables adaptive systems for the rapid evolution of AI technology, which has significantly impacted the intersection of law and natural environments. Exploring how AI influences legal frameworks and environmental management, while also examining how legal and environmental considerations can confine AI within the socioeconomic domain, is essential. From a co-production review perspective, drawing on insights from lawyers, social scientists, and environmental scientists, principles for responsible data mining are proposed based on safety, transparency, fairness, accountability, and contestability. This discussion offers a blueprint for interdisciplinary collaboration to create adaptive law systems based on AI integration of knowledge from environmental and social sciences. Discrepancies in the use of language between environmental scientists and decision-makers in terms of usefulness and accuracy hamper how AI can be used based on the principles of legal considerations for a safe, trustworthy, and contestable disaster management framework. When social networks are useful for mitigating disaster risks based on AI, the legal implications related to privacy and liability of the outcomes of disaster management must be considered. Fair and accountable principles emphasise environmental considerations and foster socioeconomic discussions related to public engagement. AI also has an important role to play in education, bringing together the next generations of law, social sciences, and natural sciences to work on interdisciplinary solutions in harmony.	翻訳日:2024-10-31 22:17:22 公開日:2024-09-20
# クロス・オーガン・クロス・スキャンナー・アデノシノーマ・セグメンテーションのためのクロス・タスク・プレトレーニング Cross-Task Pretraining for Cross-Organ Cross-Scanner Adenocarcinoma Segmentation ( http://arxiv.org/abs/2410.07124v1 ) ライセンス: Link先を確認	Adrian Galdran,	(参考訳) この要約は、病理組織像からCOSAS 2024のCross-OrganおよびCross-Scanner Adenocarcinoma Segmentationに対する解決策を記述したものである。このタイプのがんを分別する作業における大きな課題は、取得装置(顕微鏡)を変更したり、異なる臓器から組織が来たりするときに発生する顕著なドメインシフトである。 COSASで提案された2つのタスクは、3つの異なる臓器からの画像のデータセットをトレーニングし、次に、見えない臓器からのデータ(データセットT1)のセグメンテーションを予測し、3つの異なるスキャナーで取得した画像のデータセットをトレーニングし、次に別の無見えない顕微鏡で取得した画像のセグメンテーションをトレーニングすることであった。我々は,データセット毎の標準トレーニング,データセットT1の事前トレーニング,データセットT2の微調整(およびその逆),データセットAとBの組み合わせによるトレーニングという,3つの戦略を試して,ドメインシフトギャップを埋めようとした。 This short abstract describes a solution to the COSAS 2024 competition on Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation from histopathological image patches. The main challenge in the task of segmenting this type of cancer is a noticeable domain shift encountered when changing acquisition devices (microscopes) and also when tissue comes from different organs. The two tasks proposed in COSAS were to train on a dataset of images from three different organs, and then predict segmentations on data from unseen organs (dataset T1), and to train on a dataset of images acquired on three different scanners and then segment images acquired with another unseen microscope. We attempted to bridge the domain shift gap by experimenting with three different strategies: standard training for each dataset, pretraining on dataset T1 and then fine-tuning on dataset T2 (and vice-versa, a strategy we call \textit{Cross-Task Pretraining}), and training on the combination of dataset A and B. Our experiments showed that Cross-Task Pre-training is a more promising approach to domain generalization.	翻訳日:2024-10-31 22:17:22 公開日:2024-09-20
# 空間集積クラスタを用いた簡易位置細胞型可視化 A Simplified Positional Cell Type Visualization using Spatially Aggregated Clusters ( http://arxiv.org/abs/2410.07125v1 ) ライセンス: Link先を確認	Lee Mason, Jonas Almeida,	(参考訳) 組織画像に細胞型比例データをオーバーレイする新しい手法を提案する。このアプローチは、視覚的乱雑を避けたり、基盤となるスライドを過度に無視したりしながら、空間的コンテキストを保存する。提案手法では,データをクラスタ化し,同一クラスタの隣接点をポリゴンに集約する。 We introduce a novel method for overlaying cell type proportion data onto tissue images. This approach preserves spatial context while avoiding visual clutter or excessively obscuring the underlying slide. Our proposed technique involves clustering the data and aggregating neighboring points of the same cluster into polygons.	翻訳日:2024-10-31 22:17:22 公開日:2024-09-20
# データのためのブロックチェーンで実現可能な変分情報基盤自動車のインターネットにおける相互情報に基づく抽出 Blockchain-Enabled Variational Information Bottleneck for Data Extraction Based on Mutual Information in Internet of Vehicles ( http://arxiv.org/abs/2409.17287v1 ) ライセンス: Link先を確認	Cui Zhang, Wenjun Zhang, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief	(参考訳) Internet of Vehicles(IoV)ネットワークは、個々の車両の限られたコンピューティングリソースとデータ処理能力の問題に対処できるが、同時に、車両利用者にプライバシー漏洩のリスクも生じる。ブロックチェーン技術を適用することで、IoV内のセキュアなデータリンクを確立することが可能になる。しかし、IoVの開発に伴い、複数の車両間のデータ通信量や、車両と基地局、道路側ユニット等が継続的に増加傾向にある。インタラクションのボリュームをさらに削減する必要がある。この問題を解決する上では、インテリジェントなデータ圧縮が鍵となる。 VIB技術は、符号化と復号化の訓練を容易にし、送信するデータ量を大幅に減らす。本稿では、ブロックチェーンをBVIBと呼ぶVIBと統合する革新的なアプローチを紹介し、計算処理の軽量化とネットワークのセキュリティ強化を目的としている。まず、計算負荷問題に対処するため、符号化と復号化を分離して新しいネットワークフレームワークを構築し、次に、IoVネットワークのセキュリティを高めるための新しいアルゴリズムを提案する。また、最も適切なデータ抽出率を決定するために、データ抽出率がシステム遅延に与える影響についても論じる。 PythonとC++を組み合わせた実験的なフレームワークが確立され,BVIBアプローチの有効性が実証された。包括的シミュレーション研究は、BVIBがオルタナティブな基礎的方法論と比較して一貫して優れていることを示唆している。 The Internet of Vehicles (IoV) network can address the issue of limited computing resources and data processing capabilities of individual vehicles, but it also brings the risk of privacy leakage to vehicle users. Applying blockchain technology can establish secure data links within the IoV, solving the problems of insufficient computing resources for each vehicle and the security of data transmission over the network. However, with the development of the IoV, the amount of data interaction between multiple vehicles and between vehicles and base stations, roadside units, etc., is continuously increasing. There is a need to further reduce the interaction volume, and intelligent data compression is key to solving this problem. The VIB technique facilitates the training of encoding and decoding models, substantially diminishing the volume of data that needs to be transmitted. This paper introduces an innovative approach that integrates blockchain with VIB, referred to as BVIB, designed to lighten computational workloads and reinforce the security of the network. We first construct a new network framework by separating the encoding and decoding networks to address the computational burden issue, and then propose a new algorithm to enhance the security of IoV networks. We also discuss the impact of the data extraction rate on system latency to determine the most suitable data extraction rate. An experimental framework combining Python and C++ has been established to substantiate the efficacy of our BVIB approach. Comprehensive simulation studies indicate that the BVIB consistently excels in comparison to alternative foundational methodologies.	翻訳日:2024-09-30 12:41:44 公開日:2024-09-20
# 抑うつ診断対話シミュレーション:三次記憶を伴う自己改善精神科医 Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary Memory ( http://arxiv.org/abs/2409.15084v1 ) ライセンス: Link先を確認	Kunyao Lan, Bingui Jin, Zichen Zhu, Siyuan Chen, Shu Zhang, Kenny Q. Zhu, Mengyue Wu,	(参考訳) 精神疾患、特にうつ病は、効果的な自動診断方法の開発を必要とする現代社会において重大な課題を呈している。本稿では,患者と精神科医の対話を模擬してうつ病診断を促進する自己改善型会話エージェントシステムであるエージェント・メンタル・クリニック(AMC)を紹介する。対話の質と診断精度を高めるため,第3の記憶構造,「スーパーバイザ」として機能する対話制御,および記憶サンプリングモジュールからなる精神科医エージェントを設計し,精神科医エージェントが反映するスキルを十分に活用し,抑うつリスクや自殺リスク診断の精度を高める。実生活シナリオで収集したデータセットを用いた実験結果から, 精神科医の訓練手順を模擬したシステムが, LLMの重みを変更することなく, 特定の領域における実生活分布とLLMを整合させる, 有望な最適化手法であることが示された。 Mental health issues, particularly depressive disorders, present significant challenges in contemporary society, necessitating the development of effective automated diagnostic methods. This paper introduces the Agent Mental Clinic (AMC), a self-improving conversational agent system designed to enhance depression diagnosis through simulated dialogues between patient and psychiatrist agents. To enhance the dialogue quality and diagnosis accuracy, we design a psychiatrist agent consisting of a tertiary memory structure, a dialogue control and reflect plugin that acts as ``supervisor'' and a memory sampling module, fully leveraging the skills reflected by the psychiatrist agent, achieving great accuracy on depression risk and suicide risk diagnosis via conversation. Experiment results on datasets collected in real-life scenarios demonstrate that the system, simulating the procedure of training psychiatrists, can be a promising optimization method for aligning LLMs with real-life distribution in specific domains without modifying the weights of LLMs, even when only a few representative labeled cases are available.	翻訳日:2024-09-26 14:44:12 公開日:2024-09-20
# DS2TA:時空間アテンションを減衰したスパイキング変圧器 DS2TA: Denoising Spiking Transformer with Attenuated Spatiotemporal Attention ( http://arxiv.org/abs/2409.15375v1 ) ライセンス: Link先を確認	Boxun Xu, Hejia Geng, Yuxuan Yin, Peng Li,	(参考訳) 視覚変換器 (ViT) は、様々な視覚アプリケーションにおいて現在選択されている高性能モデルである。近年の進歩は、スパイクニューラルネットワークの可能性を完全に解き放つことなく、ニューロモルフィックハードウェアの超低消費電力動作で成長する生物学的にインスパイアされたスパイクトランスフォーマーを生み出している。本稿では,視覚アプリケーション専用に設計された時空間アテンションを減衰させたデノイング・スパイキング・トランスフォーマーDS2TAを紹介する。 DS2TAは、時間と空間の両方で発生する入力の発火相関を考慮し、トランスアーキテクチャのコアにおけるスパイキングニューロンの計算能力を完全に活用する、新しいスパイキング減衰時空間注意機構を導入している。重要なことに、DS2TAは余分な重みを導入することなくパラメータ効率の良い時空間アテンション計算を容易にする。 DS2TAは、効率的なハッシュマップベースの非線形スパイクアテンションデノイザを用いて、スパイキングアテンションマップの堅牢性と表現力を高める。 DS2TAは、広く採用されている静的画像と動的ニューロモルフィックデータセットの最先端性能を示す。 CIFAR100では94.92%、CIFAR100では77.47%、CIFAR10-DVSでは79.1%、DVS-Gestureでは94.44%である。 Vision Transformers (ViT) are current high-performance models of choice for various vision applications. Recent developments have given rise to biologically inspired spiking transformers that thrive in ultra-low power operations on neuromorphic hardware, however, without fully unlocking the potential of spiking neural networks. We introduce DS2TA, a Denoising Spiking transformer with attenuated SpatioTemporal Attention, designed specifically for vision applications. DS2TA introduces a new spiking attenuated spatiotemporal attention mechanism that considers input firing correlations occurring in both time and space, thereby fully harnessing the computational power of spiking neurons at the core of the transformer architecture. Importantly, DS2TA facilitates parameter-efficient spatiotemporal attention computation without introducing extra weights. DS2TA employs efficient hashmap-based nonlinear spiking attention denoisers to enhance the robustness and expressive power of spiking attention maps. DS2TA demonstrates state-of-the-art performances on several widely adopted static image and dynamic neuromorphic datasets. Operated over 4 time steps, DS2TA achieves 94.92% top-1 accuracy on CIFAR10 and 77.47% top-1 accuracy on CIFAR100, as well as 79.1% and 94.44% on CIFAR10-DVS and DVS-Gesture using 10 time steps.	翻訳日:2024-09-26 13:20:55 公開日:2024-09-20
# ControlMath: 制御可能なデータ生成は、数学的ジェネリストモデルを促進する ControlMath: Controllable Data Generation Promotes Math Generalist Models ( http://arxiv.org/abs/2409.15376v1 ) ライセンス: Link先を確認	Nuo Chen, Ning Wu, Jianhui Chang, Jia Li,	(参考訳) データ拡張に大規模言語モデル(LLM)を使用すると、数学的推論において奨励的な結果が得られる。しかし、これらのアプローチは問題多様性の制約に直面し、ドメイン内/分散データ生成を制限する可能性がある。そこで本研究では,方程式生成モジュールと2つのLLMエージェントを含む反復的手法であるControlMathを提案する。モジュールは多種多様な方程式を生成し、それを問題職人のエージェントが算術語問題に変換する。 Reverse-Agentは高品質なデータをフィルタし、より少ないデータポイントでより良い結果を得る"less is more"の原則に従って選択する。このアプローチにより、特定の領域や分布に限らず、多様な数学の問題を発生させることができる。その結果,190k の数学語問題を含む ControlMathQA が得られた。我々のデータセットとGSM8Kのようなドメイン内データセットを組み合わせることで、モデルを一般化する数学的能力の向上が達成され、特定のドメイン内およびそれ以上のパフォーマンスが向上する。 Utilizing large language models (LLMs) for data augmentation has yielded encouraging results in mathematical reasoning. However, these approaches face constraints in problem diversity, potentially restricting them to in-domain/distribution data generation. To this end, we propose ControlMath, an iterative method involving an equation-generator module and two LLM-based agents. The module creates diverse equations, which the Problem-Crafter agent then transforms into math word problems. The Reverse-Agent filters and selects high-quality data, adhering to the "less is more" principle, achieving better results with fewer data points. This approach enables the generation of diverse math problems, not limited to specific domains or distributions. As a result, we collect ControlMathQA, which involves 190k math word problems. Extensive results prove that combining our dataset with in-domain datasets like GSM8K can help improve the model's mathematical ability to generalize, leading to improved performances both within and beyond specific domains.	翻訳日:2024-09-26 13:20:55 公開日:2024-09-20
# 貧血の鑑別診断支援のための大規模言語モデルの提案 Prompting Large Language Models for Supporting the Differential Diagnosis of Anemia ( http://arxiv.org/abs/2409.15377v1 ) ライセンス: Link先を確認	Elisa Castagnari, Lillian Muyama, Adrien Coulet,	(参考訳) 実際には、臨床医は、検査、観察、イメージングなどの一連の手順に従って診断を行う。診断決定に到達するための経路は、専門家組織が作成したガイドラインによって文書化され、これらの手順を通じて臨床医が正しい診断に到達するよう指導する。これらのガイドラインは医学的推論や医学的知識の統合に有用であるが、欠点もある。多くの場合、大多数の人口に焦点が当てられているため、異常な状態の患者に対処することができず、更新には遅くて費用がかかるため、急激な新興疾患や新しい習慣には適さない。臨床ガイドラインに触発された本研究は,臨床ガイドラインで得られるものと同様の経路を開拓することを目的とした。我々は3つのLarge Language Model (LLMs) -Generative Pretrained Transformer 4 (GPT-4)、Large Language Model Meta AI (LLaMA)、Mistral - を、貧血とそのサブタイプを識別するための合成的で現実的なデータセットでテストした。意思決定プロセスを改善するために高度なプロンプト技術を用いることで,これらのモデルを用いて診断経路を生成する。実験結果から,LPMは患者データから臨床経路の発見において大きな可能性を秘めており,GPT-4はすべての実験で最高の成績を示した。 In practice, clinicians achieve a diagnosis by following a sequence of steps, such as laboratory exams, observations, or imaging. The pathways to reach diagnosis decisions are documented by guidelines authored by expert organizations, which guide clinicians to reach a correct diagnosis through these sequences of steps. While these guidelines are beneficial for following medical reasoning and consolidating medical knowledge, they have some drawbacks. They often fail to address patients with uncommon conditions due to their focus on the majority population, and are slow and costly to update, making them unsuitable for rapidly emerging diseases or new practices. Inspired by clinical guidelines, our study aimed to develop pathways similar to those that can be obtained in clinical guidelines. We tested three Large Language Models (LLMs) -Generative Pretrained Transformer 4 (GPT-4), Large Language Model Meta AI (LLaMA), and Mistral -on a synthetic yet realistic dataset to differentially diagnose anemia and its subtypes. By using advanced prompting techniques to enhance the decision-making process, we generated diagnostic pathways using these models. Experimental results indicate that LLMs hold huge potential in clinical pathway discovery from patient data, with GPT-4 exhibiting the best performance in all conducted experiments.	翻訳日:2024-09-26 13:20:55 公開日:2024-09-20
# 臨床転写の自動化に向けて Toward Automated Clinical Transcriptions ( http://arxiv.org/abs/2409.15378v1 ) ライセンス: Link先を確認	Mitchell A. Klusty, W. Vaiden Logan, Samuel E. Armstrong, Aaron D. Mullen, Caroline N. Leach, Jeff Talbert, V. K. Cody Bumgardner,	(参考訳) 管理文書は医療費の上昇の主要な要因であり、医師のバーンアウトやケアの質の低下など、有害な結果に結びついている。本稿では,音声からテキストへの書き起こしと話者ラベル化(ダイアリゼーション)の最近の進歩を患者間会話に適用するセキュアなシステムを提案する。このシステムは、正確な書き起こしを生成し、潜在的なエラーを強調し、迅速な人間の検証を促進し、さらに必要な手作業を減らすように最適化されている。 40時間以上のシミュレートされた会話に応用して、このシステムは臨床転写を自動化するための有望な基盤を提供する。 Administrative documentation is a major driver of rising healthcare costs and is linked to adverse outcomes, including physician burnout and diminished quality of care. This paper introduces a secure system that applies recent advancements in speech-to-text transcription and speaker-labeling (diarization) to patient-provider conversations. This system is optimized to produce accurate transcriptions and highlight potential errors to promote rapid human verification, further reducing the necessary manual effort. Applied to over 40 hours of simulated conversations, this system offers a promising foundation for automating clinical transcriptions.	翻訳日:2024-09-26 13:20:55 公開日:2024-09-20
# 機械量の不可逆対角化とEPRパラドックス Irreversible Diagonalization of Mechanical Quantities and the EPR Paradox ( http://arxiv.org/abs/2409.15379v1 ) ライセンス: Link先を確認	Tao Liu,	(参考訳) 量子力学的射影作用素の閉包関係は完全に真ではなく、フォック状態のユニタリ変換の下で厳密にファルシファイドすることができる。角運動量 $J_x$, $J_y$, $J_z$ は、フォック状態における連続回転変換の正則集合 $\{\|\phi_n\rangle\} の下で同時に対角化される。 $\{\|\phi_n\rangle\}$'s time reversal $\{ \mathcal{T} \|\phi_n\rangle \}$は座標 q と運動量 p の零点であり、その任意の変換 $\{ \mathcal{D} \mathcal{T} \|\phi_n\rangle \}$ は座標と運動量の両方を同時に対角化する。ディラック状態ベクトルの抽象表現は非アベル群の単位行列 $\{ \mathcal{U}^ \mathcal{H} \mathcal{U} \neq \mathcal{U} \mathcal{U} ^\mathcal{H} \}$ の対称性の破れを意味する。 EPRパラドックスは、物理的現実の可逆的対角化の下で単に誤認であり、不可逆的対角化の下で解決される。 The closure relation of quantum mechanical projection operators is not entirely true; it can be strictly falsified under unitary transformations in Fock states. The angular momentum $J_x$, $J_y$ and $J_z$ are simultaneously diagonalized under the orthonormal set $\{\|\phi_n\rangle\}$ of continuous rotation transformations in Fock states. $\{\|\phi_n\rangle\}$'s time reversal $\{ \mathcal{T} \|\phi_n\rangle \}$ is the zero point of coordinates q and momentum p, and its arbitrary translation transformation $\{ \mathcal{D} \mathcal{T} \|\phi_n\rangle \}$ diagonalizes both coordinates and momentum simultaneously. The abstract representation of the Dirac state vector implies the symmetry breaking of the non-Abelian group unit matrix $\{ \mathcal{U}^ \mathcal{H} \mathcal{U} \neq \mathcal{U} \mathcal{U} ^\mathcal{H} \}$. The EPR paradox is merely a fallacy under the reversible diagonalization of physical reality, it is resolved under irreversible diagonalization.	翻訳日:2024-09-26 13:20:55 公開日:2024-09-20
# Kalahi: フィリピンのための手作りの草の根文化LLM評価スイート Kalahi: A handcrafted, grassroots cultural LLM evaluation suite for Filipino ( http://arxiv.org/abs/2409.15380v1 ) ライセンス: Link先を確認	Jann Railey Montalan, Jian Gang Ngui, Wei Qi Leong, Yosephine Susanto, Hamsawardhini Rengarajan, William Chandra Tjhi, Alham Fikri Aji,	(参考訳) 現在、多言語大言語モデル(LLM)は、必ずしもフィリピンのユーザーに文化的に適切で関連する応答を提供するとは限らない。フィリピン生まれの話者が共同で作成した,文化的LLM評価スイートであるKalahiを紹介する。フィリピンの文化的知識と価値観の共有に関連する世代に対してLLMをテストする150の高品質で手作りでニュアンスなプロンプトで構成されている。カラヒにおける強力なLLMパフォーマンスは、ある状況下で平均的なフィリピン人が言うのと同じような反応をモデルが生成する能力を示している。フィリピン語と多言語を併用したLLM実験を行った。その結果、カライヒはフィリピン人には自明だが、LLMには挑戦的であり、フィリピンのネイティブパフォーマンス89.10%に比べて、解答率は46.0%に過ぎなかった。したがって、カラヒはLLMにおけるフィリピンの文化的表現を正確かつ確実に評価することができる。 Multilingual large language models (LLMs) today may not necessarily provide culturally appropriate and relevant responses to its Filipino users. We introduce Kalahi, a cultural LLM evaluation suite collaboratively created by native Filipino speakers. It is composed of 150 high-quality, handcrafted and nuanced prompts that test LLMs for generations that are relevant to shared Filipino cultural knowledge and values. Strong LLM performance in Kalahi indicates a model's ability to generate responses similar to what an average Filipino would say or do in a given situation. We conducted experiments on LLMs with multilingual and Filipino language support. Results show that Kalahi, while trivial for Filipinos, is challenging for LLMs, with the best model answering only 46.0% of the questions correctly compared to native Filipino performance of 89.10%. Thus, Kalahi can be used to accurately and reliably evaluate Filipino cultural representation in LLMs.	翻訳日:2024-09-26 13:20:55 公開日:2024-09-20
# Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer ( http://arxiv.org/abs/2409.09239v3 ) ライセンス: Link先を確認	Xiang Zhang, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan,	(参考訳) Transformerアーキテクチャはさまざまな言語モデリングタスクに優れ、RNNやLSTMといった従来のニューラルネットワークアーキテクチャよりも優れています。これは部分的には、並列トレーニングと勾配のスムーズな流れを可能にする再帰接続の除去によるものである。しかし、これは再帰構造から離れて、トランスフォーマーモデルをチョムスキーの計算階層の下端に配置し、計算能力に制限を与える。その結果、高度なTransformerベースのモデルでさえ、カウント、文字列反転、乗算といったタスクでかなりの困難に直面している。これらのタスクは、一見初等的なように見えるが、Transformerアーキテクチャの能力を超える計算複雑性のレベルを必要とする。同時に、"Chain of Thought"(CoT)のプロンプトの出現により、トランスフォーマーベースの言語モデルでは、以前は不可能あるいは不十分であったタスクに対処することが可能になった。本研究では、ニューラルネットワークの推論能力と計算可能性に対する、ニューラルネットワークにおけるリカレント構造の影響を徹底的に調査し、ニューラルネットワークの計算能力において自己回帰が果たす役割を対比する。そして、CoTアプローチがリカレントな計算を模倣し、言語モデルのコンテキストにおける自己回帰と再帰の間の橋渡しとして機能する方法について光を当てた。この近似反復は、モデルの性能と計算能力を特に向上する。さらに、最近のリカレントベースのトランスフォーマーモデルの設計は、我々の「完全性」の概念によって、その計算能力に焦点を絞ったものである。これを通じて、ニューラルモデルアーキテクチャに関する洞察を提供し、より良いモデル設計を促進することを目指している。 The Transformer architecture excels in a variety of language modeling tasks, outperforming traditional neural architectures such as RNN and LSTM. This is partially due to its elimination of recurrent connections, which allows for parallel training and a smoother flow of gradients. However, this move away from recurrent structures places the Transformer model at the lower end of Chomsky's computational hierarchy, imposing limitations on its computational abilities. Consequently, even advanced Transformer-based models face considerable difficulties in tasks like counting, string reversal, and multiplication. These tasks, though seemingly elementary, require a level of computational complexity that exceeds the capabilities of the Transformer architecture. Concurrently, the emergence of ``Chain of Thought" (CoT) prompting has enabled Transformer-based language models to tackle tasks that were previously impossible or poorly executed. In this work, we thoroughly investigate the influence of recurrent structures in neural models on their reasoning abilities and computability, contrasting the role autoregression plays in the neural models' computational power. We then shed light on how the CoT approach can mimic recurrent computation and act as a bridge between autoregression and recurrence in the context of language models. It is this approximated recurrence that notably improves the model's performance and computational capacity. Moreover, we revisit recent recurrent-based Transformer model designs, focusing on their computational abilities through our proposed concept of ``recurrence-completeness" and identify key theoretical limitations in models like Linear Transformer and RWKV. Through this, we aim to provide insight into the neural model architectures and prompt better model design.	翻訳日:2024-09-24 11:55:37 公開日:2024-09-20
# Pennsieve: 翻訳神経科学のための共同プラットフォーム Pennsieve: A Collaborative Platform for Translational Neuroscience and Beyond ( http://arxiv.org/abs/2409.10509v2 ) ライセンス: Link先を確認	Zack Goldblum, Zhongchuan Xu, Haoer Shi, Patryk Orzechowski, Jamaal Spence, Kathryn A Davis, Brian Litt, Nishant Sinha, Joost Wagenaar,	(参考訳) 神経科学データの指数的成長は、データ管理と多分野連携を促進するプラットフォームを必要とする。本稿では,これらのニーズを満たすために構築された,オープンソースのクラウドベースの科学データ管理プラットフォームであるPennsieveを紹介する。 Pennsieveは複雑なマルチモーダルデータセットをサポートし、データの視覚化と分析のためのツールを提供する。データ統合には包括的なアプローチを採用しており、研究者はカスタムメタデータスキーマを定義し、高度なツールを使用してデータをフィルタリングしクエリすることができる。 Pennsieveのモジュラーアーキテクチャにより、外部アプリケーションがその機能を拡張することができ、ピアレビューされたデータパブリッシングメカニズムとの協調ワークスペースは、クラウドとオンプレミスの両方で、ダウンストリーム分析に最適化された高品質なデータセットを促進する。 Pennsieveは、NIH SPARC Initiative、NIH HEAL InitiativeのPrecision Human Pain Network、NIH HEAL RE-JOIN Initiativeなどの主要な神経科学研究プログラムのコアを形成している。世界中の80以上の研究グループと、ペンシルバニア大学を通じて臨床現場で大規模な施設間プロジェクトを行っている。 SPARC.Science、Epilepsy.Science、およびPennsieve Discoverポータルを基盤として、Pennsieveは125TB以上の科学データを格納し、350以上のハイインパクトデータセットで35TB以上のデータを公開している。データ共有の発見可能で、アクセス可能で、相互運用可能で、再利用可能な(FAIR)原則に準拠しており、NIHが承認したデータリポジトリの1つとして認識されている。科学データ管理、発見、分析を容易にすることで、ペンシーブは神経科学などのための堅牢で協力的な研究エコシステムを育てている。 The exponential growth of neuroscientific data necessitates platforms that facilitate data management and multidisciplinary collaboration. In this paper, we introduce Pennsieve - an open-source, cloud-based scientific data management platform built to meet these needs. Pennsieve supports complex multimodal datasets and provides tools for data visualization and analyses. It takes a comprehensive approach to data integration, enabling researchers to define custom metadata schemas and utilize advanced tools to filter and query their data. Pennsieve's modular architecture allows external applications to extend its capabilities, and collaborative workspaces with peer-reviewed data publishing mechanisms promote high-quality datasets optimized for downstream analysis, both in the cloud and on-premises. Pennsieve forms the core for major neuroscience research programs including NIH SPARC Initiative, NIH HEAL Initiative's PRECISION Human Pain Network, and NIH HEAL RE-JOIN Initiative. It serves more than 80 research groups worldwide, along with several large-scale, inter-institutional projects at clinical sites through the University of Pennsylvania. Underpinning the SPARC.Science, Epilepsy.Science, and Pennsieve Discover portals, Pennsieve stores over 125 TB of scientific data, with 35 TB of data publicly available across more than 350 high-impact datasets. It adheres to the findable, accessible, interoperable, and reusable (FAIR) principles of data sharing and is recognized as one of the NIH-approved Data Repositories. By facilitating scientific data management, discovery, and analysis, Pennsieve fosters a robust and collaborative research ecosystem for neuroscience and beyond.	翻訳日:2024-09-24 11:55:37 公開日:2024-09-20
# CoMamba: リアルタイムの協調認識がステートスペースモデルにロックされていない CoMamba: Real-time Cooperative Perception Unlocked with State Space Models ( http://arxiv.org/abs/2409.10699v2 ) ライセンス: Link先を確認	Jinlong Li, Xinyu Liu, Baolu Li, Runsheng Xu, Jiachen Li, Hongkai Yu, Zhengzhong Tu,	(参考訳) 協調認識システムは、車両自律の安全性と効率を高める上で重要な役割を担っている。近年の研究では、自動運転車におけるV2X( vehicle-to-everything)通信技術の有効性が強調されているが、重要な課題は、車やインフラなどの接続エージェントのネットワークをまたいで、複数の高帯域機能を効率的に統合する方法である。本稿では,リアルタイム車載認識に状態空間モデルを活用することを目的とした,新しい協調型3D検出フレームワークであるCoMambaを紹介する。従来の最先端トランスフォーマーベースモデルと比較して、CoMambaは2方向状態空間モデルを用いたよりスケーラブルな3Dモデルであり、注意機構の2次複雑さの痛み点を回避している。 V2X/V2Vデータセットの広範な実験を通じて、CoMambaは、リアルタイム処理能力を維持しながら、既存の方法よりも優れたパフォーマンスを実現している。提案手法は,物体検出精度を向上するだけでなく,処理時間を大幅に短縮すると共に,知的輸送ネットワークにおける次世代協調認識システムに有望なソリューションとなる。 Cooperative perception systems play a vital role in enhancing the safety and efficiency of vehicular autonomy. Although recent studies have highlighted the efficacy of vehicle-to-everything (V2X) communication techniques in autonomous driving, a significant challenge persists: how to efficiently integrate multiple high-bandwidth features across an expanding network of connected agents such as vehicles and infrastructure. In this paper, we introduce CoMamba, a novel cooperative 3D detection framework designed to leverage state-space models for real-time onboard vehicle perception. Compared to prior state-of-the-art transformer-based models, CoMamba enjoys being a more scalable 3D model using bidirectional state space models, bypassing the quadratic complexity pain-point of attention mechanisms. Through extensive experimentation on V2X/V2V datasets, CoMamba achieves superior performance compared to existing methods while maintaining real-time processing capabilities. The proposed framework not only enhances object detection accuracy but also significantly reduces processing time, making it a promising solution for next-generation cooperative perception systems in intelligent transportation networks.	翻訳日:2024-09-24 11:55:37 公開日:2024-09-20
# 情報検索景観の探索:新しい評価手法と比較文書分割手法の検討 Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods ( http://arxiv.org/abs/2409.08479v2 ) ライセンス: Link先を確認	Esmaeil Narimissa, David Raithel,	(参考訳) 情報検索における検索・拡張生成(RAG)システムの性能は,処理中の文書の特徴に大きく影響される。本研究では, 教科書の構造的性質, 記事の簡潔さ, 小説の物語的複雑さについて, 明確な検索戦略が必要であることを示した。複数の文書分割手法の比較評価により,再帰的文字分割法は文脈整合性を保つ上で,トークンベースの分割法よりも優れていることが明らかになった。オープンソースのモデルを用いて、質問と回答のペアの包括的なデータセットを生成し、現実的な予測シナリオをシミュレートして、テスト効率とメートル法信頼性を向上させる、新しい評価手法が導入された。評価には、SequenceMatcher、BLEU、METEOR、BERT Scoreなどの重み付けされたスコアを使用して、システムの正確性と妥当性を評価する。このアプローチは、RAGシステムの精度を評価するための洗練された標準を確立し、今後の研究は、チャンクとオーバーラップサイズを最適化し、精度と効率を改善することに注力する。 The performance of Retrieval-Augmented Generation (RAG) systems in information retrieval is significantly influenced by the characteristics of the documents being processed. In this study, the structured nature of textbooks, the conciseness of articles, and the narrative complexity of novels are shown to require distinct retrieval strategies. A comparative evaluation of multiple document-splitting methods reveals that the Recursive Character Splitter outperforms the Token-based Splitter in preserving contextual integrity. A novel evaluation technique is introduced, utilizing an open-source model to generate a comprehensive dataset of question-and-answer pairs, simulating realistic retrieval scenarios to enhance testing efficiency and metric reliability. The evaluation employs weighted scoring metrics, including SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy and relevance. This approach establishes a refined standard for evaluating the precision of RAG systems, with future research focusing on optimizing chunk and overlap sizes to improve retrieval accuracy and efficiency.	翻訳日:2024-09-23 20:14:44 公開日:2024-09-20
# Fourier Kolmogorov-Arnold ネットワークによる入射神経表現 Implicit Neural Representations with Fourier Kolmogorov-Arnold Networks ( http://arxiv.org/abs/2409.09323v2 ) ライセンス: Link先を確認	Ali Mehrabian, Parsa Mojarad Adi, Moein Heidari, Ilker Hacihaliloglu,	(参考訳) 入射神経表現(INR)は、少数のパラメータを持つ複雑な信号の連続的かつ解像度に依存しない表現を提供するためにニューラルネットワークを使用する。しかし、既存のINRモデルは、各タスク固有の重要な周波数成分をキャプチャできないことが多い。本稿では,INRに対するフーリエ・コルモゴロフ・アーノルドネットワーク(FKAN)を提案する。提案したFKANは、第1層のフーリエ級数としてモデル化された学習可能なアクティベーション関数を用いて、タスク固有の周波数成分を効果的に制御し、学習する。さらに、学習可能なフーリエ係数を持つアクティベーション関数により、ネットワークの複雑なパターンや詳細をキャプチャする能力が向上し、高解像度で高次元のデータに有用である。実験結果から,提案したFKANモデルは3つの最先端ベースライン方式より優れており,画像表現タスクのピーク信号対雑音比(PSNR)と構造類似度指数(SSIM)と,3次元占有量表現タスクの結合(IoU)とがそれぞれ向上していることがわかった。 Implicit neural representations (INRs) use neural networks to provide continuous and resolution-independent representations of complex signals with a small number of parameters. However, existing INR models often fail to capture important frequency components specific to each task. To address this issue, in this paper, we propose a Fourier Kolmogorov Arnold network (FKAN) for INRs. The proposed FKAN utilizes learnable activation functions modeled as Fourier series in the first layer to effectively control and learn the task-specific frequency components. In addition, the activation functions with learnable Fourier coefficients improve the ability of the network to capture complex patterns and details, which is beneficial for high-resolution and high-dimensional data. Experimental results show that our proposed FKAN model outperforms three state-of-the-art baseline schemes, and improves the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) for the image representation task and intersection over union (IoU) for the 3D occupancy volume representation task, respectively.	翻訳日:2024-09-23 20:14:44 公開日:2024-09-20
# 強化学習における自律ゴール検出とセッセーション:音源推定を事例として Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation ( http://arxiv.org/abs/2409.09541v2 ) ライセンス: Link先を確認	Yiwei Shi, Muning Wen, Qi Zhang, Weinan Zhang, Cunjia Liu, Weiru Liu,	(参考訳) 強化学習は動的環境における意思決定プロセスに革命をもたらしたが、明確なフィードバック信号なしで目標を自律的に検出し達成することに苦慮することが多い。例えば、ソース条件推定問題では、正確な環境情報がないため、明確なフィードバック信号を提供し、ソースの位置がどのように決定されるかを定義し、評価することは困難である。この課題に対処するため,自律目標検出・シースレーション(AGDC)モジュールが開発され,自律目標検出とタスク完了時の停止のための自己フィードバック機構を組み込むことで,様々なRLアルゴリズムが強化された。提案手法は,エージェントの信念を近似することにより,未定義の目標を効果的に識別・停止し,限られたフィードバックでRLアルゴリズムの能力を大幅に向上させる。提案手法の有効性を検証するため,AGDCを深部Q-Network,近性ポリシー最適化,深部決定性ポリシー勾配アルゴリズムと統合し,その性能評価を行った。実験の結果, AGDCによるRLアルゴリズムは, インフォタキシー, エントロキシー, 二重制御などの従来の統計手法や, 非統計的ランダムな行動選択法よりも優れていた。これらの改善は成功率、平均走行距離、探索時間の観点から明らかであり、複雑な実世界のシナリオにおけるAGDCの有効性と効率を強調した。 Reinforcement Learning has revolutionized decision-making processes in dynamic environments, yet it often struggles with autonomously detecting and achieving goals without clear feedback signals. For example, in a Source Term Estimation problem, the lack of precise environmental information makes it challenging to provide clear feedback signals and to define and evaluate how the source's location is determined. To address this challenge, the Autonomous Goal Detection and Cessation (AGDC) module was developed, enhancing various RL algorithms by incorporating a self-feedback mechanism for autonomous goal detection and cessation upon task completion. Our method effectively identifies and ceases undefined goals by approximating the agent's belief, significantly enhancing the capabilities of RL algorithms in environments with limited feedback. To validate effectiveness of our approach, we integrated AGDC with deep Q-Network, proximal policy optimization, and deep deterministic policy gradient algorithms, and evaluated its performance on the Source Term Estimation problem. The experimental results showed that AGDC-enhanced RL algorithms significantly outperformed traditional statistical methods such as infotaxis, entrotaxis, and dual control for exploitation and exploration, as well as a non-statistical random action selection method. These improvements were evident in terms of success rate, mean traveled distance, and search time, highlighting AGDC's effectiveness and efficiency in complex, real-world scenarios.	翻訳日:2024-09-23 20:14:44 公開日:2024-09-20
# TG-LLaVA:学習可能な潜伏埋め込みによるテキストガイドLLaVA TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings ( http://arxiv.org/abs/2409.09564v2 ) ライセンス: Link先を確認	Dawei Yan, Pengcheng Li, Yang Li, Hao Chen, Qingguo Chen, Weihua Luo, Wei Dong, Qingsen Yan, Haokui Zhang, Chunhua Shen,	(参考訳) 現在、視覚言語モデル(VLM)の成功に触発されて、多くの研究者がVLMの改善に注力し、有望な成果を上げている。しかし、既存のほとんどのメソッドはコネクタの最適化と言語モデルコンポーネントの強化に集中しており、ビジョンエンコーダ自体の改善は無視している。対照的に、本論文では、視覚エンコーダをテキストで導くことでVLMを最適化し、新しい直交最適化方向を提供するテキストガイド付きLLaVA(TG-LLaVA)を提案する。具体的には、人間の行動に固有の目的駆動論理にインスパイアされ、学習可能な潜伏埋め込みをブリッジとして使用し、テキストの指示を分析し、分析結果を視覚エンコーダにガイダンスとして付加し、それを精製する。その後、別の潜伏埋め込みセットは、高解像度ローカルパッチから追加の詳細なテキスト誘導情報を補助情報として抽出する。最後に、テキストのガイダンスによって、視覚エンコーダは、人間が質問を考えるとき、画像の最も関連性の高い部分に集中する方法と同様に、テキスト関連の特徴を抽出することができる。その結果、より良い回答が得られます。提案手法の有効性を検証した各種データセットの実験を行った。注目すべきは、追加のトレーニングデータを必要とせずに、提案手法は、他の並行メソッドと比較して、ベースライン(LLaVA-1.5)により多くの利益をもたらすことができることだ。さらに,提案手法は異なる設定で常に改善をもたらす。 Currently, inspired by the success of vision-language models (VLMs), an increasing number of researchers are focusing on improving VLMs and have achieved promising results. However, most existing methods concentrate on optimizing the connector and enhancing the language model component, while neglecting improvements to the vision encoder itself. In contrast, we propose Text Guided LLaVA (TG-LLaVA) in this paper, which optimizes VLMs by guiding the vision encoder with text, offering a new and orthogonal optimization direction. Specifically, inspired by the purpose-driven logic inherent in human behavior, we use learnable latent embeddings as a bridge to analyze textual instruction and add the analysis results to the vision encoder as guidance, refining it. Subsequently, another set of latent embeddings extracts additional detailed text-guided information from high-resolution local patches as auxiliary information. Finally, with the guidance of text, the vision encoder can extract text-related features, similar to how humans focus on the most relevant parts of an image when considering a question. This results in generating better answers. Experiments on various datasets validate the effectiveness of the proposed method. Remarkably, without the need for additional training data, our propsoed method can bring more benefits to the baseline (LLaVA-1.5) compared with other concurrent methods. Furthermore, the proposed method consistently brings improvement in different settings.	翻訳日:2024-09-23 20:14:44 公開日:2024-09-20
# 音声セグメンテーションのための自己教師付き表現付き簡易HMM A Simple HMM with Self-Supervised Representations for Phone Segmentation ( http://arxiv.org/abs/2409.09646v2 ) ライセンス: Link先を確認	Gene-Ping Yang, Hao Tang,	(参考訳) 近年の自己教師型表現の進歩にもかかわらず、教師なし音声のセグメンテーションは依然として困難である。殆どのアプローチは、自己教師付き学習による音声表現の改善に重点を置いており、改善が音節分割に移行できることを期待している。本稿では,近年のアプローチとは対照的に,メルスペクトルのピーク検出は,多くの自己監督的手法よりも強いベースラインであることを示す。そこで本研究では,携帯電話のセグメンテーションのための境界における自己教師付き表現と特徴を用いた隠れマルコフモデルを提案する。提案手法は, 従来手法よりも一貫した改良を図り, 汎用設計への適応を可能にする一般化された定式化を行った。 Despite the recent advance in self-supervised representations, unsupervised phonetic segmentation remains challenging. Most approaches focus on improving phonetic representations with self-supervised learning, with the hope that the improvement can transfer to phonetic segmentation. In this paper, contrary to recent approaches, we show that peak detection on Mel spectrograms is a strong baseline, better than many self-supervised approaches. Based on this finding, we propose a simple hidden Markov model that uses self-supervised representations and features at the boundaries for phone segmentation. Our results demonstrate consistent improvements over previous approaches, with a generalized formulation allowing versatile design adaptations.	翻訳日:2024-09-23 20:14:44 公開日:2024-09-20
# 高分子ブレンドを用いた原子間力顕微鏡(AFM)画像解析のための機械学習 Machine Learning for Analyzing Atomic Force Microscopy (AFM) Images Generated from Polymer Blends ( http://arxiv.org/abs/2409.11438v2 ) ライセンス: Link先を確認	Aanish Paruchuri, Yunfei Wang, Xiaodan Gu, Arthi Jayaraman,	(参考訳) 本稿では,高分子膜から得られた原子間力顕微鏡画像内の領域を特定するために,教師なし学習技術を用いた新しい機械学習ワークフローを提案する。このワークフローの目的は、2種類のポリマードメインの空間的位置を手動介入をほとんど行わずに同定し、ドメインサイズ分布を計算し、その結果、材料の相分離状態をマクロ相またはミクロ相秩序または乱れ領域として評価することである。高分子科学や工学の分野で頻繁に発生する上記の課題に応用可能な,コンピュータビジョンや信号処理など,他の分野で使用されている既存のアプローチを概観する。次に、コンピュータビジョンとAFM画像データセット上の信号処理からこれらのアプローチを検証し、これらのアプローチの長所と短所を特定する。最初のドメインセグメンテーションタスクでは、離散フーリエ変換や分散統計を用いた離散コサイン変換を用いたワークフローが最適であることがわかった。コンピュータビジョン分野からのResNet50のディープラーニングアプローチは、DFTやDCTベースのワークフローと比較して、AFM画像の領域分割タスクにおいて、比較的低い性能を示した。第2のタスクでは、144個の入力AFM画像に対して、既存のポリーピー・ピソン・パッケージを使用して、DFTベースのワークフローからその画像の出力から領域サイズ分布を計算する。本稿では, 結晶又はアモルファス領域を有するポリマー試料からAIM画像の自動解析を行うためのMLモデリングとワークフロー, ドメイン間の鋭い界面や粗い界面, ミクロ・マクロ相分離領域などを求める, 高分子・軟質材料分野の研究者のためのガイドとして機能する。 In this paper we present a new machine learning workflow with unsupervised learning techniques to identify domains within atomic force microscopy images obtained from polymer films. The goal of the workflow is to identify the spatial location of the two types of polymer domains with little to no manual intervention and calculate the domain size distributions which in turn can help qualify the phase separated state of the material as macrophase or microphase ordered or disordered domains. We briefly review existing approaches used in other fields, computer vision and signal processing that can be applicable for the above tasks that happen frequently in the field of polymer science and engineering. We then test these approaches from computer vision and signal processing on the AFM image dataset to identify the strengths and limitations of each of these approaches for our first task. For our first domain segmentation task, we found that the workflow using discrete Fourier transform or discrete cosine transform with variance statistics as the feature works the best. The popular ResNet50 deep learning approach from computer vision field exhibited relatively poorer performance in the domain segmentation task for our AFM images as compared to the DFT and DCT based workflows. For the second task, for each of 144 input AFM images, we then used an existing porespy python package to calculate the domain size distribution from the output of that image from DFT based workflow. The information and open source codes we share in this paper can serve as a guide for researchers in the polymer and soft materials fields who need ML modeling and workflows for automated analyses of AFM images from polymer samples that may have crystalline or amorphous domains, sharp or rough interfaces between domains, or micro or macrophase separated domains.	翻訳日:2024-09-23 20:14:44 公開日:2024-09-20
# TPFL:信頼に基づくクラスタリングによるTsetlin-Personalized Federated Learning TPFL: Tsetlin-Personalized Federated Learning with Confidence-Based Clustering ( http://arxiv.org/abs/2409.10392v2 ) ライセンス: Link先を確認	Rasoul Jafari Gohari, Laya Aliahmadipour, Ezat Valipour,	(参考訳) 機械学習の世界(ML)は、新しいモデルやユーザデータを処理する方法に関して、急速に変化している。これまで行ってきた作業の大部分は、ディープラーニング(DL)ベースのアプローチに重点を置いています。しかしながら、Tsetlin Machine (TM)アルゴリズムのような新しいアルゴリズムが出現するにつれて、特定のドメインやアプリケーションに固有の利点をもたらす可能性のある代替アプローチを模索することへの関心が高まっている。これらのドメインのひとつがフェデレートラーニング(FL)であり、ユーザのプライバシが最も重要である。その斬新さのため、FLはパーソナライズ技術の導入が急増し、パーソナライズされた条件下でユーザーのプライバシーを維持しながらモデルの精度が向上した。本研究では,TPFL と呼ばれる新しい手法を提案する。Tsetlin-Personalized Federated Learning では,モデルが特定のクラスに対する信頼度に基づいてクラスタにグループ化される。このようにして、クラスタリングは2つの大きな利点の恩恵を受けることができます。第一に、クライアントは自信のあるものしか共有しないため、トレーニング中に特定のクラスのデータが不十分であった可能性があるクライアントの間で、誤った重み付けが排除される。この現象は、データが非独立でIdentically Distributed(非IID)であるときに発生する。第二に、特定のクラスに対してのみ重みを共有することにより、通信コストが大幅に削減され、TPLFの精度と通信コストの両面で効率が向上する。 TPFLの結果は,MNIST,FashionMNIST,FEMNISTの3つのデータセットに対して高い精度を示した。 The world of Machine Learning (ML) has witnessed rapid changes in terms of new models and ways to process users data. The majority of work that has been done is focused on Deep Learning (DL) based approaches. However, with the emergence of new algorithms such as the Tsetlin Machine (TM) algorithm, there is growing interest in exploring alternative approaches that may offer unique advantages in certain domains or applications. One of these domains is Federated Learning (FL), in which users privacy is of utmost importance. Due to its novelty, FL has seen a surge in the incorporation of personalization techniques to enhance model accuracy while maintaining user privacy under personalized conditions. In this work, we propose a novel approach dubbed TPFL: Tsetlin-Personalized Federated Learning, in which models are grouped into clusters based on their confidence towards a specific class. In this way, clustering can benefit from two key advantages. Firstly, clients share only what they are confident about, resulting in the elimination of wrongful weight aggregation among clients whose data for a specific class may have not been enough during the training. This phenomenon is prevalent when the data are non-Independent and Identically Distributed (non-IID). Secondly, by sharing only weights towards a specific class, communication cost is substantially reduced, making TPLF efficient in terms of both accuracy and communication cost. The results of TPFL demonstrated the highest accuracy on three different datasets; namely MNIST, FashionMNIST and FEMNIST.	翻訳日:2024-09-23 13:03:06 公開日:2024-09-20
# コンテキストリーチ:トランスフォーマーベースのQAモデルのロバスト性を評価する Contextual Breach: Assessing the Robustness of Transformer-based QA Models ( http://arxiv.org/abs/2409.10997v3 ) ライセンス: Link先を確認	Asir Saadat, Nahian Ibn Asad, Md Farhan Ishmam,	(参考訳) 文脈問合せモデルは、現実のシナリオでよく見られる、入力コンテキストに対する敵の摂動に影響を受けやすい。これらの逆方向ノイズは、テキスト入力を歪ませることで、モデルの性能を劣化させるように設計されている。我々は,SQuADデータセット上の5つの異なる強度レベルをそれぞれ適用し,異なる7種類の逆方向ノイズを文脈に組み込んだユニークなデータセットを提案する。このロバスト性を定量化するために、様々なノイズタイプやレベルにわたってモデル性能を評価するための標準化された尺度を提供するロバストネス指標を利用する。トランスフォーマーに基づく質問応答モデルの実験は、現実的なテキスト入力におけるモデルの性能に関する堅牢性脆弱性と重要な洞察を明らかにしている。 Contextual question-answering models are susceptible to adversarial perturbations to input context, commonly observed in real-world scenarios. These adversarial noises are designed to degrade the performance of the model by distorting the textual input. We introduce a unique dataset that incorporates seven distinct types of adversarial noise into the context, each applied at five different intensity levels on the SQuAD dataset. To quantify the robustness, we utilize robustness metrics providing a standardized measure for assessing model performance across varying noise types and levels. Experiments on transformer-based question-answering models reveal robustness vulnerabilities and important insights into the model's performance in realistic textual input.	翻訳日:2024-09-23 13:03:06 公開日:2024-09-20
# 大規模言語モデルのプロンプト難読化 Prompt Obfuscation for Large Language Models ( http://arxiv.org/abs/2409.11026v2 ) ライセンス: Link先を確認	David Pape, Thorsten Eisenhofer, Lea Schönherr,	(参考訳) 基盤となる大きな言語モデル(LLM)によって実行されるタスクを記述するための詳細な命令を含むシステムプロンプトは、基礎モデルを最小限のオーバーヘッドでツールやサービスに簡単に変換できる。ユーティリティに重大な影響を与えるため、ソフトウェア製品のコードと同様、知的財産と見なされることが多い。しかし、プロンプトインジェクションを用いることで、抽出システムプロンプトを容易に得ることができる。現在、システムプロンプトの盗難を防ぐための効果的な対策はなく、すべての保護機構をバイパスする、慎重に作られたプロンプトインジェクションによって、すべての安全対策を回避できる。本研究では,従来のシステムプロンプトの代替案を提案する。本稿では,システム自体の実用性をほとんどオーバーヘッドなく維持しながら,システムプロンプトの抽出を防止するために,プロンプト難読化を導入する。中心となる考え方は、同じ機能につながる元のシステムプロンプトの表現を見つけることであるが、難読化されたシステムプロンプトには、元のシステムプロンプトに関する結論を導き出すための情報が含まれていない。機能を維持しながら難解なプロンプト表現を見つけるために最適化に基づく手法を実装した。提案手法を評価するために,元のシステムプロンプトと難読化システムプロンプトを用いてシステムの性能を比較するため,8種類のメトリクスを調査し,難読化バージョンが元のシステムと常に同等であることを示す。さらに3つの異なる難読化攻撃を行い、難読化プロンプトとLCM自体にアクセスしても、常に意味のある情報を抽出できないことを示す。全体として,プロンプト難読化は知的財産の保護に有効な方法であり,元のシステムと同一の実用性を維持しつつも有効であることを示した。 System prompts that include detailed instructions to describe the task performed by the underlying large language model (LLM) can easily transform foundation models into tools and services with minimal overhead. Because of their crucial impact on the utility, they are often considered intellectual property, similar to the code of a software product. However, extracting system prompts is easily possible by using prompt injection. As of today, there is no effective countermeasure to prevent the stealing of system prompts and all safeguarding efforts could be evaded with carefully crafted prompt injections that bypass all protection mechanisms. In this work, we propose an alternative to conventional system prompts. We introduce prompt obfuscation to prevent the extraction of the system prompt while maintaining the utility of the system itself with only little overhead. The core idea is to find a representation of the original system prompt that leads to the same functionality, while the obfuscated system prompt does not contain any information that allows conclusions to be drawn about the original system prompt. We implement an optimization-based method to find an obfuscated prompt representation while maintaining the functionality. To evaluate our approach, we investigate eight different metrics to compare the performance of a system using the original and the obfuscated system prompts, and we show that the obfuscated version is constantly on par with the original one. We further perform three different deobfuscation attacks and show that with access to the obfuscated prompt and the LLM itself, we are not able to consistently extract meaningful information. Overall, we showed that prompt obfuscation can be an effective method to protect intellectual property while maintaining the same utility as the original system prompt.	翻訳日:2024-09-23 13:03:06 公開日:2024-09-20
# Parquetデータセットフォーマットと回帰モデルの混合精度トレーニングによる機械学習カーボンフットプリントの改善 -その2- Improve Machine Learning carbon footprint using Parquet dataset format and Mixed Precision training for regression models -- Part II ( http://arxiv.org/abs/2409.11071v2 ) ライセンス: Link先を確認	Andrew Antonopoulos,	(参考訳) これは私の修士論文の2番目の部分であり、回帰MLモデルをトレーニングしながら、Comma-Separated-Values(CSV)とparquetデータセットフォーマットをデフォルトの浮動小数点(32bit)とNvidia混合精度(16bit、32bit)と比較します。分類テストと分析に特化した第1部と同じカスタムPCが実験のために構築され、バッチサイズ、ニューロン、エポックなどの異なるMLハイパーパラメータがDeep Neural Networks (DNN)を構築するために選ばれた。 DNNのデフォルトのハイパーパラメータ値によるベンチマークテストが参照として使用され、実験では異なる設定の組み合わせが使用された。結果はExcelに記録され、グループ間の平均値を計算し、グラフとテーブルを用いて比較するために記述統計が選択された。その結果, 混合精度と特定のハイパーパラメータを併用した場合, 有意差は認められなかった。ベンチマークと比較すると、回帰モデルの最適化は7ワットから11ワットまでの消費電力を削減した。その結果,混合精度は消費電力の向上に寄与するが,過度パラメータを慎重に検討する必要があることがわかった。多数のバッチサイズとニューロンが電力消費に悪影響を及ぼす。しかし,本研究では,ANOVAとTテストの関係を比較するために,推論統計(特にANOVAとTテスト)を必要とした。その結果, 回帰試験における平均値と受理H0との間に有意な有意な有意差は認められなかった。したがって、異なるML技術とParquetデータセットフォーマットを選択することで、計算消費電力と全体のML炭素フットプリントを改善することはできない。しかし、GPUのクラスタによるより広範な実装は、本質的な要因であり、統計分析の結果を変える可能性があるため、サンプルサイズを著しく増大させることができる。 This is the 2nd part of the dissertation for my master degree and compared the power consumption using the Comma-Separated-Values (CSV) and parquet dataset format with the default floating point (32bit) and Nvidia mixed precision (16bit and 32bit) while training a regression ML model. The same custom PC as per the 1st part, which was dedicated to the classification testing and analysis, was built to perform the experiments, and different ML hyper-parameters, such as batch size, neurons, and epochs, were chosen to build Deep Neural Networks (DNN). A benchmarking test with default hyper-parameter values for the DNN was used as a reference, while the experiments used a combination of different settings. The results were recorded in Excel, and descriptive statistics were chosen to calculate the mean between the groups and compare them using graphs and tables. The outcome was positive when using mixed precision combined with specific hyper-parameters. Compared to the benchmarking, optimising the regression models reduced the power consumption between 7 and 11 Watts. The regression results show that while mixed precision can help improve power consumption, we must carefully consider the hyper-parameters. A high number of batch sizes and neurons will negatively affect power consumption. However, this research required inferential statistics, specifically ANOVA and T-test, to compare the relationship between the means. The results reported no statistical significance between the means in the regression tests and accepted H0. Therefore, choosing different ML techniques and the Parquet dataset format will not improve the computational power consumption and the overall ML carbon footprint. However, a more extensive implementation with a cluster of GPUs can increase the sample size significantly, as it is an essential factor and can change the outcome of the statistical analysis.	翻訳日:2024-09-23 13:03:05 公開日:2024-09-20
# RoMath: ルーマニアの数学的推論ベンチマーク RoMath: A Mathematical Reasoning Benchmark in Romanian ( http://arxiv.org/abs/2409.11074v2 ) ライセンス: Link先を確認	Adrian Cosma, Ana-Maria Bucur, Emilian Radoi,	(参考訳) 数学は、主に人間の理解のために、長い間自然言語を通して伝えられてきた。機械化数学と証明アシスタントの台頭により、非公式な数学的テキストを理解する必要性が高まっているが、既存のベンチマークのほとんどは英語のみに焦点を絞っており、他の言語を見下ろしている。本稿では,RoMath-Baccalaureate,RoMath-Competitions,RoMath-Syntheticの3つのデータセットからなるルーマニアの数学的推論ベンチマークスイートであるRoMathを紹介する。独特な言語的特徴を持つ低リソース言語であるルーマニア語に焦点を当てることで、RoMathはアングロ中心モデルの限界に対処し、単純な自動翻訳以上の専門的なリソースの必要性を強調している。いくつかのオープンウェイト言語モデルをベンチマークし、表現不足言語のためのリソースを作成することの重要性を強調した。コードとデータセットを利用可能にしています。 Mathematics has long been conveyed through natural language, primarily for human understanding. With the rise of mechanized mathematics and proof assistants, there is a growing need to understand informal mathematical text, yet most existing benchmarks focus solely on English, overlooking other languages. This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising three datasets: RoMath-Baccalaureate, RoMath-Competitions and RoMath-Synthetic, which cover a range of mathematical domains and difficulty levels, aiming to improve non-English language models and promote multilingual AI development. By focusing on Romanian, a low-resource language with unique linguistic features, RoMath addresses the limitations of Anglo-centric models and emphasizes the need for dedicated resources beyond simple automatic translation. We benchmark several open-weight language models, highlighting the importance of creating resources for underrepresented languages. We make the code and dataset available.	翻訳日:2024-09-23 13:03:05 公開日:2024-09-20
# 分布外インテント検出のためのダイバーシティグラウンドチャネルプロトタイプ学習 Diversity-grounded Channel Prototypical Learning for Out-of-Distribution Intent Detection ( http://arxiv.org/abs/2409.11114v2 ) ライセンス: Link先を確認	Bo Liu, Liming Zhan, Yujie Feng, Zexin Lu, Chengqiang Xie, Lei Xue, Albert Y. S. Lam, Xiao-Ming Wu,	(参考訳) タスク指向対話システムでは、実世界のシナリオで発生する不正な発話を効果的に処理する必要がある。本研究は, 大規模言語モデル(LLM)のための新たな微調整フレームワークを提案する。IDクラス名から派生したプロトタイプとのセマンティックマッチングを利用する, 内分布(ID)意図分類と外分布(OOD)意図検出を強化することを目的とした。 LLMの高度に区別可能な表現を利用することで、ダイバーシティグラウンドのプロンプトチューニングアプローチを用いて、各IDクラスのセマンティックプロトタイプを構築する。私たちは、IDクラスとOODクラスがセマンティックに近接しているが区別されていない、難易度の高いOODコンテキストで、我々のフレームワークを厳格にテストします。徹底的な評価のために,本手法を一般的な微調整手法と比較した。実験結果から,本手法は,少数ショットID意図分類と近OOD意図検出の両タスクにおいて,優れた性能を示すことがわかった。 In the realm of task-oriented dialogue systems, a robust intent detection mechanism must effectively handle malformed utterances encountered in real-world scenarios. This study presents a novel fine-tuning framework for large language models (LLMs) aimed at enhancing in-distribution (ID) intent classification and out-of-distribution (OOD) intent detection, which utilizes semantic matching with prototypes derived from ID class names. By harnessing the highly distinguishable representations of LLMs, we construct semantic prototypes for each ID class using a diversity-grounded prompt tuning approach. We rigorously test our framework in a challenging OOD context, where ID and OOD classes are semantically close yet distinct, referred to as \emph{near} OOD detection. For a thorough assessment, we benchmark our method against the prevalent fine-tuning approaches. The experimental findings reveal that our method demonstrates superior performance in both few-shot ID intent classification and near-OOD intent detection tasks.	翻訳日:2024-09-23 13:03:05 公開日:2024-09-20

Title

Authors

Abstract

論文公表日・翻訳日

# 代数的クリプトアナリシスに関するフォーマルパワーシリーズ

Formal Power Series on Algebraic Cryptanalysis ( http://arxiv.org/abs/2007.14729v3 )

ライセンス: Link先を確認

Shuhei Nakamura,

(参考訳) 多項式方程式の系を解くための暗号系を減少させる攻撃の複雑性推定において、第1の転落次数の正則度と上界は、しばしば暗号解析において用いられる。正則性の次数は半正則性仮定の下で単変量形式列を用いて容易に計算できるが、第1の転位次数の上界を決定するためには、入力システムの具体的なシジーを調べる必要がある。本稿では,多項式系における第1降下次数の上界を十分に大域にわたって検討する。この場合、非半正則系の第一降下次数は正則度で上界し、多階多項式系の第一落下次数は、多変量形式的級数列から決定される一定の値で上界することを示す。さらに、多項式系の最初の転倒次数を計算するための理論的な仮定を十分に大きな場上で提供する。

In the complexity estimation for an attack that reduces a cryptosystem to solving a system of polynomial equations, the degree of regularity and an upper bound of the first fall degree are often used in cryptanalysis. While the degree of regularity can be easily computed using a univariate formal power series under the semi-regularity assumption, determining an upper bound of the first fall degree requires investigating the concrete syzygies of an input system. In this paper, we investigate an upper bound of the first fall degree for a polynomial system over a sufficiently large field. In this case, we prove that the first fall degree of a non-semi-regular system is bounded above by the degree of regularity, and that the first fall degree of a multi-graded polynomial system is bounded above by a certain value determined from a multivariate formal power series. Moreover, we provide a theoretical assumption for computing the first fall degree of a polynomial system over a sufficiently large field.

翻訳日:2024-11-09 15:57:56 公開日:2024-09-20

# 高次元データに関する講義ノート

Lecture notes on high-dimensional data ( http://arxiv.org/abs/2101.05841v7 )

ライセンス: Link先を確認

Sven-Ake Wegner,

(参考訳) 以下は、2019-2020年にイギリスでBScの学生に教えた「数学データサイエンス」の講座の最初の部分に基づく講義ノートである。トピックは、高次元における測度集中、高次元におけるガウス確率ベクトル、乱射影、ガウスデータの分離・分離である。改訂版が教科書 (Mathematical Introduction to Data Science, Springer, Berlin, Heidelberg, 2024, https://link.springer.com/book/10.1007/978-3-662-69426-8] の一部として出版された。

These are lecture notes based on the first part of a course on 'Mathematical Data Science', which I taught to final year BSc students in the UK in 2019-2020. Topics include: concentration of measure in high dimensions; Gaussian random vectors in high dimensions; random projections; separation/disentangling of Gaussian data. A revised version has been published as part of the textbook [Mathematical Introduction to Data Science, Springer, Berlin, Heidelberg, 2024, https://link.springer.com/book/10.1007/978-3-662-69426-8].

翻訳日:2024-11-09 15:57:56 公開日:2024-09-20

# 個人データフローの可視化:Booking.comの事例から

Visualising Personal Data Flows: Insights from a Case Study of Booking.com ( http://arxiv.org/abs/2304.09603v5 )

ライセンス: Link先を確認

Haiyue Yuan, Matthew Boakes, Xiao Ma, Dongmei Cao, Shujun Li,

(参考訳) 商業組織は、絶え間なく増加する個人情報を保持し、処理している。ポリシーや法律は、これらの企業がデータの収集、保管、処理、共有に関してより透明性を持たなければならないように、継続的に変更されている。本稿では、プライバシポリシから抽出した個人データフローを可視化するケーススタディとして、Booking.comを取り上げている。消費者の個人情報の共有方法を示すことによって、私たちは質問を提起し、プライバシポリシを使用してオンラインユーザに対して、個人データフローの真の規模と状況について通知する際の課題と制限に関する議論を拡大します。このケーススタディは、よりデータフロー指向のプライバシポリシ分析に関する今後の研究や、複雑なビジネスエコシステムにおける個人データフローに関するより包括的なオントロジーの構築について教えてくれます。

Commercial organisations are holding and processing an ever-increasing amount of personal data. Policies and laws are continually changing to require these companies to be more transparent regarding the collection, storage, processing and sharing of this data. This paper reports our work of taking Booking.com as a case study to visualise personal data flows extracted from their privacy policy. By showcasing how the company shares its consumers' personal data, we raise questions and extend discussions on the challenges and limitations of using privacy policies to inform online users about the true scale and the landscape of personal data flows. This case study can inform us about future research on more data flow-oriented privacy policy analysis and on the construction of a more comprehensive ontology on personal data flows in complicated business ecosystems.

翻訳日:2024-11-09 15:13:22 公開日:2024-09-20

# ARTICLE: 文脈学習によるアノテーションの信頼性

ARTICLE: Annotator Reliability Through In-Context Learning ( http://arxiv.org/abs/2409.12218v2 )

ライセンス: Link先を確認

Sujan Dutta, Deepak Pandita, Tharindu Cyril Weerasooriya, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh,

(参考訳) トレーニングおよび評価データにおけるアノテータの品質を保証することは、NLPにおける機械学習の重要な部分である。感情分析や攻撃的音声検出といった課題は本質的に主観的であり、誠実なアノテータ間の意見の相違による作業不足による不一致の区別が困難であるため、従来の品質評価アプローチでは難しいシナリオを生み出す。一貫性を確保しつつアノテーションの多様な視点を増大させることを目的として,自己整合性を通じてアノテーションの品質を推定するインコンテキストラーニング(ICL)フレームワークである‘texttt{ARTICLE} を提案する。我々は,複数のLLMを用いて2つの攻撃的音声データセット上でこの枠組みを評価し,その性能を従来の手法と比較した。以上の結果から, 信頼性アノテータを識別する堅牢な手法として, <texttt{ARTICLE} が利用可能であることが示唆された。

Ensuring annotator quality in training and evaluation data is a key piece of machine learning in NLP. Tasks such as sentiment analysis and offensive speech detection are intrinsically subjective, creating a challenging scenario for traditional quality assessment approaches because it is hard to distinguish disagreement due to poor work from that due to differences of opinions between sincere annotators. With the goal of increasing diverse perspectives in annotation while ensuring consistency, we propose \texttt{ARTICLE}, an in-context learning (ICL) framework to estimate annotation quality through self-consistency. We evaluate this framework on two offensive speech datasets using multiple LLMs and compare its performance with traditional methods. Our findings indicate that \texttt{ARTICLE} can be used as a robust method for identifying reliable annotators, hence improving data quality.

翻訳日:2024-11-07 15:49:40 公開日:2024-09-20

# 連続変数を持つフォン・ノイマン型相互作用ハミルトニアンからのスペクトル放送構造

Spectrum Broadcast Structures from von Neumann type interaction Hamiltonians with continuous variables ( http://arxiv.org/abs/2409.12372v2 )

ライセンス: Link先を確認

Alberto Acevedo, Janek Wehr, Jarosław Korbicz,

(参考訳) 本稿では,最近確立されたスペクトル放送構造理論(SBS)の数学的基礎に貢献する。これらは多部量子状態であり、目的性の操作的概念を符号化し、より先進的なデコヒーレンスを示す。我々は、自由量子系の理論においてユビキタスなフォン・ノイマン型測定相互作用を介して、N環境と相互作用する中心系において、SBSとSBSへの漸近収束について研究する。系が無限次元ヒルベルト空間によってモデル化され、ハミルトニアンの系に付随する作用素が純粋に連続スペクトルを持つ場合に焦点を当てる。このようなセットアップは、SBS理論で対処されていないヒッヘルトを数学的に複雑化する。

In this paper, we contribute to the mathematical foundations of the recently established theory of Spectrum Broadcast Structures (SBS). These are multipartite quantum states, encoding an operational notion of objectivity and exhibiting a more advanced form of decoherence. We study SBS and asymptotic convergence to SBS in the case of a central system interacting with N environments via the von Neumann-type measurement interactions, ubiquitous in the theory of open quantum systems. We will be focusing on the case where the system is modeled by an infinite-dimensional Hilbert space and the operators associated with the system in the Hamiltonian have purely continuous spectrum. Such a setup yields mathematical complications that have hitherto not been addressed in the theory of SBS.

翻訳日:2024-11-07 15:14:47 公開日:2024-09-20

# 自動走査透過電子顕微鏡実験における教師なし逆方向画像分割

Unsupervised Reward-Driven Image Segmentation in Automated Scanning Transmission Electron Microscopy Experiments ( http://arxiv.org/abs/2409.12462v2 )

ライセンス: Link先を確認

Kamyar Barakati, Utkarsh Pratiush, Austin C. Houston, Gerd Duscher, Sergei V. Kalinin,

(参考訳) 走査透過電子顕微鏡(STEM)における自動実験は、人間の解釈、意思決定、サイト選択分光法、原子操作のためのデータ表現を最適化するために、高速な画像分割を必要とする。現在、セグメンテーションタスクは典型的には、人間のラベル付きデータを必要とし、解像度、サンプリング、ビーム形状の変化に起因する分布外ドリフト効果に敏感な教師付き機械学習手法を用いて実行される。本稿では,STEMにおけるオンザフライ画像解析のための報酬駆動最適化ワークフローの運用とベンチマークを行う。この教師なしのアプローチは、人間のラベルに依存しておらず、完全に説明可能であるため、はるかに堅牢である。説明的フィードバックは、人間が意思決定を検証し、報酬関数のパレートフロンティアに沿って位置を選択することでモデルを調整するのに役立つ。本手法のタイミングと有効性を確立し,高スループットおよび動的自動STEM実験におけるリアルタイム性能を示す。報酬駆動型アプローチは、説明可能な堅牢な分析ワークフローの構築を可能にし、電子顕微鏡や走査型プローブ顕微鏡、化学画像の幅広い画像解析タスクに一般化することができる。

Automated experiments in scanning transmission electron microscopy (STEM) require rapid image segmentation to optimize data representation for human interpretation, decision-making, site-selective spectroscopies, and atomic manipulation. Currently, segmentation tasks are typically performed using supervised machine learning methods, which require human-labeled data and are sensitive to out-of-distribution drift effects caused by changes in resolution, sampling, or beam shape. Here, we operationalize and benchmark a recently proposed reward-driven optimization workflow for on-the fly image analysis in STEM. This unsupervised approach is much more robust, as it does not rely on human labels and is fully explainable. The explanatory feedback can help the human to verify the decision making and potentially tune the model by selecting the position along the Pareto frontier of reward functions. We establish the timing and effectiveness of this method, demonstrating its capability for real-time performance in high-throughput and dynamic automated STEM experiments. The reward driven approach allows to construct explainable robust analysis workflows and can be generalized to a broad range of image analysis tasks in electron and scanning probe microscopy and chemical imaging.

翻訳日:2024-11-07 14:41:29 公開日:2024-09-20

# 自己回帰型言語モデルにおける知識蒸留における分布移動の探索と促進

Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models ( http://arxiv.org/abs/2409.12512v2 )

ライセンス: Link先を確認

Jun Rao, Xuebo Liu, Zepeng Lin, Liang Ding, Jing Li, Dacheng Tao, Min Zhang,

(参考訳) 知識蒸留(KD)は、より小さな学生モデルを模倣するように訓練することで、大きな教師モデルを圧縮する技術である。自動回帰言語モデルにおけるKDの成功は主に、露出バイアスに対処するために、モード探索と学生生成出力(SGO)にReverse KLに依存する。理論的解析と実験による検証の結果,Reverse KLは教師分布の特定の特徴を効果的に模倣するが,その行動のほとんどを捉えないことがわかった。逆に、SGOは、特に生徒モデルが教師モデルよりもかなり小さい場合、高い計算コストを発生させ、最適化の課題を示す。これらの制約は主に教師モデルの不変分布によるもので、様々な大きさのモデルに適応的に適応できない。オンライン知識蒸留(OKD)を導入し、教師ネットワークは小さなオンラインモジュールを統合し、学生モデルと同時学習する。この戦略は、オンラインサンプリングの必要性を排除し、トレーニング中に教師のオンラインモジュールのパラメータを最小限に更新するだけで、学生の配布に動的に適応することで蒸留をより良くする。複数の世代データセットにまたがる大規模な結果から、OKDは様々なモデルアーキテクチャやサイズにおいて、リードメソッドのパフォーマンスを達成または超えることを示し、トレーニング時間を最大4倍に短縮する。

Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them. The success of KD in auto-regressive language models mainly relies on Reverse KL for mode-seeking and student-generated output (SGO) to combat exposure bias. Our theoretical analyses and experimental validation reveal that while Reverse KL effectively mimics certain features of the teacher distribution, it fails to capture most of its behaviors. Conversely, SGO incurs higher computational costs and presents challenges in optimization, particularly when the student model is significantly smaller than the teacher model. These constraints are primarily due to the immutable distribution of the teacher model, which fails to adjust adaptively to models of varying sizes. We introduce Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model. This strategy abolishes the necessity for on-policy sampling and merely requires minimal updates to the parameters of the teacher's online module during training, thereby allowing dynamic adaptation to the student's distribution to make distillation better. Extensive results across multiple generation datasets show that OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.

翻訳日:2024-11-07 14:41:29 公開日:2024-09-20

# Michelangelo: 遅延構造クエリによるHaystackを越えた長期のコンテキスト評価

Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries ( http://arxiv.org/abs/2409.12640v2 )

ライセンス: Link先を確認

Kiran Vodrahalli, Santiago Ontanon, Nilesh Tripuraneni, Kelvin Xu, Sanil Jain, Rakesh Shivanna, Jeffrey Hui, Nishanth Dikkala, Mehran Kazemi, Bahare Fatemi, Rohan Anil, Ethan Dyer, Siamak Shakeri, Roopali Vij, Harsh Mehta, Vinay Ramasesh, Quoc Le, Ed Chi, Yifeng Lu, Orhan Firat, Angeliki Lazaridou, Jean-Baptiste Lespiau, Nithya Attaluri, Kate Olszewska,

(参考訳) ミケランジェロ(Michelangelo)は、大規模言語モデルに対する最小限の、合成的で、未学習の長文推論評価であり、自動採点も容易である。この評価は、任意の長さのコンテキストに対する評価のための、新しい統一されたフレームワークによって導かれる。 Latent Structure Queries Framework (LSQ) の中心的な考え方は、コンテキスト内の無関係な情報を 'chisel away'' するモデルを必要とするタスクを構築し、コンテキスト内の遅延構造を明らかにすることである。この潜在構造に対するモデルの理解を検証するため、モデルに構造の詳細を問い合わせる。 LSQを用いて、コードおよび自然言語ドメイン間での3つの診断長文評価を行い、長文言語モデル機能のより強力な信号を提供する。いくつかの最先端モデルで評価を行い、その両方を実証する。 a) 提案された評価は高信号であり、かつ b)長文情報の合成に改善の余地があること。

We introduce Michelangelo: a minimal, synthetic, and unleaked long-context reasoning evaluation for large language models which is also easy to automatically score. This evaluation is derived via a novel, unifying framework for evaluations over arbitrarily long contexts which measure the model's ability to do more than retrieve a single piece of information from its context. The central idea of the Latent Structure Queries framework (LSQ) is to construct tasks which require a model to ``chisel away'' the irrelevant information in the context, revealing a latent structure in the context. To verify a model's understanding of this latent structure, we query the model for details of the structure. Using LSQ, we produce three diagnostic long-context evaluations across code and natural-language domains intended to provide a stronger signal of long-context language model capabilities. We perform evaluations on several state-of-the-art models and demonstrate both that a) the proposed evaluations are high-signal and b) that there is significant room for improvement in synthesizing long-context information.

翻訳日:2024-11-07 14:08:12 公開日:2024-09-20

# PRAGA:空間多モードオミクス解析のためのプロトタイプ対応グラフ適応アグリゲーション

PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis ( http://arxiv.org/abs/2409.12728v2 )

ライセンス: Link先を確認

Xinlei Huang, Zhiqi Ma, Dian Meng, Yanran Liu, Shiwei Ruan, Qingqiang Sun, Xubin Zheng, Ziyue Qiao,

(参考訳) 2023年にネイチャー・メソッドズ(Nature Methods)によって先進的な生物学的手法として強調された空間多モードオミクス技術は、生物学的規制プロセスを空間的文脈で解決する上で重要な役割を担っている。近年、K-nearest neighbor(KNN)グラフに基づくグラフニューラルネットワークは、シークエンシングスポット間の意味関係をモデル化する能力により、空間的マルチモーダルオミクス法で注目されている。しかし、固定されたKNNグラフは、生物学的シーケンシングプロセス中に避けられないデータ摂動によって隠された潜伏意味関係を捕捉できず、意味情報が失われる。さらに、スポットアノテーションの欠如や、実際にはクラス番号の先行が、空間的マルチモーダルオミクスモデルの最適化を妨げている。本稿では,空間的マルチモーダルオミクス分析(PRAGA)のための空間的マルチモーダルオミクス解決フレームワークであるPRototype-Aware Graph Adaptative Aggregationを提案する。 PRAGAは動的グラフを構築し、潜在意味関係を捉え、空間情報と特徴意味論を包括的に統合する。学習可能なグラフ構造は、クロスモーダルな知識を学習することで摂動を損なうこともできる。さらに, ベイジアン・ガウス混合モデルの動的適応性に基づいて, 未知の生物前駆体に対するマルチモーダルオミクス表現を最適化する, 動的プロトタイプ型コントラスト学習を提案する。 7つの競合する手法によるシミュレーションおよび実データに対する定量的および定性的な実験は、PRAGAの優れた性能を示す。

Spatial multi-modal omics technology, highlighted by Nature Methods as an advanced biological technique in 2023, plays a critical role in resolving biological regulatory processes with spatial context. Recently, graph neural networks based on K-nearest neighbor (KNN) graphs have gained prominence in spatial multi-modal omics methods due to their ability to model semantic relations between sequencing spots. However, the fixed KNN graph fails to capture the latent semantic relations hidden by the inevitable data perturbations during the biological sequencing process, resulting in the loss of semantic information. In addition, the common lack of spot annotation and class number priors in practice further hinders the optimization of spatial multi-modal omics models. Here, we propose a novel spatial multi-modal omics resolved framework, termed PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA). PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics. The learnable graph structure can also denoise perturbations by learning cross-modal knowledge. Moreover, a dynamic prototype contrastive learning is proposed based on the dynamic adaptability of Bayesian Gaussian Mixture Models to optimize the multi-modal omics representations for unknown biological priors. Quantitative and qualitative experiments on simulated and real datasets with 7 competing methods demonstrate the superior performance of PRAGA.

翻訳日:2024-11-07 13:45:42 公開日:2024-09-20

# 天然シリコン/シリコン-ゲルマニウム中の300$\,$mmウェハ加工スピン量子ビット

Industrial 300$\,$mm wafer processed spin qubits in natural silicon/silicon-germanium ( http://arxiv.org/abs/2409.12731v2 )

ライセンス: Link先を確認

Thomas Koch, Clement Godfrin, Viktor Adam, Julian Ferrero, Daniel Schroller, Noah Glaeser, Stefan Kubicek, Ruoyu Li, Roger Loo, Shana Massar, George Simion, Danny Wan, Kristiaan De Greve, Wolfgang Wernsdorfer,

(参考訳) 普遍量子コンピュータの実現には数千から数百万の量子ビットの演算が必要である。既存の産業用半導体製造技術とインフラをアップスケーリングと再現性に利用できるため、シリコンベースのスピンキュービットはこの目標を達成する上で最も有望なプラットフォームとなっている。現在最大の半導体ベースの量子プロセッサの実装は、低電荷ノイズ、長いクビットコヒーレンス時間、高速駆動速度で知られるシリコン/シリコン-ゲルマニウムヘテロ構造で実現されたが、高構造的な複雑さは工業的実装の課題を生み出している。ここでは, 天然Si/SiGeヘテロ構造にホストされる量子ドットを, ヘテロ構造成長からCoマイクロマグネットモノリシック集積に至るまで, 産業用300$\,$mm半導体ウェハプロセスラインで完全に作製した。 2$\,\mathrm{\mu eV/\sqrt{Hz}}$, 1$\,$s, コヒーレンス時間$T_2^*$, $T_2^H$ of 1$\,\mathrm{\mu s}$, 50$\,\mathrm{\mu s}$のスピン緩和時間について報告する。さらに、5$\,$MHzまでのRabi周波数と、99$\,\%$以上の単一キュービットゲートフィデリティを実現する。スケーラビリティに加えて、300$\,$mmプロセスの高い再現性は、キュービット品質の最適化に不可欠であるプロセスパラメータに対するキュービットメートル法依存性の決定論的研究を可能にする。

The realisation of an universal quantum computer will require the operation of thousands to millions of qubits. The possibility of using existing industrial semiconductor fabrication techniques and infrastructure for up-scaling and reproducibility makes silicon based spin qubits one of the most promising platforms to achieve this goal. The implementation of the up to now largest semiconductor based quantum processor was realized in a silicon/silicon-germanium heterostructure known for its low charge noise, long qubit coherence times and fast driving speeds, but the high structural complexity creates challenges for industrial implementations. Here we demonstrate quantum dots hosted in a natural Si/SiGe heterostructure fully fabricated by an industrial 300$\,$mm semiconductor wafer process line from heterostructure growth to Co micromagnet monolithic integration. We report charge noise values below 2$\,\mathrm{\mu eV/\sqrt{Hz}}$, spin relaxation times of over 1$\,$s and coherence times $T_2^*$ and $T_2^H$ of 1$\,\mathrm{\mu s}$ and 50$\,\mathrm{\mu s}$ respectively, for quantum wells grown using natural silicon. Further, we achieve Rabi frequencies up to 5$\,$MHz and single qubit gate fidelities above 99$\,\%$. In addition to scalability, the high reproducibility of the 300$\,$mm processes enables the deterministic study of qubit metric dependencies on process parameters, which is essential for optimising qubit quality.

翻訳日:2024-11-07 13:45:42 公開日:2024-09-20

# 医学用微調整大言語モデル : 直接選好最適化の役割と意義

Fine Tuning Large Language Models for Medicine: The Role and Importance of Direct Preference Optimization ( http://arxiv.org/abs/2409.12741v2 )

ライセンス: Link先を確認

Thomas Savage, Stephen Ma, Abdessalem Boukil, Vishwesh Patel, Ekanath Rangan, Ivan Rodriguez, Jonathan H Chen,

(参考訳) 医学分野では,Large Language Model (LLM) の微調整が不十分である。ファインチューニングの最も一般的な2つの方法は、Supervised Fine Tuning (SFT) と Direct Preference Optimization (DPO) であるが、どちらのテクニックを使うかをユーザーに伝えるガイダンスはほとんどない。本研究は,医学における5つの共通自然言語タスクにおけるSFTとDPOの性能の比較である。テキストデータの分類,数値データの分類,臨床推論,要約,臨床トリアージである。 SFTだけではテキストデータの分類に十分であるのに対し、DPOは、より複雑な臨床推論、要約、臨床トリアージのタスクのパフォーマンスを向上させる。本研究は,医療におけるDPO微調整の役割と重要性を確立し,この手法の普及を阻止する現在のソフトウェアギャップに注意を払っている。

Large Language Model (LLM) fine tuning is underutilized in the field of medicine. Two of the most common methods of fine tuning are Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO), but there is little guidance informing users when to use either technique. In this investigation, we compare the performance of SFT and DPO for five common natural language tasks in medicine: Classification with text data, Classification with numeric data, Clinical Reasoning, Summarization, and Clinical Triage. We find that SFT alone is sufficient for Classification with text data, whereas DPO improves performance for the more complex tasks of Clinical Reasoning, Summarization and Clinical Triage. Our results establish the role and importance of DPO fine tuning within medicine, and consequently call attention to current software gaps that prevent widespread deployment of this technique.

翻訳日:2024-11-07 13:34:43 公開日:2024-09-20

# デジタル双対再利用性向上のための添加物製造監視システムのドメイン適応に関する研究

Investigation on domain adaptation of additive manufacturing monitoring systems to enhance digital twin reusability ( http://arxiv.org/abs/2409.12785v2 )

ライセンス: Link先を確認

Jiarui Xie, Zhuo Yang, Chun-Chun Hu, Haw-Ching Yang, Yan Lu, Yaoyao Fiona Zhao,

(参考訳) パウダーベッド・フュージョン(PBF)は、複雑なジオメトリーの迅速な製造を可能にする新しい金属添加物製造(AM)技術である。しかし、細孔や発声などの欠陥が生じ、構造上の不整合が生じ、部品の機械的性能が損なわれる可能性がある。これは、一部の欠陥の性質がプロセス中に確率的であり、外部から見えないため、品質保証にとって重要な課題となっている。この問題に対処するために、機械学習(ML)ベースのモデリングを用いたデジタルツイン(DT)をAMプロセスの監視と制御のためにデプロイすることができる。メルトプールはプロセス監視において最もよく見られる物理現象の1つであり、通常は高速カメラによって行われる。ラベル付きおよび前処理後、メルトプール画像を使用して、プロセス異常検出や印刷品質評価などのDTアプリケーションのためのMLベースのモデルをトレーニングする。それでも、AMマシンや監視機器など、AM設定の幅広いばらつきのため、DTの再利用性は制限されている。ある設定から収集したデータセットを使用してトレーニングされたMLモデルのパフォーマンスは、通常、他の設定に適用した場合に損なわれる。本稿では,AM DTの再利用性を高めるため,異なるAM設定間の知識伝達パイプラインを提案する。ソースとターゲットのデータセットは、National Institute of Standards and TechnologyとNational Cheng Kung Universityから、異なるカメラ、材料、AMマシン、プロセスパラメータで収集されている。提案されたパイプラインは、データ前処理、データ拡張、ドメインアライメント、決定アライメントの4つのステップで構成されている。ソースデータセットのみを使用してトレーニングされたモデルと比較して、このパイプラインは、ターゲットデータセットからのラベル付きトレーニングデータなしで、メルトプール異常検出の精度を31%向上させた。

Powder bed fusion (PBF) is an emerging metal additive manufacturing (AM) technology that enables rapid fabrication of complex geometries. However, defects such as pores and balling may occur and lead to structural unconformities, thus compromising the mechanical performance of the part. This has become a critical challenge for quality assurance as the nature of some defects is stochastic during the process and invisible from the exterior. To address this issue, digital twin (DT) using machine learning (ML)-based modeling can be deployed for AM process monitoring and control. Melt pool is one of the most commonly observed physical phenomena for process monitoring, usually by high-speed cameras. Once labeled and preprocessed, the melt pool images are used to train ML-based models for DT applications such as process anomaly detection and print quality evaluation. Nonetheless, the reusability of DTs is restricted due to the wide variability of AM settings, including AM machines and monitoring instruments. The performance of the ML models trained using the dataset collected from one setting is usually compromised when applied to other settings. This paper proposes a knowledge transfer pipeline between different AM settings to enhance the reusability of AM DTs. The source and target datasets are collected from the National Institute of Standards and Technology and National Cheng Kung University with different cameras, materials, AM machines, and process parameters. The proposed pipeline consists of four steps: data preprocessing, data augmentation, domain alignment, and decision alignment. Compared with the model trained only using the source dataset, this pipeline increased the melt pool anomaly detection accuracy by 31% without any labeled training data from the target dataset.

翻訳日:2024-11-07 13:23:33 公開日:2024-09-20

# ROV-Extended Abstract を用いたバイオファウリングビルド状態推定のための自律的ビジュアルフィッシュペン検査

Autonomous Visual Fish Pen Inspections for Estimating the State of Biofouling Buildup Using ROV -- Extended Abstract ( http://arxiv.org/abs/2409.12813v2 )

ライセンス: Link先を確認

Matej Fabijanić, Nadir Kapetanović, Nikola Mišković,

(参考訳) 魚介類検査のプロセスは、小規模でも工業でも、どの魚養殖所でも必要なメンテナンス作業であり、完全に自動化される可能性のある作業である。自律的な海洋車両で定期的な検査を行う訓練されたダイバーをリプレースすることで、人力のコストを低減し、水中検査を行う人間に関連するリスクを取り除くことができる。このような自律性のレベルを達成することは、バイオファウル化ビルドの状態を推定できる画像処理アルゴリズムを開発することを意味する。本研究の目的は、ROVのための自律制御アルゴリズムの開発から、魚介類の画像の自動分割、バイオファウリング状態の正確な推定に至るまで、これらの検査プロセスを自動化するための完全なソリューションを提案することである。第1部は、市販のROVを音響SBL位置決めシステムで修正し、閉ループ制御システムを開発する。第2の部分は、画像セグメンテーションを行うためにAIに依存するバイオファウリング推定フレームワークを実装し、確立されたコンピュータビジョン手法を用いて画像を処理することにより、魚のケージからROVの距離を大まかに推定することで実現される。これには、トレーニング対象のセマンティックセグメンテーションを実行するニューラルネットワーク用のイメージデータセットを作成するためのラベルツールの開発も含まれていた。実験結果から, 自律ミッションに音響トランスポンダを装着したROVの有効性を示し, 良好な距離推定能力とともに, バイオファウリング推定フレームワークが正確な評価を行う能力を示した。その結果, 生物汚濁推定精度は養殖業での利用可能性を示している。

The process of fish cage inspections, which is a necessary maintenance task at any fish farm, be it small scale or industrial, is a task that has the potential to be fully automated. Replacing trained divers who perform regular inspections with autonomous marine vehicles would lower the costs of manpower and remove the risks associated with humans performing underwater inspections. Achieving such a level of autonomy implies developing an image processing algorithm that is capable of estimating the state of biofouling buildup. The aim of this work is to propose a complete solution for automating the said inspection process; from developing an autonomous control algorithm for an ROV, to automatically segmenting images of fish cages, and accurately estimating the state of biofouling. The first part is achieved by modifying a commercially available ROV with an acoustic SBL positioning system and developing a closed-loop control system. The second part is realized by implementing a proposed biofouling estimation framework, which relies on AI to perform image segmentation, and by processing images using established computer vision methods to obtain a rough estimate of the distance of the ROV from the fish cage. This also involved developing a labeling tool in order to create a dataset of images for the neural network performing the semantic segmentation to be trained on. The experimental results show the viability of using an ROV fitted with an acoustic transponder for autonomous missions, and demonstrate the biofouling estimation framework's ability to provide accurate assessments, alongside satisfactory distance estimation capabilities. In conclusion, the achieved biofouling estimation accuracy showcases clear potential for use in the aquaculture industry.

翻訳日:2024-11-07 13:23:33 公開日:2024-09-20

# スマートスケーリング: 小規模モデル初期化による大規模言語モデルの事前トレーニングの高速化

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization ( http://arxiv.org/abs/2409.12903v2 )

ライセンス: Link先を確認

Mohammad Samragh, Iman Mirzadeh, Keivan Alizadeh Vahid, Fartash Faghri, Minsik Cho, Moin Nabi, Devang Naik, Mehrdad Farajtabar,

(参考訳) 言語モデルの事前学習フェーズは、しばしばランダムに初期化パラメータから始まる。モデルスケーリングの現在のトレンドでは、大量のパラメータをトレーニングするのは、非常に遅くてコストがかかります。対照的に、小さな言語モデルは訓練に費用がかからないが、大きなモデルの精度を達成できないことが多い。本稿では,これら2つの制度を接続する興味深いアイデアを探求する。より小さな事前学習モデルを用いて,大規模言語モデルを初期化する手法を開発することができるか? このような初期化は、トレーニング時間と最終的な正確性という面で、何らかのメリットをもたらすのだろうか? 本稿では,事前学習した言語モデルのパラメータを,隠れ次元の増大した大規模モデルのパラメータに拡張する手法であるHyperCloningを紹介する。我々の手法は、より大きなモデルがより小さなモデルの機能を保っていることを保証します。結果として、より大きなモデルは、トレーニングを開始する前に、より小さなモデルの予測能力と精度をすでに継承している。このような初期化モデルをトレーニングすることで,大規模言語モデルの事前学習に必要なGPU時間を大幅に削減できることを実証する。

The pre-training phase of language models often begins with randomly initialized parameters. With the current trends in scaling models, training their large number of parameters can be extremely slow and costly. In contrast, small language models are less expensive to train, but they often cannot achieve the accuracy of large models. In this paper, we explore an intriguing idea to connect these two different regimes: Can we develop a method to initialize large language models using smaller pre-trained models? Will such initialization bring any benefits in terms of training time and final accuracy? In this paper, we introduce HyperCloning, a method that can expand the parameters of a pre-trained language model to those of a larger model with increased hidden dimensions. Our method ensures that the larger model retains the functionality of the smaller model. As a result, the larger model already inherits the predictive power and accuracy of the smaller model before the training starts. We demonstrate that training such an initialized model results in significant savings in terms of GPU hours required for pre-training large language models.

翻訳日:2024-11-07 12:59:09 公開日:2024-09-20

# CorBin-FL:共通ランダム性を用いた個人差分フェデレーション学習機構

CorBin-FL: A Differentially Private Federated Learning Mechanism using Common Randomness ( http://arxiv.org/abs/2409.13133v1 )

ライセンス: Link先を確認

Hojat Allah Salehi, Md Jueal Mia, S. Sandeep Pradhan, M. Hadi Amini, Farhad Shirani,

(参考訳) Federated Learning (FL)は、分散機械学習のための有望なフレームワークとして登場した。複数のクライアント間の協調学習を可能にし、分散データとコンピューティングリソースを活用する。しかし、FLはプライバシー保証、通信効率、全体的なモデル精度のバランスをとることの課題に直面している。本研究では,モデル全体の精度を維持しつつ,相関二項確率量子化を用いて差分プライバシーを実現するプライバシメカニズムであるCorBin-FLを紹介する。このアプローチでは、セキュアなマルチパーティ計算技術を使用して、クライアントが個々のプライバシを損なうことなく、ローカルモデル更新の相関量子化を行うことができる。我々は,CorBin-FLがパラメータレベルの局所差分プライバシー(PLDP)を達成すること,および平均二乗誤差ユーティリティ尺度とPLDPプライバシー尺度との間のプライバシー効用トレードオフを漸近的に最適化することを示す理論的解析を行った。さらに,PLDPに加えて,ユーザレベルおよびサンプルレベルの中央差分プライバシー保証を実現する拡張であるAugCorBin-FLを提案する。両方のメカニズムに対して、プライバシパラメータと平均2乗誤差性能測定値のバウンダリを導出する。 MNISTとCIFAR10データセットの大規模な実験により、我々のメカニズムは、同一のPLDPプライバシー予算の下でモデル精度の点で、ガウスとラプラシアのメカニズムを含む既存の微分プライベートFLメカニズムよりも優れていることが示された。

Federated learning (FL) has emerged as a promising framework for distributed machine learning. It enables collaborative learning among multiple clients, utilizing distributed data and computing resources. However, FL faces challenges in balancing privacy guarantees, communication efficiency, and overall model accuracy. In this work, we introduce CorBin-FL, a privacy mechanism that uses correlated binary stochastic quantization to achieve differential privacy while maintaining overall model accuracy. The approach uses secure multi-party computation techniques to enable clients to perform correlated quantization of their local model updates without compromising individual privacy. We provide theoretical analysis showing that CorBin-FL achieves parameter-level local differential privacy (PLDP), and that it asymptotically optimizes the privacy-utility trade-off between the mean square error utility measure and the PLDP privacy measure. We further propose AugCorBin-FL, an extension that, in addition to PLDP, achieves user-level and sample-level central differential privacy guarantees. For both mechanisms, we derive bounds on privacy parameters and mean squared error performance measures. Extensive experiments on MNIST and CIFAR10 datasets demonstrate that our mechanisms outperform existing differentially private FL mechanisms, including Gaussian and Laplacian mechanisms, in terms of model accuracy under equal PLDP privacy budgets.

翻訳日:2024-11-07 11:52:12 公開日:2024-09-20

# ラベルマスキング蒸留によるフェデレートラーニング

Federated Learning with Label-Masking Distillation ( http://arxiv.org/abs/2409.13136v1 )

ライセンス: Link先を確認

Jianghu Lu, Shikun Li, Kexin Bao, Pengju Wang, Zhenxing Qian, Shiming Ge,

(参考訳) フェデレーション学習は、グローバルサーバの調整を通じて、複数のローカルクライアントに分散したデータ上でモデルを協調的にトレーニングするための、プライバシ保護の方法を提供する。本稿では,クライアントのユーザ行動が異なるため,異なるクライアント間のラベル分布が著しく異なる,フェデレート学習におけるラベル分布スキューに着目した。このようなケースに直面した場合、ほとんどの既存手法は、クライアントにおけるラベル分布情報の不十分な利用により、最適以下に最適化される。そこで我々は,FedLMDと呼ばれるラベルマスキング蒸留手法を提案し,各クライアントのラベル分布を知覚することで,フェデレーション学習を容易にする。トレーニング中のクラス毎のサンプル数に基づいて、ラベルを多数と少数に分類する。クライアントモデルは、ローカルデータから大多数のラベルの知識を学習する。蒸留のプロセスは、グローバルモデルから大多数のラベルの予測を隠蔽し、クライアントのマイノリティなラベル知識の保存に集中できるようにします。一連の実験により, 提案手法は様々なケースで最先端の性能を達成できることが示されている。さらに,クライアントの限られたリソースを考慮し,計算コストを増大させることなく,従来の軽量なアプローチよりも優れた教師を必要としないFedLMD-Tfを提案する。私たちのコードはhttps://github.com/wnma3mz/FedLMDで利用可能です。

Federated learning provides a privacy-preserving manner to collaboratively train models on data distributed over multiple local clients via the coordination of a global server. In this paper, we focus on label distribution skew in federated learning, where due to the different user behavior of the client, label distributions between different clients are significantly different. When faced with such cases, most existing methods will lead to a suboptimal optimization due to the inadequate utilization of label distribution information in clients. Inspired by this, we propose a label-masking distillation approach termed FedLMD to facilitate federated learning via perceiving the various label distributions of each client. We classify the labels into majority and minority labels based on the number of examples per class during training. The client model learns the knowledge of majority labels from local data. The process of distillation masks out the predictions of majority labels from the global model, so that it can focus more on preserving the minority label knowledge of the client. A series of experiments show that the proposed approach can achieve state-of-the-art performance in various cases. Moreover, considering the limited resources of the clients, we propose a variant FedLMD-Tf that does not require an additional teacher, which outperforms previous lightweight approaches without increasing computational costs. Our code is available at https://github.com/wnma3mz/FedLMD.

翻訳日:2024-11-07 11:52:12 公開日:2024-09-20

# リラベル蒸留による深部ネットワーク予測の解釈

Interpret the Predictions of Deep Networks via Re-Label Distillation ( http://arxiv.org/abs/2409.13137v1 )

ライセンス: Link先を確認

Yingying Hua, Shiming Ge, Daichi Zhang,

(参考訳) ブラックボックスのディープネットワークの予測を解釈することで、デプロイメントの信頼性が向上する。本研究では,入力から予測への直接写像を自己超越的に学習するための再ラベル蒸留手法を提案する。画像はVAEサブスペースに投影され、潜在ベクトルをランダムに摂動させることで、いくつかの合成画像を生成する。そして、これらの合成画像は、2つのクラスのうちの1つにアノテートすることができる。その後、ディープネットワークで注釈付けされたラベルを教師として使用し、これらの合成画像をクラスにマッピングすることで、アノテーションを近似する線形学生モデルを訓練する。このようにして、これらの再ラベルされた合成画像はディープネットワークの局所的な分類機構をうまく記述することができ、学習した学生は予測に対してより直感的な説明を提供することができる。本手法の有効性を質的,定量的に検証した。

Interpreting the predictions of a black-box deep network can facilitate the reliability of its deployment. In this work, we propose a re-label distillation approach to learn a direct map from the input to the prediction in a self-supervision manner. The image is projected into a VAE subspace to generate some synthetic images by randomly perturbing its latent vector. Then, these synthetic images can be annotated into one of two classes by identifying whether their labels shift. After that, using the labels annotated by the deep network as teacher, a linear student model is trained to approximate the annotations by mapping these synthetic images to the classes. In this manner, these re-labeled synthetic images can well describe the local classification mechanism of the deep network, and the learned student can provide a more intuitive explanation towards the predictions. Extensive experiments verify the effectiveness of our approach qualitatively and quantitatively.

翻訳日:2024-11-07 11:52:12 公開日:2024-09-20

# 高レベル合成のためのハードウェア設計の比較学習

Learning to Compare Hardware Designs for High-Level Synthesis ( http://arxiv.org/abs/2409.13138v1 )

ライセンス: Link先を確認

Yunsheng Bai, Atefeh Sohrabizadeh, Zijian Ding, Rongjian Liang, Weikai Li, Ding Wang, Haoxing Ren, Yizhou Sun, Jason Cong,

(参考訳) 高レベル合成(HLS)は、高レベルコードをハードウェア設計に変換する自動設計プロセスであり、ハードウェアアクセラレーションの迅速な開発を可能にする。 HLSはソースコードに挿入されたディレクティブであるプラグマに依存しており、プラグマは様々な設定と値を持ち、結果として生じるハードウェア設計に大きな影響を及ぼす。 HARPのような最先端のMLベースのHLSメソッドは、まず、ソースコードとプラグマのグラフベースの表現に適用されるグラフニューラルネットワーク(GNN)に基づいて、ディープラーニングモデルを訓練する。その後、設計空間探索(DSE)を行い、プラグマ設計空間を探索し、モデルを用いて候補設計をランク付けし、トップデザインを返却する。しかし、従来のDSE手法は、プラグマ設定とパフォーマンスメトリクスの非常に非線形な関係と、非回避的な方法でパフォーマンスに影響を与えるプラグマ間の複雑な相互作用により、課題に直面している。これらの課題に対処するために,ハードウェア設計を比較して効率的なHLS最適化を行う新しいアプローチである compareXplore を提案する。 CompareXploreは、ペアワイズな選好学習とポイントワイズなパフォーマンス予測を組み合わせたハイブリッドな損失関数を導入し、モデルが相対的な選好と絶対的なパフォーマンスの両方をキャプチャできるようにする。さらに,設計間の最も情報的な差異に着目した新しいノード差注意モジュールを導入し,性能に影響を及ぼす致命的なプラグマを同定する。 CompareXploreは2段階のDSEを採用しており、初期設計プルーニングにポイントワイズ予測モデルが使用され、その後、正確な性能検証のためのペアワイズ比較ステージが採用されている。大規模な実験では、ComparXploreはランキングの指標を大幅に改善し、選択した設計に対して高品質なHLS結果を生成し、既存のSOTA法よりも優れている。

High-level synthesis (HLS) is an automated design process that transforms high-level code into hardware designs, enabling the rapid development of hardware accelerators. HLS relies on pragmas, which are directives inserted into the source code to guide the synthesis process, and pragmas have various settings and values that significantly impact the resulting hardware design. State-of-the-art ML-based HLS methods, such as HARP, first train a deep learning model, typically based on graph neural networks (GNNs) applied to graph-based representations of the source code and pragmas. They then perform design space exploration (DSE) to explore the pragma design space, rank candidate designs using the model, and return the top designs. However, traditional DSE methods face challenges due to the highly nonlinear relationship between pragma settings and performance metrics, along with complex interactions between pragmas that affect performance in non-obvious ways. To address these challenges, we propose compareXplore, a novel approach that learns to compare hardware designs for effective HLS optimization. CompareXplore introduces a hybrid loss function that combines pairwise preference learning with pointwise performance prediction, enabling the model to capture both relative preferences and absolute performance. Moreover, we introduce a novel node difference attention module that focuses on the most informative differences between designs, enabling the model to identify critical pragmas impacting performance. CompareXplore adopts a two-stage DSE, where a pointwise prediction model is used for the initial design pruning, followed by a pairwise comparison stage for precise performance verification. In extensive experiments, compareXplore achieves significant improvements in ranking metrics and generates high-quality HLS results for the selected designs, outperforming the existing SOTA method.

翻訳日:2024-11-07 11:52:12 公開日:2024-09-20

# G-Fuzz: gVisor用の直接ファジィフレームワーク

G-Fuzz: A Directed Fuzzing Framework for gVisor ( http://arxiv.org/abs/2409.13139v1 )

ライセンス: Link先を確認

Yuwei Li, Yuan Chen, Shouling Ji, Xuhong Zhang, Guanglu Yan, Alex X. Liu, Chunming Wu, Zulie Pan, Peng Lin,

(参考訳) gVisorは、Googleが公開しているコンテナ用のアプリケーションレベルのカーネルである。 gVisorは軽量で分離性も高いため、多くのIT企業で広く使用されている。上流のgVisorの新しい脆弱性が見つかると、下流の開発者が対応するコードをテストしてセキュリティを維持することが重要になる。この目的を達成するために、誘導ファジィングは有望である。それにもかかわらず、gVisorに既存の有向ファジィ法を適用するには多くの課題がある。主な理由は、既存の有向ファザは主にC/C++アプリケーション用であり、gVisorはGo言語で記述されたOSカーネルであるからである。上記の課題に対処するため,gVisor用のファジィフレームワークであるG-Fuzzを提案する。 G-Fuzzには、3つのコアメソッドがあり、軽量できめ細かな距離計算、ターゲットと関連するsyscall推論と利用、探索と利用の動的スイッチがある。 G-Fuzzのメソッドは一般的なもので、他のOSカーネルに転送できる。我々はG-Fuzzの性能を評価するために広範囲な実験を行った。 Syzkaller と比較すると、最先端のカーネルファジターである G-Fuzz は性能を著しく上回っている。さらに,G-Fuzzの各コア法の重要性を厳格に評価した。 G-Fuzzは業界に展開され、深刻な脆弱性を複数発見している。

gVisor is a Google-published application-level kernel for containers. As gVisor is lightweight and has sound isolation, it has been widely used in many IT enterprises \cite{Stripe, DigitalOcean, Cloundflare}. When a new vulnerability of the upstream gVisor is found, it is important for the downstream developers to test the corresponding code to maintain the security. To achieve this aim, directed fuzzing is promising. Nevertheless, there are many challenges in applying existing directed fuzzing methods for gVisor. The core reason is that existing directed fuzzers are mainly for general C/C++ applications, while gVisor is an OS kernel written in the Go language. To address the above challenges, we propose G-Fuzz, a directed fuzzing framework for gVisor. There are three core methods in G-Fuzz, including lightweight and fine-grained distance calculation, target related syscall inference and utilization, and exploration and exploitation dynamic switch. Note that the methods of G-Fuzz are general and can be transferred to other OS kernels. We conduct extensive experiments to evaluate the performance of G-Fuzz. Compared to Syzkaller, the state-of-the-art kernel fuzzer, G-Fuzz outperforms it significantly. Furthermore, we have rigorously evaluated the importance for each core method of G-Fuzz. G-Fuzz has been deployed in industry and has detected multiple serious vulnerabilities.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# スコアベースマルチビームポイントクラウドデノイング

Score-Based Multibeam Point Cloud Denoising ( http://arxiv.org/abs/2409.13143v1 )

ライセンス: Link先を確認

Li Ling, Yiping Xie, Nils Bore, John Folkesson,

(参考訳) MBES (Multibeam echo-sounder) はバスメータマッピングのためのデファクトセンサである。近年、安価なMBESセンサーとグローバルマッピングイニシアチブは、利用可能なデータの指数関数的な成長をもたらしている。しかし、生のMBESデータには半自動フィルタリングを必要とするノイズが1-25%含まれており、Cheld UncertaintyやBathymetric Estimator(CUBE)などのツールが使用されている。本研究では,3Dポイントクラウドコミュニティからインスピレーションを得て,スコアベースのポイントクラウドデノナイジングネットワークをMBESのアウトレイラ検出とデノナイジングに応用した。我々は,実際のMBES調査データに基づいて,このネットワークを訓練し,評価した。提案手法は従来の手法よりも優れており,既存のMBES標準ワークフローに容易に組み込むことができる。将来の研究を促進するために、コードと事前訓練されたモデルはオンラインで利用可能である。

Multibeam echo-sounder (MBES) is the de-facto sensor for bathymetry mapping. In recent years, cheaper MBES sensors and global mapping initiatives have led to exponential growth of available data. However, raw MBES data contains 1-25% of noise that requires semi-automatic filtering using tools such as Combined Uncertainty and Bathymetric Estimator (CUBE). In this work, we draw inspirations from the 3D point cloud community and adapted a score-based point cloud denoising network for MBES outlier detection and denoising. We trained and evaluated this network on real MBES survey data. The proposed method was found to outperform classical methods, and can be readily integrated into existing MBES standard workflow. To facilitate future research, the code and pretrained model are available online.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# GASA-UNet:3次元医用画像分割のためのグローバル軸自己注意U-Net

GASA-UNet: Global Axial Self-Attention U-Net for 3D Medical Image Segmentation ( http://arxiv.org/abs/2409.13146v1 )

ライセンス: Link先を確認

Chengkun Sun, Russell Stevens Terry, Jiang Bian, Jie Xu,

(参考訳) 複数の臓器の正確なセグメンテーションと画像診断における病理組織の分化は極めて重要であるが、特にニュアンスド分類や曖昧な臓器の境界については困難である。これらの課題に対処するために,GASA-UNetを導入した。このブロックは、異なる解剖学的断面を表す各2次元平面で、画像データを3次元実体として処理する。この空間的文脈内ではVoxelの特徴が定義され、抽出した1DパッチにMHSA(Multi-Head Self-Attention)機構を利用してこれらの平面間の接続を容易にする。位置埋め込み (PE) は我々の注目の枠組みに組み込まれ, 空間的文脈でボクセルの特徴を豊かにし, 組織分類と臓器縁のデライン化を強化した。我々のモデルは, BTCV, AMOS, KiTS23の3つのベンチマークデータセット上で, Diceスコアと正規化表面Dice (NSD) を用いて, より小さな解剖学的構造に対して, セグメンテーション性能の有望な改善を実証した。

Accurate segmentation of multiple organs and the differentiation of pathological tissues in medical imaging are crucial but challenging, especially for nuanced classifications and ambiguous organ boundaries. To tackle these challenges, we introduce GASA-UNet, a refined U-Net-like model featuring a novel Global Axial Self-Attention (GASA) block. This block processes image data as a 3D entity, with each 2D plane representing a different anatomical cross-section. Voxel features are defined within this spatial context, and a Multi-Head Self-Attention (MHSA) mechanism is utilized on extracted 1D patches to facilitate connections across these planes. Positional embeddings (PE) are incorporated into our attention framework, enriching voxel features with spatial context and enhancing tissue classification and organ edge delineation. Our model has demonstrated promising improvements in segmentation performance, particularly for smaller anatomical structures, as evidenced by enhanced Dice scores and Normalized Surface Dice (NSD) on three benchmark datasets, i.e., BTCV, AMOS, and KiTS23.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# QSVMにおける量子カーネルのアンザッツにおける特徴埋め込み配置の影響

The Impact of Feature Embedding Placement in the Ansatz of a Quantum Kernel in QSVMs ( http://arxiv.org/abs/2409.13147v1 )

ライセンス: Link先を確認

Ilmo Salmenperä, Ilmars Kuhtarskis, Arianne Meijer van de Griend, Jukka K. Nurminen,

(参考訳) 量子カーネルの有用な機能マップを設計することは、古典的な機械学習モデルに対するアドバンテージを達成するための重要なタスクである。回路アーキテクチャの選択、すなわち、機能依存ゲートが他のゲートとどのように織り交ぜられるかは、比較的未解明の問題であり、量子埋め込みカーネル(QEK)と呼ばれる量子カーネルのモデルを使用する場合、非常に重要である。我々は,QEKにおける様々なアーキテクチャパターンを研究,分類し,既存のアーキテクチャスタイルが文献が想定しているように振る舞わないことを示す。また、古いものに基づいた新しい代替アーキテクチャも作成し、古いものよりも少ないゲートを含む一方で、同等に機能することを示す。

Designing a useful feature map for a quantum kernel is a critical task when attempting to achieve an advantage over classical machine learning models. The choice of circuit architecture, i.e. how feature-dependent gates should be interwoven with other gates is a relatively unexplored problem and becomes very important when using a model of quantum kernels called Quantum Embedding Kernels (QEK). We study and categorize various architectural patterns in QEKs and show that existing architectural styles do not behave as the literature supposes. We also produce a novel alternative architecture based on the old ones and show that it performs equally well while containing fewer gates than its older counterparts.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# UniTabNet: テーブル構造認識のためのブリッジングビジョンと言語モデル

UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition ( http://arxiv.org/abs/2409.13148v1 )

ライセンス: Link先を確認

Zhenrong Zhang, Shuhang Liu, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Yu Hu,

(参考訳) デジタル時代には、テーブル構造認識技術は大量の表データを処理するための重要なツールである。従来の手法は主に表構造回復の視覚的側面に焦点を当てていたが、表内のテキスト意味論、特に記述的なテキスト細胞を効果的に理解できない場合が多い。本稿では,画像・テキストモデルに基づくテーブル構造解析のための新しいフレームワークUniTabNetを提案する。 UniTabNetは‘divide-and-conquer’戦略を採用し、画像とテキストのモデルを使ってテーブルセルを分離し、物理デコーダと論理デコーダを統合して完全なテーブル構造を再構築する。我々は、モデルが関連する領域に焦点を向け、予測精度を高めるビジョンガイドにより、我々のフレームワークをさらに強化する。さらに,テーブルイメージのテクスチャ意味を理解するためのモデル機能を改善するために,Language Guiderを導入する。 PubTabNet、PubTables1M、WTW、iFLYTABなどの卓越したテーブル構造データセットに基づいて、UniTabNetは、新しい最先端のパフォーマンスを実現し、我々のアプローチの有効性を実証する。コードは一般公開される予定だ。

In the digital era, table structure recognition technology is a critical tool for processing and analyzing large volumes of tabular data. Previous methods primarily focus on visual aspects of table structure recovery but often fail to effectively comprehend the textual semantics within tables, particularly for descriptive textual cells. In this paper, we introduce UniTabNet, a novel framework for table structure parsing based on the image-to-text model. UniTabNet employs a ``divide-and-conquer'' strategy, utilizing an image-to-text model to decouple table cells and integrating both physical and logical decoders to reconstruct the complete table structure. We further enhance our framework with the Vision Guider, which directs the model's focus towards pertinent areas, thereby boosting prediction accuracy. Additionally, we introduce the Language Guider to refine the model's capability to understand textual semantics in table images. Evaluated on prominent table structure datasets such as PubTabNet, PubTables1M, WTW, and iFLYTAB, UniTabNet achieves a new state-of-the-art performance, demonstrating the efficacy of our approach. The code will also be made publicly available.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# PIXERによる視覚情報ユーティリティの学習

Learning Visual Information Utility with PIXER ( http://arxiv.org/abs/2409.13151v1 )

ライセンス: Link先を確認

Yash Turkar, Timothy Chase Jr, Christo Aluckal, Karthik Dantu,

(参考訳) 正確な特徴検出は、自律ロボット工学、3D再構成、医療画像、リモートセンシングなど、様々なコンピュータビジョンタスクに欠かせない。視覚特徴の堅牢性向上の進歩にもかかわらず、特定の特徴型アルゴリズムによって処理される前の視覚情報の有用性を計測する手法は存在しない。このギャップに対処するために,PIXER と "Featureness" の概念を導入する。ベイズ学習の一般化を活用することで,モンテカルロサンプリングのようなコストのかかる操作を回避し,広範囲のアプリケーションに適応可能なカスタマイズ可能な特徴定義を許容し,画素の高機能化への寄与の確率と不確実性の両方を定量化する。 PIXERを特徴量選択性のある視覚的オドメトリーで評価し, RMSE軌道における平均31%の改善を実現し, 特徴量が49%減少した。

Accurate feature detection is fundamental for various computer vision tasks, including autonomous robotics, 3D reconstruction, medical imaging, and remote sensing. Despite advancements in enhancing the robustness of visual features, no existing method measures the utility of visual information before processing by specific feature-type algorithms. To address this gap, we introduce PIXER and the concept of "Featureness," which reflects the inherent interest and reliability of visual information for robust recognition, independent of any specific feature type. Leveraging a generalization on Bayesian learning, our approach quantifies both the probability and uncertainty of a pixel's contribution to robust visual utility in a single-shot process, avoiding costly operations such as Monte Carlo sampling and permitting customizable featureness definitions adaptable to a wide range of applications. We evaluate PIXER on visual odometry with featureness selectivity, achieving an average of 31% improvement in RMSE trajectory with 49% fewer features.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# スキップ接続を超えて: 特異点除去のためのプールとアンプール設計

Beyond Skip Connection: Pooling and Unpooling Design for Elimination Singularities ( http://arxiv.org/abs/2409.13154v1 )

ライセンス: Link先を確認

Chengkun Sun, Jinqian Pan, Juoli Jin, Russell Stevens Terry, Jiang Bian, Jie Xu,

(参考訳) 深層畳み込みニューラルネットワーク(CNN)のトレーニングでは、除去特異点の広範的問題、損失ランドスケープ内の退化多様体につながるノードの一貫した非活性化など、ユニークな課題が提示されている。これらの特異性は、特徴伝播を妨害することで効率的な学習を妨げる。これを軽減するために、私たちは、Max Pooling、Max Unpooling、3倍の畳み込み、スキップ接続を戦略的に組み合わせたアーキテクチャ拡張であるPool Skipを紹介します。この構成は、トレーニングプロセスを安定化し、レイヤ間の機能の整合性を維持するのに役立つ。また, プールスキップの発達を支える重み慣性仮説を提案し, 次元およびアフィン補償による除去特異性に起因する劣化の緩和に関する理論的知見を提供する。本手法は,分類やセグメンテーションなどのタスクを含む2次元の自然画像と3次元の医用画像の両方に焦点をあてて,様々なベンチマークで評価する。以上の結果から,より堅牢なCNNトレーニングとモデル性能向上を目的としたPool Skipの有効性が示唆された。

Training deep Convolutional Neural Networks (CNNs) presents unique challenges, including the pervasive issue of elimination singularities, consistent deactivation of nodes leading to degenerate manifolds within the loss landscape. These singularities impede efficient learning by disrupting feature propagation. To mitigate this, we introduce Pool Skip, an architectural enhancement that strategically combines a Max Pooling, a Max Unpooling, a 3 times 3 convolution, and a skip connection. This configuration helps stabilize the training process and maintain feature integrity across layers. We also propose the Weight Inertia hypothesis, which underpins the development of Pool Skip, providing theoretical insights into mitigating degradation caused by elimination singularities through dimensional and affine compensation. We evaluate our method on a variety of benchmarks, focusing on both 2D natural and 3D medical imaging applications, including tasks such as classification and segmentation. Our findings highlight Pool Skip's effectiveness in facilitating more robust CNN training and improving model performance.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# 局所更新による分散適応最適化の収束性

Convergence of Distributed Adaptive Optimization with Local Updates ( http://arxiv.org/abs/2409.13155v1 )

ライセンス: Link先を確認

Ziheng Cheng, Margalit Glasgow,

(参考訳) 本稿では,局所的な更新(間欠的通信)を用いた分散適応アルゴリズムについて検討する。現代の機械学習モデルの分散トレーニングにおける適応的手法の実証的成功にもかかわらず、適応的手法における局所的更新の理論的利点、特に通信複雑性の低減の観点からはまだ完全には理解されていない。本稿では,運動量を持つ \em Local SGD \em (\em Local \em SGDM) と \em Local \em Adam がそれぞれ凸および弱凸設定でミニバッチよりも優れていることを示す。これは局所的な更新の利点を示す重要なステップであるが、一般化された滑らかさ仮定と勾配クリッピングの下では困難である。

We study distributed adaptive algorithms with local updates (intermittent communication). Despite the great empirical success of adaptive methods in distributed training of modern machine learning models, the theoretical benefits of local updates within adaptive methods, particularly in terms of reducing communication complexity, have not been fully understood yet. In this paper, we prove that \em Local SGD \em with momentum (\em Local \em SGDM) and \em Local \em Adam can outperform their minibatch counterparts in convex and weakly convex settings, respectively. Our analysis relies on a novel technique to prove contraction during local iterations, which is a crucial but challenging step to show the advantages of local updates, under generalized smoothness assumption and gradient clipping.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# RRM:ロバスト・リワードモデルトレーニングは、リワードハッキングを緩和する

RRM: Robust Reward Model Training Mitigates Reward Hacking ( http://arxiv.org/abs/2409.13156v1 )

ライセンス: Link先を確認

Tianqi Liu, Wei Xiong, Jie Ren, Lichang Chen, Junru Wu, Rishabh Joshi, Yang Gao, Jiaming Shen, Zhen Qin, Tianhe Yu, Daniel Sohn, Anastasiia Makarova, Jeremiah Liu, Yuan Liu, Bilal Piot, Abe Ittycheriah, Aviral Kumar, Mohammad Saleh,

(参考訳) リワードモデル(RM)は、大きな言語モデル(LLM)と人間の嗜好の整合において重要な役割を果たす。しかし、特定のプロンプトに結びついたレスポンスペアに依存する従来のRMトレーニングでは、応答長やフォーマットなど、プロンプト非依存のアーティファクトからプロンプト駆動の好みを遠ざけるのに苦労している。本研究では,従来のRMトレーニング手法の基本的制限を明らかにするとともに,好みを決定する際に,RMがコンテキスト信号と無関係なアーティファクトを効果的に区別することができないことを示す。そこで本稿では,これらのアーティファクトに依存しない好みを学習する因果的枠組みを導入し,それらを排除するために設計された新しいデータ拡張手法を提案する。大規模な実験により,提案手法は望ましくないアーティファクトをフィルタし,より堅牢な報酬モデル(RRM)を実現することができた。我々のRRMは、RewardBench上でGemma-2-9b-itでトレーニングされたペアワイズ報酬モデルの性能を改善し、精度を80.61%から84.15%に向上させる。さらに、RMとRRMの両方を用いて2つのDPOポリシーを訓練し、RTMがDPOポリシーを大幅に強化し、MT-Benchスコアが7.27から8.31に、AlpacaEval-2が33.46%から52.49%に改善したことを示す。

Reward models (RMs) play a pivotal role in aligning large language models (LLMs) with human preferences. However, traditional RM training, which relies on response pairs tied to specific prompts, struggles to disentangle prompt-driven preferences from prompt-independent artifacts, such as response length and format. In this work, we expose a fundamental limitation of current RM training methods, where RMs fail to effectively distinguish between contextual signals and irrelevant artifacts when determining preferences. To address this, we introduce a causal framework that learns preferences independent of these artifacts and propose a novel data augmentation technique designed to eliminate them. Extensive experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model (RRM). Our RRM improves the performance of a pairwise reward model trained on Gemma-2-9b-it, on RewardBench, increasing accuracy from 80.61% to 84.15%. Additionally, we train two DPO policies using both the RM and RRM, demonstrating that the RRM significantly enhances DPO-aligned policies, improving MT-Bench scores from 7.27 to 8.31 and length-controlled win-rates in AlpacaEval-2 from 33.46% to 52.49%.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# 仮想現実感のための高忠実マスクレスニューラルサーフェス再構成

High-Fidelity Mask-free Neural Surface Reconstruction for Virtual Reality ( http://arxiv.org/abs/2409.13158v1 )

ライセンス: Link先を確認

Haotian Bai, Yize Chen, Lin Wang,

(参考訳) 多視点画像からのオブジェクト中心の表面再構成は、AR/VRのための編集可能なデジタルアセットを作成する上で重要である。幾何学的制約が欠如しているため、既存の方法、例えばNeuSはメッシュ処理でコンパクトな表面を再構築するためにオブジェクトマスクに注釈を付ける必要がある。しかし、マスクの注釈は、その厄介な性質のためにかなりの労働コストをもたらしている。本稿では,多視点オブジェクトマスクを使わずにコンパクトかつ正確な表面を復元することを目的とした,ニューラル暗黙表面再構成のための新しいレンダリングベースフレームワークであるHi-NeuSを提案する。私たちの重要な洞察は、オブジェクト中心のビューの重なり合う領域は、カメラがオブジェクトの周りを周回するときに、自然に関心の対象を浮き彫りにするということです。興味の対象は、複数のビューから蓄積されたレンダリング重量の分布を推定することで特定できる。これにより、多視点レンダリングウェイトを用いて、ニューラルネットワークの符号付き距離関数(SDF)を自己監督的にガイドする幾何学的洗練手法が考案される。具体的には、これらの重みを保ち、それらの分布に基づいて擬似表面を再サンプリングする。これにより、SDFと関心の対象とのアライメントが容易になる。次に、幾何整合性に対するSDFのバイアスを正則化する。さらに, より正確な評価のために, ポストプロセッシングを行わずに, 抽出したメッシュを計測するためにアンマスクド・チャンファー距離(CD)を用いることを提案する。我々のアプローチはNeuSとその変種であるNeuangeloを通じて検証され、異なるNeuSバックボーン間の適応性を実証した。 DTUデータセットの広範囲なベンチマークにより,本手法は表面ノイズを約20%低減し,未加工のCDを約30%改善し,表面の細部を改良した。 Hi-NeuSの優位性はさらに、BlendedMVSとハンドヘルドカメラによるコンテンツ作成に有効である。

Object-centric surface reconstruction from multi-view images is crucial in creating editable digital assets for AR/VR. Due to the lack of geometric constraints, existing methods, e.g., NeuS necessitate annotating the object masks to reconstruct compact surfaces in mesh processing. Mask annotation, however, incurs considerable labor costs due to its cumbersome nature. This paper presents Hi-NeuS, a novel rendering-based framework for neural implicit surface reconstruction, aiming to recover compact and precise surfaces without multi-view object masks. Our key insight is that the overlapping regions in the object-centric views naturally highlight the object of interest as the camera orbits around objects. The object of interest can be specified by estimating the distribution of the rendering weights accumulated from multiple views, which implicitly identifies the surface that a user intends to capture. This inspires us to design a geometric refinement approach, which takes multi-view rendering weights to guide the signed distance functions (SDF) of neural surfaces in a self-supervised manner. Specifically, it retains these weights to resample a pseudo surface based on their distribution. This facilitates the alignment of the SDF to the object of interest. We then regularize the SDF's bias for geometric consistency. Moreover, we propose to use unmasked Chamfer Distance(CD) to measure the extracted mesh without post-processing for more precise evaluation. Our approach has been validated through NeuS and its variant Neuralangelo, demonstrating its adaptability across different NeuS backbones. Extensive benchmark on the DTU dataset shows that our method reduces surface noise by about 20%, and improves the unmasked CD by around 30%, achieving better surface details. The superiority of Hi-NeuS is further validated on BlendedMVS and handheld camera captures for content creation.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# アフリカの未来を守る : タンザニアにおける子どもの安全、学習、技能獲得のためのサイバーセキュリティ戦略

Protecting Africa's Future: Cybersecurity Strategies for Child Safety, Learning, and Skill Acquisition in Tanzania ( http://arxiv.org/abs/2409.13159v1 )

ライセンス: Link先を確認

Ezekia Gilliard, Abdul Maziko, Gideon Rwechungura, Ahmed Abubakar Aliyu, Erasto Kayumbe,

(参考訳) 今日、アフリカの子どもたちはインターネットからリスクが増している。危険物には有害なコンテンツ、暴力、搾取、虐待、無視が含まれる。これらすべてがモバイルとインターネットのテクノロジー利用の増加によって増加しており、安全を危険にさらすだけでなく、将来に必要なスキルを学ぶ能力にも影響を与えている。本稿では,第3世界のアフリカ諸国が直面している,子どものオンライン安全性の確保と,その発達ニーズを支える上での課題について概説する。これは、子供たちをオンラインの脅威から保護し、デジタルリテラシーを強化するために他国が採用した効果的な慣行と政策を強調している。我々は、他の国が児童虐待から保護し、デジタル世界で成功するためのベストプラクティスと政策を共有することに重点を置いています。この研究は、UNICEFや国連などの組織との国際協力の重要性とともに、タンザニア共和国特有のオンライン安全戦略、法的枠組み、レコメンデーションを強調している。アフリカの政策立案者、教育者、サイバーセキュリティの専門家に実践的なガイダンスと勧告を提供し、大陸内外での子供のオンライン安全活動を強化することを目的としている。

Today, children across Africa are at a growing risk from the Internet. Dangers include harmful content, violence, exploitation, abuse, and neglect. All these have increased due to increased mobile and Internet technology use, which not only places their safety at risk but also affects their ability to learn essential skills for their future. This paper provides an overview of the unique challenges faced by third-world African countries in ensuring the online safety of children while also supporting their developmental needs. It highlights effective practices and policies adopted by other nations to safeguard children from online threats and enhance their digital literacy. We are focusing on sharing the best practices and policies other countries have used to protect children from abuse and help them succeed in the digital world. The study emphasizes the online safety strategies, legal frameworks, and recommendations specific to the United Republic of Tanzania, along with the significance of international collaborations with organizations like UNICEF and the UN. The goal is to provide African policymakers, educators, and cybersecurity professionals with practical guidance and recommendations to strengthen child online safety initiatives both within and beyond the continent.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# Zero-shot Point Cloud Anomaly Detectionに向けて:マルチビュープロジェクションフレームワーク

Towards Zero-shot Point Cloud Anomaly Detection: A Multi-View Projection Framework ( http://arxiv.org/abs/2409.13162v1 )

ライセンス: Link先を確認

Yuqi Cheng, Yunkang Cao, Guoyang Xie, Zhichao Lu, Weiming Shen,

(参考訳) ポイントクラウド内の異常を検出することは、様々な産業アプリケーションにとって重要であるが、従来の教師なしの手法は、データ取得コスト、初期生産制約、製品カテゴリ間の限定的な一般化による課題に直面している。これらの課題を克服するために、トレーニング済みのビジョンランゲージモデル(VLM)を利用して異常を検出する、Multi-View Projection(MVP)フレームワークを導入する。具体的には、MVPは、クラウドデータを多視点深度画像に向けることで、ポイントクラウド異常検出をイメージ異常検出に変換する。ゼロショット画像異常検出法に続いて、予め訓練したVLMを用いて、これらの深度画像上の異常を検出する。事前学習されたVLMは、本質的にゼロショット点雲異常検出に適合せず、特異性に欠ける可能性があることを考慮し、これらのVLMを微調整するための学習可能な視覚的および適応的テキストプロンプト技術の統合を提案し、その検出性能を向上させる。 MVTec 3D-ADとReal3D-ADの広範囲な実験により,提案するMVPフレームワークの優れたゼロショット異常検出性能と高速化技術の有効性が実証された。自動車用プラスチック部品の検査における実世界の評価は,提案手法が実用上の見当たらないシナリオにも一般化可能であることをさらに示している。コードはhttps://github.com/hustCYQ/MVP-PCLIPで入手できる。

Detecting anomalies within point clouds is crucial for various industrial applications, but traditional unsupervised methods face challenges due to data acquisition costs, early-stage production constraints, and limited generalization across product categories. To overcome these challenges, we introduce the Multi-View Projection (MVP) framework, leveraging pre-trained Vision-Language Models (VLMs) to detect anomalies. Specifically, MVP projects point cloud data into multi-view depth images, thereby translating point cloud anomaly detection into image anomaly detection. Following zero-shot image anomaly detection methods, pre-trained VLMs are utilized to detect anomalies on these depth images. Given that pre-trained VLMs are not inherently tailored for zero-shot point cloud anomaly detection and may lack specificity, we propose the integration of learnable visual and adaptive text prompting techniques to fine-tune these VLMs, thereby enhancing their detection performance. Extensive experiments on the MVTec 3D-AD and Real3D-AD demonstrate our proposed MVP framework's superior zero-shot anomaly detection performance and the prompting techniques' effectiveness. Real-world evaluations on automotive plastic part inspection further showcase that the proposed method can also be generalized to practical unseen scenarios. The code is available at https://github.com/hustCYQ/MVP-PCLIP.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# 隠れたアクティベーションは十分ではない:ニューラルネットワークの予測に対する一般的なアプローチ

Hidden Activations Are Not Enough: A General Approach to Neural Network Predictions ( http://arxiv.org/abs/2409.13163v1 )

ライセンス: Link先を確認

Samuel Leblanc, Aiky Rasolomanana, Marco Armenta,

(参考訳) 本稿では,クイバー表現理論のツールを用いたニューラルネットワーク解析のための新しい数学的枠組みを提案する。このフレームワークにより,ニューラルネットワークが認識する新たなデータサンプルとトレーニングデータとの類似性を定量化することができる。データサンプルの帰納的クイバー表現を活用することで、従来の隠蔽層出力よりも多くの情報をキャプチャする。このクイバー表現は、フォワードパスの計算の複雑さを1つの行列に抽象化し、行列空間における単純な幾何学的および統計的引数を用いてニューラルネットワークの予測を研究する。私たちの数学的結果はアーキテクチャ非依存でタスク非依存であり、広く適用できます。本稿では,MNIST と FashionMNIST のデータセットに対して,異なる MLP アーキテクチャの対角的例といくつかの対向的攻撃方法を検出する問題に対して,本研究の結果を適用した。我々の実験は、我々の \href{https://github.com/MarcoArmenta/Hidden-Activations-are-Enough}{publicly available repository} で再現できる。

We introduce a novel mathematical framework for analyzing neural networks using tools from quiver representation theory. This framework enables us to quantify the similarity between a new data sample and the training data, as perceived by the neural network. By leveraging the induced quiver representation of a data sample, we capture more information than traditional hidden layer outputs. This quiver representation abstracts away the complexity of the computations of the forward pass into a single matrix, allowing us to employ simple geometric and statistical arguments in a matrix space to study neural network predictions. Our mathematical results are architecture-agnostic and task-agnostic, making them broadly applicable. As proof of concept experiments, we apply our results for the MNIST and FashionMNIST datasets on the problem of detecting adversarial examples on different MLP architectures and several adversarial attack methods. Our experiments can be reproduced with our \href{https://github.com/MarcoArmenta/Hidden-Activations-are-not-Enough}{publicly available repository}.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# 姿勢制御のためのモジュラ衛星の形態と挙動の協調最適化

Morphology and Behavior Co-Optimization of Modular Satellites for Attitude Control ( http://arxiv.org/abs/2409.13166v1 )

ライセンス: Link先を確認

Yuxing Wang, Jie Li, Cong Yu, Xinyang Li, Simeng Huang, Yongzhe Chang, Xueqian Wang, Bin Liang,

(参考訳) モジュラー衛星の出現は、宇宙探査における柔軟性、レジリエンス、拡張性の新たなパラダイムを導入し、宇宙船工学における重要な転換点となった。姿勢制御などの複雑な課題に対処するためには、衛星の形態的アーキテクチャと制御器の両方が性能の最適化に不可欠である。最適な制御に関するかなりの研究にもかかわらず、特定のミッションの制約に合わせたモジュラー衛星の最適化と実用的な組み立て戦略の開発には大きなギャップが残っている。この研究のギャップは、設計と制御の協調最適化という本質的に複雑な性質から生じる。従来、人工進化によって取り組まれていたこの問題は、サンプル非効率で計算コストのかかる個々のコントローラの適合度に基づいて形態を最適化することである。本稿では、モジュラー衛星の形状と制御を同時に最適化し、姿勢制御ミッションの性能と効率を向上させるための、新しい勾配に基づくアプローチを提案する。我々のモンテカルロシミュレーションは、この共最適化アプローチが、進化に基づくアプローチで設計されたものよりも、ミッション性能のよいモジュラー衛星を産み出すことを示した。さらに,本研究では今後の研究の道筋について論じる。

The emergence of modular satellites marks a significant transformation in spacecraft engineering, introducing a new paradigm of flexibility, resilience, and scalability in space exploration endeavors. In addressing complex challenges such as attitude control, both the satellite's morphological architecture and the controller are crucial for optimizing performance. Despite substantial research on optimal control, there remains a significant gap in developing optimized and practical assembly strategies for modular satellites tailored to specific mission constraints. This research gap primarily arises from the inherently complex nature of co-optimizing design and control, a process known for its notorious bi-level optimization loop. Conventionally tackled through artificial evolution, this issue involves optimizing the morphology based on the fitness of individual controllers, which is sample-inefficient and computationally expensive. In this paper, we introduce a novel gradient-based approach to simultaneously optimize both morphology and control for modular satellites, enhancing their performance and efficiency in attitude control missions. Our Monte Carlo simulations demonstrate that this co-optimization approach results in modular satellites with better mission performance compared to those designed by evolution-based approaches. Furthermore, this study discusses potential avenues for future research.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# 電子ノイズシステムにおけるドリフト補償のための教師なしアテンションベースマルチソースドメイン適応フレームワーク

Unsupervised Attention-Based Multi-Source Domain Adaptation Framework for Drift Compensation in Electronic Nose Systems ( http://arxiv.org/abs/2409.13167v1 )

ライセンス: Link先を確認

Wenwen Zhang, Shuhao Hu, Zhengyuan Zhang, Yuanjin Zheng, Qi Jie Wang, Zhiping Lin,

(参考訳) 電子鼻(E-nose)システムを用いた産業環境における有害・有害・爆発・可燃性ガスの連続的・長期モニタリングは、ガスセンサの時間変化によるガス識別精度の低下という重大な課題に直面している。この問題に対処するために,E-noseシステムにおいて,ドリフト補償を伴うガス識別のための非監視型マルチソースドメイン共有プライベート機能融合適応(AMDS-PFFA)フレームワークを提案する。 AMDS-PFFAモデルは、初期段階で収集された複数のソースドメインからのラベル付きデータを有効利用し、ターゲットドメインからのラベルなしガスセンサアレイドリフト信号中のガスを正確に識別する。このモデルの有効性を検証するため、カリフォルニア大学アーバイン校(UCI)の標準ドリフトガスデータセット(36ヶ月以上)と、自家製E-noseシステムからのドリフト信号データ(30ヶ月)を用いて、広範囲な実験的評価を行った。近年のドリフト補償法と比較して、AMDS-PFFAモデルは強い収束率で、UCIデータセットで83.20%、ターゲット領域のバッチで開発したE-noseシステムからのデータで93.96%に達する。これらの結果は, ドリフト補償によるガス識別におけるAMDS-PFFAモデルの優れた性能を示し, 既存手法よりも優れていた。

Continuous, long-term monitoring of hazardous, noxious, explosive, and flammable gases in industrial environments using electronic nose (E-nose) systems faces the significant challenge of reduced gas identification accuracy due to time-varying drift in gas sensors. To address this issue, we propose a novel unsupervised attention-based multi-source domain shared-private feature fusion adaptation (AMDS-PFFA) framework for gas identification with drift compensation in E-nose systems. The AMDS-PFFA model effectively leverages labeled data from multiple source domains collected during the initial stage to accurately identify gases in unlabeled gas sensor array drift signals from the target domain. To validate the model's effectiveness, extensive experimental evaluations were conducted using both the University of California, Irvine (UCI) standard drift gas dataset, collected over 36 months, and drift signal data from our self-developed E-nose system, spanning 30 months. Compared to recent drift compensation methods, the AMDS-PFFA model achieves the highest average gas recognition accuracy with strong convergence, attaining 83.20% on the UCI dataset and 93.96% on data from our self-developed E-nose system across all target domain batches. These results demonstrate the superior performance of the AMDS-PFFA model in gas identification with drift compensation, significantly outperforming existing methods.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# AI時代の経済政策への挑戦

Economic Policy Challenges for the Age of AI ( http://arxiv.org/abs/2409.13168v1 )

ライセンス: Link先を確認

Anton Korinek,

(参考訳) 本稿では、AIの人工知能(AGI)への転換的進歩が、経済学者や経済政策立案者にもたらす大きな課題について考察する。私は、AIの時代が、労働の役割を減らし、前例のない生産性向上をもたらすことによって、経済の基本的な構造にどのように革命をもたらすかを検討するが、失業、所得分配、教育と人的資本の価値に対する懸念を提起する。私は、AGI後の労働にどのような役割が残るのか、どの生産要因が重要になるのかを探求する。この論文は,(1)不平等と所得分配,(2)教育と技術開発,(3)社会的・政治的安定,(4)マクロ経済政策,(5)反トラストと市場規制,(6)知的財産,(7)環境影響,(8)グローバルAIガバナンスという,AI時代の経済政策の8つの重要な課題を明らかにする。経済学者がこれらの課題をより深く理解するためにどのように貢献できるかを強調して結論付けている。

This paper examines the profound challenges that transformative advances in AI towards Artificial General Intelligence (AGI) will pose for economists and economic policymakers. I examine how the Age of AI will revolutionize the basic structure of our economies by diminishing the role of labor, leading to unprecedented productivity gains but raising concerns about job disruption, income distribution, and the value of education and human capital. I explore what roles may remain for labor post-AGI, and which production factors will grow in importance. The paper then identifies eight key challenges for economic policy in the Age of AI: (1) inequality and income distribution, (2) education and skill development, (3) social and political stability, (4) macroeconomic policy, (5) antitrust and market regulation, (6) intellectual property, (7) environmental implications, and (8) global AI governance. It concludes by emphasizing how economists can contribute to a better understanding of these challenges.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# 層内LPBFモニタリングのための生成拡散モデルによる深層学習に基づく光学画像の超解像

Deep Learning based Optical Image Super-Resolution via Generative Diffusion Models for Layerwise in-situ LPBF Monitoring ( http://arxiv.org/abs/2409.13171v1 )

ライセンス: Link先を確認

Francis Ogoke, Sumesh Kalambettu Suresh, Jesse Adamczyk, Dan Bolintineanu, Anthony Garland, Michael Heiden, Amir Barati Farimani,

(参考訳) レーザー粉体層融合(L-PBF)における欠陥の確率的形成は, 高精度使用例への採用に悪影響を及ぼす。光モニタリング技術は,レイヤワイドイメージングに基づく欠陥の同定に利用することができるが,コストやメモリの制約により,高解像度化が困難である。そこで我々は,低コストで低解像度なビルドプレート画像と詳細な高解像度なビルドプレート画像とを結びつけ,コスト効率のよいプロセス監視を実現するため,生成型ディープラーニングモデルを実装した。そのため,低分解能Webカメラ画像からビルドプレートの現実的な高分解能画像を生成するための条件付き潜伏確率拡散モデルを訓練し,小型の特徴の分布と表面粗さを復元する。まず、ピーク信号対雑音比(PSNR)、構造類似度指標(SSIM)、ウェーブレット共分散測定を用いて、生成した画像の再構成品質を解析し、そのモデルの性能を評価する。さらに,Segment Anything Foundationモデルに基づくフレームワークを設計し,プリント部の3次元形状を再現し,再構成した試料の表面粗さを解析する。最後に、実装されたフレームワークのゼロショット一般化能力を、合成低解像度データを作成することによって、他の部分のジオメトリに拡張する。

The stochastic formation of defects during Laser Powder Bed Fusion (L-PBF) negatively impacts its adoption for high-precision use cases. Optical monitoring techniques can be used to identify defects based on layer-wise imaging, but these methods are difficult to scale to high resolutions due to cost and memory constraints. Therefore, we implement generative deep learning models to link low-cost, low-resolution images of the build plate to detailed high-resolution optical images of the build plate, enabling cost-efficient process monitoring. To do so, a conditional latent probabilistic diffusion model is trained to produce realistic high-resolution images of the build plate from low-resolution webcam images, recovering the distribution of small-scale features and surface roughness. We first evaluate the performance of the model by analyzing the reconstruction quality of the generated images using peak-signal-to-noise-ratio (PSNR), structural similarity index measure (SSIM) and wavelet covariance metrics that describe the preservation of high-frequency information. Additionally, we design a framework based upon the Segment Anything foundation model to recreate the 3D morphology of the printed part and analyze the surface roughness of the reconstructed samples. Finally, we explore the zero-shot generalization capabilities of the implemented framework to other part geometries by creating synthetic low-resolution data.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# フラッターミニマにおけるバイラテラルシャープネスの最小化

Bilateral Sharpness-Aware Minimization for Flatter Minima ( http://arxiv.org/abs/2409.13173v1 )

ライセンス: Link先を確認

Jiaxin Deng, Junbiao Pang, Baochang Zhang, Qingming Huang,

(参考訳) SAM (Sharpness-Aware Minimization) は、Max-Sharpness (MaxS) を小さくすることで一般化を促進する。実践的な成功にもかかわらず,SAM の一般化強化を支える MAxS が「Flatness Indicator Problem (FIP)」に直面することを実証的に見出した。より良い平坦度指標(FI)は、ニューラルネットワークのより良い一般化をもたらすだろう。なぜならSAMは自然界における欲求的な探索法であるからである。本稿では, トレーニング損失と現在の重量を囲む周辺地域の最小損失との差を利用して, ミンシャープネス(Min-Sharpness, MinS)と表現する。 MaxSとMinSをマージすることで、最適化中により平坦な方向を示すより良いFIを作成しました。具体的には、このFIをSAMと組み合わせて提案したバイラテラルSAM (BSAM) と組み合わせ、SAMのそれよりもより平坦な最小値を求める。この理論解析は、BSAMが局所ミニマに収束することを証明している。大規模な実験により、BSAMは、分類、移動学習、ポーズ推定、ネットワーク量子化といった様々なタスクにおいて、バニラSAMよりも優れた一般化性能とロバスト性を提供することが示された。コードは、https://github.com/ajiaaa/BSAM.comで公開されている。

Sharpness-Aware Minimization (SAM) enhances generalization by reducing a Max-Sharpness (MaxS). Despite the practical success, we empirically found that the MAxS behind SAM's generalization enhancements face the "Flatness Indicator Problem" (FIP), where SAM only considers the flatness in the direction of gradient ascent, resulting in a next minimization region that is not sufficiently flat. A better Flatness Indicator (FI) would bring a better generalization of neural networks. Because SAM is a greedy search method in nature. In this paper, we propose to utilize the difference between the training loss and the minimum loss over the neighborhood surrounding the current weight, which we denote as Min-Sharpness (MinS). By merging MaxS and MinS, we created a better FI that indicates a flatter direction during the optimization. Specially, we combine this FI with SAM into the proposed Bilateral SAM (BSAM) which finds a more flatter minimum than that of SAM. The theoretical analysis proves that BSAM converges to local minima. Extensive experiments demonstrate that BSAM offers superior generalization performance and robustness compared to vanilla SAM across various tasks, i.e., classification, transfer learning, human pose estimation, and network quantization. Code is publicly available at: https://github.com/ajiaaa/BSAM.

翻訳日:2024-11-07 11:41:13 公開日:2024-09-20

# RPAF:大規模リコメンダシステムにおけるキャッシュ割り当てのための強化予測アロケーションフレームワーク

RPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems ( http://arxiv.org/abs/2409.13175v1 )

ライセンス: Link先を確認

Shuo Su, Xiaoshuang Chen, Yao Wang, Yulin Wu, Ziqiang Zhang, Kaiqiao Zhan, Ben Wang, Kun Gai,

(参考訳) 現代のリコメンデータシステムは計算集約的なインフラ上に構築されており、計算資源が限られているため、特にピーク時に各要求に対してリアルタイムな計算を行うことは困難である。ユーザ側のキャッシュによるリコメンデーションは、リアルタイムのレコメンデーションができない場合に広く使用される。しかし、ユーザ全体のエンゲージメントを最大化するために、リアルタイムおよびキャッシュされたレコメンデーションを割り当てることは困難である。本稿では,キャッシュアロケーションにおける2つの重要な課題,すなわち,バリューストラテジー依存とストリーミングアロケーションを示す。そこで我々は,これらの問題に対処する強化予測割当フレームワーク(RPAF)を提案する。 RPAFは、予測とアロケーション段階を含む強化学習ベースの2段階フレームワークである。予測段階は、値戦略依存性を考慮したキャッシュ選択の値を推定し、割り当て段階は、グローバルな予算制約を満たしつつ、各要求に対するキャッシュ選択を決定する。 RPAF訓練の課題には, グローバル性と予算制約の厳格性が含まれており, この問題に対処するための緩やかなローカルアロケータ (RLA) が提案されている。さらに、ストリーミングアロケーション問題に対処するために、アロケーションステージでPoolRankアルゴリズムが使用される。実験の結果,RPAFは計算予算制約下でのユーザのエンゲージメントを大幅に改善することがわかった。

Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users' overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users' engagement under computational budget constraints.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# 説明可能なAIとLLMを用いた適応型エンドツーエンドIoTセキュリティフレームワーク

An Adaptive End-to-End IoT Security Framework Using Explainable AI and LLMs ( http://arxiv.org/abs/2409.13177v1 )

ライセンス: Link先を確認

Sudipto Baral, Sajal Saha, Anwar Haque,

(参考訳) IoT(Internet of Things)の指数関数的な成長は、サイバーセキュリティの脅威の複雑さと量を大幅に増加させ、高度でスケーラブルで解釈可能なセキュリティフレームワークの開発を必要としている。本稿では、機械学習(ML)、説明可能なAI(XAI)、大規模言語モデル(LLM)を活用した、リアルタイムIoT攻撃検出および応答のための革新的で包括的なフレームワークを提案する。 SHAP(SHapley Additive exPlanations)やLIME(Local Interpretable Model-Agnostic Explanations)といったXAI技術をモデルに依存しないアーキテクチャに統合することにより、さまざまなMLアルゴリズムにまたがるフレームワークの適応性を確保する。さらに、LSMの組み入れにより、検出決定の解釈可能性とアクセシビリティが向上し、システム管理者に検出された脅威の動作可能で人間に理解可能な説明を提供する。私たちのエンドツーエンドフレームワークは、モデル開発からデプロイメントへのシームレスな移行を促進するだけでなく、既存の研究でしばしば欠落している現実世界のアプリケーション機能も表しています。 The CIC-IOT-2023 dataset \cite{neto2023ciciot2023}, Gemini and OPENAI LLMS shows unique strengths in attack mitigation: Gemini provide exact, focused strategy, OPENAI provides extensive, in-deepth security measures。 SHAPアルゴリズムとLIMEアルゴリズムをXAIに組み込むことで、攻撃検出、詳細な特徴分析、微調整、誤分類の適応によるモデル改善の機会を強調し、精度を高めることができる。

The exponential growth of the Internet of Things (IoT) has significantly increased the complexity and volume of cybersecurity threats, necessitating the development of advanced, scalable, and interpretable security frameworks. This paper presents an innovative, comprehensive framework for real-time IoT attack detection and response that leverages Machine Learning (ML), Explainable AI (XAI), and Large Language Models (LLM). By integrating XAI techniques such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) with a model-independent architecture, we ensure our framework's adaptability across various ML algorithms. Additionally, the incorporation of LLMs enhances the interpretability and accessibility of detection decisions, providing system administrators with actionable, human-understandable explanations of detected threats. Our end-to-end framework not only facilitates a seamless transition from model development to deployment but also represents a real-world application capability that is often lacking in existing research. Based on our experiments with the CIC-IOT-2023 dataset \cite{neto2023ciciot2023}, Gemini and OPENAI LLMS demonstrate unique strengths in attack mitigation: Gemini offers precise, focused strategies, while OPENAI provides extensive, in-depth security measures. Incorporating SHAP and LIME algorithms within XAI provides comprehensive insights into attack detection, emphasizing opportunities for model improvement through detailed feature analysis, fine-tuning, and the adaptation of misclassifications to enhance accuracy.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# API提案における大規模コードモデルの体系的評価:いつ,いつ,どのように

A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How ( http://arxiv.org/abs/2409.13178v1 )

ライセンス: Link先を確認

Chaozheng Wang, Shuzheng Gao, Cuiyun Gao, Wenxuan Wang, Chun Yong Chong, Shan Gao, Michael R. Lyu,

(参考訳) API提案は、現代のソフトウェア開発において重要なタスクであり、現在の状況に基づいてサードパーティのAPIを予測し、推奨することでプログラマを支援する。大規模コードモデル(LCM)の最近の進歩は、API提案タスクにおいて有望であることを示している。しかし彼らは主に、どのAPIを使うべきかを提案することに重点を置いており、プログラマは、提案されたAPIを使うタイミングやAPIを使う方法など、実際にAPIを使用している間、より多くの支援を要求する可能性があることを無視している。このギャップを軽減するため,本論文では,API提案タスクのLCMを体系的に評価する。調査を容易にするために、まず、683の人気のあるJavaプロジェクトで使用されている176のAPIをカバーする、多様なコードスニペットのコレクションを含むベンチマークを構築しました。 API提案タスクの3つのシナリオは評価のために考慮される。(1)API使用の望ましい位置とタイミングを決定することを目的とした'`\textit{when to use}''、(2)ライブラリから適切なAPIを特定することを目的とした'`\textit{which to use}'、(3)APIの引数を予測することを目的とした'`\textit{how to use}'である。この3つのシナリオを考慮すれば、開発者のためのAPIの提案におけるLCMの機能の包括的な評価が可能になる。評価では,3つのシナリオに対して,異なるモデルサイズを持つ9つの一般的なLCMを選択する。また、文脈選択がモデル性能に与える影響を詳細に分析する。

API suggestion is a critical task in modern software development, assisting programmers by predicting and recommending third-party APIs based on the current context. Recent advancements in large code models (LCMs) have shown promise in the API suggestion task. However, they mainly focus on suggesting which APIs to use, ignoring that programmers may demand more assistance while using APIs in practice including when to use the suggested APIs and how to use the APIs. To mitigate the gap, we conduct a systematic evaluation of LCMs for the API suggestion task in the paper. To facilitate our investigation, we first build a benchmark that contains a diverse collection of code snippets, covering 176 APIs used in 853 popular Java projects. Three distinct scenarios in the API suggestion task are then considered for evaluation, including (1) ``\textit{when to use}'', which aims at determining the desired position and timing for API usage; (2) ``\textit{which to use}'', which aims at identifying the appropriate API from a given library; and (3) ``\textit{how to use}'', which aims at predicting the arguments for a given API. The consideration of the three scenarios allows for a comprehensive assessment of LCMs' capabilities in suggesting APIs for developers. During the evaluation, we choose nine popular LCMs with varying model sizes for the three scenarios. We also perform an in-depth analysis of the influence of context selection on the model performance ...

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# ConvLSTMTransNet:インターネットトラフィックテレメトリのためのハイブリッドディープラーニングアプローチ

ConvLSTMTransNet: A Hybrid Deep Learning Approach for Internet Traffic Telemetry ( http://arxiv.org/abs/2409.13179v1 )

ライセンス: Link先を確認

Sajal Saha, Saikat Das, Glaucio H. S. Carvalho,

(参考訳) 本稿では、時系列予測のためのハイブリッドディープラーニングモデルConvLSTMTransNetと、インターネットトラフィックテレメトリへの具体的な適用について述べる。このモデルは、畳み込みニューラルネットワーク(CNN)、Long Short-Term Memory(LSTM)ネットワーク、およびTransformerエンコーダの強みを統合し、時系列データに固有の複雑な時空間関係をキャプチャする。 The ConvLSTMTransNet model were evaluation on three baseline model: RNN, LSTM, Gated Recurrent Unit (GRU) using real Internet traffic data sampleed from high-speed port on a provider edge router。 Mean Absolute Error (MAE)、Root Mean Squared Error (RMSE)、Weighted Absolute Percentage Error (WAPE)といったパフォーマンス指標を使用して各モデルの精度を評価した。以上の結果から,ConvLSTMTransNetは予測精度において,ベースラインモデルよりも約10%優れていた。 ConvLSTMTransNetは、時間的依存関係を捕捉し、インターネットトラフィックデータから空間的特徴を抽出する能力を高めるという、革新的なアーキテクチャ上の特徴により、従来のモデルを上回る。これらの知見は、より正確な予測を達成するために、インターネットトラフィックデータの複雑さに合わせた高度なアーキテクチャを採用することの重要性を浮き彫りにしている。

In this paper, we present a novel hybrid deep learning model, named ConvLSTMTransNet, designed for time series prediction, with a specific application to internet traffic telemetry. This model integrates the strengths of Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Transformer encoders to capture complex spatial-temporal relationships inherent in time series data. The ConvLSTMTransNet model was evaluated against three baseline models: RNN, LSTM, and Gated Recurrent Unit (GRU), using real internet traffic data sampled from high-speed ports on a provider edge router. Performance metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Weighted Absolute Percentage Error (WAPE) were used to assess each model's accuracy. Our findings demonstrate that ConvLSTMTransNet significantly outperforms the baseline models by approximately 10% in terms of prediction accuracy. ConvLSTMTransNet surpasses traditional models due to its innovative architectural features, which enhance its ability to capture temporal dependencies and extract spatial features from internet traffic data. Overall, these findings underscore the importance of employing advanced architectures tailored to the complexities of internet traffic data for achieving more precise predictions.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# インターネットトラフィック予測におけるデータ制限の克服:トランスファーラーニングとウェーブレット拡張を用いたLSTMモデル

Overcoming Data Limitations in Internet Traffic Forecasting: LSTM Models with Transfer Learning and Wavelet Augmentation ( http://arxiv.org/abs/2409.13181v1 )

ライセンス: Link先を確認

Sajal Saha, Anwar Haque, Greg Sidebottom,

(参考訳) 小型ISPネットワークにおけるインターネットトラフィックの効果的な予測は、データ可用性の制限によって困難である。本稿では, LSTM を用いた2つのモデル LSTMSeq2Seq と LSTMSeq2SeqAtn を用いた転送学習とデータ拡張手法を用いて, この課題を考察する。データセットは実際のインターネットトラフィックテレメトリを表し、さまざまなネットワークドメインにわたる多様なトラフィックパターンに関する洞察を提供する。両モデルとも単段階予測では良好に動作したが,特に長期精度では多段階予測が困難であった。小さなデータセットでは、LSTMSeq2Seqは一般的にLSTMSeq2SeqAtnよりも優れており、より高いモデル複雑性が必ずしもより良いパフォーマンスをもたらすとは限らないことを示している。モデルの有効性は、異なるネットワーク領域で異なり、異なるトラフィック特性の影響を反映している。データ不足に対処するため、離散ウェーブレット変換はデータ拡張に使用され、特に短期的な予測において、モデルの性能が大幅に改善された。分析の結果、限られたデータを持つシナリオでは、データの増大が不可欠であることが判明した。さらに、LSTMSeq2SeqAtnにおける注意機構により、より短期的な予測一貫性は向上するが、より長い予測ではより大きな変動性を提供する。その結果、交通予測における異なるモデリングアプローチの利点と限界が浮き彫りになった。本研究は、特にデータ可用性に制限のある小さなISPネットワークにおいて、交通予測モデルの精度を高める上で、転送学習とデータ拡張の重要性を浮き彫りにしている。

Effective internet traffic prediction in smaller ISP networks is challenged by limited data availability. This paper explores this issue using transfer learning and data augmentation techniques with two LSTM-based models, LSTMSeq2Seq and LSTMSeq2SeqAtn, initially trained on a comprehensive dataset provided by Juniper Networks and subsequently applied to smaller datasets. The datasets represent real internet traffic telemetry, offering insights into diverse traffic patterns across different network domains. Our study revealed that while both models performed well in single-step predictions, multi-step forecasts were challenging, particularly in terms of long-term accuracy. In smaller datasets, LSTMSeq2Seq generally outperformed LSTMSeq2SeqAtn, indicating that higher model complexity does not necessarily translate to better performance. The models' effectiveness varied across different network domains, reflecting the influence of distinct traffic characteristics. To address data scarcity, Discrete Wavelet Transform was used for data augmentation, leading to significant improvements in model performance, especially in shorter-term forecasts. Our analysis showed that data augmentation is crucial in scenarios with limited data. Additionally, the study included an analysis of the models' variability and consistency, with attention mechanisms in LSTMSeq2SeqAtn providing better short-term forecasting consistency but greater variability in longer forecasts. The results highlight the benefits and limitations of different modeling approaches in traffic prediction. Overall, this research underscores the importance of transfer learning and data augmentation in enhancing the accuracy of traffic prediction models, particularly in smaller ISP networks with limited data availability.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# $\textit{SKIntern}$: より優れたCoT能力を小言語モデルに拡張するためのシンボリック知識の内部化

$\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models ( http://arxiv.org/abs/2409.13183v1 )

ライセンス: Link先を確認

Huanxuan Liao, Shizhu He, Yupu Hao, Xiang Li, Yuanzhe Zhang, Kang Liu, Jun Zhao,

(参考訳) SLM(Small Language Models)は、LLM(Large Language Models)の高い計算要求とプライバシー上の懸念から注目を集めている。 LLMから抽出したCoT(Chains of Thought)データを用いた微調整SLMの研究は,その推論能力の向上を目的としている。さらに、いくつかのCoT蒸留法は、外部シンボル知識を生成プロセスに導入し、限られた知識記憶、推論能力、およびSLMの外部ドメイン(OOD)一般化を改善する。しかし、記号的知識の導入により計算オーバーヘッドが増加し、潜在的なノイズがもたらされる。本稿では,SLM に記号的知識を内在化させる革新的な手法である $\textit{SKIntern}$ を導入する。知識を効率的に内部化することにより、$\textit{SKIntern}$は計算オーバーヘッドを減らし、推論中の問題のみに焦点を当てることで推論プロセスを高速化する。最先端のベースラインを5倍以上上回り、推論コスト(FLOPで測定される)を最大4倍に削減します。私たちのコードは \url{https://github.com/Xnhyacinth/SKIntern} で利用可能です。

Small Language Models (SLMs) are attracting attention due to the high computational demands and privacy concerns of Large Language Models (LLMs). Some studies fine-tune SLMs using Chains of Thought (CoT) data distilled from LLMs, aiming to enhance their reasoning ability. Furthermore, Some CoT distillation methods introduce external symbolic knowledge into the generation process to improve the limited knowledge memory, reasoning ability and out-of-domain (OOD) generalization of SLMs. However, the introduction of symbolic knowledge increases computational overhead and introduces potential noise. In this paper, we introduce $\textit{SKIntern}$, an innovative approach that empowers SLMs to internalize symbolic knowledge and few-shot examples gradually through a progressive fine-tuning process, guided by a predefined linear decay schedule under curriculum learning. By efficiently internalizing knowledge, $\textit{SKIntern}$ reduces computational overhead and speeds up the reasoning process by focusing solely on the question during inference. It outperforms state-of-the-art baselines by over 5\%, while reducing inference costs (measured in FLOPs) by up to $4\times$ across a wide range of SLMs in both in-domain (ID) and out-of-domain (OOD) tasks. Our code will be available at \url{https://github.com/Xnhyacinth/SKIntern}.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# ASPINN:特異摂動微分方程式を解く漸近戦略

ASPINN: An asymptotic strategy for solving singularly perturbed differential equations ( http://arxiv.org/abs/2409.13185v1 )

ライセンス: Link先を確認

Sen Wang, Peizhi Zhao, Tao Song,

(参考訳) 特異摂動微分方程式 (SPDE) の解法は, 境界層における解の急激な変化に起因する。本論文では,物理情報ニューラルネットワーク (PINN) と一般結合型物理情報ニューラルネットワーク (GKPINN) の一般化である漸近的物理情報ニューラルネットワーク (ASPINN) を提案する。これは漸近解析の考え方に基づく分解法である。 PINNと比較して、ASPINN法は境界層に指数層が配置されているため、SPDEを解くのに強い適合性を持つ。 GKPINNとは異なり、ASPINNは完全に接続されたレイヤーの数を減らし、トレーニングコストをより効率的に削減する。さらに、ASPINNは理論上境界層での解をより正確に近似し、GKPINNと比較して精度も向上する。本稿では,ASPINN法が境界層問題において有望であることを示す,多様なSPDEのクラスを解くことでASPINNの効果を実証する。さらに,MLPの代わりにChebyshev Kolmogorov-Arnold Networks (Chebyshev-KAN)を導入し,様々な実験で高い性能を実現した。

Solving Singularly Perturbed Differential Equations (SPDEs) presents challenges due to the rapid change of their solutions at the boundary layer. In this manuscript, We propose Asymptotic Physics-Informed Neural Networks (ASPINN), a generalization of Physics-Informed Neural Networks (PINN) and General-Kindred Physics-Informed Neural Networks (GKPINN) approaches. This is a decomposition method based on the idea of asymptotic analysis. Compared to PINN, the ASPINN method has a strong fitting ability for solving SPDEs due to the placement of exponential layers at the boundary layer. Unlike GKPINN, ASPINN lessens the number of fully connected layers, thereby reducing the training cost more effectively. Moreover, ASPINN theoretically approximates the solution at the boundary layer more accurately, which accuracy is also improved compared to GKPINN. We demonstrate the effect of ASPINN by solving diverse classes of SPDEs, which clearly shows that the ASPINN method is promising in boundary layer problems. Furthermore, we introduce Chebyshev Kolmogorov-Arnold Networks (Chebyshev-KAN) instead of MLP, achieving better performance in various experiments.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# 適応型大規模言語モデルが糖尿病治療における複数の医療作業を促進する

An adapted large language model facilitates multiple medical tasks in diabetes care ( http://arxiv.org/abs/2409.13191v1 )

ライセンス: Link先を確認

Lai Wei, Zhen Ying, Muyang He, Yutong Chen, Qian Yang, Yanzhe Hong, Jiaping Lu, Xiaoying Li, Weiran Huang, Ying Chen,

(参考訳) 糖尿病は世界的な健康上の重荷となる慢性疾患であり、糖尿病管理の最適化には複数のステークホルダーの協力が必要である。大規模言語モデル(LLM)は、様々な医療シナリオにおいて有望であるが、様々な糖尿病タスクにおけるその効果は証明されていない。本研究では,糖尿病特異的LSMを訓練し,評価するための枠組みを導入した。最初に、データ収集、フィルタリング、拡張、改善を含む包括的なデータ処理パイプラインを開発しました。このアプローチは、高品質で糖尿病特異的なデータセットの作成に寄与し、スクラッチから完全に評価ベンチマークをいくつか作成する。収集したトレーニングデータセットを用いて糖尿病特異的LLMファミリーを微調整し,他のLLMと比較した各種糖尿病タスクの理解と処理の最先端性を示した。さらに, 糖尿病治療におけるモデルの有用性について臨床的検討を行い, パーソナライズされた医療提供, 医療支援, 臨床業務の合理化などについて検討した。そこで本研究では,糖尿病特異的LLMファミリーを開発・評価する枠組みを導入し,臨床実践の強化と,エンドユーザーと対面した糖尿病支援のためのパーソナライズされたデータ駆動型サポートの提供の可能性を強調した。コードはGitHubでhttps://github.com/waltonfuture/Diabetica.comで提供されている。

Diabetes is a chronic disease that poses a significant global health burden, and optimizing diabetes management requires multi-stakeholder collaboration. Large language models (LLMs) have shown promise in various healthcare scenarios, but their effectiveness across a diverse range of diabetes tasks remains unproven. In this study, we introduced a framework to train and validate diabetes-specific LLMs. We first developed a comprehensive data processing pipeline that includes data collection, filtering, augmentation and refinement. This approach contributes to creating a high-quality, diabetes-specific dataset, and several evaluation benchmarks entirely from scratch. Utilizing the collected training dataset, we fine-tuned a diabetes-specific LLM family that demonstrated state-of-the-art proficiency in understanding and processing various diabetes tasks compared to other LLMs. Furthermore, clinical studies showed the potential applications of our models in diabetes care, including providing personalized healthcare, assisting medical education, and streamlining clinical tasks. In conclusion, our study introduced a framework to develop and evaluate a diabetes-specific LLM family, and highlighted its potential to enhance clinical practice and provide personalized, data-driven support for diabetes support when facing different end users. The code is provided via GitHub at https://github.com/waltonfuture/Diabetica.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# ChemDFM-X:化学のための大規模マルチモーダルモデルを目指して

ChemDFM-X: Towards Large Multimodal Model for Chemistry ( http://arxiv.org/abs/2409.13194v1 )

ライセンス: Link先を確認

Zihan Zhao, Bo Chen, Jingpiao Li, Lu Chen, Liyang Wen, Pengyu Wang, Zichen Zhu, Danyang Zhang, Ziping Wan, Yansi Li, Zhongyang Dai, Xin Chen, Kai Yu,

(参考訳) AIツールの急速な開発は、化学を含む自然科学の研究に前例のない支援を提供すると予想されている。しかし、既存の単一タスク特化モデルや、新しい大規模マルチモーダルモデル(LMM)は、幅広い化学データモダリティやタスクカテゴリをカバーできない。化学者の真の要求に応えるために,LMMの潜在能力を活用した真に実用的で有用な研究アシスタントとして機能するクロスモーダルケミカル・ジェネラル・インテリジェンス(CGI)システムが必要である。本稿では,ChemDFM-X (ChemDFM-X) を初めて導入する。近似計算とタスク固有モデル予測により、初期モダリティから、多様なマルチモーダルデータを生成する。この戦略は十分な化学訓練コーパスを生成し、過剰なコストを大幅に削減し、7.6Mデータを含む命令チューニングデータセットを生成する。命令の微調整の後、ChemDFM-Xは様々なデータモダリティを持つ様々な化学タスクの広範な実験で評価される。その結果,マルチモーダルおよびモーダル間知識理解におけるChemDFM-Xの能力が示された。 ChemDFM-Xは、CGIに一歩近づいた化学における全てのモダリティの整合に向けた重要なマイルストーンである。

Rapid developments of AI tools are expected to offer unprecedented assistance to the research of natural science including chemistry. However, neither existing unimodal task-specific specialist models nor emerging general large multimodal models (LMM) can cover the wide range of chemical data modality and task categories. To address the real demands of chemists, a cross-modal Chemical General Intelligence (CGI) system, which serves as a truly practical and useful research assistant utilizing the great potential of LMMs, is in great need. In this work, we introduce the first Cross-modal Dialogue Foundation Model for Chemistry (ChemDFM-X). Diverse multimodal data are generated from an initial modality by approximate calculations and task-specific model predictions. This strategy creates sufficient chemical training corpora, while significantly reducing excessive expense, resulting in an instruction-tuning dataset containing 7.6M data. After instruction finetuning, ChemDFM-X is evaluated on extensive experiments of different chemical tasks with various data modalities. The results demonstrate the capacity of ChemDFM-X for multimodal and inter-modal knowledge comprehension. ChemDFM-X marks a significant milestone toward aligning all modalities in chemistry, a step closer to CGI.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# BoilerTAI: 教育フォーラムにおけるジェネレーティブAIを用いた指導の強化プラットフォーム

BoilerTAI: A Platform for Enhancing Instruction Using Generative AI in Educational Forums ( http://arxiv.org/abs/2409.13196v1 )

ライセンス: Link先を確認

Anvit Sinha, Shruti Goyal, Zachary Sy, Rhianna Kuperus, Ethan Dickey, Andres Bejarano,

(参考訳) コントリビューション: このResearch Categoryトラックのフルペーパーは、Generative AI(GenAI)とオンラインの教育フォーラムをシームレスに統合し、スタッフの教育能力を高めるための新しいアプローチを提供する、実用的でスケーラブルなプラットフォームを記述している。このプラットフォームは、学生ポストとLLM(Large Language Model)との相互作用を円滑に進めることによって、指導スタッフが反応を効率的に管理し、洗練し、承認することを可能にする。この貢献は、指導支援の効率性と効果を高め、学生に提供する応答の質と速度を大幅に向上させ、全体としての学習経験を豊かにする。背景: ヴィゴツキーの社会文化的理論とより知識のある他者(MKO)の概念を基礎として,GenAIが学生とインストラクターの教育対話を強化するために補助的なMKOとして機能するかを検討する。調査質問:GenAIは、教育討論フォーラムに投稿された学生の質問に対して、指導要員の負担軽減にどの程度効果があるか? 方法論: 大規模なプログラミングコースにおいて混合メソッドのアプローチを用いることで、AI-TAは、学生の質問を事前に答えるためにAI支援プラットフォームを使用した。我々は、AI生成応答に対する修正頻度などの効率指標を分析し、AI-TAから定性的なフィードバックを収集した。その結果、AI-TAが生み出す反応に対する学生の反応と、人間のインストラクターが与える反応とでは有意な差は認められなかった。これは、GenAIが適切に管理された場合、教育ニーズを効果的に満たせることを示唆している。さらに、AI-TAは、学習の質を損なうことなく教育効率を高めるGenAIの可能性を指して、クエリに応答するために必要な認知負荷の低減を経験した。

Contribution: This Full paper in the Research Category track describes a practical, scalable platform that seamlessly integrates Generative AI (GenAI) with online educational forums, offering a novel approach to augment the instructional capabilities of staff. The platform empowers instructional staff to efficiently manage, refine, and approve responses by facilitating interaction between student posts and a Large Language Model (LLM). This contribution enhances the efficiency and effectiveness of instructional support and significantly improves the quality and speed of responses provided to students, thereby enriching the overall learning experience. Background: Grounded in Vygotsky's socio-cultural theory and the concept of the More Knowledgeable Other (MKO), the study examines how GenAI can act as an auxiliary MKO to enrich educational dialogue between students and instructors. Research Question: How effective is GenAI in reducing the workload of instructional staff when used to pre-answer student questions posted on educational discussion forums? Methodology: Using a mixed-methods approach in large introductory programming courses, human Teaching Assistants (AI-TAs) employed an AI-assisted platform to pre-answer student queries. We analyzed efficiency indicators like the frequency of modifications to AI-generated responses and gathered qualitative feedback from AI-TAs. Findings: The findings indicate no significant difference in student reception to responses generated by AI-TAs compared to those provided by human instructors. This suggests that GenAI can effectively meet educational needs when adequately managed. Moreover, AI-TAs experienced a reduction in the cognitive load required for responding to queries, pointing to GenAI's potential to enhance instructional efficiency without compromising the quality of education.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# 大規模言語モデル学習における局所SGDのスケーリング法則の探索

Exploring Scaling Laws for Local SGD in Large Language Model Training ( http://arxiv.org/abs/2409.13198v1 )

ライセンス: Link先を確認

Qiaozhi He, Xiaomin Zhuang, Zhihua Wu,

(参考訳) 本稿では,ゆるく接続されたデバイスでのトレーニングを容易にする分散最適化アルゴリズムであるLLMトレーニングにおける局所SGDのスケーリング法について検討する。実験により, モデルパラメータ, データセット, 計算資源など, 従来の手法と比較して, 局所的なSGDが競合する結果が得られることを示す。さらに,マルチクラスタセットアップやエッジコンピューティング環境など,様々な実践シナリオにおけるローカルSGDの適用について検討する。本研究は, 実効マルチクラスタLLMトレーニングに必要な条件を明らかにし, LLMトレーニングプロセスにおけるエッジコンピューティングリソースの活用の可能性と限界について検討した。これは、単一の大規模クラスタトレーニングの代替として、その生存性を示すものだ。

This paper investigates scaling laws for local SGD in LLM training, a distributed optimization algorithm that facilitates training on loosely connected devices. Through extensive experiments, we show that local SGD achieves competitive results compared to conventional methods, given equivalent model parameters, datasets, and computational resources. Furthermore, we explore the application of local SGD in various practical scenarios, including multi-cluster setups and edge computing environments. Our findings elucidate the necessary conditions for effective multi-cluster LLM training and examine the potential and limitations of leveraging edge computing resources in the LLM training process. This demonstrates its viability as an alternative to single large-cluster training.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# CFSP: 粗い活性化情報を持つLCMのための効率的な構造化プルーニングフレームワーク

CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information ( http://arxiv.org/abs/2409.13199v1 )

ライセンス: Link先を確認

Yuxin Wang, Minghua Ma, Zekun Wang, Jingchang Chen, Huiming Fan, Liping Shan, Qing Yang, Dongliang Xu, Ming Liu, Bing Qin,

(参考訳) LLM(Large Language Models)の余剰パラメータと計算オーバーヘッドは、現実のアプリケーションに挑戦する。冗長パラメータを除去して非構造的あるいは構造的疎結合を目標とするネットワークプルーニングは,最近,LLM加速のために検討されている。既存のLCMプルーニング作業は、非構造化プルーニングに重点を置いている。対照的に、構造化プルーニングは一般的なデバイスでのレイテンシを低減することができる。しかし、構造的刈り込みを効率的に行い、特に疎度比の高い性能を維持することは依然として課題である。この目的のために、我々は、粗い(インターブロック)ときめ細かい(イントラブロック)アクティベーション情報の両方をガイドプルーニングの重要基準として活用する、CFSPと呼ばれる効率的な構造化プルーニングフレームワークを導入する。プルーニングは、機能アクティベーションを計算するために1つのフォワードパスしか必要としないため、非常に効率的である。具体的には,まず,各ブロックの重み付けを重要度に基づいて,各ブロックに分散予算を割り当てる。さらに,粗い重要度に基づいてトレーニングのオーバーヘッドを適応的に配分し,さらなる性能向上を図るリカバリファインチューニング戦略を導入する。実験結果から, CFSPは, 多様なモデルにおいて, 様々な予算にまたがる既存手法よりも優れていることがわかった。私たちのコードはhttps://github.com/wyxscir/CFSP.comで公開されます。

The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing LLM pruning works focus on unstructured pruning, which typically requires special hardware support for a practical speed-up. In contrast, structured pruning can reduce latency on general devices. However, it remains a challenge to perform structured pruning efficiently and maintain performance, especially at high sparsity ratios. To this end, we introduce an efficient structured pruning framework named CFSP, which leverages both Coarse (interblock) and Fine-grained (intrablock) activation information as an importance criterion to guide pruning. The pruning is highly efficient, as it only requires one forward pass to compute feature activations. Specifically, we first allocate the sparsity budget across blocks based on their importance and then retain important weights within each block. In addition, we introduce a recovery fine-tuning strategy that adaptively allocates training overhead based on coarse-grained importance to further improve performance. Experimental results demonstrate that CFSP outperforms existing methods on diverse models across various sparsity budgets. Our code will be available at https://github.com/wyxscir/CFSP.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# ニューラル・シンボリック協調蒸留:複雑な推論タスクのための小言語モデルの改善

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks ( http://arxiv.org/abs/2409.13203v1 )

ライセンス: Link先を確認

Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Jun Zhao,

(参考訳) 本稿では,大規模言語モデル (LLMs, e g , \textgreater 13B) の複雑な推論能力を学習するための知識蒸留法である $\textbf{Ne}$ural-$\textbf{Sy}$mbolic $\textbf{C}$ollaborative $\textbf{D}$istillation$\textbf{NesyCD}$B を提案する。これらのタスクは、一般的な認知能力だけでなく、専門知識も要求するので、SLM(Small Language Models, SLMs, eg, $\leq$ 7B)にとって複雑な推論タスクは難しいと我々は主張する。そのため、NesyCDはLLMの一般的な能力と専門知識を異なる方法で蒸留する。一方,教師のLSMからパラメータ化されたニューラルネットワークの学生のSLMにのみ一般能力を蒸留する。一方,複雑な推論課題の専門的能力と非常識的知識については,記号的知識蒸留法を用いて,その専門的知識を記号的知識基盤(KB)内に獲得・保存する。一般的な機能と特殊な機能を分離することにより、提案したNesyCDは、より小さなモデルを活用し、パラメータ化されたニューラルネットワークとシンボリックKBをブレンドすることで、より優れたパフォーマンスを実現することができる。さらに、特殊なKBはよく一般化し、人間によって解釈され、操作される。実験の結果,NesyCDは領域内(BBH, GSM8K)および領域外(AGIEval, ARC)データセット上でのSLMの複雑な推論性能を大幅に向上させることがわかった。特に,LLaMA3-8B と Qwen2-7B は GPT-3.5-turbo を上回り,LLaMA3-70B に近づいた。私たちのコードはhttps://github.com/Xnhyacinth/NesyCDで公開されます。

In this paper, we propose $\textbf{Ne}$ural-$\textbf{Sy}$mbolic $\textbf{C}$ollaborative $\textbf{D}$istillation ($\textbf{NesyCD}$), a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs, e.g., \textgreater 13B). We argue that complex reasoning tasks are difficult for Small Language Models (SLMs, e.g., $\leq$ 7B), as these tasks demand not only general cognitive abilities but also specialized knowledge, which is often sparse and difficult for these neural-based SLMs to effectively capture. Therefore, NesyCD distills the general capabilities and specialized knowledge in LLMs using different manners. On the one hand, we distill only general abilities from teacher LLMs into the student SLMs of parameterized neural networks. On the other hand, for the specialized abilities and uncommon knowledge of a complex reasoning task, we employ a symbolic knowledge distillation approach to obtain and store the specialized knowledge within a symbolic knowledge base (KB). By decoupling general and specialized capabilities, the proposed NesyCD can achieve superior performance cost-effectively, utilizing smaller models and blending parameterized neural networks with symbolic KB. Moreover, the specialized KB generalizes well and is comprehended and manipulated by humans. Our experiments show that NesyCD significantly boosts SLMs' complex reasoning performance on in-domain (BBH, GSM8K) and out-of-domain (AGIEval, ARC) datasets. Notably, our approach enabled the LLaMA3-8B and Qwen2-7B to surpass GPT-3.5-turbo in performance and come close to matching LLaMA3-70B, despite the latter having nine times more parameters. Our code will be available at https://github.com/Xnhyacinth/NesyCD.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# 回帰誘導ニューラルネットワークを用いた環境危険因子による健康リスクの集団不均一性の解明

Unveiling Population Heterogeneity in Health Risks Posed by Environmental Hazards Using Regression-Guided Neural Network ( http://arxiv.org/abs/2409.13205v1 )

ライセンス: Link先を確認

Jong Woo Nam, Eun Young Choi, Jennifer A. Ailshire, Yao-Yi Chiang,

(参考訳) 環境の危険は、特定の個人を不均等に高いリスクに陥らせる。これらの危険が人間の健康を危険にさらす中、最も脆弱な集団の正確な同定は公衆衛生にとって重要である。モデレート多重回帰(MMR)は、リスクへの曝露と他の集団特性の間の相互作用項を線形回帰モデルに付加することにより、これを調査するための簡単な方法を提供する。しかし、脆弱性が多くの特徴の断面に隠されている場合、MMRは意味のある発見を見つける能力に制限されることが多い。本稿では、ニューラルネットワーク(ANN)を用いて予測器を非線形に結合し、局所予測器と相互作用する潜伏表現を生成するハイブリッド手法である回帰誘導ニューラルネットワーク(RegNN)を提案する。大気汚染(PM2.5)が認知機能に与える影響について,ReGNNを用いた調査を行った。従来のMMRモデルに適合する結果と比較することにより,従来のMMRを用いて隠蔽される集団の不均一性をReGNNを用いて発見できることを実証した。本質的には、ReGNNは、個人の健康リスクに対する感受性を効果的に要約し定量化することで、従来の回帰モデルを強化する新しいツールである。

Environmental hazards place certain individuals at disproportionately higher risks. As these hazards increasingly endanger human health, precise identification of the most vulnerable population subgroups is critical for public health. Moderated multiple regression (MMR) offers a straightforward method for investigating this by adding interaction terms between the exposure to a hazard and other population characteristics to a linear regression model. However, when the vulnerabilities are hidden within a cross-section of many characteristics, MMR is often limited in its capabilities to find any meaningful discoveries. Here, we introduce a hybrid method, named regression-guided neural networks (ReGNN), which utilizes artificial neural networks (ANNs) to non-linearly combine predictors, generating a latent representation that interacts with a focal predictor (i.e. variable measuring exposure to an environmental hazard). We showcase the use of ReGNN for investigating the population heterogeneity in the health effects of exposure to air pollution (PM2.5) on cognitive functioning scores. We demonstrate that population heterogeneity that would otherwise be hidden using traditional MMR can be found using ReGNN by comparing its results to the fit results of the traditional MMR models. In essence, ReGNN is a novel tool that enhances traditional regression models by effectively summarizing and quantifying an individual's susceptibility to health risks.

翻訳日:2024-11-07 11:29:51 公開日:2024-09-20

# 倫理的問題に対するレコメンダシステム監査のための統一因果関係

A Unified Causal Framework for Auditing Recommender Systems for Ethical Concerns ( http://arxiv.org/abs/2409.13210v1 )

ライセンス: Link先を確認

Vibhhu Sharma, Shantanu Gupta, Nil-Jana Akpinar, Zachary C. Lipton, Liu Leqi,

(参考訳) 推薦システムがさまざまなドメインに広くデプロイされるようになると、ユーザの信念や好みに影響を及ぼすようになる。推薦システムの監査は、レコメンデーションアルゴリズムの継続的な改善を保証するだけでなく、バイアスや倫理的懸念といった潜在的な問題に対する保護も必要である。本稿では、因果レンズからのレコメンデータシステム監査を考察し、監査基準を定義するための一般的なレシピを提供する。この一般的な因果監査フレームワークの下では、既存の監査指標を分類し、それらのギャップを識別する -- 特に、レコメンデーションプロセスのマルチステップのダイナミクスを考慮しつつ、ユーザエージェンシーを監査するための指標が欠如している。筆者らは,我々のフレームワークを活用して,ユーザ自身の推奨に影響を及ぼす能力と,他のユーザの推奨に影響を及ぼす能力を測定する,未来と過去の対応性と安定性の2つの尺度を提案する。我々は、これらのメトリクスを計算するための勾配ベースのアプローチとブラックボックスアプローチの両方を提供し、監査人がレコメンデータシステムに異なるレベルのアクセスでそれらを計算できるようにする。本実験では,提案手法の有効性を実証し,提案手法を用いてレコメンダシステムの設計を検証した。

As recommender systems become widely deployed in different domains, they increasingly influence their users' beliefs and preferences. Auditing recommender systems is crucial as it not only ensures the continuous improvement of recommendation algorithms but also safeguards against potential issues like biases and ethical concerns. In this paper, we view recommender system auditing from a causal lens and provide a general recipe for defining auditing metrics. Under this general causal auditing framework, we categorize existing auditing metrics and identify gaps in them -- notably, the lack of metrics for auditing user agency while accounting for the multi-step dynamics of the recommendation process. We leverage our framework and propose two classes of such metrics:future- and past-reacheability and stability, that measure the ability of a user to influence their own and other users' recommendations, respectively. We provide both a gradient-based and a black-box approach for computing these metrics, allowing the auditor to compute them under different levels of access to the recommender system. In our experiments, we demonstrate the efficacy of methods for computing the proposed metrics and inspect the design of recommender systems through these proposed metrics.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# MalMixer: Retrieval-Augmented Semi-Supervised Learningを用いたFew-Shotのマルウェア分類

MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning ( http://arxiv.org/abs/2409.13213v1 )

ライセンス: Link先を確認

Eric Li, Yifan Zhang, Yu Huang, Kevin Leach,

(参考訳) 近年のマルウェアの増殖と増殖は、感染家族による新しいサンプルを迅速に分類する実践者の能力を検証している。労働集約的なリバースエンジニアリングの取り組みとは対照的に、機械学習アプローチはスピードと精度の向上を実証している。しかし、既存のディープラーニングマルウェアのファミリー分類器の多くは、トレーニング前に手動で分析される大量のサンプルを使用して校正されなければならない。さらに、トレーニングセットの範囲を超えて、新しいマルウェアサンプルが出現するにつれて、トレーニングセットを更新するためには、さらなるリバースエンジニアリングの努力を払わなければならない。野生で発見された新しいサンプルの量は、現代の分類器を適切に訓練するのに十分なマルウェアをリバースエンジニアリングする実践者の能力にかなりの圧力を与えている。本稿では,半教師付き学習を用いたマルウェアファミリー分類器であるMalMixerを提案する。本稿では、マルウェアの特徴表現を増強し、半教師付きマルウェアファミリー分類の少数ショット性能を向上させるための新しいドメイン知識認識手法を提案する。そこで,MalMixerは,数ショットのマルウェアファミリー分類設定において,最先端のパフォーマンスを実現する。本研究は、軽量なドメイン認識機能拡張手法の有効性と有効性を確認し、マルウェア分類問題に対処する上で、類似の半教師付き分類器の能力を強調した。

Recent growth and proliferation of malware has tested practitioners' ability to promptly classify new samples according to malware families. In contrast to labor-intensive reverse engineering efforts, machine learning approaches have demonstrated increased speed and accuracy. However, most existing deep-learning malware family classifiers must be calibrated using a large number of samples that are painstakingly manually analyzed before training. Furthermore, as novel malware samples arise that are beyond the scope of the training set, additional reverse engineering effort must be employed to update the training set. The sheer volume of new samples found in the wild creates substantial pressure on practitioners' ability to reverse engineer enough malware to adequately train modern classifiers. In this paper, we present MalMixer, a malware family classifier using semi-supervised learning that achieves high accuracy with sparse training data. We present a novel domain-knowledge-aware technique for augmenting malware feature representations, enhancing few-shot performance of semi-supervised malware family classification. We show that MalMixer achieves state-of-the-art performance in few-shot malware family classification settings. Our research confirms the feasibility and effectiveness of lightweight, domain-knowledge-aware feature augmentation methods and highlights the capabilities of similar semi-supervised classifiers in addressing malware classification issues.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# 多重忠実度による不誠実な絡み合いの検出

Detecting unfaithful entanglement by multiple fidelities ( http://arxiv.org/abs/2409.13214v1 )

ライセンス: Link先を確認

Ruiqi Zhang, Zhaohui Wei,

(参考訳) 未知の量子状態に対する証明の絡み合いは、量子コンピューティングと量子物理学の基本的な問題である。実装が容易であるため、現代の量子実験におけるこの問題に対する最も一般的なアプローチは、忠実度に基づく絡み合った証人による標的量子状態の検出である。具体的には、対象状態と絡み合った純状態との忠実度が一定の値を超えると、対象状態が絡み合うことが保証される。しかし、近年では、いわゆる不信な量子状態が存在し、絡み合うことができるが、その絡み合いは、忠実性に基づく絡み合いの証人によっては証明できないことが判明している。本稿では,複数の忠実度を組み合わせることで,忠実度に基づく絡み合いをわずかに修正した場合,この手法で不信な量子状態に対する絡み合いを証明できることを,具体例で示す。特に,修正された絡み目の数学的構造を分析し,それらの最適設計を探索するアルゴリズムを提案する。

Certifying entanglement for unknown quantum states experimentally is a fundamental problem in quantum computing and quantum physics. Because of being easy to implement, a most popular approach for this problem in modern quantum experiments is detecting target quantum states with fidelity-based entanglement witnesses. Specifically, if the fidelity between a target state and an entangled pure state exceeds a certain value, the target state can be guaranteed to be entangled. Recently, however, it has been realized that there exist so-called unfaithful quantum states, which can be entangled, but their entanglement cannot be certified by any fidelity-based entanglement witnesses. In this paper, by specific examples we show that if one makes a slight modification to fidelity-based entanglement witnesses by combining multiple fidelities together, it is still possible to certify entanglement for unfaithful quantum states with this popular technique. Particularly, we will analyze the mathematical structure of the modified entanglement witnesses, and propose an algorithm that can search for the optimal designs for them.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# 3D-GSW:放射場における著作権保護のための3Dガウスめっき透かし

3D-GSW: 3D Gaussian Splatting Watermark for Protecting Copyrights in Radiance Fields ( http://arxiv.org/abs/2409.13222v1 )

ライセンス: Link先を確認

Youngdong Jang, Hyunje Park, Feng Yang, Heeju Ko, Euijin Choo, Sangpil Kim,

(参考訳) 近年, 高速レンダリングと画像品質により, 3次元空間を表現する革新的な手法として, 3次元ガウススプラッティングが注目されている。しかし、3Dガウシアンスプラッティングの著作権保護はまだ導入されていない。本稿では,3次元ガウススプラッティングのための新しい透かし法を提案する。提案手法は,事前学習した3次元ガウススプラッティングモデルを微調整することにより,バイナリメッセージを3次元ガウスに埋め込む。これを実現するために、離散フーリエ変換を用いて高頻度のパッチを見つけ出し、3Dガウス寄与ベクトルに基づいて3Dガウスを分割する周波数誘導密度化(FGD)を提案する。レンダリングされたピクセルの色に対する3Dガウスの寄与であり、レンダリング品質とビット精度の両方を改善している。さらに、レンダリング品質を向上させるために、適応的な勾配マスクを変更する。実験の結果,本手法は3次元ガウシアンに透かしを埋め込むことができ,攻撃に対するキャパシティとロバスト性を高めることができることがわかった。提案手法は最適化コストを削減し,他の手法と比較して最先端の性能を実現する。

Recently, 3D Gaussian splatting has been getting a lot of attention as an innovative method for representing 3D space due to rapid rendering and image quality. However, copyright protection for the 3D Gaussian splatting has not yet been introduced. In this paper, we present a novel watermarking method for 3D Gaussian splatting. The proposed method embeds a binary message into 3D Gaussians by fine-tuning the pre-trained 3D Gaussian splatting model. To achieve this, we present Frequency-Guided Densification (FGD) that utilizes Discrete Fourier Transform to find patches with high-frequencies and split 3D Gaussians based on 3D Gaussian Contribution Vector. It is each 3D Gaussian contribution to rendered pixel colors, improving both rendering quality and bit accuracy. Furthermore, we modify an adaptive gradient mask to enhance rendering quality. Our experiments show that our method can embed a watermark in 3D Gaussians imperceptibly with increased capacity and robustness against attacks. Our method reduces optimization cost and achieves state-of-the-art performance compared to other methods.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# 並列化可能な物理シミュレータを用いた非定常物体マニピュレーションのためのインクリメンタルFew-Shot適応

Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation using Parallelizable Physics Simulators ( http://arxiv.org/abs/2409.13228v1 )

ライセンス: Link先を確認

Fabian Baumeister, Lukas Mack, Joerg Stueckler,

(参考訳) 日々の環境やフレキシブル生産といったオープンワールド環境でタスクを実行するインテリジェントロボットにとって、ショット適応は重要な機能である。本稿では,モデル予測制御のための物理に基づく力学モデルに反復的に適応する,非包括的操作のための新しいアプローチを提案する。ロボットとオブジェクトの相互作用の例として,モデルのパラメータを漸進的に適用する。これは、並列化可能な剛体物理シミュレーションを動的世界モデルとして用いたパラメータのサンプリングベース最適化によって達成される。代わりに、効率的なサンプリングベース最適化を用いたモデル予測制御に最適化されたダイナミクスモデルを用いることができる。シミュレーションおよび実ロボットを用いたいくつかの物体押出実験において,本手法の有効性を検証した。

Few-shot adaptation is an important capability for intelligent robots that perform tasks in open-world settings such as everyday environments or flexible production. In this paper, we propose a novel approach for non-prehensile manipulation which iteratively adapts a physics-based dynamics model for model-predictive control. We adapt the parameters of the model incrementally with a few examples of robot-object interactions. This is achieved by sampling-based optimization of the parameters using a parallelizable rigid-body physics simulation as dynamic world model. In turn, the optimized dynamics model can be used for model-predictive control using efficient sampling-based optimization. We evaluate our few-shot adaptation approach in several object pushing experiments in simulation and with a real robot.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# 脳腫瘍分離のためのnnU-NetにおけるマルチスケールエンコーダとOmni次元動的畳み込み強化

Multiscale Encoder and Omni-Dimensional Dynamic Convolution Enrichment in nnU-Net for Brain Tumor Segmentation ( http://arxiv.org/abs/2409.13229v1 )

ライセンス: Link先を確認

Sahaj K. Mistry, Sourav Saini, Aashray Gupta, Aayush Gupta, Sunny Rai, Vinit Jakhetiya, Ujjwal Baid, Sharath Chandra Guntuku,

(参考訳) 脳腫瘍の分節はコンピュータ支援診断において重要な役割を担っている。本研究では nnU-Net アーキテクチャを改良した新しいセグメンテーションアルゴリズムを提案する。 nnU-Netアーキテクチャのエンコーダ部では、全次元動的畳み込み層を組み込んで従来の畳み込み層を強化し、特徴表現を改善した。同時に,様々な尺度からの現代的洞察を活用するマルチスケールアテンション戦略を提案する。モデルの有効性はBraTS-2023チャレンジの多様なデータセットで実証される。オムニ次元動的畳み込み(ODConv)層とマルチスケール機能を統合することで、複数の腫瘍セグメンテーションデータセット間でnnU-Netアーキテクチャの性能が大幅に向上する。注目すべきは、BraTS Africaデータセットの検証において、提案したモデルが良好な精度が得られることだ。 ODconvのソースコードと完全なトレーニングコードはGitHubで公開されている。

Brain tumor segmentation plays a crucial role in computer-aided diagnosis. This study introduces a novel segmentation algorithm utilizing a modified nnU-Net architecture. Within the nnU-Net architecture's encoder section, we enhance conventional convolution layers by incorporating omni-dimensional dynamic convolution layers, resulting in improved feature representation. Simultaneously, we propose a multi-scale attention strategy that harnesses contemporary insights from various scales. Our model's efficacy is demonstrated on diverse datasets from the BraTS-2023 challenge. Integrating omni-dimensional dynamic convolution (ODConv) layers and multi-scale features yields substantial improvement in the nnU-Net architecture's performance across multiple tumor segmentation datasets. Remarkably, our proposed model attains good accuracy during validation for the BraTS Africa dataset. The ODconv source code along with full training code is available on GitHub.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# DNNの不確かさと敵攻撃との関係

Relationship between Uncertainty in DNNs and Adversarial Attacks ( http://arxiv.org/abs/2409.13232v1 )

ライセンス: Link先を確認

Abigail Adeniran, Adewale Adeyemo,

(参考訳) ディープニューラルネットワーク(DNN)は、最先端の結果を達成し、多くの課題において人間の精度よりも優れており、自然言語処理、パターン認識、予測、制御最適化など、さまざまな分野に採用されている。しかし、DNNは結果の不確実性を伴うため、あるレベルの信頼の域外にある結果を予測する。これらの不確実性は、敵の攻撃によって悪化する可能性があるモデルまたはデータ制約に起因している。敵攻撃は、DNNに摂動入力を提供することを目的としており、DNNは誤った予測をしたり、モデルの不確実性を増大させる。本稿では,DNNの不確実性と敵攻撃との関係を考察し,敵攻撃がDNNの不確実性をいかに引き起こすかを強調した。

Deep Neural Networks (DNNs) have achieved state of the art results and even outperformed human accuracy in many challenging tasks, leading to DNNs adoption in a variety of fields including natural language processing, pattern recognition, prediction, and control optimization. However, DNNs are accompanied by uncertainty about their results, causing them to predict an outcome that is either incorrect or outside of a certain level of confidence. These uncertainties stem from model or data constraints, which could be exacerbated by adversarial attacks. Adversarial attacks aim to provide perturbed input to DNNs, causing the DNN to make incorrect predictions or increase model uncertainty. In this review, we explore the relationship between DNN uncertainty and adversarial attacks, emphasizing how adversarial attacks might raise DNN uncertainty.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# 混合音と人工音のみを用いたフェデレーション環境におけるラベル不均衡のバランス

Balancing Label Imbalance in Federated Environments Using Only Mixup and Artificially-Labeled Noise ( http://arxiv.org/abs/2409.13235v1 )

ライセンス: Link先を確認

Kyle Sang, Tahseen Rabbani, Furong Huang,

(参考訳) 分散あるいはフェデレーションされた環境のクライアントは、しばしばラベルの異なるサブセットに向かってスキューされたデータを保持する。このシナリオは、異種または非IDフェデレーション学習と呼ばれ、モデルトレーニングとパフォーマンスを著しく妨げていることが示されている。本研究では,スキューラベル分布のバランスをとるための,単純かつ効果的な拡張戦略の限界について検討する。既存のアルゴリズムでは、ローカルトレーニングデータのミックスアップのような擬似イメージのみをトレーニングしていますが、当社の強化されたクライアントデータセットは、実画像と擬似イメージの両方で構成されています。他の文献とは対照的に,(1) DP-Instahide 変種を用いて画像符号化の復調性を低減し,(2) ツイストとして,訓練なしのStyleGAN が生成する人工ラベル付き「自然ノイズ」を用いて局所データを補う。これらのノイズのある画像は、自然のシーンに存在するパワースペクトルパターンを模倣し、ミキシング画像とともに、クライアント間のラベルの分布を均質化するのに役立ちます。ラベル付きCIFAR-10およびMNIST訓練において,混合と自然雑音による少量の増強が顕著に改善することが実証された。

Clients in a distributed or federated environment will often hold data skewed towards differing subsets of labels. This scenario, referred to as heterogeneous or non-iid federated learning, has been shown to significantly hinder model training and performance. In this work, we explore the limits of a simple yet effective augmentation strategy for balancing skewed label distributions: filling in underrepresented samples of a particular label class using pseudo-images. While existing algorithms exclusively train on pseudo-images such as mixups of local training data, our augmented client datasets consist of both real and pseudo-images. In further contrast to other literature, we (1) use a DP-Instahide variant to reduce the decodability of our image encodings and (2) as a twist, supplement local data using artificially labeled, training-free 'natural noise' generated by an untrained StyleGAN. These noisy images mimic the power spectra patterns present in natural scenes which, together with mixup images, help homogenize label distribution among clients. We demonstrate that small amounts of augmentation via mixups and natural noise markedly improve label-skewed CIFAR-10 and MNIST training.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# ひずみ定位を急激な不連続性としてモデル化するDeep Ritz法の適用可能性を探る

Exploring the ability of the Deep Ritz Method to model strain localization as a sharp discontinuity ( http://arxiv.org/abs/2409.13241v1 )

ライセンス: Link先を確認

Omar León, Víctor Rivera, Angel Vázquez-Patiño, Jacinto Ulloa, Esteban Samaniego,

(参考訳) 本研究では, 変位場における急激な不連続性として固体中のひずみ局在をモデル化するためのDeep Ritz Method (DRM) の可能性について探索的検討を行った。このために、弾塑性固体の変種設定において、正則化された強不連続キネマティクスを用いる。対応する数学的モデルは、ニューラルネットワーク(ANN)を用いて離散化される。アーキテクチャはキネマティクスを処理し、境界値問題の変分文は損失関数によって処理される。このアプローチの背景にある主な考え方は、ANNのトレーニング可能なパラメータを用いて、平衡問題と局所化帯域の位置の両方を解決することである。概念実証として,DRM の枠組み内での弾塑性固体のひずみ局在の計算モデルが実現可能であることを示す。

We present an exploratory study of the possibilities of the Deep Ritz Method (DRM) for the modeling of strain localization in solids as a sharp discontinuity in the displacement field. For this, we use a regularized strong discontinuity kinematics within a variational setting for elastoplastic solids. The corresponding mathematical model is discretized using Artificial Neural Networks (ANNs). The architecture takes care of the kinematics, while the variational statement of the boundary value problem is taken care of by the loss function. The main idea behind this approach is to solve both the equilibrium problem and the location of the localization band by means of trainable parameters in the ANN. As a proof of concept, we show through both 1D and 2D numerical examples that the computational modeling of strain localization for elastoplastic solids within the framework of DRM is feasible.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# 単一画像からの閉塞除去のためのディープジェネレーティブ・アドバイサル・ネットワーク

Deep Generative Adversarial Network for Occlusion Removal from a Single Image ( http://arxiv.org/abs/2409.13242v1 )

ライセンス: Link先を確認

Sankaraganesh Jonna, Moushumi Medhi, Rajiv Ranjan Sahay,

(参考訳) 今日では、インストレッシブイメージングデバイスの能力が強化され、インターネット上でのマルチメディアコンテンツの獲得と共有が大幅に増加しています。画像センサー技術の進歩にもかかわらず、‘textit{occlusions’のような厄介な条件は写真撮影を妨げ、監視、検出、認識などのアプリケーションの性能を低下させる可能性がある。オークルージョンセグメンテーションは、スケールのばらつきや照明の変化などにより困難である。同様に、前景の閉塞からシーンを復元することは、閉鎖された領域を正確に推定し、周囲のコンテキストとの整合性を維持するという複雑さのために、重大な課題を引き起こす。特に、画像のデフェンシングは、形状、テクスチャ、色、パターン、そしてしばしば散らかった環境の様々なバリエーションのために、独自の課題を提示している。本研究では,単一画像からの閉塞の自動検出と除去に焦点を当てた。本稿では,完全自動2段階畳み込みニューラルネットワークを提案する。我々は、GANを利用して、構造とテクスチャの両方を含む現実的なコンテンツを、インペイントのための単一ショットで合成する。ゼロショットの一般化を評価するため,提案したフェンス状閉塞セグメンテーションデータセットを用いて,訓練された閉塞検出モデルを評価した。データセットはGitHubにある。

Nowadays, the enhanced capabilities of in-expensive imaging devices have led to a tremendous increase in the acquisition and sharing of multimedia content over the Internet. Despite advances in imaging sensor technology, annoying conditions like \textit{occlusions} hamper photography and may deteriorate the performance of applications such as surveillance, detection, and recognition. Occlusion segmentation is difficult because of scale variations, illumination changes, and so on. Similarly, recovering a scene from foreground occlusions also poses significant challenges due to the complexity of accurately estimating the occluded regions and maintaining coherence with the surrounding context. In particular, image de-fencing presents its own set of challenges because of the diverse variations in shape, texture, color, patterns, and the often cluttered environment. This study focuses on the automatic detection and removal of occlusions from a single image. We propose a fully automatic, two-stage convolutional neural network for fence segmentation and occlusion completion. We leverage generative adversarial networks (GANs) to synthesize realistic content, including both structure and texture, in a single shot for inpainting. To assess zero-shot generalization, we evaluated our trained occlusion detection model on our proposed fence-like occlusion segmentation dataset. The dataset can be found on GitHub.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# 認知から認知へ:ソーシャルナビゲーションのための未来認識フレームワーク

From Cognition to Precognition: A Future-Aware Framework for Social Navigation ( http://arxiv.org/abs/2409.13244v1 )

ライセンス: Link先を確認

Zeying Gong, Tianshuai Hu, Ronghe Qiu, Junwei Liang,

(参考訳) 混み合った空間で安全に効率的に移動するためには、ロボットは環境の現在の状態を認識できるだけでなく、将来の人間の動きも予測すべきである。本稿では,人間の軌道を明示的に予測し,将来の人間の進路を阻害する罰則を課すことにより,社会的に認識されたナビゲーションに取り組むための強化学習アーキテクチャであるFalconを提案する。現実的な評価を容易にするために,Social-HM3DとSocial-MP3Dの2つの新しいデータセットを含むSocialNavベンチマークを導入する。このベンチマークでは、自然の人間の動きと軌道パターンを取り入れた、シーン面積の大きさに基づいて、適切な量の人間のエージェントが集まっている大規模な写真リアリスティック屋内シーンを提供する。新しいベンチマークでは,最先端の学習手法と古典的なルールベースの経路計画アルゴリズムを用いて,詳細な実験分析を行う。その結果、今後の予測の重要性が示され、我々の手法は、約90%の個人空間コンプライアンスを維持しつつ、55%のタスク成功率を達成することができた。コードとデータセットをリリースします。デモのビデオはhttps://zeying-gong.github.io/projects/falcon/ で見ることができる。

To navigate safely and efficiently in crowded spaces, robots should not only perceive the current state of the environment but also anticipate future human movements. In this paper, we propose a reinforcement learning architecture, namely Falcon, to tackle socially-aware navigation by explicitly predicting human trajectories and penalizing actions that block future human paths. To facilitate realistic evaluation, we introduce a novel SocialNav benchmark containing two new datasets, Social-HM3D and Social-MP3D. This benchmark offers large-scale photo-realistic indoor scenes populated with a reasonable amount of human agents based on scene area size, incorporating natural human movements and trajectory patterns. We conduct a detailed experimental analysis with the state-of-the-art learning-based method and two classic rule-based path-planning algorithms on the new benchmark. The results demonstrate the importance of future prediction and our method achieves the best task success rate of 55% while maintaining about 90% personal space compliance. We will release our code and datasets. Videos of demonstrations can be viewed at https://zeying-gong.github.io/projects/falcon/ .

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# マルチタスク学習によるクロススキャナ腺癌分画の改善

Understanding Stain Separation Improves Cross-Scanner Adenocarcinoma Segmentation with Joint Multi-Task Learning ( http://arxiv.org/abs/2409.13246v1 )

ライセンス: Link先を確認

Ho Heon Kim, Won Chan Jeong, Young Shin Ko, Young Jin Park,

(参考訳) デジタル病理学は、腫瘍の診断とセグメンテーションに大きな進歩をもたらしたが、臓器、組織の準備、取得(ドメインシフトとして知られる)の違いによる画像の多様性は、現在のアルゴリズムの有効性を制限している。 COSAS(Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation)は、セグメンテーションアルゴリズムのドメインシフトに対するレジリエンスを改善することでこの問題に対処する。提案手法では,マルチデコーダオートエンコーダを用いたマルチタスク学習フレームワーク内での汚れ分離による教師なし学習を採用する。このモデルは、染色マトリクスと染色密度を分離し、色の変化を処理し、スキャナー間の一般化を改善する。さらに,ステン強化技術の混合によりモデルの堅牢性を高め,セグメンテーションにU-netアーキテクチャを使用した。本手法の新規性はマルチタスク学習フレームワーク内での染色分離の利用であり,色の変化から組織構造を効果的に切り離すことができる。このアプローチは、異なる病理組織染色のセグメンテーション精度と一般化を改善し、デジタル病理学におけるより信頼性の高い診断ツールの道を開くことを約束する。

Digital pathology has made significant advances in tumor diagnosis and segmentation, but image variability due to differences in organs, tissue preparation, and acquisition - known as domain shift - limits the effectiveness of current algorithms. The COSAS (Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation) challenge addresses this issue by improving the resilience of segmentation algorithms to domain shift, with Task 2 focusing on adenocarcinoma segmentation using a diverse dataset from six scanners, pushing the boundaries of clinical diagnostics. Our approach employs unsupervised learning through stain separation within a multi-task learning framework using a multi-decoder autoencoder. This model isolates stain matrix and stain density, allowing it to handle color variation and improve generalization across scanners. We further enhanced the robustness of the model with a mixture of stain augmentation techniques and used a U-net architecture for segmentation. The novelty of our method lies in the use of stain separation within a multi-task learning framework, which effectively disentangles histological structures from color variations. This approach shows promise for improving segmentation accuracy and generalization across different histopathological stains, paving the way for more reliable diagnostic tools in digital pathology.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# T2M-X:部分的注釈付きデータから表現型テキスト対運動生成を学習する

T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data ( http://arxiv.org/abs/2409.13251v1 )

ライセンス: Link先を確認

Mingdian Liu, Yilin Liu, Gurunandan Krishnan, Karl S Bayer, Bing Zhou,

(参考訳) テキストプロンプトからヒューマノイドアニメーションを生成することは、アニメーション制作とAR/VR体験に大きな影響を与える。しかし,既存手法では表情や手の動きを除いた身体の動きデータしか生成できない。この制限は、主に全身のモーションデータセットが欠如しているため、プロダクション使用の準備が困難である。このようなデータセットを作成しようとする最近の試みは、人工的に強化されたデータにおける異なる身体部分間の運動の不整合、またはRGBビデオから抽出されたデータ品質の低下をもたらす。本研究では,部分注釈付きデータから表現力のあるテキスト・ツー・モーション生成を学習する2段階のT2M-Xを提案する。 T2M-Xは、高品質なモーション出力を保証するために、体、手、顔用の3つの別個のベクトル量子変分オートエンコーダ(VQ-VAEs)を訓練する。本研究は,データセットの制約に対するロバスト性を示すとともに,定量的および定性的にベースラインを大幅に改善したことを示す。

The generation of humanoid animation from text prompts can profoundly impact animation production and AR/VR experiences. However, existing methods only generate body motion data, excluding facial expressions and hand movements. This limitation, primarily due to a lack of a comprehensive whole-body motion dataset, inhibits their readiness for production use. Recent attempts to create such a dataset have resulted in either motion inconsistency among different body parts in the artificially augmented data or lower quality in the data extracted from RGB videos. In this work, we propose T2M-X, a two-stage method that learns expressive text-to-motion generation from partially annotated data. T2M-X trains three separate Vector Quantized Variational AutoEncoders (VQ-VAEs) for body, hand, and face on respective high-quality data sources to ensure high-quality motion outputs, and a Multi-indexing Generative Pretrained Transformer (GPT) model with motion consistency loss for motion generation and coordination among different body parts. Our results show significant improvements over the baselines both quantitatively and qualitatively, demonstrating its robustness against the dataset limitations.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# 知識グラフとLLMの活用による立法システムのサポートと監視

Leveraging Knowledge Graphs and LLMs to Support and Monitor Legislative Systems ( http://arxiv.org/abs/2409.13252v1 )

ライセンス: Link先を確認

Andrea Colombo,

(参考訳) 知識グラフ(KG)は、大規模データセットを構造化された相互接続された情報に整理し、さまざまな分野にわたるデータ分析を強化するために使用されている。立法の文脈において、KGsの潜在的な自然な応用の1つは、法律とそれらの記事とより広範な立法の文脈を結びつける複雑な相互接続のセットをモデル化することである。同時に、GPTのような大規模言語モデル(LLM)の台頭は、テキスト生成や文書起草といった法的な応用に新たな機会をもたらしている。彼らの可能性にもかかわらず、法的な文脈におけるLSMの使用は、新しい法律が毎日発行されるため、幻覚の欠如と最新の情報への依存を必要とするため、非常に重要である。本研究は、立法プロセスの相乗効果と支援について、立法知識グラフとLLMを用いて検討する。我々は、立法制度にKGを使うことの利点、LLMが正確なアウトプットを保証することによって立法活動をどのように支援できるか、そして、非技術系ユーザーがそのような技術を彼らの活動に利用できるようにする方法についての3つの主要な疑問に対処する。この目的のために,我々は,立法分析の実施可能性を高めることを目的とした,イタリアの立法に焦点を当てた対話型プラットフォームであるLegis AI Platformを開発した。

Knowledge Graphs (KGs) have been used to organize large datasets into structured, interconnected information, enhancing data analytics across various fields. In the legislative context, one potential natural application of KGs is modeling the intricate set of interconnections that link laws and their articles with each other and the broader legislative context. At the same time, the rise of large language models (LLMs) such as GPT has opened new opportunities in legal applications, such as text generation and document drafting. Despite their potential, the use of LLMs in legislative contexts is critical since it requires the absence of hallucinations and reliance on up-to-date information, as new laws are published on a daily basis. This work investigates how Legislative Knowledge Graphs and LLMs can synergize and support legislative processes. We address three key questions: the benefits of using KGs for legislative systems, how LLM can support legislative activities by ensuring an accurate output, and how we can allow non-technical users to use such technologies in their activities. To this aim, we develop Legis AI Platform, an interactive platform focused on Italian legislation that enhances the possibility of conducting legislative analysis and that aims to support lawmaking activities.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# Informative Graph Neural Network を用いたデータドリフトにおける時空間インダクティブ予測

Inductive Spatial Temporal Prediction Under Data Drift with Informative Graph Neural Network ( http://arxiv.org/abs/2409.13253v1 )

ライセンス: Link先を確認

Jialun Zheng, Divya Saxena, Jiannong Cao, Hanchen Yang, Penghui Ruan,

(参考訳) 帰納的時空間予測は、非常にダイナミックなシナリオ(例えば、交通システム、株式市場)に不可欠な、目に見えないデータを予測するために歴史的データを一般化することができる。しかし、外部イベント(都市構造の成長、市場崩壊など)や新たなエンティティ(ロケーション、株式など)は、時間の経過とともにデータドリフトを誘導することで予測精度を損なう可能性がある。既存の研究では、データドリフトに対抗するために不変パターンを抽出するが、パターンの多様性は無視する。この問題に対処するため,多変量パターンを抽出し,データドリフト時の予測精度を向上させるためのインフォーマティブグラフニューラルネットワーク(INF-GNN)を設計した。まず,一意に設計された指標であるRelation Importance (RI) を用いて,安定な実体と異なる空間関係を効果的に選択できる情報サブグラフを構築する。このサブグラフは、近隣のマージを通じて新しいエンティティのデータをさらに一般化する。次に,時間間隔内の影響関数を用いて抽出した貴重なタイムスタンプを強調するための情報的時間記憶バッファを提案する。このメモリバッファは、INF-GNNが影響力のある時間パターンを識別することを可能にする。最後に、RI損失の最適化はパターンの整合性のために設計されている。大規模なデータドリフト下の実世界のデータセットに関する大規模な実験は、INF-GNNが既存の選択肢よりも大幅に優れていることを示した。

Inductive spatial temporal prediction can generalize historical data to predict unseen data, crucial for highly dynamic scenarios (e.g., traffic systems, stock markets). However, external events (e.g., urban structural growth, market crash) and emerging new entities (e.g., locations, stocks) can undermine prediction accuracy by inducing data drift over time. Most existing studies extract invariant patterns to counter data drift but ignore pattern diversity, exhibiting poor generalization to unseen entities. To address this issue, we design an Informative Graph Neural Network (INF-GNN) to distill diversified invariant patterns and improve prediction accuracy under data drift. Firstly, we build an informative subgraph with a uniquely designed metric, Relation Importance (RI), that can effectively select stable entities and distinct spatial relationships. This subgraph further generalizes new entities' data via neighbors merging. Secondly, we propose an informative temporal memory buffer to help the model emphasize valuable timestamps extracted using influence functions within time intervals. This memory buffer allows INF-GNN to discern influential temporal patterns. Finally, RI loss optimization is designed for pattern consolidation. Extensive experiments on real-world dataset under substantial data drift demonstrate that INF-GNN significantly outperforms existing alternatives.

翻訳日:2024-11-07 11:18:04 公開日:2024-09-20

# 神経群形成による創発的集団再生

Emergent Collective Reproduction via Evolving Neuronal Flocks ( http://arxiv.org/abs/2409.13254v1 )

ライセンス: Link先を確認

Nam H. Le, Richard Watson, Mike Levin, Chrys Buckley,

(参考訳) この研究は、複雑な生殖集団の出現をシミュレートするために、複雑に自己組織化と自然選択を融合させる新しい人工生命の枠組みであるVitaNovaを通じて、個人性(ETI)の進化的遷移の理解を促進する。捕食者と空間的制約によってそれらに挑戦する環境の中で個々のエージェントを動的にモデル化することで、VitaNovaは単純なエージェントが集合的複製を示す凝集単位へと進化するメカニズムを解明する。この結果は, 自己組織的行動と適応的進化戦略の相乗効果を, ETIの基本的要因として示している。このアプローチは、高次の生物学的個性に対する深い理解に寄与するだけでなく、ETIの実証的研究、現在の理論的枠組みの挑戦、拡張における新たな先例となる。

This study facilitates the understanding of evolutionary transitions in individuality (ETIs) through a novel artificial life framework, named VitaNova, that intricately merges self-organization and natural selection to simulate the emergence of complex, reproductive groups. By dynamically modelling individual agents within an environment that challenges them with predators and spatial constraints, VitaNova elucidates the mechanisms by which simple agents evolve into cohesive units exhibiting collective reproduction. The findings underscore the synergy between self-organized behaviours and adaptive evolutionary strategies as fundamental drivers of ETIs. This approach not only contributes to a deeper understanding of higher-order biological individuality but also sets a new precedent in the empirical investigation of ETIs, challenging and extending current theoretical frameworks.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# 1H遷移金属化合物のハイブリッド次トポロジカル相と遷移

Hybrid-Order Topological Phase And Transition in 1H Transition Metal Compounds ( http://arxiv.org/abs/2409.13258v1 )

ライセンス: Link先を確認

Ning-Jing Yang, Zhigao Huang, Jian-Min Zhang,

(参考訳) 近年のハイブリッドトポロジカル状態(Nature 628, 527 (2024))の実験的観測から着想を得て, 1H遷移金属化合物(TMC)中のハイブリッド-オーダートポロジカル絶縁体(HOTI)を予測し, フェルミ準位付近で2階と1階のトポロジカル状態が共存することを示した。当初、1H-TMCはd軌道のバンドギャップのために2階の位相位相を示す。 p-軌道とd-軌道がカップリングすると、一階の位相特性が現れる。このハイブリッド秩序トポロジカル相転移は結晶場効果によって調整可能である。第一原理計算と組み合わせて、WTe2とNbSe2の相転移を説明する。さらに、HOTIの1階のトポロジカルバンドギャップは、強いスピンホール効果を示す。我々の発見は、2次元電子材料における新しいハイブリッド秩序トポロジカル位相を明らかにし、スピントロニクスの応用を強調した。

Inspired by recent experimental observations of hybrid topological states [Nature 628, 527 (2024)], we predict hybrid-order topological insulators (HOTIs) in 1H transition metal compounds (TMCs), where both second-order and first-order topological states coexist near the Fermi level. Initially, 1H-TMCs exhibit a second-order topological phase due to the d-orbital band gap. Upon coupling of p- and d- orbitals couple, first-order topological characteristics emerge. This hybrid-order topological phase transition is tunable via crystal field effects. Combined with first-principles calculations, we illustrate the phase transition with WTe2 and NbSe2. In addition, the first-order topological band gap of the HOTI exhibits a strong spin Hall effect. Our finding reveal novel hybrid-order topological phase in 2D electron materials and highlight spintronic applications.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# 深層学習を用いたゲノムスケール代謝ネットワークにおける欠失反応の解離のための一般化可能な枠組み

A generalizable framework for unlocking missing reactions in genome-scale metabolic networks using deep learning ( http://arxiv.org/abs/2409.13259v1 )

ライセンス: Link先を確認

Xiaoyi Liu, Hongpeng Yang, Chengwei Ai, Ruihan Dong, Yijie Ding, Qianqian Yuan, Jijun Tang, Fei Guo,

(参考訳) 代謝過程の不完全な知識は、GEnome-scale Metabolic Model (GEMs)の精度を妨げ、システム生物学や代謝工学の進歩を妨げる。既存のギャップ埋め法は、計算予測と実験結果の差を最小限に抑えるために、表現型データに依存するのが一般的である。しかし、実験データやアノテートされたゲノムが利用可能になる前に、初期状態のGEMに自動的かつ正確なギャップ埋め方法がない。本研究では,GEM内のハイパーエッジ予測問題としてモデル化することで,ギャップ埋めの問題に対処するディープラーニング駆動ツールであるCLOSEgapsを紹介する。具体的には、CLOSEgapsは代謝ネットワークをハイパーグラフとしてマッピングし、そのハイパートポロジーの特徴を学習し、仮説的な反応を利用して、欠落した反応とギャップを識別する。この革新的なアプローチは、代謝ネットワーク内の既知の反応と仮説的な反応の両方を特徴づけ、キュレーションすることができる。 CLOSEgaps は人工的に導入した GEM の 96% 以上のギャップを正確に埋めることを示した。さらに、CLOSEgapsは24個のGEMの表現型予測を強化し、2つの生物において4つの重要な代謝物(ラクタート、エタノール、プロピオネート、サクシネート)を生産する際の顕著な改善を見出した。あらゆる GEM に対して広く適用可能な解として、CLOSEgaps はギャップ埋めプロセスの自動化と、反応と観察された代謝表現型の間の欠如した関係を明らかにするための有望なモデルである。

Incomplete knowledge of metabolic processes hinders the accuracy of GEnome-scale Metabolic models (GEMs), which in turn impedes advancements in systems biology and metabolic engineering. Existing gap-filling methods typically rely on phenotypic data to minimize the disparity between computational predictions and experimental results. However, there is still a lack of an automatic and precise gap-filling method for initial state GEMs before experimental data and annotated genomes become available. In this study, we introduce CLOSEgaps, a deep learning-driven tool that addresses the gap-filling issue by modeling it as a hyperedge prediction problem within GEMs. Specifically, CLOSEgaps maps metabolic networks as hypergraphs and learns their hyper-topology features to identify missing reactions and gaps by leveraging hypothetical reactions. This innovative approach allows for the characterization and curation of both known and hypothetical reactions within metabolic networks. Extensive results demonstrate that CLOSEgaps accurately gap-filling over 96% of artificially introduced gaps for various GEMs. Furthermore, CLOSEgaps enhances phenotypic predictions for 24 GEMs and also finds a notable improvement in producing four crucial metabolites (Lactate, Ethanol, Propionate, and Succinate) in two organisms. As a broadly applicable solution for any GEM, CLOSEgaps represents a promising model to automate the gap-filling process and uncover missing connections between reactions and observed metabolic phenotypes.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# 中国におけるASR誤り訂正のための大言語モデルはPinyinを理解すべきである

Large Language Model Should Understand Pinyin for Chinese ASR Error Correction ( http://arxiv.org/abs/2409.13262v1 )

ライセンス: Link先を確認

Yuang Li, Xiaosong Qiao, Xiaofeng Zhao, Huan Zhao, Wei Tang, Min Zhang, Hao Yang,

(参考訳) 大規模言語モデルは、生成誤り訂正によって自動音声認識システムを強化することができる。本稿では,中国語の中国語の音声表現であるPinyiを利用して中国語のASR誤り訂正を改善するPinyin-enhanced GECを提案する。提案手法は, 合成誤差をトレーニングに用い, 推論時に最良仮説を用いる。さらに,Pinyinとテキスト間の変換タスクによる特徴空間の整合性を考慮したマルチタスク学習手法を提案する。 Aishell-1とCommon Voiceデータセットの実験は、我々のアプローチがテキストのみの入力でGECを一貫して上回っていることを示している。より重要なことは、PY-GECの有効性とマルチタスクトレーニングの2つの側面から、直感的な説明を提供することである。 1)ピニイン特徴に対する注意重量の増加,及び 2)Pinyinとテキスト隠蔽状態の整列した特徴空間。

Large language models can enhance automatic speech recognition systems through generative error correction. In this paper, we propose Pinyin-enhanced GEC, which leverages Pinyi, the phonetic representation of Mandarin Chinese, as supplementary information to improve Chinese ASR error correction. Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference. Additionally, we introduce a multitask training approach involving conversion tasks between Pinyin and text to align their feature spaces. Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input. More importantly, we provide intuitive explanations for the effectiveness of PY-GEC and multitask training from two aspects: 1) increased attention weight on Pinyin features; and 2) aligned feature space between Pinyin and text hidden states.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# ライフスパン認知システムに向けて

Towards LifeSpan Cognitive Systems ( http://arxiv.org/abs/2409.13265v1 )

ライセンス: Link先を確認

Yu Wang, Chi Han, Tongtong Wu, Xiaoxin He, Wangchunshu Zhou, Nafis Sadeq, Xiusi Chen, Zexue He, Wei Wang, Gholamreza Haffari, Heng Ji, Julian McAuley,

(参考訳) シミュレーションされたデジタル世界であれ、人間社会であれ、複雑な環境と継続的に対話する人間のようなシステムを構築することは、いくつかの重要な課題を提示している。これの中心は、相互作用を経験と呼ぶ連続して高周波の相互作用を可能にすることである。本稿では,このシステムをLifeSpan Cognitive System (LSCS)と呼ぶ。 LSCSの重要な特徴は、過去の経験を維持し、正確にリコールしながら、インクリメンタルで迅速な更新を行う機能である。本稿は,(1)抽象化と経験の融合,(2)正確なリコールによる長期維持という2つの大きな課題を特定する。これらの特性は、新しい経験を保存し、過去の経験を整理し、関連する歴史的データを活用する方法で環境に反応するために不可欠である。通常、微調整や特定のドメインやタスクのパフォーマンス向上に集中するために大きなコーパスに依存している継続学習を持つ言語モデルとは異なり、LSCSは環境からの新たな情報を高速かつ漸進的に更新する必要がある。上記の2つの課題を解決する可能性を持つ既存の技術は、過去の経験を保存するのに必要な相対空間を測定する概念的尺度であるストレージ複雑度(Storage Complexity)に基づいて、4つのクラスに分類される。これら4つの技術のそれぞれには、それぞれ独自の強みと限界がある。既存の技術がLSCSのみを達成できないことを考えると、LSCSには4種類の技術を統合する新しいパラダイムが提案されている。新パラダイムは,2つのコアプロセス – 吸収エクスペリエンスと生成応答 – を通じて運用される。

Building a human-like system that continuously interacts with complex environments -- whether simulated digital worlds or human society -- presents several key challenges. Central to this is enabling continuous, high-frequency interactions, where the interactions are termed experiences. We refer to this envisioned system as the LifeSpan Cognitive System (LSCS). A critical feature of LSCS is its ability to engage in incremental and rapid updates while retaining and accurately recalling past experiences. We identify two major challenges in achieving this: (1) Abstraction and Experience Merging, and (2) Long-term Retention with Accurate Recall. These properties are essential for storing new experiences, organizing past experiences, and responding to the environment in ways that leverage relevant historical data. Unlike language models with continual learning, which typically rely on large corpora for fine-tuning and focus on improving performance within specific domains or tasks, LSCS must rapidly and incrementally update with new information from its environment at a high frequency. Existing technologies with the potential of solving the above two major challenges can be classified into four classes based on a conceptual metric called Storage Complexity, which measures the relative space required to store past experiences. Each of these four classes of technologies has its own strengths and limitations. Given that none of the existing technologies can achieve LSCS alone, we propose a novel paradigm for LSCS that integrates all four classes of technologies. The new paradigm operates through two core processes: Absorbing Experiences and Generating Responses.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# JoyHallo: マンダリンのデジタルヒューマンモデル

JoyHallo: Digital human model for Mandarin ( http://arxiv.org/abs/2409.13268v1 )

ライセンス: Link先を確認

Sheng Shi, Xuyang Cao, Jun Zhao, Guoxin Wang,

(参考訳) 音声によるビデオ生成では、マンダリンのビデオを作成することが大きな課題である。包括的なマンダリンデータセットの収集は困難であり、マンダリンの複雑な唇の動きは、英語と比較してモデルトレーニングをさらに複雑にしている。本研究では、JD Health International Inc.の従業員から29時間のマンダリン音声ビデオを収集し、その結果、jdh-Halloデータセットが得られた。このデータセットには、さまざまな年齢と話し方が含まれており、会話と専門の医療トピックの両方を含んでいる。マンダリンのJoyHalloモデルに適応するために、我々は中国語wav2vec2モデルをオーディオ機能埋め込みに使用した。唇, 表情, ポーズの特徴間の機能間関係を捉えるために, 半疎結合構造を提案する。この統合により情報利用効率が向上するだけでなく、推論速度も14.3%向上する。特に、JoyHalloは、英語のビデオを生成する強力な能力を維持しており、優れた言語間の生成能力を誇示している。コードとモデルはhttps://jdh-algo.github.io/JoyHalloで公開されている。

In audio-driven video generation, creating Mandarin videos presents significant challenges. Collecting comprehensive Mandarin datasets is difficult, and the complex lip movements in Mandarin further complicate model training compared to English. In this study, we collected 29 hours of Mandarin speech video from JD Health International Inc. employees, resulting in the jdh-Hallo dataset. This dataset includes a diverse range of ages and speaking styles, encompassing both conversational and specialized medical topics. To adapt the JoyHallo model for Mandarin, we employed the Chinese wav2vec2 model for audio feature embedding. A semi-decoupled structure is proposed to capture inter-feature relationships among lip, expression, and pose features. This integration not only improves information utilization efficiency but also accelerates inference speed by 14.3%. Notably, JoyHallo maintains its strong ability to generate English videos, demonstrating excellent cross-language generation capabilities. The code and models are available at https://jdh-algo.github.io/JoyHallo.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# 経験的自由クラス増分学習のためのアダプティブ・マージングローバル分類器

Adaptive Margin Global Classifier for Exemplar-Free Class-Incremental Learning ( http://arxiv.org/abs/2409.13275v1 )

ライセンス: Link先を確認

Zhongren Yao, Xiaobin Chang,

(参考訳) EFCIL(Exemplar-free class-incremental Learning)は、新しいタスク学習に古いクラスサンプルが欠落しているため、大きな課題となる。古いクラスと新しいクラスの厳密な不均衡のため、学習された分類器は、新しいクラスに偏りやすい。さらに、EFCIL で機能抽出器を継続的に更新することは、例えば、古いクラスの特徴の識別能力を損なう可能性がある。既存の手法は主にバイアス付き分類器学習を扱うことに焦点を当てている。本研究では,提案手法を用いて両事例を考察する。具体的には,データ不均衡やサンプリングといった既存手法のバイアス要因を回避するために,まず分散ベースグローバル分類器(DBGC)を導入する。さらに重要なことに、古いクラスの妥協された分布は、単純な操作、分散拡大(VE)によってシミュレートされる。 VEをDBGCに組み込むと、EFCILの新たな分類が失われる。この損失は、Adaptive Margin Softmax Cross Entropy (AMarX)と等価である。そこで提案手法は,Adaptive Margin Global Classifier (AMGC) と呼ばれる。 AMGCは単純だが有効である。広範囲な実験により、AMGCは、難易度の高いEFCIL設定下で、画像分類結果に優れていることが示されている。詳細な分析も、さらなるデモのために提供されている。

Exemplar-free class-incremental learning (EFCIL) presents a significant challenge as the old class samples are absent for new task learning. Due to the severe imbalance between old and new class samples, the learned classifiers can be easily biased toward the new ones. Moreover, continually updating the feature extractor under EFCIL can compromise the discriminative power of old class features, e.g., leading to less compact and more overlapping distributions across classes. Existing methods mainly focus on handling biased classifier learning. In this work, both cases are considered using the proposed method. Specifically, we first introduce a Distribution-Based Global Classifier (DBGC) to avoid bias factors in existing methods, such as data imbalance and sampling. More importantly, the compromised distributions of old classes are simulated via a simple operation, variance enlarging (VE). Incorporating VE based on DBGC results in a novel classification loss for EFCIL. This loss is proven equivalent to an Adaptive Margin Softmax Cross Entropy (AMarX). The proposed method is thus called Adaptive Margin Global Classifier (AMGC). AMGC is simple yet effective. Extensive experiments show that AMGC achieves superior image classification results on its own under a challenging EFCIL setting. Detailed analysis is also provided for further demonstration.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# ランダムサンプリングによるディープ・ニューラル・オペレーター・ネットワークの効率的な学習

Efficient Training of Deep Neural Operator Networks via Randomized Sampling ( http://arxiv.org/abs/2409.13280v1 )

ライセンス: Link先を確認

Sharmila Karumuri, Lori Graham-Brady, Somdatta Goswami,

(参考訳) ニューラル演算子(NOs)は、無限次元関数空間間の写像を学習するためにディープニューラルネットワークを使用する。一般的なNOアーキテクチャであるDeep operator Network (DeepONet)は、様々な科学・工学的応用における複雑な力学のリアルタイム予測に成功している。本稿では,DeepONetのトレーニング中に採用するランダムサンプリング手法を提案する。提案手法は,物理系が定義されている有界領域の時空間位置に対応する基底関数を出力するDeepONetモデルのトランクネットワークを対象としている。伝統的に、損失関数を構築しながら、DeepONetトレーニングは、全ての出力関数がイテレーション毎に評価される時空間点の均一なグリッドを考える。このアプローチは、確率勾配降下(SGD)オプティマイザの制限により、バッチサイズが大きくなり、一般化が貧弱になり、メモリ要求が増大する。トランクネットの入力に対するランダムサンプリングは、これらの課題を軽減し、一般化を改善し、トレーニング中のメモリ要求を低減し、計算能力が大幅に向上する。 3つのベンチマーク例を通じて仮説を検証し、従来のトレーニングアプローチと比較して、全体的なテストエラーを同等または低いものにしながら、トレーニング時間の大幅な削減を実証した。実験の結果,訓練中にトランクネットワーク入力にランダム化を組み込むことで,DeepONetの効率性と堅牢性が向上し,複雑な物理系のモデリングにおけるフレームワークの性能向上に期待できる道筋が得られた。

Neural operators (NOs) employ deep neural networks to learn mappings between infinite-dimensional function spaces. Deep operator network (DeepONet), a popular NO architecture, has demonstrated success in the real-time prediction of complex dynamics across various scientific and engineering applications. In this work, we introduce a random sampling technique to be adopted during the training of DeepONet, aimed at improving the generalization ability of the model, while significantly reducing the computational time. The proposed approach targets the trunk network of the DeepONet model that outputs the basis functions corresponding to the spatiotemporal locations of the bounded domain on which the physical system is defined. Traditionally, while constructing the loss function, DeepONet training considers a uniform grid of spatiotemporal points at which all the output functions are evaluated for each iteration. This approach leads to a larger batch size, resulting in poor generalization and increased memory demands, due to the limitations of the stochastic gradient descent (SGD) optimizer. The proposed random sampling over the inputs of the trunk net mitigates these challenges, improving generalization and reducing memory requirements during training, resulting in significant computational gains. We validate our hypothesis through three benchmark examples, demonstrating substantial reductions in training time while achieving comparable or lower overall test errors relative to the traditional training approach. Our results indicate that incorporating randomization in the trunk network inputs during training enhances the efficiency and robustness of DeepONet, offering a promising avenue for improving the framework's performance in modeling complex physical systems.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# 純外部予測のための時間分散深層学習モデル -気象画像時系列を用いた水表深予測への適用-

Time Distributed Deep Learning models for Purely Exogenous Forecasting. Application to Water Table Depth Prediction using Weather Image Time Series ( http://arxiv.org/abs/2409.13284v1 )

ライセンス: Link先を確認

Matteo Salis, Abdourrahmane M. Atto, Stefano Ferraris, Rosa Meo,

(参考訳) 地下水資源は水循環において最も重要な要素の1つであるため、それらを正確に予測するモデルを開発することは、持続可能な資源管理フレームワークにおいて重要な課題である。深層学習(DL)モデルは、特に空間分布データ(例えばラスタデータ)を供給することによって、水文学において非常に効果的であることが明らかにされている。多くの地域では、水文学的な測定は定期的に、または定期的に取得することは困難であり、場合によっては、最後に利用可能なデータは最新のものではない。逆に、水資源に大きな影響を及ぼす気象データは、通常より利用でき、高品質である。具体的には,Grana-Maira漁獲量(Piemonte, IT)の表層深度を,外因性気象画像時系列のみを用いて予測する2つの異なるDLモデルを提案する。画像時系列を扱うために、どちらのモデルも最初のTime Distributed Convolutional Neural Network (TDC) で構成され、各ステップで利用可能な画像をベクトル表現にエンコードする。最初のモデルであるTDC-LSTMは、LSTM層に基づくシークエンシャルモジュールを使用して、時間的関係を学習し、予測を出力する。第2のモデルであるTDC-UnPWaveNetは、代わりにWaveNetアーキテクチャの新バージョンを使用しており、ここでは、入力されたものに関して、シーケンスを短く、完全にシフトさせるように適応している。この目的と、UnPWaveNetの異なるシーケンス長を扱うために、タイム分散層のように振る舞う新しいチャネル分散層を設計しました。 TDC-LSTMとTDC-UnPWaveNetはどちらも顕著な結果を示した。 TDC-LSTMはバイアスの低減に重点を置いており、TDC-UnPWaveNetは相関の最大化とKGEに重点を置いている。

Groundwater resources are one of the most relevant elements in the water cycle, therefore developing models to accurately predict them is a pivotal task in the sustainable resources management framework. Deep Learning (DL) models have been revealed very effective in hydrology, especially by feeding spatially distributed data (e.g. raster data). In many regions, hydrological measurements are difficult to obtain regularly or periodically in time, and in some cases, last available data are not up to date. Reversely, weather data, which significantly impacts water resources, are usually more available and with higher quality. More specifically, we have proposed two different DL models to predict the water table depth in the Grana-Maira catchment (Piemonte, IT) using only exogenous weather image time series. To deal with the image time series, both models are made of a first Time Distributed Convolutional Neural Network (TDC) which encodes the image available at each time step into a vectorial representation. The first model, TDC-LSTM uses then a Sequential Module based on an LSTM layer to learn temporal relations and output the predictions. The second model, TDC-UnPWaveNet uses instead a new version of the WaveNet architecture, adapted here to output a sequence shorter and completely shifted in the future with respect to the input one. To this aim, and to deal with the different sequence lengths in the UnPWaveNet, we have designed a new Channel Distributed layer, that acts like a Time Distributed one but on the channel dimension, i.e. applying the same set of operations to each channel of the input. TDC-LSTM and TDC-UnPWaveNet have shown both remarkable results. However, the two models have focused on different learnable information: TDC-LSTM has focused more on lowering the bias, while the TDC-UnPWaveNet has focused more on the temporal dynamics maximising correlation and KGE.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# 点雲対応のための自己注意重みとしての局所ガウス

Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence ( http://arxiv.org/abs/2409.13291v1 )

ライセンス: Link先を確認

Alessandro Riva, Alessandro Raganato, Simone Melzi,

(参考訳) ポイントクラウドマッチングのための現在のデータ駆動手法は、広範囲なトレーニング時間と計算資源を必要とし、モデルデプロイメントとアプリケーションにとって重要な課題を提示している。点雲マッチングタスクにおいて、エンコーダのみのトランスフォーマーアーキテクチャによる最近の進歩は、特に入力形状の各点を中心とするガウス関数に類似した、注意頭における意味論的意味のあるパターンの出現を明らかにしている。本研究では,これらのパターンを,トランスフォーマーアーキテクチャのアテンションヘッドに固定されたアテンション重みとして組み込むことにより,この現象をさらに解明する。本稿では,ガウシアンに対して所定の分散値を利用する方法と,学習可能なパラメータとして分散値を扱う方法の2つを評価する。さらに、ノイズデータの性能を分析し、ノイズに対する堅牢性を改善するための可能性を探る。その結果,注意重みの修正はトレーニングプロセスの促進だけでなく,最適化の安定性の向上にも寄与することがわかった。さらに,注入した情報が最も影響のある特定の層を同定し,その情報に対するネットワークの依存度を理解するためのアブレーション実験を行った。

Current data-driven methodologies for point cloud matching demand extensive training time and computational resources, presenting significant challenges for model deployment and application. In the point cloud matching task, recent advancements with an encoder-only Transformer architecture have revealed the emergence of semantically meaningful patterns in the attention heads, particularly resembling Gaussian functions centered on each point of the input shape. In this work, we further investigate this phenomenon by integrating these patterns as fixed attention weights within the attention heads of the Transformer architecture. We evaluate two variants: one utilizing predetermined variance values for the Gaussians, and another where the variance values are treated as learnable parameters. Additionally we analyze the performances on noisy data and explore a possible way to improve robustness to noise. Our findings demonstrate that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization. Furthermore, we conducted an ablation study to identify the specific layers where the infused information is most impactful and to understand the reliance of the network on this information.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# BPMの育児に向けて:持続可能なビジネスプロセスのための人間中心のアプローチ

Towards Nudging in BPM: A Human-Centric Approach for Sustainable Business Processes ( http://arxiv.org/abs/2409.13295v1 )

ライセンス: Link先を確認

Cielo Gonzalez Moyano, Finn Klessascheck, Saimir Bala, Stephan A. Fahrenkrog-Petersen, Jan Mendling,

(参考訳) ビジネスプロセス管理(BPM)は、主に技術的なソリューションを見つけることに焦点を当てています。ナッジ(英: Nudging)は、心理学と行動経済学のアプローチであり、人々の行動を導く。本稿では,BPMライフサイクルの異なるフェーズにヌードを組み込む方法について述べる。さらに、より持続可能なビジネスプロセスのための代替戦略として、ヌードがどうあるべきかを概説する。我々は,nudgingの統合がプロセスマイニングやビジネスプロセス管理において,より人間中心となる重要な機会を提供することを示す。ナッジの採用に伴う課題についても論じる。

Business Process Management (BPM) is mostly centered around finding technical solutions. Nudging is an approach from psychology and behavioral economics to guide people's behavior. In this paper, we show how nudging can be integrated into the different phases of the BPM lifecycle. Further, we outline how nudging can be an alternative strategy for more sustainable business processes. We show how the integration of nudging offers significant opportunities for process mining and business process management in general to be more human-centric. We also discuss challenges that come with the adoption of nudging.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment

OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment ( http://arxiv.org/abs/2409.13299v1 )

ライセンス: Link先を確認

Yooseok Lim, Sujee Lee,

(参考訳) 個別の患者状況の正確な診断と適切な服薬戦略は、パーソナライズされた医療意思決定プロセスの中核的な要素である。患者の状態を再帰的に評価し、適切な薬剤を投与する治療処置を、強化学習(RL)問題として効果的にモデル化することができる。重要なことに、この文脈におけるRLの成功は、最適な治療戦略を正確に表現する、明確に定義された報酬関数の確立に依存している。しかし、RLにおける学習方向を明示的な指標の限られたセットで定義することは、必要なドメイン知識の本質的な複雑さのためにタスクを複雑にする。このアプローチはまた、RLポリシーが臨床医の治療意図を適切に反映していない可能性を高め、様々な状況や指標を考慮することで決定される。本研究では,臨床医の意図を反映した報酬関数の開発に焦点をあて,オフラインRL環境に沿ったオフライン逆強化学習(IRL)を行うオフラインモデルに基づくガイド・リワード学習(OMG-RL)を導入する。 OMG-RLを通じて、限られたデータから専門家の意図を含むパラメータ化された報酬関数を学習し、エージェントのポリシーを強化する。ヘパリン投与課題に対する提案手法の検証を行った。その結果、OMG-RLによる政策学習は有意義であり、ヘパリンの効果をモニタリングするための重要な指標である活性化部分トロンボプラスチン時間(aPTT)において、学習方針が正に強化されていることが確認された。このアプローチはヘパリン服薬問題だけでなく、一般のRLベースの薬物服薬タスクにも広く利用することができる。

Accurate diagnosis of individual patient conditions and appropriate medication dosing strategies are core elements of personalized medical decision-making processes. This therapeutic procedure, which entails recursively assessing the patient's condition and administering suitable medications, can effectively be modeled as a reinforcement learning (RL) problem. Crucially, the success of RL in this context depends on the establishment of a well-defined reward function that accurately represents the optimal treatment strategy. However, defining the learning direction in RL with only a limited set of explicit indicators complicates the task due to the inherent complexity of the required domain knowledge. This approach may also increase the likelihood that the RL policy does not adequately reflect the clinician's treatment intentions, which are determined by considering various situations and indicators. In this study, we focus on developing a reward function that reflects the clinician's intentions and introduce Offline Model-based Guided Reward Learning (OMG-RL), which performs offline inverse reinforcement learning (IRL) aligned with the offline RL environment. Through OMG-RL, we learn a parameterized reward function that includes the expert's intentions from limited data, thereby enhancing the agent's policy. We validate the proposed approach on the heparin dosing task. The results demonstrate that policy learning through OMG-RL is meaningful and confirm that the learned policy is positively reinforced in terms of activated partial thromboplastin time (aPTT), a key indicator for monitoring the effects of heparin. This approach can be broadly utilized not only for the heparin dosing problem but also for RL-based medication dosing tasks in general.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# 予測DNA断片化:機械学習を用いた化学測定法の非破壊的類似

Predicting DNA fragmentation: A non-destructive analogue to chemical assays using machine learning ( http://arxiv.org/abs/2409.13306v1 )

ライセンス: Link先を確認

Byron A Jacobs, Ifthakaar Shaik, Frando Lin,

(参考訳) 全世界では不妊率は増加しており、全出生の2.55%は2022年の体外受精(IVF)によって支えられている。男性不妊は、これらの症例の約半数の原因である。精子DNAの品質はIVFの成功に大きな影響を及ぼす。精子DNAの評価は伝統的に、IVFに対して精子細胞を不適格にする化学測定によって行われる。多くの複合要因が人口危機を招き、近年では全世界で出生率が低下している。このような補助的生殖技術(ART)が最近の研究の焦点となっている。同時に、人工知能はユビキタスに成長し、現代の生活の多くの側面に浸透している。最先端の機械学習の出現と、多くの分野での例外的な性能を生かし、この研究はこれらの成功に基づき、不安定な精子の画像から精子のDNA断片化を予測する新しい枠組みを提案する。精子の完全性を維持し、IVFのための精子の最適な選択を可能にする予測モデルをレンダリングする。

Globally, infertility rates are increasing, with 2.5\% of all births being assisted by in vitro fertilisation (IVF) in 2022. Male infertility is the cause for approximately half of these cases. The quality of sperm DNA has substantial impact on the success of IVF. The assessment of sperm DNA is traditionally done through chemical assays which render sperm cells ineligible for IVF. Many compounding factors lead to the population crisis, with fertility rates dropping globally in recent history. As such assisted reproductive technologies (ART) have been the focus of recent research efforts. Simultaneously, artificial intelligence has grown ubiquitous and is permeating more aspects of modern life. With the advent of state-of-the-art machine learning and its exceptional performance in many sectors, this work builds on these successes and proposes a novel framework for the prediction of sperm cell DNA fragmentation from images of unstained sperm. Rendering a predictive model which preserves sperm integrity and allows for optimal selection of sperm for IVF.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# MeMoir: メモリ使用量に基づくソフトウェア駆動のカバレッジチャネル

MeMoir: A Software-Driven Covert Channel based on Memory Usage ( http://arxiv.org/abs/2409.13310v1 )

ライセンス: Link先を確認

Jeferson Gonzalez-Gomez, Jose Alejandro Ibarra-Campos, Jesus Yamir Sandoval-Morales, Lars Bauer, Jörg Henkel,

(参考訳) カバーチャネル攻撃は、現代のコンピューティングシステムに対する深刻な脅威として継続的に研究されてきた。ソフトウェアベースの秘密チャンネルは、悪質なアクター間の不正なコミュニケーションを確立するために仮想リソースを活用するため、これらの攻撃の通常、検出が難しい分岐である。本稿では,MeMoirについて紹介する。MeMoirは,初めてメモリ使用量をチャネルの媒体として利用する,ソフトウェア駆動のカバートチャネルである。汎用的なIntel x86-64ベースのデスクトップコンピュータとARM64ベースの組み込みシステムである。以上の結果から,新しいアーキテクチャおよびハードウェアに依存しないサーキットチャネルが有効であり,エラーの少ない中程度の伝送速度を実現することが示唆された。さらに,Hyper-V仮想化環境からWindows 11ホストシステムへの情報伝達が可能な攻撃事例も提示した。さらに,システムメモリの使用を監視することで,95%以上の精度で,偽陽性と偽陰性率の低いシステムに攻撃が存在するかどうかを予測できる機械学習ベースの検出器を実装した。最後に,他の通常のアプリケーションと比較して,システム内の低電力オーバーヘッドを誘導しながら,攻撃を効果的に軽減するノイズベース対策を提案する。

Covert channel attacks have been continuously studied as severe threats to modern computing systems. Software-based covert channels are a typically hard-to-detect branch of these attacks, since they leverage virtual resources to establish illegitimate communication between malicious actors. In this work, we present MeMoir: a novel software-driven covert channel that, for the first time, utilizes memory usage as the medium for the channel. We implemented the new covert channel on two real-world platforms with different architectures: a general-purpose Intel x86-64-based desktop computer and an ARM64-based embedded system. Our results show that our new architecture- and hardware-agnostic covert channel is effective and achieves moderate transmission rates with very low error. Moreover, we present a real use-case for our attack where we were able to communicate information from a Hyper-V virtualized enviroment to a Windows 11 host system. In addition, we implement a machine learning-based detector that can predict whether an attack is present in the system with an accuracy of more than 95% with low false positive and false negative rates by monitoring the use of system memory. Finally, we introduce a noise-based countermeasure that effectively mitigates the attack while inducing a low power overhead in the system compared to other normal applications.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# UIテスト再利用のためのスキル適応型模倣学習

Skill-Adpative Imitation Learning for UI Test Reuse ( http://arxiv.org/abs/2409.13311v1 )

ライセンス: Link先を確認

Mengzhou Wu, Hao Wang, Jun Ren, Yuan Cao, Yuetong Li, Alex Jiang, Dezhi Ran, Yitao Hu, Wei Yang, Tao Xie,

(参考訳) ユーザインターフェース(UI)テストケースを手作業で作成するコストを軽減するため、UIテストマイグレーションは、同様の機能を持つソースアプリから、ターゲットとするモバイルアプリケーション(アプリ)のテストケースを自動的に生成することを目的としている。従来、このプロセスは、ソースアプリのイベントをテキスト記述に基づいてターゲットアプリのイベントにマッピングする、シーケンシャルなUIイベントマッピング問題としてアプローチされてきた。これまでの研究は、NLPモデルのイベントマッピング精度の向上に重点を置いてきた。 NLP機能を備えた大規模言語モデル(LLM)の出現は、ほぼ完璧なイベントマッピングの可能性を示しているが、我々の研究は、LLMの高精度なイベントマッピングでさえ、ソースとターゲットアプリ間の実装の相違に対処するには不十分であり、UIテストマイグレーションのためのLLM駆動ソリューションの全体的な効果を低下させることを示した。そこで本研究では,2つの鍵となる設計によるUIテストマイグレーションの有効性向上を目的とした,スキル適応型模倣学習フレームワークSAILを提案する。まず、SAILは、ソーステストケースをデモとして活用し、テストケースの基礎となるスキルを多レベルに抽象化し、ソーステストケースからテスト情報を抽出して、ターゲットアプリ上でのテスト生成の知識ベースとする。第2に、SAILは学習したスキルのサブセットを選択的に再利用し、新しいコンテキストおよび履歴認識スキル適応を用いて、ターゲットアプリのテストケースの生成を誘導する。 SAILは任意の模倣学習技術でインスタンス化できるが、LLMのテキスト内学習機能を利用してSAILをインスタンス化する。評価の結果、SAILはUIテストマイグレーションの有効性を大幅に改善し、最先端のアプローチよりも149\%高い成功率を示した。

To alleviate the substantial cost of manually crafting user interface (UI) test cases, UI test migration aims to automatically generate test cases for a target mobile application (app) by adapting those from a source app that shares similar functionalities. Traditionally, this process has been approached as a sequential UI-event-mapping problem, where events in the source app are mapped to those in the target one based on their textual descriptions. Prior research has extensively focused on enhancing the event-mapping accuracy of NLP models. Although the advent of large language models (LLMs) with impressive NLP capabilities suggests the potential for near-perfect event-mapping, our study demonstrates that even the highly accurate event-mapping of LLMs is insufficient to address the implementation discrepancies between the source and the target apps, reducing the overall effectiveness of LLM-driven solutions for UI test migration. To address this challenge, in this paper, we propose SAIL, a skill-adaptive imitation learning framework designed to enhance the effectiveness of UI test migration through two key designs. First, SAIL leverages the source test cases as demonstrations and employs a multi-level abstraction of test cases' underlying skills, so as to extract the testing information from source test cases as the knowledge base for the subsequent test generation on the target app. Second, SAIL selectively reuses a subset of the learned skills to guide the generation of test cases for the target app with its novel context- and history-aware skill adaptation. While SAIL can be instantiated with any imitation learning techniques, we utilize the in-context learning capabilities of LLMs to instantiate SAIL. Evaluations results show that SAIL substantially improves the effectiveness of UI test migration, with 149\% higher success rate than state-of-the-art approaches.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# GAProtoNet:解釈可能なテキスト分類のためのマルチヘッドグラフアテンションに基づくプロトタイプネットワーク

GAProtoNet: A Multi-head Graph Attention-based Prototypical Network for Interpretable Text Classification ( http://arxiv.org/abs/2409.13312v1 )

ライセンス: Link先を確認

Ximing Wen, Wenjuan Tan, Rosina O. Weber,

(参考訳) 事前訓練されたトランスフォーマーベース言語モデル(LM)は、強力な単語埋め込みによるテキスト分類タスクの大幅な改善を達成できることでよく知られているが、そのブラックボックスの性質は、解釈可能性の欠如につながっている。本稿では,LMエンコーダで構築したテキスト分類モデルの決定を記述した,新しいホワイトボックスのマルチヘッドグラフアテンションに基づくプロトタイプネットワークであるGAProtoNetを紹介する。提案手法では,入力ベクトルとプロトタイプをグラフ内のノードとみなし,入力ノードとプロトタイプノードの間のエッジを選択的に構築し,解釈可能なプロトタイプ表現を学習する。推測中、モデルは各プロトタイプに割り当てられた注目スコアによって重み付けされた活性型プロトタイプの線形結合に基づいて決定を行い、その選択を注意重みによって透過的に説明し、最も近いマッチングトレーニング例に投影する。複数の公開データセットを用いた実験により,元のブラックボックスLMの精度を犠牲にすることなく,より優れた結果が得られた。また,提案手法は4種類のネットワーク変動を比較検討し,F1の精度と精度を比較検討した。プロトタイプクラスタのケーススタディと可視化は,LMを用いて構築したブラックボックスモデルの決定を効率的に説明できることを示す。

Pretrained transformer-based Language Models (LMs) are well-known for their ability to achieve significant improvement on text classification tasks with their powerful word embeddings, but their black-box nature, which leads to a lack of interpretability, has been a major concern. In this work, we introduce GAProtoNet, a novel white-box Multi-head Graph Attention-based Prototypical Network designed to explain the decisions of text classification models built with LM encoders. In our approach, the input vector and prototypes are regarded as nodes within a graph, and we utilize multi-head graph attention to selectively construct edges between the input node and prototype nodes to learn an interpretable prototypical representation. During inference, the model makes decisions based on a linear combination of activated prototypes weighted by the attention score assigned for each prototype, allowing its choices to be transparently explained by the attention weights and the prototypes projected into the closest matching training examples. Experiments on multiple public datasets show our approach achieves superior results without sacrificing the accuracy of the original black-box LMs. We also compare with four alternative prototypical network variations and our approach achieves the best accuracy and F1 among all. Our case study and visualization of prototype clusters also demonstrate the efficiency in explaining the decisions of black-box models built with LMs.

翻訳日:2024-11-07 07:51:11 公開日:2024-09-20

# 高次元ベイズネットワーク学習のためのリング型分散アルゴリズム

A Ring-Based Distributed Algorithm for Learning High-Dimensional Bayesian Networks ( http://arxiv.org/abs/2409.13314v1 )

ライセンス: Link先を確認

Jorge D. Laborda, Pablo Torrijos, José M. Puerta, José A. Gámez,

(参考訳) 高次元データからベイズネットワーク(BN)を学習することは複雑で時間を要する作業である。文献には水平(インスタンス)や垂直(変数)のパーティショニングに基づくアプローチがあるが、GESアルゴリズム自体に基づく手法を除いて、Greedy Equivalence Search (GES)アルゴリズムと同じ理論的性質を保証できない。本稿では, GES を局所学習アルゴリズムとして用い, GES と同じ理論的特性を保証しながら,CPU 時間の短縮を図った有向リングベース分散手法を提案する。この方法は、可能なエッジの集合を分割し、リング内の各プロセッサが受信したサブセットでのみ動作するように制限することを含む。グローバルラーニングプロセスは、収束基準を満たすまで数ラウンドを繰り返す反復アルゴリズムである。各ラウンドにおいて、各プロセッサは、前者のリングからBNを受け取り、それを自身のBNモデルと融合させ、その結果を、エッジの集合に制約された局所学習プロセスの開始解として利用する。その後、環の後継者に得られたモデルを送付する。 3つの大きなドメイン(400-1000変数)で実験を行い、GESとその高速バージョン(fGES)と比較して提案手法の有効性を実証した。

Learning Bayesian Networks (BNs) from high-dimensional data is a complex and time-consuming task. Although there are approaches based on horizontal (instances) or vertical (variables) partitioning in the literature, none can guarantee the same theoretical properties as the Greedy Equivalence Search (GES) algorithm, except those based on the GES algorithm itself. In this paper, we propose a directed ring-based distributed method that uses GES as the local learning algorithm, ensuring the same theoretical properties as GES but requiring less CPU time. The method involves partitioning the set of possible edges and constraining each processor in the ring to work only with its received subset. The global learning process is an iterative algorithm that carries out several rounds until a convergence criterion is met. In each round, each processor receives a BN from its predecessor in the ring, fuses it with its own BN model, and uses the result as the starting solution for a local learning process constrained to its set of edges. Subsequently, it sends the model obtained to its successor in the ring. Experiments were carried out on three large domains (400-1000 variables), demonstrating our proposal's effectiveness compared to GES and its fast version (fGES).

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# 品質・多様性における性能・再現性トレードオフの探求

Exploring the Performance-Reproducibility Trade-off in Quality-Diversity ( http://arxiv.org/abs/2409.13315v1 )

ライセンス: Link先を確認

Manon Flageat, Hannah Janmohamed, Bryan Lim, Antoine Cully,

(参考訳) 品質多様性(QD)アルゴリズムは多くの領域やアプリケーションで有望な結果を示している。しかし、複雑な実世界のアプリケーションでQDが使用される場合、ソリューションの適合性と行動推定の不確実性は依然として大きな課題である。不確実なアプリケーションの性能を改善するためのいくつかのアプローチが提案されているが、多くの人は重要な課題に対処できない。ほとんどの先行した方法は、適合性と再現性を共同で改善し、それらが矛盾する目的である可能性を無視する。例えば、ロボット工学では、解は不確実な環境で最大速度の90%を確実に歩けるが、より速く歩く解は転倒しやすい。これはトレードオフなので、この2つのソリューションのどちらか一方が他方よりも"良い"ものではありません。したがって、アルゴリズムは本質的に一方の解を選ぶことはできないが、これら2つの矛盾する目的に対して与えられた選好のみを強制することができる。本稿では,不確実なQDに対する性能再現性トレードオフとして,この問題を定式化する。そこで本稿では, トレードオフに対する最適解を求める新たな4つのQDアルゴリズムを提案する。また,これらの選好が事前に定義できない場合のA-posteriori QDアルゴリズムを提案する。以上の結果から,提案手法は与えられた嗜好を満たす解を見出すことができた。重要なことは、このトレードオフを単純に説明すれば、我々のアプローチは既存の不確実なQD手法よりも優れているということです。これは、性能再現性トレードオフを考慮すると、パフォーマンスのみを最適化した場合に通常見逃される重要なステップストーンがアンロックされることを示している。

Quality-Diversity (QD) algorithms have exhibited promising results across many domains and applications. However, uncertainty in fitness and behaviour estimations of solutions remains a major challenge when QD is used in complex real-world applications. While several approaches have been proposed to improve the performance in uncertain applications, many fail to address a key challenge: determining how to prioritise solutions that perform consistently under uncertainty, in other words, solutions that are reproducible. Most prior methods improve fitness and reproducibility jointly, ignoring the possibility that they could be contradictory objectives. For example, in robotics, solutions may reliably walk at 90% of the maximum velocity in uncertain environments, while solutions that walk faster are also more prone to falling over. As this is a trade-off, neither one of these two solutions is "better" than the other. Thus, algorithms cannot intrinsically select one solution over the other, but can only enforce given preferences over these two contradictory objectives. In this paper, we formalise this problem as the performance-reproducibility trade-off for uncertain QD. We propose four new a-priori QD algorithms that find optimal solutions for given preferences over the trade-offs. We also propose an a-posteriori QD algorithm for when these preferences cannot be defined in advance. Our results show that our approaches successfully find solutions that satisfy given preferences. Importantly, by simply accounting for this trade-off, our approaches perform better than existing uncertain QD methods. This suggests that considering the performance-reproducibility trade-off unlocks important stepping stones that are usually missed when only performance is optimised.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# JMedBench: 日本の生物医学大言語モデル評価ベンチマーク

JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language Models ( http://arxiv.org/abs/2409.13317v1 )

ライセンス: Link先を確認

Junfeng Jiang, Jiahao Huang, Akiko Aizawa,

(参考訳) 日本語大言語モデル(LLM)の最近の発展は、主に一般ドメインに焦点を当てており、日本の生物医学 LLM の進歩は少ない。ひとつの障害は、比較のための包括的な大規模ベンチマークがないことだ。また, バイオメディカルLLMを評価するための資源も不十分である。そこで本研究では,4つのカテゴリに8つのLSMと5つのタスクにまたがる20のバイオメディカルデータセットを含む新しいベンチマークを提案する。実験結果から,(1)日本の生物医学的課題において,日本の生物医学的知識をより深く理解した LLM がより優れた性能を発揮すること,(2)日本の生物医学的領域を主目的としない LLM が相変わらず良好な性能を発揮すること,(3) 日本の生物医学的課題において既存の LLM を改良する余地がまだ残っていること,などが示唆された。さらに、この分野の発展をさらに促進できる洞察を提供する。我々の評価ツールはベンチマークに合わせており、データセットはhttps://huggingface.co/datasets/Coldog2333/JMedBenchで公開されています。

Recent developments in Japanese large language models (LLMs) primarily focus on general domains, with fewer advancements in Japanese biomedical LLMs. One obstacle is the absence of a comprehensive, large-scale benchmark for comparison. Furthermore, the resources for evaluating Japanese biomedical LLMs are insufficient. To advance this field, we propose a new benchmark including eight LLMs across four categories and 20 Japanese biomedical datasets across five tasks. Experimental results indicate that: (1) LLMs with a better understanding of Japanese and richer biomedical knowledge achieve better performance in Japanese biomedical tasks, (2) LLMs that are not mainly designed for Japanese biomedical domains can still perform unexpectedly well, and (3) there is still much room for improving the existing LLMs in certain Japanese biomedical tasks. Moreover, we offer insights that could further enhance development in this field. Our evaluation tools tailored to our benchmark as well as the datasets are publicly available in https://huggingface.co/datasets/Coldog2333/JMedBench to facilitate future research.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# SLaVA-CXR:胸部X線レポート自動化のための小言語と視覚アシスタント

SLaVA-CXR: Small Language and Vision Assistant for Chest X-ray Report Automation ( http://arxiv.org/abs/2409.13321v1 )

ライセンス: Link先を確認

Jinge Wu, Yunsoo Kim, Daqian Shi, David Cliffton, Fenglin Liu, Honghan Wu,

(参考訳) 大規模言語モデル(LLMs)の成功に触発されて、臨床医を支援する医療分野におけるLSMの開発への研究関心が高まっている。しかし、病院では、クローズドソースの商用LCMを使用するにはプライバシーの問題があり、特に資源効率のよい地域や低所得国では、大規模な計算資源を必要とする。我々はChest X-Rayレポートの自動化に使用できるオープンソースのSmall Language and Vision Assistant (SLaVA-CXR)を提案する。そこで我々はまず,放射線技師の認知発達をシミュレートしたRe$3$Training法を提案し,認識・推論・報告の訓練方法においてモデルを最適化する。そこで,プライバシー規制に準拠した高品質で多様な学習コーパスを生成できるデータ合成手法RADEXを提案する。実験の結果,SLaVA-CXRは2.7Bのバックボーン上に構築されており,従来の最先端モデルよりも6倍高速な推論効率を実現していることがわかった。

Inspired by the success of large language models (LLMs), there is growing research interest in developing LLMs in the medical domain to assist clinicians. However, for hospitals, using closed-source commercial LLMs involves privacy issues, and developing open-source public LLMs requires large-scale computational resources, which are usually limited, especially in resource-efficient regions and low-income countries. We propose an open-source Small Language and Vision Assistant (SLaVA-CXR) that can be used for Chest X-Ray report automation. To efficiently train a small assistant, we first propose the Re$^3$Training method, which simulates the cognitive development of radiologists and optimizes the model in the Recognition, Reasoning, and Reporting training manner. Then, we introduce a data synthesis method, RADEX, which can generate a high-quality and diverse training corpus with privacy regulation compliance. The extensive experiments show that our SLaVA-CXR built on a 2.7B backbone not only outperforms but also achieves 6 times faster inference efficiency than previous state-of-the-art larger models.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# 核スピン-異性体重ね合わせのコヒーレントダイナミクス

Coherent dynamics of a nuclear-spin-isomer superposition ( http://arxiv.org/abs/2409.13322v1 )

ライセンス: Link先を確認

Tamar Levin, Ziv Meir,

(参考訳) システムのサイズと複雑さの増加に伴う量子コヒーレンスを保存することは大きな課題である。分子は、様々な大きさと複雑さと多くの自由度を持ち、量子から古典的行動への遷移を研究するための優れたプラットフォームである。分子の量子制御の研究は振動と回転に焦点を当てているが、ここでは同じ分子の2つの核スピン異性体の間の量子重ね合わせを作ることに焦点を当てる。本稿では、2つの非結合の核-スピン-異性体状態間の強い結合を生み出すために、スペクトルにおける避けられた交差を利用して、異性体量子ビットを生成するスキームを提案する。我々は,4レベルハミルトニアンを用いて体系をモデル化し,システムの異なる状態とパラメータのコヒーレントなダイナミクスを探索する。我々の4レベルモデルとアプローチは、同様のエネルギーレベル構造を持つ他のシステムに適用できる。

Preserving quantum coherence with the increase of a system's size and complexity is a major challenge. Molecules, with their diverse sizes and complexities and many degrees of freedom, are an excellent platform for studying the transition from quantum to classical behavior. While most quantum-control studies of molecules focus on vibrations and rotations, we focus here on creating a quantum superposition between two nuclear-spin isomers of the same molecule. We present a scheme that exploits an avoided crossing in the spectrum to create strong coupling between two uncoupled nuclear-spin-isomer states, hence creating an isomeric qubit. We model our scheme using a four-level Hamiltonian and explore the coherent dynamics in the different regimes and parameters of our system. Our four-level model and approach can be applied to other systems with a similar energy-level structure.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# 2音駆動とパラメトリックポンプの接合効果による強い機械的スクイーズの発生

Generation of strong mechanical squeezing through the joint effect of two-tone driving and parametric pumping ( http://arxiv.org/abs/2409.13323v1 )

ライセンス: Link先を確認

Xiao-Jie Wu, Huan-Huan Cheng, Qiannan Wu, Cheng-Hua Bai, Shao-Xiong Wu,

(参考訳) 光学系における2音駆動とパラメトリックポンプの相乗的機構を利用して、強力な機械的スクイーズを効率的に作成する革新的な手法を提案する。光学パラメトリック増幅器によって誘導されるキャビティフィールドのスクイーズ効果は、2トーン駆動により圧縮されたメカニカル発振器に伝達でき、メカニカル発振器のスクイーズ化の度合いは、任意の単一機構によって得られたものを上回る。我々のプロジェクトは、幅広い条件で強力な機械的スクイーズを生成するために、多用途で効率的なアプローチを提供する。

We propose an innovative scheme to efficiently prepare strong mechanical squeezing through utilizing the synergistic mechanism of two-tone driving and parametric pumping in an optomechanical system. By reasonable choosing the system parameters, the proposal highlights the following prominent advantages: the squeezing effect of the cavity field induced by the optical parametric amplifier can be transferred to the mechanical oscillator, which has been squeezed by the two-tone driving, and the degree of squeezing of the mechanical oscillator will surpass that obtained by any single mechanism; the joint mechanism can enhance the degree of squeezing significantly and break the 3 dB mechanical squeezing limit, which is particularly evident in range where the red/blue-detuned ratio is sub-optimal; the mechanical squeezing achieved through this distinctive joint mechanism exhibits notable robustness against both thermal noise and decay of mechanical oscillator. Our project offers a versatile and efficient approach for generating strong mechanical squeezing across a wide range of conditions.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# 半教師付きデュアルモーダルセマンティックセマンティックセグメンテーションに向けて

Towards Semi-supervised Dual-modal Semantic Segmentation ( http://arxiv.org/abs/2409.13325v1 )

ライセンス: Link先を確認

Qiulei Dong, Jianan Li, Shuang Deng,

(参考訳) 3Dおよび2Dデータ取得技術の開発により、シーンの点雲と画像の同時取得が容易になり、デュアルモーダルなセマンティックセマンティックセグメンテーションがさらに容易になった。ポイントクラウドとイメージを同時にセグメンテーションする既存の方法のほとんどは、ラベル付きトレーニングデータの量と品質に大きく依存している。しかし、大量のポイントワイドおよびピクセルワイドラベリング手順は時間がかかり、労働集約的である。そこで本研究では,少数のラベル付き点群,多数のラベル付き点群,およびラベル付き画像を用いて,PD-Netと呼ばれる半教師付きデュアルモーダルセマンティックセマンティックセマンティックセマンティクスタスクを処理する並列デュアルストリームネットワークを提案する。提案したPD-Netは、2つの並列ストリーム(元のストリームと擬似ラベル予測ストリームと呼ばれる)で構成されている。擬似ラベル予測ストリームは、未ラベルの点雲とその対応する画像の擬似ラベルを予測する。そして、ラベルなしデータを元のストリームに送信して自己学習を行う。各ストリームは、それぞれ3Dデータと2Dデータのための2つのエンコーダデコーダブランチを含む。各ストリームにおいて、複数のデュアルモーダル融合モジュールが二重モーダル特徴を融合するために探索される。さらに、擬似ラベル予測ストリームによって出力される擬似ラベルを最適化するために擬似ラベル最適化モジュールを探索した。 2つの公開データセットの実験結果から、提案手法は、比較半教師付き手法よりも優れているだけでなく、ほとんどの場合、完全教師付き手法で競合性能を達成できることが示された。

With the development of 3D and 2D data acquisition techniques, it has become easy to obtain point clouds and images of scenes simultaneously, which further facilitates dual-modal semantic segmentation. Most existing methods for simultaneously segmenting point clouds and images rely heavily on the quantity and quality of the labeled training data. However, massive point-wise and pixel-wise labeling procedures are time-consuming and labor-intensive. To address this issue, we propose a parallel dual-stream network to handle the semi-supervised dual-modal semantic segmentation task, called PD-Net, by jointly utilizing a small number of labeled point clouds, a large number of unlabeled point clouds, and unlabeled images. The proposed PD-Net consists of two parallel streams (called original stream and pseudo-label prediction stream). The pseudo-label prediction stream predicts the pseudo labels of unlabeled point clouds and their corresponding images. Then, the unlabeled data is sent to the original stream for self-training. Each stream contains two encoder-decoder branches for 3D and 2D data respectively. In each stream, multiple dual-modal fusion modules are explored for fusing the dual-modal features. In addition, a pseudo-label optimization module is explored to optimize the pseudo labels output by the pseudo-label prediction stream. Experimental results on two public datasets demonstrate that the proposed PD-Net not only outperforms the comparative semi-supervised methods but also achieves competitive performances with some fully-supervised methods in most cases.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# 拡散確率モデルによる生成空力設計

Generative Aerodynamic Design with Diffusion Probabilistic Models ( http://arxiv.org/abs/2409.13328v1 )

ライセンス: Link先を確認

Thomas Wagenaar, Simone Mancini, Andrés Mateo-Gabín,

(参考訳) 空力設計のためのジオメトリの最適化は、ジオメトリを評価し、反復的に改善するために、多くの高価なシミュレーションに依存することが多い。しばしばリフト・アンド・ドラッグ、空力モーメント、表面積の観点で、所望の要求に近く特性を持つ開始幾何を提供することで、シミュレーションの数を減らすことができる。生成モデルは、シミュレーションの大規模なデータセット上でジオメトリを一般化することにより、そのような開始ジオメトリを提供する可能性があることを示す。特に,XFOILシミュレーションで訓練した拡散確率モデルを用いて,所定の空力特性と制約を条件とした2次元翼ジオメトリーを合成する。翼はベルンシュタイン多項式でパラメータ化され、生成された設計の滑らかさを保証する。モデルが同一の要件と制約に対して多様な候補設計を生成可能であることを示し、最適化手順に複数の出発点を提供する設計空間を効果的に探索する。しかし、候補設計の品質は、データセット内の模擬設計の分布に依存する。重要なことに、このデータセットのジオメトリは、生成されたジオメトリが物理的であることを保証するために、拡散モデルの条件付けに使われていない他の要件や制約を満たす必要がある。

The optimization of geometries for aerodynamic design often relies on a large number of expensive simulations to evaluate and iteratively improve the geometries. It is possible to reduce the number of simulations by providing a starting geometry that has properties close to the desired requirements, often in terms of lift and drag, aerodynamic moments and surface areas. We show that generative models have the potential to provide such starting geometries by generalizing geometries over a large dataset of simulations. In particular, we leverage diffusion probabilistic models trained on XFOIL simulations to synthesize two-dimensional airfoil geometries conditioned on given aerodynamic features and constraints. The airfoils are parameterized with Bernstein polynomials, ensuring smoothness of the generated designs. We show that the models are able to generate diverse candidate designs for identical requirements and constraints, effectively exploring the design space to provide multiple starting points to optimization procedures. However, the quality of the candidate designs depends on the distribution of the simulated designs in the dataset. Importantly, the geometries in this dataset must satisfy other requirements and constraints that are not used in conditioning of the diffusion model, to ensure that the generated geometries are physical.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# 新たなデータセットを用いた非拘束環境における果実・野菜検出の促進

Enhancing Fruit and Vegetable Detection in Unconstrained Environment with a Novel Dataset ( http://arxiv.org/abs/2409.13330v1 )

ライセンス: Link先を確認

Sandeep Khanna, Chiranjoy Chattopadhyay, Suman Kundu,

(参考訳) コンピュータビジョンによる果物や野菜の検出の自動化は、農業の近代化、効率の向上、食品品質の確保、技術的に先進的で持続可能な農業慣行への貢献に不可欠である。本稿では,実環境における果実や野菜の検出とローカライズのためのエンドツーエンドパイプラインを提案する。これを実現するために、FRUVEG67というデータセットをキュレートした。このデータセットには、制約のないシナリオでキャプチャされた67種類の果物や野菜の画像が含まれており、クラス毎に手動で注釈付けされたサンプルはわずかである。我々は,残りの非注釈画像にラベルをつけるためにオブジェクトのバウンディングボックスを生成する半教師付きデータアノテーションアルゴリズム(SSDA)を開発した。 Fruit and Vegetable Detection Network (FVDNet) は3つの異なるグリッド構成を持つYOLOv7のアンサンブルバージョンである。我々は,境界ボックス予測に平均的アプローチ,およびクラス予測に投票機構を用いる。我々は、より小さな物体をよりよく検出するために、焦点損失とともにJensen-Shannon divergence (JSD)を統合した。実験の結果,従来のYOLOに比べてFVDNetの方が優れており,検出性能とローカライゼーション性能が著しく向上していることがわかった。平均平均精度(mAP)は全クラスで0.78であった。さらに,FVDNetの有効性をオープンカテゴリの冷凍機画像を用いて評価し,有望な結果を示した。

Automating the detection of fruits and vegetables using computer vision is essential for modernizing agriculture, improving efficiency, ensuring food quality, and contributing to technologically advanced and sustainable farming practices. This paper presents an end-to-end pipeline for detecting and localizing fruits and vegetables in real-world scenarios. To achieve this, we have curated a dataset named FRUVEG67 that includes images of 67 classes of fruits and vegetables captured in unconstrained scenarios, with only a few manually annotated samples per class. We have developed a semi-supervised data annotation algorithm (SSDA) that generates bounding boxes for objects to label the remaining non-annotated images. For detection, we introduce the Fruit and Vegetable Detection Network (FVDNet), an ensemble version of YOLOv7 featuring three distinct grid configurations. We employ an averaging approach for bounding-box prediction and a voting mechanism for class prediction. We have integrated Jensen-Shannon divergence (JSD) in conjunction with focal loss to better detect smaller objects. Our experimental results highlight the superiority of FVDNet compared to previous versions of YOLO, showcasing remarkable improvements in detection and localization performance. We achieved an impressive mean average precision (mAP) score of 0.78 across all classes. Furthermore, we evaluated the efficacy of FVDNet using open-category refrigerator images, where it demonstrates promising results.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# 術前多言語BERTを埋め込みに応用した悪性プロンプト注射の検出の改善

Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection ( http://arxiv.org/abs/2409.13331v1 )

ライセンス: Link先を確認

Md Abdur Rahman, Hossain Shahriar, Fan Wu, Alfredo Cuzzocrea,

(参考訳) 大きな言語モデル(LLM)は、その優れた能力と広範囲のアプリケーションに適用できることで有名である。しかし、この広範な利用は重大な脆弱性をもたらす。また、現在のアプローチでは、現実世界のアプリケーションにおけるこれらの脆弱性の複雑さや進化の性質に適切に対処できないため、大規模な言語モデルにおける悪意あるインジェクション攻撃に対する効果的な検出と緩和戦略の必要性に、大きなギャップがあることがよく観察されている。したがって、本研究は、実際のLLMアプリケーションに最も危険な脆弱性の一つである悪意のあるプロンプトインジェクション攻撃の影響に焦点を当てている。正規のプロンプトから悪意のあるプロンプトを分類するために、多言語BERT、DistilBertのような様々なBERT(Bidirectional Encoder Representations from Transformers)を適用する。また,多言語BERTを用いた迅速なテキストのトークン化と埋め込み生成が,ガウスネーブベイズ,ランダムフォレスト,サポートベクターマシン,ロジスティック回帰といった機械学習手法の性能向上にどのように貢献するかを観察した。各モデルの性能は、悪意のあるプロンプトを発見するためにバイナリ分類を改善するために、様々なパラメータで厳格に分析される。プロンプトを埋め込むための多言語BERTアプローチは、既存の作業を大幅に改善し、性能を上回り、ロジスティック回帰により96.55%の精度を達成した。さらに,モデルの誤り予測について検討し,その限界について考察した。この発見は、多様なLSMの脆弱性に最も適したモデルを見つけるために、様々なBERTをチューニングする研究者を導くことができる。

Large language models (LLMs) are renowned for their exceptional capabilities, and applying to a wide range of applications. However, this widespread use brings significant vulnerabilities. Also, it is well observed that there are huge gap which lies in the need for effective detection and mitigation strategies against malicious prompt injection attacks in large language models, as current approaches may not adequately address the complexity and evolving nature of these vulnerabilities in real-world applications. Therefore, this work focuses the impact of malicious prompt injection attacks which is one of most dangerous vulnerability on real LLMs applications. It examines to apply various BERT (Bidirectional Encoder Representations from Transformers) like multilingual BERT, DistilBert for classifying malicious prompts from legitimate prompts. Also, we observed how tokenizing the prompt texts and generating embeddings using multilingual BERT contributes to improve the performance of various machine learning methods: Gaussian Naive Bayes, Random Forest, Support Vector Machine, and Logistic Regression. The performance of each model is rigorously analyzed with various parameters to improve the binary classification to discover malicious prompts. Multilingual BERT approach to embed the prompts significantly improved and outperformed the existing works and achieves an outstanding accuracy of 96.55% by Logistic regression. Additionally, we investigated the incorrect predictions of the model to gain insights into its limitations. The findings can guide researchers in tuning various BERT for finding the most suitable model for diverse LLMs vulnerabilities.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# 大規模言語モデルにおける時間意識: Fact Recallのベンチマーク

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time ( http://arxiv.org/abs/2409.13338v1 )

ライセンス: Link先を確認

David Herel, Vojtech Bartek, Tomas Mikolov,

(参考訳) 大統領は誰ですか。答えは質問のタイミングによって変わる。大きな言語モデル(LLM)は様々な推論タスクで評価されるが、時間という重要な次元を見逃してしまうことが多い。現実のシナリオでは、回答の正しさはしばしば時間的文脈に結びついている。本稿では,LLMが時間に敏感な事実を処理できることを厳格に検証するための新しいデータセットを提案する。我々のベンチマークは、LLMの知識と正しい時間コンテキストの整合性を測定するための体系的な方法を提供し、現在の評価手法における重要なギャップを埋め、将来のモデルにおける現実の応用性を改善するための貴重なツールを提供する。

Who is the US President? The answer changes depending on when the question is asked. While large language models (LLMs) are evaluated on various reasoning tasks, they often miss a crucial dimension: time. In real-world scenarios, the correctness of answers is frequently tied to temporal context. In this paper, we introduce a novel dataset designed to rigorously test LLMs' ability to handle time-sensitive facts. Our benchmark offers a systematic way to measure how well LLMs align their knowledge with the correct time context, filling a key gap in current evaluation methods and offering a valuable tool for improving real-world applicability in future models.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# タブラルバイオメディカルデータのための低性能機械学習における特徴重要度の有効性

Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data ( http://arxiv.org/abs/2409.13342v1 )

ライセンス: Link先を確認

Youngro Lee, Giacomo Baruzzo, Jeonghwan Kim, Jongmo Seo, Barbara Di Camillo,

(参考訳) 表型バイオメディカルデータ分析では,特徴の重要性を議論する上で,高精度のチューニングモデルが必須であると考えられる。本研究では,性能の低いモデルも特徴として有用であることを示すとともに,一般的な信念に挑戦する。性能が連続的に低下するにつれて特徴量の変化を観測する実験を提案する。 3つの合成データセットと6つの実バイオメディカルデータセットを用いて、完全なデータセットから得られた特徴のランクを、サンプルサイズ(データ切断)が減ったもの(機能切断)または少ないもの(機能切断)と比較する。合成データセットでは、特徴切断は特徴ランクを変えないが、データ切断は低い性能で高い相違を示す。実際のデータセットでは、フィーチャーカットはデータカットと同じような、あるいは小さな変更を示しているが、いくつかのデータセットは反対である。相関を除去することで特徴の相互作用が制御される場合、特徴の切断は安定した安定性を示す。特徴値の分布を解析し,そのモデルが特徴間の特徴重要度を区別できない可能性を理論的に検証することにより,特徴切断による性能劣化にもかかわらず,データ切断によるものではないにもかかわらず,モデルが特徴重要度を識別できることを明らかにする。本研究は,データサイズが十分であれば,低性能レベルでも特徴重要度を維持可能であると結論付け,表型医療データ解析における最適下地性能に寄与する重要な要因である。本稿では,分類器の性能が十分でない場合でも,特徴量分析と統計解析を併用して相対的に特徴量を比較する可能性を示す。

In tabular biomedical data analysis, tuning models to high accuracy is considered a prerequisite for discussing feature importance, as medical practitioners expect the validity of feature importance to correlate with performance. In this work, we challenge the prevailing belief, showing that low-performing models may also be used for feature importance. We propose experiments to observe changes in feature rank as performance degrades sequentially. Using three synthetic datasets and six real biomedical datasets, we compare the rank of features from full datasets to those with reduced sample sizes (data cutting) or fewer features (feature cutting). In synthetic datasets, feature cutting does not change feature rank, while data cutting shows higher discrepancies with lower performance. In real datasets, feature cutting shows similar or smaller changes than data cutting, though some datasets exhibit the opposite. When feature interactions are controlled by removing correlations, feature cutting consistently shows better stability. By analyzing the distribution of feature importance values and theoretically examining the probability that the model cannot distinguish feature importance between features, we reveal that models can still distinguish feature importance despite performance degradation through feature cutting, but not through data cutting. We conclude that the validity of feature importance can be maintained even at low performance levels if the data size is adequate, which is a significant factor contributing to suboptimal performance in tabular medical data analysis. This paper demonstrates the potential for utilizing feature importance analysis alongside statistical analysis to compare features relatively, even when classifier performance is not satisfactory.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# $\textit{"I Don't Use AI for everything"$: Exploring Utility, Attitude, and Responsibility of AI-powered Tools in Software Development

$\textit{"I Don't Use AI for Everything"}$: Exploring Utility, Attitude, and Responsibility of AI-empowered Tools in Software Development ( http://arxiv.org/abs/2409.13343v1 )

ライセンス: Link先を確認

Shidong Pan, Litian Wang, Tianyi Zhang, Zhenchang Xing, Yanjie Zhao, Qinghua Lu, Xiaoyu Sun,

(参考訳) AIを活用したツールが変革の力として現れ、ソフトウェア開発業界を根本的に改革し、さまざまな分野にまたがる広範な影響を約束している。本研究では、ソフトウェア開発プロセスにおけるAIを活用したツールの採用、影響、およびセキュリティに関する考察を行う。さまざまなバックグラウンドを持つ19人のソフトウェア実践者との半構造化インタビューを通じて、AIツールの有用性、開発者に対する態度、セキュリティとプライバシ責任の3つの重要な側面を探求する。ソフトウェア開発のさまざまな段階において,AIツールが広く採用されていることが判明した。開発者は一般的に、AIに対する肯定的な態度を示し、仕事を置き換える脅威ではなく、効率を高めるアシスタントと見なしている。しかし、彼らはまた、ソフトウェア開発における複雑な、馴染みのない、あるいは非常に専門的なタスクを扱うAIの能力の限界を認識した。セキュリティとプライバシに関して、私たちは開発者の間でさまざまなレベルのリスク意識を見つけました。私たちの研究は、ソフトウェア開発におけるAIの採用状況に関する洞察を提供し、ソフトウェア産業におけるAIの統合を効果的にナビゲートするために、実践者、組織、AIプロバイダ、規制機関に推奨する。

AI-empowered tools have emerged as a transformative force, fundamentally reshaping the software development industry and promising far-reaching impacts across diverse sectors. This study investigates the adoption, impact, and security considerations of AI-empowered tools in the software development process. Through semi-structured interviews with 19 software practitioners from diverse backgrounds, we explore three key aspects: the utility of AI tools, developers' attitudes towards them, and security and privacy responsibilities. Our findings reveal widespread adoption of AI tools across various stages of software development. Developers generally express positive attitudes towards AI, viewing it as an efficiency-enhancing assistant rather than a job replacement threat. However, they also recognized limitations in AI's ability to handle complex, unfamiliar, or highly specialized tasks in software development. Regarding security and privacy, we found varying levels of risk awareness among developers, with larger companies implementing more comprehensive risk management strategies. Our study provides insights into the current state of AI adoption in software development and offers recommendations for practitioners, organizations, AI providers, and regulatory bodies to effectively navigate the integration of AI in the software industry.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# マルチモーダルモデルのための新しい適応的微調整アルゴリズム:リモートセンシングにおける自己最適化分類と高品質データセットの選択

A Novel Adaptive Fine-Tuning Algorithm for Multimodal Models: Self-Optimizing Classification and Selection of High-Quality Datasets in Remote Sensing ( http://arxiv.org/abs/2409.13345v1 )

ライセンス: Link先を確認

Yi Ren, Tianyi Zhang, Zhixiong Han, Weibin Li, Zhiyang Wang, Wenbo Ji, Chenhao Qin, Chenbin Liang, Licheng Jiao,

(参考訳) マルチモーダル大モデルに対する適応的な微調整アルゴリズムを提案する。このアルゴリズムの中核ステップは2段階の切り離しを含む。まず、大量のデータを意味ベクトル空間に投影し、MiniBatchKMeansアルゴリズムを自動クラスタリングに使用する。この分類により、各クラスタ内のデータは、高いセマンティックな類似性を示す。次に、各クラスタのデータを処理し、マルチモーダル大モデルのベクトル空間における原データと摂動データの変換差を計算する。この差はデータの一般化指標として機能する。この測定値に基づいて、トレーニングのための高一般化ポテンシャルを持つデータを選択する。このアルゴリズムを用いて、GeoChatマルチモーダルリモートセンシングデータセットの3分の1を用いて、2台の3090 GPU上でInternLM-XComposer2-VL-7Bモデルをトレーニングした。その結果,我々のアルゴリズムは最先端のベースラインよりも優れていた。様々なベースライン最適に選択された3分の1のデータセットでトレーニングしたモデルは、実験的な検証に基づいて、フルデータセットでトレーニングしたモデルと比較して、さまざまなリモートセンシングメトリクスのパフォーマンスが1%低下しただけだった。このアプローチは、トレーニング時間を68.2%削減しながら、汎用能力を著しく維持した。さらに、UCMercedおよびAID評価データセットで89.86点、77.19点を記録し、GeoChatデータセットを5.43点、GeoChatデータセットを5.16点上回った。 LRBEN評価データセットでは0.91ポイントの低下しか示さなかった。

We propose an adaptive fine-tuning algorithm for multimodal large models. The core steps of this algorithm involve two stages of truncation. First, the vast amount of data is projected into a semantic vector space, and the MiniBatchKMeans algorithm is used for automated clustering. This classification ensures that the data within each cluster exhibit high semantic similarity. Next, we process the data in each cluster, calculating the translational difference between the original and perturbed data in the multimodal large model's vector space. This difference serves as a generalization metric for the data. Based on this metric, we select the data with high generalization potential for training. We applied this algorithm to train the InternLM-XComposer2-VL-7B model on two 3090 GPUs using one-third of the GeoChat multimodal remote sensing dataset. The results demonstrate that our algorithm outperforms the state-of-the-art baselines. various baselines. The model trained on our optimally chosen one-third dataset, based on experimental validation, exhibited only 1% reduction in performance across various remote sensing metrics compared to the model trained on the full dataset. This approach significantly preserved general-purpose capabilities while reducing training time by 68.2%. Furthermore, the model achieved scores of 89.86 and 77.19 on the UCMerced and AID evaluation datasets, respectively, surpassing the GeoChat dataset by 5.43 and 5.16 points. It only showed a 0.91-point average decrease on the LRBEN evaluation dataset.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# チューニング不要のパーソナライズド画像生成

Imagine yourself: Tuning-Free Personalized Image Generation ( http://arxiv.org/abs/2409.13346v1 )

ライセンス: Link先を確認

Zecheng He, Bo Sun, Felix Juefei-Xu, Haoyu Ma, Ankit Ramchandani, Vincent Cheung, Siddharth Shah, Anmol Kalia, Harihar Subramanyam, Alireza Zareian, Li Chen, Ankit Jain, Ning Zhang, Peizhao Zhang, Roshan Sumbaly, Peter Vajda, Animesh Sinha,

(参考訳) 拡散モデルは様々な画像と画像のタスクにおいて顕著な効果を示した。本研究では,画像のパーソナライズを目的とした最先端モデルであるImagine yourselfを紹介する。従来のチューニングベースのパーソナライズ手法とは異なり、Imagine自身はチューニング不要のモデルとして機能し、すべてのユーザが個別に調整することなく共有フレームワークを利用することができる。さらに、従来の研究は、複雑なプロンプトに従って視覚的品質を保ちながら、アイデンティティ保存のバランスをとるという課題に遭遇し、結果として参照画像のコピー・ペースト効果が強いモデルとなった。したがって、参照画像、 \eg、表情の変化、頭と体のポーズ、生成した画像の多様性を著しく変更する必要のあるプロンプトに従って画像を生成することは困難である。これらの制限に対処するために,提案手法を紹介する。 1)画像の多様性を促進するための新しい合成ペアデータ生成機構 2)3つのテキストエンコーダと、テキスト忠実性を改善するための完全に訓練可能なビジョンエンコーダを備えた完全に平行なアテンションアーキテクチャ 3) 視覚的品質の境界を徐々に推し進める, 粗大な多段階ファインタニング手法を提案する。我々の研究は、Imagine自身が最先端のパーソナライズモデルを超え、アイデンティティ保存、視覚的品質、テキストアライメントにおいて優れた能力を示すことを示した。このモデルは、様々なパーソナライズアプリケーションのための堅牢な基盤を確立する。人間の評価結果は、過去のパーソナライゼーションモデルと比較して、モデルのSOTA優越性(アイデンティティ保存、テキスト忠実性、視覚的魅力)を全側面にわたって評価する。

Diffusion models have demonstrated remarkable efficacy across various image-to-image tasks. In this research, we introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. Unlike conventional tuning-based personalization techniques, Imagine yourself operates as a tuning-free model, enabling all users to leverage a shared framework without individualized adjustments. Moreover, previous work met challenges balancing identity preservation, following complex prompts and preserving good visual quality, resulting in models having strong copy-paste effect of the reference images. Thus, they can hardly generate images following prompts that require significant changes to the reference image, \eg, changing facial expression, head and body poses, and the diversity of the generated images is low. To address these limitations, our proposed method introduces 1) a new synthetic paired data generation mechanism to encourage image diversity, 2) a fully parallel attention architecture with three text encoders and a fully trainable vision encoder to improve the text faithfulness, and 3) a novel coarse-to-fine multi-stage finetuning methodology that gradually pushes the boundary of visual quality. Our study demonstrates that Imagine yourself surpasses the state-of-the-art personalization model, exhibiting superior capabilities in identity preservation, visual quality, and text alignment. This model establishes a robust foundation for various personalization applications. Human evaluation results validate the model's SOTA superiority across all aspects (identity preservation, text faithfulness, and visual appeal) compared to the previous personalization models.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# V-Hands:リモートホワイトボードインタラクションのためのタッチスクリーンによるハンドトラッキング

V-Hands: Touchscreen-based Hand Tracking for Remote Whiteboard Interaction ( http://arxiv.org/abs/2409.13347v1 )

ライセンス: Link先を確認

Xinshuang Liu, Yizhong Zhang, Xin Tong,

(参考訳) ホワイトボードベースのリモートコミュニケーションでは、描画されたコンテンツと手画面のインタラクションのシームレスな統合が、没入的なユーザエクスペリエンスに不可欠である。これまでの方法では、手の動きを捉えるために、かさばる装置のセットアップが必要だったり、静電容量画像から手の動きを正確に追跡できなかったりしていた。本稿では,容量的ビデオフレームから両手の3Dポーズを正確に追跡するリアルタイム手法を提案する。そこで我々は,手の位置をキャパシタフレームから同定し,手関節位置から手関節位置を推定するディープニューラルネットワークを開発し,制約された逆運動論的解法を用いて手関節位置から3次元手ポーズを復元する。さらに,高品質な手画面インタラクションデータをキャプチャする装置を設計し,より正確な同期型容量ビデオと手ポーズデータセットを得た。本手法は,遠隔通信のためのコンパクトな装置構成を維持しながら,キャパシタフレームの3次元ハンドトラッキングの精度と安定性を向上させる。提案手法の有効性を検証し,提案手法の有効性を検証した。私たちのコード、モデル、データセットはhttps://V-Hands.github.io.comで公開されています。

In whiteboard-based remote communication, the seamless integration of drawn content and hand-screen interactions is essential for an immersive user experience. Previous methods either require bulky device setups for capturing hand gestures or fail to accurately track the hand poses from capacitive images. In this paper, we present a real-time method for precise tracking 3D poses of both hands from capacitive video frames. To this end, we develop a deep neural network to identify hands and infer hand joint positions from capacitive frames, and then recover 3D hand poses from the hand-joint positions via a constrained inverse kinematic solver. Additionally, we design a device setup for capturing high-quality hand-screen interaction data and obtained a more accurate synchronized capacitive video and hand pose dataset. Our method improves the accuracy and stability of 3D hand tracking for capacitive frames while maintaining a compact device setup for remote communication. We validate our scheme design and its superior performance on 3D hand pose tracking and demonstrate the effectiveness of our method in whiteboard-based remote communication. Our code, model, and dataset are available at https://V-Hands.github.io.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# ID-Guard: ブレーキング識別による顔操作のユニバーサルフレームワーク

ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification ( http://arxiv.org/abs/2409.13349v1 )

ライセンス: Link先を確認

Zuomin Qu, Wei Lu, Xiangyang Luo, Qian Wang, Xiaochun Cao,

(参考訳) 深層学習に基づく顔操作の誤用は、公民権に対する潜在的な脅威となる。この不正行為を防止すべく、画像に見えない敵の摂動を加えて操作過程を妨害するプロアクティブディフェンス技術が提案され、偽造された出力が観察者を不安にさせる。しかし、その非指向的な出力の破壊は、画像中の人物のアイデンティティ情報の保持を招き、個人のスティグマティゼーションにつながる可能性がある。本稿では,IDガード(ID-Guard)と呼ばれる,顔操作と戦うための新しいユニバーサルフレームワークを提案する。具体的には、特定の顔画像に対応するクロスモデルユニバーサル対向摂動を生成するために、エンコーダ・デコーダネットワークの1つのフォワードパスしか必要としない。顔画像の匿名性を確保するため、偽造顔の識別情報を標的に破壊する新しいIDM(IDM)を導入する。さらに,多タスク学習問題として,異なる顔操作への障害を考慮した摂動を最適化し,クロスモデル性能を向上させるために動的重み付け戦略を設計する。提案フレームワークは, 顔画像の特定領域を効果的に歪ませることによって, 複数の顔の操作に対する防御効果を顕著に報告した。さらに,我々の実験では,破壊された画像が顔の塗り絵やオープンソースの画像認識システムを避けることができるID-Guardの能力を明らかにした。

The misuse of deep learning-based facial manipulation poses a potential threat to civil rights. To prevent this fraud at its source, proactive defense technology was proposed to disrupt the manipulation process by adding invisible adversarial perturbations into images, making the forged output unconvincing to the observer. However, their non-directional disruption of the output may result in the retention of identity information of the person in the image, leading to stigmatization of the individual. In this paper, we propose a novel universal framework for combating facial manipulation, called ID-Guard. Specifically, this framework requires only a single forward pass of an encoder-decoder network to generate a cross-model universal adversarial perturbation corresponding to a specific facial image. To ensure anonymity in manipulated facial images, a novel Identity Destruction Module (IDM) is introduced to destroy the identifiable information in forged faces targetedly. Additionally, we optimize the perturbations produced by considering the disruption towards different facial manipulations as a multi-task learning problem and design a dynamic weights strategy to improve cross-model performance. The proposed framework reports impressive results in defending against multiple widely used facial manipulations, effectively distorting the identifiable regions in the manipulated facial images. In addition, our experiments reveal the ID-Guard's ability to enable disrupted images to avoid face inpaintings and open-source image recognition systems.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# 大規模言語モデルにおける感情認知の最近の進歩

Recent Advancement of Emotion Cognition in Large Language Models ( http://arxiv.org/abs/2409.13354v1 )

ライセンス: Link先を確認

Yuyan Chen, Yanghua Xiao,

(参考訳) 大規模言語モデル(LLM)における感情認知は、ソーシャルメディア、人間とコンピュータの相互作用、メンタルヘルスアセスメントなど、さまざまなアプリケーションにおけるパフォーマンス向上に不可欠である。我々は、感情分類、感情的に豊かな反応生成、心の理論を主軸とする現在の研究の展望を探求するとともに、注釈付きデータへの依存や感情処理の複雑さといった課題を認識している。本稿では,感情認知のためのLSMの最近の進歩について,詳細な調査を行う。我々は、Ulric Neisserの認知段階と整合して、重要な研究、方法論、成果、資源を探究する。さらに、教師なし学習アプローチや、より複雑で解釈可能な感情認知LLMの開発など、この発展分野における研究の今後の方向性について概説する。また、LLMの感情認知能力を向上させるために使用されるコントラスト学習などの高度な手法についても論じる。

Emotion cognition in large language models (LLMs) is crucial for enhancing performance across various applications, such as social media, human-computer interaction, and mental health assessment. We explore the current landscape of research, which primarily revolves around emotion classification, emotionally rich response generation, and Theory of Mind assessments, while acknowledge the challenges like dependency on annotated data and complexity in emotion processing. In this paper, we present a detailed survey of recent progress in LLMs for emotion cognition. We explore key research studies, methodologies, outcomes, and resources, aligning them with Ulric Neisser's cognitive stages. Additionally, we outline potential future directions for research in this evolving field, including unsupervised learning approaches and the development of more complex and interpretable emotion cognition LLMs. We also discuss advanced methods such as contrastive learning used to improve LLMs' emotion cognition capabilities.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# EmotionQueen: 大規模言語モデルの共感を評価するベンチマーク

EmotionQueen: A Benchmark for Evaluating Empathy of Large Language Models ( http://arxiv.org/abs/2409.13359v1 )

ライセンス: Link先を確認

Yuyan Chen, Hao Wang, Songzhou Yan, Sijia Liu, Yueze Li, Yi Zhao, Yanghua Xiao,

(参考訳) 大規模言語モデル(LLM)における感情知能は自然言語処理において非常に重要である。しかし,従来の研究は,LLMの全体的感情知能を評価するには不十分な感情認識など,基本的な感情分析タスクに重点を置いていた。そこで本稿では,LLMの感情的知性を評価するためのEmotionQueenというフレームワークを提案する。このフレームワークには、キーイベント認識、混合イベント認識、インプリシット感情認識、意図認識の4つの固有のタスクが含まれている。 LLMは重要な出来事や暗黙の感情を認識し、共感的な反応を生成するよう要求される。また、感情関連文の認識と応答におけるLLMの能力を評価するための2つの指標を設計する。実験により、LLMの能力と感情知能の限界について重要な結論が得られた。

Emotional intelligence in large language models (LLMs) is of great importance in Natural Language Processing. However, the previous research mainly focus on basic sentiment analysis tasks, such as emotion recognition, which is not enough to evaluate LLMs' overall emotional intelligence. Therefore, this paper presents a novel framework named EmotionQueen for evaluating the emotional intelligence of LLMs. The framework includes four distinctive tasks: Key Event Recognition, Mixed Event Recognition, Implicit Emotional Recognition, and Intention Recognition. LLMs are requested to recognize important event or implicit emotions and generate empathetic response. We also design two metrics to evaluate LLMs' capabilities in recognition and response for emotion-related statements. Experiments yield significant conclusions about LLMs' capabilities and limitations in emotion intelligence.

翻訳日:2024-11-07 07:40:00 公開日:2024-09-20

# FPBoost: 生存分析のための完全なパラメトリックグラディエントブースティング

FPBoost: Fully Parametric Gradient Boosting for Survival Analysis ( http://arxiv.org/abs/2409.13363v1 )

ライセンス: Link先を確認

Alberto Archetti, Eugenio Lomurno, Diego Piccinotti, Matteo Matteucci,

(参考訳) 生存分析は、時間から時間までのデータを分析し、貴重な臨床的洞察を抽出するための重要なツールである。近年,ニューラルネットワークと決定木を利用した機械学習技術が数多く開発されている。これらのうち、最も成功したアプローチは、しばしばモデル化されたハザード関数の形状に関する特定の仮定に依存する。これらの仮定には、比例的ハザード、加速された障害時間、予め定義された時間点の集合での離散推定が含まれる。本研究では,個別のパラメトリック・ハザード・コントリビューションの重み付け和に基づくサバイバルモデル設計のための新しいパラダイムを提案する。我々は,付加的ハザード関数を適用し,生存率や累積ハザード関数に基づくアプローチを改善することにより,フィールドに新たなコントリビューションをもたらすために,よく知られたアンサンブル技術を構築した。さらに、我々はFPBoostと呼ぶモデルを提案し、勾配押し上げによる生存率を直接最適化する最初のアルゴリズムである。我々は、さまざまなデータセットをまたいだアプローチを評価し、さまざまな最先端モデルと比較した。その結果、FPBoostは、基準値と校正値の両方でリスク推定を改善することが示された。

Survival analysis is a critical tool for analyzing time-to-event data and extracting valuable clinical insights. Recently, numerous machine learning techniques leveraging neural networks and decision trees have been developed for this task. Among these, the most successful approaches often rely on specific assumptions about the shape of the modeled hazard function. These assumptions include proportional hazard, accelerated failure time, or discrete estimation at a predefined set of time points. In this study, we propose a novel paradigm for survival model design based on the weighted sum of individual fully parametric hazard contributions. We build upon well-known ensemble techniques to deliver a novel contribution to the field by applying additive hazard functions, improving over approaches based on survival or cumulative hazard functions. Furthermore, the proposed model, which we call FPBoost, is the first algorithm to directly optimize the survival likelihood via gradient boosting. We evaluated our approach across a diverse set of datasets, comparing it against a variety of state-of-the-art models. The results demonstrate that FPBoost improves risk estimation, according to both concordance and calibration metrics.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# 可観測物の量子度生成における量子速度制限

Quantum Speed limit on the production of quantumness of observables ( http://arxiv.org/abs/2409.13365v1 )

ライセンス: Link先を確認

Divyansh Shrimali, Swapnil Bhowmick, Arun Kumar Pati,

(参考訳) 量子システムの古典的でない特徴は、環境や雑音にさらされると劣化することがある。量子系がノイズの存在下で古典的でない特徴を示すのに要する最低時間はどのくらいか? ここでは、2つの与えられた可観測体の可観測子のノルムとして観測可能の量子性に明確な速度制限を証明している。このような量子度測定の速度制限は、量子度の変化率の基本的な上限を設定し、与えられた量によってシステムの量子度を変更するのに必要な時間に対する下限を与える。さらに、量子系における重ね合わせの量をキャプチャする量子コヒーレンスのような古典的でない特徴に対して、速度制限を証明した。得られた速度制限は、興味のある物理過程に対して達成可能であることを実証したので、これらの境界は厳密であると見なすことができる。

Non-classical features of quantum systems can degrade when subjected to environment and noise. Here, we ask a fundamental question: What is the minimum amount of time it takes for a quantum system to exhibit non-classical features in the presence of noise? Here, we prove distinct speed limits on the quantumness of observable as the norm of the commutator of two given observables. The speed limit on such quantumness measures sets the fundamental upper bound on the rate of change of quantumness, which provides the lower bound on the time required to change the quantumness of a system by a given amount. Additionally, we have proved speed limit for the non-classical features such as quantum coherence that captures the amount of superposition in the quantum systems. We have demonstrated that obtained speed limits are attainable for physical processes of interest, and hence, these bounds can be considered to be tight.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# RingMo-Aerial:アフィン変換コントラスト学習を用いた空中リモートセンシング基礎モデル

RingMo-Aerial: An Aerial Remote Sensing Foundation Model With A Affine Transformation Contrastive Learning ( http://arxiv.org/abs/2409.13366v1 )

ライセンス: Link先を確認

Wenhui Diao, Haichen Yu, Kaiyue Kang, Tong Ling, Di Liu, Yingchao Feng, Hanbo Bi, Libo Ren, Xuexue Li, Yongqiang Mao, Xian Sun,

(参考訳) 空中リモートセンシング(ARS)の視覚タスクは、視角の独特の特徴のために大きな課題を生んでいる。既存の研究は主に特定のタスクのアルゴリズムに焦点を当てており、幅広いARSビジョンアプリケーションに適用性に制限がある。本稿では,ARSビジョンの分野における基礎モデル研究のギャップを埋めることを目的としたRingMo-Aerialモデルを提案する。周波数強化型マルチヘッド・セルフアテンション(FE-MSA)機構とアフィン変換に基づくコントラスト学習事前学習手法を導入することにより、小型目標に対するモデルの検出能力を向上し、ARSの特徴となる傾いた視野角に最適化する。さらに,ARS-Adapterは,様々なARSビジョンタスクにおけるモデルの適応性と有効性を改善するために,効率的なパラメータ調整手法である。実験により、RingMo-Aerialは複数の下流タスクにおいてSOTA性能を達成することを示した。このことは、ARS視覚タスクの性能向上におけるRingMo-Aerialの実用性と有効性を示している。

Aerial Remote Sensing (ARS) vision tasks pose significant challenges due to the unique characteristics of their viewing angles. Existing research has primarily focused on algorithms for specific tasks, which have limited applicability in a broad range of ARS vision applications. This paper proposes the RingMo-Aerial model, aiming to fill the gap in foundation model research in the field of ARS vision. By introducing the Frequency-Enhanced Multi-Head Self-Attention (FE-MSA) mechanism and an affine transformation-based contrastive learning pre-training method, the model's detection capability for small targets is enhanced and optimized for the tilted viewing angles characteristic of ARS. Furthermore, the ARS-Adapter, an efficient parameter fine-tuning method, is proposed to improve the model's adaptability and effectiveness in various ARS vision tasks. Experimental results demonstrate that RingMo-Aerial achieves SOTA performance on multiple downstream tasks. This indicates the practicality and effectiveness of RingMo-Aerial in enhancing the performance of ARS vision tasks.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# ALPEC: 臨床における機械学習による覚醒検出のための総合的評価フレームワークとデータセット

ALPEC: A Comprehensive Evaluation Framework and Dataset for Machine Learning-Based Arousal Detection in Clinical Practice ( http://arxiv.org/abs/2409.13367v1 )

ライセンス: Link先を確認

Stefan Kraft, Andreas Theissler, Vera Wienhausen-Wilke, Philipp Walter, Gjergji Kasneci,

(参考訳) 睡眠障害の診断には睡眠中の覚醒剤の検出が不可欠である。しかし、臨床実践における機械学習(ML)の使用は、主に臨床プロトコルとMLメソッドのミスマッチによって、基本的な問題によって妨げられている。臨床医は通常、覚醒の開始のみに注釈を付けるが、MLメソッドは開始と終了の両方にアノテーションに依存する。また、覚醒検出モデルに対する臨床ニーズに合わせて標準化された評価手法は存在しない。本研究は, 覚醒剤の局所化と正確な事象数(ALPEC)を重視した新しい後処理・評価フレームワークを導入することで, これらの課題に対処する。我々は,ML実践者が,臨床実践と整合して覚醒的発症を検出することに注力することを推奨する。この変化が現在のトレーニングや評価方法に与える影響について検討し、単純化と課題に対処する。我々は、上記の臨床アノテーション制約を反映し、既存のポリソノグラフィーデータセットに存在しないモダリティを含む、新しい包括的ポリソノグラフィーデータセット(CPS)を利用する。本論文と並行してデータセットを公開し,マルチモーダルデータを利用した覚醒的オンセット検出の利点を実証する。本研究は,MLに基づく覚醒検出を臨床環境に統合し,技術進歩と臨床ニーズとのギャップを減らした。

Detecting arousals in sleep is essential for diagnosing sleep disorders. However, using Machine Learning (ML) in clinical practice is impeded by fundamental issues, primarily due to mismatches between clinical protocols and ML methods. Clinicians typically annotate only the onset of arousals, while ML methods rely on annotations for both the beginning and end. Additionally, there is no standardized evaluation methodology tailored to clinical needs for arousal detection models. This work addresses these issues by introducing a novel post-processing and evaluation framework emphasizing approximate localization and precise event count (ALPEC) of arousals. We recommend that ML practitioners focus on detecting arousal onsets, aligning with clinical practice. We examine the impact of this shift on current training and evaluation schemes, addressing simplifications and challenges. We utilize a novel comprehensive polysomnographic dataset (CPS) that reflects the aforementioned clinical annotation constraints and includes modalities not present in existing polysomnographic datasets. We release the dataset alongside this paper, demonstrating the benefits of leveraging multimodal data for arousal onset detection. Our findings significantly contribute to integrating ML-based arousal detection in clinical settings, reducing the gap between technological advancements and clinical needs.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# MCICSAM:Monte Carlo-Guided Interpolation Consistency Segment Anything Model for Semi-Supervised Prestate Zone Segmentation

MCICSAM: Monte Carlo-guided Interpolation Consistency Segment Anything Model for Semi-Supervised Prostate Zone Segmentation ( http://arxiv.org/abs/2409.13371v1 )

ライセンス: Link先を確認

Guantian Huang, Beibei Li, Xiaobing Fan, Aritrick Chatterjee, Cheng Wei, Shouliang Qi, Wei Qian, Dianning He,

(参考訳) 前立腺内の様々な領域の正確なセグメンテーションは、前立腺関連疾患の診断と治療に重要である。しかし、特に前立腺画像のような特殊な医療分野におけるラベル付きデータの不足は、大きな課題となっている。 Segment Anything Model (SAM)は、自然画像分割のための新しい大きなモデルであるが、医療画像にはいくつかの課題がある。 SAMの強力な特徴抽出機能を活用し,医用画像アノテーションの低データボリューム問題に対処するために,モンテカルロのローランド適応(LoRA)と半教師あり学習手法を用いた補間整合(MCIC)を用いて,SAMの微調整を行う。半教師付き学習に基づく前立腺領域セグメンテーションに適用するためのモンテカルロ誘導補間一貫性セグメンテーションモデル(MCICSAM)を提案する。非ラベルデータセクションでは、MCICは入力データに対して2つの異なる補間変換を行い、モンテカルロの不確実性解析を出力に組み込む。これらの補間されたサンプルに課される一貫性の制約により、モデルがラベルのないデータの分布をよりよく適合させ、最終的に半教師付きシナリオのパフォーマンスを向上させることができる。 Dice と Hausdorff Distance at 95th percentile (HD95) を使ってモデル性能を検証する。 MCICSAMはDiceを79.38%、89.95%で、HD95値を3.12と2.27で改善している。同時に、MCICSAMは強い一般化性を示す。この手法は前立腺画像分割の分野で新たな可能性をもたらすことが期待されている。

Accurate segmentation of various regions within the prostate is pivotal for diagnosing and treating prostate-related diseases. However, the scarcity of labeled data, particularly in specialized medical fields like prostate imaging, poses a significant challenge. Segment Anything Model (SAM) is a new large model for natural image segmentation, but there are some challenges in medical imaging. In order to better utilize the powerful feature extraction capability of SAM as well as to address the problem of low data volume for medical image annotation, we use Low-Rank Adaptation (LoRA) and semi-supervised learning methods of Monte Carlo guided interpolation consistency (MCIC) to enhance the fine-tuned SAM. We propose Monte Carlo-guided Interpolation Consistency Segment Anything Model (MCICSAM) for application to semi-supervised learning based prostate region segmentation. In the unlabeled data section, MCIC performs two different interpolation transformations on the input data and incorporates Monte Carlo uncertainty analysis in the output, forcing the model to be consistent in its predictions. The consistency constraints imposed on these interpolated samples allow the model to fit the distribution of unlabeled data better, ultimately improving its performance in semi-supervised scenarios. We use Dice and Hausdorff Distance at 95th percentile (HD95) to validate model performance. MCICSAM yieldes Dice with 79.38% and 89.95%, along with improves HD95 values of 3.12 and 2.27 for transition zone and transition zone. At the same time MCICSAM demonstrates strong generalizability. This method is expected to bring new possibilities in the field of prostate image segmentation.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# 非エルミートグライド時間対称性

Non-Hermitian glide-time symmetry ( http://arxiv.org/abs/2409.13372v1 )

ライセンス: Link先を確認

Li-Wei Wang, Jian-Hua Jiang,

(参考訳) 非エルミート系は、従来のエルミート系を超えて、例外点や複雑なスペクトルトポロジーのような興味深い概念や、非エルミート皮膚効果(NHSE)のようなエキゾチックな現象をもたらした。しかしながら、非エルミート系に関する以前の研究は主に固有状態の性質に焦点を当てており、非エルミート力学現象についてはより限定的な議論がなされている。ここでは、非エルミート物理学におけるパリティ時対称性の成功に触発され、グライド時反転(GT)対称性を持つ一次元非エルミート系を理論的に研究する。我々は、GT対称性が特異な物理的性質をもたらし、非エルミート系においてリッチな動的現象を可能にすることを発見した。注目すべきは、異なる動的位相にまたがる多様な挙動を示す動的NHSEを明らかにし、非エルミート力学の豊かさを解明することである。我々は、リッチな非エルミート力学現象を理解するための理論的枠組みを確立する。さらに、GT対称系のリッチな動的位相は、バルク内およびエッジ境界における力学の顕著なチューニングを可能にすることを示す。これらには、バルクにおける指向性波動伝播と増幅、エッジ境界における波のトラップと動的パターンが含まれる。理論的枠組みの発展と豊富な非エルミート力学相の研究の両方により、この研究は非エルミート力学の将来の研究の基盤となり、格子対称性の重要な役割に特に重点を置いている。

Non-Hermitian systems, going beyond conventional Hermitian systems, have brought in intriguing concepts such as exceptional points and complex spectral topology as well as exotic phenomena such as non-Hermitian skin effects (NHSEs). However, previous studies on non-Hermitian systems predominantly focus on the properties of eigenstates, with rather limited discussions on non-Hermitian dynamic phenomena. Here, inspired by the celebrated success of the parity-time symmetry in non-Hermitian physics, we theoretically study a one-dimensional non-Hermitian system with glide-time reversal (GT) symmetry. We discover that the GT symmetry leads to unique physical properties and enables rich dynamic phenomena in non-Hermitian systems. Remarkably, we reveal the dynamic NHSEs that exhibit diverse behaviors across distinct dynamic phases, elucidating the richness of non-Hermitian dynamics. We establish the theoretical frameworks for understanding the rich non-Hermitian dynamic phenomena. We further show that the rich dynamic phases in the GT-symmetric systems enable the remarkable tuning of the dynamics in the bulk as well as at the edge boundaries. These include the directional wave propagation and amplification in the bulk, as well as the wave trapping and the dynamic patterns at the edge boundaries. With both the development in the theoretical framework and the study of the rich non-Hermitian dynamic phases, this work serves as a stepstone for future studies on non-Hermitian dynamics with a special emphasize on the pivotal role of the lattice symmetry.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# LLMはまだ計画できない, LRMは可能か? OpenAIのo1のPlanBenchに関する予備的評価

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench ( http://arxiv.org/abs/2409.13373v1 )

ライセンス: Link先を確認

Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati,

(参考訳) 望ましい状況を達成するための行動コースを計画する能力は、長年、知的エージェントのコアコンピテンスと考えられてきた。大規模言語モデル(LLM)の出現により、そのような計画能力を持っているかどうかという問題にかなりの関心が寄せられている。 GPT3のリリース直後の2022年に開発した拡張可能なベンチマークであるPlanBenchは、LLMの計画能力を評価する上で重要なツールであり続けている。 GPT3以来、新しいプライベートおよびオープンソース LLM が多数存在するが、このベンチマークの進捗は驚くほど遅かった。 OpenAIによると、最近のo1(Strawberry)モデルは、自己回帰型LLMの通常の制限から逃れるために特別に構築され、訓練されている。この開発を触媒として利用し、現在のLLMと新しいLRMがPlanBenchにどの程度優れているかを包括的に検討する。ご覧の通り、o1のパフォーマンスはベンチマークの量子的改善であり、競争を上回りますが、それでも飽和には程遠いです。この改善は、そのようなシステムをデプロイする前に考慮すべき正確性、効率、保証に関する問題にもつながる。

The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities. PlanBench, an extensible benchmark we developed in 2022, soon after the release of GPT3, has remained an important tool for evaluating the planning abilities of LLMs. Despite the slew of new private and open source LLMs since GPT3, progress on this benchmark has been surprisingly slow. OpenAI claims that their recent o1 (Strawberry) model has been specifically constructed and trained to escape the normal limitations of autoregressive LLMs--making it a new kind of model: a Large Reasoning Model (LRM). Using this development as a catalyst, this paper takes a comprehensive look at how well current LLMs and new LRMs do on PlanBench. As we shall see, while o1's performance is a quantum improvement on the benchmark, outpacing the competition, it is still far from saturating it. This improvement also brings to the fore questions about accuracy, efficiency, and guarantees which must be considered before deploying such systems.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# ポスト選択1ショット対称量子状態の識別とアクセプタンスにおける性能指標としての誤差最小化測定

Error-Minimizing Measurements in Postselected One-Shot Symmetric Quantum State Discrimination and Acceptance as a Performance Metric ( http://arxiv.org/abs/2409.13379v1 )

ライセンス: Link先を確認

Saurabh Kumar Gupta, Abhishek K. Gupta,

(参考訳) 量子状態を用いた仮説テストでは、2つの可能な状態の1つを含むブラックボックスが与えられると、仮説の1つを優先して測定を行う。ポストセレクトされた仮説テストでは、仮説のいずれかを選択しない3番目の結果が追加される。ポストセレクトされたシナリオでは、最小誤差の1ショット対称仮説テストは、選択された結果の1つが生じるという事実に基づいて条件付けられた文献によって特徴づけられる。この方向にさらに進み、最小限の誤差につながるあらゆる可能な測定値のセットを与えます。パラメトリック形式で任意の誤差最小化測定を行った。どの仮説も選択しないことは、テストの品質を損なうことに注意してください。さらに、これらの測定値が品質によって異なることを示す例を挙げる。ポストセレクトされた仮説テストの品質について議論する必要がある。そこで, 提案手法は, 任意の誤差最小化測定値に対する受理の表現をパラメータとして定義することで, ポストセレクト仮説検定の質を特徴付ける。最小誤差を達成できる測定値のセットについて、受け入れを最大化し、それを達成した例を与えられたので、受け入れの観点で可能な最良の測定値の例を与える。

In hypothesis testing with quantum states, given a black box containing one of the two possible states, measurement is performed to detect in favor of one of the hypotheses. In postselected hypothesis testing, a third outcome is added, corresponding to not selecting any of the hypotheses. In postselected scenario, minimum error one-shot symmetric hypothesis testing is characterized in literature conditioned on the fact that one of the selected outcomes occur. We proceed further in this direction to give the set of all possible measurements that lead to the minimum error. We have given an arbitrary error-minimizing measurement in a parametric form. Note that not selecting any of the hypotheses decimates the quality of testing. We further give an example to show that these measurements vary in quality. There is a need to discuss the quality of postselected hypothesis testing. We then characterize the quality of postselected hypothesis testing by defining a new metric acceptance and give expression of acceptance for an arbitrary error-minimizing measurement in terms of some parameters of the measurement. On the set of measurements that achieve minimum error, we have maximized the acceptance, and given an example which achieves that, thus giving an example of the best possible measurement in terms of acceptance.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# 音声合成におけるロバスト協調透かしのための音声コーデック強化

Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis ( http://arxiv.org/abs/2409.13382v1 )

ライセンス: Link先を確認

Lauri Juvela, Xin Wang,

(参考訳) 合成音声の自動検出がますます重要になっているのは、現在の合成法がヒトの音声とほぼ区別がつかず、一般に広くアクセス可能であるためである。音声透かしやその他のアクティブな開示手法は、受動的検出に基づいて従来のディープフェイク防御を補完できるため、研究活動を惹きつけている。アクティブな検出と受動的検出の両方において、堅牢性は大きな関心事である。従来のオーディオ透かしは、特にオーディオコーデックアプリケーションによる攻撃を受けやすい。野生に放出されるほとんどの音声および音声コンテンツは、純粋に分配方法としてオーディオコーデックを通り抜ける。我々は最近,雑音に富むが識別可能な伝送路上で生成した音声をより容易に検出する手法として,協調的な透かしを提案する。本稿では,従来の音声コーデックやニューラルオーディオコーデックと併用するためにチャネル拡張を拡張し,様々な構成に対するコーデックビットレートの転送性および効果を評価する。その結果、勾配近似のための波形領域ストレートスルー推定器を用いて、ブラックボックスオーディオコーデックによって協調的な透かしを確実に拡張できることが示唆された。さらに,この結果から,ニューラルオーディオコーデックによるチャネル拡張は従来のコーデックによく寄与することが示された。リスニングテストでは、8kbpsの高ビットレートコーデックやDACで、協調的な透かしは知覚上の劣化を無視できることを示した。

Automatic detection of synthetic speech is becoming increasingly important as current synthesis methods are both near indistinguishable from human speech and widely accessible to the public. Audio watermarking and other active disclosure methods of are attracting research activity, as they can complement traditional deepfake defenses based on passive detection. In both active and passive detection, robustness is of major interest. Traditional audio watermarks are particularly susceptible to removal attacks by audio codec application. Most generated speech and audio content released into the wild passes through an audio codec purely as a distribution method. We recently proposed collaborative watermarking as method for making generated speech more easily detectable over a noisy but differentiable transmission channel. This paper extends the channel augmentation to work with non-differentiable traditional audio codecs and neural audio codecs and evaluates transferability and effect of codec bitrate over various configurations. The results show that collaborative watermarking can be reliably augmented by black-box audio codecs using a waveform-domain straight-through-estimator for gradient approximation. Furthermore, that results show that channel augmentation with a neural audio codec transfers well to traditional codecs. Listening tests demonstrate collaborative watermarking incurs negligible perceptual degradation with high bitrate codecs or DAC at 8kbps.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# 弱測定方式における量子メモリによるエンプティ波の試行

A Test of Empty Wave via Quantum Memory in a Weak Measurement Scheme ( http://arxiv.org/abs/2409.13383v1 )

ライセンス: Link先を確認

Jian-Peng Dou, Feng Lu, Hao Tang, Xiao-Wen Shang, Xian-Min Jin,

(参考訳) 量子力学では、長年の疑問が残る: 単一の光子が二重スリットをどう横切るのか? 1つの直感的な図は、光子は1つのスリットのみを通過し、その波動関数は「空」波と「フル」波に分裂することを示している。しかし、この空の波の現実はまだ確認されていない。本稿では、量子メモリと弱い測定を組み合わせた新しい実験構成を提案し、空波の性質について検討する。単一の原子励起は、二重スリット実験において2つの経路に類似した自由空間と量子メモリの間に確率的に分割される。量子メモリは、量子状態が崩壊することなく、保存されたスピン波の存在により単一光子ラマン散乱が増強される経路検出器として機能する。この拡張は古典的な情報として記録され、量子メモリに格納されたスピン波は2回検索され、干渉可視性は79%である。選択後において弱い値が検出される従来の弱い測定方式とは異なり、干渉が起こる前に弱い値を古典的な情報に変換する。量子メモリは,部分的な情報を抽出しながらコヒーレンスを保ち,量子計測に新たな洞察を与える計測装置としての可能性を示す。

In quantum mechanics, a long-standing question remains: How does a single photon traverse double slits? One intuitive picture suggests that the photon passes through only one slit, while its wavefunction splits into an ``empty" wave and a ``full" wave. However, the reality of this empty wave is yet to be verified. Here, we present a novel experimental configuration that combines quantum memory and weak measurement to investigate the nature of the empty wave. A single atomic excitation is probabilistically split between free space and a quantum memory, analogous to the two paths in a double-slit experiment. The quantum memory serves as a path detector, where single-photon Raman scattering is enhanced due to the presence of a stored spin wave, without collapsing the quantum state. This enhancement is recorded as classical information, and the spin wave stored in the quantum memory is retrieved twice, with an interference visibility of 79%. Unlike conventional weak measurement schemes, where weak values are detected during post-selection, our approach converts the weak value into classical information before interference takes place. Our results demonstrate the potential of quantum memory as a measurement device that preserves coherence while extracting partial information, offering new insights into quantum measurement.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# 不確実環境におけるロバスト信号制御のためのスケーラブル多目的最適化

Scalable Multi-Objective Optimization for Robust Traffic Signal Control in Uncertain Environments ( http://arxiv.org/abs/2409.13388v1 )

ライセンス: Link先を確認

Weian Guo, Wuzhao Li, Zhiou Zhang, Lun Zhang, Li Li, Dongyang Li,

(参考訳) 知的交通信号制御は、経済効率、環境持続可能性、日常生活の質に重要な影響を及ぼす現代都市経営にとって不可欠である。しかし、この数十年間、大規模な交通ネットワークの管理、交差点の調整、不確実な交通条件下での堅牢性確保において、大きな課題が続いている。本稿では,動的かつ不確実な都市環境におけるロバストな交通信号制御のための,スケーラブルな多目的最適化手法を提案する。本稿では,確率変数と確率的トラフィックパターンを組み込んだ多目的最適化モデルを提案する。本稿では,適応ハイブリッド多目的最適化アルゴリズム (Adaptive Hybrid Multi-Objective Optimization Algorithm, AHMOA) を提案する。 AHMOAは、予測不可能なトラフィックの変化に対応しつつ、平均遅延、ネットワーク安定性、システムの堅牢性など、複数の目的を同時に最適化する。このアルゴリズムは、進化的戦略と、探索と搾取のバランスをとるための適応的なメカニズムを結合し、履歴トラフィックデータを活用するためのメモリベースの評価メカニズムを組み込む。シミュレーションはマンハッタン、パリ、サンパウロ、イスタンブールなど様々な都市で行われている。実験の結果、AHMOAは最先端のアルゴリズムを一貫して上回り、不確実な環境下で複雑な交通システムを管理するためのスケーラブルで堅牢なPareto最適ソリューションを提供する能力があることが示された。

Intelligent traffic signal control is essential to modern urban management, with important impacts on economic efficiency, environmental sustainability, and quality of daily life. However, in current decades, it continues to pose significant challenges in managing large-scale traffic networks, coordinating intersections, and ensuring robustness under uncertain traffic conditions. This paper presents a scalable multi-objective optimization approach for robust traffic signal control in dynamic and uncertain urban environments. A multi-objective optimization model is proposed in this paper, which incorporates stochastic variables and probabilistic traffic patterns to capture traffic flow dynamics and uncertainty. We propose an algorithm named Adaptive Hybrid Multi-Objective Optimization Algorithm (AHMOA), which addresses the uncertainties of city traffic, including network-wide signal coordination, fluctuating patterns, and environmental impacts. AHMOA simultaneously optimizes multiple objectives, such as average delay, network stability, and system robustness, while adapting to unpredictable changes in traffic. The algorithm combines evolutionary strategies with an adaptive mechanism to balance exploration and exploitation, and incorporates a memory-based evaluation mechanism to leverage historical traffic data. Simulations are conducted in different cities including Manhattan, Paris, Sao Paulo, and Istanbul. The experimental results demonstrate that AHMOA consistently outperforms several state-of-the-art algorithms and the algorithm is competent to provide scalable, robust Pareto optimal solutions for managing complex traffic systems under uncertain environments.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# 2次元・3次元の1次構造テンソルスケール空間

Feature-Centered First Order Structure Tensor Scale-Space in 2D and 3D ( http://arxiv.org/abs/2409.13389v1 )

ライセンス: Link先を確認

Pawel Tomasz Pieta, Anders Bjorholm Dahl, Jeppe Revall Frisvad, Siavash Arjomand Bigdeli, Anders Nymark Christensen,

(参考訳) 構造テンソル法は画像構造の2次元および3次元解析によく用いられるが、その結果は多くの場合、ユーザのメソッドパラメータの選択に非常に依存している。微分フィルタの幅を画像特徴量に直結させることにより, 1次構造テンソルスケール空間におけるパラメータ選択を単純化する。リングフィルタのステップを導入することで、ガウス積分/平滑化を特徴端から中心へより正確に微分フィルタ応答をシフトさせる手法に置き換える。さらに、抽出された構造的測度を用いて、スケールマップの既知の不正確さを補正し、2Dと3Dの両方の特徴量を信頼性良く表現できることを示す。従来の1次構造テンソルやそれ以前の構造テンソルスケール空間のアプローチと比較して、我々の解ははるかに正確であり、最小限のユーザ入力で幅広い構造パラメータを抽出するアウト・オブ・ザ・ボックス法として機能する。

The structure tensor method is often used for 2D and 3D analysis of imaged structures, but its results are in many cases very dependent on the user's choice of method parameters. We simplify this parameter choice in first order structure tensor scale-space by directly connecting the width of the derivative filter to the size of image features. By introducing a ring-filter step, we substitute the Gaussian integration/smoothing with a method that more accurately shifts the derivative filter response from feature edges to their center. We further demonstrate how extracted structural measures can be used to correct known inaccuracies in the scale map, resulting in a reliable representation of the feature sizes both in 2D and 3D. Compared to the traditional first order structure tensor, or previous structure tensor scale-space approaches, our solution is much more accurate and can serve as an out-of-the-box method for extracting a wide range of structural parameters with minimal user input.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# 機械学習型原子間ポテンシャルのベンチマークとしての圧力下の水素

Hydrogen under Pressure as a Benchmark for Machine-Learning Interatomic Potentials ( http://arxiv.org/abs/2409.13390v1 )

ライセンス: Link先を確認

Thomas Bischoff, Bastian Jäckl, Matthias Rupp,

(参考訳) 機械学習原子間ポテンシャル(MLPs)は、原子論系のポテンシャルエネルギー表面の高速でデータ駆動の代理モデルであり、アブ初期分子動力学(MD)シミュレーションを数桁の規模で加速することができる。 MLPの性能は、トレーニングで使われていないデータに対するエネルギーと力の予測誤差として一般的に測定される。テストセット上での予測誤差は低いが、MDシミュレーションでは良い性能が保証されない。後者は、加速シミュレーションの実行から得られる物理的動機付けされた性能測定を必要とする。しかし、そのような措置の採用は、それらを計算し解釈するのに必要な努力とドメイン知識によって制限されている。この制限を克服するため,圧力下での水素中の液体-液体相転移のMDシミュレーションにおいて,MDシミュレーションにおいてMDPの性能を自動的に評価するベンチマークシステムを提案する。ベンチマークのh-llpt-24データセットは、異なる温度と質量密度でのMDシミュレーションによる参照測地、エネルギー、力、ストレスを提供する。ベンチマークのPythonコードは、MDシミュレーションを自動で実行し、圧力、安定な分子分数、拡散係数、放射分布関数を定量的に比較、視覚化する。このベンチマークを用いて, 液体-液相転移を再現できない状態のMLPがいくつか存在することを示す。

Machine-learning interatomic potentials (MLPs) are fast, data-driven surrogate models of atomistic systems' potential energy surfaces that can accelerate ab-initio molecular dynamics (MD) simulations by several orders of magnitude. The performance of MLPs is commonly measured as the prediction error in energies and forces on data not used in their training. While low prediction errors on a test set are necessary, they do not guarantee good performance in MD simulations. The latter requires physically motivated performance measures obtained from running accelerated simulations. However, the adoption of such measures has been limited by the effort and domain knowledge required to calculate and interpret them. To overcome this limitation, we present a benchmark that automatically quantifies the performance of MLPs in MD simulations of a liquid-liquid phase transition in hydrogen under pressure, a challenging benchmark system. The benchmark's h-llpt-24 dataset provides reference geometries, energies, forces, and stresses from density functional theory MD simulations at different temperatures and mass densities. The benchmark's Python code automatically runs MLP-accelerated MD simulations and calculates, quantitatively compares and visualizes pressures, stable molecular fractions, diffusion coefficients, and radial distribution functions. Employing this benchmark, we show that several state-of-the-art MLPs fail to reproduce the liquid-liquid phase transition.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# Elite-EvGS: イベント・ツー・ビデオ優先の蒸留によるイベントベース3次元ガウス分割学習

Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors ( http://arxiv.org/abs/2409.13392v1 )

ライセンス: Link先を確認

Zixin Zhang, Kanghao Chen, Lin Wang,

(参考訳) イベントカメラは、固定フレームではなく、非同期でスパースなイベントストリームを出力するバイオインスパイアされたセンサーである。高ダイナミックレンジや高時間分解能などの異なる利点から、ロボットマッピングにおいて重要な3D再構成にイベントカメラが応用されている。近年, 3次元ガウススプラッティング(3DGS)などのニューラルレンダリング技術は, 3次元再構成に成功している。しかし、効果的なイベントベースの3DGSパイプラインの開発方法はまだ解明されていない。特に、3DGSは、通常、高品質な初期化と密集した多視点制約に依存しているため、その固有のスパース性から、3DGS最適化に潜在的な問題が現れる。そこで我々は,イベントベースの新しい3DGSフレームワークElite-EvGSを提案する。我々のキーとなる考え方は、既成のイベント・ツー・ビデオ(E2V)モデルから事前知識を抽出し、粗い最適化方法でイベントから3Dシーンを効果的に再構築することである。具体的には、イベントからの3DGS初期化の複雑さに対処するため、E2Vモデルによって生成されたフレームから粗い3DGSを最適化し、イベントを組み込んで詳細を洗練するウォームアップ初期化戦略を導入する。そこで本稿では,ウィンドウスライシングによるイベント監視を段階的に削減する,プログレッシブなイベント監視戦略を提案する。これにより、イベントフレームの時間的ランダム性が微妙に向上し、局所的なテクスチャとグローバルな構造の詳細の最適化に寄与する。ベンチマークデータセットの実験では、Elite-EvGSがより優れたテクスチャと構造の詳細で3Dシーンを再構築できることが示されている。一方,本手法は,高速な動きや低照度シーンなどの多様な課題を含む実世界のデータに対して,高い性能が得られる。

Event cameras are bio-inspired sensors that output asynchronous and sparse event streams, instead of fixed frames. Benefiting from their distinct advantages, such as high dynamic range and high temporal resolution, event cameras have been applied to address 3D reconstruction, important for robotic mapping. Recently, neural rendering techniques, such as 3D Gaussian splatting (3DGS), have been shown successful in 3D reconstruction. However, it still remains under-explored how to develop an effective event-based 3DGS pipeline. In particular, as 3DGS typically depends on high-quality initialization and dense multiview constraints, a potential problem appears for the 3DGS optimization with events given its inherent sparse property. To this end, we propose a novel event-based 3DGS framework, named Elite-EvGS. Our key idea is to distill the prior knowledge from the off-the-shelf event-to-video (E2V) models to effectively reconstruct 3D scenes from events in a coarse-to-fine optimization manner. Specifically, to address the complexity of 3DGS initialization from events, we introduce a novel warm-up initialization strategy that optimizes a coarse 3DGS from the frames generated by E2V models and then incorporates events to refine the details. Then, we propose a progressive event supervision strategy that employs the window-slicing operation to progressively reduce the number of events used for supervision. This subtly relives the temporal randomness of the event frames, benefiting the optimization of local textural and global structural details. Experiments on the benchmark datasets demonstrate that Elite-EvGS can reconstruct 3D scenes with better textural and structural details. Meanwhile, our method yields plausible performance on the captured real-world data, including diverse challenging conditions, such as fast motion and low light scenes.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# PointSAM:リモートセンシング画像のためのポイントアップセグメンテーションモデル

PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images ( http://arxiv.org/abs/2409.13401v1 )

ライセンス: Link先を確認

Nanqing Liu, Xun Xu, Yongyi Su, Haojie Zhang, Heng-Chao Li,

(参考訳) Segment Anything Model (SAM)は画像分割のための高度な基礎モデルであり、リモートセンシング画像(RSI)に広く応用されている。 RSIと自然画像のドメインギャップのため、従来の方法では、ソーストレーニング済みのモデルとしてSAMを使用し、完全に教師付きマスクで微調整する。これらの手法とは異なり、我々の研究はより便利で挑戦的なポイントアノテーションを使ってSAMを微調整することに焦点を当てている。 SAMのゼロショット機能を活用して、トレーニング用に擬似ラベルを反復的に生成する自己学習フレームワークを採用する。しかし、擬似ラベルがノイズラベルを含む場合、エラーの蓄積のリスクがある。この問題に対処するため、ターゲットデータセットからターゲットプロトタイプを抽出し、ハンガリーのアルゴリズムを用いて予測プロトタイプとマッチングし、モデルが間違った方向に学習するのを防ぐ。さらに、複雑な背景とRSI内のオブジェクトの密分布のため、ポイントプロンプトを使用すると、複数のオブジェクトが1つとして認識される。この問題を解決するために,インスタンスマスクの非重複性に基づく負のプロンプトキャリブレーション手法を提案する。簡単に言えば、重なり合うマスクのプロンプトを対応する負の信号として使い、洗練されたマスクを生み出す。本稿では,これらの手法を組み合わせることで,ポイントSAMという新しいセグメンテーションモデルを提案する。我々は, WHU, HRSID, NWPU VHR-10を含むRSIデータセットを用いて実験を行い, SAM, SAM2, および他の比較手法による直接試験よりも優れた結果を得た。さらに,PointSAMをポイント・ツー・ボックス・コンバータとして導入し,提案手法を他のポイント・教師付きタスクに拡張できることを示す。コードはhttps://github.com/Lans1ng/PointSAMで公開されている。

Segment Anything Model (SAM) is an advanced foundational model for image segmentation, widely applied to remote sensing images (RSIs). Due to the domain gap between RSIs and natural images, traditional methods typically use SAM as a source pre-trained model and fine-tune it with fully supervised masks. Unlike these methods, our work focuses on fine-tuning SAM using more convenient and challenging point annotations. Leveraging SAM's zero-shot capabilities, we adopt a self-training framework that iteratively generates pseudo-labels for training. However, if the pseudo-labels contain noisy labels, there is a risk of error accumulation. To address this issue, we extract target prototypes from the target dataset and use the Hungarian algorithm to match them with prediction prototypes, preventing the model from learning in the wrong direction. Additionally, due to the complex backgrounds and dense distribution of objects in RSI, using point prompts may result in multiple objects being recognized as one. To solve this problem, we propose a negative prompt calibration method based on the non-overlapping nature of instance masks. In brief, we use the prompts of overlapping masks as corresponding negative signals, resulting in refined masks. Combining the above methods, we propose a novel Pointly-supervised Segment Anything Model named PointSAM. We conduct experiments on RSI datasets, including WHU, HRSID, and NWPU VHR-10, and the results show that our method significantly outperforms direct testing with SAM, SAM2, and other comparison methods. Furthermore, we introduce PointSAM as a point-to-box converter and achieve encouraging results, suggesting that this method can be extended to other point-supervised tasks. The code is available at https://github.com/Lans1ng/PointSAM.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# マルチモーダルディープラーニングカメラライダー校正モデルの検証と探索

Validation & Exploration of Multimodal Deep-Learning Camera-Lidar Calibration models ( http://arxiv.org/abs/2409.13402v1 )

ライセンス: Link先を確認

Venkat Karramreddy, Liam Mitchell,

(参考訳) 本稿では,マルチモーダルセンサシステムの校正のためのディープラーニングアーキテクチャの探索,評価,実装における革新的な研究について述べる。その背景にあるのは、センサー融合を利用して、3D LiDARと2Dカメラのダイナミックでリアルタイムなアライメントを実現することだ。静的キャリブレーション法は退屈で時間を要するため,この問題を解決するために,従来型ニューラルネットワーク(CNN)と幾何学的に情報を得た学習を組み合わせることを提案する。我々は、RegNet、CalibNet、LCCNetなどのExtrinsic LiDAR-Camera Calibrationツールの基本原則を活用し、オンラインで利用可能なオープンソースモデルを探索し、その結果を対応する研究論文と比較する。これらの視覚的および測定可能なアウトプットを抽出するために必要な要件は、ソースコードの微調整、トレーニング、バリデーション、テストの各フレームワークを等しく比較することであった。この手法は,どの先進的ネットワークが最も正確かつ一貫した予測を生成するかを調べることを目的としている。一連の実験を通じて、その過程での潜在的な改善の欠点と領域を明らかにします。 LCCNetは、検証したすべてのモデルの中で、最高の結果をもたらすことが分かりました。

This article presents an innovative study in exploring, evaluating, and implementing deep learning architectures for the calibration of multi-modal sensor systems. The focus behind this is to leverage the use of sensor fusion to achieve dynamic, real-time alignment between 3D LiDAR and 2D Camera sensors. static calibration methods are tedious and time-consuming, which is why we propose utilizing Conventional Neural Networks (CNN) coupled with geometrically informed learning to solve this issue. We leverage the foundational principles of Extrinsic LiDAR-Camera Calibration tools such as RegNet, CalibNet, and LCCNet by exploring open-source models that are available online and comparing our results with their corresponding research papers. Requirements for extracting these visual and measurable outputs involved tweaking source code, fine-tuning, training, validation, and testing for each of these frameworks for equal comparisons. This approach aims to investigate which of these advanced networks produces the most accurate and consistent predictions. Through a series of experiments, we reveal some of their shortcomings and areas for potential improvements along the way. We find that LCCNet yields the best results out of all the models that we validated.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# クレジットカードの不正検出:ディープラーニングアプローチ

Credit Card Fraud Detection: A Deep Learning Approach ( http://arxiv.org/abs/2409.13406v1 )

ライセンス: Link先を確認

Sourav Verma, Joydip Dhar,

(参考訳) クレジットカードは、最近の電子取引におけるオンラインとオフラインの両方の支払いモードにおいて、最も広範なインストール方法の1つである。クレジットカードの発明は電子取引をかなり楽にしたしかし、犯罪に対する新たな詐欺の機会も提供し、詐欺率の上昇につながった。不正なクレジットカード取引により、多くの機関や個人によって実質的な金額が失われている。したがって、改善された動的不正認識フレームワークを適応させることは、すべてのクレジットカード流通銀行が損失を軽減するために必須となった。実際、不正なクレジットカード取引の問題は、コンセプトドリフト(concept drift)、クラス不均衡(class im Balance)、検証レイテンシ(Verification latency)といった、関連するリアルタイムの課題に関係している。しかし、現在のシステムの大部分は人工知能(AI)、ファジィ論理、機械学習、データマイニング、遺伝的アルゴリズムなどに基づいており、詐欺検出システム(FDS)のすべての課題にほとんど対処しない仮定に依存している。本稿では,偽陽性率が非常に低い不正カバレッジを得るために,Deep Learningアルゴリズムを理解し,実装することを目的とする。また、一般的なパターンを学習するための教師なし(半教師なし)手法として自動エンコーダを実装することを目的とする。キーワード:クレジットカード詐欺、不正検出システム(FDS)、電子取引、コンセプトドリフト、クラス不均衡、検証レイテンシ、機械学習、ディープラーニング

Credit card is one of the most extensive methods of instalment for both online and offline mode of payment for electronic transactions in recent times. credit cards invention has provided significant ease in electronic transactions. However, it has also provided new fraud opportunities for criminals, which results in increased fraud rates. Substantial amount of money has been lost by many institutions and individuals due to fraudulent credit card transactions. Adapting improved and dynamic fraud recognition frameworks thus became essential for all credit card distributing banks to mitigate their losses. In fact, the problem of fraudulent credit card transactions implicates a number of relevant real-time challenges, namely: Concept drift, Class imbalance, and Verification latency. However, the vast majority of current systems are based on artificial intelligence (AI), Fuzzy logic, Machine Learning, Data mining, Genetic Algorithms, and so on, rely on assumptions that hardly address all the relevant challenges of fraud-detection system (FDS). This paper aims to understand & implement Deep Learning algorithms in order to obtain a high fraud coverage with very low false positive rate. Also, it aims to implement an auto-encoder as an unsupervised (semi-supervised) method of learning common patterns. Keywords: Credit card fraud, Fraud-detection system (FDS), Electronic transactions, Concept drift, Class imbalance, Verification latency, Machine Learning, Deep Learning

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# 非定常コストを考慮したマルチエフェクタ時空間計画のコントラスト説明に関するユーザスタディ

A User Study on Contrastive Explanations for Multi-Effector Temporal Planning with Non-Stationary Costs ( http://arxiv.org/abs/2409.13427v1 )

ライセンス: Link先を確認

Xiaowei Liu, Kevin McAreavey, Weiru Liu,

(参考訳) 本稿では,スマートホームの時間的計画のためのエンドユーザーアプリケーションとして,コンストラッシブな説明を採用する。本アプリケーションでは、アプライアンスタスクの実行の要件、動的エネルギー関税によるエネルギーの支払い、高容量バッテリーストレージへのアクセス、電力をグリッドに販売することができる。装置の同時スケジューリングは、これをマルチエフェクタ計画の問題とし、動的関税は、非定常的なコスト(または、定常だが外因性事象に依存するコスト)をもたらす。これらの特徴は、一般に既存のPDDLベースのプランナーではプランニング問題がサポートされないため、適切なアプライアンス数や時間的地平線にスケールする独自のドメイン依存プランナーを設計する。我々は,2つのユーザストーリーに基づいて,オンラインクラウドソーシングプラットフォームを用いた128人の参加者を対象に,コントロールされたユーザスタディを実施している。比較質問や説明を提示したユーザは,満足度が高く,理解度が向上する傾向があり,これらの機能にアクセスできないユーザに比べて,推奨されるAIスケジュールに好適に適合する可能性が示唆された。

In this paper, we adopt constrastive explanations within an end-user application for temporal planning of smart homes. In this application, users have requirements on the execution of appliance tasks, pay for energy according to dynamic energy tariffs, have access to high-capacity battery storage, and are able to sell energy to the grid. The concurrent scheduling of devices makes this a multi-effector planning problem, while the dynamic tariffs yield costs that are non-stationary (alternatively, costs that are stationary but depend on exogenous events). These characteristics are such that the planning problems are generally not supported by existing PDDL-based planners, so we instead design a custom domain-dependent planner that scales to reasonable appliance numbers and time horizons. We conduct a controlled user study with 128 participants using an online crowd-sourcing platform based on two user stories. Our results indicate that users provided with contrastive questions and explanations have higher levels of satisfaction, tend to gain improved understanding, and rate the helpfulness more favourably with the recommended AI schedule compared to those without access to these features.

翻訳日:2024-11-07 07:28:56 公開日:2024-09-20

# 大規模マルチモーダルモデルによる指導誘導多粒度セグメントとキャプション

Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model ( http://arxiv.org/abs/2409.13407v1 )

ライセンス: Link先を確認

Li Zhou, Xu Yuan, Zenghui Sun, Zikun Zhou, Jingsong Lan,

(参考訳) 大規模マルチモーダルモデル(LMM)は、大規模言語モデルを拡張することで大きな進歩を遂げた。この進歩を踏まえ、LMMの最新の開発は、セグメンテーションモデルの統合による高密度ピクセルワイドセグメンテーションを生成する能力を示しているが、既存の作品のテキスト応答とセグメンテーションマスクはインスタンスレベルに留まり、細部まで細部まで理解とセグメンテーションを行う能力に制限がある。この制限を克服するために、スグメンテーションとキャプション(SegCap)の粒度をユーザ指示に従ってシームレスに調整できるMGLMM(Multi-Granularity Large Multimodal Model)を導入する。このようなタスクをMGSC(Multi-Granularity Segmentation and Captioning)と呼ぶ。 MGSCタスク上でのモデルトレーニングと評価のためのベンチマークが欠如しているのを見て、カスタマイズされた自動アノテーションパイプラインを使用して、複数の粒度のマスクとキャプションを並べたベンチマークを構築した。このベンチマークは、10Kイメージと30Kイメージ検索ペアで構成されている。我々は、さらなる研究のための自動データセットアノテーションパイプラインの実装とともにデータセットをリリースし、また、異種セグメンテーションデータセットを統一する新しいSegCapデータフォーマットを提案し、マルチタスクトレーニング中にオブジェクトの概念と視覚的特徴を効果的に関連付けることを支援します。大規模な実験により,MGLMMは8つの下流タスクに精通し,MGSC,GCG,画像キャプション,セグメンテーションの参照,複数と空のセグメンテーションタスク,推論セグメンテーションタスクの最先端性能を実現していることがわかった。 MGLMMの優れた性能と汎用性は、マルチモーダル研究の進展にその潜在的影響を浮き彫りにした。

Large Multimodal Models (LMMs) have achieved significant progress by extending large language models. Building on this progress, the latest developments in LMMs demonstrate the ability to generate dense pixel-wise segmentation through the integration of segmentation models.Despite the innovations, the textual responses and segmentation masks of existing works remain at the instance level, showing limited ability to perform fine-grained understanding and segmentation even provided with detailed textual cues.To overcome this limitation, we introduce a Multi-Granularity Large Multimodal Model (MGLMM), which is capable of seamlessly adjusting the granularity of Segmentation and Captioning (SegCap) following user instructions, from panoptic SegCap to fine-grained SegCap. We name such a new task Multi-Granularity Segmentation and Captioning (MGSC). Observing the lack of a benchmark for model training and evaluation over the MGSC task, we establish a benchmark with aligned masks and captions in multi-granularity using our customized automated annotation pipeline. This benchmark comprises 10K images and more than 30K image-question pairs. We will release our dataset along with the implementation of our automated dataset annotation pipeline for further research.Besides, we propose a novel unified SegCap data format to unify heterogeneous segmentation datasets; it effectively facilitates learning to associate object concepts with visual features during multi-task training. Extensive experiments demonstrate that our MGLMM excels at tackling more than eight downstream tasks and achieves state-of-the-art performance in MGSC, GCG, image captioning, referring segmentation, multiple and empty segmentation, and reasoning segmentation tasks. The great performance and versatility of MGLMM underscore its potential impact on advancing multimodal research.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 自動内視鏡石盤認識のための合成画像の妥当性評価

Evaluating the plausibility of synthetic images for improving automated endoscopic stone recognition ( http://arxiv.org/abs/2409.13409v1 )

ライセンス: Link先を確認

Ruben Gonzalez-Perez, Francisco Lopez-Tiro, Ivan Reyes-Amezcua, Eduardo Falcon-Morales, Rosa-Maria Rodriguez-Gueant, Jacques Hubert, Michel Daudon, Gilberto Ochoa-Ruiz, Christian Daul,

(参考訳) 現在、Morpho-Constitutional Analysis (MCA) は腎臓結石の組織学的診断の事実上のアプローチであり、再発を避けるためにパーソナライズされた治療を確立するための重要なステップである。近年では、内視鏡的石盤認識(ESR)と呼ばれる、そのようなタスクを術中実行することに焦点を当てている。どちらの方法も、分析されたサンプルをいくつかのサブグループに分離するために、表面と腎臓石の断面で観察された特徴に依存している。しかし、ESRで見られる高いサーバ内変動と複雑な動作条件を考えると、コンピュータ支援診断にAIを使うことには多くの関心がある。しかし、現在のAIモデルは、優れたパフォーマンスを達成し、目に見えないディストリビューションを一般化するために、大きなデータセットを必要としている。これは大きなラベル付きデータセットの取得が非常に困難であり、腎臓石のクラスは非常に稀であるため、大きな問題である。そこで本研究では,既存の腎臓結石データセットを拡張するための拡散法を提案する。本研究の目的は,前生児データを用いた事前トレーニングモデルに使用可能な多彩な腎臓結石画像を作成することである。本研究では,CCD画像の自然画像と合成画像とを混合することにより,未確認の術中データに非常によく対応できるモデルを訓練することができることを示す。その結果,ImageNetのみで事前学習したベースラインモデルに比べて精度が10%向上する可能性が示唆された。さらに,CCD画像のみを用いたモデル列車と比較して,表面画像の6%,断面画像の10%の改善が見られ,合成画像の有効性が示された。

Currently, the Morpho-Constitutional Analysis (MCA) is the de facto approach for the etiological diagnosis of kidney stone formation, and it is an important step for establishing personalized treatment to avoid relapses. More recently, research has focused on performing such tasks intra-operatively, an approach known as Endoscopic Stone Recognition (ESR). Both methods rely on features observed in the surface and the section of kidney stones to separate the analyzed samples into several sub-groups. However, given the high intra-observer variability and the complex operating conditions found in ESR, there is a lot of interest in using AI for computer-aided diagnosis. However, current AI models require large datasets to attain a good performance and for generalizing to unseen distributions. This is a major problem as large labeled datasets are very difficult to acquire, and some classes of kidney stones are very rare. Thus, in this paper, we present a method based on diffusion as a way of augmenting pre-existing ex-vivo kidney stone datasets. Our aim is to create plausible diverse kidney stone images that can be used for pre-training models using ex-vivo data. We show that by mixing natural and synthetic images of CCD images, it is possible to train models capable of performing very well on unseen intra-operative data. Our results show that is possible to attain an improvement of 10% in terms of accuracy compared to a baseline model pre-trained only on ImageNet. Moreover, our results show an improvement of 6% for surface images and 10% for section images compared to a model train on CCD images only, which demonstrates the effectiveness of using synthetic images.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# CT/PET画像における深層学習による腫瘍分節の正弦波正規化

Sine Wave Normalization for Deep Learning-Based Tumor Segmentation in CT/PET Imaging ( http://arxiv.org/abs/2409.13410v1 )

ライセンス: Link先を確認

Jintao Ren, Muheng Li, Stine Sofia Korreman,

(参考訳) 本報告では, オートPETIIIチャレンジのために開発されたCT/PETスキャンにおける腫瘍分離の正常化ブロックについて述べる。 SineNormalはPETデータに周期的な正弦変換を適用して病変検出を強化する。 PET強調領域における強度の変化を強調し、同心リングパターンを生成することにより、特にマルチトラックPETデータセットに挑戦するセグメンテーション精度を向上させることを目的としている。プロジェクトのコードはGitHubで公開されている(https://github.com/BBQtime/Sine-Wave-Normalization-for-Deep-Learning-Based-Tumor-Segmentation-in-CT -PET)。

This report presents a normalization block for automated tumor segmentation in CT/PET scans, developed for the autoPET III Challenge. The key innovation is the introduction of the SineNormal, which applies periodic sine transformations to PET data to enhance lesion detection. By highlighting intensity variations and producing concentric ring patterns in PET highlighted regions, the model aims to improve segmentation accuracy, particularly for challenging multitracer PET datasets. The code for this project is available on GitHub (https://github.com/BBQtime/Sine-Wave-Normalization-for-Deep-Learning-Based-Tumor-Segmentation-in-CT -PET).

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 時間差重み付けによるMS病変の経時的分節化

Longitudinal Segmentation of MS Lesions via Temporal Difference Weighting ( http://arxiv.org/abs/2409.13416v1 )

ライセンス: Link先を確認

Maximilian Rokuss, Yannick Kirchhoff, Saikat Roy, Balint Kovacs, Constantin Ulrich, Tassilo Wald, Maximilian Zenk, Stefan Denner, Fabian Isensee, Philipp Vollmuth, Jens Kleesiek, Klaus Maier-Hein,

(参考訳) 経時的MRI検査における多発性硬化症(MS)病変の正確な分節化は、疾患の進行と治療効果の監視に不可欠である。臨床実習で画像を評価する場合、時間的変化が考慮されるが、既存のディープラーニング手法のほとんどは、異なる時点からのスキャンを別々に扱う。縦断画像を用いた研究の中では、時間点を統合するために用いられる最優先の手法は、チャネルワイズ結合である。本稿では,ベースラインとフォローアップスキャンの時間的差を,差分重みブロックと呼ばれるユニークなアーキテクチャ的帰納バイアスによって明示的に取り込む新しい手法を提案する。 2つのタイムポイントから機能をマージし、スキャン間の変更を強調します。病変のセグメンテーション (Dice Score, Hausdorff distance) と病変検出 (Lesion-level $F_1$ score) において, 2つのデータセットの経時的, 単独のタイムポイントモデルと比較して, 優れたスコアが得られた。私たちのコードはwww.github.com/MIC-DKFZ/Longitudinal-Difference-Weightingで公開されています。

Accurate segmentation of Multiple Sclerosis (MS) lesions in longitudinal MRI scans is crucial for monitoring disease progression and treatment efficacy. Although changes across time are taken into account when assessing images in clinical practice, most existing deep learning methods treat scans from different timepoints separately. Among studies utilizing longitudinal images, a simple channel-wise concatenation is the primary albeit suboptimal method employed to integrate timepoints. We introduce a novel approach that explicitly incorporates temporal differences between baseline and follow-up scans through a unique architectural inductive bias called Difference Weighting Block. It merges features from two timepoints, emphasizing changes between scans. We achieve superior scores in lesion segmentation (Dice Score, Hausdorff distance) as well as lesion detection (lesion-level $F_1$ score) as compared to state-of-the-art longitudinal and single timepoint models across two datasets. Our code is made publicly available at www.github.com/MIC-DKFZ/Longitudinal-Difference-Weighting.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 超伝導回路の熱分光計

Thermal spectrometer for superconducting circuits ( http://arxiv.org/abs/2409.13417v1 )

ライセンス: Link先を確認

Christoforus Dimas Satrya, Yu-Cheng Chang, Rishabh Upadhyay, Ilari K. Makinen, Joonas T. Peltonen, Bayan Karimi, Jukka P. Pekola,

(参考訳) 超伝導回路は、基本量子現象の研究や量子技術応用のための多用途で制御可能なプラットフォームを提供する。量子回路の状態を読み出し、その特性を特徴づける従来の手法は、高価で複雑な計装を含むrf測定方式に基づいている。本稿では、コプラナー導波路共振器を用いた概念実証実験において、超伝導回路の特性を調べるための熱分光計の簡単なdc測定を実演する。共振器内のマイクロ波光子のごく一部はオンチップボルメータによって吸収され、測定可能な温度上昇をもたらす。このプロセスによる温度計のdc信号のモニタリングにより、共振器の共振周波数とラインシェイプ(品質係数)を決定することができる。実証されたスキームは、単純なdc測定であり、200GHzまでの広帯域を持ち、典型的なrf分光計よりかなり優れている。さらに、熱測定は従来のrf測定とは異なり、ローレンツ吸収信号の高周波数独立基準レベルが得られる。低出力状態では、測定は完全にキャリブレーションフリーである。そこで本手法は,従来の手法よりも多くの点で優れている量子回路の代替分光器を提供する。

Superconducting circuits provide a versatile and controllable platform for studies of fundamental quantum phenomena as well as for quantum technology applications. A conventional technique to read out the state of a quantum circuit or to characterize its properties is based on rf measurement schemes involving costly and complex instrumentation. Here we demonstrate a simple dc measurement of a thermal spectrometer to investigate properties of a superconducting circuit, in this proof-of-concept experiment a coplanar waveguide resonator. A fraction of the microwave photons in the resonator is absorbed by an on-chip bolometer, resulting in a measurable temperature rise. By monitoring the dc signal of the thermometer due to this process, we are able to determine the resonance frequency and the lineshape (quality factor) of the resonator. The demonstrated scheme, which is a simple dc measurement, has a wide band up to 200 GHz, well exceeding that of the typical rf spectrometer. Moreover, the thermal measurement yields a highly frequency independent reference level of the Lorentzian absorption signal, unlike the conventional rf measurement. In the low power regime, the measurement is fully calibration-free. Our technique thus offers an alternative spectrometer for quantum circuits, which is in many ways superior with respect to conventional methods.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 就労型デュアルコントゥーリング

Occupancy-Based Dual Contouring ( http://arxiv.org/abs/2409.13418v1 )

ライセンス: Link先を確認

Jisung Hwang, Minhyuk Sung,

(参考訳) 本稿では,計算時間を数秒で達成しつつ,占有関数の最先端性能を実現する2つのコントゥーリング手法を提案する。本手法は,GPU並列化を最大化するために,学習不要かつ慎重に設計されている。近年の暗黙の神経表現の急激な増加は、占有領域に大きな関心を惹きつけ、その結果、それらに基づく広範囲な3D再構成と生成方法がもたらされた。しかし、そのような手法の出力は、結果として生じる占有関数をメッシュに変換するボトルネックのために過小評価されている。マーチングキューブは階段のような人工物を産み出す傾向があり、その後のほとんどの研究は符号付き距離関数を入力として活用することに焦点を当て、占有関数に対する準最適結果も得る。 Manifold Dual Contouring (MDC) に基づくOccupancy-based Dual Contouring (ODC) を提案する。本研究では,局所表面の正規点と1D点を同時に計算し,二次誤差関数による3D点の同定を支援する補助的2D点を提案する。 1D, 2D, 3Dの点を探索するために, すべての格子縁, 顔, 細胞に並列化可能な高速アルゴリズムを開発した。複数の3次元ニューラル生成モデルと3Dメッシュデータセットを用いた実験により,本手法が先行研究と比較して最高の忠実度を達成できることが実証された。

We introduce a dual contouring method that provides state-of-the-art performance for occupancy functions while achieving computation times of a few seconds. Our method is learning-free and carefully designed to maximize the use of GPU parallelization. The recent surge of implicit neural representations has led to significant attention to occupancy fields, resulting in a wide range of 3D reconstruction and generation methods based on them. However, the outputs of such methods have been underestimated due to the bottleneck in converting the resulting occupancy function to a mesh. Marching Cubes tends to produce staircase-like artifacts, and most subsequent works focusing on exploiting signed distance functions as input also yield suboptimal results for occupancy functions. Based on Manifold Dual Contouring (MDC), we propose Occupancy-Based Dual Contouring (ODC), which mainly modifies the computation of grid edge points (1D points) and grid cell points (3D points) to not use any distance information. We introduce auxiliary 2D points that are used to compute local surface normals along with the 1D points, helping identify 3D points via the quadric error function. To search the 1D, 2D, and 3D points, we develop fast algorithms that are parallelizable across all grid edges, faces, and cells. Our experiments with several 3D neural generative models and a 3D mesh dataset demonstrate that our method achieves the best fidelity compared to prior works.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 状態空間モデル、出現、エルゴード性:安定した予測には、どれくらいのパラメータが必要か?

State space models, emergence, and ergodicity: How many parameters are needed for stable predictions? ( http://arxiv.org/abs/2409.13421v1 )

ライセンス: Link先を確認

Ingvar Ziemann, Nikolai Matni, George J. Pappas,

(参考訳) 与えられたタスクを実行するために、モデルのパラメータはいくつ必要か? 自己教師付き学習によって事前訓練された大規模言語モデルは、パラメータの数が臨界スケールに達するにつれて、多段階推論のような創発的な能力を示すと論じられている。本研究では,この現象が単純な理論モデルで類似して再現できるかどうかを考察する。本稿では,線形力学系(自制学習の単純な例)の学習の問題点が相転移を示すことを示す。すなわち、すべての非エルゴード線形系に対して、学習者がそのしきい値より少ないパラメータを使用すると、大きなシーケンス長の有界誤差を達成できないような臨界しきい値が存在する。異なることに、我々のモデルでは、かなりの長距離相関を示すタスクにはパラメータ(出現に類似した現象)の臨界数が必要であり、学習者のパラメトリゼーションの役割についても検討し、隠れ状態を持つ線形力学系の単純なバージョン($\mathbb{R}$の不完全なランダムウォーク)を考える。この状況に対して,フィルタ長が有効メモリ長と水平線に依存する一定の閾値を超えない限り,ランダムウォークを円滑に学習できる線形フィルタを用いた学習者が存在しないことを示す。

How many parameters are required for a model to execute a given task? It has been argued that large language models, pre-trained via self-supervised learning, exhibit emergent capabilities such as multi-step reasoning as their number of parameters reach a critical scale. In the present work, we explore whether this phenomenon can analogously be replicated in a simple theoretical model. We show that the problem of learning linear dynamical systems -- a simple instance of self-supervised learning -- exhibits a corresponding phase transition. Namely, for every non-ergodic linear system there exists a critical threshold such that a learner using fewer parameters than said threshold cannot achieve bounded error for large sequence lengths. Put differently, in our model we find that tasks exhibiting substantial long-range correlation require a certain critical number of parameters -- a phenomenon akin to emergence. We also investigate the role of the learner's parametrization and consider a simple version of a linear dynamical system with hidden state -- an imperfectly observed random walk in $\mathbb{R}$. For this situation, we show that there exists no learner using a linear filter which can succesfully learn the random walk unless the filter length exceeds a certain threshold depending on the effective memory length and horizon of the problem.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 未知環境におけるロボットダイナミクスの最適化のための因果強化学習

Causal Reinforcement Learning for Optimisation of Robot Dynamics in Unknown Environments ( http://arxiv.org/abs/2409.13423v1 )

ライセンス: Link先を確認

Julian Gerald Dcruz, Sam Mahoney, Jia Yun Chua, Adoundeth Soukhabandith, John Mugabe, Weisi Guo, Miguel Arana-Catania,

(参考訳) 未知の環境におけるロボットの自律的な操作は、物体の運動可能性のような相互作用のダイナミクスの知識が不足しているため、困難である。本研究は,ロボット操作の強化を目的とした,新たな因果強化学習手法を導入し,都市検索・救助(SAR)シナリオに適用する。提案した機械学習アーキテクチャにより、ロボットは、テクスチャや形状などの物体の視覚的特徴と、その動作性などの相互作用における物体のダイナミクスとの間の因果関係を学習し、意思決定プロセスを大幅に改善することができる。我々は因果的発見とRL実験を行い、因果的RLの優れた性能を実証し、非因果的モデルと比較して、複雑な状況下での学習時間を24.5%以上減少させた。

Autonomous operations of robots in unknown environments are challenging due to the lack of knowledge of the dynamics of the interactions, such as the objects' movability. This work introduces a novel Causal Reinforcement Learning approach to enhancing robotics operations and applies it to an urban search and rescue (SAR) scenario. Our proposed machine learning architecture enables robots to learn the causal relationships between the visual characteristics of the objects, such as texture and shape, and the objects' dynamics upon interaction, such as their movability, significantly improving their decision-making processes. We conducted causal discovery and RL experiments demonstrating the Causal RL's superior performance, showing a notable reduction in learning times by over 24.5% in complex situations, compared to non-causal models.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# HMD$^2$:単一エゴセントリックヘッドマウントデバイスによる環境認識運動生成

HMD$^2$: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device ( http://arxiv.org/abs/2409.13426v1 )

ライセンス: Link先を確認

Vladimir Guzov, Yifeng Jiang, Fangzhou Hong, Gerard Pons-Moll, Richard Newcombe, C. Karen Liu, Yuting Ye, Lingni Ma,

(参考訳) 本稿では,外向きカラーカメラと視覚SLAM機能を備えた頭部装着装置を用いて,リアルな全身動作のオンライン生成について検討する。本稿では, 運動再構成と生成のバランスをとるための新しいシステム HMD$^2$ を導入する。再建の観点から,本システムは,頭部運動,SLAM点雲,画像埋め込みなどの解析的特徴と学習的特徴の両方を最大限に活用することを目的としている。生成面では、HMD$^2$はマルチモーダルな条件付き運動拡散モデルを採用し、生成した動きの時間的コヒーレンスを維持するために時系列バックボーンを組み込んでおり、自動回帰インペイントを用いて、最小レイテンシ(0.17秒)でオンライン動作推論を容易にする。集合的に、我々のシステムは、公開可能なスマートグラスを用いて、広範囲の屋内および屋外環境において収集された200時間を超える広範囲なデータセットにスケール可能な、非常に効果的で堅牢なソリューションを提供していることを実証した。

This paper investigates the online generation of realistic full-body human motion using a single head-mounted device with an outward-facing color camera and the ability to perform visual SLAM. Given the inherent ambiguity of this setup, we introduce a novel system, HMD$^2$, designed to balance between motion reconstruction and generation. From a reconstruction standpoint, our system aims to maximally utilize the camera streams to produce both analytical and learned features, including head motion, SLAM point cloud, and image embeddings. On the generative front, HMD$^2$ employs a multi-modal conditional motion Diffusion model, incorporating a time-series backbone to maintain temporal coherence in generated motions, and utilizes autoregressive in-painting to facilitate online motion inference with minimal latency (0.17 seconds). Collectively, we demonstrate that our system offers a highly effective and robust solution capable of scaling to an extensive dataset of over 200 hours collected in a wide range of complex indoor and outdoor environments using publicly available smart glasses.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# テキスト対応マスド画像モデリングによるシーンテキスト除去のためのテキストローカライゼーションの活用

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling ( http://arxiv.org/abs/2409.13431v1 )

ライセンス: Link先を確認

Zixiao Wang, Hongtao Xie, YuXin Wang, Yadong Qu, Fengjun Guo, Pengwei Liu,

(参考訳) 既存のシーンテキスト削除(STR)タスクは、高価なピクセルレベルのラベリングのため、トレーニングデータ不足に悩まされる。本稿では,低コストなテキスト検出ラベル付きSTRモデル(テキスト境界ボックスなど)を事前学習可能なテキスト対応マスク付き画像モデリングアルゴリズム(TMIM)を導入することで,この問題に対処することを目的とする。間接的補助的タスクのみを用いて暗黙的特徴抽出能力を高める従来の事前訓練方法とは異なり、TMIMではまずSTRタスクを弱教師付きで直接訓練し、STRの知識を明確かつ効率的に探索する。 TMIMでは、まず背景モデリングストリームを構築し、マスクされた非テキスト領域を復元することで背景生成規則を学習する。一方、マスクされたテキスト領域に擬似STRラベルを提供する。次に、擬似ラベルから学習し、そのモデルにエンドツーエンドのSTR能力を持たせるために、テキスト消去ストリームを提案する。 2つのコラボレーティブストリームから恩恵を受けながら、私たちのSTRモデルは、高コストSTRラベルの制限を大幅に軽減する、公開テキスト検出データセットでのみ、素晴らしいパフォーマンスを達成できます。実験により,本手法は他のプレトレイン法よりも優れ,最先端性能(SCUT-EnsTextの37.35 PSNR)が得られた。コードはhttps://github.com/wzx99/TMIMで入手できる。

Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling. In this paper, we aim to address this issue by introducing a Text-aware Masked Image Modeling algorithm (TMIM), which can pretrain STR models with low-cost text detection labels (e.g., text bounding box). Different from previous pretraining methods that use indirect auxiliary tasks only to enhance the implicit feature extraction ability, our TMIM first enables the STR task to be directly trained in a weakly supervised manner, which explores the STR knowledge explicitly and efficiently. In TMIM, first, a Background Modeling stream is built to learn background generation rules by recovering the masked non-text region. Meanwhile, it provides pseudo STR labels on the masked text region. Second, a Text Erasing stream is proposed to learn from the pseudo labels and equip the model with end-to-end STR ability. Benefiting from the two collaborative streams, our STR model can achieve impressive performance only with the public text detection datasets, which greatly alleviates the limitation of the high-cost STR labels. Experiments demonstrate that our method outperforms other pretrain methods and achieves state-of-the-art performance (37.35 PSNR on SCUT-EnsText). Code will be available at https://github.com/wzx99/TMIM.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 摂動高次例外点の固有値スペクトルに対するグラフ理論的アプローチ

Graph-theoretical approach to the eigenvalue spectrum of perturbed higher-order exceptional points ( http://arxiv.org/abs/2409.13434v1 )

ライセンス: Link先を確認

Daniel Grom, Julius Kullig, Malte Röntgen, Jan Wiersig,

(参考訳) 例外点はパラメータ空間の特別な縮退点であり、開量子および波動系を記述する(効果的)非エルミート・ハミルトニアンにおいて生じる。 n階の例外点において、n 個の固有値と対応する固有ベクトルが同時に結合する。これらの結合固有値は、センサ応用に有用な小さな摂動に対して強い応答を示すのが一般的である。強度$\epsilon$のいわゆる一般摂動は、$\epsilon$のn番目の根に比例する固有値を変化させる。摂動下での異なる固有値の振る舞いは非GA(non-generic)と呼ばれる。様々な種類の摂動に対する固有値の振る舞いの理解は望ましいものであり、応用にも不可欠である。我々は、高次例外点の固有値スペクトル、すなわち n > 2 に対する摂動効果の理解に寄与するグラフ理論的視点を提唱する。半無限導波路と端ミラーを結合したマイクロリングのシステムについて,非遺伝的摂動の関連性を強調し,その発生を解釈する。さらに、そのようなシステムにおいて空洞選択的センシングに生じる飽和効果は、グラフ理論図の中で自然に説明される。

Exceptional points are special degeneracy points in parameter space that can arise in (effective) non-Hermitian Hamiltonians describing open quantum and wave systems. At an n-th order exceptional point, n eigenvalues and the corresponding eigenvectors simultaneously coalesce. These coalescing eigenvalues typically exhibit a strong response to small perturbations which can be useful for sensor applications. A so-called generic perturbation with strength $\epsilon$ changes the eigenvalues proportional to the n-th root of $\epsilon$. A different eigenvalue behavior under perturbation is called non-generic. An understanding of the behavior of the eigenvalues for various types of perturbations is desirable and also crucial for applications. We advocate a graph-theoretical perspective that contributes to the understanding of perturbative effects on the eigenvalue spectrum of higher-order exceptional points, i.e. n > 2. To highlight the relevance of non-generic perturbations and to give an interpretation for their occurrence, we consider an illustrative example, a system of microrings coupled by a semi-infinite waveguide with an end mirror. Furthermore, the saturation effect occurring for cavity-selective sensing in such a system is naturally explained within the graph-theoretical picture.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 遺伝子モデルを用いたダウンシンドローム脳バイオマーカーの発見に向けて

Towards the Discovery of Down Syndrome Brain Biomarkers Using Generative Models ( http://arxiv.org/abs/2409.13437v1 )

ライセンス: Link先を確認

Jordi Malé, Juan Fortea, Mateus Rozalem Aranha, Yann Heuzé, Neus Martínez-Abadías, Xavier Sevillano,

(参考訳) 脳イメージングにより、神経科学者は、ダウン症候群、認知障害や記憶障害の神経解剖学的基盤を解明するための関心領域の特定など、遺伝や神経発達障害の脳形態を分析できるようになった。しかし、脳解剖学、認知能力、アルツハイマー病などの合併症の関連性はまだダウン症候群の集団ではよく分かっていない。人工知能の最新の進歩は、大量の脳磁気共鳴イメージングスキャンを解析する自動ツールを開発する機会となり、手動解析のボトルネックを克服する。本研究では、アルツハイマー病による神経変性の度合いに影響を及ぼすダウン症候群患者の脳変化を検出するための生成モデルを提案する。そこで我々は,脳磁気共鳴画像スキャンの独自のデータセットを活用し,変分オートエンコーダと拡散モデルに基づく最先端の脳異常検出モデルの評価を行った。総合的な評価プロセスの後、本研究はいくつかの重要な分析を含む。まず,神経放射線学の専門家による質的評価を行った。第2に, 生成モデルに対する定量的および定性的再構成忠実度調査を行った。第3に,ヒストグラムのポストプロセッシングがモデル性能をいかに向上させるかを検討するため,アブレーション試験を行った。最後に,皮質下構造の定量的体積解析を行った。以上の結果より,ダウン症候群の脳解剖を特徴付ける一次変化,小脳小脳,拡大した心室,大脳皮質の縮小,およびアルツハイマー病による頭頂葉の変化を効果的に検出できるモデルがあることが示唆された。

Brain imaging has allowed neuroscientists to analyze brain morphology in genetic and neurodevelopmental disorders, such as Down syndrome, pinpointing regions of interest to unravel the neuroanatomical underpinnings of cognitive impairment and memory deficits. However, the connections between brain anatomy, cognitive performance and comorbidities like Alzheimer's disease are still poorly understood in the Down syndrome population. The latest advances in artificial intelligence constitute an opportunity for developing automatic tools to analyze large volumes of brain magnetic resonance imaging scans, overcoming the bottleneck of manual analysis. In this study, we propose the use of generative models for detecting brain alterations in people with Down syndrome affected by various degrees of neurodegeneration caused by Alzheimer's disease. To that end, we evaluate state-of-the-art brain anomaly detection models based on Variational Autoencoders and Diffusion Models, leveraging a proprietary dataset of brain magnetic resonance imaging scans. Following a comprehensive evaluation process, our study includes several key analyses. First, we conducted a qualitative evaluation by expert neuroradiologists. Second, we performed both quantitative and qualitative reconstruction fidelity studies for the generative models. Third, we carried out an ablation study to examine how the incorporation of histogram post-processing can enhance model performance. Finally, we executed a quantitative volumetric analysis of subcortical structures. Our findings indicate that some models effectively detect the primary alterations characterizing Down syndrome's brain anatomy, including a smaller cerebellum, enlarged ventricles, and cerebral cortex reduction, as well as the parietal lobe alterations caused by Alzheimer's disease.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 脳波代表学習のための個人用マルチモーダルラプラシアンドロップアウト(DP-MLD)

Differentially Private Multimodal Laplacian Dropout (DP-MLD) for EEG Representative Learning ( http://arxiv.org/abs/2409.13440v1 )

ライセンス: Link先を確認

Xiaowen Fu, Bingxin Wang, Xinzhou Guo, Guoqing Liu, Yang Xiang,

(参考訳) 近年,マルチモーダル脳波(EEG)学習は,疾患検出において大きな可能性を秘めている。同時に、法的・倫理的な懸念から、臨床研究におけるプライバシーの確保がますます重要になっている。プライバシー保護のための広く採用されているスキームは、その明確な解釈と実装の容易さのため、差分プライバシー(DP)である。 DP下では数多くの手法が提案されているが、モデルや信号データの複雑さのため、マルチモーダル脳波データについては広く研究されていない。本稿では,マルチモーダル脳波学習のためのDP-MLD方式を提案する。本稿では,言語モデルによる脳波データをテキストとして処理し,視覚変換器による脳波データを画像として処理する多モーダル代表学習モデルを提案する。 DPを実現するために,プライバシ予算内でランダム度割り当てと性能を動的に最適化する新しい適応型機能レベルのラプラシアンドロップアウト方式を設計する。パーキンソン病(PD)におけるフリーズ・オブ・ゲイト(FoG)のオープンソースマルチモーダルデータセットの実験において,提案手法は分類精度をおよそ4倍改善し,DP下でのマルチモーダル脳波学習における最先端性能を実現する。

Recently, multimodal electroencephalogram (EEG) learning has shown great promise in disease detection. At the same time, ensuring privacy in clinical studies has become increasingly crucial due to legal and ethical concerns. One widely adopted scheme for privacy protection is differential privacy (DP) because of its clear interpretation and ease of implementation. Although numerous methods have been proposed under DP, it has not been extensively studied for multimodal EEG data due to the complexities of models and signal data considered there. In this paper, we propose a novel Differentially Private Multimodal Laplacian Dropout (DP-MLD) scheme for multimodal EEG learning. Our approach proposes a novel multimodal representative learning model that processes EEG data by language models as text and other modal data by vision transformers as images, incorporating well-designed cross-attention mechanisms to effectively extract and integrate cross-modal features. To achieve DP, we design a novel adaptive feature-level Laplacian dropout scheme, where randomness allocation and performance are dynamically optimized within given privacy budgets. In the experiment on an open-source multimodal dataset of Freezing of Gait (FoG) in Parkinson's Disease (PD), our proposed method demonstrates an approximate 4\% improvement in classification accuracy, and achieves state-of-the-art performance in multimodal EEG learning under DP.

翻訳日:2024-11-07 07:17:49 公開日:2024-09-20

# 自然言語入力による階層学習を用いた検索・救助における選択的探索と情報収集

Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input ( http://arxiv.org/abs/2409.13445v1 )

ライセンス: Link先を確認

Dimitrios Panagopoulos, Adolfo Perrusquia, Weisi Guo,

(参考訳) 近年、ロボットと自律システムは私たちの日常生活にますます不可欠なものとなり、様々な領域にまたがる複雑な問題に対する解決策を提供してきた。しかし、SAR(Search and rescue)オペレーションにおけるそれらの応用は、ユニークな課題を提示している。災害に遭った地域を網羅的に探索することは、地形の広さ、変化する環境、そして関連する時間的制約のためにしばしば実現不可能である。従来のロボットシステムは、事前に定義された探索パターンで動作し、人間の利害関係者が提供する真実を取り入れ、活用する能力が欠如している。このギャップに対処するため,大規模言語モデル(LLM)と階層的強化学習(HRL)フレームワークを連携させるシステムを導入する。提案システムは,人間の利害関係者からの言語入力を実用的なRLインサイトへ翻訳し,検索戦略を調整するように設計されている。 LLMによる人為的情報の利用とHRLによるタスク実行の構造化により、我々のアプローチは自律能力と人間の知能のギャップを埋めるだけでなく、長い地平線とスパース報酬によって特徴づけられる環境におけるエージェントの学習効率と意思決定プロセスを大幅に改善する。

In recent years, robots and autonomous systems have become increasingly integral to our daily lives, offering solutions to complex problems across various domains. Their application in search and rescue (SAR) operations, however, presents unique challenges. Comprehensively exploring the disaster-stricken area is often infeasible due to the vastness of the terrain, transformed environment, and the time constraints involved. Traditional robotic systems typically operate on predefined search patterns and lack the ability to incorporate and exploit ground truths provided by human stakeholders, which can be the key to speeding up the learning process and enhancing triage. Addressing this gap, we introduce a system that integrates social interaction via large language models (LLMs) with a hierarchical reinforcement learning (HRL) framework. The proposed system is designed to translate verbal inputs from human stakeholders into actionable RL insights and adjust its search strategy. By leveraging human-provided information through LLMs and structuring task execution through HRL, our approach not only bridges the gap between autonomous capabilities and human intelligence but also significantly improves the agent's learning efficiency and decision-making process in environments characterised by long horizons and sparse rewards.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# Minstrel: 非AIエキスパートのためのマルチエージェントコーディネーションによる構造的プロンプト生成

Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts ( http://arxiv.org/abs/2409.13449v1 )

ライセンス: Link先を確認

Ming Wang, Yuanzhong Liu, Xiaoyu Liang, Yijie Huang, Daling Wang, Xiaocui Yang, Sijia Shen, Shi Feng, Xiaoming Zhang, Chaofeng Guan, Yifei Zhang,

(参考訳) LLMは様々な領域にまたがって高い性能を示してきた。それでも、彼らの仕事を助けるための高品質なプロンプトを定式化することは、非AI専門家にとって挑戦となる。プロンプトエンジニアリングにおける既存の研究は、幾らか分散した最適化原則と設計が経験的に依存したプロンプトオプティマイザを示唆している。残念なことに、これらの取り組みには構造的な設計がなく、高い学習コストが発生しており、特にAIの専門家以外の人々にとって、プロンプトの反復的な更新には適していない。構造的再利用可能なプログラミング言語に着想を得て,構造的プロンプト設計フレームワークであるLangGPTを提案する。さらに、構造的プロンプトの自動生成を実現するために、リフレクションを備えた多世代エージェントであるMinstrelを導入する。実験とケーススタディにより,ミンストレルが生成した構造的プロンプトや手書きによるLLMの性能向上が明らかに示された。さらに,オンラインコミュニティにおけるユーザ調査を通じて,構造的プロンプトの使いやすさを分析した。

LLMs have demonstrated commendable performance across diverse domains. Nevertheless, formulating high-quality prompts to assist them in their work poses a challenge for non-AI experts. Existing research in prompt engineering suggests somewhat scattered optimization principles and designs empirically dependent prompt optimizers. Unfortunately, these endeavors lack a structural design, incurring high learning costs and it is not conducive to the iterative updating of prompts, especially for non-AI experts. Inspired by structured reusable programming languages, we propose LangGPT, a structural prompt design framework. Furthermore, we introduce Minstrel, a multi-generative agent system with reflection to automate the generation of structural prompts. Experiments and the case study illustrate that structural prompts generated by Minstrel or written manually significantly enhance the performance of LLMs. Furthermore, we analyze the ease of use of structural prompts through a user survey in our online community.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# ADMMに基づくフェデレーションラーニング

Noise-Robust and Resource-Efficient ADMM-based Federated Learning ( http://arxiv.org/abs/2409.13451v1 )

ライセンス: Link先を確認

Ehsan Lari, Reza Arablouei, Vinay Chakravarthi Gogineni, Stefan Werner,

(参考訳) フェデレートラーニング(FL)は、クライアントサーバ通信を活用して、分散データ上でグローバルモデルをトレーニングする。しかし、通信ノイズやエラーはモデルの精度を損なう可能性がある。この問題に対処するために,通信負荷を低減しつつ,通信騒音に対する堅牢性を高める新しいFLアルゴリズムを提案する。本稿では,重み付き最小二乗回帰問題(WLS)を具体例として,提案アルゴリズムを導出する。本稿では,分散凸最適化問題としてのWLS回帰を,ランダムスケジューリングを用いた分散ネットワーク上での分散凸最適化問題として,通信効率の向上を目的とした。次に、この問題を反復的に解くために乗算器の交互方向法(ADMM)を適用する。累積的な通信雑音による有害な影響を抑えるため,両変数を排除し,各クライアントで新たなローカルモデル更新を実装することで,鍵となる修正を導入する。この微妙ながら効果的な変更により、各クライアントで2つではなく1つのノイズの多いグローバルモデル更新を使用することで、追加的な通信ノイズに対する堅牢性が改善される。さらに、サーバに選択されていなくてもクライアントがローカル更新を継続できるように、また別の修正を加えて、大幅なパフォーマンス改善を実現しました。我々の理論解析は,サーバが各繰り返しにおけるノイズの多いリンク上でクライアントのランダムなサブセットと通信する場合でも,平均および平均2乗感覚におけるアルゴリズムの収束を確認している。その結果,提案アルゴリズムの有効性を検証し,理論的知見を裏付ける結果を得た。

Federated learning (FL) leverages client-server communications to train global models on decentralized data. However, communication noise or errors can impair model accuracy. To address this problem, we propose a novel FL algorithm that enhances robustness against communication noise while also reducing communication load. We derive the proposed algorithm through solving the weighted least-squares (WLS) regression problem as an illustrative example. We first frame WLS regression as a distributed convex optimization problem over a federated network employing random scheduling for improved communication efficiency. We then apply the alternating direction method of multipliers (ADMM) to iteratively solve this problem. To counteract the detrimental effects of cumulative communication noise, we introduce a key modification by eliminating the dual variable and implementing a new local model update at each participating client. This subtle yet effective change results in using a single noisy global model update at each client instead of two, improving robustness against additive communication noise. Furthermore, we incorporate another modification enabling clients to continue local updates even when not selected by the server, leading to substantial performance improvements. Our theoretical analysis confirms the convergence of our algorithm in both mean and the mean-square senses, even when the server communicates with a random subset of clients over noisy links at each iteration. Numerical results validate the effectiveness of our proposed algorithm and corroborate our theoretical findings.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# 機械学習におけるパラメータ推定のためのランク1格子を用いたデータ圧縮

Data Compression using Rank-1 Lattices for Parameter Estimation in Machine Learning ( http://arxiv.org/abs/2409.13453v1 )

ライセンス: Link先を確認

Michael Gnewuch, Kumar Harsha, Marcin Wnuk,

(参考訳) 平均二乗誤差と正規化バージョンは、教師付き機械学習における標準損失関数である。しかし、これらの大きなデータセットに対する損失を計算することは、計算的に要求される。 J. Dick と M. Feischl [Journal of Complexity 67 (2021)] のアプローチを改良し、ランク1格子を用いて広範なデータセットを小さくするアルゴリズムを提案する。ランク1格子は準モンテカルロ(QMC)点集合であり、慎重に選択されたとしても多次元単位立方体においてよく分布する。前処理ステップの圧縮戦略は、すべての格子点に対して、元のデータと応答に依存する一対の重みを割り当て、その相対的な重要性を示す。その結果、圧縮されたデータにより、最適化ステップにおける繰り返し損失計算がより高速になる。我々は、QMCデータ圧縮アルゴリズムの誤差と、フーリエ係数が十分に高速に崩壊する関数に対する前処理ステップのコストを分析し、それらがある種のウィーナー代数やコロボフ空間に存在するようにした。特に、関数が十分に滑らかである限り、我々のアプローチが任意の高収束率につながることを証明している。

The mean squared error and regularized versions of it are standard loss functions in supervised machine learning. However, calculating these losses for large data sets can be computationally demanding. Modifying an approach of J. Dick and M. Feischl [Journal of Complexity 67 (2021)], we present algorithms to reduce extensive data sets to a smaller size using rank-1 lattices. Rank-1 lattices are quasi-Monte Carlo (QMC) point sets that are, if carefully chosen, well-distributed in a multidimensional unit cube. The compression strategy in the preprocessing step assigns every lattice point a pair of weights depending on the original data and responses, representing its relative importance. As a result, the compressed data makes iterative loss calculations in optimization steps much faster. We analyze the errors of our QMC data compression algorithms and the cost of the preprocessing step for functions whose Fourier coefficients decay sufficiently fast so that they lie in certain Wiener algebras or Korobov spaces. In particular, we prove that our approach can lead to arbitrary high convergence rates as long as the functions are sufficiently smooth.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# コンピュータビジョンにおける概念に基づく説明:我々はどこにいてどこへ行くのか?

Concept-Based Explanations in Computer Vision: Where Are We and Where Could We Go? ( http://arxiv.org/abs/2409.13456v1 )

ライセンス: Link先を確認

Jae Hee Lee, Georgii Mikriukov, Gesina Schwalbe, Stefan Wermter, Diedrich Wolter,

(参考訳) 概念に基づくXAI(C-XAI)アプローチは、概念(画像の中の意味論的意味のある部分)を参照する説明は直感的に理解でき、関連する領域のみを明らかにする唾液ベースのテクニックを越えているため、将来的な研究分野である。近年のこの分野の顕著な進歩を考えると、コミュニティは進歩とトレンドを批判的に見る時が来た。そこで本研究では,C-XAI法を用いて,興味深く未探索な領域を同定し,今後の研究方向性を提案する。この目的のために、説明すべき概念の選択、概念表現の選択、概念の制御方法の3つの主な方向を考える。後者では,知識表現と学習の分野からインスピレーションを得る手法を提案し,これが今後のC-XAI研究をいかに充実させるかを示した。

Concept-based XAI (C-XAI) approaches to explaining neural vision models are a promising field of research, since explanations that refer to concepts (i.e., semantically meaningful parts in an image) are intuitive to understand and go beyond saliency-based techniques that only reveal relevant regions. Given the remarkable progress in this field in recent years, it is time for the community to take a critical look at the advances and trends. Consequently, this paper reviews C-XAI methods to identify interesting and underexplored areas and proposes future research directions. To this end, we consider three main directions: the choice of concepts to explain, the choice of concept representation, and how we can control concepts. For the latter, we propose techniques and draw inspiration from the field of knowledge representation and learning, showing how this could enrich future C-XAI research.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# Facebook URLデータセットにおける時間的エンゲージメント、コンテンツ品質、イデオロギー

Engagement, Content Quality and Ideology over Time on the Facebook URL Dataset ( http://arxiv.org/abs/2409.13461v1 )

ライセンス: Link先を確認

Emma Fraxanet, Fabrizio Germano, Andreas Kaltenbrunner, Vicenç Gómez,

(参考訳) ソーシャルメディア利用者のイデオロギーとオンラインニュース消費の関係を解き放つことで、ユーザのエンゲージメント行動とレコメンダシステムのコンテンツ提供とのフィードバックループに関する重要な洞察が得られる。しかしながら、プラットフォームによって引き起こされる影響から固有のユーザの振る舞いを遠ざけることは、特に限られた期間をカバーするデータセットを扱う場合、大きな課題となる。本研究では、2017年1月から2020年12月まで、米国におけるニュースURLに関連するユーザエンゲージメント指標を調査し、Facebook Privacy-Protected Full URLs Datasetを用いて、総括分析と縦断解析を行った。ニュースソースのイデオロギー的アライメントと質を,ユーザの政治的嗜好と合わせて取り入れることで,リベラル,保守的,中道的な読者を対象に,イデオロギーとニュース消費の質の重み付け平均を構築した。これにより、進化の追跡が可能になります。一リベラル派と保守派のイデオロギー的ギャップ (ii)各グループのニュース消費の平均品質。これらの指標は、偏光や誤報のようなより広い現象と関連付けられている。両指標のトレンドには,ユーザエンゲージメントの変化に伴う2つの大きな変化がある。両点ともイデオロギー的ギャップが拡大し,ニュース品質が低下するが,第1点以降はエンゲージメントが増加し,第2点以降は減少する。最後に、Facebookのニュースフィードアルゴリズムの2つのメジャーアップデートとの関係について議論することで、これらの変更を文脈化します。

Unpacking the relationship between the ideology of social media users and their online news consumption offers critical insight into the feedback loop between users' engagement behavior and the recommender systems' content provision. However, disentangling inherent user behavior from platform-induced influences poses significant challenges, particularly when working with datasets covering limited time periods. In this study, we conduct both aggregate and longitudinal analyses using the Facebook Privacy-Protected Full URLs Dataset, examining user engagement metrics related to news URLs in the U.S. from January 2017 to December 2020. By incorporating the ideological alignment and quality of news sources, along with users' political preferences, we construct weighted averages of ideology and quality of news consumption for liberal, conservative, and moderate audiences. This allows us to track the evolution of (i) the ideological gap between liberals and conservatives and (ii) the average quality of each group's news consumption. These metrics are linked to broader phenomena such as polarization and misinformation. We identify two significant shifts in trends for both metrics, each coinciding with changes in user engagement. Interestingly, during both inflection points, the ideological gap widens and news quality declines; however, engagement increases after the first one and decreases after the second. Finally, we contextualize these changes by discussing their potential relation to two major updates to Facebook's News Feed algorithm.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# 畳み込みニューラルネットワークを用いた圧縮画像のロバスト能動物体検出

Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks ( http://arxiv.org/abs/2409.13464v1 )

ライセンス: Link先を確認

Guibiao Liao, Wei Gao,

(参考訳) 健全物体検出(SOD)は近年大きく進歩している。実際のシナリオでは、圧縮画像(CI)がデータ転送と記憶の主要な媒体となる。しかし、畳み込みニューラルネットワーク(CNN)を用いた圧縮画像のSODに向けて注意が向けられている。本稿では,圧縮画像上でのCNNに基づく有意な物体検出の厳密なベンチマークと解析を行う。この問題を包括的に研究するために、既存の公開SODデータセットからさまざまなCI SODデータセットを慎重に確立する。次に, 圧縮画像(約264万画像)上での強靭性の評価を行い, 代表的CNNに基づくSOD法について検討した。重要な点として,評価結果は2つの重要な発見である。 1) 現在最先端のCNNベースのSODモデルは、クリーンな画像に優れたが、圧縮された画像に適用すると大きなパフォーマンスボトルネックが生じる。 2)CI SODのロバスト性に影響を与える主な要因は,圧縮画像の特徴と,有意な特徴学習の限界に根ざしている。これらの観測に基づいて、我々は、堅牢なCNNベースのCI SODを実現するために、ロバストな特徴表現学習に焦点を当てた、単純で有望なベースラインフレームワークを提案する。本手法の有効性を実証し, クリーンなデータに対する競合精度を維持しつつ, 画像劣化の度合いを著しく改善したことを示す。我々は、CNNベースのSODアルゴリズムの堅牢性をより包括的に理解し、コミュニティにおける今後の研究を促進するために、ベンチマークの取り組み、分析的洞察、提案された技術が貢献できることを願っている。

Salient object detection (SOD) has achieved substantial progress in recent years. In practical scenarios, compressed images (CI) serve as the primary medium for data transmission and storage. However, scant attention has been directed towards SOD for compressed images using convolutional neural networks (CNNs). In this paper, we are dedicated to strictly benchmarking and analyzing CNN-based salient object detection on compressed images. To comprehensively study this issue, we meticulously establish various CI SOD datasets from existing public SOD datasets. Subsequently, we investigate representative CNN-based SOD methods, assessing their robustness on compressed images (approximately 2.64 million images). Importantly, our evaluation results reveal two key findings: 1) current state-of-the-art CNN-based SOD models, while excelling on clean images, exhibit significant performance bottlenecks when applied to compressed images. 2) The principal factors influencing the robustness of CI SOD are rooted in the characteristics of compressed images and the limitations in saliency feature learning. Based on these observations, we propose a simple yet promising baseline framework that focuses on robust feature representation learning to achieve robust CNN-based CI SOD. Extensive experiments demonstrate the effectiveness of our approach, showcasing markedly improved robustness across various levels of image degradation, while maintaining competitive accuracy on clean data. We hope that our benchmarking efforts, analytical insights, and proposed techniques will contribute to a more comprehensive understanding of the robustness of CNN-based SOD algorithms, inspiring future research in the community.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# 片道横断CNOTゲートによる高能率耐故障コードスイッチング

Efficient fault-tolerant code switching via one-way transversal CNOT gates ( http://arxiv.org/abs/2409.13465v1 )

ライセンス: Link先を確認

Sascha Heußen, Janine Hilder,

(参考訳) コードスイッチングは、2つのQEC符号と相補的なゲートセットを組み合わせることで、FT量子ゲート操作の普遍的なセットを容易にする確立された技術であり、それぞれがフォールトトレラントの実装が容易である。本研究では,FT回路設計の制約を考慮に入れたコードスイッチング方式を提案する。これらのゲートは本質的にFTであり、追加のキュービットオーバーヘッドはない。我々は、既存の量子プロセッサ(例えば、閉じ込められたイオンや中性原子)での動作に適した、低距離カラーコードへのスキームの適用を解析する。超伝導量子ビットに基づくアーキテクチャにおいて生じる接続制約について,簡潔に論じる。回路レベルの雑音の数値シミュレーションにより,本手法により促進される論理的な$T$ゲートは,フラグ-FTマジック状態注入プロトコルと物理値の$T$ゲートを低物理誤差で上回る可能性が示唆された。トランスバーサルコードスイッチングは、任意のコード距離のコードペアに自然にスケールする。距離3実装と物理ゲートの両方と比較して,距離5プロトコルの性能向上を観察し,現実的に実現可能な物理エンタングゲート誤差率について検討した。論理的補助量子ビットが十分に確実に準備できることを前提として、このスキームを大規模な並列化でどのように実装できるかを論じる。当社の論理的な$T$-gateは、コストのかかる州立工場を回避します。 QECを実行し、FTユニバーサルゲートセットを達成するための要件は、基本的に同じである: 論理補助キュービットをオフラインに準備し、トランスバースゲートを実行し、高速に測定する。したがって、トランスバーサル符号切替は、FT普遍量子計算のより実用的なハードウェア実現を可能にする。このスキームは、論理量子ビット上で実行される量子アルゴリズムの実験的なデモンストレーションのためのリソース要件を緩和する。

Code switching is an established technique that facilitates a universal set of FT quantum gate operations by combining two QEC codes with complementary sets of gates, which each by themselves are easy to implement fault-tolerantly. In this work, we present a code switching scheme that respects the constraints of FT circuit design by only making use of transversal gates. These gates are intrinsically FT without additional qubit overhead. We analyze application of the scheme to low-distance color codes, which are suitable for operation in existing quantum processors, for instance based on trapped ions or neutral atoms. We briefly discuss connectivity constraints that arise for architectures based on superconducting qubits. Numerical simulations of circuit-level noise indicate that a logical $T$-gate, facilitated by our scheme, could outperform both flag-FT magic state injection protocols and a physical $T$-gate at low physical error rates. Transversal code switching naturally scales to code pairs of arbitrary code distance. We observe improved performance of a distance-5 protocol compared to both the distance-3 implementation and the physical gate for realistically attainable physical entangling gate error rates. We discuss how the scheme can be implemented with a large degree of parallelization, provided that logical auxiliary qubits can be prepared reliably enough. Our logical $T$-gate circumvents potentially costly magic state factories. The requirements to perform QEC and to achieve an FT universal gate set are then essentially the same: Prepare logical auxiliary qubits offline, execute transversal gates and perform fast-enough measurements. Transversal code switching thus serves to enable more practical hardware realizations of FT universal quantum computation. The scheme alleviates resource requirements for experimental demonstrations of quantum algorithms run on logical qubits.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# 孤立林を用いたフェデレーション学習環境におけるグローバル・アウトリー検出

Global Outlier Detection in a Federated Learning Setting with Isolation Forest ( http://arxiv.org/abs/2409.13466v1 )

ライセンス: Link先を確認

Daniele Malpetti, Laura Azzimonti,

(参考訳) 本稿では,特にクロスサイロシナリオをターゲットとした,フェデレーション学習環境におけるグローバルなアウトレイラの検出手法を提案する。当社のアプローチでは、2つのサーバの使用と、クライアントから1つのサーバにマスキングされたローカルデータの送信を伴います。データのマスキングは、外れ値の識別を引き続き許可しながら、機密情報の開示を防止する。さらに、プライバシーをさらに保護するために、サーバがどのクライアントがマスキングされたデータポイントを所有しているかを知らないよう、置換機構を実装している。サーバは、アイソレーションフォレストまたはその拡張バージョンを使用して、マスクされたデータに対する外れ値検出を実行し、クライアントにアウト値情報を送信し、その後のフェデレーションモデルトレーニングを開始する前に、ローカルデータセットの外れ値の識別と削除を可能にする。このアプローチは、プレーンデータに対する分離フォレストアルゴリズムの集中実行に匹敵する結果をもたらす。

We present a novel strategy for detecting global outliers in a federated learning setting, targeting in particular cross-silo scenarios. Our approach involves the use of two servers and the transmission of masked local data from clients to one of the servers. The masking of the data prevents the disclosure of sensitive information while still permitting the identification of outliers. Moreover, to further safeguard privacy, a permutation mechanism is implemented so that the server does not know which client owns any masked data point. The server performs outlier detection on the masked data, using either Isolation Forest or its extended version, and then communicates outlier information back to the clients, allowing them to identify and remove outliers in their local datasets before starting any subsequent federated model training. This approach provides comparable results to a centralized execution of Isolation Forest algorithms on plain data.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# ライドバーグ原子イオン分子の振動結合

Vibrationally coupled Rydberg atom-ion molecules ( http://arxiv.org/abs/2409.13469v1 )

ライセンス: Link先を確認

Ilango Maran, Liam J. Bond, Jeremy T. Young, Arghavan Safavi-Naini, Rene Gerritsma,

(参考訳) 両端にRydberg原子と結合したポールトラップにイオン結晶が閉じ込められたハイブリッド原子イオン系におけるRydberg原子イオン分子(RAIMs)の発生について検討した。このようなシステムの実現可能性を評価するため、我々はポールトラップのrf電位がRAIMに与える影響を詳細にFloquet解析し、スケーリング法則に基づく生存確率の定性解析を行う。 RAIMは十分に弱い低周波トラップに対して生存する。次に、このハイブリッドシステムを用いて、イオン結晶の共通運動モードを利用して、チェーンの端で2つのRAIMを形成する確率を抑制(遮断)または強化(阻害)し、典型的な遮断半径をイオン結晶の長さで置き換える手法を提案する。

We study the occurrence of Rydberg atom-ion molecules (RAIMs) in a hybrid atom-ion system with an ion crystal trapped in a Paul trap coupled to Rydberg atoms on its either ends. To assess the feasibility of such a system, we perform a detailed Floquet analysis of the effect of the Paul trap's rf potential on the RAIMs and provide a qualitative analysis of the survival probability based on scaling laws. We conclude that the RAIM survives for sufficiently weak and low frequency traps. We then use this hybrid system and propose a scheme to utilise the common motional modes of the ion crystal to suppress (blockade) or enhance (anti-blockade) the probability of forming two RAIMs at the ends of the chain, replacing the typical blockade radius by the length of the ion crystal.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# 決定論的・確率的動的分類法--雑音を伴うランダム対逆攻撃に対抗する

Deterministic versus stochastic dynamical classifiers: opposing random adversarial attacks with noise ( http://arxiv.org/abs/2409.13470v1 )

ライセンス: Link先を確認

Lorenzo Chicchi, Duccio Fanelli, Diego Febbe, Lorenzo Buffoni, Francesca Di Patti, Lorenzo Giambagli, Raffele Marino,

(参考訳) 興奮性生物学的ニューロンの相互交叉ダイナミクスを記述するために神経科学で広く用いられている連続可変フィリングレート(CVFR)モデルは、ここで訓練され、動的に補助される分類器としてテストされる。この目的のために、モデルは、そのスペクトル分解を通じて、ノード間結合行列に自己整合的に埋め込まれた植込み誘引器のセットを供給される。金額を分類する学習は、課された均衡のアトラクションの盆地を削り、それぞれの関係のクラスを反映した、対応する目的地目標に向けて異なる項目を誘導する。 CVFRモデルの確率的変種も研究され、不可逆的ランダム攻撃に対して頑健であることが判明し、分類対象の項目が破損した。この驚くべき発見は、ノイズと動的特性が互いに共鳴するときに生じる、非常に多くの驚くべき影響の1つである。

The Continuous-Variable Firing Rate (CVFR) model, widely used in neuroscience to describe the intertangled dynamics of excitatory biological neurons, is here trained and tested as a veritable dynamically assisted classifier. To this end the model is supplied with a set of planted attractors which are self-consistently embedded in the inter-nodes coupling matrix, via its spectral decomposition. Learning to classify amounts to sculp the basin of attraction of the imposed equilibria, directing different items towards the corresponding destination target, which reflects the class of respective pertinence. A stochastic variant of the CVFR model is also studied and found to be robust to aversarial random attacks, which corrupt the items to be classified. This remarkable finding is one of the very many surprising effects which arise when noise and dynamical attributes are made to mutually resonate.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# 皮質誘導性バイアスを伴うRNNにおける刺激と刺激の学習

Stimulus-to-Stimulus Learning in RNNs with Cortical Inductive Biases ( http://arxiv.org/abs/2409.13471v1 )

ライセンス: Link先を確認

Pantelis Vafidis, Antonio Rangel,

(参考訳) 動物は、条件付けのプロセスを通じて経験から外部の事象を予測することを学ぶ。条件付けの自然なメカニズムは刺激の置換であり、これまでの行動的意義のない刺激に対する神経反応は、それが確実に予測する行動学的に重要な刺激によって生成されるものと徐々に同一になる。本研究では,脳皮質における誘導バイアスの2つの形態を応用した刺激置換モデルを提案する。複合刺激表現の形式における表現誘導バイアスと,皮質連想学習の基本単位として機能することが示されている2成分錐体ニューロンの形式におけるアーキテクチャ誘導バイアスである。これらのニューロンの性質は、刺激置換を実装し、シナプスでローカルに利用可能な情報のみを利用する生物学的に妥当な学習規則を可能にする。本モデルでは, 各種条件付け現象を多岐にわたって生成し, 個々の実験課題のパラメータ微調整に頼らずに, 動物実験と共生する訓練量の関連性を学習できることを示す。対照的に、よく用いられるヘビアン規則は、混合選択性による一般的な刺激-刺激関連を学習できず、タスク固有のパラメータの微調整が必要であることを示す。我々の枠組みは、大脳皮質におけるマルチコンパートメントニューロン処理の重要性を強調し、大脳皮質動物を進化の端とみなす方法を示している。

Animals learn to predict external contingencies from experience through a process of conditioning. A natural mechanism for conditioning is stimulus substitution, whereby the neuronal response to a stimulus with no prior behavioral significance becomes increasingly identical to that generated by a behaviorally significant stimulus it reliably predicts. We propose a recurrent neural network model of stimulus substitution which leverages two forms of inductive bias pervasive in the cortex: representational inductive bias in the form of mixed stimulus representations, and architectural inductive bias in the form of two-compartment pyramidal neurons that have been shown to serve as a fundamental unit of cortical associative learning. The properties of these neurons allow for a biologically plausible learning rule that implements stimulus substitution, utilizing only information available locally at the synapses. We show that the model generates a wide array of conditioning phenomena, and can learn large numbers of associations with an amount of training commensurate with animal experiments, without relying on parameter fine-tuning for each individual experimental task. In contrast, we show that commonly used Hebbian rules fail to learn generic stimulus-stimulus associations with mixed selectivity, and require task-specific parameter fine-tuning. Our framework highlights the importance of multi-compartment neuronal processing in the cortex, and showcases how it might confer cortical animals the evolutionary edge.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# Flotta: セキュアでフレキシブルなSparkにインスパイアされたフェデレーション学習フレームワーク

Flotta: a Secure and Flexible Spark-inspired Federated Learning Framework ( http://arxiv.org/abs/2409.13473v1 )

ライセンス: Link先を確認

Claudio Bonesana, Daniele Malpetti, Sandra Mitrović, Francesca Mangili, Laura Azzimonti,

(参考訳) Flottaは、バイオメディカルフィールドのような高度なセキュリティを必要とする状況下で研究を行う多党コンソーシアムに分散されたセンシティブなデータに基づいて機械学習モデルをトレーニングするために設計されたフェデレートラーニングフレームワークである。 FlottaはPythonパッケージで、Apache Sparkのいくつかの側面にインスパイアされたもので、柔軟性とセキュリティの両方を提供し、コンソーシアム内部のマシンのみを使用して研究を行うことができる。本稿では,フレームワークの主要なコンポーネントと,フレームワークの能力とセキュリティ,柔軟性,ユーザフレンドリさを強調する実践的なユースケースについて述べる。

We present Flotta, a Federated Learning framework designed to train machine learning models on sensitive data distributed across a multi-party consortium conducting research in contexts requiring high levels of security, such as the biomedical field. Flotta is a Python package, inspired in several aspects by Apache Spark, which provides both flexibility and security and allows conducting research using solely machines internal to the consortium. In this paper, we describe the main components of the framework together with a practical use case to illustrate the framework's capabilities and highlight its security, flexibility and user-friendliness.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# 大規模言語モデルにおける非学習ファクチュアル知識の代替選好最適化

Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models ( http://arxiv.org/abs/2409.13474v1 )

ライセンス: Link先を確認

Anmol Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David Koleczek, Mukund Rungta, Sadid Hasan, Elita Lobo,

(参考訳) 機械学習は、特定のトレーニングデータの影響をモデルから効率的に排除することを目的としている。しかし、既存のLarge Language Models (LLMs) の未学習メソッドは、無視セットに関連する応答を抑えるために、負のフィードバックのみに頼っているため、しばしば非感覚的あるいは一貫性のないアウトプットが発生し、モデルの有用性を低下させ、潜在的なプライバシーリスクを生じさせる、という重大な課題に直面している。この制限に対処するため、我々はAltPO(Alternate Preference Optimization)と呼ばれる新しい手法を提案する。また,新たな評価指標を導入し,その評価基準の妥当性を検証した。大規模な実験により、我々のアプローチは効果的なアンラーニングを可能にするだけでなく、全体的なモデル性能を維持しながら、望ましくないモデル動作を避けることができることが示された。

Machine unlearning aims to efficiently eliminate the influence of specific training data, known as the forget set, from the model. However, existing unlearning methods for Large Language Models (LLMs) face a critical challenge: they rely solely on negative feedback to suppress responses related to the forget set, which often results in nonsensical or inconsistent outputs, diminishing model utility and posing potential privacy risks. To address this limitation, we propose a novel approach called Alternate Preference Optimization (AltPO), which combines negative feedback with in-domain positive feedback on the forget set. Additionally, we introduce new evaluation metrics to assess the quality of responses related to the forget set. Extensive experiments show that our approach not only enables effective unlearning but also avoids undesirable model behaviors while maintaining overall model performance.

翻訳日:2024-11-07 07:04:14 公開日:2024-09-20

# PLOT: 部品発見に対応する部分スロットアテンション付きテキストベースの人物検索

PLOT: Text-based Person Search with Part Slot Attention for Corresponding Part Discovery ( http://arxiv.org/abs/2409.13475v1 )

ライセンス: Link先を確認

Jicheol Park, Dongwon Kim, Boseung Jeong, Suha Kwak,

(参考訳) 膨大な画像コレクション内の個人を特定するために自由形式のテキストクエリを利用するテキストベースの人物検索は、視覚的およびテキスト的表現、特に人間の部分レベルでの調整において、ユニークな課題を提示する。既存の手法は、直接的な部分レベルの監督やヒューリスティックな特徴への依存が欠如しているため、部分的な特徴抽出とアライメントに苦慮することが多い。本稿では、スロットアテンションに基づく部分発見モジュールを活用して、特異部分をモジュール間で自律的に識別・整列し、明示的な部分レベルの対応監督を伴わずに解釈可能性と検索精度を向上させる新しいフレームワークを提案する。さらに、テキストベースの動的部分注意は各部分の重要性を調整し、検索結果をさらに改善する。提案手法は3つの公開ベンチマークで評価され,既存手法よりも優れていた。

Text-based person search, employing free-form text queries to identify individuals within a vast image collection, presents a unique challenge in aligning visual and textual representations, particularly at the human part level. Existing methods often struggle with part feature extraction and alignment due to the lack of direct part-level supervision and reliance on heuristic features. We propose a novel framework that leverages a part discovery module based on slot attention to autonomously identify and align distinctive parts across modalities, enhancing interpretability and retrieval accuracy without explicit part-level correspondence supervision. Additionally, text-based dynamic part attention adjusts the importance of each part, further improving retrieval outcomes. Our method is evaluated on three public benchmarks, significantly outperforming existing methods.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# 皮膚科医のような説明可能なAIはメラノーマの診断精度を高める:眼球追跡研究

Dermatologist-like explainable AI enhances melanoma diagnosis accuracy: eye-tracking study ( http://arxiv.org/abs/2409.13476v1 )

ライセンス: Link先を確認

Tirtha Chanda, Sarah Haggenmueller, Tabea-Clara Bucher, Tim Holland-Letz, Harald Kittler, Philipp Tschandl, Markus V. Heppt, Carola Berking, Jochen S. Utikal, Bastian Schilling, Claudia Buerger, Cristian Navarrete-Dechent, Matthias Goebeler, Jakob Nikolas Kather, Carolin V. Schneider, Benjamin Durani, Hendrike Durani, Martin Jansen, Juliane Wacker, Joerg Wacker, Reader Study Consortium, Titus J. Brinker,

(参考訳) 人工知能(AI)システムは、皮膚科医のメラノーマの診断精度を大幅に改善し、説明可能なAI(XAI)システムは、臨床医のAIによる決定に対する信頼と信頼をさらに高めた。これらの進歩にもかかわらず、皮膚科医がAIとXAIの両方のツールとどのように関わるかの客観的評価には、依然として重要な必要性がある。そこで本研究では,76名の皮膚科医を対象に,XAIシステムを用いてメラノーマとネビの16例の皮膚内視鏡像の診断を行った。視線追跡技術は、その相互作用を評価するために用いられた。診断性能は、説明的特徴を欠いた標準的なAIシステムと比較された。以上の結果から,XAIシステムは診断精度を標準AIと比較して2.8ポイント向上した。さらに,AI/XAIシステムと複雑な病変との診断上の相違は,視力の増大による認知負荷の増加と関連していた。これらの知見は、臨床実践、視覚タスクのためのAIツールの設計、医療診断におけるXAIの広範な発展に重要な意味を持っている。

Artificial intelligence (AI) systems have substantially improved dermatologists' diagnostic accuracy for melanoma, with explainable AI (XAI) systems further enhancing clinicians' confidence and trust in AI-driven decisions. Despite these advancements, there remains a critical need for objective evaluation of how dermatologists engage with both AI and XAI tools. In this study, 76 dermatologists participated in a reader study, diagnosing 16 dermoscopic images of melanomas and nevi using an XAI system that provides detailed, domain-specific explanations. Eye-tracking technology was employed to assess their interactions. Diagnostic performance was compared with that of a standard AI system lacking explanatory features. Our findings reveal that XAI systems improved balanced diagnostic accuracy by 2.8 percentage points relative to standard AI. Moreover, diagnostic disagreements with AI/XAI systems and complex lesions were associated with elevated cognitive load, as evidenced by increased ocular fixations. These insights have significant implications for clinical practice, the design of AI tools for visual tasks, and the broader development of XAI in medical diagnostics.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# コンテンツ・スタイルモデリングに基づくガイド付きマルチコントラストMRI再構成のためのプラグ・アンド・プレイ法

A Plug-and-Play Method for Guided Multi-contrast MRI Reconstruction based on Content/Style Modeling ( http://arxiv.org/abs/2409.13477v1 )

ライセンス: Link先を確認

Chinmay Rao, Matthias van Osch, Nicola Pezzotti, Jeroen de Bresser, Laurens Beljaards, Jakob Meineke, Elwin de Weerdt, Huangling Lu, Mariya Doneva, Marius Staring,

(参考訳) 同じ解剖学の複数のMRIコントラストには冗長な情報が含まれているため、アンサンプされた後続のコントラストの再構築を導くための先行として、1コントラストが使用できる。この目的のために,学習に基づく指導的再構築手法が提案されている。しかし、2つの重要な課題が残っている。 (a)大規模なペアトレーニングデータセットの要件 b) モデルの内部表現の直感的な理解の欠如と共有情報の活用。本稿では,これらの課題に対処するため,ガイド付き再構築のためのモジュラー2段階アプローチを提案する。 2コントラスト画像データのコンテンツ/スタイルモデルは、ほとんど損なわれない方法で学習され、その後、反復再構成においてプラグ・アンド・プレイ演算子として適用される。内容とスタイルのアンタングル化は、コントラスト非依存およびコントラスト固有の要因の明示的な表現を可能にする。これに基づいて、事前情報を再構成に組み込むことにより、エイリアス化された再構成内容と参照スキャンから派生したクリーンコンテンツとを簡易に置き換えることができる。この手法をPnP-MUNITと呼ぶ。解釈可能性や収束性といった様々な側面をシミュレーションで調べる。さらに、その実用性はNYU fastMRI DICOMデータセットと2つの社内生データセットで実証され、与えられたSSIMの学習ベースの非誘導的再構成よりも最大32.6%の高速化が得られる。放射線学的な課題として、PnP-MUNITは診断品質における臨床再建よりも33.3%の加速を可能にした。

Since multiple MRI contrasts of the same anatomy contain redundant information, one contrast can be used as a prior for guiding the reconstruction of an undersampled subsequent contrast. To this end, several learning-based guided reconstruction methods have been proposed. However, two key challenges remain - (a) the requirement of large paired training datasets and (b) the lack of intuitive understanding of the model's internal representation and utilization of the shared information. We propose a modular two-stage approach for guided reconstruction, addressing these challenges. A content/style model of two-contrast image data is learned in a largely unpaired manner and is subsequently applied as a plug-and-play operator in iterative reconstruction. The disentanglement of content and style allows explicit representation of contrast-independent and contrast-specific factors. Based on this, incorporating prior information into the reconstruction reduces to simply replacing the aliased reconstruction content with clean content derived from the reference scan. We name this novel approach PnP-MUNIT. Various aspects like interpretability and convergence are explored via simulations. Furthermore, its practicality is demonstrated on the NYU fastMRI DICOM dataset and two in-house raw datasets, obtaining up to 32.6% more acceleration over learning-based non-guided reconstruction for a given SSIM. In a radiological task, PnP-MUNIT allowed 33.3% more acceleration over clinical reconstruction at diagnostic quality.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# ディバイドとコンカレントに基づくシンボル脆弱性検出

Divide and Conquer based Symbolic Vulnerability Detection ( http://arxiv.org/abs/2409.13478v1 )

ライセンス: Link先を確認

Christopher Scherb, Luc Bryan Heitz, Hermann Grieder,

(参考訳) 現代のソフトウェア開発では、複雑なソフトウェアシステムのバグや脆弱性が避けられないため、脆弱性検出が不可欠である。テストフェーズにおけるこれらの脆弱性の検出と排除が不可欠である。ファジィングなどの現在の手法はこの目的のために広く用いられている。ファジィングは、ランダムな突然変異や世代を用いて広範囲のバグや脆弱性を特定するのに効率的であるが、脆弱性の正しさや欠如を保証しない。したがって、重要インフラと制御システムの安全性と安全性を確保するために、非ランダムな手法が好ましい。本稿では,各種ソフトウェア脆弱性を特定するために,シンボル実行と制御フローグラフ解析に基づく脆弱性検出手法を提案する。提案手法では,無関係なプログラム情報を排除し,その処理を高速化し,従来のシンボル実行法やモデル検査法と比較して大規模プログラムの解析を可能にする。

In modern software development, vulnerability detection is crucial due to the inevitability of bugs and vulnerabilities in complex software systems. Effective detection and elimination of these vulnerabilities during the testing phase are essential. Current methods, such as fuzzing, are widely used for this purpose. While fuzzing is efficient in identifying a broad range of bugs and vulnerabilities by using random mutations or generations, it does not guarantee correctness or absence of vulnerabilities. Therefore, non-random methods are preferable for ensuring the safety and security of critical infrastructure and control systems. This paper presents a vulnerability detection approach based on symbolic execution and control flow graph analysis to identify various types of software weaknesses. Our approach employs a divide-and-conquer algorithm to eliminate irrelevant program information, thus accelerating the process and enabling the analysis of larger programs compared to traditional symbolic execution and model checking methods.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# Invertible ResNets for Inverse Imaging Problems: Competitive Performance with Provable Regularization Properties (特集:情報ネットワーク)

Invertible ResNets for Inverse Imaging Problems: Competitive Performance with Provable Regularization Properties ( http://arxiv.org/abs/2409.13482v1 )

ライセンス: Link先を確認

Clemens Arndt, Judith Nickel,

(参考訳) 学習に基づく手法は、逆問題、特に画像再構成タスクにおいて顕著な性能を示した。彼らの成功にもかかわらず、これらのアプローチは理論的な保証を欠くことが多く、医療画像のようなセンシティブな応用に不可欠である。 Arndt et al (2023 Inverse Problems 39 125018, 2024 Inverse Problems 40 045021) による最近の研究は、非可逆残差ネットワーク (iResNets) に基づくデータ駆動再構築法を解析することによって、このギャップに対処している。彼らは合理的な仮定の下で、このアプローチが収束正則化スキームを構成することを明らかにした。しかし, 再現法の性能は, 学術的な玩具問題や小型のiResNetアーキテクチャでのみ検証された。本研究では,2つの実世界の画像処理タスク(線形ぼやけた演算子と非線形拡散演算子)におけるiResNetsの性能を評価することで,このギャップに対処する。そのため、Arndtらによる理論的結果のいくつかを非線形逆問題を含むように拡張し、大規模高性能iResNetアーキテクチャの設計に対する洞察を提供する。数値実験により,iResNetモデルの性能を最先端のニューラルネットワークと比較し,その有効性を確認した。さらに,本手法の理論的保証を数値的に検討し,ネットワークの可逆性によって学習したフォワード演算子とその学習正規化をより深く解析できることを示す。

Learning-based methods have demonstrated remarkable performance in solving inverse problems, particularly in image reconstruction tasks. Despite their success, these approaches often lack theoretical guarantees, which are crucial in sensitive applications such as medical imaging. Recent works by Arndt et al (2023 Inverse Problems 39 125018, 2024 Inverse Problems 40 045021) addressed this gap by analyzing a data-driven reconstruction method based on invertible residual networks (iResNets). They revealed that, under reasonable assumptions, this approach constitutes a convergent regularization scheme. However, the performance of the reconstruction method was only validated on academic toy problems and small-scale iResNet architectures. In this work, we address this gap by evaluating the performance of iResNets on two real-world imaging tasks: a linear blurring operator and a nonlinear diffusion operator. To do so, we extend some of the theoretical results from Arndt et al to encompass nonlinear inverse problems and offer insights for the design of large-scale performant iResNet architectures. Through numerical experiments, we compare the performance of our iResNet models against state-of-the-art neural networks, confirming their efficacy. Additionally, we numerically investigate the theoretical guarantees of this approach and demonstrate how the invertibility of the network enables a deeper analysis of the learned forward operator and its learned regularization.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# 音声によるオープンドメイン質問応答に対する多モーダルDense Retrievalアプローチ

A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering ( http://arxiv.org/abs/2409.13483v1 )

ライセンス: Link先を確認

Georgios Sidiropoulos, Evangelos Kanoulas,

(参考訳) 音声インタフェースを介してQAシステムと対話するユーザの増加に伴い、音声ベースのオープンドメイン質問応答(大量のコーパスと音声質問を含むQA)が重要な課題となっている。音声ベースのオープンドメインQAでは,パス検索が重要な課題である。これまでの研究では、高密度テキストレトリバーに入力する前に音声質問を書き起こす自動音声認識(ASR)モデルによるパイプラインを採用していた。このようなパイプラインにはいくつかの制限がある。 ASRモデルの必要性は、アノテートされた音声データを持たない低リソース言語や特殊なドメインに適用性を制限する。さらに、ASRモデルは、そのエラーをレトリバーに伝達する。本研究では、音声質問を直接処理可能な、ASRフリーでエンドツーエンドにトレーニングされた多モーダル高密度検索器を提案することにより、これらの制限を緩和しようとする。以上の結果から,ASRが重要な単語を誤って書き起こした場合や,単語誤り率の高い書き起こしを発生させた場合に,検索性能が向上する可能性が示唆された。

Speech-based open-domain question answering (QA over a large corpus of text passages with spoken questions) has emerged as an important task due to the increasing number of users interacting with QA systems via speech interfaces. Passage retrieval is a key task in speech-based open-domain QA. So far, previous works adopted pipelines consisting of an automatic speech recognition (ASR) model that transcribes the spoken question before feeding it to a dense text retriever. Such pipelines have several limitations. The need for an ASR model limits the applicability to low-resource languages and specialized domains with no annotated speech data. Furthermore, the ASR model propagates its errors to the retriever. In this work, we try to alleviate these limitations by proposing an ASR-free, end-to-end trained multimodal dense retriever that can work directly on spoken questions. Our experimental results showed that, on shorter questions, our retriever is a promising alternative to the \textit{ASR and Retriever} pipeline, achieving better retrieval performance in cases where ASR would have mistranscribed important words in the question or have produced a transcription with a high word error rate.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# 「弁護士は男性である」:LLMによるヒンディー語生成におけるインプシットジェンダーバイアスの検討

'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi Language Generation by LLMs ( http://arxiv.org/abs/2409.13484v1 )

ライセンス: Link先を確認

Ishika Joshi, Ishita Gupta, Adrita Dey, Tapan Parikh,

(参考訳) 大きな言語モデル(LLM)は、翻訳、顧客サポート、教育などのタスクのために、様々な言語でテキストを生成するためにますます使われています。これらの進歩にもかかわらず、LLMは英語で顕著なジェンダーバイアスを示しており、ヒンディー語のような比較的表現の浅い言語でコンテンツを生成する際にさらに顕著になる。本研究はヒンディー語のテキスト生成における性差の暗黙的偏見を調査し,それを英語のそれと比較する。我々はWinoBiasにインスパイアされたHindiデータセットを開発し、GPT-4oやClaude-3 sonnetといったモデルからの応答のステレオタイプパターンを調べた。その結果、ヒンディー語では87.8%、英語のGPT-4o世代では33.4%、ヒンディー語では職業、権力階層、社会階級といったジェンダーステレオタイプが多かった。この研究は、言語間での性別バイアスの変化を強調し、生成的AIシステムにおいてこれらのバイアスをナビゲートするための考察を提供する。

Large Language Models (LLMs) are increasingly being used to generate text across various languages, for tasks such as translation, customer support, and education. Despite these advancements, LLMs show notable gender biases in English, which become even more pronounced when generating content in relatively underrepresented languages like Hindi. This study explores implicit gender biases in Hindi text generation and compares them to those in English. We developed Hindi datasets inspired by WinoBias to examine stereotypical patterns in responses from models like GPT-4o and Claude-3 sonnet. Our results reveal a significant gender bias of 87.8% in Hindi, compared to 33.4% in English GPT-4o generation, with Hindi responses frequently relying on gender stereotypes related to occupations, power hierarchies, and social class. This research underscores the variation in gender biases across languages and provides considerations for navigating these biases in generative AI systems.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# 大規模言語モデルにおけるミンド理論の強化のための制約付き推論チェイン

Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models ( http://arxiv.org/abs/2409.13490v1 )

ライセンス: Link先を確認

Zizheng Lin, Chunkit Chan, Yangqiu Song, Xin Liu,

(参考訳) LLM(Large Language Models)が持つ理論-of-Mind(ToM)能力は制限されている。 LLMにおけるToMの改善手法の多くはゼロショットプロンプトを採用しており、複雑なToM推論タスクのパフォーマンスの低下や、非ナラティブコンテキストを扱うことができないといった問題に直面している。本稿では、ドメイン知識とToM次元間の因果関係を利用してこれらの制約に対処する、制約付きチェーン・オブ・ToM(CCoToM)というゼロショットプロンプト手法を提案する。具体的には、CCoToM は LLM に対して、まず LLM に関連する ToM 次元(例えば、信念)を推論するように促すことにより、明示的な推論連鎖を構築するよう誘導する。その後、CCoToMは、生成されたToM次元とそれに対応する因果関係に基づいて、問い合わせされたToM次元を推測するようにLCMに促す。さらに、CCoToMはインダクティブバイアスを導入し、ToM次元間の一貫性を改善するプロンプトに適応的に制約を課す。物語の他に、CCoToMは会話のような物語的でないコンテキストも扱える。大規模な実験により、CCoToMはすべてのLLMとデータセットに対して、従来の最先端の手法をはるかに上回っていることが示されている。また,CCoToMについてより深い知見を得るため,詳細な分析を行う。コードを公開しました。

Theory-of-Mind (ToM) ability possessed by Large Language Models (LLMs) has been shown to be limited. Most existing methods for improving ToM in LLMs adopt zero-shot prompting, and they face challenges including poor performance in complex ToM reasoning tasks and an inability to handle non-narrative contexts. We propose a zero-shot prompting method named Constrained Chain-of-ToM (CCoToM) that leverages domain knowledge and the causal relations between ToM dimensions to address these limitations. Specifically, CCoToM guides LLMs to construct explicit reasoning chains by first prompting LLMs to infer related ToM dimensions (e.g., belief). Afterward, CCoToM prompts LLMs to infer the queried ToM dimension based on the generated related ToM dimensions and corresponding causal relations. Additionally, CCoToM adaptively imposes constraints on prompts to introduce inductive biases and improve consistency between ToM dimensions. Besides narratives, CCoToM can also handle non-narrative contexts like conversations. Extensive experiments show that CCoToM consistently outperforms previous state-of-the-art methods by large margins across all LLMs and datasets used. We also conduct in-depth analyses to gain deeper insights into CCoToM. We have made our code publicly available.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# DAP-LED:CLIPによる低照度化と劣化の学習

DAP-LED: Learning Degradation-Aware Priors with CLIP for Joint Low-light Enhancement and Deblurring ( http://arxiv.org/abs/2409.13496v1 )

ライセンス: Link先を確認

Ling Wang, Chen Wu, Lin Wang,

(参考訳) 自律走行車やロボットは、RGBカメラの長時間露光による照度と動きのぼかしが低いため、夜間に信頼できる視覚に苦しむことが多い。既存の手法はこの課題に対処し、既訓練の低照度エンハンスメントとデブロアリングモデルを順次接続する。残念なことに、これらの手法は、過剰に露光された領域における顕著な人工物(色歪み)を引き起こすか、暗黒領域の運動キューをほとんど学ばないようにする。本稿では,視覚言語モデルであるCLIP(Contrastive Language- Image Pretraining)が,夜間における多様な劣化レベルを包括的に知覚できることを示す。そこで本研究では,低照度化と劣化を共同で実現し,深度推定,セグメンテーション,暗黒領域の検出といった下流作業に役立てる,トランスフォーマーを用いた新しい共同学習フレームワーク DAP-LED を提案する。重要な洞察は、CLIPを活用して、夜間に画像から劣化レベルを適応的に学習することだ。これにより、統合タスクの最適化のためのリッチな意味情報と視覚的表現を学習することができる。これを実現するために、まずCLIP誘導クロスフュージョンモジュールを導入し、画像埋め込みからマルチスケールのパッチワイズ分解ヒートマップを得る。熱マップは設計したCLIP拡張変換器ブロックを介して融合され、効率的なモデル最適化のための有用な劣化情報を保持する。実験の結果,既存の手法と比較して,DAP-LEDは暗黒環境での最先端性能を実現していることがわかった。一方、強化された結果は3つの下流タスクに有効であることが示されている。デモやその他の結果については、プロジェクトページを参照してほしい。

Autonomous vehicles and robots often struggle with reliable visual perception at night due to the low illumination and motion blur caused by the long exposure time of RGB cameras. Existing methods address this challenge by sequentially connecting the off-the-shelf pretrained low-light enhancement and deblurring models. Unfortunately, these methods often lead to noticeable artifacts (\eg, color distortions) in the over-exposed regions or make it hardly possible to learn the motion cues of the dark regions. In this paper, we interestingly find vision-language models, \eg, Contrastive Language-Image Pretraining (CLIP), can comprehensively perceive diverse degradation levels at night. In light of this, we propose a novel transformer-based joint learning framework, named DAP-LED, which can jointly achieve low-light enhancement and deblurring, benefiting downstream tasks, such as depth estimation, segmentation, and detection in the dark. The key insight is to leverage CLIP to adaptively learn the degradation levels from images at night. This subtly enables learning rich semantic information and visual representation for optimization of the joint tasks. To achieve this, we first introduce a CLIP-guided cross-fusion module to obtain multi-scale patch-wise degradation heatmaps from the image embeddings. Then, the heatmaps are fused via the designed CLIP-enhanced transformer blocks to retain useful degradation information for effective model optimization. Experimental results show that, compared to existing methods, our DAP-LED achieves state-of-the-art performance in the dark. Meanwhile, the enhanced results are demonstrated to be effective for three downstream tasks. For demo and more results, please check the project page: \url{https://vlislab22.github.io/dap-led/}.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# ハイパースペクトルイメージングによる画素レベルの物質分類のための深層学習手法

A Deep Learning Approach for Pixel-level Material Classification via Hyperspectral Imaging ( http://arxiv.org/abs/2409.13498v1 )

ライセンス: Link先を確認

Savvas Sifnaios, George Arvanitakis, Fotios K. Konstantinidis, Georgios Tsimiklis, Angelos Amditis, Panayiotis Frangos,

(参考訳) コンピュータビジョンの最近の進歩、特に検出、セグメンテーション、分類は、様々な領域に大きな影響を与えている。しかし、これらの進歩はRGBベースのシステムと結びついており、廃棄物の選別、医薬品、防衛といった産業において、形状や色を超えた高度な物体のキャラクタリゼーションが必要とされるには不十分である。ハイパースペクトル(HS)イメージングは、スペクトル情報と空間情報の両方を撮像し、これらの制限に対処し、特に速度、コスト、安全性の点で、X線蛍光やラマン分光のような従来の技術よりも有利である。本研究では,HSイメージングと深層学習を併用した材料評価の可能性について検討した。研究は以下のとおりである。一 HSカメラ、コンベア及び制御照明を備えた実験装置を設計すること。二半自動マスク生成及びラマン分光法によるラベル付けによる各種プラスチック(HDPE、PET、PP、PS)の多目的データセットの作成三画素レベルの物質分類のためのHS画像に基づいて訓練された深層学習モデルを開発すること。このモデルは99.94\%の分類精度を達成し、色、サイズ、形状のばらつきの堅牢性を証明し、材料重なりを効果的に処理した。ブラックオブジェクトの課題のような制限も議論されている。 RGBからHSイメージングへのコンピュータビジョンの拡張は実現可能であり、従来の手法の大きな制限を克服し、将来的な応用の可能性を示している。

Recent advancements in computer vision, particularly in detection, segmentation, and classification, have significantly impacted various domains. However, these advancements are tied to RGB-based systems, which are insufficient for applications in industries like waste sorting, pharmaceuticals, and defense, where advanced object characterization beyond shape or color is necessary. Hyperspectral (HS) imaging, capturing both spectral and spatial information, addresses these limitations and offers advantages over conventional technologies such as X-ray fluorescence and Raman spectroscopy, particularly in terms of speed, cost, and safety. This study evaluates the potential of combining HS imaging with deep learning for material characterization. The research involves: i) designing an experimental setup with HS camera, conveyor, and controlled lighting; ii) generating a multi-object dataset of various plastics (HDPE, PET, PP, PS) with semi-automated mask generation and Raman spectroscopy-based labeling; and iii) developing a deep learning model trained on HS images for pixel-level material classification. The model achieved 99.94\% classification accuracy, demonstrating robustness in color, size, and shape invariance, and effectively handling material overlap. Limitations, such as challenges with black objects, are also discussed. Extending computer vision beyond RGB to HS imaging proves feasible, overcoming major limitations of traditional methods and showing strong potential for future applications.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# HUT: Adamard Updated Transformationによるより効率的なファインチューニング手法

HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation ( http://arxiv.org/abs/2409.13501v1 )

ライセンス: Link先を確認

Geyuan Zhang, Xiaofei Zhou, Chuheng Chen,

(参考訳) 下流タスクのための微調整済み言語モデルが、NLPで素晴らしい成果を上げている。しかし,モデルパラメータが急速に大きくなるため,パラメータの微調整は不可能となる。これを解決するために、パラメータ効率の良いファインチューニング(PEFT)メソッドはパラメータのサブセットだけを更新する。 LoRAのようなほとんどのPEFTメソッドは、元のパラメータに学習された重み行列の増分を含むインクリメンタルアップデートを使用する。有効ではあるが、これらの手法は複雑なパラメータのダイナミックスをキャプチャする際の制限に直面し、元のパラメータと更新されたパラメータの間に強い相関は保たない。これらの課題を克服するために,元のパラメータから更新パラメータへの変換を直接構成する直接更新変換(UT)パラダイムを提案する。このアプローチにより、元のパラメータと更新されたパラメータの相関が保存されることが保証され、事前トレーニング中に学んだ意味的特徴が活用される。このパラダイムに基づいて,Hadamard Updated Transformation (HUT) 法を提案する。 HUTは、2つの低ランク行列でアダマール変換を用いて元の重量行列を効率的に更新し、より表現力が高く柔軟な更新機構を提供する。これによりHUTは、関数変換によってよりリッチなパラメータ機能をキャプチャし、モデル品質を維持したり改善したりしながら、計算の複雑さを低減できる。 RoBERTaおよびGPT-2に関する理論的解析と広範な実験により、HUTの有効性が検証された。その結果,HUTはモデル品質の観点から他のPEFT法と同等以上の性能を示し,計算複雑性を著しく低減した。

Fine-tuning pre-trained language models for downstream tasks has achieved impressive results in NLP. However, fine-tuning all parameters becomes impractical due to the rapidly increasing size of model parameters. To address this, Parameter Efficient Fine-Tuning (PEFT) methods update only a subset of parameters. Most PEFT methods, such as LoRA, use incremental updates, which involve adding learned weight matrix increments to the original parameters. Although effective, these methods face limitations in capturing complex parameter dynamics and do not maintain a strong correlation between the original and updated parameters. To overcome these challenges, we propose the direct Updated Transformation (UT) paradigm, which constructs a transformation directly from the original to the updated parameters. This approach ensures that the correlation between the original and updated parameters is preserved, leveraging the semantic features learned during pre-training. Building on this paradigm, we present the Hadamard Updated Transformation (HUT) method. HUT efficiently updates the original weight matrix using the Hadamard transformation with two low-rank matrices, offering a more expressive and flexible update mechanism. This allows HUT to capture richer parameter features through functional transformations, reducing computational complexity while maintaining or improving model quality. Theoretical analysis and extensive experiments on RoBERTa and GPT-2 validate the effectiveness of HUT. Results show that HUT performs on par with or better than other PEFT methods in terms of model quality, while significantly reducing computational complexity.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# 音声によるスケッチ:「非現実的」な音のレンダリング

Sketching With Your Voice: "Non-Phonorealistic" Rendering of Sounds via Vocal Imitation ( http://arxiv.org/abs/2409.13507v1 )

ライセンス: Link先を確認

Matthew Caren, Kartik Chandra, Joshua B. Tenenbaum, Jonathan Ragan-Kelley, Karima Ma,

(参考訳) 本研究では,人間の声の模倣を自動生成する手法を提案する。まず、人間の声道の模擬モデルから、まずモデルの制御パラメータを調整して声道模倣を試み、その合成音声を聴覚的特徴の観点から対象音と一致させる。そして,人間の直感に合うようにコミュニケーションの認知理論を適用し,人間の話者が聴取者に対して戦略的に判断する方法について考察する。最後に,本手法にこのようなコミュニケーション的推論を加えると,聴覚的特徴のみに適合するよりも人間の直感に適合することを示す実験とユーザスタディについて述べる。この観察はコンピュータグラフィックスにおける描写の研究に幅広い意味を持っている。

We present a method for automatically producing human-like vocal imitations of sounds: the equivalent of "sketching," but for auditory rather than visual representation. Starting with a simulated model of the human vocal tract, we first try generating vocal imitations by tuning the model's control parameters to make the synthesized vocalization match the target sound in terms of perceptually-salient auditory features. Then, to better match human intuitions, we apply a cognitive theory of communication to take into account how human speakers reason strategically about their listeners. Finally, we show through several experiments and user studies that when we add this type of communicative reasoning to our method, it aligns with human intuitions better than matching auditory features alone does. This observation has broad implications for the study of depiction in computer graphics.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# 正規化変分量子イマジナリー時間進化によるシュウィンガーモデルのシミュレーション

Simulating the Schwinger Model with a Regularized Variational Quantum Imaginary Time Evolution ( http://arxiv.org/abs/2409.13510v1 )

ライセンス: Link先を確認

Xiao-Wei Li, Fei Li, Jiapei Zhuang, Man-Hong Yung,

(参考訳) シュウィンガーモデル(Schwinger model)は量子色力学(QCD)における非摂動アルゴリズムのテストのベンチマークとして機能し、強い結合状態におけるQCDとの類似性を強調している。しかし、古典的アルゴリズムは「符号問題」や大規模システム処理の難しさなど、シュウィンガーモデルをシミュレートする際の課題に直面する。これらの制限は、障害を克服するために量子コンピューティング技術を含む代替シミュレーションアプローチの探索を動機付けている。シュウィンガーモデルをシミュレートする既存の変分量子アルゴリズム(VQA)は、主に数学的勾配に基づく最適化に依存しており、直感的かつ物理的に誘導された最適化経路を提供しないこともある。対照的に、変分量子イマジナリー時間進化法(VQITE)は、物理的に着想を得た最適化手法を提供する。したがって、VQITEはSchwingerモデルをシミュレートするための強力なツールである。しかし, 標準VQITE法は, 非可逆行列問題に悩まされるため, 十分に安定ではない。この問題に対処するため,我々は正規化VQITE法(regularized-VQITE (rVQITE)) と呼ばれるVQITEの正規化バージョンを提案した。数値シミュレーションにより,提案手法は性能が向上し,他の手法と比較して収束が速いことを示す。我々は、シュウィンガーモデルにおいて様々な物理観測値の位相図をシミュレートするためにrVQITE法を用い、その結果の位相境界は正確な計算手法から得られるものと一致している。

The Schwinger model serves as a benchmark for testing non-perturbative algorithms in quantum chromodynamics (QCD), emphasizing its similarities to QCD in strong coupling regimes, primarily due to the phenomena such as confinement and charge screening. However, classical algorithms encounter challenges when simulating the Schwinger model, such as the "sign problem" and the difficulty in handling large-scale systems. These limitations motivate the exploration of alternative simulation approaches, including quantum computing techniques, to overcome the obstacles. While existing variational quantum algorithms (VQAs) methods for simulating the Schwinger model primarily rely on mathematical gradient-based optimization, which sometimes fail to provide intuitive and physically-guided optimization pathways. In contrast, the Variational Quantum Imaginary Time Evolution (VQITE) method offers a physically-inspired optimization approach. Therefore, we introduce that VQITE holds promise as a potent tool for simulating the Schwinger model. However, the standard VQITE method is not sufficiently stable, as it encounters difficulties with the non-invertible matrix problem. To address this issue, we have proposed a regularized version of the VQITE, which we have named the Regularized-VQITE (rVQITE) method, as it incorporates a truncation-based approach. Through numerical simulations, we demonstrate that our proposed rVQITE approach achieves better performance and exhibits faster convergence compared to other related techniques. We employ the rVQITE method to simulate the phase diagrams of various physical observables in the Schwinger model, and the resulting phase boundaries are in agreement with those obtained from an exact computational approach.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# ユニバーサル画像検索のための効率的・識別的特徴抽出

Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval ( http://arxiv.org/abs/2409.13513v1 )

ライセンス: Link先を確認

Morris Florek, David Tschirschwitz, Björn Barz, Volker Rodehorst,

(参考訳) 現在の画像検索システムはドメインの特異性や一般化の問題に直面することが多い。本研究の目的は、様々な領域にまたがる強力な意味的イメージ表現を提供する普遍的特徴抽出器のための、計算効率の良いトレーニングフレームワークを開発することにより、これらの制限を克服することである。この目的のために、リソース効率のトレーニングを可能にするM4D-35kと呼ばれるマルチドメイントレーニングデータセットをキュレートしました。さらに、効率的な普遍的特徴抽出に適合するかどうかについて、最先端のビジュアルセマンティック基礎モデルとマージンに基づく距離学習損失関数の広範な評価と比較を行う。制約のある計算資源にもかかわらず、Google Universal Image Embedding Challengeにおいて、mMP@5の0.721で最先端の成果を達成している。これにより、ベストパフォーマンスメソッドのわずか0.7ポイントのリードボードに、私たちのメソッドを第2位に配置します。しかし、我々のモデルは、全体的なパラメータが32%少なく、トレーニング可能なパラメータが289倍少ない。類似の計算条件を持つ手法と比較して,従来の最先端の手法よりも3.3パーセント高い性能を示した。私たちはコードとM4D-35kのトレーニングセットアノテーションをhttps://github.com/morrisfl/UniFExでリリースしています。

Current image retrieval systems often face domain specificity and generalization issues. This study aims to overcome these limitations by developing a computationally efficient training framework for a universal feature extractor that provides strong semantic image representations across various domains. To this end, we curated a multi-domain training dataset, called M4D-35k, which allows for resource-efficient training. Additionally, we conduct an extensive evaluation and comparison of various state-of-the-art visual-semantic foundation models and margin-based metric learning loss functions regarding their suitability for efficient universal feature extraction. Despite constrained computational resources, we achieve near state-of-the-art results on the Google Universal Image Embedding Challenge, with a mMP@5 of 0.721. This places our method at the second rank on the leaderboard, just 0.7 percentage points behind the best performing method. However, our model has 32% fewer overall parameters and 289 times fewer trainable parameters. Compared to methods with similar computational requirements, we outperform the previous state of the art by 3.3 percentage points. We release our code and M4D-35k training set annotations at https://github.com/morrisfl/UniFEx.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# Transducer-based ASRのためのAho-Corasickアルゴリズムを用いたLM支援キーワードバイアス

LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR ( http://arxiv.org/abs/2409.13514v1 )

ライセンス: Link先を確認

Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Andres Carofilis, Shashi Kumar, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju,

(参考訳) 近年の音声認識におけるエンドツーエンドモデルの成功にもかかわらず、特殊で語彙外な単語の認識や、テキストによる高速なドメイン適応は依然として困難である。特別なエンティティへのバイアスが全体的なパフォーマンスの低下につながることはよくあることです。単語レベルn-gram言語モデルとAho-Corasick文字列マッチングアルゴリズムに基づく浅層融合アプローチを組み合わせ,名前付きエンティティのバイアスリストを組み合わせることで,音声認識性能を向上させるためのライトオンザフライ方式を提案する。 Aho-Corasickアルゴリズムは他の手法よりも効率的であることが証明され、高速な文脈適応が可能となった。 n-gram言語モデル(n-gram language model)は、失敗と出力のアークを持つグラフとして導入され、アーク重みはn-gram確率から適応される。言語モデルは、言語モデルと1つのコンテキストグラフのバイアスエンティティを組み合わせることで、全体的なパフォーマンスを気にするときに、キーワードバイアスの追加サポートとして使用される。我々は、名前付きエンティティや語彙外エンティティのパフォーマンスを含む、4つの言語、2つのパブリック、および1つのプライベートデータセットに関する知見を実証した。逆実時間係数の実用的差のない一般単語誤り率の21.6%の相対的な改善を実現した。

Despite the recent success of end-to-end models for automatic speech recognition, recognizing special rare and out-of-vocabulary words, as well as fast domain adaptation with text, are still challenging. It often happens that biasing to the special entities leads to a degradation in the overall performance. We propose a light on-the-fly method to improve automatic speech recognition performance by combining a bias list of named entities with a word-level n-gram language model with the shallow fusion approach based on the Aho-Corasick string matching algorithm. The Aho-Corasick algorithm has proved to be more efficient than other methods and allows fast context adaptation. An n-gram language model is introduced as a graph with fail and output arcs, where the arc weights are adapted from the n-gram probabilities. The language model is used as an additional support to keyword biasing when the language model is combined with bias entities in a single context graph to take care of the overall performance. We demonstrate our findings on 4 languages, 2 public and 1 private datasets including performance on named entities and out-of-vocabulary entities. We achieve up to 21.6% relative improvement in the general word error rate with no practical difference in the inverse real-time factor.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# 人工衛星と地上の量子ネットワークのための効率的な絡み合いルーティング

Efficient Entanglement Routing for Satellite-Aerial-Terrestrial Quantum Networks ( http://arxiv.org/abs/2409.13517v1 )

ライセンス: Link先を確認

Yu Zhang, Yanmin Gong, Lei Fan, Yu Wang, Zhu Han, Yuanxiong Guo,

(参考訳) 6G以降の時代には、宇宙と地上の量子ネットワーク(SATQN)が、グローバルスケールの量子インターネットの未来を形作っている。本稿では, 衛星, 空中, 地上の量子ネットワーク間の協調関係について検討し, 長距離での高忠実な量子絡み合いを効率よく伝達する。まず、既存の衛星、空中、地上の量子ネットワークの概要を概観する。その後、経路選択と絡み合い発生率(PS-EGR)を共同で最適化することにより、量子ネットワークスループットを最大化する目的で、絡み合いルーティング問題に対処する。元の問題は、本質的に難解な混合整数線形プログラミング(MILP)問題として定式化されていることを考慮し、この問題を効率的に解くためにベンダー分解法(BD)ベースのアルゴリズムを提案する。数値計算により,PS-EGR方式の有効性が検証され,システム内の様々な最適化可能な要因について貴重な知見が得られた。最後に, SATQN における今後の研究に向けて, 今後の課題について検討し, 今後の課題を提示する。

In the era of 6G and beyond, space-aerial-terrestrial quantum networks (SATQNs) are shaping the future of the global-scale quantum Internet. This paper investigates the collaboration among satellite, aerial, and terrestrial quantum networks to efficiently transmit high-fidelity quantum entanglements over long distances. We begin with a comprehensive overview of existing satellite-, aerial-, and terrestrial-based quantum networks. Subsequently, we address the entanglement routing problem with the objective of maximizing quantum network throughput by jointly optimizing path selection and entanglement generation rates (PS-EGR). Given that the original problem is formulated as a mixed-integer linear programming (MILP) problem, which is inherently intractable, we propose a Benders' decomposition (BD)-based algorithm to solve the problem efficiently. Numerical results validate the effectiveness of the proposed PS-EGR scheme, offering valuable insights into various optimizable factors within the system. Finally, we discuss the current challenges and propose promising avenues for future research in SATQNs.

翻訳日:2024-11-07 06:53:09 公開日:2024-09-20

# モラル基礎理論と事前学習言語モデル:現状と課題

A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges ( http://arxiv.org/abs/2409.13521v1 )

ライセンス: Link先を確認

Lorenzo Zangari, Candida M. Greco, Davide Picca, Andrea Tagarelli,

(参考訳) 道徳的価値は初期の文明に深く根ざし、社会秩序と共通の善を規制する規範や法則の中で成文化された。人間の行動と文化的指向の心理的基盤を理解する上で重要な役割を担っている。モラル・ファンデーション理論(MFT)は、異なる文化が個人や社会生活を形作る方法の基礎となる道徳的基盤を識別する確立した枠組みである。自然言語処理,特にプレトレーニング言語モデル(PLM)の最近の進歩は,テキストデータから道徳的次元の抽出と分析を可能にしている。本調査では, MFT インフォームド PLM の総合的なレビューを行い, PLM の道徳的傾向とその MFT の文脈における応用について分析した。また、関連するデータセットやレキシコンをレビューし、トレンド、制限、今後の方向性について議論する。 PLMとMFTの交差点の構造的な概要を提供することにより、この研究はPLMの領域内の道徳心理学的洞察を橋渡しし、道徳的に意識されたAIシステムを構築するためのさらなる研究と開発の道を開く。

Moral values have deep roots in early civilizations, codified within norms and laws that regulated societal order and the common good. They play a crucial role in understanding the psychological basis of human behavior and cultural orientation. The Moral Foundation Theory (MFT) is a well-established framework that identifies the core moral foundations underlying the manner in which different cultures shape individual and social lives. Recent advancements in natural language processing, particularly Pre-trained Language Models (PLMs), have enabled the extraction and analysis of moral dimensions from textual data. This survey presents a comprehensive review of MFT-informed PLMs, providing an analysis of moral tendencies in PLMs and their application in the context of the MFT. We also review relevant datasets and lexicons and discuss trends, limitations, and future directions. By providing a structured overview of the intersection between PLMs and MFT, this work bridges moral psychology insights within the realm of PLMs, paving the way for further research and development in creating morally aware AI systems.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# EMMeTT:効率的なマルチモーダル機械翻訳訓練

EMMeTT: Efficient Multimodal Machine Translation Training ( http://arxiv.org/abs/2409.13523v1 )

ライセンス: Link先を確認

Piotr Żelasko, Zhehuai Chen, Mengru Wang, Daniel Galvez, Oleksii Hrinchuk, Shuoyang Ding, Ke Hu, Jagadeesh Balam, Vitaly Lavrukhin, Boris Ginsburg,

(参考訳) 基礎言語モデルのモダリティ拡張に対する関心の高まりは、最も効果的で効率的なマルチモーダルトレーニングアプローチに関する議論を保証している。本研究は、ニューラルマシン翻訳(NMT)に焦点を当て、自動音声翻訳(AST)を含む音声-LLMの共同マルチモーダルトレーニングシステムを提案する。本稿では,Canary-1Bの音声エンコーダで拡張されたデコーダのみのGPTとエンコーダ・デコーダT5の2つの基盤モデルアーキテクチャについて検討する。共同マルチモーダルトレーニングを扱うために,EMMeTTと呼ばれる新しいトレーニングフレームワークを提案する。 EMMeTTは、言語、データセット、モダリティ間のバランスの取れたサンプリング、効率的なシーケンシャルなデータイテレーション、バッチサイズオプティマイザ(OOMptimizer)によって補完されるマルチモーダルデータのための新しい2Dバケットスキームによって、トレーニング効率を向上させる。マルチモーダルなトレーニングは、両方のアーキテクチャに一貫して役立ちます。さらに、EMMeTTで訓練されたSALM-T5は、オリジナルのNMT能力を保ちながら、FLORESとFLEURSの4言語サブセット上でASTベースラインを上回っている。結果、多モーダル翻訳モデルでは、強いテキストと音声の翻訳結果を同時に生成する。

A rising interest in the modality extension of foundation language models warrants discussion on the most effective, and efficient, multimodal training approach. This work focuses on neural machine translation (NMT) and proposes a joint multimodal training regime of Speech-LLM to include automatic speech translation (AST). We investigate two different foundation model architectures, decoder-only GPT and encoder-decoder T5, extended with Canary-1B's speech encoder. To handle joint multimodal training, we propose a novel training framework called EMMeTT. EMMeTT improves training efficiency with the following: balanced sampling across languages, datasets, and modalities; efficient sequential data iteration; and a novel 2D bucketing scheme for multimodal data, complemented by a batch size optimizer (OOMptimizer). We show that a multimodal training consistently helps with both architectures. Moreover, SALM-T5 trained with EMMeTT retains the original NMT capability while outperforming AST baselines on four-language subsets of FLORES and FLEURS. The resultant Multimodal Translation Model produces strong text and speech translation results at the same time.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# サイバー防衛のためのコンテキストAI - LLMを用いた自動調査

Contextualized AI for Cyber Defense: An Automated Survey using LLMs ( http://arxiv.org/abs/2409.13524v1 )

ライセンス: Link先を確認

Christoforus Yoga Haryanto, Anne Maria Elvira, Trung Duc Nguyen, Minh Hieu Vu, Yoshiano Hartanto, Emily Lomempow, Arathi Arakala,

(参考訳) 本稿では,2015年から2024年にかけてのサイバー防衛能力向上におけるコンテキストAIの可能性について調査する。私たちは、組織的信頼とガバナンスフレームワークのギャップを指摘しながら、堅牢性、信頼性、統合方法に重点を置いています。文献調査手法として, (A) ChatGPT 4 と (B) Gemma 2:9b を用いた。学術研究にLLMを使うことの有効性と課題について論じ,今後の研究者に洞察を提供する。

This paper surveys the potential of contextualized AI in enhancing cyber defense capabilities, revealing significant research growth from 2015 to 2024. We identify a focus on robustness, reliability, and integration methods, while noting gaps in organizational trust and governance frameworks. Our study employs two LLM-assisted literature survey methodologies: (A) ChatGPT 4 for exploration, and (B) Gemma 2:9b for filtering with Claude 3.5 Sonnet for full-text analysis. We discuss the effectiveness and challenges of using LLMs in academic research, providing insights for future researchers.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# 時系列基礎モデルに向けて

Towards Long-Context Time Series Foundation Models ( http://arxiv.org/abs/2409.13530v1 )

ライセンス: Link先を確認

Nina Żukowska, Mononito Goswami, Michał Wiliński, Willa Potosnak, Artur Dubrawski,

(参考訳) 時系列基礎モデルは、ゼロショットの設定であっても、幅広い領域にわたる様々なタスクにおいて印象的なパフォーマンスを示している。しかし、これらのモデルのほとんどは短い単変量時系列を入力として扱うように設計されている。これは、特に、時間的および変数内依存関係の強い長い多変量データを扱う医療のような分野において、実用的使用を制限する。本研究は,言語ドメインと時系列ドメインの両方から,様々なコンテキスト拡張手法をカタログ化し,体系的に比較し,エンコーダのみのTSFMが変数間の依存性を効果的にモデル化できるようにするための,新しい圧縮メモリ機構を導入することで,このギャップを埋めるものである。我々は,近年のマルチタスク時系列基盤モデルであるMOMENTを多変量文脈で導入することで,このアプローチの利点を実証する。

Time series foundation models have shown impressive performance on a variety of tasks, across a wide range of domains, even in zero-shot settings. However, most of these models are designed to handle short univariate time series as an input. This limits their practical use, especially in domains such as healthcare with copious amounts of long and multivariate data with strong temporal and intra-variate dependencies. Our study bridges this gap by cataloging and systematically comparing various context expansion techniques from both language and time series domains, and introducing a novel compressive memory mechanism to allow encoder-only TSFMs to effectively model intra-variate dependencies. We demonstrate the benefits of our approach by imbuing MOMENT, a recent family of multi-task time series foundation models, with the multivariate context.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# ロボットがどう動くか予測する高レベルパターン

Using High-Level Patterns to Estimate How Humans Predict a Robot will Behave ( http://arxiv.org/abs/2409.13533v1 )

ライセンス: Link先を確認

Sagar Parekh, Lauren Bramblett, Nicola Bezzo, Dylan P. Losey,

(参考訳) ロボットと対話する人間は、ロボットが次に何をするかを予測する。例えば、最近の自動運転車の行動に基づいて、近くの人間のドライバーは、車が同じ車線に留まっていると予測するかもしれない。ロボットが人間の安全でシームレスな相互作用の予測を理解することは重要である。例えば、自動運転車が人間をマージしていないと認識しているなら、自動運転車は実際にマージを意図している。従来の研究は、人間がロボットの振る舞いを正確に予測していると仮定していた。しかし、人間と人間の予測に関する最近の研究は、人間は高いレベルの振る舞いを予測することによって、他のエージェントを近似する傾向があることを示唆している。この発見を,ロボットが人間の行動を予測する方法を推定する2階のマインド・アプローチの開発に応用する。データから直接これらの高いレベルの予測を抽出するために、最近の人間とロボットの軌道を離散的な潜在空間に埋め込む。この潜伏空間の各要素は、異なる種類の振舞い(例えば、人間の前にマージし、同じ車線に残る)を捉え、下層の振舞いと整合した状態空間のベクトル場にデコードする。ロボット行動の高レベルおよびコース予測は実際の人間の予測と一致すると仮定する。本稿では,この仮説を支持するための最初の証拠を概念実証ユーザスタディを通じて提示する。

A human interacting with a robot often forms predictions of what the robot will do next. For instance, based on the recent behavior of an autonomous car, a nearby human driver might predict that the car is going to remain in the same lane. It is important for the robot to understand the human's prediction for safe and seamless interaction: e.g., if the autonomous car knows the human thinks it is not merging -- but the autonomous car actually intends to merge -- then the car can adjust its behavior to prevent an accident. Prior works typically assume that humans make precise predictions of robot behavior. However, recent research on human-human prediction suggests the opposite: humans tend to approximate other agents by predicting their high-level behaviors. We apply this finding to develop a second-order theory of mind approach that enables robots to estimate how humans predict they will behave. To extract these high-level predictions directly from data, we embed the recent human and robot trajectories into a discrete latent space. Each element of this latent space captures a different type of behavior (e.g., merging in front of the human, remaining in the same lane) and decodes into a vector field across the state space that is consistent with the underlying behavior type. We hypothesize that our resulting high-level and course predictions of robot behavior will correspond to actual human predictions. We provide initial evidence in support of this hypothesis through a proof-of-concept user study.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# フォーミュラ・スーパービジョンによる視覚幾何学的事前学習

Formula-Supervised Visual-Geometric Pre-training ( http://arxiv.org/abs/2409.13535v1 )

ライセンス: Link先を確認

Ryosuke Yamada, Kensho Hara, Hirokatsu Kataoka, Koshi Makihara, Nakamasa Inoue, Rio Yokota, Yutaka Satoh,

(参考訳) コンピュータビジョンの歴史を通じて、画像(視覚)と点雲(幾何学)の統合を研究してきたが、画像と3Dオブジェクト認識の進歩は、これらのモダリティを別々に処理する傾向にある。我々は、この分割を統一トランスモデル上に画像と点雲を統合することで橋渡しすることを目指している。このアプローチは画像と点雲のモジュラリティ固有の特性を統合し、画像における基本的な下流タスクと、視覚幾何学的表現を学習することで、統一トランスフォーマーモデル上での3次元オブジェクト認識を実現する。本研究では,FSVGP (Fulall-Supervised Visual-Geometric Pre-training) について述べる。相互モダリティの監督を通じて,視覚的モダリティと幾何学的モダリティの間の教師付き事前学習を可能にする。 FSVGPはまた、実際のデータ収集、モダリティ間のアライメント、人間のアノテーションへの依存を減らす。実験の結果,FSVGPは画像と3Dオブジェクトの分類,検出,セグメンテーションの6つのタスクで,VisualAtomやPC-FractalDBよりも効果的に事前トレーニングを行うことがわかった。これらの成果は、画像および3次元物体認識におけるFSVGPの優れた一般化を示し、視覚幾何学的表現学習における合成事前学習の可能性を強調している。プロジェクトのWebサイトはhttps://ryosuke-yamada.github.io/fdsl-fsvgp/で公開されている。

Throughout the history of computer vision, while research has explored the integration of images (visual) and point clouds (geometric), many advancements in image and 3D object recognition have tended to process these modalities separately. We aim to bridge this divide by integrating images and point clouds on a unified transformer model. This approach integrates the modality-specific properties of images and point clouds and achieves fundamental downstream tasks in image and 3D object recognition on a unified transformer model by learning visual-geometric representations. In this work, we introduce Formula-Supervised Visual-Geometric Pre-training (FSVGP), a novel synthetic pre-training method that automatically generates aligned synthetic images and point clouds from mathematical formulas. Through cross-modality supervision, we enable supervised pre-training between visual and geometric modalities. FSVGP also reduces reliance on real data collection, cross-modality alignment, and human annotation. Our experimental results show that FSVGP pre-trains more effectively than VisualAtom and PC-FractalDB across six tasks: image and 3D object classification, detection, and segmentation. These achievements demonstrate FSVGP's superior generalization in image and 3D object recognition and underscore the potential of synthetic pre-training in visual-geometric representation learning. Our project website is available at https://ryosuke-yamada.github.io/fdsl-fsvgp/.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# ShizishanGPT: ツールとリソースを統合する農業用大規模言語モデル

ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources ( http://arxiv.org/abs/2409.13537v1 )

ライセンス: Link先を確認

Shuting Yang, Zehui Liu, Wolfgang Mayer,

(参考訳) 大規模言語モデル(LLM)の最近の発展は、複雑な問合せを扱う知的対話システムの能力を大幅に向上させた。しかし、現在のLLMは、特に農業のような技術分野において、専門分野の知識に制限を課している。この問題に対処するため,我々は,Retrieval Augmented Generation(RAG)フレームワークとエージェントアーキテクチャに基づく農業用知的質問応答システムであるShizishanGPTを提案する。シジシャンGPTは、一般的な質問に答える汎用的なGPT-4ベースのモジュール、大言語モデルの知識をタイムリーに更新できない問題に補償する検索エンジンモジュール、ドメイン事実を提供する農業知識グラフモジュール、ドメイン知識を補うためにRAGを使用する検索モジュール、作物の表現型予測、遺伝子発現解析などの特殊なモデルを実行する農業エージェントモジュールを含む5つの主要なモジュールから構成されている。本研究に特化して設計された100の農業問題を含むデータセットを用いてシジシャンGPTを評価した。実験の結果,このツールはモジュール設計と異なるドメイン知識ソースの統合により,より正確かつ詳細な回答を提供するため,一般のLLMよりも優れていた。ソースコード、データセット、モデルウェイトはhttps://github.com/Zaiwen/CropGPT.comで公開されています。

Recent developments in large language models (LLMs) have led to significant improvements in intelligent dialogue systems'ability to handle complex inquiries. However, current LLMs still exhibit limitations in specialized domain knowledge, particularly in technical fields such as agriculture. To address this problem, we propose ShizishanGPT, an intelligent question answering system for agriculture based on the Retrieval Augmented Generation (RAG) framework and agent architecture. ShizishanGPT consists of five key modules: including a generic GPT-4 based module for answering general questions; a search engine module that compensates for the problem that the large language model's own knowledge cannot be updated in a timely manner; an agricultural knowledge graph module for providing domain facts; a retrieval module which uses RAG to supplement domain knowledge; and an agricultural agent module, which invokes specialized models for crop phenotype prediction, gene expression analysis, and so on. We evaluated the ShizishanGPT using a dataset containing 100 agricultural questions specially designed for this study. The experimental results show that the tool significantly outperforms general LLMs as it provides more accurate and detailed answers due to its modular design and integration of different domain knowledge sources. Our source code, dataset, and model weights are publicly available at https://github.com/Zaiwen/CropGPT.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# 第2回パーセプションテストチャレンジのマルチ選択ビデオQAトラックへの第1位ソリューション

First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge ( http://arxiv.org/abs/2409.13538v1 )

ライセンス: Link先を確認

Yingzhe Peng, Yixiao Yuan, Zitian Ao, Huapeng Zhou, Kangqi Wang, Qipeng Zhu, Xu Yang,

(参考訳) 本稿では,第2回知覚テストチャレンジの多目的ビデオ質問回答(Multiple-choice Video Question Answering, QA)トラックに対する第1位ソリューションについて述べる。このコンペティションは複雑なビデオ理解の課題を提起し、ビデオコンテンツに関する質問を正確に理解し答えるモデルを必要とした。この課題に対処するために、我々は強力なQwenVL2 (7B)モデルを活用し、提供されたトレーニングセットで微調整しました。さらに、私たちはパフォーマンスを高めるためにモデルアンサンブル戦略とテスト時間拡張を採用しました。連続最適化により,本手法はリーダボード上でのTop-1精度0.7647を達成した。

In this report, we present our first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge. This competition posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content. To address this challenge, we leveraged the powerful QwenVL2 (7B) model and fine-tune it on the provided training set. Additionally, we employed model ensemble strategies and Test Time Augmentation to boost performance. Through continuous optimization, our approach achieved a Top-1 Accuracy of 0.7647 on the leaderboard.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# FullAnno:MLLMの画像理解を強化するデータエンジン

FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs ( http://arxiv.org/abs/2409.13540v1 )

ライセンス: Link先を確認

Jing Hao, Yuxiang Zhao, Song Chen, Yanpeng Sun, Qiang Chen, Gang Zhang, Kun Yao, Errui Ding, Jingdong Wang,

(参考訳) MLLM(Multimodal Large Language Models)は、その強力な推論と一般化機能を備えた幅広い視覚言語タスクにおいて、有望であることを示す。しかし、それらはSupervised Fine-Tuning (SFT) フェーズの高品質なデータに大きく依存している。既存のアプローチは、GPT-4Vによる高品質なデータのキュレートを目標としているが、GPT-4Vの商業的性質と、モデルを指示するために使用するプロンプトの単純さのため、スケーラビリティが低い。そこで我々は,オブジェクトのカテゴリと位置,地域記述,テキスト情報,および画像の高密度キャプションからなる,大規模で高品質できめ細かい画像アノテーションを生成可能なデータエンジンであるFullAnnoシステムを開発した。このエンジンは、複数の専門家モデルを含むカスケードアノテーションプロセスで特徴付けられ、濃密な画像キャプションを生成するためにLSMを指示するためにリッチなプロンプトを使用する。我々は、FullAnnoシステムを用いてCOCOおよびVisual Genomeデータセットを再注釈し、オブジェクトアノテーションの数を3倍にし、元の画像キャプションの長さを15。実験により、再生したアノテーションは、複数のベンチマークでLLaVA-v1.5の能力を著しく向上できることが示された。再注釈されたデータは、https://arcana-project-page.github.ioで入手できる。

Multimodal Large Language Models (MLLMs) have shown promise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities. However, they heavily depend on high-quality data in the Supervised Fine-Tuning (SFT) phase. The existing approaches aim to curate high-quality data via GPT-4V, but they are not scalable due to the commercial nature of GPT-4V and the simplicity of the prompts used to instruct the model. To this end, we devised the FullAnno system, which is a data engine that can generate large-scale, high-quality, and fine-grained image annotations consisting of the category and position of objects, region descriptions, text information, as well as image dense captions. This engine is characterized by its cascade annotation process, which involves multiple expert models and employs rich prompts to instruct LLMs in generating dense image captions. We re-annotated the COCO and Visual Genome datasets using our FullAnno system, tripling the number of object annotations and increasing the length of the original image captions by a factor of 15. Experiments show that the regenerated annotation can significantly enhance the capabilities of LLaVA-v1.5 on several benchmarks. The re-annotated data are available at: https://arcana-project-page.github.io

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# 融合と流れ:フォトニックグラフ状態を確実に構築するための正式なプロトコル

Fusion and flow: formal protocols to reliably build photonic graph states ( http://arxiv.org/abs/2409.13541v1 )

ライセンス: Link先を確認

Giovanni de Felice, Boldizsár Poór, Lia Yeh, William Cashman,

(参考訳) Photonicsは、計測ベースの量子コンピューティングの実装のための有望なプラットフォームを提供する。最近提案されたフュージョンベースのアーキテクチャは、普遍性とフォールトトレランスを達成することを目的としている。これらの手法では、資源グラフ状態上で核融合と単一ビット計測を行うことにより計算を行う。これらのアーキテクチャの検証には、線形代数的、確率的、制御フロー構造を統一形式言語で結合する必要がある。本稿では,線形光学,ZX計算,データフロープログラミングを融合して,フォトニック量子コンピューティングのためのフレームワークを開発する。パウリの誤差を誘発する核融合測定を特徴付けるとともに、核融合ネットワークのための新しい流れ構造を用いて補正可能であることを示す。任意の核融合を実現するための新しい再帰的・再帰的プロトコルの正しさを証明し、光子源が絡み合った線形光学系に対する普遍性のグラフ理論的証明を提供する。提案するフレームワークは、フォトニック量子コンピューティングのためのコンパイルアルゴリズムの開発方法である。

Photonics offers a promising platform for implementations of measurement-based quantum computing. Recently proposed fusion-based architectures aim to achieve universality and fault-tolerance. In these approaches, computation is carried out by performing fusion and single-qubit measurements on a resource graph state. The verification of these architectures requires linear algebraic, probabilistic, and control flow structures to be combined in a unified formal language. This paper develops a framework for photonic quantum computing by bringing together linear optics, ZX calculus, and dataflow programming. We characterize fusion measurements that induce Pauli errors and show that they are correctable using a novel flow structure for fusion networks. We prove the correctness of new repeat-until-success protocols for the realization of arbitrary fusions and provide a graph-theoretic proof of universality for linear optics with entangled photon sources. The proposed framework paves the way for the development of compilation algorithms for photonic quantum computing.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# 半スーパービジョンノード分類のためのグラフ類似性正規化ソフトマックス

Graph Similarity Regularized Softmax for Semi-Supervised Node Classification ( http://arxiv.org/abs/2409.13544v1 )

ライセンス: Link先を確認

Yiming Yang, Jun Liu, Wei Wan,

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データのために設計された強力なディープラーニングモデルであり、ソフトマックス関数は半教師付きノード分類の最も一般的な分類法である。しかし、ソフトマックス関数はグラフ構造の空間情報を欠いている。本稿では,半教師付きノード分類におけるGNNのためのグラフ類似性正規化ソフトマックスを提案する。非局所的全変動(TV)正規化をソフトマックス活性化関数に組み込むことで、グラフ固有の空間情報をより効果的に捉えることができる。非局所勾配と発散作用素の重みはグラフの隣接行列に基づいて決定される。本稿では,提案手法をGCNとGraphSAGEのアーキテクチャに適用し,それぞれを引用とWebページリンクデータセット上でテストする。数値実験はノード分類と一般化能力において優れた性能を示す。これらの結果は、グラフ類似性が正則化されたソフトマックスは、因数グラフと非因数グラフの両方に有効であることを示している。

Graph Neural Networks (GNNs) are powerful deep learning models designed for graph-structured data, demonstrating effectiveness across a wide range of applications.The softmax function is the most commonly used classifier for semi-supervised node classification. However, the softmax function lacks spatial information of the graph structure. In this paper, we propose a graph similarity regularized softmax for GNNs in semi-supervised node classification. By incorporating non-local total variation (TV) regularization into the softmax activation function, we can more effectively capture the spatial information inherent in graphs. The weights in the non-local gradient and divergence operators are determined based on the graph's adjacency matrix. We apply the proposed method into the architecture of GCN and GraphSAGE, testing them on citation and webpage linking datasets, respectively. Numerical experiments demonstrate its good performance in node classification and generalization capabilities. These results indicate that the graph similarity regularized softmax is effective on both assortative and disassortative graphs.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# 分割型ランダム化平滑化による正逆ロバスト性証明

Certified Adversarial Robustness via Partition-based Randomized Smoothing ( http://arxiv.org/abs/2409.13546v1 )

ライセンス: Link先を確認

Hossein Goli, Farzan Farnia,

(参考訳) ディープニューラルネットワーク分類器の信頼性の高い応用には、敵の摂動に対する堅牢性証明が必要である。ガウスの平滑化は、正規有界摂動に対するロバスト性を証明するための広く分析されたアプローチであり、認定された予測半径はガウスのノイズの分散と、加法的なガウスのノイズの下でのニューラルネットの予測の信頼度に依存する。しかし、高次元画像データセットに適用した場合、高分散のガウス雑音が画像の視認性を著しく損なうため、原ガウス滑らか化の認定半径は比較的小さい可能性がある。本稿では,Pixel Partitioningに基づくランダム化平滑化手法を提案する。提案するPPRSアルゴリズムは,加法ガウス雑音下での画像の可視性を向上させる。本稿では,標準的なコンピュータビジョンデータセットとニューラルネットワークアーキテクチャにPPRSを適用した数値結果について論じる。実験により, ランダムな平滑化における付加ガウス雑音に対する予測モデルの精度と安定性が著しく向上したことが示された。

A reliable application of deep neural network classifiers requires robustness certificates against adversarial perturbations. Gaussian smoothing is a widely analyzed approach to certifying robustness against norm-bounded perturbations, where the certified prediction radius depends on the variance of the Gaussian noise and the confidence level of the neural net's prediction under the additive Gaussian noise. However, in application to high-dimensional image datasets, the certified radius of the plain Gaussian smoothing could be relatively small, since Gaussian noise with high variances can significantly harm the visibility of an image. In this work, we propose the Pixel Partitioning-based Randomized Smoothing (PPRS) methodology to boost the neural net's confidence score and thus the robustness radius of the certified prediction. We demonstrate that the proposed PPRS algorithm improves the visibility of the images under additive Gaussian noise. We discuss the numerical results of applying PPRS to standard computer vision datasets and neural network architectures. Our empirical findings indicate a considerable improvement in the certified accuracy and stability of the prediction model to the additive Gaussian noise in randomized smoothing.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# 自由四元数選択によるパラメータ化制御ゲートの最適化

Optimizing a parameterized controlled gate with Free Quaternion Selection ( http://arxiv.org/abs/2409.13547v1 )

ライセンス: Link先を確認

Hiroyoshi Kurogi, Katsuhiro Endo, Yuki Sato, Michihiko Sugawara, Kaito Wada, Kenji Sugisaki, Shu Kanno, Hiroshi C. Watanabe, Haruyuki Nakano,

(参考訳) 変分アルゴリズムでは、量子回路は伝統的に単一量子ビットゲートに対してパラメータ化される。本研究では、一般化された制御ゲートをパラメータ化し、コスト値の局所最小化に最適なパラメータを推定するアルゴリズムを提案する。提案手法は,Isingおよび分子ハミルトニアンの変分量子固有解法(VQE),フィデリティ最大化のための変分量子アルゴリズム(VQA),時間発展演算子のユニタリコンパイルなど,様々な最適化問題に適用する。提案手法は,他の手法よりも浅い回路で効率よく最適化し,高い表現性を示す。さらに, この手法は, 化学系応用において要求される粒子数保存ゲートを一般化し, 完全に最適化することができる。この特性を利用して、分子ハミルトニアンの時間発展作用素を実際に近似し、トロッター分解による標準実装と比較して浅い回路で力学をシミュレートした。

In variational algorithms, quantum circuits are conventionally parametrized with respect to single-qubit gates. In this study, we parameterize a generalized controlled gate and propose an algorithm to estimate the optimal parameters for locally minimizing the cost value, where we extend the free quaternion selection method, an optimization method for a single-qubit gate. To benchmark the performance, we apply the proposed method to various optimization problems, including the Variational Quantum Eigensolver (VQE) for Ising and molecular Hamiltonians, Variational Quantum Algorithms (VQA) for fidelity maximization, and unitary compilation of time evolution operators. In these applications, the proposed method shows efficient optimization and greater expressibility with shallower circuits than other methods. Furthermore, this method is also capable of generalizing and fully optimizing particle-number-conserving gates, which are in demand in chemical systems applications. Taking advantage of this property, we have actually approximated time evolution operators of molecular Hamiltonian and simulated the dynamics with shallower circuits in comparison to the standard implementation by Trotter decomposition.

翻訳日:2024-11-07 06:41:58 公開日:2024-09-20

# 計算ノートにおけるコンテクスト化されたデータ記述コード生成

Contextualized Data-Wrangling Code Generation in Computational Notebooks ( http://arxiv.org/abs/2409.13551v1 )

ライセンス: Link先を確認

Junjie Huang, Daya Guo, Chenglong Wang, Jiazhen Gu, Shuai Lu, Jeevana Priya Inala, Cong Yan, Jianfeng Gao, Nan Duan, Michael R. Lyu,

(参考訳) データラングリングは、計算ノートブックのさらなる分析のために生データを準備するプロセスであり、データサイエンスにおいて不可欠だが時間を要するステップである。コード生成は、ユーザ意図を実行可能なコードに変換することによって、アナリストのオーバーヘッドを削減するために、データラングリングプロセスを自動化する可能性がある。正確なコードラングリングデータの生成は、テキストコンテキスト、コードコンテキスト、データコンテキストなど、ノートブックに存在するリッチコンテキストの包括的な考慮を必要とする。しかし、ノートブックはしばしば複数の非線形解析タスクを線形コードブロックのシーケンスにインターリーブする。ソースコードブロックでモデルを直接トレーニングするのは、正確なラングリングコード生成のためにコンテキストを完全に活用するのに失敗する。このギャップを埋めるために、コード生成タスクを乱すデータモデルのトレーニングを支援するために、明確でリッチなコンテキストで高品質なデータセットを構築することを目的としています。本研究では,まず,マルチモーダルなコンテキスト依存を明確化したデータラングリングコード生成例を抽出するための自動アプローチであるCoCoMineを提案する。最初はデータフロー分析を採用して、データラングリングコードを含むコードブロックを識別する。次にCoCoMineは、ノートブックのトレースと再生を通じて、コンテキスト化されたデータラングリングコード例を抽出する。 CoCoMineでは、Notebooksでコンテキスト化されたデータラングリングコード生成のための58,221のサンプルを含むデータセットであるCoCoNoteを構築している。データセットの有効性を示すため、トレーニング済みのコードモデルの範囲を微調整し、タスク上で様々な大きな言語モデルを促す。さらに、コード生成を強化するために、データコンテキストとコード/テキストコンテキストを別々にエンコードするDataCoderを提案する。実験結果から,データラングリングコード生成にデータコンテキストを組み込むことの重要性と,本モデルの有効性が示された。コードとデータは url でリリースします。

Data wrangling, the process of preparing raw data for further analysis in computational notebooks, is a crucial yet time-consuming step in data science. Code generation has the potential to automate the data wrangling process to reduce analysts' overhead by translating user intents into executable code. Precisely generating data wrangling code necessitates a comprehensive consideration of the rich context present in notebooks, including textual context, code context and data context. However, notebooks often interleave multiple non-linear analysis tasks into linear sequence of code blocks, where the contextual dependencies are not clearly reflected. Directly training models with source code blocks fails to fully exploit the contexts for accurate wrangling code generation. To bridge the gap, we aim to construct a high quality datasets with clear and rich contexts to help training models for data wrangling code generation tasks. In this work, we first propose an automated approach, CoCoMine to mine data-wrangling code generation examples with clear multi-modal contextual dependency. It first adopts data flow analysis to identify the code blocks containing data wrangling codes. Then, CoCoMine extracts the contextualized datawrangling code examples through tracing and replaying notebooks. With CoCoMine, we construct CoCoNote, a dataset containing 58,221 examples for Contextualized Data-wrangling Code generation in Notebooks. To demonstrate the effectiveness of our dataset, we finetune a range of pretrained code models and prompt various large language models on our task. Furthermore, we also propose DataCoder, which encodes data context and code&textual contexts separately to enhance code generation. Experiment results demonstrate the significance of incorporating data context in data-wrangling code generation and the effectiveness of our model. We release code and data at url...

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# 接地的特徴とコアフェレントな特徴を持つビジュアルストーリーの生成

Generating Visual Stories with Grounded and Coreferent Characters ( http://arxiv.org/abs/2409.13555v1 )

ライセンス: Link先を確認

Danyang Liu, Mirella Lapata, Frank Keller,

(参考訳) 登場人物は物語において重要である。彼らはプロットを前進させ、感情的なつながりを作り、物語のテーマを具現化する。ビジュアルなストーリーテリング手法は、特定のキャラクターに関する物語を構築することなく、それに関連するプロットやイベントをより重視する。その結果、生成されたストーリーはジェネリックに感じられ、キャラクタが不在、曖昧、または誤っている。これらの問題を緩和するため,キャラクタ中心のストーリー生成という新たなタスクを導入し,一貫した接地と中核的なキャラクタの言及で視覚的なストーリーを予測できる最初のモデルを提案する。我々のモデルは、広く使われているVISTベンチマークの上に構築された新しいデータセットに基づいて微調整されています。具体的には、VISTを視覚的およびテキスト的文字コア参照チェーンで強化する自動パイプラインを開発する。また、物語における文字の豊かさとコア参照を測定するための新しい評価指標を提案する。実験結果から,本モデルは,ベースラインや最先端システムと比較して,一貫性とコアフェレントな繰り返しキャラクタを持つストーリーを生成することがわかった。

Characters are important in narratives. They move the plot forward, create emotional connections, and embody the story's themes. Visual storytelling methods focus more on the plot and events relating to it, without building the narrative around specific characters. As a result, the generated stories feel generic, with character mentions being absent, vague, or incorrect. To mitigate these issues, we introduce the new task of character-centric story generation and present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions. Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark. Specifically, we develop an automated pipeline to enrich VIST with visual and textual character coreference chains. We also propose new evaluation metrics to measure the richness of characters and coreference in stories. Experimental results show that our model generates stories with recurring characters which are consistent and coreferent to larger extent compared to baselines and state-of-the-art systems.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# 視覚増強による信頼できるヘイトスピーチ検出

Trustworthy Hate Speech Detection Through Visual Augmentation ( http://arxiv.org/abs/2409.13557v1 )

ライセンス: Link先を確認

Ziyuan Yang, Ming Yan, Yingyu Chen, Hui Wang, Zexin Lu, Yi Zhang,

(参考訳) ソーシャルメディアプラットフォームでのヘイトスピーチの急増は、ヘイトスピーチ検出(HSD)がますます批判的になり、大きな課題となっている。現在のHSD法は、検出性能を高めるために文脈情報を充実させることに重点を置いているが、ヘイトスピーチの本質的な不確実性を見落としている。本稿では,視覚的拡張(TrusV-HSD)による信頼に値するヘイトスピーチ検出手法を提案する。 TrusV-HSDは、ペアデータのないマルチモーダル接続を通じて、信頼できる情報を効果的に抽出することで意味表現を学習する。公開HSDデータセットを用いた実験では,TrusV-HSDの有効性が示され,従来の手法よりも顕著な改善が見られた。

The surge of hate speech on social media platforms poses a significant challenge, with hate speech detection~(HSD) becoming increasingly critical. Current HSD methods focus on enriching contextual information to enhance detection performance, but they overlook the inherent uncertainty of hate speech. We propose a novel HSD method, named trustworthy hate speech detection method through visual augmentation (TrusV-HSD), which enhances semantic information through integration with diffused visual images and mitigates uncertainty with trustworthy loss. TrusV-HSD learns semantic representations by effectively extracting trustworthy information through multi-modal connections without paired data. Our experiments on public HSD datasets demonstrate the effectiveness of TrusV-HSD, showing remarkable improvements over conventional methods.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# 生成モデルと対向摂動を考慮したニューラルネットワークの効率的な可視化

Efficient Visualization of Neural Networks with Generative Models and Adversarial Perturbations ( http://arxiv.org/abs/2409.13559v1 )

ライセンス: Link先を確認

Athanasios Karagounis,

(参考訳) 本稿では,既存の手法を改良した生成ネットワークによるディープビジュアライゼーション手法を提案する。従来の複数のネットワークとは対照的に,ジェネレータと識別器のみを必要とするため,使用するネットワーク数を削減し,アーキテクチャを単純化する。さらに,本モデルでは事前学習の知識を少なくし,非対話的学習プロセスを用いて,判別器がジェネレータと競合するのではなく,ガイドとして機能する。この研究のコアコントリビューションは、特定のクラスラベルと整合した詳細な視覚化画像を生成する能力である。本モデルでは,複数層にまたがるクラス情報を伝播することにより,ラベル指向の画像生成を促進できる,ユニークなスキップ接続型ブロック設計を取り入れている。さらに、これらの生成した視覚化を逆例として利用し、元の画像に最小限の修正を施した分類網を効果的に騙す方法について検討する。実験結果から,本手法は標的攻撃と非目標攻撃の両方において従来の対向的事例生成技術より優れ,摂動を最小限に抑えた94.5%の愚行率を達成できた。この研究は、可視化手法と敵の例とのギャップを埋めるものであり、愚かさが可視化品質を評価するための定量的指標となることを示唆している。本研究から得られた知見は、ニューラルネットワークの解釈可能性と敵攻撃に対する脆弱性に関する新たな視点を提供する。

This paper presents a novel approach for deep visualization via a generative network, offering an improvement over existing methods. Our model simplifies the architecture by reducing the number of networks used, requiring only a generator and a discriminator, as opposed to the multiple networks traditionally involved. Additionally, our model requires less prior training knowledge and uses a non-adversarial training process, where the discriminator acts as a guide rather than a competitor to the generator. The core contribution of this work is its ability to generate detailed visualization images that align with specific class labels. Our model incorporates a unique skip-connection-inspired block design, which enhances label-directed image generation by propagating class information across multiple layers. Furthermore, we explore how these generated visualizations can be utilized as adversarial examples, effectively fooling classification networks with minimal perceptible modifications to the original images. Experimental results demonstrate that our method outperforms traditional adversarial example generation techniques in both targeted and non-targeted attacks, achieving up to a 94.5% fooling rate with minimal perturbation. This work bridges the gap between visualization methods and adversarial examples, proposing that fooling rate could serve as a quantitative measure for evaluating visualization quality. The insights from this study provide a new perspective on the interpretability of neural networks and their vulnerabilities to adversarial attacks.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# 故障診断のためのログからの故障指示情報のデミスティファイションと抽出

Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis ( http://arxiv.org/abs/2409.13561v1 )

ライセンス: Link先を確認

Junjie Huang, Zhihan Jiang, Jinyang Liu, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Cong Feng, Hui Dong, Zengyin Yang, Michael R. Lyu,

(参考訳) ログはオンラインサービスシステムのメンテナンスにおいて必須であり、多くの場合、効果的な障害軽減のための重要な情報を含んでいる。既存の異常検出手法は、広範な実行時データ内の異常なログの識別を容易にするが、技術者による手動によるログメッセージの調査は、労働集約的かつエラーを起こしやすい欠陥を理解するのに不可欠である。 CloudAでログベースのトラブルシューティングのプラクティスを調べると、エンジニアが診断のためにログ情報の2つのカテゴリを優先していることが分かりました。これには、異常なシステムイベントを記録するフォールトインジケート記述や、関連するエンティティを指定するフォールトインジケートパラメータが含まれる。そこで本研究では,LoFIと呼ばれる異常診断のためのログから,そのような故障情報を自動的に抽出する手法を提案する。 LoFIは2つの重要なステージから構成される。最初の段階では、LoFIは、意味的類似性に基づいて、障害に関連するログを収集する粗粒度フィルタリングを行う。第2段階では、LoFIは学習済みの言語モデルと新しいプロンプトベースのチューニング手法を利用して、収集したログから興味の詳細な情報を抽出する。我々は、Apache Sparkから収集したログとCloudAから収集した産業データセット上でLoFIを評価する。実験の結果、LoFIは全てのベースライン法を有意差で上回り、最高のベースライン法であるChatGPTよりもF1の25.8~37.9の絶対的な改善を達成している。このことは、欠陥を示す情報の認識におけるLoFIの有効性を強調している。さらに,CloudAにおけるLoFIのデプロイの成功とユーザスタディにより,本手法の有用性が検証された。コードとデータはhttps://github.com/Jun-jie-Huang/LoFI.comで公開されている。

Logs are imperative in the maintenance of online service systems, which often encompass important information for effective failure mitigation. While existing anomaly detection methodologies facilitate the identification of anomalous logs within extensive runtime data, manual investigation of log messages by engineers remains essential to comprehend faults, which is labor-intensive and error-prone. Upon examining the log-based troubleshooting practices at CloudA, we find that engineers typically prioritize two categories of log information for diagnosis. These include fault-indicating descriptions, which record abnormal system events, and fault-indicating parameters, which specify the associated entities. Motivated by this finding, we propose an approach to automatically extract such faultindicating information from logs for fault diagnosis, named LoFI. LoFI comprises two key stages. In the first stage, LoFI performs coarse-grained filtering to collect logs related to the faults based on semantic similarity. In the second stage, LoFI leverages a pre-trained language model with a novel prompt-based tuning method to extract fine-grained information of interest from the collected logs. We evaluate LoFI on logs collected from Apache Spark and an industrial dataset from CloudA. The experimental results demonstrate that LoFI outperforms all baseline methods by a significant margin, achieving an absolute improvement of 25.8~37.9 in F1 over the best baseline method, ChatGPT. This highlights the effectiveness of LoFI in recognizing fault-indicating information. Furthermore, the successful deployment of LoFI at CloudA and user studies validate the utility of our method. The code and data are available at https://github.com/Jun-jie-Huang/LoFI.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# Proxion:Ethereumの衝突脆弱性を見つけるための隠れたスマートコントラクト

Proxion: Uncovering Hidden Proxy Smart Contracts for Finding Collision Vulnerabilities in Ethereum ( http://arxiv.org/abs/2409.13563v1 )

ライセンス: Link先を確認

Cheng-Kang Chen, Wen-Yi Chu, Muoi Tran, Laurent Vanbever, Hsu-Chun Hsiao,

(参考訳) プロキシ設計パターンにより、Ethereumスマートコントラクトを同時に不変かつアップグレード可能とし、元のコントラクトをデータストレージを含むプロキシコントラクトと実装ロジックを含むロジックコントラクトに分割する。このアーキテクチャは、セキュリティ上の問題、すなわちプロキシとロジックのコントラクト間の機能衝突とストレージの衝突が知られており、実際のインシデントでユーザから数百万ドル相当のデジタル資産を盗まれている。この懸念に応えて、いくつかの以前の研究がEthereumのプロキシコントラクトを特定して、衝突を検出する方法を模索している。しかし、それらすべてがカバー範囲が限られているために不足しており、多くの場合、利用可能なソースコードや過去のトランザクションとの契約のみの分析に制限される。このギャップを埋めるために、私たちは、すべてのプロキシスマートコントラクトとEthereum内のそれらの衝突を識別する、自動クロスコントラクトアナライザであるProxionを紹介します。 Proxionを際立たせるのは、ソースコードと過去のトランザクションの両方を欠く隠れたスマートコントラクトを分析する能力だ。 Proxionは効率性と精度を向上させる様々な技術を備えており、最先端のツールよりも優れており、特に数百万のプロキシ契約と何千もの未報告の衝突を識別している。我々は、2015年から2023年までの3600万以上の生きた契約を分析し、54.2%がプロキシ契約であり、約150万の契約が少なくとも1つの衝突問題を示すことを明らかにした。

The proxy design pattern allows Ethereum smart contracts to be simultaneously immutable and upgradeable, in which an original contract is split into a proxy contract containing the data storage and a logic contract containing the implementation logic. This architecture is known to have security issues, namely function collisions and storage collisions between the proxy and logic contracts, and has been exploited in real-world incidents to steal users' millions of dollars worth of digital assets. In response to this concern, several previous works have sought to identify proxy contracts in Ethereum and detect their collisions. However, they all fell short due to their limited coverage, often restricting analysis to only contracts with available source code or past transactions. To bridge this gap, we present Proxion, an automated cross-contract analyzer that identifies all proxy smart contracts and their collisions in Ethereum. What sets Proxion apart is its ability to analyze hidden smart contracts that lack both source code and past transactions. Equipped with various techniques to enhance efficiency and accuracy, Proxion outperforms the state-of-the-art tools, notably identifying millions more proxy contracts and thousands of unreported collisions. We apply Proxion to analyze over 36 million alive contracts from 2015 to 2023, revealing that 54.2% of them are proxy contracts, and about 1.5 million contracts exhibit at least one collision issue.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# ディープラーニングと機械学習、ビッグデータ分析と管理の強化:テンソルフロー事前学習モデル

Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Tensorflow Pretrained Models ( http://arxiv.org/abs/2409.13566v1 )

ライセンス: Link先を確認

Keyu Chen, Ziqian Bi, Qian Niu, Junyu Liu, Benji Peng, Sen Zhang, Ming Liu, Ming Li, Xuanhe Pan, Jiawei Xu, Jinlang Wang, Pohsun Feng,

(参考訳) 本書は、ディープラーニングにおけるTensorFlow事前学習モデルの応用に焦点を当て、画像分類やオブジェクト検出などのタスクにこれらのモデルを効果的に使用するための詳細なガイダンスを提供する。 ResNet、MobileNet、EfficientNetといったモダンアーキテクチャの実践的な実装をカバーし、実世界の実例や実験を通じてトランスファーラーニングのパワーを実証している。この本は線形探索とモデル微調整を比較し、PCA、t-SNE、UMAPといった技術を使って、読者が異なるアプローチの影響を直感的に理解できるように視覚化する。初心者向けに設計された本書には、完全なサンプルコードとステップ・バイ・ステップの指示が含まれており、読者は事前学習されたモデルを利用して、実践的なシナリオにおけるパフォーマンスを改善する方法を素早く習得することができる。この本は、理論的な洞察と実践を融合することにより、読者に様々な深層学習課題に自信を持って取り組む知識を与える。

This book focuses on the application of TensorFlow pre-trained models in deep learning, providing detailed guidance on effectively using these models for tasks such as image classification and object detection. It covers practical implementations of modern architectures like ResNet, MobileNet, and EfficientNet, demonstrating the power of transfer learning through real-world examples and experiments. The book compares linear probing and model fine-tuning, offering visualizations using techniques such as PCA, t-SNE, and UMAP to help readers intuitively understand the impact of different approaches. Designed for beginners to advanced users, this book includes complete example code and step-by-step instructions, enabling readers to quickly master how to leverage pre-trained models to improve performance in practical scenarios. By blending theoretical insights with hands-on practice, this book equips readers with the knowledge to confidently tackle various deep learning challenges.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# ふわふわ雲に対処する:S2および/またはS1画像の時系列を用いたフィールド境界検出

Tackling fluffy clouds: field boundaries detection using time series of S2 and/or S1 imagery ( http://arxiv.org/abs/2409.13568v1 )

ライセンス: Link先を確認

Foivos I. Diakogiannis, Zheng-Shu Zhou, Jeff Wang, Gonzalo Mata, Dave Henry, Roger Lawes, Amy Parker, Peter Caccetta, Rodrigo Ibata, Ondrej Hlinka, Jonathan Richetti, Kathryn Batchelor, Chris Herrmann, Andrew Toovey, John Taylor,

(参考訳) 正確なフィールド境界線作成は、デジタル農業において重要な課題であり、作物のモニタリングから資源管理まで、あらゆることに影響を及ぼす。既存の手法はしばしばノイズに悩まされ、特に光リモートセンシングにおいて雲のカバーを扱う場合、様々な風景を一般化することができない。そこで本研究では,Sentinel-2 (S2) およびSentinel-1 (S1) 画像からの時系列データを活用する手法を提案する。本稿では,衛星画像時系列に特化して設計された3次元ビジョントランスフォーマーアーキテクチャについて紹介する。 2つのモデルが提案されている: PTAViT3DはS2またはS1データを独立に処理し、PTAViT3D-CAは両方のデータセットを融合して精度を高める。両モデルとも、時空間相関を利用して、疎密で密集した雲の範囲で評価される。その結果,S1モデルでは空間分解能の点でS2画像に匹敵する性能を提供するため,部分的(S2,S2,S1データ融合)や密集雲被覆(S1)であっても,領域境界を効果的に導出できることが示唆された。このアプローチの重要な強みは、時空間相関をメモリ効率のよい方法で活用することで、クラウドに汚染された画像を直接処理できる能力にある。この手法は、オーストラリアのフィールド境界をマッピングするためにePaddocks製品で使用され、様々な農業環境に適応可能な堅牢でスケーラブルなソリューションを提供する。私たちのコードはhttps://github.com/feevos/tfcl.comで公開されています。

Accurate field boundary delineation is a critical challenge in digital agriculture, impacting everything from crop monitoring to resource management. Existing methods often struggle with noise and fail to generalize across varied landscapes, particularly when dealing with cloud cover in optical remote sensing. In response, this study presents a new approach that leverages time series data from Sentinel-2 (S2) and Sentinel-1 (S1) imagery to improve performance under diverse cloud conditions, without the need for manual cloud filtering. We introduce a 3D Vision Transformer architecture specifically designed for satellite image time series, incorporating a memory-efficient attention mechanism. Two models are proposed: PTAViT3D, which handles either S2 or S1 data independently, and PTAViT3D-CA, which fuses both datasets to enhance accuracy. Both models are evaluated under sparse and dense cloud coverage by exploiting spatio-temporal correlations. Our results demonstrate that the models can effectively delineate field boundaries, even with partial (S2 or S2 and S1 data fusion) or dense cloud cover (S1), with the S1-based model providing performance comparable to S2 imagery in terms of spatial resolution. A key strength of this approach lies in its capacity to directly process cloud-contaminated imagery by leveraging spatio-temporal correlations in a memory-efficient manner. This methodology, used in the ePaddocks product to map Australia's national field boundaries, offers a robust, scalable solution adaptable to varying agricultural environments, delivering precision and reliability where existing methods falter. Our code is available at https://github.com/feevos/tfcl.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# オーストラリア首都圏における電子投票システムeVACS 2020/2024のセキュリティ分析

Security analysis of the Australian Capital Territory's eVACS 2020/2024 paperless direct recording electronic voting system ( http://arxiv.org/abs/2409.13570v1 )

ライセンス: Link先を確認

Chris Culnane, Andrew Conway, Vanessa Teague, Ty Wilson-Brown,

(参考訳) 本報告では,Ada Web Services Libraryにおける2つの暗号エラーがeVACSに与える影響について述べる。これらのエラーは、2024年3月に公開された2024 eVACSコードの検査とテストの過程で確認した。この問題をAdaCoreに開示し、当時の影響を関連する選挙当局に説明しました。

This report describes the implications for eVACS of two cryptographic errors in the Ada Web Services Library that it depends on. We identified these errors in the course of examining and testing the 2024 eVACS code, which was made publicly available in March 2024. We disclosed the problems to AdaCore, and explained the implications at the time to the relevant electoral authorities.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# ファクトリワイド動的スケジューリングのためのスケーラブルなマルチエージェント強化学習

Scalable Multi-agent Reinforcement Learning for Factory-wide Dynamic Scheduling ( http://arxiv.org/abs/2409.13571v1 )

ライセンス: Link先を確認

Jaeyeon Jang, Diego Klabjan, Han Liu, Nital S. Patel, Xiuqi Li, Balakrishnan Ananthanarayanan, Husam Dauod, Tzung-Han Juang,

(参考訳) リアルタイムな動的スケジューリングは、意思決定の複雑さのため、現代の製造プロセスにおいて極めて難しい課題である。近年、強化学習(RL)がこの課題に対処するための影響のある手法として注目されている。しかし、古典的なRL法は通常、大規模な工場規模のスケジューリングには適さない人為的なディスパッチ規則に依存している。このギャップを埋めるために,本論文では,スケジューリング問題を各エージェントが処理するサブプロブレムの集合に分解した後,所望のコーディネーションを得るためにリーダ・フォロワマルチエージェントRL(MARL)の概念を適用した。さらに、エージェントのエラーによる生産能力の壊滅的な損失を防止するためにルールベースの変換アルゴリズムを提案することで、手順をさらに強化する。実験の結果,提案手法は様々な面において,最先端の深部RLに基づくスケジューリングモデルよりも優れていた。さらに、提案したモデルは、要求の変化に対する最も堅牢なスケジューリング性能を提供する。全体として、提案したMARLベースのスケジューリングモデルでは、リアルタイムスケジューリング問題に対する有望な解決策が提示され、様々な製造業における潜在的な応用が期待できる。

Real-time dynamic scheduling is a crucial but notoriously challenging task in modern manufacturing processes due to its high decision complexity. Recently, reinforcement learning (RL) has been gaining attention as an impactful technique to handle this challenge. However, classical RL methods typically rely on human-made dispatching rules, which are not suitable for large-scale factory-wide scheduling. To bridge this gap, this paper applies a leader-follower multi-agent RL (MARL) concept to obtain desired coordination after decomposing the scheduling problem into a set of sub-problems that are handled by each individual agent for scalability. We further strengthen the procedure by proposing a rule-based conversion algorithm to prevent catastrophic loss of production capacity due to an agent's error. Our experimental results demonstrate that the proposed model outperforms the state-of-the-art deep RL-based scheduling models in various aspects. Additionally, the proposed model provides the most robust scheduling performance to demand changes. Overall, the proposed MARL-based scheduling model presents a promising solution to the real-time scheduling problem, with potential applications in various manufacturing industries.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# 自己の法則:非合意親密メディアにおけるDMCAの効用

A Law of One's Own: The Inefficacy of the DMCA for Non-Consensual Intimate Media ( http://arxiv.org/abs/2409.13575v1 )

ライセンス: Link先を確認

Li Qiwei, Shihui Zhang, Samantha Paige Pratt, Andrew Timothy Kasper, Eric Gilbert, Sarita Schoenebeck,

(参考訳) NCIM(Non-consensual Intimate Media)は、表現されている個人に対して、インターネット規模の害を与えるメディアである。削除を求める最も強力なツールの1つは、デジタルミレニアム著作権法(DMCA)である。しかし、DMCAはNCIMの問題に対処するよりも著作権保持者を保護するために設計された。本稿では,54,000以上のDMCAレポートと8500万以上のURLを10年以上にわたって収集したデータセットを用いて,NCIM削除に対するDMCAの有効性を評価する。その結果、インデクシングURLの50%以下は60日以内にウェブサイトのホストから削除され、Google検索はインデクシングコンテンツのデインデクシングに11.7日を要した。ウェブホスト全体では、最初の48時間でURLのわずか4%が削除される。さらに、非商業的なNCIMのための最も頻繁に報告されるドメインは、大きなプラットフォームではなく、より小さなウェブサイトである。我々は、大きなプラットフォームや小さなプラットフォームにまたがって実施可能な、削除期間の短縮を保証する新しい法律の必要性を強調します。

Non-consensual intimate media (NCIM) presents internet-scale harm to individuals who are depicted. One of the most powerful tools for requesting its removal is the Digital Millennium Copyright Act (DMCA). However, the DMCA was designed to protect copyright holders rather than to address the problem of NCIM. Using a dataset of more than 54,000 DMCA reports and over 85 million infringing URLs spanning over a decade, this paper evaluates the efficacy of the DMCA for NCIM takedown. Results show less than 50% of infringing URLs are removed from website hosts in 60 days, and Google Search takes a median of 11.7 days to deindex infringing content. Across web hosts, only 4% of URLs are removed within the first 48 hours. Additionally, the most frequently reported domains for non-commercial NCIM are smaller websites, not large platforms. We stress the need for new laws that ensure a shorter time to takedown that are enforceable across big and small platforms alike.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# Region Prompt Tuning:Regional Text Promptを利用したきめ細かいシーンテキスト検出

Region Prompt Tuning: Fine-grained Scene Text Detection Utilizing Region Text Prompt ( http://arxiv.org/abs/2409.13576v1 )

ライセンス: Link先を確認

Xingtao Lin, Heqian Qiu, Lanxiao Wang, RUihang Wang, Linfeng XU, Hongliang Li,

(参考訳) プロンプトチューニングの最近の進歩は、シーンテキスト検出などの下流タスクに対して、Contrastive Language-Image Pre-trained (CLIP)のような大規模モデルに適応することに成功した。通常、テキストプロンプトはテキストエンコーダの入力を補完し、細粒度の詳細を無視しながらグローバルな特徴に焦点を合わせ、シーンテキスト検出のタスクではきめ細かいテキストが無視される。本稿では,詳細なシーンテキスト検出のための領域プロンプトチューニング(RPT)手法を提案する。リージョンプロンプトチューニング法は、地域テキストプロンプトを個々の文字に分解し、視覚特徴マップを地域視覚トークンに分割し、文字とトークンを1対1で対応させる。これにより、文字はトークンの局所的な特徴と一致し、詳細な特徴やきめ細かいテキストが省略されるのを避けることができる。これを実現するために,各文字を対応するトークンにリンクするための共有位置埋め込みを導入し,各領域のテキストプロンプト文字をターゲットの `text'' に合わせるために双方向距離ロスを用いる。細粒度レベルで情報を洗練するために,符号化前後の文字-トークンレベルの相互作用を実装した。提案手法は,画像テキストプロセスから得られた一般的なスコアマップと文字とトークンのマッチングから得られた領域スコアマップを組み合わせることで,グローバルな特徴とローカルな特徴のバランスを保ち,DBNetに入力してテキストを検知する最終的なスコアマップを生成する。 ICDAR2015、TotalText、CTW1500といったベンチマークの実験では、RTTのパフォーマンスが印象的であり、シーンテキスト検出の有効性が強調されている。

Recent advancements in prompt tuning have successfully adapted large-scale models like Contrastive Language-Image Pre-trained (CLIP) for downstream tasks such as scene text detection. Typically, text prompt complements the text encoder's input, focusing on global features while neglecting fine-grained details, leading to fine-grained text being ignored in task of scene text detection. In this paper, we propose the region prompt tuning (RPT) method for fine-grained scene text detection, where region text prompt proposed would help focus on fine-grained features. Region prompt tuning method decomposes region text prompt into individual characters and splits visual feature map into region visual tokens, creating a one-to-one correspondence between characters and tokens. This allows a character matches the local features of a token, thereby avoiding the omission of detailed features and fine-grained text. To achieve this, we introduce a sharing position embedding to link each character with its corresponding token and employ a bidirectional distance loss to align each region text prompt character with the target ``text''. To refine the information at fine-grained level, we implement character-token level interactions before and after encoding. Our proposed method combines a general score map from the image-text process with a region score map derived from character-token matching, producing a final score map that could balance the global and local features and be fed into DBNet to detect the text. Experiments on benchmarks like ICDAR2015, TotalText, and CTW1500 demonstrate RPT impressive performance, underscoring its effectiveness for scene text detection.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# Time and Tokens: エンドツーエンド音声障害検出のベンチマーク

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection ( http://arxiv.org/abs/2409.13582v1 )

ライセンス: Link先を確認

Xuanru Zhou, Jiachen Lian, Cheol Jun Cho, Jingwen Liu, Zongli Ye, Jinming Zhang, Brittany Morin, David Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Maria Luisa Gorno Tempini, Gopala Anumanchipalli,

(参考訳) 音声のディフルエンシモデリングは、繰り返し、ブロック、挿入、置換、削除などの音声のディフルエンシを検出するタスクである。最近の進歩は、この問題を時間に基づく物体検出問題として扱う。本研究では,この問題を新しい視点から再考する: 障害のトークン化と検出問題をトークンベース自動音声認識(ASR)問題としてモデル化する。規則に基づく音声とテキストのディフルエンシシミュレータを提案し、VCTKトケンを開発し、その後、Whisperのようなセク2seqアーキテクチャを開発し、良好な性能を持つ新しいベンチマークを構築する。また,提案手法と時間に基づく手法を体系的に比較し,今後の研究を促進するための統一ベンチマークを提案する。より広い科学コミュニティのために、これらのリソースをオープンソースにしています。プロジェクトページはhttps://rorizzz.github.io/で公開されている。

Speech dysfluency modeling is a task to detect dysfluencies in speech, such as repetition, block, insertion, replacement, and deletion. Most recent advancements treat this problem as a time-based object detection problem. In this work, we revisit this problem from a new perspective: tokenizing dysfluencies and modeling the detection problem as a token-based automatic speech recognition (ASR) problem. We propose rule-based speech and text dysfluency simulators and develop VCTK-token, and then develop a Whisper-like seq2seq architecture to build a new benchmark with decent performance. We also systematically compare our proposed token-based methods with time-based methods, and propose a unified benchmark to facilitate future research endeavors. We open-source these resources for the broader scientific community. The project page is available at https://rorizzz.github.io/

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# ニューロシンボリック・コンフォーマル分類

Neurosymbolic Conformal Classification ( http://arxiv.org/abs/2409.13585v1 )

ライセンス: Link先を確認

Arthur Ledaguenel, Céline Hudelot, Mostepha Khouadjia,

(参考訳) 過去数十年間、主にディープラーニング(DL)によって駆動される機械学習(ML)が大幅に改善されてきた。しかし、多くの領域におけるMLの成功にもかかわらず、(分散シフトや敵攻撃などに直面した)MLシステムの整合性の保証や脆弱性を提供することの不可能さは、信頼できるAIシステムの設計を妨げている。この脆弱性を軽減し、ニューロシンボリックAIと共形予測を含むMLシステムの動作に関するいくつかの保証を提供するために、いくつかの研究パスが研究されている。ニューロシンボリック人工知能(Neurosymbolic AI)は、ニューラルネットワーク学習能力とシンボリックシステムの推論能力を組み合わせることを目的とした研究分野である。このハイブリダイゼーションの目的の1つは、システムの出力が何らかの事前の知識に従うという理論的な保証を提供することである。コンフォーマル予測(Conformal prediction)とは、一意の予測を信頼セットと呼ばれる一連の予測に変換することによって、MLシステムの不確実性を考慮した一連の手法である。興味深いことに、これは信頼セット内の真のラベルの存在に関する統計的保証が伴う。どちらのアプローチも分布自由であり、モデルに依存しない。本稿では,この2つのアプローチが相互に補完する方法について述べる。本稿では,いくつかのニューロシンボリックな共形予測手法を導入し,その特性(信頼性セットのサイズ,計算複雑性など)について検討する。

The last decades have seen a drastic improvement of Machine Learning (ML), mainly driven by Deep Learning (DL). However, despite the resounding successes of ML in many domains, the impossibility to provide guarantees of conformity and the fragility of ML systems (faced with distribution shifts, adversarial attacks, etc.) have prevented the design of trustworthy AI systems. Several research paths have been investigated to mitigate this fragility and provide some guarantees regarding the behavior of ML systems, among which are neurosymbolic AI and conformal prediction. Neurosymbolic artificial intelligence is a growing field of research aiming to combine neural network learning capabilities with the reasoning abilities of symbolic systems. One of the objective of this hybridization can be to provide theoritical guarantees that the output of the system will comply with some prior knowledge. Conformal prediction is a set of techniques that enable to take into account the uncertainty of ML systems by transforming the unique prediction into a set of predictions, called a confidence set. Interestingly, this comes with statistical guarantees regarding the presence of the true label inside the confidence set. Both approaches are distribution-free and model-agnostic. In this paper, we see how these two approaches can complement one another. We introduce several neurosymbolic conformal prediction techniques and explore their different characteristics (size of confidence sets, computational complexity, etc.).

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# 機械学習による量子固有解アルゴリズムの高速化

Accelerating Quantum Eigensolver Algorithms With Machine Learning ( http://arxiv.org/abs/2409.13587v1 )

ライセンス: Link先を確認

Avner Bensoussan, Elena Chachkarova, Karine Even-Mendoza, Sophie Fortz, Connor Lenihan,

(参考訳) 本稿では,NISQデバイス上でのハミルトン基底状態エネルギー計算の高速化について検討する。本稿では,量子固有解法を応用した量子アルゴリズムの高速化を機械学習と併用して提案する。我々は、XGBoostのPythonレグレシタを使用して、最大16キュービットのシステムから古典的にマイニングされたデータに関する2つの小さなモデルを訓練した。 Eigensolverのハイパーパラメータを最適化することにより,20ビット,24ビット,28ビットシステムに対する予備的アプローチを評価した。これらのモデルはハイパーパラメータ値を予測し、28量子ビットシステムでのテストでは0.13\%-0.15\%エラーを減少させる。しかし,20量子ビット系と24量子ビット系では決定的な結果が得られず,ハミルトン特性に基づくトレーニングデータのさらなる検討が提案されている。今後の研究では、機械学習モデルをトレーニングして、ハイパーパラメータを超えて量子アルゴリズムの実行の他の側面やサブルーチンを最適化する予定です。

In this paper, we explore accelerating Hamiltonian ground state energy calculation on NISQ devices. We suggest using search-based methods together with machine learning to accelerate quantum algorithms, exemplified in the Quantum Eigensolver use case. We trained two small models on classically mined data from systems with up to 16 qubits, using XGBoost's Python regressor. We evaluated our preliminary approach on 20-, 24- and 28-qubit systems by optimising the Eigensolver's hyperparameters. These models predict hyperparameter values, leading to a 0.13\%-0.15\% reduction in error when tested on 28-qubit systems. However, due to inconclusive results with 20- and 24-qubit systems, we suggest further examination of the training data based on Hamiltonian characteristics. In future work, we plan to train machine learning models to optimise other aspects or subroutines of quantum algorithm execution beyond its hyperparameters.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# ChainBuddy: LLMパイプラインを生成するAIエージェントシステム

ChainBuddy: An AI Agent System for Generating LLM Pipelines ( http://arxiv.org/abs/2409.13588v1 )

ライセンス: Link先を確認

Jingyue Zhang, Ian Arawjo,

(参考訳) 大規模言語モデル(LLM)が進歩するにつれて、その潜在的なアプリケーションは大幅に成長した。しかし、ユーザ固有のタスクにおけるLCMの挙動を評価し、効果的にパイプラインを構築することは依然として困難である。多くのユーザーはどこから始めるかに苦慮しており、しばしば「ブランクページ」問題と呼ばれる。 ChainBuddyは、ChainForgeプラットフォームに組み込まれた評価LLMパイプラインを生成するためのAIアシスタントである。 ChainBuddyは、LCMの振る舞いを計画し、評価するための単純でユーザフレンドリな方法を提供する。本稿では,ChainBuddyをベースラインインタフェースと比較した内的ユーザスタディを報告する。 AIアシストを使用する場合、参加者は要求の少ない作業負荷を報告し、LCM動作の評価パイプラインのセットアップをより確実に感じた。我々は,AIのオープンエンド評価において,ユーザを支援するインターフェースの将来に対する洞察を導き出す。

As large language models (LLMs) advance, their potential applications have grown significantly. However, it remains difficult to evaluate LLM behavior on user-specific tasks and craft effective pipelines to do so. Many users struggle with where to start, often referred to as the "blank page" problem. ChainBuddy, an AI assistant for generating evaluative LLM pipelines built into the ChainForge platform, aims to tackle this issue. ChainBuddy offers a straightforward and user-friendly way to plan and evaluate LLM behavior, making the process less daunting and more accessible across a wide range of possible tasks and use cases. We report a within-subjects user study comparing ChainBuddy to the baseline interface. We find that when using AI assistance, participants reported a less demanding workload and felt more confident setting up evaluation pipelines of LLM behavior. We derive insights for the future of interfaces that assist users in the open-ended evaluation of AI.

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# MRI分類モデルにおける$k$-Space特徴の影響の解析

Analyzing the Effect of $k$-Space Features in MRI Classification Models ( http://arxiv.org/abs/2409.13589v1 )

ライセンス: Link先を確認

Pascal Passigan, Vayd Ramkumar,

(参考訳) 医療診断における人工知能(AI)の統合は、しばしばモデル不透明さによって妨げられ、高い精度のシステムは透明な推論なしで「ブラックボックス」として機能する。この制限は、信頼性と信頼性が最重要である臨床環境において重要である。これを解決するために、医用イメージングに適した説明可能なAI手法を開発した。画像領域と周波数領域の両方にわたるMRIスキャンを解析する畳み込みニューラルネットワーク(CNN)を用いることで,一様マニフォールド近似と投影UMAPを組み込んだ新しいアプローチを導入し,潜時入力埋め込みの可視化を行う。このアプローチは、早期トレーニング効率を高めるだけでなく、追加機能がモデル予測に与える影響の理解を深め、解釈可能性を高め、より正確で直感的な診断推論をサポートする。

The integration of Artificial Intelligence (AI) in medical diagnostics is often hindered by model opacity, where high-accuracy systems function as "black boxes" without transparent reasoning. This limitation is critical in clinical settings, where trust and reliability are paramount. To address this, we have developed an explainable AI methodology tailored for medical imaging. By employing a Convolutional Neural Network (CNN) that analyzes MRI scans across both image and frequency domains, we introduce a novel approach that incorporates Uniform Manifold Approximation and Projection UMAP] for the visualization of latent input embeddings. This approach not only enhances early training efficiency but also deepens our understanding of how additional features impact the model predictions, thereby increasing interpretability and supporting more accurate and intuitive diagnostic inferences

翻訳日:2024-11-07 06:30:58 公開日:2024-09-20

# マルチモーダル生成プリミティブを利用した画像編集

Portrait Video Editing Empowered by Multimodal Generative Priors ( http://arxiv.org/abs/2409.13591v1 )

ライセンス: Link先を確認

Xuan Gao, Haiyao Xiao, Chenglai Zhong, Shimin Hu, Yudong Guo, Juyong Zhang,

(参考訳) マルチモーダルプロンプトを用いた一貫した表現型スタイリングを実現する強力なポートレートビデオ編集手法であるPortraitGenを紹介する。伝統的なポートレートビデオ編集手法は、しばしば3Dと時間的一貫性に悩まされ、通常、レンダリングの品質と効率性が欠如している。これらの問題に対処するため、我々はポートレートビデオフレームを動的3次元ガウス場に引き上げ、フレーム間の構造的・時間的コヒーレンスを確保する。さらに,洗練されたスタイル編集を可能にするだけでなく,100FPS以上のレンダリング速度を実現するニューラルガウステクスチャ機構を設計する。提案手法は,大規模2次元生成モデルから抽出した知識によるマルチモーダル入力を取り入れたものである。また,表情類似性指導と顔認識画像編集モジュールを内蔵し,反復的データセット更新に伴う劣化問題を効果的に軽減する。大規模な実験により, 時間的一貫性, 編集効率, レンダリング品質が向上した。提案手法の幅広い適用性は、テキスト駆動編集、画像駆動編集、リライティングなど様々なアプリケーションを通じて実証され、ビデオ編集の分野を前進させる大きな可能性を浮き彫りにしている。デモビデオとリリースされたコードは、プロジェクトページで公開されています。

We introduce PortraitGen, a powerful portrait video editing method that achieves consistent and expressive stylization with multimodal prompts. Traditional portrait video editing methods often struggle with 3D and temporal consistency, and typically lack in rendering quality and efficiency. To address these issues, we lift the portrait video frames to a unified dynamic 3D Gaussian field, which ensures structural and temporal coherence across frames. Furthermore, we design a novel Neural Gaussian Texture mechanism that not only enables sophisticated style editing but also achieves rendering speed over 100FPS. Our approach incorporates multimodal inputs through knowledge distilled from large-scale 2D generative models. Our system also incorporates expression similarity guidance and a face-aware portrait editing module, effectively mitigating degradation issues associated with iterative dataset updates. Extensive experiments demonstrate the temporal consistency, editing efficiency, and superior rendering quality of our method. The broad applicability of the proposed approach is demonstrated through various applications, including text-driven editing, image-driven editing, and relighting, highlighting its great potential to advance the field of video editing. Demo videos and released code are provided in our project page: https://ustc3dv.github.io/PortraitGen/

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# yesBut: 視覚言語モデルのサファイア理解能力を評価するための高品質アノテーション付きマルチモーダルデータセット

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models ( http://arxiv.org/abs/2409.13592v1 )

ライセンス: Link先を確認

Abhilash Nandy, Yash Agarwal, Ashish Patwa, Millon Madhur Das, Aman Bansal, Ankit Raj, Pawan Goyal, Niloy Ganguly,

(参考訳) 風刺やユーモアを理解することは、現在のVision-Languageモデルでも難しい課題です。本稿では,風刺画像検出(画像が風刺的かどうかを検出する),理解(画像の背景にある理由を生成する),コンプリート(画像の一方が風刺的であるような2つの選択肢から残りの半分を選択),高品質なデータセットYesBut(2547枚,風刺的1084枚,非風刺的1463枚)の課題を提示し,それらの課題を評価する。データセットの各風刺画像は、笑いや皮肉のような矛盾するシナリオとともに、通常のシナリオを描いている。視覚的QAや画像キャプションなどのマルチモーダルタスクにおける現在のビジョンランゲージモデルの成功にもかかわらず、ベンチマーク実験により、ゼロショット設定におけるYesButデータセットにおける提案されたタスクでは、自動化と人的評価の両方において、そのようなモデルが不十分であることが示されている。さらに、さらなる研究のために、119枚のリアルな風刺写真データセットをリリースする。データセットとコードはhttps://github.com/abhi1nandy2/yesbut_datasetで公開されている。

Understanding satire and humor is a challenging task for even current Vision-Language models. In this paper, we propose the challenging tasks of Satirical Image Detection (detecting whether an image is satirical), Understanding (generating the reason behind the image being satirical), and Completion (given one half of the image, selecting the other half from 2 given options, such that the complete image is satirical) and release a high-quality dataset YesBut, consisting of 2547 images, 1084 satirical and 1463 non-satirical, containing different artistic styles, to evaluate those tasks. Each satirical image in the dataset depicts a normal scenario, along with a conflicting scenario which is funny or ironic. Despite the success of current Vision-Language Models on multimodal tasks such as Visual QA and Image Captioning, our benchmarking experiments show that such models perform poorly on the proposed tasks on the YesBut Dataset in Zero-Shot Settings w.r.t both automated as well as human evaluation. Additionally, we release a dataset of 119 real, satirical photographs for further research. The dataset and code are available at https://github.com/abhi1nandy2/yesbut_dataset.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# クロスターゲットスタンス検出:技術,データセット,課題の調査

Cross-Target Stance Detection: A Survey of Techniques, Datasets, and Challenges ( http://arxiv.org/abs/2409.13594v1 )

ライセンス: Link先を確認

Parisa Jamadi Khiabani, Arkaitz Zubiaga,

(参考訳) スタンス検出は、テキストで表現された視点を所定のターゲットに向けて決定するタスクである。タスク内の特定の方向は、特定のターゲットに関連するサンプルに基づいてトレーニングされたモデルが、新しい、目に見えないターゲットに適用される、クロスターゲットスタンス検出に焦点を当てる。オンラインの視点や意見の分析やマイニングの必要性が高まる中、このタスクは近年大きな関心を集めている。本稿は,過去10年間の目標間姿勢検出の進歩を概観し,基礎統計手法から現代ニューラルモデル,LLMモデルへの進化を概説する。これらの進歩は、精度と適応性に顕著な改善をもたらした。イノベーティブなアプローチには、トピックグループ化された注意とゼロショット検出のための逆学習の使用、モデルロバスト性を高める微調整技術などがある。さらに、プロンプトチューニング手法と外部知識の統合により、モデル性能はさらに改善された。これらのモデルを評価するために使用されるデータセットの包括的概要も提供され、この分野の進歩と課題に関する貴重な洞察を提供する。我々は,研究の新たな方向性を強調し,今後の課題への道筋を提案することで結論付ける。

Stance detection is the task of determining the viewpoint expressed in a text towards a given target. A specific direction within the task focuses on cross-target stance detection, where a model trained on samples pertaining to certain targets is then applied to a new, unseen target. With the increasing need to analyze and mining viewpoints and opinions online, the task has recently seen a significant surge in interest. This review paper examines the advancements in cross-target stance detection over the last decade, highlighting the evolution from basic statistical methods to contemporary neural and LLM-based models. These advancements have led to notable improvements in accuracy and adaptability. Innovative approaches include the use of topic-grouped attention and adversarial learning for zero-shot detection, as well as fine-tuning techniques that enhance model robustness. Additionally, prompt-tuning methods and the integration of external knowledge have further refined model performance. A comprehensive overview of the datasets used for evaluating these models is also provided, offering valuable insights into the progress and challenges in the field. We conclude by highlighting emerging directions of research and by suggesting avenues for future work in the task.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# 非エルミート系における断熱増幅への幾何学的寄与

Geometric contribution to adiabatic amplification in non-Hermitian systems ( http://arxiv.org/abs/2409.13595v1 )

ライセンス: Link先を確認

Tomoki Ozawa, Henning Schomerus,

(参考訳) 非エルミート量子力学の概念は、光学、古典力学、メタマテリアルデザインなど、様々な古典システムの理解と操作に有用であることが証明されている。近年, 断熱処理におけるベリー相の非エルミートアナログを実験的に測定した。非エルミート系では、ベリー相は虚部を持ち、全波の強度の増幅や減衰に寄与する。ベリー曲率の虚部が 0 であるとき、この幾何増幅係数はパラメータ空間における断熱経路の初期点と最終点によってのみ決定され、これらの点が経路によってどのように接続されるかには依存しない。我々は、この経路独立が適切な対称性によって保証される非エルミート・ハミルトン群のクラスをリストし、これらのクラスの一部について、増幅係数は初期点と最終点のピーターマン因子の観点でのみ記述できることを見出した。我々の結果は、断熱過程下での波動関数のノルムがどのように変化するかを観察することによって、ピーターマン因子を実験的に得ることができる。我々は、物理的関連性の具体例を用いて、我々の理論を検証した。

Concepts from non-Hermitian quantum mechanics have proven useful in understanding and manipulating a variety of classical systems, such as encountered in optics, classical mechanics, and metamaterial design. Recently, the non-Hermitian analog of the Berry phase for adiabatic processes has been experimentally measured. In non-Hermitian systems, the Berry phase can have an imaginary part, which contributes to the amplification or decay of the total wave intensity. When the imaginary part of the Berry curvature is zero, this geometric amplification factor is determined solely by the initial and final points of the adiabatic path in parameter space, and does not depend on how these points are connected by the path. We list classes of non-Hermitian Hamiltonians where this path independence is guaranteed by suitable symmetries, and find that, for some of these classes, the amplification factor can be written only in terms of the Petermann factors of the initial and final points. Our result can, in turn, be used to experimentally obtain the Petermann factor by observing how the norm of the wavefunction changes under adiabatic processes. We validate our theory using a couple of concrete examples of physical relevance.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# Prithvi WxC:気候と気候の基礎モデル

Prithvi WxC: Foundation Model for Weather and Climate ( http://arxiv.org/abs/2409.13598v1 )

ライセンス: Link先を確認

Johannes Schmude, Sujit Roy, Will Trojak, Johannes Jakubik, Daniel Salles Civitarese, Shraddha Singh, Julian Kuehnert, Kumar Ankur, Aman Gupta, Christopher E Phillips, Romeo Kienzler, Daniela Szwarcman, Vishal Gaur, Rajat Shinde, Rohit Lal, Arlindo Da Silva, Jorge Luis Guevara Diaz, Anne Jones, Simon Pfreundschuh, Amy Lin, Aditi Sheshadri, Udaysankar Nair, Valentine Anantharaj, Hendrik Hamann, Campbell Watson, Manil Maskey, Tsengdar J Lee, Juan Bernabe Moreno, Rahul Ramachandran,

(参考訳) AIエミュレータは、HPCシステムで動作する従来の数値天気予報モデルのパフォーマンスに匹敵する可能性があるという認識から、予測、ダウンスケール、あるいは現在のキャストといったユースケースに対処する大規模なAIモデルが増えている。 AI文学における並列的な開発は、複数の異なるユースケースに対応するために効果的に調整可能な基盤モデルに焦点が当てられているが、天気と気候に関する開発は、主に中距離予測に特に重点を置いて、シングルユースケースに焦点を当てている。このギャップを埋めるために、Prithvi WxCというパラメータ基盤モデルを導入する。これは、Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2)から160変数を用いて開発された。 Prithvi WxCはエンコーダ-デコーダベースのアーキテクチャを採用し、様々なトランスフォーマーモデルの概念を取り入れて、入力データにおける地域的およびグローバルな依存関係を効果的にキャプチャする。このモデルは、異なる位相の気象現象を微細な解像度でモデル化するために、大きなトークン数に対応できるように設計されている。さらに,マスクを用いた再建と予測のパラダイムを組み合わせた混合目標を用いて訓練を行った。本稿では, 自動回帰ロールアウト予測, ダウンスケーリング, 重力波フラックスパラメータ化, エクストリームイベント推定など, 課題のある下流タスクのセットでモデルを検証する。 2.3億のパラメータを持つ事前トレーニングされたモデルは、関連する微調整ワークフローとともに、Hugging Faceを通じてオープンソースコントリビューションとして公開された。

Triggered by the realization that AI emulators can rival the performance of traditional numerical weather prediction models running on HPC systems, there is now an increasing number of large AI models that address use cases such as forecasting, downscaling, or nowcasting. While the parallel developments in the AI literature focus on foundation models -- models that can be effectively tuned to address multiple, different use cases -- the developments on the weather and climate side largely focus on single-use cases with particular emphasis on mid-range forecasting. We close this gap by introducing Prithvi WxC, a 2.3 billion parameter foundation model developed using 160 variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). Prithvi WxC employs an encoder-decoder-based architecture, incorporating concepts from various recent transformer models to effectively capture both regional and global dependencies in the input data. The model has been designed to accommodate large token counts to model weather phenomena in different topologies at fine resolutions. Furthermore, it is trained with a mixed objective that combines the paradigms of masked reconstruction with forecasting. We test the model on a set of challenging downstream tasks namely: Autoregressive rollout forecasting, Downscaling, Gravity wave flux parameterization, and Extreme events estimation. The pretrained model with 2.3 billion parameters, along with the associated fine-tuning workflows, has been publicly released as an open-source contribution via Hugging Face.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# MeLIAD:Metric LearningとEntropy-based Scoringを用いた解釈可能なFew-Shot異常検出

MeLIAD: Interpretable Few-Shot Anomaly Detection with Metric Learning and Entropy-based Scoring ( http://arxiv.org/abs/2409.13602v1 )

ライセンス: Link先を確認

Eirini Cholopoulou, Dimitris K. Iakovidis,

(参考訳) 異常検出(AD)は、欠陥製品を検出し、品質検査を自動化するマルチメディアアプリケーションにおいて重要な役割を果たす。ディープラーニング(DL)モデルは通常、大規模なアノテートデータを必要とする。これらのモデルのブラックボックスの性質は、ユーザーが信頼することを禁じている。これらの課題に対処するために,従来の手法と異なり,真の異常の分布仮定に頼らずに設計による解釈性を実現する,新しい異常検出手法であるMeLIADを提案する。 MeLIADは、拡張テクニックを使わずに、トレーニング用の異常のサンプルをわずかに必要としており、本質的に解釈可能であり、画像がなぜ異常であると特定されたかに関する洞察を提供する可視化を提供する。これは、異常なインスタンスの識別とローカライズのための、新しいトレーニング可能なエントロピーベースのスコアリングコンポーネントと、メトリック学習目的の異常スコアリングコンポーネントを協調的に最適化する新規なロス関数を導入することで達成される。解釈可能性の定量的かつ定性的な評価を含む5つの公開ベンチマークデータセットの実験は、MeLIADが最先端の手法と比較して異常検出とローカライゼーション性能の改善を実現していることを示している。

Anomaly detection (AD) plays a pivotal role in multimedia applications for detecting defective products and automating quality inspection. Deep learning (DL) models typically require large-scale annotated data, which are often highly imbalanced since anomalies are usually scarce. The black box nature of these models prohibits them from being trusted by users. To address these challenges, we propose MeLIAD, a novel methodology for interpretable anomaly detection, which unlike the previous methods is based on metric learning and achieves interpretability by design without relying on any prior distribution assumptions of true anomalies. MeLIAD requires only a few samples of anomalies for training, without employing any augmentation techniques, and is inherently interpretable, providing visualizations that offer insights into why an image is identified as anomalous. This is achieved by introducing a novel trainable entropy-based scoring component for the identification and localization of anomalous instances, and a novel loss function that jointly optimizes the anomaly scoring component with a metric learning objective. Experiments on five public benchmark datasets, including quantitative and qualitative evaluation of interpretability, demonstrate that MeLIAD achieves improved anomaly detection and localization performance compared to state-of-the-art methods.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# 時間発展局所作用素における行列要素のパウリ重量要件-平衡温度を超える依存性

Pauli weight requirement of the matrix elements in time-evolved local operators: dependence beyond the equilibration temperature ( http://arxiv.org/abs/2409.13603v1 )

ライセンス: Link先を確認

Carlos Ramos-Marimón, Stefano Carignano, Luca Tagliacozzo,

(参考訳) ハイゼンベルク図における局所作用素の平衡外進化をシミュレートする複雑さは、一般の非可積分系に間に合うように線形に成長する作用素の絡み合いによって支配され、計算資源の指数的な増加をもたらす。この課題を単純化するための有望なアプローチは、作用素の一部を破棄し、ラコフスキーらによって提案された「軽い」パウリ弦(パウリ行列がほとんどない弦)によって形成された部分空間に焦点を当てることである。本研究では, この戦略が同質な生成物状態から始まるクエンチに応用できるかどうかを考察する。エルゴード力学では、これらの初期状態は幅広い平衡温度にアクセスできる。所望の行列要素に集中し、初期状態に平行なパウリ文字列を含む演算子の部分のみを保持することによって、複雑なシナリオを明らかにする。場合によっては、光のパウリ弦は力学を記述するのに十分であり、現在のアルゴリズムによる効率的なシミュレーションを可能にしている。しかし、他のケースでは、より重い文字列が必要となり、現在の能力を超えて計算要求を押し進める。我々は,Bloch球面上のほとんどの点において異なる演算子に対して計算を行う演算子重みエントロピー(Operator Weight Entropy)を用いて,この振る舞いを分析する。

The complexity of simulating the out-of-equilibrium evolution of local operators in the Heisenberg picture is governed by the operator entanglement, which grows linearly in time for generic non-integrable systems, leading to an exponential increase in computational resources. A promising approach to simplify this challenge involves discarding parts of the operator and focusing on a subspace formed by "light" Pauli strings - strings with few Pauli matrices - as proposed by Rakovszki et al. [PRB 105, 075131 (2022)]. In this work, we investigate whether this strategy can be applied to quenches starting from homogeneous product states. For ergodic dynamics, these initial states grant access to a wide range of equilibration temperatures. By concentrating on the desired matrix elements and retaining only the portion of the operator that contains Pauli strings parallel to the initial state, we uncover a complex scenario. In some cases, the light Pauli strings suffice to describe the dynamics, enabling efficient simulation with current algorithms. However, in other cases, heavier strings become necessary, pushing computational demands beyond our current capabilities. We analyze this behavior using a newly introduced measure of complexity, the Operator Weight Entropy, which we compute for different operators across most points on the Bloch sphere.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# 自閉症スペクトラム障害児の包括的ビデオ理解に向けて

Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder ( http://arxiv.org/abs/2409.13606v1 )

ライセンス: Link先を確認

Aditya Kommineni, Digbalay Bose, Tiantian Feng, So Hyun Kim, Helen Tager-Flusberg, Somer Bishop, Catherine Lord, Sudarsana Kadiri, Shrikanth Narayanan,

(参考訳) 自閉症スペクトラム障害(Autism Spectrum disorder)の文脈における臨床ビデオは、しばしば子供と介護者・臨床専門家の間の長い形式の相互作用であり、複雑な言語行動と非言語行動を含んでいる。これらの動画を客観的に分析することで、自閉症スペクトラム障害児の行動に関する微妙な洞察を臨床医や研究者に提供することができる。これらのビデオを手作業でコーディングするのは時間を要する作業であり、高いレベルのドメイン知識が必要です。したがって、これらの相互作用を計算的に捉える能力は、手作業を強化し、診断手順をサポートすることができる。本研究では,3つのモダリティ(音声,ビデオ,テキスト)にまたがる基礎モデルを用いて,子どものインタラクション・セッションの分析を行う。本稿では,大規模言語モデルを推論エージェントとして利用することにより,複数のモーダルを結合する統一手法を提案する。本研究は,行動認識と異常行動検出という,情報粒度の異なる2つのタスクにおいて,その性能を評価する。提案したマルチモーダルパイプラインは,モダリティに特有の制約に対して堅牢性を提供し,単調な設定に比べて臨床ビデオ解析の性能を向上させる。

Clinical videos in the context of Autism Spectrum Disorder are often long-form interactions between children and caregivers/clinical professionals, encompassing complex verbal and non-verbal behaviors. Objective analyses of these videos could provide clinicians and researchers with nuanced insights into the behavior of children with Autism Spectrum Disorder. Manually coding these videos is a time-consuming task and requires a high level of domain expertise. Hence, the ability to capture these interactions computationally can augment the manual effort and enable supporting the diagnostic procedure. In this work, we investigate the use of foundation models across three modalities: speech, video, and text, to analyse child-focused interaction sessions. We propose a unified methodology to combine multiple modalities by using large language models as reasoning agents. We evaluate their performance on two tasks with different information granularity: activity recognition and abnormal behavior detection. We find that the proposed multimodal pipeline provides robustness to modality-specific limitations and improves performance on the clinical video analysis compared to unimodal settings.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# FIHA:Davidson Scene Graphsを用いた視覚言語モデルの自律幻覚評価

FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs ( http://arxiv.org/abs/2409.13612v1 )

ライセンス: Link先を確認

Bowen Yan, Zhengsong Zhang, Liqiang Jing, Eftekhar Hossain, Xinya Du,

(参考訳) LVLM(Large Vision-Language Models)の急速な開発は、しばしば幻覚の広範な問題を引き起こし、コスト効率と包括的評価がますます重要になっている。現在のアプローチは、主にコストのかかるアノテーションに依存しており、アスペクト間の関係、属性、依存関係など、すべての側面を評価するという点において、包括的ではない。そこで, FIHA (Autonomous Fine-grained Hallucination evAluation Evaluation in LVLMs) を導入し, LLMフリーかつアノテーションフリーな手法で幻覚LVLMにアクセスし, 異なる種類の幻覚間の依存性をモデル化した。 FIHAは、画像データセット上のQ&Aペアを最小限のコストで生成することができ、画像とキャプションの両方から幻覚評価を可能にする。本手法では, FIHA-v1というベンチマークを導入し, MSCOCO と Foggy の様々な画像に対する多様な質問からなる。さらに、Davidson Scene Graph(DSG)を用いて、Q&Aペア間の構造を整理し、評価の信頼性を高める。 FIHA-v1を用いた代表モデルの評価を行い,その限界と課題を強調した。コードとデータを公開しました。

The rapid development of Large Vision-Language Models (LVLMs) often comes with widespread hallucination issues, making cost-effective and comprehensive assessments increasingly vital. Current approaches mainly rely on costly annotations and are not comprehensive -- in terms of evaluating all aspects such as relations, attributes, and dependencies between aspects. Therefore, we introduce the FIHA (autonomous Fine-graIned Hallucination evAluation evaluation in LVLMs), which could access hallucination LVLMs in the LLM-free and annotation-free way and model the dependency between different types of hallucinations. FIHA can generate Q&A pairs on any image dataset at minimal cost, enabling hallucination assessment from both image and caption. Based on this approach, we introduce a benchmark called FIHA-v1, which consists of diverse questions on various images from MSCOCO and Foggy. Furthermore, we use the Davidson Scene Graph (DSG) to organize the structure among Q&A pairs, in which we can increase the reliability of the evaluation. We evaluate representative models using FIHA-v1, highlighting their limitations and challenges. We released our code and data.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# pAE:ヒト視覚系におけるフィードフォワードとフィードバックストリームの統合による側方遺伝子核のモデリングのための効率的なオートエンコーダアーキテクチャ

pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System ( http://arxiv.org/abs/2409.13622v1 )

ライセンス: Link先を確認

Moslem Gorji, Amin Ranjbar, Mohammad Bagher Menhaj,

(参考訳) 視覚野は脳の不可欠な部分であり、階層的に物体を識別する役割を担っている。ボトムアップおよびトップダウン経路の両方で視覚情報を処理する際には、視覚野の前野としての外側原核(LGN)の役割を理解することが重要である。視覚刺激が網膜に達すると、初期処理のためにLGN領域に伝達され、さらに処理するために視覚野に送られる。本研究では,人間の視覚情報処理を近似した深部畳み込みモデルを提案する。我々は,pAEアーキテクチャに基づいて設計した浅層畳み込みモデルを用いて,LGN領域の関数を近似することを目的とする。 pAEモデルは、V1領域からのフィードフォワードとフィードバックストリームを問題に統合しようと試みている。このモデリングフレームワークは、固定カメラが連続的に捉えた自然な画像を含む視覚刺激データセットの時間的および非時間的データ供給モードと、動物(動作中)の画像と動物のない画像の2つのカテゴリを含む。次に,提案モデルとGabor およびbiorthogonal wavelet 関数を用いたウェーブレットフィルタバンク法を比較した。実験の結果,提案手法は人体ベンチマークと高い類似性を持つ結果を得るだけでなく,他のモデルよりも優れた性能を示すことがわかった。 pAEモデルは最終99.26%の予測性能を達成し、時間モードでの人間の結果よりも約28%向上したことを示す。

The visual cortex is a vital part of the brain, responsible for hierarchically identifying objects. Understanding the role of the lateral geniculate nucleus (LGN) as a prior region of the visual cortex is crucial when processing visual information in both bottom-up and top-down pathways. When visual stimuli reach the retina, they are transmitted to the LGN area for initial processing before being sent to the visual cortex for further processing. In this study, we introduce a deep convolutional model that closely approximates human visual information processing. We aim to approximate the function for the LGN area using a trained shallow convolutional model which is designed based on a pruned autoencoder (pAE) architecture. The pAE model attempts to integrate feed forward and feedback streams from/to the V1 area into the problem. This modeling framework encompasses both temporal and non-temporal data feeding modes of the visual stimuli dataset containing natural images captured by a fixed camera in consecutive frames, featuring two categories: images with animals (in motion), and images without animals. Subsequently, we compare the results of our proposed deep-tuned model with wavelet filter bank methods employing Gabor and biorthogonal wavelet functions. Our experiments reveal that the proposed method based on the deep-tuned model not only achieves results with high similarity in comparison with human benchmarks but also performs significantly better than other models. The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# 準長距離非ハーミタンスキンモードからの超スペクトル感度と非局所二重不純物境界状態

Ultra spectral sensitivity and non-local bi-impurity bound states from quasi-long-range non-hermitian skin modes ( http://arxiv.org/abs/2409.13623v1 )

ライセンス: Link先を確認

Chang Shu, Kai Zhang, Kai Sun,

(参考訳) 量子力学の基本的な信条は、量子系のエネルギースペクトルが無限に弱く空間的に制限された摂動に対して安定であり続けることである。本稿では、このスペクトル安定性の原理が非エルミート系において熱力学極限で失敗することを実証する。例えば、非相互作用非エルミート系 $H_0$ と点のような不純物がいくつかあり、それぞれが局所短距離ポテンシャル $V_i$ を$i=1, \ldots, n$ で導入する。不純物ポテンシャルが十分に弱い場合、単一の不純物を導入するとスペクトルが変わらず、すなわち$H_0$と$H_0 + V_1$はほぼ同じエネルギースペクトルを持つ。しかし、もし2番目の不純物である$H_0 + V_1 + V_2$が導入された場合、これらの局所ポテンシャルがどれほど弱いとしても、それらの距離が十分に大きい限り、エネルギースペクトルの著しい変化が起こり、安定スペクトルの伝統的な期待と直接矛盾する。注目すべきは、この現象は非局所的であり、摂動の影響は2つの不純物の間の距離とともに指数関数的に増加することである。言い換えれば、ハミルトニアンは完全に局所的であるが、そのエネルギースペクトルは単一の無限小弱い不純物の存在に盲目であり、宇宙において大きな距離で分離された2つの無限小弱い不純物の存在を検出することができる。グリーン関数法を用いて、このスペクトル感度の起源を明らかにし、これは非局所的二不純物境界状態の形成から生じる。我々は、そのようなスペクトル不安定性を同定し、特徴付ける解析理論を提供し、数値解と完全に一致することを示す。

A fundamental tenet of quantum mechanics is that the energy spectrum of a quantum system shall remain stable against infinitesimally weak and spatially confined perturbations. In this article, we demonstrate that this principle of spectral stability fails in non-Hermitian systems at the thermodynamic limit. Consider, for instance, a non-interacting non-Hermitian system $H_0$ with a couple of point-like impurities, each of which introduces a local short-range potential $V_i$ with $i=1, \ldots, n$ labeling the impurities. If the impurity potentials are sufficiently weak, introducing a single impurity will not alter the spectrum; that is, $H_0$ and $H_0 + V_1$ have nearly identical energy spectra. However, if a second impurity is introduced, $H_0 + V_1 + V_2$, we find that no matter how weak these local potentials are, as long as the distance between them is sufficiently large, significant alterations in the energy spectrum can arise, directly contradicting the traditional expectation of a stable spectrum. Remarkably, this phenomenon is non-local, and the impact of the perturbations increases exponentially with the distance between the two impurities. In other words, although the Hamiltonian is entirely local, its energy spectrum, which is blind to the presence of a single infinitesimally weak impurity, is capable of detecting the presence of two infinitesimally weak impurities separated by a large distance in space. Using Green's function techniques, we uncover the origin of this spectral sensitivity, which arises from the formation of non-local bi-impurity bound states: non-local stationary states with wavepackets propagating back-and-forth between the two impurities. We provide an analytic theory to identify and characterize such spectral instabilities, showing perfect agreement with numerical solutions.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# GSConvモジュールとECAアテンション機構に基づくUnet脳腫瘍画像のセグメンテーションの改善

Improved Unet brain tumor image segmentation based on GSConv module and ECA attention mechanism ( http://arxiv.org/abs/2409.13626v1 )

ライセンス: Link先を確認

Qiyuan Tian, Zhuoyue Wang, Xiaoling Cui,

(参考訳) U-Netアーキテクチャに基づく深層学習アルゴリズムである脳腫瘍に対する医用画像分割法の改良モデルについて述べる。従来のU-Netに基づいて,医療画像分割作業におけるモデルの性能向上を目的としたGSConvモジュールとECAアテンション機構を導入する。これらの改良により、新しいU-Netモデルは、重要なチャネルに柔軟に集中しながら、より効率的なマルチスケール機能の抽出と活用が可能となり、セグメンテーション結果が大幅に改善される。実験中、改良されたU-Netモデルを訓練し、体系的に評価する。トレーニングセットとテストセットの損失曲線を調べた結果,2つの損失値が8世紀以降の最低点まで急速に減少し,徐々に収束し,安定することがわかった。これは、我々のモデルが優れた学習能力と一般化能力を持っていることを示している。さらに, 平均交点比 (mIoU) の変化を観測した結果, 平均交点比 (mIoU) は35世紀以降徐々に0.8に近づき, 安定に保たれていることがわかった。従来のU-Netと比較して、GSConvモジュールとECAアテンション機構に基づく改良版は、セグメンテーション効果の明らかな利点を示している。特に脳腫瘍画像エッジの処理において、改良されたモデルによりより正確なセグメンテーション結果が得られる。この成果は、医用画像解析の精度を向上するだけでなく、より信頼性の高い臨床診断支援も提供する。

An improved model of medical image segmentation for brain tumor is discussed, which is a deep learning algorithm based on U-Net architecture. Based on the traditional U-Net, we introduce GSConv module and ECA attention mechanism to improve the performance of the model in medical image segmentation tasks. With these improvements, the new U-Net model is able to extract and utilize multi-scale features more efficiently while flexibly focusing on important channels, resulting in significantly improved segmentation results. During the experiment, the improved U-Net model is trained and evaluated systematically. By looking at the loss curves of the training set and the test set, we find that the loss values of both rapidly decline to the lowest point after the eighth epoch, and then gradually converge and stabilize. This shows that our model has good learning ability and generalization ability. In addition, by monitoring the change in the mean intersection ratio (mIoU), we can see that after the 35th epoch, the mIoU gradually approaches 0.8 and remains stable, which further validates the model. Compared with the traditional U-Net, the improved version based on GSConv module and ECA attention mechanism shows obvious advantages in segmentation effect. Especially in the processing of brain tumor image edges, the improved model can provide more accurate segmentation results. This achievement not only improves the accuracy of medical image analysis, but also provides more reliable technical support for clinical diagnosis.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# Beauty Beyond Words: Ingredient-Based Product Attributesを使用した説明可能な美しいプロダクトレコメンデーション

Beauty Beyond Words: Explainable Beauty Product Recommendations Using Ingredient-Based Product Attributes ( http://arxiv.org/abs/2409.13628v1 )

ライセンス: Link先を確認

Siliang Liu, Rahul Suresh, Amin Banitalebi-Dehkordi,

(参考訳) 正確な属性抽出は、美容製品のレコメンデーションと顧客との信頼構築に不可欠である。既存のソリューションはしばしば信頼性が低く不完全であるため、これは未解決の問題である。美容製品材料に基づくエンド・ツー・エンドの教師あり学習を用いて美容特性を抽出するシステムを提案する。私たちのシステムに対する重要な洞察は、新しいエネルギーベースの暗黙的モデルアーキテクチャである。この暗黙的なモデルアーキテクチャは、正確性、説明可能性、堅牢性、柔軟性という面で大きなメリットをもたらします。さらに、暗黙のモデルは、利用可能な追加属性を組み込むように簡単に微調整できるため、現実世界のアプリケーションではより便利です。当社のモデルをeコマーススキンケア製品カタログデータセット上で検証し,その有効性を実証する。最後に, 美容レコメンデーションの具体的属性抽出が, 美容レコメンデーションの説明可能性の向上にどのように貢献するかを示す。

Accurate attribute extraction is critical for beauty product recommendations and building trust with customers. This remains an open problem, as existing solutions are often unreliable and incomplete. We present a system to extract beauty-specific attributes using end-to-end supervised learning based on beauty product ingredients. A key insight to our system is a novel energy-based implicit model architecture. We show that this implicit model architecture offers significant benefits in terms of accuracy, explainability, robustness, and flexibility. Furthermore, our implicit model can be easily fine-tuned to incorporate additional attributes as they become available, making it more useful in real-world applications. We validate our model on a major e-commerce skincare product catalog dataset and demonstrate its effectiveness. Finally, we showcase how ingredient-based attribute extraction contributes to enhancing the explainability of beauty recommendations.

翻訳日:2024-11-07 06:19:44 公開日:2024-09-20

# 一様TC$^0$の変換器

Transformers in Uniform TC$^0$ ( http://arxiv.org/abs/2409.13629v1 )

ライセンス: Link先を確認

David Chiang,

(参考訳) これまで、平均的注意変換器(AHAT)とSMAT(Softmax-attention transformer)によって認識された言語は、回路複雑性クラスTC$^0$に含まれていた。 Strobl は AHAT が L-ユニフォーム TC$0$ で近似できることを示し、Merrill と Sabharwal は SMAT が DLOGTIME-ユニフォーム TC$0$ で近似できることを示した。ここでは、近似のないAHAT、浮動小数点精度のO(poly(n))ビットのSMAT、最大2$^{-O(poly(n))のSMAT、絶対誤差がすべてDLOGTIME-uniform TC$^0$であることを示す。

Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However, these results assume limited-precision arithmetic: using floating-point numbers with O(log n) bits (where n is the length of the input string), Strobl showed that AHATs can be approximated in L-uniform TC$^0$, and Merrill and Sabharwal showed that SMATs can be approximated in DLOGTIME-uniform TC$^0$. Here, we improve these results, showing that AHATs with no approximation, SMATs with O(poly(n)) bits of floating-point precision, and SMATs with at most $2^{-O(poly(n))}$ absolute error are all in DLOGTIME-uniform TC$^0$.